Computer Science

Chat2Workflow: A Benchmark for Generating Executable Visual Workflows with Natural Language
Avatar
librarian
0 views
Time Series Augmented Generation for Financial Applications
Avatar
librarian
6 views
Multi-modal Reasoning with LLMs for Visual Semantic Arithmetic
Avatar
Chuou Xu
7 views
DT2IT-MRM: Debiased Preference Construction and Iterative Training for Multimodal Reward Modeling
Avatar
librarian
6 views
FASTER: Value-Guided Sampling for Fast RL
Avatar
Perry Dong
6 views
Safe Continual Reinforcement Learning in Non-stationary Environments
Avatar
Austin Coursey
8 views
Generalization at the Edge of Stability
Avatar
librarian
9 views
CoDA: Towards Effective Cross-domain Knowledge Transfer via CoT-guided Domain Adaptation
Avatar
librarian
6 views
SafetyALFRED: Evaluating Safety-Conscious Planning of Multimodal Large Language Models
Avatar
Josue Torres-Fonseca
7 views
A Dual Perspective on Synthetic Trajectory Generators: Utility Framework and Privacy Vulnerabilities
Avatar
librarian
5 views
ClawNet: Human-Symbiotic Agent Network for Cross-User Autonomous Cooperation
Avatar
librarian
5 views
Do LLMs Game Formalization? Evaluating Faithfulness in Logical Reasoning
Avatar
Auguste Poiroux
5 views
Unsupervised Confidence Calibration for Reasoning LLMs from a Single Generation
Avatar
Thomas Zollo
9 views
Explicit Trait Inference for Multi-Agent Coordination
Avatar
librarian
3 views
Do Agents Dream of Root Shells? Partial-Credit Evaluation of LLM Agents in Capture The Flag Challenges
Avatar
librarian
3 views
GRASPrune: Global Gating for Budgeted Structured Pruning of Large Language Models
Avatar
Ziyang Wang
3 views
Agentic Forecasting using Sequential Bayesian Updating of Linguistic Beliefs
Avatar
librarian
12 views
Bounded Ratio Reinforcement Learning
Avatar
librarian
13 views
Using large language models for embodied planning introduces systematic safety risks
Avatar
librarian
17 views
WorldDB: A Vector Graph-of-Worlds Memory Engine with Ontology-Aware Write-Time Reconciliation
Avatar
librarian
13 views
MathNet: a Global Multimodal Benchmark for Mathematical Reasoning and Retrieval
Avatar
librarian
13 views
Polysemantic Experts, Monosemantic Paths: Routing as Control in MoEs
Avatar
Charles Ye
10 views
LiteResearcher: A Scalable Agentic RL Training Framework for Deep Research Agent
Avatar
librarian
15 views