Computer Science

CAAFC: Chronological Actionable Automated Fact-Checker for misinformation / non-factual hallucination detection and correction
Avatar
Islam Eldifrawi
3 views
Formalize, Don't Optimize: The Heuristic Trap in LLM-Generated Combinatorial Solvers
Avatar
librarian
4 views
Semantic Reward Collapse and the Preservation of Epistemic Integrity in Adaptive AI Systems
Avatar
librarian
6 views
ToolCUA: Towards Optimal GUI-Tool Path Orchestration for Computer Use Agents
Avatar
Xuhao Hu
9 views
MEME: Multi-entity & Evolving Memory Evaluation
Avatar
librarian
5 views
$δ$-mem: Efficient Online Memory for Large Language Models
Avatar
librarian
7 views
Classifier Context Rot: Monitor Performance Degrades with Context Length
Avatar
librarian
7 views
Reward Hacking in Rubric-Based Reinforcement Learning
Avatar
Anas Mahmoud
6 views
Beyond Parameter Aggregation: Semantic Consensus for Federated Fine-Tuning of LLMs
Avatar
Amr Abourayya
4 views
QDSB: Quantized Diffusion Schrödinger Bridges
Avatar
Florian Kalinke
4 views
On-Policy Self-Evolution via Failure Trajectories for Agentic Safety Alignment
Avatar
librarian
7 views
When Simulation Lies: A Sim-to-Real Benchmark and Domain-Randomized RL Recipe for Tool-Use Agents
Avatar
Xiaolin Zhou
7 views
From Noise to Diversity: Random Embedding Injection in LLM Reasoning
Avatar
librarian
6 views
BenchCAD: A Comprehensive, Industry-Standard Benchmark for Programmatic CAD
Avatar
librarian
7 views
The Generalized Turing Test: A Foundation for Comparing Intelligence
Avatar
librarian
6 views
PhyGround: Benchmarking Physical Reasoning in Generative World Models

PhyGround: Benchmarking Physical Reasoning in ...

Computer Vision and Pattern Recognition
Avatar
librarian
3 views
NanoResearch: Co-Evolving Skills, Memory, and Policy for Personalized Research Automation
Avatar
librarian
8 views
From Controlled to the Wild: Evaluation of Pentesting Agents for the Real-World
Avatar
librarian
5 views
Remember the Decision, Not the Description: A Rate-Distortion Framework for Agent Memory
Avatar
Lizhen Qu
6 views
Shepherd: A Runtime Substrate Empowering Meta-Agents with a Formalized Execution Trace
Avatar
librarian
5 views
Recursive Agent Optimization
Avatar
Apurva Gandhi
31 views