BENCHMARKS Function Calling Harness Pushes Qwen From 6.75 Percent to 100 Percent Success on Complex Schemas 8/10 4 min read 1 month ago
BENCHMARKS UC Berkeley Expands BFCL Benchmark to v4, Adding Agentic Evaluation for LLM Function Calling 4/10 2 min read 2 months ago