RESEARCH Chatbots Struggle With News Accuracy and Sourcing Ahead of U.S. Midterms, Bloomberg Reports 8/10 3 min read 4 weeks ago
RESEARCH Andrej Karpathy Joins Anthropic’s Pretraining Team, Leaves OpenAI for Frontier LLM Work 8/10 3 min read 1 month ago
RESEARCH Microsoft’s MDASH Uses 100+ AI Agents to Find Windows Vulnerabilities, Tops CyberGym Benchmark 7/10 3 min read 1 month ago
RESEARCH DeepMind Unveils Gemini-Powered AI Pointer That Understands What You’re Pointing At 7/10 3 min read 1 month ago
RESEARCH Stanford Study: Overworked AI Agents Adopt Marxist Language and Pass Solidarity Messages 8/10 3 min read 1 month ago
RESEARCH Google Researchers Say AI Was Used to Build a Zero-Day Hacking Tool 7/10 3 min read 1 month ago
RESEARCH MATS, Redwood, Oxford, and Anthropic Find SFT+RL Combo Recovers 88-99% of AI Capability After Sandbagging 7/10 4 min read 1 month ago
RESEARCH METR Says It Can Barely Measure Claude Mythos; Palo Alto Calls Frontier Models ‘Step-Change’ 8/10 4 min read 1 month ago
RESEARCH Palisade Research: AI Self-Replication via Hacking Jumps from 6% to 81% Success Rate in One Year 8/10 4 min read 1 month ago
RESEARCH Fields Medalist Tim Gowers: ChatGPT 5.5 Pro Produced ‘PhD-Level’ Math Research in 2 Hours With Zero Guidance 8/10 4 min read 1 month ago
RESEARCH Anthropic’s Natural Language Autoencoders Catch Claude Opus 4.6 Suspecting It’s Being Tested in Audits 7/10 4 min read 1 month ago