BENCHMARKS BullshitBench Results Show Anthropic Claude Models Dominate Top Seven Spots in Nonsense Detection Rankings 7/10 4 min read 2 months ago