Anthropic releases a version of its vaunted Mythos model
Digest more
Microsoft's new vulnerability-scanning system, codenamed MDASH, scored 88.45% on the CyberGym benchmark, surpassing single-model systems from Anthropic and OpenAI by using more than 100 specialized AI agents across multiple models.
For decades, the IQ test has been one of the most familiar — and most contested — yardsticks for human intelligence. Now, a startup project called AI IQ is applying the same metaphor to artificial intelligence, assigning estimated intelligence ...
AI capability determines success. AI talent powers capability. This is not new.
AI Model Release Tracker: Opus 4.8's misalignment rates similar to Claude Mythos Preview
Anthropic releases Claude Opus 4.8 with sharper judgement and fewer uncaught code flaws. Mythos-class models coming in weeks. Series H raises $65B at $965B valuation.
OpenBMB's 1B-parameter model MiniCMP 5 brings MCP support and agentic tool use to on-device AI—but it has trouble with logic traps.
Just as with LLMs, success in other frontiers of AI will require access to large volumes of high-quality data. That will require deliberate, research-driven dataset design.
