In a blog post published Monday, Anthropic said that it tested its latest model, Claude 3.7 Sonnet, on the Game Boy classic ...
OpenAI researchers accused xAI about publishing misleading Grok 3 benchmarks. The truth is a little more nuanced.
Welcome to TechCrunch’s regular AI newsletter! We’re going on hiatus for a bit, but you can find all our AI coverage, including my columns, our daily analysis, and breaking news stories, at TechCrunch ...
Elon Musk 's AI firm, xAI, has been accused by an OpenAI employee of releasing deceptive benchmark results for Grok 3. The ...
Artificial intelligence model makers routinely publish benchmark scores of their performance, but the leaderboard race may be ...
Rigetti Computing (RGTI) is catching the spotlight as the quantum computing race heats up. After Microsoft (MSFT) unveiled ...
A new test from OpenAI researchers found that LLMs were unable to resolve some freelance coding tests, failing to earn full ...
Elon Musk's xAI launches Grok 3, outperforming ChatGPT and Google Gemini in benchmarks with 200,000 GPUs and advanced ...
Microsoft Research introduced Magma, an integrated AI foundation model that combines visual and language processing to ...
Did xAI manipulate Grok-3’s benchmarks? Explore the controversy, strengths, and weaknesses of this AI model in our in-depth ...
Here at TC, we often reluctantly report benchmark figures because they're one of the few (relatively) standardized ways the AI industry measures model improvements. Popular AI benchmarks tend to ...