ai benchmark - Search News

Anthropic used Pokémon to benchmark its newest AI model

In a blog post published Monday, Anthropic said that it tested its latest model, Claude 3.7 Sonnet, on the Game Boy classic Pokémon Red. The company equipped the model with basic memory, screen pixel input, and function calls to press buttons and navigate around the screen, allowing it to play Pokémon continuously.

Reuters · 9h

Anthropic launches advanced AI hybrid reasoning model

Anthropic on Monday launched an advanced AI model that can produce faster responses or display its step-by-step reasoning process, as it looks to gain a competitive edge in the generative artificial intelligence industry.

Bloomberg L.P. · 9h

Anthropic’s New AI Model Lets Users Decide How Much It Reasons

The OpenAI rival wants to simplify the user experience and rethink when people actually need AI systems to mimic human reasoning.

The Verge on MSN · 9h

Anthropic’s new ‘hybrid reasoning’ AI model is its smartest yet

In addition to a new model, Anthropic is also releasing a “limited research preview” of its “agentic” coding tool called Claude Code. While Anthropic already powers AI coding tools like Cursor, it’s pitching Claude Code as “an active collaborator that can search and read code,

Daily Journal · 7h

Anthropic releases its 'smartest' AI model

OpenAI rival Anthropic on Monday released what it said is its smartest artificial intelligence model to date, particularly when it comes to computer coding.

NBC New York · 9h

Anthropic say it's released its ‘most intelligent' AI model yet as competition ramps up

Anthropic unveiled its latest frontier model, Claude Sonnet 3.7, on Monday and claims it’s the company’s “most intelligent” version yet.

Thurrott · 6h

Anthropic’s First Reasoning AI Model is Now Available

Anthropic today announced the immediate availability of Claude 3.7 Sonnet, its first reasoning AI model, to all customers.

Did xAI lie about Grok 3’s benchmarks?

OpenAI researchers accused xAI about publishing misleading Grok 3 benchmarks. The truth is a little more nuanced.

The Register on MSN9d

Why AI benchmarks suck

Anyone remember when Volkswagen rigged its emissions results? Oh... AI model makers love to flex their benchmarks scores. But ...

healthcareinfosecurity.com7d

Researchers Caution AI Benchmark Score Reliability

Artificial intelligence model makers routinely publish benchmark scores of their performance, but the leaderboard race may be ...

5don MSN

This Week in AI: Maybe we should ignore AI benchmarks for now

Welcome to TechCrunch’s regular AI newsletter! We’re going on hiatus for a bit, but you can find all our AI coverage, including my columns, our daily analysis, and breaking news stories, at TechCrunch ...

newsbytesapp.com1d

Musk's xAI may have fudged Grok 3's AI benchmark results

Elon Musk 's AI firm, xAI, has been accused by an OpenAI employee of releasing deceptive benchmark results for Grok 3. The ...

18h

Rigetti Computing Stock (RGTI): Benchmark Raises Price Target on Quantum Progress and AI Potential

Rigetti Computing (RGTI) is catching the spotlight as the quantum computing race heats up. After Microsoft (MSFT) unveiled ...

Futurism on MSN2d

Microsoft CEO Admits That AI Is Generating Basically No Value

Microsoft CEO Satya Nadella, whose company has invested billions of dollars in ChatGPT maker OpenAI, has had it with the AI ...

AI can fix bugs—but can’t find them: OpenAI’s study highlights limits of LLMs in software engineering

A new test from OpenAI researchers found that LLMs were unable to resolve some freelance coding tests, failing to earn full ...

Microsoft’s new AI agent can control software and robots

Microsoft Research introduced Magma, an integrated AI foundation model that combines visual and language processing to ...

Did xAI Cheat and Manipulate Grok-3 Benchmarks?

Did xAI manipulate Grok-3’s benchmarks? Explore the controversy, strengths, and weaknesses of this AI model in our in-depth ...

Phone World1d

AI Benchmark Controversy: OpenAI and xAI Clash Over Grok 3’s Performance Claims

OpenAI and Elon Musk’s AI company, xAI, engaging in a public dispute over recent test results of Grok 3's performance.

Results that may be inaccessible to you are currently showing.

Hide inaccessible results

Related topics