Anthropic launches updated Claude 3.5 Sonnet model that beats GPT-4o and Gemini 1.5 Pro

Anthropic today announced the updated Claude 3.5 Sonnet model and the new Claude 3.5 Haiku model. The updated Claude 3.5 Sonnet model delivers improvements across the board, with significant gains in coding. Claude 3.5 Haiku is Anthropic's answer to OpenAI's GPT-4o Mini and Google's Gemini 1.5 Flash. It will be available for the same price as its predecessor but with significant performance improvements.

Claude 3.5 Sonnet improvements:

SWE-bench Verified score increased from 33.4% to 49.0%, the best score ever by any model in the industry. TAU-bench score increased from 62.6% to 69.2% in the retail domain and from 36.0% to 46.0% in the airline domain. GPQA and MMLU Pro scores increased to 65% and 78%, respectively, which is better than Gemini 1.5 Pro.

The new Claude 3.5 Haiku model beats Claude 3 Opus, the largest model in Anthropic's previous generation, on many AI benchmarks. Claude 3.5 Haiku scores 40.6% on SWE-bench Verified, outperforming the original Claude 3.5 Sonnet and OpenAI GPT-4 Turbo. Claude 3.5 Haiku will be available only as a text-only model initially, with image support coming later.

Anthropic also highlighted that the joint pre-deployment testing of the new Claude 3.5 Sonnet model was conducted by the US AI Safety Institute (US AISI) and the UK Safety Institute (UK AISI) as part of the agreement they signed early this year. According to their Responsible Scaling Policy, the updated Claude 3.5 Sonnet model falls under the ASL-2 Standard.

The updated Claude 3.5 Sonnet is now available for the same price for all developers via the Anthropic API, Amazon Bedrock, and Google Cloud's Vertex AI. The new Claude 3.5 Haiku model will be available later this month.

The improved performance and affordability of these new Claude 3.5 models make them attractive options for developers and businesses seeking advanced language models for their AI applications.

Anthropic launches updated Claude 3.5 Sonnet model that beats GPT-4o and Gemini 1.5 Pro

POPULAR CATEGORY

industry

fun

health

sports