Skip to main content

GPT-4o and Gemini 1.5 Pro just got beat in the AI race

a screenshot of claude 3.5 sonnet, with an 8-bit crab
Anthropic

There’s a new leader, technically, in the race for AI assistant dominance, and it’s Anthropic’s new Claude 3.5 Sonnet. The newly released model outperforms both Gemini 1.5 Pro and ChatGPT-4o across a spectrum of benchmark tests, the company announced on Thursday.

This new iteration of Sonnet is the first in Anthropic’s upcoming line of 3.5 models, and it significantly outperforms the more expansive Opus 3.0 model, and does so at a fraction of the larger model’s energy cost. Compute efficiency is becoming an increasingly important aspect of AI system design, especially as the cost of both powering and cooling AI data centers soars while the infrastructure pushes into the gigawatt range.

Claude 3.5 Sonnet for vision

“Claude 3.5 Sonnet operates at twice the speed of Claude 3 Opus,” the Anthropic team wrote in a blog post. “This performance boost, combined with cost-effective pricing, makes Claude 3.5 Sonnet ideal for complex tasks such as context-sensitive customer support and orchestrating multistep workflows.”

Recommended Videos

The new model has reportedly set benchmark results across three standardized tests: graduate-level reasoning with GPQA, undergraduate-level knowledge with MMLU, and coding proficiency with HumanEval. It beat out Google’s Gemini 1.5 Pro, Meta’s Llama-400b, and OpenAI’s ChatGPT-4o, though not by any huge margin and typically only by a couple percentage points.

A table showing Claude 3.5 Sonnet's performance compared to other leading AI systems.
Anthropic

Sonnet 3.5 is being billed as Anthropic’s “strongest vision model yet. ” It’s capable of performing a number of vision-based tasks — like interpreting charts and graphs or transcribing text from imperfect image sources like screenshots or scanned receipts — more accurately than Opus 3.0. In fact, Sonnet 3.5 beat out Opus 3.0 by anywhere from 6 to 17 points across industry standard vision benchmarks. The new model is also reportedly much more competent at handling humor and can converse in a much more lifelike manner.

Sonnet will also be the first Anthropic AI to offer the Artifacts feature to users. Rather than generate images or code snippets directly into the flow of the conversation, Artifacts will create that content in a dedicated space to the side of the chat. This allows users to create “a dynamic workspace where they can see, edit, and build upon Claude’s creations in real time, seamlessly integrating AI-generated content into their projects and workflows,” the Anthropic team claims. It also announced that Claude will soon support team collaboration wherein a company can store its data, documents and projects in a single, central silo, with Claude acting as an on-demand assistant.

You can try out Claude 3.5 Sonnet today for free on the Claude.ai website and the Claude iOS app (a Claude Pro or Team subscription will garner you significantly higher rate limits). Third-party integration is also available through the Anthropic API, Amazon Bedrock, and Google Cloud’s Vertex AI. Claude Haiku 3.5 and Opus 3.5 are scheduled for release later in the year.

Andrew Tarantola
Andrew Tarantola is a journalist with more than a decade reporting on emerging technologies ranging from robotics and machine…
AMD’s RDNA 4 may surprise us in more ways than one
AMD RX 7800 XT and RX 7700 XT graphics cards.

Thanks to all the leaks, I thought I knew what to expect with AMD's upcoming RDNA 4. It turns out I may have been wrong on more than one account.

The latest leaks reveal that AMD's upcoming best graphics card may not be called the RX 8800 XT, as most leakers predicted, but will instead be referred to as the  RX 9070 XT. In addition, the first leaked benchmark of the GPU gives us a glimpse into the kind of performance we can expect, which could turn out to be a bit of a letdown.

Read more
This futuristic mechanical keyboard will set you back an eye-watering $1,600
Hands typing on The Icebreaker keyboard.

I've complained plenty about how some of the best gaming keyboards are too expensive, from the Razer Black Widow V4 75% to the Wooting 80HE, but nothing comes remotely close to The Icebreaker. Announced nearly a year ago by Serene Industries, The Icebreaker is unlike any keyboard I've ever seen -- and it's priced accordingly at $1,600. Plus shipping, of course.

What could justify such an extravagant price? Aluminum, it turns out. The keyboard is constructed of one single block of 6061 aluminum in what Serene Industries calls an "unorthodox wedge form." As if that wasn't enough metal, the keycaps are also made of aluminum, and Serene says they include "about 800" micro-perforations that allow the LED backlight of the keyboard to shine through.

Read more
Google one-ups Microsoft by making chats easier to transfer
Google Spaces in Google Chat on a MacBook.

In a recent blog post, Google announced that it is making it easier for admins to migrate from Microsoft Teams to Google Chat to reduce downtime. Admins can easily do this within the Google Chat migration menu and connect to opposing Microsoft accounts to transfer Teams data.

Google gave step-by-step instructions for admins on how to transfer the messages. Admins need to connect to their Microsoft account and upload a CSV of the Teams from where they transfer the messages. From there, it requires just entering a starting date for messages to be migrated from Teams and clicking Star migration. Once it's complete, it'll make the migrated space, messages, and conversation data available to Google Workspace users.

Read more