Skip to main content

Anthropic aims to fix one of the biggest problems in AI right now

the Anthropic logo
Anthropic

Hot on the heels of the announcement that its Claude 3.5 Sonnet large language model beat out other leading models, including GPT-4o and Llama-400B, AI startup Anthropic announced Monday that it plans to launch a new program to fund the development of independent, third-party benchmark tests against which to evaluate its upcoming models.

Per a blog post, the company is willing to pay third-party developers to create benchmarks that can “effectively measure advanced capabilities in AI models.”

“Our investment in these evaluations is intended to elevate the entire field of AI safety, providing valuable tools that benefit the whole ecosystem,” Anthropic wrote in a Monday blog post. “Developing high-quality, safety-relevant evaluations remains challenging, and the demand is outpacing the supply.”

The company wants submitted benchmarks to help measure the relative “safety level” of an AI based on a number of factors, including how well it resists attempts to coerce responses that might include cybersecurity; chemical, biological, radiological, and nuclear (CBRN); and misalignment, social manipulation, and other national security risks. Anthropic is also looking for benchmarks to help evaluate models’ advanced capabilities and is willing to fund the “development of tens of thousands of new evaluation questions and end-to-end tasks that would challenge even graduate students,” essentially testing a model’s ability to synthesize knowledge from a variety of sources, its ability to refuse cleverly worded malicious user requests, and its ability to respond in multiple languages.

Anthropic is looking for “sufficiently difficult,” high-volume tasks that can involve as many as “thousands” of testers across a diverse set of test formats that help the company inform its “realistic and safety-relevant” threat modeling efforts. Any interested developers are welcome to submit their proposals to the company, which plans to evaluate them on a rolling basis.

Andrew Tarantola
Andrew Tarantola is a journalist with more than a decade reporting on emerging technologies ranging from robotics and machine…
Microsoft is fixing my biggest problem with Windows 11 on handhelds
Asus ROG Ally with the Windows lock screen.

We're finally starting to make some progress on the handheld experience of Windows 11. Although Windows 11 handhelds like the ROG Ally X are some of the best handheld gaming PCs you can buy, that's despite their use of Windows, not because of it. Now, the latest Windows 11 Insider preview (build 22631.4387) adds a feature that should make navigating the OS much easier on a handheld -- a keyboard built for gamepads.

Windows has included an onscreen keyboard for years, and updates over the last couple of years have even made it usable with touch inputs. On a handheld, however, there are two problems with the keyboard. You can't invoke it naturally -- you have to bind Windows + Ctrl + O to a hotkey -- and you can't use your controller to navigate it. With the new update, Microsoft is fixing that last point, at the very least.

Read more
Nvidia’s CEO — yes, one person — is now worth more than all of Intel
Jensen Huang at GTX 2020.

Nvidia is one of the richest companies in the world, so it's no surprise that the company's CEO, Jensen Huang, is quite wealthy. The most recent net worth numbers from Forbes puts into context just how wealthy the executive really is, though. Huang has an estimated net worth of $109.2 billion, which is around $13 billion more than the market cap of Intel across the entire company.

Although Nvidia makes some of the best graphics cards, the obscene amount of money the company has racked up over the past two years stems from its AI accelerators. In 2020, Forbes estimated that Huang was worth $4.7 billion, and even in 2023, after ChatGPT had already exploded onto the scene, the executive was worth $21.1 billion. Now, Huang is the 11th richest person in the world, outpacing Bill Gates, Michael Dell, and Michael Bloomberg.

Read more
From Open AI to hacked smart glasses, here are the 5 biggest AI headlines this week
Ray-Ban Meta smart glasses in Headline style are worn by a model.

We officially transitioned into Spooky Season this week and, between OpenAI's $6.6 million funding round, Nvidia's surprise LLM, and some privacy-invading Meta Smart Glasses, we saw a scary number of developments in the AI space. Here are five of the biggest announcements.
OpenAI secures $6.6 billion in latest funding round

Sam Altman's charmed existence continues apace with news this week that OpenAI has secured an additional $6.6 billion in investment as part of its most recent funding round. Existing investors like Microsoft and Khosla Ventures were joined by newcomers SoftBank and Nvidia. The AI company is now valued at a whopping $157 billion, making it one of the wealthiest private enterprises on Earth.

Read more