Skip to main content

ChatGPT now interprets photos better than an art critic and an investigator combined

chatgpt visual intelligence with o3 model.
OpenAI

ChatGPT’s recent image generation capabilities have challenged our previous understing of AI-generated media. The recently announced GPT-4o model demonstrates noteworthy abilities of interpreting images with high accuracy and recreating them with viral effects, such as that inspired by Studio Ghibli. It even masters text in AI-generated images, which has previously been difficult for AI. And now, it is launching two new models capable of dissecting images for cues to gather far more information that might even fail a human glance.

OpenAI announced two new models earlier this week that take ChatGPT’s thinking abilities up a notch. Its new o3 model, which OpenAI calls its “most powerful reasoning model” improves on the existing interpretation and perception abilities, getting better at “coding, math, science, visual perception, and more,” the organization claims. Meanwhile, the o4-mini is a smaller and faster model for “cost-efficient reasoning” in the same avenues. The news follows OpenAI’s recent launch of the GPT-4.1 class of models, which brings faster processing and deeper context.

ChatGPT is now “thinking with images”

With improvements to their abilities to reason, both models can now incorporate images in their reasoning process, which makes them capable of “thinking with images,” OpenAI proclaims. With this change, both models can integrate images in their chain of thought. Going beyond basic analysis of images, the o3 and o4-mini models can investigate images more closely and even manipulate them through actions such as cropping, zooming, flipping, or enriching details to fetch any visual cues from the images that could potentially improve ChatGPT’s ability to provide solutions.

Introducing OpenAI o3 and o4-mini—our smartest and most capable models to date.

For the first time, our reasoning models can agentically use and combine every tool within ChatGPT, including web search, Python, image analysis, file interpretation, and image generation. pic.twitter.com/rDaqV0x0wE

— OpenAI (@OpenAI) April 16, 2025

With the announcement, it is said that the models blend visual and textual reasoning, which can be integrated with other ChatGPT features such as web search, data analysis, and code generation, and is expected to become the basis for a more advanced AI agents with multimodal analysis.

Recommended Videos

Among other practical applications, you can expect to include pictures of a multitude of items, such flow charts or scribble from handwritten notes to images of real-world objects, and expect ChatGPT to have a deeper understanding for a better output, even without a descriptive text prompt. With this, OpenAI is inching closer to Google’s Gemini, which offers the impressive ability to interpret the real world through live video.

Despite bold claims, OpenAI is limiting access only to paid members, presumably to prevent its GPUs from “melting” again, as it struggles to keep up the compute demand for new reasoning features. As of now, the o3, o4-mini, and o4-mini-high models will be exclusively available to ChatGPT Plus, Pro, and Team members while Enterprise and Education tier users get it in one week’s time. Meanwhile, Free users will be able to limited access to o4-mini when they select the “Think” button in the prompt bar.

Tushar Mehta
Tushar is a freelance writer at Digital Trends and has been contributing to the Mobile Section for the past three years…
AMD’s RDNA 4 may surprise us in more ways than one
AMD RX 7800 XT and RX 7700 XT graphics cards.

Thanks to all the leaks, I thought I knew what to expect with AMD's upcoming RDNA 4. It turns out I may have been wrong on more than one account.

The latest leaks reveal that AMD's upcoming best graphics card may not be called the RX 8800 XT, as most leakers predicted, but will instead be referred to as theĀ  RX 9070 XT. In addition, the first leaked benchmark of the GPU gives us a glimpse into the kind of performance we can expect, which could turn out to be a bit of a letdown.

Read more
This futuristic mechanical keyboard will set you back an eye-watering $1,600
Hands typing on The Icebreaker keyboard.

I've complained plenty about how some of the best gaming keyboards are too expensive, from the Razer Black Widow V4 75% to the Wooting 80HE, but nothing comes remotely close to The Icebreaker. Announced nearly a year ago by Serene Industries, The Icebreaker is unlike any keyboard I've ever seen -- and it's priced accordingly at $1,600. Plus shipping, of course.

What could justify such an extravagant price? Aluminum, it turns out. The keyboard is constructed of one single block of 6061 aluminum in what Serene Industries calls an "unorthodox wedge form." As if that wasn't enough metal, the keycaps are also made of aluminum, and Serene says they include "about 800" micro-perforations that allow the LED backlight of the keyboard to shine through.

Read more
Google one-ups Microsoft by making chats easier to transfer
Google Spaces in Google Chat on a MacBook.

In a recent blog post, Google announced that it is making it easier for admins to migrate from Microsoft TeamsĀ to Google Chat to reduce downtime. Admins can easily do this within the Google Chat migration menu and connect to opposing Microsoft accounts to transfer Teams data.

Google gave step-by-step instructions for admins on how to transfer the messages. Admins need to connect to their Microsoft account and upload a CSV of the Teams from where they transfer the messages. From there, it requires just entering a starting date for messages to be migrated from Teams and clicking Star migration. Once it's complete, it'll make the migrated space, messages, and conversation data available to Google Workspace users.

Read more