OpenAI launches GPT-5.4 with native laptop use mode, monetary plugins for Microsoft Excel, Google Sheets

The AI updates aren’t slowing down. Actually two days after OpenAI launched a brand new underlying AI mannequin for ChatGPT referred to as GPT-5.3 Instant, the firm has unveiled another, even more massive upgrade: GPT-5.4.

Truly, GPT-5.4 is available in two varieties: GPT-5.4 Pondering and GPT-5.4 Professional, the latter designed for the most complicated duties.

Each will probably be accessible in OpenAI’s paid utility programming interface (API) and Codex software program growth utility, whereas GPT-5.4 Pondering will probably be accessible to all paid subscribers of ChatGPT (Plus, the $20-per-month plan, and up) and Professional will probably be reserved for ChatGPT Professional ($200 month-to-month) and Enterprise plan customers.

ChatGPT Free customers may also get a style of GPT-5.4, however solely when their queries are auto-routed to the mannequin, in accordance to an OpenAI spokesperson.

The large headlines on this launch are effectivity, with OpenAI reporting that GPT-5.4 makes use of far fewer tokens (47% fewer on some duties) than its predecessors, and, arguably much more impressively, a brand new “native” Laptop Use mode accessible by way of the API and its Codex that lets GPT-5.4 navigate a customers’ laptop like a human and work throughout purposes.

The corporate is additionally releasing a new suite of ChatGPT integrations permitting GPT-5.4 to be plugged instantly into customers’ Microsoft Excel and Google Sheets spreadsheets and cells, enabling granular evaluation and automatic process completion that ought to pace up work throughout the enterprise, however could make fears of white collar layoffs much more pronounced on the heels of similar offerings from Anthropic’s Claude and its new Cowork application.

OpenAI says GPT-5.4 helps up to 1 million tokens of context in the API and Codex, enabling brokers to plan, execute, and verify duties throughout lengthy horizons— nonetheless, it fees double the price per 1 million tokens as soon as the enter exceeds 272,000 tokens.

Native laptop use: a step towards autonomous workflows

Probably the most consequential functionality OpenAI highlights is that GPT-5.4 is its first general-purpose mannequin launched with native, state-of-the-art computer-use capabilities in Codex and the API, enabling brokers to function computer systems and perform multi-step workflows throughout purposes.

OpenAI says the mannequin can each write code to function computer systems through libraries like Playwright and situation mouse and keyboard instructions in response to screenshots. OpenAI additionally claims a bounce in agentic net shopping.

Benchmark outcomes are offered as proof that this is not merely a UI wrapper.

On BrowseComp, which measures how properly AI brokers can persistently browse the net to discover hard-to-locate information, OpenAI stories GPT-5.4 enhancing by 17% absolute over GPT-5.2, and GPT-5.4 Professional reaching 89.3%, described as a brand new state of the artwork.

On OSWorld-Verified, which measures desktop navigation utilizing screenshots plus keyboard and mouse actions, OpenAI stories GPT-5.4 at 75.0% success, in contrast to 47.3% for GPT-5.2, and notes reported human efficiency at 72.4%.

On WebArena-Verified, GPT-5.4 reaches 67.3% success utilizing each DOM- and screenshot-driven interplay, in contrast to 65.4% for GPT-5.2. On On-line-Mind2Web, OpenAI stories 92.8% success utilizing screenshot-based observations alone.

OpenAI additionally hyperlinks laptop use to enhancements in imaginative and prescient and doc dealing with. On MMMU-Professional, GPT-5.4 reaches 81.2% success with out device use, in contrast with 79.5% for GPT-5.2, and OpenAI says it achieves that consequence utilizing a fraction of the “pondering tokens.”

On OmniDocBench, GPT-5.4’s common error is reported at 0.109, improved from 0.140 for GPT-5.2. The put up additionally describes expanded help for high-fidelity picture inputs, together with an “unique” element degree up to 10.24M pixels.

OpenAI positions GPT-5.4 as constructed for longer, multi-step workflows—work that more and more appears to be like like an agent retaining state throughout many actions quite than a chatbot responding as soon as.

Device search and improved device orchestration

As device ecosystems get bigger, OpenAI argues that the naive strategy—dumping each device definition into the immediate—creates a tax paid on each request: price, latency, and context air pollution.

GPT-5.4 introduces device search in the API as a structural repair. As an alternative of receiving all device definitions upfront, the mannequin receives a light-weight checklist of instruments plus a search functionality, and it retrieves full device definitions solely once they’re truly wanted.

OpenAI describes the effectivity win with a concrete comparability: on 250 duties from Scale’s MCP Atlas benchmark, operating with 36 MCP servers enabled, the tool-search configuration lowered whole token utilization by 47% whereas reaching the identical accuracy as a configuration that uncovered all MCP capabilities instantly in context.

That 47% determine is particularly about the tool-search setup in that analysis—not a blanket declare that GPT-5.4 makes use of 47% fewer tokens for each type of process.

Enhancements for builders and coding workflows

OpenAI’s coding pitch is that GPT-5.4 combines the coding strengths of GPT-5.3-Codex with stronger device and computer-use capabilities that matter when duties aren’t single-shot.

GPT-5.4 matches or outperforms GPT-5.3-Codex on SWE-Bench Professional whereas being decrease latency throughout reasoning efforts.

Codex additionally will get workflow-level knobs. OpenAI says /quick mode delivers up to 1.5× quicker efficiency throughout supported fashions, together with GPT-5.4, describing it as the identical mannequin and intelligence “simply quicker.”

And it describes releasing an experimental Codex talent, “Playwright (Interactive)”, meant to show how coding and laptop use can work in tandem—visually debugging net and Electron apps and testing an app because it’s being constructed.

OpenAI for Microsoft Excel and Google Sheets

Alongside GPT-5.4, OpenAI is saying a collection of safe AI merchandise in ChatGPT constructed for enterprises and monetary establishments, powered by GPT-5.4 for superior monetary reasoning and Excel-based modeling.

The centerpiece is ChatGPT for Excel and Google Sheets (beta), which OpenAI describes as ChatGPT embedded instantly in spreadsheets to construct, analyze, and replace complicated monetary fashions utilizing the formulation and constructions groups already rely on.

The suite additionally consists of new ChatGPT app integrations meant to unify market, firm, and inside information right into a single workflow, naming FactSet, MSCI, Third Bridge, and Moody’s.

And it introduces reusable “Expertise” for recurring finance work comparable to earnings previews, comparables evaluation, DCF evaluation, and funding memo drafting.

OpenAI anchors the finance push with an inside benchmark declare: mannequin efficiency elevated from 43.7% with GPT-5 to 88.0% with GPT-5.4 Pondering on an OpenAI inside funding banking benchmark.

Measuring AI efficiency in opposition to skilled work

OpenAI leans on benchmarks meant to resemble actual workplace deliverables, not simply puzzle-solving. On GDPval, an analysis spanning “well-specified data work” throughout 44 occupations, OpenAI stories that GPT-5.4 matches or exceeds business professionals in 83.0% of comparisons, in contrast to 71.0% for GPT-5.2.

The corporate additionally highlights particular enhancements in the sorts of artifacts that have a tendency to expose mannequin weaknesses: structured tables, formulation, narrative coherence, and design high quality.

In an inside benchmark of spreadsheet modeling duties modeled after what a junior funding banking analyst may do, GPT-5.4 reaches a imply rating of 87.5%, in contrast to 68.4% for GPT-5.2.

And on a set of presentation analysis prompts, OpenAI says human raters most well-liked GPT-5.4’s displays 68.0% of the time over GPT-5.2’s, citing stronger aesthetics, better visible selection, and simpler use of picture technology.

Bettering reliability and decreasing hallucinations

OpenAI describes GPT-5.4 as its most factual mannequin but and connects that declare to a sensible dataset: de-identified prompts the place customers beforehand flagged factual errors. On that set, OpenAI stories GPT-5.4’s particular person claims are 33% much less probably to be false and its full responses are 18% much less probably to comprise any errors in contrast to GPT-5.2.

In statements supplied to VentureBeat from OpenAI and attributed early GPT-5.4 testers, Daniel Swiecki of Walleye Capital says that on inside finance and Excel evaluations, GPT-5.4 improved accuracy by 30 proportion factors, which he hyperlinks to expanded automation for mannequin updates and situation evaluation.

Brendan Foody, CEO of Mercor, calls GPT-5.4 the greatest mannequin the firm has tried and says it’s now prime of Mercor’s APEX-Brokers benchmark for skilled providers work, emphasizing long-horizon deliverables like slide decks, monetary fashions, and authorized evaluation.

Pricing and availability

In the API, OpenAI says GPT-5.4 Pondering is accessible as gpt-5.4 and GPT-5.4 Professional as gpt-5.4-pro. Pricing is as follows:

GPT-5.4: $2.50 / 1M enter tokens; $15 / 1M output tokens
GPT-5.4 Professional: $30 / 1M enter tokens; $180 / 1M output tokens
Batch + Flex: half-rate; Precedence processing: 2× charge

This makes GPT-5.4 amongst the costlier fashions to run over API in contrast to the whole area, as seen in the desk under.

Mannequin	Enter	Output	Whole Value	Supply
Qwen 3 Turbo	$0.05	$0.20	$0.25	Alibaba Cloud
Qwen3.5-Flash	$0.10	$0.40	$0.50	Alibaba Cloud
deepseek-chat (V3.2-Exp)	$0.28	$0.42	$0.70	DeepSeek
deepseek-reasoner (V3.2-Exp)	$0.28	$0.42	$0.70	DeepSeek
Grok 4.1 Quick (reasoning)	$0.20	$0.50	$0.70	xAI
Grok 4.1 Quick (non-reasoning)	$0.20	$0.50	$0.70	xAI
MiniMax M2.5	$0.15	$1.20	$1.35	MiniMax
Gemini 3.1 Flash-Lite	$0.25	$1.50	$1.75	Google
MiniMax M2.5-Lightning	$0.30	$2.40	$2.70	MiniMax
Gemini 3 Flash Preview	$0.50	$3.00	$3.50	Google
Kimi-k2.5	$0.60	$3.00	$3.60	Moonshot
GLM-5	$1.00	$3.20	$4.20	Z.ai
ERNIE 5.0	$0.85	$3.40	$4.25	Baidu
Claude Haiku 4.5	$1.00	$5.00	$6.00	Anthropic
Qwen3-Max (2026-01-23)	$1.20	$6.00	$7.20	Alibaba Cloud
Gemini 3 Professional (≤200K)	$2.00	$12.00	$14.00	Google
GPT-5.2	$1.75	$14.00	$15.75	OpenAI
Claude Sonnet 4.6	$3.00	$15.00	$18.00	Anthropic
GPT-5.4	$2.50	$15.00	$17.50	OpenAI
Gemini 3 Professional (>200K)	$4.00	$18.00	$22.00	Google
Claude Opus 4.6	$5.00	$25.00	$30.00	Anthropic
GPT-5.2 Professional	$21.00	$168.00	$189.00	OpenAI
GPT-5.4 Professional	$30.00	$180.00	$210.00	OpenAI

One other essential observe: with GPT-5.4, requests that exceed 272,000 enter tokens are billed at 2X the regular charge, reflecting the capacity to ship prompts bigger than earlier fashions supported.

In Codex, compaction defaults to 272k tokens, and the greater long-context pricing applies solely when the enter exceeds 272k—which means builders can preserve sending prompts at or underneath that measurement with out triggering the greater charge, however can decide into bigger prompts by elevating the compaction restrict, with solely these bigger requests billed in another way.

An OpenAI spokesperson mentioned that in the API the most output is 128,000 tokens, the identical as earlier fashions.

Lastly, on why GPT-5.4 is priced greater at baseline, the spokesperson attributed it to three elements: greater functionality on complicated duties (together with coding, laptop use, deep analysis, superior doc technology, and gear use), main analysis enhancements from OpenAI’s roadmap, and extra environment friendly reasoning that makes use of fewer reasoning tokens for comparable duties—including that OpenAI believes GPT-5.4 stays under comparable frontier fashions on pricing even with the enhance.

The broader shift

Throughout the launch and the follow-up clarifications, GPT-5.4 is positioned as a mannequin meant to transfer past “reply technology” and into sustained skilled workflows—ones that require device orchestration, laptop interplay, lengthy context, and outputs that seem like the artifacts individuals truly use at work.

OpenAI’s emphasis on token effectivity, device search, native laptop use, and lowered user-flagged factual errors all level in the identical route: making agentic methods extra viable in manufacturing by reducing the price of retries—whether or not that retry is a human re-prompting, an agent calling one other device, or a workflow re-running as a result of the first move didn’t stick.

Disclaimer: This article is sourced from external platforms. OverBeta has not independently verified the information. Readers are advised to verify details before relying on them.