Weibo's new open supply AI mannequin VibeThinker-1.5B outperforms DeepSeek-R1 on $7,800 post-training price range

One other day in late 2025, one other spectacular consequence from a Chinese language firm in open supply synthetic intelligence.

Chinese language social networking firm Weibo's AI division recently released its open source VibeThinker-1.5B—a 1.5 billion parameter giant language mannequin (LLM) that is a fine-tuned variant of rival Chinese language tech agency Alibaba's Qwen2.5-Math-1.5B.

It's obtainable now without cost obtain and utilization by researchers and enterprise builders—even for industrial functions—underneath a permissive MIT License on Hugging Face, GitHub and ModelScope, with a technical report on open entry science publishing website arxiv.org.

And but, regardless of its compact measurement, VibeThinker-1.5B achieves benchmark-topping reasoning efficiency on math and code duties, rivaling or surpassing fashions tons of of instances its measurement, even outperforming Chinese language rival DeepSeek's famed R1 that went viral at the begin of this 12 months—a 671-billion parameter mannequin—on formal reasoning benchmark.

It additional eclipses Mistral AI's Magistral Medium and holds its personal towards Anthropic's Claude Opus 4 and OpenAI's gpt-oss-20B Medium, all whereas requiring a fraction of the infrastructure and funding.

It additionally does so having been post-trained on a price range of merely $7800 USD for compute assets (3900 GPU hours on Nvidia H800s) — far lower than the tens, and even tons of, of hundreds of {dollars} usually required to fine-tune fashions of comparable or bigger scale.

Recall this is not the whole value of the mannequin's growth, nonetheless: LLMs are skilled in phases. First comes pre-training, when the mannequin learns primary language construction and common data by predicting the subsequent phrase throughout huge quantities of textual content from the web, books, and articles. This provides it fluency however not a lot sense of how to observe directions or maintain a dialog

Publish-training comes subsequent, utilizing a lot smaller, higher-quality datasets—usually collections of instance questions, prompts, and expert-written solutions—to train the mannequin how to reply helpfully, cause by means of issues, and align with human expectations. Nonetheless, Weibo's post-training value effectiveness on VibeThinker-1.5B is noteworthy and must be counseled.

The open-source launch upends assumptions about parameter scale, compute depth, and the minimal viable measurement for high-performance LLMs.

A Completely different Coaching Strategy: Spectrum-to-Sign

VibeThinker-1.5B owes its efficiency not to scale, however to the coaching framework behind it: the Spectrum-to-Sign Precept (SSP).

As a substitute of optimizing a mannequin purely for single-answer correctness (Move@1), the SSP framework decouples supervised fine-tuning (SFT) and reinforcement studying (RL) into two distinct phases with totally different objectives:

SFT (“Spectrum Part”): The mannequin is skilled to maximize range throughout potential right solutions, bettering its Move@Okay rating. This builds a variety of believable answer paths.
RL (“Sign Part”): A second-stage reinforcement studying system (known as MaxEnt-Guided Coverage Optimization, or MGPO) is used to establish and amplify the most right paths from this numerous answer pool. MGPO prioritizes issues the place the mannequin is most unsure, utilizing entropy-based weighting to focus studying.

The authors argue this separation permits small fashions to discover reasoning area extra successfully—reaching sign amplification with out relying on huge parameter counts.

VibeThinker-1.5B makes a compelling case that the business’s reliance on parameter scaling as the solely route to higher reasoning efficiency could also be outdated.

By adopting a diversity-first coaching pipeline, WeiboAI has proven that smaller, extra accessible fashions can match and even outperform billion-dollar techniques in logic-heavy duties.

The low useful resource footprint is amongst the most important elements of VibeThinker-1.5B. At underneath $8,000, the post-training value is 30–60x decrease than fashions like DeepSeek R1 and MiniMax-M1, which value between $294K and $535K to practice.

Efficiency Throughout Domains

Regardless of its small measurement, VibeThinker-1.5B delivers cross-domain reasoning that outpaces many bigger open-source and industrial fashions:

Mannequin	AIME25	LiveCodeBench v6	GPQA-Diamond
VibeThinker-1.5B	74.4	51.1	46.7
GPT-OSS-20B-Medium	72.1	54.9	66.0
Claude Opus 4	69.2	56.6	79.6
MiniMax M1 (456B)	74.6	62.3	69.2
DeepSeek R1 (671B)	70.0	65.9	71.5
Kimi K2 (1.09T)	49.5	53.7	75.1

VibeThinker was benchmarked towards each reasoning-centric fashions (Magistral, Claude, OpenAI o3-mini) and non-reasoning LLMs (GPT-4.1, Kimi K2, DeepSeek V3). Throughout structured reasoning benchmarks, the mannequin persistently outperformed non-reasoning fashions, no matter measurement:

On AIME24 (math), it beat Kimi K2 (1.09T) by over 10 factors (80.3 vs. 69.6).
On LiveCodeBench v6, it surpassed Claude Opus 4 (51.1 vs. 47.4).
On GPQA, it scored under GPT-4.1 and Claude, however nonetheless doubled its base mannequin (from 16.4 to 46.7).

This helps the authors’ declare that measurement is not the solely path to reasoning functionality—with correct coaching design, smaller fashions can attain and even exceed the efficiency of far bigger techniques in focused duties.

Notably, it achieves parity with fashions tons of of instances bigger on math and code, although it lags behind generally data reasoning (GPQA), the place bigger fashions preserve an edge.

This suggests a possible specialization trade-off: whereas VibeThinker excels at structured logical duties, it has much less capability for wide-ranging encyclopedic recall, a recognized limitation of smaller architectures.

Steerage for Enterprise Adoption

The discharge consists of beneficial inference settings (temperature = 0.6, top_p = 0.95, max tokens = 40960).

The mannequin is sufficiently small to be deployed on edge gadgets, together with cellphones and vehicle-embedded techniques, whereas inference prices are estimated to be 20–70x cheaper than with giant fashions.

This positions VibeThinker-1.5B not simply as a analysis achievement, however as a possible basis for cost-efficient, regionally deployable reasoning techniques.

Weibo’s Technique and Market Place

Weibo, launched by Sina Company in 2009, stays a cornerstone of China’s social media ecosystem. Usually described as China’s model of X (previously Twitter), the platform blends microblogging, multimedia content material, and trending-topic options with a regulatory setting formed by tight authorities oversight.

Regardless of counting 600 million month-to-month energetic customers (greater than twice that of X), investors are not optimistic about its advertising revenue growth potential in the close to time period, and Weibo is navigating intensifying competitors from video-first platforms like Douyin, which are drawing youthful customers and rising time-spent elsewhere.

In response, Weibo has leaned into creator-economy monetization, live-streaming, and vertical video—including instruments for influencer engagement, e-commerce integration, and richer analytics for manufacturers.

The platform’s function as a digital public sq. additionally makes it a spotlight of regulatory scrutiny. Chinese language authorities proceed to apply stress on points ranging from content material governance to knowledge safety. In September 2025, Weibo was among the platforms cited in official warnings, highlighting its ongoing publicity to coverage dangers.

Weibo’s push into AI R&D—exemplified by the launch of VibeThinker-1.5B—alerts a shift in ambition. Past being a media platform, Weibo is positioning itself as a participant in the subsequent part of Chinese language AI growth, utilizing its capital reserves, consumer conduct knowledge, and in-house analysis capability to pursue adjoining technical domains.

What It Means for Enterprise Technical Resolution Makers

For engineering leaders and enterprise AI groups, VibeThinker’s launch has sensible implications for every thing from orchestration pipelines to value modeling.

A 1.5B-parameter mannequin that outperforms 100x bigger fashions on math and programming duties doesn’t simply save compute—it shifts the architectural stability. It permits LLM inference on constrained infrastructure, reduces latency at the edge, and lowers the barrier to entry for purposes that in any other case would have required API entry to closed, frontier-scale fashions.

That issues for enterprise ML leads attempting to deploy reasoning-capable brokers inside current techniques, or for platform homeowners tasked with integrating LLMs into automated workflows.

It additionally speaks to these operating reinforcement studying from human suggestions (RLHF) pipelines or managing inference optimization throughout hybrid cloud environments.

The mannequin’s post-training methodology—significantly its entropy-targeted reinforcement studying method—presents a roadmap for groups trying to refine smaller checkpoints as an alternative of relying on large-scale pretraining.

VibeThinker’s benchmark transparency and knowledge decontamination steps additionally handle one other rising precedence in enterprise AI: auditability. Whereas its efficiency on general-knowledge checks nonetheless trails giant frontier fashions, its task-specific reliability makes it a gorgeous candidate for managed environments the place correctness issues greater than protection.

Briefly, VibeThinker-1.5B isn’t only a analysis milestone—it’s a robust candidate for sensible enterprise use, deployment and learnings. It suggests {that a} new class of compact, reasoning-optimized fashions is viable for enterprise use instances that have been beforehand the area of far bigger techniques. For organizations attempting to stability value, latency, interpretability, and management, it’s new possibility to the lengthy, rising record of Chinese language open supply choices.

Disclaimer: This article is sourced from external platforms. OverBeta has not independently verified the information. Readers are advised to verify details before relying on them.

Your Bookmarks

Sorry, you have no bookmarks yet.

DeepL, identified for textual content translation,...

The FCC Has a Quick Lane...

Hyundai expands into robotics and bodily...

Tech

AI

SEO

Security

How-To

Weibo's new open supply AI mannequin VibeThinker-1.5B outperforms DeepSeek-R1 on $7,800 post-training price range

Search

Follow Us

Join Our Community