Nous Research, the San Francisco-based synthetic intelligence startup, launched on Tuesday an open-source mathematical reasoning system known as Nomos 1 that achieved near-elite human efficiency on this yr’s William Lowell Putnam Mathematical Competition, one among the most prestigious and notoriously tough undergraduate math contests in the world.
The Putnam is identified for its issue: Whereas an ideal rating is 120, this yr’s prime rating was 90, and the median was simply 2. Nomos 1, in contrast, scored 87 factors — a outcome that may have ranked second out of three,988 contributors in the 2024 competitors, in accordance to the firm.
The discharge marks an inflection level in the quickly accelerating race to construct AI programs able to subtle mathematical reasoning. Not like the huge, compute-intensive fashions deployed by main expertise firms, Nomos 1 achieves its outcomes with a comparatively compact structure: 30 billion parameters with roughly 3 billion energetic at any given time, utilizing a mixture-of-experts design based mostly on Alibaba’s Qwen3 model.
“This rating would rank #2/3988 in 2024 and marks our first step with Hillclimb AI in the direction of making a SOTA AI mathematician,” Nous Analysis announced on social media Tuesday.
The identical base mannequin scored 24 factors with out Nous Analysis’s specialised coaching
Maybe most placing is the hole between Nomos 1 and its base mannequin. When Nous Analysis ran the identical Qwen3-30B-A3B-Thinking-2507 model by way of an similar testing harness, it scored simply 24 out of 120 — a outcome that underscores the essential significance of post-training optimization and specialised reasoning strategies over uncooked mannequin scale.
“Nomos 1 achieved an 87/120 with 8 excellent scores,” the firm said, noting that the efficiency distinction “is largely due to post-training and knowledge high quality moderately than the harness.”
The outcomes have been verified by way of blind grading by a human skilled who had beforehand completed in the prime 200 on the Putnam. Nous Research supplied the anonymized submissions to the grader, then printed the full set of de-anonymized recordsdata and the runbooks used to generate them on GitHub.
Why the Putnam competitors is thought-about the final check of mathematical reasoning
The William Lowell Putnam Mathematical Competition is an annual arithmetic competitors for undergraduate faculty college students enrolled at establishments of upper studying in the United States and Canada. It is broadly thought-about to be the most prestigious university-level arithmetic competitors in the world.
The notoriously brutal William Lowell Putnam Mathematical Competitors is extra of a mathematical sporting occasion than an educational check. The examination consists of two 3-hour periods separated by a 2-hour break. There are a complete of 12 questions to be solved, 6 for every session. Every query is value 10 factors, for a complete of 120 factors.
Putnam questions are not the sort that come up in common exams or textbooks. They are extra like puzzles than calculations, typically requiring college students to discover alternative ways to characterize issues before an answer would possibly unfold.
Final yr, practically 4,000 college students throughout the continent wrote the Putnam. Sixty-one per cent scored three factors or fewer, in accordance to the Mathematical Association of America, which organizes the competitors. The highest rating was 90 out of 120.
Many Putnam Fellows have gone on to change into distinguished researchers in arithmetic and different fields, together with three Fields Medalists — John Milnor, David Mumford, and Daniel Quillen — and two Nobel laureates in physics — Richard Feynman and Kenneth Wilson.
Inside the two-phase reasoning system that powers Nomos 1’s mathematical breakthroughs
Nomos 1 is a specialization of Qwen’s Qwen3-30B-A3B-Thinking model, optimized for mathematical problem-solving and proof-writing in pure language. The system was developed in collaboration with Hillclimb AI.
What distinguishes Nomos 1 from easy mannequin inference is its subtle reasoning harness — an open-source framework that orchestrates how the mannequin approaches and solves issues. The harness operates in two distinct phases inside a three-hour time restrict, mirroring the precise Putnam competitors construction.
In the fixing section, parallel staff concurrently sort out issues utilizing a priority-based system. Every employee picks an issue, generates a submission, then scores its personal work on a scale of 1 to 7. Issues with the fewest excellent scores obtain precedence, making certain the system focuses its compute on the hardest challenges. This course of continues till both all issues have achieved a goal variety of self-critiqued excellent scores or time runs out.
The finalization section begins quarter-hour before the time restrict (or at 50% for shorter runs) and employs a two-stage choice course of. First, a consolidation step teams submissions by conclusion and makes an attempt to establish the appropriate group — importantly, not essentially the majority group. Then, a pairwise event utilizing single elimination determines the closing submission for every downside.
“Our open supply reasoning system consists of a fixing section, the place staff try a least-solved downside and self-assess, adopted by a finalization section, which consolidates submissions to select a closing submission for every downside,” Nous Analysis explained.
How Nomos 1 compares to mathematical AI programs from DeepSeek, Google, and OpenAI
The Nomos 1 outcomes arrive amid a flurry of advances in mathematical reasoning AI. DeepSeek’s mannequin, DeepSeekMath-V2, scored 118 out of 120 factors on questions from the 2024 William Lowell Putnam Mathematical Competitors, beating the prime human rating of 90. The mannequin additionally carried out at the degree of gold-medal winners in the Worldwide Mathematical Olympiad.
This yr, Google’s superior Gemini model operated end-to-end in pure language, producing rigorous mathematical proofs immediately from the official downside descriptions – all inside the 4.5-hour competitors time restrict. They achieved this yr’s outcome utilizing a sophisticated model of Gemini Deep Think.
What makes Nomos 1’s achievement notable is not uncooked efficiency — it trails DeepSeek’s 118/120 — however moderately its accessibility and effectivity. At 30 billion parameters with solely 3 billion energetic, the mannequin can run on consumer-grade {hardware}, a stark distinction to the huge compute clusters required by frontier fashions from OpenAI and Google.
Hermes 4.3 arrived simply six days earlier, educated on a decentralized blockchain community
The Nomos 1 announcement follows carefully on the heels of Nous Analysis’s December 3 launch of Hermes 4.3, a general-purpose language mannequin that marked one other vital milestone for the firm.
Hermes 4.3, based mostly on ByteDance’s Seed-OSS-36B-Base model, is the first manufacturing mannequin that Nous Analysis educated completely on its Psyche network — a distributed coaching infrastructure that makes use of a novel optimizer known as DisTrO to coordinate coaching throughout nodes unfold all through knowledge facilities over the open web, secured by consensus on the Solana blockchain.
The corporate educated Hermes 4.3 each by way of conventional centralized strategies and on the Psyche network, particularly to verify that distributed coaching may match or exceed centralized efficiency for manufacturing workloads. The Psyche-trained model outperformed the centralized model throughout a set of downstream duties, the firm reported.
“The coaching run proved steady all through, averaging 144k tokens/second unfold throughout 24 Psyche nodes,” Nous Analysis said. “Utilizing DisTrO’s overlapped collective technique, the entirety of the P2P communications have been hidden by the coaching time, successfully reaching equal throughput to conventional, centralized coaching.”
Hermes 4.3 additionally achieved state-of-the-art outcomes on RefusalBench, a brand new benchmark that measures a mannequin’s willingness to be useful throughout a wide range of situations generally restricted by different fashions. The mannequin answered 74.60% of RefusalBench questions in non-reasoning mode, surpassing its predecessor Hermes 4 70B (59.50%) and outperforming closed fashions together with Grok 4 (51.30%) and Gemini 2.5 Professional (24.23%).
Small fashions with good coaching are closing the hole with trillion-parameter giants
Collectively, the two releases in a single week sign Nous Analysis’s strategic wager: that smaller, extra environment friendly fashions with subtle post-training strategies and reasoning harnesses can compete with — and in some circumstances outperform — the huge fashions developed by better-funded rivals.
For enterprise decision-makers, the implications are vital. Mathematical reasoning capabilities have purposes far past educational competitions: they’re important for formal verification, theorem proving, scientific modeling, cryptographic evaluation, and any area requiring rigorous logical deduction.
The open-source nature of each releases — Nomos 1 is accessible beneath the Apache 2.0 license on Hugging Face, with the full reasoning harness on GitHub — implies that organizations can deploy these capabilities on their very own infrastructure with out relying on API calls to main cloud suppliers.
“For the first time, anybody can run or entry a state-of-the-art AI mathematician,” one observer famous on social media. “This lowers the barrier to severe math analysis, proof verification, modeling advanced programs, superior reasoning work.”
The important thing contributors to Nomos 1 embrace Roger Jin, who led the coaching; Jeffrey Quesnelle and Dakota Mahan, who constructed the infrastructure; Chen Guang, who advised; and Ryan Teknium and Jeffrey Quesnelle, who supplied management. The mannequin was developed with contributions from Hillclimb AI and a group of math specialists together with Samuel Kim, Miron Yurkevich, and others.
The race to construct AI mathematicians is accelerating quicker than anybody predicted
The 86th Putnam Competition befell on Saturday, December 6, 2025 — simply three days before Nous Analysis launched Nomos 1. The timing underscores how quickly the subject is transferring: firms are now releasing mathematical AI programs able to near-elite human efficiency inside days of the competitions they’re designed to remedy.
Competitors in mathematical AI has intensified dramatically in latest months. In July, a sophisticated model of Google DeepMind’s Gemini model and an experimental reasoning mannequin from OpenAI each achieved gold standing on the IMO 2025. DeepSeek’s new model matched their efficiency, fixing 5 out of 6 issues.
However the useful resource necessities for these frontier programs stay prohibitive for many organizations. OpenAI’s o1-pro is estimated at over 1.8 trillion parameters; Google’s Gemini 2.5 Professional seemingly exceeds 400 billion. Nomos 1, in contrast, achieves aggressive outcomes with a fraction of that footprint.
The hole between huge frontier fashions and environment friendly open-source options is narrowing. And for organizations that want mathematical reasoning capabilities with out the price range for hyperscale compute, that hole might have simply closed sufficient to matter.
As one observer put it on social media: “This marks a big leap for AI math fashions that are sufficiently small to run on your laptop computer.”
A laptop computer that may now outperform practically 4,000 of the continent’s finest undergraduate mathematicians.
Disclaimer: This article is sourced from external platforms. OverBeta has not independently verified the information. Readers are advised to verify details before relying on them.