Right this moment, MLCommons introduced new outcomes for the MLPerf Coaching v5.1 benchmark suite, highlighting the speedy evolution and rising richness of the AI ecosystem in addition to vital efficiency enhancements from new generations of methods.
Go right here to view the full results for MLPerf Training v5.1 and discover further information about the benchmarks.
The MLPerf Coaching benchmark suite includes full system assessments that stress fashions, software program, and {hardware} for a variety of machine studying (ML) purposes. The open-source and peer-reviewed benchmark suite gives a stage enjoying subject for competitors that drives innovation, efficiency, and vitality effectivity for the whole business.
Model 5.1 set new information for range of the methods submitted. Individuals on this spherical of the benchmark submitted 65 distinctive methods, that includes 12 completely different {hardware} accelerators and a wide range of software program frameworks. Practically half of the submissions have been multi-node, which is an 86 p.c improve from the model 4.1 spherical one 12 months in the past. The multi-node submissions employed a number of completely different community architectures, many incorporating customized options.
This spherical recorded substantial efficiency enhancements over the model 5.0 outcomes for 2 benchmark assessments targeted on generative AI eventualities, outpacing the price of enchancment predicted by Moore’s Legislation.
Relative efficiency enhancements throughout the MLPerf Coaching benchmarks, normalized to the Moore’s Legislation trendline at the cut-off date when every benchmark was launched. (Supply: MLCommons)
“Extra selections of {hardware} methods permit clients to evaluate methods on state-of-the-art MLPerf benchmarks and make knowledgeable shopping for selections,” mentioned Shriya Rishab, co-chair of the MLPerf Coaching working group. “{Hardware} suppliers are utilizing MLPerf as a method to showcase their merchandise in multi-node settings with nice scaling effectivity, and the efficiency enhancements recorded on this spherical display that the vibrant innovation in the AI ecosystem is making a giant distinction.”
The MLPerf Coaching v5.1 spherical consists of efficiency outcomes from 20 submitting organizations: AMD, ASUSTeK, Cisco, Datacrunch, Dell, Giga Computing, HPE, Krai, Lambda, Lenovo, MangoBoost, MiTAC, Nebius, NVIDIA, Oracle, Quanta Cloud Know-how, Supermicro, Supermicro + MangoBoost, College of Florida, Wiwynn. “We might particularly like to welcome first-time MLPerf Coaching submitters, Datacrunch, College of Florida, and Wiwynn” mentioned David Kanter, Head of MLPerf at MLCommons.
The sample of submissions additionally exhibits an rising emphasis on benchmarks targeted on generative AI (genAI) duties, with a 24 p.c improve in submissions for the Llama 2 70B LoRA benchmark, and a 15 p.c improve for the new Llama 3.1 8B benchmark over the take a look at it changed (BERT). “Taken collectively, the elevated submissions to genAI benchmarks and the sizable efficiency enhancements recorded in these assessments make it clear that the group is closely targeted on genAI eventualities, to some extent at the expense of different potential purposes of AI know-how,” mentioned Kanter. “We’re proud to be delivering these sorts of key insights into the place the subject is headed that permit all stakeholders to make extra knowledgeable selections.”
Strong participation by a broad set of business stakeholders strengthens the AI ecosystem as an entire and helps to make sure that the benchmark is serving the group’s wants. We invite submitters and different stakeholders to be part of the MLPerf Training working group and assist us proceed to evolve the benchmark.
MLPerf Coaching v5.1 Updates 2 Benchmarks
The gathering of assessments in the suite is curated to hold tempo with the subject, with particular person assessments added, up to date, or eliminated as deemed crucial by a panel of consultants from the AI group.
In the 5.1 benchmark launch, two earlier assessments have been changed with new ones that higher symbolize the state-of-the-art know-how options for the similar activity. Particularly: Llama 3.1 8B replaces BERT; and Flux.1 replaces Steady Diffusion v2.
Llama 3.1 8B is a benchmark take a look at for pretraining a big language mannequin (LLM). It belongs to the similar “herd” of fashions as the Llama 3.1 405B benchmark already in the suite, however because it has fewer trainable parameters, it may be run on only a single node and deployed to a broader vary of methods. This makes the take a look at accessible to a wider vary of potential submitters, whereas remaining a very good proxy for the efficiency of bigger clusters. Extra details on the Llama 3.1 8B benchmark could be discovered on this white paper https://mlcommons.org/2025/10/training-llama-3-1-8b/.
Flux.1 is a transformer-based text-to-image benchmark. Since Steady Diffusion v2 was launched into the MLPerf Coaching suite in 2023, text-to-image fashions have advanced in two necessary methods: they’ve built-in a transformer structure into the diffusion course of, and their parameter counts have grown by an order of magnitude. Flux.1, incorporating a transformer-based 11.9 billion–parameter mannequin, displays the present state of the artwork in generative AI for text-to-image duties. This white paper https://mlcommons.org/2025/10/training-flux1/ gives extra information on the Flux.1 benchmark.
“The sphere of AI is a shifting goal, consistently evolving with new eventualities and capabilities,” mentioned Paul Baumstarck, co-chair of the MLPerf Coaching working group. “We’ll proceed to evolve the MLPerf Coaching benchmark suite to make sure that we are measuring what is necessary to the group, each in the present day and tomorrow.”
Disclaimer: This article is sourced from external platforms. OverBeta has not independently verified the information. Readers are advised to verify details before relying on them.