Language Performance Benchmark

MLPerf 3.1 adds large language model benchmarks for inference

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More MLCommons is growing its suite of MLPerf AI benchmarks with the addition ...

Hosted on MSN

LiveBench: A Dynamic Benchmark for Large Language Models

In an article recently submitted to the arXiv* server, researchers introduced LiveBench, a benchmark designed to prevent test set contamination and biases from large language model (LLM) judging and ...

SiliconANGLE

Midrange and open-source large language models earn top marks in new AI accuracy benchmark

Artificial intelligence startup Galileo Technologies Inc. today released the results of a benchmark test that compared the accuracy of the industry’s most popular large language models. The ...

EurekAlert!

MathEval: a comprehensive benchmark for evaluating large language models on mathematical reasoning capabilities

This study introduces MathEval, a comprehensive benchmarking framework designed to systematically evaluate the mathematical reasoning capabilities of large language models (LLMs). Addressing key ...

VentureBeat

Nvidia triples and Intel doubles generative AI inference performance on new MLPerf benchmark

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More MLCommons is out today with its MLPerf 4.0 benchmarks for inference, once ...

ZDNet

Google, Nvidia split top marks in MLPerf AI training benchmark

Google and Nvidia split the top scores for the twice-yearly benchmark test of artificial intelligence program training, according to data released Wednesday by the MLCommons, the industry consortium ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results