Featured Articles
MLCommons and AI Verify to collaborate on AI Safety Initiative
Agree to a memorandum of intent to collaborate on a set of AI safety benchmarks for LLMs
Creating a comprehensive Test Specification Schema for AI Safety
Helping to systematically document the creation, implementation, and execution of AI safety tests
Announcing MLCommons AI Safety v0.5 Proof of Concept
Achieving a major milestone towards standard benchmarks for evaluating AI Safety
The AI Safety Ecosystem Needs Standard Benchmarks
IEEE Spectrum contributed blog excerpt, authored by the MLCommons AI Safety working group
New MLPerf Inference Benchmark Results Highlight The Rapid Growth of Generative AI Models
With 70 billion parameters, Llama 2 70B is the largest model added to the MLPerf Inference benchmark suite.
Llama 2 70B: An MLPerf Inference Benchmark for Large Language Models
MLPerf Inference task force shares insights on the selection of Llama 2 for the latest MLPerf Inference benchmark round.
Blog
-
Creating a comprehensive Test Specification Schema for AI Safety
Helping to systematically document the creation, implementation, and execution of AI safety tests
-
Unveiling the PRISM Alignment Project
Prioritizing the Data-Centric Human Factors for Aligning Large Language Models
-
The AI Safety Ecosystem Needs Standard Benchmarks
IEEE Spectrum contributed blog excerpt, authored by the MLCommons AI Safety working group
News
-
MLCommons and AI Verify to collaborate on AI Safety Initiative
Agree to a memorandum of intent to collaborate on a set of AI safety benchmarks for LLMs
-
MLPerf Mobile v4.0 application adds new benchmark, expands hardware support
The mobile benchmark suite adds a brand-new image classification model and supports neural acceleration on some of the latest mobile devices
-
Unveiling the PRISM Alignment Project
Prioritizing the Data-Centric Human Factors for Aligning Large Language Models