Large Language Models Benchmarks

Morning Overview on MSN

China’s open AI models are neck-and-neck with the West. What’s next

China’s latest generation of open large language models has moved from catching up to actively challenging Western leaders on ...

How 2025 Recalibrated AI Models Race

In 2025, large language models moved beyond benchmarks to efficiency, reliability, and integration, reshaping how AI is ...

Morningstar

Logical Intelligence Achieves 76 Percent on Putnam Benchmark, Highlighting Shift Beyond Large Language Models to Language-free, Mathematically Grounded Models

Logical Intelligence Achieves 76 Percent on Putnam Benchmark, Highlighting Shift Beyond Large Language Models to Language-free, Mathematically Grounded Models Over the last decade, artificial ...

The Brighterside of News on MSN

New memory structure helps AI models think longer and faster without using more power

Researchers from the University of Edinburgh and NVIDIA have introduced a new method that helps large language models reason ...

Z.ai Open-Sources GLM-4.7, a New Generation Large Language Model Built for Real Development Workflows

Z.ai released GLM-4.7 ahead of Christmas, marking the latest iteration of its GLM large language model family. As open-source models move beyond chat-based applications and into production ...

VietNamNet

CMC OpenAI unveils Vietnam’s first legal LLM and benchmark suite

The Vietnamese tech group CMC is shaping the country’s legal AI future through VLegal-Bench and CMC-AI-Legal-32B, pioneering ...

ZDNet

With AI models clobbering every benchmark, it's time for human evaluation

Artificial intelligence has traditionally advanced through automatic accuracy tests in tasks meant to approximate human knowledge. Carefully crafted benchmark tests such as The General Language ...

5don MSN

The AI History That Explains Fears of a Bubble

The history of AI shows how setting evaluation standards fueled progress. But today's LLMs are asked to do tasks without ...

The New York Sun

AI Scores Poorly on Articulating Christian Values, Reports Tech Firm Measuring Support for Human Flourishing

A study of 10 large language models finds that most large language models give generic, secular answers to Christian-based ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results