No Result
View All Result
  • Login
Monday, September 15, 2025
FeeOnlyNews.com
  • Home
  • Business
  • Financial Planning
  • Personal Finance
  • Investing
  • Money
  • Economy
  • Markets
  • Stocks
  • Trading
  • Home
  • Business
  • Financial Planning
  • Personal Finance
  • Investing
  • Money
  • Economy
  • Markets
  • Stocks
  • Trading
No Result
View All Result
FeeOnlyNews.com
No Result
View All Result
Home Business

AI keeps getting more powerful, making it harder to judge how smart models actually are

by FeeOnlyNews.com
2 months ago
in Business
Reading Time: 4 mins read
A A
0
AI keeps getting more powerful, making it harder to judge how smart models actually are
Share on FacebookShare on TwitterShare on LInkedIn



How do you judge an AI model when it’s already starting to perform better than human beings? That’s the challenge faced by researchers like Russell Wald, executive director of the Stanford Institute for Human-Centered Artificial Intelligence (HAI). 

“As of 2024, there are very few task categories where human ability surpasses AI, and even in these areas, the performance gap between AI and humans is shrinking rapidly,” Wald said last week in a presentation hosted at the Fortune Brainstorm AI Singapore conference. “AI is exceeding human capabilities and it’s becoming increasingly harder for us to benchmark.”

The HAI releases the AI Index each year, which aims to provide a comprehensive, data-driven snapshot of where AI is today. At Fortune Brainstorm AI Singapore, Wald shared a few highlights from the 2025 edition of the AI index, such as the increasing power of today’s models, the growing dominance of industry on the AI frontier, and how China is poised to overtake the U.S.

The following transcript has been lightly edited for conciseness and clarity.

I’m Russell Wald, the executive director of the Stanford Institute for Human-Centered Artificial Intelligence, or what we call “HAI”. 

We are Stanford University’s globally recognized interdisciplinary research institute at the forefront of shaping AI development for the public good. HAI was established in 2019 with the goal of advancing AI research, education, policy and practice. And, through our convening role and rigorous study of AI, we have become the trusted partner on AI governance for decision makers in industry, government and civil society. 

I’m going to talk about what we produce at HAI, which is the AI index, an annual data driven analysis of trends in AI that tracks research, development, deployment and the socio-economic impact of AI across academia, government and industry.

We see AI performance consistently improve year over year. We use Midjourney, a text-to-image generator, asking for a hyper-realistic image of Harry Potter. And from February 2022 to July 2024, we see rapidly increasing quality in these generated images. 

In 2022, the model produced cartoonish, inaccurate renderings of Harry Potter, but by 2024, it could create startlingly realistic depictions. We have gone from what mirrors a Picasso painting to an uncanny rendering of Daniel Radcliffe, the actor who played Harry Potter in the movies. 

Because of this consistent performance growth, we are increasingly challenged when it comes to benchmarking these models. As of 2024, there are very few task categories where human ability surpasses AI, and even in these areas, the performance gap between AI and humans is shrinking rapidly. From image recognition to competition-level mathematics to PhD-level science questions, AI is exceeding human capabilities and it’s becoming increasingly harder for us to benchmark.

From healthcare to transportation, AI is rapidly moving from the lab to our daily life. In 2023, the U.S. Food and Drug Administration approved 223 AI-enabled medical devices, up from just six in 2015. 

On the roads, self-driving cars are no longer experimental. For example, Waymo, which I regularly take while living in San Francisco, is one of the largest U.S. operators and provides over 150,000 autonomous rides each week, while Baidu’s affordable Apollo Go robotaxi has a fleet now that serves numerous cities across China. 

Business use of AI increased significantly after stagnating from 2017 to 2023. The latest McKinsey report reveals that 78% of surveyed respondents say their organizations have begun to use AI in at least one business function, marking a significant increase from 55% in 2023. 

Driven by increasingly capable small models, the inference cost for a system performing at the level of [GPT 3.5] dropped over 280-fold between November 2022 and October 2024. Hardware costs have declined 30% annually, while energy efficiency has improved by 40% each year. 

Open-weight models are also closing the gap with closed models, reducing the performance [gap] from 8% to just 1.7% on some benchmarks in a single year. Together, these trends are rapidly lowering the barriers to advanced AI. 

However, even with inference and hardware costs going down, training costs remain out of reach for academia and most small players. Nearly 90% of notable AI models in 2024 came from industry, which is up from 60% in 2023. And while academia remains a top source of highly cited research, it does struggle at this point to stay as advanced at the frontier level. 

Model scale continues to grow rapidly. Training compute doubles every five months, datasets every eight, and power use annually. Yet performance gaps are shrinking. The score difference between the top and 10th ranked models fell from 11.9% to 5.4% in a year, and the top two models are now separated by just 0.7%. The frontier is increasingly competitive and increasingly crowded. 

In recent years, AI model performance at the frontier has converged, with multiple providers now offering highly capable models. This marks a shift from late 2022, when ChatGPT’s launch, widely seen as AI’s breakthrough into the public consciousness, coincided with the landscape dominated by just two players: OpenAI and Google. 

One of the most important things to note is that the transformer model cost $930 for Google to train in 2017—and that is the T in GPT, the baseline level of architecture—and now today we’re at $200 million to train Gemini Ultra. 

Last year’s AI index was among the first publications to highlight the lack of standard benchmarks for AI safety and responsibility evaluations. The index has also been analyzing global public opinion. If you are from a non-Western industrialized nation, you are more likely to view AI positively than not. China has an 83% positive view, Indonesia 80%, and Thailand 77%. Whereas Canada is at 40%, the U.S. 39%, and the Netherlands 36%. 

I’ll close with the geopolitical situation. The U.S. still maintains a lead in AI, followed closely by China. However, this gap is tightening. My intention is not to exacerbate the idea of an AI arms race between China and the U.S., but instead to highlight the different approaches between the most advanced frontier AI model developers. 

Over the last several years, the U.S. has relied on a few proprietary model providers. Meanwhile, China has deeply invested in its talent base, and more importantly, an open-source environment. If this trend continues, and I appear next year, at this rate, China would surpass the U.S. in terms of model performance. 



Source link

Tags: harderJudgeMakingModelsPowerfulSmart
ShareTweetShare
Previous Post

Amsterdam’s Labfresh raises €1M via crowdfunding to launch smart womenswear line

Next Post

Nuvama shares sink 6.5% in 2 days amid tax raids tied to Jane Street probe

Related Posts

 Klarna and Google CEOs are vibe coding—a skill that could help you land your next job

 Klarna and Google CEOs are vibe coding—a skill that could help you land your next job

by FeeOnlyNews.com
September 15, 2025
0

Vibe coding has made it to the C-suite, and tech executives say it is saving them huge amounts of time....

Google’s market cap tops  trillion for the first time

Google’s market cap tops $3 trillion for the first time

by FeeOnlyNews.com
September 15, 2025
0

Google parent Alphabet (GOOG, GOOGL) became the fourth company to hit a market cap of $3 trillion Monday. The stock...

Netanyahu: We must be self-sufficient in weapons

Netanyahu: We must be self-sufficient in weapons

by FeeOnlyNews.com
September 15, 2025
0

Israeli Prime Minister Benjamin Netanyahu spoke today about Israel's international diplomatic isolation and said the country would be required,...

These are the tasks Indeed’s new CEO says HR leaders should hand over to AI agents

These are the tasks Indeed’s new CEO says HR leaders should hand over to AI agents

by FeeOnlyNews.com
September 15, 2025
0

Just three months after returning to the top job, Indeed CEO Hisayuki “Deko” Idekoba says he’s regularly working 15-hour days...

Three top execs leave digital bank One Zero

Three top execs leave digital bank One Zero

by FeeOnlyNews.com
September 15, 2025
0

Israeli digital bank One Zero today announced that three top executives are leaving: Deputy CEO and chief revenue officer...

Elon Musk buys  billion worth of Tesla shares from open market

Elon Musk buys $1 billion worth of Tesla shares from open market

by FeeOnlyNews.com
September 15, 2025
0

Tesla Inc Chief Executive Officer Elon Musk has purchased company's shares worth $1 billion from the open market. He bought...

Next Post
Nuvama shares sink 6.5% in 2 days amid tax raids tied to Jane Street probe

Nuvama shares sink 6.5% in 2 days amid tax raids tied to Jane Street probe

IDFC First Bank allots Rs 4,876 crore worth preference shares to Warburg Pincus affiliate

IDFC First Bank allots Rs 4,876 crore worth preference shares to Warburg Pincus affiliate

  • Trending
  • Comments
  • Latest
1 Stock to Buy, 1 Stock to Sell This Week: Walmart, Target

1 Stock to Buy, 1 Stock to Sell This Week: Walmart, Target

August 17, 2025
Of Property Rights, Civil Society, and Shampoo

Of Property Rights, Civil Society, and Shampoo

September 1, 2025
Engine Capital takes a stake in Avantor. Activist sees several ways to create value

Engine Capital takes a stake in Avantor. Activist sees several ways to create value

August 16, 2025
James Galbraith: Crash in Top Economist Hiring Contradicts Elite-Favoring “Skill Biased Technical Change” Theory

James Galbraith: Crash in Top Economist Hiring Contradicts Elite-Favoring “Skill Biased Technical Change” Theory

September 2, 2025
Vanguard reaches .5M SEC settlement

Vanguard reaches $19.5M SEC settlement

August 29, 2025
Meet a 23-year-old electrician who was a ‘good student’ but skipped college to become his own boss. He makes 6 figures

Meet a 23-year-old electrician who was a ‘good student’ but skipped college to become his own boss. He makes 6 figures

September 14, 2025
 Klarna and Google CEOs are vibe coding—a skill that could help you land your next job

 Klarna and Google CEOs are vibe coding—a skill that could help you land your next job

0
From Starting Over at 30 to 17 Rentals (and Financial Freedom) 5 Years Later

From Starting Over at 30 to 17 Rentals (and Financial Freedom) 5 Years Later

0
Mortgage Rates Today, Monday, September 15: Heading Lower

Mortgage Rates Today, Monday, September 15: Heading Lower

0
The Weekly Notable Startup Funding Report: 9/15/25 – AlleyWatch

The Weekly Notable Startup Funding Report: 9/15/25 – AlleyWatch

0
NRF Europe Innovators Showcase: Retail Tech To Watch

NRF Europe Innovators Showcase: Retail Tech To Watch

0
Associate Advisors, 9 Ways To Stand Out In Your First 90 Days

Associate Advisors, 9 Ways To Stand Out In Your First 90 Days

0
 Klarna and Google CEOs are vibe coding—a skill that could help you land your next job

 Klarna and Google CEOs are vibe coding—a skill that could help you land your next job

September 15, 2025
NRF Europe Innovators Showcase: Retail Tech To Watch

NRF Europe Innovators Showcase: Retail Tech To Watch

September 15, 2025
Minimum Tenure Personal Loans for Quick Fixes

Minimum Tenure Personal Loans for Quick Fixes

September 15, 2025
Crypto Firms Invited To Serve 40 Million Users

Crypto Firms Invited To Serve 40 Million Users

September 15, 2025
Google’s market cap tops  trillion for the first time

Google’s market cap tops $3 trillion for the first time

September 15, 2025
Netanyahu: We must be self-sufficient in weapons

Netanyahu: We must be self-sufficient in weapons

September 15, 2025
FeeOnlyNews.com

Get the latest news and follow the coverage of Business & Financial News, Stock Market Updates, Analysis, and more from the trusted sources.

CATEGORIES

  • Business
  • Cryptocurrency
  • Economy
  • Financial Planning
  • Investing
  • Market Analysis
  • Markets
  • Money
  • Personal Finance
  • Startups
  • Stock Market
  • Trading

LATEST UPDATES

  •  Klarna and Google CEOs are vibe coding—a skill that could help you land your next job
  • NRF Europe Innovators Showcase: Retail Tech To Watch
  • Minimum Tenure Personal Loans for Quick Fixes
  • Our Great Privacy Policy
  • Terms of Use, Legal Notices & Disclaimers
  • About Us
  • Contact Us

Copyright © 2022-2024 All Rights Reserved
See articles for original source and related links to external sites.

Welcome Back!

Sign In with Facebook
Sign In with Google
Sign In with Linked In
OR

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
No Result
View All Result
  • Home
  • Business
  • Financial Planning
  • Personal Finance
  • Investing
  • Money
  • Economy
  • Markets
  • Stocks
  • Trading

Copyright © 2022-2024 All Rights Reserved
See articles for original source and related links to external sites.