Making AI Inference Affordable - Opening Voices with Steeve Morin of ZML | Quentin Adam

Description

If AI is to power the entire economy, inference must become affordable, scalable and widely available.

In this third part of Opening Voices, Quentin Adam continues the conversation with Steeve Morin, founder and CEO of ZML, to explore what it really takes to industrialise inference. They discuss:

why AI must move from “chatbots as products” to AI as an infrastructure primitive
why inference will power every sector — banks, startups, industry
how efficiency gains (sometimes 5x, 10x, even 100x+) are still possible

why GPUs are not the only path forward
how new chips (TPUs, NPUs and emerging players) are reopening the semiconductor market
why power, density and optimisation now matter more than raw experimentation

This episode explains why the next wave is not about building better models, but about making inference economically viable at scale.

—

Episode Chapters: Making Inference Available

00:00 – Introduction and Context

01:38 – AI as a Primitive vs. AI as a Product

04:19 – The Economic Unit of the Token

05:15 – Scaling Compute for Inference

07:31 – A Revolution Comparable to Mobile

08:43 – Beyond GPUs

10:56 – Compiler Errors and Efficiency Waste

12:38 – Understanding Chips

15:31 – The New "Blue Ocean" of Semiconductors

20:40 – Nvidia's Strategy and Competition

21:43 – Conclusion and Next Episode

Hébergé par Ausha. Visitez ausha.co/politique-de-confidentialite pour plus d'informations.

About Quentin Adam

Je discute avec celles et ceux qui construisent le numérique.

Je suis Quentin Adam, CEO de Clever Cloud. Et j’essaie de comprendre un truc assez simple en apparence : comment fonctionne réellement l’infrastructure du monde numérique. Cloud, intelligence artificielle, open source, architecture Internet ou souveraineté : derrière ces sujets, il y a des systèmes techniques très concrets, et des choix industriels et politiques.

À travers Opening Voices, Les Bonnes Choses du Numérique ou Dîner en Ville, je prends le temps d’échanger avec des gens qui savent de quoi ils parlent. Ingénieurs, entrepreneurs, chercheurs, responsables publics, on parle technique mais aussi pouvoir, contraintes, arbitrages, et de ce que ça implique.

Parce que l’infrastructure, ce n’est pas neutre. Ça structure ce qui est possible. Et ce qui ne l’est plus.

Si vous voulez comprendre comment fonctionne vraiment le numérique, vous êtes au bon endroit.

Bienvenue.

Hébergé par Ausha. Visitez ausha.co/politique-de-confidentialite pour plus d'informations.

About Quentin Adam

Je discute avec celles et ceux qui construisent le numérique.

Parce que l’infrastructure, ce n’est pas neutre. Ça structure ce qui est possible. Et ce qui ne l’est plus.

Si vous voulez comprendre comment fonctionne vraiment le numérique, vous êtes au bon endroit.

Bienvenue.

Hébergé par Ausha. Visitez ausha.co/politique-de-confidentialite pour plus d'informations.

Description

If AI is to power the entire economy, inference must become affordable, scalable and widely available.

In this third part of Opening Voices, Quentin Adam continues the conversation with Steeve Morin, founder and CEO of ZML, to explore what it really takes to industrialise inference. They discuss:

why AI must move from “chatbots as products” to AI as an infrastructure primitive
why inference will power every sector — banks, startups, industry
how efficiency gains (sometimes 5x, 10x, even 100x+) are still possible

why GPUs are not the only path forward
how new chips (TPUs, NPUs and emerging players) are reopening the semiconductor market
why power, density and optimisation now matter more than raw experimentation

This episode explains why the next wave is not about building better models, but about making inference economically viable at scale.

—

Episode Chapters: Making Inference Available

00:00 – Introduction and Context

01:38 – AI as a Primitive vs. AI as a Product

04:19 – The Economic Unit of the Token

05:15 – Scaling Compute for Inference

07:31 – A Revolution Comparable to Mobile

08:43 – Beyond GPUs

10:56 – Compiler Errors and Efficiency Waste

12:38 – Understanding Chips

15:31 – The New "Blue Ocean" of Semiconductors

20:40 – Nvidia's Strategy and Competition

21:43 – Conclusion and Next Episode

Hébergé par Ausha. Visitez ausha.co/politique-de-confidentialite pour plus d'informations.

About Quentin Adam

Je discute avec celles et ceux qui construisent le numérique.

Parce que l’infrastructure, ce n’est pas neutre. Ça structure ce qui est possible. Et ce qui ne l’est plus.

Si vous voulez comprendre comment fonctionne vraiment le numérique, vous êtes au bon endroit.

Bienvenue.

Hébergé par Ausha. Visitez ausha.co/politique-de-confidentialite pour plus d'informations.

About Quentin Adam

Je discute avec celles et ceux qui construisent le numérique.

Parce que l’infrastructure, ce n’est pas neutre. Ça structure ce qui est possible. Et ce qui ne l’est plus.

Si vous voulez comprendre comment fonctionne vraiment le numérique, vous êtes au bon endroit.

Bienvenue.

Hébergé par Ausha. Visitez ausha.co/politique-de-confidentialite pour plus d'informations.

Embed

You may also like

Description

If AI is to power the entire economy, inference must become affordable, scalable and widely available.

In this third part of Opening Voices, Quentin Adam continues the conversation with Steeve Morin, founder and CEO of ZML, to explore what it really takes to industrialise inference. They discuss:

why AI must move from “chatbots as products” to AI as an infrastructure primitive
why inference will power every sector — banks, startups, industry
how efficiency gains (sometimes 5x, 10x, even 100x+) are still possible

why GPUs are not the only path forward
how new chips (TPUs, NPUs and emerging players) are reopening the semiconductor market
why power, density and optimisation now matter more than raw experimentation

This episode explains why the next wave is not about building better models, but about making inference economically viable at scale.

—

Episode Chapters: Making Inference Available

00:00 – Introduction and Context

01:38 – AI as a Primitive vs. AI as a Product

04:19 – The Economic Unit of the Token

05:15 – Scaling Compute for Inference

07:31 – A Revolution Comparable to Mobile

08:43 – Beyond GPUs

10:56 – Compiler Errors and Efficiency Waste

12:38 – Understanding Chips

15:31 – The New "Blue Ocean" of Semiconductors

20:40 – Nvidia's Strategy and Competition

21:43 – Conclusion and Next Episode

Hébergé par Ausha. Visitez ausha.co/politique-de-confidentialite pour plus d'informations.

About Quentin Adam

Je discute avec celles et ceux qui construisent le numérique.

Parce que l’infrastructure, ce n’est pas neutre. Ça structure ce qui est possible. Et ce qui ne l’est plus.

Si vous voulez comprendre comment fonctionne vraiment le numérique, vous êtes au bon endroit.

Bienvenue.

Hébergé par Ausha. Visitez ausha.co/politique-de-confidentialite pour plus d'informations.

About Quentin Adam

Je discute avec celles et ceux qui construisent le numérique.

Parce que l’infrastructure, ce n’est pas neutre. Ça structure ce qui est possible. Et ce qui ne l’est plus.

Si vous voulez comprendre comment fonctionne vraiment le numérique, vous êtes au bon endroit.

Bienvenue.

Hébergé par Ausha. Visitez ausha.co/politique-de-confidentialite pour plus d'informations.

Description

If AI is to power the entire economy, inference must become affordable, scalable and widely available.

In this third part of Opening Voices, Quentin Adam continues the conversation with Steeve Morin, founder and CEO of ZML, to explore what it really takes to industrialise inference. They discuss:

why AI must move from “chatbots as products” to AI as an infrastructure primitive
why inference will power every sector — banks, startups, industry
how efficiency gains (sometimes 5x, 10x, even 100x+) are still possible

why GPUs are not the only path forward
how new chips (TPUs, NPUs and emerging players) are reopening the semiconductor market
why power, density and optimisation now matter more than raw experimentation

This episode explains why the next wave is not about building better models, but about making inference economically viable at scale.

—

Episode Chapters: Making Inference Available

00:00 – Introduction and Context

01:38 – AI as a Primitive vs. AI as a Product

04:19 – The Economic Unit of the Token

05:15 – Scaling Compute for Inference

07:31 – A Revolution Comparable to Mobile

08:43 – Beyond GPUs

10:56 – Compiler Errors and Efficiency Waste

12:38 – Understanding Chips

15:31 – The New "Blue Ocean" of Semiconductors

20:40 – Nvidia's Strategy and Competition

21:43 – Conclusion and Next Episode

Hébergé par Ausha. Visitez ausha.co/politique-de-confidentialite pour plus d'informations.

About Quentin Adam

Je discute avec celles et ceux qui construisent le numérique.

Parce que l’infrastructure, ce n’est pas neutre. Ça structure ce qui est possible. Et ce qui ne l’est plus.

Si vous voulez comprendre comment fonctionne vraiment le numérique, vous êtes au bon endroit.

Bienvenue.

Hébergé par Ausha. Visitez ausha.co/politique-de-confidentialite pour plus d'informations.

About Quentin Adam

Je discute avec celles et ceux qui construisent le numérique.

Parce que l’infrastructure, ce n’est pas neutre. Ça structure ce qui est possible. Et ce qui ne l’est plus.

Si vous voulez comprendre comment fonctionne vraiment le numérique, vous êtes au bon endroit.

Bienvenue.

Hébergé par Ausha. Visitez ausha.co/politique-de-confidentialite pour plus d'informations.

Embed