UNESCO’s Hands-On AI Supervision

AI Safety & Benchmarking: Building Trustworthy Evaluation Ecosystems

34min | 02/01/2026

Description

Effective AI supervision requires reliable benchmarking ecosystems. Nicholas Miailhe discusses why benchmarks matter, how they should be constructed, and what regulators need to know about safety evaluations. The conversation highlights emerging international efforts to standardise safety testing and ensure comparability across models.

Speaker: Nicholas Miailhe (PRISM Eval)

Interviewer: Doaa Abu Elyounes, Programme Specialist, Ethics of AI Unit, UNESCO

Hosted on Ausha. See ausha.co/privacy-policy for more information.

About UNESCO’s Hands-On AI Supervision

UNESCO’s Hands-On AI Supervision: Lessons from Practice is a six-episode mini podcast series showcasing concrete lessons from the 2nd Expert Roundtable on AI Supervision, convened by UNESCO. Each episode distils insights from hands-on exercises with leading experts on AI risk mapping, evaluations, red teaming, benchmarking, cybersecurity, and engagement with market actors. Designed for regulators, policymakers, and practitioners, the series explores practical methodologies, emerging challenges, and the institutional capacities needed for effective AI oversight. Through focused conversations with specialists, the series provides accessible, actionable knowledge to strengthen technical readiness and foster ongoing dialogue across the global AI supervision community.

Hosted on Ausha. See ausha.co/privacy-policy for more information.

About UNESCO’s Hands-On AI Supervision

UNESCO’s Hands-On AI Supervision: Lessons from Practice is a six-episode mini podcast series showcasing concrete lessons from the 2nd Expert Roundtable on AI Supervision, convened by UNESCO. Each episode distils insights from hands-on exercises with leading experts on AI risk mapping, evaluations, red teaming, benchmarking, cybersecurity, and engagement with market actors. Designed for regulators, policymakers, and practitioners, the series explores practical methodologies, emerging challenges, and the institutional capacities needed for effective AI oversight. Through focused conversations with specialists, the series provides accessible, actionable knowledge to strengthen technical readiness and foster ongoing dialogue across the global AI supervision community.

Hosted on Ausha. See ausha.co/privacy-policy for more information.

Description

Effective AI supervision requires reliable benchmarking ecosystems. Nicholas Miailhe discusses why benchmarks matter, how they should be constructed, and what regulators need to know about safety evaluations. The conversation highlights emerging international efforts to standardise safety testing and ensure comparability across models.

Speaker: Nicholas Miailhe (PRISM Eval)

Interviewer: Doaa Abu Elyounes, Programme Specialist, Ethics of AI Unit, UNESCO

Hosted on Ausha. See ausha.co/privacy-policy for more information.

About UNESCO’s Hands-On AI Supervision

UNESCO’s Hands-On AI Supervision: Lessons from Practice is a six-episode mini podcast series showcasing concrete lessons from the 2nd Expert Roundtable on AI Supervision, convened by UNESCO. Each episode distils insights from hands-on exercises with leading experts on AI risk mapping, evaluations, red teaming, benchmarking, cybersecurity, and engagement with market actors. Designed for regulators, policymakers, and practitioners, the series explores practical methodologies, emerging challenges, and the institutional capacities needed for effective AI oversight. Through focused conversations with specialists, the series provides accessible, actionable knowledge to strengthen technical readiness and foster ongoing dialogue across the global AI supervision community.

Hosted on Ausha. See ausha.co/privacy-policy for more information.

About UNESCO’s Hands-On AI Supervision

UNESCO’s Hands-On AI Supervision: Lessons from Practice is a six-episode mini podcast series showcasing concrete lessons from the 2nd Expert Roundtable on AI Supervision, convened by UNESCO. Each episode distils insights from hands-on exercises with leading experts on AI risk mapping, evaluations, red teaming, benchmarking, cybersecurity, and engagement with market actors. Designed for regulators, policymakers, and practitioners, the series explores practical methodologies, emerging challenges, and the institutional capacities needed for effective AI oversight. Through focused conversations with specialists, the series provides accessible, actionable knowledge to strengthen technical readiness and foster ongoing dialogue across the global AI supervision community.

Hosted on Ausha. See ausha.co/privacy-policy for more information.

Share

Embed