CDLI Inclusion Project

CDLI Inclusive Speech Technology

Senses Hub partners with CDLI, GDI Hub (UCL), and Strathmore University to build inclusive ASR for non-standard speech across African languages. Funded by UK International Development and Google.org. Collecting 50hrs/language of dysarthria, stuttering, and cleft palate speech data across Kenyan English, Kiswahili, Ugandan English, Luganda, Kinyarwanda, and Rwandese English.

  • Build inclusive ASR for non-standard speech across African languages.
  • Collecting 50hrs/language of dysarthria, stuttering, and cleft palate speech data
  • Across Kenyan English, Kiswahili, Ugandan English, Luganda, Kinyarwanda, and Rwandese English.
CDLI field participant taking part in inclusive speech data collection
6
African Languages
300h
Inclusive Speech Target

Project Overview

Senses Hub partners with CDLI, GDI Hub (UCL), and Strathmore University to build inclusive ASR for non-standard speech across African languages. The project is funded by UK International Development and Google.org, and is collecting 50hrs/language of dysarthria, stuttering, and cleft palate speech data across Kenyan English, Kiswahili, Ugandan English, Luganda, Kinyarwanda, and Rwandese English.

Key Partners

CDLI · GDI Hub · UCL · Strathmore University · iLabAfrica · Njeri Maria Foundation

Funders

UK Intl Dev · Google.org

Non-Standard Speech Datasets

50hrs/language across 6 African languages. Conditions include dysarthria, stuttering, and cleft palate, with open-source release planned.

Custom Cards Workshops

Co-design phrase collection in Nairobi (June 2025). Languages include English, Kiswahili, Sheng, and regional dialects.

Innovation Sprint 2025

5-month hackathon (Jul-Nov 2025) across 3 tracks: Research, Modelling, and Product. Prize pool: $5,000. Demo Day: 21 Nov at Strathmore University.

Demo Day video embed (captioned):

Ethics and Consent Framework

Dedicated section with a plain-language explanation of how speech data is collected, stored, anonymised, and used. This framework should be linked from all project pages sitewide.

  • Informed consent before recording
  • Participant-rights and withdrawal process
  • Secure storage and role-based access controls
  • Clear data use policy for research and open-source outputs

Impact and Outputs

Quarterly updated by CMS. Includes links to published papers and cdl-inclusion.com.

6
Languages
300
Total Hours (Target)
3
Sprint Tracks
1
Open-Source Release Wave

Get Involved - 4 Pathways

1. Record Your Speech

Contribute your voice to improve inclusive ASR models.

Start Recording

2. Annotator / Speech Therapist

Support labelling and clinical interpretation of speech data.

Apply as Specialist

3. Developer / Researcher

Build models, tools, and accessible applications with the datasets.

Join the Build

4. Organisation Partner

Collaborate on training, deployments, and community outreach.

Partner With Us

Contact: CDLI@senseshub.vision