CDLI Inclusion Project

CDLI Inclusive Speech Technology

Senses Hub partners with CDLI, GDI Hub (UCL), and Strathmore University to build inclusive ASR for non-standard speech across African languages. Funded by UK International Development and Google.org. Collecting 50hrs/language of dysarthria, stuttering, and cleft palate speech data across Kenyan English, Kiswahili, Ugandan English, Luganda, Kinyarwanda, and Rwandese English.

Build inclusive ASR for non-standard speech across African languages.
Collecting 50hrs/language of dysarthria, stuttering, and cleft palate speech data
Across Kenyan English, Kiswahili, Ugandan English, Luganda, Kinyarwanda, and Rwandese English.

Explore Project Details Join the Initiative

CDLI field participant taking part in inclusive speech data collection

African Languages

300h

Inclusive Speech Target

Project Overview

Senses Hub partners with CDLI, GDI Hub (UCL), and Strathmore University to build inclusive ASR for non-standard speech across African languages. The project is funded by UK International Development and Google.org, and is collecting 50hrs/language of dysarthria, stuttering, and cleft palate speech data across Kenyan English, Kiswahili, Ugandan English, Luganda, Kinyarwanda, and Rwandese English.

Key Partners

CDLI · GDI Hub · UCL · Strathmore University · iLabAfrica · Njeri Maria Foundation

Funders

UK Intl Dev · Google.org

Non-Standard Speech Datasets

50hrs/language across 6 African languages. Conditions include dysarthria, stuttering, and cleft palate, with open-source release planned.

Custom Cards Workshops

Co-design phrase collection in Nairobi (June 2025). Languages include English, Kiswahili, Sheng, and regional dialects.

Community participants co-designing phrase cards in Nairobi workshop — Participants co-designing phrase cards in Nairobi.

Facilitator guiding multilingual phrase collection session with speech therapists — Multilingual phrase collection with facilitators and therapists.

Speech participant sharing non-standard speech samples during field data collection — Field data recording for non-standard speech across local dialects.

Researchers and community team reviewing workshop notes and consent forms — Research team reviewing consent and phrase workshop outcomes.

Innovation Sprint 2025

5-month hackathon (Jul-Nov 2025) across 3 tracks: Research, Modelling, and Product. Prize pool: $5,000. Demo Day: 21 Nov at Strathmore University.

Demo Day video embed (captioned):

Ethics and Consent Framework

Dedicated section with a plain-language explanation of how speech data is collected, stored, anonymised, and used. This framework should be linked from all project pages sitewide.

Informed consent before recording
Participant-rights and withdrawal process
Secure storage and role-based access controls
Clear data use policy for research and open-source outputs

Impact and Outputs

Quarterly updated by CMS. Includes links to published papers and cdl-inclusion.com.

Languages

300

Total Hours (Target)

Sprint Tracks

Open-Source Release Wave

Get Involved - 4 Pathways

1. Record Your Speech

Contribute your voice to improve inclusive ASR models.

Start Recording

2. Annotator / Speech Therapist

Support labelling and clinical interpretation of speech data.

Apply as Specialist

3. Developer / Researcher

Build models, tools, and accessible applications with the datasets.

Join the Build

4. Organisation Partner

Collaborate on training, deployments, and community outreach.

Partner With Us

Contact: CDLI@senseshub.vision