Skip to main content

New projects to promote locally developed language datasets across Africa

The Lacuna Fund announces its second cohort of supported projects, whose teams will create openly accessible text and speech datasets that will fuel natural language processing (NLP) technologies in 29 languages across Africa.
A man looking at financial reports on his laptop.
Adeolu Eletu / Unsplash
One project is developing a speech dataset that makes it possible to access digital financial services in some Ghanaian local indigenous languages.

Funding recipients will produce training datasets in Eastern, Western, and Southern Africa that will support a range of needs for low-resourced languages. These needs include machine translation, speech recognition, named-entity recognition, part-of-speech tagging, sentiment analysis, and multi-modal datasets. All the datasets will be locally developed and owned, and will be openly accessible to the international data community.

“In South Africa, the government uses chatbots to provide daily updates on COVID,” explained Vukosi Marivate, Absa Chair of Data Science at the University of Pretoria. “Right now, translating those updates to Latin languages is really easy, but the datasets necessary to translate those updates to a range of African languages don’t exist, which means that the government isn’t currently able to communicate with many of its people in their native languages. That is one of the many examples of why we need this work now,” he said.

For more information on the selected projects, please visit

About the Lacuna Fund:

The Lacuna Fund is the world’s first collaborative effort to provide data scientists, researchers, and social entrepreneurs in low- and middle-income contexts with the resources they need to produce training datasets that address urgent problems in their communities. It began as a collaborative between The Rockefeller Foundation,, and IDRC, with support from the German development agency GIZ on behalf of the Federal Ministry for Economic Cooperation and Development (BMZ). It has since evolved into a multi-stakeholder engagement composed of technical experts, thought leaders, local beneficiaries, and end users. The Lacuna Fund is committed to creating and mobilizing training datasets that solve urgent local problems and lead to a step change in machine learning’s potential worldwide.