But few people had enough mastery of the language to manually transcribe the audio. Inspired by voice assistants like Siri, Mahelona began looking into natural-language processing. “Teaching the computer to speak Māori became absolutely necessary,” Jones says.
But Te Hiku faced a chicken-and-egg problem. To build a te reo speech recognition model, it needed an abundance of transcribed audio. To transcribe the audio, it needed the advanced speakers whose small numbers it was trying to compensate for in the first place. There were, however, plenty of beginning and intermediate speakers who could read te reo words aloud better than they could recognize them in a recording.
So Jones and Mahelona, along with Te Hiku COO Suzanne Duncan, devised a clever solution: rather than transcribe existing audio, they would ask people to record themselves reading a series of sentences designed to capture the full range of sounds in the language. To an algorithm, the resulting data set would serve the same function. From those thousands of pairs of spoken and written sentences, it would learn to recognize te reo syllables in audio.
The team announced a competition. Jones, Mahelona, and Duncan contacted every Māori community group they could find, including traditional or hug dance troupes and waka ama canoe-racing teams, and revealed that whichever one submitted the most recordings would win a $ 5,000 grand prize.
The entire community mobilized. Competition got heated. One Māori community member, Te Mihinga Komene, an educator and advocate of using digital technologies to revitalize te reorecorded 4,000 phrases alone.
Money wasn’t the only motivator. People bought into Te Hiku’s vision and trusted it to safeguard their data. “Te Hiku Media said, ‘What you give us, we’re here as kaitiaki [guardians]. We look after it, but you still own your audio, ‘”says Te Mihinga. “That’s important. Those values define who we are as Māori. ”
Within 10 days, Te Hiku amassed 310 hours of speech-text pairs from some 200,000 recordings made by roughly 2,500 people, an unheard-of-level engagement among researchers in the AI community. “No one could’ve done it except for a Māori organization,” says Caleb Moses, a Māori data scientist who joined the project after learning about it on social media.
The amount of data was still small compared to the thousands of hours typically used to train English language models, but it was enough to get started. Using the data to bootstrap an existing open-source model from the Mozilla Foundation, Te Hiku created its very first te reo speech recognition model with 86% accuracy.