(March 17, 2024) In the scenic mountains of Himachal Pradesh is a river valley – Kangra – home to warm Himachali people who cherish their language and customs. The beautiful language extends its influence to Northern Punjab – where Delhi teen Navvye Anand has his familial roots. Though his family settled in Delhi, his connection to his culture and heritage remains strong through the Kangri language. Upon discovering that Kangri language was among UNESCO’s list of 10 endangered languages, he felt called to take action. “I started to look for ways to revitalise the language, and focused primarily on leveraging the oral medium. Recognising the dearth of written literature in Kangri, I turned to ASR (Automated Speech Recognition) – which converts human speech into written text,” he tells Global Indian.
Traditionally, linguistics would spend hours engaging with local people to manually transcribe oral traditions, often encountering bottlenecks due to the enormous amount of time and effort required, along with scope for human error. “ASR can be used to streamline the transcription process. Recent advancements in AI made it possible to use ASR at a much higher level than before,” adds Navvye, whose project, Cross-Lingual Automatic Speech Recognition for Endangered Languages won him The Spirit of Ramanujan Grant, worth $4800. Each year, the University of Virginia and the Templeton World Charity Organization jointly award a grant to high school students who demonstrate exceptional talent in mathematics and science.
With the grant, Navvye attended the Wolfram High School Summer Program in the US. “I learnt from Dr Steven Wolfram, who is a pioneering computer scientist and a linguist, and I had the opportunity to refine my linguistic abilities and learn more about linguistics,” says the teen, who also attended Euler Circle Program on number theory. “I used the grant to support my education.”
Love for words and languages
Growing up with a grandfather who was a polyglot fluent in seven languages, Navvye was attracted to words and languages like a moth to flame. “My love for languages is inherited, it played a crucial role in my upbringing,” says the teen, adding, “We bonded over our common love for language, and every time I found a new Pandora’s box in an unknown language, I’d walk up to him and discuss. I loved talking about languages with him. We’d often fixate upon certain peculiarities of a language – such as resemblance between shakkar (jaggery) – an Urdu word and sugar in English. My love for linguistics was innately embedded in me.”
After his grandfather passed away in 2022, Navvye decided to pay him homage by working towards revitalising a dialect of his native language. Moreover, his visit to the Kangra Valley region in 2018 made him fall in love with the warmth of the people. “They always had some folklore or a story to tell and were proud of their culture and heritage. I thought it was paradoxical that Kangri was an endangered language because these people love their culture and heritage. I thought if I could unite their love with advancements in AI, then that would be a great project to start,” he reveals.
Using AI to preserve Kangri
This led him to read papers from past researchers who had used ASR for other languages. One particularly intriguing study was by Emily Prud’hommeaux, an assistant professor at Boston College, who attempted to revitalise Seneca – an endangered language in the US. “Her research papers helped me understand the methodology and how researchers use ASR.” Later, he reached out to Dr Shweta Chauhan, a researcher at the National Institute of Technology Hamirpur, who had curated a text corpus for the Kangri language. “She invited me to intern at her lab, and ever since she has been an invaluable mentor.”
The innovation in ASR allows linguists to record conversations in their natural environment and to capture their essence without manually digitising any oral medium. Explaining the process, Navvye elaborates that a regular mic can be fed into the ASR model which helps give an accurate transcription. The audio is being converted into text via ASR. “Currently, the accuracy stands at 85 percent, and over time, my aim is to gather additional data and enhance the system to achieve a target accuracy of 95 percent.” The project operates on two fronts – one, where Navvye personally collects data by recording conversations using ASR, and second, where he connects with local translators who send audio transcriptions to him using ASR. “This allows me to build a robust audio repertoire. Additionally, I’m partnering with the Indian government through their Bhashini program, leveraging their resources to collect more Kangri data. I’m looking forward to expanding the audio repertoire as it will provide vast dataset to further fine-tune the model with improved accuracy.”
When Navvye started he was only 15, but his passion to translate his dream into reality kept him going along with the support of his parents and the people of Kangra. However, along the way he encountered some technical hiccups in his journey, primarily related to data collection, cleaning, model selection and fine-tuning. “After experimenting with other models, I settled for Open AI’s Whisper, which is the state-of-the-art speech recognition model. It is difficult to bring a simple idea to fruition but when the cause is noble, people will support you,” he adds.
ALSO READ | Induced AI to Karya: AI startups to look forward to in 2024
Creating an impact
In the last two years, Navvye’s work has empowered various translators by connecting them to MNCs operating in the Kangri domain. “I’ve helped a couple of translators gather the requisite information to contact Lenovo, created their LinkedIn profile, and filled out technical documents for them,” says Navvye, who is also creating awareness about the importance of the Kangri language among school children. Ask him the potential reason behind Kangri being an endangered language, he promptly replies, “More people are now speaking Hindi as compared to Kangri as they are dissuaded from speaking their native dialect due to globalisation. It’s not considered cool enough – something we need to counter,” says the teen.
Proud to be preserving his ancestor’s fading language, Navvye says the fruit of labour has been immense but the job isn’t finished yet. “There is a long way to go but I’m happy with the way it’s been going. I’m honoured to join the efforts to the preservation of my language which is a rich amalgamation of history and discourse,” adds Navvye. As he plans to join the California Institute of Technology this fall, he wants to stay committed to the project, confident in his ability to utilise the power of technology to further work towards the project remotely. “I will have a proper support system to enhance my knowledge. I already have a new idea about classifying dialects using embeddings which can help clusters different dialects and identify them,” reveals Navvye, adding that it can be used as a model for other languages.
Imparting advice to fellow teenagers, Navvye asks them to stop being afraid of taking the leap of faith. “Being afraid of failure is a sign of failure itself,” he says, adding, “Don’t worry if it will work out or not, you will find your way. In case, it doesn’t work out, you will learn something new in the process. Maybe you can tweak it so that it works better in the future.”
- Follow Navvye Anand on LinkedIn