(March 23, 2022) Humans aren’t the best at preservation. Case in point, the world as we know it. Now, fathom how a country like India, that had 1,100 languages, has lost 220 forever. That 20 percent is just skimming the tip of this nadir – Across the world too, the findings are concerning. Over 1,500 endangered languages will no longer be spoken by the end of this century. Or that of the world’s 7,000 recognised languages, around half are currently endangered. Language redeemer Shruti Rijhwani decided to address this lacuna, and preserve them. The Pittsburg-based coding whiz used algorithms to preserve languages like Hokkaido Ainu (spoken among few elderly Japanese), Griko (Italiot Greek), Yakkha (from Nepal and Sikkim) and Kwak’wala (estimates state only 200 speak it fluently in Western Canada).
Way back in 2011, a young girl aspired for a career in technology. She was to travel to Birla Institute of Technology and Science, Pilani, for a BSc in computer science. There, she strengthened her foundation and ambled along to Carnegie Mellon University for an MS in language technologies, followed by a PhD in the same at the School of Computer Science. About to graduate in May 2022, Shruti Rijhwani was awarded the Forbes 30 under 30 2022. Her metier was restoring lost languages using artificial intelligence and machine learning. In that, restoring world history. The Bloomberg PhD fellow first got interested in languages as a preppy research intern at Microsoft Research (2015).
Now, busying with her final thesis, to graduate as a PhD, Shruti Rijhwani speaks from Pittsburgh to Global Indian, “My PhD thesis at CMU encompasses my research on developing machine learning algorithms to improve the accuracy of extracting text in endangered languages from printed books – those endangered. The books and documents in these languages do not have a digital format. My research works towards improving automatic digitisation using machine learning and natural language processing,” explains the traditionalist whose was honoured by Forbes in the science category.
A girl who loved computers
Brought up in Bengaluru, India, her parents and sister, while away from her, are a constant source of motivation. “I really miss my family in India,” says the language champion.
Not just her family, Shruti, misses her visits to India before Covid-19 hit, “I really miss the food – I always look forward to visiting my favourite restaurants in Bengaluru whenever I visit my family,” says the NLP expert.
For laymen, Shruti explains this quest into languages and tech that led her to becoming a research fellow at Microsoft Research then Bloomberg AI, publishing innumerable papers, to then getting consumed into the world of languages.
“I became interested in NLP after an internship at Microsoft Research in Bengaluru,” says Shruti, who then realised that many existing language technologies support a limited number of languages as opposed to being able to support the 7,000+ languages in the world.
“Many communities that speak endangered languages want language technologies that work well for their language, but it’s challenging. My project tries to build algorithms that work well for endangered languages in collaboration with linguists and endangered language communities,” explains the language redeemer.
At heart, a language expert
For Rijhwani, the primary technical challenge was that most state-of-the-art NLP methods rely on a large amount text resources, or data for ML – which endangered languages don’t have. “My research helps overcome a part of this challenge by developing techniques that perform well without much data,” she explains.
As complex as it sounds, the language redeemer breaks it down. “The algorithms take scanned images of non-digitised books and handwritten documents, automatically recognise characters, and produce digitised text. The recognition sees the algorithm looking at the shape of each character, and trying to match it to an existing pattern,” explains Shruti excitedly. While the recognition is not perfect, she has developed algorithms to automatically correct errors using NLP techniques through patterns that correct them.
Incredibly honored to be recognized on the 2022 @Forbes 30 under 30 list in Science!
✨*HUGE* thanks to my collaborators and mentors, @mulix, @anas_ant, @gneubig
✨More about my recent work: https://t.co/Qbum8a2qvL@ForbesUnder30 #ForbesUnder30 https://t.co/xfdWhMffXP— Shruti Rijhwani (@shrutirij) December 2, 2021
Elaborating on NLP, the 29-year-old explains that it is broad name for technologies that enable the understanding of human languages by computers. “There are multiple applications – automatically translating text (eg, Google Translate), searching the web, or automatic question- answer. Some work I did early in my PhD builds NLP models for automatically processing entities in human language (like location and person names). Now, I am using NLP techniques to improve text extraction accuracy for endangered languages,” adds the PhD student who is grateful for her mentor Graham Neubig’s guidance.
Those early days as a research intern, at Microsoft Research inspired Shruti to apply for a PhD. “During two summers, I worked as a research intern at Bloomberg AI. I enjoyed both, it gave me a good sense of how NLP research works. It also brought about collaborations with researchers at Bloomberg, leading to published research papers,” she adds.
Creating a niche
The Forbes 30 under 30 reveals that she had a fairly normal childhood, grew up in a space where independence was respected. Though not drawn to science as a child, Shruti enjoyed computer programming. Now, graduating in May, deep in her dissertation, and a final project on improving text extracting from endangered language texts, she loves working in her beautiful office, writing code, doing data analysis, or talking with collaborators! “The environment at CMU is exciting for research as many students are working on diverse and challenging problems, so it’s fun to learn about and discuss different research ideas,” says the language restorer.
Shruti loves a challenge, Thus working on difficult research problems is hugely motivational. “I’m not afraid to run from a challenge. I enjoy taking up risky projects. I believe my projects have significant practical or real-world impact,” explains this student for life.
Her dream job predictably also involves being able to develop ML and natural language processing algorithms to solve large-scale and real-life or practical challenges. “I want to develop NLP models and techniques to expand technologies to more languages and tasks, supporting populations that don’t have access to them,” says the coder.
The adventurer behind the coder
All coding and no play is also not what Shruti prescribes to. “I love spending time outdoors: I often go hiking in Pittsburgh, there are so many amazing state and city parks. For a vacation, I’d go near the ocean as I love snorkelling and I’m a certified scuba diver,” says the language redeemer.
During Covid-19, Shruti discovered a talent for woodwork – even bought a few power tools, and ended up building multiple pieces of furniture and décor.
Her long-term partner is her constant source of support through school and now PhD. “He is incredible at helping me balance work and life, ensuring I take breaks and enjoy life outside of research,” reveals the ML and AI language whiz who wants students to enter stem, especially girls. “Science and stem research is awesome! It’s an exciting career, technologies are being rapidly developed and it’s a lot of fun to learn, and discover new things every day. It’s challenging no doubt, this direction, but it’s absolutely rewarding,” concludes the language redeemer.