(December 12, 2022) During his high-school years at Northwest Guilford High School, Neil Shah started looking for opportunities to get involved in Computer Science Research. He emailed many professors looking for an opportunity to help them with their research, even for free.
Neil had programming experience and skills, and a desire to learn, but no degree or advanced training. Eventually, a professor at NC State University Prof. Nagiza Samatova responded to his email, and he ended up spending a summer helping her graduate students with their research projects.
“This experience helped me discover that I had a real passion for getting deep into problems. I enjoyed wracking my brain on one problem for a long time and this neuroticism served me well, then and now,” smiles Neil Shah, who is now Lead Research Scientist at Snap Inc., Seattle, whose work broadly spans data mining, machine learning, network science and computational social science domains. Over the years, his extensive research has resulted in 45 + journal publications besides best paper awards.
The immigrant life
Neil’s parents moved from Mumbai to the US when they were about 30, and he was one-and-a-half years old. His father works as Director, Global Customs Compliance at a textile company, while his mother is a Staff Software Quality Assurance Engineer at a fuel dispenser manufacturing company.
“My parents are first generation immigrants, and they worked hard to build a life for me in this country. They instilled great values in me, especially a strong work ethic, integrity, and persistence,” he shares with Global Indian. For the first few years, the Shah family lived in Raleigh, NC, US and later moved to Greensboro, where Neil eventually graduated from high school. At home, he enjoyed playing video games, browsing the internet and finding tutorials to learn how to program software, etc. In middle school, his school required students to purchase TI-83+ graphing calculators to help them learn some concepts in algebra / geometry.
First steps as a coder
One of his first serious experiences getting into programming was using the simple programming language these calculators had, to write basic math and science software. “I also used to write simple “choose your own adventure” (CYOA) style games on the TI-83+,” says the 30-year-old, who enjoyed creating new tools.
Interestingly, his collaboration with Prof Nagiza, with whom he got associated in his high-school years, persisted for years afterwards. He also worked with Nagiza’s daughter, Katie, (also in high school) on a major research competition for high school students.
“Nagiza and her colleague Prof. Anatoli Melechko mentored us on a project towards identifying instabilities in plasma in computer-simulated nuclear fusion reactor data, which ended with us winning $50K as a team ($25K each between myself and Katie), and helping us pay for undergraduate schooling,” smiles Neil, who went on to join NC State for undergraduate schooling after finishing high school.
Data mining
As he did research at NC State University, Neil also worked on data management and compression – namely, how to handle storage and indexing of very large datasets.
One particularly fascinating aspect of data mining and machine learning is that a large amount of data generated today is social in nature, by which I mean that it reflects human behaviour and actions. For example, how humans interact with each other, or how they choose to spend their time watching online videos or engaging with content.
“These types of interactions create immensely valuable data that fundamentally encapsulates information about how humans behave. This data can be used as a lens into understanding people, which is a central focus of the computational social science discipline.” He says understanding that human behaviour has predictability and order was something extremely enlightening for him. Neil graduated with a BS in Computer Science and a Minor in Mathematics in 2013.
PhD from CMU
Neil spent a little over four years at Carnegie Mellon University, where he pursued his PhD (from 2013 – 2017), immediately after graduating from NC State University.
“My work at CMU was focused on understanding and modeling large-scale graph data, specifically in the context of identifying anomalous, suspicious or abusive behaviours in social networks and online platforms,” explains Neil.
Given that online perception is so critical to our impressions of online brands, influencers, and merchants, there are tremendous financial and social incentives to manipulate this perception, for instance, by purchasing fake followers on social platforms, fake reviews on rating and e-commerce platforms, says the research scientist.
Neil’s thesis focused on methods to automatically discover such nefarious behaviours in large-scale graph datasets by identifying anomalous interaction patterns in these graphs which are left as traces of these behaviours. These were used in deployed systems at Google, Flipkart and Twitch, and more.
After defending his PhD in October 2017, Neil worked with renowned Cyber space expert Prof Srijan Kumar, to write a survey paper titled “False Information on the Web and Social Media.” It provided an overview of a large variety of relevant academic works on these topics. This work has been cited over 370 times in the last few years.
At Work
He joined Snap very shortly after completing his PhD, towards the end of 2017. He leads initiatives in graph ML and manages a team of scientists, engineers and research interns towards development of state-of-the-art graph ML methods.
My team works on both enabling internal applications of graph ML methods to business problems (recommendation and ranking models), as well as impactful research that is externally visible, accessible (e.g. at top conferences) and open-source.
His work mostly focuses on machine learning techniques on graph data, towards applications of modeling user behaviour on social network data. This includes improving user experience by detecting fake users, fraudulent actions and spam, as well as bettering ranking and recommendation systems.
Graph ML
“Graphs” are a fundamental data structure in computer science which represent objects (called nodes or vertices) interacting with each other (called edges). Graph ML is a branch of Machine Learning which seeks to make sense of this relational data encoded in graph structure, towards applications like modeling and predicting behaviours on graphs (e.g. What will a person do in the future? Which other people or objects will they interact with?)
Research
A prolific researcher, Neil has a long list of work and publication to his credit. At Twitch, for instance, the popular, livestreaming platform that allows gaming-enthusiasts to find gaming and content creators, he helped tackled a major “viewbotting” problem. Streamers were paying botnet providers to inflate viewership metrics. Neil’s work was published at TheWebConf2017.
In Microsoft, Neil and his team built the Microsoft Academic Graph, working measure the impact of scientific research in ways that went beyond simple count-based metrics like citation count, h-index and journal impact factors, he says. At his first internship, at Lawrence Livermore National Laboratory, he worked to automatically identify and summarise behavioural patterns in time-evolving graph datasets. He has also worked on identifying Misinformation from Website Screenshots in Twitter data.
Scholarships
Neil was spared major financial challenges because of quite a few scholarships. He was able to offset a significant chunk of his schooling costs by pursuing undergraduate research at NC State University, getting his BS in Computer Science without any debt.
Neil says he was lucky to get his PhD “for free,” given how the Carnegie Mellon University CS program operates. “My research and stipend here was also supported by the NSF Graduate Research Fellowship, which allowed me to keep a reasonable standard of living as I studied,” he says.
Future plans
“I would like to continue doing research in industry. I love constantly learning and improving myself incrementally,” says Neil. Helping others understand how to think about the impact of problems, how to break them down into achievable steps, and persist until they are able to contribute to scientific innovation and seeing their long-term success and growth is immensely rewarding for Neil.
In leisure…
“I enjoy reading, lifting weights, and playing video games,” says Neil, who is reading quite a few Stephen King books lately. He has been lifting weights for many years now. “I used to compete in powerlifting when I was in graduate school,” says Neil, who finds it therapeutic and solitary activity after a long day of thinking. He can also spend hours playing Starcraft 2 and Dota 2, two of the biggest e-sports.
- Follow Neil Shah on LinkedIn