About Me

I am a Principal Applied Researcher in the Microsoft Machine Translation team in Hyderabad, India. I am a founding member and co-lead of AI4Bharat, a research center based in IIT Madras that works to drive advances and build resources for Indian language NLP. I am honored to be currently serving as an area chair for ACL Rolling Review (ARR) in the multilinguality and low-resource/efficient NLP areas. I have also served as an adjunct faculty in the Department of Computer Science, IIT Madras in the past.

My research areas are Natural Language Processing, Machine Learning, Information Extraction, and Retrieval.

My research interests include multilingual learning and LLMs, post-training of LLMs, reasoning and evaluation in LLMs, representation learning, NLP for related languages, machine translation, and transliteration. I am interested in building tools and resources for Indian language NLP.

Over the last decade, I have built/contributed to large-scale, broad-coverage resources like the Indic NLP Library, IndicTrans/Sata-Anuvaadak Translation systems, IndicLLMSuite, Airavata LLM, IIT Bombay Parallel Corpus, Samanantar Corpus, Indic NLP/NLG Suite, and Aksharantar/BrahmiNet transliteration corpora.

I completed my Ph.D. in 2018 at the Department of Computer Science and Engineering, IIT Bombay. I did my research under the guidance of Prof. Pushpak Bhattacharyya at the Center for Indian Language Technology. My doctoral research work explored various facets of machine translation and transliteration between related languages.


News

  • 24 Jun 2025: Tutorial on Building Multilingual NLP datasets at scale at IASNLP Summer School at IIIT Hyderabad. [slides]
  • 19 Jun 2025: Upcoming talk at OdiaGen (IIT Bhubaneshwar) on reasoning models.
  • 22 May 2025: Hands-on tutorial from our team to train reasoning models at scale with open-source software and leveraging the power of Azure ML to make the process easy at Microsoft Build 2025. [Demo at 27 min]    [Jupyter Notebook]
  • 16 May 2025: 3 papers accepted to ACL 2025. Congratulations to all my collaborators and students! Continuing with our efforts to improve Indian language NLP and understanding multilingual models! [BhasaAnuvaad]    [CIA: Cross-lingual LLM Evaluation]    [RomanLens]
  • 12 Apr 2025: Happy to be part of panel discussions at AI Days 2025, Hyderabad on Indian language NLP and LLMs.
  • Apr 2025 Honoured to be part of the Academic Council at IIIT-Hyderabad.
  • Apr 2025 Honoured to be part of the CLD Program Curriculum and Review Committee at IIIT-Hyderabad.
  • 8 Feb 2025: Invited talk on An Introduction to Reasoning Models with DeepSeek R1 as part of CSE Department Day, IIT Hyderabad. The talk introduces reasonin gmodels, particularly DeepSeek R1 and open-source reasoninng efforts initiated since R1's release. [slides]
  • 12 Jan 2025: Lecture on Multilingual Language Modeling as part of winter school on "Deep Learning for Vision and Language Modelling" at IIT Guwahati. The talk covers various aspects of multilingual learning from the beginning of the deep learning era to current multilingual LLMs. [slides]
  • More…