News

  • 8 Feb 2025: Invited talk on An Introduction to Reasoning Models with DeepSeek R1 as part of CSE Department Day, IIT Hyderabad. The talk introduces reasonin gmodels, particularly DeepSeek R1 and open-source reasoninng efforts initiated since R1's release. [slides]
  • 12 Jan 2025: Lecture on Multilingual Language Modeling as part of winter school on "Deep Learning for Vision and Language Modelling" at IIT Guwahati. The talk covers various aspects of multilingual learning from the beginning of the deep learning era to current multilingual LLMs. [slides]
  • 26 Dec 2024: Honoured to received the KnowDis Machine Learning Award for 2024 along with Prof. Mitesh Khapra [article]
  • 7 Dec 2024: Talk about NLP at scale for Indian languages on the occasion of the 25th anniversary celebrations of Language Technologies Research Centre (LTRC), International Institute of Information Technology Hyderabad (IIITH) Glad to participate int he celebrations and meet a lot of old friends, colleagues and mentors from across India. Congratulations to LTRC on this great milestone! [talk recording] [slides]
  • 21 Aug 2024: Honoured to be given the opportunity to serve as Standing Reviewer for the Transactions of ACL journal.
  • 14 Aug 2024: Two of our papers got awards at ACL 2024 (a) Area Chair Award for RomanSetu, (b) Outstanding Paper Award for IndicLLMSuite. Honoured at this recognition from ACL.
  • 9 Aug 2024: Happy to share survey a new version of the tutorial on Extending English Large Language Models to New Languages. This is an extended and revised version of presentations I did at IIT Hyderabad, IIIT Delhi (Summer School 2024), IIIT Hyderabad (IASNLP Summer School) and Microsoft. Also introducing a github report for this reading material, code, etc. on research area. [slides]    [github]
  • 8 Jul 2024: New pre-print on a new method for vocabulary expansion and initialization (ConstrainedWord2Vec) for language expansion in LLMs along with a comparison of many approaches [arxiv]
  • 16 May 2024: 3 papers accepted to ACL 2024
  • 2 Apr 2024: Happy to share survey tutorial on Extending English Large Language Models to New Languages. This is an extended version of presentations I did at IIT Hyderabad and Microsoft. [slides]    [linkedin]
  • 11 Mar 2024: New pre-print on the IndicLLMSuite, AI4Bharat's large-scale collection of data resources for training Indian language LLMs. [arxiv]    [github]
  • 8 Mar 2024: New pre-print on using romanization for cross-lingual transfer to non-Latin script languages in autoregressive English-heavy LLMs. We call this method RomanSetu, and it shows promising results. [arxiv]
  • Feb 2024: Honoured to be given the opportunity to serve as area chair at ACL Rolling Reviews (ARR).
  • 27 Jan 2024: Our paper on machine translation for extremely low-resource languages that utilizies lexical similarity has been accepted to **EACL 2024**. [pre-print]
  • 26 Jan 2024: Work from my team at Microsoft Translator on supporting 2 new Indian languages (Manipuri, Chattisgarhi) is now live. [Details]
  • 25 January 2024: AI4Bharat releases a new instruction-tuned model for Hindi, Airavata along with finetuning datasets and a benchmark collection. [blog]
  • 15 Nov 2023: Our paper "A Comprehensive Analysis of Adapter Efficiency" has been accpeted to CoDS-COMAD 2024. [pre-print]
  • 6 Oct 2023: Three papers accepted to EMNLP 2023 (1 Main, 2 Findings). All three works will presented as posters at EMNLP. [Details]
  • 5 Oct 2023: Work from my team at Microsoft Translator on supporting 4 new Indian languages (Bhojpuri, Bodo, Dogri, and Kashmiri) is now live. [Details]
  • 1 Sep 2023: Starting new role as Principal Applied Researcher at Microsoft India.
  • 26 Jun 2023: IndicSUPERB - our benchmark for Speech Language Understanding tasks accepted to AAAI [link]
  • 26 Jun 2023: Shrutilipi - our work on mining ASR corpora from All India Radio accepted to ICASSP [link]
  • 20 Jun 2023: Our work on Comprehensive Analysis of Adapter Efficiency accepted to the ES-FoMo: Efficient Systems for Foundation Models workshop at ICLM (non-archival) [arxiv]
  • 25 May 2023: Public Release of IndicTrans2, the first MT system supporting 22 Indian languages [arxiv]    [Developer Site] [Try it out]
  • 23 May 2023: Jugalbandi, a chatbot powered with Azure OpenAI and AI4Bharat translation/ASR/TTS models, showcased at Build2023. [video]
  • 22 May 2023: New pre-print on example selection for MT with LLMs [arxiv]
  • 19 May 2023: Work from my team at Microsoft Translator on supporting 4 new Indian languages (Konkani, Maithili, Sindhi, Sinhala) is now live. [Details]
  • 12 May 2023: New pre-print on Comprehensive Analysis of Adapter Efficiency [arxiv]
  • 10 May 2023: New pre-print on machine translation for extremely low-resource languages [arxiv]
  • 1 May 2023: Four papers on Indian language NLP accepted to ACL 2023. [Details]
  • 24 Jan 2023: Invited talk at IIT Hyderabad on Mining Datasets at scale for Building High-quality NLP Models [slides]
  • 28 Jul 2022: Inaguration of the <a href=https://ai4bharat.iitm.ac.in">AI4Bharat</a> center at IIT Madras.
  • 14 Apr 2022: Invited talk at IISER Bhopal on Multilingual Learning and Mining Datasets for Building High-quality NLP Models [slides]
  • 10 Mar 2022: IndicNLG Suite released with 5 generation tasks for 11 Indian languages [paper] [homepage]
  • 4 Mar 2022: I conducted lectures on sequence labeling and sequence-to-sequence learning covering RNN, LSTM, Transformers, etc. for CS-772 (Deep Learning for NLP) by Prof. Pushpak Bhattacharyya.
  • 4 Mar 2022: Our paper on IndicBART, a seq2seq pretrained model for 11 Indian languages accepted to Findings of ACL 2022
  • 31 Dec 2021: I presented a tutorial on the AI4Bharat Initiative at ICON 2021 with Mitesh Khapra and Pratyush Kumar
  • 10 Dec 2021: Our paper on IndicWav2Vec, a pretrained speech model for 40 Indian languages accepted to AAAI 2022
  • 4 Dec 2021: I presented an invited talk at Tamil Internet Conference 2021 on Indian Language Computing: A Multilingual Perspective
  • 20 Oct 2021: Work from my team at Microsoft Translator on supporting Dhivehi (language spoken in Maldives) is now live. [Details]
  • 5 Aug 2021: Glad to be part of the Samanantar team that received the NASSCOM AI Gamechangers Award 2021.
  • 15 Aug 2021: Glad to chair the SIGKDD 2021 Data Science in India Workshop networking session on NLP.
  • 15 Jul 2021: I conducted sessions on Machine Translation at the ACM NLP Summer school. </i>
  • 1 Jul 2021: Our survey paper on Multilingual Pre-trained models is now available on arxiv.
  • 15 Jun 2021: Happy to be part of the CIIL panel discussion on "Language Resources for AI in Indian Languages".
  • 18 Apr 2021: My team at Microsoft India will be presenting our work at EACL 2021 on large-scale multilingual transliteration for Indian languages on mined transliteration corpora of 600k word pairs between English and 10 Indic language pairs.
  • 13 Apr 2021: We at AI4Bharat with EkStep Foundation released Samanantar, the largest publicly available corpus for Indian languages containing 46M sentence pairs between English and 11 Indian languages.
  • 15 Feb 2021: I conducted lectures on sequence labeling and sequence-to-sequence learning covering RNN, LSTM, Transformers, etc. for CS-772 (Deep Learning for NLP) by Prof. Pushpak Bhattacharyya.
  • 2 Jan 2021: Glad to chair an NLP Session at CoDS-COMAD 2021.
  • 20 Dec 2020: Glad to chair a Machine Translation Session at ICON 2020.
  • 03 Dec 2020: Presented talk at Prof. Tanmoy Chakraborthy's ML course (IIIT Delhi) on Bridging the gap between Experimental Prototypes  and Production ML systems.
  • 10 Nov 2020: I will be part of a panel discussion on NLP/MT for low-resource languages at WMT 2020.
  • 19 Oct 2020: Invited Talk on Indic NLP: A Multilinguality and Language Relatedness Perspective at Vaibhav Summit (Organized by MyGov). [slides]
  • 18 Oct 2020: Lecture on Understanding the Indian Languages: Challenges & Opportunities> for Atal Faculty Development Program on Artificial Intelligence in Natural Language Processing at KIIT University, Bhubhaneshwar. [slides]
  • 29 Sep 2020: Work from my team at Microsoft Translator on supporting Assamese is now live. [Details]
  • 22 Sep 2020: IndicNLPSuite released containing large monolingual corpora, BERT models, embeddings and NLU datasets.
  • 15 Sep 2020: Our paper on NLP resources for Indian languages, IndicNLPSuite: Monolingual Corpora, Evaluation Benchmarks and Pre-trained Multilingual Language Models for Indian Languages,accepted to EMNLP Findings 2020 [preprint].
  • 13 Aug 2020: Work from my team at Microsoft Translator on supporting Odia is now live. [Details]
  • 09 Aug 2020: IITB Parallel Corpus v3.0 released. 47,000 new sentence pairs added. See details [HERE].
  • 09 Aug 2020: Finally documented the BrahmiNet-ITRANS transliteration scheme. See details [HERE].
  • 15 Jul 2020: Indian language multilingual translation shared task for WAT 2020 launched. We are resuming this task with larger parallel corpora. See details [HERE].
  • 09 Jul 2020: Bamdev presented our paper on Geometric Meta-embeddings at the REPL4NLP workshop (ACL 2020) [VIDEO]
  • 09 Jul 2020: We showcased theAI4Bharat-IndicNLP dataset at the REPL4NLP workshop (ACL 2020) [VIDEO]
  • 27 Jun 2020: It was great to moderate a talk by my advisor Prof. Pushpak Bhattacharyya on Imparting Sentiment and Politeness on Computers at the IIT Alumni Center Bangalore [video]
  • 10 Jun 2020: ACM Computing Survey has accepted our survey paper on Multilngual NMT. Camera-ready coming soon. HERE
  • 24 May 2020: Keynote Talk on NLP for Indian Languages: A Language Relatedness Perspective at 5th WILDRE workshop (under LREC 2020). [slides]
  • 16 May 2020: Lecture on Machine Translation at IIT Hyderabad as part of NLP course. [slides]
  • 16 May 2020: IndoWordnet Parallel Corpus v0.2 released. Fixes critical isues with v0.1. [link]
  • 01 May 2020: IndoWordnet Parallel Corpus is being used> for the WMT 2020 shared task on similar language translation for Hindi-Marathi translation. [link]
  • 30 Apr 2020: AI4Bharat-IndicNLP dataset released (built in collaboration with IIT Madras). Contains NLP resources for 10+ Indian languages. [Link to paper]
  • 15 Apr 2020: Work from my team at Microsoft Translator on supporting 5 new Indian languages (Marathi, Gujarati, Punjabi, Malayalam and Kannada) is now live. [Details]
  • 25 Mar 2020: IndoWordnet Parallel Corpus v0.1 released. [link]
  • 20 Mar 2020: Manuscript of Utilizing Language Relatedness to improve Machine Translation: A Case Study on Languages of the Indian Subcontinent available on arxiv. [link]
  • Feb 2020: IndicNLP library featured on analyticsindiamag [link]
  • 23 Jan 2020: IndicNLP library featured on AnalyticsVidhya [link]
  • 16 Jan 2020: Lecture on Neural MT at CEP course on Deep Learning for Natural Language Processing at IIT Patna [slides]
  • 05 Jan 2020: Our revised and expanded survey on Multilingual Neural MT is available.
  • 26 Oct 2019: Tutorial on Multilingual NMT accepted at COLING 2020, Barcelona, September 2020 with Raj Dabre and Chenhui Chu.
  • 26 Oct 2019: Workshop on Asian Translation (WAT) 2020 to be co-located with AACL/IJCNLP 2020. We will have en-hi and en-ta tasks. We may also have a multilingual Indic language translation tasks</i>
  • 30 Aug 2019: Started a collaborative catalog for Indian language NLP resources. Please contribute to improve the catalog.
  • 30 Aug 2019: Invited talk at NASSCOM DSAI-CoE on NLP for Indian Languages: A Language Relatedness Perspective.[slides]
  • 29 Jul 2019: Presented our paper Learning Multilingual Word Embeddings in Latent Metric Space: A Geometric Approach at ACL 2019. [video]
  • 27 Jul 2019: Tutorial at the IIT Alumni Center Bengaluru AI Deep Dive Workshop 2019 on Natural Language Processing - A Distributional Approach. [slides]
  • 4 Sep 2018: Work from my team at Microsoft Translator on supporting Telugu is now live. [Details]
  • </ul>