Publications

You can also find my articles on my Google Scholar profile. That might be more upto date, however you can filter the list here by research area.

Filter by Research Area:

    2025

  1. Data and Model Centric Approaches for Expansion of Large Language Models to New languages
    Anoop Kunchukuttan, Raj Dabre, Rudramurthy V, Mohammed Safi Ur Rahman, Thanmay Jayakumar
    EMNLP (to appear), 2025
    [paper]
  2. RomanLens: The Role Of Latent Romanization In Multilinguality In LLMs
    Alan Saji, Jaavid Aktar Husain, Thanmay Jayakumar, Raj Dabre, Anoop Kunchukuttan, Ratish Puduppully
    arXiv preprint arXiv:2502.07424, 2025
    [paper]
  3. 2024

  4. An Empirical Comparison of Vocabulary Expansion and Initialization Approaches for Language Models
    Nandini Mundra, Aditya Nanda Kishore, Raj Dabre, Ratish Puduppully, Anoop Kunchukuttan, Mitesh M Khapra,
    CoNLL, 2024
    [paper]
  5. A Comprehensive Analysis of Adapter Efficiency
    Nandini Mundra, Sumanth Doddapaneni, Raj Dabre, Anoop Kunchukuttan, Ratish Puduppully, Mitesh M Khapra
    CoDS-COMAD, 2024
    [paper]
  6. Utilizing Lexical Similarity to Enable Zero-Shot Machine Translation for Extremely Low-resource Languages
    Kaushal Kumar Maurya, Rahul Kejriwal, Maunendra Sankar Desarkar, Anoop Kunchukuttan
    EACL, 2024
    [paper]
  7. How Good is Zero-Shot MT Evaluation for Low Resource Indian Languages
    Anushka Singh, Ananya B. Sai, Raj Dabre, Ratish Puduppully, Anoop Kunchukuttan, Mitesh M Khapra
    ACL, 2024
    [paper]
  8. IndicLLMSuite: A Blueprint for Creating Pre-training and Fine-Tuning Datasets for Indian Languages.
    Khan, Mohammed Safi Ur Rahman, Priyam Mehta, Ananth Sankar, Umashankar Kumaravelan, Sumanth Doddapaneni, Sparsh Jain, Anoop Kunchukuttan, Pratyush Kumar, Raj Dabre, and Mitesh M. Khapra
    ACL, 2024
    [paper]
  9. RomanSetu: Efficiently unlocking multilingual capabilities of Large Language Models models via Romanization.
    Husain, Jaavid Aktar, Raj Dabre, Aswanth Kumar, Ratish Puduppully, and Anoop Kunchukuttan
    ACL, 2024
    [paper]
  10. Synthetic Data Generation and Joint Learning for Robust Code-Mixed Translation
    Kartik, Sanjana Soni, Anoop Kunchukuttan, Tanmoy Chakraborty, Md. Shad Akhtar,
    COLING-LREC, 2024
    [paper]
  11. Cross-Lingual Auto Evaluation for Assessing Multilingual LLMs
    Sumanth Doddapaneni, Mohammed Safi Ur Rahman Khan, Dilip Venkatesh, Raj Dabre, Anoop Kunchukuttan, Mitesh M Khapra,
    arXiv preprint arXiv:2410.13394, 2024
    [paper]
  12. BhasaAnuvaad: A Speech Translation Dataset for 13 Indian Languages
    Sparsh Jain, Ashwin Sankar, Devilal Choudhary, Dhairya Suman, Nikhil Narasimhan, Mohammed Safi Ur Rahman Khan, Anoop Kunchukuttan, Mitesh M Khapra, Raj Dabre,
    arXiv preprint arXiv:2411.04699, 2024
    [paper]
  13. Pralekha: An Indic Document Alignment Evaluation Benchmark
    Sanjay Suryanarayanan, Haiyue Song, Mohammed Safi Ur Rahman Khan, Anoop Kunchukuttan, Mitesh M Khapra, Raj Dabre,
    arXiv preprint arXiv:2411.19096, 2024
    [paper]
  14. Extending English Large Language Models to New Languages: A Survey
    Anoop Kunchukuttan
    [paper]
  15. 2023

  16. IndicMT Eval: A Dataset to Meta-Evaluate Machine Translation Metrics for Indian Languages
    Ananya Sai B, Tanay Dixit, Vignesh Nagarajan, Anoop Kunchukuttan, Pratyush Kumar, Mitesh M Khapra, Raj Dabre
    ACL, 2023
    [paper]
  17. Evaluating Inter-Bilingual Semantic Parsing for Indian Languages
    Divyanshu Aggarwal, Vivek Gupta, Anoop Kunchukuttan
    NLP4ConvAI, 2023
    [paper]
  18. In-context Example Selection for Machine Translation Using Multiple Features
    Aswanth Kumar, Ratish Puduppully, Raj Dabre, Anoop Kunchukuttan
    EMNLP Findings, 2023
    [paper]
  19. Decomposed Prompting for Machine Translation Between Related Languages using Large Language Models
    Ratish Puduppully, Anoop Kunchukuttan, Raj Dabre, Ai Ti Aw, Nancy F Chen
    EMNLP, 2023
    [paper]
  20. IndicTrans2: Towards High-Quality and Accessible Machine Translation Models for all 22 Scheduled Indian Languages
    Jay Gala, Pranjal A Chitale, Raghavan AK, Sumanth Doddapaneni, Varun Gumma, Aswanth Kumar, Janki Nawale, Anupama Sujatha, Ratish Puduppully, Vivek Raghavan, Pratyush Kumar, Mitesh M Khapra, Raj Dabre, Anoop Kunchukuttan
    TMLR, 2023
    [paper]
  21. Bhasha-Abhijnaanam: Native-script and romanized Language Identification for 22 Indic languages
    Yash Madhani, Mitesh M Khapra, Anoop Kunchukuttan
    ACL, 2023
    [paper]
  22. Naamapadam: A Large-Scale Named Entity Annotated Data for Indic Languages
    Arnav Mhaske, Harshit Kedia, Sumanth Doddapaneni, Mitesh M Khapra, Pratyush Kumar, Rudra Murthy V, Anoop Kunchukuttan
    ACL, 2023
    [paper]
  23. IndicSUPERB: A speech processing universal performance benchmark for indian languages
    Tahir Javed, Kaushal Bhogale, Abhigyan Raman, Pratyush Kumar, Anoop Kunchukuttan, Mitesh M Khapra
    AAAI, 2023
    [paper]
  24. Effectiveness of mining audio and text pairs from public data for improving ASR systems for low-resource languages
    Kaushal Bhogale, Abhigyan Raman, Tahir Javed, Sumanth Doddapaneni, Anoop Kunchukuttan, Pratyush Kumar, Mitesh M Khapra
    ICASSP, 2023
    [paper]
  25. Towards Leaving No Indic Language Behind: Building Monolingual Corpora, Benchmark and Models for Indic Languages
    Sumanth Doddapaneni, Rahul Aralikatte, Gowtham Ramesh, Shreya Goyal, Mitesh M Khapra, Anoop Kunchukuttan, Pratyush Kumar
    ACL, 2023
    [paper]
  26. Aksharantar: Towards building open transliteration tools for the next billion users
    Yash Madhani, Sushane Parthan, Priyanka Bedekar, Ruchi Khapra, Vivek Seshadri, Anoop Kunchukuttan, Pratyush Kumar, Mitesh M Khapra
    EMNLP Findings, 2023
    [paper]
  27. 2022

  28. Samanantar: The largest publicly available parallel corpora collection for 11 Indic languages
    Gowtham Ramesh, Sumanth Doddapaneni, Aravinth Bheemaraj, Mayank Jobanputra, Raghavan AK, Ajitesh Sharma, Sujit Sahoo, Harshita Diddee, Divyanshu Kakwani, Navneet Kumar, Aswin Pradeep, Kumar Deepak, Vivek Raghavan, Anoop Kunchukuttan, Pratyush Kumar, Mitesh Shantadevi Khapra
    TACL, 2022
    [paper]
  29. IndicBART: A Pre-trained Model for Indic Natural Language Generation
    Raj Dabre, Himani Shrotriya, Anoop Kunchukuttan, Ratish Puduppully, Mitesh M Khapra, Pratyush Kumar
    ACL Findings, 2022
    [paper]
  30. Towards Building ASR Systems for the Next Billion Users
    Tahir Javed, Sumanth Doddapaneni, Abhigyan Raman, Kaushal Santosh Bhogale, Gowtham Ramesh, Anoop Kunchukuttan, Pratyush Kumar, Mitesh M Khapra
    AAAI, 2022
    [paper]
  31. IndicNLG benchmark: Multilingual datasets for diverse NLG tasks in indic languages
    Aman Kumar, Himani Shrotriya, Prachi Sahu, Amogh Mishra, Raj Dabre, Ratish Puduppully, Anoop Kunchukuttan, Mitesh M Khapra, Pratyush Kumar
    EMNLP, 2022
    [paper]
  32. IndicXNLI: Evaluating multilingual inference for Indian languages
    Divyanshu Aggarwal, Vivek Gupta, Anoop Kunchukuttan
    EMNLP, 2022
    [paper]
  33. Bilingual tabular inference: A case study on indic languages
    Chaitanya Agarwal, Vivek Gupta, Anoop Kunchukuttan, Manish Shrivastava
    NAACL, 2022
    [paper]
  34. 2021

  35. An Empirical Investigation of Multi-bridge Multilingual NMT models
    Anoop Kunchukuttan
    arXiv preprint arXiv:2110.07304, 2021
    [paper]
  36. Proceedings of the 8th Workshop on Asian Translation (WAT2021)
    Toshiaki Nakazawa, Hideki Nakayama, Isao Goto, Hideya Mino, Chenchen Ding, Raj Dabre, Anoop Kunchukuttan, Shohei Higashiyama, Hiroshi Manabe, Win Pa Pa, Shantipriya Parida, Ondřej Bojar, Chenhui Chu, Akiko Eriguchi, Kaori Abe, Yusuke Oda, Katsuhito Sudoh, Sadao Kurohashi, Pushpak Bhattacharyya
    WAT, 2021
    [paper]
  37. Itihasa: A large-scale corpus for Sanskrit to English translation
    Rahul Aralikatte, Miryam de Lhoneux, Anoop Kunchukuttan, Anders Søgaard
    WAT, 2021
    [paper]
  38. The AI4Bharat Initiative
    Anoop Kunchukuttan, Mitesh Khapra, Pratyush Kumar
    ICON 2021, 2021
    [paper]
  39. Machine Translation and Transliteration involving Related, Low-resource Languages
    Anoop Kunchukuttan, Pushpak Bhattacharyya
    CRC Press, 2021
    [paper]
  40. A large-scale evaluation of neural machine transliteration for indic languages
    Anoop Kunchukuttan, Siddharth Jain, Rahul Kejriwal
    EACL, 2021
    [paper]
  41. A primer on pretrained multilingual language models
    Sumanth Doddapaneni, Gowtham Ramesh, Mitesh M Khapra, Anoop Kunchukuttan, Pratyush Kumar
    arXiv preprint arXiv:2107.00676, 2021
    [paper]
  42. 2020

  43. Multilingual neural machine translation
    Raj Dabre, Chenhui Chu, Anoop Kunchukuttan
    COLING, 2020
    [paper]
  44. Utilizing language relatedness to improve machine translation: A case study on languages of the indian subcontinent
    Anoop Kunchukuttan, Pushpak Bhattacharyya
    arXiv preprint arXiv:2003.08925, 2020
    [paper]
  45. Contact Relatedness can help improve multilingual NMT: Microsoft STCI-MT @ WMT20
    Vikrant Goyal, Anoop Kunchukuttan, Rahul Kejriwal, Siddharth Jain, Amit Bhagwat
    WMT, 2020
    [paper]
  46. Learning Geometric Word Meta-Embeddings
    Pratik Jawanpuria, NTV Dev, Anoop Kunchukuttan, Bamdev Mishra
    REPL4NLP, 2020
    [paper]
  47. The IndoWordnet Parallel Corpus
    Anoop Kunchukuttan
    [paper]
  48. AI4Bharat-IndicNLP corpus: Monolingual corpora and word embeddings for Indic languages
    Anoop Kunchukuttan, Divyanshu Kakwani, Satish Golla, Avik Bhattacharyya, Mitesh M Khapra, Pratyush Kumar
    REPL4NLP/non-archival, 2020
    [paper]
  49. Overview of the 7th Workshop on Asian Translation
    Toshiaki Nakazawa, Hideki Nakayama, Chenchen Ding, Raj Dabre, Shohei Higashiyama, Hideya Mino, Isao Goto, Win Pa Pa, Anoop Kunchukuttan, Shantipriya Parida, Ondřej Bojar, Sadao Kurohashi
    WAT, 2020
    [paper]
  50. A Survey of Multilingual Neural Machine Translation
    Raj Dabre, Chenhui Chu, Anoop Kunchukuttan
    ACM Computing Surveys, 2020
    [paper]
  51. IndicNLPSuite: Monolingual Corpora, Evaluation Benchmarks and Pre-trained Multilingual Language Models for Indian Languages.
    Divyanshu Kakwani, Anoop Kunchukuttan, Satish Golla, Gokul N.C., Avik Bhattacharyya, Mitesh M. Khapra, Pratyush Kumar
    EMNLP Findings, 2020
    [paper]
  52. Indic NLP Library: A unified approach to NLP for Indian languages
    Anoop Kunchukuttan
    [paper]
  53. 2019

  54. Learning multilingual word embeddings in latent metric space: a geometric approach
    Pratik Jawanpuria, Arjun Balgovind, Anoop Kunchukuttan, Bamdev Mishra
    TACL, 2019
    [paper]
  55. Addressing word-order Divergence in Multilingual Neural Machine Translation for extremely Low Resource Languages
    Rudra Murthy V, Anoop Kunchukuttan, Pushpak Bhattacharyya
    NAACL, 2019
    [paper]
  56. Proceedings of the 6th Workshop on Asian Translation
    Toshiaki Nakazawa, Chenchen Ding, Raj Dabre, Anoop Kunchukuttan, Nobushige Doi, Yusuke Oda, Ondřej Bojar, Shantipriya Parida, Isao Goto, Hidaya Mino
    WAT, 2019
    [paper]
  57. 2018

  58. NICT's Participation in WAT 2018: Approaches Using Multilingualism and Recurrently Stacked Layers.
    Raj Dabre, Anoop Kunchukuttan, Atsushi Fujita, Eiichiro Sumita
    WAT, 2018
    [paper]
  59. Overview of the 5th Workshop on Asian Translation
    Toshiaki Nakazawa, Shohei Higashiyama, Chenchen Ding, Raj Dabre, Anoop Kunchukuttan, Win Pa Pa, Isao Goto, Hideya Mino, Katsuhito Sudoh, Sadao Kurohashi
    WAT, 2018
  60. Machine Translation and Transliteration involving Related, Low-resource Languages
    Anoop Kunchukuttan
    IIT Bombay, 2018
  61. Multilingual Indian Language Translation System at WAT 2018: Many-to-one Phrase-based SMT
    Tamali Banerjee, Anoop Kunchukuttan, Pushpak Bhattacharyya
    WAT, 2018
    [paper]
  62. Judicious Selection of Training Data in Assisting Language for Multilingual Neural NER
    V Rudramurthy, Anoop Kunchukuttan, Pushpak Bhattacharyya
    ACL, 2018
    [paper]
  63. Leveraging Orthographic Similarity for Multilingual Neural Transliteration
    Anoop Kunchukuttan, Mitesh Khapra, Gurneet Singh, Pushpak Bhattacharyya
    TACL, 2018
    [paper]
  64. McTorch, a manifold optimization library for deep learning
    Mayank Meghwanshi, Pratik Jawanpuria, Anoop Kunchukuttan, Hiroyuki Kasai, Bamdev Mishra
    Workshop on Machine Learning Open Source Software @NIPS, 2018
    [paper]
  65. The IIT Bombay English-Hindi Parallel Corpus
    Anoop Kunchukuttan, Pratik Mehta, Pushpak Bhattacharyya
    LREC, 2018
    [paper]
  66. 2017

  67. Learning variable length units for SMT between related languages via Byte Pair Encoding
    Anoop Kunchukuttan, Pushpak Bhattacharyya
    First Workshop on Subword and Character LEvel Models in NLP (SCLeM), 2017
    [paper]
  68. Comparing Recurrent and Convolutional Architectures for English-Hindi Neural Machine Translation
    Sandhya Singh, Ritesh Panjwani, Anoop Kunchukuttan, Pushpak Bhattacharyya
    WAT, 2017
    [paper]
  69. Utilizing Lexical Similarity between Related, Low-resource Languages for Pivot-based SMT
    Anoop Kunchukuttan, Maulik Shah, Pradyot Prakash, Pushpak Bhattacharyya
    IJCNLP, 2017
    [paper]
  70. 2016

  71. Orthographic Syllable as basic unit for SMT between Related Languages
    Anoop Kunchukuttan, Pushpak Bhattacharyya
    EMNLP, 2016
    [paper]
  72. Statistical machine translation between related languages
    Pushpak Bhattacharyya, Mitesh M Khapra, Anoop Kunchukuttan
    NAACL Tutorial, 2016
    [paper]
  73. Faster decoding for subword level Phrase-based SMT between related languages.
    Anoop Kunchukuttan, Pushpak Bhattacharyya
    Third Workshop on NLP for Similar Languages, Varieties and Dialects, 2016
    [paper]
  74. Substring-based Unsupervised Transliteration with Phonetic and Contextual Knowledge
    Anoop Kunchukuttan, Mitesh Khapra Pushpak Bhattacharyya
    CoNLL, 2016
    [paper]
  75. IIT Bombay’s English-Indonesian submission at WAT: Integrating neural language models with SMT
    Sandhya Singh, Anoop Kunchukuttan, Pushpak Bhattacharyya
    WAT, 2016
    [paper]
  76. 2015

  77. Addressing Class Imbalance in Grammatical Error Detection with Evaluation Metric Optimization
    Anoop Kunchukuttan, Pushpak Bhattacharyya
    International Conference on Natural Language Processing (ICON), 2015
    [paper]
  78. Translation & Transliteration between Related Languages
    Anoop Kunchukuttan, Mitesh Khapra
    ICON , 2015
    [paper]
  79. Investigating the potential of postordering SMT output to improve translation quality
    Pratik Mehta, Anoop Kunchukuttan, Pushpak Bhattacharyya
    International Conference on Natural Language Processing (ICON), 2015
    [paper]
  80. Brahmi-Net: A transliteration and script conversion system for languages of the Indian subcontinent
    Anoop Kunchukuttan, Ratish Puduppully, Pushpak Bhattacharyya
    NAACL: System Demonstrations, 2015
    [paper]
  81. Augmenting Pivot based SMT with word segmentation
    Rohit More, Anoop Kunchukuttan, Raj Dabre, Pushpak Bhattacharyya
    International Conference on Natural Language Processing (ICON), 2015
    [paper]
  82. SarcasmBot: An open-source sarcasm-generation module for chatbots
    Aditya Joshi, Anoop Kunchukuttan, Pushpak Bhattacharyya, Mark James Carman
    WISDOM Workshop, 2015
    [paper]
  83. Data representation methods and use of mined corpora for Indian language transliteration
    Anoop Kunchukuttan, Pushpak Bhattacharyya
    Proceedings of the Fifth Named Entity Workshop, 2015
    [paper]
  84. 2014

  85. The IIT Bombay SMT System for ICON 2014 Tools Contest
    Anoop Kunchukuttan, Ratish Pudupully, Rajen Chatterjee, Abhijit Mishra, Pushpak Bhattacharyya
    NLP Tools Contest at ICON 2014, 2014
    [paper]
  86. Boosting Phrase-based SMT with Unsupervised Morph-Analysis and Transliteration Mining
    Anoop Kunchukuttan, Ratish Puduppully, Rajen Chatterjee, Abhijit Mishra, Pushpak Bhattacharyya
    NLP Tools Contest: ICON 2014, 2014
  87. Supertag Based Pre-ordering in Machine Translation
    Rajen Chatterjee, Anoop Kunchukuttan, Pushpak Bhattacharyya
    International Conference on Natural Language Processing, 2014
    [paper]
  88. Tuning a grammar correction system for increased precision
    Anoop Kunchukuttan, Sriram Chaudhury, Pushpak Bhattacharyya
    Proceedings of the Eighteenth Conference on Computational Natural Language Learning: Shared Task, 2014
    [paper]
  89. Machine Learning For Machine Translation
    Pushpak Bhattacharyya, Anoop Kunchukuttan, Piyush Dungarwal, Shubham Gautam
    ICON , 2014
    [paper]
  90. When Transliteration Met Crowdsourcing : An Empirical Study of Transliteration via Crowdsourcing using Efficient, Non-redundant and Fair Quality Control
    Mitesh Khapra, Ananthakrishnan Ramanathan, Anoop Kunchukuttan, Karthik Visweswariah, Pushpak Bhattacharyya
    Language and Resources and Evaluation Conference, 2014
    [paper]
  91. The IIT Bombay Hindi⇔ English Translation System at WMT 2014
    Piyush Dungarwal, Rajen Chatterjee, Abhijit Mishra, Anoop Kunchukuttan, Ritesh Shah, Pushpak Bhattacharyya
    ICON Shared Task, 2014
    [paper]
  92. Crowdsourcing translation services
    Anoop Kunchukuttan, Shourya Roy, Mitesh Khapra, Nicola Cancedda, Pushpak Bhattacharyya
    [paper]
  93. Śata-Anuva̅dak: Tackling Multiway Translation of Indian Languages
    Anoop Kunchukuttan, Abhijit Mishra, Chatterjee Rajen, Ritesh Shah, Pushpak Bhattacharyya
    Language Resources and Evaluation Conference, 2014
  94. 2013

  95. IITB System for CoNLL 2013 Shared Task: A Hybrid Approach to Grammatical Error Correction
    Anoop Kunchukuttan, Ritesh Shah, Pushpak Bhattacharyya
    Proceedings of the Seventeenth Conference on Computational Natural Language Learning, 2013
    [paper]
  96. TransDoop: A Map-Reduce based Crowdsourced Translation for Complex Domain
    Anoop Kunchukuttan, Rajen Chatterjee, Shourya Roy, Abhijit Mishra, Pushpak Bhattacharyya
    Proceedings of the Association of Computational Linguistics (demo), 2013
    [paper]
  97. 2012

  98. Experiences in resource generation for machine translation through crowdsourcing
    Anoop Kunchukuttan, Shourya Roy, Pratik Patel, Kushal Ladha, Somya Gupta, Mitesh Khapra, Pushpak Bhattacharyya
    LREC, 2012
    [paper]
  99. Partially modelling word reordering as a sequence labelling problem
    Anoop Kunchukuttan, Pushpak Bhattacharyya
    Workshop on Reordering for Statistical Machine Translation, 2012
    [paper]
  100. Multiword Expressions in the CLIA Project
    Anoop Kunchukuttan, Munish Minia, Pushpak Bhattacharyya
    Vishwabharat, 2012
    [paper]
  101. The Reordering Problem in Statistical Machine Translation
    Anoop Kunchukuttan
  102. 2008

  103. A system for compound noun multiword expression extraction for hindi
    Anoop Kunchukuttan, Om Prakash Damani
    ICON, 2008
    [paper]
  104. 2007

  105. Multiword Expression Recognition
    Anoop Kunchukuttan
    [paper]
  106. 2006

  107. Evaluation of Information Retrieval Systems
    Anoop Kunchukuttan
    [paper]