Publications
You can also find my articles on my Google Scholar profile. That might be more upto date, however you can filter the list here by research area.
Filter by Research Area:
- Data and Model Centric Approaches for Expansion of Large Language Models to New languagesAnoop Kunchukuttan, Raj Dabre, Rudramurthy V, Mohammed Safi Ur Rahman, Thanmay JayakumarEMNLP (to appear), 2025
- RomanLens: The Role Of Latent Romanization In Multilinguality In LLMsAlan Saji, Jaavid Aktar Husain, Thanmay Jayakumar, Raj Dabre, Anoop Kunchukuttan, Ratish PuduppullyarXiv preprint arXiv:2502.07424, 2025
- An Empirical Comparison of Vocabulary Expansion and Initialization Approaches for Language ModelsNandini Mundra, Aditya Nanda Kishore, Raj Dabre, Ratish Puduppully, Anoop Kunchukuttan, Mitesh M Khapra,CoNLL, 2024
- A Comprehensive Analysis of Adapter EfficiencyNandini Mundra, Sumanth Doddapaneni, Raj Dabre, Anoop Kunchukuttan, Ratish Puduppully, Mitesh M KhapraCoDS-COMAD, 2024
- Utilizing Lexical Similarity to Enable Zero-Shot Machine Translation for Extremely Low-resource LanguagesKaushal Kumar Maurya, Rahul Kejriwal, Maunendra Sankar Desarkar, Anoop KunchukuttanEACL, 2024
- How Good is Zero-Shot MT Evaluation for Low Resource Indian LanguagesAnushka Singh, Ananya B. Sai, Raj Dabre, Ratish Puduppully, Anoop Kunchukuttan, Mitesh M KhapraACL, 2024
- IndicLLMSuite: A Blueprint for Creating Pre-training and Fine-Tuning Datasets for Indian Languages.Khan, Mohammed Safi Ur Rahman, Priyam Mehta, Ananth Sankar, Umashankar Kumaravelan, Sumanth Doddapaneni, Sparsh Jain, Anoop Kunchukuttan, Pratyush Kumar, Raj Dabre, and Mitesh M. KhapraACL, 2024
- RomanSetu: Efficiently unlocking multilingual capabilities of Large Language Models models via Romanization.Husain, Jaavid Aktar, Raj Dabre, Aswanth Kumar, Ratish Puduppully, and Anoop KunchukuttanACL, 2024
- Synthetic Data Generation and Joint Learning for Robust Code-Mixed TranslationKartik, Sanjana Soni, Anoop Kunchukuttan, Tanmoy Chakraborty, Md. Shad Akhtar,COLING-LREC, 2024
- Cross-Lingual Auto Evaluation for Assessing Multilingual LLMsSumanth Doddapaneni, Mohammed Safi Ur Rahman Khan, Dilip Venkatesh, Raj Dabre, Anoop Kunchukuttan, Mitesh M Khapra,arXiv preprint arXiv:2410.13394, 2024
- BhasaAnuvaad: A Speech Translation Dataset for 13 Indian LanguagesSparsh Jain, Ashwin Sankar, Devilal Choudhary, Dhairya Suman, Nikhil Narasimhan, Mohammed Safi Ur Rahman Khan, Anoop Kunchukuttan, Mitesh M Khapra, Raj Dabre,arXiv preprint arXiv:2411.04699, 2024
- Pralekha: An Indic Document Alignment Evaluation BenchmarkSanjay Suryanarayanan, Haiyue Song, Mohammed Safi Ur Rahman Khan, Anoop Kunchukuttan, Mitesh M Khapra, Raj Dabre,arXiv preprint arXiv:2411.19096, 2024
- Extending English Large Language Models to New Languages: A SurveyAnoop Kunchukuttan
- IndicMT Eval: A Dataset to Meta-Evaluate Machine Translation Metrics for Indian LanguagesAnanya Sai B, Tanay Dixit, Vignesh Nagarajan, Anoop Kunchukuttan, Pratyush Kumar, Mitesh M Khapra, Raj DabreACL, 2023
- Evaluating Inter-Bilingual Semantic Parsing for Indian LanguagesDivyanshu Aggarwal, Vivek Gupta, Anoop KunchukuttanNLP4ConvAI, 2023
- In-context Example Selection for Machine Translation Using Multiple FeaturesAswanth Kumar, Ratish Puduppully, Raj Dabre, Anoop KunchukuttanEMNLP Findings, 2023
- Decomposed Prompting for Machine Translation Between Related Languages using Large Language ModelsRatish Puduppully, Anoop Kunchukuttan, Raj Dabre, Ai Ti Aw, Nancy F ChenEMNLP, 2023
- IndicTrans2: Towards High-Quality and Accessible Machine Translation Models for all 22 Scheduled Indian LanguagesJay Gala, Pranjal A Chitale, Raghavan AK, Sumanth Doddapaneni, Varun Gumma, Aswanth Kumar, Janki Nawale, Anupama Sujatha, Ratish Puduppully, Vivek Raghavan, Pratyush Kumar, Mitesh M Khapra, Raj Dabre, Anoop KunchukuttanTMLR, 2023
- Bhasha-Abhijnaanam: Native-script and romanized Language Identification for 22 Indic languagesYash Madhani, Mitesh M Khapra, Anoop KunchukuttanACL, 2023
- Naamapadam: A Large-Scale Named Entity Annotated Data for Indic LanguagesArnav Mhaske, Harshit Kedia, Sumanth Doddapaneni, Mitesh M Khapra, Pratyush Kumar, Rudra Murthy V, Anoop KunchukuttanACL, 2023
- IndicSUPERB: A speech processing universal performance benchmark for indian languagesTahir Javed, Kaushal Bhogale, Abhigyan Raman, Pratyush Kumar, Anoop Kunchukuttan, Mitesh M KhapraAAAI, 2023
- Effectiveness of mining audio and text pairs from public data for improving ASR systems for low-resource languagesKaushal Bhogale, Abhigyan Raman, Tahir Javed, Sumanth Doddapaneni, Anoop Kunchukuttan, Pratyush Kumar, Mitesh M KhapraICASSP, 2023
- Towards Leaving No Indic Language Behind: Building Monolingual Corpora, Benchmark and Models for Indic LanguagesSumanth Doddapaneni, Rahul Aralikatte, Gowtham Ramesh, Shreya Goyal, Mitesh M Khapra, Anoop Kunchukuttan, Pratyush KumarACL, 2023
- Aksharantar: Towards building open transliteration tools for the next billion usersYash Madhani, Sushane Parthan, Priyanka Bedekar, Ruchi Khapra, Vivek Seshadri, Anoop Kunchukuttan, Pratyush Kumar, Mitesh M KhapraEMNLP Findings, 2023
- Samanantar: The largest publicly available parallel corpora collection for 11 Indic languagesGowtham Ramesh, Sumanth Doddapaneni, Aravinth Bheemaraj, Mayank Jobanputra, Raghavan AK, Ajitesh Sharma, Sujit Sahoo, Harshita Diddee, Divyanshu Kakwani, Navneet Kumar, Aswin Pradeep, Kumar Deepak, Vivek Raghavan, Anoop Kunchukuttan, Pratyush Kumar, Mitesh Shantadevi KhapraTACL, 2022
- IndicBART: A Pre-trained Model for Indic Natural Language GenerationRaj Dabre, Himani Shrotriya, Anoop Kunchukuttan, Ratish Puduppully, Mitesh M Khapra, Pratyush KumarACL Findings, 2022
- Towards Building ASR Systems for the Next Billion UsersTahir Javed, Sumanth Doddapaneni, Abhigyan Raman, Kaushal Santosh Bhogale, Gowtham Ramesh, Anoop Kunchukuttan, Pratyush Kumar, Mitesh M KhapraAAAI, 2022
- IndicNLG benchmark: Multilingual datasets for diverse NLG tasks in indic languagesAman Kumar, Himani Shrotriya, Prachi Sahu, Amogh Mishra, Raj Dabre, Ratish Puduppully, Anoop Kunchukuttan, Mitesh M Khapra, Pratyush KumarEMNLP, 2022
- IndicXNLI: Evaluating multilingual inference for Indian languagesDivyanshu Aggarwal, Vivek Gupta, Anoop KunchukuttanEMNLP, 2022
- Bilingual tabular inference: A case study on indic languagesChaitanya Agarwal, Vivek Gupta, Anoop Kunchukuttan, Manish ShrivastavaNAACL, 2022
- An Empirical Investigation of Multi-bridge Multilingual NMT modelsAnoop KunchukuttanarXiv preprint arXiv:2110.07304, 2021
- Proceedings of the 8th Workshop on Asian Translation (WAT2021)Toshiaki Nakazawa, Hideki Nakayama, Isao Goto, Hideya Mino, Chenchen Ding, Raj Dabre, Anoop Kunchukuttan, Shohei Higashiyama, Hiroshi Manabe, Win Pa Pa, Shantipriya Parida, Ondřej Bojar, Chenhui Chu, Akiko Eriguchi, Kaori Abe, Yusuke Oda, Katsuhito Sudoh, Sadao Kurohashi, Pushpak BhattacharyyaWAT, 2021
- Itihasa: A large-scale corpus for Sanskrit to English translationRahul Aralikatte, Miryam de Lhoneux, Anoop Kunchukuttan, Anders SøgaardWAT, 2021
- The AI4Bharat InitiativeAnoop Kunchukuttan, Mitesh Khapra, Pratyush KumarICON 2021, 2021
- Machine Translation and Transliteration involving Related, Low-resource LanguagesAnoop Kunchukuttan, Pushpak BhattacharyyaCRC Press, 2021
- A large-scale evaluation of neural machine transliteration for indic languagesAnoop Kunchukuttan, Siddharth Jain, Rahul KejriwalEACL, 2021
- A primer on pretrained multilingual language modelsSumanth Doddapaneni, Gowtham Ramesh, Mitesh M Khapra, Anoop Kunchukuttan, Pratyush KumararXiv preprint arXiv:2107.00676, 2021
- Multilingual neural machine translationRaj Dabre, Chenhui Chu, Anoop KunchukuttanCOLING, 2020
- Utilizing language relatedness to improve machine translation: A case study on languages of the indian subcontinentAnoop Kunchukuttan, Pushpak BhattacharyyaarXiv preprint arXiv:2003.08925, 2020
- Contact Relatedness can help improve multilingual NMT: Microsoft STCI-MT @ WMT20Vikrant Goyal, Anoop Kunchukuttan, Rahul Kejriwal, Siddharth Jain, Amit BhagwatWMT, 2020
- Learning Geometric Word Meta-EmbeddingsPratik Jawanpuria, NTV Dev, Anoop Kunchukuttan, Bamdev MishraREPL4NLP, 2020
- The IndoWordnet Parallel CorpusAnoop Kunchukuttan
- AI4Bharat-IndicNLP corpus: Monolingual corpora and word embeddings for Indic languagesAnoop Kunchukuttan, Divyanshu Kakwani, Satish Golla, Avik Bhattacharyya, Mitesh M Khapra, Pratyush KumarREPL4NLP/non-archival, 2020
- Overview of the 7th Workshop on Asian TranslationToshiaki Nakazawa, Hideki Nakayama, Chenchen Ding, Raj Dabre, Shohei Higashiyama, Hideya Mino, Isao Goto, Win Pa Pa, Anoop Kunchukuttan, Shantipriya Parida, Ondřej Bojar, Sadao KurohashiWAT, 2020
- A Survey of Multilingual Neural Machine TranslationRaj Dabre, Chenhui Chu, Anoop KunchukuttanACM Computing Surveys, 2020
- IndicNLPSuite: Monolingual Corpora, Evaluation Benchmarks and Pre-trained Multilingual Language Models for Indian Languages.Divyanshu Kakwani, Anoop Kunchukuttan, Satish Golla, Gokul N.C., Avik Bhattacharyya, Mitesh M. Khapra, Pratyush KumarEMNLP Findings, 2020
- Indic NLP Library: A unified approach to NLP for Indian languagesAnoop Kunchukuttan
- Learning multilingual word embeddings in latent metric space: a geometric approachPratik Jawanpuria, Arjun Balgovind, Anoop Kunchukuttan, Bamdev MishraTACL, 2019
- Addressing word-order Divergence in Multilingual Neural Machine Translation for extremely Low Resource LanguagesRudra Murthy V, Anoop Kunchukuttan, Pushpak BhattacharyyaNAACL, 2019
- Proceedings of the 6th Workshop on Asian TranslationToshiaki Nakazawa, Chenchen Ding, Raj Dabre, Anoop Kunchukuttan, Nobushige Doi, Yusuke Oda, Ondřej Bojar, Shantipriya Parida, Isao Goto, Hidaya MinoWAT, 2019
- NICT's Participation in WAT 2018: Approaches Using Multilingualism and Recurrently Stacked Layers.Raj Dabre, Anoop Kunchukuttan, Atsushi Fujita, Eiichiro SumitaWAT, 2018
- Overview of the 5th Workshop on Asian TranslationToshiaki Nakazawa, Shohei Higashiyama, Chenchen Ding, Raj Dabre, Anoop Kunchukuttan, Win Pa Pa, Isao Goto, Hideya Mino, Katsuhito Sudoh, Sadao KurohashiWAT, 2018
- Machine Translation and Transliteration involving Related, Low-resource LanguagesAnoop KunchukuttanIIT Bombay, 2018
- Multilingual Indian Language Translation System at WAT 2018: Many-to-one Phrase-based SMTTamali Banerjee, Anoop Kunchukuttan, Pushpak BhattacharyyaWAT, 2018
- Judicious Selection of Training Data in Assisting Language for Multilingual Neural NERV Rudramurthy, Anoop Kunchukuttan, Pushpak BhattacharyyaACL, 2018
- Leveraging Orthographic Similarity for Multilingual Neural TransliterationAnoop Kunchukuttan, Mitesh Khapra, Gurneet Singh, Pushpak BhattacharyyaTACL, 2018
- McTorch, a manifold optimization library for deep learningMayank Meghwanshi, Pratik Jawanpuria, Anoop Kunchukuttan, Hiroyuki Kasai, Bamdev MishraWorkshop on Machine Learning Open Source Software @NIPS, 2018
- The IIT Bombay English-Hindi Parallel CorpusAnoop Kunchukuttan, Pratik Mehta, Pushpak BhattacharyyaLREC, 2018
- Learning variable length units for SMT between related languages via Byte Pair EncodingAnoop Kunchukuttan, Pushpak BhattacharyyaFirst Workshop on Subword and Character LEvel Models in NLP (SCLeM), 2017
- Comparing Recurrent and Convolutional Architectures for English-Hindi Neural Machine TranslationSandhya Singh, Ritesh Panjwani, Anoop Kunchukuttan, Pushpak BhattacharyyaWAT, 2017
- Utilizing Lexical Similarity between Related, Low-resource Languages for Pivot-based SMTAnoop Kunchukuttan, Maulik Shah, Pradyot Prakash, Pushpak BhattacharyyaIJCNLP, 2017
- Orthographic Syllable as basic unit for SMT between Related LanguagesAnoop Kunchukuttan, Pushpak BhattacharyyaEMNLP, 2016
- Statistical machine translation between related languagesPushpak Bhattacharyya, Mitesh M Khapra, Anoop KunchukuttanNAACL Tutorial, 2016
- Faster decoding for subword level Phrase-based SMT between related languages.Anoop Kunchukuttan, Pushpak BhattacharyyaThird Workshop on NLP for Similar Languages, Varieties and Dialects, 2016
- Substring-based Unsupervised Transliteration with Phonetic and Contextual KnowledgeAnoop Kunchukuttan, Mitesh Khapra Pushpak BhattacharyyaCoNLL, 2016
- IIT Bombay’s English-Indonesian submission at WAT: Integrating neural language models with SMTSandhya Singh, Anoop Kunchukuttan, Pushpak BhattacharyyaWAT, 2016
- Addressing Class Imbalance in Grammatical Error Detection with Evaluation Metric OptimizationAnoop Kunchukuttan, Pushpak BhattacharyyaInternational Conference on Natural Language Processing (ICON), 2015
- Translation & Transliteration between Related LanguagesAnoop Kunchukuttan, Mitesh KhapraICON , 2015
- Investigating the potential of postordering SMT output to improve translation qualityPratik Mehta, Anoop Kunchukuttan, Pushpak BhattacharyyaInternational Conference on Natural Language Processing (ICON), 2015
- Brahmi-Net: A transliteration and script conversion system for languages of the Indian subcontinentAnoop Kunchukuttan, Ratish Puduppully, Pushpak BhattacharyyaNAACL: System Demonstrations, 2015
- Augmenting Pivot based SMT with word segmentationRohit More, Anoop Kunchukuttan, Raj Dabre, Pushpak BhattacharyyaInternational Conference on Natural Language Processing (ICON), 2015
- SarcasmBot: An open-source sarcasm-generation module for chatbotsAditya Joshi, Anoop Kunchukuttan, Pushpak Bhattacharyya, Mark James CarmanWISDOM Workshop, 2015
- Data representation methods and use of mined corpora for Indian language transliterationAnoop Kunchukuttan, Pushpak BhattacharyyaProceedings of the Fifth Named Entity Workshop, 2015
- The IIT Bombay SMT System for ICON 2014 Tools ContestAnoop Kunchukuttan, Ratish Pudupully, Rajen Chatterjee, Abhijit Mishra, Pushpak BhattacharyyaNLP Tools Contest at ICON 2014, 2014
- Boosting Phrase-based SMT with Unsupervised Morph-Analysis and Transliteration MiningAnoop Kunchukuttan, Ratish Puduppully, Rajen Chatterjee, Abhijit Mishra, Pushpak BhattacharyyaNLP Tools Contest: ICON 2014, 2014
- Supertag Based Pre-ordering in Machine TranslationRajen Chatterjee, Anoop Kunchukuttan, Pushpak BhattacharyyaInternational Conference on Natural Language Processing, 2014
- Tuning a grammar correction system for increased precisionAnoop Kunchukuttan, Sriram Chaudhury, Pushpak BhattacharyyaProceedings of the Eighteenth Conference on Computational Natural Language Learning: Shared Task, 2014
- Machine Learning For Machine TranslationPushpak Bhattacharyya, Anoop Kunchukuttan, Piyush Dungarwal, Shubham GautamICON , 2014
- When Transliteration Met Crowdsourcing : An Empirical Study of Transliteration via Crowdsourcing using Efficient, Non-redundant and Fair Quality ControlMitesh Khapra, Ananthakrishnan Ramanathan, Anoop Kunchukuttan, Karthik Visweswariah, Pushpak BhattacharyyaLanguage and Resources and Evaluation Conference, 2014
- The IIT Bombay Hindi⇔ English Translation System at WMT 2014Piyush Dungarwal, Rajen Chatterjee, Abhijit Mishra, Anoop Kunchukuttan, Ritesh Shah, Pushpak BhattacharyyaICON Shared Task, 2014
- Crowdsourcing translation servicesAnoop Kunchukuttan, Shourya Roy, Mitesh Khapra, Nicola Cancedda, Pushpak Bhattacharyya
- Śata-Anuva̅dak: Tackling Multiway Translation of Indian LanguagesAnoop Kunchukuttan, Abhijit Mishra, Chatterjee Rajen, Ritesh Shah, Pushpak BhattacharyyaLanguage Resources and Evaluation Conference, 2014
- IITB System for CoNLL 2013 Shared Task: A Hybrid Approach to Grammatical Error CorrectionAnoop Kunchukuttan, Ritesh Shah, Pushpak BhattacharyyaProceedings of the Seventeenth Conference on Computational Natural Language Learning, 2013
- TransDoop: A Map-Reduce based Crowdsourced Translation for Complex DomainAnoop Kunchukuttan, Rajen Chatterjee, Shourya Roy, Abhijit Mishra, Pushpak BhattacharyyaProceedings of the Association of Computational Linguistics (demo), 2013
- Experiences in resource generation for machine translation through crowdsourcingAnoop Kunchukuttan, Shourya Roy, Pratik Patel, Kushal Ladha, Somya Gupta, Mitesh Khapra, Pushpak BhattacharyyaLREC, 2012
- Partially modelling word reordering as a sequence labelling problemAnoop Kunchukuttan, Pushpak BhattacharyyaWorkshop on Reordering for Statistical Machine Translation, 2012
- Multiword Expressions in the CLIA ProjectAnoop Kunchukuttan, Munish Minia, Pushpak BhattacharyyaVishwabharat, 2012
- The Reordering Problem in Statistical Machine TranslationAnoop Kunchukuttan
- A system for compound noun multiword expression extraction for hindiAnoop Kunchukuttan, Om Prakash DamaniICON, 2008
- Multiword Expression RecognitionAnoop Kunchukuttan
- Evaluation of Information Retrieval SystemsAnoop Kunchukuttan