Hi! I’m Maharaj, a 4th-year Computer Science Ph.D. student at the Natural Language and Information Processing (NLIP) Lab in the Department of Computer Science & Engineering at the Indian Institute of Technology Hyderabad (IITH) supervised by Prof. Maunendra Sankar Desarkar and Dr. Anoop Kunchukuttan.
My research is driven by a desire to build equitable and culturally grounded AI. My specific interests include Culture NLP, Tokenization, Multilingual NLP, Large Language Models, and Machine Translation.
I actively contribute to the NLP research community through conference service, peer review, community volunteering, and the development of open research resources. I have served as a volunteer for ACL 2023 (Virtual) and the ACM ARCS & ACM India Annual Event 2026, supporting the organization of major research events. I have also served as a reviewer for several leading conferences and shared tasks, including EMNLP 2023, ACL 2025, EMNLP 2025, WMT 2025, EACL 2026, ACL 2026, MBCC 2026, NeurIPS 2026, and EMNLP 2026. I also believe in contributing to the community by developing and maintaining open datasets and benchmarks, including DIWALI, to facilitate reproducible research and advance multilingual and culturally grounded NLP.
Prior to joining the Ph.D. program, I completed my Master of Technology (M.Tech.) at Central Institute of Technology Kokrajhar (CITK), CFTI, Deemed to be University under MoE, India. I was fortunate to be supervised by Dr. Sanjib Narzary, and I worked on Machine Translation for the under-resourced Indian language. I served as a Teaching Assistant (TA) for the master’s course Advanced Computer Network Lab (PCSE271) and the undergraduate course Programming for Problem Solving Lab (UCSE271). I was a TA for the master’s course Mobile and Pervasive Computing (PCSE115), instructed by Prof. Pranav Kumar Singh.
In 2020, I had the good fortune to co-found a startup “DigitalOma” along with my friends.
I received my Bachelor of Technology (B.Tech.) in Computer Science & Engineering from CIT Kokrajhar, India in 2019 and worked on a thesis titled “English-Bodo Neural Machine Translation using Attention Mechanism”.
You can find more information in my CV.
News:
- Attended Microsoft Research India Academic Summit 2026 hosted by MSR Bangalore, India.
- 🏆 Our Image Transcreation submission ranked first in the Machine Translation for Vision Challenge at MAPS@CVPR.
- Paper accepted! Paper titled "Multilingual Tokenization through the Lens of Indian Languages: Challenges and Insights" accepted at ACL 2026 (Findings). Thanks to collaborators - Karthika, Rajat, and Saketh (Work done in collaboration with IIT Bombay and IIT Mandi).
- Our EMNLP 2025 paper, "DIWALI - Diversity and Inclusivity aWare cuLture specific Items for India: Dataset and Assessment of LLMs for Cultural Text Adaptation in Indian Context," has been selected for an oral presentation!
- I will be attending IndoML 2025 at BITS Hyderabad, where I'll be presenting a poster.
- Paper accepted! Paper titled DIWALI - Diversity and Inclusivity aWare cuLture specific Items for India: Dataset and Assessment of LLMs for Cultural Text Adaptation in Indian Context accepted at EMNLP 2025.
- Paper accepted! Paper titled MorphTok: Morphologically Grounded Tokenization for Indic languages accepted at Tokenization Workshop (TokShop) @ ICML 2025. Work done in collaboration with IIT Bombay and IIT Mandi.
- Had JRF to SRF conversion seminar - officially transitioned from Junior to Senior Research Fellow. Time flies!
- 🏆 Won Best Poster Award (2nd) at IndoML 2024. Check out the poster!
- Attending IndoML 2024 at BITS Pilani Goa