Graduate Student at IIT Delhi

About Me

I am a final year Master’s student at IIT Delhi, studying Mathematics and Computing. Born in Pune, Maharasthra, I have now shifted to St. John’s, Newfoundland, Canada. I am fluent in English, Hindi and Marathi.
I was a Research Assistant at Simon Fraser University, Burnaby, Canada in NatLang Lab for the Spring and Summer 2023 terms where I worked with Prof. Anoop Sarkar on Simultaneous Machine Translation. I was also a visiting research intern in the 3DLG Lab at SFU.

Experience

Simultaneous Machine Translation (Published in IWSLT Workshop @ ACL 2023)

Graduate Research Assistant at NatLang Lab, SFU

Supervised by Dr. Anoop Sarkar (Dec 2022 - Aug 2023)

https://github.com/AditiJain14/TIQTok

SiMT models struggle to maintain translation quality while keeping latency low. In my work, we used an auxiliary language model to augment the training of the decoder model. Under this notion of target adaptive training, generating rare or difficult tokens is rewarded which improves the translation quality while reducing latency. The predictions made by a language model in the decoder are combined with the traditional cross entropy loss which frees up the focus on the source side context. This training method significantly improved BLEU scores in the English-Vietnamese and English-German language pairs, especially in the early latencies which are more difficult to translate for.

TriCoLo - Trimodal Contrastive Loss for Text to 3D Shape Retrieval

Graduate Research Assistant at 3DLG Lab, SFU

Supervised by Dr. Angel Chang (May 2022 - July 2022)

https://arxiv.org/pdf/2201.07366.pdf

The text, image and voxels of the objects of the Text2Shape dataset are embedded in a common space using a trimodality contrastive loss. I experimented with introducing different augmentations to the text and image encoding modules, and implemented methods like SimCSE on sentence embeddings with augmentations such as word-dropout and switch-out. Compared improvements in sentence embeddings by using pretrained models like BERT, GLoVe, and SentenceBERT.

Optimizing Bloom Filters with Learned Hashes

Summer Undergraduate Research Intern at IIT Delhi

Supervised by Dr. Srikanta Bedathur (May 2021 - Nov 2021)

https://dl.acm.org/doi/10.14778/3538598.3538613

Replaced traditional hash functions with a random projection-based approach to exploit patterns in the data by projecting each data point onto a set of vectors. The vectors themselves were divided into bins (to map to indices of the bit array) and the best vector set for a particular dataset was selected after some iterations. This method showed a significant reduction in the memory footprint of a BF while maintaining the desired false positive rate (FPR). I also found upper bounds for size and FPR for the proposed method with extensive mathematical grounding. Awarded the Undergraduate Research Award for the project and laid the groundwork for a VLDB publication.

Publications

Language Model Based Target Token Importance Rescaling for Simultaneous Neural Machine Translation
Aditi Jain, Nishant Kambhatla, Anoop Sarkar, Association for Computational Linguistics (ACL) International Conference on Spoken Language Translation (IWSLT) Workshop 2023. [DOI]

Investigating the Quality of DermaMNIST and Fitzpatrick17k Dermatological Image Datasets
Kumar Abhishek, Aditi Jain, Ghassan Hamarneh, arXiv preprint [Under Review]

Projects

Supervised by Dr. JayaShree Kalpathy-Cramer, Harvard University (Aug 2022 - Nov 2022)

Analysis of Breast Histopathology Images

I worked on segmentation and classification of whole slide images of breast cancer patients to analyse the effect of spatial distribution and the count of immune cells and predict breast cancer subtypes and treatment outcomes. I also generated heatmaps to add clinical interpretability to the attention-based model (such as the CLAM model).

Supervised by Dr. Ashish Anand, IIT Guwahati (May 2020 - Jul 2020)

https://doi.org/10.1007/978-3-030-68790-8_28

Improving Robustness of Visual Question Answering (VQA) Systems

I helped annotate a new VQA dataset with 3 categorical rephrasings (30k questions) and contributed to an ICPRW paper. I also trained character-level & word-level encoder decoder models for rephrasing generation using RNNs. I also experimented with using pretrained DistilBert model and a Gensim embedding layer to improve consistency by 15%.

Education

Indian Institute of Technology, Delhi

Integrated Bachelor's and Master's | Mathematics and Computing

2019 - 2024

During my undergrad, I studied introductory and advanced level courses of probability and statistics, linear algebra, calculus, game theory, optimization, data mining and data structures and algorithms. I also took 2 electives of linguistics studying structure and function of word in language.

I was also an active member of the Literary Society at IIT Delhi.

A Little More About Me

Alongside my interests in NLP and maths, some of my other interests and hobbies are:

Reading (mostly fiction, you can check out my latest reads on my Goodreads )
Hiking (I like taking nature pictures!)
Obsessing over cats (to an unhealthy amount)