About Me
I am a final year Master’s student at IIT Delhi, studying Mathematics and Computing. Born in Pune, Maharasthra, I have now shifted to St. John’s, Newfoundland, Canada. I am fluent in English, Hindi and Marathi.
I was a Research Assistant at Simon Fraser University, Burnaby, Canada in NatLang Lab for the Spring and Summer 2023 terms where I worked with Prof. Anoop Sarkar on Simultaneous Machine Translation.
I was also a visiting research intern in the 3DLG Lab at SFU.
Experience
Simultaneous Machine Translation (Published in IWSLT Workshop @ ACL 2023)
Graduate Research Assistant at NatLang Lab, SFU
Supervised by Dr. Anoop Sarkar (Dec 2022 - Aug 2023)
https://github.com/AditiJain14/TIQTok
SiMT models struggle to maintain translation quality while keeping latency low. In my work, we used an auxiliary language model to augment the training of the decoder model. Under this notion of target adaptive training, generating rare or difficult tokens is rewarded which improves the translation quality while reducing latency. The predictions made by a language model in the decoder are combined with the traditional cross entropy loss which frees up the focus on the source side context. This training method significantly improved BLEU scores in the English-Vietnamese and English-German language pairs, especially in the early latencies which are more difficult to translate for.
TriCoLo - Trimodal Contrastive Loss for Text to 3D Shape Retrieval
Graduate Research Assistant at 3DLG Lab, SFU
Supervised by Dr. Angel Chang (May 2022 - July 2022)
https://arxiv.org/pdf/2201.07366.pdf
The text, image and voxels of the objects of the Text2Shape dataset are embedded in a common space using a trimodality contrastive loss. I experimented with introducing different augmentations to the text and image encoding modules, and implemented methods like SimCSE on sentence embeddings with augmentations such as word-dropout and switch-out. Compared improvements in sentence embeddings by using pretrained models like BERT, GLoVe, and SentenceBERT.
Optimizing Bloom Filters with Learned Hashes
Summer Undergraduate Research Intern at IIT Delhi
Supervised by Dr. Srikanta Bedathur (May 2021 - Nov 2021)
https://dl.acm.org/doi/10.14778/3538598.3538613
Replaced traditional hash functions with a random projection-based approach to exploit patterns in the data by projecting each data point onto a set of vectors. The vectors themselves were divided into bins (to map to indices of the bit array) and the best vector set for a particular dataset was selected after some iterations. This method showed a significant reduction in the memory footprint of a BF while maintaining the desired false positive rate (FPR). I also found upper bounds for size and FPR for the proposed method with extensive mathematical grounding. Awarded the Undergraduate Research Award for the project and laid the groundwork for a VLDB publication.
Publications
Language Model Based Target Token Importance Rescaling for Simultaneous Neural Machine Translation
Aditi Jain, Nishant Kambhatla, Anoop Sarkar, Association for Computational Linguistics (ACL) International Conference on Spoken Language Translation (IWSLT) Workshop 2023. [DOI]
Investigating the Quality of DermaMNIST and Fitzpatrick17k Dermatological Image Datasets
Kumar Abhishek, Aditi Jain, Ghassan Hamarneh, arXiv preprint [Under Review]
Projects
Supervised by Dr. JayaShree Kalpathy-Cramer, Harvard University (Aug 2022 - Nov 2022)
Analysis of Breast Histopathology Images
- I worked on segmentation and classification of whole slide images of breast cancer patients to analyse the effect of spatial distribution and the count of immune cells and predict breast cancer subtypes and treatment outcomes. I also generated heatmaps to add clinical interpretability to the attention-based model (such as the CLAM model).
Supervised by Dr. Ashish Anand, IIT Guwahati (May 2020 - Jul 2020)
https://doi.org/10.1007/978-3-030-68790-8_28
Improving Robustness of Visual Question Answering (VQA) Systems
- I helped annotate a new VQA dataset with 3 categorical rephrasings (30k questions) and contributed to an ICPRW paper. I also trained character-level & word-level encoder decoder models for rephrasing generation using RNNs. I also experimented with using pretrained DistilBert model and a Gensim embedding layer to improve consistency by 15%.
Education
Indian Institute of Technology, Delhi
Integrated Bachelor's and Master's | Mathematics and Computing
2019 - 2024
During my undergrad, I studied introductory and advanced level courses of probability and statistics, linear algebra, calculus, game theory, optimization, data mining and data structures and algorithms. I also took 2 electives of linguistics studying structure and function of word in language.
I was also an active member of the Literary Society at IIT Delhi.
A Little More About Me
Alongside my interests in NLP and maths, some of my other interests and hobbies are:
- Reading (mostly fiction, you can check out my latest reads on my Goodreads )
- Hiking (I like taking nature pictures!)
- Obsessing over cats (to an unhealthy amount)