Hi 👋! I am a Research Scientist at Databricks Mosaic Research, where I study clever methods of using data to train/evaluate foundation models (e.g., LLMs) more efficiently.
Previously, I received my Ph.D. from Purdue University in the Probabilistic and Understandable Machine Learning Lab, where I had the pleasure of working with Dr. David Inouye and Dr. Murat Kocaoglu. I also have worked in various ML research roles for both production-level and research-level industry impacts. This includes working with Ankur Mallick and Kevin Hsieh from Microsoft Research, Bhavya Kailkhura from Lawrence Livermore National Lab, and Nicholas Waytowich from the Army Research Lab.
My works have been published in top-tier conferences such as NeurIPS, ICML, ICLR, and CVPR, are being patented by Microsoft, and have been integrated into Microsoft Azure’s ML monitoring toolbox as well as Microsoft Office’s Query Understanding pipeline.
PhD in Computer Engineering, 2019 - Dec 2023
Purdue University
BS in Electrical Engineering, 2015 - 2019
Purdue University
We build generative models by learning latent causal models from data observed from different domains for the purpose of generating domain counterfactuals, and further characterize the equivalence classes for such latent causal models.
We answer the question: ‘‘What is a distribution shift explanation?’’ and introduce a novel framework for explaining distribution shifts via transportation maps between a source and target distribution which are either inherently interpretable or interpreted using post-hoc interpretability methods.
We introduce a large-scale easy to use spatial reasoning including 3.6 million images summarizing 10-seconds of human-played matches from the StarCraft II video game.
This work with Microsoft365 Research studied using generative language models (e.g., LLMs) to improve enterprise search results in Microsoft Apps by adding related search terms to the user’s search query.