I am a coterminal M.S. student in Computer Science at Stanford University and a native of Pittsburgh, Pennsylvania.
I am currently doing research for the Stanford Intelligent and Interactive Autonomous Systems Group and am grateful to be advised by Prof. Dorsa Sadigh and Sidd Karamcheti. My interests include interactive systems, grounded language understanding, and reinforcement learning.
Last summer, I interned at Facebook Messenger, improving models of user-to-business communication using public conversational data. I have also interned at Telling.ai, a startup spun off from Carnegie Mellon's Language Technologies Institute that gleans biometric information from voice samples. These signals can help predict chronic lung disease and respiratory infections.
At Stanford, I was President of Stanford Speakers Bureau, managing high-profile speaker events and shows for the university and surrounding community. I also have thirteen years of training in piano. I led Stanford MELODY, a volunteer organization that provides free piano tutoring to children from underprivileged backgrounds.
I have always believed strongly in environmental sustainability. In 2014, I was featured in Science for my project which showed that switching typefaces can save a significant amount of ink and toner.
I am the proud younger brother of pianist Rishi Mirchandani.
Most recently, I have been doing research in the Stanford Intelligent and Interactive Autonomous Systems Group under Prof. Dorsa Sadigh. I'm also fortunate to be mentored by Sidd Karamcheti and Erdem Bıyık.
Currently, I'm focusing on how linguistic properties can be leveraged to help guide exploration in language-conditioned reinforcement learning (preprint below), as well as "active teaching" algorithms whereby an automated expert can concisely teach a user new tasks.
ELLA: Exploration through Learned Language Abstraction (2021)Abstract Arxiv Code
Building agents capable of understanding language instructions is critical to effective and robust human-AI collaboration. Recent work focuses on training these instruction following agents via reinforcement learning in environments with synthetic language; however, these instructions often define long-horizon, sparse-reward tasks, and learning policies requires many episodes of experience. To this end, we introduce ELLA: Exploration through Learned Language Abstraction, a reward shaping approach that correlates high-level instructions with simpler low-level instructions to enrich the sparse rewards afforded by the environment. ELLA has two key elements: 1) A termination classifier that identifies when agents complete low-level instructions, and 2) A relevance classifier that correlates low-level instructions with success on high-level tasks. We learn the termination classifier offline from pairs of instructions and terminal states. Notably, in departure from prior work in language and abstraction, we learn the relevance classifier online, without relying on an explicit decomposition of high-level instructions to low-level instructions. On a suite of complex grid world environments with varying instruction complexities and reward sparsity, ELLA shows a significant gain in sample efficiency across several environments compared to competitive language-based reward shaping and no-shaping methods.
Developing a Pragmatic Framework for Evaluating Color Captioning Systems (2019)Abstract Video
with Benjamin Louis Newman and Julia Gong
We present a framework for evaluating natural language descriptions in the color captioning problem. In this task, two agents are given a set of three colors and one of them generates a description of a target color for the other agent. Our approach is pragmatically motivated: we measure the effectiveness of a caption in terms of how well a trained model can select the correct color given the caption. We investigate four models, two of which explicitly model pragmatic reasoning, and we formulate a performance metric based on Gricean maxims to compare the effectiveness of the models. Our results indicate that though modeling pragmatic reasoning explicitly does improve evaluation perfomance by a small margin, it may not be essential from a practical perspective. Overall, we believe this evaluation framework is a promising start for evaluating natural language descriptions of captioning systems.
Analyzing an Extension of the Rational Speech Acts Model for the Figurative Use of Number Words (2019)Abstract
with Bhagirath Mehtha
Rules governing cooperative speaking have been quantified through the Rational Speech Acts (RSA) model. The model encapsulates Gricean pragmatics through a series of probabilistic inferences over a hypothetical literal listener, a pragmatic speaker, and a pragmatic listener to encode the probabilities of utterances and meanings. We study the work of Kao et al. (2014), who extend the RSA model to account for different conversational goals across the dimensions of literal meaning and affect, or subtext, in the interpretation of number words. The central objective of this work is to computationally implement this extended RSA modeling framework as devised by Kao et al. and to analyze the strengths and weaknesses of the approach.
Designing a Spoken Dialogue System to Assist L2 Acquisition in People who are Hard of Hearing (2018)Abstract Video
Advised by Dr. Maria Wolters, University of Edinburgh
Spoken dialogue systems can be a tool for conversational practice to support second language learning. In order to assess the inclusivity of such systems, we conduct a pilot study on the effect of hearing loss on interaction efficacy with a conversation practice system. Specifically, we examine the restaurant ordering context, and measure how well participants (both fluent speakers and language learners) identify and recall food options under three conditions of simulated hearing loss. We find that inherent qualities of the synthetic voice under hearing loss conditions impact subjective intelligibility scores. Additionally, for both selection and recall tasks, the effect of hearing loss under moderate conditions was exacerbated by lack of language fluency. Based on these results, we explore the optimization of an objective speech intelligibility metric by preprocessing the text with Structured Speech Markup Language properties, and offer a technique for this optimization based on Gaussian processes.
Using Partially Observable MDPs to Learn Language in a Spatial Reference Game (2018)Abstract
with Benjamin Louis Newman and Levi Lian
Much of early human language learning takes place in an unsupervised setting. In this work, we investigate how autonomous agents can use goal-oriented tasks in a spatial reference game to learn language. This problem is made difficult by the high dimensionality of the state and action spaces as well as the fact that it relates achieving one objective (i.e. reaching a goal) to achieving a secondary one (i.e. learning directional language). We formalize this problem as both a Markov decision process (MDP) and partially observable Markov decision process (POMDP). We analyze the performance of the agent under different conditions using dynamic programming and online POMDP solution techniques. We perform and visualize simulations of the policies and real-time update of belief states. We observe that knowing the language can influence the time it takes to arrive at a goal state, and completely learning the language can be incentivized by explicitly optimizing for that task.
Real-time Acoustic Modeling with Convolutional Neural Networks (2018)Abstract
with Ying Hang Seah and Levi Lian
Acoustic modeling with Hidden Markov Models and Gaussian Mixture Models has been the standard approach for automatic speech recognition (ASR) until the introduction of Convolutional Neural Networks (CNNs). We investigate the use of CNNs for a smaller task—phoneme recognition—and extend the model to allow for real-time classification. The real-time nature of the task poses challenges for streaming both the input and the output. We show that the CNN is able to produce decent performance for audio inputs given its unique characteristics. Additionally, we adapt the real-time classification task to streaming data visualization. This provides a base for a phoneme practicing tool that can be used by people with speaking difficulties. Future research can improve the usability of this system and extend the approach beyond English phonemes.
Analyzing Approaches to Remove Gender Bias from Word Embeddings (2019)Abstract
Recent literature has diagnosed that word embeddings reify social biases to a disturbing degree. In this paper, I focus on gender bias. After introducing how embeddings can capture bias, I present techniques for identifying and mitigating bias from the perspective of Gendered Innovations and the Methods of Sex and Gender Analysis (Schiebinger et al., 2018). Next, I explain the implications of debiased word embeddings for natural language processing. Finally, I discuss the inadequate focus on nonbinary gender in current work on word embeddings.
This work is an attempt to bridge the gap between the technical approaches to removing computational gender bias and the methods of analysis in the Gendered Innovations project.
Approximate Solutions to the Vehicle Routing Problem with Time Windows (2016)Summary
At the 2016 Pennsylvania Governor's School for the Sciences, I was part of a ten person team working on a variant of the Traveling Salesman Problem known as the Vehicle Routing Problem with Time Windows (VRPTW). The objective of the problem is to find the most efficient route for a number of trucks to make deliveries to customers with given locations and windows of delivery time. Our approach solved eight instances of VRPTW as efficiently as world record solutions to those instances. More information about the project is available at cmu.edu and the PGSS Blog.
Automated Illustration of Text to Improve Semantic Comprehension (2016)Abstract Recognition Video
Over a million Americans suffer from aphasia, a disorder that severely inhibits language comprehension. Medical professionals suggest that individuals with aphasia have a noticeably greater understanding of pictures than of the written or spoken word. Accordingly, we design a text-to-image converter that augments lingual communication, overcoming the highly constrained input strings and predefined output templates of previous work. This project offers four primary contributions. First, we develop an image processing algorithm that finds a simple graphical representation for each noun in the input text by analyzing Hu moments of contours in photographs and clipart images. Next, we construct a dataset of human-centric action verbs annotated with corresponding body positions. We train models to match verbs outside the dataset with appropriate body positions. Our system illustrates body positions and emotions with a generic human representation. Third, we design an algorithm that maps abstract nouns to concrete ones that can be illustrated easily. To accomplish this, we use spectral clustering to identify abstract noun classes and match these classes with representative concrete nouns. Finally, our system parses two datasets of pre-segmented and pre-captioned real-world images (ImageClef and Microsoft COCO) to identify graphical patterns that accurately represent semantic relationships between the words in a sentence. Our tests on human subjects establish the system's effectiveness in communicating text using images.
Taiwan International Science Fair, 2016
– Selected, one of two students to represent the U.S. – First Prize, Computer Science and Information Engineering category
MIT THINK Scholars Program, 2015
– Selected, one of six national finalists
Pittsburgh Regional Science and Engineering Fair, 2015
– First Place, Computer Science category
– Recipient, Intel Excellence in Computer Science Award
– Recipient, Sponsor Award from Carnegie Mellon University
– Recipient, Carnegie Science Award (awarded to the top project overall in grades 9-12)
Intel International Science and Engineering Fair, 2015
– Recipient, Fourth Place Grand Award in Systems Software category
– Recipient, Sponsor awards from China Association for Science and Technology, Association for the Advancement of Artificial Intelligence
– Recipient, Trip to European Organization for Nuclear Research – CERN
Fuzzy Logic Based Eye-Brain Controlled Web Access System (2014)Abstract Recognition Paper
Accessing the Web is crucially important in today’s society because of communication, education, and entertainment opportunities. Paralyzed or paretic individuals are unable to capitalize on these opportunities using traditional human-computer interaction methods. We develop a low-cost web browsing system for such individuals, integrating eye and brain control in a novel fashion to relay and interpret navigation commands. The system combines gaze position estimates obtained from a new image processing algorithm with brain concentration levels sensed and transmitted by an electroencephalogram headset. Since user intent may itself be uncertain, the system incorporates a novel fuzzy logic algorithm for combining brainwave and eye position inputs to determine the user’s targeted hyperlink. The algorithm adopts an approach based on exponential smoothing to efficiently keep a record of historical signals. Experimental evaluation established that the first attempt success rate of the system lies between 87% and 95% with 95% confidence. Error recovery accuracy is 98.4%, resulting in a second attempt success rate of 99.1%.
Intel International Science and Engineering Fair, 2014
– Recipient, $1500 Web Innovator Award from GoDaddy
The ElderBots Project: An Open-Source Social Robot for the Elderly (2014)Summary
During an internship at Carnegie Mellon University’s Quality of Life Technology Center, I worked on an open-source ”social robot" designed to help the elderly cope with isolation or depression. Specifically, I worked on an iOS app controller to interface with the robot. The app was published in the App Store in August 2014. (A video is available here.) Additional information about the project is available at elderbots.org and romibo.org.
The Effect of Typeface on Ink and Toner Costs (2014)Summary Recognition Paper (1) Paper (2)
In my sixth-grade science fair project, I estimated how much my school district would save in ink and toner costs by switching to a more ink-efficient typeface. I published my findings in the Journal of Emerging Investigators. The journal's editors encouraged me to extend my findings to the United States Government. The response was extremely positive: I was very fortunate to share my findings on CNN TV, to get a feature in Science Magazine, and to receive an invitation from HP's CEO to present my findings at HP's Headquarters in Palo Alto, CA.
Science, CNN, Associated Press, CBS This Morning, Financial Times Magazine, Forbes, HuffPost Live, TIME
– Featured, Science Magazine
– Invited by Ms. Meg Whitman, CEO of Hewlett-Packard, to tour HP Labs in Palo Alto, CA and present findings to HP engineers
– Recognized, Outstanding Community Service, State of California Senate, 2014
Note: I have presented a response to some misinformed articles attempting to "debunk" the study. To obtain a copy of the full rebuttal, please contact me.
Prior to college, I studied piano for a number of years under the tutelage of Prof. Luz Manríquez. Some old recordings are featured below.
Feel free to email me at email@example.com, or use the form below.