Anima Anandkumar

Recent advances in NLP

Large deep learning models have recently shown promise in being able to generate paragraphs of coherent text. I will describe NVIDIA Megatron, which is the largest language model trained to date and issues in scaling them on about 512 GPUs. On the other end of the spectrum, I will demonstrate how incorporating probabilistic models into FastText embeddings leads to unsupervised learning of senses. Finally, I will discuss applicability of NLP models to program code and new architectures to incorporate unbounded vocabulary.


Anima Anandkumar is a Bren Professor of Computing and Mathematical Sciences at the California Institute of Technology and Director of Research in Machine Learning at NVIDIA. Her research is in the areas of large-scale machine learning and high-dimensional statistics, and in particular, development of tensor methods that scale up machine learning to higher dimensions. She is also the recipient of the Alfred Sloan Fellowship, Microsoft Faculty Fellowship, ARO and AFOSR Young Investigator Awards, NSF Career Award and several paper awards. She received her B.Tech in Electrical Engineering from IIT Madras in 2004 and her PhD from Cornell University in 2009. She was a Postdoctoral Researcher in the Stochastic Systems Group at MIT from 2009-2010, an Assistant Professor at UC Irvine from 2010-2016, and Principal Scientist at Amazon Web Services from 2016-2018.

Marjan Ghazvininejad

Mask-Predict: Parallel Decoding of Conditional Masked Language Models

Most machine translation systems generate text auto-regressively from left to right. In this talk, I propose a masked language modeling objective to train a model to predict any subset of the target words, conditioned on both the input text and a partially masked target translation. This approach allows for efficient iterative decoding, where we first predict all of the target words non-autoregressively, and then repeatedly mask out and regenerate the subset of words that the model is least confident about.
By applying this strategy for a constant number of iterations, our model improves state-of-the-art performance levels for non-autoregressive and parallel decoding translation models by over 4 BLEU on average. It is also able to reach within about 1 BLEU point of a typical left-to-right transformer model, while decoding significantly faster.


Marjan Ghazvininejad is a Research Scientist in Facebook AI Research. She is interested in text representation, language generation, and machine translation. Her recent focus has been on new approaches for modeling and training of these systems to generate high quality, coherent, and creative text. She received her PhD in 2018 from the University of Southern California under the supervision of Kevin Knight.

Liang Huang

Linear-Time Parsing Meets RNA Folding

Predicting the secondary structure of an ribonucleic acid (RNA) sequence is useful in many applications. Existing algorithms [based on dynamic programming] suffer from a major limitation: their runtimes scale cubically with the RNA length, and this slowness limits their use in genome-wide applications.

We first present a novel alternative O(n^3)-time dynamic programming algorithm for RNA folding that is amenable to heuristics that make it run in O(n) time and O(n) space, while producing a high-quality approximation to the optimal solution. Inspired by the author’s earlier work on incremental parsing for context-free grammars in computational linguistics, our alternative dynamic programming algorithm scans the sequence in a left-to-right (5'-to-3') direction rather than in a bottom-up fashion, which allows us to employ the effective beam pruning heuristic. Our work, though inexact, is the first RNA folding algorithm to achieve linear runtime (and linear space) without imposing constraints on the output structure. Surprisingly, our approximate search results in even higher overall accuracy on a diverse database of sequences with known structures. More interestingly, it leads to significantly more accurate predictions on the longest sequence families in that database (16S and 23S Ribosomal RNAs), as well as improved accuracies for long-range base pairs (500+ nucleotides apart), both of which are well known to be challenging for the current models.


Liang Huang is a Distinguished Scientist at Baidu Research USA and an Assistant Professor at Oregon State University. Before that he was Assistant Professor for three years at the City University of New York (CUNY) and a part-time Research Scientist with IBM's Watson Group. He graduated in 2008 from Penn and has worked as a Research Scientist at Google and a Research Assistant Professor at USC/ISI. Most of his work develops fast algorithms and provable theory to speedup large-scale natural language processing, structured machine learning, and computational structural biology. He has received a Best Paper Award at ACL 2008, a Best Paper Honorable Mention at EMNLP 2016, several best paper nominations (ACL 2007, EMNLP 2008, and ACL 2010), two Google Faculty Research Awards (2010 and 2013), a Yahoo! Faculty Research Award (2015), and a University Teaching Prize at Penn (2005). His research has been supported by DARPA, NSF, Google, and Yahoo. He also co-authored a best-selling textbook in China on algorithms for programming contests.

Kevin Knight

NLP in Hollywood?

Is there a role for recurrent neural networks in the making of Hollywood movies? I'll answer in the affirmative, and give a few examples.


Kevin Knight is Chief Scientist for Natural Language Processing at Didi Chuxing. He received a PhD in computer science from Carnegie Mellon University and a bachelor's degree from Harvard University. Dr. Knight's research interests include human-machine communication, machine translation, language generation, automata theory, and decipherment. He has co-authored over 150 research paper on natural language processing, as well as the widely-adopted textbook "Artificial Intelligence" (McGraw-Hill). In 2001, he co-founded Language Weaver, Inc., a machine translation company acquired by SDL plc in 2010. Dr. Knight served as President of the Association for Computational Linguistics (ACL) in 2011, as General Chair for ACL in 2005, and as General Chair for the North American ACL in 2016. He is a Fellow of the ACL, USC/ISI, and AAAI.

Ndapa Nakashole

Sparsity Regularizers for Generalizing Representations of Language

A familiar scenario is NLP is one where a model trained on one dataset fails to generalize to data drawn from distributions other than that of the training data. I will talk about our work on sparsity regularizers as one way towards generalizing representations of language.


Ndapa Nakashole is an Assistant Professor at the University of California, San Diego, where she has been teaching and doing research on statistical natural language language processing since 2017.
Before that she did a postdoc at the machine learning department at Carnegie Mellon University.
She obtained her PhD from Saarland University and the Max Planck Institute for Informatics.
She completed undergraduate studies in Computer Science at the University of Cape Town, South Africa.

Sameer Singh

Discovering Bugs in NLP Models Using Natural Perturbations

Determining when a machine learning model is “good enough” is challenging since held-out accuracy metrics significantly overestimate real-world performance. In this talk, I will describe automated techniques to detect bugs that can occur naturally when a model is deployed. I will start by identifying “semantically equivalent” replacement rules for a model that should not change the meaning of the input but lead to a change in the model’s predictions. Then I will present our work on evaluating the consistency behavior of the model by exploring performance on new instances that are implied by the model’s predictions. I will also describe a method to understand and debug models by identifying keywords that "trigger" the model into misbehaving. The talk will include applications of these ideas on a number of NLP tasks, such as reading comprehension, entailment, visual QA, sentiment analysis, and language models.


Sameer Singh is an Assistant Professor of Computer Science at the University of California, Irvine. He is working on large-scale and interpretable machine learning applied to natural language processing. Sameer was a Postdoctoral Research Associate at the University of Washington and received his PhD from the University of Massachusetts, Amherst, during which he also worked at Microsoft Research, Google Research, and Yahoo! Labs. His group has received funding from Allen Institute for AI, NSF, DARPA, Adobe Research, and FICO, and was selected as a DARPA Riser in 2015. Sameer has published extensively at top-tier machine learning and natural language processing conferences.

Luke Zettlemoyer

Learning to Understand Entities In Text

Real world entities such as people, organizations and countries play a critical role in text. Reading offers rich explicit and implicit information about these entities, such as the categories they belong to, relationships they have with other entities, and events they participate in. In this talk, we introduce approaches to infer implied information about entities, and to automatically query such information in an interactive setting. This expands the scope of information that can be learned from text for a range of tasks, including sentiment extraction, entity typing and question answering. To this end, we introduce new ideas for how to find effective training data, including crowdsourcing and large-scale naturally occurring weak supervision data. We also describe new computational models, that represent rich social and conversation contexts to tackle these tasks. Together, these advances significantly expand the scope of information that can be incorporated into the next generation of machine reading systems. This work was primarily done by Eunsol Choi, in collaboration with Hannah Rashkin, He He, Mohit Iyer, Omer Levy, Mark Yatskar, Yejin Choi, Percy Liang, and Scott Yih.


Luke Zettlemoyer is an Associate Professor in the Paul G. Allen School of Computer Science & Engineering at the University of Washington, and a Research Scientist at Facebook. His research focuses on empirical methods for natural language understanding, and involves designing machine learning algorithms and building large datasets. Honors include multiple paper awards, a PECASE award, and an Allen Distinguished Investigator Award. Luke received his PhD from MIT and was a postdoc at the University of Edinburgh.