COSC 7336 Advanced Natural Language Processing

Spring 2016

Roy G. Cullen (C) 106

Instructor: Arjun Mukherjee


Overview

This is an advanced course on Natural Language Processing and applied NLP in Web mining. The course in intended for developing advanced skills in NLP and Web data and text mining via NLP applications. The broader goal is two fold: (1) Get a thorough understanding of statistical NLP techniques (e.g., latent variable models, graphical models in NLP, etc.) and learning to build tools for solving practical text mining problems, (2) Explore recent papers in the field with presentations, talks, critiques, and defence. Throughout the course, large emphasis will be placed on tying NLP techniques to specific real-world applications through hands-on experience. The course covers fundamental topics in statistical machine learning and touches upon various topics in NLP for the Web.


Administrative details

Office hours

Instructor office hours: M 2-4 pm

Prerequisites

The course requires decent background in mathematics and sufficient programming skills. If you have taken and did well in one or more of the equivalent courses/topics such as Algorithms, Data Mining, Machine Learning, Natural Language Processing, or have decent background in probability/statistics, it will be helpful. The course however reviews and covers required mathematical and statistical foundations. Sufficient experience for building projects in a high level programming language (e.g., Java) is required.

Required reference materials:

Online resources (OR) per topic as appearing in the schedule below.
Course Materials including books and lecture notes

Grading

Component Contribution
Project 25%
Paper Presentations 55%
Critique 15%
Class Participation 5%


Rules and policies

Late Assignments: Late assignments will not, in general, be accepted. They will never be accepted if the student has not made special arrangements with me at least one day before the assignment is due. It also needs to be a justifiable reason owing to exacting circumstances. If a late assignment is accepted it is subject to a reduction in score as a late penalty.
Cheating: All submitted work (code, homeworks, exams, etc.) must be your own. If evidence of code sharing is found, there will be conseuences impacting your grade in the course. Please refer to the student handbook for details on academic honesty.
Statute of limitations: Grading questions or complaints, will in general not be attended to beyond one week after the item in question has been returned.


Paper Reading Assignments/Project due dates

Assignments Due date
Project 4/18
Paper: Domain Adaptation with Structural Correspondence Learning [Blitzer at al., 2006] Presenter/Defender: Fan Critique: Yifan Next regular meeting
Paper: Distance Metric Learning for Large Margin Nearest Neighbor Classification [Weinberger et al., 2006] Presenter/Defender: Marjan Critique: Huijie Next regular meeting
Paper: One-Class SVMs for Document Classification [Manevitz et al., 2001] Presenter/Defender: Santosh Critique: Marjan Next regular meeting
Paper: Hinge Loss Markov Random Fields [Bach et al., 2013] Presenter/Defender: Dainis Critique: Huijie Next regular meeting
Paper: AFRAID: Fraud Detection via Active Inference in Time-evolving Social Networks [Vlasselaer et al., 2015] Presenter/Defender: Huijie Critique: Santosh Next regular meeting
Paper: Efficient Estimation of Word Representations in Vector Space [Mikolov et al., 2013] . Ref. for background: [1], [2] Presenter/Defender: Yifan Critique: Dainis Next regular meeting
Paper: Learning Latent Representations for Domain Adaptation using Supervised Word Clustering [Xiao et al., 2013] Presenter/Defender: Fan Critique: Santosh Next regular meeting
Paper: Co-Training for Domain Adaptation [Chen et al., 2011] Presenter/Defender: Marjan Critique: Fan Next regular meeting
Paper: Supervised Random Walks [Backstrom and Leskovec, 2011] Presenter/Defender: Santosh Critique: Huijie Next regular meeting
Paper: Distributed Representations of Words and Phrases and their Compositionality [Mikolov et al., 2013] Presenter/Defender: Dainis Critique: Yifan Next regular meeting
Paper: Understanding and Combating Link Farming in the Twitter Social Network [Ghosh et al., 2012] Presenter/Defender: Huijie Critique: Santosh Next regular meeting
Paper: DeepWalk: Online Learning of Social Representations [Perozzi et al., 2014] Presenter/Defender: Yifan Critique: Santosh Next regular meeting
Paper: Domain Adaptation for Large-Scale Sentiment Classification: A Deep Learning Approach [Glorot et al., 2011] Presenter/Defender: Fan Critique: Marjan Next regular meeting
Paper: CRF Autoencoders for Unsupervised Structured Prediction [Ammar et al., 2014] Presenter/Defender: Yifan Critique: Fan Next regular meeting
Paper: Authorship Verification as a One-Class Classification Problem [Koppel and Schler, 2004] Presenter/Defender: Marjan Critique: Dainis Next regular meeting
Paper: Unsupervised Cross-Domain Word Representation Learning [Bollegala et al., 2015] Presenter/Defender: Dainis Critique: Fan Next regular meeting
Paper: Deep Semantic Frame-based Deceptive Opinion Spam Analysis [Kim et al., 2015] Presenter/Defender: Huijie Critique: Marjan Next regular meeting
Paper: Collective Opinion Spam Detection [Rayana and Akoglu, 2015] Presenter/Defender: Santosh Critique: Dainis Next regular meeting
Paper: Learning with Marginalized Corrupted Features [Maaten et al. 2013] Presenter/Defender: Fan Critique: Marjan Next regular meeting
Paper: Bidirectional LSTM-CRF Models for Sequence Tagging [Huang et al., 2015] Ref. for background: [1] Presenter/Defender: Dainis Critique: Yifan Next regular meeting
Paper: Joint Modeling of Opinion Expression Extraction and Attribute Classification [Yang and Cardie 2014] Presenter/Defender: Yifan Critique: Dainis Next regular meeting
Paper: Frustratingly Easy Domain Adaptation [Daume III, 2007] Presenter/Defender: Marjan Critique: Dainis Next regular meeting
Paper: From Word Embeddings to Document Distances [Kusner et al., 2015] Presenter/Defender: Sanotsh Critique: Fan Next regular meeting
Paper: BIRDNEST: Bayesian INference for Review Rating Fraud [Hooi et al., 2015] Presenter/Defender: Huijie Critique: Santosh Next regular meeting


Schedule of topics

Please note that the following is a list of tentative topics. During the course, and interleaved between lectures, time will be invested in review questions, homework solutions, discussion of novel ideas, paper critiques, and concept review.

Topic(s) Resources: Readings, Slides, Lecture notes, Papers, Pointers to useful materials, etc.
Brief Introduction to NLP
Course administrivia, semester plan, course goals
NLP Resources
Language as a probabilistic phenomenon
Word collocations, NLP and text retrival basics
Text categorization
Introduction to topics to be covered in the course
Required readings:
Lecture notes/slides
Chapter 1 FSNLP (Sections 1.2.3, 1.4, 1.4.1, 1.4.2, 1.4.3, 1.4.4)
Boolean retrieval slides by H.Schutze
Boolean retrieval [Manning et al., 2008] (upto section 1.4)
F. Keller's tutorial on Naiye Bayes + notes of A.Moore for graph view (Slide 8)
Statistical foundations I:Basics
Probability theory
Conditional probability and independence
Required Readings:
Lecture notes/slides
Chapter 2 FSNLP (Section 2.1.1 - 2.1.10), Chapter 1 SI (Selected topics covered in class and solved examples)
OR01: X.Zhu's notes on mathematical background for NLP
Slides:
Statistical foundations II: Random varibales and Distributions
Random variables, density and mass fuctions
Mean, Variance
Common families of distributions
Multiple random variabls: joints and marginals
Required Readings:
Lecture notes/slides
Chapter 2 SI (Theorem 2.1.10, 2.2, 2.2.1, 2.2.2, 2.2.3, 2.2.5, 2.3.1, 2.3.2, 2.3.4, and topics covered in class).
Chapter 3 SI (All sections + worked out examples upto 3.4), focus on distributions/problems covered in class and skip other topics.
Chapter 4 SI (4.1, 4.1.1, 4.1.2, 4.1.3, 4.1.4, 4.1.5, 4.1.6, 4.1.10, 4.1.11, 4.1.12, 4.2.1, 4.2.2, 4.2.3, 4.2.4, 4.2.5).
OR02: K.Zhang's notes on common families of distribution with worked out examples [Skip hyupergeometric, neg-binomial distributions and focus on the ones covered in class].

Optional Recommended reading/solved examples:
OR03: Notes on Joint, marginals, worked out examples by S.Fan
OR04: Tutorial on joints and marginals by M.Osborne [Contains NLP specific examples]
Hierarchical models and mixture distributions
Parameter estimation: MLE vs MAP
Prior, posterior, conjugate priors
Binomial-Poisson hierarchy
Beta-Binomial hierarchy
Required Readings:
Lecture notes/slides + Chapter 4 SI (4.4, 4.4.1, 4.4.2, 4.4.5 - 4.4.6)
OR05: P. Robinson's notes on parameter estimation [Slides 1-35]

Optional reference:
OR06: Notes on conjugate models by P. Lam [Slides 1-49]
Conjugate priors for common families of distribution
Text Clustering: Semantic Clustering and Topic Models
Latent semantics and clustering problem
Introduction to Bayes nets and PGMs
Latent Dirichlet Allocation
Learning and evalauting Topic Models

Required Readings:
Lecture notes + Stat review: Sampling from distributions (previous slides/lecture notes)
OR13: Tutorial by D.Blei (Slides 1-17)
OR14:Gibbs sampling tutorial by M.Bahadori (Slides 1, 3-5, 7, 16-20, 22 )
Gibbs sampler derivation for Latent Dirichlet Allocation.
Comprehensive explanation/derivation of LDA by Gregor Heinrich
LDA Gibbs sampler implementation [Java/Eclipse project]

Programming resources, tools, libraries for projects and homeworks:
Mallet, LingPipe
Java Topic Modeling Toolkit [with implementation of LabeledLDA]
Matlab Topic Modleing Toolkit [with implementation of Author-Topic model]
[Implementation of advanced models]
G.Heinrich's LDA and statistics base classes for sampling based algorithms in Java
Supervised Topic Models

Optional recommended reading for research/projects:
Understanding Gibbs sampling with derivation for the Naive Bayes model (unsupervised) [Resnik and Hardisty, 2010]
D.Blei's tutorial on Dirichlet priors (Slides 32-39)
LDA Gibbs Sampler derivation (Chapter 2) by Yi Wang
Author topic model [Rosen-Zvi et al., 2004]; Derivation and details.
Applications of topic models (NIPS Workshop)
Topic coherence metric for evalauting topic models [Mimno et al., 2011]
Generic Gibbs sampling for Topic Models by G. Heinrich
Supervised Topic Models
Sentiment Analysis and Psycholinguistics
Aspect extraction
Deception and opinion spam
Required Readings:
Lecture notes + slides + selected topics (covered in lectures) from Chapter 11, WDM
Programming resources, tools, libraries for projects and homeworks:
Pos/Neg Sentiment Lexicon, SentiWordNet, Deep learning for senitment analysis

Optional topics/concepts useful for research/projects
Papers on opinon spam: [Mukherjee et al., 2013], [Mukherjee et al., 2012]
Papers on topic modeling: [Blei et al., 2003], [Resnik and Hardisty, 2010]
Aspect and Senitment Model: [Jo and Oh, 2011], slides, Accompanying data and source code
Other relevant papers: [Lin and He, 2009], [Zhao et al., 2010], [Mukherjee and Liu, 2012]