Introduction to Natural Language Processing
Fall, 2015

Instructor: Jackie Chi Kit Cheung
Time: Tuesdays and Thursdays, 1pm – 2:30pm
Location: McConnell 103
Office hours: Tuesdays 2:30pm – 3:30pm in MC 108N
Course outline

TA: Priya Sidhaye

This course presents an introduction to the computational modelling of natural language. Topics covered include: computational morphology, language modelling, syntactic parsing, lexical and compositional semantics, and discourse analysis. We will consider selected applications such as automatic summarization, machine translation, and speech processing. We will also study machine learning algorithms that are used in natural language processing.

Prerequisites: Knowledge of probabilities and statistics (e.g., MATH 323 or ECSE 305); algorithms (COMP 251 or COMP 252); programming experience.
Useful but not required: Background in artificial intelligence (e.g., COMP 424); introductory course in linguistics (LING 201).

Instructor permission is required to register. To obtain this permission, send me an e-mail stating how you meet the prerequisites listed above, along with your McGill ID, and whether you are an undergraduate or graduate student.

Announcements

• Office hours on Oct 27 will be shortened to 2:30pm-2pm, and will take place in MC 103.

• There was a mistake in Assignment 1 Question 3 and in lecture4.pdf for the definition of Good-Turing smoothing. These have now been fixed.

• I'm holding a make-up office hours on Sept 23, 3pm-4pm in MC108N.

• Office hours cancelled on Sept 15 and 22. E-mail me to make an appointment if you need to meet!

• The Schulich Library is offering a workshop on Library Research Methods for Computer Science Topics. You are encouraged to attend!
Description: In this hands-on workshop, you will learn to: (1) efficiently use relevant library resources to search for a variety of research material on a computer science research topic, (2) manage the references that you gather throughout the research by using EndNote, a citation management program (freely available for McGill students), and (3) address common questions about writing and citing.
When: Thursday, October 1st, from 3:00 to 4:30 pm
Where: Schulich Library room 313

Lectures and Readings

Week Lectures Readings
1 Sept 8 — Intro to NLP
Sept 10 — Morphology, FSAs, and FSTs
J&M Ch. 1
J&M Ch. 2, 3.1–3.8
2 Sept 15 — N-grams and language modelling
Sept 17 — Model complexity and smoothing
J&M Ch. 4.1–4.4
J&M Ch. 4.5, Sections 2–2.1 and 3–3.1 of this set of notes by Kevin Murphy
3 Sept 22 — Python tutorial, intro to summarization Given by Priya Sidhaye
Sept 24 — Basic machine learning and text classification
scikit-learn
4 Sept 29 — POS tagging and HMMs — intro
Oct 1 — POS tagging and HMMs — algorithms - Updated to fix errors in equations
J&M Ch. 5.1–5.3
J&M Ch. 5.5, 6.1–6.5
5 Oct 6 — Linear-chain CRFs
Oct 8 — English syntax and CFGs
Tutorial by Sutton and McCallum Sections 1, 2 up to 2.3, 3 up to 3.1
J&M Ch. 12
6 Oct 13 — The CKY algorithm and PCFGs
Oct 15 — Advanced techniques for training PCFGs
J&M Ch. 13–13.4
J&M Ch. 14–14.5, Klein and Manning, 2003; Petrov et al., 2006
7 Oct 20 — Lexical semantics: intro and WSD
Oct 22 — More lexical semantics
J&M Ch. 19, 20–20.5; WordNet; Yarowsky, 1995
J&M Rest of Ch. 20; FrameNet
8 Oct 27 — Compositional semantics: FOL and lambda calculus
Oct 29 — Definite descriptions, quantifiers, Cooper storage
J&M Ch. 17–17.3
J&M Ch. 18
9 Nov 3 — Coreference resolution
Nov 5 — Discourse coherence and cohesion
J&M Ch. 21.3–21.8
J&M Ch. 21.1, 21.2, Barzilay and Lapata, 2008
10 Nov 10 — Midterm
Nov 12 — Automatic summarization

J&M Ch. 23.3–23.7
11 Nov 17 — Natural language generation
Nov 19 — Machine translation

12 Nov 24 — Machine translation: EM and IBM Models 1 and 2
Nov 26 — The language acquisition debate
Slides by Koehn, 2009
See slides
13 Dec 1 — Guest lecture by Xiaodan Zhu: Sentiment analysis of social media texts
Dec 3 — Evaluation in NLP and AI: the Turing test and Winograd Schema Challenge

Levesque, 2013

Acknowledgements: Portions of the course slides are based on material from a similar course by Frank Rudzicz at the University of Toronto.

Assignments

Assignment 1 - due on Sept 29 at the start of class (1pm)
Data for question 4 - password given in class
Starter code for Q4

Assignment 2 - due on Oct 20 at the start of class (1pm)

Assignment 3 - due on Nov 17 at the start of class (1pm)
Starter code and data for question 2
Mitchell and Lapata, 2008
Please submit the hard copy of the response to the paper separately from the rest of the assignment.

Assignment 4 - due on Nov 26 at the start of class (1pm)
Langkilde and Knight, 1998
Please submit the hard copy of the response to the paper separately from the rest of the assignment.

Midterm

The midterm will be held in class on Tuesday, Nov 10. It will cover everything up to the end of week 9. The best way to study is to review the course slides, in-class exercises, and the assignments. Here is an additional list of optional exercises from the textbook.

Midterm
Solutions

Final Project

Project description