Introduction to Natural Language Processing
Fall, 2016
Instructor: Jackie Chi Kit Cheung
Time: Mondays and Wednesdays, 1pm – 2:30pm
Location: McConnell 103
Office hours: Wednesdays, 2:30pm – 3:30pm in McConnell 108N
Course outline
TA: Jad Kabbara
This course presents an introduction to the computational modelling of natural language. Topics covered include: computational morphology, language modelling, syntactic parsing, lexical and compositional semantics, and discourse analysis. We will consider selected applications such as automatic summarization, machine translation, and speech processing. We will also study machine learning algorithms that are used in natural language processing.
Prerequisites: Knowledge of probabilities and statistics (e.g., MATH 323 or ECSE 305); algorithms (COMP 251 or COMP 252); programming experience.
Useful but not required: Background in artificial intelligence (e.g., COMP 424); introductory course in linguistics (LING 201).
Instructor permission is required to register. To obtain this permission, send me an e-mail stating (1) how you meet the prerequisites listed above, (2) your McGill ID, and (3) whether you are an undergraduate or graduate student. I will process these requests at the beginning of each month, and ask the administrator to add permissions for you to register.
Announcements
- Instructions for the final presentation have been posted.
- Class and office hours on Nov 2 are cancelled. Make-up office hours on Tuesday, November, 8, from 2:30pm – 4pm in MC108N.
- The course is full! You can still send me e-mail to try to register, and you are welcome to attend lectures until you (hopefully) get a spot, provided that the room has capacity.
Lectures and Readings
Date | Topic | Readings |
---|---|---|
Sept 2 | Introduction to Natural Language Processing | J&M Ch 1 (both 1st ed and 2nd ed) |
Sept 5 | Labour Day – No class | |
Sept 7 | Morphology, FSAs and FSTs – Updated Sept 10 | J&M Ch 2.2, Ch 3 (both 1st ed and 2nd ed) |
Sept 12 | Language modelling | J&M Ch 6.1, 6.2 (1st ed); J&M Ch 4.1 – 4.4 (2nd ed) |
Sept 14 | Smoothing and model complexity | J&M Ch 6.3 (1st ed); J&M Ch 4.5 (2nd ed) Notes by Kevin Murphy |
Sept 19 | Feature extraction and classification | |
Sept 21 | Support Vector Machines, Python intro, given by Jad Kabbara | |
Sept 26 | Part of Speech Tagging: Markov chains and hidden Markov models | J&M Ch. 8.1–8.3 (1st ed); J&M Ch. 5.1–5.3 (2nd ed) |
Sept 28 | Part of Speech Tagging: Algorithms | J&M Ch. 7.2-7.3, 8.5 (1st ed); J&M Ch. 5.5, 6.1–6.5 (2nd ed) |
Oct 3 | Linear-Chain Conditional Random Fields | Tutorial by Sutton and McCallum. Sections 1, 2–2.3, 3–3.1 |
Oct 5 | Introduction to Context-Free Grammars | J&M Ch. 9 (1st ed); J&M Ch. 12 (2nd ed) |
Oct 10 | Thanksgiving — no class | |
Oct 12 | Parsing with the CYK Algorithm | J&M Ch. 10, 12, especially 12.1 (1st ed); J&M Ch. 13, 14, especially 14.2 (2nd ed) |
Oct 17 | Topics in parsing: Context, dependency parsing | Klein and Manning, 2003; Petrov et al., 2006; Eisner, 1997; Ryan McDonald's thesis, pp 34 – 37 (contains a clearer description of the Eisner algorithm) |
Oct 19 | Lexical semantics | J&M Ch. 16, 17–17.3 (1st ed); J&M Ch. 19, 20–20.5 (2nd ed); WordNet; Yarowsky, 1995 |
Oct 24 | Lexical semantics 2 | J&M Rest of Ch. 17 (1st ed); J&M Rest of Ch. 20 (2nd ed); FrameNet |
Oct 26 | Compositional semantics | J&M Ch. 14 – 14.3 (1st ed); J&M Ch. 17 – 17.4 (2nd ed) |
Oct 31 | Compositional semantics: quantifiers and underspecification | J&M Ch. 15 (1st ed); J&M Ch. 18 (2nd ed) |
Nov 2 | No class | |
Nov 7 | Coreference resolution | J&M Ch. 18.1 (1st ed); J&M Ch. 21.3–21.8 (2nd ed) |
Nov 9 | Midterm | |
Nov 14 | Discourse coherence | J&M Ch. 18.2, 18.3 (1st ed); J&M Ch. 21.1, 21.2 (2nd ed); Barzilay and Lapata, 2008 |
Nov 16 | Automatic summarization | J&M Ch. 23.3-23.7 (2nd ed); Survey by Nenkova and McKeown, 2011, Chapters 1 and 6 |
Nov 21 | Natural language generation | |
Nov 23 | Machine translation | |
Nov 28 | Machine translation: EM and IBM Models 1 and 2 | Slides by Koehn, 2009 |
Nov 30 | Neural networks for NLP | Primer by Goldberg, 2015 |
Dec 5 | Evaluation issues in NLP |
Assignments
Assignment 1
Due Wednesday, September 28 at 1:05pm
Assignment 2
Due Wednesday, October 19 at 1:05pm
Assignment 3
Due Wednesday, November 16 at 1:05pm
- Handout
- Data files for Q2
- Reading for Q3 — Please hang on to a hard copy of Q3 to submit after the in-class discussion on November 16th.
Assignment 4
Due Monday, December 5 at 1:05pm
Please submit the reading assignment separately from the rest of the assignment.
- Handout
- Reading for Q2 — Please hang on to a hard copy of Q2 to submit after the in-class discussion on December 5th.
Midterm
The midterm will take place in class on Wednesday, November 9, 2016. It will be closed book, and no electronic aids will be allowed. It will cover everything up to and including the class on Monday, November 7, 2016.
- Practice problems from the textbook.
- Last year's midterm
- Solutions
Final Project
See here for a description of the requirements of the final project. The project proposal is due on Wednesday, October 26.
Final Presentation
Here are instructions for the final oral examination/presentation.