A graduate course on advanced techniques to automatically interpret large amounts of structured, semi-structured, and unstructured data, infer some useful knowledge from it, and present this knowledge to users in a convenient form. With a special focus on applications to software engineering problems.
Offered by Martin Robillard in the McGill School of Computer Science in Winter 2012 (4 credits). Mondays and Wednesdays 1:00-2:30 in MC103 [Shortcut to the schedule]
Software developers, like many other types of knowledge workers, have sophisticated information needs, while at the same time being overwhelmed with information. For example, searching for "How do I send an email with Java" finds 143,000,000 related documents including articles, forum posts, mailing list archives, etc. This could be too much information. Recommendation Systems are tools that help users navigate large information spaces by providing guidance and assistance in the form of recommendations, or pieces of information estimated to be relevant in the context of a given task. Recommendation Systems for Software Engineering, or RSSEs, provide recommendations in highly technical contexts where analyses of structured data (such as source code) must often complement traditional data mining techniques. For a more detailed overview of RSSEs, read this IEEE Software article, and check out this website.
The course will cover topics in three major areas:
The course will involve a combination of "roadmap" lectures and invited lectures on selected topics, student presentations and discussion of research papers, and "work-in-progress" project presentations. The course will involve a major project: the development of a prototype RSSE. The final grade will take into account the project (50%), class participation (30%) and a take-home exam (20%). The course will be based on the book "Recommender Systems: An Introduction" by Jannach et al., 2010, and from selected scientific papers.
Official Academic Integrity Statement McGill University values academic integrity. Therefore all students must understand the meaning and consequences of cheating, plagiarism and other academic offenses under the Code of Student Conduct and Disciplinary Procedures (see www.mcgill.ca/students/srr for more information).
Language Policy In accord with McGill University’s Charter of Students’ Rights, students in this course have the right to submit in English or in French any written work that is to be graded.
The course project is to develop a prototype RSSE. You can chose whatever application and technique you like, as long as it involves the analysis of software engineering artifacts. Although you will be expected to develop a complete and functional RSSE, you are encouraged to focus on a specific aspect that corresponds to your research area of interest (e.g., mining algorithms, data preprocessing, UIs, etc.)
At the same time as you develop the technical aspects of your project, you will write a report on it using ACM's conference formatting guidelines. There are three milestones:
Details on the format of the reports and presentation, and general guidelines and advice, will be provided in class.
A one-page essay answering a synthesis question, to be completed on your own within a 24-hour period at some point after the project demos.
This schedule is subject to change. Seminar articles are in bold.
Date | Class Topics | Reading |
---|---|---|
Mon 9 Jan | Introduction to software engineering research. Roadmap: Recommendation systems. Overview of the project. | [RWZ2010] |
Wed 11 Jan | Roadmap: Data mining software repositories | [XTL2009] |
Mon 16 Jan | Seminar: Early Systems: CodeBroker and ExpertiseBrowser | [YF2002] [MH2002] |
Wed 18 Jan | Seminar: Recommendations for the web: tags and shortcuts | [LM2010] [BCC2009] |
Mon 23 Jan | Seminar: Applications of content-based recommendations: features and bug reports | [AHM2006] [DGH2011] |
Wed 25 Jan | Seminar: Code comprehension: reuse and debugging | [HRR2009] [AJL2009] |
Mon 30 Jan | Project proposals | |
Wed 1 Feb | Seminar: Mining code usage | [LZ2005] [BMM2009] |
Mon 6 Feb | Seminar: Finding code examples | [SC2006] [BOL2010] |
Wed 8 Feb | Seminar: Synthesizing code examples | [MXB2005] [DR2011] |
Mon 13 Feb | Invited Lecture: Partial program analysis and the SemDiff recommender | [DR2008] |
Wed 15 Feb | Invited Lecture: Mining user interaction data | [YR2011] |
Mon 20 Feb | No class - Study break | |
Wed 22 Feb | No class - Study break | |
Mon 27 Feb | Work in progress presentations | |
Wed 29 Feb | Seminar: Specification Mining | [ABL2002] [GS2009] |
Mon 5 Mar | Seminar: API property inference | [ZZX2009] [HST2010] |
Wed 7 Mar | Seminar: Code Quality | [ECH2001] [KR2009] |
Mon 12 Mar | Seminar: Bug prediction | [SZW2007] [BMN2011] |
Wed 14 Mar | Seminar: Software Evolution | [ZWD2004] [KN2009] |
Mon 19 Mar | Roadmap: Metrics and evaluation | [RRS2009] Chapter 8 [JZF2010] Chapter 7 |
Wed 21 Mar | Seminar: Personalization | [FYW2004] [TDH2005] |
Mon 26 Mar | Seminar: Interaction traces | [PG2006] [FOM2010] |
Wed 28 Mar | Seminar: User interfaces | [KRW2011] [SS2011] |
Mon 2 Apr | Seminar: Explanation | [HKR2000] [VSR2009] |
Wed 4 Apr | Roundtable: Privacy Issues in Recommender Systems | Selected by students |
Mon 9 Apr | No class - Easter Monday | |
Wed 11 Apr | Project presentations | |
Mon 16 Apr | Project presentations |
Sources not explicitly discussed as part of the seminar, but that will provide useful additional background on the course in general, or on specific topics.
This course draws inspiration from many sources, and in particular: discussions on RSSEs with the co-organizers of the RSSE workshop (Walid Maalej, Rob Walker, and Tom Zimmermann), joint work on API property inference with Mira Mezini and Eric Bodden at TU Darmstadt, Ahmed Hassan's course on Mining Software Engineering Data, and the exciting work of my graduate students.