Machine Learning (COMP-652)
Fall 2009

Project

Goal

The main goal of the project for this course is to encourage you to experiment with some of the machine learning methods that we discussed in class, in the context of problems that are of interest to you. At this point in the course, everyone should have chosen the project.

Requirements

You should turn in a project report and prepare a project presentation.

The report is due by Wednesday December 16, at 11:59pm. You should e-mail the report, in pdf format to Doina directly (not the the usual homework e-mail address).

The report should contain the following components:

  1. An introduction/motivation section, in which you explain why the problem you are about to address is interesting and challenging
  2. A section describing the basic approach that you will take. You should assume that algorithms we discussed in class (e.g. SVMs, HMMs etc.) are known, but summarize any algorithms that would not have been discussed in class (e.g. conditional random fields, multi-dimensional scaling etc.)
  3. A section describing the experimental setup. Here you describe your data set, along with any pre-processing steps that you might have taken (e.g. to remove noise, select attributes etc.). You should describe this in enough detail that someone with access to your data could exactly reproduce the results
  4. A section describing your results, along with a discussion of what you observed. It is important to ensure that you perform your experiments in such a way that results are meaningful (e.g., make sure you use cross-validation and report test set results). If you use statistical testing to assess the signifcance of your results, make sure that the test you choose is appropriate for your data. If appropriate, report running time in addition to performance
  5. A section containing conclusions and possible future work directions.
  6. A section of references
There is no set format for the report, use your favorite. There are also no fixed length requirements - you should just make sure that you give the right amount of information for someone to understand what you did. In the past, reports have averaged 8-10 pages in single-column format, but use this number just as a guideline. Note that many of you are working with very interesting and challenging data. As a result, the goal of the grading would be to assess your competence in applying machine learning methods to a specific, large problem, and not to see a set level of performance.

The presentation should be 8 minutes long, with 2 minutes allowed for questions. This means that you could go over 8 slides at most, including your title slide. For such a short presentation, you usually do not need an outline slide. You should motivate your topic, present the general approach, the results and a short discussion, and have a conclusion/future work slide. Please e-mail the presentation along with your report to Doina. The presentation should be in pdf format as well. Even if you are not available on the presentation day, you should still e-mail a set of slides.