Skip to main content Link Menu Expand (external link) Document Search Copy Copied

Machine Learning Project

Goal : The purpose of this project is to learn to use a real world machine learning library of your choice and apply it to some data that interests you. Unlike in data mining, where often the goal is to just explore the data and look for patterns, this project should be focused on determining if/how to use a set of features to predict another feature (this assumes you’re doing supervised learning, though unsupervised is possible as well).

Guidelines : 1-3 people per group.

Libraries : You may use any modern machine learning library. Some of the ones I suggest are:

  • Tensorflow (mostly neural nets)
  • OpenCV (computer vision + ML algorithms)
  • Keras (neural nets)
  • PyTorch (lots of algorithms, including neural nets)
  • Scikit-Learn (used in class, lots of algorithms)

The project is extremely open-ended. It should consist of the following:

  • Find or collect a data set of interest. There are many sources on the web for data sets. I would prefer the data to be of a reasonably large, but really large data sets can bog down computers. A lower limit for data size should be 100 training examples, though in special circumstances you might get away with something lower (run it by me).
  • Formulate at least two questions you would like to answer from your data, in the form of predicting some variable from other variables.
  • Using the machine learning library, train at least two machine learning models per question, for a total of four models trained.
  • For each model, you should evaluate how well it does. There should be a training set and a testing set, and you should report how well your models perform.
  • What conclusions can you draw?

Data

There are lots of data sets available online. Pick something that you will enjoy working on, and something where there is a rich source of data available. Take some time in selecting a good data set - feel free to ask me for suggestions.

Proposal (due on Canvas at 11:59pm, April 5)

Your ML project proposal should be 1 page and contain the following elements:

  • List of group members.
  • What data set is being used - where does the data come from, and what are some characteristics of it (number of features, number of training examples, types of attributes).
  • Is there a reason you picked this data set? Tell me.
  • What is the question(s) of interest - be specific. Tell me the variables you’re going to predict, and which variables you will use as your input features.
  • What machine learning algorithms do you plan to use - understanding that this might change.

While I’d like you to think through your plan carefully, please understand that this is a proposal, and nothing you write must be set in stone.