In IEOR 142: Introduction to Machine Learning and Data Analytics, Professor Paul Grigas uses real datasets to teach students key techniques in machine learning and data analytics. The class culminates with a final project where students must identify a real-world problem, collect and process data, and then use models and concepts taught in class to solve the problem.
1. NBA Sports Betting
IEOR Students: Bennett Cohen, Christopher Landgrebe, Arman Vaziri, Leonardo Biral, Calvin Shuester
As one of the world’s most popular, widely viewed sports leagues, the National Basketball Association
(NBA) also boasts a sports betting market of over $500 million, with bets placed on every match throughout the NBA’s 82-game season. In this project, students explored how to use machine learning techniques to make more accurate sports betting predictions and maximize profits. By collecting data from the NBA’s own stats website (www.nba.com/stats/) and one of the most popular sports betting websites in the world (www.sportsbookreviewsonline.com), students constructed features with predictive power and then implemented various machine learning techniques to make predictions on the outcomes of games in the NBA. By the end of the project, students could demonstrate an appreciable increase in betting accuracy with models.
2. Song Popularity
IEOR Students: Air Saengthongsrikamol, Ankita Kini, Henry Cheong, Grace Qiu, Vaibhav Gettani
What makes a song a hit? This student project explores the world of music by using machine learning to predict if a song will make it into the Billboard Top 100 list. Using extensive data collection from over 18,000 top 100 billboard songs, students cleaned, processed, and feature-engineered songs into features that included lyric sentiment, danceability, genre, loudness, Twitter data, and more. Students used different machine learning models and validation techniques to test the predictive power of song features. Gradient boosting classification provided the most accurate results, showing the most predictive song features to be ‘follower count’, ‘loudness’, and ‘pop genre’. The project proves itself useful for budding artists who can utilize the prediction engine to try out music with different features and predict its potential on the market.
3. Rent Housing Costs Across the U.S.
IEOR Students: Vinson Chiu, Jianing Yu, Zijun Zhang Felicia Xu, and Sean Tsung
In this project, students used machine learning to explore factors that can predict rent housing costs in
different cities across the United States. Students used multiple data sources, combining data about
individual properties, such as number of rooms, with regional factors like crime rates, proximity to public schools, and county population. The students engineered nine features from the data set and conducted modeling with multiple machine learning techniques. Their results indicated that median cost of rent in a city, the area of the house, and number of public schools in the neighborhood were features that impact rent housing costs most significantly.
4. Using Machine Learning to Battle COVID-19 Falsehoods
IEOR Students: Ahmet Turunc, Catherine Le, Pranav Viswanathan, Wako Morimoto
In the United States, the COVID-19 pandemic has been accompanied by a proliferation of misleading and false information, from fake cures to fatal preventive measures, to conspiracy theories about the virus’s origin, with much of the misinformation published on social media platforms like Twitter and Facebook. In this project, students respond to the coronavirus “infodemic”, by creating machine learning classification models that predict if a given short text post contains COVID-19-related falsehoods. Students were able to engineer multiple high-performing models, which can be used in efforts to identify and minimize the spread of COVID-19 misinformation and ultimately help in protecting public health.
5. Preparing for Future Pandemics
IEOR Students: Shreyas Hariharan, Dyllan Liu, Edrea Low, Shirley Wang
This student project aims to explore demographic characteristics like age, medically uninsured population percentages, income per capita, population, etc. The goal is to assess the extent to which
a factor may impact a county’s ability to respond effectively to a pandemic and allocate resources. By creating a predictive engine that determines the importance of economic and demographic county characteristics, federal, state, and county-level decision-makers can be better equipped to allocate resources efficiently during future public health emergencies.