Starts: June 29 - 12 noon ET
In this group, we discuss the basic algorithms and fundamentals of different ML techniques used to personalize services online.
Prior experience using Python is required for this project. We recommend that user’s are also familiar with scikit-learn, Tensorflow/PyTorch, Pandas and Numpy for ML-related modelling tasks. Linear Algebra knowledge is essential as well
Working Group Leads / Advisors
❗ Meeting Link: https://meet.jit.si/WGRecSys
❗ Meeting Time: 12-1 PM ET on Wednesdays (starting the 29th of June)
❗ Slack channel [Communication point]: click here to join
Project Overview
- Practice organizing and structuring big datasets for streamlined use in ML projects
- Train an ensemble of ML classifiers including support vector machines, random forests and collaborative filtering techniques for personalization services
- Examine model performance and robustness when tested against future transactions
- Learn how different algorithms from different fields like GNN, DL and more can be leveraged in personalization models
- Minimal Project Goal: Develop a recommender system on open source datasets Yelp https://www.yelp.com/dataset/download
- Stretch Goal: Quantify the impacts of different models on the dataset utilizing different parts of the input data and building a personalization model that aids in showing users the products that they are really interested in. Determining the offline metrics appropriate for such a project ranging from precision, recall, CTR, accuracy and more. Write up a short article that examines this problem and share your findings
Tentative Project Timeline
# | Major Milestones | Expected time to finish |
1 | Get familiar with the project domain | 2-3 weeks |
2 | Download & process Yelp Review data | 1 week |
3 | Test different features and determine the right attributes | 1 week |
4 | Develop two ML classifiers | 4 weeks |
5 | Building offline metrics appropriate | 1 week |
6 | Summarize the difference between the models identifying potential weakness and strengths | 1 week |
Solution Architecture + Resources

All Resources / Recipes before have either been coded out or expert checked for quality
Yelp Dataset: https://www.yelp.com/dataset/download
Pre-prcessing: The same for each paper. Skim to get some quick intuition.
Pre-Reqs:
- Convolutional Networks: TBA
- Graph Structures: TBA
- Collaborative Filtering: TBA
Neural Graph Collaborative Filtering (NGCF):
LightGCN:
Self Supervised Graph Learning for Recommendations (SGL):
Model comparison component:
Precision & Recall (more emphasis on Recall):
NDCG
We will write an article on what we learn as we compare the models above. The Core Team will be recognized as co-auhors.
Why join?
Aggregate Intellect hosts one of the most diverse ML communities in the world. Over the course of the working group
- You’ll get an immersion into that community & walk out with some cool new friends.
- Learn how to download and interact with large scale real data in Python
- Advance your ML skills by working on real world problems with classification algorithms of increasing sophistication
- Contribute to a study area (land cover classification) which has major impacts to resource management practices, wildlife habitat protection and in advancing our understanding of the Earth’s biophysical systems
Looking forward to meeting everyone in our study group! Please feel free to reach out if you have any questions about the planned project.