Data Science & Machine Learning

Data science and machine learning teaching materials portfolio.

Topic Slides Videos Project Dataset Tools/Libraries Learning Goals
Logistic regression PDF Part IPart II GitHub Banking marketing campaign dataset (48,895 records with customer demographics, financial history, and campaign outcomes) Python, pandas, scikit-learn, matplotlib, seaborn, numpy Binary classification, hyperparameter optimization with GridSearchCV, confusion matrix analysis, threshold tuning
Data preprocessing PDF Part IPart II GitHub AirBnB NYC 2019 dataset (48,895 listings with price, location, room type, host info, review data) Python, pandas, numpy, matplotlib, seaborn, scikit-learn, scipy Data cleaning, statistical analysis, feature relationships with Chi-squared and Kruskal-Wallis tests, missing value imputation, categorical encoding, Box-Cox transformation
Linear regression PDF YouTube GitHub Medical insurance cost dataset (1,338 policyholders with demographics, BMI, smoking status, region) Python, pandas, numpy, scikit-learn, matplotlib, seaborn Linear relationships, least squares estimation, feature engineering, polynomial features, model evaluation metrics, class imbalance with over-sampling
Regularized linear regression PDF YouTube GitHub US county-level sociodemographic and health data (2018-2019) for morbidity prediction Python, pandas, numpy, scikit-learn, matplotlib, seaborn Ridge and Lasso regression (L1/L2 regularization), overfitting prevention, hyperparameter tuning, polynomial feature engineering, bias-variance tradeoff
Decision trees & ensemble methods PDF Part IPart IIPart III GitHub Diabetes physiology dataset (biomedical features from 768 patients with binary diabetes label) Python, pandas, scikit-learn, matplotlib Decision tree construction & pruning techniques, overfitting mitigation, ensemble methods feature importance, tree visualization, hyperparameter optimization
Naive Bayes PDF YouTube GitHub Google Play Store app reviews dataset for sentiment analysis (positive/negative polarity) Python, pandas, numpy, scikit-learn, NLTK, matplotlib, seaborn, scipy Text preprocessing with lemmatization, multiple Naive Bayes variants comparison, dimensionality reduction with PCA and Feature Agglomeration, cross-validation, NLP techniques
K-nearest neighbors PDF YouTube GitHub Red wine quality dataset (4,898 wine samples with chemical composition features and quality ratings from 0-10) Python, pandas, numpy, scikit-learn, matplotlib Distance metrics (Euclidean, Manhattan), k-value selection, nearest neighbor voting, model performance evaluation with classification/regression metrics, computational complexity considerations
K-means clustering PDF YouTube GitHub California housing dataset (20,640 records with geographic coordinates and median income) Python, pandas, scikit-learn, numpy, matplotlib, seaborn, plotly Unsupervised learning, clustering algorithms for market segmentation, geographic data visualization, supervised classification for cluster prediction, 2D and 3D visualization
Time series forecasting PDF YouTube GitHub Airline Passengers dataset from Seaborn (1949-1960 monthly passenger counts with seasonal patterns) Python, pandas, numpy, matplotlib, seaborn, scikit-learn, pmdarima, statsmodels, scipy Time series analysis, stationarity testing, baseline models, ARIMA modeling with auto_arima, TimeSeriesSplit validation, trend and seasonality analysis
Image classification PDF Part IPart II GitHub Dogs vs Cats dataset from Kaggle competition (image classification with Kaggle API integration) Python, TensorFlow/Keras, numpy, matplotlib, kaggle API, Inception-V3 Convolutional Neural Networks, deep learning, image preprocessing, model training with GPU, hyperparameter optimization, fine-tuning Kaggle API usage, binary image classification
Natural language processing PDF YouTube GitHub URL dataset for binary classification (spam detection) Python, pandas, numpy, scikit-learn, matplotlib, seaborn, NLTK Text preprocessing, tokenization, TF-IDF vectorization, NLP pipeline development, support vector machines/classifiers
Recommender systems PDF YouTube GitHub IMDB movie database (4803 movies with text features like description, genera, keywords and cast names) Python, pandas, scikit-learn, NLTK, matplotlib Text preprocessing, tokenization, TF-IDF vectorization, NLP pipeline development, k-nearest-neighbors
ML app deployment PDF YouTube GitHub Deployment of movie recommender from previous project Gunicorn, Flask, Render Refactoring, model serving, Flask applications, web-services, cloud deployment