How-to Guides & Instructions

  1. DevOps Guides Overview
  2. Plotting Overview
  3. Statistics Overview
  4. Data Wrangling Overview
  5. Feature Engineering Overview
  6. Regression Overview
  7. Classification Overview

Cheat sheets & syntax reference

  1. Jupyter notebooks
  2. VScode (Windows)
  3. VScode (MacOs)
  4. Git
  5. NumPy
  6. Pandas

Data science library information

  1. NumPy: A core library for efficient numerical computations and multi-dimensional array operations in Python.
  2. Pandas: Provides high-level data structures (DataFrame, Series) and powerful tools for data manipulation and analysis.
  3. Matplotlib: A versatile plotting library for creating static, animated, and interactive visualizations in Python.
  4. Seaborn: A statistical data visualization library built on Matplotlib that provides attractive themes and higher-level plotting functions.
  5. SciPy: A collection of scientific computing tools built on NumPy for optimization, integration, signal processing, and more.
  6. Statsmodels: Offers classes and functions for estimating statistical models, conducting hypothesis tests, and performing data exploration.

  1. Further topics in data wrangling/data analysis
    • For an interesting alternative to Pandas see Polars
    • For N dimensional, labeled arrays see Xarray
    • For parallel, distributed dataframes see PySpark and Dask
    • For GPU accelerated data analysis see: CuPy and RAPIDS
    • For data pipeline workflow management see: Luigi or Airflow
  2. Data visualization

Incremental capstone slides

Unit 2: Applied Data Science with Python

  1. Incremental capstone 1: import and clean data

Unit 3: Machine Learning

  1. Incremental capstone 5: Florida Bike Rentals