# Lesson 10 Activity: Working with Pandas

## Learning Objectives

By the end of this activity, you will be able to:
- Create Pandas Series and DataFrames
- Load data from CSV files
- Perform basic data exploration and analysis
- Calculate descriptive statistics
- Filter and manipulate DataFrame data

## Tips

- **Creating DataFrames:** Use `pd.DataFrame(dictionary)` where dictionary keys become column names
- **Loading CSV files:** Use `pd.read_csv('filename.csv')`
- **Basic exploration:** Use `.head()`, `.tail()`, `.info()`, `.describe()`, and `.shape`
- **Filtering data:** Use conditions like `df[df['column'] > value]`
- **Column selection:** Use `df['column_name']` or `df[['col1', 'col2']]`
- **Adding columns:** Use `df['new_column'] = calculation`
- **Statistics:** Use `.mean()`, `.max()`, `.min()`, `.sum()` methods

**Remember:** Take your time with each step and test your code frequently!

In [1]:
import pandas as pd
import numpy as np

---
## Problem 1: Creating Your First DataFrame

**Scenario:** You're working at a bookstore and need to create a simple inventory system.

**Your Task:**
1. Create a DataFrame called `books_df` with the following data:
   - Book titles: ["Python Basics", "Data Science Handbook", "Web Development Guide"]
   - Authors: ["John Smith", "Jane Doe", "Mike Johnson"]
   - Prices: [29.99, 45.50, 35.00]
   - Stock: [15, 8, 12]

2. Display the DataFrame
3. Print the shape of the DataFrame
4. Display basic information about the DataFrame using `.info()`

In [None]:
# Step 1: Create the DataFrame


In [None]:
# Step 2: Display the DataFrame


In [None]:
# Step 3: Print the shape


In [None]:
# Step 4: Display info


---
## Problem 2: Loading and Exploring Student Data

**Scenario:** You're a teacher analyzing student performance data.

**Your Task:**
1. Load the `students.csv` file into a DataFrame called `students_df`
2. Display the first 3 rows using `.head()`
3. Display the last 2 rows using `.tail()`
4. Show descriptive statistics for numerical columns using `.describe()`
5. Find the average grade of all students

Students data file avalible for download here: [students.csv](https://gperdrizet.github.io/FSA_devops/assets/data/unit2/students.csv)

In [None]:
# Step 1: Load the CSV file


In [None]:
# Step 2: Display first 3 rows


In [None]:
# Step 3: Display last 2 rows


In [None]:
# Step 4: Show descriptive statistics


In [None]:
# Step 5: Calculate average grade


---
## Problem 3: Data Filtering and Selection

**Scenario:** Continue working with the student data to find specific information.

**Your Task:**
1. Display only the 'name' and 'grade' columns from `students_df`
2. Find all students who scored above 85
3. Find all students studying 'Math'
4. Find the highest grade in the dataset
5. Count how many students are in each subject

In [None]:
# Step 1: Display only name and grade columns


In [None]:
# Step 2: Students with grades above 85


In [None]:
# Step 3: Students studying Math


In [None]:
# Step 4: Highest grade


In [None]:
# Step 5: Count students by subject


---
## Problem 4: Sales Data Analysis

**Scenario:** You're analyzing sales data for an electronics store.

**Your Task:**
1. Load the `sales.csv` file into a DataFrame called `sales_df`
2. Calculate the total value for each product (price Ã— quantity)
3. Add this as a new column called 'total_value' to the DataFrame
4. Find the product with the highest total value
5. Calculate the grand total of all sales

Sales data file avalible for download here: [sales.csv](https://gperdrizet.github.io/FSA_devops/assets/data/unit2/sales.csv)

In [None]:
# Step 1: Load the sales data


In [None]:
# Step 2 & 3: Calculate total value and add as new column


In [None]:
# Step 4: Find product with highest total value


In [None]:
# Step 5: Calculate grand total of all sales


---
## Problem 5: Series Creation and Manipulation

**Scenario:** Create and work with Pandas Series for daily temperature data.

**Your Task:**
1. Create a Pandas Series called `temperatures` with the following data:
   - Values: [22, 25, 23, 26, 24, 27, 25]
   - Index: ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']
2. Find the temperature for Wednesday
3. Find days with temperature above 24 degrees
4. Calculate the average temperature for the week
5. Find the day with the highest temperature

In [None]:
# Step 1: Create the temperature series


In [None]:
# Step 2: Temperature for Wednesday


In [None]:
# Step 3: Days with temperature above 24


In [None]:
# Step 4: Average temperature


In [None]:
# Step 5: Day with highest temperature


---
## Reflection Questions

Please answer these questions after completing the activity:

1. **What is the difference between a Pandas Series and a DataFrame?**
   
   *Your answer:*

2. **What are the advantages of using Pandas over working with plain Python lists and dictionaries?**
   
   *Your answer:*

3. **Describe a real-world scenario where you might use the filtering techniques you learned in Problem 3.**
   
   *Your answer:*

4. **What did you find most challenging about working with Pandas in this activity?**
   
   *Your answer:*