# Lesson 09: NumPy Challenge - SOLUTIONS

This notebook contains complete solutions for the Lesson 09 NumPy Challenge activities.

## Instructions:
- Review each solution carefully
- Compare with your own approach
- Run the cells to see the output
- Note that there are often multiple valid ways to solve each problem!

In [1]:
# Import NumPy
import numpy as np

---
## Problem 1: The Grade Book Analyzer

**Objective:** Create a grade analysis system using NumPy arrays and statistical functions.

In [2]:
# Problem 1: Solution

def analyze_grades(grades):
    """
    Analyzes a grade book and returns various statistics.
    
    Args:
        grades (np.ndarray): 2D array of grades (students x tests)
    
    Returns:
        dict: Dictionary containing various grade statistics
    """
    
    # Calculate overall class average (mean of all grades)
    class_average = np.mean(grades)
    
    # Calculate average for each student (mean along axis=1, across columns)
    student_averages = np.mean(grades, axis=1)
    
    # Calculate average for each test (mean along axis=0, down rows)
    test_averages = np.mean(grades, axis=0)
    
    # Find highest and lowest scores
    highest_score = np.max(grades)
    lowest_score = np.min(grades)
    
    # Calculate passing rate (percentage of grades >= 60)
    passing_grades = grades >= 60
    passing_rate = (np.sum(passing_grades) / grades.size) * 100
    
    # Return all statistics in a dictionary
    return {
        'class_average': class_average,
        'student_averages': student_averages,
        'test_averages': test_averages,
        'highest_score': highest_score,
        'lowest_score': lowest_score,
        'passing_rate': passing_rate
    }

# Test case
grades = np.array([[85, 90, 78, 92],
                   [76, 88, 81, 79],
                   [93, 95, 89, 97],
                   [67, 72, 65, 70]])

results = analyze_grades(grades)

print("Grade Analysis Results:")
print(f"class_average: {results['class_average']:.2f}")
print(f"student_averages: {results['student_averages']}")
print(f"test_averages: {results['test_averages']}")
print(f"highest_score: {results['highest_score']}")
print(f"lowest_score: {results['lowest_score']}")
print(f"passing_rate: {results['passing_rate']:.2f}%")

Grade Analysis Results:
class_average: 82.31
student_averages: [86.25 81.   93.5  68.5 ]
test_averages: [80.25 86.25 78.25 84.5 ]
highest_score: 97
lowest_score: 65
passing_rate: 100.00%


---
## Problem 2: The Array Transformer

**Objective:** Create a flexible array manipulation function using NumPy operations.

In [3]:
arr = np.array([[85, 90, 78],
                [76, 88, 81],
                [93, 95, 89],
                [67, 72, 65]])

min_val = np.min(arr, axis=1, keepdims=True)
max_val = np.max(arr, axis=1, keepdims=True)

print(f'Mins: {min_val}')
print(f'Maxes: {max_val}')

Mins: [[78]
 [76]
 [89]
 [65]]
Maxes: [[90]
 [88]
 [95]
 [72]]


In [4]:
print((arr - min_val) / (max_val - min_val))

[[0.58333333 1.         0.        ]
 [0.         1.         0.41666667]
 [0.66666667 1.         0.        ]
 [0.28571429 1.         0.        ]]


In [11]:
# Problem 2: Solution

def transform_array(arr, operation='normalize', axis=None):
    """
    Transforms an array using various mathematical operations.
    
    Args:
        arr (np.ndarray): Input array
        operation (str): Type of transformation to apply
        axis (int): Axis along which to apply the operation (if applicable)
    
    Returns:
        np.ndarray: Transformed array
    """
    
    if operation == 'normalize':

        # Min-max normalization: (x - min) / (max - min)
        min_val = np.min(arr, axis=axis, keepdims=True)
        max_val = np.max(arr, axis=axis, keepdims=True)
        
        # # Handle edge case where all values are the same
        # if max_val - min_val == 0:
        #     return np.zeros_like(arr)
        
        return (arr - min_val) / (max_val - min_val)
    
    elif operation == 'standardize':

        # Z-score standardization: (x - mean) / std
        mean = np.mean(arr, axis=axis, keepdims=True)
        std = np.std(arr, axis=axis, keepdims=True)
        
        # Handle edge case where std is 0
        if std == 0:
            return np.zeros_like(arr)
        
        return (arr - mean) / std
    
    elif operation == 'square':

        # Square all values
        return arr ** 2
    
    elif operation == 'sqrt':

        # Take square root of all values
        return np.sqrt(arr)
    
    else:
        return f"Error: Unknown operation '{operation}'"

# Test cases
print("Test 1 (normalize):")
arr1 = np.array([[1, 2, 3, 4, 5], [5, 4, 3, 2, 1]])
print(f"Original: {arr1}")
print(f"Normalized: {transform_array(arr1, 'normalize')}")

print("\nTest 2 (sqrt):")
arr2 = np.array([1, 4, 9, 16])
print(f"Original: {arr2}")
print(f"Square root: {transform_array(arr2, 'sqrt')}")

print("\nTest 3 (standardize):")
arr3 = np.array([10, 20, 30, 40, 50])
print(f"Original: {arr3}")
print(f"Standardized: {transform_array(arr3, 'standardize')}")

print("\nTest 4 (square):")
arr4 = np.array([1, 2, 3, 4])
print(f"Original: {arr4}")
print(f"Squared: {transform_array(arr4, 'square')}")

Test 1 (normalize):
Original: [[1 2 3 4 5]
 [5 4 3 2 1]]
Normalized: [[0.   0.25 0.5  0.75 1.  ]
 [1.   0.75 0.5  0.25 0.  ]]

Test 2 (sqrt):
Original: [ 1  4  9 16]
Square root: [1. 2. 3. 4.]

Test 3 (standardize):
Original: [10 20 30 40 50]
Standardized: [-1.41421356 -0.70710678  0.          0.70710678  1.41421356]

Test 4 (square):
Original: [1 2 3 4]
Squared: [ 1  4  9 16]


---
## Problem 3: The Matrix Operations Toolkit

**Objective:** Build a function that performs various matrix operations.

In [12]:
# Problem 3: Solution

def matrix_operations(matrix1, matrix2=None, operation='transpose'):
    """
    Performs various matrix operations.
    
    Args:
        matrix1 (np.ndarray): First matrix
        matrix2 (np.ndarray): Second matrix (optional)
        operation (str): The operation to perform
    
    Returns:
        np.ndarray or str: Result of the operation or error message
    """
    
    if operation == 'transpose':
        # Return the transpose of matrix1
        return matrix1.T
    
    elif operation == 'flatten':
        # Flatten matrix1 to 1D array
        return matrix1.flatten()
    
    # For operations requiring two matrices, check if matrix2 is provided
    if matrix2 is None:
        return f"Error: Operation '{operation}' requires two matrices"
    
    if operation == 'multiply':

        # Element-wise multiplication
        if matrix1.shape != matrix2.shape:
            return "Error: Matrices must have the same shape for element-wise multiplication"
        
        return matrix1 * matrix2
    
    elif operation == 'add':

        # Matrix addition
        if matrix1.shape != matrix2.shape:
            return "Error: Matrices must have the same shape for addition"
        
        return matrix1 + matrix2
    
    elif operation == 'matmul':

        # Matrix multiplication
        # Check if dimensions are compatible: (m x n) @ (n x p) = (m x p)
        if matrix1.shape[1] != matrix2.shape[0]:
            return f"Error: Incompatible shapes for matrix multiplication: {matrix1.shape} and {matrix2.shape}"
        
        return matrix1 @ matrix2
    
    else:
        return f"Error: Unknown operation '{operation}'"

# Test cases
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])

print("Matrix A:")
print(A)
print("\nMatrix B:")
print(B)

print("\nTranspose of A:")
print(matrix_operations(A, operation='transpose'))

print("\nElement-wise multiplication (A * B):")
print(matrix_operations(A, B, operation='multiply'))

print("\nMatrix multiplication (A @ B):")
print(matrix_operations(A, B, operation='matmul'))

print("\nAddition (A + B):")
print(matrix_operations(A, B, operation='add'))

print("\nFlatten A:")
print(matrix_operations(A, operation='flatten'))

Matrix A:
[[1 2]
 [3 4]]

Matrix B:
[[5 6]
 [7 8]]

Transpose of A:
[[1 3]
 [2 4]]

Element-wise multiplication (A * B):
[[ 5 12]
 [21 32]]

Matrix multiplication (A @ B):
[[19 22]
 [43 50]]

Addition (A + B):
[[ 6  8]
 [10 12]]

Flatten A:
[1 2 3 4]


---
## Problem 4: Fixing NumPy Bugs

**Objective:** Debug and fix code snippets that contain common NumPy-related errors.

### Bug 1: The Shape Mismatch

**Problem:** Array has 8 elements but trying to reshape to 3x3 (9 elements)

**Solution:** Need 9 elements or reshape to compatible dimensions (e.g., 2x4, 4x2)

In [7]:
# Bug 1: Fixed code

# Fix: Need 9 elements for a 3x3 array
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])  # Added 9 to make it 9 elements
reshaped = arr.reshape(3, 3)

print(reshaped)

# Alternative fix: Reshape to compatible dimensions with 8 elements
arr2 = np.array([1, 2, 3, 4, 5, 6, 7, 8])
reshaped2 = arr2.reshape(2, 4)  # or (4, 2)
print("\nAlternative (2x4):")
print(reshaped2)

[[1 2 3]
 [4 5 6]
 [7 8 9]]

Alternative (2x4):
[[1 2 3 4]
 [5 6 7 8]]


### Bug 2: The Indexing Error

**Problem:** Using `[:, 1]` gets the second column, not the second row

**Solution:** Use `[1, :]` to get the second row (row index 1, all columns)

In [8]:
# Bug 2: Fixed code

matrix = np.array([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]])

print("Matrix:")
print(matrix)

# Fix: Use [1, :] to get the second row (row index 1, all columns)
# The original code [:, 1] gets the second column instead
second_row = matrix[1, :]  # or simply matrix[1]
print("\nSecond row:", second_row)

# For comparison, here's how to get the second column:
second_column = matrix[:, 1]
print("Second column:", second_column)

Matrix:
[[1 2 3]
 [4 5 6]
 [7 8 9]]

Second row: [4 5 6]
Second column: [2 5 8]


### Bug 3: The Data Type Problem

**Problem:** Array is integer type by default, causing integer division in older Python/NumPy

**Solution:** Convert array to float type before division, or use float in array creation

In [9]:
# Bug 3: Fixed code

# Fix Option 1: Explicitly specify float dtype when creating array
numbers = np.array([10, 15, 20, 25, 30], dtype=float)
divisor = 3

result = numbers / divisor

print("Option 1 - Using dtype=float:")
print(f"Numbers: {numbers}")
print(f"Divided by {divisor}: {result}")
print(f"Data type: {result.dtype}")

# Fix Option 2: Convert to float after creation
numbers2 = np.array([10, 15, 20, 25, 30])
result2 = numbers2.astype(float) / divisor

print("\nOption 2 - Using astype(float):")
print(f"Divided by {divisor}: {result2}")
print(f"Data type: {result2.dtype}")

# Note: In Python 3 and modern NumPy, division with / always produces floats,
# but explicitly setting the dtype is still good practice for clarity

Option 1 - Using dtype=float:
Numbers: [10. 15. 20. 25. 30.]
Divided by 3: [ 3.33333333  5.          6.66666667  8.33333333 10.        ]
Data type: float64

Option 2 - Using astype(float):
Divided by 3: [ 3.33333333  5.          6.66666667  8.33333333 10.        ]
Data type: float64


---
## Bonus Challenge: The Data Filter

**Objective:** Apply multiple NumPy concepts to solve a real-world data filtering problem.

In [10]:
# Bonus Challenge: Solution

def filter_data(data, condition='positive', threshold=0):
    """
    Filters data based on various conditions.
    
    Args:
        data (np.ndarray): Input data array
        condition (str): The filtering condition to apply
        threshold (float): Threshold value for certain conditions
    
    Returns:
        tuple: (filtered_data, count)
    """
    
    # Flatten the array if it's multi-dimensional
    flat_data = data.flatten()
    
    # Apply the appropriate filter based on condition
    if condition == 'positive':

        # Keep only positive values
        mask = flat_data > 0
    
    elif condition == 'negative':

        # Keep only negative values
        mask = flat_data < 0
    
    elif condition == 'threshold':

        # Keep only values greater than threshold
        mask = flat_data > threshold
    
    elif condition == 'range':

        # Keep only values between -threshold and +threshold (inclusive)
        mask = (flat_data >= -threshold) & (flat_data <= threshold)
    
    else:
        return (np.array([]), 0)
    
    # Apply the mask to filter the data
    filtered_data = flat_data[mask]
    count = len(filtered_data)
    
    return (filtered_data, count)

# Test cases
data = np.array([-5, 10, -3, 15, 0, -8, 20, 3])
print(f"Original data: {data}\n")

filtered, count = filter_data(data, condition='positive')
print(f"Positive values: {filtered}")
print(f"Count: {count}\n")

filtered, count = filter_data(data, condition='negative')
print(f"Negative values: {filtered}")
print(f"Count: {count}\n")

filtered, count = filter_data(data, condition='threshold', threshold=10)
print(f"Values > 10: {filtered}")
print(f"Count: {count}\n")

filtered, count = filter_data(data, condition='range', threshold=5)
print(f"Values in range [-5, 5]: {filtered}")
print(f"Count: {count}")

Original data: [-5 10 -3 15  0 -8 20  3]

Positive values: [10 15 20  3]
Count: 4

Negative values: [-5 -3 -8]
Count: 3

Values > 10: [15 20]
Count: 2

Values in range [-5, 5]: [-5 -3  0  3]
Count: 4


---
## __Reflection Questions - Sample Answers__

Here are thoughtful answers to the reflection questions:

### Your Reflections:

1. **How does NumPy's vectorization make operations faster compared to Python loops?**

   NumPy's vectorization performs operations on entire arrays at once using optimized C code, rather than iterating through elements one by one in Python. For example, in Problem 2, `arr ** 2` squares all elements simultaneously, which is much faster than using a for loop like `[x**2 for x in arr]`. NumPy operations are implemented in C and use CPU-level optimizations like SIMD (Single Instruction, Multiple Data), making them 10-100x faster for large datasets.

2. **What is the purpose of the `axis` parameter in NumPy functions?**

   The `axis` parameter specifies which dimension to perform an operation along. In Problem 1, `axis=0` calculates along rows (down columns) to get test averages, while `axis=1` calculates along columns (across rows) to get student averages. Think of it as: `axis=0` collapses rows (vertical), `axis=1` collapses columns (horizontal). Without an axis parameter, the operation applies to the entire flattened array.

3. **Explain the difference between element-wise multiplication and matrix multiplication.**

   Element-wise multiplication (`*`) multiplies corresponding elements: `[[1,2],[3,4]] * [[5,6],[7,8]] = [[5,12],[21,32]]`. Matrix multiplication (`@` or `np.matmul()`) performs the dot product of rows and columns: for 2x2 matrices, result[i,j] = sum of (row i of A * column j of B). Matrix multiplication requires compatible shapes (m×n @ n×p = m×p), while element-wise requires identical shapes. Matrix multiplication is fundamental to linear algebra and machine learning transformations.

4. **Why is it important to pay attention to array shapes?**

   Array shapes determine whether operations are valid and affect the results. Bug 1 showed that reshape requires compatible dimensions (8 elements can't become 3×3). Bug 2 demonstrated how indexing depends on understanding row vs column dimensions. Shape mismatches cause ValueError in operations like matrix multiplication or addition. In data science, shape errors often indicate conceptual mistakes - like trying to multiply features with the wrong number of samples. Always check shapes with `.shape` before operations.

5. **How can boolean indexing be used for data filtering?**

   Boolean indexing creates a mask of True/False values based on conditions, then uses it to select elements. In Problem 1, `grades >= 60` created a boolean array, which we used to count passing grades. The Bonus Challenge showed more complex filtering with conditions like `(data >= -threshold) & (data <= threshold)`. Real-world example: filtering customer data to find high-value purchases over $100: `expensive_items = prices[prices > 100]`. This is much cleaner and faster than loops, and it's essential for data cleaning, outlier detection, and feature engineering in data science workflows.

---
## Additional Tips and Best Practices

### Performance Tips:
1. **Avoid loops** - Use vectorized operations whenever possible
2. **Preallocate arrays** - Use `np.zeros()` or `np.empty()` instead of growing arrays
3. **Use views, not copies** - Slicing creates views; use `.copy()` only when needed
4. **Choose appropriate dtype** - Use the smallest dtype that fits your data

### Common Pitfalls:
1. **Modifying arrays in place** - Remember that some operations return new arrays
2. **Integer division** - Be explicit about float types when doing division
3. **Broadcasting confusion** - Understand how NumPy expands dimensions
4. **Memory efficiency** - Large arrays can consume lots of memory; be mindful of copies

### Debugging Strategies:
1. **Print shapes** - Use `print(arr.shape)` frequently
2. **Check dtypes** - Use `print(arr.dtype)` to verify data types
3. **Test with small arrays** - Debug with 2-3 element arrays first
4. **Use array_equal** - Compare arrays with `np.array_equal(a, b)`

---
## Congratulations!

You've completed the Lesson 09 NumPy Challenge solutions! Key takeaways:

- **Array operations** are vectorized and much faster than Python loops
- **Shape awareness** is critical for avoiding errors and understanding results
- **Boolean indexing** provides powerful data filtering capabilities
- **Statistical functions** with axis parameters enable sophisticated data analysis
- **Matrix operations** form the foundation of linear algebra in data science

Continue practicing these concepts as you move forward with pandas, scikit-learn, and other data science libraries. NumPy is the foundation of the entire scientific Python ecosystem!