[188, 179, 176, 176, 172, 167, 166, 162]
[0, 1, 2, 3, 4, 5, 6, 7]
Regression & Prediction
In this lesson, you will learn how to use scatterplots and regression lines in Python to predict nitrate levels from RGB color values on test strips.
Materials Needed:
🎯 Checkpoint 1. a: Review—Looking back to Workshop #2
Last workshop we learned:
🎯 Checkpoint 7. a: Using RGB Values to Create Color
Materials Needed:
🎯 Checkpoint 7. b: Use the slider tool!
What color does this RGB value represent? (110, 164, 212)
Just like the computer last workshop, try to classify/make a prediction about what fruit this could be!
We met Smokey Buoy last workshop and learned a bit about how it works.
Let’s recall Smokey’s components and how they work together
Smokey Buoy’s camera has two jobs:

| # of Cubes | Height (inches) |
|---|---|
| 1 | 1 |

| # of Cubes | Height (inches) |
|---|---|
| 1 | 1 |
| 2 |

| # of Cubes | Height (inches) |
|---|---|
| 1 | 1 |
| 2 | 2 |

| # of Cubes | Height (inches) |
|---|---|
| 1 | 1 |
| 2 | 2 |
| 3 |

| # of Cubes | Height (inches) |
|---|---|
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |

| # of Cubes | Height (inches) |
|---|---|
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 |

| # of Cubes | Height (inches) |
|---|---|
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 4 |

| # of Cubes | Height (inches) |
|---|---|
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 4 |
| 5 |

| # of Cubes | Height (inches) |
|---|---|
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 4 |
| 5 | 5 |

| # of Cubes | Height (inches) |
|---|---|
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 4 |
| 5 | 5 |
| 10 |

| # of Cubes | Height (inches) |
|---|---|
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 4 |
| 5 | 5 |
| 10 | 10 |

🎯 Checkpoint 2. a:
Q1: How tall will our tower be if we use 100 cubes?
Q2: Your friend argues that the tower will be 99 inches tall. What would you tell him? Why?
Oh no! Uncle Bob’s cubes are all slightly different heights!
🎯 Checkpoint 2. b:
How will Uncle Bob’s new cubes affect our tower height predictions?


Bob’s cubes saved us money, but now tower heights are all over the place.
We need a better way to predict.


In Workshop #2, we learned how computers “see” color through RGB values. The buoy takes all the pixels for each test pad and finds their mean.
![]()
In Workshop #2, we learned how computers “see” color through RGB values. The buoy takes all the pixels for each test pad and finds their mean.
![]()
![]()
In our applet (for the next activity), we split the pixels into a grid and meaned each section. We will treat the center box like the test pad (use that number).
![]()
In our applet (for the next activity), we split the pixels into a grid and meaned each section. We will treat the center box like the test pad (use that number).
![]()
![]()
In our applet (for the next activity), we split the pixels into a grid and meaned each section. We will treat the center box like the test pad (use that number).
![]()
![]()
![]()
In our applet (for the next activity), we split the pixels into a grid and meaned each section. We will treat the center box like the test pad (use that number).
![]()
![]()
![]()
![]()
Materials Needed:
🎯 Checkpoint 4. a: Read water quality test data like a computer!
For this portion, each group is assigned either red, green, or blue, based on the color of your stickers!
🎯 Checkpoint 4. b: Plotting R, G or B values against nitrate concentrations.
🎯 Checkpoint 5. a: Storing your team’s data in lists!
Replace the ?? placeholders with your color values and nitrate concentrations.
IMPORTANT: Values must be in the exact same order.
Click ▶️ Run Code to store your data.
Replace the ?? placeholders with your color values and nitrate concentrations.
IMPORTANT: Values must be in the exact same order.
Click ▶️ Run Code to store your data.
[188, 179, 176, 176, 172, 167, 166, 162]
[0, 1, 2, 3, 4, 5, 6, 7]
🎯 Checkpoint 5. b: Scatterplots in Python!
Replace the ?? placeholders:
color_datanitrate_ppmClick ▶️ Run Code to create a scatterplot matching your group’s paper plot.
Replace the ?? placeholders:
color_datanitrate_ppmClick ▶️ Run Code to create a scatterplot matching your group’s paper plot.

Materials Needed: Complete Scatterplot, Yardstick
With your group, discuss where your regression line will likely fall on your scatterplot. Once your group has made a decision, place your yardstick on top of your scatterplot as if it’s the line.
Note: Lines will likely vary across groups, given that you are working with different color channels!
🎯 Checkpoint 5. c: Running a regression in Python!
Replace the ?? placeholders:
color_datanitrate_ppmClick ▶️ Run Code to generate the line of best fit equation.
Replace the ?? placeholders:
color_datanitrate_ppmClick ▶️ Run Code to generate the line of best fit equation.
🎯 Checkpoint 5. d: Printing the Regression Equation
The linear regression produces the equation for the line of best fit.
Remember y = mx + b? We can just substitute!
Click ▶️ Run Code to see the slope-intercept equation.
The linear regression produces the equation for the line of best fit.
Remember y = mx + b? We can just substitute!
Click ▶️ Run Code to see the slope-intercept equation.
🎯 Checkpoint 5. e: Plotting regression line in Python
Click ▶️ Run Code to plot the scatterplot with the regression line.
nitrate_pred = slope * color_data + intercept
plt.figure(figsize=(6, 5))
plt.scatter(color_data, nitrate_ppm, s=60, color="black")
plt.plot(color_data, nitrate_pred, color="black", linewidth=1)
plt.title("Nitrate Concentration vs. Color")
plt.xlabel("Color")
plt.ylabel("Nitrate Concentration (ppm)")
plt.text(
min(color_data), max(nitrate_ppm),
f"y = {slope:.2f}x + {intercept:.2f}\n",
ha="left", fontsize=10, color="gray"
)
plt.tight_layout()
plt.show()
🎯 Checkpoint 5. f: Plotting regression line on paper
Materials Needed:
Using the exact positioning Python generated, copy the regression line onto your group’s scatterplot. When finished, bring your group’s plot to the front of the class.
🎯 Checkpoint 6. a: Storing your team’s data in lists (again)!
Re-enter your color values and nitrate concentrations.
NOTE: You can copy/paste the two lists from the previous activity.
Click ▶️ Run Code to re-load your data.
color_data = [188, 179, 176, 176, 172, 167, 166, 162]
nitrate_ppm = [0, 1, 2, 3, 4, 5, 6, 7]
# Do not change the code below ------------------------------------
color_data, nitrate_ppm = process_data(color_data, nitrate_ppm)
slope, intercept = linear_regression(color_data, nitrate_ppm)
print(f"Slope: {slope:.2f}, Intercept: {intercept:.2f}")Slope: -0.29, Intercept: 53.10
🎯 Checkpoint 6. b: Predicting nitrate concentrations from your regression model (like the buoy)!
Now that we have our regression model, we can predict nitrate concentrations for new color values — even ones we didn’t measure!
Click ▶️ Run Code, then type any color value into the box to see the predicted nitrate concentration.
Materials Needed: - 💻 RGB Applet V2 - Three Additional “Nitrate Test Pad” Cards from Real Buoy Data (with RGB values printed) - Handout / Pencil
| Sample # | Red Model | Green Model | Blue Model |
|---|---|---|---|
| Sample 1 | |||
| Sample 2 | |||
| Sample 3 |
Compare with the other groups — what do you notice?
How might we improve our model and, therefore, our predictions? :::
Materials Needed: Handout, Pencil
🎯 Checkpoint 6. e: Stop and Jot: An agriculture scenario
Scenario: A local strawberry farm sells berries at small stands around town. They want to sell out every day! Some days are slower than others, so they’re thinking about offering discounts to sell more. What makes a day slow or busy? What clues could help them predict when to offer a discount?
🎉 Great job! You’ve learned so much!
Share what you’ve learned on the Exit Ticket.
Want to practice what we’ve learned?
Try the Exercises.
AAIC — Water Quality