Welcome Zoom Room Jamboard Textbook Lectures Drive Calculator 



Our next topic is Typicality. This topic serves as a bridge between our fondational topics (shapeshifting, mad science, and justice) and our application topics (health decisions, personal finance decisions, and business decisions).
The English language makes it complicated to talk about what "typical" means. There are many synonyms for the word.
Real life also makes it complicated to talk about what "typical" means. Many groups of numbers do not have a value that happens most often, or is most representative.
Tables can help summarize data to help us see typicality. But tables can also be used to cloud issues about typicality.
Charts can clearly show typicality with a nice visual picture. But charts can also be used to create a false illusion of typicality.
We will look closely at all these issues surrounding typicality. That way we will be prepared to think clearly when our upcoming applications use the word "typical" or its synonyms.
How many calories per day should a typical person eat?
What is a healthy aerobic exercise heart rate for an average person?
In general, what percentage of an onion is removed while peeling and trimming?
What is a normal percentage of household income to spend on housing?
How much do most people save for retirement?
What is the standard markup used in that type of retail business?
What will be the expected value in twenty years for that investment portfolio?
As we study this topic, work on making helpful and organized notes, so you have handy the comments, formulas, and example problems you need.
An elementary school teacher asks her students, "What type of animal makes the best pet?". Each student was allowed one reply. The teacher than prepared the image below to show all their responses condensed together.
What a vibrantly unhelpful way to to present the survey results!
The colorful speech bubbles are a meaningless distraction. Each time the word "cat" appears means one more student picked that answer, no matter where in the image the word "cat" was put.
Six
(If you are having trouble finding them, two are in yellow, one in red, and three in green.)
Surely we can create something with more clarity.
First let's make a frequency table with two columns to show the categories and how many times each category is counted.
Much clearer! Now even a quick glance let us see that cat was picked most (6 times) and bird almost as often (5 times).
A frequency table is not as cute as the image the teacher prepared for her class. But it works much better for doing math, and circumvents her clip art's lack of multiculturalism.
6 cats + 2 horses + 4 dogs + 1 unicorn = 13 fourlegged animals
There were 13 answers that were fourlegged animals. We can count 21 answers total.
So the fraction is ^{13}⁄_{21}
There were 13 answers that were fourlegged animals. We can count 21 answers total.
percentage = part ÷ whole = 13 ÷ 21 = 0.619 ≈ 62%
I have sudden urge to rotate that frequency table 90 degrees counterclockwise.
Next I have sudden urge to replace the frequency numbers with colorful bars.
Uhg. That is way too ugly. But you can see the theory behind turning a frequency table into the type of chart below.
Much nicer! This chart is pretty enough to be useful when answering questions.
With the bar chart it is easy to see that the top three were cat, bird, and dog.
Definition
A bar chart compares the frequencies of several unordered categorical data types.
Some people really like sorting their bar chart categories from tallest bar to shortest bar. These are called basic Pareto charts.
Definition
A basic Pareto chart is a bar chart with categories sorted from tallest bar to shortest bar.
Let's make our frequency table about the 21 pet choices even better.
We could use percentage = part ÷ whole we can find what percentage of total students picked each pet.
For example, 6 cat ÷ 21 total ≈ 29% of the students picked "cat".
The most helpful frequency tables have this information a third column. To be extra clear, we call the numbers we actually counted the counted frequency and the percentages the relative frequency.
Animal  Counted Frequency  Relative Frequency 

cat  6  29% 
fish  3  14% 
bird  5  24% 
horse  2  9% 
dog  4  19% 
unicorn  1  5% 
The relative frequences would look like the counted frequences with a percent symbol attached.
Let's spend some time with a different set of data.
This table shows the data generated in a past term from Gretchen Rubin's online Four Tendencies Quiz.
Response  Frequency (raw count)  Relative Frequency (as fractions)  Relative Frequency (as decimals)  Relative Frequency (as percentages) 

Questioner  5  ^{5}⁄_{22}  
Rebel  4  0.18  18%  
Obliger  11  
Upholder  2  
Total  22 
The fractions are ^{5}⁄_{22}, ^{4}⁄_{22}, ^{11}⁄_{22}, ^{2}⁄_{22}, and ^{22}⁄_{22}.
The decimals (rounding to the nearest hundredths if appropriate) are 0.23, 0.18, 0.5, 0.09, and 1
The percentages (rounding to the nearest percent) are 23%, 18%, 50%, 9%, and 100%
Your response is personal. For me, it was surprising to see so many people in that class pick "Obliger".
Notice that when making a bar chart we can label the vertical axis with either the counted frequency or the relative frequency.
Let's do both!
They look the same except for how the vertical axis is labeled.
Next we will make this data into a pie chart.
Response  Relative Frequency (as decimals)  Degrees for Pie Chart 

Questioner  
Rebel  0.18  0.18 × 360° = 65° 
Obliger  
Upholder  
Total 
The decimals are the same as before. If we round to the nearest hundredths when appropriate they are 0.23, 0.18, 0.5, 0.09, and 1
The degree amounts are 83°, 65°, 180°, 32°, and 360°
Now we can make a pie chart. We could use a protractor. Or we can estimate the angles using the fact that a relative frequency of 25% would become a 90 degree angle, a relative frequency of 50% would become a 180 degree angle, etc.
Bar charts are good for comparing frequencies. Pie charts are usually less clear.
But pie charts can be very dramatic. For example, here is a famous pie chart of how five big tech companies dominated the S&P 500 in the year 2018.
Pie charts can also help compare just a few things: especiall when those few things change over time. For example, these charts compare the popular votes and Electoral College votes of presidential elections between 1860 and 2012.
Applications of relative frequency are usually percent of... style problems. We solve them when we multiply the relative frequency by some other number.
Let's do three examples using our pet data. Here is that relative frequency table again.
Animal  Counted Frequency  Relative Frequency 

cat  6  29% 
fish  3  14% 
bird  5  24% 
horse  2  9% 
dog  4  19% 
unicorn  1  5% 
We see that 29% of the votes were for "cat".
So we ask, "What is 29% of 80?"
Then 80 × 0.29 ≈ 23 days with cat facts during that term
We see that 29% of the votes were for "cat".
So we ask, "What is 29% of 35?"
Then 0.29 × 35 ≈ 10 photos of cats on the wall
We see that 19% of the votes were for "dog".
So we ask, "What is 19% of 252?"
Then 0.19 × 252 ≈ 48 dog cookies
We can view relative frequencies as scale factors. We scaled the total days of the term, and the total pictures on the wall.
Relative frequencies have a lot more practical application than counted frequencies. No one cares that the classroom happened to have 21 students in attendance on the day the teacher asked about favorite pets. The number 21 is unimportant. What might be important is scaling the relative frequencies to fit a different situation.
Notice that we can order the categories in our example however we want. There is no inherent ordering to pet choices. This is an example of categorical data. The frequencies belong to categories that lack a naturally prefered order.
Categorical data happens all the time in real life.
Examples of Categorical Data
What are the populations of various states?
The states are the categories, and the populations the frequencies. We could sort the states alphabetically, by when the joined the union, by increasing or decreasing population size, etc.
How many students in a classroom have black, brown, green, or blue eyes?
The colors are the categories, and the counts of students with each eye color are the frequencies. We could sort the eye colors alphabetically, in the order they first appear in the survey answers, by increasing or decreasing frequency, etc.
In an extended family, how many people can play different types of music instruments?
The instruments are the categories, and the counts of how many family members can play each instrument are the frequencies. We could sort the music instruments alphabetically, by physical size, by increasing or decreasing frequency, etc.
Some categories could be arranged in multiple ways without too much fuss, but there is a "most natural" way to order the categories. This situation is named ordinal data.
Ordinal data also happens all the time in real life.
Examples of Ordinal Data
How many students are in each grade level at a certain elementary school?
The grade levels are the categories, which are almost numbers but also include kindergarten. The counts of students in each grade level are the frequencies. Any ordering other than starting at kindergarten and then increasing looks wrong.
In a college writing class, how many students earned each letter grade?
The letters are the categories, and the counts of students for each letter grade are the frequencies. Any ordering other than F, D, C, B, A looks wrong.
Chart the highest level of education earned by people in a certain city.
The categories are probably named something like "never completed high school", "completed high school", "twoyear college degree", "fouryear college degree", "graduate degree", and "completed postgraduate studies". The counts of people with each as their highest are the frequencies. Any ordering that does not start "never completed high school" and work its way up through "completed postgraduate studies" looks wrong.
Finally, many situations have numerical data in which the categories themselves are numbers. With numerical data the horizontal axis of a bar chart looks like a number line.
Numerical data also happens all the time in real life.
Examples of Numerical Data
How many calories do I eat on each day of January?
The day numbers are numeric categories, and the counts of calories each day are the frequencies. The horizontal axis is a number line from 1 to 31.
Everyone in a math class scored between 70 and 100 on a test. How many students got each score?
The scores are numeric categories, and the counts of students for score are the frequencies. The horizontal axis is a number line from 70 to 100.
In a certain neighborhood home prices range from $100,000 to $180,000. Each home value is rounded to the nearest $10,000. How many homes have each price?
The prices are the categories, and the counts of homes with each price are the frequencies. The horizontal axis is a number line from $100,000 to $180,000 counting with intervals of $10,000.
At this point we have established some vocabulary and can make frequency tables and bar charts. That can help us present data clearly. But for most applications we need more tools.
When we make a bar chart with numerical data something special can happen.
Definition
A histogram is like a bar chart, but instead of comparing several unordered categorical data types it shows how a single quantitative data type is distributed among ordered "bins" each with a range of values.
Histograms may have their vertical axis labeled with either counted or relative frequency.
In a histogram, the categories can be numeric values, such as ages, weights, heights, test scores, etc. When the categories are numeric they will have a natural order from smallest to greatest.
Not all histograms have numeric categories. Common examples include populations for different countries, costs of living in different cities, popularity of different foods, etc.
In a histogram, the height of the bar counts how many things are in each category. So the height is always numeric.
Below is a chart that shows midterm scores for a Math 25 class back in Winter term 2016.
Notice that each student in the class has his or her own piece of bar height somewhere on the histogram. Bars that are 1 high are each representing a certain student. Bars that are 2 high are each counting two students. And so on. We could point to any unit of bar height and sensibly ask, "Which student in the class is this?"
The word "inclusive" tells us to include both the 80% bar and the 90% bar. So we are being asked to total the heights of all the yellow bars.
1 + 1 + 2 + 3 + 2 + 1 = 10 students
We are being asked to total the heights of all the bars. Let's group them by color just to help avoid making a careless mistake.
(1 + 1 + 1 + 1) + (1 + 1 + 1) + (1 + 1 + 2 + 3 + 2 + 1) + (3 + 2 + 1) = 23 students
The histogram above was also colorcoded to group the categories into D, C, B, and A grades. But colorcoding is not normally a part of histograms.
All histograms are designed. The histogram above uses categories of size 1.7% (starting at the lowest score of 48%). This visually sorts the students a certain way. The color coding emphasizes that there is a green A subgroup that did amazing, a big yellow B subgroup that did well, three orange C students who squeaked by, and four blue D students who definitely need improvement.
If the instructor instead used wider categories of size 5% the histogram would change. Below is exactly the same set of test scores, but with wider categories and altered colorcoding.
Their new category appears to be a low A, which is a big improvement from the histogram above where their category appeared to be a middle B. They would like the second histogram much better!
In the new histogram he is in the category "78% to almost 83%". This puts him at the highest score in the yellow B category. He would also prefer the second histogram.
In the new histogram she is grouped with a student who did slightly worse. Both are together in the category "73% to almost 78%". This puts her at the lowest score in the yellow B category. She would also prefer the second histogram.
Mathematicians call histogram categories bins. It helps to picture the sections of the xaxis as physical bins that items are put into. That reminds us that designing the bins is a choice with consequences.
Someone who is careless, and makes a histogram without thinking carefully about the categories, is still making a choice! It is an unintentional and sloppy choice. Hopefully it does no harm.
Please remember to pay attention to histogram bins. Perhaps the person who made the histogram has an agenda. Perhaps that person is sleepy and not paying attention to why bin size matters. Perhaps that person is you!
Person  Finger Snaps in 15 Sec 

Annette Azzlemahm  40 and 46 
Bronson Boldstock  54 and 57 
Clarabelle Crinkmack  47 and 48 
Dexter Dazzlespout  48 and 45 
Eliza Ebbletack  40 and 45 
Frederick Fiddlefeather  52 and 46 
Ginger Gupperworth  52 and 60 
Heathcliff Hablingford  57 and 20 
Your Attempts 
Bin  Frequency (raw count)  Relative Frequency (as decimals)  Relative Frequency (as percentages) 

20 to ?  
? to ?  
? to ?  
? to ?  
? to ?  
? to 60  
Total 
Khan Academy
OCLPhase2
Finding the mean of a data set
Finding the mean of a data list
Finding the mean from a frequency table
The trickiest kind of frequency tables measure how two different effects overlap.
That kind of overlap is called contingency. So these are named contingency tables.
They are easiest to understand with an example.
A survey of seventy LCC students asked about exercise and whether they homeschooled kids in 2020.
How Much are you Exercising?  Homeschooling  Not Homeschooling  Total 

Not at all  16  10  26 
A little, but not enough  10  18  28 
Yes, enough  2  14  16 
Total  28  42  70 
Notice that contingency tables will not normally have a relative frequency column or row. That looks too crowded. But we can still find that type of information.
percentage = part ÷ whole = 28 ÷ 70 = 0.4 = 40%
The tricky part of using contingency tables is that so many different kinds of questions can be asked. This means that when finding relative frequency with percentage = part ÷ whole we need to be really careful about what is "part" and what is "whole"
The whole amount is quite often implicitly the grand total.
percentage = part ÷ whole = 2 ÷ 70 ≈ 0.029 ≈ 3%
The whole amount can also be smaller than the grand total.
percentage = part ÷ whole = 2 ÷ 28 ≈ 0.07 ≈ 7%
Stay alert!
Conquering the Negativity Instinct
The second chapter of Factfulness discusses ways to avoid negativity. Hans Rosling advises us that good situations, especially gradual improvements, are seldom reported. So most news is bad news—and when we hear bad news we should ask both "What good situation was not reported?" and "Is this bad situation, although bad, getting better?"
When studying math two attitudes (not from the book) counteract negativity.
The first attitude is confidence. This word has a special meaning in the setting of personal growth, including any college class.
If we knew a situation would have success, we would have certainty. If we were less sure but had a lot of hope for success, we would have optimism.
If we instead embraced the uncertainty, recognized that life is more complex than success or failure, and acted with the expectation that the situation will be worthwhile for teaching us something about life or ourselves—that is confidence.
Be confident! One lesson of weighted average math problems is that worthwhile situations have many possibilities, and we can consider them all without focusing on success or failure.
In the words of Mark Manson:
Happiness comes from solving problems. The keyword here is "solving." If you're avoiding your problems or feel like you don't have any problems, then you're going to make yourself miserable. If you feel like you have problems that you can't solve, you will likewise make yourself miserable. The secret sauce is in the solving of the problems, not in not having problems in the first place.
When the standard of success becomes merely acting—when any result is regarded and progress and important, when inspiration is seen as a reward rather than a prerequisite—we propel ourselves ahead.
Because here's something that's weird but true: we don't actually know what a positive or negative experience is. Some of the most difficult and stressful moments of our lives also end up being the most formative and motivating. Some of the best and most gratifying experiences of our lives are also the most distracting and demotivating. Don't trust your conception of positive/negative experiences. All that we know for certain is what hurts in the moment and what doesn't. And that's not worth much.
The second attitude is holistic philosophy.
Our earliest understanding of self is based on selfobservation and information from authority figures. A girl's parents tell her "You love to dance!" when she was three years old. She had not really bothered to think about it, but they were right.
Cartesian philosophy breaks wholes into parts for understanding. This can allow a different understanding. Dancing involves moving lots of bones and muscles. They joy a dancer feels is part of endorphins and other aspects of brain chemistry.
(But it would be dreadful to assume the Cartesian thought must somehow oppose or debunk the earlier type of thought. No one would say, "Little girl, your dancing is just bones and muscles moving. Your joy is just brain chemistry.")
Systems Theory philosophy views parts in networks. Once the girl who loves dance is a little older she starts to understand how she is part of a family, and a school, and a community, and a dance class, etc.—and what those connections offer her and what she offers to others.
Quantum Mechanical philosophy teaches us to see things as clouds of possibilities. That girl as a young teen wants to grow up to be a dance teacher. She might! But maybe she will also become a writer. That would also be nice. And so on. All those possibilities are a part of her. But not everything is a possibility. She is not going to grow up to be an umbrella, or a shepherd in Alaska.
Modern philosophy notices that Systems Theory philosophy always looks from the outside at a network. If we asked the young woman who loves dancing what she herself thinks about her network, what would she say? What does her network think of her? Then words like "justice", "inspiration" and "integrity" appear that do not have a place in Systems Theory.
All these philosophies can exist together in a holistic and complimentary way. For not only does our fictional young woman see herself in all these ways, but she wants others to see her in all those ways too. She has a certain height and appearance. Her years of ballet have done some harm to her ankles and toes, which affects her today. She is a mother, teacher, and friend. She still has dreams and possibilities. She helps inspire people to better understand themselves and their community.
It is often a challenge to see other people holistically. But doing so—considering their appearance, parts, networks, possibilities, and internal thoughts and points of view—is a key part of treating other people as we treat ourselves.
Be holistic! One lesson of weighted average math problems is that people are a mix many possibilities, and it helps to consider a person as a cloud of possible future versions of themselves.
Watch this video of Ryan Hayashi's coin magic. He is uncertain! He is nervous! His hands shake like crazy! But he is utterly convinced that the situation is worthwhile and meaningful, and has moved beyond thinking about success or failure. And in response, for the first and only time, Penn and Teller tell a magician that they got so drawn into his act, and wrapped up in his confident energy, that they actually forgot to keep analyzing his routine.
Watch Mindwalk for a relxing overview of the evolution of philosophy. Note that the film is from 1990 and thus stops with Systems Theory philosophy.
Pattern blocks come in several shapes.
The green triangle is smallest.
Two green triangles make a blue diamond. (If you want, be especially mathy and call it a rhombus.)
Three green triangles make a red trapezoid.
Six green triangles make a yellow hexagon.
Pattern blocks can be used for many kinds of math activities.
If we call a green triangle "one", then we can teach multiplication. We can physically model 2 × 6 = 12 by asking how many green triangles are needed to cover two yellow hexagons.
If we call a yellow hexagon "one", then we can teach fractions. We can physically model 12 × ^{1}⁄_{3} = 4 by asking how many yellow hexagons are covered by twelve blue diamonds.
But we are going to use pattern blocks while being a bit more philosophical. Instead of explaining arithmetic, we want to investigate language.
My scale tells me that a green triangle pattern block weighs 1.5 grams.
Hm. What do we mean by typical? Discuss this with your classmates and develop an answer that you are prepared to defend before the class, using the pattern blocks as props if that helps.
There is no right answer. The definition of the word typical is somewhat slightly vague language issue.
However, people's answers tend to focus on three slightly different ideas:
Some people focus on what appears the most often.
Those people pick the blue block, because we have six of those. So a "typical" block would be blue, and would weigh 1.5 grams × 2 = 3 grams.
Some people throw out items that only appear once or twice. Those items do not appear often enough to be considered representative of the group. They focus on what appears the most representative.
Those people also pick the blue block. But not because of how many blue blocks there were. Instead, because they threw out the green and red blocks. Again a "typical" block would be blue, and would weigh 1.5 grams × 2 = 3 grams.
Some people feel a need for either inclusiveness or precision, and take an average. They sum the values and then divide by how many values there were.
Those people would not pick a kind of block. Instead they do more calculation. The total weight is (1.5 × 2 greens) + (3 × 6 blues) + (4.5 × 1 red) = 25.5 total grams. Then dividing by how many blocks there gives an average weight of 25.5 total grams ÷ 9 blocks ≈ 2.8 grams.
Note that the first and second kind of people would say a blue block is typical. But the third kind of people would say that a blue block is a bit more than typical. Does that matter?
The people that focus on most often would still pick the blue block, because we have six of those, and still say 3 grams.
The people that focus on most representative would still pick the blue block, because we have one or two of the other kinds, and still say 3 grams.
The people that calculate the average would now get a total weight of (1.5 × 2 greens) + (3 × 6 blues) + (4.5 × 1 red) + (9 × 2 yellows) = 43.5 total grams, which would change the average weight to 43.5 total grams ÷ 11 blocks ≈ 4 grams.
There is no right answer. We can disagree about definition of the word typical.
It was perhaps Benjamin Disraeli who first said, "There are three kinds of lies: lies, damned lies, and statistics." That quotation is often said to imply that statistics can be purposefully used to mislead.
But the point to our investigation of the word typical is how people can get different answers without any purposeful deceit. Statistics can be naturally uncertain because language is naturally ambiguous.
Fortunately, mathematicians have tools to remove the uncertainly and ambiguity.
Certainly we want some math terms to help us recognize when someone might be misleading us with statistics.
But more important is to develop an intuition about what happens when real life is not as typical as the formulas expect.
The formulas and guidelines are accurate and reliable! But the ones about home ownership assume a typical home, and the ones about saving for retirement assume a typical life progression from younger (with little income) to older (with more income). In reality, homes and lives are all unique, and future home prices and salaries are guesses. We plug fuzzy numbers into reliable formulas.
Good news! Math can still help us, even when we do not fully trust the numbers. Let's see how.
I see that the life of this place is always emerging beyond expectation or prediction or typicality, that it is unique, given to the world minute by minute, only once, never to be repeated. That this is when I see that this life is a miracle, absolutely worth having.
 Wendell BerryThat is the crowning unlikelihood, the thermodynamic miracle...Come, dry your eyes, for you are life, rarer than a quark and unpredictable beyond the dreams of Heisenberg; the clay in which the forces that shape all things leave their fingerprints most clearly.
 Watchmen, Chapter IX
The average you we used above with pattern blocks is formally called the mean.
The Mean
To find the mean of a group of numbers, first add up all numbers and then divide by how many numbers are in the group.
When people say "average" it should be safe to assume they are talking about the mean, unless they say otherwise.
The mean is the most commonly used average because in many everyday situations the mean does what we want. It looks at a set of numbers and provides an answer close to most often but more accurate, and close to most representative but more inclusive.
First we add up all the numbers.
34 + 36 + 36 + 36 + 37 + 37 + 38 + 39 + 40 + 40 + 41 = 453 pounds
Then we divide that total by 12, because there are 12 students.
453 pounds ÷ 12 = 37.75 ≈ 38 pounds
Notice why the answer to the previous problem feels right. It is close to what number appears most often. Also, looking at the histogram it seems most representative.
That last point is important. The mean is where a histogram balances if it were a measuring scale.
First we add up all the numbers.
2 + 2 + 6 + 10 = 20
Then we divide that total by 4, because there are 4 blocks.
20 ÷ 4 = 5
(We can also see this illustration in a different way. The two blocks on the left side are each 3 spots from the center, for a total left hand weight of 6. The two blocks on the right side are 1 and 5 spots from the center, for a total right hand weight of 6. The balance spot is accurate because the left and right hand weights both total six.)
When we want to talk about averages, not all groups of numbers are equally friendly. Here are four histograms that show test scores for a History class.
The class shown in histogram A has two subgroups of students. About half the students did poorly on the test, and half did well. We foolishly could calculate the mean of this group of numbers but doing so would be inappropriate and misleading. There are really two subgroups, each with their own typicality.
The class shown in histogram B has two subgroups of students, and test scores are even more extreme. In this class the students tended to do really terrible or really amazing. Very few were in between. As before, we foolishly could calculate the mean of this group of numbers but doing so would be inappropriate and misleading because there are really two subgroups with their own typicalities.
The class shown in histogram C does not have subgroups. This histogram has one big clump. For this classs we sensibly could calculate the mean of this group of numbers. Doing so is appropriate and helpful. The class has meaningful typicality.
The class shown in histogram D does has niether one big clump nor multiple subgroups. In this class the student scores look almost random. For this classs we hesitantly could calculate the mean of this group of numbers. Doing so is in theory appropriate because the students are in a single, meaningful group. But it is probably not be helpful. The class lacks typicality because no score is most common or most representative of the random mess. Yes, its single group has an average, but so what? For what purpose are we trying to condense this mess of test scores into a single summary number?
There is a name for how those four histograms difffer in obvious and important ways.
The name comes from considering a question. If the mean was representative of a standard value for the group, how much do all the numbers in the group deviate from that standard?
Standard Deviation
The measure of how poorly a group of numbers forms a single, meaningful clump is called standard deviation.
The formula for calculating standard deviation is not a part of this math class. But without a formula you can visually sort the four histograms.
Histogram C has the smallest standard deviation because in that group the numbers huddle in one big clump. They deviate only a little bit from the mean.
Histogram D has slightly bigger standard deviation because in that group the numbers look random. They are neither pulled toward nor pushed away from the mean.
Histogram A has a big standard deviation because in that group the numbers look pushed away from the mean. They are forming a pattern away from a single standard value.
Histogram B has the biggest standard deviation because in that group the numbers look almost allergic to the mean. They are hiddling against the far edges, as if avoiding being anywhere near a single standard value.
To repeat, the concept of standard deviation is important. Not all groups of numbers have a meaningful representative average. It is useful to be able to talk about this, to know there is a measure for this, and to realize you can visually sort using this.
If you ever actually wanted to know the value of the standard deviation for a group of numbers, you could always use an online tool.
The four standard deviations are:
 Group A: 27
 Group B: 37
 Group C: 11
 Group D: 26
Yes, or visual sorting was correct. Also notice how much lower the value for group C is than the others! It is the only group of test scores for which the average value is clearly meaningful.
All four of this histograms were mostly symmetric. What happens if our group of numbers has some atypically low or high values that make its histogram unsymmetric?
Let's return to that preschool classroom in which the students' weights are 34, 36, 36, 36, 37, 37, 38, 39, 39, 40, 40, and 41 pounds.
Now three of the preschoolers insist their favorite stuffed animals, with weights 1, 1, and 2 pounds, also be included.
Those new weights atypically low. They make the histogram unsymmetric!
First we add up all the numbers.
34 + 36 + 36 + 36 + 37 + 37 + 38 + 39 + 40 + 40 + 41 + 1 + 1 + 2 = 457 pounds
Then we divide that total by 15, because there are 15 "friends".
457 pounds ÷ 15 ≈ 30 pounds
We calculated that answer correctly. But the answer feels very wrong. The number 30 does not describe what is most common, and is not representative of anything in the room.
It feels wrong that the stuffed animals have such a big influence. What can we do better?
We need a different kind of average. What if, instead of doing any calculation, we simply picked the middle number of a sorted list?
The Median
To find the median of a group of numbers, first sort the list of numbers in order and then pick the middle number in that sorted list.
If the list has an even number of values then there will be no middle value. Instead we find the mean of the two values most in the middle.
First we sort the list.: 1, 1, 2, 34, 36, 36, 36, 37, 37, 38, 39, 40, 40, 41
The two middle values are 36 and 37. Their mean is (36 + 37) ÷ 2 = 36.5 pounds
This answer feels right. There are indeed preschoolers who weigh around 36.5 pounds. The number 36.5 does an okay job showing what is most common, and is representative for that classroom.
That median is less than the mean when we did not include any stuffed animals. But it feels okay that the stuffed animals have a measurable but not dramatic influence.
Notice that process of finding the median throws out any atypical smallest or largest values. This is the type of average for those second category of people who felt most natural defining "typical" by throwing suspiciously extreme or rare amounts.
In other words, the median focuses on most representative numbers of the group. As the previous problem demonstrated, the median can also do a respectable job at estimating the most common.
You can think of the median as ignoring the most extreme values on a histogram and then finding where the rest of them balance. That is not quite how the median works, especially if the group of numbers is small. But in the real life situations where the median is used it will ideally behave that way.
The median is appropriate and often used for company salaries, state incomes, neighborhood house values, and other situations where the lowest and highest values really should not be thought of as representative of the group.
Khan Academy
Just for the sake of completeness, know there is a third kind of average called the mode.
The Mode
The mode of a group of numbers is the number that appears most ofen. If there is a tie, all ties are modes.
(On a histogram, the modes are the tallest bars.)
The only number to appear three times is 36. So the mode is 36 pounds.
The mode is so very rarely used in real life that it is only taught in math classes because of tradition. It will not appear in homework or a test.
The mode can count the "popular vote" in an election. But it focuses too much on most common and as a result does a terrible job of measuring most representative.
Try these ten exercises on scratch paper. Work in a study group if you can! Notice where your notes need improvement. After you are very happy with your answers, you can use this form to ask me to check your work. Can you get at least 8 out of 10 correct?
1. Find the mean of these six numbers: 135, 95, 11, 5, 33, 15.
2. Continuing the previous probem, find the median of those six numbers.
3. The bar chart below (original source) shows the number of books read by different children. What is the mean number of books read?
4. Continuing the previous problem, what is the median number of books read?
5. The home values on a certain street, in thousands of dollars, are: 384, 364, 342, 346, 360, 356, 265, 417, and 530. What is the mean of these home values?
6. Continuing the previous problem, what is the median of those home values? Why does this type of average better communicate the "typical" value of a home on that street?
7. A shipping company needs to transport seven freight containers. Their weights are 10, 16, 16, 18, 20, 60, and 77 tons. What is the mean and median weight of these freight containers?
8. Two company clerks receive a report that only contains the mean and median weights, and number of containers, from the previous problem. The first clerk tries to find the total weight by multiplying the mean by the number of containers. The second clerk tries to find the total weight by multiplying the median by the number of containers. Which clerk is correct? Why? How much error does the other clerk have?
9. Some news articles make a big deal when many countries have an average temperate increase well above global average (Alaska, Canada, Russia, Norway, Finland, Switzerland, China, Singapore, Australia, South Africa, etc.) How does a better understanding of averages explain that having many items above average is neither surprising nor sensationalism?
10. During the 2007 strike of the Writer's Guild of America, two different news reports painted very different pictures of these screen and television writers.
• According to CNBC, there were 4,434 guild writers who worked fulltime in 2006, and their average salary was $204,000. (CNBC headline, October 11, 2007)
• According to the Los Angeles Times, the median income of the writers from their guildcovered employment is $5,000 a year. (Howard A. Rodman, October 17, 2007)
Were Hollywood's writers very wealthy and going to strike even though they earned much more than most Americans? Or were they poor and going on strike to defend the few thousand dollars they could earn from their writing? The headlines leave out two important facts. First, almost half of the guild's writers don't write anything in a given year (their salary that year is $0). Second, a very few writers earn millions of dollars. How does a better understanding of averages explain the situation more clearly?
Try these exercises on scratch paper. Work in a study group if you can! Notice where your notes need improvement. Check your work when you are done.
Conquering the Generalization Instinct
The sixth chapter of Factfulness includes many examples of when sorting information into the wrong "bins" causes incorrect conclusions. Hans Rosling describes several ways this happens.
Bins might hide differences. A bar graph showing the average income with a bin for each country has no practical value. Within each country are dramatic differences in income. Those bins attempt to group together people who are actually too different.
Bins might hide similarities. Everywhere in the globe, the main factor that affects how people live is their income level, not their country, culture, or religion. A bar graph showing literacy rates with a bin for each religion is promoting stereotypes. The meaningful comparison is literacy rates with income level. If bins about religion make any trend it would only be a sideeffect of how religions and income levels match up.
Bins might measure averages. As we have seen, many groups of numbers have no meaningful average. No single number is "most common" or "most representative". If colllege students tend to equally drink no tea or a lot of tea, then a histogram showing the average number of cups of tea drunk by the students in different classrooms will have valid bins (each classroom) but meaningless heights (no students actually drink what looks like a common medium amount).
Bins might measure majorities. Many situations have no meaningful majorities. Most math students get a majority of questions correct on their math tests. But test scores of 51% and 91% are hugely different!
Bins might measure something meaningless. A high school could publish a histogram of its graduates' SAT scores. But those SAT scores do not reliably predict anything, so the histogram has no meaningful implications.
Bins might make generalizations. A bar graph might show what percentage of people in each country have annual dental cleanings. But different cultures place different emphasis on dental care. That statistic might meaningfully represent one country's overall health care, but say nothing meaningful about another country.
Only special cases might be included in the bins. A histogram showing the test scores of all a high school's math students would look very different from a histogram showing the test scores of the students in the AP math class.
Heritability describes something about a population of people, not an individual. It makes no more sense to talk about the heritability of an individualâ€™s IQ than it does to talk about his birthrate.
 Richard Hernstein
Some histogams form one very symmetric clump. These are called bell curves because they are shaped like a bell. They are sometimes instead called normal curves or a normal distribution.
Bell curves can happen when an average result is the most common, being higher or lower than average is equally likely, and being way higher or lower could happen but is rare.
Repeatedly finding the sum of two dice will create a bell curve.
Notice that making a bell curve reliably requires counting a very large number of things. The histogram for each student group might not look at all like one symmetrical clump. But after combining the entire classroom's counts we do get a bell curve.
We saw how a random event can make a bell curve. Where else do bell curves naturally happen?
Heights and weights are two natural physical characteristics of people that will form a bell curve if a large enough group of people are measured.
Not many! Natural characteristics that form bell curves can be difficult to find.
Vision is one. Most people have close to "normal" vision with nearsightedness and farsightedness forming the sides of the curve.
Bicep strength is another. Perhaps our class should try dumbell curls?
Grip strength is another.
Within a homogenous population, blood pressure and life expectancy can be others.
Consider a pair of mostly similar bell curves counting people.
Because the two bell curves have the same area, they both count the same number of people.
Because the two bell curves have the same middle value, the average is the same for both sets of people.
We already have the vocabulary to describe the difference. The blue curve is more spread out, so it has a higher standard deviation.
In the pink bell curve more people are average or nearly average, and fewer are notably higher or lower. Because of how bell curves work, this effect is exaggerated near the middle and extremes. The pink bell curve has a lot more average people, and a lot fewer extremely high or low people.
In the blue bell curve more people are funky. Most are still average or nearly average. But comparatively more are notably higher or lower. Because of how bell curves work, this effect is exaggerated near the middle and extremes. The blue bell curve has a lot fewer average people, and a lot more extremely high or low people.
Both bell curves have mostly average people. Both curves have the same meaningful average that represents both most common and most typical. Neither curve is a random mess.
Both bell curves have some very high or low people. The "tails" of both bell curves remain above zero as they go on and on. Neither curve has a monopoly on extremes.
Imagine these curves measured height. The pink bell curve is measuring a group of people that strongly tend to be about average height—not too many are much taller or shorter, and the group has very, very few giants and dwarves. The blue bell curve is measuring a group of people with less tendancy to be average height—many more are taller or shorter than average, and although the group still has very few giants and dwarves, if you happened to see a giant or dwarf it would probably be from the blue group.
Remember that few natural demographic characteristics formed bell curves. We had trouble brainstorming any besides height and weight.
Far more common is when people purposefully construct tests designed to sort people into a bell curve, such as IQ Tests or college entrance exams. Because these tests are designed to make the bell curve happen, they can do it very well. But that comes at the cost of sacrificing doing anything else well. IQ Tests are famous for measuring not intelligence but "whatever it is IQ Tests measure". College entrance exams do not reliably predict whether a student is actually ready for college and will successfully earn a college degree.
Be especially wary if you read about socioeconomic characteristics that form a bell curve!
Very, very few places have bell curve distributions of wealth, income, education, etc. Those bell curves are rare!
But the field of sociology (especially in America) has an alarming track record of expecting these to form bell curves. So researchers use increasingly sloppy methods of data collection or analysis until their expectations are finally met.
Remember that histogram bin choices can change the shape of the histogram. There are other ways that sloppy choices (even if made unknowingly and accidentally) can misrepresent data.
Part of math jargon is giving certain words a very precisely defined definition. The words "average", "function", and "parallel" have a meaning in math that resembles their standard use but is more specific. In other math classes you might learn about special math definitions for the words "commute", "set", "group", or "normal".
Sometimes the opposite happens, and phrases that were originally a precise math definition shift to casual English usage and acquire a broader and fuzzier meaning.
In math jargon, "grading on a curve" means something very specific. The test must be of a kind designed to have its scores produce a bell curve (such as the SAT or ACT college entrance exams). There must be enough scores to make sure that bell curve happens (the way it took many coin flips to make that situation have a bell curve).
Only then it would make sense to use the bell curve histogram to assign grades. Most scores are average and get a C. A smaller number of scores are slightly below or above average and get a D or B (and the number of D's and B's is nearly equal). A very few scores are exceptionally low or high and get a F or A.
When designed properly, and refined over time, tests like these are so predictable that grades can be assigned to scores before the test happens! The previous group of scores will have histograms almost identical to the next group of scores. The test designers know where the distinction between each letter grade will happen even before the next test happens.
All of this is what mathematicians mean by "grading on a curve". The test has such a careful design and wellestablished history that everyone knows in advance where the distinction between each letter grade will happen.
Needless to say, there are very, very few college tests that fit this model. Only large universities have classes with enough students so that there are enough scores to make a nicely full histogram. Even in those big classes, there are only a few tests are designed primarily to create a bell curve of scores.
And remember that tests designed to make a bell curve usually cannot also be designed to measure student readiness for future success—most instructors value the latter and try to write tests accordingly.
So when most college instructors says they "grade on a curve" they are probably not using those words as a mathematician would.
Ask those instructors what they do mean by that phrase!
Often an instructor who says that he or she "grades on a curve" is actually using that phrase as part of an attempt to explain that he or she knows the tests will not reliable and predictably form a bell curve, and that he or she knows better than to actually grade on a curve!
On another note, the following image from a book about mental health attempts to show that a normal amount of stimulus is healthy even though excessive stimulus is too stressful. The graph below is not a histogram, and thus not a bell curve! Why not?
How could we change one word to make the graph a bell curve describing a population's stress level?
How could we change one word to make the graph a bell curve describing a person's stress level in different days of their life?
Try these ten exercises on scratch paper. Work in a study group if you can! Notice where your notes need improvement. After you are very happy with your answers, you can use this form to ask me to check your work. Can you get at least 8 out of 10 correct?
The most interesting part of bell curves is how they are used to sort people and shape society. That is not a great source of small math problems! So please pardon a brief tangent into our local contributions towards climate change.
In Oregon, a household's energy usage per month forms a histogram that resembles a bell curve reasonably well for reallife data.
The histogram below shows the monthly electrical useage for a household that has electric heat.
1. In which month does this household use the most electricity?
2. Which month has the least deviation from year to year?
3. Someone asks, "What is this household's typical monthly electrical usage?" What would make a numeric answer to this question meaningful?
4. When did this family replace their old electric furnace with a modern and more efficient heat pump?
5. The total electrical usage for 2017 was 14,780 kilowatt hours. The total electrical usage for 2019 was 11,513 kilowatt hours. How much less electricity was used in 2019?
6. Electricity costs an average of 11.3 cents per kilowatt hour. How much less money was spent on electricity in 2019 than in 2017?
7. What was the percentage decrease of this household's total annual electricity usage when comparing 2017 to 2019?
8. In this city 80% percent of the electric power is from carbonfree hydroelectric energy, making an overall CO2 emission of 16.2 grams per kilowatt hour. How many kilograms of CO2 did this household's electric use create in 2019?
9. A typical gasoline automobile's emission is 8.8 kilograms of CO2 per gallon. This household's car gets 30 miles per gallon. How many miles would they need to drive to the equal CO2 emissions of their annual electricity use?
10. That household actually drives 10,000 miles per year, which is equivalent to about 333 gallons of gasoline. Find this household's total kg of CO2 emissions for house and car, and then divide by 1,000 to convert kilograms to metric tons. Purchasing a carbon offset costs about $14 per metric ton of CO2. What is the value of the carbon offset cost for this household's house and car?
For most households the energy used to grow and transport food is a larger carbon footprint than home heating or vehicle driving. You can use a website such as CarbonFootprint to estimate your own numbers.
The Central Lane Metropolitan Planning Organization has found that in the EugeneSpringfield area the mean household carbon footprint is 31.9 metric tons of CO2, and the mean carbon footprint per person is 13.8 metric tons of CO2.
(no random exercises for this topic)
The ninth chapter of Factfulness talks about blames versus causes. Hans Rosling writes:
The blame instinct makes us exaggerate the importance of individuals or of particular groups. This instinct to find a guilty party derails our ability to develop a true, factbased understanding of the world: it steals our focus as we obsess about someone to blame, then blocks our learning because once we have decided who to punch in the face we stop looking for explanations elsewhere.
The same instinct is triggered when things go well. "Claim" comes just as easily as "blame". When something goes well, we are very quick to give credit to an individual or simple cause, when again it is usually more complicated.
...It's almost always about multiple interacting causes—a system. If you really want to change the world, you have to understand how it actually works and forget about punching anyone in the face.
Statistics can be used to place blame. For example, Coleman Hughes explains the gaplens and pastlens with fascinating examples.
Bell curves especially are misused by people to try to create villains or heroes. Bell curves had distract us from looking at root causes and systems.
"Did you know that adult chronic criminals, as a group, tend to have unusually low IQs? That must mean that IQ predicts something about misbehavior!" Actually there is no connection between IQ and villains. Instead there is a systemic cause. Men born with chromosonal disorders have a higher than normal chance of developing both behavior problems and lower IQs. Many of these men become adult chronic criminals. The actual issue is whether society can develop a system for dealing with chromosonal disorders, not how it blames lowIQ individuals.
We get to play with more toys. Yay!
Each group of students shares one plastic spinner with a clear base.
We can put a spinner onto a circle like the one below to make a game.
The first circle is a very boring game. Half the time a player wins $1. Half the time a player loses $1.
What we want to focus on is that the game is fair. This means that if someone played it a whole lot—long enough for any rare streaks of good or bad luck to cancel out—then overall they would not gain or lose money.
Pretend the circle is a pie, and we are cutting the pie into slices.
You could think that the +2 needs to happen half as often as the −1.
Or you could think that the −1 needs to happen twice as often as the +2.
Either way, we should give the −1 two pie slices, but only give the +2 one pie slice.
That is a total of 2 + 1 = 3 pie slices. So we cut the pie into thirds. The result looks like this:
Pretend the circle is a pie, and we are cutting the pie into slices.
You could think that the +3 needs to happen onethird as often as the −1.
Or you could think that the −1 needs to happen three times as often as the +3.
Either way, we should give the −1 three pie slices, but only give the +3 one pie slice.
That is a total of 3 + 1 = 4 pie slices. So we cut the pie into quarters. The result looks like this:
Pretend the circle is a pie, and we are cutting the pie into slices.
If we give the +1 and +2 each one slice of pie, the slice for the +2 counts double. We effectively have three slices of pie for winning +1.
This means we need three slices of pie for the −1.
The two winning numbers each get one slice. The −1 needs three slices.
That is a total of 2 + 3 = 5 pie slices. So we cut the pie into fifths. The result looks like this:
The probability of a situation happening is the ratio of desirable outcomes to total outcomes.
Because of tradition, probabilities are usually written as unreduced fractions or changed into percent format.
Problems that involve probability almost always involve a bunch of counting. Usually there are no convenient formulas to help us. We need to make lists or tables to count the outcomes.
A classic example of probability is rolling two dice and adding their values.
Looking at the green boxes on the chart, we see that six out of thirtysix possibilities have a sum of seven.
So the probability is ^{6}⁄_{36}
We could change this fraction into percent format. 6 ÷ 36 ≈ 0.167 = 16.7%
We would not usually reduce this to ^{1}⁄_{6} because that would imply a simpler situation with only six outcomes, of which one outcome is considered desirable.
Looking at the green boxes on the chart, we see that three out of thirtysix possibilities have a sum of ten.
So the probability is ^{3}⁄_{36}
We could change this fraction into percent format. 3 ÷ 36 ≈ 0.083 = 8.3%
We would not usually reduce this to ^{1}⁄_{12} because that would imply a simpler situation with only twelve outcomes, of which one outcome is considered desirable.
Imagine there is a gumball machine with equal amounts of three colors of gumballs: red, green, and blue. The table below shows all twentyseven possibilities for getting three gumballs.
Nineteen of the twentyseven possibilities have at least one blue gumball.
So the probability is ^{19}⁄_{27}
We could change this fraction into percent format. 19 ÷ 27 ≈ 0.7 = 70%
We would not usually reduce this if we could, because that would imply a simpler situation.
Notice that the probability of a certain event is 1 (or 100%).
Notice that the probability of an impossible event is 0 (or 0%).
Notice that the probability of all events must be between 0 and 1 (so between 0% and 100%).
The odds of a situation happening is the ratio of desirable outcomes to undesirable outcomes.
Because of tradition, odds are usually written as reduced fractions. They are not changed into percent format.
Looking at the green boxes on the chart, we see that six out of thirtysix possibilities have a sum of seven, and thirty do not.
So the odds are 6 to 30, which inormally would be written reduced to 1 to 5.
Looking at the pink boxes on the chart, we see that three out of thirtysix possibilities have a sum of ten, and thirtythree do not.
So the odds are 3 to 33, which inormally would be written reduced to 1 to 11.
Looking at the gumball chart, we see that 19 out of twentyseven possibilities have at least one blue, and 8 do not.
So the odds are 19 to 8, which inormally would be written reduced to 2 to 1.
In this class we will always write odds using the word "to". For example, 1 to 5. Other math books, websites, and reallife contexts might use a colon instead, and write the same ratio 1 : 5.
Notice that the odds of a certain event is 1 to 0.
Notice that the odds of an impossible event is 0 to 1.
Notice that the odds of all events could have any two numbers. A lottery can have odds of "a million to one".
Math Antics
OCLPhase2
There are a few important probability issues more complicated than simply counting the probability or odds of an outcome happening. To discuss them clearly we must unfortunately introduce more jargon.
The individual and specific ways a situation can happen are called outcomes.
A collection of outcomes that we group together as a single desired result is called an event.
So in our table of rolling two dice and adding their values the six green squares (where we colored a sum of seven) were six different outcomes grouped together as one event.
We could list these six different outcomes if we really wanted: roll 1 and 6, roll 2 and 5, roll 3 and 4, roll 4 and 3, roll 5 and 2, roll 6 and 1.
We cannot list the event, we can only name it: "the sum is seven".
Some events contain a single outcome. In the table above, the event "the sum is 12" only happens with the outcome of rolling two 6s. An event that only contains one outcome is named a simple event.
Many events contain several outcomes. In the table above, the event "the sum is 10 or more" contains six outcomes (the three shaded pink events with a sum of 10 as well as the three events below and to the right of those with higher sums). Those are called compound events.
Notice that in the table above, the event "the sum is 7" is a compound event. There are six shaded green events within this outcome.
The complete group of all events is named the sample space.
When we roll two dice and add their values, the sample space contains eleven events: "the sum is 2", "the sum is 3", and so on all the way up through "the sum is 12". In the table those events in the sample space are the eleven diagonals going southwesttonortheast.
Notice that a reallife situation can have many different sample spaces, depending upon which questions are asked. When rolling two dice, we could ask lots of things about the sum.
Notice that events can overlap. When rolling two dice, we could ask, "What is the chance of rolling 5 or more?" and also ask, "What is the chance of rolling 9 or more?"
Many reallife problems involve a sample spaces that is completely covered by two nonoverlapping events. Either we roll a 7 or we do not. Either we roll an odd number or we do not. We call those two complete yet exclusive options complementary events.
Unfortunately, very similar jargon is used when two events have no predictive effect on each other. For example, when we roll two dice they do not effect each other's results. (In contrast, the weather in two nearby cities will have a predictive effect. Rain in one will mean a higher chance of rain in the other.) Events that contain no predictive information about the other are called independent events.
It is very common for students who are first studying probability to confuse complementary events and independent events. Perhaps the following example will help you remember.
Complementary vs. Independent Events
Whenever Slightly Sleazy Sam goes to a dance he randomly tells half the women he dances with that they have pretty eyes.
The events "he says I have pretty eyes" and "he does not say that" are complementary. (Although neither is genuinely complimentary).
The events "he says I have pretty eyes" and "he says my best friend has pretty eyes" are independent. (He does not realize you two are friends).
When events are not independent we can talk about how much they are conditional on each other: how much probability of one event changes if we know (either by measuring or assuming) that the other did indeed happen.
An example of coniditional events involve home pregnancy tests, which very seldom give false positives but commonly give false negatives. The table below summarizes a study in which 900 women were asked to use a home pregnancy test, and then also used a completely reliable blood test at a doctor's office to doublecheck.
Pregnant  Not Pregnant  

Positive Home Test  322  3 false positive 
Negative Home Test  277 false negative  298 
There were 322 people who both were pregnant and had a positive home test. There were 900 people total.
So the probability would be 322 ÷ 900 ≈ 0.36 = 36%.
Unsurprisingly, about twice as many women who suspected they might be pregnant volunteered for this study. That skewed the totals to a 66% vs. 33% for pregnant or not. Among those who were pregnant, the test correctly identified more than half, resulting in our answer of 36%.
There were 322 people who both were pregnant and had a positive home test. There were 325 people with a positive home test.
So the probability would be 322 ÷ 327 ≈ 0.99 = 99%.
The home test is very reliable when it is positive.
There were 277 people who both were pregnant and had a negative home test. There were 575 people with a positive home test.
So the probability would be 277 ÷ 575 ≈ 0.99 = 48%.
The home test is not very reliable when it is negative. There is a 48% chance the result is a false negative!
Some events will only happen if another event happens. This most extreme type of conditional events is called contingent events.
It is easy to brainstorm contingent events. I only eat birthday cake if I am at a birthday party. I only climb trees when I am outside.
Most dice games ask the players to pick up all the dice with each roll. Situations like this are called outcomes with replacement. It does not matter what either die rolled previously, because it will be picked up and rerandomized for the next roll.
A Ten Card Puzzle: With Replacement
Someone makes a small deck of cards with the ace through ten of hearts.
The person shuffles and reveals the top card. Then the person puts that card back, reshuffles, and again reveals the top card. What is the chance both revealed cards are odd numbers? (Ace is considered an odd number.)
Our outcomes are pairs of cards that can contain duplicates. We can list them: two aces, ace and two, two and ace, and so on...
Our events are pairs of odd/even. When we make a table of all four possible events we have described the sample space.
even then even  even then odd  odd then even  odd then odd 
All four events are equally likely.
even then even ^{1}⁄_{4} 
even then odd ^{1}⁄_{4} 
odd then even ^{1}⁄_{4} 
odd then odd ^{1}⁄_{4} 
The rightmost table entry tells us the probability that both revealed cards are odd is ^{1}⁄_{4} or 25%.
Most card games games, however, ask the players to leave certain cards face up after shuffling and revealing the top card. Situations like this are called outcomes without replacement. It does matter what the first revealed card is, because we then know the rest of the deck does not contain that card.
A Ten Card Puzzle: Without Replacement
Someone makes a small deck of cards with the ace through ten of hearts.
The person shuffles and reveals the top card. But the person does not put that card back! If they drew the ace, for example, the deck now has only nine cards.
The person then reveals the next top card. What is the chance both revealed cards are odd numbers? (Ace is considered an odd number.)
Our outcomes are pairs of cards that cannot contain duplicates. We can list them: ace and two, two and ace, and so on...
Our events are pairs of odd/even. When we make a table of all four possible events we have described the overall sample space.
even then even  even then odd  odd then even  odd then odd 
Actually, it will be more helpful to write each entry in the table in reverse chronological order. We will rephrase each event to first describe what happens last in time.
even after even  odd after even  even after odd  odd after odd 
All four events are not equally likely.
Consider the event named "odd after even". The probability that the first card is even is ^{5}⁄_{10}. But then the deck has only nine cards, of which five are still odd (since this event requires first removing an even card).
So the probability that the second card is odd is ^{5}⁄_{9}.
We must ask ourselves, "What is ^{5}⁄_{9} of the chance the first card is even?"
In other words, that question becomes "What is ^{5}⁄_{9} of ^{5}⁄_{10}?"
The word of signifies multiplication, as it usually does. So that event's overall probability is ^{25}⁄_{90}.
The other three events can be found similarly.
even after even ^{4}⁄_{9} × ^{5}⁄_{10} = ^{20}⁄_{90} 
odd after even ^{5}⁄_{9} × ^{5}⁄_{10} = ^{25}⁄_{90} 
even after odd ^{5}⁄_{9} × ^{5}⁄_{10} = ^{25}⁄_{90} 
odd after odd ^{4}⁄_{9} × ^{5}⁄_{10} = ^{20}⁄_{90} 
Now we see why it helps to describe the table entries in reverse chronological order. It allows us to more naturally use the English language to ask "What is a more recent chance that modifies a previous chance?"
The rightmost table entry tells us the probability that both revealed cards are odd is ^{20}⁄_{90} or about 22%.
Changing the rules to make the situation without replacement reduced the probability of our desired outcome by 3%!
The previous example makes me uneasy.
Yes, I agree that the situation has an end result, what I called the overall sample space, with only four options.
But when I use the English verb "sample" in daily life it means to randomly take one time.
When I sample some chocolates I take one piece (or one pinch with a few pieces) but I do not reach into the bowl many times. That would be a different verb: helping myself to the chocolates, or eating the chocolates, or pigging out on chocolate.
When I sample some music I listen to each song briefly. I do not cycle among the songs, listening to each multiple times. That would be a different verb: comparing the songs, or evaluating my mood, or deciding what to listen to.
So my brain more naturally says the second card dealing situation has two active sample spaces: one with ten cards, then another with nine cards. Those are the groups of cards from which we actually take a single sample.
If that distinction also makes your brain happier then you will like what we do next.
A toddler owns four books.
Like most toddlers, she is fascinated by puting things in order and making combinations and patterns.
In how many orders can she put those books on her shelf?
We could make a table with every outcome. But it would be a big table! Too much work.
Instead let's think carefully.
When we pick out the first book to put on the shelf the current sample space has four options. So we have four choices.
When we pick out the second book to put on the shelf the current sample space has three options. So for each of those above four inprogress points we have three choices. We are up to 4 × 3 = 12 different ways.
When we pick out the third book to put on the shelf the current sample space has two options. So for each of those above twelve inprogress points we have two choices. We are up to 12 × 2 = 24 different ways.
When we pick out the four book to put on the shelf it is the only book left in our hands. The current sample space has only one option. We have no choice. We remain at 24 different ways.
We could draw a tree of the six choices that following first picking The Alphabet Room.
Similar trees could show the six choices that followed any of the other first picks.
Notice that what we ended up doing was multiplying 4 × 3 × 2 × 1 = 24.
Why did we multiply by a list of decreasing numbers? For the first book we had 4 options. For the second book we had 3 options. For the third book we had 2 options. For the last book we had 1 option.
Now we can talk about arrangements in more general terms.
An arrangement is a possible way to place items from a group. Arrangements do not allow repetition. Once we put an item in its place, it stays there. We do not select or put it again.
There are two questions to ask about arrangements.
• Do we place all of the items, or stop after we place only some of them?
• After we are done, do we care about the order of placed items? If we do, we say the arrangement is ordered. If we do not, we say the arrangement is unordered.
Here is a handy chart, which we will then explain.
As we talk about each type of arrangements, we will also ask what is the chance in that situation for a particular arrangement to happen if the options are all equally likely.
remember that arrangements have no repetition allowed  n Ordered Items  n Unordered Items  

Use Only r Items  Use All Items  Use Only k Items  Use All Items  
An Arrangement is Called  Partial Permutation  Complete Permutation  Combination  Trivial  
Way to Write How Many Possible Arrangements  P(n,r) n permute r 
n! n factorial 
C(n,k) n choose k 
1  
RealLife Example  who earns 1st, 2nd, 3rd place in a contest  put books on a library shelf  pick a few people from a group  everyone gets their hand stamped  
Formula  n! ÷ (n−r)!  n!  n! ÷ (n−k)! ÷ k!  1 
An ordered arrangement of all items is called a complete permutation. (Think of putting away books on a library shelf. All the books are put away. The order of the books does matter in a library.)
While looking at toddler books, we already saw the pattern for finding the number of possible complete permutations. We multiply, starting with the number of items and decreasing by 1 each time. This happens because the active sample space decreases in size by 1 each time we place a book on the shelf.
This pattern happens often enough that mathematicians give it a name, a symbol, and a button on the calculator. It is named "factorial" and the symbol is an exclamation mark.
We already saw that 4! = 4 × 3 × 2 × 1 = 24
As another example, 5! = 5 × 4 × 3 × 2 × 1 = 120
As another example, 6! = 6 × 5 × 4 × 3 × 2 × 1 = 720
In general, we use the letter n to represent a variable that is always a counting number (never a negative or decimal amount) and so write n! for "n factorial".
The Factorial Symbol for Complete Permutations
n! = n × (n − 1) × (n − 2) × ... × 2 × 1
The number of possible arrangements is 4! = 4 × 3 × 2 × 1 = 12 complete permutations of gumball colors.
So the chance of getting that particular arrangement is The number of possible arrangements is 1 ÷ 12 ≈ 0.08 = 8%.
An ordered arrangement of some items is called a partial permutation. (Think of a contest with three winners. Only some people in the contest win. The order of who earns 1st, 2nd, and 3rd place matters!) A partial permutation was called a variation in older literature.
We can think of a partial permutation as starting to make a complete permutation but then stopping early. Imagine that our in our example with a toddler's four books, the toddler only put two books on the shelf and then decided leave the others on the floor to chew on.
The multiplication for a partial permutation thus begins the same. The toddler has 4 options (of which we only pictured one above, first picking the book The Alphabet Room) and then 3 options (numbered with green).
But after doing that 4 × 3 she quits to start teething. She never gets to the × 2 × 1 part (marked with grey).
The clever way to write this involves using division of factorials.
First write too much of the multiplication. Go all the way down the decreasing list. We will color the numbers we really want green, and the others red. 4 × 3 × 2 × 1. Then we will remove the unwanted red numbers by using division. They are merely a smaller factorial, after all.
We can use parenthesis and color to emphasize how bits cancel out: 4 × 3 × (2 × 1) ÷ (2 × 1) = 12.
In other words, the number of partial permutations that involve a group of total of n items but only use r of them is n! ÷ (n − r)!
This pattern happens often enough that mathematicians give it a name, two symbols, and a button on the calculator. It is named "n permute r" and the symbols are _{n}P_{r} or P(n,r) depending upon whether you find subscripts or parenthesis more annoying.
The Permute Symbol for Partial Permutations
P(n,r) = n! ÷ (n − r)!
The number of possible arrangements is P(10,5) = 10! ÷ (10 − 5)! = 10! ÷ 5! = 30,240 partial permutations of five digits.
So the chance of getting that particular arrangement is The number of possible arrangements is 1 ÷ 30,240 ≈ 0.00003 = 0.003%. Threethousandths of a percent is extremely unlikely to happen!
An unordered arrangement of some items is called a combination. (Think of a raffle with three identical prizes for three winners. Only some people in the raffle win. The order of who wins does not matter.) A combination is also called a binomial coefficient in some literature.
When we moved from complete to partial permutations we "crossed off" or "cancelled out" many of the possibile arrangements. We said that complete arrangements with identical endings should be considered the same. When only considering the first two letters, the words RITE and RIET are the same.
Now we do that even more. We look at the partial permutations and say that the order of the letters does not matter. When only considering the first two letters and ignoring the order then the words RITE, RIET, IRTE, and IRET are the same.
So we need to "cancel out" those partial permutations whose r items we do use that are merely reorderings of each other. How many of those are there? We have already found this answer. There are r! of them.
In other words, the number of combinations that involve a group of total of n items but only use r of them is n! ÷ (n − r)! ÷ r!
To be extra clear, many texts switch which letters they use when taking about combinations. The letter n is still used for the total number of items. But the number of items used in the combination is k (instead of r). That way students who see an r know the problem deals with permutations, and students who see an k know the problem deals with combinations. This website will do this. You are encouraged to do this too, so that your scratch work will be as clear as possible when working with classmates or getting graded.
The combination pattern happens often enough that mathematicians give it a name, three symbols, and a button on the calculator. It is named "n choose k" and the symbols are _{n}C_{k} or C(n,k) or depending upon whether you find subscripts or parenthesis more annoying.
The Choose Symbol for Partial Combinations
C(n,k) = n! ÷ (n − k)! ÷ k!
The number of possible arrangements is C(15,3) = 15! ÷ (15 − 3)! ÷ 3! = 455 combinations of winners.
So the chance for a particular arrangement of three winners to happen is 1 ÷ 455 ≈ 0.002 = 0.2%. Twotenths of a percent is very unlikely to happen!
An unordered arrangement of all items is trivial. We put all the items on the shelf. Done! There is only 1 solution, because all shufflings are considered to be the same solution.
What happens if we do allow repetition?
An ordered mixture of some items that allows repetition is called an word, or a ntuple. If an alphabet has n different letters, the number of (probably unpronouncable nonsense) words with c letters that we can make is n^{c} because the choice of n letters is repeated on c occasions. You can think of the exponent c as counting the number of characters in the word.
The number of possiblefourcharacter "words" that can be made out of ten "letters" is 10^{4} = 10,000.
So the chance of that particular "word" is 1 ÷ 10,000 = 0.0001 = 0.01%. A hundredth of a percent is not very likely to happen!.
An unordered combination of some items that allows repetition is uninteresting except for Scrabble players. No one else wonders what words would be like if STOPS and SPOTS were the same word.
When we do arithmetic with numbers the basic operations are +, −, ×, and ÷. Sometimes there are extra rules with fractions.
When we do arithmetic with probabilities the basic operations are and and or. Sometimes there are extra rules with conditional events.
Playing cards are the traditional situation used in example problems for probability arithmetic. Drawing a card is an outcome with two features: suit and rank. This allows us to ask questions such as "What is the chance of drawing a king or a red card?" that are interestingly complex because some kings are red cards and others are not.
But those exampel problems create headaches. Too many 52s and other awkwardly large numbers.
Normal sixsided dice have nice small numbers, but do not naturally make questions of equivalent interest and complexity.
So we will modify sixsided dice to get the best of both words: interesting complexity with small numbers. Huzzah!
Imagine two young siblings invented a special dice game. They put stickers on a sixsided die in the following ways:
You could use stickers and a sixsided die to make your own copy. Or you could print, cut out, fold, and tape/glue this version:
The rules to the game are:
1. Both kids start with a pile of pennies they pretend are spaceships. Perhaps these piles have equal size, or perhaps one kid is younger and gets to start with more. Doesn't matter. Also, some extra pennies are set to the side. The kids also get a small number of tiny pebbles.
2. On your turn roll the die. Rolling a spaceship means your fleet builds a new spaceship: take a new penny from the extras. Rolling an explosion means you defeat an enemy ship: remove one of your sibling's pennies. Rolling a bomb means one enemy ship is doomed: put three pebbles on one of your sibling's pennies.
3. If you rolled an explosion, roll again! This can keep happening until you do not roll an explosion. Then your turn ends and the other kid starts a turn.
4. Remember bombs? At the start of your turn, remove one pebble from all of your ships that have any pebbles. When the last pebble is removed from a ship, that ship blows up: also remove that penny.
5. You win when your sibling's fleet is completely destroyed!
When doing probability with or we add, being careful to subtract any overlap.
For example, the chance to roll a ship is ½. The chance to roll an explosion is ½. But the chance to roll a ship or explosion is not ½ + ½ = 1. Do you see why? How do we fix that addition?
When combining independent events with and we multiply.
For example, the chance to roll an explosion is ½. Then we can roll again, and on the second roll we again will get an explosion half the time. So the chance to roll at least two explosions is ½ × ½ = ¼. Do you see why?
When combining conditional events with and we imagine that we are telling a story, going chronologically backwards. "We defeated the dragon! But first we had to find its cave." Start at the most recent event, and multiply by the probability that we actually did get that far already.
For example, what is the chance to get all three outcomes once? This can only happen one way: first rolling "bomb and explosion" and then rolling "only ship". The chance to roll "only ship" is ⅓. The chance to have actually rolled "bomb and explosion" earlier is ⅙. So the chance both happens is ⅓ × ⅙ = ^{1}⁄_{18}. Do you see why?
What happens when we tell a story involving or?
For example, what is the chance that a player's first two rolls damage exactly three ships with bombs or explosions? There are four ways this could happen. A player could first roll "bomb and explosion", and then on the second roll get either "only explosion" or "explosion and ship". Or that player could first roll either "only explosion" or "explosion and ship", and then on the second roll get "bomb and explosion". Finish answering this problem.
Unfortunately, there is vagueness about how to measure the probability of a change.
Imagine a medicine that can reduce one of your family member's cancer risk from 44 cases among 10,000 people down to 11 cancer cases among 10,000 people. The medicine has some bad side effects. Is the reduction in cancer risk worth suffering these side effects?
Absolute Change Example
The old risk is 0.44%. The new risk is 0.11%.
0.44% − 0.11% = 0.33%
We could say that with the medication the risk is reduced by 0.33%. (Only a third of one percent? That does not sound like much.)
We could also use a percent change to measure how much less likely is the new risk. Like all percent changes, this is a ratio comparing change to original.
(The funky part of this problem is how the change and original amount are both percentages.)
Relative Change (Less Likely)
The old risk is 0.44%. The new risk is 0.11%. Subtracting tells us the change is 0.33%. Then we do change ÷ original.
0.33% ÷ 0.44% = 0.75 = 75%
We could say that with the medication the occurrence of cancer is 75% less likely than before. (That sounds impressive!)
Relative Change (As Likely)
The new risk is 0.11%. The old risk is 0.44%.
0.11% ÷ 0.44% = 0.25 = 25%
We could say that with the medication the occurrence of cancer is only 25% as likely than before. (That still sounds impressive!)
The moral of the story is to pay attention (especially when dealing with small numbers) to whether a speaker is using an absolute change or a relative change. The former made the medicine sound like it is probably not worth the risk of its side effects. The latter made the medicine sound amazing.
The weighted average of a group of situations measures the "average result" of that group.
To find an weighted average, use a table. Each possible outcome is a row. Work across with multiplication: the value for that outcome times its percent probability. Then add those products.
This answer requires making a table, as below. The answer is not surprising. Most people already know that the "average value" when rolling two dice is seven. The expected value table confirms that common knowledge is precise instead of rounded: the expected value is indeed seven exactly, not slightly more or less. (The original Google spreadsheet is here.)
Here is a sample spreadsheet that shows Billy spends an average of about 2.1¢ each trip to the grocery store.
For many students the most commonly used weighted average table is finding their overall grade in a class.
Here is a sample spreadsheet that shows the overall grade is 81.9 in the class.
The weighted average is sometimes called the expected value. It does make sense to say "the expected value of the sum of two dice is 7". It almost makes sense to say "the expected value of one of Billy's trips to buy a gumball is 2.1¢." But it does not make sense to call the overall class grade an "expected value" because that situation does not involve condensing mutually exclusive outcomes into an average outcome.
Try these ten exercises on scratch paper. Work in a study group if you can! Notice where your notes need improvement. After you are very happy with your answers, you can use this form to ask me to check your work. Can you get at least 8 out of 10 correct?
1. When rolling two dice, what is the probability of the sum being an even number?
2. When rolling two dice, whatare the odds of the sum being an even number?
3. When rolling two dice, what is the probability of the sum being 8 or more?
4. When rolling two dice, what are the odds of the sum being 8 or more?
5. The medicine trastuzumab, which fights breast cancer in women who already have breast cancer, was popularized because of a certain study. In the control group of 1,700 women, 34 died. In the group treated with trastuzumab, 23 of 1,643 women died. What percentage of the women in the control group died? What percentage of the women in the treated group died?
6. Continuing the previous problem, what was the absolute change (subtraction) in risk?
7. Continuing the previous problem, what was the relative change (percent change) in risk?
8. Trastuzumab also has some dangerous side effects. Most notably, 40% of the women who take it develop flulike symptoms, 7% develop mild heart problems, and 5% suffer a stroke or severe heart failure. About how many of the 1,643 women in the study who were treated with trastuzumab suffered a stroke or severe heart failure because of drug?
(Tangentially, if you were in charge of publicity for this drug, what type of claim could you truthfully make about the medicine? If you were trying to discredit trastuzumab—perhaps concerned about the side effects and trying to convince a family member with breast cancer not to take the medicine—what type of claim could you truthfully make about the medicine?)
9. Your little brother thinks that ten is a very big number. He wants to play a dice game about the number ten. He proposes a game where you each start with a pile of candies, and he finds the sum of two dice several times. Whenever the sum is less than ten, he gives you one candy. Whenever the sum is ten or greater, you give him more than one candy—but he is not sure how many is fair. Help your brother finish inventing his game by using an expected value table to find how many candies must you give him when he "wins" so that the game has an expected value of zero.
10. Your friend is starting a food cart business. She has read that new food carts have a 35% chance to go out of business during the first year with a $10,000 loss, a 30% chance to earn $20,000 profit the first year, a 15% chance to earn $30,000 profit the first year, a 15% chance to earn $40,000 profit the first year, and a 5% chance to earn $50,000 profit the first year. Assuming these numbers are true, and your friend has typical skill and luck in her new business, what is the expected value of her first year's income?
Try these exercises on scratch paper. Work in a study group if you can! Notice where your notes need improvement. Check your work when you are done.
Brainstorming it Personal
Share specific examples from your own life:
 a percentage that is a part of a whole, and a percentage that is not a part of a whole
 a list of categorical data, and a list of quantitative data
 a list of data whose mean and median are very different
 a list of data whose range is greater than its highest value and which does not have a mode
 a list of data that has high standard deviation, and a list of data that has zero standard deviation
 a data set appropriately written in a frequency table, and data set inappropriate to write in a frequency table
 a data set appropriately written in a contingency table, and data set inappropriate to write in a contingency table
 a data set appropriately drawn as a pie chart, and data set inappropriate to draw as a pie chart
 a compliment, complementary outcomes
 a simple outcome, and a compound outcome