Data literacy requires people to be able to make sense of data, to incorporate the information it relays, and be able to share that information.
Why Should I Be Data Literate?
There is a new poll out saying that 73% of Americans approve of new legislation. 4 out of 5 Dentists like whitening toothpaste. That new film has a rating of 4.5 stars.
We are inundated with data. So it behooves us to be able to understand the data that is presented.
For instance, for the statement 4 out of 5 dentists like whitening toothpaste. How many dentists were really asked? If they really only asked 5 dentists and four dentists like it, do you think dentists really like whitening toothpaste. What if they asked 5,000 dentists, and 4,000 dentists like it?
And that's not all!
Your streaming service is giving you increasingly good recommendations. And your map app warns you about traffic on your route.
That's Big Data. Your streaming service knows what you've watched on it. As you use it, it knows more and more about what you like to watch. It is comparing you to other people who watched the same shows to recommend other shows that those people have watched.
Our behavior online is data. And not just when we order from Amazon, but when we use wearable electronics (like Fitbit), use smart speakers, download apps, and more. So we need to remember that and be careful with our data.
When you think data you probably think numbers. But there is more to data than just numbers. There are two types data : qualitative data and quantitative data.
Quantitative data is numeric data. These are things like age, height, or weight limit. The hard sciences use primarily quantitative data.
Qualitative data, also known as categorical data, is data that falls into groups like gender, medications, or opinions. The social sciences use a lot of qualitative data.
While the hard sciences may use primarily quantitative data and the social sciences may use primarily qualitative data, neither can rely on just the one data type. Accurately measuring and describing our world through data requires both.
Poorly labelled graphs can cause confusion.
For instance, these two charts are based on the same data, the amount of money Company A spent on a project over the years. The first chart shows the cumulative amount that the company had spent by the end of the two year period. But the amount of money actually spent per period is shown in the second graph.
If the company released the first graph and said that they were constantly spending more on the project*, you would likely think that they spent 7 times as much in 2020 as they did in 2010, when in fact they spent only twice as much in 2020. And that's what the company wants you to think.
Using the wrong type of graph
Different graph types have different purposes. And if someone used the wrong sort of graph for their information, their message could be muddled. If a university wants to show the proportion of staff to faculty, a pie chart will be excellent; but if it wants to show the changes/consistency in proportions over the years then a pie chart would be confusing and not helpful. For that a stacked bar chart would be better than the pie chart.
Media reporting is an important place for data literacy. It can be reports on scientific studies to local polls and everything in between. There are two data literacy points in reporting that people should note.
When you look at the reported data, pay attention to how it was sampled. This comes in two forms: sample size and sample population.
Sample size is important. If the news tells you 80% of people like the new park the city just put in, that sounds like it's really well liked. But what if they only asked 5 people? Then that 80% is just 4 people, and you're no longer certain that new park is liked at all.
A good sample size is dependent upon the size of the population. Say the College had 100 people take a survey. In a College of two thousand undergraduates, that's 1 out of 20. In a smaller College of 500 undergraduates, that's actually a great response rate, each respondent is only representing 5 people.
The population is also important. *Note, the population refers to whatever is being studied; this can be people, animals, objects, or anything. you want your studied population to accurately represent the full population. If a local paper declares that 98% of locals think NMSU should win the battle of I10, one might assume that NMSU is probably having a strong season. But what if 90% of the people they asked are NMSU supporters? That would bias the results.
An important note about statistics, it can show correlation, but cannot show causation. And these two ideas are vastly different. Correlation means there is a connection (positive or negative) between two variables. Causation means one variable directly affects (positive or negative) the other. There is a correlation between the shoe and clothing sizes of infants and toddlers, but an increase in shoe size doesn't cause an increase in clothing size, both are caused by the children growing.