Healthy Datasets

DATA FUELS AI

Just like food fuels the human body!
Large collections of data are called datasets.

Healthy datasets
lead to correct actions with AI!
road leading straight ahead

Healthy datasets have lots of different examples!

many sound icons

IT'S IMPORTANT TO SPEND TIME GATHERING HEALTHY DATA!

 If your data is healthy …

Healthy
dataset

Finds
correct
patterns

Healthy
prediction!

Correct actions or decisions!

 If your data is not healthy …

Unhealthy
dataset

Finds
bad
patterns

Bad
prediction!

Incorrect actions or decisions!

But how do we make healthy datasets?

Healthy datasets

Lots of data

Different examples of data

The right kind of data

(example: not using pictures if sounds would be better)

Correct actions or decisions
happy face

REMEMBER OUR MAKE ME HAPPY MODEL?

happy sad dataset

How
healthy
was this dataset?

Finds
the right
patterns

Correct
prediction
for happy
or sad

Correctly shows a happy or
sad face

Is the Make Me Happy dataset healthy?

Does it have lots of data? Were there equal amounts in each class?

What different examples of data does it have? Could we have added more, different data?

Does it use the right kind of data? Why?

Makes the right decision about the type of sentence and displays the correct picture.
happy facesad face transparent background

Let’s check if the dataset is healthy!

More than 10 pieces of data means we are off to a good start with lots of data! 

We have different example sentences as our different types of data

Our model uses sentences, so a text dataset (happy and sad sentences) is the right kind of data!

Likely to make the right decision and display the correct happy or sad face!

Can you make the dataset for Make Me Happy better?

Healthy datasets

Lots of dataare there more sentences we can add to the training data?

Different examples of dataare there sentences in another language we should add?

The right kind of datainstead of text should we use audio clips (sound)?

You can make
an even better happy/sad recognizer!

Below are some activities to get you ready to work with datasets! First, you will judge some existing datasets to see if they are healthy, and then you’ll practice planning a dataset for a given problem.

ACTIVITY 1: JUDGE THE DATASETS

Download the file and fill out the worksheet

ACTIVITY 2: PLAN A DATASET

Download the file and fill out the worksheet