DATA FUELS AI
Healthy datasets lead to correct actions with AI!
Healthy datasets have lots of different examples!
IT'S IMPORTANT TO SPEND TIME GATHERING HEALTHY DATA!
If your data is healthy …
Healthy
dataset
Finds
correct
patterns
Healthy
prediction!
Correct actions or decisions!
If your data is not healthy …
Unhealthy
dataset
Finds
bad
patterns
Bad
prediction!
Incorrect actions or decisions!
But how do we make healthy datasets?
Healthy datasets
Lots of data
Different examples of data
The right kind of data
(example: not using pictures if sounds would be better)
REMEMBER OUR MAKE ME HAPPY MODEL?
How
healthy
was this dataset?
Finds
the right
patterns
Correct
prediction
for happy
or sad
Correctly shows a happy or
sad face
Is the Make Me Happy dataset healthy?
Does it have lots of data? Were there equal amounts in each class?
What different examples of data does it have? Could we have added more, different data?
Does it use the right kind of data? Why?
Let’s check if the dataset is healthy!
More than 10 pieces of data means we are off to a good start with lots of data!
We have different example sentences as our different types of data.
Our model uses sentences, so a text dataset (happy and sad sentences) is the right kind of data!
Likely to make the right decision and display the correct happy or sad face!
Can you make the dataset for Make Me Happy better?
Healthy datasets
Lots of data – are there more sentences we can add to the training data?
Different examples of data – are there sentences in another language we should add?
The right kind of data – instead of text should we use audio clips (sound)?
You can make
an even better happy/sad recognizer!
Below are some activities to get you ready to work with datasets! First, you will judge some existing datasets to see if they are healthy, and then you’ll practice planning a dataset for a given problem.