Large collections of data are called datasets. Healthy datasets lead to correct actions with AI!
Healthy datasets have lots of different examples!
IT'S IMPORTANT TO SPEND TIME GATHERING HEALTHY DATA!
If your data is healthy …
Healthy dataset
Finds correct patterns
Healthy prediction!
Correct actions or decisions!
If your data is not healthy …
Unhealthy dataset
Finds bad patterns
Bad prediction!
Incorrect actions or decisions!
But how do we make healthy datasets?
Healthy datasets
Lots of data
Different examples of data
The right kind of data
(example: not using pictures if sounds would be better)
Correct actions or decisions
REMEMBER OUR MAKE ME HAPPY MODEL?
How healthy was this dataset?
Finds the right patterns
Correct prediction for happy or sad
Correctly shows a happy or sad face
Is the Make Me Happydatasethealthy?
Does it have lots of data? Were there equal amounts in each class?
What different examples of data does it have? Could we have added more, different data?
Does it use the right kind of data? Why?
Makes the right decision about the type of sentence and displays the correct picture.
Let’s check if the dataset is healthy!
More than 10 pieces of data means we are off to a good start withlots of data!
We have different example sentences as our different types of data.
Our model uses sentences, so a text dataset (happy and sad sentences) is theright kindof data!
Likely to make the right decision and display the correct happy or sad face!
Can you make the dataset for Make Me Happybetter?
Healthy datasets
Lotsof data – are there more sentences we can add to the training data?
Different examples of data – are there sentences in another language we should add?
The right kind of data – instead of text should we use audio clips (sound)?
You can make an even better happy/sad recognizer!
Below are some activities to get you ready to work with datasets! First, you will judge some existing datasets to see if they are healthy, and then you’ll practice planning a dataset for a given problem.