Remember, healthy predictions need healthy data!
Healthy
dataset
Finds
correct
patterns
Healthy
prediction!
Correct actions or decisions!
But what makes data healthy data?
Can you make a list?
Healthy data
Lots of data
Accurate
Matches your problem and solution
Different examples of data
The right kind of data
You have permission to use it
Healthy prediction!
Better actions or decisions
THERE ARE 3 WAYS YOU CAN COLLECT HEALTHY DATA
What path is right for you?
if the information you need is available from your community, and you can get permission to use it.
if you want your app to collect data on its own .
if you need lots and lots of data or you’re solving a problem that is more global.
Plan: What type of data do you need from your community? How will you gather it?
Voice assistants like Siri or Alexa have a microphone, which is a sound sensor, to gather data (like what a person is saying)
This smart crosswalk has a light sensor to gather information about how light and dark it is and a motion sensor to detect movement.
Different sensors gather different types of data!
HARDWARE OPTIONS
You will need devices to help you capture your data. Consider using:
Your phone has many sensors that can be used to gather information... Think mobile app!
Small circuit board that can gather data and is built to use AI.
Small circuit board that can gather data and is built to use AI.
Mini-computer (board) as powerful as a laptop!
For more, check out this big list of sensors.
This video from our World Summit 2020 has lots of great information on hardware options!
You might want to use a public dataset if …
You want more complete data.
You want to use a lot of data.
You want accurate data.
WHERE TO FIND PUBLIC DATASETS
Here are some good places to start. (Click images to visit sites.)
Here are some links to sample datasets according to problem types:
Here are some links to sample datasets according to problem types:
WHERE ELSE COULD YOU LOOK?
You can try:
It’s not easy finding the perfect dataset. Remember to ask for help!
The activity below is practice in finding and using a public dataset and training a model using Machine Learning For Kids. The dataset has information about iris flowers, and the model will predict the type of iris based on length and width of flower parts.
In the next unit, we’ll use the iris AI model in a Scratch project!
You may not be able to gather the perfect dataset, but gather what you can.
Proxy data is data that is a placeholder for a later version of your dataset.
JUMP IN AND STARTING PUTTING YOUR DATASET TOGETHER!
Check off these items to make sure you create a strong dataset!
Cookie | Duration | Description |
---|---|---|
cookielawinfo-checkbox-analytics | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics". |
cookielawinfo-checkbox-functional | 11 months | The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional". |
cookielawinfo-checkbox-necessary | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary". |
cookielawinfo-checkbox-others | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other. |
cookielawinfo-checkbox-performance | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance". |
viewed_cookie_policy | 11 months | The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data. |
We’re using an AI-powered translation tool, and there may be some errors. If you find something that is incorrect, please fill out our form and let us know so we can fix it!
You can also find a link to the form in the footer.