Collecting Data

 Remember,  healthy predictions need healthy data!

Healthy
dataset

Finds
correct
patterns

Healthy
prediction!

Correct actions or decisions!

 But what makes data healthy data? 

Can you make a list?

Healthy data

Lots of data

Accurate

Matches your problem and solution

Different examples of data

The right kind of data

You have permission to use it

Healthy prediction!

Better actions or decisions

THERE ARE 3 WAYS YOU CAN COLLECT HEALTHY DATA

Collect your own training data from

your community

Gather data with

sensors or user input

Use data from

public datasets

What path is right for you?

COMMUNITY

if the information you need is available from your community, and you can get permission to use it.

SENSORS

if you want your app to collect data on its own .

PUBLIC DATASETS

if you need lots and lots of data or you’re solving a problem that is more global.

COMMUNITY DATA

Plan: What type of data do you need from your community? How will you gather it?

survey checklist
Surveys
camera illustration
Take pictures
two people sitting in chairs talking
Make appointments
microphone
Record sounds

SENSOR DATA

Voice assistants like Siri or Alexa have a microphone, which is a sound sensor, to gather data (like what a person is saying)

This smart crosswalk has a light sensor to gather information about how light and dark it is and a motion sensor to detect movement.

man walking across street in dark with streetlights turning on

Different sensors gather different types of data!

Camera
Proximity sensor
Speedometer
Infrared thermometer
Light sensor
Microphone

HARDWARE OPTIONS

You will need devices to help you capture your data. Consider using:

Phone

Your phone has many sensors that can be used to gather information... Think mobile app!

arduino

Arduino

Small circuit board that can gather data and is built to use AI.

micro:bit

Small circuit board that can gather data and is built to use AI.

Raspberry Pi

Mini-computer (board) as powerful as a laptop!

For more, check out this big list of sensors.

This video from our World Summit 2020 has lots of great information on hardware options!

PUBLIC DATASETS

You might want to use a public dataset if …

puzzle
stack of books

You want more complete data.

You want to use a lot of data.

You want accurate data.

WHERE TO FIND PUBLIC DATASETS

Here are some good places to start. (Click images to visit sites.)

Here are some links to sample datasets according to problem types:

Here are some links to sample datasets according to problem types:

WHERE ELSE COULD YOU LOOK?

You can try:

  • local organizations
  • local schools
  • local libraries
  • where else?

It’s not easy finding the perfect dataset. Remember to ask for help!

The activity below is  practice in finding and using a public dataset and training a model using Machine Learning For Kids. The dataset has information about iris flowers, and the model will predict the type of iris based on length and width of flower parts.

ACTIVITY: TRAIN A MODEL

Follow the video below to use a public dataset in an AI model

In the next unit, we’ll use the iris AI model in a Scratch project!

DO THE BEST YOU CAN!

You may not be able to gather the perfect dataset, but gather what you can. 

Proxy data is data that is a placeholder for a later version of your dataset. 

JUMP IN AND STARTING PUTTING YOUR DATASET TOGETHER!

Check off these items to make sure you create a strong dataset!

  •