girls coding

Web Apps Using AI Models

  • Train a machine learning model using Python libraries in a Jupyter Notebook
  • Incorporate a trained machine learning model into a Streamlit web app

These are the activities for this lesson:

WORKING WITH MACHINE LEARNING MODELS

Another major feature of Jupyter Notebooks, Python, and Streamlit is the ability to train machine learning models and make predictions. 

If you are new to Artificial Intelligence,  you might want to view the AI lessons in this curriculum to learn the basics before jumping into the more advanced coding involved here. You can use a user-friendly machine learning model platform like Teachable Machine to make a model and still incorporate it into a Python web app. 

If you have had some experience with artificial intelligence and working with datasets using Jupyter Notebooks, this is a good next step for you. 

In this lesson, you will learn about some of the Python machine learning libraries, and some of the different machine learning models you can create using Python. 

To review, to create a machine learning model, there are 3 main parts. 

DATASET

FINDS PATTERNS WITH LEARNING ALGORITHM


PREDICTION!

The dataset is your input for the model. It can include text, images, sounds, or poses. We worked with text and numerical data in Unit 2 using Jupyter Notebooks. We will continue to work with text data in this lesson, in the form of a spreadsheet.

Finding Patterns is essentially building the machine learning model using the dataset. Python contains many libraries that will build an AI model from data. In this curriculum, we will use many of the functions from the scikit-learn package. In addition to the libraries it provides, the website contains lots of excellent information about machine learning and the process of building models. It is a great resource to learn more!

Once you have created your model, the model can be used to predict an outcome based on new information. Once again, Python supplies libraries that enable this.

PREPROCESSING DATA

Before your dataset can be sent to the algorithm for building the model, it must be preprocessed, or “cleaned” so that the model building algorithm can work with it and make the most accurate model possible. In fact, the bulk of the work for creating a machine learning model is in the preprocessing. You will need to look carefully at the data to decide what is important, what can be left out, and what needs to be cleaned up.

What does preprocessing involve? With a text-based dataset, here are some things to deal with.

 

Null Values

Sometimes the dataset contains blank or null values, especially if the data is survey data. Sometimes you will eliminate any rows that have null data.

However, if the number of samples is low, you may not want to eliminate them. Another option is to replace a null value with some other value. It could be zero, or it could be the average of all the other values for that field.

Outliers

Sometimes the data contains one or two samples that are very different from the rest of the data. This might skew the model. You don't want the outliers to affect how the model is created, so often outliers are eliminated from the dataset.

For example, you might have a dataset where 95% of the samples are people aged 10-30, but you have a few random samples where the people are over 50. Since the vast majority of samples are in the 10-30 age group, you can consider removing the samples from the older age group.

Standardization

Often the numbers in a large dataset vary, depending on what features are represented. For example, you might have age, that varies from 0 to 70, but then you might have salary, that ranges from 0 to 500,000! The scales are very different, so one feature might count more in the model.

To fix, this, you can standardize the data so it's on a single scale. scikit-learn provides a StandardScaler, which updates each features so the mean is 0 and the standard deviation is 1.

Another thing to consider is a balanced number of samples for each class or label. You want to make sure their are about the same number of each class in your dataset.

Encoding

AI likes numbers, not necessarily words. So it helps to have all of the data converted to numbers. scikit-learn provides Encoder functions, so you can easily convert a range of possible text values to a range of numbers.

An example could be activity levels, with sample values sedentary, light, moderate, high. Those responses could be encoded to be values 0, 1, 2, and 3, which is much easier for the model building algorithm to handle.

SPLITTING DATA

Once you have preprocessed your data, you need to split it into a training set and a testing set. The training set will be used to train and create the model. Then you will test your model using the test set to see how it performs.

There are standard ways to split it (usually 75% for training, and 25% for testing) but you can split it however you want. Again, functions are provided so this is all automated for you. 

CREATING THE MODEL

The next step is creating the model. A big decision to make is, which algorithm do I use? There are many different supervised learning algorithms to choose from, and it is hard to know which one to use. A good process is to try out several different algorithms and then see which one gives you the best accuracy. 

The first step is to decide whether you need a classification algorithm or a regression algorithm. That depends on what you are trying to predict. 

Classification algorithms are used to predict discrete targets or classes. For example, classifying email as spam or not spam would be a classification problem

Regression algorithms are used to predict something that is along a continuous range. One example would be predicting how much salary a person will be paid. The prediction is a range of numbers and the output could be any value along that range. 

Here are just a few of the popular types of model creation algorithms.

neural network

Classification

  • Decision Tree
  • Random Forest 
  • K-Nearest Neighbors
  • Naive Bayes
  • Logistic Regression
  • Support Vector Machine

Regression

  • Linear regression
  • Ridge Regression
  • Lasso Regression
  • Polynomial Regression
  • Bayesian Linear Regression
  • Support Vector Regression

Note that some algorithms fall under both types. For example, there is a Decision Tree Classifier as well as a Decision Tree Regressor. And a Support Vector Machine works for classification, and Support Vector Regression works for regression. 

So, how do you decide which one to use? You should research to see what other data scientists use for different situations to see what might for for your model. You should also try out some different algorithms with your data and then find which one provides the best accuracy. You can also tweak parameters for a particular algorithm to see if it returns a more accurate model. 

scitkit-learn provides functions for all of these algorithms, so it’s straightforward to create the model. 

EVALUATING THE MODEL

You want your model to be the best it can be, so you need to evaluate its performance. Two common variables in evaulating a model are bias and variance. 

Bias is the difference between the model’s predicted value and the correct value. 

Variance is how much the predictions change when different data is used. 

You want to achieve a good balance between bias and variance.

High bias -> underfitting.

Underfitting happens when the model is too simple to account for the noise in training data. This can happen if there is not enough data or not enough features (columns), or too much noise in the data. If a model doesn’t perform well on both the training data and the testing data, it signals underfitting.

High variance -> overfitting.

Overfitting happens if you train a model on one set of data and it performs very well, but if you then present it with new data, it does not perform well at all. This can happen if the model is overly complex and it tries to fit too closely to the training data. The model may predict very well on the training data, but then performs poorly on testing data.

One technique to check the performance of a model is cross-validation.

Cross-validation means training your model several times, using different splits of training/testing data each time. Your dataset is split into several folds, or subsets. Then one fold is held out as the validation or test set, and the remaining folds are used to train it. This is performed several times, so each time the training and test sets change. 

 

sciikit-learn also provides a metrics library so you can easily get performance scores for your models. 

  • accuracy score = correct predictions/total predictions
  • precision = true positives/(true positives + false positives)
  • recall = true positives/(true positives + false negatives)
  • F1 score = (2 x precision x recall)/(precision + recall)
  • specificity = true negative/ (true positives + false negatives)
  • confusion matrix  – shows true positive, true negative, false positive, and false negative counts

By checking metrics from various algorithms, you can choose the best model. 

PREDICTING!

Once you have a model that you are satisfied with, you then want to use it in your app. 

It is common practice to do the preprocessing, creation, and evaluation of your model using Python, in an environment like Jupyter Notebooks. From there, you can export your model as a file.

Then, within your Streamlit app, you can load the model and use it to make predictions. 

In this lesson’s activities, you will go through this entire process using a stroke risk dataset. You will see how to preprocess the data, create models using different algorithms, and then use a model in a simple Streamlit app to predict the risk of stroke, given some input characteristics. 

ACTIVITY 1: TRAIN AN AI MODEL IN JUPYTER NOTEBOOK

Estimated time: 60 minutes

Explore a Stroke Risk Dataset to build an AI model

Follow this video to:
  1. Download a stroke prediction dataset from Kaggle.
  2. Work with the data in a Jupyter Notebook to:
    • Review the data
    • Preprocess the data to prepare it for the model
    • Create some different models
    • Evaluate and choose a model for your app
    • Export the model
Download Notebook

CHALLENGE

Try out a different model than the ones in the Jupyter Notebook. 

  1. Research the scikit-learn website for other classification algorithms, and look at other model-building examples on Kaggle.
  2. Choose one algorithm and add the code to your notebook to create the model.  
  3. Use the scikit-learn metrics to check for accuracy. 

How does your model perform? Is it better than any of other algorithms in the notebook?

ACTIVITY 2: BUILD A PREDICTION APP

Estimated time: 45 minutes

Use your model in a Streamlit App

Follow this video to make a web app that will make a stroke risk prediction based on user input.

CHALLENGE

The Iris dataset is a classic dataset that classifies iris flowers into 3 species (setosa, versicolor, and  virginica) based on the petal and sepal dimensions.

  1. Do some research on the dataset to learn its features and targets. 
  2. You can download the dataset and create your own model or use this model (pickle file) created using K-nearest neighbors. Note that no scaler was needed for this dataset.  The pickle file contains just the model.
  3. Import the model and create a Streamlit app to predict the iris species based on the four dataset features.

TECHNOVATION INSPIRATION

Here are some pretty amazing examples from Technovation participants who used Python and Streamlit to build web apps incorporating machine learning models. 

REFLECTION

You have gone through the entire process of preprocessing a dataset, building several models, and evaluating and choosing one to use in an app.  That is A LOT to learn in one lesson!

reflection
What type of model do you think you will need for your solution - classification or regression?
What parts did you find difficult to understand?
What steps can you take now to learn more so you can build your solution?
Previous slide
Next slide

REVIEW OF KEY TERMS

  • Preprocessing – taking a dataset and making sure the data in it is suitable to train a machine learning model with
  • Classification algorithm – an algorithm used to train a machine learning model that will classify or predict discrete values
  • Regression algorithm – algorithm used to train a machine learning model to predict a value on a continuous range
  • Bias – the difference between the model’s predicted value and the correct value, due to incorrect assumptions that simplify the model
  • Variance – the amount of variability in model predictions, when a model is unable to generalize when faced with new data
  • Overfitting – when the model fits too well to the training data that it cannot predict well on new data, caused by high variance in the model
  • Underfitting – when a model is simplified too much and does not perform well on either training or testing data, caused by high bias or assumptions in the model
  •  

ADDITIONAL RESOURCES

Machine Learning 

Streamlit

girl working at computer

Web Apps: Diving In

  • Make a web app that displays images and plays sounds
  • Learn how to make data graphs in Python using Jupyter Notebooks
  • Make a data dashboard web app with Streamlit

STREAMLIT

Making a web app with Streamlit and Python is straightforward. Like block-based coding platforms, the Streamlit platform includes many components and widgets that can be added to your app with a single line of code. Most of the code is already written and packaged for you, so you can focus on the goals of your app, rather than getting bogged down in lots of difficult code. 

To practice using Streamlit, we will take one of our mobile app examples from Thunkable and App Inventor, and show how the same app can be built in web app form using Streamlit. 

ACTIVITY 1: SOUNDBOARD APP

Estimated time: 30 minutes

Code a Streamlit Web App

  1. Click button below to download assets (images and sound files) that will be needed to make the app.
  2. Follow this video to create a simple Soundboard app that plays sounds when a button is pressed.
    If you haven't already installed Python and Streamlit, complete the Exploring Web App Builders Activity before doing this activity.
  3. Add a fourth person to your app. Find an image and short sound file to add to the app. Here is a link to some famous speeches.
Download asset files

WORKING WITH DATA

The Python language works well with data. Python has many libraries specifically made to allow coders to read, manipulate, and plot data. When combined with the Streamlit platform, coders can easily make apps that analyze and display data for users. And you can take the next step to incorporate datasets and machine learning models into an app.

data graphs

Most programmers and data scientists work with data in Python is by using software called Notebooks. One of the most popular Notebook interfaces is Jupyter Notebook. According to the Kaggle Survey 2022 results, Jupyter Notebooks are the most popular data science interactive development environment (IDE), used by over 80% of respondents. 

Jupyter Notebook runs in a browser, although there are other interfaces. For example, it can integrate directly in Visual Studio Code. 

The engine behind the notebook that runs the code is called a kernel. For Python, you will use the ipython kernel. 

The notebooks allow you to write text as well as Python code. Text is written using a markdown language, with simple commands to format the text. It’s a good way to add headers and explanations of the code included in the notebook.

You can also execute Python code directly in the notebook. 

jupyter notebook screenshot of code cells

Similar to a physical notebook you might use in school, Jupyter Notebooks are a great way to take notes, organize your thoughts with a data project, and explore information. The added feature of executing code allows you to experiment with Python code in a controlled and organized way. 

Jupyter Notebooks help you to plan out and test different aspects of your web app before you jump to the Visual Code Editor to build the actual app.

PYTHON LIBRARIES

There are many libraries that you will need to use in your code to build your web app. A library is a collection of pre-written code that performs particular tasks. Programming libraries are very powerful and mean that your app can do powerful things with just a few lines of code. 

For Python, most libraries require that you first install them on your computer, then in your Python script file, you will import the libraries that you need. 

An example of the libraries you will need for using data are numpy and pandas.

Pandas allows your app to work easily with large amounts of data. It puts the data into something called a dataframe, and your app works with the dataframe. Numpy has many functions for performing numerical operations on the data in the dataframe.

In addition, there are many plotting and graphing libraries, which allow users to visualize the data. The most popular visualization libraries in Python are matplotlib, plotly, and seaborn

The following activity will use all of these libraries. The activity will take a dataset from a music and mental health survey to create a data dashboard app that displays the data in different ways for the user to interact with.

ACTIVITY 2: DATA DASHBOARD

Estimated time: 90 minutes

Build a Streamlit Web App

  1. Follow this video (Part 1) to install and run the Jupyter Notebook. Click link below to download the notebook file.
  2. Follow this video (Part 2) to build the data dashboard app using Streamlit.
  3. Your turn: Add one more graph to your data dashboard. You can choose from some of the other graphs from the original Jupyter Notebook, or make a new plot/graph in the Jupyter Notebook, and then integrate the code to Streamlit to add it to the dashboard.
Download Notebook

REFLECTION

Congratulations, you have made two web apps in Streamlit! Ask yourself these questions:

reflection
Did you run into any issues installing or running Jupyter Notebook or Streamlit?
How did you overcome issues when you ran into them?
How can you use the ideas from this lesson in your project?
Previous slide
Next slide

REVIEW OF KEY TERMS

  • Jupyter Notebook – popular data science interactive development environment to work with data through Python coding
  • Kernel – a process that runs and acts as the engine behind Jupyter Notebooks
  • Markdown language – a language that allows you to format text easily so it is more readable
  • Library – collection of pre-written code that performs particular tasks

ADDITIONAL RESOURCES

Jupyter Notebooks

 

Streamlit

girl at computer with woman looking over shoulder

Exploring Web App Builders

  • Learn about web apps and how they differ from mobile apps
  • Learn about different options for coding and building web apps
  • Install the necessary software to build a web app

These are the activities for this lesson:

WEB APPS

For your Technovation project, you have the option of building a mobile app or a web app

For participants who have participated in Technovation before and are looking for a new challenge, or for new participants that have prior coding experience, you might consider making a web app for your Technovation project.

Many participants will opt for building a mobile app, with one of our suggested app builders, App Inventor or Thunkable. If you are new to coding or know you want to use App Inventor or Thunkable to code your app, you can skip this lesson! 

Making a web app involves text-based coding and is more advanced than block-based coding with App Inventor or Thunkable. 

Let’s start with a review of the difference between mobile apps, web apps, and progressive web apps. 

Mobile App

  • a program that runs natively on the phone
  • downloaded and installed on the device 
  • can access the phone’s features, such as GPS and camera
  • platform-specific (iOS or Android) 
  • coded with particular languages to match the operating system

Web App

  • looks a lot like a mobile app
  • runs in an internet browser
  • not native to a particular device (iOS or Android) 
  • is generally coded with HTML, CSS, Javascript and Python
  • cannot run when offline

Progressive Web App

  • special type of web app that is a hybrid between a mobile app and web app
  • runs in a browser
  • can also be installed on the mobile device like a regular mobile app
  • can run even when user is offline

Note that a web app differs from a website. A website is static, coded using HTML and CSS. Web apps are dynamic and changing, based on user input and other external interactions.  For your Technovation project, a website is not acceptable.

We will explore some beginner options for creating web apps.

One big difference from the app building platforms we cover for mobile apps is that you will create a web app using a text-based programming language instead of a block-based language. 

There are two main languages that are used to create web apps.

JAVASCRIPT

javascript logo

Javascript, or JS, is a scripting language. This means that the code is executed at runtime, instead of being compiled, like a mobile app. It’s like an actor running through her script during a show each time the show runs.

Javascript is often combined with HTML and CSS to make websites. HTML and CSS are used to make static websites, which can present information but do not change. Javascript adds interactivity and the ability for the website to change and update based on external factors. And a dynamic, interactive website is essentially a web app.

PYTHON

python logo

Python is a very popular general-purpose programming language. Python is both a programming language and a scripting language, so it can be compiled to run but also can execute at runtime.

It is seen as a straightforward, versatile language that is accessible for new coders. It is used in many different aspects of software development. One area is web development. Another is machine learning. So, Python is a great option for learning and developing more advanced AI web apps. 

In this curriculum, we will focus on making web apps with Python. To easily build a web app using Python, we’ll use a framework called Streamlit.  Streamlit allows you to build powerful, interactive web apps with little code.  It specializes in apps involving data, and allows you to easily use Python machine learning libraries to incorporate AI in your apps. 

Here are some main features.

  • Good option for people who have lots of blocks-based coding experience and are looking for a new challenge
  • Good for people who have done some text-based coding
  • Very versatile language used widely
  • You will need to install software on your computer
      • Python and associated libraries
      • A code editor
  • Streamlit does have option to run in the browser, using Github
      • We won’t be cover this option in this curriculum
  • You can use AI with it
      • Most popular language for building and using machine learning models
      • We’ll use Jupyter Notebooks in this curriculum for model building

GETTING STARTED

To code web apps for your Technovation project, you will need:

  • a computer or laptop
  • Internet access

You should be somewhat comfortable using the Terminal window on a Mac or Linux and either Terminal or command prompt on Windows. If you do not have experience, or even know what the Terminal window is, then check out these beginner videos before moving on to the activity.

Watch the appropriate video for your operating system.

Using the Terminal

3 Videos

ACTIVITY: GET STARTED WITH PYTHON

Estimated time: 45 minutes

INSTALL SOFTWARE & CODE STARTER APP

NOTE: Following the instructions below, you might run into some issues, so be patient and be prepared to troubleshoot as you go!

  1. Step 1: Install Python. Here is a good set of instructions for Window, Mac, or Linux.
  2. Step 2: Install a code editor and Streamlit. This video shows you how to install Visual Studio Code and then install Streamlit to run it from the code editor.
  3. Step 3: Run a very simple Streamlit web app in the VS Code environment by following this video.
  4. Celebrate by taking a screenshot of your first web app and send it with a note to your mentor!

REVIEW OF KEY TERMS

  • Web App –  application that looks like a mobile app but runs in an internet browser and is coded using HTML, CSS, and Javascript or Python

REFLECTION

Congratulations on trying out some text-based coding! Here are some reflection questions for you to consider with your team and with your mentor.

refelction of rocks in water
Did you find any challenges installing and/or working with a text-based language?
How did you overcome the challenges?
Previous slide
Next slide

ADDITIONAL RESOURCES

You’ll need to refer to documentation and support for help working with Python & Streamlit. Below are some good places to start.