# TINY MACHINE LEARNING LESSON 7

## TOPICS INDEX

**Warnings****Copyright Notice****First Step for the creation of the Trained Model “Hello World”****The Training Dataset****PAI-015: The first steps in Google Colab****PAI-016: Realization of the “Hello World” Dataset on Colab****PAI-017: Data for Training, Data for Validation and Data for Tests****PAI-018: Completing Datasets for Model Training**

## Warnings

With regard to the safety aspects, since the projects are based on a very low voltage power supply supplied by the USB port of the PC or by support batteries or power supplies with a maximum of 9V output, there are no particular risks of an electrical nature. It is however necessary to specify that any short circuits caused during the exercise phase could produce damage to the PC, to the furnishings and in extreme cases even to burns, for this reason every time a circuit is assembled, or changes are made on it, it will be necessary to do so in the absence of power and at the end of the exercise it will be necessary to provide for the disconnection of the circuit by removing both the USB cable connecting to the PC and any batteries from the appropriate compartments or external power connectors. In addition, always for safety reasons, it is strongly recommended to carry out projects on insulating and heat-resistant carpets that can be purchased in any electronics store or even on specialized websites.

At the end of the exercises it is advisable to wash your hands, as the electronic components could have processing residues that could cause damage if ingested or if in contact with eyes, mouth, skin, etc. Although the individual projects have been tested and safe, those who decide to follow what is reported in this document, assume full responsibility for what could happen in the execution of the exercises provided for in the same. For younger children and / or the first experiences in the field of Electronics, it is advisable to perform the exercises with the help and in the presence of an adul

## Copyright Notice

*All trademarks are the property of their respective owners; third-party trademarks, product names, trade names, corporate names and companies mentioned may be trademarks owned by their respective owners or registered trademarks of other companies and have been used for purely explanatory purposes and for the benefit of the owner, without any purpose of violation of the copyright rights in force. What is reported in this document is the property of Roberto Francavilla, Italian and European laws on copyright are applicable to it – any texts taken from other sources are also protected by the Copyright and property of the respective Owners. All the information and contents (texts, graphics and images, etc.) reported are, to the best of my knowledge, in the public domain. If, unintentionally, material subject to copyright or in violation of the law has been published, please notify **info@bemaker.org* by email * and I will promptly remove it.*

## Roberto Francavilla

## First Step for the creation of the Trained Model "Hello World"

In Lesson 6 we saw what is the function sen(x) or even written in English sin (x), so ignoring that there is a mathematical function that allows us to draw the sine function precisely, we try to make our microcontroller build it thanks to TinyML.

We can also use this function as a PWM signal to gradually and alternately turn on two Green and Red LEDs to indicate when we are in the positive half-wave phase (green) and when we are in the negative semi-wave phase (red).

But now we focus on building our machine learning model.

Also in Lesson 6 we saw that a Machine Learning model is realized by Phases, the first Phase that I want to face is the one that concerns the construction of a Dataset.

## The Training Dataset

The Dataset is a set of data that is used to do the initial training of the Model.

Obviously the Dataset is established according to what we want our model to learn to predict, for example, if we want a model that is trained to make weather forecasts, the corresponding Dataset will have to contain all those values of pressure, temperature, and humidity (quantities, perhaps, detected daily over several years and in different seasons) with the consequent weather behavior. In this way the model can learn how to make weather forecasts.

Now, going back to our Hello World ML model and what I wrote in Lesson 6, this dataset is being built by the Machine Learning programmer.

So let’s get to work and to do this I need to explain a very useful tool that helps us a lot in the realization of our Dataset.

This tool is called** Jupyter Notebooks** and in the cloud it is offered for free by

**Google Colab**. To use it you need to have an account on Google, which I suggest, in fact thanks to Google Drive we can also store our projects in the cloud and even share them.

Colab is a particularly practical and easy to use development environment, it also allows us to have visibility, through graphic representations, also of the status of our Dataset.

The programming language used in Colab is Python, but don’t worry, it’s all easy, I’ll guide you step by step throughout.

## PAI-015: The first steps in Google Colab

It is not my intention to do a course on Colab, so do not worry, my goal is to give you the initial tools to be able to use Colab immediately and quickly, obviously Colab is a very powerful and very interesting tool and there are many tutorials that explain its operation and use, so I leave to these tutorials any of your insights, I personally only intend to explain step by step how to build our Dataset and with the opportunity to explain how to use Colab at best and in an elementary way.

To get started, once you have created your Google Account, log in to Colab by clicking on ** the link**

The welcome window appears:

Click **File** (top left) and then New **Notepad**

At this point you will have a notepad page where you can enter code or parts of text, but also photos, videos, etc … we don’t care… we will only use Colab to test our initial Dataset:

The notes blocco is organized into cells, the cells that contain the program code can also be executed by clicking on the “play” arrow at the top left of the cell. Once the code part is executed, the check mark is put.

Let’s take a practical example: We load the mathematical and graphic libraries and have the sine function drawn. To do this, we write inside the code cell:

import numpy as np

import math

import matplotlib.pyplot as plt

With the instructions above we are telling the Colab compiler to load three libraries, two of mathematics and one of graphics to plot the results. As you can see from the code, with the “import” statement we load the library then there is the “as” statement, in this case the statement tells the compiler that from this moment on (for example) the “numpy” library will be called “np”, so in essence the “as” statement ” helps us assign names that are easier to memorize and easier to write, for that particular library. Obviously, the names “np” and “plt” are arbitrary.

Then we click on add code cell and in the cell we write:

*x_values = np.linspace(0, 2 * math.pi, 1000)*

*y_values = np.sin(x_values)*

*plt.plot(x_values, y_values, ‘r’)*

In the first line we are building an array (i.e. a vector) of 1000 elements with values ranging from 0 to 2 π considering that π = 3.14, then 2 π = 6.28 .

This is a value expressed in radians and represents the turn angle, i.e. 360°.

In the second line we create an array (another vector) of size equal to x_values, but the elements are the calculation of the sine of the corresponding value of the angle in radians. In essence, we have built an array with the value of the sine with respect to the angle expressed in radians ranging from 0 to 360°.

In the third line, we “plot” the input data (angle in radians) and output (sine values) with a line of red color “r” and we get like this (click on the play of the two cells):

… a wonderful sine wave….

## PAI-016: Realization of the "Hello World" Dataset on Colab

Click **File** (top left) and then New **Notepad**

… and in the first cell of Colab we copy the code below:# TensorFlow is an open source machine learning library

% tensorflow_version 2 .x

import tensorflow as tf

# Numpy is a math library

import numpy as np

# Matplotlib is a graphing library

import matplotlib.pyplot as plt

# math is Python’s math library

import math

The line with “#” is a comment line.

The “import tensorflow as tf” line installs the TensorFlow library. The loaded “tensorflow” library is renamed “tf”.

The following lines are identical to the previous project, the libraries “numpy”, “matplotlib.pyplot” and “math” are invoked and renamed “np” and “plt”, while math, remains with its own name. I remember that they are mathematical libraries and for plotting results.

At this point we launch the execution of the code contained in the cell by clicking on the play

Since our purpose is to train a model with the purpose that assigned an input (which is an angle expressed in radians), the prediction of the model is the calculation of the sine of the angle given as input, then it is obvious that our Dataset will be built on the basis of already known sine values. Having previously studied the sine function, we know that the sine is a periodic function and repeats itself after each “period” indicated by the turn angle, that is, the 360°.

So the sine function will have a value that starts from 0, reaches 1, then returns to zero and goes to -1, finally returns to zero, with the angle that will go from 0 to 2 π , after which the values of the function and the angle, are repeated.

To build this Dataset we add another cell in Colab and write the following program lines:

# We’ll generate this many sample datapoints

SAMPLES = 1000

# Set a “seed” value, so we get the same random numbers each time we run this

# notebook. Any number can be used here.

SEED = 1337

np.random.seed(SEED)

tf.random.set_seed(SEED)

# Generate a uniformly distributed set of random numbers in the range from

# 0 to 2π, which covers a complete sine wave oscillation

x_values = np.random.uniform(low=0, high=2*math.pi, size=SAMPLES)

# Shuffle the values to guarantee they’re not in order

np.random.shuffle(x_values)

# Calculate the corresponding sine values

y_values = np.sin(x_values)

# Plot our data. The ‘b.’ argument tells the library to print blue dots.

plt.plot(x_values, y_values, ‘b.’)

plt.show()

πLet’s see line by line what we wrote….

SAMPLES = 1000

With the variable “SAMPLES” we indicate the number of values (samples) that we want to store in the array of input and output data. In this case we decide to store 1000 samples.

SEED = 1337

With the variable “SEED” we fix “the seed” for the generation of random numbers. It is useful to fix the seed in such a way as to always have the generation of the same casual numbers every time you launch the Colab notebook. The value 1337 is an arbitrary number (I leave the value 1337 because it is the one shown in the public example of Hello World that you can find on GitHub)

np.random.seed(SEED)

The function “np.random.seed (SEED)” initializes the random number generator with a reference seed value (if no seed is specified, the seed used is the time of the system).

tf.random.set_seed(SEED)

With the function “tf.random.set_seed(SEED)” the random number generator for the TensorFlow environment is initialized, this is a global random number generator. This is also why a reference seed value is needed.

x_values = np.random.uniform(low=0, high=2*math.pi, size=SAMPLES)

Once the number generator is initialized, with the function “np.random.uniform(low=0, high=2*math.pi, size=SAMPLES)” we make it define them and insert them into an array of 1000 elements. For this function I would like to point out that a range is given for the random numbers to be produced, that is, those ranging from 0 to 2 π . Once the random number is generated, it is inserted into the array variable named “x_values”.

np.random.shuffle(x_values)

The function “np.random.shuffle(x_values)” is used to randomly mix the elements of an array. In this regard, it should be noted that the training process used in deep learning depends on the data that is sent to it and if these are in a truly random order the resulting model is more accurate than in the case of sending data in an orderly manner.

y_values = np.sin(x_values)

The function “np.sin(x_values)” defines an array whose elements are the value of the sine of the angle corresponding to the array “x_values”.

plt.plot(x_values, y_values, ‘b.’)

plt.show()

The instructions above plot the graph representative of the values contained in the two arrays x_values and y_values. The argument “b.”, indicates the color to be used for printing the graphic, especially blue.

At this point we have built our training dataset, but to also give a sense of randomness of the data created and thus simulate a set of pseudo-collected data (this practice improves the training efficiency of the model!), we introduce additional random data out of value of the sine. In jargon it is said that background noise is introduced to the data.

To do this, to our elements that are inside the array with the calculated values of the sine, we add a small random value to it (the randomly added value can also be negative, so in reality there is a deduction in the value).

In terms of code then, we add an additional cell and we wrote the following program:

# Add a small random number to each y value

y_values += 0.1 * np.random.randn(*y_values.shape)

# Plot our data

plt.plot(x_values, y_values, ‘r.’)

plt.show()

Let’s analyze the function “y_values += 0.1 * np.random.randn(*y_values.shape)”. The values contained in the “y_values” array are values of the sine function and they range from – 1 to +1, so if I take these values and use them as a seed to determine random values, I will still get a generation of random values between -1 and +1. In fact I get this with the function “np.random.randn(*y_values.shape)”. Then I multiply this random value by 0.1 (i.e. I calculate 10%) and the value obtained I add to the corresponding value of the element of the array y_values.

In essence we have in this way constructed a distribution of random numbers (with respect to the sine value) that retain a sine-shaped distribution, but whose values are not those of the sine and the range of variation of these random values is +/-10% around the value of the sine.

Then we make the values reset, this time red.

This is how we created our Dataset, but it is not yet complete.

## PAI-017: Data for Training, Data for Validation and Data for Tests

At this point it is necessary to introduce a further important concept for the completion of our Dataset and that is that in order for the ML model to be properly trained it is necessary to introduce validation data and test data into the training dataset. So a Dataset is a set of data that is divided into **Data for Training**, **Data for Validation** and **Data for Tests**.

In general, the weight that is given for these three types of data is:

- 60% data for Training
- 20% data for Validation
- 20% data for Tests

To see how we get from a set of data a subdivision of the same according to our needs, let’s first take an elementary example and we will use a Python function, in particular of the “numpy” Library, called **“.split**“.

Let’s take a practical example.

Click on **File** (top left) and then on **New Notepad** and we write the following code:

import numpy as np

Campioni = 20

Seme = 1337

np.random.seed(Seme)

x_values = np.random.uniform(low=0, high=6.28, size=Campioni)

print(“Array X=”, x_values)

y_values = np.sin(x_values)

print(“Array Y=”, y_values)

With the code above, without going into detail of instructions and functions already seen, we created an array with 20 random values between 0 and 6.28. Then we build an array consisting of 20 elements, where each element corresponds to the value of the sine of the corresponding value of the element of the array x_values.

We add another cell and write the following code:

Dati_Addestramento = int(0.6 * Campioni)

Dati_Test = int(0.2 * Campioni + Dati_Addestramento)

x_train, x_test, x_val = np.split(x_values, [Dati_Addestramento, Dati_Test])

y_train, y_test, y_val = np.split(y_values, [Dati_Addestramento, Dati_Test])

assert (x_train.size + x_val.size + x_test.size) == Campioni

print(“Array X_T=”, x_train)

print(“Array Y_T=”, y_train)

print(“Array X_TS=”, x_test)

print(“Array Y_TS=”, y_test)

print(“Array X_V=”, x_val)

print(“Array Y_V=”, y_val)

In this second cell instead we see in detail what we are doing.

With the calculation “Dati_Addestramento = int(0.6 * Samples)” we are defining an integer variable with the value of 60% of the number of Samples, so if the Samples are 20, the variable will take the value of 12.

With the calculation ” Dati_Test = int(0.2 * Samples + Dati_Addestramento)” we are assigning to the integer variable Dati_Test a value that is equal to the sum of the value of the previously calculated variable (i.e. Dati_Addestramento”) with 20% of the number of total samples. So the previous value was 12, to which will be added 20% of 20, that is, 4, for a total of 16.

These two values are useful to us in the division of the array with the **.split** function In fact, with the function “x_train, x_test, x_val = np.split(x_values, [Dati_Addestramento, Dati_Test])” we are dividing the array into three parts, from element 1 to element 12, there are the training values, from 13 to 16 those of Test and from 17 to 20th those of validation:

In fact we launch the execution of the two cells and we get:

At this point we can complete our Dataset for model training.

## PAI-018: Completing Datasets for Model Training

We open the Colab Notebook called Seno_Data and add a code cell and write the following code:

# We’ll use 60% of our data for training and 20% for testing. The remaining 20%

# will be used for validation. Calculate the indices of each section.

TRAIN_SPLIT = int(0.6 * SAMPLES)

TEST_SPLIT = int(0.2 * SAMPLES + TRAIN_SPLIT)

# Use np.split to chop our data into three parts.

# The second argument to np.split is an array of indices where the data will be

# split. We provide two indices, so the data will be divided into three chunks.

x_train, x_test, x_validate = np.split(x_values, [TRAIN_SPLIT, TEST_SPLIT])

y_train, y_test, y_validate = np.split(y_values, [TRAIN_SPLIT, TEST_SPLIT])

# Double check that our splits add up correctly

assert (x_train.size + x_test.size + x_validate.size) == SAMPLES

# Plot the data in each partition in different colors:

plt.plot(x_train, y_train, ‘b.’, label=”Train”)

plt.plot(x_test, y_test, ‘r.’, label=”Test”)

plt.plot(x_validate, y_validate, ‘y.’, label=”Validate”)

plt.legend()

plt.show()

By clicking on the “play” of the cell, the distribution of the data is shown:

This is the end of the Dataset for learning our model.