Paolo Perrotta: A Deep Learning Adventure

published Oct 25, 2019

Keynote talk by Paolo Perrotta at the Plone Conference 2019 in Ferrara.

It is hard to see what part of machine learning is over hyped, and what part is actually useful.

Basis of ML (Machine Learning) is to get input and transform it to output. The ML gets a picture of a duck, and it gives as answer "duck". That is image recognition. You train the ML with images that you already label. And then you give it an image, and hope it gives a good answer. Same: from English to Japanese text. From an fMRI scan to an image visualized.

Simpler example: solar panel. Input: time of day, output: how much generated power. You would train this with data. The ML would turn this into a function that gives an approximately good answer.

Simplest model: linear regression. Turn the data into a function like this:

a * X + b

No line through the training points will be perfect. You find the line that minimizes the average error. So the ML uses linear regression to find a and b.

In our example, we try to guess the amount of mojitos we sell, based on the number of passengers that a daily boat brings to our beach bar.

But this may also depend on the temperature, or the number of sharks. With two variables you would not have a line, but a plane. With n, you would get an n-dimensional shape. You have n inputs, and give a weight to each input, and add them.

This is about numbers. For image recognition, you don't get a number as output. But we can apply a function to the result, and get a number between 0 and 1 that gives a likelyhood. For an image we may have 0.02 certainty that it is a cat, and 0.9 that it is a duck, so our answer will be the highest: a duck. Translated to the mojitos: what is the likelyhood that my bar breaks even?

This system with weights and a reducer function is called a perceptron. With a short Python program I got 90 percent accuracy on the standard NDIST test. We can do better.

We look at neural networks. This is basically: mash two perceptrons together. Or more. We are finding a function that approximates the data, and reduce the error iteratively.

What would I say is deep learning? Why has it grown?

  1. Neural networks with many layers...
  2. ... trained with a lot of data...
  3. ... with specialized architectures.

We helped Facebook with neural networks by tagging our photos. We gave them training data!

A lot of engineering is going into deep learning.

Generative Adversarial Networks: GANs. Example: a horse discriminator. Train the system with images of horses and others, and it should answer with: yes or no horse.

Other part: horse generator. Randomly generate images for feeding to the horse discriminator. You train this by saying: you did or did not manage to trick the discriminator. And you try to get better. I trained this a night, and after about a million iterations, I got pictures that are not quite horses, there is something wrong, but they are amazingly close.