back to index

Neural networks learning spirals


Whisper Transcript | Transcript Only Page

00:00:00.000 | Let's use TensorFlow Playground to see what kind of neural network can learn to partition the space
00:00:04.960 | for the binary classification problem between the blue and the orange dots. First is an easier
00:00:10.560 | binary classification problem with a circle and a ring distribution around it. Second is a more
00:00:17.040 | difficult binary classification problem of two dueling spirals. This little visualization tool
00:00:24.160 | on playground.tensorflow.org is really useful for getting an intuition about how the size of the
00:00:29.680 | network and the various hyperparameters affect what kind of representations that network is
00:00:34.320 | able to learn. The input to the network is the position of the point in the 2d plane and the
00:00:39.520 | output of the network is the classification of whether it's an orange or a blue dot. We'll hold
00:00:44.960 | all the hyperparameters constant for this little experiment and just vary the number of neurons and
00:00:50.000 | hidden layers. The hyperparameters are batch size of one, learning rate of 0.03, the activation
00:00:56.720 | function is ReLU and L1 regularization with a rate of 0.001. So let's start with one hidden layer and
00:01:03.600 | one neuron and gradually increase the size of the network to see what kind of representation it's
00:01:07.760 | able to learn. Keep your eye on the right side of the screen that shows the test loss and the
00:01:11.920 | training loss and the plot that shows sample points from the two distributions and then the
00:01:16.960 | shading in the background of the plot shows the partitioning function that the neural network is
00:01:21.680 | learning. So a successful function is able to separate the orange and the blue dots. One hidden
00:01:27.520 | layer with one neuron, two neurons, three neurons,
00:01:37.120 | four neurons,
00:01:41.680 | eight neurons.
00:01:49.760 | Now let's take a look at the trickier spiral dataset keeping most of the hyperparameters the
00:01:54.800 | same but decreasing the learning rate to 0.01 and adding to the input to the neural network
00:02:02.960 | extra features than just the coordinate of the point but also the squares of the coordinates,
00:02:08.400 | the multiplication, and the sign of each coordinate. Let's start with one hidden layer,
00:02:13.760 | one neuron, two neurons, four neurons,
00:02:24.640 | six neurons,
00:02:32.160 | eight neurons.
00:02:40.160 | Two hidden layers, two neurons in the second layer,
00:02:59.040 | four neurons,
00:03:06.080 | six neurons,
00:03:34.080 | eight neurons.
00:03:40.960 | There you go. That's a basic illustration with the playground.tensorflow.org that I recommend
00:04:00.960 | you try that shows the connection between neural network architecture, dataset characteristics,
00:04:07.360 | and different training hyperparameters. It's important to note that the initialization of
00:04:12.240 | the neural network has a big impact in many of the cases but the purpose of this video was not
00:04:17.200 | to show the minimal neural network architecture that's able to represent the spiral dataset but
00:04:23.600 | rather to provide a visual intuition about which kind of networks are able to learn which kinds of
00:04:28.960 | datasets. There you go. I hope you enjoyed these quick little videos, whether they make you think,
00:04:34.480 | give you a new kind of insights, or just fun and inspiring. See you next time,
00:04:39.360 | and remember, try to challenge yourself and learn something new every day.
00:04:58.880 | [BLANK_AUDIO]