back to indexNeural networks learning spirals
00:00:00.000 |
Let's use TensorFlow Playground to see what kind of neural network can learn to partition the space 00:00:04.960 |
for the binary classification problem between the blue and the orange dots. First is an easier 00:00:10.560 |
binary classification problem with a circle and a ring distribution around it. Second is a more 00:00:17.040 |
difficult binary classification problem of two dueling spirals. This little visualization tool 00:00:24.160 |
on playground.tensorflow.org is really useful for getting an intuition about how the size of the 00:00:29.680 |
network and the various hyperparameters affect what kind of representations that network is 00:00:34.320 |
able to learn. The input to the network is the position of the point in the 2d plane and the 00:00:39.520 |
output of the network is the classification of whether it's an orange or a blue dot. We'll hold 00:00:44.960 |
all the hyperparameters constant for this little experiment and just vary the number of neurons and 00:00:50.000 |
hidden layers. The hyperparameters are batch size of one, learning rate of 0.03, the activation 00:00:56.720 |
function is ReLU and L1 regularization with a rate of 0.001. So let's start with one hidden layer and 00:01:03.600 |
one neuron and gradually increase the size of the network to see what kind of representation it's 00:01:07.760 |
able to learn. Keep your eye on the right side of the screen that shows the test loss and the 00:01:11.920 |
training loss and the plot that shows sample points from the two distributions and then the 00:01:16.960 |
shading in the background of the plot shows the partitioning function that the neural network is 00:01:21.680 |
learning. So a successful function is able to separate the orange and the blue dots. One hidden 00:01:27.520 |
layer with one neuron, two neurons, three neurons, 00:01:49.760 |
Now let's take a look at the trickier spiral dataset keeping most of the hyperparameters the 00:01:54.800 |
same but decreasing the learning rate to 0.01 and adding to the input to the neural network 00:02:02.960 |
extra features than just the coordinate of the point but also the squares of the coordinates, 00:02:08.400 |
the multiplication, and the sign of each coordinate. Let's start with one hidden layer, 00:02:40.160 |
Two hidden layers, two neurons in the second layer, 00:03:40.960 |
There you go. That's a basic illustration with the playground.tensorflow.org that I recommend 00:04:00.960 |
you try that shows the connection between neural network architecture, dataset characteristics, 00:04:07.360 |
and different training hyperparameters. It's important to note that the initialization of 00:04:12.240 |
the neural network has a big impact in many of the cases but the purpose of this video was not 00:04:17.200 |
to show the minimal neural network architecture that's able to represent the spiral dataset but 00:04:23.600 |
rather to provide a visual intuition about which kind of networks are able to learn which kinds of 00:04:28.960 |
datasets. There you go. I hope you enjoyed these quick little videos, whether they make you think, 00:04:34.480 |
give you a new kind of insights, or just fun and inspiring. See you next time, 00:04:39.360 |
and remember, try to challenge yourself and learn something new every day.