back to indexStoic Philosophy Text Generation with TensorFlow
Chapters
0:0
3:28 Embedding Array
9:59 Building the Model
10:26 Embedding Layer
14:36 Summary the Model
21:42 Predict Method
00:00:00.000 |
All right in this video we're going to go through or what I'm going to so redesign 00:00:11.160 |
That we are currently using to actually train a current neural network model 00:00:17.120 |
It's been trained on these two days. That's here 00:00:36.120 |
Model is actually training on meditations by Marcus Aurelius and 00:00:42.000 |
Also less from stoic by its Seneca, which is in Latin this I'm not going to pronounce it 00:01:04.560 |
that this is the parent page and then we're using beautiful soup to 00:01:10.440 |
Pull the individual letters. So it's 124 of them 00:01:14.700 |
So it pulls them all in put some into a dictionary and then 00:01:22.040 |
That dictionary we sort of extract the text we need from that join it all together 00:01:26.640 |
Then join that with meditations and that is our training data. So literally just a big huge string 00:01:33.760 |
Containing all the meditations and all the letters 00:01:41.320 |
You can see here and just join it together. So this is 00:01:45.320 |
So this is the this is the formula get letters in 00:01:52.000 |
And then we have these local web address and the text itself 00:01:56.160 |
So we join all those together into one big string and 00:02:12.880 |
So I so quickly summarize what we do to build the 00:02:18.760 |
To preprocess the data and actually build a record now network and then I'll sort of get started with it. So 00:02:32.440 |
Set sorry, it's a set of the all of the unique words that we have within 00:02:37.400 |
Our data, but actually sorry for this. We're using characters and not words 00:02:42.320 |
Okay, so you can use either in an LP sometimes use a word sometimes use characters characters just kind of easier to set up 00:02:59.480 |
But in this case, we're gonna use characters and then we use we create a character to index dictionary. So 00:03:16.400 |
Numbers, okay, they need to be integers those integers represent an index in an array. So we will have 00:03:31.600 |
It's like a three-dimensional was a three-dimensional model a three-dimensional 00:03:39.920 |
which has the indices of each letter going down here and 00:03:44.280 |
These across here is the embedding dimension. So, you know character a 00:03:50.840 |
In our case, we'll have two hundred and fifty six 00:03:54.840 |
Floater point values that represent it. Okay, and then it's three-dimensional because we have a batch size of 64 00:04:14.880 |
into an index which can then be read into that embedding layer and then we have the 00:04:21.680 |
Pulling it back. So after we after trained and we want to convert our predictions 00:04:28.120 |
Into current readable characters again, we use this which goes from the index to the character 00:04:36.280 |
And this is just kind of converting our data into indices 00:04:40.000 |
Here's the sequence left. So that is the window of 00:04:52.800 |
Imagine our window is four characters. Okay, and we have the input of hello 00:05:07.080 |
Our input has a window size of four, okay, so it reads hell 00:05:16.200 |
Outputs because we are using this to predict or generate text 00:05:30.240 |
Output will be a letter that will be our target. Okay, and 00:05:34.160 |
We do this throughout the entire text. Okay, the only difference for us is that we are doing in 00:05:54.000 |
Sequence length 100 and then if you do the things like I said the batch size 64 embedding dimension 256 00:06:04.360 |
Okay, I'll come down to here so here we're using a 00:06:10.440 |
TensorFlow data set object was really useful and that 00:06:14.800 |
essentially we put our data into this data set object and 00:06:35.640 |
That's what I said before with the hello, right so with that chunk can be hello 00:06:43.680 |
Input data follow me actually I put comment. Okay 00:06:59.560 |
Hello. All right. So that's why we're splitting into our input and target data here. Okay, and then we map 00:07:06.080 |
The data object go here. Okay, we map data object into this function to get our 00:07:26.400 |
So this is where we are splitting into the sequence left with 101 at a time so you can see sequence length plus 1. Okay 00:07:34.000 |
Now anything remaining after that because obviously the data set is very unlikely to correctly squeeze into 00:07:47.960 |
So we just drop the remaining characters if there are any that probably will be but it's not going to be many so it's fine 00:07:56.400 |
Do that here mapping like I just said and then here we shuffle the data set. Okay both sides we define up here 00:08:14.600 |
Give a better representation of the data with every single batch. Okay, so we're gonna train on 00:08:23.000 |
Sequences at any one time and then update the model weights 00:08:33.680 |
So we have the first part of their meditations second part lattice, right 00:08:41.680 |
If we didn't shuffle the data, it would be training on 00:08:49.560 |
Okay, and then within that it would probably also be speaking about a specific topic at any one time. Okay 00:08:56.080 |
Or a specific few topics now doesn't give a very good overview of everything 00:09:02.040 |
We're training on so it updates those weights according to that specific topic and that specific 00:09:09.320 |
Book, okay a specific text. So meditations or letters, right? 00:09:14.200 |
So we show for a data set to give a better representation of the data in every single batch 00:09:21.920 |
So now instead of having you know, just meditation just one topic 00:09:25.420 |
It's gonna have a few different topics and meditations and a few different 00:09:37.360 |
And then batch so we already use batch here to split into sequence lens 00:09:42.000 |
And then here we use batch again to split into 00:10:06.160 |
I have two different I'm gonna change all of this by the way, this is what we're doing 00:10:10.320 |
So we have a grew unit model or LSTM unit model, okay, so it's a change here 00:10:20.160 |
So I'm not gonna go into too much going to be quick 00:10:25.040 |
But the embedding layer as I said before it's a folk up size 00:10:29.880 |
So folk up size is how many characters we have this way in cases are to I think 85 00:10:39.880 |
How is what's 256? It's how detailed every single character is 00:10:53.840 |
Number of sequences that we're gonna put in there any one time, which is 64 in this case 00:11:00.360 |
Then we have our LSTM unit this is where the actual learning sequence learning comes into play 00:11:06.320 |
So it sounds a long short term memory unit. This is gate recurrent unit 00:11:11.560 |
The point of using these over a current neural network is that they retain a sense of memory long term 00:11:20.960 |
They do that through these different gates within the units 00:11:26.440 |
Which is really useful obviously with text and then the other point is 00:11:31.760 |
That we use in dropout of 10% on both of these. So these units are naturally very deep 00:11:39.680 |
So they can over fit really easily. So having a dropout of 10% 00:11:44.740 |
Means that we mask 10% of the inputs at any one time. So in a sentence 00:11:57.720 |
No one of what we must 10% there isn't, you know, we can't really split out. So 00:12:01.920 |
We'll mask one letter. Okay, so hello will become H blank LLO, right? 00:12:10.200 |
And that just helps the model generalize and then here 00:12:15.120 |
This is our classification. So it's a it's just a typical neural network densely connected hence dense 00:12:26.800 |
Our focal size, which is probably I think 85 which 00:12:35.920 |
Output zero will map to a output one will map to B and so on 00:12:42.160 |
Okay, and then that's after that we use the the index to character dictionary again 00:12:47.480 |
Okay, so here we're just building models. This is in a in a function 00:12:52.040 |
Okay, summarizing model creating a loss function 00:12:55.120 |
Compiling the model using atom optimizer and for the last function. Sorry sparse categorical cross entropy 00:13:16.600 |
During training. Okay, so callbacks checkpoint callback, which is here 00:13:23.800 |
Then at the end we restore the final checkpoint 00:13:26.560 |
Rebuild the model. Okay. So here building it again, but this time instead of a batch size of 64 it has a size of 1 00:13:44.480 |
The reason we do that is we don't want a batch size of 64 when we're predicting 00:13:50.080 |
because then we'd have to put in see, you know a list of 00:13:56.320 |
Starting strings and we don't want to do that. So we'd have to like from 00:14:03.720 |
So instead of doing that and that would take a lot more computing power as well. Okay, so we 00:14:12.140 |
essentially flatten the model a little bit and 00:14:20.200 |
Or a batch size of one at a time. So then we just feed in the word from and it will predict 00:14:31.920 |
Load the weights into it and build it again. Okay, then we just some read the model 00:14:38.360 |
Which is the same as above but like I said one instead of 64 for the batch size 00:14:43.440 |
Here I'm just clearing out the memory of the checkpoints because there's a lot of them and they take up a lot of memory 00:14:51.040 |
So I do that after it's loaded or the most recent one 00:14:54.860 |
And then here saving the model and the character index dictionary, which I'll go through that later 00:15:03.200 |
Then here we generate text. This is an old text generation 00:15:11.920 |
Sure, I can show you later. So here data writer 00:15:15.920 |
saving the model saving the character to index and 00:15:23.880 |
What we have down here which is kind of interesting 00:15:33.000 |
In this scenario, I don't really think I've seen it happen 00:15:42.360 |
But other than that, I haven't seen anyone else do it 00:15:51.800 |
Scoring them based on their output based on their English and how 00:15:56.800 |
Grammatically correct everything is and so on 00:16:00.320 |
Scoring their outputs and then choosing a winner 00:16:07.040 |
So you can see here and there's some really rubbish ones because they're not they're not properly trained 00:16:13.240 |
Okay, but some of them are obviously a lot better. So these ones you can see where it says meditations 00:16:19.280 |
they've only been trained on meditations by Marcus Aurelius and they 00:16:22.280 |
Tend to do worse a lot of the time these are training both 00:16:26.640 |
Okay, so this one has the top score almost 20 and what is a cloth vessel if it were no useful book? 00:16:44.720 |
Mot of all those in which it's worth something. Okay 00:16:50.440 |
And actually you can see so it's these ones here 00:16:54.960 |
These are the scores so if we scroll up we can see one of them went a little bit crazy 00:17:01.480 |
I've got a really bad score - 120. I don't know why this happens. It's very weird 00:17:06.720 |
But sometimes they just go crazy which is why I sort of build this ensemble recruit neural network learning thing method 00:17:15.680 |
Because some of them occasionally go a bit crazy 00:17:21.880 |
For all the models or three of the models in this case 00:17:25.860 |
Which are there to you back it up. So one of them goes crazy 00:17:31.440 |
One of the other models takes over. Okay, which is 00:17:34.820 |
Like really useful. I saw the output from one of these I haven't really trained very much and it was so much better 00:17:46.680 |
So the thing that you might think is okay using different models. So how do you keep them like, you know 00:17:56.480 |
They do that by the the winning so we split in sentences 00:18:01.760 |
we we rate a sentence the winning sentence goes back in to the models and 00:18:11.160 |
To generate more text so they're continuously being updated would do with the best new sentence. Okay 00:18:23.680 |
So this is the the new text generation function it still needs some work and so really 00:18:35.640 |
So I'll just kind of go over really quickly if the text is empty because I was getting some errors before where text would be 00:18:43.000 |
I'm not sure if that was an error in my code or the models were just being weird 00:18:47.240 |
I think it must be an error in the code. So I need to remove this and then like actually figure out 00:18:53.200 |
Okay, so text empty just return it because otherwise it'll throw an error while rating the rest of it 00:19:02.640 |
Normalize text as in we remove all punctuation and lowercase everything 00:19:07.440 |
Okay, and then here we check for correct punctuation. So at the end, right? 00:19:14.560 |
at the end is it for stop estimation mark question mark or 00:19:18.560 |
Near the end e.g. If there's a full stop and then a new line character 00:19:25.360 |
question mark this doesn't fully work at the moment because it's splitting it stops the 00:19:35.280 |
All right, for example, so I need to update that so that it stops its text generation when there's a full stop 00:19:43.080 |
question mark or new line character the other alternative like when it when it stops text generation is when it when the 00:19:50.960 |
When it's generate too many characters, which is a limit of like 500 at the moon 00:20:01.560 |
And then we check for too much repetition life. It's gonna say, you know, they're they're they're there 00:20:11.560 |
So then that happened like occasionally not so much of these models, but I have seen it happen quite often before 00:20:18.240 |
So that's quite good rating weight rating as well 00:20:21.820 |
And then here checking all the words are actual words according to the vocabulary that we have 00:20:31.560 |
So I built this separately. I don't know if I still I have to code some words in the comments. So I just 00:20:45.280 |
Split into words and split that into a text document. All right 00:20:56.720 |
Don't know if you can see this in a pie chart much. Oh what I've done 00:21:10.600 |
So you can see it's just a list of all of the words. Okay, and then we just put that into a reg X 00:21:22.200 |
It's in there if it's real or it's in there already. It gets a good rating. Otherwise, no 00:21:44.700 |
So initialize a prediction dictionary called self documentations 00:21:53.520 |
Model name the score and then the text. Okay, which you can see here 00:21:59.800 |
And then that is used by gladiator predict which also controls 00:22:07.640 |
This function here. So this function actually generates text and 00:22:20.000 |
Then it finds the highest scoring sentence or sequence 00:22:27.320 |
so the highest growing one is added to the text and then the highest growing one is set as the 00:22:35.560 |
New start sequence and then initially we keep the start 00:22:39.800 |
String because it's something we typed in like from and we need to keep that in for it for the first 00:22:45.040 |
Iteration to make sense, but then after the first iteration, we don't want to keep what we're feeding back into it because it's a previous sentence 00:22:52.700 |
So we just want a new text after that point. So we just set that to false and then it goes through loops through 00:23:00.360 |
Depending on how many times we have set it to so here's 10. Okay, that's gonna loop through 10 times and produce 00:23:22.880 |
refactor this into something that is clean and 00:23:29.640 |
Not so messy so I'm gonna rebuild it into I think a new model a new sorry 00:23:47.760 |
I'm gonna call it train and then we will refer to that whenever we are building a new model 00:23:57.520 |
So then everything is so segregated bit nicer 00:24:06.480 |
I'll sort of describe what I'm doing every now and again, but for the most part I'm just gonna fast-forward you see the code being made 00:29:19.640 |
Okay, so what we have done now is just refactor, 00:29:24.640 |
just rebuild that code into a class here, as you can see. 00:29:37.620 |
We format the data and then we build the model. 00:29:41.160 |
So my intention of like splitting it like this 00:29:47.240 |
And then I can pass multiple model build parameters to it 00:29:58.720 |
So I haven't built that up yet, I'm going to do that. 00:30:04.360 |
and it will loop through different model parameters 00:30:13.840 |
that we can then use in the Ensembl class up here, okay? 00:30:18.840 |
And then after that, we will have several good models, 00:30:43.400 |
this is types, so I'm going to set up and run that