Back to Index

Stoic Philosophy Text Generation with TensorFlow


Chapters

0:0
3:28 Embedding Array
9:59 Building the Model
10:26 Embedding Layer
14:36 Summary the Model
21:42 Predict Method

Transcript

All right in this video we're going to go through or what I'm going to so redesign This code here, which is the code That we are currently using to actually train a current neural network model It's been trained on these two days. That's here Which I can actually show you over here so the Sorry this one So the Model is actually training on meditations by Marcus Aurelius and Also less from stoic by its Seneca, which is in Latin this I'm not going to pronounce it So we're pulling them from these two sources And both open source, which is really good Anything with letters from stoic is that this is the parent page and then we're using beautiful soup to Pull the individual letters.

So it's 124 of them So it pulls them all in put some into a dictionary and then That dictionary we sort of extract the text we need from that join it all together Then join that with meditations and that is our training data. So literally just a big huge string Containing all the meditations and all the letters Which you can see You can see here and just join it together.

So this is So this is the this is the formula get letters in Okay, so we have the letter name And then we have these local web address and the text itself So we join all those together into one big string and then after that we join meditations and letters into The data that we use for training Okay So I so quickly summarize what we do to build the To preprocess the data and actually build a record now network and then I'll sort of get started with it.

So We take a vocab, which is a unique Unique Set sorry, it's a set of the all of the unique words that we have within Our data, but actually sorry for this. We're using characters and not words Okay, so you can use either in an LP sometimes use a word sometimes use characters characters just kind of easier to set up Don't really have to do as much with that There's drawbacks and benefits for both Usually you probably use words But in this case, we're gonna use characters and then we use we create a character to index dictionary.

So We when we feed the characters into the data Into the model they need to be Numbers, okay, they need to be integers those integers represent an index in an array. So we will have the first layer in the model is a embedding array, which is It's like a three-dimensional was a three-dimensional model a three-dimensional tensor which has the indices of each letter going down here and These across here is the embedding dimension.

So, you know character a In our case, we'll have two hundred and fifty six Floater point values that represent it. Okay, and then it's three-dimensional because we have a batch size of 64 So we feed in 64 of everything at any one point, okay so that's why we have the character index to convert from a character into an index which can then be read into that embedding layer and then we have the Pulling it back.

So after we after trained and we want to convert our predictions Into current readable characters again, we use this which goes from the index to the character Okay And this is just kind of converting our data into indices Here's the sequence left. So that is the window of Text that we read at any one point so imagine Let me give you an example Imagine our window is four characters.

Okay, and we have the input of hello With two L's not three so Our input has a window size of four, okay, so it reads hell Our Outputs because we are using this to predict or generate text Outputs will be pushed one next along Okay. So where is our input is hell Output will be a letter that will be our target.

Okay, and We do this throughout the entire text. Okay, the only difference for us is that we are doing in Sequences of 101 Okay, so define these Don't define it. So you define here Sequence length 100 and then if you do the things like I said the batch size 64 embedding dimension 256 Okay So we have a few things in there Okay, I'll come down to here so here we're using a TensorFlow data set object was really useful and that essentially we put our data into this data set object and Then we can use the functions useful methods On that data set object like batching Shuffle and also batching again here Okay, so that's really useful and this here That's what I said before with the hello, right so with that chunk can be hello Input data follow me actually I put comment.

Okay So so we're inputting. Hello This input data will be help And this will be Hello. All right. So that's why we're splitting into our input and target data here. Okay, and then we map The data object go here. Okay, we map data object into this function to get our correctly formatted input output data set so We're using shuffle So here sorry, let me just go this quickly So this is where we are splitting into the sequence left with 101 at a time so you can see sequence length plus 1.

Okay Now anything remaining after that because obviously the data set is very unlikely to correctly squeeze into Sets of 101 So we just drop the remaining characters if there are any that probably will be but it's not going to be many so it's fine so Do that here mapping like I just said and then here we shuffle the data set.

Okay both sides we define up here So we shuffle the data set and Reason that we do that is to Give a better representation of the data with every single batch. Okay, so we're gonna train on 64 Sequences at any one time and then update the model weights so shuffling data set Means that especially with this data as well So we have the first part of their meditations second part lattice, right If we didn't shuffle the data, it would be training on 64 sequences of meditations at any one time Okay, and then within that it would probably also be speaking about a specific topic at any one time.

Okay Or a specific few topics now doesn't give a very good overview of everything We're training on so it updates those weights according to that specific topic and that specific Book, okay a specific text. So meditations or letters, right? So we show for a data set to give a better representation of the data in every single batch So now instead of having you know, just meditation just one topic It's gonna have a few different topics and meditations and a few different topics from Lattice, okay So that works a lot better And then batch so we already use batch here to split into sequence lens And then here we use batch again to split into Batches of 64 sequence lens.

Okay Okay, so here we're building the model So I have two different I'm gonna change all of this by the way, this is what we're doing So we have a grew unit model or LSTM unit model, okay, so it's a change here So I'm not gonna go into too much going to be quick But the embedding layer as I said before it's a folk up size So folk up size is how many characters we have this way in cases are to I think 85 the embedding dimension is How is what's 256?

It's how detailed every single character is Okay, and then the batch input shape is the Number of sequences that we're gonna put in there any one time, which is 64 in this case Okay Then we have our LSTM unit this is where the actual learning sequence learning comes into play So it sounds a long short term memory unit.

This is gate recurrent unit The point of using these over a current neural network is that they retain a sense of memory long term They do that through these different gates within the units Which is really useful obviously with text and then the other point is That we use in dropout of 10% on both of these.

So these units are naturally very deep So they can over fit really easily. So having a dropout of 10% Means that we mask 10% of the inputs at any one time. So in a sentence or in a word Say hello again This will mask No one of what we must 10% there isn't, you know, we can't really split out.

So We'll mask one letter. Okay, so hello will become H blank LLO, right? And that just helps the model generalize and then here This is our classification. So it's a it's just a typical neural network densely connected hence dense and that that outputs into Our focal size, which is probably I think 85 which essentially means Output zero will map to a output one will map to B and so on Okay, and then that's after that we use the the index to character dictionary again Okay, so here we're just building models.

This is in a in a function Okay, summarizing model creating a loss function Compiling the model using atom optimizer and for the last function. Sorry sparse categorical cross entropy Okay Here we are saving the model weights and every epoch Defined by this here, which is fed into tends to flow During training.

Okay, so callbacks checkpoint callback, which is here Okay Then at the end we restore the final checkpoint Rebuild the model. Okay. So here building it again, but this time instead of a batch size of 64 it has a size of 1 Okay So that So The reason we do that is we don't want a batch size of 64 when we're predicting because then we'd have to put in see, you know a list of 64 Starting strings and we don't want to do that.

So we'd have to like from 64 times in a list which doesn't make sense So instead of doing that and that would take a lot more computing power as well. Okay, so we essentially flatten the model a little bit and Then it will only have one batch at a time Or a batch size of one at a time.

So then we just feed in the word from and it will predict Okay, it will generate text So we rebuild it. Yeah Load the weights into it and build it again. Okay, then we just some read the model Which is the same as above but like I said one instead of 64 for the batch size Here I'm just clearing out the memory of the checkpoints because there's a lot of them and they take up a lot of memory So I do that after it's loaded or the most recent one And then here saving the model and the character index dictionary, which I'll go through that later Then here we generate text.

This is an old text generation Function is updated a lot now Sure, I can show you later. So here data writer saving the model saving the character to index and Here so What we have down here which is kind of interesting Haven't seen this done that often before In this scenario, I don't really think I've seen it happen I've seen someone else do it for a chatbot But other than that, I haven't seen anyone else do it is using Using multiple kernel networks Scoring them based on their output based on their English and how Grammatically correct everything is and so on Scoring their outputs and then choosing a winner Okay So you can see here and there's some really rubbish ones because they're not they're not properly trained Okay, but some of them are obviously a lot better.

So these ones you can see where it says meditations they've only been trained on meditations by Marcus Aurelius and they Tend to do worse a lot of the time these are training both Okay, so this one has the top score almost 20 and what is a cloth vessel if it were no useful book?

Okay and then here as a score 14.12 these are a pleasure and a Mot of all those in which it's worth something. Okay And actually you can see so it's these ones here These are the scores so if we scroll up we can see one of them went a little bit crazy I've got a really bad score - 120.

I don't know why this happens. It's very weird But sometimes they just go crazy which is why I sort of build this ensemble recruit neural network learning thing method Because some of them occasionally go a bit crazy so by doing that you have For all the models or three of the models in this case Which are there to you back it up.

So one of them goes crazy One of the other models takes over. Okay, which is Like really useful. I saw the output from one of these I haven't really trained very much and it was so much better Already, it was so much better the only So the thing that you might think is okay using different models.

So how do you keep them like, you know so speaking about the same thing and They do that by the the winning so we split in sentences we we rate a sentence the winning sentence goes back in to the models and Then that is used To generate more text so they're continuously being updated would do with the best new sentence.

Okay Which is a pretty pretty good So this is the the new text generation function it still needs some work and so really quickly put together at the moment the rating function So I'll just kind of go over really quickly if the text is empty because I was getting some errors before where text would be Empty every now and again I'm not sure if that was an error in my code or the models were just being weird I think it must be an error in the code.

So I need to remove this and then like actually figure out Okay, so text empty just return it because otherwise it'll throw an error while rating the rest of it so then we Normalize text as in we remove all punctuation and lowercase everything Okay, and then here we check for correct punctuation.

So at the end, right? at the end is it for stop estimation mark question mark or Near the end e.g. If there's a full stop and then a new line character is there a full stop estimation mark or question mark this doesn't fully work at the moment because it's splitting it stops the Text generation when there's a full stop All right, for example, so I need to update that so that it stops its text generation when there's a full stop estimation mark question mark or new line character the other alternative like when it when it stops text generation is when it when the When it's generate too many characters, which is a limit of like 500 at the moon So does that too?

Okay And then we check for too much repetition life. It's gonna say, you know, they're they're they're there No, it's probably a problem. I think So then that happened like occasionally not so much of these models, but I have seen it happen quite often before So that's quite good rating weight rating as well And then here checking all the words are actual words according to the vocabulary that we have So we actually saved a vocab So I built this separately.

I don't know if I still I have to code some words in the comments. So I just Read in all the data that we have have and then Split into words and split that into a text document. All right which is here Can we see this? Don't know if you can see this in a pie chart much.

Oh what I've done Okay, so here here is So you can see it's just a list of all of the words. Okay, and then we just put that into a reg X Yeah, and we you know search it. Okay It's in there if it's real or it's in there already.

It gets a good rating. Otherwise, no Yeah, okay And then here is our ensemble class so Here is where so this predict method here So initialize a prediction dictionary called self documentations so this will have the Model name the score and then the text. Okay, which you can see here And then that is used by gladiator predict which also controls This function here.

So this function actually generates text and Scores it this one here just controls that function So it runs that function. Yeah, and Then it finds the highest scoring sentence or sequence Okay so the highest growing one is added to the text and then the highest growing one is set as the New start sequence and then initially we keep the start String because it's something we typed in like from and we need to keep that in for it for the first Iteration to make sense, but then after the first iteration, we don't want to keep what we're feeding back into it because it's a previous sentence So we just want a new text after that point.

So we just set that to false and then it goes through loops through Depending on how many times we have set it to so here's 10. Okay, that's gonna loop through 10 times and produce winning Outputs, okay so That's what we have so far What I'm going to do now What we need to do now is refactor this into something that is clean and Not so messy so I'm gonna rebuild it into I think a new model a new sorry new Python file Called train, I think Yeah, I think train is fine I'm gonna call it train and then we will refer to that whenever we are building a new model And training the model So then everything is so segregated bit nicer Okay, so I'm gonna go ahead and Get on with that I'll sort of describe what I'm doing every now and again, but for the most part I'm just gonna fast-forward you see the code being made It should be okay I I I I I I I I I I I I (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) Okay, so what we have done now is just refactor, just rebuild that code into a class here, as you can see.

So now the model is initialized here. We format the data and then we build the model. So my intention of like splitting it like this is that we can format the data, okay? And then I can pass multiple model build parameters to it and build several models at once.

So I haven't built that up yet, I'm going to do that. So I'll just put in like a loop here and it will loop through different model parameters and build them all up throughout, over a night or something. So we have several different models that we can then use in the Ensembl class up here, okay?

And then after that, we will have several good models, hopefully good models all competing to get the best text, right? So I think that is pretty good so far. So one thing I just noticed actually here, this is types, so I'm going to set up and run that and see what we get, okay, cool.