back to index

Stoic Philosophy Text Generation with TensorFlow


Chapters

0:0
3:28 Embedding Array
9:59 Building the Model
10:26 Embedding Layer
14:36 Summary the Model
21:42 Predict Method

Whisper Transcript | Transcript Only Page

00:00:00.000 | All right in this video we're going to go through or what I'm going to so redesign
00:00:06.460 | This code here, which is the code
00:00:11.160 | That we are currently using to actually train a current neural network model
00:00:17.120 | It's been trained on these two days. That's here
00:00:21.160 | Which I can actually show you over here
00:00:25.040 | so the
00:00:28.200 | Sorry this one
00:00:30.200 | So the
00:00:36.120 | Model is actually training on meditations by Marcus Aurelius and
00:00:42.000 | Also less from stoic by its Seneca, which is in Latin this I'm not going to pronounce it
00:00:49.400 | So we're pulling them from these two sources
00:00:55.840 | And both open source, which is really good
00:01:00.080 | Anything with letters from stoic is
00:01:04.560 | that this is the parent page and then we're using beautiful soup to
00:01:10.440 | Pull the individual letters. So it's 124 of them
00:01:14.700 | So it pulls them all in put some into a dictionary and then
00:01:22.040 | That dictionary we sort of extract the text we need from that join it all together
00:01:26.640 | Then join that with meditations and that is our training data. So literally just a big huge string
00:01:33.760 | Containing all the meditations and all the letters
00:01:37.260 | Which you can see
00:01:41.320 | You can see here and just join it together. So this is
00:01:45.320 | So this is the this is the formula get letters in
00:01:49.180 | Okay, so we have the letter name
00:01:52.000 | And then we have these local web address and the text itself
00:01:56.160 | So we join all those together into one big string and
00:02:01.140 | then after that we join
00:02:04.400 | meditations and letters into
00:02:07.040 | The data that we use for training
00:02:12.880 | So I so quickly summarize what we do to build the
00:02:18.760 | To preprocess the data and actually build a record now network and then I'll sort of get started with it. So
00:02:24.480 | We take a vocab, which is a unique
00:02:28.600 | Unique
00:02:32.440 | Set sorry, it's a set of the all of the unique words that we have within
00:02:37.400 | Our data, but actually sorry for this. We're using characters and not words
00:02:42.320 | Okay, so you can use either in an LP sometimes use a word sometimes use characters characters just kind of easier to set up
00:02:50.140 | Don't really have to do as much with that
00:02:52.900 | There's drawbacks and benefits for both
00:02:56.440 | Usually you probably use words
00:02:59.480 | But in this case, we're gonna use characters and then we use we create a character to index dictionary. So
00:03:09.440 | We when we feed the characters into the data
00:03:13.120 | Into the model they need to be
00:03:16.400 | Numbers, okay, they need to be integers those integers represent an index in an array. So we will have
00:03:24.840 | the first layer in the model is a
00:03:28.780 | embedding array, which is
00:03:31.600 | It's like a three-dimensional was a three-dimensional model a three-dimensional
00:03:38.560 | tensor
00:03:39.920 | which has the indices of each letter going down here and
00:03:44.280 | These across here is the embedding dimension. So, you know character a
00:03:50.840 | In our case, we'll have two hundred and fifty six
00:03:54.840 | Floater point values that represent it. Okay, and then it's three-dimensional because we have a batch size of 64
00:04:03.240 | So we feed in
00:04:04.760 | 64 of everything at any one point, okay
00:04:08.280 | so that's why we have the character index to
00:04:12.080 | convert from a character
00:04:14.880 | into an index which can then be read into that embedding layer and then we have the
00:04:21.680 | Pulling it back. So after we after trained and we want to convert our predictions
00:04:28.120 | Into current readable characters again, we use this which goes from the index to the character
00:04:36.280 | And this is just kind of converting our data into indices
00:04:40.000 | Here's the sequence left. So that is the window of
00:04:44.440 | Text that we read at any one point
00:04:47.480 | so imagine
00:04:50.400 | Let me give you an example
00:04:52.800 | Imagine our window is four characters. Okay, and we have the input of hello
00:05:02.080 | With two L's not three
00:05:07.080 | Our input has a window size of four, okay, so it reads hell
00:05:16.200 | Outputs because we are using this to predict or generate text
00:05:20.720 | Outputs will be pushed one next along
00:05:24.480 | Okay. So where is our input is hell
00:05:30.240 | Output will be a letter that will be our target. Okay, and
00:05:34.160 | We do this throughout the entire text. Okay, the only difference for us is that we are doing in
00:05:42.120 | Sequences of
00:05:47.840 | Okay, so
00:05:49.600 | define these
00:05:51.360 | Don't define it. So you define here
00:05:54.000 | Sequence length 100 and then if you do the things like I said the batch size 64 embedding dimension 256
00:06:02.360 | So we have a few things in there
00:06:04.360 | Okay, I'll come down to here so here we're using a
00:06:10.440 | TensorFlow data set object was really useful and that
00:06:14.800 | essentially we put our data into this data set object and
00:06:19.480 | Then we can use
00:06:22.200 | the functions
00:06:24.120 | useful methods
00:06:25.680 | On that data set object like batching
00:06:27.680 | Shuffle and also batching again here
00:06:31.120 | Okay, so that's really useful and this here
00:06:35.640 | That's what I said before with the hello, right so with that chunk can be hello
00:06:43.680 | Input data follow me actually I put comment. Okay
00:06:48.600 | So so we're inputting. Hello
00:06:52.600 | This input data will be
00:06:57.480 | And this will be
00:06:59.560 | Hello. All right. So that's why we're splitting into our input and target data here. Okay, and then we map
00:07:06.080 | The data object go here. Okay, we map data object into this function to get our
00:07:15.040 | correctly formatted input output data set
00:07:21.800 | We're using shuffle
00:07:23.800 | So here sorry, let me just go this quickly
00:07:26.400 | So this is where we are splitting into the sequence left with 101 at a time so you can see sequence length plus 1. Okay
00:07:34.000 | Now anything remaining after that because obviously the data set is very unlikely to correctly squeeze into
00:07:43.200 | Sets of 101
00:07:47.960 | So we just drop the remaining characters if there are any that probably will be but it's not going to be many so it's fine
00:07:56.400 | Do that here mapping like I just said and then here we shuffle the data set. Okay both sides we define up here
00:08:04.240 | So we shuffle the data set and
00:08:09.440 | Reason that we do that is to
00:08:14.600 | Give a better representation of the data with every single batch. Okay, so we're gonna train on
00:08:23.000 | Sequences at any one time and then update the model weights
00:08:28.640 | shuffling data set
00:08:30.640 | Means that especially with this data as well
00:08:33.680 | So we have the first part of their meditations second part lattice, right
00:08:41.680 | If we didn't shuffle the data, it would be training on
00:08:44.680 | 64 sequences of
00:08:47.240 | meditations at any one time
00:08:49.560 | Okay, and then within that it would probably also be speaking about a specific topic at any one time. Okay
00:08:56.080 | Or a specific few topics now doesn't give a very good overview of everything
00:09:02.040 | We're training on so it updates those weights according to that specific topic and that specific
00:09:09.320 | Book, okay a specific text. So meditations or letters, right?
00:09:14.200 | So we show for a data set to give a better representation of the data in every single batch
00:09:21.920 | So now instead of having you know, just meditation just one topic
00:09:25.420 | It's gonna have a few different topics and meditations and a few different
00:09:28.920 | topics from
00:09:31.000 | Lattice, okay
00:09:32.720 | So that works a lot better
00:09:37.360 | And then batch so we already use batch here to split into sequence lens
00:09:42.000 | And then here we use batch again to split into
00:09:47.760 | Batches of 64 sequence lens. Okay
00:09:51.960 | Okay, so here we're building the model
00:10:06.160 | I have two different I'm gonna change all of this by the way, this is what we're doing
00:10:10.320 | So we have a grew unit model or LSTM unit model, okay, so it's a change here
00:10:20.160 | So I'm not gonna go into too much going to be quick
00:10:25.040 | But the embedding layer as I said before it's a folk up size
00:10:29.880 | So folk up size is how many characters we have this way in cases are to I think 85
00:10:36.640 | the embedding dimension is
00:10:39.880 | How is what's 256? It's how detailed every single character is
00:10:45.760 | Okay, and
00:10:49.280 | then the batch input shape is the
00:10:53.840 | Number of sequences that we're gonna put in there any one time, which is 64 in this case
00:11:00.360 | Then we have our LSTM unit this is where the actual learning sequence learning comes into play
00:11:06.320 | So it sounds a long short term memory unit. This is gate recurrent unit
00:11:11.560 | The point of using these over a current neural network is that they retain a sense of memory long term
00:11:20.960 | They do that through these different gates within the units
00:11:26.440 | Which is really useful obviously with text and then the other point is
00:11:31.760 | That we use in dropout of 10% on both of these. So these units are naturally very deep
00:11:39.680 | So they can over fit really easily. So having a dropout of 10%
00:11:44.740 | Means that we mask 10% of the inputs at any one time. So in a sentence
00:11:50.640 | or in a word
00:11:53.760 | Say hello again
00:11:55.720 | This will mask
00:11:57.720 | No one of what we must 10% there isn't, you know, we can't really split out. So
00:12:01.920 | We'll mask one letter. Okay, so hello will become H blank LLO, right?
00:12:10.200 | And that just helps the model generalize and then here
00:12:15.120 | This is our classification. So it's a it's just a typical neural network densely connected hence dense
00:12:22.120 | and that
00:12:24.800 | that outputs into
00:12:26.800 | Our focal size, which is probably I think 85 which
00:12:33.080 | essentially means
00:12:35.920 | Output zero will map to a output one will map to B and so on
00:12:42.160 | Okay, and then that's after that we use the the index to character dictionary again
00:12:47.480 | Okay, so here we're just building models. This is in a in a function
00:12:52.040 | Okay, summarizing model creating a loss function
00:12:55.120 | Compiling the model using atom optimizer and for the last function. Sorry sparse categorical cross entropy
00:13:04.840 | Here we are
00:13:06.840 | saving the model weights and every epoch
00:13:10.720 | Defined by this here, which is fed into
00:13:13.840 | tends to flow
00:13:16.600 | During training. Okay, so callbacks checkpoint callback, which is here
00:13:23.800 | Then at the end we restore the final checkpoint
00:13:26.560 | Rebuild the model. Okay. So here building it again, but this time instead of a batch size of 64 it has a size of 1
00:13:37.920 | So that
00:13:44.480 | The reason we do that is we don't want a batch size of 64 when we're predicting
00:13:50.080 | because then we'd have to put in see, you know a list of
00:13:56.320 | Starting strings and we don't want to do that. So we'd have to like from
00:14:00.340 | 64 times in a list which doesn't make sense
00:14:03.720 | So instead of doing that and that would take a lot more computing power as well. Okay, so we
00:14:12.140 | essentially flatten the model a little bit and
00:14:17.040 | Then it will only have one batch at a time
00:14:20.200 | Or a batch size of one at a time. So then we just feed in the word from and it will predict
00:14:26.960 | Okay, it will generate text
00:14:29.080 | So we rebuild it. Yeah
00:14:31.920 | Load the weights into it and build it again. Okay, then we just some read the model
00:14:38.360 | Which is the same as above but like I said one instead of 64 for the batch size
00:14:43.440 | Here I'm just clearing out the memory of the checkpoints because there's a lot of them and they take up a lot of memory
00:14:51.040 | So I do that after it's loaded or the most recent one
00:14:54.860 | And then here saving the model and the character index dictionary, which I'll go through that later
00:15:03.200 | Then here we generate text. This is an old text generation
00:15:08.080 | Function is updated a lot now
00:15:11.920 | Sure, I can show you later. So here data writer
00:15:15.920 | saving the model saving the character to index and
00:15:20.240 | Here so
00:15:23.880 | What we have down here which is kind of interesting
00:15:28.920 | Haven't seen this done that often before
00:15:33.000 | In this scenario, I don't really think I've seen it happen
00:15:39.440 | I've seen someone else do it for a chatbot
00:15:42.360 | But other than that, I haven't seen anyone else do it
00:15:45.760 | is using
00:15:48.960 | Using multiple kernel networks
00:15:51.800 | Scoring them based on their output based on their English and how
00:15:56.800 | Grammatically correct everything is and so on
00:16:00.320 | Scoring their outputs and then choosing a winner
00:16:07.040 | So you can see here and there's some really rubbish ones because they're not they're not properly trained
00:16:13.240 | Okay, but some of them are obviously a lot better. So these ones you can see where it says meditations
00:16:19.280 | they've only been trained on meditations by Marcus Aurelius and they
00:16:22.280 | Tend to do worse a lot of the time these are training both
00:16:26.640 | Okay, so this one has the top score almost 20 and what is a cloth vessel if it were no useful book?
00:16:38.720 | and then here
00:16:40.440 | as a score 14.12
00:16:42.440 | these are a pleasure and a
00:16:44.720 | Mot of all those in which it's worth something. Okay
00:16:50.440 | And actually you can see so it's these ones here
00:16:54.960 | These are the scores so if we scroll up we can see one of them went a little bit crazy
00:17:01.480 | I've got a really bad score - 120. I don't know why this happens. It's very weird
00:17:06.720 | But sometimes they just go crazy which is why I sort of build this ensemble recruit neural network learning thing method
00:17:15.680 | Because some of them occasionally go a bit crazy
00:17:18.960 | so by doing that you have
00:17:21.880 | For all the models or three of the models in this case
00:17:25.860 | Which are there to you back it up. So one of them goes crazy
00:17:31.440 | One of the other models takes over. Okay, which is
00:17:34.820 | Like really useful. I saw the output from one of these I haven't really trained very much and it was so much better
00:17:42.200 | Already, it was so much better
00:17:44.760 | the only
00:17:46.680 | So the thing that you might think is okay using different models. So how do you keep them like, you know
00:17:53.320 | so speaking about the same thing and
00:17:56.480 | They do that by the the winning so we split in sentences
00:18:01.760 | we we rate a sentence the winning sentence goes back in to the models and
00:18:08.160 | Then that is used
00:18:11.160 | To generate more text so they're continuously being updated would do with the best new sentence. Okay
00:18:18.760 | Which is a pretty pretty good
00:18:23.680 | So this is the the new text generation function it still needs some work and so really
00:18:30.440 | quickly put together at the moment
00:18:32.960 | the rating function
00:18:35.640 | So I'll just kind of go over really quickly if the text is empty because I was getting some errors before where text would be
00:18:41.320 | Empty every now and again
00:18:43.000 | I'm not sure if that was an error in my code or the models were just being weird
00:18:47.240 | I think it must be an error in the code. So I need to remove this and then like actually figure out
00:18:53.200 | Okay, so text empty just return it because otherwise it'll throw an error while rating the rest of it
00:18:59.840 | so then we
00:19:02.640 | Normalize text as in we remove all punctuation and lowercase everything
00:19:07.440 | Okay, and then here we check for correct punctuation. So at the end, right?
00:19:14.560 | at the end is it for stop estimation mark question mark or
00:19:18.560 | Near the end e.g. If there's a full stop and then a new line character
00:19:22.560 | is there a full stop estimation mark or
00:19:25.360 | question mark this doesn't fully work at the moment because it's splitting it stops the
00:19:32.000 | Text generation when there's a full stop
00:19:35.280 | All right, for example, so I need to update that so that it stops its text generation when there's a full stop
00:19:41.240 | estimation mark
00:19:43.080 | question mark or new line character the other alternative like when it when it stops text generation is when it when the
00:19:50.960 | When it's generate too many characters, which is a limit of like 500 at the moon
00:19:55.840 | So does that too?
00:20:01.560 | And then we check for too much repetition life. It's gonna say, you know, they're they're they're there
00:20:07.320 | No, it's probably a problem. I think
00:20:11.560 | So then that happened like occasionally not so much of these models, but I have seen it happen quite often before
00:20:18.240 | So that's quite good rating weight rating as well
00:20:21.820 | And then here checking all the words are actual words according to the vocabulary that we have
00:20:28.720 | So we actually saved a vocab
00:20:31.560 | So I built this separately. I don't know if I still I have to code some words in the comments. So I just
00:20:40.760 | Read in all the data that we have
00:20:42.760 | have and then
00:20:45.280 | Split into words and split that into a text document. All right
00:20:50.280 | which is
00:20:54.720 | Can we see this?
00:20:56.720 | Don't know if you can see this in a pie chart much. Oh what I've done
00:21:02.160 | Okay, so here here is
00:21:10.600 | So you can see it's just a list of all of the words. Okay, and then we just put that into a reg X
00:21:17.320 | Yeah, and we you know search it. Okay
00:21:22.200 | It's in there if it's real or it's in there already. It gets a good rating. Otherwise, no
00:21:28.340 | Yeah, okay
00:21:32.480 | And then here is our ensemble class
00:21:40.920 | Here is where so this predict method here
00:21:44.700 | So initialize a prediction dictionary called self documentations
00:21:50.160 | so this will have the
00:21:53.520 | Model name the score and then the text. Okay, which you can see here
00:21:59.800 | And then that is used by gladiator predict which also controls
00:22:07.640 | This function here. So this function actually generates text and
00:22:10.740 | Scores it this one here just controls
00:22:14.200 | that function
00:22:17.040 | So it runs that function. Yeah, and
00:22:20.000 | Then it finds the highest scoring sentence or sequence
00:22:27.320 | so the highest growing one is added to the text and then the highest growing one is set as the
00:22:35.560 | New start sequence and then initially we keep the start
00:22:39.800 | String because it's something we typed in like from and we need to keep that in for it for the first
00:22:45.040 | Iteration to make sense, but then after the first iteration, we don't want to keep what we're feeding back into it because it's a previous sentence
00:22:52.700 | So we just want a new text after that point. So we just set that to false and then it goes through loops through
00:23:00.360 | Depending on how many times we have set it to so here's 10. Okay, that's gonna loop through 10 times and produce
00:23:07.940 | winning
00:23:10.240 | Outputs, okay
00:23:13.720 | That's what we have so far
00:23:15.960 | What I'm going to do now
00:23:19.280 | What we need to do now is
00:23:22.880 | refactor this into something that is clean and
00:23:29.640 | Not so messy so I'm gonna rebuild it into I think a new model a new sorry
00:23:36.920 | new Python file
00:23:39.800 | Called train, I think
00:23:44.080 | Yeah, I think train is fine
00:23:47.760 | I'm gonna call it train and then we will refer to that whenever we are building a new model
00:23:54.600 | And training the model
00:23:57.520 | So then everything is so segregated bit nicer
00:24:00.240 | Okay, so I'm gonna go ahead and
00:24:04.160 | Get on with that
00:24:06.480 | I'll sort of describe what I'm doing every now and again, but for the most part I'm just gonna fast-forward you see the code being made
00:24:12.880 | It should be okay
00:24:39.480 | (upbeat music)
00:24:42.060 | (upbeat music)
00:24:44.640 | (upbeat music)
00:24:47.220 | (upbeat music)
00:24:49.800 | (upbeat music)
00:24:52.380 | (upbeat music)
00:24:54.960 | (upbeat music)
00:24:57.540 | (upbeat music)
00:25:00.120 | (upbeat music)
00:25:02.700 | (upbeat music)
00:25:05.280 | (upbeat music)
00:25:07.860 | (upbeat music)
00:25:10.440 | (upbeat music)
00:25:13.020 | (upbeat music)
00:25:15.600 | (upbeat music)
00:25:20.660 | (upbeat music)
00:25:26.960 | (upbeat music)
00:25:33.560 | (upbeat music)
00:25:41.840 | (upbeat music)
00:25:44.420 | (upbeat music)
00:25:47.000 | (upbeat music)
00:25:57.500 | (upbeat music)
00:26:05.580 | (upbeat music)
00:26:12.220 | (upbeat music)
00:26:14.800 | (upbeat music)
00:26:17.380 | (upbeat music)
00:26:28.680 | (upbeat music)
00:26:37.900 | (upbeat music)
00:26:40.700 | (upbeat music)
00:26:43.280 | (upbeat music)
00:26:58.820 | (upbeat music)
00:27:01.400 | (upbeat music)
00:27:16.180 | (upbeat music)
00:27:27.020 | (upbeat music)
00:27:29.600 | (upbeat music)
00:27:46.140 | (upbeat music)
00:27:48.720 | (upbeat music)
00:28:03.300 | (upbeat music)
00:28:05.880 | (upbeat music)
00:28:08.460 | (upbeat music)
00:28:11.040 | (upbeat music)
00:28:13.620 | (upbeat music)
00:28:16.200 | (upbeat music)
00:28:34.040 | (upbeat music)
00:28:36.620 | (upbeat music)
00:28:50.800 | (upbeat music)
00:28:53.380 | (upbeat music)
00:29:19.640 | Okay, so what we have done now is just refactor,
00:29:24.640 | just rebuild that code into a class here, as you can see.
00:29:31.420 | So now the model is initialized here.
00:29:37.620 | We format the data and then we build the model.
00:29:41.160 | So my intention of like splitting it like this
00:29:44.520 | is that we can format the data, okay?
00:29:47.240 | And then I can pass multiple model build parameters to it
00:29:52.240 | and build several models at once.
00:29:58.720 | So I haven't built that up yet, I'm going to do that.
00:30:01.920 | So I'll just put in like a loop here
00:30:04.360 | and it will loop through different model parameters
00:30:07.120 | and build them all up throughout,
00:30:09.400 | over a night or something.
00:30:11.040 | So we have several different models
00:30:13.840 | that we can then use in the Ensembl class up here, okay?
00:30:18.840 | And then after that, we will have several good models,
00:30:26.160 | hopefully good models all competing
00:30:29.360 | to get the best text, right?
00:30:31.960 | So I think that is pretty good so far.
00:30:36.540 | So one thing I just noticed actually here,
00:30:43.400 | this is types, so I'm going to set up and run that
00:30:48.400 | and see what we get, okay, cool.
00:30:55.760 | [BLANK_AUDIO]