Stanford CS224N NLP with Deep Learning | 2023 | Python Tutorial, Manasi Sharma

All right, hi everyone. Welcome to the 224N Python review session. The goal of the session really will be to sort of give you the basics of Python and NumPy in particular that you'll be using a lot in your second homework. And the homeworks that will come after that as well.

We're sort of taking this tutorial from the background of anyone who hasn't touched programming languages to some extent. But also for people who have, we'll be sort of going through a lot of that material very quickly and we'll be progressing to NumPy as well. And as I mentioned, first and foremost, the session is really meant for the people who are here in person.

So if you'd like me to slow down, speed up at any point, need time for clarifications, feel free to ask. It's really meant for you first here. And I really would like it to be sort of an interactive session as well. All right, so these are the topics we'll be covering today.

Going through first of all, why Python as a language? Why have we chosen it for sort of this course? And in general, why do people prefer it to some extent for machine learning and natural language processing? Some basics of the language itself, common data structures. And then getting to sort of the meat of it through NumPy, which as I mentioned you'll be extensively using in your homeworks going forward.

And then some practical tips about how to use things in Python. All right, so first thing, why Python? So a lot of you who might have been first introduced to programming, might have done Java before. A lot of people use MATLAB in other fields as well. So why Python?

Python is generally used for one, because it's a very high level language. It can look very, very English like, and so it's really easy to work with for people, especially when they get started out. It has a lot of scientific computational functionality as well, similar to MATLAB. So when you talk about NumPy, you'll see that it has a lot of frameworks of very, very quick and efficient operations involving math or matrices.

And that's very, very useful in applications such as deep learning. And for deep learning in particular, a lot of frameworks that people use, particularly for example, PyTorch and TensorFlow, interface directly with Python. And so for that, those main reasons, people generally tend to use Python within deep learning. Okay, so the setup information is in the slides if you'd like to look at them offline.

I will be sort of jumping over that for now because I wanna sort of get to the introduction to the language itself. And if we have time, come back to sort of the setup information. A lot of it's pretty direct. You can walk through it. It gives you steps for sort of how to install packages.

What is a conda environment, for example? And gets you set up with your first working Python environment, so you can sort of run simple and basic commands to get used to the language. But for now, I'm gonna be skipping over this and coming back to it if we have time.

All right, language basics. So in Python, you have variables, and these variables can take on multiple values. The assignment operation, or the equal sign, will allow you to assign a particular value to a variable. A nice thing with Python is you don't have to instantiate the type of the variable to begin with, and then only instantiate, and only assign values of that type.

So for example, in certain languages, we first say that this variable, x, is only gonna be of type int. And any value aside from that assigned to it will throw an error. Python's pretty flexible. So if I want to, I can reassign, I can start with x is equal to 10.

And then later on, like five lines later, I can say x is equal to high as a string, and there would be no issue. You can do simple mathematical operations, such as the plus and division signs. You can do exponentiation, which is raising one value to another value. So x to the power of y, for example, using the double asterisk.

You can do type castings for float divisions. So if you wanna ensure that your values are being divided, resulting in a float value and not just dividing two integers, you can cast two different types like float. If you want something to be specifically an int, you can also just put an int instead of the float with brackets around the result, and that'll give you an integer value.

And then you can also do type casting to, for example, convert from integers to strings. So in this case, if I wanted to, instead of doing 10 plus 3 as a mathematical operation, I just wanted to write out 10 plus 3. Then I can convert the x and y values, for example, to strings, and then add the plus sign as a character as well to create a string.

And so a lot of these common operations you can look online as well. People have lists for them, and just see how they're sort of done in Python. All right, some other quick things. So Boolean values, the true and the false, they're always used with capital letters. In some of the languages, it might be lowercase, so just one thing to know.

Python also doesn't have a null value. The equivalent of a null value is none. So sometimes when you wanna say that this value, you want to return none, say that I'm not really doing anything here. You wanna do checks, for example, in if statements, to say that this doesn't have a value, then you can assign it to none.

So none sort of functions as a null equivalent, so you're not really returning anything, it doesn't have a value. Not the same as zero. And another nice thing about Python is lists, which are sort of mutable, we'll come to that a little bit later, but sort of mutable lists of objects.

And means that you can change them, they can be of any type. So you can have a mixture of integers, none values, strings, etc. And yeah, functions can return the none value as well. And another quick thing, instead of using the double and and in some of the languages as people might do, with Python, I mentioned earlier, it's very English-like.

So you can actually just write out if x is equal to 3 and, and in English, y is equal to 4, then return true or something. It's quite nice that way, so you can use and, or, and not. And then just the comparison operators of equal equals to and not equals to will check for equality and inequality.

This one's pretty standard, I feel, across many languages, and you can use them in Python as well. And yeah, remember, just a quick thing, the equal equal to sign is different from the assignment operator. This one checks for equality, that one is just assigning a value. So single equal sign versus two of them.

All right, and then also in Python, you don't use brackets. So Python, you can use basically spaces or tabs. So either indents of 2 or 4 to be able to break up what is contained within the function or contained within like an if statement, a for statement, or any loops, for example.

And so the main thing is you can choose whether to do 2 or 4. You just have to be consistent throughout your entire code base, otherwise they will throw an error. Now we'll go to some common data structures, and for this we'll transition to the Colab. So this one will sort of show you in real time.

This is, by the way, a Colab. A Colab is basically a Jupyter Notebook, for those of you who are familiar with those, that you can use and it's hosted on Google servers. The really nice thing about Jupyter Notebooks is you don't have to run an entire file all together, you can run it step by step into what are these called cells.

So if you want to see like an intermediate output, you can see that pretty easily. And that way, and it also writes, for example, a lot of descriptions pertaining to cells, which is really, really nice to have as well. So a lot of people tend to use these when they're sort of starting off their project and want to debug things.

And Colab allows you to use these Jupyter Notebook type applications, hosted on their servers for free, basically. So anyone can create one of these and run their code. All right, so lists are mutable arrays. Mutable means that you can change them, so that once you declare them, you can add to them, you can delete them, and they're optimized for that purpose.

So they expect to be changed very often. We'll come to what are called NumPy arrays later, and those tend to be pretty much fixed. When you change one, you basically have to create a new array, which will have the additional information. So this is highly optimized for changing things.

So if you know, for example, and you're in a loop, you're adding different elements to, let's say, a bigger entity, you'd want to use something like a list, because you're going to be changing that very often. So let's see how they work. So we start off with a names array with Zack and Jay.

You can index into the list by, so what is that? It says index into the list by index, which means that you can list out the elements in the list, depending on what's called the index. So it's what place that value is at within the list. So zero refers to the first element, so Python's what's called zero index, which means it starts with zero, and then it goes to one.

So here, zero will be Zack. And then let's say I want to append something to the end. So to add something to the end of the list, the term is append, not add. And so if I want to append, I can now create a separate list, which is the original list itself with the added last element.

And what would currently be the length of this? It would be three, because you have three elements. And you can just quickly get that by using the len function, not length, just three letters, len. All right, it's also really nice because Python has overloaded the plus operation to be able to concatenate lists.

So here, I have a separate list, right? And all you need for a list definition is just brackets. So this is a separate list altogether, even though I haven't saved it in the variable, just Abhi and Kevin. And I can just do a plus equal to, which means that names is equal to names plus Abhi and Kevin.

And this should output this full list. You can create lists by just putting the plain brackets or an existing list. And then as I mentioned earlier, your list can have a variety of types within them. So here, this list contains an integer value, a list value. So you can have a list of lists, as many sort of sublists as you'd like, a float value and a none value.

And this is completely valid within Python. Slicing refers to how you can access only parts of the list. So if I only want, for example, in this numbers array, I only want 0, 1, 2. Slicing is a way that you can extract only those parts. So the way slicing works is, the first element is included and the last element is excluded.

So here, I start with 0, 1, 2, 3. So 3 is not included and so 0, 1, 2 will be printed out. There's also shorthands. So if you know that you're going to be starting with the first element of the array. So if you know I'm starting, I want 0, 1, 2 and it starts with 0, then you don't need to even include the first index.

You can just leave that and include the last index that would be excluded. So that would be blank, semi-colon 3 and same deal with the end. If you know that you want to take everything, let's say from like 5 and 6 till the end of the array, you can start with what would you like.

So 0, 1, 2, 3, 4, 5 till the end and leave that. Fun fact, so this semi-colon, when you take just the semi-colon, it'll take everything in the list but it'll also create a duplicate in memory. That's a very slight, very useful thing to know because sometimes when you like pass lists in array, sorry in Python which is out of scope of this tutorial, you'd only pass the reference to it.

So if you will change the array, that gets changed. This will create an entirely separate copy in memory of the exact same array. So if you make any changes to it, it won't affect your original array. So this is a very pretty neat way to do that. Then another fun thing that Python has which is pretty unique, is you can index negatively.

So negative indexing means you index from the back of the array. So minus 1 refers to the last element of the array, minus 3 will refer to the third last element. So what minus 1 will give you will be 6 in this case, but minus 3 will give you will be everything because you're starting with the minus 3 elements.

So minus 1, minus 2, minus 3 till the end. Then this one seems kind of confusing, right? 3 to minus 2. So this will do is it will give you 0, 1, 2, 3. So you start with 3 and then minus 1, minus 2. So you leave off the x, the last because you excluded within lists.

You'd only get 3 and 4. That's what this is. Okay. That's about lists. Tuples are immutable arrays. So once you declare the values of these, they cannot be changed. So I start with, you know, we started with like the list of Zack and Jay. Tuples, you start with Zack and Jay.

You can still access them. You know, I can still print out names 0, same as I did with lists. But if I try to change it, in this case, it'll throw an error. So tuples, once you've instantiated them, they cannot be changed. To create an empty tuple, you just create, you can either use just a tuple sign, or oftentimes you can just use the parentheses brackets.

So you can just say, for example, as you did here, just parentheses to instantiate something. All right. And yeah, this one, we'll come to a little bit later in shapes. But you can also have a tuple of a single value. And all you have to do there is just put the value and put a comma.

So that just shows that you have a tuple, which is like an immutable array. So you can't change it. It's a list, but only of one item. And that's here. Okay. I'll quickly move to dictionaries. For those of you who might be familiar with other languages, this is the equivalent of a hash map or hash table.

What this is useful for essentially is mapping one value to another in a really, really quick way. So if I want to map, for example, a string to an index, which you will happen to do a lot of in your homeworks, this is a really, really useful way to do that.

And so what it does is you can instantiate this dictionary. And it says corresponding that Zack is going to correspond to this string value, whatever it is. And so anytime I want to retrieve the string value, I just use this dictionary. I index by it, which is what I do here, and then it outputs the corresponding value.

And it does that really, really quickly. And yeah, so it's really useful, very, very commonly used. Especially when you sort of, for example, you have like a list of strings or a list of items, and you want to have a corresponding index for them. Because as you'll see in NLP, oftentimes you're using with- you're working with indices and numbers in particular.

So it's a really great way to sort of move from like string formats to just like numerical index values. There's some other things you can do for dictionaries. You can check whether certain elements are in there. So if you, for example, try to index phone book is equal to Monty, they'll throw an error because there's no string that says Monty in that phone book dictionary.

And so sometimes you might be wanting to do checks before you extract a value. And so this will just check, for example, if I do print Monty and phone book, it should say false or for example here Kevin and phone book, it should say false. While something that's actually in that dictionary, Zach will be true.

Okay. And then if you'd like to delete an entry from the, um, from the dictionary, you can just do that using the del command. All right. Let's move to loops, um, quickly. So loops are a really great way to optimize for the same kind of op- same kind of operation you're doing.

Um, it's also a great way to, um, start to sequentially go over, um, those list type or array type objects we were talking about earlier. You know, you have like a list of names, right? Um, how do you access all of them? So loops are really a great way to do that.

Um, in Python, um, they've abstracted away a lot of the confusing sort of, um, parts in other languages that might be. You- you can really, for example, first index on numbers. So what you do is you have like a range function that you call. So here you say range, um, and the range of the last number you'd want.

So what this range function will return is 0, 1, 2, 3, 4, and that's what will be stored in this i value. And here it's just printing out that i value. So if I want to, for example, loop over the length of an- of a list of size 10, I just have to do for i in range 10, and then index that corresponding part of the list.

You technically don't even have to do that because in Python, you can just directly get the element of the list. So here I have an- a list of, um, names where I have Zach, Jay, and Richard. Instead of saying first the length of the list, and then doing this range operation, I can just directly say for name and names, and then print out the names, and it will just directly get the element in each list.

Um, but sometimes you might want both. You might both want this element Zach, as well as its position in the array. And for that, you can actually use this really helpful function called enumerate. And so enumerate will basically pair those two values, and it'll give you the, um, both the value which is here in name for example, and its corresponding index within the array, um, both together.

So that's really, really convenient. Versus for example, having to do this like a little bit more complicated range operation, where you first take the range and then you index the list. How do you iterate over a dictionary? So for dictionaries, um, if you want to inter- um, iterate over what's called the keys.

So all of these first items that you first, you know, put into the dic- the dictionary, you can just iterate the same way you would a list. You just say for name in for example, phone book, and you can output the keys. If you want to iterate over what is stored in the list, which is called a value, you'd have to do the dictionary dot values.

And if you want both, you use the dot items function. And so that will print out both of these. All right. So this is sort of covering the overarching most commonly used sort of structures, um, lists, um, dictionaries, and then loops, and how to sort of efficiently use them within your code.

We'll quickly be moving to the sort of meat of what, um, is really, really strong about Python, and what you'll be using a lot for your coming homework, especially homework two, um, which is NumPy. Okay. So for NumPy also I'm going to be going to the CoLab, but just quickly wanted to mention, um, what NumPy is.

So NumPy is basically an optimized library, um, for mathematical operations. You know, people tend to like MathLab because it's very, very useful for these mathematical operations, which people use in their research. Um, Python's sort of solution to that is to have a separate library entirely where they make use of, um, subroutines which are sort of like sub languages, um, sorry, sub, um, scripts that are written in a different language called C or C++, um, that are highly optimized for, um, efficiency.

So the reason C and C++ are much faster than Python is because they're closer to what's called machine language, which is what the computer will read. Um, I mentioned earlier, one of the nice things about Python is it's kind of high level. It looks like English, right? Just like I said.

You know, we say literally like is, you know, if x is equal to one or x is equal to two, right? But, um, that also means that there's a lot more translation required on the computer's part before it understands what you mean. Um, and that's useful when you know we're writing out code where we want to understand it, but it's a little bit less useful when you're sort of running a lot of operations on a lot of data.

So the real benefit of something like NumPy is that if you have sort of your memory and your data in a particular format, it'll call these, these like we see scripts or what are called subroutines in a different language and it'll make them very, very fast. And so that's the real benefit of using NumPy.

And almost everyone, um, in, in sort of NLP is very, very familiar with this because you'll be running a lot of operations on, for example, like co-occurrence matrices, which are really, really big and, um, it's very useful to have them optimized for time. So that's really the benefit of using NumPy.

And NumPy basically, it's involved for all these like math and matrix and vector calculations. Um, and it's different than a list. Although you can easily translate between a list and a NumPy array, NumPy arrays are specifically, as I mentioned, designed to be used in these subroutines. So they have a specific format, they're instantiated differently, um, and you can translate between this and sort of your standard list easily.

But to know that you can only do NumPy operations on NumPy arrays. You can't do NumPy operations on lists directly. You'd first have to like convert them, which is really simple. You just use this NumPy dot array function. Um, but just know that they'd operate only on NumPy arrays.

Okay. So for NumPy, we're gonna be going back to the Colab. And then, as I mentioned earlier, the real strength of NumPy is, you know, it supports these large multi-dimensional arrays and matrices for very, very optimized high-level mathematical functions. Um, and just to go back- step back for a quick second, what is a matrix?

Matrices are basically like rectangular, um, structures of numbers that are used and you can treat them with specific rules, um, for operations between different kinds of things. So if you have like a lot of data, instead of, you know, individually potentially multiplying things, if you can store them in this rectangular format, um, you have specific rules about how this matrix, for example, will interact with a different one.

And by doing that, which is matrix multiplication or matrix math, um, you can do a wide variety of mathematical operations. A vector is generally- this is conventional. None of these are like hard and fast rules, but conventionally, a vector is, um, a matrix in one dimension. So it's usually like a row vector or a column vector, which usually just means that it's a list, um, of values in only one dimension.

So it's like, for example, here, when I come down to x is equal to numpy array of 1, 2, 3, that's a list in only one dimension versus, for example, z, when I- this is z down here, that is what's called like a two-dimensional array because you have both rows, for example, like 6, 7, and then you have 8, 9, um, versus in this first one, you only have three values in one dimension.

So that's sort of the conventional difference between the two. Another convention is matrices generally refer to two-dimensional objects. So this, as I mentioned, is like z, this is two-dimensional. Um, you might have heard the word tensor also. Tensors by convention usually are like higher dimensional objects. So instead of having two dimensions, you know, 2, 2, you can have like n dimensions.

You can have 2, 2, 2, 2, 2, 2, for like five or six dimensions. Um, and those are very valid to do mathematical operations on, um, and those are often colloquially sort of called tensors. Um, in addition, and this will be covered in the next tutorial in PyTorch, um, those larger sort of tensors are also optimized for efficiency, um, to be used on GPUs.

And so they're called tensor in a more concrete way because you're using these tensors with PyTorch and other sort of packages to directly do those quicker GPU operations on for deep learning. So those are sort of- that's a quick sort of terminology difference between the three. Okay. So now, um, let's start off with just some quick sort of representations of how are these matrices and vectors represented in NumPy.

Um, this sort of goes back to your question about like, what is the difference between like three comma versus like one comma three. Um, so usually three comma in NumPy arrays usually just means that you have one list of like one, two, three, for example, there's like three values versus if you add another list on top of that, this one comma three essentially refers to the fact that there's a list of lists.

So anytime you have two dimensions, it always means that there's a list of lists, um, and that being like a list of lists of for example like a row. So here, one comma three means that there's one row and then three columns. So it's saying there's one row of three comma four comma five essentially, and then each of those is like a column separately.

You can easily reshape them. So these are basically the same format, but from NumPy's perspective, you'll see a little bit later for operations such as broadcasting, you need to have it for example sometimes in this one comma three format or three comma one format. Um, and also like what- like as I said, three is just like it represents three numbers.

One comma three means like one row of three elements. Three comma one will mean you have essentially in each column, you'll have a separate array. So you'll see sort of boxes around each of them. I'll- there's an example that comes a little bit later in this colab which will make it a little bit more clearer.

So here, if you can see the difference between like x and y, one of them has only one bracket which just says it's one list, only one list of one comma two comma three. The second one is two brackets which says it's a list with only one list in it.

It's a list of a list. That's really the main difference between like these sort of two representations. So I could have like, let's say like a separate one. I'm going to call this A, and I just do this. So it's the same sort of elements, but this will be one comma three because it's showing that there's one outer list which shows the rows, and then one inner list which will have each of those values.

So the benefit will come when I'm coming to what a little bit later which is broadcasting. And so it essentially will help you determine what dimensions you want to match against. Because sometimes you'd want to have one comma three, like 1, 2, 3 applied only to rows in some other matrix.

We'll, we'll come to that a little bit later. Uh, but sometimes you might want to have it only applied to columns. And so, like if I have a separate matrix for example of 0, 0, 0, 0, 0, 0, 0, 0, and I want the resulting matrix to be for example, 1, 2, 3, 1, 2, 3, 1, 2, 3 along the rows.

Let me actually draw this out. It might be easier. So, let's say I have like the 0, 0, 0, 0, 0, 0, 0, 0. Um, and if I want to have a matrix that does 1, 2, 3, 1, 2, 3, 1, 2, 3, versus 1, 2, 3, 1, 2, 3, 1, 2, 3.

The difference in how to generate these two, um, will be the difference in the shape, like how you represent their shape. It's the same 1, 2, 3, but the resulting array you're generating by repeating the 1, 2, 3 values, um, requires a difference in shape. And so, we'll come to that a little bit later because this process of how you generate these arrays is called broadcasting.

But that's the real benefit of having an understanding of the shapes. The same 1, 2, 3 values are the same. It's just how they're sort of used with regards to other arrays. All right. So, yeah, vectors can be easily represented as sort of, and this is what I was talking about earlier as like n dimensions, n by 1 or 1 by n dimensions, and they can resolve in this different behavior kind of what, like this that I talked about.

Um, matrices are usually in two dimensions represented as m by n. Um, these are just two examples. If for example, I generate, let's say, and then you can also reshape. So, I start with, for example, this array which is a list of 10. Oh, sorry, I need to import them back quickly.

So, I start off with this matrix A which is basically a one-dimensional list of 10 values. I can reshape it into a 5 by 2 matrix. So, you just have to make sure that your dimensions match which means that like, you can multiply them together and get the original size.

So, if I start off with the 10 matrix, I can make a 2 by 5 matrix, I can make a 5 by 2 matrix, I can make a 10 by 1, 1 by 10. I can't make a, for example, 3 and 5 because that it wouldn't fit into the original size.

Um, and for that, this operation called reshape is really useful. Um, you might be wondering why is there two parentheses. The way that reshape works is essentially it'll take in a tuple. So, remember that what I was talking about earlier with tuples is that these, they're immutable objects and they're defined by parentheses.

So, the outer parentheses is representing what you're inputting to the function, and what you're inputting is a tuple. So, it uses a second set of parentheses. So, now, let's go to some array operations. Um, so I start off with, you know, this array X. Um, when you apply simple operations, for example, a max operation, sometimes you might want the max of the entire array.

So, if I do the max of the entire array, what's the max value of the entire array by the way? Just the entire thing. Yes, six, right? So, if I just do np.max of X, it'll return one value, it'll return six. Well, let's say I want the max of every row, right?

Like in every, in each of these rows, I say I want, let's say the max of each row. I want two and then four and then six. How do you do that? And so, NumPy always has like usually in most of their functions an access variable. And what the access variable will do is it'll tell you which of these dimensions do you want to take the max over.

And the way to sort of think about it is, this is going to be a little bit tricky, um, but the way people describe it is, the access is what you want to apply your function over, what you want to reduce over. And what that means is I print out the shape of the original array, it's three by two.

I want to apply access one, where as I remember, you know, NumPy is zero index, it'll be zero one. So, I want to apply the max over the second dimension. The second dimension means that for each of these essentially, you know that like for, like the row dimension is the first dimension.

So, it's not along, along the rows, I'm going to be comparing columns. And so, compare this entire column to this entire column. And so, just remember for axes, um, usually the axis zero refers to the row axis, and then the axis one refers to the column axis. Um, if you don't even want to remember that, you can just remember that from the original dimension, which of these it's referring to.

Um, and that's the dimension you want to compare over or reduce over. So, it can be a little bit harder to grasp around. It- it- usually the best way to sort of get around is like just play with a bunch of sort of operations of min-max, um, and things like that.

But just remember like the axis is what you want to compare over, not the resulting thing. So, axis one means here column, I want to compare between the columns. I want to get, for example, comparing one to two, three to four, five to six. Does that make sense? Okay.

Um, and what this will do is if I just do, um, numpy.axis, it'll just return- basically since I'm comparing these columns, it'll just return a resultant column. And so, as I mentioned, you know, um, for over the axis one, you get three values because you're comparing over these columns, and each column has three values.

I'm comparing over rows, as you mentioned, I get two values, right? Um, and so this will just be the tuple comma, which is just indicating that it's just a list. It's not a list of lists, it's just a list. But let's say I want a list of lists, you know, maybe I want to do those operations I talked about earlier.

Um, instead of reshaping, which is always there, it's always an option, you can also use this, um, feature called keep dims. And what that'll do is it'll take the original dimensions, which is two dimensions, right? Because you have three comma two, there's two of them, and it'll keep that consistent.

So it'll be three comma one. But it just means that instead of returning just the extracted column, which is just a list, it'll basically keep the column in the context of the original sort of x, and it'll be- it'll keep it as like a two-dimensional value. All right. Now, these are just some operations.

So in NumPy, um, you can use an asterisk as, uh, an element-wise multiplication. So an asterisk means that I'm going to be comparing every single value, um, to every single corresponding value in another matrix. And it's- you need your matrices to also be the same size for this one.

So this one, it's- it's basically an element-wise matrix. It's not a matrix multiplication, so you need to have them be the exact same size. So this will compare, for example, one into three, two into three, three into three, and four into three. All right. Um, you can also do matrix multiplication, which is a different operation entirely.

Um, for those of you unfamiliar with matrix multiplication, um, you would basically be multiplying a row of one matrix with a column of another matrix. And for that to be necessary, you need to have the second dimension of the first array be equal to the first dimension of the second array.

So for matrix multiplication, if I have an a into b, comma, c into c, um, shaped matrices, these two have to be equal for matrix multiplication. Just something to keep in mind, um, because oftentimes if you're doing matrix multiplication, um, you need- you have to make sure that these dimensions are the same.

Which means that, for example, this is a valid operation, um, but this can sometimes throw an error. Sometimes. So it's just important to make sure that sometimes you, you want to make sure that these are exactly equal. You can actually just print out the shapes and make sure that these are equal to be doing matrix multiplication.

And then for matrix multiplication, um, there's a couple of functions you can use. Um, the first one is just np.matmul, which is np.matrixmultiplication. You can also just use the, um, the at operation. And that one, both of those are overloaded. You can choose whichever one. They'll result in the same exact operation.

And just a quick session show, you can- to show what this will do is it will multiply one into two. So it'll come like one, two versus three, four. So it'll do one into three, two into three, and add those two values. That's what matrix multiplication will do. Okay.

Um, and then dot products will- what, what a dot product is that it takes two vectors. So usually it operates on vectors. Um, and a vector as I mentioned is just like a one-dimensional matrix. So it's just basically three cross one, for example, a four cross one. Um, it'll element-wise multiply between two different vectors and will sum up those values.

And so here, what a dot product would do would be like one into one, plus two into 10, plus three into 100. And for a NumPy, you can just do np. and then both of those vectors. Um, this one is just a side on how you would want the structure of the dot product to be.

Um, for arrays that are more- so, okay, so the, the phrase is the best way. Um, for single-dimensional, um, vectors, this operation works directly. Anytime it's a multiple-dimensional matrix, um, then it treats it as a matrix multiplication, the np. dot function. So for a two by two matrix versus a two by two matrix dot product, it's not going to return the sum, it's going to return, um, the matrix multiplication.

Now that's just something to keep in mind. If you want to make sure that your, um, your dot product is happening in the correct way, um, you would want to make sure that sort of similar to what I was talking about earlier, um, that here, I think this way to show it.

Okay. So you would want the second, like the- what I mentioned like the last dimension of the first one to match with the first dimension of the next one, because it's treating it as like a matrix multiplication. Um, here, the error that it's throwing is this three comma two combined with three.

And so the way to sort of like fix that would be to have this be like, for example, like, um, switch the two so you'd have two comma three and then three comma. It's really a dimension matching thing at this point. So the- the- it's- it can be a little bit confusing, but when you sort of- the main thing to keep in mind is like for single-dimensional vectors, you can just do np.

dot directly and it'll give you the dot product value. For higher dimensional matrices, it treats it as a matrix multiplication. Um, and so for- if you still want to, like for those higher dimensional values to ensure that you're getting a dot product, um, you'd have to make sure that the dimensions are aligned similar to these.

So anything that's two by two plus for both, um, any- any- you see any matrix that doesn't have a single dimension in any of them, yes, it would treat it as a matrix on, uh, mat mule, the same thing. Okay. All right. Okay. I'm going to move to indexing.

So similar to what I was talking about earlier, remember with lists, I was saying if you just do the semicolon, it'll create like the same array. Same- same deal here. The- the semicolon just means that you take everything from the original array. In fact, it returns a copy. So it returns a deep copy, means that you have a set- complete separate copy in memory.

Um, okay. Now, I'm going into sort of more details on how do you want to index quickly. So if I, for example, have, let's say this three by four matrix, and I only want to select the zero and the second rows, how would I do that? So what's useful is that you can sort of treat a numpy, you can treat different dimensions differently for indexing.

So a semicolon means you select everything in that dimension, which for example, here there's a semicolon in the second dimension, which means I'm taking all of the column values. Um, versus what's in the first dimension here, it's saying a numpy array of zero and two. So it's saying only the zero index and only the two index, which means only the zeroth row and only the second row.

So what this would look like would be something like, I have a matrix. Okay. I have a matrix and I only want to select the zeroth row and I only want to select the column- the second row, zero and second, and everything in the columns. All right. And then similarly, for example, if I want to select in the column dimension, um, I want to select the first and second rows, and only the first row, I can do that.

So you can basically treat them separately. You can think how many columns do I want, how many rows do I want, and then index those separately. And that goes for as many dimensions as you want in your entire tensor. Um, so nice things also, if I want to for example take- I have this like- let me print out actually x here.

I'll just generate the x. Okay. So this is x, right? So if I want to take all the values of x that are above 0.5 for example, I can do that by using what's called Boolean indexing. So I just basically will say x indexed by everything in x that's bigger than 0.5.

So it's pretty direct and it'll just output all the values in this entire array that are bigger than 0.5. All right. Um, this one is also another way to do reshaping. So I kind of mentioned earlier, you know, sometimes you want- have this like list of three elements and you want to reshape it to a three by one array for example.

Um, you can also use what's called numpy.newaccess. This will essentially add another access in whatever dimension you want. So if I want to change, go from like this three by four array to a three by, three by four to three by four by one, then I can just add a numpy.newaccess there.

Even simpler way to think about it would be like a two comma to a two comma one. And so it's just- it's another way to do what essentially what would be the reshaping operation. Does that make sense? Also what this would look like for example, let me just do a little bit more concrete.

So it's basically I have this list, right? I have like a singular list and in each- in that list I have a list of lists. So I have a list with element one and list of element two. So this is what that reshape operation will do. And what numpy.newaccess will enable you to do as well.

All right. I think we're good for time. So the last main topic we'll be covering is broadcasting. And what's really great about broadcasting is it'll allow you to operate with numpy arrays that are of different shapes but can be sort of- if many operations in them can be repeated, it allows for that in a very efficient manner.

And this is actually one of the most I would say useful things about numpy and one of its defining features. And what that means is if for example in this case, right? If we go back to this example that I had with- I start off with the 0, 0, 0 array.

How do I generate this array versus how do I generate this array, right? Instead of me saying, okay, element 0, 0 plus 1, element 0, 1 plus 2, all that stuff, right? Instead of doing that one by one, what broadcasting allows me to do is I can have only one vector of size 1, 2, 3.

And it'll- depending on how I do the broadcasting which I'll come to in a second, I can duplicate it along the row dimension, or I can duplicate it along the column dimension. And numpy allows for that. It'll do that on its own in the back end. And so that's really what broadcasting means is I don't need to for example, create a new array saying I wanted like create a new array to begin with, which is already like this and then add those two together.

I can just duplicate this and get this. All right. So now some rules for broadcasting. And I mean just we visually also just show what broadcasting will do. Oh, sorry. So broadcasting, this is a pretty good visual analogy. I have this 1 by 1, 1, 2, 3 vector, right?

And I want to basically add, let's say only the columns with this 1, 2, 3 vector. So what broadcasting allows you to do is you only pass these two values in, and on the back end it'll duplicate this along the column dimension. So let's say I have 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, and then it'll do the addition.

Similarly, if I pass it a vector 1, 2, 3, 4, and I want it to be added to each of the rows instead of each of the columns, it'll be able to do that by sort of duplicating it on the back end. So this is visually what's happening with broadcasting.

All right. Now some rules. So how does NumPy know when and how to do broadcasting? So the main two rules to keep in mind with for broadcasting is one, it can only happen if all of the dimensions, every single dimension between two arrays are compatible. And when they say what is compatible, either the dimension values are equal or one of them is equal to one.

And that is the only rule required. So for example, I start off with this X array, right? I have this like 3 by 4 X array. Will Y is equal to 3, 1 be compatible? Yes, it will be. Why? Because you have three in the first dimension between the two which is the same, and in the second dimension you have four and you have one.

So those are compatible values. And so what this tells NumPy on the back end is I'm doing, for example, an addition operation X plus Y. It knows that, okay, three and three are the same, but four and one are not the same. You know, one of them has one dimension.

So I need to duplicate this Y along the second dimension, which means I need to duplicate it along the column dimension. And once it does that, it duplicates it, it'll get four, 3, 4 in array, and then it can do the addition. And it does that really fast. So it's better to use broadcasting in this way, but then for you to create a separate array already duplicated and then add them.

Similarly, I have this Z array which is 1, 4. What X into Z will do is, first, it'll check, okay, 3, 1. Okay, is that compatible? Yes, because you have three in one dimension and you have one in the second, and four and four are compatible. Okay, so say I know that these two are compatible in the second dimension, I don't need to change anything.

In the first dimension, it'll know to duplicate them, basically. So you don't have to duplicate Z. And so add it three times in the row dimension. Create a separate array and then multiply those two. So this is giving you an example of saying I started off with X, I have Y, then the final shape will be 3, 4.

So a lot of times in deep learning, you will have the same- basically, you'll have different batches of different images coming in. But you want to apply, let's say, the same weight matrix to all of them. And instead of duplicating that weight matrix a hundred or even like potentially depending on the size of your batch size like a thousand times, and then adding those together, you use the same matrix and it'll know, okay, if I'm going to be duplicating over the batch dimension, it'll do that for you on the back end.

So it's used a lot of times in deep learning because of this. And basically, in your second homework, that's basically what you'll be doing. We're implementing a feed-for-all network in NumPy. And it'll say you have like this W matrix, you have this like B matrix, which is a bias, it will come to those in class.

And it'll ask you to implement it in NumPy, because that's basically what you're doing. It's like you have this input image, you have a weight matrix which will somehow scale it to an output. And that weight matrix will be applied to multiple images in your batch. And those images can be different, but their sizes will be the same and it's optimized for that.

Okay. So this is just more examples of sort of the same thing. Your final thing that you'll be coming to is the size of 3,4. Let's see. This one's sort of the example that I showed right here, right? Which is that I have this array of like say zeros.

I have this NumPy, this B array of size, what size were they? What would this be? Yes. Good. Because you have one outer list, and inside this you have one inner list. So it's just basically one row and then three values inside. So yes. And so would this be compatible?

Yes. And so it'll know basically to duplicate over the row dimension. And so you're going to get duplicates in the row dimensions. You're going to get 1, 2, 3, 1, 2, 3, 1, 2, 3. And that's what's happening here. So these are for example a little bit sometimes when it says more complex behavior.

What this basically just means is that like if I have this B vector, which is 3,1. If I'm doing this B plus B dot transpose, by the way transpose is just changing the dimensions and switching them. So if I have a two by three matrix, transpose will be a three by two matrix.

What that means visually is something like your row and rows and like column dimensions will get switched. X goes to, I believe it's like 1, 2, 3, 4, 5, 6. So like three row- rows versus like three columns. And what this is just saying is that a three by one and a one by three, both of those vectors will be compatible because remember in each dimension it's either the same or one.

And so it knows to duplicate over both of those dimensions. And that's what's happening here. Okay. So I think we are right at time. And what I would recommend is basically playing with variations of this for broadcasting. And see, just remember the two rules for broadcasting is just, if it's compatible it's either the same value or it's one.

And whatever is the one dimension is what's going to be duplicated over on the back end. So yeah, it's not going to be compatible if they're divisible for example, right? So if you have like let's say six and three, that's not compatible. You can reshape it and then see if you'd like to have one.

There's tricks you can use where you're sort of thinking like on the back end, how do I want this data to be multiplied? You can maybe reshape everything into like an eight- one, like one by 18 matrix and then multiply everything and then reshape it back. That's what you can do but you can never just directly for example, six by three, make that compatible.

Okay. So I think let's wrap up. This one's just a quick example of another use of efficient NumPy code. Quick note, never, preferably don't use loops whenever you're dealing with large data matrices. Mostly because loops are almost always about a 100 times slower. NumPy is usually very, very efficient.

As this is just an example of what you can accomplish with NumPy and same thing using loops. So what this is saying is I have an x matrix of size 1000 by 1000. And I want to apply, you know, let's say I want to add everything from row 100 onwards with plus five.

So visually what that will look like is something like I have this full matrix and I want everything here basically to be added with plus five. Then in the loop format, I can basically loop over the first dimension of 100 plus and do that. Or NumPy, I can basically do what's called NumPy.a range, which will generate integers in like we see 1, 2, 3, 4, 5, 6 all the way up to that 100 value.

In this case, it's between 100 and 1000. So start with 100, 100, 1, 100, 2, all the way to 1000 in the first dimension and then just add that with five. So this is just an example of how you would switch from using loops to using NumPy. And it's a lot, lot faster.

Stanford CS224N NLP with Deep Learning | 2023 | Python Tutorial, Manasi Sharma

Transcript