Back to Index

How-to use the Kaggle API in Python


Chapters

0:0
0:10 Pip Install Kaggle
2:20 Import the Kaggle Api Class
3:1 Downloading the Competition Data Sets
5:27 Download the Standalone Data Sets

Transcript

Hi and welcome to this video where we are going to go through setting up and using the Kaggle API. So the first thing we want to do is actually pip install Kaggle. Now I already have it installed so I'm not going to go ahead and install it again but once you do have it installed you can try and import the Kaggle module and you will get this error here.

So this OS error simply tells you that you could not find the Kaggle.json and you need to add it to this location here. Now the reason it's telling you this is because we use Kaggle.json to authenticate our API access. Obviously Kaggle is not going to let anyone access their API, you need to have an account before you start downloading their data.

So to get our Kaggle.json credentials we simply go over to Kaggle.com. Now if you don't have an account you'll have to go ahead and create one. Once you've created your account you simply go over to this little icon over here in the top right, click account and scroll down until you see this API section.

Now all you need to do is create a new API token and this creates the Kaggle.json credentials and allows me to save them to my computer. So I'm just going to save them in my documents for now and then head back to the notebook and we're going to see that we need to save it here.

So I'm going to copy and paste that across and here we have the directory that we need to put our Kaggle.json. I'm going to take my Kaggle.json and simply move it into here. Okay so to check that it's worked we simply rerun this cell and there we can see that our Kaggle API is now functional.

Now we don't actually need this import Kaggle, instead we need to import the Kaggle API class from the Kaggle API extended module. So once we've imported that we simply initialize our API and then authenticate it. Now we're ready to start downloading datasets and the Kaggle API gives us several options for doing this.

The two that you're most likely to use are for downloading the competition datasets or standalone datasets. Now a competition dataset is related to a current or past competition. So for example there is a sentiment analysis on movie reviews competition. We can actually find it over here and you can see here in the URL Kaggle.com is followed by this C and this C essentially means that this is a competition and we can also see playground prediction competition everything is telling us that this is a competition and in this competition it comes with some data.

Now this is different to a standalone dataset and these standalone datasets can simply be uploaded by anyone. So if we go to sentiment 140 dataset here you look in the URL and we can see that this dataset has been uploaded by Casanova and there's a slightly different structure to the dataset page as well.

We can see here it's a dataset first tab takes us to data and we can scroll down and see the data that we can get here. So there are two different methods for downloading each one of these we can't download competition datasets with the standalone dataset method and we can't download standalone datasets with the competition dataset method.

So we'll start with the competition dataset and to download one of these all we need to do is use the competition download IOP method and then we need to pass the competition name followed by the dataset. So head back over here we can see the competition name is this and the data that we would like is train.tsv.zip and that is downloaded into our current directory you can see here.

Okay so that's how we download the competition datasets we can also download the standalone datasets. To do so we use the dataset download file method and then here we need to pass the username followed by the dataset name. So if we head over here you can find both in the url so this one is casanova/sentiment140.

We also need to specify the file name which in this case is this text here and then just execute that and now we can see that we have downloaded both files here. Now you will notice that both of these files are actually zipped so we can just quickly unzip them using python all we need to do is import zip file and with zip file we specify the path to the data which in this case is just the file name and we specify that we are simply reading it.

And then we simply call the extract all method and we have our data set here and we see everything is in the right format. So that's everything for this tutorial on using the Kaggle API. If you have any questions just let me know in the comments below but otherwise thank you for watching and I will see you again next time.