Now that we’ve learned the basic concepts for data mining, we’re ready to do some REAL data mining. To do this, we need real data. Where do we get data for practice? Kaggle is the right place to go. A large amount of data and even code that uses it is shared in Kaggle. There’s also a community where you can post questions and answers, and a ‘contest’ to find a better solution.
How to use Kaggle better? If you install Kaggle API, you can download the data easily and participate the contests easily. In this article, we will go through step by step installation of Kaggle API using Anaconda and Google Colab.
■ Sign up for Kaggles and get API tokens
Go to the Kaggle site and sign up. It is easy to sign up via Google ID. When you log in, there is duck on the upper right corner. Clicking on the image will lead you to enter Your Profile.
Click Account section, then scroll down a bit to find API section. Click Create New API Token to download the kaggle.json file.
If you open the kaggle.json file via notepad, you will see a username and key. User name is literally the name of your kaggle, and the key is a password and this should not be disclosed elsewhere. (If it’s leaked or you want to change the key value for some other reason, you can click and regenerate the Expire API Token next to the Create New API Token button.)
■ Kaggle installation
Now you need to open a terminal, such as a Windows PowerShell or Anaconda Prompt, to install the Kaggle. Type this as follows:
conda install -c conda-forge kaggle
When you hit the command, the explanation pops up. If you enter y and you’re done! If installation is successful, you will see ‘done‘ on the screen.
Now let’s enter kaggle again.
Then, if you go into the location where you’ve entered kaggle (in other words, my PC > local disk (C:) > user > your PC’s username), you’ll see that a new folder named .kaggle. Move kaggle.json to this folder and you are really done!
■ Kaggle API on Google Colab
The above method works on your local PC. If you mainly use Google Colab, you can use Kaggle API as follows:
import os os.environ ['KAGGLE_USERNAME'] = 'put username value in kaggle.json' os.environ ['KAGGLE_KEY'] = 'put key value in kaggle.json' # Make sure it works well !kaggle -h
■ Download data using the Kaggle API
Now let’s see how to use Kaggle API. Firstly, find the data you want to use. Here, let’s use amazon stock data. To download this data, click little dots on the right then, you will see Copy API command. If you click it, the command will automatically be copied.
Now, simply paste the command into Google Colab or terminal as is! Finish 🙂
!kaggle datasets download -d varpit94/amazon-stock-data
If you want to check other commands, you can check them here: Kaggle API Github Page