How to access datasets directly from Kaggle

  • Kaggle is one of the largest data science community platforms that provides access to various datasets, competitions, resources, and powerful tools to practice data science and machine learning.
Owner: Kaggle.com
  • Kaggle allows us to use its datasets by downloading them or by using its API.
  • In this article, we will be looking at the latter part where we can simply use the API key provided to us by Kaggle.com and can be stored anywhere on your Google drive.

Prerequisites

To follow through this article, you need to have a Kaggle account (to generate the API key) and a Google account (to use Google Colab)

Generating the API Key

To generate the Kaggle API Key, follow the given steps:

  1. Login to your kaggle.com account
  2. On the top right corner, you can see your profile. On clicking it, you will see an option to view Your Profile, Account Settings, or Logout. Click on the Account Settings (indicated by Gear icon).
Going to your Kaggle Account
  1. On your account page, you can scroll down till you see an API section. In this section, you can see a Create New API Token button. Click on it.
Generating API Key
  1. You will be given a JSON file named kaggle.json that contains the API Key that is private only to your account and must not be shared.
  2. You need to store this API key in a folder named .kaggle as the API’s library by default searches for this on your local system.

Setting things up

  • In this article, I will be showcasing how to access the token through google drive.
  • Before running the required scripts, you first need to upload your kaggle.json file on Google Drive.
  • Meanwhile, you can create a new colab notebook to keep up with this article.
  • After you have uploaded the file, you need to mount your drive storage on your new colab notebook using the following command:
drive.mount('/content/drive')
  • You will be prompted to give access to your drive storage by selecting your account and authenticating using a key.
Authorizing your Connection to Google Drive
  • Now that you have mounted your drive, we can download and import all the necessary libraries on this colab instance.
  • Starting with the required libraries, we will first install kaggle and kaggle-cli libraries using the following commands:
!pip install -q kaggle
!pip install -q kaggle-cli
  • Now, you need to run the below script that creates a folder named as .kaggle on your drive, copies the kaggle.json file in it, and modifies the access such that only you can access and read the kaggle.json file:
!mkdir -p ~/.kaggle
!cp "/content/drive/MyDrive/kaggle.json" ~/.kaggle/
!cat ~/.kaggle/kaggle.json
!chmod 600 ~/.kaggle/kaggle.json
  • The output should be your kaggle username and your API Key and we are set to download the datasets.

Accessing a publicly available dataset

  • To download the dataset here, you need to copy the URL after kaggle.com i.e. username of the uploader and the dataset name they have uploaded.
  • And the required command will be in the form:
!kaggle datasets download -d username/dataset_name
!kaggle datasets download -d nicholasjhana/energy-consumption-generation-prices-and-weather
  • You can see the download progress and later check that the files are visible on the left side of your colab interface.
Example of downloading a dataset
  • But the data is in a zip file. You can extract the contents using the following command:
!unzip /content/energy-consumption-generation-prices-and-weather.zip
  • You can now use the pandas library to check the data.

Accessing a Competition dataset

  • The procedure is the same except that you first need to terms and conditions of the said competition.
  • To download the dataset here, you need to copy the URL after kaggle.com i.e. the competition name.
  • And the required command will be in the form:
!kaggle competitions download -c competition_name
!kaggle competitions download -c tabular-playground-series-feb-2022
  • Again the file is in zipped format but you can unzip it using the !unzip command.

Conclusion

  • And that’s actually it…
  • You can access the notebook that I have created for your reference here.
  • All you need to do is generate and upload your API key on your google drive before running the above notebook.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
INSAID

INSAID

One of India’s leading institutions providing world-class Data Science & AI programs for working professionals with a mission to groom Data leaders of tomorrow!