Upskill your org in ML/AI with Kaggle

1. Introduction

47566e1490c16443.png

Last Updated: 2024-05-10

What is Kaggle?

Kaggle is the largest AI & ML community, the ultimate platform for data science and machine learning enthusiasts of all levels to level up with the latest techniques and technologies. Discover a vast repository of datasets, notebooks, and pre-trained models to kickstart your next project. Participate in competitions, learn from courses, and connect with a diverse community of over 18 million users from around the globe. Whether you're a beginner or a seasoned pro, Kaggle is the place to hone your skills, stay ahead of the curve, and collaborate on cutting-edge projects.

What you'll build

In this codelab you'll create, configure and launch a kaggle competition. You'll walk through the competitor experience and understand best practices for running an engaging competition.

What you'll learn

  • Understand how to create and manage a Kaggle competition from the host's side
  • Navigate the competitor experience, from exploration to submission
  • Learn best practices for running an engaging competition

This codelab is focused on creating a competition quickly and leverages Kaggle's growing competition library.

What you'll need

  • A recent web browser
  • Basic knowledge of python

2. Getting set up

Create a Kaggle Account

Visit the Kaggle website (https://www.kaggle.com/) and click "Register" to create a free account.

Verify your account

  1. In the upper right corner of the page click on your profile image
  2. Click "Your Profile"
  3. Click on the "Settings" button on the right side of the profile content
  4. Under "Phone Verification" follows the instructions to verify your account

3. Creating your first competition

Introducing AI generated competition templates

AI Generated Competitions is a new feature on Kaggle that allows users to create machine learning competitions quickly and easily. It leverages AI to generate synthetic datasets that mimic the statistical properties of existing datasets without containing any personally identifiable information.

Here's how it works:

  1. Choose a template: Select from a list of templates based on different machine learning tasks (e.g., classification, regression).
  2. AI generates a dataset: Kaggle's AI creates a new dataset for your competition based on your chosen template. This dataset is similar to the original but uses a subset of features and has slightly different feature distributions.
  3. Customize your competition: Enter basic details like the competition name, description, and timeline. You can also choose the privacy settings for your competition.
  4. Launch: After finalizing the details and setting a launch, you're ready to launch your competition.

This feature streamlines the competition creation process, making it accessible to more users and enabling them to focus on the machine learning aspects rather than dataset preparation.

Create a competition

Navigate to https://www.kaggle.com/competitions/new, select "New AI Generated Competition"

2629bf77a282a46c.png

Select the "Regression with a Crab Age Dataset" Competition.

Competition Details

2dd2228b9d686a6e.png

Fill out a descriptive name and subtitle. For example, you could use ‘<Your Names>'s Test Crab Competition' as the title and ‘Creating my first competition to see how it works' as the subtitle. Note that the competition URL is automatically filled in based on the title.

Visibility and Access

We now need to set the visibility and access for the competition.

5c7dcae412ddd574.png

Visibility

  • Public: Your competition is visible to anyone on Kaggle. It'll show up in search results, so anyone interested can join.
  • Private: Your competition is hidden from public view. It won't appear in searches, and only people you specifically invite can participate.

Who Can Join

  • Anyone: This is like an open door policy. Anyone on Kaggle can join your competition.
  • Only people with a link: This is more exclusive. You'll generate a special link, and only people with that link can join.
  • Restricted email list: This is the most controlled option. You provide a list of specific email addresses or domains (like @yourschool.edu), and only people with those addresses can join.

We'll talk more about the Enable Notebooks and Models setting later. For now, make sure it is toggled on. For our example competition set these settings to Private and Only people with the link.

Read and agree to the terms and click "Create Competition".

4. Understanding and configuring your competition

Behind the scenes we've created a completely new competition with a unique dataset. Let's do a quick review of the competition settings.

Host Tab

The host tab contains everything you need as a host to properly configure your competition. Specifically see the page list on the right of the page:

bcedd6768cc4f32c.png

Basic Details

This section includes:

  • General
  • Privacy, Access & Resources
  • Timeline
  • Scoring & Teams

We covered the General and Privacy sections when launching the competition.

Timeline

The competition end date is timezone aware.

7141f4aea90bccb0.png

Scoring & Team

The Scoring & Team section allows you to control how many folks can join a team, how many times they can submit each day, and how many of their submissions they need to choose for final evaluation.

5efb6387612db941.png

Images

Images allows you to customize the banner and thumbnail for your competition. This will affect the home page of the competition as well as the listing entry for your competition.

6dfd442376a1c702.png

Hosts

Here you can add other Kaggle users as a host for your competition. Other hosts will have full access (including launching) to your competition.

8f8c90eb6baa7747.png

Evaluation Metric

The Evaluation Metric tab is the heart of the competition. When creating a competition from scratch, here you need to do some careful thought about which evaluation (or scoring) metric to use, upload your solution file, define the public/private test split, and provide a sample submission. However since we used a generated competition we don't need to do any of this!

Scoring Metric

This determines how a submission is scored against the solution file. Each metric has documentation and actual code available.

Solution File

Since we are using a generated competition, this file is unique to your competition!

89fa1f42d177505a.png

The Solution Sampling allows you to adjust the amount of the solution file that is used to score submissions during the competition (the public leaderboard) vs how many rows are used to determine the final leaderboard. During competition, users will be allowed to select (based on the Scored Private Submissions setting) which of their submissions to be used for the final leaderboard (called the Private Leaderboard here).

This process ensures that competitors are not rewarded for overfitting or flooding with submissions.

Sandbox Submissions

These allow competition hosts to ensure that scoring works as anticipated, and allows them to set "benchmark" submissions for competitors to compare against. These benchmark submissions will show up on the leaderboard.

Teams & Submissions

During the competition this allows the hosts to download all the scores, as well as manage teams. Before the competition starts, this is empty.

Launch Checklist

This will be covered in the next section!

5. Launching your competition

50b03df072c02e6a.png

From the top of the competition page, click on the "Launch Checklist" button.

Launch Checklist

The Launch Checklist shows the required steps to take before launching a competition. Since we have already started from a competition template, most of these steps are already completed! There are only two tasks that remain, setting a deadline and updating the competition rules.

938b9ed7bc4e0597.png

Set Deadline

First click on the arrow next to Set Deadline. Competitions usually last at least a couple months. The maximum length for a competition is one year.

Edit Rules

Your competition rules need to be updated from the default template before launching. If you are running this competition for a class or group this is a good place to put any information about expectations.

Launch

We are ready for launch! Go ahead and launch your competition! You are now ready for competitors to join!

6. Competitor Experience

Now that you've launched your competition, let's take a look at what the competitor experience looks like. We'll cover joining the competition and submitting a submission. For this, you can join the Google IO Demo Competition here: https://www.kaggle.com/competitions/google-io-demo-competition

Joining the competition

After navigating to the competition home page, click the "Join Competition" button in the upper right then read and acknowledge rules.

Making your first submission

Go to the code tab and click "New Notebook". This will open a notebook which will allow you to submit to the competition.

First we will read in the train and test data

# read the test and train data

train = pd.read_csv('/kaggle/input/google-io-demo-competition/train.csv')

test = pd.read_csv('/kaggle/input/google-io-demo-competition/test.csv')

Let's take a look at the data.

# take a look at some of the data

train.head()

Let's prepare the data for training. In this case we drop out Sex because it's not a numeric value. (Hint: figuring out how to include this should improve the performance of your model).

 # drop out the results from the test data

data = train.drop(columns=[‘Age', ‘Sex'])

answers = train[‘Age']

Then we create a model. In this case we are doing a random forest model.

# imports for the model

from sklearn.model_selection import train_test_split

from sklearn.ensemble import RandomForestRegressor

from sklearn.metrics import mean_absolute_error

model = RandomForestRegressor()

​​# train the model

model.fit(data, answers)

Create a submission:

predictions = model.predict(test.drop(columns=[‘Sex']))

submission = pd.DataFrame({‘id': test[‘id'], ‘Age': predictions})

submission.to_csv(‘submission.csv', index=False)

Then you can submit to the competition by selecting "Submit to Competition" on the right side menu.

1cf17449cae53abe.png

Tips for running a great competition

  1. Make sure to include a starter notebook that makes a basic submission
  2. Encourage use of the discussions and sharing notebooks early in the competition
  3. Have fun!