Deploy an end-to-end Tensorflow Pipeline on Kubeflow

Mihir Parmar
Searce
Published in
8 min readMar 21, 2020

--

Have you ever tried to deploy a machine learning model in production?

If yes, then you might know that it is not just about training the model, but it also includes the steps of data validation, preprocessing, model validation, scalability and model serving. So, to remove the inconsistencies between model training on the local environment and serving it on the production environment, an end to end pipeline that takes into account all these components will be a good option to have.

‘Tensorflow Extended (TFX)’ is one such service that helps to create and manage a production pipeline. It can be combined with the Kubeflow service to deploy the machine learning pipeline on AI Platform for the online predictions.

This article will guide you on how to set up an end to end pipeline using the services mentioned for the Chicago Taxi Trip Dataset.

Dataset

This dataset contains taxi trips data of Chicago from 2013 to the present. It contains 23 features which include a unique key, taxi ID, fare, tips, etc. For this article, we are predicting whether the customer will pay a tip greater than 20%.

Overview of Tensorflow Extended (TFX)

TFX is an end to end platform for deploying production ML pipelines. It provides a configuration framework and shared libraries to integrate common components needed to define, launch and monitor an ML system.

Read more about TFX Libraries here

Why choose TFX?

TFX is a platform which is specifically designed to build an end to end machine learning pipeline. The components it uses are designed for scalable, high-performance machine learning tasks.

It becomes easy to preprocess the data using the functionalities of TFX components and TFX libraries. For example, The library

Tensorflow Data Validation(TFDV)’ automatically identifies anomalies, such as missing features, out-of-range values, or wrong feature types so that one doesn’t have to manually look for anomalies and make changes which make the task time efficient.

Tensorflow Model Analysis (TFMA) allows users to evaluate their models on large amounts of data in a distributed manner.

Also, Once the pipeline is ready, It is easy to deploy the model on any serving architecture using the ‘Pusher’ component.

The whole task from Data preprocessing to model deployment can be done from the same script by using TFX pipelines.

Because of these rich functionalities that TFX provides, TFX is highly used for machine learning pipelines especially for the use case like this.

TFX Pipeline Implementation

For this TFX pipeline, the pipeline code — ‘TFX Example.ipynb’ is used from the Github Repository.

Perform the following actions in the GCP console before you run the pipeline file.

  1. Create a GCP project and make sure that you have enabled billing.
  2. In the Cloud Storage, make two buckets with appropriate names. Here, two buckets with names ‘model_input’ and ‘model_output’ are created.
  3. In ‘model_input’ folder, upload the file named ‘taxi_utils.py
  4. Create a folder in the storage bucket ‘model_input’ with the name ‘data_input’.
  5. Upload the dataset in ‘.csv’ format in the ‘data_input’ folder.
  6. Enable ‘Dataflow API’ if it is not enabled.

Next Step: Set up the environment.

Setting up the Environment

Conda virtual environment is used here for installing the required dependencies.

To run the TFX pipeline, It needs ‘python version 3.6’ or later.

All other dependencies which are required are listed as below :

To satisfy above-mentioned requirements, follow the mentioned steps :

  • Create a conda virtual environment with Python 3.6 and activate the environment.
  • Install the libraries in this virtual environment.

Next Step: Make the following changes in ‘TFX Example.ipynb’ file.

Directory and data locations

TFX example file

Give the paths for input and output buckets according to the buckets you created on your Google cloud storage. Also, provide your project ID.

Configure the TFX pipeline example

Load the pipeline file in your jupyter notebook using the following code.

%load https://raw.githubusercontent.com/tensorflow/tfx/master/tfx/examples/chicago_taxi_pipeline/taxi_pipeline_kubeflow_gcp.py

Make the following changes in the file that opens up.

Import Dependencies

Since the data is being read from a ‘.csv’ file, we have to import CsvExampleGen method and some other dependencies as below -

1.Import CscExampleGen

from tfx.components.example_gen.csv_example_gen.component import  CsvExampleGen

2.Import external_input

from tfx.utils.dsl_utils import external_input

Parameter Values

Fill in your details for Pipeline Name, Input bucket, Output Bucket, module file, and Project ID in the variables defined.

Changes in ‘_create_pipeline’ function

You are required to update the path in the CsvExampleGen function as per your data source file path.

example_gen = CsvExampleGen(input=external_input("<path_to_data.csv_file>"))

Changes in ‘Trainer’ and ‘Pusher’ functions

For this blog, the model deployment will be on the AI Platform. So, provide the following arguments in the Trainer and Pusher component of the pipeline:

In the Trainer function, change

‘ai_platform_trainer_executor.TRAINING_ARGS_KEY’ to ‘ai_platform_training_args’.

Similarly, In the pusher function, change

‘ai_platform_pusher_executor.SERVING_ARGS_KEY’ to ‘ai_platform_serving_args’.

We have made the necessary changes in our pipeline that will allow us to deploy our model on the AI Platform and serve online predictions.

The successful run of this file will generate and save a file with the name <name of the pipeline>.tar.gz. in your directory. Also, it will generate a folder in the output bucket’ containing files that are generated by the successful run of the pipeline file.

Pipeline File

Download this file on your machine.

Now, the next step is to deploy the pipeline on the AI platform using kubeflow.

What is Kubeflow?

Kubeflow is an open-source Kubernetes-native platform for developing, orchestrating, deploying, and running scalable and portable machine learning workloads.

It does not need to recreate other services but provides a straightforward way to deploy best-of-breed open-source systems for ML to diverse infrastructures.

Also, anywhere where we can run Kubernetes, we can run kubeflow.

Kubeflow Deployment

After creating the Kubernetes cluster on GCP, Now we can deploy the Kubeflow on the cluster.

Follow these steps to deploy kubeflow on the Kubernetes cluster.

  1. Use the kubeflow UI link to access kubeflow for deployment on GCP. It will redirect you to the page where you will be asked to fill in a few details.
  2. Provide your Project ID, Deployment Name and other required details. Choose us-central-1-a zone, kubeflow version ‘v0.7.1’ and click on create a deployment.

This task will take about 30 minutes to get completed.

After completion, a link will be generated in the logs which will redirect you to the kubeflow UI page.

Kubeflow UI

If you can access this page it means that kubeflow is successfully deployed on top of the Kubernetes cluster.

Next step: Upload the pipeline file(.tar.gz) file which was downloaded after running the ‘TFX Example.ipynb’ file previously.

You need to upload the pipeline file in pipelines using the Kubeflow UI.

Follow these steps to upload the pipeline on kubeflow.

  1. Select your namespace on kubeflow UI from the drop-down menu.
  2. Go to Pipelines on kubeflow UI and click on ‘upload pipeline’ or you can directly use the shortcut as shown in the snapshot below.
Upload Pipeline

3. Select the pipeline (<your_pipeline_name>.tar.gz) file.

4. After uploading the pipeline, your_pipeline_name will be visible in the pipelines section. The page containing the pipeline graph will be loaded once you click on it.

Pipeline Graph

5. To run this pipeline, Click on the ‘Create run’ button. You will be asked to fill in a few details.

Create Run

6. For the run of the pipeline, either use the default experiment or create a new one.

7. Start the run after filling in the necessary details.

As it will go through each component defined in the ‘TFX Example.ipynb’ file, it might take several hours to run.

On successful completion of the run, a graph will be generated as shown below.

Successful Run

The successful run of the pipeline implies that the model has been deployed on the Google Cloud AI Platform. You can view the model version details on the AI platform.

Model version details on AI platform

Our next task is to make online predictions from the model that is deployed on the AI platform using Kubeflow.

Prediction

GCP AI Platform provides two types for predictions, one is Online Prediction and the other is Batch Prediction. We have used Online Prediction for this model deployment.

Online predictions are fast as they can take one instance per request and Predictions are returned in the response message only.

An API is created using GCP Cloud Functions, where we have hosted the prediction code for the AI Platform deployed model. The main function code snippet is as follows -

Cloud Function

The URL is globally accessible, you just need to hit the URL with the test data that needs to be predicted and you’ll get predictions in return.

Prediction Response-1

In the response, scores of each class are predicted where Class ‘0’ means that the user will give a tip of more than 20% and Class ‘1’ means the user will give a tip of less than 20%.

The score is the confidence value for the predicted class.

For example in the screenshot above, 80% chances are that the tip will be more than 20%.

Another Prediction Response is given in the screenshot below -

Prediction Response-2

For this test data, there are 78% chances that the user will pay a tip of more than 20%.

Conclusion

If you made this far, a big shout out to you! We have learnt how to set up a TFX pipeline using Kubeflow, deploy the model on AI Platform, and make online predictions with the Chicago Taxi Trip Dataset.

I hope you enjoyed reading this!

You can also write to us in the comments if you have any queries. For more interesting things to read, visit our engineering blog

References

--

--