Skip to content

Latest commit

 

History

History

docker

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

NOTE: The Docker initialization action has been deprecated. Please use the Docker Component

The Docker Component is the best way to use Docker with Cloud Dataproc. To learn more about Dataproc Components see here.


Docker Initialization Action

This initialization action installs a binary release of Docker on a Google Cloud Dataproc cluster. After installation, it will add the yarn user to the special docker group so that YARN-executed applications can access Docker.

Using this initialization action

⚠️ NOTICE: See best practices of using initialization actions in production.

  1. Use the gcloud command to create a new cluster with this initialization action.

    REGION=<region>
    CLUSTER_NAME=<cluster_name>
    gcloud dataproc clusters create ${CLUSTER_NAME} \
        --region ${REGION} \
        --initialization-actions gs://goog-dataproc-initialization-actions-${REGION}/docker/docker.sh
  2. Docker is installed and configured on all nodes of the cluster (both master and workers). You can log into the master node and run a test command to see that it works:

    sudo docker run hello-world

    Or, to run as the yarn user would:

    sudo su yarn
    docker run hello-world