The Docker Component is the best way to use Docker with Cloud Dataproc. To learn more about Dataproc Components see here.
This initialization action installs a binary release of
Docker on a Google Cloud
Dataproc cluster. After installation, it
will add the yarn
user to the special docker
group so that YARN-executed
applications can access Docker.
-
Use the
gcloud
command to create a new cluster with this initialization action.REGION=<region> CLUSTER_NAME=<cluster_name> gcloud dataproc clusters create ${CLUSTER_NAME} \ --region ${REGION} \ --initialization-actions gs://goog-dataproc-initialization-actions-${REGION}/docker/docker.sh
-
Docker is installed and configured on all nodes of the cluster (both master and workers). You can log into the master node and run a test command to see that it works:
sudo docker run hello-world
Or, to run as the
yarn
user would:sudo su yarn docker run hello-world