Skip to content

Latest commit

 

History

History

h2o

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 

H2O Sparkling Water Initialization Action

This initialization action installs H2O Sparkling Water on all nodes of Google Cloud Dataproc cluster.

This initialization works with Dataproc image version 1.3 and newer, except 1.5 image.

Using this initialization action

⚠️ NOTICE: See best practices of using initialization actions in production.

You can use this initialization action to create a new Dataproc cluster with H2O Sparkling Water installed:

  1. To create Dataproc 1.3 cluster use conda initialization action:

    REGION=<region>
    CLUSTER_NAME=<cluster_name>
    gcloud dataproc clusters create ${CLUSTER_NAME} \
        --image-version 1.3 \
        --scopes "cloud-platform" \
        --initialization-actions "gs://goog-dataproc-initialization-actions-${REGION}/conda/bootstrap-conda.sh,gs://goog-dataproc-initialization-actions-${REGION}/h2o/h2o.sh"
  2. To create Dataproc 1.4 cluster use ANACONDA optional component:

    REGION=<region>
    CLUSTER_NAME=<cluster_name>
    gcloud dataproc clusters create ${CLUSTER_NAME} \
        --image-version 1.4 \
        --optional-components ANACONDA \
        --scopes "cloud-platform" \
        --initialization-actions "gs://goog-dataproc-initialization-actions-${REGION}/h2o/h2o.sh"
  3. To create Dataproc 2.0 cluster and newer you don't need any additional initialization actions or optional components:

    REGION=<region>
    CLUSTER_NAME=<cluster_name>
    gcloud dataproc clusters create ${CLUSTER_NAME} \
        --image-version 2.0 \
        --scopes "cloud-platform" \
        --initialization-actions "gs://goog-dataproc-initialization-actions-${REGION}/h2o/h2o.sh"

Submit sample job:

REGION=<region>
CLUSTER_NAME=<cluster_name>
gcloud dataproc jobs submit pyspark --cluster ${CLUSTER_NAME} \
    "gs://goog-dataproc-initialization-actions-${REGION}/h2o/sample-script.py"

Supported metadata parameters

  • H2O_SPARKLING_WATER_VERSION: Sparkling Water version number. You can find the versions from the releases page on GitHub. Default is 3.30.1.2-1.