Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hive-HCatalog action flaky on HA #205

Closed
DariuszAniszewski opened this issue Mar 5, 2018 · 3 comments
Closed

Hive-HCatalog action flaky on HA #205

DariuszAniszewski opened this issue Mar 5, 2018 · 3 comments
Assignees

Comments

@DariuszAniszewski
Copy link
Contributor

I noticed that hive-hcatalog init action is flaky on High Availability cluster.

Cluster was created using following command:

gcloud dataproc clusters create 'hive-ha' \
  --initialization-actions 'gs://dataproc-initialization-actions/hive-hcatalog/hive-hcatalog.sh' \
  --num-workers 2 \
  --num-masters 3 \
  --worker-machine-type n1-standard-4 \
  --master-machine-type n1-standard-4

While cluster seems to be created properly and init action is executed (I manually checked for artifacts of the action), it can only run jobs on main master (m-0). I wanted to simply run SHOW TABLES; query against the cluster using

gcloud dataproc jobs submit hive --cluster 'hive-ha' -e 'SHOW TABLES;'

Result is flaky and depends on which master node takes the job.

Job below was executed on m-0:

$ gcloud dataproc jobs submit hive --cluster 'hive-ha' -e 'SHOW TABLES;'
Job [381c440c-146e-4794-8752-de1420f050ca] submitted.
Waiting for job output...
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/hive/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Connecting to jdbc:hive2://hive-ha-m-0:10000
Connected to: Apache Hive (version 2.1.1)
Driver: Hive JDBC (version 2.1.1)
18/03/05 13:29:55 [main]: WARN jdbc.HiveConnection: Request to set autoCommit to false; Hive does not support autoCommit=false.
Transaction isolation: TRANSACTION_REPEATABLE_READ
+-----------+--+
| tab_name  |
+-----------+--+
+-----------+--+
No rows selected (0.112 seconds)
Beeline version 2.1.1 by Apache Hive
Closing: 0: jdbc:hive2://hive-ha-m-0:10000
Job [381c440c-146e-4794-8752-de1420f050ca] finished successfully.

Job below was executed on m-1:

$ gcloud dataproc jobs submit hive --cluster 'hive-ha' -e 'SHOW TABLES;'
Job [e46b6906-aeea-4b3a-aefd-4c8b517cd360] submitted.
Waiting for job output...
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/hive/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Connecting to jdbc:hive2://hive-ha-m-1:10000
18/03/05 13:32:10 [main]: WARN jdbc.HiveConnection: Failed to connect to hive-ha-m-1:10000
Could not open connection to the HS2 server. Please check the server URI and if the URI is correct, then ask the administrator to check the server status.
Error: Could not open client transport with JDBC Uri: jdbc:hive2://hive-ha-m-1:10000: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0)
No current connection
ERROR: (gcloud.dataproc.jobs.submit.hive) Job [e46b6906-aeea-4b3a-aefd-4c8b517cd360] entered state [ERROR] while waiting for [DONE].

Job below was executed on m-2:

$ gcloud dataproc jobs submit hive --cluster 'hive-ha' -e 'SHOW TABLES;'
Job [1cbc6ced-6e50-42c2-a29f-f8d500043d7d] submitted.
Waiting for job output...
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/hive/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Connecting to jdbc:hive2://hive-ha-m-2:10000
18/03/05 13:29:07 [main]: WARN jdbc.HiveConnection: Failed to connect to hive-ha-m-2:10000
Could not open connection to the HS2 server. Please check the server URI and if the URI is correct, then ask the administrator to check the server status.
Error: Could not open client transport with JDBC Uri: jdbc:hive2://hive-ha-m-2:10000: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0)
No current connection
ERROR: (gcloud.dataproc.jobs.submit.hive) Job [1cbc6ced-6e50-42c2-a29f-f8d500043d7d] entered state [ERROR] while waiting for [DONE].
@pmkc
Copy link
Contributor

pmkc commented Mar 6, 2018

You have pretty much perfectly diagnosed the issue. We are pointing beeline at the current master, but only running a Hive Server 2 on m-0.

We are looking into HIVE-14063 for a long term solution, but in the short term we are fixing the mapping. Our fix should be rolled out by Friday March 16th.

@pmkc pmkc self-assigned this Mar 6, 2018
@DariuszAniszewski
Copy link
Contributor Author

Sounds good 👍

@pmkc
Copy link
Contributor

pmkc commented Mar 19, 2018

This was fixed in the March 16th release.

@pmkc pmkc closed this as completed Mar 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants