Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors importing lots of entities to DataStore using the emulator #95

Closed
JustinBeckwith opened this issue May 31, 2018 · 6 comments · Fixed by #544
Closed

Errors importing lots of entities to DataStore using the emulator #95

JustinBeckwith opened this issue May 31, 2018 · 6 comments · Fixed by #544
Assignees
Labels
api: datastore Issues related to the googleapis/nodejs-datastore API. type: question Request for information or clarification. Not an issue.

Comments

@JustinBeckwith
Copy link
Contributor

From @glenpike on May 23, 2018 10:26

[x] - Search the issues already opened: https://github.com/GoogleCloudPlatform/google-cloud-node/issues
[x] - Search StackOverflow: http://stackoverflow.com/questions/tagged/google-cloud-platform+node.js
[404] - Check our Troubleshooting guide: https://googlecloudplatform.github.io/google-cloud-node/#/docs/guides/troubleshooting
[404] - Check our FAQ: https://googlecloudplatform.github.io/google-cloud-node/#/docs/guides/faq

If you are still having issues, please be sure to include as much information as possible:

Environment details

  • gcloud SDK: 202.0.0
  • OS: OSX El Capitan (10.11.6) Using about 12GB / 16GB memory
  • Node.js version: v8.11.2
  • npm version: v5.6.0
  • google-cloud-node version:
├─┬ @google-cloud/[email protected]
│ ├─┬ @google-cloud/[email protected]
├─┬ @google-cloud/[email protected]
│ └─┬ @google-cloud/[email protected]
│   ├─┬ @google-cloud/[email protected]
│   ├─┬ @google-cloud/[email protected]
├─┬ @google-cloud/[email protected]
│ ├─┬ @google-cloud/[email protected]

Using DataStore via: [email protected]

Steps to reproduce

Looping through a list of data and creating a model for each one, then calling a function which
uses save

    const { body } = ctx;
    let promises = [];

    // Save new Object
    body.new.forEach((model) => {
        const createdEntity = FromModel.create(model, model.id);
        promises.push(createdEntity.upsert());
    });

    const response = await Promise.all(promises);

Trying to import about 2.5k models, we are getting lots of errors that look like they're coming from grpc maybe? Workaround is to split data into chunks, e.g. 1/4 works.

The log of errors looks like this ('...' is replacing several repeated events):

10:02:46.826Z ERROR import: 1 CANCELLED: Received RST_STREAM with error code 8
10:02:46.826Z ERROR import: 13 INTERNAL: Half-closed without a request
10:02:46.827Z ERROR import: 1 CANCELLED: Received RST_STREAM with error code 8
...
10:02:46.841Z ERROR import: 1 CANCELLED: Received RST_STREAM with error code 8
10:02:46.841Z ERROR import: 13 INTERNAL: Half-closed without a request
10:02:46.841Z ERROR import: 1 CANCELLED: Received RST_STREAM with error code 8
10:02:46.841Z ERROR import: 1 CANCELLED: Received RST_STREAM with error code 8
10:02:46.841Z ERROR import: 1 CANCELLED: Received RST_STREAM with error code 8
10:02:46.841Z ERROR import: 13 INTERNAL: Half-closed without a request
10:02:46.841Z ERROR import: 1 CANCELLED: Received RST_STREAM with error code 8
...
10:02:46.858Z ERROR import: 1 CANCELLED: Received RST_STREAM with error code 8
10:02:46.858Z ERROR import: 13 INTERNAL: Half-closed without a request
10:02:46.858Z ERROR import: 13 INTERNAL: Half-closed without a request
10:02:46.858Z ERROR import: 1 CANCELLED: Received RST_STREAM with error code 8
10:02:46.858Z ERROR import: 1 CANCELLED: Received RST_STREAM with error code 8
10:02:46.858Z ERROR import: 1 CANCELLED: Received RST_STREAM with error code 8
10:02:46.858Z ERROR import: 13 INTERNAL: Half-closed without a request
10:02:46.859Z ERROR import: 1 CANCELLED: Received RST_STREAM with error code 8
10:02:46.873Z ERROR import: 13 INTERNAL: Half-closed without a request
10:02:46.874Z ERROR import: 1 CANCELLED: Received RST_STREAM with error code 8
...
10:02:46.898Z ERROR import: 1 CANCELLED: Received RST_STREAM with error code 8
10:02:46.898Z ERROR import: 13 INTERNAL: Half-closed without a request
10:02:46.898Z ERROR import: 1 CANCELLED: Received RST_STREAM with error code 8
10:02:46.898Z ERROR import: 1 CANCELLED: Received RST_STREAM with error code 8
10:02:46.899Z ERROR import: 13 INTERNAL: Half-closed without a request
10:02:46.899Z ERROR import: 1 CANCELLED: Received RST_STREAM with error code 8
10:03:43.145Z ERROR import: 4 DEADLINE_EXCEEDED: Deadline Exceeded
10:03:43.145Z ERROR import: 4 DEADLINE_EXCEEDED: Deadline Exceeded

The DEADLINE_EXCEEDED error seems to correspond with this in the emulator:

datastore] May 23, 2018 11:03:16 AM com.google.cloud.datastore.emulator.impl.LocalDatastoreFileStub$7 run
[datastore] INFO: Time to persist datastore: 198 ms
[datastore] Exception in thread "LocalDatastoreService-1" java.lang.OutOfMemoryError: unable to create new native thread
[datastore] 	at java.lang.Thread.start0(Native Method)
[datastore] 	at java.lang.Thread.start(Thread.java:714)
[datastore] 	at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:950)
[datastore] 	at java.util.concurrent.ThreadPoolExecutor.processWorkerExit(ThreadPoolExecutor.java:1018)
[datastore] 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1160)
[datastore] 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[datastore] 	at java.lang.Thread.run(Thread.java:745)
[datastore] Exception in thread "LocalDatastoreService-4" java.lang.OutOfMemoryError: unable to create new native thread
[datastore] 	at java.lang.Thread.start0(Native Method)
[datastore] 	at java.lang.Thread.start(Thread.java:714)
[datastore] 	at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:950)
[datastore] 	at java.util.concurrent.ThreadPoolExecutor.processWorkerExit(ThreadPoolExecutor.java:1018)
[datastore] 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1160)
[datastore] 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[datastore] 	at java.lang.Thread.run(Thread.java:745)

Copied from original issue: googleapis/google-cloud-node#2822

@JustinBeckwith
Copy link
Contributor Author

@glenpike does this repo using the datastore service, and not the emulator? I'm trying to figure out if this is an issue with the client library or the emulator :)

@JustinBeckwith JustinBeckwith added type: question Request for information or clarification. Not an issue. needs more info This issue needs more information from the customer to proceed. labels May 31, 2018
@glenpike
Copy link

glenpike commented Jun 1, 2018

Hi @JustinBeckwith - this happens with the emulator, on the 'live' system it's behaving.

@sduskis sduskis removed the needs more info This issue needs more information from the customer to proceed. label Dec 4, 2018
@ideasculptor
Copy link

ideasculptor commented Jul 30, 2019

This problem is persistent in the emulator.

I have a reasonably small production datastore instance - barely over 1GB including indexes. I then exported a small fraction of that data, just a few of the entity types constituting about 125MB of data in the storage bucket. But I have been utterly unable to import that data into the emulator. No matter how much memory I give to the running process, it eventually errors out with OOM errors. (when I gave the docker container I was running it in 8GB of memory, it finally completed). Total size on disk was about 160MB. The runtime memory requirements relative to the total dataset size seem more than a little out of whack.

I'm just using a basic import command via curl, just like the documentation suggests (which also never make a single mention of memory management). We're talking about tens of thousands of entities here, not millions. This ought to be a trivial workload for any database. Are there any workarounds? I'm on a host with 16GB of memory and plenty of disk space.

@tylervick
Copy link

+1 - I encounter this issue using the recommended documentation (posting via cURL). A single kind with a few thousand records (~2GB) should not choke this up. Any ideas?

@JustinBeckwith
Copy link
Contributor Author

@stephenplusplus could I trouble you to take a look?

@ohmpatel1997
Copy link

ohmpatel1997 commented Nov 15, 2019

I have encountered the same issue while I was importing data of around 1GB. So the problem was in emulator only. The solution is to increase the allocated memory to JVM. It will eat up your entire CPU but it will work.

Stop your emulator.

Use the below command to increase the memory.

-Xms512m -Xmx1152m -XX:MaxPermSize=256m -XX:MaxNewSize=256m
-Xms: initial heap size
-Xmx: Maximum heap size

Increase it according to your needs and run your import again.

Hope this works for you...!

@google-cloud-label-sync google-cloud-label-sync bot added the api: datastore Issues related to the googleapis/nodejs-datastore API. label Jan 31, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: datastore Issues related to the googleapis/nodejs-datastore API. type: question Request for information or clarification. Not an issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants