Slow download and upload to Google Cloud Storage

Alknes
New Member

I'm in need to upload a small chunk of files, around 70k of them, using the google cloud storage client for python. The size of the files are no larger than 1kb. In the beginning of the upload process I managed to upload about 20 files per second, but suddenly I started getting 429. After implementing measures to retry the upload waiting for some seconds between each request if I get 429, it is now suddenly always using more than 10 seconds for each upload. 

I tried checking see if we were hitting the quota, but nothing is indicating to me that there is something wrong with the setup. Any suggestion on what could be causing the issue?

2 3 1,913
3 REPLIES 3

My guess is that you are running into the limits on how fast you can ramp up the request rate:

 
For that to be true  you would have to be making the requests asynchronously and maybe in parallel, since you'd need about 1000 requests per second to trigger this limit.   You would do that likely with transfer_manager.upload_many_from_filenames or similar.   Is that possibly true of your set up? If so, you need to find a way to reduce the rate at which you are making requests. 
 

 

When I use upload_many_from_filenames with all the values, I see the code keeps running and keeps running for more than 11 minutes. I had to forcibly interrupt and terminate the execution. I have only 4 files in a source directory. Why is it so long without any result?

It seems like you've already implemented a mechanism to handle 429 errors by retrying with a delay. However, if you're still experiencing delays in your uploads, there could be several reasons for this behavior. I have solve same issue with my car loan project and I am sure you this will help you Here are some suggestions to help you troubleshoot the issue:

  1. Check for Rate Limiting or Quota Exceedance:

    • Review the Google Cloud Storage quotas and limits to ensure you're not exceeding any limitations: 
    • Confirm that your retries with delays are working as expected and not contributing to a longer overall processing time.
  2. Review Your Retry Strategy:

    • Ensure that your retry strategy is optimized. For example, you might want to implement an exponential backoff mechanism, where the delay between retries increases exponentially.
  3. Network Latency:

    • Check for network latency issues between your application and Google Cloud Storage. You can use tools like traceroute or ping to identify potential network bottlenecks.
  4. Check Server-Side Logging:

    • Review the server-side logs for your Google Cloud Storage bucket. There might be additional error messages or information that could help pinpoint the issue.
  5. Google Cloud Storage Client Configuration:

    • Review your Google Cloud Storage client configuration. Ensure that you are using the latest version of the client library and that your client configuration is appropriate for your use case.
  6. Consider Batch Operations:

    • Instead of uploading files one by one, consider batching the uploads. You can use the Google Cloud Storage JSON API's compose method to concatenate multiple objects into a single object. This might help reduce the number of API requests.
  7. Monitor Resource Utilization:

    • Monitor the resource utilization of your application during the file uploads. Check for any potential bottlenecks in CPU, memory, or disk usage.
  8. Review API Request and Response Times:

    • Monitor the time it takes for each API request and response. This can help identify if the issue is related to API responsiveness.
  9. Consider Asynchronous Operations:

    • Depending on your use case, consider using asynchronous programming techniques to handle multiple uploads concurrently.
  10. Contact Google Cloud Support:

  • If the issue persists and you can't identify the root cause, consider reaching out to Google Cloud Support for assistance.

Remember to carefully review your code and configurations, and consider the specific characteristics of your workload and network environment.