Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry GoogleDataTransport Connection Errors #1705

Merged
merged 1 commit into from
Jun 23, 2020
Merged

Conversation

samedson
Copy link
Contributor

@samedson samedson commented Jun 23, 2020

  • In cases where there is bad connection, the current behavior deletes the cached reports because they are caught as an IOException. They appear to be encoding errors, when in reality it's an error opening the OutputStream that is the connection.

  • This changes it so that when the IOException is due to connection problems, we retry the upload instead of deleting it.

This is where we transform status numbers into success, fatal, and transient errors: CctTransportBackend.java

  • 400-level errors (except 404) are treated as fatal errors, which are deleted
  • 500-level errors are treated as transient errors, which are retried

This is where we decide to delete fatal errors, and retry transient errors: Uploader.java

@googlebot googlebot added the cla: yes Override cla label Jun 23, 2020
@samedson samedson requested a review from rlazo June 23, 2020 17:57
@google-oss-bot
Copy link
Contributor

Binary Size Report

Affected SDKs

  • firebase-encoders-json

    Type Base (037282f) Head (94d853c7) Diff
    apk (aggressive) 10.9 kB 11.0 kB +15 B (+0.1%)
    apk (debug) 28.6 kB 28.6 kB +3 B (+0.0%)
  • transport-api

    Type Base (037282f) Head (94d853c7) Diff
    apk (aggressive) 10.9 kB 11.0 kB +15 B (+0.1%)
    apk (debug) 23.0 kB 23.0 kB -13 B (-0.1%)
  • transport-backend-cct

    Type Base (037282f) Head (94d853c7) Diff
    aar 39.0 kB 39.0 kB +67 B (+0.2%)
    apk (aggressive) 47.9 kB 48.0 kB +57 B (+0.1%)
    apk (debug) 102 kB 102 kB +51 B (+0.1%)
    apk (release) 82.4 kB 82.4 kB +41 B (+0.0%)
  • transport-runtime

    Type Base (037282f) Head (94d853c7) Diff
    apk (debug) 80.6 kB 80.6 kB +2 B (+0.0%)

Test Logs

Notes

Head commit (94d853c7) is created by Prow via merging commits: 037282f e2af542.

@google-oss-bot
Copy link
Contributor

Coverage Report

Affected SDKs

  • transport-backend-cct

    SDK overall coverage changed from 78.45% (037282f) to 78.14% (94d853c7) by -0.31%.

    Filename Base (037282f) Head (94d853c7) Diff
    CctTransportBackend.java 96.24% 94.71% -1.53%

Test Logs

Notes

HTML coverage reports can be produced locally with ./gradlew <product>:checkCoverage.
Report files are located at <product-build-dir>/reports/jacoco/.

Head commit (94d853c7) is created by Prow via merging commits: 037282f e2af542.

@samedson
Copy link
Contributor Author

/test smoke-tests

@ashwinraghav
Copy link
Contributor

Links were super helpful.
It seems like we model IOExceptions as transient errors already

This would result in retrying ? What am i missing

@samedson
Copy link
Contributor Author

@ashwinraghav The issue is IOException from this error don't bubble up to that try catch because they're caught right here, and returned as a HttpResponse(400, null, 0);.

So when this error happens that we are testing, we get a valid response here. And then it becomes a fatalError here before, because it was a 400. Not since we return a HttpResponse(500, null, 0);, it becomes a transientError

@ashwinraghav
Copy link
Contributor

@ashwinraghav The issue is IOException from this error don't bubble up to that try catch because they're caught right here, and returned as a HttpResponse(400, null, 0);.

So when this error happens that we are testing, we get a valid response here. And then it becomes a fatalError here before, because it was a 400. Not since we return a HttpResponse(500, null, 0);, it becomes a transientError

Makes sense.

@ashwinraghav
Copy link
Contributor

Can you add a test similar to

@@ -268,6 +270,9 @@ private HttpResponse doSend(HttpRequest request) throws IOException {
// JsonWriter often writes one character at a time.
dataEncoder.encode(
request.requestBody, new BufferedWriter(new OutputStreamWriter(outputStream)));
} catch (ConnectException | UnknownHostException e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking out loud: This seems like a pretty conservative approach, as opposed to treating all IO Exceptions as retry-able.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah exactly - the reason we did this is there are legit IO Exceptions that would not be retryable, eg. encoding errors of some sort

@samedson
Copy link
Contributor Author

@ashwinraghav we tried to write a test, but it seems really difficult with the current code setup, and with WireMock. We're gonna punt for now and sync up with Vlad on Friday to see if he knows a way.

@samedson samedson merged commit 08507fa into master Jun 23, 2020
@samedson samedson deleted the gdt-connection-e branch June 23, 2020 20:22
@firebase firebase locked and limited conversation to collaborators Jul 24, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants