Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Firestore: Deleting a couple of documents returned by a limited subcollection query triggers all documents to disappear locally #1548

Closed
aleh opened this issue Jul 17, 2018 · 12 comments
Assignees
Milestone

Comments

@aleh
Copy link

aleh commented Jul 17, 2018

[REQUIRED] Step 2: Describe your environment

  • Xcode version: 9.4 (9F1027a)
  • Firebase SDK version: 5.3.0
  • Firebase Component: Firestore
  • Component version: 5.3.0

[REQUIRED] Step 3: Describe the problem

Steps to reproduce:

  • Set up a query on any subcollection with a limit smaller than the number of elements in the subcollection already, for example, 5.

  • Set up another query on the same subcollection with a limit set to 1. (Possibly unneeded step.)

  • Start deleting documents returned by the first query randomly one by one.

  • Observe that at some points the documents will begin disappearing from the query snapshot on their own. They will remain gone even if the app restarts but will be still visible via Firestore Console.

Relevant Code:

See example app along with more detailed description and video here.

Also, the following code in Firestore/Source/Remote/FSTRemoteEvent.mm in remoteEventAtSnapshotVersion: method seems to have something to do with this:

      if (targetState.current && [queryData.query isDocumentQuery]) {
        // Document queries for document that don't exist can produce an empty result set. To update
        // our local cache, we synthesize a document delete if we have not previously received the
        // document. This resolves the limbo state of the document, removing it from
        // limboDocumentRefs.
        FSTDocumentKey *key = [FSTDocumentKey keyWithPath:queryData.query.path];
        if (_pendingDocumentUpdates.find(key) == _pendingDocumentUpdates.end() &&
            ![self containsDocument:key inTarget:targetID]) {
          [self removeDocument:[FSTDeletedDocument documentWithKey:key version:snapshotVersion]
                       withKey:key
                    fromTarget:targetID];
        }
      }
@aleh aleh changed the title Firestore: Deleting a couple of documents returned by limited subcollection query triggers all documents to disappear locally Firestore: Deleting a couple of documents returned by a limited subcollection query triggers all documents to disappear locally Jul 17, 2018
@mikelehen mikelehen self-assigned this Jul 17, 2018
@mikelehen
Copy link
Contributor

Thank you for the repro app and video! I'll take a look and get back to you.

@wilhuff
Copy link
Contributor

wilhuff commented Jul 18, 2018

Thanks for filing this. It turns out to be the cause of this discussion. We'll treat this as the canonical issue for this bug. The problem exists on all Firestore mobile/web SDKs.

For context, when a client is listening to a limit query (and in certain other cases) and a document moves out of the query results the server is allowed to tell the client merely that a document is removed without fully describing why that was. This is an intentional ambiguity that gives us some wiggle room in the server implementation.

When this happens the client considers the document to be in limbo and attempts to figure out what happened to that document by requesting it out of band. Without this limbo resolution process, the next time the client ran the query you would see a flicker: the client would serve stale data from the cache which would disappear when the server gave up-to-date results again. This is something we try to avoid.

So with that background, the bug is in this limbo resolution process. There's a certain order of events from the server that can cause the client to mistakenly determine that a limbo document doesn't exist and this is causing the client to populate negative entries in its cache. The fix for this is in flight here: firebase/firebase-js-sdk#1014.

This alone would not be a problem because the next update from the server would fix the client's cache, but the client is also recording the commit time of the delete as the current time of the limbo resolution read, which is typically later than the last server-side modification. There are corner cases around updates to documents that don't match a query where the server is not obligated to send the absolute latest version of a document so the client will prefer its version of a document if it's later than the server issued one. This compounds the issue and causes it to be persistent. A fix that will allow the client to self-heal its cache is progress here: firebase/firebase-js-sdk#1015.

(Please don't be alarmed: we're fixing the js-sdk first because we have a cross-platform test suite written in typescript. It's easier to fix there and then port.)

@aleh
Copy link
Author

aleh commented Jul 18, 2018

@mikelehen @wilhuff Thanks for the detailed update!

wilhuff added a commit to firebase/firebase-js-sdk that referenced this issue Jul 18, 2018
This is a force fix for potential existence filter mismatches caused by
firebase/firebase-ios-sdk#1548

The essential problem is that when resuming a query, the server is
allowed to forget deletes. If the number of incorrectly synthesized
deletes matches the number of server-forgotten deletes then the
existence filter can give a false positive, preventing the cache from
self healing.

Dropping the query cache clears any client-side resume token which
prevents a false positive existence filter mismatch.

Note that the remote document cache and mutation queues are unaffected
so any cached documents that do exist will still work while offline.
wilhuff added a commit that referenced this issue Jul 19, 2018
This is a force fix for potential existence filter mismatches caused by
#1548

The essential problem is that when resuming a query, the server is
allowed to forget deletes. If the number of incorrectly synthesized
deletes matches the number of server-forgotten deletes then the
existence filter can give a false positive, preventing the cache from
self healing.

Dropping the query cache clears any client-side resume token which
prevents a false positive existence filter mismatch.

Note that the remote document cache and mutation queues are unaffected
so any cached documents that do exist will still work while offline

firebase/firebase-js-sdk#1019
wilhuff added a commit to firebase/firebase-js-sdk that referenced this issue Jul 19, 2018
* Add a schema migration that drops the query cache

This is a force fix for potential existence filter mismatches caused by
firebase/firebase-ios-sdk#1548

The essential problem is that when resuming a query, the server is
allowed to forget deletes. If the number of incorrectly synthesized
deletes matches the number of server-forgotten deletes then the
existence filter can give a false positive, preventing the cache from
self healing.

Dropping the query cache clears any client-side resume token which
prevents a false positive existence filter mismatch.

Note that the remote document cache and mutation queues are unaffected
so any cached documents that do exist will still work while offline.

* Implement review feedback
@paulb777 paulb777 added this to the 5.4.1 milestone Jul 19, 2018
wilhuff added a commit that referenced this issue Jul 19, 2018
…tes in it (#1557)

* Update spec tests from the js-sdk

* Allow remote updates from watch to heal a cache with synthesized deletes in it

Port of firebase/firebase-js-sdk#1015

Addresses #1548
@scottmas
Copy link

I see the js and ios fixes have been merged. Into master. Any ETA on when we can expect new version releases with the fix?

@mikelehen
Copy link
Contributor

@scottmas There's still one more part to the iOS fix (#1558) and then we'll work on getting a new release out with the fixes. We'll update this thread as that happens.

wilhuff added a commit that referenced this issue Jul 19, 2018
This is a force fix for potential existence filter mismatches caused by
#1548

The essential problem is that when resuming a query, the server is
allowed to forget deletes. If the number of incorrectly synthesized
deletes matches the number of server-forgotten deletes then the
existence filter can give a false positive, preventing the cache from
self healing.

Dropping the query cache clears any client-side resume token which
prevents a false positive existence filter mismatch.

Note that the remote document cache and mutation queues are unaffected
so any cached documents that do exist will still work while offline

firebase/firebase-js-sdk#1019
wilhuff added a commit that referenced this issue Jul 19, 2018
…tes in it (#1557)

* Update spec tests from the js-sdk

* Allow remote updates from watch to heal a cache with synthesized deletes in it

Port of firebase/firebase-js-sdk#1015

Addresses #1548
wilhuff added a commit that referenced this issue Jul 19, 2018
This is a force fix for potential existence filter mismatches caused by
#1548

The essential problem is that when resuming a query, the server is
allowed to forget deletes. If the number of incorrectly synthesized
deletes matches the number of server-forgotten deletes then the
existence filter can give a false positive, preventing the cache from
self healing.

Dropping the query cache clears any client-side resume token which
prevents a false positive existence filter mismatch.

Note that the remote document cache and mutation queues are unaffected
so any cached documents that do exist will still work while offline

firebase/firebase-js-sdk#1019
@wilhuff
Copy link
Contributor

wilhuff commented Jul 19, 2018

@scottmas It's now possible to try this out. Add these lines to your Podfile (replacing any existing entries for FirebaseCore and FirebaseFirestore.

pod 'FirebaseCore', :git => 'https://github.com/firebase/firebase-ios-sdk.git', :branch => 'wilhuff/pre-5.4.1'
pod 'FirebaseFirestore', :git => 'https://github.com/firebase/firebase-ios-sdk.git', :branch => 'wilhuff/pre-5.4.1'

@scottmas
Copy link

Great! What about Android?

@wilhuff
Copy link
Contributor

wilhuff commented Jul 19, 2018

That's coming, but not ready yet. I'm mostly interested in verifying that with this in place your iOS simulator with the broken calendar entries is fixed.

@scottmas
Copy link

scottmas commented Jul 19, 2018 via email

@wilhuff
Copy link
Contributor

wilhuff commented Jul 19, 2018

Could you elaborate on how it didn't work?

Note Firebase/Core is the Core subspec of the Firebase pod. It's distinct from FirebaseCore which is the source pod actually containing the code. Similarly Firebase/Firestore is distinct from FirebaseFirestore.

After pruning unrelated things out I have the following in my Podfile for a test app and I'm able to build successfully:

target 'mcg-playground' do
  use_frameworks!

  pod 'FirebaseCore', :git => 'https://github.com/firebase/firebase-ios-sdk.git', :branch => 'wilhuff/pre-5.4.1'
  pod 'FirebaseFirestore', :git => 'https://github.com/firebase/firebase-ios-sdk.git', :branch => 'wilhuff/pre-5.4.1'

  pod 'Firebase/Core'
  pod 'Firebase/Firestore'
  pod 'Firebase/Auth'
end

@wilhuff
Copy link
Contributor

wilhuff commented Jul 20, 2018

This has been released in Firebase JavaScript SDK 5.3.0 and Firebase iOS SDK 5.4.1. Android will follow separately.

@scottmas
Copy link

scottmas commented Jul 20, 2018 via email

schmidt-sebastian added a commit to firebase/firebase-js-sdk that referenced this issue Jul 25, 2018
* Add @davideast as a CODEOWNER (#996)

* Embed metadata directly into the RPC call (#979)

* Embed metadata directly into the RPC call

* [AUTOMATED]: Prettier Code Styling

* Use ...args

* [AUTOMATED]: Prettier Code Styling

* Minimize diff

* Add the OAuth assertion back in

* Added missing type for optional database url. (#1001)

* RxFire Realtime Database (#997)

* initial database code

* test setup

* database tests

* auditTrail and database tests

* [AUTOMATED]: Prettier Code Styling

* [AUTOMATED]: License Headers

* Josh's comments. Database docs

* [AUTOMATED]: Prettier Code Styling

* Firestore docs

* auth docs

* declaration fixes

* switch to peerDeps

* [AUTOMATED]: Prettier Code Styling

* test config

* Expose array transforms and array contains queries. (#1004)

Also remove test code that was combining multiple array contains queries since those were disallowed in 04c9c3a.

* Move fieldFilter (free function) to Filter.create() (#988)

This is a refactoring to unify filter creation across platforms.

* Enable firestore sdk to talk to emulator (#1007)

* Enable firestore sdk to talk to emulator

* [AUTOMATED]: Prettier Code Styling

* Revert firestore sdk changes

* [AUTOMATED]: Prettier Code Styling

* Revert credentials.ts

* Cleanup

* [AUTOMATED]: Prettier Code Styling

* Set webSafe=false

* Combine initializeTestApp and initializeFirestoreTestApp

* [AUTOMATED]: Prettier Code Styling

* Cleanup

* [AUTOMATED]: Prettier Code Styling

* Update major version since this is a breaking change that will cause the testing sdk to no longer work with old versions of the RTDB emulator

* Completely remove admin sdk

* Change version back to 0.1.0

* Setting GarbageSource in SyncEngine's constructor (#1010)

* b/72533250: Fix issue with limbo resolutions triggering incorrect manufactured deletes. (#1014)

This fixes an issue occurring when a limbo target receives a documentUpdate,
then a global snapshot, and then a CURRENT. Because there was a global
snapshot before the CURRENT, WatchChangeAggregator has no pending document
updates and calls SyncEngine.targetContainsDocument() to see if we previously got any
document from the backend for the target. See:
https://github.com/firebase/firebase-js-sdk/blob/6905339235ad801291edc696dd75a08e80647f5b/packages/firestore/src/remote/watch_change.ts#L422

Prior to this change, targetContainsDocument() returned false because
it relies on our Views to track the contents of the target, and we don't
have Views for limbo targets. Thus WatchChangeAggregator incorrectly
manufactures a NoDocument document update which deletes data from our
cache.

The fix is to have SyncEngine track the fact that we did indeed get
a document for the limbo resolution and return true from
targetContainsDocument().

* Updating yarn.lock

* Add @firebase/util as a dep of @firebase/testing

* Allow remote updates from watch to heal a cache with synthesized deletes in it (#1015)

* Write a spec test for the busted cache

* Modify spec test to demonstrate deletedDoc issue. (#1017)

* Allow updates for targets where the document is modified

* Fix getRemoteKeysForTarget() method name in comment. (#1020)

While porting I noticed this was slightly wrong. targetContainsDocument() is the method in WatchChangeAggregator. The SyncEngine method I meant to reference is getRemoteKeysForTarget().

* Making sure we don't export 'experimental' (#1023)

* Add a schema migration that drops the query cache (#1019)

* Add a schema migration that drops the query cache

This is a force fix for potential existence filter mismatches caused by
firebase/firebase-ios-sdk#1548

The essential problem is that when resuming a query, the server is
allowed to forget deletes. If the number of incorrectly synthesized
deletes matches the number of server-forgotten deletes then the
existence filter can give a false positive, preventing the cache from
self healing.

Dropping the query cache clears any client-side resume token which
prevents a false positive existence filter mismatch.

Note that the remote document cache and mutation queues are unaffected
so any cached documents that do exist will still work while offline.

* Implement review feedback

* Add a release note for the fix to #1548 (#1024)

* Ensure that we create an empty TargetGlobal row. (#1029)

Ensure the v3 migration unconditionally creates the TargetGlobal row. Remove the no-longer-necessary v2 schema migration.

* Remove unnecessary `any` (#1030)

* Fix an errant any usage

* [AUTOMATED]: Prettier Code Styling

* Publish [email protected]

* Unify local.QueryData with the other platforms (#1027)

This makes it line up with it's own docs, and also the other platforms.

* Fix to #1027 to allow SnapshotVersion == 0 (#1033)

* Add iat to fake access token payload (#1022)

* Add iat to fake access token payload

* [AUTOMATED]: Prettier Code Styling

* Simpler tests

* [AUTOMATED]: Prettier Code Styling

* Do not clobber iat

* catch server error RESET_PASSWORD_EXCEED_LIMIT (#1037)

* Merging Master into Multi-Tab (#1038)
@firebase firebase locked and limited conversation to collaborators Oct 30, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants