Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disallow starting readwrite transactions while readonly transactions are running? #253

Closed
inexorabletash opened this issue Jan 18, 2019 · 11 comments · Fixed by #319
Closed

Comments

@inexorabletash
Copy link
Member

The text in transaction lifetime about "when a transaction can be started" doesn't define an algorithm (unlike e.g. web locks). Instead, it just defines preconditions. Specifically for this case:

  • ... As long as a read-only transaction is running, the data that the implementation returns through requests created with that transaction must remain constant ...

Which is then followed by the note:

There are a number of ways that an implementation can ensure this. The implementation could prevent any read/write transaction, whose scope overlaps the scope of the read-only transaction, from starting until the read-only transaction finishes. Or the implementation could allow the read-only transaction to see a snapshot of the contents of the object stores which is taken when the read-only transaction started.

Chrome implemented snapshots, and therefore allowed a readwrite transaction to start even while a readonly transaction overlapping scope was still running. e.g.

const tx1 = db.transaction('store', 'readonly');
const tx2 = db.transaction('store', 'readwrite');

Both tx1 and tx2 could start immediately; operations within tx1 would see a snapshot of the data.

Other engines (Firefox, Safari, Edge) did not implement this; tx2 would not start until tx1 finished.

Chrome is changing this as part of a rework of the implementation; we actually had a non-WPT test asserting this behavior, which broke.

Following this change, all engines will behave the same way - the readonly transactions must finish before the readwrite transaction can start. Should we codify that in the spec and WPTs?

@pwnall
Copy link

pwnall commented Jan 18, 2019

I'm conflicted between the two. I'll explain my arguments for both sides, to help the decision.

No ordering

As a user, the tutorials and MDN guides I have access to don't describe the ordering guarantees. For example, purely based on the guides, I didn't know that I can rely on requests completing in order. For example, when I had to fire 3 put requests, I wrote code waiting for all 3 to complete before proceeding. I didn't realize that I can just wait for the last request. I claim that most folks writing IndexedDB code operate in a similar mode.

As an implementer, the request ordering costs us extra complexity and will cost us extra memory. AFAIK the Cache Storage API has ran into a similar problem, and they decided to sacrifice ordering for performance. I definitely wish we didn't have ordering constraints -- user code that needs ordering can add extra coordination, and code that doesn't could squeeze out a bit more performance.

So, I think we shouldn't add extra ordering and, instead, we should call out that the ordering is implementation-dependent, but must be consistent with some ordering where readonly and readwrite don't overlap. To be clear, I don't know of a concrete setup where this would be a net-benefit... I'm just thinking about not over-constraining the future.

Ordering

Having strict ordering requirements and associated tests means we'll have consistency across IndexedDB implementations, increasing the odds that content will work well across browsers.

On the testing front -- after Chrome removes its snapshotting system, we won't have any implementation that does snapshots. So, we won't have any guarantee that WPT tests won't accidentally start depending on ordering anyways.

The Cache Storage API parallel isn't perfect. In their case, prioritizing performance is easily the right call, because Cache Storage reads generally block page loads. IndexedDB is used more often outside of critical paths.

@vweevers
Copy link

Chrome is changing this as part of a rework of the implementation

Is Chrome making this change merely to align with other engines, because of technical limitations of the rework, or because the new behavior is deemed better (and if so, why?).

I'm curious, because being able to (slowly) read and write simultaneously, and for reads to not be affected by writes, seems like a fairly fundamental requirement for a database.

@inexorabletash
Copy link
Member Author

Is Chrome making this change merely to align with other engines, because of technical limitations of the rework, or because the new behavior is deemed better (and if so, why?).

The implementation ends up being slightly simpler. The new behavior "falls out" of the new implementation. We could maintain Chrome's current behavior with additional code, which would not be too burdensome, but it's not clear there's a good reason to do so. (Anyone relying on it would have cross-browser bugs.) We'll be monitoring metrics to see if it affects performance of sites in a measurable way.

@vweevers
Copy link

Anyone relying on it would have cross-browser bugs

How so? Is that based on the assumption that people are relying on the ordering of transactions? Like @pwnall is describing, I did not know I could rely on the order, nor would I (if and when I'd need it, I would handle such a thing in the application).

We'll be monitoring metrics to see if it affects performance of sites in a measurable way.

IMO IndexedDB has never been optimal for large amounts of data (with backpressure), which means you'll have a "blind spot" in the spectrum of use cases. If no one uses IndexedDB like that (because they can't, not because they don't want to) then metrics will not show a performance degradation.

@pwnall
Copy link

pwnall commented Jan 18, 2019

@vweevers The "traditional" database systems I've looked at use some form of locking, so at least some reads will be blocked on some writes. SQLite only has one database-level S/X lock, and a bunch of systems only implement table-level locking. I don't consider row-level locking or snapshotting to be "fundamental" requirements for a DB system, and I think that SQLite's adoption proves this to be correct.

In other words, given the available embedded database systems, I'd strongly object to making snapshots (or any other form of concurrent reads and writes) required in IndexedDB.

If your application relies on concurrent reads and writes to the same store, I'm guessing you have large transactions. You'll enjoy better performance on all browser engines if you break up your writes into smaller batches. This will also give your read transactions an opportunity to get scheduled.

@vweevers
Copy link

so at least some reads will be blocked on some writes.

That's fine. Completely different from blocking all reads or writes.

In other words, given the available embedded database systems, I'd strongly object to making snapshots (or any other form of concurrent reads and writes) required in IndexedDB.

There aren't many available embedded database systems in browsers with persistent storage. Emscripten has the IDBFS file system which (full-circle) uses IndexedDB, and WORKERFS, which is only available in workers. I'm gonna be looking into this though. Particularly interested in the wasm+leveldb option that @inexorabletash mentioned elsewhere.

You'll enjoy better performance on all browser engines if you break up your writes into smaller batches. This will also give your read transactions an opportunity to get scheduled.

Yeah, batching writes is how I managed to get 10+ million writes working in a now-discontinued Chrome extension, where the background page was silently synchronizing data with a backend while other contexts were reading from the same store. Thanks to snapshots, I did not have to worry about the order of transactions or what other contexts were doing in the mean time. I still had to work around the limited lifetime of transactions though, having to make a choice between snapshot guarantees and proper backpressure.

@asutherland
Copy link
Collaborator

As things stand, from a standards-perspective, I think the obvious right thing to do is to codify that the readonly transaction must finish before the readwrite can start. If all code will be written on implementations that assume the tx1 and tx2 transactions and their application logic can't interleave, then it's almost certain there will be emergent application bugs when they can interleave.

However, at TPAC 2018 in the ServiceWorkers meeting, there was the ongoing issue of how SWs should deal with complicated issues like LRU eviction from the Cache API. And a serious proposal was that perhaps we should turn to IndexedDB in those cases (see w3c/ServiceWorker#863 (comment)), enabling it to persist Request and Response objects. Its indices enable all sorts of custom logic and lets us avoid complicating the Cache API. I think it makes sense to make sure there is a game-plan around the realities of that use-case before changing the spec for this related issue.

If we pursue that course of action, it's likely that we'd have to face the reality of a SW that wants to serve pre-readwrite-transaction data while there's an ongoing readwrite-transaction that's updating the database. In those cases, it wouldn't be acceptable for "fetch" events to be stalled by a potentially long-running "readwrite" transaction. A snapshot based mechanism would seem helpful. However, that could also just be a new type of transaction with new semantics that are "any 'readonly' transactions opened before the 'fancy' transaction completes will see the database's state before the 'fancy' transaction was opened."

(Existing "readonly" transactions wouldn't really work, even with snapshot required, because they would have to be opened prior to the "readwrite" transaction and then spammed with read requests to keep the transaction open under the existing transaction model. Currently we require any transaction created after the "readwrite" one to see its changes and therefore block on it.)

@dmurph
Copy link

dmurph commented Jan 25, 2019

If the expectation is that these IndexedDB cache-api calls are going to have multiple...cycles? (if they are going to issue an operation, wait until the result, and then issue another operation) - then I can see how the write-while-reading pattern could slow things down.

However, if we are only expecting these IndexedDB transactions to be a single 'get' or a single 'put', then I believe the API user could achieve optimal performance by calling 'commit()' on all of their transactions. I could be misunderstanding this though.

@asutherland
Copy link
Collaborator

I guess I'm picturing 2 primary use-cases:

  1. Something like LRU eviction that operates on streaming processing of the current contents of the database where the reads use cursors/similar to perform mutations in bite-sized pieces. This could be a fairly large amount of data and take a fairly long time.
  2. Complicated upgrade scenarios that may require IndexedDB enhancements.

Complicated Upgrade Scenarios

I think the idealized upgrade scenario for ServiceWorkers as naturally flows from the spec is that the SW script is changed whenever an upgrade needs to happen and it populates a new CacheStorage cache asynchronously during a waitUntil()'ed "install". Normal activation happens or a complicated dance with skipWaiting()/claim() happens. The newly activated ServiceWorker serves out of the new cache.

In practice, it seems like many ServiceWorker authors do not want to change the ServiceWorker script when they make changes to their app. Instead, they want the ServiceWorker to be parameterized. This can work with the above, but one problem is that if the cache name/IDB name isn't something hardcoded into the SW script that gets updated, a mechanism like w3c/ServiceWorker#1331 that's synchronously available is desired so the right cache/db can be opened. Or there needs to be a way to pivot the new cache/database to replace the old database.

Pivoting from old to new storage could happen in a few ways:

  • Atomically renaming caches / IDB databases. This is not yet a thing.
  • Use a lock or transaction to stall new requests while the new, already-fetched state is propagated into the "current" database. The IDB transaction model doesn't quite work here without jumping through some hoops to spam the target "readwrite" database with no-op requests if the source isn't delivering data fast enough. This could all be made less ugly if IDB started supporting waitUntil()/web-locks, but then things could end up blocking for quite some time. So then a snapshot mechanism potentially looks more attractive, especially because it could also simplify the update process.
  • Something involving storage buckets. Be able to clone/fork them and/or rename them.
  • Addressing A simple globally available store of data per-registration ServiceWorker#1331 with some means of having a per-ServiceWorker or per-registration store that has very good performance characteristics (synchronous or guaranteed <10ms fast async) so that people use it to identify the current db/cache names.

Tying this back to the issue, I think it would be good to have a general consensus about what directions we expect to move the IndexedDB spec to support storing Requests/Responses, and how large or over-engineered sites will deal with the upgrade mechanism. I personally like the idea of having SW's create custom rename-able storage buckets that we are able to tie to their registrations (which helps with automatic cleanup), and giving them the option to fork existing storage buckets as a starting point. Clever IDB/Cache implementations could use snapshot/copy-on-write/unionfs/refcounting/whatever mechanisms under the hood, whereas naive implementations could just brute force copy the data across. And we'd never need to expose snapshot semantics to IDB.

@inexorabletash
Copy link
Member Author

Chrome no longer has snapshots here, per @dmurph

@inexorabletash
Copy link
Member Author

TPAC 2019 Web Apps Indexed DB triage notes:

Since behaviors align across browsers, we should spec it.

@inexorabletash inexorabletash added this to the V3 milestone Sep 20, 2019
inexorabletash added a commit that referenced this issue Feb 27, 2020
All implementations have a strict ordering of transactions with
overlapping scopes; read-only transactions can run in parallel but
block a later read/write transaction from starting, and the read/write
transactions similarly block later read/write and read-only
transactions.

Tighten up the constraints definition to something precise, and move
the wordy implications into a non-normative aside.

Also, define "overlapping scopes" as a term.

Closes #253
chromium-wpt-export-bot pushed a commit to web-platform-tests/wpt that referenced this issue Feb 29, 2020
In service of w3c/IndexedDB#253 move some
transaction scheduling tests from Blink to WPT.

This involved converting them from js-test.js to testharness.js, but
the overall logic of each test was retained.

This also adds one new test which verifies the change described
in w3c/IndexedDB#319 - all browsers implicitly
block R-O transactions behind overlapping R/W transactions.

Change-Id: I596aaa75b79bf3bf3e17a2553abb4e11329d59ab
inexorabletash added a commit that referenced this issue Mar 2, 2020
All implementations have a strict ordering of transactions with
overlapping scopes; read-only transactions can run in parallel but
block a later read/write transaction from starting, and the read/write
transactions similarly block later read/write and read-only
transactions.

Tighten up the constraints definition to something precise, and move
the wordy implications into a non-normative aside.

Also, define "overlapping scopes" as a term.

Closes #253
chromium-wpt-export-bot pushed a commit to web-platform-tests/wpt that referenced this issue Mar 3, 2020
In service of w3c/IndexedDB#253 move some
transaction scheduling tests from Blink to WPT.

This involved converting them from js-test.js to testharness.js, but
the overall logic of each test was retained.

This also adds one new test which verifies the change described
in w3c/IndexedDB#319 - all browsers implicitly
block R-O transactions behind overlapping R/W transactions.

Change-Id: I596aaa75b79bf3bf3e17a2553abb4e11329d59ab
Bug: 921193
chromium-wpt-export-bot pushed a commit to web-platform-tests/wpt that referenced this issue Mar 3, 2020
In service of w3c/IndexedDB#253 move some
transaction scheduling tests from Blink to WPT.

This involved converting them from js-test.js to testharness.js, but
the overall logic of each test was retained.

This also adds one new test which verifies the change described
in w3c/IndexedDB#319 - all browsers implicitly
block R-O transactions behind overlapping R/W transactions.

Change-Id: I596aaa75b79bf3bf3e17a2553abb4e11329d59ab
Bug: 921193
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2081237
Reviewed-by: Daniel Murphy <[email protected]>
Commit-Queue: Joshua Bell <[email protected]>
Auto-Submit: Joshua Bell <[email protected]>
Cr-Commit-Position: refs/heads/master@{#746553}
blueboxd pushed a commit to blueboxd/chromium-legacy that referenced this issue Mar 3, 2020
In service of w3c/IndexedDB#253 move some
transaction scheduling tests from Blink to WPT.

This involved converting them from js-test.js to testharness.js, but
the overall logic of each test was retained.

This also adds one new test which verifies the change described
in w3c/IndexedDB#319 - all browsers implicitly
block R-O transactions behind overlapping R/W transactions.

Change-Id: I596aaa75b79bf3bf3e17a2553abb4e11329d59ab
Bug: 921193
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2081237
Reviewed-by: Daniel Murphy <[email protected]>
Commit-Queue: Joshua Bell <[email protected]>
Auto-Submit: Joshua Bell <[email protected]>
Cr-Commit-Position: refs/heads/master@{#746553}
stephenmcgruer pushed a commit to web-platform-tests/wpt that referenced this issue Mar 4, 2020
In service of w3c/IndexedDB#253 move some
transaction scheduling tests from Blink to WPT.

This involved converting them from js-test.js to testharness.js, but
the overall logic of each test was retained.

This also adds one new test which verifies the change described
in w3c/IndexedDB#319 - all browsers implicitly
block R-O transactions behind overlapping R/W transactions.

Change-Id: I596aaa75b79bf3bf3e17a2553abb4e11329d59ab
Bug: 921193
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2081237
Reviewed-by: Daniel Murphy <[email protected]>
Commit-Queue: Joshua Bell <[email protected]>
Auto-Submit: Joshua Bell <[email protected]>
Cr-Commit-Position: refs/heads/master@{#746553}

Co-authored-by: Joshua Bell <[email protected]>
inexorabletash added a commit that referenced this issue Mar 4, 2020
All implementations have a strict ordering of transactions with
overlapping scopes; read-only transactions can run in parallel but
block a later read/write transaction from starting, and the read/write
transactions similarly block later read/write and read-only
transactions.

Tighten up the constraints definition to something precise, and move
the wordy implications into a non-normative aside.

Also, define "overlapping scopes" as a term.

Closes #253
xeonchen pushed a commit to xeonchen/gecko that referenced this issue Mar 7, 2020
…cheduling tests, a=testonly

Automatic update from web-platform-tests
WPT: Upstream and add some transaction scheduling tests (#22027)

In service of w3c/IndexedDB#253 move some
transaction scheduling tests from Blink to WPT.

This involved converting them from js-test.js to testharness.js, but
the overall logic of each test was retained.

This also adds one new test which verifies the change described
in w3c/IndexedDB#319 - all browsers implicitly
block R-O transactions behind overlapping R/W transactions.

Change-Id: I596aaa75b79bf3bf3e17a2553abb4e11329d59ab
Bug: 921193
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2081237
Reviewed-by: Daniel Murphy <[email protected]>
Commit-Queue: Joshua Bell <[email protected]>
Auto-Submit: Joshua Bell <[email protected]>
Cr-Commit-Position: refs/heads/master@{#746553}

Co-authored-by: Joshua Bell <[email protected]>

--

wpt-commits: 5e1160e543827227c05b10c7660795436a76b307
wpt-pr: 22027
moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this issue Mar 7, 2020
…cheduling tests, a=testonly

Automatic update from web-platform-tests
WPT: Upstream and add some transaction scheduling tests (#22027)

In service of w3c/IndexedDB#253 move some
transaction scheduling tests from Blink to WPT.

This involved converting them from js-test.js to testharness.js, but
the overall logic of each test was retained.

This also adds one new test which verifies the change described
in w3c/IndexedDB#319 - all browsers implicitly
block R-O transactions behind overlapping R/W transactions.

Change-Id: I596aaa75b79bf3bf3e17a2553abb4e11329d59ab
Bug: 921193
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2081237
Reviewed-by: Daniel Murphy <[email protected]>
Commit-Queue: Joshua Bell <[email protected]>
Auto-Submit: Joshua Bell <[email protected]>
Cr-Commit-Position: refs/heads/master@{#746553}

Co-authored-by: Joshua Bell <[email protected]>

--

wpt-commits: 5e1160e543827227c05b10c7660795436a76b307
wpt-pr: 22027
gecko-dev-updater pushed a commit to marco-c/gecko-dev-comments-removed that referenced this issue Mar 9, 2020
…cheduling tests, a=testonly

Automatic update from web-platform-tests
WPT: Upstream and add some transaction scheduling tests (#22027)

In service of w3c/IndexedDB#253 move some
transaction scheduling tests from Blink to WPT.

This involved converting them from js-test.js to testharness.js, but
the overall logic of each test was retained.

This also adds one new test which verifies the change described
in w3c/IndexedDB#319 - all browsers implicitly
block R-O transactions behind overlapping R/W transactions.

Change-Id: I596aaa75b79bf3bf3e17a2553abb4e11329d59ab
Bug: 921193
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2081237
Reviewed-by: Daniel Murphy <dmurphchromium.org>
Commit-Queue: Joshua Bell <jsbellchromium.org>
Auto-Submit: Joshua Bell <jsbellchromium.org>
Cr-Commit-Position: refs/heads/master{#746553}

Co-authored-by: Joshua Bell <inexorabletashgmail.com>

--

wpt-commits: 5e1160e543827227c05b10c7660795436a76b307
wpt-pr: 22027

UltraBlame original commit: 6b1929e476680a2aafc78037914b756199333f09
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified that referenced this issue Mar 9, 2020
…cheduling tests, a=testonly

Automatic update from web-platform-tests
WPT: Upstream and add some transaction scheduling tests (#22027)

In service of w3c/IndexedDB#253 move some
transaction scheduling tests from Blink to WPT.

This involved converting them from js-test.js to testharness.js, but
the overall logic of each test was retained.

This also adds one new test which verifies the change described
in w3c/IndexedDB#319 - all browsers implicitly
block R-O transactions behind overlapping R/W transactions.

Change-Id: I596aaa75b79bf3bf3e17a2553abb4e11329d59ab
Bug: 921193
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2081237
Reviewed-by: Daniel Murphy <dmurphchromium.org>
Commit-Queue: Joshua Bell <jsbellchromium.org>
Auto-Submit: Joshua Bell <jsbellchromium.org>
Cr-Commit-Position: refs/heads/master{#746553}

Co-authored-by: Joshua Bell <inexorabletashgmail.com>

--

wpt-commits: 5e1160e543827227c05b10c7660795436a76b307
wpt-pr: 22027

UltraBlame original commit: 6b1929e476680a2aafc78037914b756199333f09
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified-and-comments-removed that referenced this issue Mar 10, 2020
…cheduling tests, a=testonly

Automatic update from web-platform-tests
WPT: Upstream and add some transaction scheduling tests (#22027)

In service of w3c/IndexedDB#253 move some
transaction scheduling tests from Blink to WPT.

This involved converting them from js-test.js to testharness.js, but
the overall logic of each test was retained.

This also adds one new test which verifies the change described
in w3c/IndexedDB#319 - all browsers implicitly
block R-O transactions behind overlapping R/W transactions.

Change-Id: I596aaa75b79bf3bf3e17a2553abb4e11329d59ab
Bug: 921193
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2081237
Reviewed-by: Daniel Murphy <dmurphchromium.org>
Commit-Queue: Joshua Bell <jsbellchromium.org>
Auto-Submit: Joshua Bell <jsbellchromium.org>
Cr-Commit-Position: refs/heads/master{#746553}

Co-authored-by: Joshua Bell <inexorabletashgmail.com>

--

wpt-commits: 5e1160e543827227c05b10c7660795436a76b307
wpt-pr: 22027

UltraBlame original commit: 6b1929e476680a2aafc78037914b756199333f09
inexorabletash added a commit that referenced this issue Feb 1, 2021
All implementations have a strict ordering of transactions with
overlapping scopes; read-only transactions can run in parallel but
block a later read/write transaction from starting, and the read/write
transactions similarly block later read/write and read-only
transactions.

Tighten up the constraints definition to something precise, and move
the wordy implications into a non-normative aside.

Also, define "overlapping scopes" as a term.

Closes #253
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants