Bucket hooks #18

annevk · 2016-03-31T16:38:18Z

@jakearchibald @jungkees hey! I was wondering what kind of hooks you need to make it clear e.g., service worker registrations and the Cache API are stored in a box.

In #4 we are discussing the cleanup steps for when a box gets closed, but maybe we should also have formal language for actually storing something inside?

annevk · 2017-11-20T13:59:43Z

On IRC Jake suggested that we could just have "bucket has an associated X" where X could be service worker registrations and such. This assumes that when clearing a bucket is replaced with a new one (allowing X effectively to be GC'd as there are no more references to it). Is that the model we want? Currently we just say a bucket is cleared.

One problem is that we'd have to copy some state over from the old bucket, such as persistence and potentially more in the future once we start expanding the concept. At least, I think if you clear, you don't necessarily expect to have to invoke persist() again.

Thoughts?

An alternative is that a bucket has something like a specification-level GetStorageHandler(Identifier, optional ClearCallback) operation that returns a StorageHandler in which you can store stuff.

cc @inexorabletash @mikewest

jakearchibald · 2017-11-20T14:54:29Z

An alternative is that a bucket has something like a specification-level GetStorageHandler(Identifier, optional ClearCallback) operation that returns a StorageHandler in which you can store stuff.

"A bucket has storages", and it's the storages that become detached.

Adding callback steps for cleanup is fine unless the order becomes observable.

annevk · 2017-11-20T15:10:56Z

I think the order will be observable given the combination of navigator.storage.clear() IDB, and Cache API. Probably also with other APIs.

https://w3c.github.io/webappsec-clear-site-data/#abstract-opdef-clear-dom-accessible-storage-for-origin deals with this through enumeration (though doesn't list the Cache API). My idea with the identifier was that we'd first sort lexicographically and then invoke the ClearCallback, but perhaps it's better to just list everything in the Storage Standard and require it to be updated as new things are added.

jakearchibald · 2017-11-20T15:12:16Z

perhaps it's better to just list everything in the Storage Standard and require it to be updated as new things are added.

That seems fine. Doesn't hurt to have all origin storage referenced from one place.

mikewest · 2017-11-21T07:22:16Z

perhaps it's better to just list everything in the Storage Standard and require it to be updated as new things are added.

I'd agree that this is the right approach. Clear-Site-Data would be better if it deferred to Storage, rather than requiring additional enumeration of storage mechanisms.

annevk · 2020-04-16T16:16:46Z

FWIW, I have the feeling I'm missing a simpler solution here and as you can tell this is very much a sketch. Would love to hear your thoughts.

The idea here is to define existing storage APIs, such as service workers and localStorage, on top of these primitives so we get a well-defined Clear-Site-Data and hopefully some other benefits too. I suspect this architecture might also work for the Storage Access API in due course, though it depends a bit on how all that will pan out.

Storage APIs (e.g., localStorage) need to define:

A storage identifier (a string), e.g., "localStorage". (These should match those of UsageDetails from https://github.com/whatwg/storage/pull/69/files.)
A replace algorithm to abort transactions or some such in the event of storage bucket replacement. (Could be nothing if there's no cleanup to be done.)
They need to invoke the "obtain a storage bucket area map" algorithm (outlined below) for environments that end up using the API and use the returned map as the place to store all their data.

The Storage Standard needs to define:

A registry of all storage identifiers and an easy way to get from one to its corresponding replace algorithm.

A storage bucket holds a map of storage identifiers to storage areas.

A storage area is a struct consisting of map and a proxy map pointer set.

(The idea is that storage area's map holds the actual storage. It's in a map because those are easy to work with. How the map is persisted is implementation-defined. How to make it available across process boundaries is implementation-defined.)

A proxy map has identical operations to a map and performs those on its underlying map.

(We hand out a proxy map to a storage API so we can replace the actual map behind the scenes.)

New algorithms:

To obtain a storage bucket area map, given a storage identifier identifier and an environment environment, run these steps:

Let key be the result of obtaining a storage key from environment. (This should be less hand-wavy.)
Let bucket be the storage bucket for key. (This should be less hand-wavy.)
Let storageArea be bucket's map[identifier].
Let proxyMap be a new proxy map.
Append a pointer to proxyMap to storageArea's proxy map pointer set.
Set proxyMap's underlying map to storageArea's map.
Return proxyMap.

(The above algorithm is intended for storage APIs. They would invoke this upon initialization to get a map to store things in.)

To replace a storage bucket old with a storage bucket new, run these steps:

Atomically:
1. Replace old with new. (This should be less hand-wavy and probably talk about the site storage unit.)
2. For each identifier → storageArea of old's map:
  1. For each proxyMapPointer of storageArea's proxy map pointer set:
    1. Let newStorageArea be new's map[identifier].
    2. Set the value of proxyMapPointer's underlying map to newStorageArea's map.
    3. Append proxyMapPointer to newStorageArea's proxy map pointer set.
For each impacted agent of ...: (This should be less hand-wavy)
1. Queue a task to:
  1. For each identifier of ...:
    1. Run identifier's corresponding replace algorithm with ....

(There's a couple things that need to be filled out here including what kind of details the replace algorithm might need to clean up the relevant APIs.)

inexorabletash · 2020-04-16T17:22:01Z

Just to be crystal clear (still waking up ☕ vs. multiple levels of indirection), the usage of the storage area's map is up to the particular storage API, i.e. for localStorage the map's keys/values are literally the (local) storage area's keys/values; for Indexed DB the keys/values would be database names/database constructs, for Cache Storage the keys/values would be cache names/caches, etc. Or a storage API could have a single entry in its storage area, and put all of its structure inside the single value. The need for this map is just because it's a common pattern across all storage APIs.

"Storage area" as a term seems to conflict with HTML's use for localstorage, but maybe they can coalesce? Or HTML can get a new term as part of refactoring to align with this. (I don't think it's formally defined in HTML?)

I think this proposal works for Indexed DB. (From a spec level; haven't thought about implementation impact, especially the replacement part.)

hober · 2020-04-16T20:50:39Z

Overall, @annevk, your sketch looks really good to me. One really basic question:

Let key be the result of obtaining a storage key from environment. (This should be less hand-wavy.)

I imagine the obtain a storage key from an environment algorithm could return a (registrable domain, registrable domain) tuple for partitioned storage, and a registrable domain otherwise?

annevk · 2020-04-17T14:15:16Z

Yeah, the map is there so APIs don't have to design their own infrastructure. (It also isn't quite clear to me what the alternative would be.)
Yeah, HTML needs a rewrite at which point it no longer needs the storage area it talks about. I guess one thing that's a bit unclear is sessionStorage. That might warrant some indirection so different sessions get their own. (Edit: the better solution here would be for it to request a map in a session bucket, which would also solve Allow 'session' bucket #71 and not require sessionStorage to manage the lifetime.)
Yeah, the storage key can/needs to account for partitioning efforts.

domenic · 2020-04-17T16:17:41Z

Relaying some discussion from IRC:

Let key be the result of obtaining a storage key from environment. (This should be less hand-wavy.)

I assume most of the time the storage key will be an origin. But not always. In particular this step will allow us to define both double-keying and blocking of storage.

Note that currently storage is blocked in opaque origins on a per-API basis (e.g. localStorage, idb.open()). Those mechanisms should probably be subsumed here, so that if environment is an opaque origin, key is failure, and the rest of the algorithm fails. This also allows other scenarios to block storage by intervening at the "obtain a key" stage.

mkruisselbrink · 2020-04-17T16:44:23Z

Generally I think this all looks good. sessionStorage is definitely the odd one out, and I'm not sure quite how that would fit in here. I suppose it would get its own very special "obtaining a storage key from environment" algorithm (also currently as spec-ed, sessionStorage is the only storage mechanism that is supposed to work in opaque origins. Not implemented in chrome though).

annevk · 2020-04-30T15:25:55Z

I think I uncovered how sessionStorage ought to work, that's whatwg/html#5498 now, but I'll work on infrastructure to allow defining it properly and also allow for #71.

#86 is my WIP PR to define all this. Probably best to keep high-level discussion here for now until I've made it somewhat more concrete, but feedback welcome on what is there now. (Note that there isn't much there yet compared to my comment above, but there is a bit. I hope to get to the remainder tomorrow/next week (tomorrow is a holiday I just realized).)

TODO Closes #18.

annevk mentioned this issue Apr 1, 2016

Allow caches to opt-in to granular cleanup w3c/ServiceWorker#863

Open

annevk mentioned this issue Apr 22, 2016

Clarify relation to existing storage APIs #24

Closed

annevk mentioned this issue Jun 25, 2016

Miscellaneous editorial feedback #23

Closed

annevk changed the title ~~Box hooks~~ Bucket hooks Nov 20, 2017

annevk mentioned this issue Nov 20, 2017

Merge with Storage w3c/webappsec-clear-site-data#20

Open

jakearchibald mentioned this issue Nov 20, 2017

Merge with storage w3c/ServiceWorker#1230

Open

annevk mentioned this issue Jun 2, 2018

Proposal: Add detailed usage breakdown in estimate() #63

Open

This was referenced Apr 2, 2020

Have deviceId follow partitioning rules for storage (such as localStorage) w3c/mediacapture-main#674

Merged

Reference storage spec once its hooks are ready #80

Closed

Use new Storage endpoint model for deviceId w3c/mediacapture-main#675

Open

This was referenced Apr 17, 2020

Define how localStorage is synchronized between browser tabs/windows whatwg/html#403

Open

Rewrite spec text for the Storage interface operations to use ordered maps whatwg/html#3252

Closed

annevk mentioned this issue Apr 30, 2020

sessionStorage copying into popups whatwg/html#5498

Closed

annevk added a commit that referenced this issue May 2, 2020

Finally define storage infrastructure

853158d

TODO Closes #18.

annevk added a commit that referenced this issue May 2, 2020

Finally define storage infrastructure

fd84f61

TODO Closes #18.

asutherland mentioned this issue May 12, 2020

Clarify storage infrastructure #86

Merged

3 tasks

This was referenced May 15, 2020

Do service/shared workers and BroadcastChannel deserve a special strategy? privacycg/storage-partitioning#9

Closed

Replacement design #88

Open

annevk closed this as completed in 25ac6d4 May 15, 2020

asutherland mentioned this issue May 18, 2020

Define Indexed DB as a storage endpoint, use hooks w3c/IndexedDB#334

Open

domenic mentioned this issue Oct 27, 2020

How to block storage access pre-activation? WICG/nav-speculation#7

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bucket hooks #18

Bucket hooks #18

annevk commented Mar 31, 2016

annevk commented Nov 20, 2017

jakearchibald commented Nov 20, 2017

annevk commented Nov 20, 2017

jakearchibald commented Nov 20, 2017

mikewest commented Nov 21, 2017

annevk commented Apr 16, 2020

inexorabletash commented Apr 16, 2020 •

edited

hober commented Apr 16, 2020 •

edited

annevk commented Apr 17, 2020 •

edited

domenic commented Apr 17, 2020

mkruisselbrink commented Apr 17, 2020

annevk commented Apr 30, 2020 •

edited

Bucket hooks #18

Bucket hooks #18

Comments

annevk commented Mar 31, 2016

annevk commented Nov 20, 2017

jakearchibald commented Nov 20, 2017

annevk commented Nov 20, 2017

jakearchibald commented Nov 20, 2017

mikewest commented Nov 21, 2017

annevk commented Apr 16, 2020

inexorabletash commented Apr 16, 2020 • edited

hober commented Apr 16, 2020 • edited

annevk commented Apr 17, 2020 • edited

domenic commented Apr 17, 2020

mkruisselbrink commented Apr 17, 2020

annevk commented Apr 30, 2020 • edited

inexorabletash commented Apr 16, 2020 •

edited

hober commented Apr 16, 2020 •

edited

annevk commented Apr 17, 2020 •

edited

annevk commented Apr 30, 2020 •

edited