Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bucket hooks #18

Closed
annevk opened this issue Mar 31, 2016 · 12 comments
Closed

Bucket hooks #18

annevk opened this issue Mar 31, 2016 · 12 comments

Comments

@annevk
Copy link
Member

annevk commented Mar 31, 2016

@jakearchibald @jungkees hey! I was wondering what kind of hooks you need to make it clear e.g., service worker registrations and the Cache API are stored in a box.

In #4 we are discussing the cleanup steps for when a box gets closed, but maybe we should also have formal language for actually storing something inside?

@annevk
Copy link
Member Author

annevk commented Nov 20, 2017

On IRC Jake suggested that we could just have "bucket has an associated X" where X could be service worker registrations and such. This assumes that when clearing a bucket is replaced with a new one (allowing X effectively to be GC'd as there are no more references to it). Is that the model we want? Currently we just say a bucket is cleared.

One problem is that we'd have to copy some state over from the old bucket, such as persistence and potentially more in the future once we start expanding the concept. At least, I think if you clear, you don't necessarily expect to have to invoke persist() again.

Thoughts?

An alternative is that a bucket has something like a specification-level GetStorageHandler(Identifier, optional ClearCallback) operation that returns a StorageHandler in which you can store stuff.

cc @inexorabletash @mikewest

@annevk annevk changed the title Box hooks Bucket hooks Nov 20, 2017
@jakearchibald
Copy link
Contributor

An alternative is that a bucket has something like a specification-level GetStorageHandler(Identifier, optional ClearCallback) operation that returns a StorageHandler in which you can store stuff.

"A bucket has storages", and it's the storages that become detached.

Adding callback steps for cleanup is fine unless the order becomes observable.

@annevk
Copy link
Member Author

annevk commented Nov 20, 2017

I think the order will be observable given the combination of navigator.storage.clear() IDB, and Cache API. Probably also with other APIs.

https://w3c.github.io/webappsec-clear-site-data/#abstract-opdef-clear-dom-accessible-storage-for-origin deals with this through enumeration (though doesn't list the Cache API). My idea with the identifier was that we'd first sort lexicographically and then invoke the ClearCallback, but perhaps it's better to just list everything in the Storage Standard and require it to be updated as new things are added.

@jakearchibald
Copy link
Contributor

perhaps it's better to just list everything in the Storage Standard and require it to be updated as new things are added.

That seems fine. Doesn't hurt to have all origin storage referenced from one place.

@mikewest
Copy link
Member

perhaps it's better to just list everything in the Storage Standard and require it to be updated as new things are added.

I'd agree that this is the right approach. Clear-Site-Data would be better if it deferred to Storage, rather than requiring additional enumeration of storage mechanisms.

@annevk
Copy link
Member Author

annevk commented Apr 16, 2020

FWIW, I have the feeling I'm missing a simpler solution here and as you can tell this is very much a sketch. Would love to hear your thoughts.

The idea here is to define existing storage APIs, such as service workers and localStorage, on top of these primitives so we get a well-defined Clear-Site-Data and hopefully some other benefits too. I suspect this architecture might also work for the Storage Access API in due course, though it depends a bit on how all that will pan out.

Storage APIs (e.g., localStorage) need to define:

  • A storage identifier (a string), e.g., "localStorage". (These should match those of UsageDetails from https://github.com/whatwg/storage/pull/69/files.)
  • A replace algorithm to abort transactions or some such in the event of storage bucket replacement. (Could be nothing if there's no cleanup to be done.)
  • They need to invoke the "obtain a storage bucket area map" algorithm (outlined below) for environments that end up using the API and use the returned map as the place to store all their data.

The Storage Standard needs to define:

A registry of all storage identifiers and an easy way to get from one to its corresponding replace algorithm.

A storage bucket holds a map of storage identifiers to storage areas.

A storage area is a struct consisting of map and a proxy map pointer set.

(The idea is that storage area's map holds the actual storage. It's in a map because those are easy to work with. How the map is persisted is implementation-defined. How to make it available across process boundaries is implementation-defined.)

A proxy map has identical operations to a map and performs those on its underlying map.

(We hand out a proxy map to a storage API so we can replace the actual map behind the scenes.)

New algorithms:

To obtain a storage bucket area map, given a storage identifier identifier and an environment environment, run these steps:

  1. Let key be the result of obtaining a storage key from environment. (This should be less hand-wavy.)
  2. Let bucket be the storage bucket for key. (This should be less hand-wavy.)
  3. Let storageArea be bucket's map[identifier].
  4. Let proxyMap be a new proxy map.
  5. Append a pointer to proxyMap to storageArea's proxy map pointer set.
  6. Set proxyMap's underlying map to storageArea's map.
  7. Return proxyMap.

(The above algorithm is intended for storage APIs. They would invoke this upon initialization to get a map to store things in.)

To replace a storage bucket old with a storage bucket new, run these steps:

  1. Atomically:
    1. Replace old with new. (This should be less hand-wavy and probably talk about the site storage unit.)
    2. For each identifierstorageArea of old's map:
      1. For each proxyMapPointer of storageArea's proxy map pointer set:
        1. Let newStorageArea be new's map[identifier].
        2. Set the value of proxyMapPointer's underlying map to newStorageArea's map.
        3. Append proxyMapPointer to newStorageArea's proxy map pointer set.
  2. For each impacted agent of ...: (This should be less hand-wavy)
    1. Queue a task to:
      1. For each identifier of ...:
        1. Run identifier's corresponding replace algorithm with ....

(There's a couple things that need to be filled out here including what kind of details the replace algorithm might need to clean up the relevant APIs.)

@inexorabletash
Copy link
Member

inexorabletash commented Apr 16, 2020

Just to be crystal clear (still waking up ☕ vs. multiple levels of indirection), the usage of the storage area's map is up to the particular storage API, i.e. for localStorage the map's keys/values are literally the (local) storage area's keys/values; for Indexed DB the keys/values would be database names/database constructs, for Cache Storage the keys/values would be cache names/caches, etc. Or a storage API could have a single entry in its storage area, and put all of its structure inside the single value. The need for this map is just because it's a common pattern across all storage APIs.

"Storage area" as a term seems to conflict with HTML's use for localstorage, but maybe they can coalesce? Or HTML can get a new term as part of refactoring to align with this. (I don't think it's formally defined in HTML?)

I think this proposal works for Indexed DB. (From a spec level; haven't thought about implementation impact, especially the replacement part.)

@hober
Copy link

hober commented Apr 16, 2020

Overall, @annevk, your sketch looks really good to me. One really basic question:

  1. Let key be the result of obtaining a storage key from environment. (This should be less hand-wavy.)

I imagine the obtain a storage key from an environment algorithm could return a (registrable domain, registrable domain) tuple for partitioned storage, and a registrable domain otherwise?

@annevk
Copy link
Member Author

annevk commented Apr 17, 2020

  • Yeah, the map is there so APIs don't have to design their own infrastructure. (It also isn't quite clear to me what the alternative would be.)
  • Yeah, HTML needs a rewrite at which point it no longer needs the storage area it talks about. I guess one thing that's a bit unclear is sessionStorage. That might warrant some indirection so different sessions get their own. (Edit: the better solution here would be for it to request a map in a session bucket, which would also solve Allow 'session' bucket #71 and not require sessionStorage to manage the lifetime.)
  • Yeah, the storage key can/needs to account for partitioning efforts.

@domenic
Copy link
Member

domenic commented Apr 17, 2020

Relaying some discussion from IRC:

Let key be the result of obtaining a storage key from environment. (This should be less hand-wavy.)

I assume most of the time the storage key will be an origin. But not always. In particular this step will allow us to define both double-keying and blocking of storage.

Note that currently storage is blocked in opaque origins on a per-API basis (e.g. localStorage, idb.open()). Those mechanisms should probably be subsumed here, so that if environment is an opaque origin, key is failure, and the rest of the algorithm fails. This also allows other scenarios to block storage by intervening at the "obtain a key" stage.

@mkruisselbrink
Copy link

Generally I think this all looks good. sessionStorage is definitely the odd one out, and I'm not sure quite how that would fit in here. I suppose it would get its own very special "obtaining a storage key from environment" algorithm (also currently as spec-ed, sessionStorage is the only storage mechanism that is supposed to work in opaque origins. Not implemented in chrome though).

@annevk
Copy link
Member Author

annevk commented Apr 30, 2020

I think I uncovered how sessionStorage ought to work, that's whatwg/html#5498 now, but I'll work on infrastructure to allow defining it properly and also allow for #71.

#86 is my WIP PR to define all this. Probably best to keep high-level discussion here for now until I've made it somewhat more concrete, but feedback welcome on what is there now. (Note that there isn't much there yet compared to my comment above, but there is a bit. I hope to get to the remainder tomorrow/next week (tomorrow is a holiday I just realized).)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

7 participants