Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add regionCode attribute #690

Merged
merged 12 commits into from
May 16, 2018
128 changes: 122 additions & 6 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -2148,6 +2148,7 @@ <h2>
readonly attribute DOMString postalCode;
readonly attribute DOMString recipient;
readonly attribute DOMString region;
readonly attribute DOMString regionCode;
readonly attribute DOMString sortingCode;
readonly attribute FrozenArray&lt;DOMString&gt; addressLine;
};
Expand Down Expand Up @@ -2199,6 +2200,75 @@ <h2>
</li>
</ol>
</li>
<li>If <var>details</var>["<a>regionCode</a>"] is present and not
the empty string:
<ol>
<li>Let <var>regionCode</var> be the result of <a>strip leading
and trailing ASCII whitespace</a> from
<var>details</var>["<a>regionCode</a>"] and then
<a data-cite="!INFRA#ascii-uppercase">ASCII uppercasing</a>
the result.
</li>
<li>
<p>
If <var>regionCode</var> is not a valid <a>country
subdivision code element</a> as per [[!ISO3166-2]]'s
section 5.2 "Structure of country subdivision code
elements" (non-normative details below), throw a
<a>RangeError</a> exception.
</p>
<div class="note" title=
"Structure of country subdivision code elements">
<p>
<strong>Do not implement from this note.</strong> The
structure of a <a>country subdivision code element</a> is
formally defined in [[!ISO3166-2]] (section 5.2).
Although the structure is not expected to change at the
time of writing, implementers are expected to track
updates to [[!ISO3166-2]] directly from ISO.
</p>
<p>
As [[!ISO3166-2]] is not freely available to the general
public, the structure of a <a>country subdivision code
element</a> at the time of publication is as follows:
</p>
<ul>
<li>Two <a>code points</a> that match an [[!ISO3166-1]]
alpha-2 country code.
</li>
<li>A single U+002D (-) <a>code point</a>.
</li>
<li>One, two, or three <a data-cite=
"INFRA#ascii-alphanumeric">ASCII alphanumeric</a> code
points, in any order.
</li>
</ul>
</div>
</li>
<li>Set <var>address</var>.<a>[[\regionCode]]</a> to
<var>regionCode</var>.
</li>
<li>Let <var>region</var> be the corresponding <a>country
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Adding comment again, so we can discuss here)

@rsolomakhin, I've tried to summarize what you said about how Chrome does the ordering, while keeping the order in which things are matched optional. However, I added matching on the document's body's language as first, to match what we say in .show(). The "Any other criteria the user agent deems suitable" if for the fallback you described, whereby you pick the first available.

WDYT? Would this work?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the specific question is… do you want me to comment on the locale ordering or the spec change in general? For the locale ordering, as long as the UA isn't required to show the PR dialog in a language that the UA doesn't have installed then I'm fine with it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a separate note, I wonder whether we really need to derive region from the regionCode… it would be simpler to also return the ISO3166-2 code as the region.

One concern I have with the spec change is that I've never seen ISO3166-2 used with the country code prefix in the wild. As a web developer I would expect regionCode to be "CA" for California, not "US-CA" if I hadn't read documentation on the API and only saw the webidl FWIW. I'm not sure if there are exceptions to cutting off the first 3 characters in order to get the region code specific to the country. If there is then this will likely cause issues for merchants when mapping the PR's address to existing systems.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a web developer I would expect regionCode to be "CA" for California, not "US-CA"

With the information I have at the moment, I'm leaning towards just returning the ISO3166-2 format. It might save us from having to deal with special cases where there is ambiguity, but I also don't know if there are any. I'll try to do a bit more reading.

I'm not sure if there are exceptions to cutting off the first 3 characters in order to get the region code specific to the country.

I don't think there are, from my reading of the ISO3166-2 spec. But I'll take a closer look as it's been a few weeks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coming back to this and re-reading the ISO spec, I can't determine that there would be significant issues with chopping off country and dash. I'm still reluctant to do it tho. I'd rather just stick to the full code "just in case" and just so we are not creating new conventions. @mnoorenberghe, is that ok?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I'm always hesitant to parse other people's strings, ISO 3166-2 is defined to use ISO 3166-1 alpha-2 code elements as the first two characters of the string, followed by the hyphen character, followed by the region itself. So I think we'd be safe in truncated the first three chars.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it does also remove a source of ambiguity/redundancy, as .country will now need to be there to derive .region. Do we still attempt to do validation on a synthetic 3166-2 code? I would guess no: we attempt to populate region iff county and regionCode are both there and they both together form a known 3366-2 code. Can work with that.

subdivision name</a> for <var>regionCode</var>. Where
[[!ISO3166-2]] defines multiple <a>country subdivision
names</a> for a <var>regionCode</var>, it is RECOMMENDED the
user agent select one by matching on:
<ol>
<li>The <a data-cite="!HTML#language">language</a> of
<a data-cite="HTML#the-body-element-2">the body
element</a>.
</li>
<li>The user's preferred languages.
</li>
<li>Any other criteria the user agent deems suitable.
</li>
</ol>
</li>
<li>Set <var>address</var>.<a>[[\region]]</a> to
Copy link
Member Author

@marcoscaceres marcoscaceres Mar 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@domenic, stop reading this! get back to vacation! 🍹

But when you are back... by design, address.region is overridable by the developer. So, eventually:

const dict = {
   country: "PT",
   regionCode: "PT-11",
}
const address = new PaymentAddress(dict);
address.region; // "Lisboa", per ISO ISO3166-2

// And... then.. 
const dict = {
   country: "PT",
   regionCode: "PT-11",
   region: "Lisbon",
}
const address = new PaymentAddress(dict);
address.region; // Lisbon, per developer override.

Hmm.... now I'm wondering if we should allow some members to be nullable on the AddressInit dictionary:

const dict = {
   country: "PT",
   regionCode: "PT-11",
   region: null, // or undefined, means "derive it for me" 
}

While:

const dict = {
   country: "PT",
   regionCode: "PT-11",
   region: "", // trash it ... I'll handle it
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@domenic, if you are back from vacation, #690 (comment)

Copy link
Collaborator

@domenic domenic Apr 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think undefined ("not present" in dictionary-speak) is the right signifier. So basically only execute this step if details["region"] does not exist. Otherwise set address.[[region]] to details["region"].

Whether we want to allow details["region"] to be nullable or not, as some kind of explicit "don't derive this for me, but also make it not a real region value"---I'm not sure.

<var>region</var>.
</li>
</ol>
</li>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consider deriving region from regionCode, so the caller only has to pass one? I'm thinking ahead to the constructor; do we want to prevent new PaymentAddress({ ..., regionCode: "PT-11", region: "Tasmania" })?

I guess for the cases where there is no corresponding regionCode, we'd still want to pass region. But if regionCode is not the empty string, deriving region from it might be good...

Copy link
Member Author

@marcoscaceres marcoscaceres Mar 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if regionCode is not the empty string, deriving region from it might be good...

Agree. It's definitely doable. ISO3166-2 provides lookup tables, for example:

AD-07* Andorra la Vella
AD-02* Canillo
AD-03* Encamp
AD-08* Escaldes-Engordany
AD-04* La Massana
AD-05* Ordino
AD-06* Sant Julià de Lòria

Where column 2 (defined thing) is "...the country subdivision names in the administrative language of the country concerned, where relevant with diacritic signs..." (as Unicode). Note that "a country’s administrative language is a written language used by the administration of the country at the national level". So, the result won't be in English a lot of the time - but that's fine, IMO. This is not for display purposes.

So, something like:

The steps to derive a region from a country subdivision code element are as follows. The string takes a DOMString subCode as input and returns a DOMString.

  1. If validate a region code subCode return false, throw RangeError exception.
  2. ASCII uppercase subCode, and let normalizeCode be the result.
  3. Find normalizeCode in Section 8, "List of country subdivision names and their code elements" of ISO3166-2 and return the value of "column 2" for the matching country subdivision code element.

Sound good?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds great! Probably we should get implementer buy-ins though; maybe as a separate PR? Because it'd mean having to ship that table with the browser...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree... ☝️@rsolomakhin, @mnoorenberghe, something to start thinking about. Would appreciate your input.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are already using libaddressinput (and I believe Chromium does too) so it would be great if we could also use that here instead of having two versions of similar data. Does ISO3166-2 only have one name per region? What about when there are multiple official languages? Consider CA-QC: is it "Quebec" or "Québec" in ISO3166-2. libaddressinput provides both but without a clear way to know which one would match ISO3166-2.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumedly, we can just grab the "en" column 2

I'm not quite sure how this all fits together but won't this affect what gets returned in the region property and therefore may be displayed back to the user on the payment confirmation page the merchant renders? If so, it would seem like we should use the language best for the user.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably perform some kind of lookup, based on the document language. But then it gets into preference order for when it doesn't match the user's language (and there is no "en").

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems crazy town, but:

  1. Try to match document language.
  2. Try to match user language (like Accept-Language)?
  3. Try to match "en".
  4. Um... just give me whatever you got.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use libaddressinput in Chromium and @sebsg is helping us to add the ISO codes to that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if regionCode is not the empty string, deriving region from it might be good.

The regionCode field takes care of interop, so we don't necessarily have to agree on how the region field is derived from that. Having said that, the rule of thumb we have been using in Chrome is to use the browser user interface language when deciding between Quebec and Québec. If the browser user interface language does not match any of the languages supported by the country, then Chrome falls back to the first language specified in libaddressinput data for that country. For example, if the browser user interface language is Italian, then Canadian addresses will be in English. Although the default language works out to be English for Canada, that's not something that we should hardcode for all countries in the world, IMHO.

<li>If <var>details</var>["<a>languageCode</a>"] is present:
<ol>
<li>If <a data-cite=
Expand Down Expand Up @@ -2316,6 +2386,17 @@ <h2>
internal slot.
</p>
</section>
<section>
<h2>
<dfn>regionCode</dfn> attribute
</h2>
<p data-link-for="">
Represents the <a>region</a> of the address as an [[!ISO3166-2]]
<a>country subdivision code element</a>. When getting, returns the
value of the <a>PaymentAddress</a>'s <a>[[\regionCode]]</a>
internal slot.
</p>
</section>
<section>
<h2>
<dfn>city</dfn> attribute
Expand Down Expand Up @@ -2434,9 +2515,20 @@ <h2>
<dfn>[[\region]]</dfn>
</td>
<td>
A <a>region</a> as a country subdivision name or the empty
string, such as "Victoria", representing the state of Victoria
in Australia.
A <a>region</a> as a <a>country subdivision name</a> or the
empty string, such as "Victoria", representing the state of
Victoria in Australia.
</td>
</tr>
<tr>
<td>
<dfn>[[\regionCode]]</dfn>
</td>
<td>
A <a>region</a> represented as a [[!ISO3166-2]] <a>country
subdivision code element</a> stored in its canonical uppercase
form, or the empty string. For example, "<code>PT-11</code>"
represents the Lisbon district of Portugal.
</td>
</tr>
<tr>
Expand Down Expand Up @@ -2516,6 +2608,7 @@ <h2>
DOMString country;
sequence&lt;DOMString&gt; addressLine;
DOMString region;
DOMString regionCode;
DOMString city;
DOMString dependentLocality;
DOMString postalCode;
Expand Down Expand Up @@ -2550,6 +2643,13 @@ <h2>
<dd>
A <a>region</a>.
</dd>
<dt>
<dfn>regionCode</dfn> member
</dt>
<dd>
An <a>region</a>, represented as a <a>country subdivision code
element</a>.
</dd>
<dt>
<dfn>city</dfn> member
</dt>
Expand Down Expand Up @@ -2684,9 +2784,17 @@ <h2>
<var>details</var>["<a>recipient</a>"] to the user-provided recipient
of the transaction, or to the empty string if none was provided.
</li>
<li>If "region" is not in <var>redactList</var>, set
<var>details</var>["<a>region</a>"] to the user-provided region, or
to the empty string if none was provided.
<li>If "region" is not in <var>redactList</var>:
<ol>
<li>Set <var>details</var>["<a>region</a>"] to the user-provided
region, or to the empty string if none was provided.
</li>
<li>If <var>details</var>["<a>region</a>"] has a corresponding
<a>country subdivision code element</a>, set
<var>details</var>["<a>regionCode</a>"] to that <a>country
subdivision code element</a>.
</li>
</ol>
</li>
<li>If "sortingCode" is not in <var>redactList</var>, set
<var>details</var>["<a>sortingCode</a>"] to the user-provided sorting
Expand Down Expand Up @@ -3923,6 +4031,14 @@ <h2>
This specification relies on several other underlying specifications.
</p>
<dl data-sort="">
<dt>
ISO 3366-2
</dt>
<dd>
<dfn data-lt="country subdivision names">Country subdivision
name</dfn> and <dfn>country subdivision code element</dfn> are
defined in [[!ISO3166-2]].
</dd>
<dt>
Infra
</dt>
Expand Down