Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[css-text] Better describe the likely outcomes of hyphenation (editorial) #5973

Closed
r12a opened this issue Feb 10, 2021 · 16 comments
Closed

[css-text] Better describe the likely outcomes of hyphenation (editorial) #5973

r12a opened this issue Feb 10, 2021 · 16 comments
Assignees
Labels
Closed Accepted by CSSWG Resolution css-text-3 Current Work css-text-4 i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. Tested Memory aid - issue has WPT tests

Comments

@r12a
Copy link
Contributor

r12a commented Feb 10, 2021

8 Breaking Within Words
https://drafts.csswg.org/css-text-4/#hyphenation
https://drafts.csswg.org/css-text-3/#hyphenation

I think it would be worthwhile to add a note which explains that hyphenation should produce a number of effects, depending on the language in question, and give examples, in order to remind implementers to implement a solution that is open to cultural adaptation. These examples include:

  1. In some cases the spelling of a word needs to be changed around hyphenation, for example in Dutch cafeetje → café-tje and skiërs → ski-ers, and in Hungarian Összeg → Ösz-szeg.
  2. The symbol used to indicate that a word was broken at the end of a line is not always one that looks like a hyphen. Cree uses ᐀ [U+1400 CANADIAN SYLLABICS HYPHEN], Armenian uses ֊ [U+058A ARMENIAN HYPHEN], Balinese uses ᭠ [U+1B60 BALINESE PAMENENG], etc. Some languages, such as Malayalam or Tamil may produce no visual marker.
  3. The location of the mark is not always at the end of the line – some languages put it at the start of the following line, such as Traditional Mongolian.

It should also be made clear that such effects are triggered not only by browser code applying algorithms, but by ­ or wbr (see #5972) when they fall within a range to which the hyphens property has been applied (with relevant values), and that ­ should only produce a glyph at the end of a line that looks like a hyphen if that is appropriate for the language in question.

@faceless2
Copy link

Polish language is one I believe? I understand the OpenOffice hyphenation rules for Polish apply a hyphen to both the end of the first line and the start of the next.

As you're looking at this, we noticed that hyphenate-character allows you to override the defaults, but it doesn't allow you to specify whether your overridden character is at the end of the first line or the start of the second (or both). Two easy ways to specify this would be to give hyphenate-character an optional second argument, eg hyphenate-character: "" "-", or a single string with a newline to separate the two values, eg hyphenate-character: "-\A-". How necessary this is, I don't know.

@r12a r12a added the i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. label Feb 11, 2021
@r12a
Copy link
Contributor Author

r12a commented Feb 11, 2021

Here's another example of the visual marker appearing at the beginning of a line. Unicode Standard, v11, p536:

In writing Mongolian and Todo, U+1806 mongolian todo soft hyphen is used at the beginning of the second line to indicate resumption of a broken word. It functions like U+2010 hyphen, except that U+1806 appears at the beginning of a line rather than at the end.

@frivoal
Copy link
Collaborator

frivoal commented Apr 23, 2021

Do you have a suggestion for a short example that we could include, for instance right after the first paragraph of 5.4? If we do add something, it would be good to keep it short, just to illustrate the point that hyphenation can be different / more complicated than what is typical of English. But I wouldn't attempt to list too many cases. As much as I find that sort of things interesting, css-text cannot scale to describing all the peculiarities of all the world's languages :)

@frivoal
Copy link
Collaborator

frivoal commented Apr 23, 2021

(Given generic language in the spec that generically allows and expects "the right thing" for all languages, automated tests in wpt might be a more effective place to highlight the specificity of various languages)

@xfq
Copy link
Member

xfq commented Apr 24, 2021

The line breaking / hyphenation of pinyin can be an example too, but it may be less common than the above examples (and may be more suitable for css-ruby?).

(Related clreq issue: w3c/clreq#351)

fantasai added a commit that referenced this issue Dec 28, 2022
frivoal added a commit that referenced this issue Dec 28, 2022
Follow-up on the fix for #5973
@fantasai
Copy link
Collaborator

@r12a @xfq We've added a short table of examples illustrating the spelling changes (which are normatively noted in the paragraph above) here:
https://drafts.csswg.org/css-text-3/#hyphenation

If you have other examples you want to add, we can do that; but please remember we're not trying to make the spec examples exhaustive. :) It might be useful to compile your more exhaustive notes into the Typography index, though, and we can link there if you want.

We also clarified the spec to say that hyphenation character changes must, and spelling changes should, apply. (The SHOULD is because, if the spelling differs between hyphenated and unhyphenated forms, depending on where the author ended up inserting the UA might not be able to match up the author's chosen hyphenation opportunity against its hyphenation dictionary.)

We did not make any changes for WBR, see @frivoal's comments in #5972 (comment) and whatwg/html#6326 (comment) . Note that if HTML does introduce a way to mark up explicit hyphenation opportunities in the future, the spec is written to be generic to such mechanisms already.

Agenda+ for CSSWG review.

frivoal added a commit to frivoal/wpt that referenced this issue Dec 29, 2022
frivoal added a commit to frivoal/wpt that referenced this issue Dec 29, 2022
frivoal added a commit to frivoal/wpt that referenced this issue Dec 29, 2022
@frivoal frivoal added Tested Memory aid - issue has WPT tests and removed Needs Testcase (WPT) labels Dec 29, 2022
frivoal added a commit to web-platform-tests/wpt that referenced this issue Dec 29, 2022
moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this issue Jan 5, 2023
…tion, a=testonly

Automatic update from web-platform-tests
Add tests for i18n variations on hyphenation

See w3c/csswg-drafts#5973

--

wpt-commits: b63c4e6e1c7b4f0ce9deb4a2ef76e90b7758f4ed
wpt-pr: 37694
jamienicol pushed a commit to jamienicol/gecko that referenced this issue Jan 13, 2023
…tion, a=testonly

Automatic update from web-platform-tests
Add tests for i18n variations on hyphenation

See w3c/csswg-drafts#5973

--

wpt-commits: b63c4e6e1c7b4f0ce9deb4a2ef76e90b7758f4ed
wpt-pr: 37694
jakearchibald pushed a commit to jakearchibald/csswg-drafts that referenced this issue Jan 16, 2023
jakearchibald pushed a commit to jakearchibald/csswg-drafts that referenced this issue Jan 16, 2023
jakearchibald pushed a commit to jakearchibald/csswg-drafts that referenced this issue Jan 16, 2023
Follow-up on the fix for w3c#5973
@r12a
Copy link
Contributor Author

r12a commented Mar 20, 2023

@r12a @xfq We've added a short table of examples illustrating the spelling changes (which are normatively noted in the paragraph above) here:
https://drafts.csswg.org/css-text-3/#hyphenation

[1] The Uighur example is missing the 'hyphen'. It should be a short baseline extension, separated from the last letter by a small space. Here's an example. It's not entirely clear how the line should be produced. Some say that the font should automatically drop and lengthen a normal hyphen, but others say you should use ـ U+0640 ARABIC TATWEEL. In the meantime, perhaps an SVG image would be better here.

[2] Although the introductory text mentions that other symbols may be used, rather than a hyphen, the list of examples doesn't back that up convincingly - it only shows hyphens. I can provide one extra example for you, but how would you like it? I can provide text, but others may not be able to see the text, or i could provide an SVG image which could be displayed at approximately normal text size.

@r12a r12a reopened this Mar 20, 2023
@fantasai
Copy link
Collaborator

@r12a The backing store is actually using U+0640 but it looks like it needs some kind of thin space to create the visual separation. What do you recommend here?

@r12a
Copy link
Contributor Author

r12a commented Mar 30, 2023

How about using these images:

damydi_full
damydi_initial
damydi-final

If that works for you, i can provide another set to show another non-hyphen based hyphenation in a different script.

@r12a
Copy link
Contributor Author

r12a commented Mar 30, 2023

PS: If you like, you can also use those images plus the following 2 for Example 18, which looks a little ragged as a bitmap.

damydi_initial_wrong
damydi_final_wrong

frivoal added a commit that referenced this issue Aug 26, 2023
frivoal added a commit that referenced this issue Aug 26, 2023
As otherwise the browser-supplied rendering in the spec isn't
necessarily quite right.

See #5973
@frivoal
Copy link
Collaborator

frivoal commented Aug 26, 2023

@r12a I've updated the spec to use the images you provided.

As for this:

Although the introductory text mentions that other symbols may be used, rather than a hyphen, the list of examples doesn't back that up convincingly - it only shows hyphens. I can provide one extra example for you, but how would you like it? I can provide text, but others may not be able to see the text, or i could provide an SVG image which could be displayed at approximately normal text size.

Should we consider that the Uyghur example is using a U+0640 ARABIC TATWEEL and call it done, or do you want to supply some alternative example? If you want to offer something else, SVG is indeed good, as that provides reliable rendering.

@r12a
Copy link
Contributor Author

r12a commented Sep 12, 2023

I don't think we need to worry (for this context) about which character is used if we use the images.

The answer to the question about which character should be used – for implementers of Uighur hyphenation – is not clear, afaict even among the Unicode folks, and needs further discussion. My personal preference is to use tatweel, fwiw.

@frivoal
Copy link
Collaborator

frivoal commented Oct 4, 2023

At this point, I am not sure what the request is on the spec. Do we consider the examples already present good enough to show some diversity, or not?

@r12a
Copy link
Contributor Author

r12a commented Oct 4, 2023

@frivoal i think we're almost done, but here are some final suggestions:

  1. Per the discussion in this comment, I would like to add one more line to the Hyphenation Across Languages examples, to show a hyphenation mark in LTR text that is clearly not a hyphen. I suggest using the svg images for Plains Cree at the bottom of this comment field.
  2. I don't think it helps much to have to expand example 16 to view it – the table isn't huge, so i'd remove the details tag.
  3. The first para in 5.4 says "and visually indicating the split (usually by inserting a hyphen, U+2010)" (my emphasis). My view is that, even though we call this 'hyphenation', that is a western bias, and 'hyphenation' actually relates to splitting of words to fit on a line, and doesn't necessarily indicate that there needs to be a visual indicator. For example, Malayalam needs hyphenation rules but uses no visual indicator – see an example. I think the text could say "usually indicating the split", or we could drop that phrase.

SVG images for Cree example:

kasitaniwaninik
kasitani-
waninik

@frivoal
Copy link
Collaborator

frivoal commented Oct 13, 2023

Done (af3f01a). Also added a test in WPT for Cree (web-platform-tests/wpt#42523). Thanks for supplying this example.

@frivoal frivoal closed this as completed Oct 13, 2023
frivoal added a commit to frivoal/wpt that referenced this issue Oct 13, 2023
frivoal added a commit to web-platform-tests/wpt that referenced this issue Oct 13, 2023
moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this issue Oct 26, 2023
…only

Automatic update from web-platform-tests
Add hyphenatin example for cree

See w3c/csswg-drafts#5973

--

wpt-commits: b79a9b9a36e3165401fb2a70106969d65c0f749a
wpt-pr: 42523
ErichDonGubler pushed a commit to ErichDonGubler/firefox that referenced this issue Oct 27, 2023
…only

Automatic update from web-platform-tests
Add hyphenatin example for cree

See w3c/csswg-drafts#5973

--

wpt-commits: b79a9b9a36e3165401fb2a70106969d65c0f749a
wpt-pr: 42523
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified-and-comments-removed that referenced this issue Oct 30, 2023
…only

Automatic update from web-platform-tests
Add hyphenatin example for cree

See w3c/csswg-drafts#5973

--

wpt-commits: b79a9b9a36e3165401fb2a70106969d65c0f749a
wpt-pr: 42523

UltraBlame original commit: 70077f8f643cfc86c0d1a32074dccc26cdf40091
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified that referenced this issue Oct 30, 2023
…only

Automatic update from web-platform-tests
Add hyphenatin example for cree

See w3c/csswg-drafts#5973

--

wpt-commits: b79a9b9a36e3165401fb2a70106969d65c0f749a
wpt-pr: 42523

UltraBlame original commit: 70077f8f643cfc86c0d1a32074dccc26cdf40091
gecko-dev-updater pushed a commit to marco-c/gecko-dev-comments-removed that referenced this issue Oct 30, 2023
…only

Automatic update from web-platform-tests
Add hyphenatin example for cree

See w3c/csswg-drafts#5973

--

wpt-commits: b79a9b9a36e3165401fb2a70106969d65c0f749a
wpt-pr: 42523

UltraBlame original commit: 70077f8f643cfc86c0d1a32074dccc26cdf40091
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Closed Accepted by CSSWG Resolution css-text-3 Current Work css-text-4 i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. Tested Memory aid - issue has WPT tests
Projects
None yet
Development

No branches or pull requests

6 participants