Page MenuHomePhabricator

New syntax for multiline list items / talk page comments
Open, Needs TriagePublic

Description

List items are fragile in wikitext, and there are many things that can't be effectively embedded in lists. In theory we can just use a <dl><dd> tag in wikitext to work around this, but this isn't done in practice, mostly because it looks very ugly.

So (using talk pages as an example) instead we have:

: This is a comment
: This is still the same comment!
{|
|+ I wanted to embed a table but I had to break out of the list item to do that
|}
: Ok, another item.  I'm done with my comment now... ~~~~

A better syntax for multiline list items would help. T114432: [RFC] Heredoc arguments for templates (aka "hygienic" or "long" arguments) interalia proposes a syntax that looks like:

: <<< This is a comment
This is still the same comment!
{|
|+ I wanted to embed a table but I had to break out of the list item to do that
|}
Ok, another item.  I'm done with my comment now... ~~~~ >>>

But there are other alternatives that have been mooted.

One is to use {* ... *} / {: ... :} syntax, which is based on the wikitext 2.0 proposal. The nested form of that would look something like:

:::{: line 1
line 2 :}

although different variants are proposed.

Another proposal is to use "traditional" email quoting characters:

> line 1
> line 2
>
> * embedded list

This doesn't actually conflict with HTML, since < is the special character in HTML -- > is only a meta character after an initial < has been seen. The angle brackets also work well in a RTL context, since they are "logical" punctuation and change direction in a RTL context, for example: https://ar.wikipedia.org/wiki/%D9%85%D8%B3%D8%AA%D8%AE%D8%AF%D9%85:Cscott/AngleTest . However, it is then difficult to cut-and-paste content into a comment, since you need to go through and manually add > to each line. In theory you could mix this with heredoc quoting to avoid this, but that results in:

>>> Deeply nested reply <<<
multi

line 
* embedded list
>>>
>>>> Next item

That might be considered bracket overload. By making this syntax only work at start-of-line, we can probably avoid most breakage of legacy content, although probably some folks mask whitespace in wikitext using constructs like:

<span class="foo"
>something something...

Presumably we'd use entity encoding or <nowiki/> to handle quoting content that started with a literal > character (this should be rare!), but the old usenet quoting standard considered >> and > > to be different: the former was a two-level-deep quote, and the latter was a one-level-deep quote starting with a literal angle bracket (the intervening space is removed). No reason we have to implement usenet quoting exactly, of course.


It should be considered how well these various multiline content proposals integrate with/account for:

  1. Interoperability with existing :-style items, since old-school editors will continue to mix "traditional" syntax and new
  2. Extensions to add attributes to the item: T230658: Syntax for list item attributes or similar
  3. The desire to have an HTML box around "just this one comment". The old-style : context technically nests replies in the same box as the original comment, so it is hard to use styles which highlight "just this one comment not its replies". T230654: Parser support for talk pages discusses this issue.

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

I agree entirely with Anomie in T230683#5434186. His shot at a mockup is a bit reddit-ish too, which seems to be accepted elsewhere....

If they are willing to use new markup for talk page comments, we should give them new markup that makes sense as wikitext rather than trying to hack something ugly into colon-lists to make talk page commenting slightly less bad.

Agreed. I really like the > syntax. Another suggestion during the consult was use of % as an indicator. The community will categorically reject the {} and >>> <<< forms in the context of talk pages unless you can guarantee (you can't) that's what will end up in the final wikitext somehow.

Then we can also look at improving wikitext lists separately, without also having to try to have it make sense for talk page comments too.

There are already a few of these tasks hanging out there; some linked above. I didn't see T11996: Multiline tags in lists should be output more intelligently but that's the Duh use for it and perhaps discussion about multi-line list items should occur there instead. Speaking of which, why are you [cscott] trying to solve that problem instead of the one you are faced with in regard to the talk page consult? :)

The community will categorically reject the {} and >>> <<< forms in the context of talk pages unless you can guarantee (you can't) that's what will end up in the final wikitext somehow.

Right now, at the English Wikipedia, I'm writing comments that look like this:

**:::My reply is long.  {{pb}} My other thought is this.  ~~~~

At other wikis, I have to write something like this, even though I'm pretty sure that you told me that it's technically not the best HTML approach:

**:::My reply is long. <br /> My other thought is this.  ~~~~

Given that I'm already coping with complicated formatting, why wouldn't I be just as willing to type something like this, with the added benefit that it will work unfailingly on all wikis?

**:::<<<My reply is long.  
My other thought is this.  ~~~~>>>

Rough consensus seems to be forming around:

:::: <<< line 1

line 2 >>>

which is mentioned as a future extension at the end of RFC T114432, approved 2019-08. T114432#5786885 considers some relevant details.

Open questions are T230658: Syntax for list item attributes and T230659: Automatically-assigned id attributes for list items. I've proposed concrete strawmen for discussion in T230658#5786980 and T230659#5787050. There's no consensus implied around those; I just felt that discussion would be facilitated if we had specific proposals.

Rough consensus seems to be forming around:

Besides you (supporting, presumably) and me (opposed), the only people I see weighing in are @Izno (opposed per T230683#5549340) and @Whatamidoing-WMF (possibly supporting per T230683#5786927). That seems very far from any sort of consensus.

which is mentioned as a future extension at the end of RFC T114432, approved 2019-08.

Err, similar syntax in a very different context was supported there.

The "rough consensus" in T230683#5787066 the result of a meeting for that purpose involving @cscott, @ssastry, @Whatamidoing-WMF, @dchan, @matmarex, @ppelberg, Jazmin, @Esanders, @DLynch, and Marcella. (I don't personally have a strong preference among the alternatives suggested and didn't vote.) That meeting obviously doesn't include all possible stakeholders, which is why I posted here to try to keep everyone in sync. As I understand it, the goal is to put forward an RfC with one of these proposed syntaxes, which would of course get further debated as usual for that process.

If you have an internal meeting, it's better to say "We had an internal meeting, and like X because A, B, and C. We didn't like Y because of D, E, and F." rather than announcing an otherwise-unsupported "consensus".

I was deliberately trying to downplay "consensus" with "rough" and suggest only that the winds were vaguely blowing in that direction. But it's clear the words I chose read differently to you than they did in my head. I apologize.

I'll clean up the meeting notes and post bits here, so folks can see the pros/cons discussed at least.

Syntax comparison from the meeting notes:

"Wikitext 2.0" {: ... :}

  • Example
:::{: line 1

line 2 :}
  • Pros:
    • “Wikitext 2.0” used {<somechar> ... <somechar>} as a consistent bracketing strategy for many different wikitext constructs; for example {= =} {* *} (as well as {- -} {' '} {'' ''} which are further out there)
    • “Just another list item syntax” / no special reply syntax needed / interoperable w/ existing talk page markup
    • Obvious how to further nest these constructs, if that were to be necessary
  • Cons:
    • Perhaps ahead of its time? (Are we really going to introduce the other corresponding forms?)
    • “Just another list item syntax”; doubles down on “a reply is a list item”
    • Not obvious how to add attributes or extensions
    • I bet {: is used on wiki already, although could be linted away
    • There's no real "wikitext 2.0" (this was just one proposal in 2017) and what the transition paths are, etc. Syntax proposal should be evaluated on its own, not in any broader “wikitext 2.0” context.
  • Open Questions
    • Where would list item attributes go?
    • A variant: {:2 :} {:3 :}
    • Verify that this makes sense in an RTL context (ie, that default directionality at the start/end of the construct is usually correct, etc)
  • Initial straw poll votes: (qty) vote ("comments")
    • (4) +1 ("sounds okay")
    • (1) -1 ("Too similar to existing syntaxes (uses too many of the same characters)")
    • (1) -0.5 ("I originally liked it but not sure of this being the first step we do; this might be a good followup / secondary step maybe")

"Heredoc" <<<...>>>

  • Example
:::: <<< line 1

line 2 >>>
  • Pros
  • Cons
    • “Just a more consistent way to escape text” / not reply-specific
    • Syntax for further nesting is a bit awkward
  • Open Questions
    • How general do we want to make this quote mechanism
    • How should you attach attributes to list items?
    • Require no space between : and <<< ?
    • Since not reply-specific, do we add id attributes to all lists / all DL lists / add list items which end in {{#~}} or what?
    • Nesting: keep the (alpha)numeric prefix?
    • Q: What if the close bracket not found? A: backtrack and treat the open as literal text? (Prevents breaking the rest of the page.)
  • Initial straw poll votes: (qty) vote ("comments")
    • (5) +1 ("sounds okay", "Low-risk approach + rfc approved.", "I don’t personally like it, but I can’t think of a good reason why not")
    • (1) ±0 ("more cluttered than the above. And our users will always prefer fewer keystrokes.")

Email-style lines starting with >

  • Example
>>>> line 1
>>>>
>>>> line 2
  • Pros
    • Precedent for replies from email quoting
    • Also familiar from markdown and other contexts, so people are used to it
    • Reply-specific, not just another list item
    • Repeated brackets make it clear how many indent levels we’re in
    • Might be feasible to make this apply only at start of line; as such fit in naturally w/ other sol-sensitive constructs (but note that these existing SOL constructs have never played nicely w/ templates; cf T2529)
    • Reasonable way for others to insert a further reply in the middle of a long reply. (But do we want that?)
    • Idiot-proof: no closing symbol, so no way "forgetting to close" will break the rest of the page.
  • Cons
    • Repeated brackets are annoying to write
    • Doesn’t fit naturally into our existing parsing grammar
    • Difficult to cut-and-paste content; difficult to adjust indentation level (in both cases you need to add symbols to every line)
    • Doesn’t play nicely with heredoc syntax, so templates and replies don’t play nicely together.
    • If only works at start of line, again doesn’t play nicely with templates etc
    • Not interoperable w/ existing talk page markup (what if you wanted to use old syntax to respond to a new-style thread, or vice-versa?)
    • No unique “list start” symbol because continuation is the same sequence as list start. And look for {{#~}} to delimit end?
  • Open Questions
    • Syntax for attributes? (Can’t fall back to tag syntax since this is a new construct aliased to existing tags)
    • WP:LISTGAP / Multiple replies at same level.
    • What about:
>>> line 1
>>
>>> Is this paragraph 2 of the “blah blah blah” comment, or did the previous line separate this into two replies (maybe with an empty reply in between), or what?
  • What about:
> > > > Four nested comments?  Or one comment indented to the fourth level?
  • Initial straw poll votes: (qty) vote ("comments")
    • (1) +1 ("sounds okay")
    • (1) ±0 ("i’m not a huge fan of this, it just doesn’t feel “wikitext” to me (or at least, not part of the “good parts” of wikitext, has some similarities with the “bad parts” of wikitext, which break templates etc). But I could be convinced; it has historical precedent.")
    • (4) -1 ("Doesn’t seem to mesh with the rest of our syntax")

Baseline-ish proposal (ie, make work what we have)

  • Example
::: <dd> line 1
line 2 </dd>
  • Pros
    • Almost works already
    • Attributes also work / are obvious
  • Cons
    • Mixing : and <dl> doesn’t work right now; you’d actually have to write block tags for every indent level (but this could be fixed)
    • <dd> without an opening <dl> doesn’t work right now, you’d have to allow that
      • Need to check for consistency w/ HTML5 spec
    • Are we writing wikitext, or HTML?
  • Open questions
    • Maybe HTML syntax, but with a custom tag type, like:
<reply id=foo class=outdent>Line 1

Line 2</reply>
    • Pre-save transform (e.g. ~~~~) doesn’t work inside normal MediaWiki “XML-style tags”, so we’d have to somehow fix that, or implement the tag at the preprocessor level (like <noinclude>)
    • MediaWiki “XML-style tags” also can’t be nested
  • Initial straw poll votes: (qty) vote ("comments")
    • (2) -1 ("minus a million", "NO!")
    • (3) ±0 ("I don’t want this as the primary, but it might be nice to make work if we can, just to improve markup handling", "I concur", "I could get on board with this")
    • (1) -1 (but "I could get on board with this" aka not using as the primary but making it work)

Is it really appropriate to keep using definition lists for comment nesting? As @stjn notes in this topic, it's semantically incorrect to do so. If new syntax is introduced, it just doesn't make sense to me to keep using definition lists for nesting in discussions. Perhaps it could be justified from a usability POV, but it otherwise seems short-sighted.

@cscott, could you make clear whether the "rough consensus" was pertaining to multiline list item syntax in general, or to talk page syntax specifically?

"Wikitext 2.0" {: ... :}
[...]

  • Cons:

Also

  • Requirement for matched starting and ending sequences is a significant change to commenting, and could be easy to screw up.
  • Easy to lose the structure of the discussion when looking at the wikitext, since the multiple lines of the comment aren't indented.

"Heredoc" <<<...>>>
[...]

  • Cons

Again,

  • Requirement for matched starting and ending sequences is a significant change to commenting, and could be easy to screw up.
  • Easy to lose the structure of the discussion when looking at the wikitext, since the multiple lines of the comment aren't indented.

Email-style lines starting with >
[...]

  • Pros

[...]

  • Might be feasible to make this apply only at start of line; as such fit in naturally w/ other sol-sensitive constructs (but note that these existing SOL constructs have never played nicely w/ templates; cf T2529)

T14974 may have been a better link there. That was caused by the "fix" for T2529, and has been an annoying misbehavior ever since.

  • Cons
    • Repeated brackets are annoying to write

On the other hand, people are already very used to repeating the ":::::" syntax on multi-line replies.

  • Doesn’t fit naturally into our existing parsing grammar

It's similar at a grammar level to the use of a line-leading space to delimit a <pre>...</pre> block.

Although in implementation I'd suggest handling it more like you'd probably handle any of the above proposals: collect the lines in the block, parse them, then inject them into the nested structure.

  • Difficult to cut-and-paste content; difficult to adjust indentation level (in both cases you need to add symbols to every line)

Both of these seem an uncommon need, and isn't worse than the status quo of using ":::::" or "***".

  • Doesn’t play nicely with heredoc syntax, so templates and replies don’t play nicely together.

Since the heredoc syntax isn't actually implemented yet (as far as I know), it's not too late to change it.

Other than that, it doesn't seem to have any issues with templates that the other proposals (and status quo) don't already have.

  • If only works at start of line,

Err, so do the other proposals. And why would we want to start a comment at someplace else than the start of a line?

  • Not interoperable w/ existing talk page markup (what if you wanted to use old syntax to respond to a new-style thread,

Why would you want to, other than just to be disruptive?

But if you do, it probably won't be much worse than when people mismatch the colons and asterisks now.

or vice-versa?)

It's a pretty simple conversion, just replace the existing colons and/or asterisks with the equal number of ">".

Or just don't, and the resulting visual rendering probably won't be much worse than what you get when you mismatch the colons and asterisks now.

  • No unique “list start” symbol because continuation is the same sequence as list start. And look for {{#~}} to delimit end?

I suggested using blank lines in T230683#5422768. That matches well with what editors already do with the colon-and-asterisk markup, even though for those it produces poor HTML (WP:LISTGAP).

  • Open Questions
    • Syntax for attributes? (Can’t fall back to tag syntax since this is a new construct aliased to existing tags)

Why would attributes even be needed?

  • WP:LISTGAP / Multiple replies at same level.

As above I suggested using blank lines in T230683#5422768 to separate multiple replies at the same level.

  • What about:
> > > > Four nested comments?  Or one comment indented to the fourth level?

The latter, I'd say. It follows naturally from how this really should work with respect to other SOL constructs like tables and lists.

  • Initial straw poll votes: (qty) vote ("comments")
    • (1) ±0 ("i’m not a huge fan of this, it just doesn’t feel “wikitext” to me (or at least, not part of the “good parts” of wikitext, has some similarities with the “bad parts” of wikitext, which break templates etc). But I could be convinced; it has historical precedent.")
    • (4) -1 ("Doesn’t seem to mesh with the rest of our syntax")

To me, this is the only proposal that feels "wikitext" and meshes with the rest of our syntax.

Although in implementation I'd suggest handling it more like you'd probably handle any of the above proposals: collect the lines in the block, parse them, then inject them into the nested structure.

I think this would probably be necessary from a usability perspective if one marker is going to be added per line, since otherwise the output of the inline editing field would behave differently to other editors' output (see T241388 and T240696). Currently this is mostly fine since users know they're actually adding list items, but it'd probably be confusing and counterintuitive otherwise.

To me, this is the only proposal that feels "wikitext" and meshes with the rest of our syntax.

I think I largely agree with this, although I think start and end tags could still work if they could be made simple enough. Wikitext markup seems to try to be intuitive by generally requiring as few characters as possible for formatting. If most of the time the software is generating that markup it's better than if users will be regularly expected to write it themselves, so it's not as big of a deal here as it would be elsewhere (if the inline editor always works, that is...), but it's still not ideal to introduce overly complicated constructs when simpler ones would suffice.

To some degree "HTML markup on talk pages" and "wikitext markup on talk pages" are orthogonal. If we can agree on what "proper" wikitext looks like, then we can certainly tweak the parser so that it outputs different semantic tags for that markup when in the talk namespace. That's oversimplifying a bit -- a bunch of downstream stuff would need to know about the output change as well (VE, gadgets, CSS, etc) -- but all that downstream stuff would also need to change if we added new markup specifically for talk page replies. So conceptually we can probably separate the concerns.

I'll also note that "just another list item syntax" was consistently listed as both a pro and a con wherever it appeared in those meeting notes. ;) As a pro, it means that nothing forces you to use the new syntax unless you're specifically trying to do something impossible with the current markup. So *most* comments would/could look exactly the same as they do now (there's a non-technical policy question hiding in there; perhaps the wiki or the tool would choose to enforce a certain syntax, but in most cases there's no technical reason to; they both render to the same HTML). As a con, it means that (absent an orthogonal change to list rendering on talk pages) most comments are expressed and rendered using list item semantics.

(FWIW, I believe that markdown renders > as <blockquote> which isn't accurate in our use case; it's not entirely clear what a proper "semantic" HTML5 element for a comment actually should be! I think current W3C/WHATWG guidance would be to use RDFa and/or microdata to indicate the semantic content, not rely on the tag name. Concrete discussion implicit in T234966, although that concentrates mostly on the subtree corresponding to the signature block.)

"New syntax specifically for comments" (the flip side of "just another list item syntax") has a similar non-technical policy question, wrt this situation:

: Hi, I'm power user one! I've been editing for decades, never gonna change my markup.
>> Hi, I'm a newbie, I heard this was the way to go now!
::: Ugh.  Newbies.  No one's going to report me for WP:MOS:VAR if I change those to colons, are they?
>>>> But I need to do something new, which the colon syntax doesn't support, like, uh, embed a table?  Here goes!
>>>>
>>>> {|
>>>> :{| class="wikitable"
>>>> |+ Caption
>>>> ! Header cell !! Header cell
>>>> |-
>>>> | Content cell || Content cell
>>>> |}

Depending on what policies the wiki decides to enforce, this sort of thing can greatly complicate tool support. In addition to two different wikitext *input* formats to parse on talk pages, depending on decisions made, you might end up with two different *output HTML* structures that tools need to parse, in the name of "more semantic" markup. (If > is instead "just another list item syntax", aliased to : with some different newline handling, it only works for definition lists, not any other sort of list...)

On that topic: "different newline handling" seems to be the crux of why > feels different from other wikitext markup. It's also been a problem with integrating the <poem> extension (T54061), which also had its own newline and whitespace handling that didn't fit neatly into the rest of the parser. (I think the proposal is to make whitespace significant between > which isn't the case w/ most other wikitext markup, although that seems less core to the proposal.)

Returning to the issue of forking input and/or output format: this is also an issue with the "baseline" proposal which replaces colon in some cases with an html-ish tag. On the input side we're bound to see mixtures; indeed one of the issues flagged was that currently *all* colons need to be replaced with the htmlish tag, which brings up WP:MOS:VAR. If that html-ish tag is "<dd>" then it is "just another list item syntax", but if the tag is something more "semantic" than we have the "new syntax for comments" issue with gadgets and tools trying to make sense of comments with a variety of different HTML tag structures.

Finally, having to prefix every line with an increasing number of symbols seems to be one of those "both pro and con" things. Some folks like the constant reminder of your current indentation level. Others find it distracting and inconvenient line noise. For the common case of single-paragraph comments, there isn't really any significant difference.

I don't think there's a clear "right answer" for this syntax; every choice involves some degree of tradeoff.

Actually, this discussion is exactly why I suggested reframing the syntax-choice discussion as: "what syntactical support is needed to proceed that doesn’t foreclose future cleanup options”. Multi-line list items have always been a big pain point, especially in talk pages. Instead of trying to propose totally new syntax that are narrowly targeted to talk pages, or having to bikeshed choice of what kind of new list syntax we want, or if we want new syntax at all... all of which are going to bog down the talk page reboot project unnecessarily, I feel it is useful to refocus our attention narrowly on what is needed to help that project succeed for everyone's sake (not just for our own convenience as developers). Certainly, there are arguments to be made for better syntax for talk pages or "indent-pre" (I dislike that syntax personally, for the record) or whatever else we are unhappy about in wikitext. The question there is if we should be making piecemeal decisions or think about new syntax more holistically. I don't know what approach will succeed, but what I do know is that we should avoid getting the first minimal talk page reboot project stuck on new syntax proposals for list items or only talk pages.

So, that is precisely the reason why I recommended that we piggyback on top of the heredoc syntax which has the benefit that its use for one narrow use case has been approved and is something that will definitely be implemented in that form since it is part of the Parsing Team's long term goals of bringing more structure to templates and wikitext. In addition, it is also clear that heredoc syntax is a more generic nesting / new-doc / nesting mechanism, and it seems like a totally natural fit to adopt it for list items (and other contexts, if required, but probably nothing else requires it at this time). So, why invent yet another syntactical construct instead of reusing it?. And, it helps with the effort to bring more structure to talk page comments incrementally. Certainly, no one is forced to use it, and the talk page UI tool can emit that syntax for those who use it. And, in parallel, if new syntax for list items or talk pages is really needed, we can work that out separately and roll it out afterwards.

But, in the end, I feel this is a decision for the Editing Team ( cc @ppelberg ) to make after factoring in their project goals / deliverables / roadmap besides all the feedback in various forums (including this one).

Ugh. Newbies. No one's going to report me for WP:MOS:VAR if I change those to colons, are they?

Maybe they would.

(I think the proposal is to make whitespace significant between > which isn't the case w/ most other wikitext markup, although that seems less core to the proposal.)

It's more a side effect. The proposal is "collect the block of consecutive lines beginning with the same number of >, run s/^>+ *// over the block, parse it as an independent subtree,[1] then wrap the result in whatever HTML is needed to make it display as a comment in the discussion".

It doesn't happen with something like * * * * because list items just inject <ol><li> or </li><li> or the like, without the parsing as an independent subtree bit.

I wouldn't really care if > > > > wound up rendering like * * * * does, as a level-1 comment where the line begins with ">". But it seems to me that that might be more complicated to make happen while all other SOL syntax still works as expected inside a comment.

[1]: i.e. basically the same thing we do for extension tags like <ref>, where we collect the text inside the tag and process it independently. Also not too different from the future envisioned for "balanced templates", where we extract the whole template transclusion, process it to a DOM subtree, then inject that DOM subtree into the parent document.

Actually, this discussion is exactly why I suggested reframing the syntax-choice discussion as: "what syntactical support is needed to proceed that doesn’t foreclose future cleanup options”. Multi-line list items have always been a big pain point, especially in talk pages. Instead of trying to propose totally new syntax that are narrowly targeted to talk pages, or having to bikeshed choice of what kind of new list syntax we want, or if we want new syntax at all... all of which are going to bog down the talk page reboot project unnecessarily, I feel it is useful to refocus our attention narrowly on what is needed to help that project succeed for everyone's sake (not just for our own convenience as developers).

New syntax is being proposed on all sides. We should take the time to consider the best new syntax, rather than arbitrarily choosing one for expediency.

I think actual "multi-line list syntax" (versus talk page comments) might easily be solved by redefining list syntax slightly, to run to the next newline not nested inside some other tag/construct. In other words, make

*** <div>Text text

Text text</div>

work like it already does with an opaque extension tag like <syntaxhighlight>.

Assuming multi-line list items are even a concern outside of the abuse of wikitext list syntax for talk page formatting, of course.

So, that is precisely the reason why I recommended that we piggyback on top of the heredoc syntax which has the benefit that its use for one narrow use case has been approved and is something that will definitely be implemented in that form since it is part of the Parsing Team's long term goals of bringing more structure to templates and wikitext.

"one narrow use case" being a key phrase there, IMO.

In addition, it is also clear that heredoc syntax is a more generic nesting / new-doc / nesting mechanism,

This does not seem clear to me.

Certainly, no one is forced to use it, and the talk page UI tool can emit that syntax for those who use it.

Everyone is forced to read it in the wikitext, even if they don't personally produce it.

And, in parallel, if new syntax for list items or talk pages is really needed, we can work that out separately and roll it out afterwards.

Then we wind up with multiple different syntaxes trying to solve essentially the same problem. That does not seem like a win.

*** <div>Text text

Text text</div>

work like it already does with an opaque extension tag like <syntaxhighlight>.

That's more-or-less the "baseline" proposal in the meeting notes above, or at least a variant on it.

One implicit-but-important consideration here is compatibility with old wikitext. "Just" changing the newline semantics of <div> is something that my intuition says would be actually incredible difficult to roll out, because we'd have to lint the entire universe of wikitext for this case first to prevent breaking existing pages. But my intuition could be wrong. At some point we need to do some large-scale dump greps / linting to determine what sorts of showstoppers are actually in existing wikitext. For the email-list syntax proposal, for example, it worries me a lot that a single > character which happened to be at the start of the line would be enough to break the page. Obviously a trade-off here -- the "uglier" the syntax, the less likely it is to already exist in large quantities on existing pages.

I do find @ssastry's heredoc argument on the grounds of "it's already being implemented for similar purposes elsewhere" to be quite compelling. I also agree on the useful clarity of it being a fairly generic thing that can be used to bypass wikitext newline behavior.

Everyone is forced to read it in the wikitext, even if they don't personally produce it.

For talk pages in particular, that seems... less-relevant. It's an environment with a strong cultural norm around not editing the wikitext others produce, no? And people using whatever multiline syntax we introduce will either be doing so via a tool (and so filtering all their interactions through that tool, without really needing to understand the construct), or by having learned the new syntax because they agree it's useful. The former are a whole class of people who won't be included in that "everyone is forced to read it".

You could argue against it for list items on article pages as well, I suppose... but any argument there would presumably run up against the "we're using it for templates, people will get used to it there anyway" counter.

Actually, this discussion is exactly why I suggested reframing the syntax-choice discussion as: "what syntactical support is needed to proceed that doesn’t foreclose future cleanup options”. Multi-line list items have always been a big pain point, especially in talk pages. Instead of trying to propose totally new syntax that are narrowly targeted to talk pages, or having to bikeshed choice of what kind of new list syntax we want, or if we want new syntax at all... all of which are going to bog down the talk page reboot project unnecessarily, I feel it is useful to refocus our attention narrowly on what is needed to help that project succeed for everyone's sake (not just for our own convenience as developers).

New syntax is being proposed on all sides. We should take the time to consider the best new syntax, rather than arbitrarily choosing one for expediency.

I was saying: heredoc is NOT new syntax. But, I suppose that requires an agreement that it is a generic construct vs. being a special-case construct for template args. That is definitely up for discussion since I see you don't agree with that positioning. But, if for a moment you agreed that it is a generic construct, my proposal is indeed: "let us take the time to figure out the broader syntactic issues".

I think actual "multi-line list syntax" (versus talk page comments) might easily be solved by redefining list syntax slightly, to run to the next newline not nested inside some other tag/construct. In other words, make

*** <div>Text text

Text text</div>

work like it already does with an opaque extension tag like <syntaxhighlight>.

This is T134469 and as @cscott indicated earlier "fixing" that would probably cause a fair amount of breakage and while it is still something we want to do, it would probably require linting and some degree of cleanup. Not something that can be done now. We thought of tackling it during the Tidy replacement project, but decided to wait for Parsoid to become the default.

Assuming multi-line list items are even a concern outside of the abuse of wikitext list syntax for talk page formatting, of course.

I have had some editors (at wikimania / other venues I don't remember) ask me about the inability to embed multi-line constructs inside list items, so yes, it is a concern beyond talk page formatting.

So, that is precisely the reason why I recommended that we piggyback on top of the heredoc syntax which has the benefit that its use for one narrow use case has been approved and is something that will definitely be implemented in that form since it is part of the Parsing Team's long term goals of bringing more structure to templates and wikitext.

"one narrow use case" being a key phrase there, IMO.

Right, but, that is why a RFC to consider this along with other proposals. But, anyway my proposal requires seeing heredoc as a generically useful mechanism that is usable in other contexts. But if not, yes, it is similar to other new syntax proposals.

I feel it is useful to refocus our attention narrowly on what is needed to help that project succeed for everyone's sake...
...But, in the end, I feel this is a decision for the Editing Team ( cc @ppelberg ) to make after factoring in their project goals / deliverables / roadmap besides all the feedback in various forums (including this one).

I appreciate you saying this, @ssastry and I agree. I think it is important for the Editing Team to share a clear set of requirements for this new syntax and then letting those requirements guide the approach.

To this end, we – the Editing Team – are doing just this: discussing these requirements as a team and will have something to share here next week.


Note: I hope my brevity and lack of acknowledgement of the rest of the conversation doesn't come across as dismissive or unappreciative, I'm just wanting to make sure it's clear to everyone what the Editing Team is thinking about and when you can expect to hear from us again.

: Hi, I'm power user one! I've been editing for decades, never gonna change my markup.

On the other hand, it's nominally acceptable to correct people's formatting errors, though it's not clear what the actual community reaction would be if people started fixing talk page nesting all the time. It would have to happen at some point depending on the syntax, since there would be tens of thousands of ongoing discussions at the point in time that the syntax is introduced, but it would probably be somewhat messy unless the implementation's unusually smooth.

This is a comparison table of the various proposals and existing constructs that I'm currently aware of, with a focus on the syntax differences. (n is the nesting level; indicates a line break; the "Compatible?" column indicates whether the proposal would allow mixing of old and new syntax.) There (understandably) aren't a lot of editors putting out fully realized proposals, so I've had to guess how Wnt and Jeblad's proposals would have worked syntax-wise.

ProposalS/O/NStart markerLengthPara. markerLengthEnd markerLengthOther changesCompatible?
Status quo⏎::, ...n+1⏎::, ...n+1~~~~⏎5Yes (by definition)
· (variant)⏎::, ...n+1{{pb}}6~~~~⏎5Yes (by definition)
· (Russian)⏎**n+1<br>4~~~~⏎5Yes (by definition)
Wikitext 2.04/2/0⏎::{:, ...n+2⏎⏎2~~~~:}⏎7Yes
Heredoc5/0/1⏎::<<<, ...n+3⏎⏎2~~~~>>>⏎8Yes
Email1/4/1⏎>>n+1⏎>>n+1~~~~⏎5Maybe?
XML0/3/3⏎::<dd>, ...n+5⏎⏎?2?~~~~</dd>⏎10Yes
Jc86035⏎>2:, ...floor(log(n))+3⏎⏎2~~~~⏎5~~~~ result, othersMaybe?
Wnt⏎::=, ...n+2⏎::, ...n+1~~~~⏎?5?:= expansion, othersMaybe?
Jeblad(extension tag)~n+6?⏎⏎2(signature, tag)~12?~~~~ result, reply-to, othersNo
  • I've chosen the shortest possible character combinations in all cases.
  • My proposal is from my comment at 17:15, 27 February 2019 (UTC). It's sort of similar to the email-like system proposed in the discussion, but line breaks work normally and the nesting level is indicated as an integer, and there are a bunch of other possible bells and whistles which would be enabled primarily through extension tags (basically T230653).
  • Wnt's proposal is from his comment at 13:59, 25 February 2019 (UTC). It's fairly similar to the status quo, but has a few additional features influenced by Structured Discussions. I've assumed that "some other sequence that indicates 'end of reply'" would be generated by the four tildes, since it's the shortest possible way it could be done. The equals sign is supposed to expand to become an anchor. Wnt also proposed the use of % as an 8-level "outdent" character to avoid excessive indentation.
  • @jeblad's proposal is a bit hard to summarize, but seems to be mainly influenced by Structured Discussions. He doesn't seem to specify how he wants indentation to look (or I wasn't able to figure it out), so I've just assumed the extension tag would be something like ::<post>...</post>, since this would be shorter than including any attributes. I think he probably had something different in mind, though.

This is a comparison table of the various proposals and existing constructs that I'm currently aware of, with a focus on the syntax differences. (n is the nesting level; indicates a line break; the "Compatible?" column indicates whether the proposal would allow mixing of old and new syntax.) There (understandably) aren't a lot of editors putting out fully realized proposals, so I've had to guess how Wnt and Jeblad's proposals would have worked syntax-wise.

Thanks for the concise summary. A quick observation is that <<<, {:, <post>, <dd> are conceptually somewhat similar. The difference perhaps is that :{: ,<post>, <dd> demarcate the entire comment with balanced syntactical markers, and the intent there is for them to used uniformly everywhere, although they would be compatible with using them only when needed. <<< is used only on-demand and is technically not new syntax if it is implemented in wikitext for other purposes. The proposal only advocates expanding that use here. Strictly speaking <<< need not be attached with :: either. It could conceptually be used anywhere it is needed, although as a matter of style, using it at the very start might look cleaner.

So, in the interest of reducing the number of proposals to evaluate, I would pool them all in one class of proposals and distinguish them in terms of whether they propose custom syntax only for talk-page-list items, all-list-items, or reuse syntax from other uses. And perhaps other concerns specific to the proposals (ex: <<dd> has other issues).

"Just" changing the newline semantics of <div>

Note that's not what I proposed. There's nothing special about <div> there. The change in semantics is to SOL list markup, to "ignore" newlines embedded deeper in the DOM subtree when looking for the end of the list item.

But yes, if we wanted to investigate that proposal we'd have to do a bunch of linting to find out what it might break. There's probably wikitext out there relying on Remex closing unclosed tags at the </li> (i.e. relying on the behavior that T11996 considers a bug).

Obviously a trade-off here -- the "uglier" the syntax, the less likely it is to already exist in large quantities on existing pages.

OTOH, the uglier the syntax the less likely people will be to want to use it. ;)

Everyone is forced to read it in the wikitext, even if they don't personally produce it.

For talk pages in particular, that seems... less-relevant. It's an environment with a strong cultural norm around not editing the wikitext others produce, no?

Even if I don't edit your comment, I have to edit the page and therefore see the syntax you used and figure it out well enough to be able to put my reply in the right place.

I was saying: heredoc is NOT new syntax.

It doesn't actually exist yet in MediaWiki, even in master.

But, I suppose that requires an agreement that it is a generic construct vs. being a special-case construct for template args. That is definitely up for discussion since I see you don't agree with that positioning. But, if for a moment you agreed that it is a generic construct, my proposal is indeed: "let us take the time to figure out the broader syntactic issues".

Indeed, if we want to use it in a broader context we need to figure out what those contexts are and what it means in those contexts. The proposal in T114432 only defines it "immediately after = or | inside double braces", and seems to intend it as simply ignoring any contained = or | or }} for the purposes of finding the end of the template argument.

Wikitext list context might be one, as proposed here, to ignore the newlines when looking for the end of the list.

Also table syntax?

{|
| <<< This is one cell, despite the || in here that would normally split it into two.
|-
| This is still the original cell.
|}
And even this is still the original cell. >>>
|}

Links, like [[Foo|<<< Some text with ]] in it >>>]]? Or [[What|<<<would [[this]] do?>>>]]?

<ref><<<Can we do subrefs like <ref>this</ref> now?>>></ref>?

{{Will it get|
* confusing <<< if the | different contexts
are nested?
>>>}}
ProposalS/O/NStart markerLengthPara. markerLengthEnd markerLengthOther changesCompatible?
Status quo⏎::, ...n+1⏎::, ...n+1~~~~⏎5Yes (by definition)
· (variant)⏎::, ...n+1{{pb}}6~~~~⏎5Yes (by definition)
· (Russian)⏎**n+1<br>4~~~~⏎5Yes (by definition)

Note none of these really have an "end" marker. If you forget the ~~~~, the page structuring still works.

The "variant" is the only one with a real paragraph break. The others visually create a break, but e.g. screen readers will probably not read it as such.

BTW, on enwiki I've seen people sometimes use ⏎::⏎:: for a paragraph break, as it makes the break a bit more consistent in the wikitext.

Wikitext 2.04/2/0⏎::{:, ...n+2⏎⏎2~~~~:}⏎7Yes
Heredoc5/0/1⏎::<<<, ...n+3⏎⏎2~~~~>>>⏎8Yes

On the other hand, if you forget/mangle the ":}" or ">>>" here then the whole rest of the page is considered as being part of the comment.

Email1/4/1⏎>>n+1⏎>>n+1~~~~⏎5Maybe?

Again, if you forget the ~~~~ here it doesn't break the rest of the page.

Paragraph break as I proposed it would be ⏎>>⏎>>. {{pb}} would still work too, and <br>` visually.

I was saying: heredoc is NOT new syntax.

It doesn't actually exist yet in MediaWiki, even in master.

I was sloppy in my choice of words. What I meant was: "heredoc is not new syntax in the sense that its existence is guaranteed independent of what we do here".

"Just" changing the newline semantics of <div>

Note that's not what I proposed. There's nothing special about <div> there. The change in semantics is to SOL list markup, to "ignore" newlines embedded deeper in the DOM subtree when looking for the end of the list item.

Yes, I apologize, my wording was sloppy. What you said is what I meant. It's changing how SOL list markup works.
@ssastry mentioned T134469: doBlockLevels() inserts <p> and </p> randomly with no regard for HTML validity which I believe is part but not all of the problem. For example, this wikitext:

::<dl><dd>foo
bar</dd></dl>

Currently results in:

<dl><dd><dl><dd><dl><dd>foo</dd></dl></dd></dl>
bar</dd></dl>

There are no <p> tags involved, and frankly the place where the bar ends up is baffling even to me. (But at least Parsoid and the legacy parser agree, even if I don't understand why.)

We'd certainly like to rationalize this behavior, along with the rest of the doBlockLevels craziness, but it will certainly involve a lot of time-consuming wikilint work on the projects before we could alter those behaviors.

Links, like [[Foo|<<< Some text with ]] in it >>>]]? Or [[What|<<<would [[this]] do?>>>]]?
<ref><<<Can we do subrefs like <ref>this</ref> now?>>></ref>?

Moved discussion of this interesting example to T114432#5793209. The last is actually written as:

{{#tag:ref|<<<We can do subrefs like <ref>this</ref> now!>>>}}

For the record: (Russian Wikipedia’s) Convenient Discussions script should also insert {{pb}} after two line breaks, but not everyone, obviously, uses that ability (I personally don’t, because it seems strange to insert a paragraph where paragraphs are typically not rendered).

As for the general proposal, isn’t it a bit strange to come up with new syntax based on something that will be used as infrequently as tables in discussions are used? I agree that it’s hard to write them correctly in the current layout, but it seems to me that something more generic, like *+ Part of the same list item (not a concrete proposal), could cover 80% of the cases for this need without the dramatic upheaval of what you expect from wikitext. Markdown has something like this, but with a line break and a leading space to denote a continuing list item, even while their list syntax is wildly inferior to ours. And as for 20% that won’t be, it can be easier to add something like a parser function for tables that could be used in these contexts rather than a ‘break wikitext’ syntax.

As a general note, I don’t agree that a worse syntax will be less likely to be used by people. As long as it provides a helpful function, people put up even with stuff like <onlyinclude></onlyinclude> in articles, which is hundreds times worse than whatever will be the result here.

For example, this wikitext:

::<dl><dd>foo
bar</dd></dl>

Currently results in:

<dl><dd><dl><dd><dl><dd>foo</dd></dl></dd></dl>
bar</dd></dl>

There are no <p> tags involved, and frankly the place where the bar ends up is baffling even to me. (But at least Parsoid and the legacy parser agree, even if I don't understand why.)

I think I can explain it. Deconstructing the HTML:

  • <dl><dd><dl><dd> comes from the ::
  • <dl><dd>foo comes from the wikitext <dl><dd>foo
  • </dd></dl></dd></dl> is inserted because the parser sees that the next line doesn't begin with : or ::, so it closes the lists that were opened for the :: at the start of this line.
  • bar</dd></dl> again comes from the wikitext bar</dd></dl>.

That winds up balanced as far as something like Remex is concerned, even though it's probably not what was actually intended.

If we replace the :: with **, the HTML (before Remex fixes it up) would look something like

<ul><li><ul><li><dl><dd>foo</li></ul></li></ul>
bar</dd></dl>

which makes it a bit more clear what came from where.

The last is actually written as:

{{#tag:ref|<<<We can do subrefs like <ref>this</ref> now!>>>}}

Using #tag you don't even need the <<< ... >>>, unless it's just for a | or }} or the like.

My point was that if <<< ... >>> is going to be a general-purpose "escaping" construct for terminators in wikitext, why doesn't it escape the </ref> too?

My point was that if <<< ... >>> is going to be a general-purpose "escaping" construct for terminators in wikitext, why doesn't it escape the </ref> too?

Short answer: because the contents of an extension tag isn't wikitext.

The <<< does allow/protect the inner <ref> -- but the outer <ref> doesn't parse anything inside it (extension tags never parse any of the extension content), so it terminates at the first </ref>, which just happens to be after a <<< sequence (which doesn't mean anything to the extension).

That is, {{#tag:ref|<<<doesn't matter what's inside; eg <ref> is fine >>>}} works as you expect. But when the outermost context is <ref> we're using "extension tag quoting" rules. There are decent reasons for why those work as they do, although you could imagine future tweaks as I outlined in T114432#5793209. That's not related to this task (nor to heredocs) though: once you're inside the extension tag you're not in "wikitext" any more. Changing how the extension tag mechanism works is a different kettle of fish. If you don't like having a different escape mechanism when you leave wikitext, use {{#tag}} and stay in wikitext-land (T204370).

You and I know the weird way it works, but to most people <ref> looks like it just contains wikitext.

So, in the interest of reducing the number of proposals to evaluate, I would pool them all in one class of proposals and distinguish them in terms of whether they propose custom syntax only for talk-page-list items, all-list-items, or reuse syntax from other uses. And perhaps other concerns specific to the proposals (ex: <<dd> has other issues).

I think it would make sense to do this in two or more rounds (i.e. picking some general direction(s) and only then picking specific options), although the structure might depend more on how the scope of the discussion is defined.

Looking at the individual differences between the proposals would probably help, although in some cases (particularly in the proposals by editors that I mentioned) certain features might have to be considered together due to being dependent on each other. For example, in my proposal the reason the lack of a closing tag is more or less allowed specifically by the inclusion of a magic word/parser function in the 4-tilde signature output which would let the parser demarcate comments without matching signatures. (I don't really know how feasible the proposal would be, but it's from almost a year ago; if I had made the proposal more recently it would probably be fairly different.)

Heredoc looks nice for its consistency and general-ness, but I do hope we come up with something for LISTGAP. The definition-list is still a very bad idea, but then we can also just use ** <<< now.

I understand why some might find the current syntax hard to use, but claiming tables or other things cannot effectively be embedded in talk page comments is just wrong. Taking the original example from above:

: This is a comment
: This is still the same comment!
:{|
|+ I can easily embed tables in comments...
|}
:<syntaxhighlight lang="js">
// or even code or other preformatted text - which would probably
// be harder with the proposed "<<<" syntax.
</syntaxhighlight>
: Ok, another item.  I'm done with my comment now... ~~~~

So why should we change the existing syntax and risk breaking existing content? What exactly could *only* be done by introducing a new syntax?

: This is a comment
: This is still the same comment!
:{|
|+ I can easily embed tables in comments...
|}
:<syntaxhighlight lang="js">
// or even code or other preformatted text - which would probably
// be harder with the proposed "<<<" syntax.
</syntaxhighlight>
: Ok, another item.  I'm done with my comment now... ~~~~

Having multiple rows in a table inside a definition list item will cause a parser error and will not display as expected (now and/or into the future). So no, tables cannot be embedded in such constructs.

@Tkarcher There are a few cases that work, but in general it doesn't work. For example, the same syntax but with * instead of : doesn't work at all (and we'll probably need to support replying using * at some point, T259864). And the :{| syntax always causes a "list gap", which is an accessibility issue (https://en.wikipedia.org/wiki/WP:LISTGAP explains it well).