Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Digit separators #216

Closed
MadsTorgersen opened this issue Feb 3, 2015 · 41 comments
Closed

Proposal: Digit separators #216

MadsTorgersen opened this issue Feb 3, 2015 · 41 comments
Assignees
Labels
Area-Language Design Feature Request Feature Specification Language-C# Resolution-External The behavior lies outside the functionality covered by this repository
Milestone

Comments

@MadsTorgersen
Copy link
Contributor

Being able to group digits in large numeric literals would have great readability impact and no significant downside.

Adding binary literals (#215) would increase the likelihood of numeric literals being long, so the two features enhance each other.

We would follow Java and others, and use an underscore _ as a digit separator. It would be able to occur everywhere in a numeric literal (except as the first and last character), since different groupings may make sense in different scenarios and especially for different numeric bases:

int bin = 0b1001_1010_0001_0100;
int hex = 0x1b_a0_44_fe;
int dec = 33_554_432;
int weird = 1_2__3___4____5_____6______7_______8________9;
double real = 1_000.111_1e-1_000;

Any sequence of digits may be separated by underscores, possibly more than one underscore between two consecutive digits. They are allowed in decimals as well as exponents, but following the previous rule, they may not appear next to the decimal (10_.0), next to the exponent character (1.1e_1), or next to the type specifier (10_f). When used in binary and hexadecimal literals, they may not appear immediately following the 0x or 0b.

The syntax is straightforward, and the separators have no semantic impact - they are simply ignored.

This has broad value and is easy to implement.

@svick
Copy link
Contributor

svick commented Feb 3, 2015

Does this apply to real literals as well? For example, would 1_0_._5_e_-_1_6_m_ be valid?

I have no idea if this would be useful, just curious.

@monoman
Copy link

monoman commented Feb 4, 2015

👍

@chrisaut
Copy link

chrisaut commented Feb 4, 2015

Don't shoot me, but would it be too hard to parse "space" as a seperator? Or does that make the grammer ambiguous?

int two = 0b 10;
short max = 0x ffff;
long oneMillion = 1 000 000;

Just thinking out loud.

@AdamSpeight2008
Copy link
Contributor

Digit separators where included in the VB.net (vNext CTP) would it be beneficial to a also describe what was allow in VB? It think
'1_000' was allowed but 1__000 wasn't.
Comma usage could be an issue as it would clash with array literals, as you couldn't tell what was number and what was and array element.

@mburbea
Copy link

mburbea commented Feb 4, 2015

I agree I like space more then underscore. It's generally easier to type and makes it easier when working with something like hex numbers e.g.
0x8080 8080 8080 8080UL is so much easier to read and make sure I've filled all the slots vs something like 0x8080808080808080UL where I have to sit and count to see if I got 16 characters or I only typed 14 or something. How's about ' as well.

@alanfo
Copy link

alanfo commented Feb 6, 2015

I don't see how you could use a space as the separator because numeric literals would then potentially consist of not one but several tokens. This would make them very difficult to parse.

The underscore seems the best choice of separator to me, particularly as it's already used by several other languages.

I'm not so keen on allowing multiple consecutive underscores but I suppose it does no harm.

@AdamSpeight2008
Copy link
Contributor

This grammar wouldn't allow consecutive separators.

  digit ::= '0' - '9'
  sep   ::= '_'
 prefix ::=
literal ::= prefix (sep? digit)+

I think spaces could also be possible

    digit ::= '0'-'9'
seperator ::= ' '
  literal ::= digit (separator? digit)*

@paulomorgado
Copy link

I think it would be very hard to use spaces.

I haven't looked at the parser, but it's probably doing something like breaking the text at white spaces, parenthesis, braces, whatever and analyzing the tokens from there. Assuming that after a numeric literal it might come the rest of it is doable, but I don't think it is worth the cost.

And what next? This?

var a = 1111
        1111
        1111
        1111;

Or this?

var a = 1111    // comment
        1111    // comment
        1111    // comment
        1111;   // comment

Although it might be an itsy bitsy harder to write in most keyboard configuration, the semantic break of the numeric literal is the same with the _ and I would argue that it's even better because gives separation and cohesion.

@AdamSpeight2008
Copy link
Contributor

Wonder if the parser supports significant whitespace?

@AnthonyDGreen
Copy link
Contributor

The VB implementation of digit group separators prototyped last year actually supported three different separators originally: underscore, back tick, and space. So you could write &B1111 0010 or 1_000_000 or 3`600. We quickly decided that back tick didn't make enough sense to anyone and cut it. The VB preview still supported both underscores and spaces. The biggest motivation for spaces was binary literals, another feature prototyped at the same time, because binary numbers are conventionally separated with spaces.

As to implementation, it's not hard at all really - at least in VB, particularly when you don't allow multiple consecutive separators. Normally the scanner encounters a digit and starts scanning a integral literal one character at a time until it encounters a character that's not a digit for the base being used (decimal, hex, octal) then it stops. We changed it so that if the non-digit character were a underscore or space it would peek one more character ahead and if that character were a digit it would keep scanning it as a single token. There are some corner cases you have to put extra recovery around but it's not very complicated, particularly because in VB it's not valid to have two integer literals follow one another so it's non-breaking to interpret 1 1 as 11. I think C# is the same here though in C# we were pretty settled that underscore would be the sole separator.

I think the biggest concern about that is that tools would be confused thinking the space was a word boundary (not VS, the editor is smart enough in VS to handle space) and we just couldn't foresee what havoc spaces would be unleashing on the world (if any).

Another more minor concern was complexity - would users benefit more from having a single consistent separator used everywhere? If we decided to pick one it would likely be the underscore so space was only a possibility if we were ok with having two separators which was an open question.

-ADG

@thomaslevesque
Copy link
Member

Using space as a separator would probably be a bad idea, because it would cause hard-to-spot mistakes. For instance, int[] numbers = { 1 2 } looks like an array with the numbers 1 and 2, but it would actually be an array with only the number 12. Forgetting a comma would silently change the meaning of the code, instead of causing an error.

@chrisaut
Copy link

chrisaut commented Feb 7, 2015

@thomaslevesque that is a very good point, before I suggested it I quickly tried to think of places where two numbers would follow each other, but I had totally missed this obvious one. I think that is probably a deal breaker.

Seems generally people are not for using space, and I think I have come to agree with this point. Still don't like how "1_000" looks, but it might be the best and easiest option.

@AdamSpeight2008
Copy link
Contributor

Isn't this proposal about digit separators for the literals have a prefix?

@thomaslevesque
Copy link
Member

@AdamSpeight2008, no, it's for all numeric literals.

@AnthonyDGreen
Copy link
Contributor

@AdamSpeight2008, we did consider restricting space in particular to its most obvious use case - binary literals. It would be unusual, but I think it's worth considering if it gives us more confidence in the feature.

@thomaslevesque, @chrisaut, I find that developers tend to bias negatively on what would confuse other developers and how often. Just about every feature ever proposed or introduced has someone saying "this will cause hard to spot mistakes for everyone ever". There are also features which at first seem harmless - then later turn out to be pits of failure. Fortunately, with "Roslyn" and a managed code base it's much easier to quickly prototype language features - even the scary ones and experiment and make decisions after making observations. I think that will give us the most room to explore the full potential of the language without being committed to doing or not doing a feature a particular way too early. It's still very very early in the design of VB15 (this idea has 0% chance of making it into C#) and given how often space has been proposed or preferred by different VB users we've spoken to I'd hate to cut the idea down prematurely if it could actually produce a better experience for those users.

Regards,

-ADG

@mikedn
Copy link

mikedn commented Feb 8, 2015

I'd say ' or ` are better choices than _:

  • They're easier to type (single keystroke instead of combination)
  • Even they are placed at the top of the text they look more similar to commas and dots that are used as digit separators in various cultures
  • The _ might be useful to other features, such as user defined literals.

@d-kr
Copy link

d-kr commented Feb 8, 2015

@mikedn

They're easier to type (single keystroke instead of combination)

Sadly this holds true only for the US keyboard layout. At least In the German layout all three require two key strokes. Only space is one keystroke here, too.

@tomasr
Copy link

tomasr commented Feb 8, 2015

Agreed ` or ' are undesirable for the reasons already mentioned. I actually don't mind using _ as a separator at all, and, frankly, anything here is better than nothing :)

Using space seems like a recipe for conflicts all over the place, and I don't see it adding that much value. I dislike the idea of allowing multiple, alternative separators, while anyone reusing Roslyn wouldn't care, other tools doing their own lexing of C# code would have to do much more work.

@AdamSpeight2008
Copy link
Contributor

' is used for a comment in VB.net

@AdamSpeight2008
Copy link
Contributor

In VB.net _ is also used as a line continuation.
Would that cause a misread of the user's intent?

@ViIvanov
Copy link

@mikedn, @tomasr or ' is good only for decimal digits. Lets see other cases:

int bin = 0b`1001`1010`0001`0100;
int hex = 0x1b`a0`44`fe;
int dec = 33`554`432;
int weird = 1`2``3```4````5`````6``````7```````8````````9`````````;

I think _ is better because it more universal.

@AdamSpeight2008
Copy link
Contributor

@ViIvanov ` and ' make it look like numbers are indicating degrees. or feet and inches.

@AnthonyDGreen
Copy link
Contributor

@AdamSpeight2008, in VB the explicit line continuation is actually to ensure that the underscore is never a trailing character of an identifier or other token so it wouldn't be a problem.

I agree that ` and ' look more like units of measurement. _ has a precedent in identifiers as a chunk separator. is used for binary numbers in particular and has been recommended by various bodies as a standard separator alternative to either comma or period (http://en.wikipedia.org/wiki/Decimal_mark#Digit_grouping)

I haven't seen a good scenario for multiple consecutive separators yet and am likely to advocate disallowing them.

@yume-chan
Copy link

@gafter So the final decision is disallowing separators immediately after prefixes?

@AdamSpeight2008
Copy link
Contributor

@CnSimonChan I think it is implement in the Future branch.but it needs the feature flag to be set (or the language version to be VB15. Not sure if these features are available by default in that version (15) of the language.

@jskeet
Copy link

jskeet commented Jul 22, 2016

@zippec: Completely agree. @jaredpar should we break out the feature request for 0x_1001_1000 to be valid into a separate issue?

@jaredpar
Copy link
Member

@jskeet yes let's use a separate issue since this feature is implemented as spec'd here. We can use the new issue to track changing to allow that syntax.

@weitzhandler
Copy link
Contributor

weitzhandler commented Mar 6, 2017

Would be nice. Although space feels less C#ish, I still vote for spaces, I mean can it go wrong as long as we're expecting a ;?
Anyway, I think it should only be allowed in binary/hex/o̷c̷t̷ etc.?

@paulomorgado
Copy link

@weitzhandler, I think that changing C# 7 and Visual Studio for tomorrow is, most probably, out of the question. 😄

@alrz
Copy link
Contributor

alrz commented Mar 6, 2017

so

var a = 1                                                                                                                                                                                                                                      0;

is actually just ten?

@paulomorgado
Copy link

@alrz, that's no worst than

var a = 1______________________________________________________________________________________________________________________________________________________________________________________________________________________________________0;

The greater issue here is that, in this particular case and only in this particular case, space is a special case for white spaces. And that's bad. Very bad.

@alrz
Copy link
Contributor

alrz commented Mar 6, 2017

@paulomorgado

No, the space is worst because it's invisible. In your example it's impossible to overlook the zero because the literal goes on and on. and on.

@weitzhandler
Copy link
Contributor

weitzhandler commented Mar 6, 2017

Limit to single space (surely no line breaks 😡).

_ is definitely more C#ish anyway.
And separation only make sense in binary/hex.

@alrz
Copy link
Contributor

alrz commented Mar 6, 2017

@weitzhandler No it doesn't. C# doesn't mind how many spaces you are using between tokens at all.

@weitzhandler
Copy link
Contributor

We should keep the discussion here.

@gafter
Copy link
Member

gafter commented Mar 6, 2017

@weitzhandler I think you mean discussion has moved here.

@paulomorgado
Copy link

@alrz, Visual Studio can make white space visible. But I still think that would be the least of the problems.

@gafter
Copy link
Member

gafter commented Mar 7, 2017

Discussion for this feature has been moved here.

@gafter gafter closed this as completed Mar 7, 2017
@gafter gafter added Resolution-External The behavior lies outside the functionality covered by this repository and removed 4 - In Review A fix for the issue is submitted for review. labels Mar 7, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Language Design Feature Request Feature Specification Language-C# Resolution-External The behavior lies outside the functionality covered by this repository
Projects
None yet
Development

No branches or pull requests