gh-76535: Make `PyUnicode_ToLowerFull` and friends public #136176

lysnikolaou · 2025-07-01T13:30:09Z

Make _PyUnicode_ToLowerFull, _PyUnicode_ToUpperFull, _PyUnicode_ToTitleFull and _PyUnicode_ToFoldedFull public and rename them to PyUnicode_ToLower etc.

Issue: Unclear intention of deprecating Py_UNICODE_TOLOWER / Py_UNICODE_TOUPPER #76535

📚 Documentation preview 📚: https://cpython-previews--136176.org.readthedocs.build/

Make `PyUnicode_ToLowerFull`, `PyUnicode_ToUpperFull` and `PyUnicode_ToTitleFull` public and rename them to `PyUnicode_ToLower` etc.

Doc/c-api/unicode.rst

lysnikolaou · 2025-07-01T14:53:13Z

Thanks for taking a look @vstinner! Feedback addressed.

serhiy-storchaka

In #76535 (comment) @vstinner suggested to provide a constant which is the minimum buffer size.

If this is indeed a hard constant which will never be changed in future Unicode standards, then I prefer this way. It is too expensive to allocate the output buffer dynamically.

cc @ezio-melotti, our Unicode expert.

lysnikolaou · 2025-07-01T15:09:49Z

Another question I have is whether we want to expose something like the following to handle the Greek letter sigma edge case:

int PyUnicode_ToLowerHandleSigma(Py_UCS4 *str, Py_UCS4 ch, Py_UCS4 *buffer, int size)

vstinner

Thanks, I prefer this API which is more future-proof, it doesn't depend on a specific Unicode version.

Objects/unicodeobject.c

Doc/c-api/unicode.rst

vstinner · 2025-07-01T15:14:26Z

PyUnicode_ToLowerHandleSigma

Would you mind to elaborate? I'm not aware of this special case.

vstinner · 2025-07-01T15:18:24Z

If this is indeed a hard constant which will never be changed in future Unicode standards

Even if it's a constant which will never (!) change, IMO it's better to request a size as an argument to make the caller responsible to check the buffer size. APIs which accept a pointer with no size are a bad pattern, like the deprecated gets() function.

lysnikolaou · 2025-07-01T15:33:17Z

Further feedback addressed.

Would you mind to elaborate? I'm not aware of this special case.

There's one special case, the Greek letter sigma, where the result of lower casing is context-specific. More specifically, Σ gets lower-cased to ς if it's at the end of the word or to σ otherwise. This is handled in lower_ucs4 right now.

vstinner

Can you try to add tests to Modules/_testcapi/unicode.c and Lib/test/test_capi/test_unicode.py?

Doc/c-api/unicode.rst

vstinner · 2025-07-01T15:46:38Z

There's one special case, the Greek letter sigma, where the result of lower casing is context-specific. More specifically, Σ gets lower-cased to ς if it's at the end of the word or to σ otherwise.

Oh, that's a tricky case. Proposed API takes a single character, so we don't know if Σ is at the end of a word or not. I don't think that it's worth it to handle this special case in proposed API.

lysnikolaou · 2025-07-01T16:32:41Z

Can you try to add tests to Modules/_testcapi/unicode.c and Lib/test/test_capi/test_unicode.py?

Done.

serhiy-storchaka · 2025-07-01T16:53:18Z

If we add too many parameters and runtime checks, this will make the API slower and more difficult to use.

Modules/_testcapi/unicode.c

Lib/test/test_capi/test_unicode.py

vstinner · 2025-07-02T09:34:04Z

Doc/c-api/unicode.rst

+   Convert *ch* to lower case, store result in *buffer*, which should be
+   able to hold as many characters needed for *ch* to be lower cased
+   (e.g. a maximum of two character for Unicode 16.0), and
+   return the number of characters stored. Passing a ``NULL`` buffer returns


Is it an important usecase for you to be able to pass NULL buffer to get the buffer size? Removing this feature would make the implementation simpler.

No, it's not an important usecase. I implemented because it was suggested on the issue and it also seems like standard practice for some C functions that accept buffers.

Do you think I should remove it?

I don't think that someone will call the function with NULL, allocate a buffer on the heap memory with the result (size), call again the function, and then release the buffer. It sounds complicated just to transform a single character.

Just overallocate 10 characters on the stack memory, call the function once, that's it. You get the size as well. If you don't need characters, just ignore them.

Modules/_testcapi/unicode.c

vstinner · 2025-07-02T12:01:59Z

Modules/_testcapi/unicode.c

-    assert(chars >= 1);
+    Py_UCS4 buf[3];
+    int chars = function(c, buf, Py_ARRAY_LENGTH(buf));
+    if (chars <= 0) {


Suggested change

if (chars <= 0) {

if (chars < 0) {

0 is also an unexpected result here that shouldn't be happening.

Modules/_testcapi/unicode.c

pythongh-76535: Make PyUnicode_ToLowerFull and friends public

431abba

Make `PyUnicode_ToLowerFull`, `PyUnicode_ToUpperFull` and `PyUnicode_ToTitleFull` public and rename them to `PyUnicode_ToLower` etc.

bedevere-app bot added the awaiting core review label Jul 1, 2025

bedevere-app bot mentioned this pull request Jul 1, 2025

Unclear intention of deprecating Py_UNICODE_TOLOWER / Py_UNICODE_TOUPPER #76535

Open

lysnikolaou mentioned this pull request Jul 1, 2025

gh-76535: Add C API functions for changing case of a single codepoint #117117

Closed

vstinner reviewed Jul 1, 2025

View reviewed changes

Doc/c-api/unicode.rst Outdated Show resolved Hide resolved

vstinner reviewed Jul 1, 2025

View reviewed changes

Doc/c-api/unicode.rst Outdated Show resolved Hide resolved

Address feedback; add size parameter and do PyUnicode_ToFolded as well

d604fc8

📜🤖 Added by blurb_it.

fbbf841

serhiy-storchaka reviewed Jul 1, 2025

View reviewed changes

vstinner reviewed Jul 1, 2025

View reviewed changes

Objects/unicodeobject.c Outdated Show resolved Hide resolved

Doc/c-api/unicode.rst Outdated Show resolved Hide resolved

Address more feedback; assert return value and raise ValueError

f17aa0c

vstinner reviewed Jul 1, 2025

View reviewed changes

Doc/c-api/unicode.rst Show resolved Hide resolved

Add tests

4a70489

Document the maximum numbers of characters needed in the buffer

61afd9a

vstinner reviewed Jul 2, 2025

View reviewed changes

Modules/_testcapi/unicode.c Show resolved Hide resolved

Modules/_testcapi/unicode.c Outdated Show resolved Hide resolved

Lib/test/test_capi/test_unicode.py Show resolved Hide resolved

Lib/test/test_capi/test_unicode.py Show resolved Hide resolved

vstinner reviewed Jul 2, 2025

View reviewed changes

Address feedback; test more characters and refactor _testcapi functions

7885b17

vstinner reviewed Jul 2, 2025

View reviewed changes

Address more review comments

6f9cb95

Uh oh!

gh-76535: Make PyUnicode_ToLowerFull and friends public #136176

Are you sure you want to change the base?

gh-76535: Make PyUnicode_ToLowerFull and friends public #136176

Conversation

lysnikolaou commented Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lysnikolaou commented Jul 1, 2025

Uh oh!

serhiy-storchaka left a comment

Choose a reason for hiding this comment

Uh oh!

lysnikolaou commented Jul 1, 2025

Uh oh!

vstinner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

vstinner commented Jul 1, 2025

Uh oh!

vstinner commented Jul 1, 2025

Uh oh!

lysnikolaou commented Jul 1, 2025

Uh oh!

vstinner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vstinner commented Jul 1, 2025

Uh oh!

lysnikolaou commented Jul 1, 2025

Uh oh!

serhiy-storchaka commented Jul 1, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vstinner Jul 2, 2025

Choose a reason for hiding this comment

Uh oh!

lysnikolaou Jul 2, 2025

Choose a reason for hiding this comment

Uh oh!

lysnikolaou Jul 2, 2025

Choose a reason for hiding this comment

Uh oh!

vstinner Jul 2, 2025

Choose a reason for hiding this comment

Uh oh!

lysnikolaou Jul 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

vstinner Jul 2, 2025

Choose a reason for hiding this comment

Uh oh!

lysnikolaou Jul 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gh-76535: Make `PyUnicode_ToLowerFull` and friends public #136176

gh-76535: Make `PyUnicode_ToLowerFull` and friends public #136176

lysnikolaou commented Jul 1, 2025 •

edited

Loading