Skip to content

[RFC] Add compile-time checking of mp_printf format strings #17556

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 18 commits into
base: master
Choose a base branch
from

Conversation

jepler
Copy link
Contributor

@jepler jepler commented Jun 24, 2025

Summary

It's always nice when errors can be detected at compile time. In traditional C programs, gcc can check that the printf argument types match the printf format string. This has not been possible up to now with mp_printf, because it has both extensions to standard printf (e.g., the %q format type) and is missing things in standard printf (e.g., %zd is not supported).

To that end, I have developed a GCC plugin that does this checking at compile time. I've also made the necessary changes for the unix coverage build to complete with the plugin enabled, and enabled it during the coverage build process.

Creating in draft mode to get feedback and also because this is cumulative with some other outstanding PRs that are needed to get the CI board to green.

Testing

I built the unix port coverage variant & ran the tests locally. The plugin itself should cause no code changes. There is a small code growth reported, so one of the added casts must not actually be a no-op. I have not determined which one.

Trade-offs and Alternatives

As a gcc plugin this can only support gcc-based toolchains. clang and proprietary compilers would not work. This does not seem important, as this feature only produces diagnostics.

The plugin is GPL licensed. I started with a GPL-licensed plugin template, and plugins need to be GPL-or-compatible in license in order to be loaded in gcc anyway. The plugin code IMO does not affect the license situation of the output object code, as you'd get the exact same code with or without the plugin.

Missing support for:

  • Knowing whether %ll is runtime-supported
  • handling enum argmuents properly (just one enum was %d printed in the coverage test and I added a cast instead of fixing the checker)
  • Whether to add & use defines similar to standard PRId32 for printing mp_{u,}int_t values: some ports need %d and others %ld, and maybe some even need %lld (I think maybe 32-bit nanbox builds would require this, for example). e.g., #define PRIdPY "lld" next to typedef long long mp_int_t. This would replace (int) casts which was the easiest way to get local builds to finish.

CI may need a new package installed -- debian needed gcc-12-plugin-dev and there's no gcc-plugin-dev (w/o version number) package to install the 'usual' one). I tried to code this, we'll see if it works.

Whether to enable it on more ports. This could catch problems in port-specific files, or for different fundamental object sizes.

Adding support for cmake-based builds and any other oddball build configurations

jepler added 6 commits June 22, 2025 08:11
This side-steps the problem where the output does not match when the
host python version is 3.12.x (as it is on unix ci today).

Signed-off-by: Jeff Epler <[email protected]>
If the fields added for MICROPY_PY_SYS_SETTRACE are not
initialized properly, their value in a thread is
indeterminate. In particular, if the callback is not NULL,
it will be invoked as a function.

Signed-off-by: Jeff Epler <[email protected]>
When MICROPY_PY_SYS_SETTRACE was enabled, a crash was seen in
the qemu_mips build. It seems likely that this was due to these
added fields not being initialized.

Signed-off-by: Jeff Epler <[email protected]>
The argument corresponding to a `%q` specifier must be of type
`qstr`, not a narrower type like `int16_t`. Not ensuring this
caused an assertion error on one Windows x64 build.

The argument corresponding to a `%d` specifier must be of type
`int`, not a potentially-wider type like `mp_uint_t`. Not ensuring
this prevented the function name from being printed on the
unix nanbox build.

Signed-off-by: Jeff Epler <[email protected]>
@jepler
Copy link
Contributor Author

jepler commented Jun 24, 2025

Example diagnostic:

coverage.c: In function ‘extra_coverage’:
coverage.c:206:9: error: argument 3: expected ‘long int’ or ‘long unsigned int’, not ‘int’ [-
Werror=format=]
  206 |         mp_printf(&mp_plat_print, "%ld\n", 123); // long
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors

@jepler jepler force-pushed the compile-time-format-checker branch 2 times, most recently from df0062d to c39e2bd Compare June 24, 2025 08:49
Copy link

codecov bot commented Jun 24, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.23%. Comparing base (09541b7) to head (bc2a207).
Report is 17 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #17556      +/-   ##
==========================================
- Coverage   98.56%   98.23%   -0.34%     
==========================================
  Files         169      171       +2     
  Lines       21950    22160     +210     
==========================================
+ Hits        21636    21769     +133     
- Misses        314      391      +77     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link

Code size report:

   bare-arm:    +0 +0.000% 
minimal x86:    +0 +0.000% 
   unix x64:    +0 +0.000% standard
      stm32:    +4 +0.001% PYBV10
     mimxrt:    +0 +0.000% TEENSY40
        rp2:    +0 +0.000% RPI_PICO_W
       samd:    +4 +0.001% ADAFRUIT_ITSYBITSY_M4_EXPRESS
  qemu rv32:    +4 +0.001% VIRT_RV32

@jepler jepler force-pushed the compile-time-format-checker branch 2 times, most recently from 59942b2 to 41fcad4 Compare June 24, 2025 14:50
jepler added 10 commits June 24, 2025 18:47
This test is flaky on GitHub CI in the sanitize_undefined build.
This build adds extra runtime overhead, but there's not a simple
way to test for its presence at runtime from micropython. So, just
bump the top spec thread count down globally.

Signed-off-by: Jeff Epler <[email protected]>
The name field of type objects is of type uint16_t for efficiency,
but when the type is passed to mp_printf it must be cast explicitly
to type qstr.

These locations were found using an experimental gcc plugin
for mp_printf error checking, cross-building for x64 windows
on Linux.

Signed-off-by: Jeff Epler <[email protected]>
The type of the argument must match the format string. Add
casts and/or modify format strings in order to ensure that they do.

These locations were found using an experimental gcc plugin
for mp_printf error checking, cross-building for x64 windows
on Linux.

In one case there was already a cast, but it was written
incorrectly and did not have the intended effect.

Signed-off-by: Jeff Epler <[email protected]>
we still want this not to crash a runtime but the
new static checker wouldn't like it.

Signed-off-by: Jeff Epler <[email protected]>
@jepler jepler force-pushed the compile-time-format-checker branch from 0153502 to 536f6fb Compare June 24, 2025 16:48
@jepler
Copy link
Contributor Author

jepler commented Jun 24, 2025

the plugin can also run during the gcc windows builds. I tested it locally using the cross-building steps but I assume it'd work on windows too. I won't complicate this PR by adding it, but I'd plan to add it in a subsequent PR. Some of the format string "findings" came out of me doing that locally.

@jepler
Copy link
Contributor Author

jepler commented Jun 25, 2025

@dpgeorge Please let me know if you think this is worth pursuing.

@jepler
Copy link
Contributor Author

jepler commented Jun 25, 2025

I did some checks in a sibling branch, and on macos, gcc is clang (!) which has a different incompatible plugin api; gcc-14 is also installed via brew in the runner, but it is missing some indirect dependency required for plugin building (/opt/homebrew/Cellar/gcc@14/14.3.0/bin/../lib/gcc/14/gcc/aarch64-apple-darwin23/14/plugin/include/system.h:700:10: fatal error: 'gmp.h' file not found). this may be solvable with brew, but as this would not cover any additional source files or integer size models it's probably not worth the time to figure out how to fix it.

@jepler
Copy link
Contributor Author

jepler commented Jun 25, 2025

Plugin support on Windows/MinGW has a number of limitations and additional requirements so adding support for that would better be postponed to a subsequent PR.

On the nanbox build, `o->obj` is a 64-bit type but `%p` formats
a 32-bit type. This is incorrect, as detected by the format string
checker plugin.

Signed-off-by: Jeff Epler <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant