|
|
Subscribe / Log in / New account

Moving Python's bugs to GitHub

Did you know...?

LWN.net is a subscriber-supported publication; we rely on subscribers to keep the entire operation going. Please help out by buying a subscription and keeping LWN on the net.

By Jake Edge
February 23, 2022

Over the past seven years or so, Python has slowly been moving its development infrastructure to GitHub; we covered some of the early discussions at the end of 2014. One piece of that infrastructure, bug tracking, has not been moved from bugs.python.org, but plans are underway to make that happen soon. It is not a simple or straightforward process to do so, however, so the transition will take up to a week to complete; there are a number of interesting facets to the switch, as it entails clearing some technical, and even legal, hurdles.

The plan

Python's developer-in-residence, Łukasz Langa, announced the plan and schedule on the Python Discourse forum on February 18. As described in PEP 581 ("Using GitHub Issues for CPython"), the Roundup-based bugs.python.org (often abbreviated as "bpo" or "BPO") will be retired, but live on in read-only mode so that the existing URLs still work. Each of the entries on BPO will be enhanced with information about where the corresponding issue lives on GitHub; after the transition, all new issues will be added to GitHub.

After the discussions in 2014, the move to GitHub got rolling with PEP 512 ("Migrating from hg.python.org to GitHub"); as the title indicates, it was a move away from the Mercurial-based repositories to Git and GitHub. (Mercurial uses "hg" as its nickname and the name of its main binary, after the Atomic symbol for Mercury.) Brett Cannon, who authored PEP 512 and has been one of the driving forces behind the workflow changes that came with the move to GitHub, reported on the progress of the project at the Python Language Summits in 2016 and in 2017. At the summit in 2018, Mariatta Wijaya proposed switching to GitHub Issues for bug tracking, which resulted in PEP 581; it was approved in 2019 and is coming to fruition now.

As Langa noted, though, there are various difficulties in making the switch:

Unfortunately, this is not an easy task technically, procedurally, or legally, as it involves coordinating with several external actors and solving technical challenges mostly unique to our current circumstances. As a result, while progress was steady, it took a long while to get to this point. I was asked by the Steering Council to take over project management on the migration.

He has been working with CPython developer Ezio Melotti and "our friends on the Github side" to push the task forward. His announcement marked the beginning of a two-week feedback-gathering phase. Then, on March 4, a test migration will be done using 10% of the bugs on BPO; if that is successful, and no show-stopping problems are encountered, the migration will start by making BPO read-only on March 10 and beginning the transfer of everything on March 11.

The migration is estimated to take anywhere from 3 to 7 days, depending on the load on Github.com. This is why we will be performing the bulk of it during the weekend to speed things up.

During that time, no new issues can be opened in either place, but GitHub pull requests (PRs) can be created and used as normal. As issues are migrated from BPO and start showing up at GitHub, which will be ongoing during the process, they can be edited there, "but destructive actions (changing issue titles, editing comment content, deleting comments, removal of labels) are HIGHLY DISCOURAGED". Making those kinds of changes will make it more difficult to audit the completeness of the migration.

There is a contingency plan should things stretch out too long: "In the unlikely case that the migration cannot be completed in 7 days, the Steering Council decided that we would abort it and re-enable BPO again." Further details on the plan, its risks, and possible mitigations for them can be found in a GitHub issue. That issue is part of the gh-migration repository, which is where problems should be reported as part of the feedback process: "You can treat it as exercise in using Github issues 😉". There are also example migrated issues available on GitHub for Python developers and others to examine, as well as documentation updates (coming from this PR).

The main legal question to resolve was whether the Python Software Foundation (PSF) is able to move the user-generated content, with its potentially personally identifiable information (PII), from BPO to GitHub. The steering council and PSF lawyers determined that no user consent was required to do so:

Both BPO and Github are public-facing systems. Users actively placed their information (including PII) in the BPO system, which actively grants consent for that information to be stored, publicly accessible, and distributed on-demand. Changing our backend to Github does not revoke that permission. At the same time, the migration will not be surfacing any new user information that wasn't previously publicly accessible in the BPO system.

Concerns

As might be guessed, one of the concerns expressed in the forum regarded the multi-day pause in the use of the bug tracker. Eric Snow wondered if the older closed and inactive issues could be locked and migrated first. "Assuming the remaining issues would be much fewer than the inactive ones, I'd expect the disruptive part of the migration would be much (proportionally?) shorter." Langa said that the difficulty with doing that is there is no way to disable GitHub Issues during the migration; as soon as some issues are migrated, there would be two trackers in operation, in effect. "The idea to have two issue trackers open at the same time is making me nervous."

In the announcement, Langa noted that Python and GitHub were able to learn from the experience of the LLVM project, which migrated from Bugzilla to GitHub Issues back in December. That migration took 21 days, so the hope is that experience will lead to a smoother (and quicker) transition for Python. Snow said that the estimate of four to seven days "feels like the end of the world" in terms of its impact on core workflow, but, obviously, 21 days is far worse. Melotti said that he has been in contact with LLVM and others:

If I understand correctly the actual transfer eventually took them a couple of days, but it had a few false starts and issues. I've been talking with the project manager of the LLVM project and a few other people that performed similar migrations in the past, so that we could learn from their mistakes and avoid them.

Irit Katriel suggested that post-migration would make for a good time "to review old issues and close them if they are no longer relevant". Langa agreed with that idea, as did Melotti, who added it to the issue tracking notification for BPO users. A notification email of the change will be sent to BPO contributors, listing the issues they have submitted, been assigned, or were following, along with a link to the corresponding new GitHub Issue.

Victor Stinner asked about a related concern; normally an update to a BPO issue will send an email to those people who have added themselves to the "nosy" list for it. He wondered if the update of the BPO entries to add the new GitHub link would generate said emails; "I'm in the nosy list of 885 BPO issues. Should I expect 885 emails [...]?" Melotti sympathized (at least in part because he is on the nosy list for over 4000 entries), but did not directly address whether the BPO change emails would be generated. He did say that he was still hoping to be able to automatically convert the nosy list subscriptions to their GitHub equivalent. Otherwise, active contributors will need to go into each new bug, one by one, and add themselves.

Steve Dower wondered whether it made sense to migrate the closed issues at all. "While there's always some amount of further discussion on closed issues, the vast majority are never going to be touched again. Why recreate them?". But Katriel said that the closed issues still have useful information, which is best kept in one place:

If you want to search closed tickets for some error message, for instance, you want to search in only one place.

There are issues where the problem is not fixed, but the ticket has relevant discussion and workarounds.

The conversation is still ongoing as of this writing, and presumably will be for another week or more. None of the concerns raised so far seem like they will be all that hard to deal with, though it may still be a pretty painful transition, especially for active, longtime contributors. Whether that all gets worked out on the timeline laid out remains to be seen; it would not be a huge shock if the final transition had to be pushed back a time or two. There are quite a number of moving parts that need to be in alignment for this kind of a transition. Hopefully, it all goes off without a hitch—though that may be a tad overoptimistic.

As with Python learning from LLVM's experience, so too can other projects watch this transition with interest. That is one of the strengths of open source and openly developed software; there is much to be learned from the experience of other projects. In fact, the whole transition from self-hosted to GitHub can be found in the Python mailing lists, forum posts, PEPs, and so on; projects thinking about making a switch like that can prepare themselves better by standing on the shoulders of the projects that have gone before.

The switch away from Roundup also largely completes Python's transition of its development infrastructure from open-source, Python-based tools (Mercurial, Roundup) to the proprietary GitHub "software as a service" offering, which is certainly sad in some ways. But Python has always been a fairly pragmatic project—something it seems to have inherited from former benevolent dictator for life Guido van Rossum—and the intent of these moves was geared toward attracting new developers who are familiar with and comfortable using GitHub. Over the last few years, the project does seem to have picked up some steam—and lots of new faces—so it looks like that effort may be paying off. That, too, may be instructive to other projects.


Index entries for this article
PythonDevelopment model


(Log in to post comments)

Moving Python's bugs to GitHub

Posted Feb 24, 2022 10:11 UTC (Thu) by LtWorf (subscriber, #124958) [Link] (3 responses)

The problem will be moving away from github.

Moving Python's bugs to GitHub

Posted Feb 24, 2022 10:11 UTC (Thu) by LtWorf (subscriber, #124958) [Link]

But now that I think of it, microsoft employs Guido, and owns github…

Moving Python's bugs to GitHub

Posted Mar 2, 2022 23:56 UTC (Wed) by johannbg (guest, #65743) [Link]

There is no vendor lock-in on Github so I'm unsure why you think that exporting something from github is an issue + since it's own by Microsoft then it wont go through some financial difficulties like for example gitlab is experiencing so the python community choosing github as it's gitforge was a wise choice.

At this point I would not be surprised if gitlab was relying on IBM/RH buying them given how RH seems to be heavily using it :)

Moving Python's bugs to GitHub

Posted Mar 4, 2022 6:57 UTC (Fri) by oldtomas (guest, #72579) [Link]

I'm also concerned seeing more and more strategic (for free software) projects moving to Github.

I'm not even sure vendor lockin is the problem. It might be. But having seen how Github (before it was Microsoft) transformed an inherently decentral thing as Git into a centralised service without forcing anyone, and given Microsoft's track record on top of that...

I keep asking myself: what kind of shenanigans are up their sleeve to monetise the $7B they dumped into Github, and in which way are those shenanigans affecting us?

I've lived for too long. I don't trust them. A bit.

Real accounts

Posted Feb 24, 2022 20:35 UTC (Thu) by Empterdose (subscriber, #152954) [Link] (1 responses)

From the example migration, it looks like the migrated issues and comments are at least associated with real, individual GitHub user accounts. (Well, mostly. I've never seen a “mannequin” user before!) Thank goodness – I've previously seen issue migrations to GitHub use a single bot account to create the issues and comments, and the conversations end up being nearly unreadable, or at least very difficult to follow. When did GitHub add the ability to do this? Is it only available for big projects like this one? The migration plan mentions “ECI (Enterprise Cloud Importer)”, but the obvious documentation only mentions code, not issues.

Real accounts

Posted Feb 24, 2022 22:56 UTC (Thu) by mathstuf (subscriber, #69389) [Link]

From the LLVM thread that I perused, it seems it involved GitHub engineers being in the loop. So…not something available to us peons. Which is fair as the power to impersonate any account while posting a comment seems…like a lot to just have as an API endpoint hanging out there.

Moving Python's bugs to GitHub: postponed till March 24

Posted Mar 10, 2022 8:45 UTC (Thu) by douglasbagnall (subscriber, #62736) [Link]

This has been delayed by a war and a security release:

https://discuss.python.org/t/github-issues-migration-what...

(I'm following this because I've got a non-urgent bug that I've been but holding off reporting).


Copyright © 2022, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds