tech-repository archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: Core statement on version control systems
> Date: Sun, 4 Jan 2015 16:31:10 +0200
> From: Alan Barrett <apb%netbsd.org@localhost>
>
> The NetBSD core group has been asked to make a statement on version
> control systems.
>
> There is a strong case for switching from CVS to a modern
> distributed version control system (DVCS). However, the proponents
> of particular DVCS systems have not presented a coherent plan that
> can be used to implement a transition from the current system to a
> new system.
Here is a plan to transition NetBSD from CVS to a hybrid of Mercurial
and Git, based on the current CVS -> Mercurial conversion process we
have ongoing.
Summary:
- Developers can pull _and push_ via either hg or git at their choice.
- Primary repository is hg.
- Git mirror via git-cinnabar -- pushes to git are relayed
synchronously to hg.
=> Developers using git need not install or configure any special
software like git-cinnabar; it will act just like a regular git
remote.
=> No more forced updates -- that was an artefact of the ongoing
conversion from CVS.
=> Read-only github mirror will remain.
- For now, linear-history workflow with rebase, just like CVS.
=> We may transition in the future to merge-based workflow
integrated with releng automatic testbed, but not today.
Details below. Not everything in this draft plan is finished, but the
parts marked XXX are largely minor. Parts of this are to be
integrated with other existing notes scattered around the wiki in a
more obvious central place. Aspects that are specifically relevant to
security, particularly how we keep audit trails without regression
from CVS and authenticate repository history, are marked [SECURITY].
The subsection on TNF's private infrastructure tooling is omitted from
this public message.
> Some of the things that we would like to be addressed in a transition
> plan are:
** matrix like https://wiki.freebsd.org/VersionControl
> * How well the proposed system satisfies the requirements and
> desires of the community, in terms of features, ease of use,
> performance, and other considerations that have been mentioned
> in the tech-repository mailing list. It would be useful to
> have a matrix similar to the one produced by FreeBSD, available
> at <https://wiki.freebsd.org/VersionControl>.
I think that matrix serves reasonably well. Some updates:
- git has better `partial clone' support than it did at the time (via
shallow and/or filtered clones, and sparse checkouts).
- hg can import the full repository just as well as git (the git
conversion is derived from the hg conversion).
** low-memory issues
> * Performance implications of the desired VCS system, especially
> for hosts with low or moderate amounts of memory.
hg can operate in about 512 MB of working memory, without the evolve
extension for revising work-in-progress development topics. (With the
evolve extension, requires about 3 GB.)
XXX TBD: Test whether remotefilelog reduces memory usage.
git can operate in about 512 MB of working memory, with the following
tunables in ~/.gitconfig (with the caveat that they may increase disk
space usage):
[core]
packedGitWindowSize = 1k
packedGitLimit = 1k
deltaBaseCacheLimit = 1k
compression = 0
loosecompression = 0
bigFileThreshold = 1m
[pack]
window = 1
depth = 1
windowMemory = 1k
deltaCacheSize = 1k
deltaCacheLimit = 1
threads = 1
packSizeLimit = 1m
While this memory usage considerably higher than CVS required, it's
also about the minimum you need anyway to build NetBSD.
> Whether low performance hosts:
>
> - will be able to use the new system fully; or
Below around 512 MB, not fully.
> - will be able to use a degraded mode that will allow at
> least HEAD and branch checkouts and commits (but perhaps
> without the ability to create new branches or to merge); or
Systems with 256 MB -- and possibly much less than that -- should be
able to work in Git sparse checkouts of shallow filtered clones.
> - will be able to use a front-end or mirror that provides
> CVS-like capabilities and performance; or
See also below about making a read-only CVS mirror with hooks.
> - will not be able to use revision control at all.
** flaky network issues
Not mentioned in the core statement, but worth addressing here:
`hg clone' and `hg pull' don't have incremental modes, nor do `git
clone' and `git pull', but there are several mitigations:
- We can periodically generate bundles, which are single files that you
can download incrementally with, e.g., `ftp -R' or `curl -C -'
(restart and continue where it left off in case network flakes).
`hg clone' and `hg pull' are able to automatically use bundles
(though not incrementally) announced via hash and external URL by the
remote server, so they can safely be delivered through the CDN
(hashed bundle references are new in Mercurial 6.9).
XXX git has some support for automatic use of bundles but I haven't
tried it yet, and [SECURITY] it does not clearly have a way to make
secure hashed references to bundles:
https://git-scm.com/docs/bundle-uri
For example, we could generate monthly full bundles, weekly
incremental bundles since the last monthly, and daily incremental
bundles since the last weekly.
This infrastructure is already partly in place for hg.
- We can mount a daily snapshot of the repositories, say a snapshot of
/hg mounted at /snap/hg, and allow developers to copy the snapshot
with rsync; then they can track further updates with `hg pull'.
Similarly, potentially, for git.
- For hg, developers and users don't always need to download the file
content of all history all at once:
. With the remotefilelog extension, you only need to download the
metadata of the repository history, not all the historical file
contents.
. With the narrow extension, you can limit the working directory to a
subset of files in the repository with `hg clone --narrow --include
<file>'.
A shallow clone with remotefilelog cuts src down from several
gigabytes to 300-400 MB as of 2024.
- For git, developers and users don't always need to download the whole
tree or history all at once:
. With _shallow clones_ (git clone --depth=1), you need only fetch
the most recent commit.
. With _filtered clones_, you need only fetch part of the content of
each commit you get, and git will fetch on-demand if you try to do
things that require more like `git blame' (show what commit each
line in a file came from) or `git log -S<pat>' (search for commits
adding or deleting <pat>):
=> With _blobless clones_ (git clone --filter=blob:none), you need
only fetch the commit metadata (commit message and history
pointers) and tree structure, not the content of every file.
=> With _treeless clones_ (git clone --filter=tree:0), you need
only fetch the commit metadata (commit message and history
pointers), not even the tree structure.
. With _sparse checkouts_ (git sparse-checkout) in filtered clones,
you can check out and make commits in a subtree, and git will only
download the file content for files touched by commits in that
subtree.
E.g., to do development on src/bin/sh, you only need to download
about 34MB with a sparse checkout of a treeless shallow clone.
(That's still more than CVS, but not a lot more.)
Once you have a full repository, you can do local work 100% offline, to
commit and edit and organize changes. You can also maintain multiple
work trees in parallel from a single clone of the repository with `hg
share' or `git worktree' -- no need to download again.
** continued CVS use in parallel
> * Whether it will be possible to run the existing CVS system
> in parallel with the new version control system during a
> transition period, and if so, how commits made to one system
> will be mirrored to the other.
We could use a Mercurial changegroup hook to commit changes to a
read-only CVS mirror.
=> This would require some thought to make it work correctly with tags
and branches -- maybe limited to release branches/tags like
netbsd-11, netbsd-11-base, and netbsd-11-0-release, requiring manual
intervention by releng.
=> This probably wouldn't preserve revision numbers.
=> We might reasonably limit it to netbsd-9 and netbsd-10 and then
stop once netbsd-10 is EOL.
** continued real-time conversion with other VCSs
> * Whether one-way or bidirectional near-real-time conversion to
> other VCS systems will be possible on an ongoing basis, and
> if so, to which systems (other than the system chosen as the
> master), and how it will be configured.
We can provide a push/pull bridge to git using git-cinnabar
(https://github.com/glandium/git-cinnabar) that works as follows:
1. The primary repository is hg.
2. Pushing to hg triggers an asynchronous pull by the git repository.
The underlying mechanism is a Mercurial txnclose hook.
(If git pull fails, the git repository effectively becomes read-only
until admins intervention.)
3. Pushing to git synchronously pushes back to hg.
This way, if hg rejects the push, you get feedback in the git push.
4. XXX hg topics and git per-developer-namespace feature branches TBD:
https://github.com/glandium/git-cinnabar/issues/326
git-cinnabar has had incompatible changes in the default conversion --
but usually has configuration knobs, with automatic tests included, to
restore the old behaviour. For example, in the transition from 0.6 to
0.7, it changed from appending NUL bytes to appending line feeds to
certain commit messages, but preserved the old behaviour.
** matrix of tools using cvs and plans for hg
> * A matrix of all the supporting tools and code that currently
> assumes CVS, with plans and actions for the conversion of each
> item to the new VCS.
Automation plan split into three phases:
1. Needed for ongoing NetBSD development. Can't be put off.
2. Important but not needed to prevent development from stalling, like
browsing the source tree on the web.
3. Nice-to-have but not used for anything important, like posting RSS
change logs.
A general note on automation: Many tools rely on `cvs update' to get an
up-to-date snapshot, without storing unnecessary history. Mercurial
doesn't natively have this operation, but:
(a) We could use a periodic snapshot with hg archive and rsync instead.
(b) We could use the hg remotefilelog extension, which is shipped with
Mercurial but requires some care to enable:
https://bz.mercurial-scm.org/show_bug.cgi?id=6936
https://bz.mercurial-scm.org/show_bug.cgi?id=6937
[TNF private infrastructure details omitted]
** how to set up tnf servers
> * How the NetBSD project's official servers should be set up,
> taking into account the needs of developers with write access,
> read-only access for the public to mirror the repository,
> read-only access for the public to check out the tree (if
> that is different from mirroring the repository), backups,
> redundancy, audit trail, email for commit messages, and any
> other issues identified with the assistance of NetBSD admins.
*** TNF server setup
**** hg.NetBSD.org (alias git.NetBSD.org)
1. hg repos stored in /hg (XXX currently /repo) with multiple shares
(i.e., same repo content sharing storage) with filtered views:
(a) src-public exposes only public changesets on default (main
development branch) and long-term branches, which never change
once published
(b) src-draft additionally exposes draft changesets, which can be
obsoleted by successor changesets by rebase/histedit/amend
(c) src-all additionally exposes hidden changesets, which are the
predecessors -- earlier revisions before rebase/histedit/amend
-- of the current draft changesets
Thus, src-public <= src-draft <= src-all; src-all contains the
entire revision history, including the revisions to revisions.
(Read `<=' as `subset of or equal to'.)
Physical layout:
- [src-all] /hg/.joined/src is where the content is stored
=> .hg/hgrc -> /etc/mercurial/hgrc-joined (symlink)
- [src-draft] /hg/src-draft is an hg share of /hg/.joined/src
=> .hg/hgrc -> /etc/mercurial/hgrc-draft (symlink)
- [src-public] /hg/src-public is an hg share of /hg/.joined/src
=> .hg/hgrc -> /etc/mercurial/hgrc-public (symlink)
- [src-public] /hg/src -> src-public (symlink)
Symlink /hg/src means that using ssh://hg.NetBSD.org//hg/src gets
you src-public.
2. git-cinnabar bridge at /git/src, /git/pkgsrc, &c.
3. [SECURITY] restricted developer access via hg/git over ssh
4. [SECURITY]
XXX Future change: auditlog configured to record who pushed what
XXX (currently using deprecated pushlog instead)
5. hook to:
(a) reject pushes that create multiple heads (i.e., no push -f)
(python:hgext.hooklib.reject_new_heads.hook)
(b) reject pushing merge commits (for now)
(python:hgext.hooklib.reject_merge_commits.hook)
(c) [SECURITY] require committer name match ssh login
(d) limit namespaces, e.g. allow developers to create topics named
DEVNAME-*, but only allow releng to create branches and tags
like netbsd-X, netbsd-X-base, netbsd-X-Y-release
(XXX TBD: https://bz.mercurial-scm.org/show_bug.cgi?id=6944)
(can do this with a custom hook even if acl ext lacks support)
6. hg hooks to:
(a) hg push to anonhg
=> /etc/mercurial/hgrc has:
[hooks]
changegroup.anonhg = /etc/mercurial/trigger-sync
This connects to a socket, /home/srcmastr/sync-*.
=> /home/srcmastr/startup starts up tmux sessions that run...
=> /home/srcmastr/syncd runs
hg push --hidden -B @ --new-branch -r 'heads(:)' -f <anonhgrepo>
and then waits for a socket connection and repeats.
(b) send mail notification via the `notify' extension
=> /etc/mercurial/hgrc [notify] section has some config
=> /etc/mercurial/notify.conf has the rest of the config
(c) trigger `git pull' in git-cinnabar bridge
7. cron jobs to:
(a) `hg verify' all hg repositories weekly (XXX TBD)
(b) create hg bundles, /etc/mercurial/build-*bundles (XXX TBD)
(c) `git fsck' or `git maintenance' all repositories regularly (XXX TBD)
(d) create git bundles (XXX TBD)
**** anonhg.NetBSD.org (alias anongit.NetBSD.org)
1. clone of hg.n.o repositories with same public/draft/all division
Physical layout:
- /hg/.joined/src is where the content is stored
- [src-all] /hg/src-all is an hg share of /hg/.joined/src
=> .hg/hgrc includes /etc/mercurial/hgrc-all
- [src-draft] /hg/src-draft is an hg share of /hg/.joined/src
=> .hg/hgrc includes /etc/mercurial/hgrc-draft
- [src-public] /hg/src-public is an hg share of /hg/.joined/src
=> .hg/hgrc includes /etc/mercurial/hgrc-public
URLs under https://anonhg.netbsd.org are determined by
/etc/mercurial/hgweb.config, e.g.:
[paths]
/src = /hg/src-public
/src-draft = /hg/src-draft
XXX Why do we have /hg/src-all in addition to /hg/.joined/src?
Something about hg.n.o's hg push for the conversion needing
different configuration (.joined) from the public view (-all)?
2. anonymous read-only access via hg over ssh as user `anonhg'
=> authenticated via ssh host key fingerprint shipped in NetBSD
=> use contrib/hg-ssh (https://wiki.mercurial-scm.org/SharedSSH) to
restrict access to read-only; see
https://wiki.mercurial-scm.org/SecuringRepositories for security
advice
=> [SECURITY] This is necessary so we can ship the ssh host key
fingerprint with NetBSD instead of just relying on HTTPS CAs for
https://anonhg.n.o.
3. anonymous read-only access via hgweb over http/https, under nginx or
apache2 wsgi socket
=> can't use bozohttpd for lack of wsgi
=> authenticated via HTTPS CAs, in principle vulnerable to rogue CAs
4. anonymous read-only human-readable access via hgweb over http/https,
under nginx or apache2
XXX We might want to move this to another host, like hgweb.n.o (and
gitweb.n.o) in order to mitigate the impact of robot abuse on real
hg/git clients.
5. similarly with git-http-backend and gitweb
=> XXX can't use bozohttpd for lack of chunked input, PR bin/58354
https://gnats.NetBSD.org/58354
**** periodic backup methods
[TNF private infrastructure details omitted]
*** Requirements
**** developers with write access
To get started:
hg clone ssh://hg.NetBSD.org//hg/src
or, on flaky networks, download the latest clonebundle.hg and:
hg init src
cd src
hg unbundle ../clonebundle.hg
and set `[paths] default = ssh://hg.NetBSD.org//hg/src' in the .hg/hgrc
file.
To keep updated:
hg pull --rebase
To publish changes (may require pull/rebase cycle):
hg push
To share work-in-progress, put it on a topic DEVNAME-PRNUM-SUMMARY:
(https://www.mercurial-scm.org/doc/evolution/tutorials/topic-tutorial.html),
add `[paths] draft:pushurl = ssh://hg.NetBSD.org//hg/src-draft' to
.hg/hgrc and:
hg push -t DEVNAME-PRNUM-SUMMARY draft
***** Git bridge
To get started:
git clone <developer>@git.NetBSD.org:/git/src
or, on flaky networks, download the latest bundle.git and:
git clone bundle.git src
git -C src remote set-url origin <developer>@git.NetBSD.org:/git/src
To keep updated:
git fetch
git rebase origin/default
To publish changes (may require fetch/rebase cycle):
git push
XXX Share work-in-progress via GIT_NAMESPACE in private per-developer
refs/namespaces/<devname>/, mirrored to hg topics under (say)
default//<devname>/.
**** read-only access for the public
To get started:
hg clone https://anonhg.NetBSD.org/src
or, on flaky networks, download the latest clonebundle.hg and:
hg init src
cd src
hg unbundle ../clonebundle.hg
and set `[paths] default = https://anonhg.NetBSD.org/src' in .hg/hgrc
file.
To keep updated:
hg pull --rebase
To share changes:
hg export
***** Git bridge
To get started:
git clone https://anongit.NetBSD.org/src
or, on flaky networks, download the latest bundle.git and:
git clone bundle.git src
git -C src remote set-url origin https://anongit.NetBSD.org/src
To keep updated:
git fetch
git rebase origin/default
To share changes:
git format-patch ...
**** backups
(see `franklin.NetBSD.org periodic backup methods' above)
1. dump /hg on hg.n.o with dump(8)/restore(8)
2. dump /git on git.n.o with dump(8)/restore(8)
3. hg/git clone and hg/git pull
4. archive deduplicated snapshots of uncompressed bundle exports
**** redundancy
For write access, we don't have redundancy with CVS and we won't have
redundancy with Mercurial: if cvs.n.o goes down, development halts; if
hg.n.o goes down, development would also halt. No regression.
For read access by developers, hg.n.o and anonhg.n.o provide
redundancy. No regression.
For read access by others, we could easily mirror anonhg.n.o, subject
to infrastructure requirements. The general public can also mirror it
with `hg clone', albeit without authentication. We can also publish
bundles from time to time, served by the CDN ([SECURITY] with hashed
references from anonhg.n.o), to reduce the load on anonhg.n.o. No
regression.
We could potentially implement replication on push via a three-phase
commit protocol through hg repository hooks, like GitHub does with Git:
https://github.blog/engineering/infrastructure/stretching-spokes/
https://lore.kernel.org/git/20241105013433.4E52260A64%jupiter.mumble.net@localhost/T/
https://bz.mercurial-scm.org/show_bug.cgi?id=6903
However, this may require new engineering.
**** audit trail [SECURITY]
We can use a pretxnchangegroup hook to require the ssh login name to
match the author name on any changesets pushed to the repository. This
provides the same audit trail as CVS. No regression.
XXX TBD: hg auditlog (replacing hg pushlog) to record who pushed what
without requiring the authorship to match, so we can more easily credit
outside contributors.
Mercurial also supports a mailmap so we can retrospectively change
display names in hg commit logs, in case of, e.g., deadnames. But the
committer login names will remain intact for audit trail. This isn't
perfect (developers may change their login names too on transition) but
it's not a regression and it's probably the most reasonable compromise
between auditability and gracefully handling deadnames.
NOTE: We do NOT rely on collision resistance of SHA-1 for audit trail
or authentication. Instead, we rely on the security of the repository
server, just like we did with CVS -- no regression.
**** email for commit messages
We can use the `notify' extension to send commit notifications by email
to source-changes@.
XXX Decide how to filter revisions to existing work-in-progress
changesets.
** how to bootstrap
> * How tools will be incorporated into the src tree, or
> bootstrapped from pkgsrc.
1. Via binary installation:
pkg_add mercurial
pkg_add pyN-hg-evolve
- This path is authenticated at the transport layer, now that we
have TLS trust anchors in base as of NetBSD 10.0.
- With pkgsrc cross-builds, we can build mercurial packages for
every platform, even if we don't currently do bulk builds for all
platforms.
2. From source (src and pkgsrc):
./build.sh tools distribution
follow pkgsrc/doc/HOWTO-use-crosscompile to build mercurial package
(and pyN-hg-evolve)
It is technically feasible to put git in base, but that brings in the
maintenance burden of keeping curl up-to-date in base. Better to work
on package cross-builds, authenticated binary distributions, and
integration with build.sh.
** workflow
> * How developers and non-developers will interact with the
> system, including workflow options for official release or
> feature branches, and for personal public or private branches
> (by both developers with write access, and non-developers).
Functionally, same workflow as CVS, spelled differently.
We generally work in linear history on the main branch, which is called
`HEAD' in CVS and `default' in hg (currently called `trunk' in the hg
and git conversions but this will change in the final conversion).
Rough correspondence:
cvs checkout hg clone
cvs update hg commit -m WIP, hg pull --rebase, hg uncommit
cvs commit hg commit, (`hg log -pvr .' to review), hg push
(If hg push fails because of multiple heads,
hg pull --rebase, resolve conflicts, ideally
re-test each commit, and try again.)
cvs add hg add
cvs rm hg rm
cvs annotate hg annotate
cvs log hg log
cvs diff hg diff
cvs update -C FILE hg revert FILE
Checking out a long-term branch like netbsd-10:
cvs checkout -rBRANCH hg clone -b BRANCH ...
(or use hg share, if you already have a clone)
cvs update -rBRANCH hg update BRANCH
cvs update -A hg update default
Checking out a tag or exporting a tag's contents:
cvs checkout -rTAG hg clone -u TAG ...
cvs update -rTAG hg update TAG
cvs export -rTAG -dDIR hg archive -r TAG DIR
Starting a work-in-progress patch series outside main branch:
cvs rtag -rHEAD -rDEV-NAME-base0 src hg topic DEV-NAME
cvs rtag -b -rHEAD DEV-NAME-base0 DEV-NAME src hg commit ...
cvs commit ... hg push -r 'topic(DEV-NAME)'
(`hg outgoing' to review first)
`Syncing with HEAD':
cvs update -P -rDEV-NAME hg update DEV-NAME
cvs rtag -F -rHEAD DEV-NAME-baseN hg rebase
cvs update -jDEV-NAME-base(N-1) -jDEV-NAME-baseN
cvs commit
`Merging into HEAD':
cvs update -jDEV-NAME hg update default
hg rebase -r 'topic(DEV-NAME)' -d .
cvs commit hg push
(If push fails because it would create a new head, start over and try
again.)
We can use a hook to deny multiple heads for main and stable branches,
even if client used hg push -f, while still allowing them for topics.
Reserved names:
- Tags, bookmarks, and topics named <developer>-* reserved to
<developer>. (Consider using local tags.)
- All other tags (particularly netbsd-*) reserved to releng.
- All branches (long-term branches) reserved to releng.
** commit references
> * New standards for log messages that refer to earlier commits,
> to avoid tying us to any particular VCS in the future.
> (Roughly, what to say in log messages instead of "revision
> <number>" or "commit <hash>" or "the previous commit".)
If we create a tag for every version bump, on the main development
branch and on release branches, we can use
hg identify -T '{latesttag}-{changessincelatesttag}-hg-{id}\n'
to obtain references like
10.99.10-2924-hg-34a495b4a160681952dd78e519c9bc58eb98a4e8
for the commit
https://mail-index.netbsd.org/source-changes/2024/05/24/msg151526.html
which is 2924 linear commits after the 10.99.10 bump in sys/param.h,
and (in the current hg conversion) had commit hash
34a495b4a160681952dd78e519c9bc58eb98a4e8.
Even if we switch Mercurial from SHA-1 to SHA-256, or if we switch from
Mercurial to Git or Bikeshed or Monotreme, only the commit hash part of
the reference will be invalidated; the rest will remain valid.
We can recover the revision id -- even if the hashes change in the
future, e.g. because of transition from SHA-1 to another hash function
-- by:
hg log -r 'branch(tag("10.99.10")) &
last(descendants(
descendants(tag("10.99.10"), 1),
2924))'
This can be written as an hg revsetalias in .hgrc like so:
[revsetalias]
revsincetag(t, n) = branch(tag(t)) & last(descendants(descendants(tag(t), 1), n))
`git describe' produces output like `10.99.10-2924-gd01834fb75de',
which works with `git show' as is:
$ git show 10.99.10-2924-gd01834fb75de
commit d01834fb75de27dacaba086af30015a41040446f
Author: tsutsui <tsutsui%NetBSD.org@localhost>
Date: Fri May 24 10:13:44 2024 +0000
Pull sharable src/usr.sbin/installboot/cd9660.c.
...
Recovering the commit from just the tag and revision number (because it
came from hg instead of git, or because of a `git filter-branch' to
expunge tainted history, or because we changed hash functions from
SHA-1 to something more reasonable) is a little more work:
$ git describe --match 10.99.10 HEAD
10.99.10-2951-g05fd7334f534
$ git show HEAD~$((2951 - 2924))
commit d01834fb75de27dacaba086af30015a41040446f
Author: tsutsui <tsutsui%NetBSD.org@localhost>
Date: Fri May 24 10:13:44 2024 +0000
Pull sharable src/usr.sbin/installboot/cd9660.c.
...
This isn't perfect, but it's doable. Maybe there'll be a tidier syntax
in the future for scanning forward in linear history (rather than
backward, which is already convenient).
** how cvs conversion happens
> * How the existing repository will be converted. The following
> items would be nice to have (in decreasing order of importance):
We adopt essentially the current hg conversion plus tags, done afresh
to discard all past hiccups in the conversion.
XXX Not sure if the conversion script can be convinced to create tags
or we have to invent a new way to do that.
22:51 < joerg> de-sparsifying branches should be considered
22:51 < joerg> checking for split commits
22:51 < joerg> utf8-iying commit messages should be considered
22:52 < joerg> checking and deciding what to do with forced commits
22:53 < joerg> rechecking for bad branch points
Other cleanups:
- any unprocessed admins@ tickets asking for deletion of accidental
imports
*** how to deal with vendor branches used for different purposes
> - how CVS vendor branches will be handled, including
> cases where the same vendor tag has been used for
> logically-distinct branches (as is common in pkgsrc).
I don't think it's very important to address that immediately.
- The release tag is more important for actually merging changes.
- For future vendor imports, we can just create a new finer-grained
vendor branch.
*** how to deal with historical log messages
> - how (if at all) historical log messages will be edited
> during the conversion (for example, to adjust the
> character set, to convert user names to email addresses,
> or to fix up references to CVS numeric revision numbers,
> to make them use the newly defined standards);
**** character set
We should find any non-ASCII commit messages and try to convert them to
UTF-8.
The hg/git conversion already attempts this, but there are some
stragglers that it can't handle automatically.
**** convert user names to email addresses
We can use .mailmap for that.
**** fix up references to CVS numeric revision numbers
I don't think it is feasible to go through the content of commit
messages and fix up anything that says `foo.c rev. 1.23'. But we could
make it easier to resolve these references in the future.
Some possible approaches:
- metadata in hg commit extra
XXX Waiting on joerg for how to query the current conversion for
mapping CVS revision numbers to changeset ids.
*** how to deal with renames
> - how (if at all) historical repository moves and copies
> will be identified and fixed up during the conversion;
XXX to discuss with joerg
*** whether pre-CVS history can be included later
> - whether pre-CVS history (such as the older SCCS history)
> can also be included, either at the time of conversion, or
> later;
XXX to discuss with joerg
*** whether content can be removed
> - whether information removed as a result of the USL v. BSDI
> lawsuit could ever be reinstated, if legal issues were
> resolved in the future.
Two options:
1. hg censor extension:
- keeps data stored in repository
- prevents server from serving to clone/pull clients
- leaves changeset hashes intact
2. hg histedit:
- removes data from repository
- rewrites all changeset hashes, requiring a flag day
** considerations to avoid lock-in
> * Considerations to avoid lock-in to a particular version
> control system, but to allow for a future change to yet
> another system. (For example, we could choose a VCS system
> with a widely supported import and export format, and restrict
> our workflow to features that are supported by many VCS
> systems, and avoid the use of features that are unique to the
> chosen system; however, the set of widely-supported features
> should be identified.)
`hg fastexport' exports the repository content in a format fit for
`git fast-import', also supported by all other halfway serious
alternatives (like fossil, darcs, pijul).
** estimated timeline
> * An estimated time-line of the conversion, together with a list
> of people responsible for it and their respective tasks.
Phase 1: February 2025.
Phase 2: March 2025.
Phase 3: April 2025.
Mercurial infrastructure is already there and working and has been for
years. Main hurdles are performance and just flipping the switch.
Releng has a draft of autobuild with git, which is the highest-priority
automation we need. Adapting autobuild to hg is probably trivial.
Bracket is unhappy with non-linear commit dates but it can probably be
made to use hg bisect instead.
Home |
Main Index |
Thread Index |
Old Index