tech-repository archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Reply to David Holland's notes and comments
Apologies for the slightly belated reply; I'm not subscribed to this
list yet and found David Holland's comments when checking the list
archives to make sure my technical proposal had come through.
(Alan Barrett: I haven't seen a reply from you. If you sent one,
please resend.)
Please copy replies to esr%thyrsus.com@localhost
David Holland:
>While in general I agree, you do realize we already have one family of
>incremental conversions running, right?
Yes. And knowing what I know about CVS malformations that makes me a
little nervous about the output. No matter; we can clear all that up,
I have good tools for checking conversion quality.
Specifically, I have a script wrapper that, after conversion to git,
checks for a content match at every tag and branch head. Later today
I'll run it on src - hadn't had time to before since the Great Beast
arrived.
>We're more or less aware of that - the possible choices are git, hg,
>maybe Fossil, and "write something", where that last isn't very
>realistic.
No, it isn't. :-)
Your conversion target is for you to decide. As I've noted, I don't
think Fossil would scale up well enough to be used here, but since
you've already got a Fossil conversion process in place you can run
your own performance tests to check that.
>So, because git doesn't have real branches (only git-branches) the
>current conversion loses branch information. Is this limitation also
>present in the git-fast-export format? If so, is there a way to avoid
>throwing away branch information when converting to hg?
I don't understand what is "real" about CVS ranches that isn't "real"
about git branches. Both are simply labels pointing to tip revisions
in a tree. Can you clarify what "branch information" you believe is
being lost?
>That's not what "low or moderate" means in these parts.
Right. I see from later in the archives that good performance results
are being obtained on small systems from git shallow clones, which is
another argument in favor of git.
>There are enough references to CVS version numbers outside the
>repository (mail archives, published and signed security advisories,
>bug reports) that we need to preserve the CVS version numbers either
>as searchable metadata in the VCS or in some external searchable table
>of equivalences.
This requirement is pretty standard. You'll get an equivalence table
as a byproduct of the conversion.
>My opinion on this is that all or nearly all of these more or less
>bogus branches should just be eliminated and the import turned into a
>regular add and commit. It might take hand review to identify which
>branches need this treatment; but a good approximation (once one has
>changesets) is any vendor branch import changeset in pkgsrc where the
>same files have never had another version imported on that or any
>other vendor branch.
>From your description it probably is going to take hand review. I'll
need to look at some concrete examples to be sure I understand all the
ramifications.
>This does not address the other (real) vendor branches in src; I think
>it's clear what the proper semantics are there though.
Agreed. I don't anticipate any real problems there.
>I... had thought it deduced and stored the information at commit time.
Nope.
>This may be slightly off topic in this thread, but: how does this
>work, and how can it possibly both scale and work reliably? Does it
>check every other file in the repository for similarity (and in every
>previous version) every time you do git log?
I don't know how it works internally. I believe part of the answer is
that they got acceptable scaling of rename and copy detection at the
cost of not having it work reliably - that is, it can occasionally throw
false negatives.
What I know is that rename and copy matches are detected by the
porcelain, not natively represented in plumbing (git's filesystem-like
storage engine). You can explicitly tell the exporter to *generate* R
and C ops (which you want to do if you're shipping to a
container-tracking VCS like hg under which the importer will interpret
them) but the exporter uses a heuristic (probably based on SHA1
matching) to generate them.
>To what extent do your tools allow importing external annotations
>about renames?
Not at all. The reason should be clear from the foregoing.
>...as above, what about branch metadata?
Again, I don't know what branch metadata you intend. What is there in
CVS beyond the branch name itself?
>Given that we've had conversions running for some time, which required
>doing a lot of cleanup and turned up some fascinatingly broken things,
>it seems likely to me that we've already stepped on most of these
>problems.
Let us devoutly hope so.
Vendor branches are a defect attractor. The remaining trouble spots
likely cluster around those.
--
<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>
Home |
Main Index |
Thread Index |
Old Index