Subject: Re: Removing pkgviews?
To: Alistair Crooks <agc@pkgsrc.org>
From: Johnny Lam <jlam@pkgsrc.org>
List: tech-pkg
Date: 03/28/2006 19:40:26
Alistair Crooks wrote:
> On Fri, Mar 24, 2006 at 11:26:35AM -0500, Johnny C. Lam wrote:
>> I hope this is not *too* controversial, but I am thinking of removing
>> the existing pkgviews implementation in pkgsrc after the 2006Q1 branch.
>> The existing code is complicating development of some new
>> infrastructure components that I'm pursuing. I'd like to get a sense of
>> the number of users that this might impact.
>
> I'm interested in your new infrastructure, but would like to know a
> bit more about why it's so invasive that pkgviews needs to go. I'd
> also like to know what needs to be done to complete pkgviews.
>
> My understanding is that you still need to "reference count"
> directories (which can be manoeuvered around by adding a switch to
> pkg_delete to ignore the return results from an rmdir(2), and which I
> think would be a good addition anyway), and also the problems arising
> from the mini-packages debate, which I think are greatly reduced by the
> former change. Are there any more showstopeers?
For clarification, I will refer to "package views" as the concept, and
"pkgviews" as the current implementation in pkgsrc today.
I outlined what I think needs to happen for pkgviews to be "finished" in
my talk at the last pkgsrcCon[1]. The problem with the existing
implementation boils down to needing a globally-unique way to refer to
files in the latest version of a package from another package. I
outlined three possible solutions to this problem, but I'm not really
satisfied with any of them. The solutions that don't involve
source-level modification of packages can be summarized as "just use
more symlinks", and here I'm talking about a *lot* more symlinks. Those
solutions all lack elegance, which is irksome from both a developer's
and a system administrator's standpoint.
Looking back, I think that pkgviews is a flawed implementation of the
package views concept. The original package views paper[2] noted that
there were problems with wildcard dependencies if we had pkgviews
packages link directly against files in other depot directories, but
didn't really cover what those problem might be. I think we now have a
clear understanding of what those problems are, especially as to how it
relates to pkgviews in pkgsrc today.
If I were to implement package views again, or at least something quite
like it, I think I would do it completely differently, and the design
would be driven by what it means for a package to depend on another
package. I think the relationship that is important is which package
"uses" the other package, regardless of how the dependency relationship
is expressed between those two package. For example, links-gui depends
on png and links-gui does indeed use libpng.so, whereas p5-CGI depends
on perl, but perl is the one that actually uses CGI.pm (i.e. 'use CGI').
If package A uses package B, then we should just symlink the contents of
package B's depot directory into package A's depot directory. This
gives each package a well-known way to refer to files that belong to
another package -- it simply pretends they're installed into the same
place as its own files, directly under ${PREFIX}. Thus, if a package
installs a Perl script, then it "uses" the perl package so the perl
binary is symlinked to ${PREFIX}/bin/perl, and the Perl script can
simply start with "#!${PREFIX}/bin/perl". If a package needs shared
libraries provided by a library package, that those shared libraries are
symlinked into the package depot directory, and we just need
${PREFIX}/lib in the run-time library search path.
In this design, the depot directory of a package is not a sacrosanct
location that only contains files belonging to that package -- instead,
it becomes a collection of files and symlinks needed to make that
package work. We can generalize this by not requiring depot directories
at all, but rather have directories where whole interrelated families of
packages could co-exist, i.e. multiple LOCALBASEs. We also don't need
the package's meta-data directory to be the parent directory for the
depot directory in this design, so we can keep using either a single
PKG_DBDIR, or multiple PKG_DBDIRs specified in a "PKG_DBPATH". In this
generalized design, you could implement the depot directory idea
contained in the pkgviews paper, but this is also flexible enough for
most folks to do what I think they want, which is to install families of
packages into a few separate LOCALBASEs.
There's no beating around the bush -- we're still using a lot of
symlinks to make this happen. I think we would improve this by doing
"tree-folding"[3] like GNU Stow does. There are also ramifications
regarding PKG_SYSCONFDIR and VARBASE when they are shared between
packages that need to be addressed, but I haven't thought through them
yet (I note that this is still a problem for the existing pkgviews
implementation). Lastly, we still need better tools for managing these
symlink farms.
I'd be happy to keep discussing this further, as we still don't have a
decent way to manage multiple installations of the same package in
pkgsrc, and package views is really the only proposal put forth to
address this in a general way.
Cheers,
-- Johnny Lam <jlam@pkgsrc.org>
[1] http://www.pkgsrccon.org/2005/slides/jlam/pkgviews.html
[2] http://www.netbsd.org/Documentation/software/pkgviews.pdf
[3] http://www.gnu.org/software/stow/manual.html#IDX11