On Sun, 15 Apr 2007, Ed Hill wrote:
From a packaging (not a user perspective) there are a number of ways that merging Core+Extras was/is a big improvement. Dependencies between packages (Core items could not depend on Extras) was, for instance, an annoying problem that now vanishes.
Fair enough, although pretty much by definition core items should not ever HAVE to depend on extras -- it is what makes the core the core. I've always liked the idea of the core remaining a VERY marginal set that is pretty much "just enough" to bootstrap an install. One of the things that from time immemorial has bugged me about the red hat install process is its complete lack of robustness, so that if for any reason it fails in midstream, one pretty much has to start over. This has always been pretty silly. The correct way for the install to have proceeded, especially post yum, is for a minimal diskful installation of "the core" to take place almost immediately, leaving the system bootable. Then to use yum OUTSIDE of the basic install mode to complete the installation, because a yum install on a package list is essentially restartable. Yes, one "can" with some effort do this by hand, using e.g. kickstart (it is pretty difficult from the graphical installer, actually) but it is a PITA because it isn't designed to work that way, it is designed for you to select e.g. a "gnome workstation" from a checkbox list and do it all. On a fast, reliable LAN connection to a fast, reliable server installing on known compatible, reliable hardware of course this works just fine. Over a relatively slow DSL link connecting to a heavily loaded public server that is apt to reject a connection in midstream onto hardware that may (gasp) have some bugs that get tweaked -- especially in the install kernel which is not necessarily the updated kernel that actually works with the hardware once the install and update are finished -- well, let's just say that I find myself even now cursing and pulling out hair. And then there is the stupidity of having to do an install and THEN a yum update. Using yum for all installation past a basic bootstrap install and using the full repo set INCLUDING updates, one can actually just install the up-to-date versions of all packages. I know that this is where everything is moving, but RPM-based installs cannot get to a two-step (real "core" plus a post-install "everything else") fast enough from by point of view. So yes, it does worry me a bit that there will "just" be a core repo. This flattens dependency resolution, sure, but by eliminating any sort of functional/library groupings of packages it actually makes maintenance of the entire suite a monolithic nightmare from the point of view of debugging. For example, I personally could easily see all of X -- libraries, modules, and the basic X-based packages that depend only on core libraries to build -- going into a repo all by itself. In the development process, one then a) updates core; b) ports/patches X on top of "just core"; c) ports/patches anything else with mixed dependencies built on top of "both core and X and this and that". Having a functional separation at the level of API and library is a good thing. Whether or not this "has" to be done at the repo level, well, obviously not. Groups could do it. But there are two aspects of groups -- one is the human oriented grouping of items by human relevant category -- games, scientific, office, whatever. Then there is the system oriented grouping by dependency trees. Those trees "should" be organized in such a way that there are at least a few points of clear separation -- core should be the root or primary trunk of that tree with NO dependencies EVER on anything not in core (where core is minimal, NOT monolithic which is yes the other way to ensure that this is true). As one moves up the trunk, there are a few other fairly clear forking points -- X-based software for example all comes back to an "X core". Eventually -- possibly even fairly quickly -- one gets to levels where the packages will have mixed dependencies across several of these second level "cores" or provide library support to many different kinds of application and clear separation is no longer possible, but it does seem very useful to maintain this separation to the highest functional levels possible to orthogonalize and decompose the global debugging problem to some sort of minimal level of possible circular reference. So it seems to me that there is a need for two distinct kinds of grouping, only one of which can conveniently be accomodated by "package groups". Maybe multiple repos aren't a good way of separating out the required functionality, although I personally think that if "scientific linux" had been from the beginning NOT a distro but just a yum repo built and maintained on TOP of a distro -- or better yet, packaged up to be built and delivered on top of several distros -- it would have been finished and in universal use years ago. "Extras" isn't a great solution either as it is already way too BIG -- recall that I started by making this observation and the problem that caused me to make it isn't really made better by making the repo the "extras" packages are in even bigger. The problem is inheritable. Does unifying "everything" with "core" in FC mean that "everything" will be unified in the next RHEL release built from FC? If so, then the cost of doing QA and support for RHEL increases exponentially or worse with the number of packages added (just as it really does with FC according to the argument above). Maintaining a logical and physical separation of "scientific linux" as an RPM repo built >>on top of<< FC >>or<< Centos >>or<< RHEL >>or<< (the rpm-based distro of your choice) means that one can safely and reliably install RHEL or Centos on a cluster or LAN and then add SL on top via yum with a neat isolation of the associated problems that may or may not arise -- a broken application for example. It makes it very easy to back off to a functional core, to reinstall to a functional core, to determine where the problem lies, and to be able to fix it. Games are another obvious example of something that can and should be in a repo all there own -- a layer that NOTHING in the core should EVER depend on. Office packages ditto -- I love open office, but I definitely don't want something in the actual core required to make e.g. a beowulf cluster node function to depend on an open-office supplied library and again making "Office Linux" into a repo and package suite would focus development attention on what is there in a very appropriate way. By keeping any or all of these things "separated" from the defended core on the basis of library dependency decomposition AND function one makes it easy to build and maintain systems based on modular package selections. I'm very concerned that flattening everything out and relying on package groups on the one hand and internal dependency resolution on the other (as has been done for many years now) to provide functional decomposition in more than two dimensions will simply perpetuate several problems that have existed for many years and that plague RPM-based system users and managers. To understand this problem and where it is headed, it might be really useful to use rpm tools to map out the complete dependency matrix for e.g. FC 6 and note how it has grown relative to e.g. FC 4 and FC 5. If one maps it out hierarchically looking for optimal decomposition points and decomposition planes in the multivariate space thus represented it would be even more useful. With this in hand, one could then very likely at least anticipate what hierarchical additions might be required to accommodate the otherwise rapidly diverging complexity without an attendant divergence in distributed management cost.
But getting to your point about package segregation into named repos for end-user manageability -- I think I see what you want. Perhaps it is something that can be better handled by improving the package "groups/categories" (aka "Comps") situation? Its a topic that has been discussed within Fedora and it will hopefully get more attention (and better tools) as the total number of packages grows.
It needs more attention quite rapidly. The complexity of the dependency tree is (I suspect) highly nonlinear as a function of its size -- I'd guess greater than exponential, although it is still inheriting benefits from the de facto decompositions built into it by its development history and the economic constraints inherent therein. Add ten thousand packages to what WAS the core -- which is obviously one of those de facto decomposition points (with several others implicit therein) and flatten everything and the full force of that complexity will rapidly come out as a maintainer of a single package in that set will have 9999 ways to wreak dependency havoc and create deeply hidden bugs with no obvious "owner". My concern may be silly, of course. The complexity problem will be under considerable economic pressure to self-organize into functional decompositions that keep some sort of lid on the otherwise enabled (mathematical) catastrophe and sane, intelligent humans will probably find a way to muddle through. It does seem to me that it might be wiser to do some numerical/statistical/mathematical studies of the decomposition problem and ANTICIPATE the catastrophe and think now about ways of doing better than just "muddling through". I'd think that Red Hat itself would pretty much demand this sort of foresight as a condition of supporting FC development, being as how they will willy-nilly inherit the real economic burden of maintenance problems created by n-tupling (with n>4) the number of packages "in" RHEL in two more years. Of course they won't -- they'll do their OWN line-drawing and decomposition right back into RHEL and an "extras" that consists of all the FC packages that they don't want to be directly responsible for supporting, but it does seem to me to be highly desireable to INTEGRATE this sort of decomposition now rather than impose it a posteriori onto a dependency space that will rapidly mix once the "requirement" that the core build independently is relaxed and extended over the full FC package space.
If you have a desire to improve the situation the best place to start is: http://fedoraproject.org/wiki/ and volunteer to help with some aspect.
I'm a bit overcommitted -- I'm trying to get three or four distinct applications I have written and personally support ready to go INTO FC extras (which is not trivial because of the various disciplines this requires!) AND I teach full time (and a half, this semester) AND I do a bit of consulting AND I write far too much on this list AND I try to have a life that doesn't involve JUST typing at my keyboard with decreasing success. So I'd much rather just predict doom and gloom in a public forum of the possible consequences of unifying and flattening a rather complex dependency space without thinking first about the problems of dependency decomposition planes in an abstract space of rather high dimensionality and the economics of hierarchical decomposition of the associated debugging and maintenance problems. That way if people have actually thought about these problems and have a solution or have concrete empirical reasons to believe that they won't become problems after all (rgb, you ignorant slut!) all is well and good. If not, well, maybe they WILL think about them and either reconsider or deliberately engineer a solution to them that can be expected to scale nicely up to 20+ kpkgs and eventually beyond. This is serious business. I can without even trying name a half dozen times in the past that people built de facto scaling limits into operating systems that would "never" be exceeded that were of course exceeded. We have been discussion the 32 bit problem. Then there is the Unix clock problem. There are all sorts of places inside the Linux kernel that went from being unsigned ints to being unsigned long long ints (32 bit ints to 64 bit ints, at any rate) because yes, one CAN receive more than 2^32 packets on a network between boots etc. There is the famous 10-bit boot address problem. There are the limits in PID space and UID space (which at one point were signed 16 bit ints and probably still are AFAICT looking over PIDs returned by ps aux). There are limits in the number of open sockets and device handles and more. Some of these limits have a "low cost" associated with pushing the limits of scaling -- and many of them are disjoint linear problems that scale at zero cost until they fail altogether and require a one-time costly fix. In a way they are NICER because of this. In others -- the problem associated with having to stat very large directories in a flat FHS-dictated layout -- the potential scaling catastrophe has been ameliorated faster than it developed by virtue of Moore's law (competing exponents, as it were) and a human time contraint rather than a system efficiency constraint. In this case the economic impact of a poor design decision is quite large, even given that the "cost" is mostly opportunity cost time DONATED to the project and hence easily viewed as an inexhaustible resource. ;-) rgb
Ed
-- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:[EMAIL PROTECTED] _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf