On Sun, Nov 30, 2008 at 3:12 PM, Diego 'Flameeyes' Pettenò <[EMAIL PROTECTED]> wrote: > "Alec Warner" <[EMAIL PROTECTED]> writes: > >> Diego, What are the concrete benefits of your proposal? > > As I said: > > - no need to replicate homepage data between versions; even though forks > can change homepage, I would expect that to at worse split in two a > package, or have to be different by slot, like Java; > - allows proper handling of packages lacking a HOMEPAGE; > - less data in metadata cache; > - users can check the metadata much more easily by just opening the xml > file or interfacing to that rather than having to skim through the > ebuild, the xml files are probably more user readable then ebuilds > using multiple eclasses; > - displaying info about the package does not require parsing the full > ebuild file, with its eclasses; > - extensible to provide more links than just the homepage (forums, > trackers, gentoo-specific documentation, ...); > - if we also move DESCRIPTION, search software can ignore everything > about ebuild parsing, and just use the metadata.xml files; considering > how many people actually use or used eix, it would make sense to allow > third-party applications to be able to search through the tree; > - webapps like packages.gentoo.org would be able to display basic > information without having to parse the ebuilds or the metadata cache. > - as much as people might think metadata is easier to parse than > anything, XML has one huge advantage: there are plently of parsers for > any language without having to actually write one, even as easy as it > can be, and it's easily interfaced with anything; I wrote a simple XSL > file that outputs the basic metadata details for packages without > having any parser or executable code but xsltproc (or any other XSLT > software), correlating data with herds.xml too; > - it really is metadata, and it makes very little sense to need parsing > of eclasses and EAPI handling to get some data from a package that is > non-functional in nature and free form (just like DESCRIPTION, and > unlike LICENSE like Alec said), and that changes at worse once each > slot (unlike LICENSE that can change at any given version). > > Disadvantages: > > - it requires user-interface software to parse metadata.xml to show > data for a package; which is already needed to show per-package USE > flag meaning; > > General points: > > - it does not solve unrelated problems like code replication; > > Can someone come up with any other point beside "I don't like XML" > (which I already said is a puny answer) and "it can theorically be 10 > different homepages for 10 different versions" (which I have sincerely > some beef with myself since if you fork a software you might as well > change its name)? > > As I said, moving out the HOMEPAGE field from a package manager > prospective is non functional; if you're showing to the user some data > about a package you might as well show as much as you can, like long > descriptions, other links, and USE flags. And the fact that you can ask > the package manager for something is for me not a valid reason to avoi > moving something in a more approchable place for other software.
Ciaran covered most of my points already. Third party programs should not parse ebuilds and eclasses by hand. I'd expect half of them to get it wrong if they tried. Ebuild parsing is hard, that is why we have three complex software packages that for the most part do it properly. Why is 'ask the package manager' an invalid reason to not making something more accessible? How accessible must this data be? Writing an XML parser is not accessible enough (for me), we should just put it in plain text on the hard drive, perhaps in "/var/cache/edb/dep/${PORTDIR}/$C/$PV" Oh wait, we do that already[1]. So really this is where I'm confused. If third parties are using the package manager APIs to get at this data; the only rationale to move it out of ebuilds is: - Space savings. Certainly your scheme may be smaller, but the XML tag overhead may eat into the savings. You should do some estimates to show the community how much smaller the tree will be from this proposal. Randomly looking: cd /var/cache/edb/dep/usr/portage grep -hR HOMEPGE | wc -m yields 1.1million characters. Each character is 1 byte (is that so in UTF8?) So at best you could save the 1.2GB tree 2.2 million bytes (about 2 megs) if your scheme was (more than) 100% efficient. The extra 1.1 million characters comes from the space freed in the cache (since we don't cache metadata.xml). 2 megs into 1200 megs is.. ".166666%" of the tree. As I thought, not very compelling. Looking at DESCRIPTION: grep -hR DESCRIPTION | wc -m yields ~1.5 million characters. Nice! So if we purge that from the cache and replace it with a (more than) 100% efficient metadata.xml solution we could save: 3 megs 3 megs saved + 2 megs saved = 5 megs saved. 5 / 1200 = .416666% of the tree. Still again not very compelling. - Extra Tags. Extending HOMEPAGE is harder than changing metadata.xml, which I imagine is part of the reason why you proposed it. It will be until EAPI3 at least until we can get the HOMEPAGE tags in ebuilds implemented and then we have to bump affected ebuilds to EAPI3. However if we drop the 'extra tags' bit then the only reason to move the data is space, and I imagine the space savings will not be compelling; but feel free to prove me wrong. [1] For ebuilds that have cache entries, using the default cache implementation for portage. > > -- > Diego "Flameeyes" Pettenò > http://blog.flameeyes.eu/ >