Hi, Thanks for the response and interest. I should note the package names I have listed in the dataset are source package names, not binary package names.
The method I am using is based on similarity between filename lists of source packages. I use the Jaccard index ( http://en.wikipedia.org/wiki/Jaccard_index) between sets of filenames to calculate similarity. This was done as an offshoot from the PhD research I'm currently undertaking at Deakin University. -- Silvio On Fri, Jan 21, 2011 at 9:02 AM, Enrico Zini <enr...@enricozini.org> wrote: > On Wed, Jan 19, 2011 at 10:54:44AM +1100, Silvio Cesare wrote: > > > I have generated a list of roughly equivalent packages between Linux > > distributions (currently Debian 5 and Fedora 13). The list is > > automatically generated. > [...] > > Hi Silvio, > > thank you for your work, it is extremely valuable work. I'm currently > at a cross-distro meeting on app installers[1] and it's precisely > something we've been working on today. I'd be greatly interested to > exchange algorithms with you. > > The main use case we have in mind is to be able to fall back on other > distros when a package doesn't have some piece of information. For > example: > > - does package $foo have a screenshot in Debian? > - if no, how about in Fedora? > - if no, how about in OpenSUSE? > - if no, how about in Mandriva? > > The example uses screenshots, but it could be other kinds of metadata, > like categories (it's a way for example to port at least some of Debtags > to other distros), ratings or user comments. > > The euristics I've been implementing so far are: > > - trivial package name matching > - 'stemming' specific kinds of package names (debian:lifoo-dev->foo; > fedora:foo-devel->foo) > - matching packages that contain the same .desktop files or the same > pkg-config files > - similarity matching of file lists > > I still don't have results because the implementation is not complete, > but I should have something in a day or two. You have something *today*, > which is, wow. Tomorrow (Friday) I'll download your dataset and try to > add another euristic that just uses it. It'll also be interesting to use > all these methods to cross-validate each other. > > [1] http://distributions.freedesktop.org/wiki/Meetings/AppInstaller2011 > > > Ciao, > > Enrico > > -- > GPG key: 4096R/E7AD5568 2009-05-08 Enrico Zini <enr...@enricozini.org> > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.10 (GNU/Linux) > > iQEcBAEBCAAGBQJNOLDeAAoJEON4Oc9CHQta7ckH/1IsATAFZss4NprTfzO0LMWi > hXn8ds1GvPIxzokgKnX6v3JAq0rX56kFe4yDMFL2JA0GHTHR7bpXtClYBFtP9ErX > XWv6caymfqmJVQLDDwUuDMPUBrVLeT+U4syv7B47JI/paGMfDPYfcRn74qEVrSlL > T3P9cMYKzAwvgrNpL+EGAP3Kw34nfiMra3hmD7SeeYluo3trNUV3/BP6oRxIiLu0 > RBSvRzf6+W2P+jE2TsR/KSPYQQ9Ji6CjFPElzNYgW6N3ZKte985vA5AadX91pE2G > QuKeW9PouddjCok1G9qgUCbDLz/WEQqbwkvC6/Wi5TVvpyRwqWmoj6Pmcx9klKM= > =KtYB > -----END PGP SIGNATURE----- > >