Hi, Thanks for the response and interest.

I should note the package names I have listed in the dataset are source
package names, not binary package names.

 The method I am using is based on similarity between filename lists of
source packages. I use the Jaccard index (
http://en.wikipedia.org/wiki/Jaccard_index) between sets of filenames to
calculate similarity. This was done as an offshoot from the PhD research I'm
currently undertaking at Deakin University.

--
Silvio

On Fri, Jan 21, 2011 at 9:02 AM, Enrico Zini <enr...@enricozini.org> wrote:

> On Wed, Jan 19, 2011 at 10:54:44AM +1100, Silvio Cesare wrote:
>
> >    I have generated a list of roughly equivalent packages between Linux
> >    distributions (currently Debian 5 and Fedora 13). The list is
> >    automatically generated.
> [...]
>
> Hi Silvio,
>
> thank you for your work, it is extremely valuable work.  I'm currently
> at a cross-distro meeting on app installers[1] and it's precisely
> something we've been working on today. I'd be greatly interested to
> exchange algorithms with you.
>
> The main use case we have in mind is to be able to fall back on other
> distros when a package doesn't have some piece of information. For
> example:
>
>  - does package $foo have a screenshot in Debian?
>  - if no, how about in Fedora?
>  - if no, how about in OpenSUSE?
>  - if no, how about in Mandriva?
>
> The example uses screenshots, but it could be other kinds of metadata,
> like categories (it's a way for example to port at least some of Debtags
> to other distros), ratings or user comments.
>
> The euristics I've been implementing so far are:
>
>  - trivial package name matching
>  - 'stemming' specific kinds of package names (debian:lifoo-dev->foo;
>   fedora:foo-devel->foo)
>  - matching packages that contain the same .desktop files or the same
>   pkg-config files
>  - similarity matching of file lists
>
> I still don't have results because the implementation is not complete,
> but I should have something in a day or two. You have something *today*,
> which is, wow. Tomorrow (Friday) I'll download your dataset and try to
> add another euristic that just uses it. It'll also be interesting to use
> all these methods to cross-validate each other.
>
> [1] http://distributions.freedesktop.org/wiki/Meetings/AppInstaller2011
>
>
> Ciao,
>
> Enrico
>
> --
> GPG key: 4096R/E7AD5568 2009-05-08 Enrico Zini <enr...@enricozini.org>
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.10 (GNU/Linux)
>
> iQEcBAEBCAAGBQJNOLDeAAoJEON4Oc9CHQta7ckH/1IsATAFZss4NprTfzO0LMWi
> hXn8ds1GvPIxzokgKnX6v3JAq0rX56kFe4yDMFL2JA0GHTHR7bpXtClYBFtP9ErX
> XWv6caymfqmJVQLDDwUuDMPUBrVLeT+U4syv7B47JI/paGMfDPYfcRn74qEVrSlL
> T3P9cMYKzAwvgrNpL+EGAP3Kw34nfiMra3hmD7SeeYluo3trNUV3/BP6oRxIiLu0
> RBSvRzf6+W2P+jE2TsR/KSPYQQ9Ji6CjFPElzNYgW6N3ZKte985vA5AadX91pE2G
> QuKeW9PouddjCok1G9qgUCbDLz/WEQqbwkvC6/Wi5TVvpyRwqWmoj6Pmcx9klKM=
> =KtYB
> -----END PGP SIGNATURE-----
>
>

Reply via email to