Equivalent packages between Linux distributions

2011-01-18 Thread Silvio Cesare
I have generated a list of roughly equivalent packages between Linux
distributions (currently Debian 5 and Fedora 13). The list is automatically
generated.

https://github.com/silviocesare/Equivalent-Packages/blob/master/NearestNeighbour/Debian5_Fedora13_Matches

An example entry in this file is:

ack-grep:ack:0.914286

This means that package ack-grep in Debian is 91.4286% similar to package
ack in Fedora. I have set a threshold of 60% to say if two packages are
equivalent. If they are more than 60% similar, then I say they are the same
package. This threshold can be tuned. The similarity is based on similarity
between source packages.

Do you think such a list could be useful to Debian? A possible use would be
that a user could identify an equivalent package knowing only Fedora's
package name.

Please CC me on any responses.

--
Silvio Cesare


Re: Equivalent packages between Linux distributions

2011-01-21 Thread Silvio Cesare
Hi, Thanks for the response and interest.

I should note the package names I have listed in the dataset are source
package names, not binary package names.

 The method I am using is based on similarity between filename lists of
source packages. I use the Jaccard index (
http://en.wikipedia.org/wiki/Jaccard_index) between sets of filenames to
calculate similarity. This was done as an offshoot from the PhD research I'm
currently undertaking at Deakin University.

--
Silvio

On Fri, Jan 21, 2011 at 9:02 AM, Enrico Zini  wrote:

> On Wed, Jan 19, 2011 at 10:54:44AM +1100, Silvio Cesare wrote:
>
> >I have generated a list of roughly equivalent packages between Linux
> >distributions (currently Debian 5 and Fedora 13). The list is
> >automatically generated.
> [...]
>
> Hi Silvio,
>
> thank you for your work, it is extremely valuable work.  I'm currently
> at a cross-distro meeting on app installers[1] and it's precisely
> something we've been working on today. I'd be greatly interested to
> exchange algorithms with you.
>
> The main use case we have in mind is to be able to fall back on other
> distros when a package doesn't have some piece of information. For
> example:
>
>  - does package $foo have a screenshot in Debian?
>  - if no, how about in Fedora?
>  - if no, how about in OpenSUSE?
>  - if no, how about in Mandriva?
>
> The example uses screenshots, but it could be other kinds of metadata,
> like categories (it's a way for example to port at least some of Debtags
> to other distros), ratings or user comments.
>
> The euristics I've been implementing so far are:
>
>  - trivial package name matching
>  - 'stemming' specific kinds of package names (debian:lifoo-dev->foo;
>   fedora:foo-devel->foo)
>  - matching packages that contain the same .desktop files or the same
>   pkg-config files
>  - similarity matching of file lists
>
> I still don't have results because the implementation is not complete,
> but I should have something in a day or two. You have something *today*,
> which is, wow. Tomorrow (Friday) I'll download your dataset and try to
> add another euristic that just uses it. It'll also be interesting to use
> all these methods to cross-validate each other.
>
> [1] http://distributions.freedesktop.org/wiki/Meetings/AppInstaller2011
>
>
> Ciao,
>
> Enrico
>
> --
> GPG key: 4096R/E7AD5568 2009-05-08 Enrico Zini 
>
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1.4.10 (GNU/Linux)
>
> iQEcBAEBCAAGBQJNOLDeAAoJEON4Oc9CHQta7ckH/1IsATAFZss4NprTfzO0LMWi
> hXn8ds1GvPIxzokgKnX6v3JAq0rX56kFe4yDMFL2JA0GHTHR7bpXtClYBFtP9ErX
> XWv6caymfqmJVQLDDwUuDMPUBrVLeT+U4syv7B47JI/paGMfDPYfcRn74qEVrSlL
> T3P9cMYKzAwvgrNpL+EGAP3Kw34nfiMra3hmD7SeeYluo3trNUV3/BP6oRxIiLu0
> RBSvRzf6+W2P+jE2TsR/KSPYQQ9Ji6CjFPElzNYgW6N3ZKte985vA5AadX91pE2G
> QuKeW9PouddjCok1G9qgUCbDLz/WEQqbwkvC6/Wi5TVvpyRwqWmoj6Pmcx9klKM=
> =KtYB
> -END PGP SIGNATURE-
>
>


CPE lists was Re: Equivalent packages between Linux distributions

2011-01-31 Thread Silvio Cesare
I created an automatically generated CPE list for Fedora13 packages. It only
has 300 or so packages in it, but this will improve as say Debian increase
the list of packages they track (they only track 1100 or so currently).

https://github.com/silviocesare/Equivalent-Packages/blob/master/CPE/Fedora13.CPE.generated

To generate the list I build a list of equivalent packages between Debian
and Fedora
https://github.com/silviocesare/Equivalent-Packages/blob/master/NearestNeighbour/Debian5_Fedora13_Matches.
I then use Debian's CPE list
http://svn.debian.org/wsvn/secure-testing/data/CPE/list<http://svn.debian.org/wsvn/secure-testing/data/CPE/list%20>to
document the equivalent packages in Fedora.

This should work fine for other Distributions also.

--
Silvio Cesare


Audit of Debian/Ubuntu for unfixed vulnerabilities because of embedded code copies

2012-07-02 Thread Silvio Cesare
Hi,

I have been working on a tool called Clonewise
(http://www.github.com/silviocesare/Clonewise and http://www.FooCodeChu.com)
to automatically identify code copies in Linux and try to infer if any of
these code copies are causing security issues because they haven't been
updated. The goal is for the Debian's security team to use Clonewise to
find bugs and track code copies. Clonewise has found tens of bugs in the
past, but I'm using some different approaches and code to what I've done in
the past. I'm working on getting it ready for release.

I recently ran the tool and cross referenced identified code copies with
Debian's security tracking of affected packages by CVE. I did this for all
CVEs in 2010, 2011, and 2012.

The report can be found here
http://www.foocodechu.com/downloads/Clonewise-report.txt

Clonewise reported 138 potentially unfixed code copies related to specific
CVEs in 22 packages.

Now some of these cases are going to be false positives. From looking at
the results, many of the vulns were probably fixed but have not been
reported in the security tracker. The report tries to be self explanatory
and justify why it thinks it's found a code copy based on the source code
being similar. It also tells you which source file has the vuln based on
the CVE summary.

I will work on going through this report myself, but I thought I'd post it
to the list and see if anyone wants to help. If you find false positives,
or actual vulnerabilities, please tell me about it so I can tally up the
results, and also so I can improve the tool to have fewer false positives
in the future. If you think the report is missing something that would make
it easier to read, be sure to tell me.

Thanks,

Silvio Cesare
Deakin University
http://www.FooCodeChu.com


Re: Audit of Debian/Ubuntu for unfixed vulnerabilities because of embedded code copies

2012-07-02 Thread Silvio Cesare
Last I checked, ia32-libs on squeeze didn't have the openssl patches for
0.9.8. I may have to check more thoroughly to be sure. It might have some
other vulns as well.

--
Silvio

On Mon, Jul 2, 2012 at 8:27 PM, Bernd Zeimetz  wrote:

> On 07/02/2012 10:53 AM, Silvio Cesare wrote:
> > Hi,
> > [ ... ]
> > Now some of these cases are going to be false positives. From looking at
> > the results, many of the vulns were probably fixed but have not been
> > reported in the security tracker. The report tries to be self
> > explanatory and justify why it thinks it's found a code copy based on
> > the source code being similar. It also tells you which source file has
> > the vuln based on the CVE summary.
>
> The ia32-libs stuff are all false positives (assuming the package was
> updated after the security fixes came out, I'm not 100% sure about that
> :) And the openssl source is expected to contain the openssl source.
>
> Otherwise I think it might be worth to integraet such a check into the
> qa tools Debian runs regularity.
>
> Thanks for your work!
>
> Cheers,
>
> Bernd
>
>
>
> --
>  Bernd ZeimetzDebian GNU/Linux Developer
>  http://bzed.dehttp://www.debian.org
>  GPG Fingerprint: ECA1 E3F2 8E11 2432 D485  DD95 EB36 171A 6FF9 435F
>