Re: Non-identical files with identical md5sums on Debian systems?

2013-08-05 Thread Chow Loong Jin
On Mon, Aug 05, 2013 at 06:44:49AM +0200, Fabian Greffrath wrote:
> Hi all,
> 
> I do occasionally check for identical files on different systems by
> comparing their md5sums. So, just out of interest, could someone tell me
> (how to find out) how many non-identical files with identical md5sums
> there are there on a typical (say, amd64) Debian system?

How about this?


#!/bin/sh
cat /var/lib/dpkg/info/*.md5sums | sort -u > md5sums-files.txt
awk '{print $1}' md5sums-files.txt | uniq -c | awk '$1 > 1 {print $2}' > dup.txt

while read md5; do
grep "^$md5" md5sums-files.txt | sed -re 's/^[a-f0-9]+[[:space:]]+//' |
(
read file
shasum1=$(sha256sum "$file" | awk '{print $1}')

while read file; do
if [ "$(sha256sum "$file" | awk '{print $1}')" != "$shasum1" ]; then
echo $md5 $file
fi
done
)
done < dup.txt


I tried running it, didn't find anything on my Ubuntu installation.

-- 
Kind regards,
Loong Jin


signature.asc
Description: Digital signature


Re: Non-identical files with identical md5sums on Debian systems?

2013-08-05 Thread Helmut Grohne
On Sun, Aug 04, 2013 at 10:24:59PM -0700, Vincent Cheng wrote:
> On Sun, Aug 4, 2013 at 9:44 PM, Fabian Greffrath  wrote:
> > I do occasionally check for identical files on different systems by
> > comparing their md5sums. So, just out of interest, could someone tell me
> > (how to find out) how many non-identical files with identical md5sums
> > there are there on a typical (say, amd64) Debian system?
> 
> The closest thing to what you want may be dedup.debian.net, but I
> don't think it lets you filter out non-identical files.

Indeed this task can be solved with the software backing
dedup.debian.net. The general assumption is that sha512 is
collision-free. I can give a rough idea on how to do that:

1) Obtain the software.
2) Modify schema.sql to add md5 to the functions table.
3) Modify importpkg.py to record md5 hashes.
4) Follow the steps in README to import a local Debian mirror.
   (This takes about 7 hours on a quick 8 core box and 3 days on a
   slower single core.)
5) Look for files, that have same md5 hash, but different sha512 hash.
   Something like this SQL query will give you an answer (untested).

   SELECT h1.cid, h2.cid FROM hash AS h1 JOIN hash AS h2 ON h1.fid = h2.fid AND 
h1.hash = h2.hash JOIN hash AS h3 ON h1.cid = h3.cid JOIN hash AS h4 ON h2.cid 
= h4.cid AND h3.fid = h4.fid JOIN function AS f1 ON h1.fid = f1.id JOIN 
function AS f3 ON h3.fid = f3.id WHERE h3.hash != h4.hash AND f1.name = 'md5' 
AND f3.name = 'sha512';

   It gives keys into the content table to look up the actual filenames
   and packages.

In case you have any questions, just ask (mail or #-qa on oftc).

Helmut


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20130805084636.ga10...@alf.mars



Re: Finding correct component for Virtual Box / Debian / screen resolution issue

2013-08-05 Thread Paul Wise
On Sun, Aug 4, 2013 at 3:09 PM, Cyril Brulebois wrote:

> Doesn't look like something to be run in d-i.

As I understand it, isenkram is just a proof of concept of the idea.
It also seems to be a reimplementation of discover?

> discover already pulls virtualbox bits in. Mentioned not so long ago in:
>   https://lists.debian.org/20130726091036.gb22...@mraw.org

Aha, so that is where I should file a patch for installing
thinkfan/etc on Thinkpads when installing with d-i.

It appears that discover and isenkram uses a hard-coded list (in
discover-data) of mappings between devices. If DEP-11 support were to
be added to the archive and to discover, maintaining the list of
mappings between devices and packages would be delegated to individual
maintainers of the packages in question. discover/isenkram/PackageKit
would then use apt data to discover which packages to install on which
hardware.

BTW, virtualbox-ose-guest-x11 got renamed to virtualbox-guest-x11 in
2011 so the discover-data package needs updating for that transition.

-- 
bye,
pabs

http://wiki.debian.org/PaulWise


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/CAKTje6FdCdMEif+ha1cdZ=7q4g3nlgt7avewjsjggne7wxx...@mail.gmail.com



Bug#718769: ITP: clsync -- live sync tool based on inotify, written in GNU C

2013-08-05 Thread Artyom A Anikeev
Package: wnpp
Severity: wishlist
Owner: Artyom A Anikeev 

* Package name: clsync
  Version : 0.0
  Upstream Author : Dmitry Yu Okunev 
* URL : https://github.com/xaionaro/clsync
* License : GPL-3+
  Programming Lang: C
  Description : live sync tool based on inotify, written in GNU C

Clsync recursively watches for source directory and executes external
program to sync the changes. Clsync is adapted to use together with rsync.
This utility is much more lightweight than competitors and supports such
features as separate queue for big files, regex file filter and
multi-threading.


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/20130805090403.15637.78167.report...@icarus.mephi.ru



Re: Non-identical files with identical md5sums on Debian systems?

2013-08-05 Thread Adam Borowski
On Sun, Aug 04, 2013 at 10:21:09PM -0700, Russ Allbery wrote:
> Fabian Greffrath  writes:
> 
> > I do occasionally check for identical files on different systems by
> > comparing their md5sums. So, just out of interest, could someone tell me
> > (how to find out) how many non-identical files with identical md5sums
> > there are there on a typical (say, amd64) Debian system?
> 
> Unless you have a collection of MD5 collision attacks, or have installed a
> package that includes a sample MD5 collision, the changes are quite good
> that the answer is "zero."  MD5 is no longer considered cryptographically
> strong, but that doesn't mean it's not a fairly random 128-bit hash.  You
> need a *lot* of files before even the birthday paradox will give you much
> likelihood of an MD5 collision that wasn't intentionally constructed.

Let's assume every hard drive produced so far in human history is combined
in a single RAID0 array, and formatted using a typical filesystem without
an inode limit, then filled with small files.  If my estimate is correct,
thanks to the birthday paradox there's around 0.001% chance there will be
at least one non-constructed MD5 collision.

Also, there is no known preimage attack against MD5; collision attacks are
quite less dangerous as the attacker would need to first give you a
legitimate version of the file she wants to replace.

-- 
ᛊᚨᚾᛁᛏᚣ᛫ᛁᛊ᛫ᚠᛟᚱ᛫ᚦᛖ᛫ᚹᛖᚨᚲ


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20130805100834.ga2...@angband.pl



Bug#718775: ITP: clojurehelper -- Helper scripts for packaging Clojure programs

2013-08-05 Thread Eugenio Cano-Manuel Mendoza
Package: wnpp
Severity: wishlist
Owner: "Eugenio Cano-Manuel Mendoza" 

* Package name: clojurehelper
  Version : 0.1
  Upstream Author : Eugenio Cano-Manuel Mendoza 
* URL :
* http://anonscm.debian.org/gitweb/?p=pkg-clojure/clojurehelper.git
* License : MIT
  Programming Lang: Python
  Description : Helper scripts for packaging Clojure programs

Clojurehelper contains several scripts which help in packaging Clojure
programs:
 * lein_makepkg generates a template for a Debian Clojure package.
 * lein_builddocs creates html documentation from Markdown format.
 * lein_build creates jar files from Clojure sources.
 * lein-xml is a plugin for Leiningen that exports project.clj files to xml.
This package provides a dh sequence that can be used along javahelper to build
clojure packages.


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20130805022738.8425.50929.reportbug@localhost



Re: new hashes (SHA512, SHA3) in apt metadata and .changes files?

2013-08-05 Thread Ian Jackson
Ondřej Surý writes ("Re: new hashes (SHA512, SHA3) in apt metadata and .changes 
files?"):
> SHA512 doesn't bring any advantage over SHA256.

AIUI SHA-512 is faster than SHA-256 on many processors, and not
usually slower on the others.  If the hashes are too long, they can be
truncated.

Ian.


--
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20991.39828.481089.77...@chiark.greenend.org.uk



Re: We need a global decision about R data in binary format, and stick to it.

2013-08-05 Thread Ian Jackson
Paul Tagliamonte writes ("Re: We need a global decision about R data in binary 
format, and stick to it."):
> On Mon, Aug 05, 2013 at 09:57:35AM +0900, Charles Plessy wrote:
> > it is the common practice in upstream R packages to store data in binary
> > objects.  Those objects can be modified with R, and exported into various
> > formats.  The Debian archive if full of them.
> 
> This is not unlike a Python pickle.
> 
> However, even more to the point, with *this* package, that was a
> *generated data table*. These *generated* values are clearly not prefered
> form of modification. I asked the uploader to point to where they came
> from. I don't think this is unfair.

We need to separate these two issues.

One is the file format question.  It doesn't seem to me that there is
anything wrong with a binary format as the preferred form for
modification, in principle.  For a file which is typically edited
using R, including by upstream when they what to edit it, then there
is no problem.

The other is the assertion that this particular case involves a
generated data table.  If this is the case then the source package
needs to contain the source code which generates the table - and,
really, it should regenerate the table during the build.  (The source
might be in the form of another R binary object.)

(Of course there is a third issue: it is probably not the best
engineering decision to use a binary save format rather than text
source code.  But that's not something the Debian maintainer
necessarily gets to choose and it's not a reason for an ftpmaster
reject.)

> > The question asked by Paul is a recurrent question that comes each
> > time the FTP trainees rotate (basically once per release cycle,
> > because during the Freeze the FTP trainees find other exciting
> > tasks to do, and then do not seem to have much time to process NEW
> > anymore).
> 
> This must mean many people who care deeply about this topic see this as an
> issue.

I don't think this is a helpful response to someone who is raising
what they see as a systematic problem.

Paul, would it be possible to update the ftpmaster assistant reference
materials to discuss R's binary files ?

Ian.


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/20991.42219.341036.231...@chiark.greenend.org.uk



Re: Non-identical files with identical md5sums on Debian systems?

2013-08-05 Thread Ian Jackson
Russ Allbery writes ("Re: Non-identical files with identical md5sums on Debian 
systems?"):
> Unless you have a collection of MD5 collision attacks, or have installed a
> package that includes a sample MD5 collision, [...]

For the sake of sanity of our (still) MD5-based tools, I hope that
no-one uploads into our archive a package with an example MD5
collision.  (Unless the colliding files are wrapped up somehow, to
protect our infrastructure from any untoward behaviour.)

Ian.


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/20991.42365.739458.834...@chiark.greenend.org.uk



Bug#718791: ITP: mikutter -- Simple, powerful and moeful twitter client

2013-08-05 Thread HIGUCHI Daisuke (VDR dai)
Package: wnpp
Severity: wishlist
Owner: "HIGUCHI Daisuke (VDR dai)" 

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

* Package name: mikutter
  Version : 0.2.2.1318
  Upstream Author : Toshiaki Asai
* URL : http://mikutter.hachune.net/
* License : GPL-3, CC-BY-SA-3.0
  Programming Lang: Ruby
  Description : Simple, powerful and moeful twitter client

 Mikutter is a simple, powerful and moeful twitter client.
 .
 Mikutter provides several advanced features:
   * Multi pane
   * Reply view
   * Thread view
   * Followee, Follower list
   * Profile view
   * Search view
   * List view
   * Activity view

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)

iQIcBAEBCAAGBQJR/6n/AAoJEHg5YZ3UOWaOH6UP/j2X6ah9YciPXy9hn8wJrWTZ
Y0DqZYeixC4fkiBQiR6bHK2VyhdxIUBs04zg9hUek1CXZuIlLuxB3nLIkBHRAL16
jhSOHd4C3LGUWE1Tx4aqW/Y3mE5E/ynzYvvl4vWcfpSrwgo86UUbOVPf85XWwPfA
HRySAnj0GGznfHchRIh/B/ULjOlxLEdvHCXVTY0W+uJDODDiGUFuRXyqUH5+kPb8
/mAp74Cz9TXxxiways9+Oj7tfrZmH0Jinfmz5CbVy4bHkLx4C5UcXHxk8k5+BKDQ
kPrix1hNxp5pmuUeIJ9d+4Zqlto+XoFy1lmcIuVdyd0DcflOU4AFf3fKW/SSnIav
FcXHioA6lqmh9kPCCePyTnIQpWAaGBLh5tSH8rqGLlyPv8QK0QAMuIZT0ZHBPTdZ
lW/gqoPIhMPf7FXVVRctEPGzHtZFJ0Eu+cd6DUFs9wjLXu+XcC1Hjo7IUQQasG6E
N2+MyY1Dana4F6Jo9D57NMUhjkXPh2pBB9WaQG79levqyFdM0+fNBjjTn13lSMvt
4xs4lG8b8xU3l76BsBzi8RmUJ/LJuIBGtiqDkWvD1uE7ieDMfp9a0jci8J9EXpt+
iGWrKlXO7dyMye3EsYSDhY5KwUspjCbKGBajwwHxJ0FEeJWGOTZbxhTquwcv8GE+
Sw7kK/CjGhL5UNclNi43
=1+ro
-END PGP SIGNATURE-


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/20130805133501.29901.57302.report...@lilith.infoblue.home



Re: We need a global decision about R data in binary format, and stick to it.

2013-08-05 Thread Paul Tagliamonte
On Mon, Aug 05, 2013 at 02:13:15PM +0100, Ian Jackson wrote:
> We need to separate these two issues.

Aye.

IMVHO, this is the same as how we should treat images (I mean, for any
data format, not just this one case of a pickled object) - if the image
was a photo, clearly the .jpg or .png or whatever we get is the best way
to communicate this data, but if the image was generated off an .svg,
it should be distributed with it (and even rebuilt at build-time).

> One is the file format question.  It doesn't seem to me that there is
> anything wrong with a binary format as the preferred form for
> modification, in principle.  For a file which is typically edited
> using R, including by upstream when they what to edit it, then there
> is no problem.

Sure. If this data wasn't collected off some scientific
instrument or lovingly hand-made, I strongly believe that we should
rebuild such objects at build time, and use those in the binary
packages.

> The other is the assertion that this particular case involves a
> generated data table.  If this is the case then the source package
> needs to contain the source code which generates the table - and,
> really, it should regenerate the table during the build.  (The source
> might be in the form of another R binary object.)

I completely agree.

> (Of course there is a third issue: it is probably not the best
> engineering decision to use a binary save format rather than text
> source code.  But that's not something the Debian maintainer
> necessarily gets to choose and it's not a reason for an ftpmaster
> reject.)
> 
> > > The question asked by Paul is a recurrent question that comes each
> > > time the FTP trainees rotate (basically once per release cycle,
> > > because during the Freeze the FTP trainees find other exciting
> > > tasks to do, and then do not seem to have much time to process NEW
> > > anymore).
> > 
> > This must mean many people who care deeply about this topic see this as an
> > issue.
> 
> I don't think this is a helpful response to someone who is raising
> what they see as a systematic problem.

I'm sorry, Charles. Ian's right. That was a poor tone.

> 
> Paul, would it be possible to update the ftpmaster assistant reference
> materials to discuss R's binary files ?

I would be happy to document what is and isn't OK with these files. I'll
have to seek a bit of consensus from the rest of the ftp-team, but I
think treating them as if they were any other data format should be
fine.

> 
> Ian.

Thanks, Ian,
  Paul




-- 
 .''`.  Paul Tagliamonte 
: :'  : Proud Debian Developer
`. `'`  4096R / 8F04 9AD8 2C92 066C 7352  D28A 7B58 5B30 807C 2A87
 `- http://people.debian.org/~paultag


signature.asc
Description: Digital signature


Re: We need a global decision about R data in binary format, and stick to it.

2013-08-05 Thread Bastien ROUCARIES
Le 5 août 2013 15:42, "Paul Tagliamonte"  a écrit :
>
> On Mon, Aug 05, 2013 at 02:13:15PM +0100, Ian Jackson wrote:
> > We need to separate these two issues.
>
> Aye.
>
> IMVHO, this is the same as how we should treat images (I mean, for any
> data format, not just this one case of a pickled object) - if the image
> was a photo, clearly the .jpg or .png or whatever we get is the best way
> to communicate this data, but if the image was generated off an .svg,
> it should be distributed with it (and even rebuilt at build-time).

Could we made an exception for specially crafted image in order to exercice
buffer oveeflow ? (I think particularly art libpng ImageMagick)
>
> > One is the file format question.  It doesn't seem to me that there is
> > anything wrong with a binary format as the preferred form for
> > modification, in principle.  For a file which is typically edited
> > using R, including by upstream when they what to edit it, then there
> > is no problem.
>
> Sure. If this data wasn't collected off some scientific
> instrument or lovingly hand-made, I strongly believe that we should
> rebuild such objects at build time, and use those in the binary
> packages.
>
> > The other is the assertion that this particular case involves a
> > generated data table.  If this is the case then the source package
> > needs to contain the source code which generates the table - and,
> > really, it should regenerate the table during the build.  (The source
> > might be in the form of another R binary object.)
>
> I completely agree.
>
> > (Of course there is a third issue: it is probably not the best
> > engineering decision to use a binary save format rather than text
> > source code.  But that's not something the Debian maintainer
> > necessarily gets to choose and it's not a reason for an ftpmaster
> > reject.)
> >
> > > > The question asked by Paul is a recurrent question that comes each
> > > > time the FTP trainees rotate (basically once per release cycle,
> > > > because during the Freeze the FTP trainees find other exciting
> > > > tasks to do, and then do not seem to have much time to process NEW
> > > > anymore).
> > >
> > > This must mean many people who care deeply about this topic see this
as an
> > > issue.
> >
> > I don't think this is a helpful response to someone who is raising
> > what they see as a systematic problem.
>
> I'm sorry, Charles. Ian's right. That was a poor tone.
>
> >
> > Paul, would it be possible to update the ftpmaster assistant reference
> > materials to discuss R's binary files ?
>
> I would be happy to document what is and isn't OK with these files. I'll
> have to seek a bit of consensus from the rest of the ftp-team, but I
> think treating them as if they were any other data format should be
> fine.
>
> >
> > Ian.
>
> Thanks, Ian,
>   Paul
>
>
>
>
> --
>  .''`.  Paul Tagliamonte 
> : :'  : Proud Debian Developer
> `. `'`  4096R / 8F04 9AD8 2C92 066C 7352  D28A 7B58 5B30 807C 2A87
>  `- http://people.debian.org/~paultag


Re: We need a global decision about R data in binary format, and stick to it.

2013-08-05 Thread Ian Jackson
Bastien ROUCARIES writes ("Re: We need a global decision about R data in binary 
format, and stick to it."):
> Le 5 août 2013 15:42, "Paul Tagliamonte"  a écrit :
> > IMVHO, this is the same as how we should treat images (I mean, for any
> > data format, not just this one case of a pickled object) - if the image
> > was a photo, clearly the .jpg or .png or whatever we get is the best way
> > to communicate this data, but if the image was generated off an .svg,
> > it should be distributed with it (and even rebuilt at build-time).
> 
> Could we made an exception for specially crafted image in order to exercice
> buffer oveeflow ? (I think particularly art libpng ImageMagick)

I think this is something of a red herring corner case, and not really
related to the question about R binary objects.

If the last thing that happened to the image file was that upstream
edited it with a hex editor to introduce a buffer overflow, then the
resulting binary file is the preferred form for modification (after
all, that's how the last person to do so modified it...)

Ian.


--
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/20991.46045.904978.836...@chiark.greenend.org.uk



Re: We need a global decision about R data in binary format, and stick to it.

2013-08-05 Thread Sune Vuorela
On 2013-08-05, Paul Tagliamonte  wrote:
> IMVHO, this is the same as how we should treat images (I mean, for any
> data format, not just this one case of a pickled object) - if the image
> was a photo, clearly the .jpg or .png or whatever we get is the best way
> to communicate this data, but if the image was generated off an .svg,
> it should be distributed with it (and even rebuilt at build-time).

Whattabout svg files that are converted into png's and then manually
adjusted?

/Sune


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/slrnkvvdku.j0.nos...@sshway.ssh.pusling.com



Re: We need a global decision about R data in binary format, and stick to it.

2013-08-05 Thread Jeremy Stanley
On 2013-08-05 14:13:15 +0100 (+0100), Ian Jackson wrote:
[...]
> The other is the assertion that this particular case involves a
> generated data table. If this is the case then the source package
> needs to contain the source code which generates the table - and,
> really, it should regenerate the table during the build.
[...]

No argument on the first, but the second sets a bad precedent if
interpreted strongly. For example I have a program which relies on a
fairly large set of correlative data requiring hours of expensive
computation to generate. In the source package I include the
original data on which the resulting tables are based and provide a
means to regenerate it on the fly at package build time, but disable
it by default so that it doesn't chew up build resources
unnecessarily.

Since I need to generate the correlation data for other (non-Debian)
users of the software anyway, I ship the generated files in the
source package too and just include them in the binary package
(along with instructions and tooling for the end user to be able to
build datasets they can use to override the default ones provided).
While my example is Python rather than R, I expect it's
representative of situations for many scientific tools. Perhaps some
guidance on when this tactic is or is not appropriate would be
beneficial.
-- 
{ PGP( 48F9961143495829 ); FINGER( fu...@cthulhu.yuggoth.org );
WWW( http://fungi.yuggoth.org/ ); IRC( fu...@irc.yuggoth.org#ccl );
WHOIS( STANL3-ARIN ); MUD( kin...@katarsis.mudpy.org:6669 ); }


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20130805151657.gd1...@yuggoth.org



Re: We need a global decision about R data in binary format, and stick to it.

2013-08-05 Thread Ian Jackson
Jeremy Stanley writes ("Re: We need a global decision about R data in binary 
format, and stick to it."):
> No argument on the first, but the second sets a bad precedent if
> interpreted strongly. For example I have a program which relies on a
> fairly large set of correlative data requiring hours of expensive
> computation to generate. In the source package I include the
> original data on which the resulting tables are based and provide a
> means to regenerate it on the fly at package build time, but disable
> it by default so that it doesn't chew up build resources
> unnecessarily.

That makes sense, and is IMO a good reason for not doing the complete
from-scratch build each time.

> Since I need to generate the correlation data for other (non-Debian)
> users of the software anyway, I ship the generated files in the
> source package too and just include them in the binary package
> (along with instructions and tooling for the end user to be able to
> build datasets they can use to override the default ones provided).
> While my example is Python rather than R, I expect it's
> representative of situations for many scientific tools. Perhaps some
> guidance on when this tactic is or is not appropriate would be
> beneficial.

There should IMO be a standard way to request a source package to do
from-scratch rebuilds for this kind of thing, for QA purposes.

Ian.


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/20991.51097.617273.783...@chiark.greenend.org.uk



Re: We need a global decision about R data in binary format, and stick to it.

2013-08-05 Thread Paul Wise
On Mon, Aug 5, 2013 at 4:28 PM, Sune Vuorela wrote:

> What about svg files that are converted into png's and then manually
> adjusted?

I'd say the "source" is the combination of the SVG files plus the adjusted PNGs.

I guess you are thinking of a particular case here? What is the reason
for manually adjusting them?

-- 
bye,
pabs

http://wiki.debian.org/PaulWise


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/caktje6e1xcubomaeuajzmkvdhjzumhnpwh04+fw8m19qd8p...@mail.gmail.com



Re: We need a global decision about R data in binary format, and stick to it.

2013-08-05 Thread Jeremy Stanley
On 2013-08-05 16:41:13 +0100 (+0100), Ian Jackson wrote:
[...]
> There should IMO be a standard way to request a source package to do
> from-scratch rebuilds for this kind of thing, for QA purposes.

I absolutely agree. If there were a standard make target or envvar
for this purpose I would gladly implement it in my debian/rules.
-- 
{ PGP( 48F9961143495829 ); FINGER( fu...@cthulhu.yuggoth.org );
WWW( http://fungi.yuggoth.org/ ); IRC( fu...@irc.yuggoth.org#ccl );
WHOIS( STANL3-ARIN ); MUD( kin...@katarsis.mudpy.org:6669 ); }


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20130805155503.ge1...@yuggoth.org



Re: Finding correct component for Virtual Box / Debian / screen resolution issue

2013-08-05 Thread adrelanos
Paul Wise:
>> This question is about Virtual Box / Debian / screen resolution without
>> having guest additions installed.
> 
> I see, is there any reason to not do that?

Security reasons. It weakens isolation between guest and host. See also
[1]. Another reason is, guest additions are every now and then not
installable.

> Anyway, looking at the Xorg.log you posted, it is using VESA. It
> rejects (various reasons) all the modes returned by the virtual
> firmware and uses some hard-coded built-in modes instead. Probably
> this is either #566153 or #563203 and I think has been present
> forever;

Maybe. Have they been forwarded upstream? Are there workarounds?

>> (It should work. Grub can do higher resolutions in grub boot menu as
>> noted in my bug report. Why Linux can not?)
> 
> I missed that point. Do you know which driver/module grub is loading
> to achieve that? I expect it is using VESA and trusting the virtual
> firmware instead.

In /etc/default/grub using GRUB_GFXMODE="1280x1024" works, but only for
the grub boot menu.

I don't know which driver/module grub is loading to achieve that. Other
then the GRUB_GFXMODE="1280x1024 change, no other changes. So grub
default, whatever that is. Any way I could find out? Probably indeed
vesa. (Because other standards available at that early phase don't even
support higher resolutions in principle as far I know.)

[1] http://www.phoronix.com/scan.php?page=news_item&px=OTk5Mw


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/51ffcfa1.8010...@riseup.net



Re: new hashes (SHA512, SHA3) in apt metadata and .changes files?

2013-08-05 Thread Helmut Grohne
On Mon, Aug 05, 2013 at 01:33:24PM +0100, Ian Jackson wrote:
> AIUI SHA-512 is faster than SHA-256 on many processors, and not
> usually slower on the others.  If the hashes are too long, they can be
> truncated.

Not that, I think it matters, but this got me interested. It appears
that in practice this depends entirely on the word size. So SHA-256 is
faster on 32bit architectures and SHA-512 is faster on 64bit
architectures. The other aspect is that a block update of SHA-256 uses
64 rounds for a 64 byte block. Whereas SHA-512 uses 80 rounds for a 128
byte block update. So SHA-512 lowers the rounds/byte ratio. Now what can
we do with this knowledge? Probably negligible.

Helmut


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20130805162104.ga32...@alf.mars



Fwd: /etc/hosts and resolving of the local host/domainname - 127.0.0.1 vs. 127.0.1.1

2013-08-05 Thread Thomas Hood
Sorry I'm a bit late contributing to this discussion.

Christoph Anton Mitterer wrote:
> The eventual result[1] was that Debian nowadays ships
> /etc/hosts like these per default:
>
> 127.0.0.1 localhost
> 127.0.1.1 . 
>
> As also described in the Debian reference[2].

That's not entirely accurate. Wheezy and Ubuntu Desktop install
an /etc/hosts like the following, without a domain_name.

   127.0.0.1 localhost
   127.0.1.1 

The Debian Reference is out of date.

Some years ago it was the case that if a machine had a static
external IP address then this was listed instead of '127.0.1.1'.
I presume that this is still the case but I haven't checked
(and I am on the road, so can't easily check, sorry).

> The hostname is not necessarily a domain name, at least not
> de jure.

Right. Ideally nothing would blindly treat the system hostname
as a domain name. I don't know how that practice ever got
started, but it overlooked the fact that machines can have
multiple domain names and multiple IP addresses, any of
which can be externally administered and any of which can
be changed at any time. The machine itself doesn't even know
when its domain names change.

> But in reality, many programs and people rely or are at least
> used to the hostname being resolvable.
> That practise won't change and we cannot do much about it.

That seems too pessimistic to me.  If there are broken programs
we can patch them.

> - Most applications that listen to the loopback actually
> only listen to 127.0.0.1 (and perhaps ::1) but often not
> to 127.0.0.0/8.

Last time I checked, most applications that listen on 127.0.0.1
listen on all addresses, thus including 127.0.0.0/8.  This is why
resolving the hostname to 127.0.1.1 actually causes few if any
problems in practice.

> => so the overall proposal (I) is:
> If no one has any technical reasons against, can we stop using
> 127.0.1.1 and let the hostname point to 127.0.0.1 as in:
> 127.0.0.1   localhost
> 127.0.0.1   foobar[.bar.net foobar]

Strictly speaking, each IP address in /etc/hosts should be
represented by no more than one line.

Your proposal has the consequence that 'localhost' is the
canonical name for 'foobar'. Please don't do this. I don't
want to return to the days of 'localhost' appearing in log
files and command line prompts.

Simon McVittie wrote:
> libnss-myhostname is basically this, and is packaged. It tries
> to return a public address if possible, only falling back to
> 127.0.0.2 (upstream), 127.0.1.1 (as patched in Debian) or ::1
> (IPv6) if there's nothing more suitable.

This is exactly what you need if you need the system hostname
to be resolvable to an IP address. (And I am prepared to believe
that we still need that, even though I haven't tested it recently.)

With the nsswitch configuration

hosts:  files ... dns ... myhostname

myhostname resolves the system hostname if nothing else does
so first. So it can be overridden either by DNS or by /etc/hosts.
If the system hostname changes, no file has to be edited.  Nice.

Also nice is the fact that myhostname resolves the system hostname
to an external address if there is one, increasing the chances that
the result is similar to what would be obtained from DNS.

Wouter Verhelst wrote:
> The right way, in my opinion, is that /etc/hosts should
> look like this:
>
> 127.0.0.1 localhost
> 127.0.0.1 hostname.domain hostname

Strictly speaking there should be no more than one line per
IP address, so that would be

127.0.0.1 localhost hostname.domain hostname

in which case 'localhost' is the canonical name for alias 'hostname'.

> or, alternatively:
>
> 127.0.0.1 hostname.domain hostname localhost

In that case 'hostname.domain' is the canonical name for alias 'localhost'.

Before any move is made to conflate the system hostname with
'localhost' in this way I'd like to see some proof that this no longer
causes any malfunction, or if it does cause malfunction (e.g.,
'localhost' appearing in log files) then I'd like to see the
malfunctioning packages fixed in advance of the transition from
127.0.1.1 to 127.0.0.1. And before making this potentially disruptive
change, I'd like to see evidence that the current practice actually
causes problems --- problems that can't easily be solved by patching
individual packages either to make them listen on 127.0.1.1 on the one
hand or to make them talk to localhost on the other.
--
Thomas Hood


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/cajn8kfcqbzh6scduqya1udjn397xo3wwvaj1mbfvzhvghkk...@mail.gmail.com



Re: We need a global decision about R data in binary format, and stick to it.

2013-08-05 Thread Tollef Fog Heen
]] Ian Jackson 

> Bastien ROUCARIES writes ("Re: We need a global decision about R data in 
> binary format, and stick to it."):
> > Le 5 août 2013 15:42, "Paul Tagliamonte"  a écrit :
> > > IMVHO, this is the same as how we should treat images (I mean, for any
> > > data format, not just this one case of a pickled object) - if the image
> > > was a photo, clearly the .jpg or .png or whatever we get is the best way
> > > to communicate this data, but if the image was generated off an .svg,
> > > it should be distributed with it (and even rebuilt at build-time).
> > 
> > Could we made an exception for specially crafted image in order to exercice
> > buffer oveeflow ? (I think particularly art libpng ImageMagick)
> 
> I think this is something of a red herring corner case, and not really
> related to the question about R binary objects.

Agreed.

> If the last thing that happened to the image file was that upstream
> edited it with a hex editor to introduce a buffer overflow, then the
> resulting binary file is the preferred form for modification (after
> all, that's how the last person to do so modified it...)

Or more precisely, it's no longer an image that you tend to use for,
well, displaying something.  It's a test for a buffer overflow that also
happens to be an image.  (Saying that just because somebody last edited
a file with a hex editor then that's the preferred form for modification
leaves a pretty large hole.  If I make a change to a blob and change a
2012 to 2013 in a copyright notice, it's obvious that the blob isn't its
own source.)

-- 
Tollef Fog Heen
UNIX is user friendly, it's just picky about who its friends are


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/m2siyoqa93@rahvafeir.err.no



Re: We need a global decision about R data in binary format, and stick to it.

2013-08-05 Thread Don Armstrong
On Mon, 05 Aug 2013, Ian Jackson wrote:
> The other is the assertion that this particular case involves a
> generated data table. If this is the case then the source package
> needs to contain the source code which generates the table - and,
> really, it should regenerate the table during the build. (The source
> might be in the form of another R binary object.)

I know of almost no cases where someone actually generated the R binary
object directly.

In general, you have a data table represented as some kind of text file,
and then you do operations on it, which result in a R binary object
being created from a collection of text files. Subsequently, you might
load the R binary object and modify it within R, but for some
modifications, you might want to go back to the original data table.

It's unfortunately common practice for R upstreams to ship the binary
object instead of the combination of original tables and R source
necessary to generate the actual R binary save data, but this is
something that should be changed, and Debian should be working to lead
the charge to do this.

In almost all cases, dropping the R binary object(s) do not appreciably
change the functionality of the R module; it just means that it is more
difficult to use the examples because there is no example data.

-- 
Don Armstrong  http://www.donarmstrong.com

in Just-
spring  when the world is mud-
luscious the little lame baloonman 

whistles   far   and wee 
 -- e.e. cummings "[in Just-]"


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20130805224955.gd14...@rzlab.ucr.edu



Re: [Debian-med-packaging] We need a global decision about R data in binary format, and stick to it.

2013-08-05 Thread Charles Plessy
Hi Joerg and Paul,

thank you for your prompt answers and thank for everybody's contribution.

I would like to focus my questions on R binary objects that represent data that
was not entirely computer-generated (that is, for which the source code can not
be summarised by a mathematical formula and simple starting values).  Note also
that a large number of other software, like LibreOffice for instance, allow to
store unformatted textual data as a binary object.  Therefore "binary object"
does not mean that the content is impractical to retreive.

My first question is: to what extent do we need to verify that the object can
be regenerated.

  - The starting point is a source package with a R binary object.
  - With this starting point only, it may be impossible to know if it has a
source or not.  Has the upstream developer typed the results by hand
in a R session, for instance when collecting data from a table in a
printed report, did he collect his data in a file, not provided
in the source package, or does he need a combination of data and scripts
to regenerate the binary object ?  Unless the answer can be found on the
Internet, one has to ask the author directly.
  - If we have to ask, how long do we need to wait for the answer, and what
is the conclusion in case there is no answer.

My second question is: to what extent do we need the source.

  - When the R binary object is a table that has been generated by hand,
my understanding is that it does not matter whatever format Upstream
prefers, since it is trivial for anybody to export the R object into
his favorite format for modification.
  - When the data in the R binary object has been produced by processing
another data file, to what point do we need to go backwards ?  This
is an important question, because at the end of the chain of
rebuildability, there can be gigabytes of data.
  - When the source of the binary object is not strictly necessary for
making relevant modifications, can we distribute the package in Debian ?

My last question is, given the answers to the previous questions, what do we do
with the R packages that are already in the archive and also contain data that 
is
editable as is but do have an original source, who will do it, and what is the
timeline in case of inaction.

Also, since the case of pictures have been discussed, here is a parallel
between R objects and PNG files is the following.

1) In the PNG file's metadata, there is a field that can indicate if for 
instance
it was made by Inkscape.  However, in presence of that field, one can not
conclude if the SVG source is still existing, or if it exists on the computer
of a contributor, but the upstream developers decided to discard it.

2) If a program displays an image in PNG format and does not use its SVG
source, while one can regret that the source is not available, it does not
prevent from editing the PNG, or even replacing it entirely.

3) One could consider to scan the Debian archive for PNG files made with
Inkscape with no corresponding SVG file in the source package.  Would such
packages be non-Free ?  If yes, how long would you wait before removing the
package ?

While writing this answer, I also read Don's email advocating for Debian to
take the lead and change the current practice in the R community, that prefers
to ditribute data as R binary objects in the source packages.  This is
laudable, but I expect that it will take time, and it needs people who have
roots in both communities.

In the current situation, that I describe as "active bitrotting", we do not
apply the same rules to the packages that enter the archive and the packages
that are already in, which cause the packages under active development to
become obsolete each time new dependancies can not enter in Debian.  Given the
rotten tomatoes that fly on my face because I can not update anymore the
r-cran-ggplot2 package, I do not feel fit to the task of negociating with the
R community to change its traditions.

In any case, I think that we need clear guidelines, that help to foresee if a R
package is acceptable or not in Debian, so that we can better decide if we
undertake the work at all.

Currently, my take would be to move packages to non-free.  This would also
allow us to ship the PDF documentation that we currently delete.

Cheers,

-- 
Charles Plessy
Debian Med packaging team,
http://www.debian.org/devel/debian-med
Tsurumi, Kanagawa, Japan


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20130805232904.ga8...@falafel.plessy.net



Re: Bug#718791: ITP: mikutter -- Simple, powerful and moeful twitter client

2013-08-05 Thread Chris Bannister
On Mon, Aug 05, 2013 at 10:35:01PM +0900, HIGUCHI Daisuke (VDR dai) wrote:
> Package: wnpp
> Severity: wishlist
> Owner: "HIGUCHI Daisuke (VDR dai)" 
> 
> * Package name: mikutter
>   Version : 0.2.2.1318
>   Upstream Author : Toshiaki Asai
> * URL : http://mikutter.hachune.net/
> * License : GPL-3, CC-BY-SA-3.0
>   Programming Lang: Ruby
>   Description : Simple, powerful and moeful twitter client
   ^^

>  Mikutter is a simple, powerful and moeful twitter client.
  ^^

I can't find any definition of "moeful" and therefore is more of a
hindrance to understanding the description than an aid.


>* Followee, Follower list
   
   No such word.

-- 
"If you're not careful, the newspapers will have you hating the people
who are being oppressed, and loving the people who are doing the 
oppressing." --- Malcolm X


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20130805235929.GC23885@tal



Re: [Debian-med-packaging] We need a global decision about R data in binary format, and stick to it.

2013-08-05 Thread Don Armstrong
On Tue, 06 Aug 2013, Charles Plessy wrote:
> My first question is: to what extent do we need to verify that the
> object can be regenerated.
> 
>   - The starting point is a source package with a R binary object.
>   - With this starting point only, it may be impossible to know if it
>   has a source or not. 
[...]
>   Unless the answer can be found on the Internet, one has to ask the
>   author directly.
>   - If we have to ask, how long do we need to wait for the answer, and
>   what is the conclusion in case there is no answer.

We should ask if there is any question. If we get no answer, we should
use our best judgment as to the likely case. Non-responsive upstreams
also should cause us to question whether we should be distributing the
package at all.
 
> My second question is: to what extent do we need the source.
> 
>   - When the R binary object is a table that has been generated by
>   hand, my understanding is that it does not matter whatever format
>   Upstream prefers, since it is trivial for anybody to export the R
>   object into his favorite format for modification.

The original table in any form is source, then. But if there are any
subsequent alterations to the table, we should distribute those
subsequent alterations. In many cases, you take the original raw data,
and then alter it. If the code to do that exists, we should take the
original raw data, and do the alterations. [This should really be SOP
for all modules in R, because to do otherwise means that it is very
difficult to reproduce your alterations in the event of wrong data or
new data.]

>   - When the data in the R binary object has been produced by
>   processing another data file, to what point do we need to go
>   backwards ? This is an important question, because at the end of the
>   chain of rebuildability, there can be gigabytes of data.

This is a far more difficult case, but if this data exists and can be
digitally distributed Debian should have it and distribute it. Perhaps
not in the source package, but almost certainly in a data package
somewhere. [And honestly, there are very few interesting R packages
which we can actually distribute where this is really the case. I can't
think of any we currently distribute, and the main ones I can think of
involve databases of sequences for microarrays, and there you actually
want the complete data anyway.]

>   - When the source of the binary object is not strictly necessary for
>   making relevant modifications, can we distribute the package in
>   Debian ?

If the source isn't strictly necessary, we should remove the binary
object, and distribute the package.
 
> My last question is, given the answers to the previous questions, what
> do we do with the R packages that are already in the archive and also
> contain data that is editable as is but do have an original source,
> who will do it, and what is the timeline in case of inaction.

The package maintainer should handle it; in the case of inaction from
upstream, the package maintainer can then either remove the data, split
the package, move the package to non-free, or remove the package from
Debian entirely. The timeline should be the standard one that is used
for all RC bugs.

> In the current situation, that I describe as "active bitrotting", we
> do not apply the same rules to the packages that enter the archive and
> the packages that are already in, which cause the packages under
> active development to become obsolete each time new dependancies can
> not enter in Debian.

We actually do and should apply the same rules. Sometimes violations of
the rules are missed for a while, though, and we have to come back and
file bugs with severity serious to deal with the problem.

> Currently, my take would be to move packages to non-free. This would
> also allow us to ship the PDF documentation that we currently delete.

In these cases, we should split the package out into a non-free
component and a free component.

I should note that I'm currently distributing via debian-r.debian.net a
few hundred packages which probably have this particular problem too.

-- 
Don Armstrong  http://www.donarmstrong.com

in Just-
spring  when the world is mud-
luscious the little lame baloonman 

whistles   far   and wee 
 -- e.e. cummings "[in Just-]"


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20130806004416.gf14...@rzlab.ucr.edu



Re: Non-identical files with identical md5sums on Debian systems?

2013-08-05 Thread Chow Loong Jin
On Mon, Aug 05, 2013 at 02:15:41PM +0100, Ian Jackson wrote:
> Russ Allbery writes ("Re: Non-identical files with identical md5sums on 
> Debian systems?"):
> > Unless you have a collection of MD5 collision attacks, or have installed a
> > package that includes a sample MD5 collision, [...]
> 
> For the sake of sanity of our (still) MD5-based tools, I hope that
> no-one uploads into our archive a package with an example MD5
> collision.  (Unless the colliding files are wrapped up somehow, to
> protect our infrastructure from any untoward behaviour.)

What in our infrastructure would break on an MD5 collision anyway? The closest
thing I could think of is dedup.debian.net, but that appears to use SHA512.

-- 
Kind regards,
Loong Jin


signature.asc
Description: Digital signature


Re: Bug#718791: ITP: mikutter -- Simple, powerful and moeful twitter client

2013-08-05 Thread Chow Loong Jin
On Tue, Aug 06, 2013 at 11:59:29AM +1200, Chris Bannister wrote:
> On Mon, Aug 05, 2013 at 10:35:01PM +0900, HIGUCHI Daisuke (VDR dai) wrote:
> > Package: wnpp
> > Severity: wishlist
> > Owner: "HIGUCHI Daisuke (VDR dai)" 
> > 
> > * Package name: mikutter
> >   Version : 0.2.2.1318
> >   Upstream Author : Toshiaki Asai
> > * URL : http://mikutter.hachune.net/
> > * License : GPL-3, CC-BY-SA-3.0
> >   Programming Lang: Ruby
> >   Description : Simple, powerful and moeful twitter client
>^^
> 
> >  Mikutter is a simple, powerful and moeful twitter client.
>   ^^
> 
> I can't find any definition of "moeful" and therefore is more of a
> hindrance to understanding the description than an aid.

Probably a conjugation of "moe" and "-ful". Just "moe" would probably better
describe this.

> >* Followee, Follower list
>
>No such word.

-- 
Kind regards,
Loong Jin


signature.asc
Description: Digital signature