Re: Non-identical files with identical md5sums on Debian systems?
On Mon, Aug 05, 2013 at 06:44:49AM +0200, Fabian Greffrath wrote: > Hi all, > > I do occasionally check for identical files on different systems by > comparing their md5sums. So, just out of interest, could someone tell me > (how to find out) how many non-identical files with identical md5sums > there are there on a typical (say, amd64) Debian system? How about this? #!/bin/sh cat /var/lib/dpkg/info/*.md5sums | sort -u > md5sums-files.txt awk '{print $1}' md5sums-files.txt | uniq -c | awk '$1 > 1 {print $2}' > dup.txt while read md5; do grep "^$md5" md5sums-files.txt | sed -re 's/^[a-f0-9]+[[:space:]]+//' | ( read file shasum1=$(sha256sum "$file" | awk '{print $1}') while read file; do if [ "$(sha256sum "$file" | awk '{print $1}')" != "$shasum1" ]; then echo $md5 $file fi done ) done < dup.txt I tried running it, didn't find anything on my Ubuntu installation. -- Kind regards, Loong Jin signature.asc Description: Digital signature
Re: Non-identical files with identical md5sums on Debian systems?
On Sun, Aug 04, 2013 at 10:24:59PM -0700, Vincent Cheng wrote: > On Sun, Aug 4, 2013 at 9:44 PM, Fabian Greffrath wrote: > > I do occasionally check for identical files on different systems by > > comparing their md5sums. So, just out of interest, could someone tell me > > (how to find out) how many non-identical files with identical md5sums > > there are there on a typical (say, amd64) Debian system? > > The closest thing to what you want may be dedup.debian.net, but I > don't think it lets you filter out non-identical files. Indeed this task can be solved with the software backing dedup.debian.net. The general assumption is that sha512 is collision-free. I can give a rough idea on how to do that: 1) Obtain the software. 2) Modify schema.sql to add md5 to the functions table. 3) Modify importpkg.py to record md5 hashes. 4) Follow the steps in README to import a local Debian mirror. (This takes about 7 hours on a quick 8 core box and 3 days on a slower single core.) 5) Look for files, that have same md5 hash, but different sha512 hash. Something like this SQL query will give you an answer (untested). SELECT h1.cid, h2.cid FROM hash AS h1 JOIN hash AS h2 ON h1.fid = h2.fid AND h1.hash = h2.hash JOIN hash AS h3 ON h1.cid = h3.cid JOIN hash AS h4 ON h2.cid = h4.cid AND h3.fid = h4.fid JOIN function AS f1 ON h1.fid = f1.id JOIN function AS f3 ON h3.fid = f3.id WHERE h3.hash != h4.hash AND f1.name = 'md5' AND f3.name = 'sha512'; It gives keys into the content table to look up the actual filenames and packages. In case you have any questions, just ask (mail or #-qa on oftc). Helmut -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20130805084636.ga10...@alf.mars
Re: Finding correct component for Virtual Box / Debian / screen resolution issue
On Sun, Aug 4, 2013 at 3:09 PM, Cyril Brulebois wrote: > Doesn't look like something to be run in d-i. As I understand it, isenkram is just a proof of concept of the idea. It also seems to be a reimplementation of discover? > discover already pulls virtualbox bits in. Mentioned not so long ago in: > https://lists.debian.org/20130726091036.gb22...@mraw.org Aha, so that is where I should file a patch for installing thinkfan/etc on Thinkpads when installing with d-i. It appears that discover and isenkram uses a hard-coded list (in discover-data) of mappings between devices. If DEP-11 support were to be added to the archive and to discover, maintaining the list of mappings between devices and packages would be delegated to individual maintainers of the packages in question. discover/isenkram/PackageKit would then use apt data to discover which packages to install on which hardware. BTW, virtualbox-ose-guest-x11 got renamed to virtualbox-guest-x11 in 2011 so the discover-data package needs updating for that transition. -- bye, pabs http://wiki.debian.org/PaulWise -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/CAKTje6FdCdMEif+ha1cdZ=7q4g3nlgt7avewjsjggne7wxx...@mail.gmail.com
Bug#718769: ITP: clsync -- live sync tool based on inotify, written in GNU C
Package: wnpp Severity: wishlist Owner: Artyom A Anikeev * Package name: clsync Version : 0.0 Upstream Author : Dmitry Yu Okunev * URL : https://github.com/xaionaro/clsync * License : GPL-3+ Programming Lang: C Description : live sync tool based on inotify, written in GNU C Clsync recursively watches for source directory and executes external program to sync the changes. Clsync is adapted to use together with rsync. This utility is much more lightweight than competitors and supports such features as separate queue for big files, regex file filter and multi-threading. -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20130805090403.15637.78167.report...@icarus.mephi.ru
Re: Non-identical files with identical md5sums on Debian systems?
On Sun, Aug 04, 2013 at 10:21:09PM -0700, Russ Allbery wrote: > Fabian Greffrath writes: > > > I do occasionally check for identical files on different systems by > > comparing their md5sums. So, just out of interest, could someone tell me > > (how to find out) how many non-identical files with identical md5sums > > there are there on a typical (say, amd64) Debian system? > > Unless you have a collection of MD5 collision attacks, or have installed a > package that includes a sample MD5 collision, the changes are quite good > that the answer is "zero." MD5 is no longer considered cryptographically > strong, but that doesn't mean it's not a fairly random 128-bit hash. You > need a *lot* of files before even the birthday paradox will give you much > likelihood of an MD5 collision that wasn't intentionally constructed. Let's assume every hard drive produced so far in human history is combined in a single RAID0 array, and formatted using a typical filesystem without an inode limit, then filled with small files. If my estimate is correct, thanks to the birthday paradox there's around 0.001% chance there will be at least one non-constructed MD5 collision. Also, there is no known preimage attack against MD5; collision attacks are quite less dangerous as the attacker would need to first give you a legitimate version of the file she wants to replace. -- ᛊᚨᚾᛁᛏᚣ᛫ᛁᛊ᛫ᚠᛟᚱ᛫ᚦᛖ᛫ᚹᛖᚨᚲ -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20130805100834.ga2...@angband.pl
Bug#718775: ITP: clojurehelper -- Helper scripts for packaging Clojure programs
Package: wnpp Severity: wishlist Owner: "Eugenio Cano-Manuel Mendoza" * Package name: clojurehelper Version : 0.1 Upstream Author : Eugenio Cano-Manuel Mendoza * URL : * http://anonscm.debian.org/gitweb/?p=pkg-clojure/clojurehelper.git * License : MIT Programming Lang: Python Description : Helper scripts for packaging Clojure programs Clojurehelper contains several scripts which help in packaging Clojure programs: * lein_makepkg generates a template for a Debian Clojure package. * lein_builddocs creates html documentation from Markdown format. * lein_build creates jar files from Clojure sources. * lein-xml is a plugin for Leiningen that exports project.clj files to xml. This package provides a dh sequence that can be used along javahelper to build clojure packages. -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20130805022738.8425.50929.reportbug@localhost
Re: new hashes (SHA512, SHA3) in apt metadata and .changes files?
Ondřej Surý writes ("Re: new hashes (SHA512, SHA3) in apt metadata and .changes files?"): > SHA512 doesn't bring any advantage over SHA256. AIUI SHA-512 is faster than SHA-256 on many processors, and not usually slower on the others. If the hashes are too long, they can be truncated. Ian. -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20991.39828.481089.77...@chiark.greenend.org.uk
Re: We need a global decision about R data in binary format, and stick to it.
Paul Tagliamonte writes ("Re: We need a global decision about R data in binary format, and stick to it."): > On Mon, Aug 05, 2013 at 09:57:35AM +0900, Charles Plessy wrote: > > it is the common practice in upstream R packages to store data in binary > > objects. Those objects can be modified with R, and exported into various > > formats. The Debian archive if full of them. > > This is not unlike a Python pickle. > > However, even more to the point, with *this* package, that was a > *generated data table*. These *generated* values are clearly not prefered > form of modification. I asked the uploader to point to where they came > from. I don't think this is unfair. We need to separate these two issues. One is the file format question. It doesn't seem to me that there is anything wrong with a binary format as the preferred form for modification, in principle. For a file which is typically edited using R, including by upstream when they what to edit it, then there is no problem. The other is the assertion that this particular case involves a generated data table. If this is the case then the source package needs to contain the source code which generates the table - and, really, it should regenerate the table during the build. (The source might be in the form of another R binary object.) (Of course there is a third issue: it is probably not the best engineering decision to use a binary save format rather than text source code. But that's not something the Debian maintainer necessarily gets to choose and it's not a reason for an ftpmaster reject.) > > The question asked by Paul is a recurrent question that comes each > > time the FTP trainees rotate (basically once per release cycle, > > because during the Freeze the FTP trainees find other exciting > > tasks to do, and then do not seem to have much time to process NEW > > anymore). > > This must mean many people who care deeply about this topic see this as an > issue. I don't think this is a helpful response to someone who is raising what they see as a systematic problem. Paul, would it be possible to update the ftpmaster assistant reference materials to discuss R's binary files ? Ian. -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20991.42219.341036.231...@chiark.greenend.org.uk
Re: Non-identical files with identical md5sums on Debian systems?
Russ Allbery writes ("Re: Non-identical files with identical md5sums on Debian systems?"): > Unless you have a collection of MD5 collision attacks, or have installed a > package that includes a sample MD5 collision, [...] For the sake of sanity of our (still) MD5-based tools, I hope that no-one uploads into our archive a package with an example MD5 collision. (Unless the colliding files are wrapped up somehow, to protect our infrastructure from any untoward behaviour.) Ian. -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20991.42365.739458.834...@chiark.greenend.org.uk
Bug#718791: ITP: mikutter -- Simple, powerful and moeful twitter client
Package: wnpp Severity: wishlist Owner: "HIGUCHI Daisuke (VDR dai)" -BEGIN PGP SIGNED MESSAGE- Hash: SHA256 * Package name: mikutter Version : 0.2.2.1318 Upstream Author : Toshiaki Asai * URL : http://mikutter.hachune.net/ * License : GPL-3, CC-BY-SA-3.0 Programming Lang: Ruby Description : Simple, powerful and moeful twitter client Mikutter is a simple, powerful and moeful twitter client. . Mikutter provides several advanced features: * Multi pane * Reply view * Thread view * Followee, Follower list * Profile view * Search view * List view * Activity view -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.14 (GNU/Linux) iQIcBAEBCAAGBQJR/6n/AAoJEHg5YZ3UOWaOH6UP/j2X6ah9YciPXy9hn8wJrWTZ Y0DqZYeixC4fkiBQiR6bHK2VyhdxIUBs04zg9hUek1CXZuIlLuxB3nLIkBHRAL16 jhSOHd4C3LGUWE1Tx4aqW/Y3mE5E/ynzYvvl4vWcfpSrwgo86UUbOVPf85XWwPfA HRySAnj0GGznfHchRIh/B/ULjOlxLEdvHCXVTY0W+uJDODDiGUFuRXyqUH5+kPb8 /mAp74Cz9TXxxiways9+Oj7tfrZmH0Jinfmz5CbVy4bHkLx4C5UcXHxk8k5+BKDQ kPrix1hNxp5pmuUeIJ9d+4Zqlto+XoFy1lmcIuVdyd0DcflOU4AFf3fKW/SSnIav FcXHioA6lqmh9kPCCePyTnIQpWAaGBLh5tSH8rqGLlyPv8QK0QAMuIZT0ZHBPTdZ lW/gqoPIhMPf7FXVVRctEPGzHtZFJ0Eu+cd6DUFs9wjLXu+XcC1Hjo7IUQQasG6E N2+MyY1Dana4F6Jo9D57NMUhjkXPh2pBB9WaQG79levqyFdM0+fNBjjTn13lSMvt 4xs4lG8b8xU3l76BsBzi8RmUJ/LJuIBGtiqDkWvD1uE7ieDMfp9a0jci8J9EXpt+ iGWrKlXO7dyMye3EsYSDhY5KwUspjCbKGBajwwHxJ0FEeJWGOTZbxhTquwcv8GE+ Sw7kK/CjGhL5UNclNi43 =1+ro -END PGP SIGNATURE- -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20130805133501.29901.57302.report...@lilith.infoblue.home
Re: We need a global decision about R data in binary format, and stick to it.
On Mon, Aug 05, 2013 at 02:13:15PM +0100, Ian Jackson wrote: > We need to separate these two issues. Aye. IMVHO, this is the same as how we should treat images (I mean, for any data format, not just this one case of a pickled object) - if the image was a photo, clearly the .jpg or .png or whatever we get is the best way to communicate this data, but if the image was generated off an .svg, it should be distributed with it (and even rebuilt at build-time). > One is the file format question. It doesn't seem to me that there is > anything wrong with a binary format as the preferred form for > modification, in principle. For a file which is typically edited > using R, including by upstream when they what to edit it, then there > is no problem. Sure. If this data wasn't collected off some scientific instrument or lovingly hand-made, I strongly believe that we should rebuild such objects at build time, and use those in the binary packages. > The other is the assertion that this particular case involves a > generated data table. If this is the case then the source package > needs to contain the source code which generates the table - and, > really, it should regenerate the table during the build. (The source > might be in the form of another R binary object.) I completely agree. > (Of course there is a third issue: it is probably not the best > engineering decision to use a binary save format rather than text > source code. But that's not something the Debian maintainer > necessarily gets to choose and it's not a reason for an ftpmaster > reject.) > > > > The question asked by Paul is a recurrent question that comes each > > > time the FTP trainees rotate (basically once per release cycle, > > > because during the Freeze the FTP trainees find other exciting > > > tasks to do, and then do not seem to have much time to process NEW > > > anymore). > > > > This must mean many people who care deeply about this topic see this as an > > issue. > > I don't think this is a helpful response to someone who is raising > what they see as a systematic problem. I'm sorry, Charles. Ian's right. That was a poor tone. > > Paul, would it be possible to update the ftpmaster assistant reference > materials to discuss R's binary files ? I would be happy to document what is and isn't OK with these files. I'll have to seek a bit of consensus from the rest of the ftp-team, but I think treating them as if they were any other data format should be fine. > > Ian. Thanks, Ian, Paul -- .''`. Paul Tagliamonte : :' : Proud Debian Developer `. `'` 4096R / 8F04 9AD8 2C92 066C 7352 D28A 7B58 5B30 807C 2A87 `- http://people.debian.org/~paultag signature.asc Description: Digital signature
Re: We need a global decision about R data in binary format, and stick to it.
Le 5 août 2013 15:42, "Paul Tagliamonte" a écrit : > > On Mon, Aug 05, 2013 at 02:13:15PM +0100, Ian Jackson wrote: > > We need to separate these two issues. > > Aye. > > IMVHO, this is the same as how we should treat images (I mean, for any > data format, not just this one case of a pickled object) - if the image > was a photo, clearly the .jpg or .png or whatever we get is the best way > to communicate this data, but if the image was generated off an .svg, > it should be distributed with it (and even rebuilt at build-time). Could we made an exception for specially crafted image in order to exercice buffer oveeflow ? (I think particularly art libpng ImageMagick) > > > One is the file format question. It doesn't seem to me that there is > > anything wrong with a binary format as the preferred form for > > modification, in principle. For a file which is typically edited > > using R, including by upstream when they what to edit it, then there > > is no problem. > > Sure. If this data wasn't collected off some scientific > instrument or lovingly hand-made, I strongly believe that we should > rebuild such objects at build time, and use those in the binary > packages. > > > The other is the assertion that this particular case involves a > > generated data table. If this is the case then the source package > > needs to contain the source code which generates the table - and, > > really, it should regenerate the table during the build. (The source > > might be in the form of another R binary object.) > > I completely agree. > > > (Of course there is a third issue: it is probably not the best > > engineering decision to use a binary save format rather than text > > source code. But that's not something the Debian maintainer > > necessarily gets to choose and it's not a reason for an ftpmaster > > reject.) > > > > > > The question asked by Paul is a recurrent question that comes each > > > > time the FTP trainees rotate (basically once per release cycle, > > > > because during the Freeze the FTP trainees find other exciting > > > > tasks to do, and then do not seem to have much time to process NEW > > > > anymore). > > > > > > This must mean many people who care deeply about this topic see this as an > > > issue. > > > > I don't think this is a helpful response to someone who is raising > > what they see as a systematic problem. > > I'm sorry, Charles. Ian's right. That was a poor tone. > > > > > Paul, would it be possible to update the ftpmaster assistant reference > > materials to discuss R's binary files ? > > I would be happy to document what is and isn't OK with these files. I'll > have to seek a bit of consensus from the rest of the ftp-team, but I > think treating them as if they were any other data format should be > fine. > > > > > Ian. > > Thanks, Ian, > Paul > > > > > -- > .''`. Paul Tagliamonte > : :' : Proud Debian Developer > `. `'` 4096R / 8F04 9AD8 2C92 066C 7352 D28A 7B58 5B30 807C 2A87 > `- http://people.debian.org/~paultag
Re: We need a global decision about R data in binary format, and stick to it.
Bastien ROUCARIES writes ("Re: We need a global decision about R data in binary format, and stick to it."): > Le 5 août 2013 15:42, "Paul Tagliamonte" a écrit : > > IMVHO, this is the same as how we should treat images (I mean, for any > > data format, not just this one case of a pickled object) - if the image > > was a photo, clearly the .jpg or .png or whatever we get is the best way > > to communicate this data, but if the image was generated off an .svg, > > it should be distributed with it (and even rebuilt at build-time). > > Could we made an exception for specially crafted image in order to exercice > buffer oveeflow ? (I think particularly art libpng ImageMagick) I think this is something of a red herring corner case, and not really related to the question about R binary objects. If the last thing that happened to the image file was that upstream edited it with a hex editor to introduce a buffer overflow, then the resulting binary file is the preferred form for modification (after all, that's how the last person to do so modified it...) Ian. -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20991.46045.904978.836...@chiark.greenend.org.uk
Re: We need a global decision about R data in binary format, and stick to it.
On 2013-08-05, Paul Tagliamonte wrote: > IMVHO, this is the same as how we should treat images (I mean, for any > data format, not just this one case of a pickled object) - if the image > was a photo, clearly the .jpg or .png or whatever we get is the best way > to communicate this data, but if the image was generated off an .svg, > it should be distributed with it (and even rebuilt at build-time). Whattabout svg files that are converted into png's and then manually adjusted? /Sune -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/slrnkvvdku.j0.nos...@sshway.ssh.pusling.com
Re: We need a global decision about R data in binary format, and stick to it.
On 2013-08-05 14:13:15 +0100 (+0100), Ian Jackson wrote: [...] > The other is the assertion that this particular case involves a > generated data table. If this is the case then the source package > needs to contain the source code which generates the table - and, > really, it should regenerate the table during the build. [...] No argument on the first, but the second sets a bad precedent if interpreted strongly. For example I have a program which relies on a fairly large set of correlative data requiring hours of expensive computation to generate. In the source package I include the original data on which the resulting tables are based and provide a means to regenerate it on the fly at package build time, but disable it by default so that it doesn't chew up build resources unnecessarily. Since I need to generate the correlation data for other (non-Debian) users of the software anyway, I ship the generated files in the source package too and just include them in the binary package (along with instructions and tooling for the end user to be able to build datasets they can use to override the default ones provided). While my example is Python rather than R, I expect it's representative of situations for many scientific tools. Perhaps some guidance on when this tactic is or is not appropriate would be beneficial. -- { PGP( 48F9961143495829 ); FINGER( fu...@cthulhu.yuggoth.org ); WWW( http://fungi.yuggoth.org/ ); IRC( fu...@irc.yuggoth.org#ccl ); WHOIS( STANL3-ARIN ); MUD( kin...@katarsis.mudpy.org:6669 ); } -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20130805151657.gd1...@yuggoth.org
Re: We need a global decision about R data in binary format, and stick to it.
Jeremy Stanley writes ("Re: We need a global decision about R data in binary format, and stick to it."): > No argument on the first, but the second sets a bad precedent if > interpreted strongly. For example I have a program which relies on a > fairly large set of correlative data requiring hours of expensive > computation to generate. In the source package I include the > original data on which the resulting tables are based and provide a > means to regenerate it on the fly at package build time, but disable > it by default so that it doesn't chew up build resources > unnecessarily. That makes sense, and is IMO a good reason for not doing the complete from-scratch build each time. > Since I need to generate the correlation data for other (non-Debian) > users of the software anyway, I ship the generated files in the > source package too and just include them in the binary package > (along with instructions and tooling for the end user to be able to > build datasets they can use to override the default ones provided). > While my example is Python rather than R, I expect it's > representative of situations for many scientific tools. Perhaps some > guidance on when this tactic is or is not appropriate would be > beneficial. There should IMO be a standard way to request a source package to do from-scratch rebuilds for this kind of thing, for QA purposes. Ian. -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20991.51097.617273.783...@chiark.greenend.org.uk
Re: We need a global decision about R data in binary format, and stick to it.
On Mon, Aug 5, 2013 at 4:28 PM, Sune Vuorela wrote: > What about svg files that are converted into png's and then manually > adjusted? I'd say the "source" is the combination of the SVG files plus the adjusted PNGs. I guess you are thinking of a particular case here? What is the reason for manually adjusting them? -- bye, pabs http://wiki.debian.org/PaulWise -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/caktje6e1xcubomaeuajzmkvdhjzumhnpwh04+fw8m19qd8p...@mail.gmail.com
Re: We need a global decision about R data in binary format, and stick to it.
On 2013-08-05 16:41:13 +0100 (+0100), Ian Jackson wrote: [...] > There should IMO be a standard way to request a source package to do > from-scratch rebuilds for this kind of thing, for QA purposes. I absolutely agree. If there were a standard make target or envvar for this purpose I would gladly implement it in my debian/rules. -- { PGP( 48F9961143495829 ); FINGER( fu...@cthulhu.yuggoth.org ); WWW( http://fungi.yuggoth.org/ ); IRC( fu...@irc.yuggoth.org#ccl ); WHOIS( STANL3-ARIN ); MUD( kin...@katarsis.mudpy.org:6669 ); } -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20130805155503.ge1...@yuggoth.org
Re: Finding correct component for Virtual Box / Debian / screen resolution issue
Paul Wise: >> This question is about Virtual Box / Debian / screen resolution without >> having guest additions installed. > > I see, is there any reason to not do that? Security reasons. It weakens isolation between guest and host. See also [1]. Another reason is, guest additions are every now and then not installable. > Anyway, looking at the Xorg.log you posted, it is using VESA. It > rejects (various reasons) all the modes returned by the virtual > firmware and uses some hard-coded built-in modes instead. Probably > this is either #566153 or #563203 and I think has been present > forever; Maybe. Have they been forwarded upstream? Are there workarounds? >> (It should work. Grub can do higher resolutions in grub boot menu as >> noted in my bug report. Why Linux can not?) > > I missed that point. Do you know which driver/module grub is loading > to achieve that? I expect it is using VESA and trusting the virtual > firmware instead. In /etc/default/grub using GRUB_GFXMODE="1280x1024" works, but only for the grub boot menu. I don't know which driver/module grub is loading to achieve that. Other then the GRUB_GFXMODE="1280x1024 change, no other changes. So grub default, whatever that is. Any way I could find out? Probably indeed vesa. (Because other standards available at that early phase don't even support higher resolutions in principle as far I know.) [1] http://www.phoronix.com/scan.php?page=news_item&px=OTk5Mw -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/51ffcfa1.8010...@riseup.net
Re: new hashes (SHA512, SHA3) in apt metadata and .changes files?
On Mon, Aug 05, 2013 at 01:33:24PM +0100, Ian Jackson wrote: > AIUI SHA-512 is faster than SHA-256 on many processors, and not > usually slower on the others. If the hashes are too long, they can be > truncated. Not that, I think it matters, but this got me interested. It appears that in practice this depends entirely on the word size. So SHA-256 is faster on 32bit architectures and SHA-512 is faster on 64bit architectures. The other aspect is that a block update of SHA-256 uses 64 rounds for a 64 byte block. Whereas SHA-512 uses 80 rounds for a 128 byte block update. So SHA-512 lowers the rounds/byte ratio. Now what can we do with this knowledge? Probably negligible. Helmut -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20130805162104.ga32...@alf.mars
Fwd: /etc/hosts and resolving of the local host/domainname - 127.0.0.1 vs. 127.0.1.1
Sorry I'm a bit late contributing to this discussion. Christoph Anton Mitterer wrote: > The eventual result[1] was that Debian nowadays ships > /etc/hosts like these per default: > > 127.0.0.1 localhost > 127.0.1.1 . > > As also described in the Debian reference[2]. That's not entirely accurate. Wheezy and Ubuntu Desktop install an /etc/hosts like the following, without a domain_name. 127.0.0.1 localhost 127.0.1.1 The Debian Reference is out of date. Some years ago it was the case that if a machine had a static external IP address then this was listed instead of '127.0.1.1'. I presume that this is still the case but I haven't checked (and I am on the road, so can't easily check, sorry). > The hostname is not necessarily a domain name, at least not > de jure. Right. Ideally nothing would blindly treat the system hostname as a domain name. I don't know how that practice ever got started, but it overlooked the fact that machines can have multiple domain names and multiple IP addresses, any of which can be externally administered and any of which can be changed at any time. The machine itself doesn't even know when its domain names change. > But in reality, many programs and people rely or are at least > used to the hostname being resolvable. > That practise won't change and we cannot do much about it. That seems too pessimistic to me. If there are broken programs we can patch them. > - Most applications that listen to the loopback actually > only listen to 127.0.0.1 (and perhaps ::1) but often not > to 127.0.0.0/8. Last time I checked, most applications that listen on 127.0.0.1 listen on all addresses, thus including 127.0.0.0/8. This is why resolving the hostname to 127.0.1.1 actually causes few if any problems in practice. > => so the overall proposal (I) is: > If no one has any technical reasons against, can we stop using > 127.0.1.1 and let the hostname point to 127.0.0.1 as in: > 127.0.0.1 localhost > 127.0.0.1 foobar[.bar.net foobar] Strictly speaking, each IP address in /etc/hosts should be represented by no more than one line. Your proposal has the consequence that 'localhost' is the canonical name for 'foobar'. Please don't do this. I don't want to return to the days of 'localhost' appearing in log files and command line prompts. Simon McVittie wrote: > libnss-myhostname is basically this, and is packaged. It tries > to return a public address if possible, only falling back to > 127.0.0.2 (upstream), 127.0.1.1 (as patched in Debian) or ::1 > (IPv6) if there's nothing more suitable. This is exactly what you need if you need the system hostname to be resolvable to an IP address. (And I am prepared to believe that we still need that, even though I haven't tested it recently.) With the nsswitch configuration hosts: files ... dns ... myhostname myhostname resolves the system hostname if nothing else does so first. So it can be overridden either by DNS or by /etc/hosts. If the system hostname changes, no file has to be edited. Nice. Also nice is the fact that myhostname resolves the system hostname to an external address if there is one, increasing the chances that the result is similar to what would be obtained from DNS. Wouter Verhelst wrote: > The right way, in my opinion, is that /etc/hosts should > look like this: > > 127.0.0.1 localhost > 127.0.0.1 hostname.domain hostname Strictly speaking there should be no more than one line per IP address, so that would be 127.0.0.1 localhost hostname.domain hostname in which case 'localhost' is the canonical name for alias 'hostname'. > or, alternatively: > > 127.0.0.1 hostname.domain hostname localhost In that case 'hostname.domain' is the canonical name for alias 'localhost'. Before any move is made to conflate the system hostname with 'localhost' in this way I'd like to see some proof that this no longer causes any malfunction, or if it does cause malfunction (e.g., 'localhost' appearing in log files) then I'd like to see the malfunctioning packages fixed in advance of the transition from 127.0.1.1 to 127.0.0.1. And before making this potentially disruptive change, I'd like to see evidence that the current practice actually causes problems --- problems that can't easily be solved by patching individual packages either to make them listen on 127.0.1.1 on the one hand or to make them talk to localhost on the other. -- Thomas Hood -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/cajn8kfcqbzh6scduqya1udjn397xo3wwvaj1mbfvzhvghkk...@mail.gmail.com
Re: We need a global decision about R data in binary format, and stick to it.
]] Ian Jackson > Bastien ROUCARIES writes ("Re: We need a global decision about R data in > binary format, and stick to it."): > > Le 5 août 2013 15:42, "Paul Tagliamonte" a écrit : > > > IMVHO, this is the same as how we should treat images (I mean, for any > > > data format, not just this one case of a pickled object) - if the image > > > was a photo, clearly the .jpg or .png or whatever we get is the best way > > > to communicate this data, but if the image was generated off an .svg, > > > it should be distributed with it (and even rebuilt at build-time). > > > > Could we made an exception for specially crafted image in order to exercice > > buffer oveeflow ? (I think particularly art libpng ImageMagick) > > I think this is something of a red herring corner case, and not really > related to the question about R binary objects. Agreed. > If the last thing that happened to the image file was that upstream > edited it with a hex editor to introduce a buffer overflow, then the > resulting binary file is the preferred form for modification (after > all, that's how the last person to do so modified it...) Or more precisely, it's no longer an image that you tend to use for, well, displaying something. It's a test for a buffer overflow that also happens to be an image. (Saying that just because somebody last edited a file with a hex editor then that's the preferred form for modification leaves a pretty large hole. If I make a change to a blob and change a 2012 to 2013 in a copyright notice, it's obvious that the blob isn't its own source.) -- Tollef Fog Heen UNIX is user friendly, it's just picky about who its friends are -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/m2siyoqa93@rahvafeir.err.no
Re: We need a global decision about R data in binary format, and stick to it.
On Mon, 05 Aug 2013, Ian Jackson wrote: > The other is the assertion that this particular case involves a > generated data table. If this is the case then the source package > needs to contain the source code which generates the table - and, > really, it should regenerate the table during the build. (The source > might be in the form of another R binary object.) I know of almost no cases where someone actually generated the R binary object directly. In general, you have a data table represented as some kind of text file, and then you do operations on it, which result in a R binary object being created from a collection of text files. Subsequently, you might load the R binary object and modify it within R, but for some modifications, you might want to go back to the original data table. It's unfortunately common practice for R upstreams to ship the binary object instead of the combination of original tables and R source necessary to generate the actual R binary save data, but this is something that should be changed, and Debian should be working to lead the charge to do this. In almost all cases, dropping the R binary object(s) do not appreciably change the functionality of the R module; it just means that it is more difficult to use the examples because there is no example data. -- Don Armstrong http://www.donarmstrong.com in Just- spring when the world is mud- luscious the little lame baloonman whistles far and wee -- e.e. cummings "[in Just-]" -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20130805224955.gd14...@rzlab.ucr.edu
Re: [Debian-med-packaging] We need a global decision about R data in binary format, and stick to it.
Hi Joerg and Paul, thank you for your prompt answers and thank for everybody's contribution. I would like to focus my questions on R binary objects that represent data that was not entirely computer-generated (that is, for which the source code can not be summarised by a mathematical formula and simple starting values). Note also that a large number of other software, like LibreOffice for instance, allow to store unformatted textual data as a binary object. Therefore "binary object" does not mean that the content is impractical to retreive. My first question is: to what extent do we need to verify that the object can be regenerated. - The starting point is a source package with a R binary object. - With this starting point only, it may be impossible to know if it has a source or not. Has the upstream developer typed the results by hand in a R session, for instance when collecting data from a table in a printed report, did he collect his data in a file, not provided in the source package, or does he need a combination of data and scripts to regenerate the binary object ? Unless the answer can be found on the Internet, one has to ask the author directly. - If we have to ask, how long do we need to wait for the answer, and what is the conclusion in case there is no answer. My second question is: to what extent do we need the source. - When the R binary object is a table that has been generated by hand, my understanding is that it does not matter whatever format Upstream prefers, since it is trivial for anybody to export the R object into his favorite format for modification. - When the data in the R binary object has been produced by processing another data file, to what point do we need to go backwards ? This is an important question, because at the end of the chain of rebuildability, there can be gigabytes of data. - When the source of the binary object is not strictly necessary for making relevant modifications, can we distribute the package in Debian ? My last question is, given the answers to the previous questions, what do we do with the R packages that are already in the archive and also contain data that is editable as is but do have an original source, who will do it, and what is the timeline in case of inaction. Also, since the case of pictures have been discussed, here is a parallel between R objects and PNG files is the following. 1) In the PNG file's metadata, there is a field that can indicate if for instance it was made by Inkscape. However, in presence of that field, one can not conclude if the SVG source is still existing, or if it exists on the computer of a contributor, but the upstream developers decided to discard it. 2) If a program displays an image in PNG format and does not use its SVG source, while one can regret that the source is not available, it does not prevent from editing the PNG, or even replacing it entirely. 3) One could consider to scan the Debian archive for PNG files made with Inkscape with no corresponding SVG file in the source package. Would such packages be non-Free ? If yes, how long would you wait before removing the package ? While writing this answer, I also read Don's email advocating for Debian to take the lead and change the current practice in the R community, that prefers to ditribute data as R binary objects in the source packages. This is laudable, but I expect that it will take time, and it needs people who have roots in both communities. In the current situation, that I describe as "active bitrotting", we do not apply the same rules to the packages that enter the archive and the packages that are already in, which cause the packages under active development to become obsolete each time new dependancies can not enter in Debian. Given the rotten tomatoes that fly on my face because I can not update anymore the r-cran-ggplot2 package, I do not feel fit to the task of negociating with the R community to change its traditions. In any case, I think that we need clear guidelines, that help to foresee if a R package is acceptable or not in Debian, so that we can better decide if we undertake the work at all. Currently, my take would be to move packages to non-free. This would also allow us to ship the PDF documentation that we currently delete. Cheers, -- Charles Plessy Debian Med packaging team, http://www.debian.org/devel/debian-med Tsurumi, Kanagawa, Japan -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20130805232904.ga8...@falafel.plessy.net
Re: Bug#718791: ITP: mikutter -- Simple, powerful and moeful twitter client
On Mon, Aug 05, 2013 at 10:35:01PM +0900, HIGUCHI Daisuke (VDR dai) wrote: > Package: wnpp > Severity: wishlist > Owner: "HIGUCHI Daisuke (VDR dai)" > > * Package name: mikutter > Version : 0.2.2.1318 > Upstream Author : Toshiaki Asai > * URL : http://mikutter.hachune.net/ > * License : GPL-3, CC-BY-SA-3.0 > Programming Lang: Ruby > Description : Simple, powerful and moeful twitter client ^^ > Mikutter is a simple, powerful and moeful twitter client. ^^ I can't find any definition of "moeful" and therefore is more of a hindrance to understanding the description than an aid. >* Followee, Follower list No such word. -- "If you're not careful, the newspapers will have you hating the people who are being oppressed, and loving the people who are doing the oppressing." --- Malcolm X -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20130805235929.GC23885@tal
Re: [Debian-med-packaging] We need a global decision about R data in binary format, and stick to it.
On Tue, 06 Aug 2013, Charles Plessy wrote: > My first question is: to what extent do we need to verify that the > object can be regenerated. > > - The starting point is a source package with a R binary object. > - With this starting point only, it may be impossible to know if it > has a source or not. [...] > Unless the answer can be found on the Internet, one has to ask the > author directly. > - If we have to ask, how long do we need to wait for the answer, and > what is the conclusion in case there is no answer. We should ask if there is any question. If we get no answer, we should use our best judgment as to the likely case. Non-responsive upstreams also should cause us to question whether we should be distributing the package at all. > My second question is: to what extent do we need the source. > > - When the R binary object is a table that has been generated by > hand, my understanding is that it does not matter whatever format > Upstream prefers, since it is trivial for anybody to export the R > object into his favorite format for modification. The original table in any form is source, then. But if there are any subsequent alterations to the table, we should distribute those subsequent alterations. In many cases, you take the original raw data, and then alter it. If the code to do that exists, we should take the original raw data, and do the alterations. [This should really be SOP for all modules in R, because to do otherwise means that it is very difficult to reproduce your alterations in the event of wrong data or new data.] > - When the data in the R binary object has been produced by > processing another data file, to what point do we need to go > backwards ? This is an important question, because at the end of the > chain of rebuildability, there can be gigabytes of data. This is a far more difficult case, but if this data exists and can be digitally distributed Debian should have it and distribute it. Perhaps not in the source package, but almost certainly in a data package somewhere. [And honestly, there are very few interesting R packages which we can actually distribute where this is really the case. I can't think of any we currently distribute, and the main ones I can think of involve databases of sequences for microarrays, and there you actually want the complete data anyway.] > - When the source of the binary object is not strictly necessary for > making relevant modifications, can we distribute the package in > Debian ? If the source isn't strictly necessary, we should remove the binary object, and distribute the package. > My last question is, given the answers to the previous questions, what > do we do with the R packages that are already in the archive and also > contain data that is editable as is but do have an original source, > who will do it, and what is the timeline in case of inaction. The package maintainer should handle it; in the case of inaction from upstream, the package maintainer can then either remove the data, split the package, move the package to non-free, or remove the package from Debian entirely. The timeline should be the standard one that is used for all RC bugs. > In the current situation, that I describe as "active bitrotting", we > do not apply the same rules to the packages that enter the archive and > the packages that are already in, which cause the packages under > active development to become obsolete each time new dependancies can > not enter in Debian. We actually do and should apply the same rules. Sometimes violations of the rules are missed for a while, though, and we have to come back and file bugs with severity serious to deal with the problem. > Currently, my take would be to move packages to non-free. This would > also allow us to ship the PDF documentation that we currently delete. In these cases, we should split the package out into a non-free component and a free component. I should note that I'm currently distributing via debian-r.debian.net a few hundred packages which probably have this particular problem too. -- Don Armstrong http://www.donarmstrong.com in Just- spring when the world is mud- luscious the little lame baloonman whistles far and wee -- e.e. cummings "[in Just-]" -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20130806004416.gf14...@rzlab.ucr.edu
Re: Non-identical files with identical md5sums on Debian systems?
On Mon, Aug 05, 2013 at 02:15:41PM +0100, Ian Jackson wrote: > Russ Allbery writes ("Re: Non-identical files with identical md5sums on > Debian systems?"): > > Unless you have a collection of MD5 collision attacks, or have installed a > > package that includes a sample MD5 collision, [...] > > For the sake of sanity of our (still) MD5-based tools, I hope that > no-one uploads into our archive a package with an example MD5 > collision. (Unless the colliding files are wrapped up somehow, to > protect our infrastructure from any untoward behaviour.) What in our infrastructure would break on an MD5 collision anyway? The closest thing I could think of is dedup.debian.net, but that appears to use SHA512. -- Kind regards, Loong Jin signature.asc Description: Digital signature
Re: Bug#718791: ITP: mikutter -- Simple, powerful and moeful twitter client
On Tue, Aug 06, 2013 at 11:59:29AM +1200, Chris Bannister wrote: > On Mon, Aug 05, 2013 at 10:35:01PM +0900, HIGUCHI Daisuke (VDR dai) wrote: > > Package: wnpp > > Severity: wishlist > > Owner: "HIGUCHI Daisuke (VDR dai)" > > > > * Package name: mikutter > > Version : 0.2.2.1318 > > Upstream Author : Toshiaki Asai > > * URL : http://mikutter.hachune.net/ > > * License : GPL-3, CC-BY-SA-3.0 > > Programming Lang: Ruby > > Description : Simple, powerful and moeful twitter client >^^ > > > Mikutter is a simple, powerful and moeful twitter client. > ^^ > > I can't find any definition of "moeful" and therefore is more of a > hindrance to understanding the description than an aid. Probably a conjugation of "moe" and "-ful". Just "moe" would probably better describe this. > >* Followee, Follower list > >No such word. -- Kind regards, Loong Jin signature.asc Description: Digital signature