Package: moreutils
Version: 0.41 

I have a huge number of files in a directory (which is tmpfs mounted)

> ls |wc -l
  9010
> du -sh
12G
> ls *.out |wc -l
5003

This is a 96GB RAM dual CPU hexacore machine, so I have 12 cores in all.
Now I like to compute the md5sum of all files *.out in parallel and
write them to a file. My first attempt:

rm -f /tmp/list ; parallel.moreutils md5sum -- *.out > /tmp/list ; wc -l 
/tmp/list
4666 /tmp/list

I repreat the command several times:
> rm -f /tmp/list ; parallel.moreutils md5sum -- *.out > /tmp/list ; wc -l 
> /tmp/list
4656 /tmp/list
> rm -f /tmp/list ; parallel.moreutils md5sum -- *.out > /tmp/list ; wc -l 
> /tmp/list
4660 /tmp/list
> rm -f /tmp/list ; parallel.moreutils md5sum -- *.out > /tmp/list ; wc -l 
> /tmp/list
4687 /tmp/list
> rm -f /tmp/list ; parallel.moreutils md5sum -- *.out > /tmp/list ; wc -l 
> /tmp/list
4683 /tmp/list


Adding -n100 or reducing the number of parallel jobs using -j3 does
not help. It's a non-deterministic behaviour.

GNU parallel 20120322 does the same job perfect. It always has 5003
lines in the output file.


I wonder why adding a pipe and a cat command to parallel.moreutils fixes this ?
> rm -f /tmp/list ; parallel.moreutils -n100 md5sum -- *.out | cat > /tmp/list 
> ; wc -l /tmp/list
5003 /tmp/list

Is this a lack of documentation or a bug?

-- 
regards Thomas


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Reply via email to