Here document / redirection / background process weirdness
Configuration Information [Automatically generated, do not change]: Machine: x86_64 OS: linux-gnu Compiler: gcc Compilation CFLAGS: -DPROGRAM='bash' -DCONF_HOSTTYPE='x86_64' -DCONF_OSTYPE='linux-gnu' -DCONF_MACHTYPE='x86_64-unknown-linux-gnu' -DCONF_VENDOR='unknown' -DLOCALEDIR='/home/mjackson/src/bash-4.2/_install/share/locale' -DPACKAGE='bash' -DSHELL -DHAVE_CONFIG_H -I. -I. -I./include -I./lib -g -O2 uname output: Linux stagecoach 3.5.0-25-generic #39-Ubuntu SMP Mon Feb 25 18:26:58 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux Machine Type: x86_64-unknown-linux-gnu Bash Version: 4.2 Patch Level: 0 Release Status: release Description: When executing a here document in bash, with the here document piped to another instance of bash, where the here document contains "echo <&- &; wait", the here document gets executed twice. I have seen this on Ubuntu both with the current 4.2.37 from ubuntu and the latest bash tarball (details above) Repeat-By: #!/home/mjackson/src/bash-4.2/_install/bin/bash /home/mjackson/src/bash-4.2/_install/bin/bash <
Re: documentation... readonly help is not accurate
On 3/4/13 11:12 AM, Gotmy Nick wrote: > On 4 March 2013 14:51, Chet Ramey wrote: >> >> This isn't accurate. Run the following script: >> >> foo() >> { >> echo foo >> } >> bar=quux >> >> readonly foo >> readonly bar >> >> readonly >> >> In addition to the built-in bash readonly variables, both bar and foo will >> be listed. > > Maybe I'm wrong, but this is creating a variable "foo" which is empty. Yes, you're correct. My fault. I'll take another look. Chet -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/
Re: gnu parallel in the bash manual
John Kearney wrote: > The example is bad anyway as you normally don't want to parallelize disk > io , due to seek overhead and io bottle neck congestion. This example > will be slower and more likely to damage your disk than simply using mv > on its own. but thats another discussion. --- That depends on how many IOPS your disk subsystem can handle and how much cpu is between each of the IO calls. Generally, unless you have a really old, non-queuing disk, >1 procs will be of help. If you have a RAID, it can go up with # of data spindles (as a max, though if all are reading from the same area, not so much...;-))... Case in point, I wanted to compare rpm versions of files on disk in a dir to see if there were duplicate version, and if so, only keep the newest (highest numbered) version) (with the rest going into a per-disk recycling bin (a fall-out of sharing those disks to windows and implementing undo abilities on the shares (samba, vfs_recycle). I was working directories with 1000's of files -- (1 dir, after pruning has 10,312 entries). Sequential reading of those files was DOG slow. I parallelized it (using perl) first by sorting all the names, then breaking it into 'N' lists -- doing those in parallel, then merging the results (and comparing end-points -- like end of one list might have been diff-ver from beginning of next). I found a dynamic 'N' based on max cpu load v.disk (i.e. no matter how many procs I threw at it, it still used about 75% cpu). So I chose 9: Hot cache: Read 12161 rpm names. Use 1 procs w/12162 items/process #pkgs=10161, #deletes=2000, total=12161 Recycling 2000 duplicates...Done Cumulative This Phase ID 0.000s 0.000s Init 0.000s 0.000s start_program 0.038s 0.038s starting_children 0.038s 0.001s end_starting_children 8.653s 8.615s endRdFrmChldrn_n_start_re_sort 10.733s 2.079s afterFinalSort 17.94sec 3.71usr 6.21sys (55.29% cpu) --- Read 12161 rpm names. Use 9 procs w/1353 items/process #pkgs=10161, #deletes=2000, total=12161 Recycling 2000 duplicates...Done Cumulative This Phase ID 0.000s 0.000s Init 0.000s 0.000s start_program 0.032s 0.032s starting_children 0.036s 0.004s end_starting_children 1.535s 1.500s endRdFrmChldrn_n_start_re_sort 3.722s 2.187s afterFinalSort 10.36sec 3.31usr 4.47sys (75.09% cpu) Cold Cache: Read 12161 rpm names. Use 1 procs w/12162 items/process #pkgs=10161, #deletes=2000, total=12161 Recycling 2000 duplicates...Done Cumulative This Phase ID 0.000s 0.000s Init 0.000s 0.000s start_program 0.095s 0.095s starting_children 0.096s 0.001s end_starting_children 75.067s 74.971s endRdFrmChldrn_n_start_re_sort 77.140s 2.073s afterFinalSort 84.52sec 3.62usr 6.26sys (11.70% cpu) Read 12161 rpm names. Use 9 procs w/1353 items/process #pkgs=10161, #deletes=2000, total=12161 Recycling 2000 duplicates...Done Cumulative This Phase ID 0.000s 0.000s Init 0.000s 0.000s start_program 0.107s 0.107s starting_children 0.112s 0.005s end_starting_children 29.350s 29.238s endRdFrmChldrn_n_start_re_sort 31.497s 2.147s afterFinalSort 38.27sec 3.35usr 4.47sys (20.47% cpu) --- hot cache savings: 42% cold cache savings: 55%
Re: gnu parallel in the bash manual
Am 06.03.2013 01:03, schrieb Linda Walsh: > > John Kearney wrote: >> The example is bad anyway as you normally don't want to parallelize disk >> io , due to seek overhead and io bottle neck congestion. This example >> will be slower and more likely to damage your disk than simply using mv >> on its own. but thats another discussion. > --- > That depends on how many IOPS your disk subsystem can > handle and how much cpu is between each of the IO calls. > Generally, unless you have a really old, non-queuing disk, >> 1 procs will be of help. If you have a RAID, it can go > up with # of data spindles (as a max, though if all are reading > from the same area, not so much...;-))... > > > Case in point, I wanted to compare rpm versions of files > on disk in a dir to see if there were duplicate version, and if so, > only keep the newest (highest numbered) version) (with the rest > going into a per-disk recycling bin (a fall-out of sharing > those disks to windows and implementing undo abilities on > the shares (samba, vfs_recycle). > > I was working directories with 1000's of files -- (1 dir, > after pruning has 10,312 entries). Sequential reading of those files > was DOG slow. > > I parallelized it (using perl) first by sorting all the names, > then breaking it into 'N' lists -- doing those in parallel, then > merging the results (and comparing end-points -- like end of one list > might have been diff-ver from beginning of next). I found a dynamic > 'N' based on max cpu load v.disk (i.e. no matter how many procs I > threw at it, it still used about 75% cpu). > > So I chose 9: > > Hot cache: > Read 12161 rpm names. > Use 1 procs w/12162 items/process > #pkgs=10161, #deletes=2000, total=12161 > Recycling 2000 duplicates...Done > Cumulative This Phase ID > 0.000s 0.000s Init > 0.000s 0.000s start_program > 0.038s 0.038s starting_children > 0.038s 0.001s end_starting_children > 8.653s 8.615s endRdFrmChldrn_n_start_re_sort > 10.733s 2.079s afterFinalSort > 17.94sec 3.71usr 6.21sys (55.29% cpu) > --- > Read 12161 rpm names. > Use 9 procs w/1353 items/process > #pkgs=10161, #deletes=2000, total=12161 > Recycling 2000 duplicates...Done > Cumulative This Phase ID > 0.000s 0.000s Init > 0.000s 0.000s start_program > 0.032s 0.032s starting_children > 0.036s 0.004s end_starting_children > 1.535s 1.500s endRdFrmChldrn_n_start_re_sort > 3.722s 2.187s afterFinalSort > 10.36sec 3.31usr 4.47sys (75.09% cpu) > > Cold Cache: > > Read 12161 rpm names. > Use 1 procs w/12162 items/process > #pkgs=10161, #deletes=2000, total=12161 > Recycling 2000 duplicates...Done > Cumulative This Phase ID > 0.000s 0.000s Init > 0.000s 0.000s start_program > 0.095s 0.095s starting_children > 0.096s 0.001s end_starting_children > 75.067s 74.971s endRdFrmChldrn_n_start_re_sort > 77.140s 2.073s afterFinalSort > 84.52sec 3.62usr 6.26sys (11.70% cpu) > > Read 12161 rpm names. > Use 9 procs w/1353 items/process > #pkgs=10161, #deletes=2000, total=12161 > Recycling 2000 duplicates...Done > Cumulative This Phase ID > 0.000s 0.000s Init > 0.000s 0.000s start_program > 0.107s 0.107s starting_children > 0.112s 0.005s end_starting_children > 29.350s 29.238s endRdFrmChldrn_n_start_re_sort > 31.497s 2.147s afterFinalSort > 38.27sec 3.35usr 4.47sys (20.47% cpu) > > --- > hot cache savings: 42% > cold cache savings: 55% > > > > > Different use case you can't really compare mv to data processing. And Generally it is a bad idea, unless yo know what you are doing. trying to parallelize mv /* Is a bad idea unless you are on some expensive hardware. This is because of the sequential nature of the access model. Your use case was a sparse access model and there is normally no performance penalty to interleaving sparse access methods. Depending on the underlying hardware it can be very costly to interleave sequential access streams especially on embedded devices e.g. emmc. Not to mention the sync object overhead you may be incurring in the fs driver and or hardware driver. With 13000 files in one directory you must have been taking a dir list and file open access penalty. What fs was that on? I'm tempted to say 1 reason parallelization helped in you above example was due to the fs overhead for a directory that size. Generally I don't advise having more files in a directory than be be contained in 1 extent. In general max cpu is not the best metric to go by, rather compare real time, as the 2 don't always correlate. Was that