Am 06.03.2013 01:03, schrieb Linda Walsh: > > John Kearney wrote: >> The example is bad anyway as you normally don't want to parallelize disk >> io , due to seek overhead and io bottle neck congestion. This example >> will be slower and more likely to damage your disk than simply using mv >> on its own. but thats another discussion. > --- > That depends on how many IOPS your disk subsystem can > handle and how much cpu is between each of the IO calls. > Generally, unless you have a really old, non-queuing disk, >> 1 procs will be of help. If you have a RAID, it can go > up with # of data spindles (as a max, though if all are reading > from the same area, not so much...;-))... > > > Case in point, I wanted to compare rpm versions of files > on disk in a dir to see if there were duplicate version, and if so, > only keep the newest (highest numbered) version) (with the rest > going into a per-disk recycling bin (a fall-out of sharing > those disks to windows and implementing undo abilities on > the shares (samba, vfs_recycle). > > I was working directories with 1000's of files -- (1 dir, > after pruning has 10,312 entries). Sequential reading of those files > was DOG slow. > > I parallelized it (using perl) first by sorting all the names, > then breaking it into 'N' lists -- doing those in parallel, then > merging the results (and comparing end-points -- like end of one list > might have been diff-ver from beginning of next). I found a dynamic > 'N' based on max cpu load v.disk (i.e. no matter how many procs I > threw at it, it still used about 75% cpu). > > So I chose 9: > > Hot cache: > Read 12161 rpm names. > Use 1 procs w/12162 items/process > #pkgs=10161, #deletes=2000, total=12161 > Recycling 2000 duplicates...Done > Cumulative This Phase ID > 0.000s 0.000s Init > 0.000s 0.000s start_program > 0.038s 0.038s starting_children > 0.038s 0.001s end_starting_children > 8.653s 8.615s endRdFrmChldrn_n_start_re_sort > 10.733s 2.079s afterFinalSort > 17.94sec 3.71usr 6.21sys (55.29% cpu) > --------------- > Read 12161 rpm names. > Use 9 procs w/1353 items/process > #pkgs=10161, #deletes=2000, total=12161 > Recycling 2000 duplicates...Done > Cumulative This Phase ID > 0.000s 0.000s Init > 0.000s 0.000s start_program > 0.032s 0.032s starting_children > 0.036s 0.004s end_starting_children > 1.535s 1.500s endRdFrmChldrn_n_start_re_sort > 3.722s 2.187s afterFinalSort > 10.36sec 3.31usr 4.47sys (75.09% cpu) > > Cold Cache: > ============ > Read 12161 rpm names. > Use 1 procs w/12162 items/process > #pkgs=10161, #deletes=2000, total=12161 > Recycling 2000 duplicates...Done > Cumulative This Phase ID > 0.000s 0.000s Init > 0.000s 0.000s start_program > 0.095s 0.095s starting_children > 0.096s 0.001s end_starting_children > 75.067s 74.971s endRdFrmChldrn_n_start_re_sort > 77.140s 2.073s afterFinalSort > 84.52sec 3.62usr 6.26sys (11.70% cpu) > ---- > Read 12161 rpm names. > Use 9 procs w/1353 items/process > #pkgs=10161, #deletes=2000, total=12161 > Recycling 2000 duplicates...Done > Cumulative This Phase ID > 0.000s 0.000s Init > 0.000s 0.000s start_program > 0.107s 0.107s starting_children > 0.112s 0.005s end_starting_children > 29.350s 29.238s endRdFrmChldrn_n_start_re_sort > 31.497s 2.147s afterFinalSort > 38.27sec 3.35usr 4.47sys (20.47% cpu) > > --- > hot cache savings: 42% > cold cache savings: 55% > > > > > Different use case you can't really compare mv to data processing. And Generally it is a bad idea, unless yo know what you are doing. trying to parallelize mv <dir 1>/* <Dir 2> Is a bad idea unless you are on some expensive hardware. This is because of the sequential nature of the access model.
Your use case was a sparse access model and there is normally no performance penalty to interleaving sparse access methods. Depending on the underlying hardware it can be very costly to interleave sequential access streams especially on embedded devices e.g. emmc. Not to mention the sync object overhead you may be incurring in the fs driver and or hardware driver. With 13000 files in one directory you must have been taking a dir list and file open access penalty. What fs was that on? I'm tempted to say 1 reason parallelization helped in you above example was due to the fs overhead for a directory that size. Generally I don't advise having more files in a directory than be be contained in 1 extent. In general max cpu is not the best metric to go by, rather compare real time, as the 2 don't always correlate. Was that comment about the raid from me? Anyway it depends on if you have sw/hw raid, and the type of raid. Take for example a decent sas controller card with 16 fast ssd drive in a 1+0 raid configuration. At least for reads there are no real effective limits on parallel access. If however you have a sw Raid5/6, it might as well be a spindle drive, from this perspective.