Arcady Genkin put forth on 7/12/2010 11:52 AM: > On Mon, Jul 12, 2010 at 02:05, Stan Hoeppner <s...@hardwarefreak.com> wrote: > >> lvcreate -i 10 -I [stripe_size] -l 102389 vg0 >> >> I believe you're losing 10x performance because you have a 10 "disk" mdadm >> stripe but you didn't inform lvcreate about this fact. > > Hi, Stan: > > I believe that the -i and -I options are for using *LVM* to do the > striping, am I wrong?
If this were the case, lvcreate would require the set of physical or pseudo (mdadm) device IDs to stripe across wouldn't it? There are no options in lvcreate to specify physical or pseudo devices. The only input to lvcreate is a volume group ID. Therefor, lvcreate is ignorant of the physical devices underlying it, is it not? > In our case (when LVM sits on top of one RAID0 > MD stripe) the option -i does not seem to make sense: > > test4:~# lvcreate -i 10 -I 1024 -l 102380 vg0 > Number of stripes (10) must not exceed number of physical volumes (1) It makes sense once you accept the fact that lvcreate is ignorant of the underlying disk device count/configuration. Once you accept that fact, you will realize the -i option is what allows one to educate lvcreate that there are, in your case, 10 devices underlying it which one desires to stripe data across. I believe the -i option exists merely to educate lvcreate about the underlying device structure. > My understanding is that LVM should be agnostic of what's underlying > it as the physical storage, so it should treat the MD stripe as one > large disk, and thus let the MD device to handle the load balancing > (which it seems to be doing fine). If lvcreate is agnostic of the underlying structure, why does it have stripe width and stripe size options at all? As a parallel example of this, filesystems such as XFS are ignorant of underlying disk structure as well. mkfs.xfs has no less than 4 sub options to optimize its performance atop RAID stripes. One of it's options, sw, specifies stripe width, which is the number of physical or logical devices in the RAID stripe. In your case, if you use xfs, this would be "-d sw=10". These options in lvcreate serve the same function as those in mkfs.xfs, which is to optimize their performance atop a RAID stripe. > Besides, the speed we are getting from the LVM volume is more than > twice slower than an individual component of the RAID10 stripe. Even > if we assume that LVM manages somehow distribute its data so that it > always hits only one physical disk (a disk triplet in our case), there > would still be the question why it is doing it *that* slow. It's 57 > MB/s vs 134 MB/s that an individual triplet can do: Forget comparing performance to one of your single mdadm mirror sets. What's key here, and why I suggested "lvcreate -i 10 .." to begin with, is the fact that your lvm performance is almost exactly 10 times lower than the underlying mdadm device, which has exactly 10 physical stripes. Isn't that more than just a bit coincidental? The 10x drop only occurs when talking to the lvm device. Put on your Sherlock Holmes hat for a minute. > We are using chunk size of 1024 (i.e. 1MB) with the MD devices. For > the record, we used the following commands to create the md devices: > > For N in 0 through 9: > mdadm --create /dev/mdN -v --raid-devices=3 --level=raid10 \ > --layout=n3 --metadata=0 --bitmap=internal --bitmap-chunk=2048 \ > --chunk=1024 /dev/sdX /dev/sdY /dev/sdZ Is that a typo, or are you turning those 3 disk mdadm sets into RAID10 as shown above, instead of the 3-way mirror sets you stated previously? RAID 10 requires a minimum of 4 disks, you have 3. Something isn't right here... > Then the big stripe: > mdadm --create /dev/md10 -v --raid-devices=10 --level=stripe \ > --metadata=1.0 --chunk=1024 /dev/md{0,5,1,6,2,7,3,8,4,9} And I'm pretty sure this is the stripe lvcreate needs to know about to fix the 10x performance drop issue. Create a new lvm test volume with the lvcreate options I've mentioned, and see how it performs against the current 400GB test volume that's running slow. -- Stan -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/4c3b937c.1080...@hardwarefreak.com