Arcady Genkin put forth on 7/12/2010 11:52 AM:
> On Mon, Jul 12, 2010 at 02:05, Stan Hoeppner <s...@hardwarefreak.com> wrote:
> 
>> lvcreate -i 10 -I [stripe_size] -l 102389 vg0
>>
>> I believe you're losing 10x performance because you have a 10 "disk" mdadm
>> stripe but you didn't inform lvcreate about this fact.
> 
> Hi, Stan:
> 
> I believe that the -i and -I options are for using *LVM* to do the
> striping, am I wrong?  

If this were the case, lvcreate would require the set of physical or pseudo
(mdadm) device IDs to stripe across wouldn't it?  There are no options in
lvcreate to specify physical or pseudo devices.  The only input to lvcreate is
a volume group ID.  Therefor, lvcreate is ignorant of the physical devices
underlying it, is it not?

> In our case (when LVM sits on top of one RAID0
> MD stripe) the option -i does not seem to make sense:
> 
> test4:~# lvcreate -i 10 -I 1024 -l 102380 vg0
>   Number of stripes (10) must not exceed number of physical volumes (1)

It makes sense once you accept the fact that lvcreate is ignorant of the
underlying disk device count/configuration.  Once you accept that fact, you
will realize the -i option is what allows one to educate lvcreate that there
are, in your case, 10 devices underlying it which one desires to stripe data
across.  I believe the -i option exists merely to educate lvcreate about the
underlying device structure.

> My understanding is that LVM should be agnostic of what's underlying
> it as the physical storage, so it should treat the MD stripe as one
> large disk, and thus let the MD device to handle the load balancing
> (which it seems to be doing fine).

If lvcreate is agnostic of the underlying structure, why does it have stripe
width and stripe size options at all?  As a parallel example of this,
filesystems such as XFS are ignorant of underlying disk structure as well.
mkfs.xfs has no less than 4 sub options to optimize its performance atop RAID
stripes.  One of it's options, sw, specifies stripe width, which is the number
of physical or logical devices in the RAID stripe.  In your case, if you use
xfs, this would be "-d sw=10".  These options in lvcreate serve the same
function as those in mkfs.xfs, which is to optimize their performance atop a
RAID stripe.

> Besides, the speed we are getting from the LVM volume is more than
> twice slower than an individual component of the RAID10 stripe.  Even
> if we assume that LVM manages somehow distribute its data so that it
> always hits only one physical disk (a disk triplet in our case), there
> would still be the question why it is doing it *that* slow.  It's 57
> MB/s vs 134 MB/s that an individual triplet can do:

Forget comparing performance to one of your single mdadm mirror sets.  What's
key here, and why I suggested "lvcreate -i 10 .." to begin with, is the fact
that your lvm performance is almost exactly 10 times lower than the underlying
mdadm device, which has exactly 10 physical stripes.  Isn't that more than
just a bit coincidental?  The 10x drop only occurs when talking to the lvm
device.  Put on your Sherlock Holmes hat for a minute.

> We are using chunk size of 1024 (i.e. 1MB) with the MD devices.  For
> the record, we used the following commands to create the md devices:
> 
> For N in 0 through 9:
> mdadm --create /dev/mdN -v --raid-devices=3 --level=raid10 \
>   --layout=n3 --metadata=0 --bitmap=internal --bitmap-chunk=2048 \
>   --chunk=1024 /dev/sdX /dev/sdY /dev/sdZ

Is that a typo, or are you turning those 3 disk mdadm sets into RAID10 as
shown above, instead of the 3-way mirror sets you stated previously?  RAID 10
requires a minimum of 4 disks, you have 3.  Something isn't right here...

> Then the big stripe:
> mdadm --create /dev/md10 -v --raid-devices=10 --level=stripe \
>   --metadata=1.0 --chunk=1024 /dev/md{0,5,1,6,2,7,3,8,4,9}

And I'm pretty sure this is the stripe lvcreate needs to know about to fix the
10x performance drop issue.  Create a new lvm test volume with the lvcreate
options I've mentioned, and see how it performs against the current 400GB test
volume that's running slow.

-- 
Stan


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4c3b937c.1080...@hardwarefreak.com

Reply via email to