So I've been struggling with LVM snapshot issues for atleast a year or
so (in some form or another)

I'm not sure if the kernel engineers working on this issue have nailed
down the precise cause, but I have made some observations over the past
week that may help.

1) The issue is has a definite positive correlation with disk activity
 I am snapshotting a database partition.  If I try to snapshot the partition 
while there is database activity, I can almost never get a snapshot off (I've 
gotten up to 10 snapshot failures in a row).  If I suspend the database, which 
eliminates *most* of the disk activity, I can almost always get a snapshot off

2) Starting in Feisty, I get corrupted snapshots
 Snapshots were working so poorly for me, I started booting the Edgy kernel on 
my feisty system.  This solved almost all of my problems (except for the very 
very occasional snapshot creation failure).  Ubuntu dropped a few new updates 
to the feisty kernel, so I decided I would give it a try in the interests of 
helping people debug feisty issues.  When I use the feisty kernel, some of my 
snapshots have corrupted data on them.  This never (AFAIK) happened with the 
Edgy kernel.  Here is what I am doing:
On the MySQL server I am snapshotting, I issue
flush tables with read lock
then I lvcreate a snapshot of the mysql partition
then I issue unlock tables.

The MySQL session is kept open during the snapshotting, which keeps the
lock in place, and the tables closed.

With the edgy kernel, I get database tables that are completely closed
and non-corrupt.

With the feisty kernel (2.6.20-16-server), I will sometimes get tables
that are 'still open', and sometimes I get corrupted tables.

Now.. I can't find documentation anywhere that describes what creating a
snapshot actually does.  Does it ask the ext3 filesystem to sync it's
buffers before snapshot?  Is ext3 supposed to close it journal for the
snapshot?  Does it flush outstanding write pages sitting in the kernel
before the snapshot?  I do not know, and I don't know if it's supposed
to.  (but I sure would be curious)

Barring a bug in MySQL, when the 'flush tables with read lock' command
is returned, it's supposed to have completed closing all of it's tables.
I would expect when an lvcreate snapshot comes around, those pages (any
anything else that's hit the kernel) would be part of that snapshot.
They don't always appear to be in feisty.  If I put a 'sync' infront of
my lvcreate, I haven't been able to get corrupted tables (yet... I
haven't tried all that much).  If I don't put the sync infront of the
lvcreate, I can sometimes get corrupted tables.

So if I *am* supposed to be getting those pages, perhaps there is a race
condition when disk activity is present on a partition being
snapshotted, and that is part of the core issue here with snapshot
creation failures.  Especially since I see a very high percentage of
snapshot failures when I have activity on the partition.

-- 
snapshot creation failure race "in use: not deactivating"
https://bugs.launchpad.net/bugs/105936
You received this bug notification because you are a member of Ubuntu
Bugs, which is the bug contact for Ubuntu.

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to