Hi, Matt,
I believe I did reproduce the problem. The difficulty was really with creating
an L1 hole. Which I managed with a zfs recv of an empty L1 range from one zvol
to another. The target zvol had L1 hole in place of the L1 range filled with L0
holes in the source zvol.
The issue that I see is as follows (the datasets have compression on, the pool
has hole_birth feature active). If the L1 hole is later partially overwritten
with non-zero data, then the result is that a new L1 block is allocated and is
partially filled in with new L0 block pointers pointing to non-zero blocks.
Unfortunately, the rest of the L1 block appears to be left initialized with
zeros (to zdb, it looks like bunch of holes with 0 birth epoch). But this is a
wrong thing to do, because now, this hole at the end of the L1 range in
question is "old" whereas it should retail the birth epoch of the original L1
hole ("new"). But it does not. So, the next zfs send disregards this hole,
which results in lost FREE record(s) in the corresponding zfs send stream.
I have the datasets snapped and local, I can reproduce this problem and can
dump any zdb data if needed. Here are some snippets of the zdb output.
Before the overwrite:
7c0000 L0 1:2584800:10000 10000L/10000P F=1 B=346/346
7d0000 L0 1:2594800:10000 10000L/10000P F=1 B=346/346
7e0000 L0 1:25a4800:10000 10000L/10000P F=1 B=346/346
7f0000 L0 1:25b4800:10000 10000L/10000P F=1 B=346/346
800000 L1 4000L B=548
1000000 L1 0:7c23e00:400 1:7c2d200:400 4000L/400P F=128 B=1268/1268
1000000 L0 0:7808c00:10000 10000L/10000P F=1 B=1268/1268
1010000 L0 0:7818c00:10000 10000L/10000P F=1 B=1268/1268
1020000 L0 0:7828c00:10000 10000L/10000P F=1 B=1268/1268
The L1 hole is at offset 800000. After the partial overwrite (10 blocks written
at the beginning of the L1 range):
7d0000 L0 1:2594800:10000 10000L/10000P F=1 B=346/346
7e0000 L0 1:25a4800:10000 10000L/10000P F=1 B=346/346
7f0000 L0 1:25b4800:10000 10000L/10000P F=1 B=346/346
800000 L1 0:ea36000:600 1:f2d6000:600 4000L/600P F=10 B=1749/1749
800000 L0 1:f246000:10000 10000L/10000P F=1 B=1749/1749
810000 L0 1:f256000:10000 10000L/10000P F=1 B=1749/1749
820000 L0 1:f266000:10000 10000L/10000P F=1 B=1749/1749
830000 L0 1:f276000:10000 10000L/10000P F=1 B=1749/1749
840000 L0 1:f286000:10000 10000L/10000P F=1 B=1749/1749
850000 L0 1:f296000:10000 10000L/10000P F=1 B=1749/1749
860000 L0 1:f2a6000:10000 10000L/10000P F=1 B=1749/1749
870000 L0 1:f2c6000:10000 10000L/10000P F=1 B=1749/1749
880000 L0 1:f2b6000:10000 10000L/10000P F=1 B=1749/1749
890000 L0 0:ea26000:10000 10000L/10000P F=1 B=1749/1749
1000000 L1 0:7c23e00:400 1:7c2d200:400 4000L/400P F=128 B=1268/1268
1000000 L0 0:7808c00:10000 10000L/10000P F=1 B=1268/1268
Dump of the new L1 block's contents:
# zdb tpool -R 0:ea36000:600:di
Found vdev: /dev/sdk1
DVA[0]=<1:f246000:10000> [L0 zvol object] fletcher4 uncompressed LE contiguous
unique single size=10000L/10000P birth=1749L/1749P fill=1
cksum=2041f382f58d:408f994ef048de7:daf0e1cadf74f53:47c3fdb952a1e13f
DVA[0]=<1:f256000:10000> [L0 zvol object] fletcher4 uncompressed LE contiguous
unique single size=10000L/10000P birth=1749L/1749P fill=1
cksum=20306f1938d5:403381ccd468d8a:713a193137858160:d91b1c5cecb306af
DVA[0]=<1:f266000:10000> [L0 zvol object] fletcher4 uncompressed LE contiguous
unique single size=10000L/10000P birth=1749L/1749P fill=1
cksum=1fe160444ab9:3fcbaeb1f31c86e:11198655b490d3b5:76cd3d278385af3e
DVA[0]=<1:f276000:10000> [L0 zvol object] fletcher4 uncompressed LE contiguous
unique single size=10000L/10000P birth=1749L/1749P fill=1
cksum=201cb0f386db:4035cd7deb41749:6b4d734a11ce04b5:1fbc2dc2f169dcae
DVA[0]=<1:f286000:10000> [L0 zvol object] fletcher4 uncompressed LE contiguous
unique single size=10000L/10000P birth=1749L/1749P fill=1
cksum=202d87c1a695:403d7ff6a1a6f53:66b049fa47216fb4:848b133855fab5b
DVA[0]=<1:f296000:10000> [L0 zvol object] fletcher4 uncompressed LE contiguous
unique single size=10000L/10000P birth=1749L/1749P fill=1
cksum=200db48ae914:40163392a6e1f2a:62ad5c6b01c39d36:b1fa1b14d986fa82
DVA[0]=<1:f2a6000:10000> [L0 zvol object] fletcher4 uncompressed LE contiguous
unique single size=10000L/10000P birth=1749L/1749P fill=1
cksum=1feb72709d3a:3f8dbd7a3f1e98f:ab207f926cc8b2fc:c9a1145f06e1f9ab
DVA[0]=<1:f2c6000:10000> [L0 zvol object] fletcher4 uncompressed LE contiguous
unique single size=10000L/10000P birth=1749L/1749P fill=1
cksum=1fd90de96ff0:3f98dc852f15900:a3cd2aed016bc0a9:eb9f507ffe495f15
DVA[0]=<1:f2b6000:10000> [L0 zvol object] fletcher4 uncompressed LE contiguous
unique single size=10000L/10000P birth=1749L/1749P fill=1
cksum=20361039de8d:40b6d13e4438295:75686dbb7da50937:3217ceae84d5b538
DVA[0]=<0:ea26000:10000> [L0 zvol object] fletcher4 uncompressed LE contiguous
unique single size=10000L/10000P birth=1749L/1749P fill=1
cksum=202a397fecd2:405130f54a9a83d:cda024b471659627:edf740c0ca1563eb
HOLE [L0 unallocated] size=200L birth=0L
HOLE [L0 unallocated] size=200L birth=0L
HOLE [L0 unallocated] size=200L birth=0L
....
HOLE [L0 unallocated] size=200L birth=0L
HOLE [L0 unallocated] size=200L birth=0L
HOLE [L0 unallocated] size=200L birth=0L
The uncompressed data is likely due to the /dev/urandom source. The volume does
have lz4 compression set (and had before the overwrite - inherited from the
pool).
By induction, a similar issue is likely to arise with an Ln hole when it is
partially overwritten with non-hole block pointers. The remainder of the new
indirect block allocated in place of the Ln hole needs to be backfilled with
Ln-1 holes with the same birth epoch as the original Ln hole.
At this time, it is not clear to me how this is best accomplished. Any pointers
are highly appreciated.
Best regards,
Boris.
________________________________
From: Matthew Ahrens <[email protected]>
Sent: Monday, November 16, 2015 5:14 PM
To: Boris
Cc: [email protected]; [email protected]
Subject: Re: [OpenZFS Developer] zfs send not detecting new holes
On Mon, Nov 16, 2015 at 4:36 AM, Boris
<[email protected]<mailto:[email protected]>> wrote:
I should have been more specific, in my case I see the problem with zvols: the
first snapshot has a non-zero block, the next snapshot has the block overwrite
with zeros, but the stream lacks the free record. The zvol is ~1.2T, 64k block
size, sparse, has lz4 compression on.
In that case I don't think your problem is related to the bug I mentioned,
which only has to do with objects that have been reallocated. You must be
seeing a different issue. We also can not reproduce your issue with a simple
test case.
--matt
Typos courtesy of my iPhone
On Nov 15, 2015, at 12:25 PM, Matthew Ahrens
<[email protected]<mailto:[email protected]>> wrote:
btw, here is the bug you're asking about:
https://www.illumos.org/issues/6370
--matt
On Sun, Nov 15, 2015 at 9:24 AM, Matthew Ahrens
<[email protected]<mailto:[email protected]>> wrote:
We have a fix for this that we need to upstream. We are waiting on code
reviews for another change to send/receive:
https://github.com/openzfs/openzfs/pull/23
6393 zfs receive a full send as a clone
I'll probably stop waiting soon and RTI it, then we get get our fix for this in.
--matt
On Sun, Nov 15, 2015 at 8:37 AM, Boris
<[email protected]<mailto:[email protected]>> wrote:
Hi, guys,
I've been looking an issue where sometimes, after non-zero data blocks are
overwritten with zero blocks with compression on, the corresponding incremental
send stream does not include the FREE record for those blocks. The zdb -ddddddd
output seems to indicate that the blocks in question have never been written
(the offsets for them are not listed in the output).
This looks like the issue addressed by
commit a4069eef2e403a3b2a307b23b7500e2adc6ecae5
Author: Prakash Surya
<[email protected]<mailto:[email protected]>>
Date: Fri Mar 27 13:03:22 2015 +1100
Illumos 5695 - dmu_sync'ed holes do not retain birth time
but I certainly do have that commit. I have experimented with overwriting
blocks at different offsets, ranges of blocks spanning L1 and L2 block
pointers, but I cannot reproduce the issue.
Any suggestions for directions to look ? Perhaps for a way to shape the block
tree such that this problem could arise ?
Best regards,
Boris.
_______________________________________________
developer mailing list
[email protected]<mailto:[email protected]>
http://lists.open-zfs.org/mailman/listinfo/developer
_______________________________________________
developer mailing list
[email protected]
http://lists.open-zfs.org/mailman/listinfo/developer