Re: [Qemu-devel] QCOW2 deduplication design

2013-01-10 Thread Stefan Hajnoczi
On Thu, Jan 10, 2013 at 4:18 PM, Benoît Canet wrote: >> Now I understand. This case covers overwriting existing data with new >> contents. That is common :). >> >> But are you seeing a cluster with refcount > 1 being overwritten >> often? If so, it's worth looking into why that happens. It may

Re: [Qemu-devel] QCOW2 deduplication design

2013-01-10 Thread Benoît Canet
> Now I understand. This case covers overwriting existing data with new > contents. That is common :). > > But are you seeing a cluster with refcount > 1 being overwritten > often? If so, it's worth looking into why that happens. It may be a > common pattern for certain file systems or applica

Re: [Qemu-devel] QCOW2 deduplication design

2013-01-10 Thread Stefan Hajnoczi
On Wed, Jan 9, 2013 at 5:40 PM, Benoît Canet wrote: >> > I.5) cluster removal >> > When a L2 entry to a cluster become stale the qcow2 code decrement the >> > refcount. >> > When the refcount reach zero the L2 hash block of the stale cluster >> > is written to clear the hash. >> > This happen ofte

Re: [Qemu-devel] QCOW2 deduplication design

2013-01-09 Thread Stefan Hajnoczi
On Wed, Jan 9, 2013 at 5:32 PM, Eric Blake wrote: > On 01/09/2013 09:16 AM, Stefan Hajnoczi wrote: > >>> I.6) max refcount reached >>> The L2 hash block of the cluster is written in order to remember at next >>> startup >>> that it must not be used anymore for deduplication. The hash is dropped

Re: [Qemu-devel] QCOW2 deduplication design

2013-01-09 Thread Benoît Canet
> > Two GTrees are used to give access to the hashes : one indexed by hash and > > one other indexed by physical offset. > > What is the GTree indexed by physical offset used for? I think I can get rid of the second GTree for ram based deduplication. It need to: -Start qcow2 with the deduplicati

Re: [Qemu-devel] QCOW2 deduplication design

2013-01-09 Thread Stefan Hajnoczi
On Wed, Jan 9, 2013 at 4:24 PM, Benoît Canet wrote: > Here is a mail to open a discussion on QCOW2 deduplication design and > performance. > > The actual deduplication strategy is RAM based. > One of the goal of the project is to plan and implement an alternative way to > do > the lookups from di

Re: [Qemu-devel] QCOW2 deduplication design

2013-01-09 Thread Benoît Canet
> > What is the GTree indexed by physical offset used for? It's used for two things: deletion and loading of the hashes. -Deletion is a hook in the refcount code that trigger when zero is reached. the only information the code got is the physical offset of the yet to discard cluster. The hash m

Re: [Qemu-devel] QCOW2 deduplication design

2013-01-09 Thread Eric Blake
On 01/09/2013 09:16 AM, Stefan Hajnoczi wrote: >> I.6) max refcount reached >> The L2 hash block of the cluster is written in order to remember at next >> startup >> that it must not be used anymore for deduplication. The hash is dropped from >> the >> gtrees. > > Interesting case. This means

[Qemu-devel] QCOW2 deduplication design

2013-01-09 Thread Benoît Canet
Hello, Here is a mail to open a discussion on QCOW2 deduplication design and performance. The actual deduplication strategy is RAM based. One of the goal of the project is to plan and implement an alternative way to do the lookups from disk for bigger images. I will in a first section enumerate