Re: Distributed storage. Move away from char device ioctls.

2007-10-26 Thread Evgeniy Polyakov
Returning back to this, since block based storage, which can act as a shared storage/transport layer, is ready with 5'th release of the DST. My couple of notes on proposed data distribution algorithm in FS. On Sun, Sep 16, 2007 at 03:07:11AM -0400, Kyle Moffett ([EMAIL PROTECTED]) wrote: > >I ac

Re: Distributed storage. Security attributes and ducumentation update.

2007-09-22 Thread Evgeniy Polyakov
Hi Pavel. On Mon, Sep 17, 2007 at 06:22:30PM +, Pavel Machek ([EMAIL PROTECTED]) wrote: > > I'm pleased to announce third release of the distributed storage > > subsystem, which allows to form a storage on top of remote and local > > nodes, which in turn can be exported to another storage as

Re: Distributed storage. Security attributes and ducumentation update.

2007-09-21 Thread Pavel Machek
Hi! > I'm pleased to announce third release of the distributed storage > subsystem, which allows to form a storage on top of remote and local > nodes, which in turn can be exported to another storage as a node to > form tree-like storages. How is this different from raid0/1 over nbd? Or raid0/1 o

Re: Distributed storage. Move away from char device ioctls.

2007-09-16 Thread Evgeniy Polyakov
On Sat, Sep 15, 2007 at 11:24:46AM -0600, Andreas Dilger ([EMAIL PROTECTED]) wrote: > > When Chris Mason announced btrfs, I found that quite a few new ideas > > are already implemented there, so I postponed project (although > > direction of the developement of the btrfs seems to move to the zfs

Re: Distributed storage. Move away from char device ioctls.

2007-09-16 Thread Kyle Moffett
On Sep 15, 2007, at 13:24:46, Andreas Dilger wrote: On Sep 15, 2007 16:29 +0400, Evgeniy Polyakov wrote: Yes, block device itself is not able to scale well, but it is the place for redundancy, since filesystem will just fail if underlying device does not work correctly and FS actually does n

Re: Distributed storage. Move away from char device ioctls.

2007-09-15 Thread Andreas Dilger
On Sep 15, 2007 12:20 -0400, Robin Humble wrote: > On Sat, Sep 15, 2007 at 10:35:16AM -0400, Jeff Garzik wrote: > >Lustre is tilted far too much towards high-priced storage, > > many (most?) Lustre deployments are with SATA and md raid5 and GigE - > can't get much cheaper than that. I have to ag

Re: Distributed storage. Move away from char device ioctls.

2007-09-15 Thread Andreas Dilger
On Sep 15, 2007 16:29 +0400, Evgeniy Polyakov wrote: > Yes, block device itself is not able to scale well, but it is the place > for redundancy, since filesystem will just fail if underlying device > does not work correctly and FS actually does not know about where it > should place redundancy bit

Re: Distributed storage. Move away from char device ioctls.

2007-09-15 Thread Robin Humble
On Sat, Sep 15, 2007 at 10:35:16AM -0400, Jeff Garzik wrote: >Robin Humble wrote: >>On Fri, Sep 14, 2007 at 03:07:46PM -0400, Jeff Garzik wrote: >>>I've been waiting for years for a smart person to come along and write a >>>POSIX-only distributed filesystem. >>it's called Lustre. >>works well, sca

Re: Distributed storage. Move away from char device ioctls.

2007-09-15 Thread Jeff Garzik
Robin Humble wrote: On Fri, Sep 14, 2007 at 03:07:46PM -0400, Jeff Garzik wrote: It is my hope that you will put your skills towards a distributed filesystem :) Of the current solutions, GFS (currently in kernel) scales poorly, and NFS v4.1 is amazingly bloated and overly complex. I've been

Re: Distributed storage. Move away from char device ioctls.

2007-09-15 Thread Robin Humble
On Fri, Sep 14, 2007 at 03:07:46PM -0400, Jeff Garzik wrote: >It is my hope that you will put your skills towards a distributed >filesystem :) Of the current solutions, GFS (currently in kernel) >scales poorly, and NFS v4.1 is amazingly bloated and overly complex. > >I've been waiting for years

Re: Distributed storage. Move away from char device ioctls.

2007-09-15 Thread Evgeniy Polyakov
Hi Mike. On Fri, Sep 14, 2007 at 10:54:56PM -0400, Mike Snitzer ([EMAIL PROTECTED]) wrote: > This distributed storage is very much needed; even if it were to act > as a more capable/performant replacement for NBD (or MD+NBD) in the > near term. Many high availability applications don't _need_ al

Re: Distributed storage. Move away from char device ioctls.

2007-09-15 Thread Evgeniy Polyakov
Hi Jeff. On Fri, Sep 14, 2007 at 03:07:46PM -0400, Jeff Garzik ([EMAIL PROTECTED]) wrote: > >Further TODO list includes: > >* implement optional saving of mirroring/linear information on the remote > > nodes (simple) > >* new redundancy algorithm (complex) > >* some thoughts about distributed

Re: Distributed storage. Move away from char device ioctls.

2007-09-14 Thread J. Bruce Fields
On Sat, Sep 15, 2007 at 12:08:42AM -0400, Jeff Garzik wrote: > J. Bruce Fields wrote: >> No, servers are required to support ordinary nfs operations to the >> metadata server. >> At least, that's the way it was last I heard, which was a while ago. I >> agree that it'd stink (for any number of reas

Re: Distributed storage. Move away from char device ioctls.

2007-09-14 Thread Jeff Garzik
J. Bruce Fields wrote: On Fri, Sep 14, 2007 at 06:32:11PM -0400, Jeff Garzik wrote: J. Bruce Fields wrote: On Fri, Sep 14, 2007 at 05:14:53PM -0400, Jeff Garzik wrote: NFSv4.1 adds to the fun, by throwing interoperability completely out the window. What parts are you worried about in partic

Re: Distributed storage. Move away from char device ioctls.

2007-09-14 Thread Mike Snitzer
On 9/14/07, Jeff Garzik <[EMAIL PROTECTED]> wrote: > Evgeniy Polyakov wrote: > > Hi. > > > > I'm pleased to announce fourth release of the distributed storage > > subsystem, which allows to form a storage on top of remote and local > > nodes, which in turn can be exported to another storage as a no

Re: Distributed storage. Move away from char device ioctls.

2007-09-14 Thread J. Bruce Fields
On Fri, Sep 14, 2007 at 06:32:11PM -0400, Jeff Garzik wrote: > J. Bruce Fields wrote: >> On Fri, Sep 14, 2007 at 05:14:53PM -0400, Jeff Garzik wrote: >>> J. Bruce Fields wrote: On Fri, Sep 14, 2007 at 03:07:46PM -0400, Jeff Garzik wrote: > I've been waiting for years for a smart person to

Re: Distributed storage. Move away from char device ioctls.

2007-09-14 Thread Jeff Garzik
J. Bruce Fields wrote: On Fri, Sep 14, 2007 at 05:14:53PM -0400, Jeff Garzik wrote: J. Bruce Fields wrote: On Fri, Sep 14, 2007 at 03:07:46PM -0400, Jeff Garzik wrote: I've been waiting for years for a smart person to come along and write a POSIX-only distributed filesystem. What exactly do y

Re: Distributed storage. Move away from char device ioctls.

2007-09-14 Thread J. Bruce Fields
On Fri, Sep 14, 2007 at 05:14:53PM -0400, Jeff Garzik wrote: > J. Bruce Fields wrote: >> On Fri, Sep 14, 2007 at 03:07:46PM -0400, Jeff Garzik wrote: >>> I've been waiting for years for a smart person to come along and write a >>> POSIX-only distributed filesystem. > >> What exactly do you mean by

Re: Distributed storage. Move away from char device ioctls.

2007-09-14 Thread Jeff Garzik
J. Bruce Fields wrote: On Fri, Sep 14, 2007 at 03:07:46PM -0400, Jeff Garzik wrote: I've been waiting for years for a smart person to come along and write a POSIX-only distributed filesystem. What exactly do you mean by "POSIX-only"? Don't bother supporting attributes, file modes, and other

Re: Distributed storage. Move away from char device ioctls.

2007-09-14 Thread J. Bruce Fields
On Fri, Sep 14, 2007 at 03:07:46PM -0400, Jeff Garzik wrote: > My thoughts. But first a disclaimer: Perhaps you will recall me as one > of the people who really reads all your patches, and examines your code and > proposals closely. So, with that in mind... > > I question the value of distrib

Re: Distributed storage. Move away from char device ioctls.

2007-09-14 Thread Al Boldi
Jeff Garzik wrote: > Evgeniy Polyakov wrote: > > Hi. > > > > I'm pleased to announce fourth release of the distributed storage > > subsystem, which allows to form a storage on top of remote and local > > nodes, which in turn can be exported to another storage as a node to > > form tree-like storage

Re: Distributed storage. Move away from char device ioctls.

2007-09-14 Thread Jeff Garzik
Evgeniy Polyakov wrote: Hi. I'm pleased to announce fourth release of the distributed storage subsystem, which allows to form a storage on top of remote and local nodes, which in turn can be exported to another storage as a node to form tree-like storages. This release includes new configuratio

Re: Distributed storage. Security attributes and ducumentation update.

2007-09-13 Thread Paul E. McKenney
On Thu, Sep 13, 2007 at 04:22:59PM +0400, Evgeniy Polyakov wrote: > Hi Paul. > > On Mon, Sep 10, 2007 at 03:14:45PM -0700, Paul E. McKenney ([EMAIL > PROTECTED]) wrote: > > > Further TODO list includes: > > > * implement optional saving of mirroring/linear information on the remote > > > nodes

Re: Distributed storage. Security attributes and ducumentation update.

2007-09-13 Thread Evgeniy Polyakov
Hi Paul. On Mon, Sep 10, 2007 at 03:14:45PM -0700, Paul E. McKenney ([EMAIL PROTECTED]) wrote: > > Further TODO list includes: > > * implement optional saving of mirroring/linear information on the remote > > nodes (simple) > > * implement netlink based setup (simple) > > * new redundancy alg

Re: [1/1] Block device throttling [Re: Distributed storage.]

2007-09-01 Thread Daniel Phillips
On Friday 31 August 2007 14:41, Alasdair G Kergon wrote: > On Thu, Aug 30, 2007 at 04:20:35PM -0700, Daniel Phillips wrote: > > Resubmitting a bio or submitting a dependent bio from > > inside a block driver does not need to be throttled because all > > resources required to guarantee completion mu

Re: [1/1] Block device throttling [Re: Distributed storage.]

2007-08-31 Thread Alasdair G Kergon
On Thu, Aug 30, 2007 at 04:20:35PM -0700, Daniel Phillips wrote: > Resubmitting a bio or submitting a dependent bio from > inside a block driver does not need to be throttled because all > resources required to guarantee completion must have been obtained > _before_ the bio was allowed to procee

Re: [1/1] Block device throttling [Re: Distributed storage.]

2007-08-31 Thread Evgeniy Polyakov
Hi Daniel. On Thu, Aug 30, 2007 at 04:20:35PM -0700, Daniel Phillips ([EMAIL PROTECTED]) wrote: > On Wednesday 29 August 2007 01:53, Evgeniy Polyakov wrote: > > Then, if of course you will want, which I doubt, you can reread > > previous mails and find that it was pointed to that race and > > pos

Re: [1/1] Block device throttling [Re: Distributed storage.]

2007-08-30 Thread Daniel Phillips
On Wednesday 29 August 2007 01:53, Evgeniy Polyakov wrote: > Then, if of course you will want, which I doubt, you can reread > previous mails and find that it was pointed to that race and > possibilities to solve it way too long ago. What still bothers me about your response is that, while you kno

Re: [1/1] Block device throttling [Re: Distributed storage.]

2007-08-29 Thread Evgeniy Polyakov
On Tue, Aug 28, 2007 at 02:08:04PM -0700, Daniel Phillips ([EMAIL PROTECTED]) wrote: > On Tuesday 28 August 2007 10:54, Evgeniy Polyakov wrote: > > On Tue, Aug 28, 2007 at 10:27:59AM -0700, Daniel Phillips ([EMAIL > > PROTECTED]) wrote: > > > > We do not care about one cpu being able to increase

Re: [1/1] Block device throttling [Re: Distributed storage.]

2007-08-28 Thread Daniel Phillips
On Tuesday 28 August 2007 10:54, Evgeniy Polyakov wrote: > On Tue, Aug 28, 2007 at 10:27:59AM -0700, Daniel Phillips ([EMAIL PROTECTED]) > wrote: > > > We do not care about one cpu being able to increase its counter > > > higher than the limit, such inaccuracy (maximum bios in flight > > > thus ca

Re: [1/1] Block device throttling [Re: Distributed storage.]

2007-08-28 Thread Evgeniy Polyakov
On Tue, Aug 28, 2007 at 10:27:59AM -0700, Daniel Phillips ([EMAIL PROTECTED]) wrote: > > We do not care about one cpu being able to increase its counter > > higher than the limit, such inaccuracy (maximum bios in flight thus > > can be more than limit, difference is equal to the number of CPUs - >

Re: [1/1] Block device throttling [Re: Distributed storage.]

2007-08-28 Thread Daniel Phillips
On Tuesday 28 August 2007 02:35, Evgeniy Polyakov wrote: > On Mon, Aug 27, 2007 at 02:57:37PM -0700, Daniel Phillips ([EMAIL PROTECTED]) wrote: > > Say Evgeniy, something I was curious about but forgot to ask you > > earlier... > > > > On Wednesday 08 August 2007 03:17, Evgeniy Polyakov wrote: > >

Re: Distributed storage.

2007-08-28 Thread Evgeniy Polyakov
On Fri, Aug 03, 2007 at 09:04:51AM +0400, Manu Abraham ([EMAIL PROTECTED]) wrote: > On 7/31/07, Evgeniy Polyakov <[EMAIL PROTECTED]> wrote: > > > TODO list currently includes following main items: > > * redundancy algorithm (drop me a request of your own, but it is highly > > unlikley

Re: [1/1] Block device throttling [Re: Distributed storage.]

2007-08-28 Thread Evgeniy Polyakov
On Mon, Aug 27, 2007 at 02:57:37PM -0700, Daniel Phillips ([EMAIL PROTECTED]) wrote: > Say Evgeniy, something I was curious about but forgot to ask you > earlier... > > On Wednesday 08 August 2007 03:17, Evgeniy Polyakov wrote: > > ...All oerations are not atomic, since we do not care about prec

Re: [1/1] Block device throttling [Re: Distributed storage.]

2007-08-27 Thread Daniel Phillips
Say Evgeniy, something I was curious about but forgot to ask you earlier... On Wednesday 08 August 2007 03:17, Evgeniy Polyakov wrote: > ...All oerations are not atomic, since we do not care about precise > number of bios, but a fact, that we are close or close enough to the > limit. > ... in bi

Re: Distributed storage. Mirroring to any number of devices.

2007-08-14 Thread Evgeniy Polyakov
On Tue, Aug 14, 2007 at 07:20:49PM +0200, Jan Engelhardt ([EMAIL PROTECTED]) wrote: > >I'm pleased to announce second release of the distributed storage > >subsystem, which allows to form a storage on top of remote and local > >nodes, which in turn can be exported to another storage as a node to >

Re: Distributed storage. Mirroring to any number of devices.

2007-08-14 Thread Jan Engelhardt
On Aug 14 2007 20:29, Evgeniy Polyakov wrote: > >I'm pleased to announce second release of the distributed storage >subsystem, which allows to form a storage on top of remote and local >nodes, which in turn can be exported to another storage as a node to >form tree-like storages. I'll be quick: w

Re: Block device throttling [Re: Distributed storage.]

2007-08-14 Thread Daniel Phillips
On Tuesday 14 August 2007 05:46, Evgeniy Polyakov wrote: > > The throttling of the virtual device must begin in > > generic_make_request and last to ->endio. You release the throttle > > of the virtual device at the point you remap the bio to an > > underlying device, which you have convinced your

Re: Block device throttling [Re: Distributed storage.]

2007-08-14 Thread Evgeniy Polyakov
On Tue, Aug 14, 2007 at 05:32:29AM -0700, Daniel Phillips ([EMAIL PROTECTED]) wrote: > On Tuesday 14 August 2007 04:50, Evgeniy Polyakov wrote: > > On Tue, Aug 14, 2007 at 04:35:43AM -0700, Daniel Phillips > ([EMAIL PROTECTED]) wrote: > > > On Tuesday 14 August 2007 04:30, Evgeniy Polyakov wrote:

Re: Block device throttling [Re: Distributed storage.]

2007-08-14 Thread Daniel Phillips
On Tuesday 14 August 2007 04:50, Evgeniy Polyakov wrote: > On Tue, Aug 14, 2007 at 04:35:43AM -0700, Daniel Phillips ([EMAIL PROTECTED]) wrote: > > On Tuesday 14 August 2007 04:30, Evgeniy Polyakov wrote: > > > > And it will not solve the deadlock problem in general. (Maybe > > > > it works for y

Re: Block device throttling [Re: Distributed storage.]

2007-08-14 Thread Evgeniy Polyakov
On Tue, Aug 14, 2007 at 04:35:43AM -0700, Daniel Phillips ([EMAIL PROTECTED]) wrote: > On Tuesday 14 August 2007 04:30, Evgeniy Polyakov wrote: > > > And it will not solve the deadlock problem in general. (Maybe it > > > works for your virtual device, but I wonder...) If the virtual > > > device

Re: Block device throttling [Re: Distributed storage.]

2007-08-14 Thread Daniel Phillips
On Tuesday 14 August 2007 04:30, Evgeniy Polyakov wrote: > > And it will not solve the deadlock problem in general. (Maybe it > > works for your virtual device, but I wonder...) If the virtual > > device allocates memory during generic_make_request then the memory > > needs to be throttled. > > D

Re: Block device throttling [Re: Distributed storage.]

2007-08-14 Thread Evgeniy Polyakov
On Tue, Aug 14, 2007 at 04:13:10AM -0700, Daniel Phillips ([EMAIL PROTECTED]) wrote: > On Tuesday 14 August 2007 01:46, Evgeniy Polyakov wrote: > > On Mon, Aug 13, 2007 at 06:04:06AM -0700, Daniel Phillips > ([EMAIL PROTECTED]) wrote: > > > Perhaps you never worried about the resources that the d

Re: Block device throttling [Re: Distributed storage.]

2007-08-14 Thread Daniel Phillips
On Tuesday 14 August 2007 01:46, Evgeniy Polyakov wrote: > On Mon, Aug 13, 2007 at 06:04:06AM -0700, Daniel Phillips ([EMAIL PROTECTED]) wrote: > > Perhaps you never worried about the resources that the device > > mapper mapping function allocates to handle each bio and so did not > > consider thi

Re: Block device throttling [Re: Distributed storage.]

2007-08-14 Thread Evgeniy Polyakov
On Mon, Aug 13, 2007 at 06:04:06AM -0700, Daniel Phillips ([EMAIL PROTECTED]) wrote: > Perhaps you never worried about the resources that the device mapper > mapping function allocates to handle each bio and so did not consider > this hole significant. These resources can be significant, as is

Re: Distributed storage.

2007-08-13 Thread Daniel Phillips
On Monday 13 August 2007 02:12, Jens Axboe wrote: > > It is a system wide problem. Every block device needs throttling, > > otherwise queues expand without limit. Currently, block devices > > that use the standard request library get a slipshod form of > > throttling for free in the form of limit

Re: Block device throttling [Re: Distributed storage.]

2007-08-13 Thread Daniel Phillips
On Monday 13 August 2007 05:18, Evgeniy Polyakov wrote: > > Say you have a device mapper device with some physical device > > sitting underneath, the classic use case for this throttle code. > > Say 8,000 threads each submit an IO in parallel. The device mapper > > mapping function will be called

Re: Block device throttling [Re: Distributed storage.]

2007-08-13 Thread Evgeniy Polyakov
On Mon, Aug 13, 2007 at 05:18:14AM -0700, Daniel Phillips ([EMAIL PROTECTED]) wrote: > > If limit is for > > 1gb of pending block io, and system has for example 2gbs of ram (or > > any other resonable parameters), then there is no way we can deadlock > > in allocation, since it will not force pag

Re: Block device throttling [Re: Distributed storage.]

2007-08-13 Thread Evgeniy Polyakov
On Mon, Aug 13, 2007 at 04:18:03AM -0700, Daniel Phillips ([EMAIL PROTECTED]) wrote: > > No. Since all requests for virtual device end up in physical devices, > > which have limits, this mechanism works. Virtual device will > > essentially call either generic_make_request() for new physical > > de

Re: Block device throttling [Re: Distributed storage.]

2007-08-13 Thread Daniel Phillips
On Monday 13 August 2007 05:04, Evgeniy Polyakov wrote: > On Mon, Aug 13, 2007 at 04:04:26AM -0700, Daniel Phillips ([EMAIL PROTECTED]) wrote: > > On Monday 13 August 2007 01:14, Evgeniy Polyakov wrote: > > > > Oops, and there is also: > > > > > > > > 3) The bio throttle, which is supposed to prev

Re: Block device throttling [Re: Distributed storage.]

2007-08-13 Thread Evgeniy Polyakov
On Mon, Aug 13, 2007 at 04:04:26AM -0700, Daniel Phillips ([EMAIL PROTECTED]) wrote: > On Monday 13 August 2007 01:14, Evgeniy Polyakov wrote: > > > Oops, and there is also: > > > > > > 3) The bio throttle, which is supposed to prevent deadlock, can > > > itself deadlock. Let me see if I can reme

Re: Distributed storage.

2007-08-13 Thread Daniel Phillips
On Monday 13 August 2007 04:03, Evgeniy Polyakov wrote: > On Mon, Aug 13, 2007 at 03:12:33AM -0700, Daniel Phillips ([EMAIL PROTECTED]) wrote: > > > This is not a very good solution, since it requires all users of > > > the bios to know how to free it. > > > > No, only the specific ->endio needs t

Re: Block device throttling [Re: Distributed storage.]

2007-08-13 Thread Daniel Phillips
On Monday 13 August 2007 01:23, Evgeniy Polyakov wrote: > On Sun, Aug 12, 2007 at 10:36:23PM -0700, Daniel Phillips ([EMAIL PROTECTED]) wrote: > > (previous incomplete message sent accidentally) > > > > On Wednesday 08 August 2007 02:54, Evgeniy Polyakov wrote: > > > On Tue, Aug 07, 2007 at 10:55:

Re: Block device throttling [Re: Distributed storage.]

2007-08-13 Thread Daniel Phillips
On Monday 13 August 2007 01:14, Evgeniy Polyakov wrote: > > Oops, and there is also: > > > > 3) The bio throttle, which is supposed to prevent deadlock, can > > itself deadlock. Let me see if I can remember how it goes. > > > > * generic_make_request puts a bio in flight > > * the bio gets pas

Re: Distributed storage.

2007-08-13 Thread Evgeniy Polyakov
On Mon, Aug 13, 2007 at 03:12:33AM -0700, Daniel Phillips ([EMAIL PROTECTED]) wrote: > > This is not a very good solution, since it requires all users of the > > bios to know how to free it. > > No, only the specific ->endio needs to know that, which is set by the > bio owner, so this knowledge

Re: Distributed storage.

2007-08-13 Thread Daniel Phillips
On Monday 13 August 2007 03:22, Jens Axboe wrote: > I never compared the bio to struct page, I'd obviously agree that > shrinking struct page was a worthy goal and that it'd be ok to uglify > some code to do that. The same isn't true for struct bio. I thought I just said that. Regards, Daniel -

Re: Distributed storage.

2007-08-13 Thread Jens Axboe
On Mon, Aug 13 2007, Daniel Phillips wrote: > On Monday 13 August 2007 03:06, Jens Axboe wrote: > > On Mon, Aug 13 2007, Daniel Phillips wrote: > > > Of course not. Nothing I said stops endio from being called in the > > > usual way as well. For this to work, endio just needs to know that > > > o

Re: Distributed storage.

2007-08-13 Thread Daniel Phillips
On Monday 13 August 2007 03:06, Jens Axboe wrote: > On Mon, Aug 13 2007, Daniel Phillips wrote: > > Of course not. Nothing I said stops endio from being called in the > > usual way as well. For this to work, endio just needs to know that > > one call means "end" and the other means "destroy", thi

Re: Distributed storage.

2007-08-13 Thread Daniel Phillips
On Monday 13 August 2007 02:18, Evgeniy Polyakov wrote: > On Mon, Aug 13, 2007 at 02:08:57AM -0700, Daniel Phillips ([EMAIL PROTECTED]) wrote: > > > But that idea fails as well, since reference counts and IO > > > completion are two completely seperate entities. So unless end IO > > > just happens

Re: Distributed storage.

2007-08-13 Thread Jens Axboe
On Mon, Aug 13 2007, Daniel Phillips wrote: > On Monday 13 August 2007 02:13, Jens Axboe wrote: > > On Mon, Aug 13 2007, Daniel Phillips wrote: > > > On Monday 13 August 2007 00:45, Jens Axboe wrote: > > > > On Mon, Aug 13 2007, Jens Axboe wrote: > > > > > > You did not comment on the one about put

Re: Distributed storage.

2007-08-13 Thread Daniel Phillips
On Monday 13 August 2007 02:13, Jens Axboe wrote: > On Mon, Aug 13 2007, Daniel Phillips wrote: > > On Monday 13 August 2007 00:45, Jens Axboe wrote: > > > On Mon, Aug 13 2007, Jens Axboe wrote: > > > > > You did not comment on the one about putting the bio > > > > > destructor in the ->endio handl

Re: Distributed storage.

2007-08-13 Thread Evgeniy Polyakov
On Mon, Aug 13, 2007 at 02:08:57AM -0700, Daniel Phillips ([EMAIL PROTECTED]) wrote: > > But that idea fails as well, since reference counts and IO completion > > are two completely seperate entities. So unless end IO just happens > > to be the last user holding a reference to the bio, you cannot

Re: Distributed storage.

2007-08-13 Thread Jens Axboe
On Mon, Aug 13 2007, Daniel Phillips wrote: > On Monday 13 August 2007 00:45, Jens Axboe wrote: > > On Mon, Aug 13 2007, Jens Axboe wrote: > > > > You did not comment on the one about putting the bio destructor > > > > in the ->endio handler, which looks dead simple. The majority of > > > > cases

Re: Distributed storage.

2007-08-13 Thread Jens Axboe
On Mon, Aug 13 2007, Daniel Phillips wrote: > On Monday 13 August 2007 00:28, Jens Axboe wrote: > > On Sun, Aug 12 2007, Daniel Phillips wrote: > > > Right, that is done by bi_vcnt. I meant bi_max_vecs, which you can > > > derive efficiently from BIO_POOL_IDX() provided the bio was > > > allocated

Re: Distributed storage.

2007-08-13 Thread Daniel Phillips
On Monday 13 August 2007 00:45, Jens Axboe wrote: > On Mon, Aug 13 2007, Jens Axboe wrote: > > > You did not comment on the one about putting the bio destructor > > > in the ->endio handler, which looks dead simple. The majority of > > > cases just use the default endio handler and the default > >

Re: Distributed storage.

2007-08-13 Thread Daniel Phillips
On Monday 13 August 2007 00:28, Jens Axboe wrote: > On Sun, Aug 12 2007, Daniel Phillips wrote: > > Right, that is done by bi_vcnt. I meant bi_max_vecs, which you can > > derive efficiently from BIO_POOL_IDX() provided the bio was > > allocated in the standard way. > > That would only be feasible,

Re: Block device throttling [Re: Distributed storage.]

2007-08-13 Thread Evgeniy Polyakov
On Sun, Aug 12, 2007 at 10:36:23PM -0700, Daniel Phillips ([EMAIL PROTECTED]) wrote: > (previous incomplete message sent accidentally) > > On Wednesday 08 August 2007 02:54, Evgeniy Polyakov wrote: > > On Tue, Aug 07, 2007 at 10:55:38PM +0200, Jens Axboe wrote: > > > > So, what did we decide? To

Re: [1/1] Block device throttling [Re: Distributed storage.]

2007-08-13 Thread Evgeniy Polyakov
Hi Daniel. On Sun, Aug 12, 2007 at 04:16:10PM -0700, Daniel Phillips ([EMAIL PROTECTED]) wrote: > Your patch is close to the truth, but it needs to throttle at the top > (virtual) end of each block device stack instead of the bottom > (physical) end. It does head in the direction of eliminatin

Re: Block device throttling [Re: Distributed storage.]

2007-08-13 Thread Evgeniy Polyakov
On Sun, Aug 12, 2007 at 11:44:00PM -0700, Daniel Phillips ([EMAIL PROTECTED]) wrote: > On Sunday 12 August 2007 22:36, I wrote: > > Note! There are two more issues I forgot to mention earlier. > > Oops, and there is also: > > 3) The bio throttle, which is supposed to prevent deadlock, can itsel

Re: Distributed storage.

2007-08-13 Thread Jens Axboe
On Mon, Aug 13 2007, Jens Axboe wrote: > > You did not comment on the one about putting the bio destructor in > > the ->endio handler, which looks dead simple. The majority of cases > > just use the default endio handler and the default destructor. Of the > > remaining cases, where a specializ

Re: Distributed storage.

2007-08-13 Thread Jens Axboe
On Sun, Aug 12 2007, Daniel Phillips wrote: > On Tuesday 07 August 2007 13:55, Jens Axboe wrote: > > I don't like structure bloat, but I do like nice design. Overloading > > is a necessary evil sometimes, though. Even today, there isn't enough > > room to hold bi_rw and bi_flags in the same variabl

Re: Block device throttling [Re: Distributed storage.]

2007-08-13 Thread Daniel Phillips
On Sunday 12 August 2007 22:36, I wrote: > Note! There are two more issues I forgot to mention earlier. Oops, and there is also: 3) The bio throttle, which is supposed to prevent deadlock, can itself deadlock. Let me see if I can remember how it goes. * generic_make_request puts a bio in fl

Re: Block device throttling [Re: Distributed storage.]

2007-08-12 Thread Daniel Phillips
(previous incomplete message sent accidentally) On Wednesday 08 August 2007 02:54, Evgeniy Polyakov wrote: > On Tue, Aug 07, 2007 at 10:55:38PM +0200, Jens Axboe wrote: > > So, what did we decide? To bloat bio a bit (add a queue pointer) or > to use physical device limits? The latter requires to r

Re: Block device throttling [Re: Distributed storage.]

2007-08-12 Thread Daniel Phillips
On Wednesday 08 August 2007 02:54, Evgeniy Polyakov wrote: > On Tue, Aug 07, 2007 at 10:55:38PM +0200, Jens Axboe ([EMAIL PROTECTED]) wrote: > > So, what did we decide? To bloat bio a bit (add a queue pointer) or > to use physical device limits? The latter requires to replace all > occurence of bi

Re: Distributed storage.

2007-08-12 Thread Daniel Phillips
On Tuesday 07 August 2007 13:55, Jens Axboe wrote: > I don't like structure bloat, but I do like nice design. Overloading > is a necessary evil sometimes, though. Even today, there isn't enough > room to hold bi_rw and bi_flags in the same variable on 32-bit archs, > so that concern can be scratche

Re: [1/1] Block device throttling [Re: Distributed storage.]

2007-08-12 Thread Daniel Phillips
Hi Evgeniy, Sorry for not getting back to you right away, I was on the road with limited email access. Incidentally, the reason my mails to you keep bouncing is, your MTA is picky about my mailer's IP reversing to a real hostname. I will take care of that pretty soon, but for now my direct m

Re: [1/1] Block device throttling [Re: Distributed storage.]

2007-08-08 Thread Evgeniy Polyakov
On Wed, Aug 08, 2007 at 02:17:09PM +0400, Evgeniy Polyakov ([EMAIL PROTECTED]) wrote: > This throttling mechanism allows to limit maximum amount of queued bios > per physical device. By default it is turned off and old block layer > behaviour with unlimited number of bios is used. When turned on

[1/1] Block device throttling [Re: Distributed storage.]

2007-08-08 Thread Evgeniy Polyakov
This throttling mechanism allows to limit maximum amount of queued bios per physical device. By default it is turned off and old block layer behaviour with unlimited number of bios is used. When turned on (queue limit is set to something different than -1U via blk_set_queue_limit()), generic_make

Block device throttling [Re: Distributed storage.]

2007-08-08 Thread Evgeniy Polyakov
On Tue, Aug 07, 2007 at 10:55:38PM +0200, Jens Axboe ([EMAIL PROTECTED]) wrote: > I don't like structure bloat, but I do like nice design. Overloading is So, what did we decide? To bloat bio a bit (add a queue pointer) or to use physical device limits? The latter requires to replace all occurence

Re: Distributed storage.

2007-08-07 Thread Jens Axboe
On Tue, Aug 07 2007, Daniel Phillips wrote: > On Tuesday 07 August 2007 05:05, Jens Axboe wrote: > > On Sun, Aug 05 2007, Daniel Phillips wrote: > > > A simple way to solve the stable accounting field issue is to add a > > > new pointer to struct bio that is owned by the top level submitter > > > (

Re: Distributed storage.

2007-08-07 Thread Daniel Phillips
On Tuesday 07 August 2007 05:05, Jens Axboe wrote: > On Sun, Aug 05 2007, Daniel Phillips wrote: > > A simple way to solve the stable accounting field issue is to add a > > new pointer to struct bio that is owned by the top level submitter > > (normally generic_make_request but not always) and is n

Re: Distributed storage.

2007-08-07 Thread Jens Axboe
On Sun, Aug 05 2007, Daniel Phillips wrote: > A simple way to solve the stable accounting field issue is to add a new > pointer to struct bio that is owned by the top level submitter > (normally generic_make_request but not always) and is not affected by > any recursive resubmission. Then getti

Re: Distributed storage.

2007-08-06 Thread Evgeniy Polyakov
On Sun, Aug 05, 2007 at 02:35:04PM -0700, Daniel Phillips ([EMAIL PROTECTED]) wrote: > On Sunday 05 August 2007 08:01, Evgeniy Polyakov wrote: > > On Sun, Aug 05, 2007 at 01:06:58AM -0700, Daniel Phillips wrote: > > > > DST original code worked as device mapper plugin too, but its two > > > > addi

Re: Distributed storage.

2007-08-06 Thread Evgeniy Polyakov
On Sun, Aug 05, 2007 at 02:23:45PM -0700, Daniel Phillips ([EMAIL PROTECTED]) wrote: > On Sunday 05 August 2007 08:08, Evgeniy Polyakov wrote: > > If we are sleeping in memory pool, then we already do not have memory > > to complete previous requests, so we are in trouble. > > Not at all. Any re

Re: Distributed storage.

2007-08-05 Thread Daniel Phillips
On Sunday 05 August 2007 08:01, Evgeniy Polyakov wrote: > On Sun, Aug 05, 2007 at 01:06:58AM -0700, Daniel Phillips wrote: > > > DST original code worked as device mapper plugin too, but its two > > > additional allocations (io and clone) per block request ended up > > > for me as a show stopper. >

Re: Distributed storage.

2007-08-05 Thread Daniel Phillips
On Sunday 05 August 2007 08:08, Evgeniy Polyakov wrote: > If we are sleeping in memory pool, then we already do not have memory > to complete previous requests, so we are in trouble. Not at all. Any requests in flight are guaranteed to get the resources they need to complete. This is guaranteed

Re: Distributed storage.

2007-08-05 Thread Evgeniy Polyakov
Hi Daniel. On Sun, Aug 05, 2007 at 01:04:19AM -0700, Daniel Phillips ([EMAIL PROTECTED]) wrote: > > we can wait in it for memory in mempool. Although that means we > > already in trouble. > > Not at all. This whole block writeout path needs to be written to run > efficiently even when normal

Re: Distributed storage.

2007-08-05 Thread Evgeniy Polyakov
On Sun, Aug 05, 2007 at 01:06:58AM -0700, Daniel Phillips ([EMAIL PROTECTED]) wrote: > > DST original code worked as device mapper plugin too, but its two > > additional allocations (io and clone) per block request ended up for > > me as a show stopper. > > Ah, sorry, I misread. A show stopper i

Re: Distributed storage.

2007-08-05 Thread Daniel Phillips
On Saturday 04 August 2007 09:44, Evgeniy Polyakov wrote: > > On Tuesday 31 July 2007 10:13, Evgeniy Polyakov wrote: > > > * storage can be formed on top of remote nodes and be > > > exported simultaneously (iSCSI is peer-to-peer only, NBD requires > > > device mapper and is synchronous) > > >

Re: Distributed storage.

2007-08-05 Thread Daniel Phillips
On Saturday 04 August 2007 09:37, Evgeniy Polyakov wrote: > On Fri, Aug 03, 2007 at 06:19:16PM -0700, I wrote: > > To be sure, I am not very proud of this throttling mechanism for > > various reasons, but the thing is, _any_ throttling mechanism no > > matter how sucky solves the deadlock problem.

Re: Distributed storage.

2007-08-04 Thread Evgeniy Polyakov
On Fri, Aug 03, 2007 at 09:04:51AM +0400, Manu Abraham ([EMAIL PROTECTED]) wrote: > On 7/31/07, Evgeniy Polyakov <[EMAIL PROTECTED]> wrote: > > > TODO list currently includes following main items: > > * redundancy algorithm (drop me a request of your own, but it is highly > > unlikley

Re: Distributed storage.

2007-08-04 Thread Evgeniy Polyakov
Hi Daniel. > On Tuesday 31 July 2007 10:13, Evgeniy Polyakov wrote: > > * storage can be formed on top of remote nodes and be exported > > simultaneously (iSCSI is peer-to-peer only, NBD requires device > > mapper and is synchronous) > > In fact, NBD has nothing to do with device mappe

Re: Distributed storage.

2007-08-04 Thread Evgeniy Polyakov
On Fri, Aug 03, 2007 at 06:19:16PM -0700, Daniel Phillips ([EMAIL PROTECTED]) wrote: > It depends on the characteristics of the physical and virtual block > devices involved. Slow block devices can produce surprising effects. > Ddsnap still qualifies as "slow" under certain circumstances (big

Re: Distributed storage.

2007-08-03 Thread Manu Abraham
On 8/4/07, Dave Dillow <[EMAIL PROTECTED]> wrote: > On Fri, 2007-08-03 at 09:04 +0400, Manu Abraham wrote: > > On 7/31/07, Evgeniy Polyakov <[EMAIL PROTECTED]> wrote: > > > > > TODO list currently includes following main items: > > > * redundancy algorithm (drop me a request of your own, but it

Re: Distributed storage.

2007-08-03 Thread Dave Dillow
On Fri, 2007-08-03 at 09:04 +0400, Manu Abraham wrote: > On 7/31/07, Evgeniy Polyakov <[EMAIL PROTECTED]> wrote: > > > TODO list currently includes following main items: > > * redundancy algorithm (drop me a request of your own, but it is highly > > unlikley that Reed-Solomon based wil

Re: Distributed storage.

2007-08-03 Thread Daniel Phillips
On Friday 03 August 2007 03:26, Evgeniy Polyakov wrote: > On Thu, Aug 02, 2007 at 02:08:24PM -0700, I wrote: > > I see bits that worry me, e.g.: > > > > + req = mempool_alloc(st->w->req_pool, GFP_NOIO); > > > > which seems to be callable in response to a local request, just the > > case w

Re: Distributed storage.

2007-08-03 Thread Daniel Phillips
Hi Mike, On Thursday 02 August 2007 21:09, Mike Snitzer wrote: > But NBD's synchronous nature is actually an asset when coupled with > MD raid1 as it provides guarantees that the data has _really_ been > mirrored remotely. And bio completion doesn't? Regards, Daniel - To unsubscribe from this l

Re: Distributed storage.

2007-08-03 Thread Daniel Phillips
Hi Evgeniy, Nit alert: On Tuesday 31 July 2007 10:13, Evgeniy Polyakov wrote: > * storage can be formed on top of remote nodes and be exported > simultaneously (iSCSI is peer-to-peer only, NBD requires device > mapper and is synchronous) In fact, NBD has nothing to do with device

Re: Distributed storage.

2007-08-03 Thread Daniel Phillips
On Friday 03 August 2007 07:53, Peter Zijlstra wrote: > On Fri, 2007-08-03 at 17:49 +0400, Evgeniy Polyakov wrote: > > On Fri, Aug 03, 2007 at 02:27:52PM +0200, Peter Zijlstra wrote: > > ...my main position is to > > allocate per socket reserve from socket's queue, and copy data > > there from main

Re: Distributed storage.

2007-08-03 Thread Daniel Phillips
On Friday 03 August 2007 06:49, Evgeniy Polyakov wrote: > ...rx has global reserve (always allocated on > startup or sometime way before reclaim/oom)where data is originally > received (including skb, shared info and whatever is needed, page is > just an exmaple), then it is copied into per-socket

  1   2   >