Re: Ticket CASSANDRA-3578 - Multithreaded CommitLog

Piotr Kołaczkowski Thu, 08 Dec 2011 08:45:26 -0800


W dniu 2011-12-08 08:40, Jonathan Ellis pisze:

2011/12/8 Piotr Kołaczkowski<pkola...@ii.pw.edu.pl>:

Right, this would be the best option to have an ability to write into
multiple log files, put on multiple disks. I'm not sure if it is part of
that ticket, though.

It's not.  I don't think anyone needs more than 80MB/s or so of
commitlog bandwidth for a while.

BTW: I'm not so sure if multiple, parallel writes to a memory mapped file
would be actually slower or faster than sequential writes. I think the OS
would optimise the writes so that physically they would be sequential, or
even delay them until fsync (or low cached disk buffers), so no performance
loss would occur

Right.  What we're trying to fix here is having a single thread doing
the copying + checksumming being a bottleneck.  The i/o pattern should
stay more or less the same.

Thanks for explanation. This is exactly what I understood from theticket. Also calculating the serialized size twice looks like a waste ofCPU to me (or am I wrong and it is calculated once?)

Now, the longer I think about this ticket, I've got more questions.

Can someone tell me what is the use pattern of the CommitLog#add method?I mean, is it possible, that a single thread calls add many times,remembers the returned Future objects and *then* waits on all / some ofthem? Or is it always like: add, then wait (until the Future is ready),add, wait, add, wait... ? If the former is true, then we would benefitfrom returning the Future objects as early as possible, withoutperforming any heavy calculations in the add method, and making the codeparallel on the output of the queue - by using some kind of a threadpool executor (or changing current commit log executors to have morethan one worker thread). Then, even if a single thread writes to theCommitLog many RowMutations, the CRC and copying would be still paralleland fast. What do you think of it? Does it make sense? In the future,such architecture could be extended to supporting many log files onseparate disks :)



To summarize:

The current architecture:

many threads (calc. size) -> queue -> one thread (calc. size,serialize, CRC, allocate, copy, fsync)


My 1st proposal:

many threads (calc. size, serialize, CRC) -> queue -> one thread(allocate, copy, fsync)


My 2nd proposal:

many threads (calc. size, allocate, serialize, CRC, copy) -> queue ->one thread (fsync)


My 3rd proposal:

many threads (calc. size, allocate, serialize directly into buffer, CRC)-> queue -> one thread (fsync)


My 4th proposal:

many threads (no op) -> queue -> n threads, where n = number of cores(calc. size, allocate, serialize, CRC, copy) -> queue -> one thread (fsync)


Which one do you like the most?

--
Piotr Kołaczkowski
Instytut Informatyki, Politechnika Warszawska
Nowowiejska 15/19, 00-665 Warszawa
e-mail: pkola...@ii.pw.edu.pl
www: http://home.elka.pw.edu.pl/~pkolaczk/

Re: Ticket CASSANDRA-3578 - Multithreaded CommitLog

Reply via email to