W dniu 2011-12-08 08:40, Jonathan Ellis pisze:
2011/12/8 Piotr Kołaczkowski<pkola...@ii.pw.edu.pl>:
Right, this would be the best option to have an ability to write into
multiple log files, put on multiple disks. I'm not sure if it is part of
that ticket, though.
It's not. I don't think anyone needs more than 80MB/s or so of
commitlog bandwidth for a while.
BTW: I'm not so sure if multiple, parallel writes to a memory mapped file
would be actually slower or faster than sequential writes. I think the OS
would optimise the writes so that physically they would be sequential, or
even delay them until fsync (or low cached disk buffers), so no performance
loss would occur
Right. What we're trying to fix here is having a single thread doing
the copying + checksumming being a bottleneck. The i/o pattern should
stay more or less the same.
Thanks for explanation. This is exactly what I understood from the
ticket. Also calculating the serialized size twice looks like a waste of
CPU to me (or am I wrong and it is calculated once?)
Now, the longer I think about this ticket, I've got more questions.
Can someone tell me what is the use pattern of the CommitLog#add method?
I mean, is it possible, that a single thread calls add many times,
remembers the returned Future objects and *then* waits on all / some of
them? Or is it always like: add, then wait (until the Future is ready),
add, wait, add, wait... ? If the former is true, then we would benefit
from returning the Future objects as early as possible, without
performing any heavy calculations in the add method, and making the code
parallel on the output of the queue - by using some kind of a thread
pool executor (or changing current commit log executors to have more
than one worker thread). Then, even if a single thread writes to the
CommitLog many RowMutations, the CRC and copying would be still parallel
and fast. What do you think of it? Does it make sense? In the future,
such architecture could be extended to supporting many log files on
separate disks :)
To summarize:
The current architecture:
many threads (calc. size) -> queue -> one thread (calc. size,
serialize, CRC, allocate, copy, fsync)
My 1st proposal:
many threads (calc. size, serialize, CRC) -> queue -> one thread
(allocate, copy, fsync)
My 2nd proposal:
many threads (calc. size, allocate, serialize, CRC, copy) -> queue ->
one thread (fsync)
My 3rd proposal:
many threads (calc. size, allocate, serialize directly into buffer, CRC)
-> queue -> one thread (fsync)
My 4th proposal:
many threads (no op) -> queue -> n threads, where n = number of cores
(calc. size, allocate, serialize, CRC, copy) -> queue -> one thread (fsync)
Which one do you like the most?
--
Piotr Kołaczkowski
Instytut Informatyki, Politechnika Warszawska
Nowowiejska 15/19, 00-665 Warszawa
e-mail: pkola...@ii.pw.edu.pl
www: http://home.elka.pw.edu.pl/~pkolaczk/