Jialin,

I'm curious to know why you're asking all these questions. Are you working on 
some research project that involves BookKeeper? Otherwise, what's your use case 
if you don't mind sharing?


-Flavio



On Monday, July 21, 2014 1:34 PM, Ivan Kelly <[email protected]> wrote:
 

>
>
>We have considered something like this in the past. However, it would
>mean that reads will affect the latency or writes, as they will move
>the disk head.
>
>It's also the case that the interleaved entrylog performs really badly
>on reads. Work has been done recently to improve this, by buffering
>entries and sorting them by ledger id before flushing to the
>entrylog. This means that reads for a specific ledger will be
>sequential as opposed to jumping all over the place as it has to do
>now. If we used the journal for this, then we wouldn't be able to do
>this processing, as the point of the journal is to ensure that the
>entry is on persistent storage before replying to the client. If we
>buffered enough to get benefit from sorting, write latency would be
>enormous.
>
>-Ivan
>
>
>On Sat, Jul 19, 2014 at 01:55:16PM -0700, Jaln wrote:
>> Thank you so much, Rakesh,
>> Without consideration of performance, can we just maintain one file. For
>> example journal file, and the index for each entry.
>> 
>> Best,
>> Jaln
>> 
>> 
>> On Fri, Jul 18, 2014 at 11:23 PM, Rakesh Radhakrishnan <
>> [email protected]> wrote:
>> 
>> > Hi Jaln,
>> >
>> > >>>>>>for the data in the journal file(*.txn) and the entry log
>> > file(*.log), are
>> > >>>>>>they similar?
>> > >>>>>>for example, when I add an entry, this opeartion and the entry data
>> > will be
>> > >>>>>>logged in the journal file,
>> > >>>>>>and the entry data will be logged in the entry log file (*.log),
>> > right?
>> >
>> > As I mentioned earlier, when an entry is added Bookie server will add only
>> > this entry to the journal file and will send a response back to the
>> > client after the successful flush to the disk. Later during checkpointing
>> > time, server will read the journal entries and add it to the entry logger
>> > files. Also, it will generate index files corresponding to each ledgers for
>> > the faster access. This old journal file will be garbage collected, because
>> > all these entries are mapped it to the entry logger.
>> >
>> > >>>>>what's the purpose of the two files?
>> > AFAIK, adding to entry log and generating index is a costly I/O operation
>> > and will affect the performance. Thats the reason, first will only add
>> > transactions to journal file and send a response quickly. Later will add it
>> > to the entrylog file & index files offline.
>> >
>> > Total bookie stored data = entry logger data + journal data(most recent
>> > data)
>> >
>> > *For example:* I'm calling write operation as transaction. Assume client
>> > has performed 20 transactions. All these exists only in the journal file.
>> > Say, now checkpointing triggered. It will add these 20 transactions to the
>> > entry logger file and generate indexes. Again assume user performed 10 more
>> > transactions. Now we have total 30 transactions.
>> >
>> > Bookie data(30 transactions) = 20 + 10.
>> >
>> > Regards,
>> > Rakesh
>> >
>> >
>> >
>> > On Sat, Jul 19, 2014 at 9:52 AM, Jaln <[email protected]> wrote:
>> >
>> > > Thanks Rakesh,
>> > > for the data in the journal file(*.txn) and the entry log file(*.log),
>> > are
>> > > they similar?
>> > > for example, when I add an entry, this opeartion and the entry data will
>> > be
>> > > logged in the journal file,
>> > > and the entry data will be logged in the entry log file (*.log), right?
>> > > what's the purpose of the two files?
>> > >
>> > > Thanks,
>> > > Jaln
>> > >
>> > > On Fri, Jul 18, 2014 at 8:16 PM, Rakesh Radhakrishnan <
>> > > [email protected]> wrote:
>> > >
>> > > > Hi Jaln,
>> > > >
>> > > > No, both are different. I hope you are asking about 'entry log' files
>> > and
>> > > > 'journal' files
>> > > >
>> > > > *Journal : *When client performs a write operation (such as adding an
>> > > entry
>> > > > etc), it is first recorded in the journal file. Journal will be flushed
>> > > and
>> > > > synced after every write operation before a success code is returned to
>> > > the
>> > > > client. This ensures that no operation is lost due to machine failure.
>> > > >
>> > > > *Entry Log : *It is not updated for every write operation, bookie
>> > server
>> > > > will do it lazily. Because writing out the ledger involves - update
>> > > ledger
>> > > > index files to faster look up and add entry to the logger file. This
>> > will
>> > > > be a costly operation and will affect the performance.
>> > > >
>> > > > In Bookie, there is a dedicated thread to play journal transactions and
>> > > add
>> > > > it to the logger lazily, this is called as checkpointing operation.
>> > This
>> > > > will be performed periodically, now the data will be persisted to
>> > ledger
>> > > > index files and entry logger. By default the 'flushInterval' is 100
>> > > > milliseconds. Probably you can configure a bigger value to see the
>> > > > difference.
>> > > >
>> > > > *"SyncThread"* is a background thread which help checkpointing. After a
>> > > > ledger storage is checkpointed, the journal files added before
>> > checkpoint
>> > > > will be garbage collected.
>> > > >
>> > > > Cheers,
>> > > > Rakesh
>> > > >
>> > > >
>> > > > On Sat, Jul 19, 2014 at 1:41 AM, Jaln <[email protected]> wrote:
>> > > >
>> > > > > Hi,
>> > > > > is the ledger file and journal file same?
>> > > > > I run the bookkeeper and generate the bookie,
>> > > > > inside the bookie, I found the journal file and ledger file are
>> > almost
>> > > > > same.
>> > > > >
>> > > > > Best,
>> > > > > Jialin
>> > > > >
>> > > >
>> > >
>> >
>
>
>

Reply via email to