On Fri, Jul 25, 2014 at 12:04 PM, Jaln <[email protected]> wrote:

> Thanks Ivan,
> From the bookkeeper tutorial(
> http://zookeeper.apache.org/doc/r3.4.1/bookkeeperOverview.html)
> It says `A server maintains an in-memory data structure (with periodic
> snapshots for example) and logs changes to that structure before it applies
> the change.',
> but why what I see by opening the ledger file and journal file, both of
> them are same, i.e., entry data. (for example, if I use hedwig to pub a
> topic, both the ledger and journal file only have the topic/contents, which
> is what I called entry data, no change information, e.g., pub, is logged.)
> why there is no such `change' information that can be used to recover
> failure.
>
> Maybe my understanding is wrong, plz correct me. Thanks a lot.
>

There isn't an update or modification on an existing entry. BK only append
entries into a ledger, so all entries are new data. Let's why we record a
entry as 'add' in journal.


>
> Best,
> Jialin
>
> On Mon, Jul 21, 2014 at 5:31 AM, Ivan Kelly <[email protected]> wrote:
>
> > We have considered something like this in the past. However, it would
> > mean that reads will affect the latency or writes, as they will move
> > the disk head.
> >
> > It's also the case that the interleaved entrylog performs really badly
> > on reads. Work has been done recently to improve this, by buffering
> > entries and sorting them by ledger id before flushing to the
> > entrylog. This means that reads for a specific ledger will be
> > sequential as opposed to jumping all over the place as it has to do
> > now. If we used the journal for this, then we wouldn't be able to do
> > this processing, as the point of the journal is to ensure that the
> > entry is on persistent storage before replying to the client. If we
> > buffered enough to get benefit from sorting, write latency would be
> > enormous.
> >
> > -Ivan
> >
> > On Sat, Jul 19, 2014 at 01:55:16PM -0700, Jaln wrote:
> > > Thank you so much, Rakesh,
> > > Without consideration of performance, can we just maintain one file.
> For
> > > example journal file, and the index for each entry.
> > >
> > > Best,
> > > Jaln
> > >
> > >
> > > On Fri, Jul 18, 2014 at 11:23 PM, Rakesh Radhakrishnan <
> > > [email protected]> wrote:
> > >
> > > > Hi Jaln,
> > > >
> > > > >>>>>>for the data in the journal file(*.txn) and the entry log
> > > > file(*.log), are
> > > > >>>>>>they similar?
> > > > >>>>>>for example, when I add an entry, this opeartion and the entry
> > data
> > > > will be
> > > > >>>>>>logged in the journal file,
> > > > >>>>>>and the entry data will be logged in the entry log file
> (*.log),
> > > > right?
> > > >
> > > > As I mentioned earlier, when an entry is added Bookie server will add
> > only
> > > > this entry to the journal file and will send a response back to the
> > > > client after the successful flush to the disk. Later during
> > checkpointing
> > > > time, server will read the journal entries and add it to the entry
> > logger
> > > > files. Also, it will generate index files corresponding to each
> > ledgers for
> > > > the faster access. This old journal file will be garbage collected,
> > because
> > > > all these entries are mapped it to the entry logger.
> > > >
> > > > >>>>>what's the purpose of the two files?
> > > > AFAIK, adding to entry log and generating index is a costly I/O
> > operation
> > > > and will affect the performance. Thats the reason, first will only
> add
> > > > transactions to journal file and send a response quickly. Later will
> > add it
> > > > to the entrylog file & index files offline.
> > > >
> > > > Total bookie stored data = entry logger data + journal data(most
> recent
> > > > data)
> > > >
> > > > *For example:* I'm calling write operation as transaction. Assume
> > client
> > > > has performed 20 transactions. All these exists only in the journal
> > file.
> > > > Say, now checkpointing triggered. It will add these 20 transactions
> to
> > the
> > > > entry logger file and generate indexes. Again assume user performed
> 10
> > more
> > > > transactions. Now we have total 30 transactions.
> > > >
> > > > Bookie data(30 transactions) = 20 + 10.
> > > >
> > > > Regards,
> > > > Rakesh
> > > >
> > > >
> > > >
> > > > On Sat, Jul 19, 2014 at 9:52 AM, Jaln <[email protected]> wrote:
> > > >
> > > > > Thanks Rakesh,
> > > > > for the data in the journal file(*.txn) and the entry log
> > file(*.log),
> > > > are
> > > > > they similar?
> > > > > for example, when I add an entry, this opeartion and the entry data
> > will
> > > > be
> > > > > logged in the journal file,
> > > > > and the entry data will be logged in the entry log file (*.log),
> > right?
> > > > > what's the purpose of the two files?
> > > > >
> > > > > Thanks,
> > > > > Jaln
> > > > >
> > > > > On Fri, Jul 18, 2014 at 8:16 PM, Rakesh Radhakrishnan <
> > > > > [email protected]> wrote:
> > > > >
> > > > > > Hi Jaln,
> > > > > >
> > > > > > No, both are different. I hope you are asking about 'entry log'
> > files
> > > > and
> > > > > > 'journal' files
> > > > > >
> > > > > > *Journal : *When client performs a write operation (such as
> adding
> > an
> > > > > entry
> > > > > > etc), it is first recorded in the journal file. Journal will be
> > flushed
> > > > > and
> > > > > > synced after every write operation before a success code is
> > returned to
> > > > > the
> > > > > > client. This ensures that no operation is lost due to machine
> > failure.
> > > > > >
> > > > > > *Entry Log : *It is not updated for every write operation, bookie
> > > > server
> > > > > > will do it lazily. Because writing out the ledger involves -
> update
> > > > > ledger
> > > > > > index files to faster look up and add entry to the logger file.
> > This
> > > > will
> > > > > > be a costly operation and will affect the performance.
> > > > > >
> > > > > > In Bookie, there is a dedicated thread to play journal
> > transactions and
> > > > > add
> > > > > > it to the logger lazily, this is called as checkpointing
> operation.
> > > > This
> > > > > > will be performed periodically, now the data will be persisted to
> > > > ledger
> > > > > > index files and entry logger. By default the 'flushInterval' is
> 100
> > > > > > milliseconds. Probably you can configure a bigger value to see
> the
> > > > > > difference.
> > > > > >
> > > > > > *"SyncThread"* is a background thread which help checkpointing.
> > After a
> > > > > > ledger storage is checkpointed, the journal files added before
> > > > checkpoint
> > > > > > will be garbage collected.
> > > > > >
> > > > > > Cheers,
> > > > > > Rakesh
> > > > > >
> > > > > >
> > > > > > On Sat, Jul 19, 2014 at 1:41 AM, Jaln <[email protected]>
> > wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > > is the ledger file and journal file same?
> > > > > > > I run the bookkeeper and generate the bookie,
> > > > > > > inside the bookie, I found the journal file and ledger file are
> > > > almost
> > > > > > > same.
> > > > > > >
> > > > > > > Best,
> > > > > > > Jialin
> > > > > > >
> > > > > >
> > > > >
> > > >
> >
>
>
>
> --
>
> Genius only means hard-working all one's life
>

Reply via email to