Sorry Shawn, Somehow I am still not quite grasping it. I would really appreciate if somebody (or even you) could have another go at very small part of this. Maybe it will clear it up: > Similarly, in performance section of Wiki, it says: "A commit (including a soft commit) will free up almost all heap memory" Why? What is the "hard work" that hard commit does and soft commit does not but still commit to disk. Is it some sort of Lucene segment finalization and new segment creation?
Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Fri, Feb 8, 2013 at 2:57 AM, Shawn Heisey <s...@elyograg.org> wrote: > On 2/7/2013 9:29 PM, Alexandre Rafalovitch wrote: > >> Hello, >> >> What actually happens when using soft (as opposed to hard) commit? >> >> I understand somewhat very high-level picture (documents become available >> faster, but you may loose them on power loss). >> I don't care about low-level implementation details. >> >> But I am trying to understand what is happening on the medium level of >> details. >> >> For example what are stages of a document if we are using all available >> transaction log, soft commit, hard commit options? It feels like there is >> three stages: >> *) Uncommitted (soft or hard): accessible only via direct real-time get? >> *) Soft-committed: accessible through all search operatons? (but not on >> disk? but where is it? in memory?) >> *) Hard-committed: all the same as soft-committed but it is now on disk >> >> Similarly, in performance section of Wiki, it says: "A commit (including >> a >> soft commit) will free up almost all heap memory" - why would soft commit >> free up heap memory? I thought it was not flushed to disk. >> >> Also, with soft-commits and transaction log enabled, doesn't transaction >> log allows to replay/recover the latest state after crash? I believe >> that's >> what transaction log does for the database. If not, how does one recover, >> if at all? >> >> And where does openSearcher=false fits into that? Does it cause >> inconsistent results somehow? >> >> I am missing something, but I am not sure what or where. Any points in the >> right direction would be appreciated. >> > > Let's see if I can answer your questions without giving you incorrect > information. > > New indexed content is not searchable until you open a new searcher, > regardless of the type of commit that you do. > > A hard commit will close the current transaction log and start a new one. > It will also instruct the Directory implementation to flush to disk. If > you specify openSearcher=false, then the content that has just been > committed will NOT be searchable, as discussed in the previous paragraph. > The existing searcher will remain open and continue to serve queries > against the same index data. > > A soft commit does not flush the new content to disk, but it does open a > new searcher. I'm sure that the amount of memory available for caching > this content is not large, so it's possible that if you do a lot of > indexing with soft commits and your hard commits are too infrequent, you'll > end up flushing part of the cached data to disk anyway. I'd love to hear > from a committer about this, because I could be wrong. > > There's a caveat with that 'flush to disk' operation -- the default > Directory implementation in the Solr example config, which is > NRTCachingDirectoryFactory, will cache the last few megabytes of indexed > data and not flush it to disk even with a hard commit. If your commits are > small, then the net result is similar to a soft commit. If the server or > Solr were to crash, the transaction logs would be replayed on Solr startup, > recovering that last few megabytes. The transaction log may also recover > documents that were soft committed, but I'm not 100% sure about that. > > To take full advantage of NRT functionality, you can commit as often as > you like with soft commits. On some reasonable interval, say every one to > fifteen minutes, you can issue a hard commit with openSearcher set to > false, to flush things to disk and cycle through transaction logs before > they get huge. Solr will keep a few of the transaction logs around, and if > they are huge, it can take a long time to replay them. You'll want to > choose a hard commit interval that doesn't create giant transaction logs. > > If any of the info I've given here is wrong, someone should correct me! > > Thanks, > Shawn > >