I think the attachment got stripped. Here it is: http://www.flickr.com/photos/otis/8409088080/in/photostream
Otis -- Solr & ElasticSearch Support http://sematext.com/ On Tue, Jan 22, 2013 at 12:36 PM, Otis Gospodnetic < otis.gospodne...@gmail.com> wrote: > Same here - I've seen some document caches that were huge and highly > utilized. Check out the screenshot of the SPM for Solr dashboard that > shows pretty high hit rates on all caches. I've circled the parts to look > at. ML manager may strip the attachment, of course. :) > > In addition to multiple in-request lookups and hits in document cache, > document caches provide value when queries are frequently somewhat similar > and thus return some of the same hits as previous queries. > > Otis > -- > Solr & ElasticSearch Support > http://sematext.com/ > > > > > > On Mon, Jan 21, 2013 at 1:39 PM, Erick Erickson > <erickerick...@gmail.com>wrote: > >> Hmm, interesting. I'll have to look closer... >> >> On Sun, Jan 20, 2013 at 3:50 PM, Walter Underwood <wun...@wunderwood.org> >> wrote: >> > I routinely see hit rates over 75% on the document cache. Perhaps yours >> is too small. Mine is set at 10240 entries. >> > >> > wunder >> > >> > On Jan 20, 2013, at 8:08 AM, Erick Erickson wrote: >> > >> >> About your question about document cache: Typically the document cache >> >> has a pretty low hit-ratio. I've rarely, if ever, seen it get hit very >> >> often. And remember that this cache is only hit when assembling the >> >> response for a few documents (your page size). >> >> >> >> Bottom line: I wouldn't worry about this cache much. It's quite useful >> >> for processing a particular query faster, but not really intended for >> >> cross-query use. >> >> >> >> Really, I think you're getting the cart before the horse here. Run it >> >> up the flagpole and try it. Rely on the OS to do its job >> >> ( >> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html). >> >> Find a bottleneck _then_ tune. Premature optimization and all >> >> that.... >> >> >> >> Several tens of millions of docs isn't that large unless the text >> >> fields are enormous. >> >> >> >> Best >> >> Erick >> >> >> >> On Sat, Jan 19, 2013 at 2:32 PM, Isaac Hebsh <isaac.he...@gmail.com> >> wrote: >> >>> Ok. Thank you everyone for your helpful answers. >> >>> I understand that fieldValueCache is not used for resolving queries. >> >>> Is there any cache that can help this basic scenario (a lot of >> different >> >>> queries, on a small set of fields)? >> >>> Does Lucene's FieldCache help (implicitly)? >> >>> How can I use RAM to reduce I/O in this type of queries? >> >>> >> >>> >> >>> On Fri, Jan 18, 2013 at 4:09 PM, Tomás Fernández Löbbe < >> >>> tomasflo...@gmail.com> wrote: >> >>> >> >>>> No, the fieldValueCache is not used for resolving queries. Only for >> >>>> multi-token faceting and apparently for the stats component too. The >> >>>> document cache maintains in memory the stored content of the fields >> you are >> >>>> retrieving or highlighting on. It'll hit if the same document >> matches the >> >>>> query multiple times and the same fields are requested, but as Eirck >> said, >> >>>> it is important for cases when multiple components in the same >> request need >> >>>> to access the same data. >> >>>> >> >>>> I think soft committing every 10 minutes is totally fine, but you >> should >> >>>> hard commit more often if you are going to be using transaction log. >> >>>> openSearcher=false will essentially tell Solr not to open a new >> searcher >> >>>> after the (hard) commit, so you won't see the new indexed data and >> caches >> >>>> wont be flushed. openSearcher=false makes sense when you are using >> >>>> hard-commits together with soft-commits, as the "soft-commit" is >> dealing >> >>>> with opening/closing searchers, you don't need hard commits to do it. >> >>>> >> >>>> Tomás >> >>>> >> >>>> >> >>>> On Fri, Jan 18, 2013 at 2:20 AM, Isaac Hebsh <isaac.he...@gmail.com> >> >>>> wrote: >> >>>> >> >>>>> Unfortunately, it seems ( >> >>>>> http://lucene.472066.n3.nabble.com/Nrt-and-caching-td3993612.html) >> that >> >>>>> these caches are not per-segment. In this case, I want to (soft) >> commit >> >>>>> less frequently. Am I right? >> >>>>> >> >>>>> Tomás, as the fieldValueCache is very similar to lucene's >> FieldCache, I >> >>>>> guess it has a big contribution to standard (not only faceted) >> queries >> >>>>> time. SolrWiki claims that it primarily used by faceting. What that >> says >> >>>>> about complex textual queries? >> >>>>> >> >>>>> documentCache: >> >>>>> Erick, After a query processing is finished, doesn't some documents >> stay >> >>>> in >> >>>>> the documentCache? can't I use it to accelerate queries that should >> >>>>> retrieve stored fields of documents? In this case, a big >> documentCache >> >>>> can >> >>>>> hold more documents.. >> >>>>> >> >>>>> About commit frequency: >> >>>>> HardCommit: "openSearch=false" seems as a nice solution. Where can >> I read >> >>>>> about this? (found nothing but one unexplained sentence in >> SolrWiki). >> >>>>> SoftCommit: In my case, the required index freshness is 10 minutes. >> The >> >>>>> plan to soft commit every 10 minutes is similar to storing all of >> the >> >>>>> documents in a queue (outside to Solr), an indexing a bulk every 10 >> >>>>> minutes. >> >>>>> >> >>>>> Thanks. >> >>>>> >> >>>>> >> >>>>> On Fri, Jan 18, 2013 at 2:15 AM, Tomás Fernández Löbbe < >> >>>>> tomasflo...@gmail.com> wrote: >> >>>>> >> >>>>>> I think fieldValueCache is not per segment, only fieldCache is. >> >>>> However, >> >>>>>> unless I'm missing something, this cache is only used for faceting >> on >> >>>>>> multivalued fields >> >>>>>> >> >>>>>> >> >>>>>> On Thu, Jan 17, 2013 at 8:58 PM, Erick Erickson < >> >>>> erickerick...@gmail.com >> >>>>>>> wrote: >> >>>>>> >> >>>>>>> filterCache: This is bounded by 1M * (maxDoc) / 8 * (num filters >> in >> >>>>>>> cache). Notice the /8. This reflects the fact that the filters are >> >>>>>>> represented by a bitset on the _internal_ Lucene ID. UniqueId has >> no >> >>>>>>> bearing here whatsoever. This is, in a nutshell, why warming is >> >>>>>>> required, the internal Lucene IDs may change. Note also that it's >> >>>>>>> maxDoc, the internal arrays have "holes" for deleted documents. >> >>>>>>> >> >>>>>>> Note this is an _upper_ bound, if there are only a few docs that >> >>>>>>> match, the size will be (num of matching docs) * sizeof(int)). >> >>>>>>> >> >>>>>>> fieldValueCache. I don't think so, although I'm a bit fuzzy on >> this. >> >>>>>>> It depends on whether these are "per-segment" caches or not. Any >> "per >> >>>>>>> segment" cache is still valid. >> >>>>>>> >> >>>>>>> Think of documentCache as intended to hold the stored fields while >> >>>>>>> various components operate on it, thus avoiding repeatedly >> fetching >> >>>>>>> the data from disk. It's _usually_ not too big a worry. >> >>>>>>> >> >>>>>>> About hard-commits once a day. That's _extremely_ long. Think >> instead >> >>>>>>> of committing more frequently with openSearcher=false. If nothing >> >>>>>>> else, you transaction log will grow lots and lots and lots. I'm >> >>>>>>> thinking on the order of 15 minutes, or possibly even much less. >> With >> >>>>>>> softCommits happening more often, maybe every 15 seconds. In fact, >> >>>> I'd >> >>>>>>> start out with soft commits every 15 seconds and hard commits >> >>>>>>> (openSearcher=false) every 5 minutes. The problem with hard >> commits >> >>>>>>> being once a day is that, if for any reason the server is >> >>>> interrupted, >> >>>>>>> on startup Solr will try to replay the entire transaction log to >> >>>>>>> assure index integrity. Not to mention that your tlog will be >> huge. >> >>>>>>> Not to mention that there is some memory usage for each document >> in >> >>>>>>> the tlog. Hard commits roll over the tlog, flush the in-memory >> tlog >> >>>>>>> pointers, close index segments, etc. >> >>>>>>> >> >>>>>>> Best >> >>>>>>> Erick >> >>>>>>> >> >>>>>>> On Thu, Jan 17, 2013 at 1:29 PM, Isaac Hebsh < >> isaac.he...@gmail.com> >> >>>>>>> wrote: >> >>>>>>>> Hi, >> >>>>>>>> >> >>>>>>>> I am going to build a big Solr (4.0?) index, which holds some >> >>>> dozens >> >>>>> of >> >>>>>>>> millions of documents. Each document has some dozens of fields, >> and >> >>>>> one >> >>>>>>> big >> >>>>>>>> textual field. >> >>>>>>>> The queries on the index are non-trivial, and a little-bit long >> >>>>> (might >> >>>>>> be >> >>>>>>>> hundreds of terms). No query is identical to another. >> >>>>>>>> >> >>>>>>>> Now, I want to analyze the cache performance (before setting up >> the >> >>>>>> whole >> >>>>>>>> environment), in order to estimate how much RAM will I need. >> >>>>>>>> >> >>>>>>>> filterCache: >> >>>>>>>> In my scenariom, every query has some filters. let's say that >> each >> >>>>>> filter >> >>>>>>>> matches 1M documents, out of 10M. Does the estimated memory usage >> >>>>>> should >> >>>>>>> be >> >>>>>>>> 1M * sizeof(uniqueId) * num-of-filters-in-cache? >> >>>>>>>> >> >>>>>>>> fieldValueCache: >> >>>>>>>> Due to the difference between queries, I guess that >> fieldValueCache >> >>>>> is >> >>>>>>> the >> >>>>>>>> most important factor on query performance. Here comes a generic >> >>>>>>> question: >> >>>>>>>> I'm indexing new documents to the index constantly. Soft commits >> >>>> will >> >>>>>> be >> >>>>>>>> performed every 10 mins. Does it say that the cache is >> meaningless, >> >>>>>> after >> >>>>>>>> every 10 minutes? >> >>>>>>>> >> >>>>>>>> documentCache: >> >>>>>>>> enableLazyFieldLoading will be enabled, and "fl" contains a very >> >>>>> small >> >>>>>>> set >> >>>>>>>> of fields. BUT, I need to return highlighting on about (possibly) >> >>>> 20 >> >>>>>>>> fields. Does the highlighting component use the documentCache? I >> >>>>> guess >> >>>>>>> that >> >>>>>>>> highlighting requires the whole field to be loaded into the >> >>>>>>> documentCache. >> >>>>>>>> Will it happen only for fields that matched a term from the >> query? >> >>>>>>>> >> >>>>>>>> And one more question: I'm planning to hard-commit once a day. >> >>>>> Should I >> >>>>>>>> prepare to a significant RAM usage growth between hard-commits? >> >>>>>>> (consider a >> >>>>>>>> lot of new documents in this period...) >> >>>>>>>> Does this RAM comes from the same pool as the caches? An >> >>>> OutOfMemory >> >>>>>>>> exception can happen is this scenario? >> >>>>>>>> >> >>>>>>>> Thanks a lot. >> >>>>>>> >> >>>>>> >> >>>>> >> >>>> >> > >> > -- >> > Walter Underwood >> > wun...@wunderwood.org >> > >> > >> > >> > >