bq: My guess so far is that the filter has to fetch the unique key for all documents in results, which consumes a lot of resources.
Guessing here and going from memory, but... If you have some code like reader.get(doc).get("id") it'll totally barf. Problem here is that to get the id field, it has to go out to disk, decompress the data and pull out the field. You want to use docValues to read the data from the _index_ rather than the stored field. Like I said, this is a guess... Best, Erick On Thu, Mar 17, 2016 at 1:27 AM, John Smith <solr-u...@remailme.net> wrote: > Hi, > > The purpose of the project is an actual RT Search, not NRT, but with a > specific condition: when an updated document meets a fixed criteria, it > should be filtered out from future results (no reuse of the document). > This criteria is present in the search query but of course doesn't work > for uncommitted documents. > > What I wrote is a combination of the following: > - an UpdateRequestProcessor in the update chain storing the document > unique key in a local cache when the condition is met > - a postCommit listener clearing the cache > - a PostFilter collecting documents that aren't found in the cache, > activated in the search query as a fq parameter > > Functionally it does the job, however for large indexes the filter takes > a hit. The index that poses problem has 18 mil documents in 13Gb, and > queries return an average of 25,000 docs in results. The VM has 8 cores > and 20Gb RAM, and uses nimble storage (combination of ssd & hd). Without > the code Solr works like a charm. My guess so far is that the filter has > to fetch the unique key for all documents in results, which consumes a > lot of resources. > > What would be your advice? > - Could I use the internal document id instead of a field value? This id > would have to be available both in the UpdateRequestProcessor and > PostFilter: is it the case and how can I access it? I suppose the > SolrInputDocument in the request processor doesn't have it yet anyway. > - If I reduce the autoSoftCommit maxDocs value (how far?), would it be > wise (and feasible) to convert the PostFilter into a plain filter query > such as "*:* NOT (id:1 OR id:2)" or something similar? How could I > implement this and how to estimate the filter cost in order for Solr to > execute it at the right position? > - Maybe I took the wrong path altogether? > > Thanks in advance > John > >