On 3/16/2017 6:02 AM, Ganesh M wrote: > We have 1 million of documents and would like to query with multiple fq > values. > > We have kept the access_control ( multi value field ) which holds information > about for which group that document is accessible. > > Now to get the list of all the documents of an user, we would like to pass > multiple fq values ( one for each group user belongs to ) > > q:somefiled:value&fq:access_control:g1&fq:access_control:g2&fq:access_control:g3&fq:access_control:g4&fq:access_control:g5... > > Like this, there could be 100 groups for an user.
The correct syntax is fq=field:value -- what you have there is not going to work. This might not do what you expect. Filter queries are ANDed together -- *every* filter must match, which means that if a document that you want has only one of those values in access_control, or has 98 of them but not all 100, then the query isn't going to match that document. The solution is one filter query that can match ANY of them, which also might run faster. I can't say whether this is a problem for you or not. Your data might be completely correct for matching 100 filters. Also keep in mind that there is a limit to the size of a URL that you can send into any webserver, including the container that runs Solr. That default limit is 8192 bytes, and includes the "GET " or "POST " at the beginning and the " HTTP/1.1" at the end (note the spaces). The filter query information for 100 of the filters you mentioned is going to be over 2K, which will fit in the default, but if your query has more complexity than you have mentioned here, the total URL might not fit. There's a workaround to this -- use a POST request and put the parameters in the request body. > If we fire query with 100 values in the fq, whats the penalty on the > performance ? Can we get the result in less than one second for 1 million of > documents. With one million documents, each internal filter query result is 250000 bytes -- the number of documents divided by eight. That's 2.5 megabytes for 100 of them. In addition, every time a filter is run, it must examine every document in the index to create that 250000 byte structure, which means that filters which *aren't* found in the filterCache are relatively slow. If they are found in the cache, they're lightning fast, because the cache will contain the entire 250000 byte bitset. If you make your filterCache large enough, it's going to consume a LOT of java heap memory, particularly if the index gets bigger. The nice thing about the filterCache is that once the cache entries exist, the filters are REALLY fast, and if they're all cached, you would DEFINITELY be able to get results in under one second. I have no idea whether the same would happen when filters aren't cached. It might. Filters that do not exist in the cache will be executed in parallel, so the number of CPUs that you have in the machine, along with the query rate, will have a big impact on the overall performance of a single query with a lot of filters. Also related to the filterCache, keep in mind that every time a commit is made that opens a new searcher, the filterCache will be autowarmed. If the autowarmCount value for the filterCache is large, that can make commits take a very long time, which will cause problems if commits are happening frequently. On the other hand, a very small autowarmCount can cause slow performance after a commit if you use a lot of filters. My reply is longer and more dense than I had anticipated. Apologies if it's information overload. Thanks, Shawn