Very low filter cache hit ratio
Hi All, I am trying to run an index on solr cloud version 7.3.1 with 3 nodes. Planning to index the records using full index once a day and delta index every 30 minutes. Purpose to keep stale index was to utilize the cache of solr. But to my surprise, when I put real traffic on this index . cache usage was very less. It was varying between 0 to 10% irrespective of the size of filter cache. I tried varying the cache size but nothing happened and usage was very low. Most of the fields in the index are stored/doc values. I tried with cache sizes of 1024, 10024, 100024. What can be the possible reasons for low cache usage? How can I leverage cache feature for high traffic indexes? Thanks Saurabh Sharma
Re: Very low filter cache hit ratio
On 5/29/2019 6:57 AM, Saurabh Sharma wrote: What can be the possible reasons for low cache usage? How can I leverage cache feature for high traffic indexes? Your usage apparently does not use the exact same query (or filter query, in the case of filterCache) very often. In order to achieve a high hit ratio on a cache, the same query will need to be used by many users. That's not happening here. I'm betting that each user is sending something unique to Solr - which means it will be impossible to get a hit, unless that user sends the same query again. Thanks, Shawn
Re: Very low filter cache hit ratio
Hi Shwan, Many filters are common among the queries. AFAIK, filter cache are created against filters and by that logic one should get good hit ratio for those cached filter conditions.i tried to create a cache of 100K size and that too was not producing good hit ratio. Any document/suggetion about efficient usage of various caches and their internal working. Thanks Saurabh On Wed 29 May, 2019, 6:53 PM Shawn Heisey, wrote: > On 5/29/2019 6:57 AM, Saurabh Sharma wrote: > > What can be the possible reasons for low cache usage? > > How can I leverage cache feature for high traffic indexes? > > Your usage apparently does not use the exact same query (or filter > query, in the case of filterCache) very often. > > In order to achieve a high hit ratio on a cache, the same query will > need to be used by many users. That's not happening here. I'm betting > that each user is sending something unique to Solr - which means it will > be impossible to get a hit, unless that user sends the same query again. > > Thanks, > Shawn >
Re: Very low filter cache hit ratio
You can refer to this one: https://teaspoon-consulting.com/articles/solr-cache-tuning.html HTH, Atita On Wed, May 29, 2019 at 3:33 PM Saurabh Sharma wrote: > Hi Shwan, > > Many filters are common among the queries. AFAIK, filter cache are created > against filters and by that logic one should get good hit ratio for those > cached filter conditions.i tried to create a cache of 100K size and that > too was not producing good hit ratio. Any document/suggetion about > efficient usage of various caches and their internal working. > > Thanks > Saurabh > > On Wed 29 May, 2019, 6:53 PM Shawn Heisey, wrote: > > > On 5/29/2019 6:57 AM, Saurabh Sharma wrote: > > > What can be the possible reasons for low cache usage? > > > How can I leverage cache feature for high traffic indexes? > > > > Your usage apparently does not use the exact same query (or filter > > query, in the case of filterCache) very often. > > > > In order to achieve a high hit ratio on a cache, the same query will > > need to be used by many users. That's not happening here. I'm betting > > that each user is sending something unique to Solr - which means it will > > be impossible to get a hit, unless that user sends the same query again. > > > > Thanks, > > Shawn > > >
RE: Very low filter cache hit ratio
Hello, What is missing in that article is you must never use NOW without rounding it down in a filter query. If you have it, round it down to an hour, day or minute to prevent flooding the filter cache. Regards, Markus -Original message- > From:Atita Arora > Sent: Wednesday 29th May 2019 15:43 > To: solr-user@lucene.apache.org > Subject: Re: Very low filter cache hit ratio > > You can refer to this one: > https://teaspoon-consulting.com/articles/solr-cache-tuning.html > > HTH, > Atita > > On Wed, May 29, 2019 at 3:33 PM Saurabh Sharma > wrote: > > > Hi Shwan, > > > > Many filters are common among the queries. AFAIK, filter cache are created > > against filters and by that logic one should get good hit ratio for those > > cached filter conditions.i tried to create a cache of 100K size and that > > too was not producing good hit ratio. Any document/suggetion about > > efficient usage of various caches and their internal working. > > > > Thanks > > Saurabh > > > > On Wed 29 May, 2019, 6:53 PM Shawn Heisey, wrote: > > > > > On 5/29/2019 6:57 AM, Saurabh Sharma wrote: > > > > What can be the possible reasons for low cache usage? > > > > How can I leverage cache feature for high traffic indexes? > > > > > > Your usage apparently does not use the exact same query (or filter > > > query, in the case of filterCache) very often. > > > > > > In order to achieve a high hit ratio on a cache, the same query will > > > need to be used by many users. That's not happening here. I'm betting > > > that each user is sending something unique to Solr - which means it will > > > be impossible to get a hit, unless that user sends the same query again. > > > > > > Thanks, > > > Shawn > > > > > >
ExactSharedStatsCache vs LRUStatsCache
Running 6.6, why should I prefer one over the other? And what kind of cache does Exact use if it isn’t LRU? wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog)
Re: Very low filter cache hit ratio
On 5/29/2019 7:33 AM, Saurabh Sharma wrote: Many filters are common among the queries. AFAIK, filter cache are created against filters and by that logic one should get good hit ratio for those cached filter conditions.i tried to create a cache of 100K size and that too was not producing good hit ratio. Any document/suggetion about efficient usage of various caches and their internal working. In order to produce a cache hit, the query or filter must be identical in every way. Whitespace and all. And it must be identical after parts of it are substituted or expanded by Solr. Take note of the reply you received from Markus Jelsma. The "NOW" keyword is replaced by a current timestamp with millisecond accuracy -- which effectively means that queries using NOW are always different and cannot produce a cache hit. Rounding the timestamp using NOW/HOUR or NOW/DAY, if that fits user requirements, can be one solution to that problem. Be careful with defining a large filterCache. The memory requirements can become VERY extreme. Thanks, Shawn
Re: Very low filter cache hit ratio
You must show us the _exact_ filter queries you’re using, or at least a representative sample. Bumping the cache up very high is almost always the wrong thing to do. Each entry takes approximately maxDoc/8 bytes so unless your corpus is very small, you’ll eventually blow memory up. To Markus’ point about NOW, a full treatment is here: https://dzone.com/articles/solr-date-math-now-and-filter Best, Erick > On May 29, 2019, at 6:47 AM, Markus Jelsma wrote: > > Hello, > > What is missing in that article is you must never use NOW without rounding it > down in a filter query. If you have it, round it down to an hour, day or > minute to prevent flooding the filter cache. > > Regards, > Markus > > -Original message- >> From:Atita Arora >> Sent: Wednesday 29th May 2019 15:43 >> To: solr-user@lucene.apache.org >> Subject: Re: Very low filter cache hit ratio >> >> You can refer to this one: >> https://teaspoon-consulting.com/articles/solr-cache-tuning.html >> >> HTH, >> Atita >> >> On Wed, May 29, 2019 at 3:33 PM Saurabh Sharma >> wrote: >> >>> Hi Shwan, >>> >>> Many filters are common among the queries. AFAIK, filter cache are created >>> against filters and by that logic one should get good hit ratio for those >>> cached filter conditions.i tried to create a cache of 100K size and that >>> too was not producing good hit ratio. Any document/suggetion about >>> efficient usage of various caches and their internal working. >>> >>> Thanks >>> Saurabh >>> >>> On Wed 29 May, 2019, 6:53 PM Shawn Heisey, wrote: >>> On 5/29/2019 6:57 AM, Saurabh Sharma wrote: > What can be the possible reasons for low cache usage? > How can I leverage cache feature for high traffic indexes? Your usage apparently does not use the exact same query (or filter query, in the case of filterCache) very often. In order to achieve a high hit ratio on a cache, the same query will need to be used by many users. That's not happening here. I'm betting that each user is sending something unique to Solr - which means it will be impossible to get a hit, unless that user sends the same query again. Thanks, Shawn >>> >>
Re: problem indexing GPS metadata for video upload
Sorry Tim! I missed your last message about this issue! Thank you very much for the information. Is the latest 1.21 Tika Incorporated with the change already? and how about solr? Thanks! On Fri, May 3, 2019 at 11:28 AM Where is Where wrote: > Thank you very much Tim, I wonder how to make the Tika change apply to > Solr? I saw Tika core, parse and xml jar files tika-core.jar > tika-parsers.jar tika-xml.jar in solr contrib/extraction/lib folder. Do we > just replace these files? Thanks! > > On Thu, May 2, 2019 at 12:16 PM Where is Where wrote: > >> Thank you Alex and Tim. >> I have looked at the solrconfig.xml file (I am trying the techproducts >> demo config), the only related place I can find is the extract handle >> >> > startup="lazy" >> class="solr.extraction.ExtractingRequestHandler" > >> >> true >> >> >> >> true >> links >> ignored_ >> >> >> >> I am using this command bin/post -c techproducts >> example/exampledocs/1.mp4 -params "literal.id=mp4_1&uprefix=attr_" >> >> I have tried commenting out ignored_ and >> changing to div >> but still not working. I don't quite get why image is getting gps etc >> metadata but video is acting differently while it is using the same >> solrconfig and the gps metadata are in the same fields. There is no >> differentiation in solrconfig setting between image and video. >> >> Tim yes this is related to the TIKA link. Thank you! >> >> Here is the output in solr for mp4. >> >> { >> "attr_meta":["stream_size", >> "5721559", >> "date", >> "2019-03-29T04:36:39Z", >> "X-Parsed-By", >> "org.apache.tika.parser.DefaultParser", >> "X-Parsed-By", >> "org.apache.tika.parser.mp4.MP4Parser", >> "stream_content_type", >> "application/octet-stream", >> "meta:creation-date", >> "2019-03-29T04:36:39Z", >> "Creation-Date", >> "2019-03-29T04:36:39Z", >> "tiff:ImageLength", >> "1080", >> "resourceName", >> "/Volumes/Data/inData/App/solr/example/exampledocs/1.mp4", >> "dcterms:created", >> "2019-03-29T04:36:39Z", >> "dcterms:modified", >> "2019-03-29T04:36:39Z", >> "Last-Modified", >> "2019-03-29T04:36:39Z", >> "Last-Save-Date", >> "2019-03-29T04:36:39Z", >> "xmpDM:audioSampleRate", >> "1000", >> "meta:save-date", >> "2019-03-29T04:36:39Z", >> "modified", >> "2019-03-29T04:36:39Z", >> "tiff:ImageWidth", >> "1920", >> "xmpDM:duration", >> "2.64", >> "Content-Type", >> "video/mp4"], >> "id":"mp4_4", >> "attr_stream_size":["5721559"], >> "attr_date":["2019-03-29T04:36:39Z"], >> "attr_x_parsed_by":["org.apache.tika.parser.DefaultParser", >> "org.apache.tika.parser.mp4.MP4Parser"], >> "attr_stream_content_type":["application/octet-stream"], >> "attr_meta_creation_date":["2019-03-29T04:36:39Z"], >> "attr_creation_date":["2019-03-29T04:36:39Z"], >> "attr_tiff_imagelength":["1080"], >> >> "resourcename":"/Volumes/Data/inData/App/solr/example/exampledocs/1.mp4", >> "attr_dcterms_created":["2019-03-29T04:36:39Z"], >> "attr_dcterms_modified":["2019-03-29T04:36:39Z"], >> "last_modified":"2019-03-29T04:36:39Z", >> "attr_last_save_date":["2019-03-29T04:36:39Z"], >> "attr_xmpdm_audiosamplerate":["1000"], >> "attr_meta_save_date":["2019-03-29T04:36:39Z"], >> "attr_modified":["2019-03-29T04:36:39Z"], >> "attr_tiff_imagewidth":["1920"], >> "attr_xmpdm_duration":["2.64"], >> "content_type":["video/mp4"], >> "content":[" \n \n \n \n \n \n \n \n \n \n \n \n \n \n >> \n \n \n \n \n \n \n \n \n "], >> "_version_":1632383499325407232}] >> }} >> >> JPEG is getting these: >> "attr_meta":[ >> "GPS Latitude", >> "37° 47' 41.99\"", >> >> "attr_gps_latitude":["37° 47' 41.99\""], >> >> >> On Wed, May 1, 2019 at 2:57 PM Where is Where wrote: >> >>> uploading video to solr via tika >>> https://lucene.apache.org/solr/guide/7_7/uploading-data-with-solr-cell-using-apache-tika.html >>> The index has no video GPS metadata which is extracted and indexed for >>> images such as jpeg. I have checked both MP4 and MOV files, the files I >>> checked all have GPS Exif data embedded in the same fields as image. Any >>> idea? Thanks! >>> >>