Highlighting Performance improvement suggestions required - Solr 6.5.1
Hi All, I found quite a few discussions on the highlighting performance issue. Though I tried to implement most of them, performance improvement was negative. Currently index count is really low with about 922 records . But the field on which highlighting is done is quite large data. Querying of data with highlighting is taking lots of time with 85-90% time taken on highlighting. Configuration of my set schema.xml is as below fieldType name="text_general" class="solr.TextField" positionIncrementGap="100"> Query used in solr is hl=true&hl.fl=customContent&hl.fragsize=500&hl.simple.pre=&hl.simple.post=&hl.snippets=1&hl.method=unified&hl.bs.type=SENTENCE&hl.fragListBuilder=simple&hl.maxAnalyzedChars=214748364&facet=true&facet.mincount=1&facet.limit=-1&facet.s ort=count&debug=timing&facet.field=contentSpecific Also note that We had tried fastvectorhighlighter too but the result was not positive. Once when we tried to hl.offsetSource="term_vectors" with unified result came up in half a second but it didnt had any highlight snippets. One of the debug returned by solr is shared below for reference time=8833.0,prepare={time=0.0,query={time=0.0},facet={time=0.0},facet_module={time=0.0},mlt={time=0.0},hig hlight={time=0.0},stats={time=0.0},expand={time=0.0},terms={time=0.0},debug={time=0.0}},process={time=8826.0,query={time=867.0},facet={time=2.0},facet_module={time=0.0},mlt={time=0.0},highlight={time=7953.0},stats={time=0.0},expand={time=0.0},ter ms={time=0.0},debug={time=0.0}},loadFieldValues={time=28.0}} Any suggestions to improve the performance would be of great help Thanks, Arun -- View this message in context: http://lucene.472066.n3.nabble.com/Highlighting-Performance-improvement-suggestions-required-Solr-6-5-1-tp4349767.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Highlighting Performance improvement suggestions required - Solr 6.5.1
Hi Amrit, Thanks for the response. I did went through both and that is how I landed up with unified method for highlighter Thanks, Arun -- View this message in context: http://lucene.472066.n3.nabble.com/Highlighting-Performance-improvement-suggestions-required-Solr-6-5-1-tp4349767p4349781.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr performance issue on querying --> Solr 6.5.1
Hi All, I have been using Solr for some time now but mostly in standalone mode. Now my current project is using Solr 6.5.1 hosted on hadoop. My solrconfig.xml has the following configuration. In the prod environment the performance on querying seems to really slow. Can anyone help me with few pointers on howimprove on the same. ${solr.hdfs.home:} ${solr.hdfs.blockcache.enabled:true} ${solr.hdfs.blockcache.slab.count:1} ${solr.hdfs.blockcache.direct.memory.allocation:false} ${solr.hdfs.blockcache.blocksperbank:16384} ${solr.hdfs.blockcache.read.enabled:true} ${solr.hdfs.blockcache.write.enabled:false} ${solr.hdfs.nrtcachingdirectory.enable:true} ${solr.hdfs.nrtcachingdirectory.maxmergesizemb:16} ${solr.hdfs.nrtcachingdirectory.maxcachedmb:192} hdfs It has 6 collections of following size Collection 1 -->6.41 MB Collection 2 -->634.51 KB Collection 3 -->4.59 MB Collection 4 -->1,020.56 MB Collection 5 --> 607.26 MB Collection 6 -->102.4 kb Each Collection has 5 shards each. Allocated heap size for young generation is about 8 gb and old generation is about 24 gb. And gc analysis showed peak size utlisation is really low compared to these values. But querying to Collection 4 and collection 5 is giving really slow response even thoughwe are not using any complex queries.Output of debug quries run with debug=timing are given below for reference. Can anyone help suggest a way improve the performance. Response to query true 0 3962 ("hybrid electric powerplant" "hybrid electric powerplants" "Electric" "Electrical" "Electricity" "Engine" "fuel economy" "fuel efficiency" "Hybrid Electric Propulsion" "Power Systems" "Powerplant" "Propulsion" "hybrid" "hybrid electric" "electric powerplant") edismax true on host title url customContent contentSpecificSearch id contentTagsCount 0 OR OR 3985d7e2-3e54-48d8-8336-229e85f5d9de 600 ("hybrid electric powerplant"^100.0 "hybrid electric powerplants"^100.0 "Electric"^50.0 "Electrical"^50.0 "Electricity"^50.0 "Engine"^50.0 "fuel economy"^50.0 "fuel efficiency"^50.0 "Hybrid Electric Propulsion"^50.0 "Power Systems"^50.0 "Powerplant"^50.0 "Propulsion"^50.0 "hybrid"^15.0 "hybrid electric"^15.0 "electric powerplant"^15.0) 15374.0 2.0 2.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 15363.0 1313.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 14048.0 Thanks, Arun -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Solr performance issue on querying --> Solr 6.5.1
Hi Erick, Thank you for the quick response. Query time was relatively faster once it is read from memory. But personally I always felt response time could be far better. As suggested, We will try and set up in a non HDFS environment and update on the results. Thanks, Arun -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Solr performance issue on querying --> Solr 6.5.1
Hi Erick, Qtime comes down with rows set as 1. Also it was noted that qtime comes down when debug parameter is not added with the query. It comes to about 900. Thanks, Arun -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Solr performance issue on querying --> Solr 6.5.1
Hi Emir, Please find the response without bq parameter and debugQuery set to true. Also it was noted that Qtime comes down drastically without the debug parameter to about 700-800. true 0 3446 ("hybrid electric powerplant" "hybrid electric powerplants" "Electric" "Electrical" "Electricity" "Engine" "fuel economy" "fuel efficiency" "Hybrid Electric Propulsion" "Power Systems" "Powerplant" "Propulsion" "hybrid" "hybrid electric" "electric powerplant") edismax on host title url customContent contentSpecificSearch id contentOntologyTagsCount 0 OR 3985d7e2-3e54-48d8-8336-229e85f5d9de 600 true ... solr-prd-cluster-m-GooglePatent_shard4_replica2-1506504238282-20 35 159 GET_TOP_IDS 41294 ... 29 165 GET_TOP_IDS 40980 ... 31 200 GET_TOP_IDS 41006 ... 43 208 GET_TOP_IDS 41040 ... 181 466 GET_TOP_IDS 41138 ... 1518 1523 GET_FIELDS,GET_DEBUG 110 ... 1562 1573 GET_FIELDS,GET_DEBUG 115 ... 1793 1800 GET_FIELDS,GET_DEBUG 120 ... 2153 2161 GET_FIELDS,GET_DEBUG 125 ... 2957 2970 GET_FIELDS,GET_DEBUG 130 ... 10302.0 2.0 2.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 10288.0 661.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 9627.0 ("hybrid electric powerplant" "hybrid electric powerplants" "Electric" "Electrical" "Electricity" "Engine" "fuel economy" "fuel efficiency" "Hybrid Electric Propulsion" "Power Systems" "Powerplant" "Propulsion" "hybrid" "hybrid electric" "electric powerplant") ("hybrid electric powerplant" "hybrid electric powerplants" "Electric" "Electrical" "Electricity" "Engine" "fuel economy" "fuel efficiency" "Hybrid Electric Propulsion" "Power Systems" "Powerplant" "Propulsion" "hybrid" "hybrid electric" "electric powerplant") (+(DisjunctionMaxQuery((host:hybrid electric powerplant | contentSpecificSearch:"hybrid electric powerplant" | customContent:"hybrid electric powerplant" | title:hybrid electric powerplant | url:hybrid electric powerplant)) DisjunctionMaxQuery((host:hybrid electric powerplants | contentSpecificSearch:"hybrid electric powerplants" | customContent:"hybrid electric powerplants" | title:hybrid electric powerplants | url:hybrid electric powerplants)) DisjunctionMaxQuery((host:Electric | contentSpecificSearch:electric | customContent:electric | title:Electric | url:Electric)) DisjunctionMaxQuery((host:Electrical | contentSpecificSearch:electrical | customContent:electrical | title:Electrical | url:Electrical)) DisjunctionMaxQuery((host:Electricity | contentSpecificSearch:electricity | customContent:electricity | title:Electricity | url:Electricity)) DisjunctionMaxQuery((host:Engine | contentSpecificSearch:engine | customContent:engine | title:Engine | url:Engine)) DisjunctionMaxQuery((host:fuel economy | contentSpecificSearch:"fuel economy" | customContent:"fuel economy" | title:fuel economy | url:fuel economy)) DisjunctionMaxQuery((host:fuel efficiency | contentSpecificSearch:"fuel efficiency" | customContent:"fuel efficiency" | title:fuel efficiency | url:fuel efficiency)) DisjunctionMaxQuery((host:Hybrid Electric Propulsion | contentSpecificSearch:"hybrid electric propulsion" | customContent:"hybrid electric propulsion" | title:Hybrid Electric Propulsion | url:Hybrid Electric Propulsion)) DisjunctionMaxQuery((host:Power Systems | contentSpecificSearch:"power systems" | customContent:"power systems" | title:Power Systems | url:Power Systems)) DisjunctionMaxQuery((host:Powerplant | contentSpecificSearch:powerplant | customContent:powerplant | title:Powerplant | url:Powerplant)) DisjunctionMaxQuery((host:Propulsion | contentSpecificSearch:propulsion | customContent:propulsion | title:Propulsion | url:Propulsion)) DisjunctionMaxQuery((host:hybrid | contentSpecificSearch:hybrid | customContent:hybrid | title:hybrid | url:hybrid)) DisjunctionMaxQuery((host:hybrid electric | contentSpecificSearch:"hybrid electric" | customContent:"hybrid electric" | title:hybrid electric | url:hybrid electric)) DisjunctionMaxQuery((host:electric powerplant | contentSpecificSearch:"electric powerplant" | customContent:"electric powerplant" | title:electric powerplant | url:electric powerplant/no_coord +((host:hybrid electric powerplant | contentSpecificSearch:"hybrid electric powerplant" | customContent:"hybrid electric powerplant" | title:hybrid electric powerplant | url:hybrid electric powerplant) (host:hybrid electric powerplants | contentSpecificSearch:"hybrid electric powerplants" | customContent:"hybrid electric powerplants" | title:hybrid electric powerplants | url:hybrid electric powerplants) (host:Electric | contentSpecificSearch:electric | customContent:electric | title:Electric | url:Electric) (host:Electrical | contentSpecificSearch:electrical | customContent:electrical | title:Electrical | url:Electrical) (host:Electricity | contentSpecificSearch:electricity | customContent:electricity | title:Electricity | url:Electricity) (host:Engine | contentSpecificSearch:engine | customContent:engine | title:Engine | url:Engine) (host:fuel econ
Re: Solr performance issue on querying --> Solr 6.5.1
Hi Erick, As suggested, I did try nonHDFS solr cloud instance and it response looks to be really better. From the configuration side to, I am mostly using default configurations and with block.cache.direct.memory.allocation as false. On analysis of hdfs cache, evictions seems to be on higher side. Thanks, Arun -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Clarification on Suggester Component of Solr 6.5.1
Hi All, Last day I was able to configure Solr Suggester for recommendation in my site with the following settings mySuggester AnalyzingLookupFactory DocumentDictionaryFactory query_suggest text_suggester false false true 10 mySuggester suggest With the above configuration I am able to get suggestion from Solr. But only point of confusion is when I repeatedly hit the same search word, results are coming in different order. Is this an expected pattern with respect to suggester component. Example for mentioned pattern is given below for reference localhost:8983/solr/techproduct/suggest?suggest=true&suggest.build=true&suggest.dictionary=mySuggester&wt=json&suggest.q=lat&suggest.cfq=memory Sample Result 1 {"responseHeader":{"zkConnected":true,"status":0,"QTime":16},"command":"build","suggest":{"mySuggester":{"lat":{"numFound":3,"suggestions":[{"term":"latest development in electrification","weight":0,"payload":""},{"term":"latest development in the area of digital pdp","weight":0,"payload":""},{"term":"latest technology for materials","weight":0,"payload":""}] Sample Result 2 {"responseHeader":{"zkConnected":true,"status":0,"QTime":14},"command":"build","suggest":{"mySuggester":{"lat":{"numFound":3,"suggestions":[{"term":"latest development in the area of digital pdp","weight":0,"payload":""},{"term":"latest technology for materials","weight":0,"payload":""},{"term":"latest development in electrification","weight":0,"payload":""}] First Suggestion in both the case are different. Please advice Thanks, Arun -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Highlighting keywords which are not in close proximity with in a field
Hi All, Currently when I search for a phrase "Artificial Intelligence in space". keyword Artificial Intelligence is getting highlighted as number of occurrence of that word is more in the document. Most of its occurrence is mostly at the start of document. Whereas word Space is available in the document at the bottom. Due to which it is not shown in highlighting blob. Is there a way to highlight the keywords which are not in close proximity Thanks Arun -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Highlighting Solr 8
Hi Eric, Unified highlighter does not have an option to provide alternate field when highlighting. That option is available with Orginal and fast vector highlighter. As indicated in the Solr documentation, Unified is the recommended method for highlighting to meet most of the use cases. Please do share more details in case you are facing any specific issue with highlighting. Thanks, Arun -- Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Query related APACHE SOLR 8.2.0
Hi Rohit, Solr bundle comes with a Jetty server by default and does not require a tomcat instance to run. Even though earlier version of Solr was in the form of war file, Solr 5.0 and higher versions no longer supports user defined containers. Details of the same are available in the link below for reference. https://cwiki.apache.org/confluence/display/solr/WhyNoWar Details of system requirements are available in the below link https://lucene.apache.org/solr/guide/8_2/solr-system-requirements.html#supported-operating-systems Thanks, Arun -- Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html