Hi Erick, I change all my url fields into text (they were string fields before), and added a WordDelimiterFilterFactory, so that url fields can be tokenized into several words. But I still got around 15 seconds response time measured using debugyQuery=on, and most of the time still spend on DebugComponent. The query I use did not have any prepended asterisk. (Excuse me if the context description is still not complete enought)
Is there any other margin to boost the query performance? Spark 2012/1/10 yu shen <shenyu...@gmail.com> > Hi Erick, > > I only added debugyQuery=on to the url, and did not do any configuration > with regard to DebugComponent. Seems like 'string' type should be > substituted with 'text' type. > > I will paste the result here after I did some experiments. > > Spark > > > 2012/1/9 Erick Erickson <erickerick...@gmail.com> > >> Do you by chance have the debugQuery on by default? >> Because if you look down in the "timing" section, >> you can see the times the various components took to do >> their work, there are two sections "prepare" and "process". >> >> The cumulative time is 17.156 seconds. Of which 17.156 >> seconds is reported to be in the DebugComponent..... >> >> So what happens if you just turn that component off? Because >> I don't see anything in your output that really looks like it is >> taking any time. Of course if you've changed your code from >> *url* to url*, that will account for time too, since the infix case >> requires that every term in the fields in question be examined. >> >> About WordDelimiterFilterFactory That is irrelevant for a "string" >> type. It's an oen question whether a string type is what you >> want, but that is determined by your problem space. You might >> spend some time with admin/analysis to see the effects of >> various analysis chains. "string" is used when you want no >> tokenization, no case transformations etc. >> >> Best >> Erick >> >> On Mon, Jan 9, 2012 at 10:04 AM, yu shen <shenyu...@gmail.com> wrote: >> > Hi Erick, >> > >> > Thanks for you reply. Actually I did the following search: >> > survey_url:http\://www.someurl.com/sch/i.html* referal_url:http\:// >> > www.someurl.com/sch/i.html* page_url:http\:// >> www.someurl.com/sch/i.html* >> > >> > I did not prepend any asterisk to the field value, but only append to >> them. >> > >> > I analyze url field on solr admin page, and it give me this, meaning the >> > url is not tokenized. I notice you mentioned a >> WordDelimiterFilterFactory. >> > Do I need to configure it in schema.xml or some place else? >> > term position 1 term text http://www.someurl.com/sch/i.html* term type >> > word source >> > start,end 0,31 >> > I add the debugQuery=on to the query url, I got this (Sorry to paste >> such >> > long encrypted code here, they are really mysterious to me) >> > <lst name="debug"> >> > <str name="rawquerystring">survey_url:http\:// >> > www.someurl.com/sch/i.html* >> > referal_url:http\://www.someurl.com/sch/i.html*page_url:http\://<http://www.someurl.com/sch/i.html*page_url:http%5C://> >> > www.someurl.com/sch/i.html*</str> >> > <str name="querystring">survey_url:http\:// >> www.someurl.com/sch/i.html*referal_url:http\://<http://www.someurl.com/sch/i.html*referal_url:http%5C://> >> > www.someurl.com/sch/i.html* page_url:http\:// >> www.someurl.com/sch/i.html* >> > </str> >> > <str name="parsedquery">survey_url: >> http://www.someurl.com/sch/i.html*referal_url: >> > http://www.someurl.com/sch/i.html* page_url: >> > http://www.someurl.com/sch/i.html*</str> >> > <str name="parsedquery_toString">survey_url: >> > http://www.someurl.com/sch/i.html* referal_url: >> > http://www.someurl.com/sch/i.html* page_url: >> > http://www.someurl.com/sch/i.html*</str> >> > <lst name="explain"> >> > <str name="5007688343"> >> > 0.76980036 = (MATCH) product of: >> > 1.1547005 = (MATCH) sum of: >> > 0.57735026 = (MATCH) ConstantScoreQuery(referal_url: >> > http://www.someurl.com/sch/i.html*), product of: >> > 1.0 = boost >> > 0.57735026 = queryNorm >> > 0.57735026 = (MATCH) ConstantScoreQuery(page_url: >> > http://www.someurl.com/sch/i.html*), product of: >> > 1.0 = boost >> > 0.57735026 = queryNorm >> > 0.6666667 = coord(2/3) >> > </str> >> > <str name="5007648909"> >> > 0.76980036 = (MATCH) product of: >> > 1.1547005 = (MATCH) sum of: >> > 0.57735026 = (MATCH) ConstantScoreQuery(referal_url: >> > http://www.someurl.com/sch/i.html*), product of: >> > 1.0 = boost >> > 0.57735026 = queryNorm >> > 0.57735026 = (MATCH) ConstantScoreQuery(page_url: >> > http://www.someurl.com/sch/i.html*), product of: >> > 1.0 = boost >> > 0.57735026 = queryNorm >> > 0.6666667 = coord(2/3) >> > </str> >> > <str name="5007653989"> >> > 0.76980036 = (MATCH) product of: >> > 1.1547005 = (MATCH) sum of: >> > 0.57735026 = (MATCH) ConstantScoreQuery(referal_url: >> > http://www.someurl.com/sch/i.html*), product of: >> > 1.0 = boost >> > 0.57735026 = queryNorm >> > 0.57735026 = (MATCH) ConstantScoreQuery(page_url: >> > http://www.someurl.com/sch/i.html*), product of: >> > 1.0 = boost >> > 0.57735026 = queryNorm >> > 0.6666667 = coord(2/3) >> > </str> >> > <str name="5007709065"> >> > 0.76980036 = (MATCH) product of: >> > 1.1547005 = (MATCH) sum of: >> > 0.57735026 = (MATCH) ConstantScoreQuery(referal_url: >> > http://www.someurl.com/sch/i.html*), product of: >> > 1.0 = boost >> > 0.57735026 = queryNorm >> > 0.57735026 = (MATCH) ConstantScoreQuery(page_url: >> > http://www.someurl.com/sch/i.html*), product of: >> > 1.0 = boost >> > 0.57735026 = queryNorm >> > 0.6666667 = coord(2/3) >> > </str> >> > <str name="5007710379"> >> > 0.76980036 = (MATCH) product of: >> > 1.1547005 = (MATCH) sum of: >> > 0.57735026 = (MATCH) ConstantScoreQuery(referal_url: >> > http://www.someurl.com/sch/i.html*), product of: >> > 1.0 = boost >> > 0.57735026 = queryNorm >> > 0.57735026 = (MATCH) ConstantScoreQuery(page_url: >> > http://www.someurl.com/sch/i.html*), product of: >> > 1.0 = boost >> > 0.57735026 = queryNorm >> > 0.6666667 = coord(2/3) >> > </str><str name="5007739634"> >> > 0.76980036 = (MATCH) product of: >> > 1.1547005 = (MATCH) sum of: >> > 0.57735026 = (MATCH) ConstantScoreQuery(referal_url: >> > http://www.someurl.com/sch/i.html*), product of: >> > 1.0 = boost >> > 0.57735026 = queryNorm >> > 0.57735026 = (MATCH) ConstantScoreQuery(page_url: >> > http://www.someurl.com/sch/i.html*), product of: >> > 1.0 = boost >> > 0.57735026 = queryNorm >> > 0.6666667 = coord(2/3) >> > </str><str name="5007753066"> >> > 0.76980036 = (MATCH) product of: >> > 1.1547005 = (MATCH) sum of: >> > 0.57735026 = (MATCH) ConstantScoreQuery(referal_url: >> > http://www.someurl.com/sch/i.html*), product of: >> > 1.0 = boost >> > 0.57735026 = queryNorm >> > 0.57735026 = (MATCH) ConstantScoreQuery(page_url: >> > http://www.someurl.com/sch/i.html*), product of: >> > 1.0 = boost >> > 0.57735026 = queryNorm >> > 0.6666667 = coord(2/3) >> > </str><str name="5007756045"> >> > 0.76980036 = (MATCH) product of: >> > 1.1547005 = (MATCH) sum of: >> > 0.57735026 = (MATCH) ConstantScoreQuery(referal_url: >> > http://www.someurl.com/sch/i.html*), product of: >> > 1.0 = boost >> > 0.57735026 = queryNorm >> > 0.57735026 = (MATCH) ConstantScoreQuery(page_url: >> > http://www.someurl.com/sch/i.html*), product of: >> > 1.0 = boost >> > 0.57735026 = queryNorm >> > 0.6666667 = coord(2/3) >> > </str><str name="5007832978"> >> > 0.76980036 = (MATCH) product of: >> > 1.1547005 = (MATCH) sum of: >> > 0.57735026 = (MATCH) ConstantScoreQuery(referal_url: >> > http://www.someurl.com/sch/i.html*), product of: >> > 1.0 = boost >> > 0.57735026 = queryNorm >> > 0.57735026 = (MATCH) ConstantScoreQuery(page_url: >> > http://www.someurl.com/sch/i.html*), product of: >> > 1.0 = boost >> > 0.57735026 = queryNorm >> > 0.6666667 = coord(2/3) >> > </str><str name="5007849124"> >> > 0.76980036 = (MATCH) product of: >> > 1.1547005 = (MATCH) sum of: >> > 0.57735026 = (MATCH) ConstantScoreQuery(referal_url: >> > http://www.someurl.com/sch/i.html*), product of: >> > 1.0 = boost >> > 0.57735026 = queryNorm >> > 0.57735026 = (MATCH) ConstantScoreQuery(page_url: >> > http://www.someurl.com/sch/i.html*), product of: >> > 1.0 = boost >> > 0.57735026 = queryNorm >> > 0.6666667 = coord(2/3) >> > </str></lst><str name="QParser">LuceneQParser</str><lst >> > name="timing"><double name="time">17156.0</double><lst >> > name="prepare"><double name="time">0.0</double><lst >> > name="org.apache.solr.handler.component.QueryComponent"><double >> > name="time">0.0</double></lst><lst >> > name="org.apache.solr.handler.component.FacetComponent"><double >> > name="time">0.0</double></lst><lst >> > name="org.apache.solr.handler.component.MoreLikeThisComponent"><double >> > name="time">0.0</double></lst><lst >> > name="org.apache.solr.handler.component.HighlightComponent"><double >> > name="time">0.0</double></lst><lst >> > name="org.apache.solr.handler.component.StatsComponent"><double >> > name="time">0.0</double></lst><lst >> > name="org.apache.solr.handler.component.DebugComponent"><double >> > name="time">0.0</double></lst></lst><lst name="process"><double >> > name="time">17156.0</double><lst >> > name="org.apache.solr.handler.component.QueryComponent"><double >> > name="time">0.0</double></lst><lst >> > name="org.apache.solr.handler.component.FacetComponent"><double >> > name="time">0.0</double></lst><lst >> > name="org.apache.solr.handler.component.MoreLikeThisComponent"><double >> > name="time">0.0</double></lst><lst >> > name="org.apache.solr.handler.component.HighlightComponent"><double >> > name="time">0.0</double></lst><lst >> > name="org.apache.solr.handler.component.StatsComponent"><double >> > name="time">0.0</double></lst><lst >> > name="org.apache.solr.handler.component.DebugComponent"><double >> > name="time">17156.0</double></lst></lst></lst></lst> >> > >> > >> > >> > 2012/1/9 Erick Erickson <erickerick...@gmail.com> >> > >> >> Yu Shen & Arian: >> >> >> >> We can't help much without more information. In particular, how are >> >> the fields in question analyzed? What is the result of looking >> >> at the admin/analysis page? What do you get when you >> >> attach &debugQuery=on to the query? >> >> >> >> You might review: >> >> http://wiki.apache.org/solr/UsingMailingLists >> >> >> >> But at a wild guess, you have something like WordDelimiterFilterFactory >> >> in your analysis chain, and it's splitting up your input into >> >> "www" "someurl" "com" as separate tokens, and www matches >> >> all documents so Solr is having to score all documents in your corpus, >> but >> >> that's just a guess. See the admin/schema browser page and find the >> most >> >> frequent terms for the field in question, that should indicate whether >> >> you have some tokens that appear in all docs. Try searching on >> >> plain "someurl". Is that slow? Or "someurl.anotherpart" or whatever. >> >> >> >> Best >> >> Erick >> >> >> >> 2012/1/9 François Schiettecatte <fschietteca...@gmail.com>: >> >> > About the search 'referal_url:*www.someurl.com*', having a wildcard >> at >> >> the start will cause a dictionary scan for every term you search on >> unless >> >> you use ReversedWildcardFilterFactory. That could be the cause of your >> >> slowdown if you are I/O bound, and even if you are CPU bound for that >> >> matter. >> >> > >> >> > François >> >> > >> >> > >> >> > On Jan 8, 2012, at 8:44 PM, yu shen wrote: >> >> > >> >> >> Hi, >> >> >> >> >> >> My solr document has up to 20 fields, containing data from product >> name, >> >> >> date, url etc. >> >> >> >> >> >> The volume of documents is around 1.5m. >> >> >> >> >> >> My symptom is when doing url search like [ url:*www.someurl.com* >> >> >> referal_url:*www.someurl.com* page_url:*www.someurl.com*] will get >> a >> >> >> extraordinary long response time, while search against all other >> fields, >> >> >> the response time will be normal. >> >> >> >> >> >> Can anyone share any insights on this? >> >> >> >> >> >> Spark >> >> > >> >> >> > >