Hi Erick,

I change all my url fields into text (they were string fields before), and
added a WordDelimiterFilterFactory, so that url fields can be tokenized
into several words. But I still got around 15 seconds response time
measured using debugyQuery=on, and most of the time still spend on
DebugComponent. The query I use did not have any prepended asterisk.
(Excuse me if the context description is still not complete enought)

Is there any other margin to boost the query performance?

Spark

2012/1/10 yu shen <shenyu...@gmail.com>

> Hi Erick,
>
> I only added debugyQuery=on to the url, and did not do any configuration
> with regard to DebugComponent. Seems like 'string' type should be
> substituted with 'text' type.
>
> I will paste the result here after I did some experiments.
>
> Spark
>
>
> 2012/1/9 Erick Erickson <erickerick...@gmail.com>
>
>> Do you by chance have the debugQuery on by default?
>> Because if you look down in the "timing" section,
>> you can see the times the various components took to do
>> their work, there are two sections "prepare" and "process".
>>
>> The cumulative time is 17.156 seconds. Of which 17.156
>> seconds is reported to be in the DebugComponent.....
>>
>> So what happens if you just turn that component off? Because
>> I don't see anything in your output that really looks like it is
>> taking any time. Of course if you've changed your code from
>> *url* to url*, that will account for time too, since the infix  case
>> requires that every term in the fields in question be examined.
>>
>> About WordDelimiterFilterFactory That is irrelevant for a "string"
>> type. It's an oen question whether a string type is what you
>> want, but that is determined by your problem space. You might
>> spend some time with admin/analysis to see the effects of
>> various analysis chains. "string" is used when you want no
>> tokenization, no case transformations etc.
>>
>> Best
>> Erick
>>
>> On Mon, Jan 9, 2012 at 10:04 AM, yu shen <shenyu...@gmail.com> wrote:
>> > Hi Erick,
>> >
>> > Thanks for you reply. Actually I did the following search:
>> > survey_url:http\://www.someurl.com/sch/i.html* referal_url:http\://
>> > www.someurl.com/sch/i.html* page_url:http\://
>> www.someurl.com/sch/i.html*
>> >
>> > I did not prepend any asterisk to the field value, but only append to
>> them.
>> >
>> > I analyze url field on solr admin page, and it give me this, meaning the
>> > url is not tokenized. I notice you mentioned a
>> WordDelimiterFilterFactory.
>> > Do I need to configure it in schema.xml or some place else?
>> > term position 1 term text http://www.someurl.com/sch/i.html* term type
>> > word source
>> > start,end 0,31
>> > I add the debugQuery=on to the query url, I got this (Sorry to paste
>> such
>> > long encrypted code here, they are really mysterious to me)
>> > <lst name="debug">
>> >    <str name="rawquerystring">survey_url:http\://
>> > www.someurl.com/sch/i.html*
>> > referal_url:http\://www.someurl.com/sch/i.html*page_url:http\://<http://www.someurl.com/sch/i.html*page_url:http%5C://>
>> > www.someurl.com/sch/i.html*</str>
>> >    <str name="querystring">survey_url:http\://
>> www.someurl.com/sch/i.html*referal_url:http\://<http://www.someurl.com/sch/i.html*referal_url:http%5C://>
>> > www.someurl.com/sch/i.html* page_url:http\://
>> www.someurl.com/sch/i.html*
>> > </str>
>> >    <str name="parsedquery">survey_url:
>> http://www.someurl.com/sch/i.html*referal_url:
>> > http://www.someurl.com/sch/i.html* page_url:
>> > http://www.someurl.com/sch/i.html*</str>
>> >    <str name="parsedquery_toString">survey_url:
>> > http://www.someurl.com/sch/i.html* referal_url:
>> > http://www.someurl.com/sch/i.html* page_url:
>> > http://www.someurl.com/sch/i.html*</str>
>> >    <lst name="explain">
>> >        <str name="5007688343">
>> > 0.76980036 = (MATCH) product of:
>> >  1.1547005 = (MATCH) sum of:
>> >    0.57735026 = (MATCH) ConstantScoreQuery(referal_url:
>> > http://www.someurl.com/sch/i.html*), product of:
>> >      1.0 = boost
>> >      0.57735026 = queryNorm
>> >    0.57735026 = (MATCH) ConstantScoreQuery(page_url:
>> > http://www.someurl.com/sch/i.html*), product of:
>> >      1.0 = boost
>> >      0.57735026 = queryNorm
>> >  0.6666667 = coord(2/3)
>> >        </str>
>> >        <str name="5007648909">
>> > 0.76980036 = (MATCH) product of:
>> >  1.1547005 = (MATCH) sum of:
>> >    0.57735026 = (MATCH) ConstantScoreQuery(referal_url:
>> > http://www.someurl.com/sch/i.html*), product of:
>> >      1.0 = boost
>> >      0.57735026 = queryNorm
>> >    0.57735026 = (MATCH) ConstantScoreQuery(page_url:
>> > http://www.someurl.com/sch/i.html*), product of:
>> >      1.0 = boost
>> >      0.57735026 = queryNorm
>> >  0.6666667 = coord(2/3)
>> >        </str>
>> >        <str name="5007653989">
>> > 0.76980036 = (MATCH) product of:
>> >  1.1547005 = (MATCH) sum of:
>> >    0.57735026 = (MATCH) ConstantScoreQuery(referal_url:
>> > http://www.someurl.com/sch/i.html*), product of:
>> >      1.0 = boost
>> >      0.57735026 = queryNorm
>> >    0.57735026 = (MATCH) ConstantScoreQuery(page_url:
>> > http://www.someurl.com/sch/i.html*), product of:
>> >      1.0 = boost
>> >      0.57735026 = queryNorm
>> >  0.6666667 = coord(2/3)
>> >        </str>
>> >        <str name="5007709065">
>> > 0.76980036 = (MATCH) product of:
>> >  1.1547005 = (MATCH) sum of:
>> >    0.57735026 = (MATCH) ConstantScoreQuery(referal_url:
>> > http://www.someurl.com/sch/i.html*), product of:
>> >      1.0 = boost
>> >      0.57735026 = queryNorm
>> >    0.57735026 = (MATCH) ConstantScoreQuery(page_url:
>> > http://www.someurl.com/sch/i.html*), product of:
>> >      1.0 = boost
>> >      0.57735026 = queryNorm
>> >  0.6666667 = coord(2/3)
>> >        </str>
>> >        <str name="5007710379">
>> > 0.76980036 = (MATCH) product of:
>> >  1.1547005 = (MATCH) sum of:
>> >    0.57735026 = (MATCH) ConstantScoreQuery(referal_url:
>> > http://www.someurl.com/sch/i.html*), product of:
>> >      1.0 = boost
>> >      0.57735026 = queryNorm
>> >    0.57735026 = (MATCH) ConstantScoreQuery(page_url:
>> > http://www.someurl.com/sch/i.html*), product of:
>> >      1.0 = boost
>> >      0.57735026 = queryNorm
>> >  0.6666667 = coord(2/3)
>> > </str><str name="5007739634">
>> > 0.76980036 = (MATCH) product of:
>> >  1.1547005 = (MATCH) sum of:
>> >    0.57735026 = (MATCH) ConstantScoreQuery(referal_url:
>> > http://www.someurl.com/sch/i.html*), product of:
>> >      1.0 = boost
>> >      0.57735026 = queryNorm
>> >    0.57735026 = (MATCH) ConstantScoreQuery(page_url:
>> > http://www.someurl.com/sch/i.html*), product of:
>> >      1.0 = boost
>> >      0.57735026 = queryNorm
>> >  0.6666667 = coord(2/3)
>> > </str><str name="5007753066">
>> > 0.76980036 = (MATCH) product of:
>> >  1.1547005 = (MATCH) sum of:
>> >    0.57735026 = (MATCH) ConstantScoreQuery(referal_url:
>> > http://www.someurl.com/sch/i.html*), product of:
>> >      1.0 = boost
>> >      0.57735026 = queryNorm
>> >    0.57735026 = (MATCH) ConstantScoreQuery(page_url:
>> > http://www.someurl.com/sch/i.html*), product of:
>> >      1.0 = boost
>> >      0.57735026 = queryNorm
>> >  0.6666667 = coord(2/3)
>> > </str><str name="5007756045">
>> > 0.76980036 = (MATCH) product of:
>> >  1.1547005 = (MATCH) sum of:
>> >    0.57735026 = (MATCH) ConstantScoreQuery(referal_url:
>> > http://www.someurl.com/sch/i.html*), product of:
>> >      1.0 = boost
>> >      0.57735026 = queryNorm
>> >    0.57735026 = (MATCH) ConstantScoreQuery(page_url:
>> > http://www.someurl.com/sch/i.html*), product of:
>> >      1.0 = boost
>> >      0.57735026 = queryNorm
>> >  0.6666667 = coord(2/3)
>> > </str><str name="5007832978">
>> > 0.76980036 = (MATCH) product of:
>> >  1.1547005 = (MATCH) sum of:
>> >    0.57735026 = (MATCH) ConstantScoreQuery(referal_url:
>> > http://www.someurl.com/sch/i.html*), product of:
>> >      1.0 = boost
>> >      0.57735026 = queryNorm
>> >    0.57735026 = (MATCH) ConstantScoreQuery(page_url:
>> > http://www.someurl.com/sch/i.html*), product of:
>> >      1.0 = boost
>> >      0.57735026 = queryNorm
>> >  0.6666667 = coord(2/3)
>> > </str><str name="5007849124">
>> > 0.76980036 = (MATCH) product of:
>> >  1.1547005 = (MATCH) sum of:
>> >    0.57735026 = (MATCH) ConstantScoreQuery(referal_url:
>> > http://www.someurl.com/sch/i.html*), product of:
>> >      1.0 = boost
>> >      0.57735026 = queryNorm
>> >    0.57735026 = (MATCH) ConstantScoreQuery(page_url:
>> > http://www.someurl.com/sch/i.html*), product of:
>> >      1.0 = boost
>> >      0.57735026 = queryNorm
>> >  0.6666667 = coord(2/3)
>> > </str></lst><str name="QParser">LuceneQParser</str><lst
>> > name="timing"><double name="time">17156.0</double><lst
>> > name="prepare"><double name="time">0.0</double><lst
>> > name="org.apache.solr.handler.component.QueryComponent"><double
>> > name="time">0.0</double></lst><lst
>> > name="org.apache.solr.handler.component.FacetComponent"><double
>> > name="time">0.0</double></lst><lst
>> > name="org.apache.solr.handler.component.MoreLikeThisComponent"><double
>> > name="time">0.0</double></lst><lst
>> > name="org.apache.solr.handler.component.HighlightComponent"><double
>> > name="time">0.0</double></lst><lst
>> > name="org.apache.solr.handler.component.StatsComponent"><double
>> > name="time">0.0</double></lst><lst
>> > name="org.apache.solr.handler.component.DebugComponent"><double
>> > name="time">0.0</double></lst></lst><lst name="process"><double
>> > name="time">17156.0</double><lst
>> > name="org.apache.solr.handler.component.QueryComponent"><double
>> > name="time">0.0</double></lst><lst
>> > name="org.apache.solr.handler.component.FacetComponent"><double
>> > name="time">0.0</double></lst><lst
>> > name="org.apache.solr.handler.component.MoreLikeThisComponent"><double
>> > name="time">0.0</double></lst><lst
>> > name="org.apache.solr.handler.component.HighlightComponent"><double
>> > name="time">0.0</double></lst><lst
>> > name="org.apache.solr.handler.component.StatsComponent"><double
>> > name="time">0.0</double></lst><lst
>> > name="org.apache.solr.handler.component.DebugComponent"><double
>> > name="time">17156.0</double></lst></lst></lst></lst>
>> >
>> >
>> >
>> > 2012/1/9 Erick Erickson <erickerick...@gmail.com>
>> >
>> >> Yu Shen & Arian:
>> >>
>> >> We can't help much without more information. In particular, how are
>> >> the fields in question analyzed? What is the result of looking
>> >> at the admin/analysis page? What do you get when you
>> >> attach &debugQuery=on to the query?
>> >>
>> >> You might review:
>> >> http://wiki.apache.org/solr/UsingMailingLists
>> >>
>> >> But at a wild guess, you have something like WordDelimiterFilterFactory
>> >> in your analysis chain, and it's splitting up your input into
>> >> "www" "someurl" "com" as separate tokens, and www matches
>> >> all documents so Solr is having to score all documents in your corpus,
>> but
>> >> that's just a guess. See the admin/schema browser page and find the
>> most
>> >> frequent terms for the field in question, that should indicate whether
>> >> you have some tokens that appear in all docs. Try searching on
>> >> plain "someurl". Is that slow? Or "someurl.anotherpart" or whatever.
>> >>
>> >> Best
>> >> Erick
>> >>
>> >> 2012/1/9 François Schiettecatte <fschietteca...@gmail.com>:
>> >> > About the search 'referal_url:*www.someurl.com*', having a wildcard
>> at
>> >> the start will cause a dictionary scan for every term you search on
>> unless
>> >> you use ReversedWildcardFilterFactory. That could be the cause of your
>> >> slowdown if you are I/O bound, and even if you are CPU bound for that
>> >> matter.
>> >> >
>> >> > François
>> >> >
>> >> >
>> >> > On Jan 8, 2012, at 8:44 PM, yu shen wrote:
>> >> >
>> >> >> Hi,
>> >> >>
>> >> >> My solr document has up to 20 fields, containing data from product
>> name,
>> >> >> date, url etc.
>> >> >>
>> >> >> The volume of documents is around 1.5m.
>> >> >>
>> >> >> My symptom is when doing url search like [ url:*www.someurl.com*
>> >> >> referal_url:*www.someurl.com* page_url:*www.someurl.com*] will get
>> a
>> >> >> extraordinary long response time, while search against all other
>> fields,
>> >> >> the response time will be normal.
>> >> >>
>> >> >> Can anyone share any insights on this?
>> >> >>
>> >> >> Spark
>> >> >
>> >>
>>
>
>

Reply via email to