Re: Phrase Query Performance Question and score threshold

2007-11-05 Thread Yonik Seeley
On 11/5/07, Haishan Chen <[EMAIL PROTECTED]> wrote: > As for the first issues. The number of different phrase queries have > performance issues I found so far are about 10. If these are normal phrase queries (no slop), a good solution might be to simply index and query these phrases as a single t

RE: Phrase Query Performance Question and score threshold

2007-11-05 Thread Haishan Chen
> Date: Mon, 5 Nov 2007 14:55:21 -0500> From: [EMAIL PROTECTED]> To: > solr-user@lucene.apache.org> Subject: Re: Phrase Query Performance Question > and score threshold> > On 11/5/07, Haishan Chen <[EMAIL PROTECTED]> wrote:> > > If I limit the docume

Re: Phrase Query Performance Question and score threshold

2007-11-05 Thread Yonik Seeley
On 11/5/07, Haishan Chen <[EMAIL PROTECTED]> wrote: > If I limit the documents returned based on a score threshold (filter by > score) will it be able to improve query performance? No. Taking a different approach can really speed up queries though. To figure out what approach you should take, we

RE: Phrase Query Performance Question and score threshold

2007-11-05 Thread Haishan Chen
u offer advice on the best way to implement score threshold in SOLR with minimum overhead? Appreciate if anyone can help Thank you Haishan > Date: Fri, 2 Nov 2007 12:31:29 -0700> From: [EMAIL PROTECTED]> To: > solr-user@lucene.apache.org> Subject: Re: Phrase Query Per

RE: Phrase Query Performance Question

2007-11-02 Thread Haishan Chen
> Date: Fri, 2 Nov 2007 12:31:29 -0700> From: [EMAIL PROTECTED]> To: > solr-user@lucene.apache.org> Subject: Re: Phrase Query Performance Question> > > > : It still feels to me that you are trying doing something unique with > your> : phrase queries. Unfortuna

Re: Phrase Query Performance Question

2007-11-02 Thread Chris Hostetter
: It still feels to me that you are trying doing something unique with your : phrase queries. Unfortunately, you still haven't said what you are trying to : do in general terms, which makes it very difficult for people to help you. Agreed. This seems very special case, but we dont' know what th

Re: Phrase Query Performance Question

2007-11-02 Thread Mike Klaas
On 2-Nov-07, at 10:03 AM, Haishan Chen wrote: Date: Fri, 2 Nov 2007 07:32:30 -0700> Subject: Re: Phrase Query Performance Question> From: [EMAIL PROTECTED]> To: solr- [EMAIL PROTECTED]> > He means "extremely frequent" and I agree. --wunder Then it means

RE: Phrase Query Performance Question

2007-11-02 Thread Haishan Chen
> Date: Fri, 2 Nov 2007 07:32:30 -0700> Subject: Re: Phrase Query Performance > Question> From: [EMAIL PROTECTED]> To: solr-user@lucene.apache.org> > He > means "extremely frequent" and I agree. --wunder Then it means a PHRASE (combination of terms exc

Re: Phrase Query Performance Question

2007-11-02 Thread Walter Underwood
He means "extremely frequent" and I agree. --wunder On 11/2/07 1:51 AM, "Haishan Chen" <[EMAIL PROTECTED]> wrote: > Thanks for the advice. You certainly have a point. I believe you mean a query > term that appears in 5-10% of an index in a natural language corpus is > extremely INFREQUENT?

RE: Phrase Query Performance Question

2007-11-02 Thread Haishan Chen
> From: [EMAIL PROTECTED]> Subject: Re: Phrase Query Performance Question> > Date: Thu, 1 Nov 2007 11:25:26 -0700> To: solr-user@lucene.apache.org> > On > 31-Oct-07, at 11:54 PM, Haishan Chen wrote:> > >> >> Date: Wed, 31 Oct 2007 > 17:54:53 -070

Re: Phrase Query Performance Question

2007-11-01 Thread Mike Klaas
On 31-Oct-07, at 11:54 PM, Haishan Chen wrote: Date: Wed, 31 Oct 2007 17:54:53 -0700> Subject: Re: Phrase Query Performance Question> From: [EMAIL PROTECTED]> To: solr- [EMAIL PROTECTED]> > "hurricane katrina" is a very expensive query against a collection>

RE: Phrase Query Performance Question

2007-10-31 Thread Haishan Chen
> Date: Wed, 31 Oct 2007 17:54:53 -0700> Subject: Re: Phrase Query Performance > Question> From: [EMAIL PROTECTED]> To: solr-user@lucene.apache.org> > > "hurricane katrina" is a very expensive query against a collection> focused > on Hurricane Kat

RE: Phrase Query Performance Question

2007-10-31 Thread Haishan Chen
> Date: Wed, 31 Oct 2007 19:19:07 -0700> From: [EMAIL PROTECTED]> To: > solr-user@lucene.apache.org> Subject: RE: Phrase Query Performance Question> > > > : ("auto repair") 100384 hits 946 ms(auto repair) 100384 hits 31ms("car > > : repair

RE: Phrase Query Performance Question

2007-10-31 Thread Chris Hostetter
: ("auto repair") 100384 hits 946 ms(auto repair) 100384 hits 31ms("car : repair"~100) 112183 hits 766 ms(car repair) 112183 hits 63 : ms("business service"~100) 1209751 hits 1500 ms(business service) : 1209751 hits 234 ms("shopping center"~100) 119481 hits 359 : ms(shopping c

Re: Phrase Query Performance Question

2007-10-31 Thread Walter Underwood
"hurricane katrina" is a very expensive query against a collection focused on Hurricane Katrina. There will be many matches in many documents. If you want to measure worst-case, this is fine. I'd try other things, like: * ninth ward * Ray Nagin * Audubon Park * Canal Street * French Quarter * FEM

RE: Phrase Query Performance Question

2007-10-31 Thread Haishan Chen
> From: [EMAIL PROTECTED]> Subject: Re: Phrase Query Performance Question> > Date: Wed, 31 Oct 2007 15:25:42 -0700> To: solr-user@lucene.apache.org> > On > 31-Oct-07, at 2:40 PM, Haishan Chen wrote:> > >> > > http://mail-archives.apache.org/mod_mbox/l

Re: Phrase Query Performance Question

2007-10-31 Thread Mike Klaas
On 31-Oct-07, at 2:40 PM, Haishan Chen wrote: http://mail-archives.apache.org/mod_mbox/lucene-java-user/ 200512.mbox/[EMAIL PROTECTED] It mentioned that http://websearch.archive.org/katrina/ (in nutch) had 10M documents and a search of "hurricane katrina" was able to return in 1.35 second

RE: Phrase Query Performance Question

2007-10-31 Thread Haishan Chen
> From: [EMAIL PROTECTED]> Subject: Re: Phrase Query Performance Question> > Date: Tue, 30 Oct 2007 11:22:17 -0700> To: solr-user@lucene.apache.org> > On > 30-Oct-07, at 6:09 AM, Yonik Seeley wrote:> > > On 10/30/07, Haishan Chen > <[EMAIL PROTECTED

Re: Phrase Query Performance Question

2007-10-30 Thread Mike Klaas
On 30-Oct-07, at 6:09 AM, Yonik Seeley wrote: On 10/30/07, Haishan Chen <[EMAIL PROTECTED]> wrote: Thanks a lot for replying Yonik! I am running solr on a windows 2003 server (standard version). intel Xeon CPU 3.00GHz, with 4.00 GB RAM. The index is locate on Raid5 with 2 million documents.

Re: Phrase Query Performance Question

2007-10-30 Thread Yonik Seeley
On 10/30/07, Haishan Chen <[EMAIL PROTECTED]> wrote: > Thanks a lot for replying Yonik! > > I am running solr on a windows 2003 server (standard version). intel Xeon CPU > 3.00GHz, with 4.00 GB RAM. > The index is locate on Raid5 with 2 million documents. Is there any way to > improve query perfo

RE: phrase query performance

2007-10-30 Thread Haishan Chen
gt; Date: Fri, 26 Oct 2007 11:09:21 -0400> From: [EMAIL PROTECTED]> To: > solr-user@lucene.apache.org> Subject: Re: phrase query performance> > The > differences lie in Lucene.> Instead of thinking of phrase queries as slow, > think of term queries as fast :-)&g

RE: Phrase Query Performance Question

2007-10-30 Thread Haishan Chen
66 ms repeatable What are the > factors affecting phrase query performance? How come the phrase query > content:("auto repair") is almost 20 times slower than content:(auto repair)? > I also notice a the phrase query with a slop is always faster than the one > without a sl

Re: phrase query performance

2007-10-26 Thread Yonik Seeley
I > was doing the query on the one field only. content:(auto repair) >47 ms repeatablecontent:("auto repair") 937 ms > repeatablecontent:("auto repair"~1) 766 ms repeatable What are the > factors affecting phrase query perfor

Phrase Query Performance Question

2007-10-26 Thread Haishan Chen
following. I was doing the query on the one field only. content:(auto repair)47 ms repeatablecontent:("auto repair") 937 ms repeatablecontent:("auto repair"~1) 766 ms repeatable What are the factors affecting phrase query performance? How c

phrase query performance

2007-10-25 Thread Haishan Chen
following. I was doing the query on the one field only. content:(auto repair)47 ms repeatablecontent:("auto repair") 937 ms repeatablecontent:("auto repair"~1) 766 ms repeatable What are the factors affecting phrase query performance? How c