Performance of putting the solr data in SAN.

2008-08-29 Thread Yongjun Rong
Hi,
  I'm jus wondering if anybody has experinces about putting the solr
data in SAN instead of local disk. Is there a big performance penalty?
Please share with me your experiences.
  Thank you very much.
  Yongjun Rong


RE: Performance of putting the solr data in SAN.

2008-09-02 Thread Yongjun Rong
 Hi,
  I did not get any response from this maillist about this quesiton.
Does that mean no one in this mail list used Solr with SAN? Please reply
to me if you use solr with SAN.
  Thank you very much.
  Yongjun Rong

-Original Message-
From: Yongjun Rong [mailto:[EMAIL PROTECTED] 
Sent: Friday, August 29, 2008 1:18 PM
To: solr-user@lucene.apache.org
Cc: Mitch Stewart
Subject: Performance of putting the solr data in SAN.

Hi,
  I'm jus wondering if anybody has experinces about putting the solr
data in SAN instead of local disk. Is there a big performance penalty?
Please share with me your experiences.
  Thank you very much.
  Yongjun Rong


Uprade lucene to 2.3

2008-04-29 Thread Yongjun Rong
Hi,
  It seems the latest lucene 2.3 has some improvement on performance.
I'm just wondering if it is ok for us to easily upgrade the solr's
lucene from 2.1 to 2.3. Is any special thing we need to know except just
replace the lucene jars in the lib directory.
  Thank you very much.
  Yongjun Rong
 


RE: SOLR OOM (out of memory) problem

2008-05-21 Thread Yongjun Rong
I had the same problem some weeks before. You can try these:
1. Check the hit ratio for the cache via the solr/admin/stats.jsp. If
the hit ratio is very low. Just disable those cache. It will save you
some memory.
2. set -Xms and -Xmx to the same size will help improve GC performance. 
3. Check what's GC do you use? Default will be parallel. You can try use
concurrent GC which will help a lot.
4. This is my sun hotspot jvm startup options: -XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=50 -XX:-UseGCOverheadLimit
The above cannot solve the OOM forever. But they help a lot.
Wish this can help.

-Original Message-
From: Mike Klaas [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, May 21, 2008 2:23 PM
To: solr-user@lucene.apache.org
Subject: Re: SOLR OOM (out of memory) problem


On 21-May-08, at 4:46 AM, gurudev wrote:

>
> Just to add more:
>
> The JVM heap allocated is 6GB with initial heap size as 2GB. We use 
> quadro(which is 8 cpus) on linux servers for SOLR slaves.
> We use facet searches, sorting.
> document cache is set to 7 million (which is total documents in index)

> filtercache 1

You definitely don't have enough memory to keep 7 million document,
fully realized in java-object form, in memory.

Nor would you want to.  The document cache should aim to keep the most
frequently-occuring documents in memory (in the thousands, perhaps 10's
of thousands).  By devoting more memory to the OS disk cache, more of
the 12GB index can be cached by the OS and thus speed up all document
retreival.

-Mike


RE: SOLR OOM (out of memory) problem

2008-05-22 Thread Yongjun Rong
 

-Original Message-
From: gurudev [mailto:[EMAIL PROTECTED] 
Sent: Thursday, May 22, 2008 7:28 AM
To: solr-user@lucene.apache.org
Subject: RE: SOLR OOM (out of memory) problem


Hi Rong,

My cache hit ratio are:

filtercache: 0.96
documentcache:0.51
queryresultcache:0.58

Thanx
Pravesh


Yongjun Rong-2 wrote:
> 
> I had the same problem some weeks before. You can try these:
> 1. Check the hit ratio for the cache via the solr/admin/stats.jsp. If 
> the hit ratio is very low. Just disable those cache. It will save you 
> some memory.
> 2. set -Xms and -Xmx to the same size will help improve GC
performance. 
> 3. Check what's GC do you use? Default will be parallel. You can try 
> use concurrent GC which will help a lot.
> 4. This is my sun hotspot jvm startup options: -XX:+UseConcMarkSweepGC

> -XX:CMSInitiatingOccupancyFraction=50 -XX:-UseGCOverheadLimit The 
> above cannot solve the OOM forever. But they help a lot.
> Wish this can help.
> 
> -Original Message-
> From: Mike Klaas [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, May 21, 2008 2:23 PM
> To: solr-user@lucene.apache.org
> Subject: Re: SOLR OOM (out of memory) problem
> 
> 
> On 21-May-08, at 4:46 AM, gurudev wrote:
> 
>>
>> Just to add more:
>>
>> The JVM heap allocated is 6GB with initial heap size as 2GB. We use 
>> quadro(which is 8 cpus) on linux servers for SOLR slaves.
>> We use facet searches, sorting.
>> document cache is set to 7 million (which is total documents in 
>> index)
> 
>> filtercache 1
> 
> You definitely don't have enough memory to keep 7 million document, 
> fully realized in java-object form, in memory.
> 
> Nor would you want to.  The document cache should aim to keep the most

> frequently-occuring documents in memory (in the thousands, perhaps 
> 10's of thousands).  By devoting more memory to the OS disk cache, 
> more of the 12GB index can be cached by the OS and thus speed up all 
> document retreival.
> 
> -Mike
> 
> 

--
View this message in context:
http://www.nabble.com/SOLR-OOM-%28out-of-memory%29-problem-tp17364146p17
402234.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: SOLR OOM (out of memory) problem

2008-05-22 Thread Yongjun Rong
 That looks good to use those cache. Keep those cache will help improve
your search performance. Try the concurrent GC and see if you get better
result. Please let me know the results.
  Best,
  Yongjun Rong

-Original Message-
From: gurudev [mailto:[EMAIL PROTECTED] 
Sent: Thursday, May 22, 2008 7:28 AM
To: solr-user@lucene.apache.org
Subject: RE: SOLR OOM (out of memory) problem


Hi Rong,

My cache hit ratio are:

filtercache: 0.96
documentcache:0.51
queryresultcache:0.58

Thanx
Pravesh


Yongjun Rong-2 wrote:
> 
> I had the same problem some weeks before. You can try these:
> 1. Check the hit ratio for the cache via the solr/admin/stats.jsp. If 
> the hit ratio is very low. Just disable those cache. It will save you 
> some memory.
> 2. set -Xms and -Xmx to the same size will help improve GC
performance. 
> 3. Check what's GC do you use? Default will be parallel. You can try 
> use concurrent GC which will help a lot.
> 4. This is my sun hotspot jvm startup options: -XX:+UseConcMarkSweepGC

> -XX:CMSInitiatingOccupancyFraction=50 -XX:-UseGCOverheadLimit The 
> above cannot solve the OOM forever. But they help a lot.
> Wish this can help.
> 
> -Original Message-
> From: Mike Klaas [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, May 21, 2008 2:23 PM
> To: solr-user@lucene.apache.org
> Subject: Re: SOLR OOM (out of memory) problem
> 
> 
> On 21-May-08, at 4:46 AM, gurudev wrote:
> 
>>
>> Just to add more:
>>
>> The JVM heap allocated is 6GB with initial heap size as 2GB. We use 
>> quadro(which is 8 cpus) on linux servers for SOLR slaves.
>> We use facet searches, sorting.
>> document cache is set to 7 million (which is total documents in 
>> index)
> 
>> filtercache 1
> 
> You definitely don't have enough memory to keep 7 million document, 
> fully realized in java-object form, in memory.
> 
> Nor would you want to.  The document cache should aim to keep the most

> frequently-occuring documents in memory (in the thousands, perhaps 
> 10's of thousands).  By devoting more memory to the OS disk cache, 
> more of the 12GB index can be cached by the OS and thus speed up all 
> document retreival.
> 
> -Mike
> 
> 

--
View this message in context:
http://www.nabble.com/SOLR-OOM-%28out-of-memory%29-problem-tp17364146p17
402234.html
Sent from the Solr - User mailing list archive at Nabble.com.



Search query optimization

2008-05-29 Thread Yongjun Rong
Hi,
  I have a question about how the lucene query parser. For example, I
have query "A AND B AND C". Will lucene extract all documents satisfy
condition A in memory and then filter it with condition B and C? or only
the documents satisfying "A AND B AND C" will be put into memory? Is
there any articles discuss about how to build a optimization query to
save memory and improve performance?
  Thank you very much.
  Yongjun Rong


RE: Search query optimization

2008-05-29 Thread Yongjun Rong
Hi Yonik,
  Thanks for your quick reply. I'm very new to the lucene source code.
Can you give me a little more detail explaination about this.
Do you think it will save some memory if docnum = find_match("A") >
docnum = find_match("B") and put B in the front of the AND query like "B
AND A AND C"? How about sorting (sort=A,B,C&q=A AND B AND C)? Do you
think the order of conditions (A,B,C) in a query will affect the
performance of the query?
  Thank you very much.
  Yongjun


-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik
Seeley
Sent: Thursday, May 29, 2008 4:12 PM
To: solr-user@lucene.apache.org
Subject: Re: Search query optimization

On Thu, May 29, 2008 at 4:05 PM, Yongjun Rong <[EMAIL PROTECTED]>
wrote:
>  I have a question about how the lucene query parser. For example, I 
> have query "A AND B AND C". Will lucene extract all documents satisfy 
> condition A in memory and then filter it with condition B and C?

No, Lucene will try and optimize this the best it can.

It roughly goes like this..
docnum = find_match("A")
docnum = find_first_match_after(docnum, "B") docnum =
find_first_match_after(docnum,"C")
etc...
until the same docnum is returned for "A","B", and "C".

See ConjunctionScorer for the gritty details.

-Yonik



> or only
> the documents satisfying "A AND B AND C" will be put into memory? Is 
> there any articles discuss about how to build a optimization query to 
> save memory and improve performance?
>  Thank you very much.
>  Yongjun Rong
>


RE: Search query optimization

2008-06-17 Thread Yongjun Rong
Hi,
  Thanks for your reply. I did some test on my test machine. 
http://stage.boomi.com:8080/solr/select/?q=account:1&rows=1000. It will
return resultset 384 in 3ms. If I add a new AND condition as below:
http://stage.boomi.com:8080/solr/select/?q=account:1+AND+recordeddate_dt
:[NOW/DAYS-7DAYS+TO+NOW]&rows=1000. It will take 18236 to return 21
resultset. If I only use the recordedate_dt condition like
http://stage.boomi.com:8080/solr/select/?q=recordeddate_dt:[NOW/DAYS-7DA
YS+TO+NOW]&rows=1000. It takes 20271 ms to get 412800 results. All the
above URL are live, you test it.

Can anyone give me some explaination why this happens if we have the
query optimization? Thank you very much.
Yongjun Rong
 

-Original Message-
From: Walter Underwood [mailto:[EMAIL PROTECTED] 
Sent: Thursday, May 29, 2008 4:57 PM
To: solr-user@lucene.apache.org
Subject: Re: Search query optimization

The people working on Lucene are pretty smart, and this sort of query
optimization is a well-known trick, so I would not worry about it.

A dozen years ago at Infoseek, we checked the count of matches for each
term in an AND, and evaluated the smallest one first.
If any of them had zero matches, we didn't evaluate any of them.

I expect that Doug Cutting and the other Lucene folk know those same
tricks.

wunder

On 5/29/08 1:50 PM, "Yongjun Rong" <[EMAIL PROTECTED]> wrote:

> Hi Yonik,
>   Thanks for your quick reply. I'm very new to the lucene source code.
> Can you give me a little more detail explaination about this.
> Do you think it will save some memory if docnum = find_match("A") > 
> docnum = find_match("B") and put B in the front of the AND query like 
> "B AND A AND C"? How about sorting (sort=A,B,C&q=A AND B AND C)? Do 
> you think the order of conditions (A,B,C) in a query will affect the 
> performance of the query?
>   Thank you very much.
>   Yongjun
> 
> 
> -Original Message-
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik 
> Seeley
> Sent: Thursday, May 29, 2008 4:12 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Search query optimization
> 
> On Thu, May 29, 2008 at 4:05 PM, Yongjun Rong <[EMAIL PROTECTED]>
> wrote:
>>  I have a question about how the lucene query parser. For example, I 
>> have query "A AND B AND C". Will lucene extract all documents satisfy

>> condition A in memory and then filter it with condition B and C?
> 
> No, Lucene will try and optimize this the best it can.
> 
> It roughly goes like this..
> docnum = find_match("A")
> docnum = find_first_match_after(docnum, "B") docnum =
> find_first_match_after(docnum,"C")
> etc...
> until the same docnum is returned for "A","B", and "C".
> 
> See ConjunctionScorer for the gritty details.
> 
> -Yonik
> 
> 
> 
>> or only
>> the documents satisfying "A AND B AND C" will be put into memory? Is 
>> there any articles discuss about how to build a optimization query to

>> save memory and improve performance?
>>  Thank you very much.
>>  Yongjun Rong
>> 



RE: Search query optimization

2008-06-17 Thread Yongjun Rong
Thanks for reply. Here is the debugQuery output:

−

account:1 AND recordeddate_dt:[NOW/DAYS-1DAYS TO NOW]

−

account:1 AND recordeddate_dt:[NOW/DAYS-1DAYS TO NOW]

−

+account:1 +recordeddate_dt:[2008-06-16T00:00:00.000Z TO 
2008-06-17T17:07:57.420Z]

−

+account:1 +recordeddate_dt:[2008-06-16T00:00:00.000 TO 2008-06-17T17:07:57.420]

−

−


10.88071 = (MATCH) sum of:
  10.788804 = (MATCH) weight(account:1 in 6515410), product of:
0.9957678 = queryWeight(account:1), product of:
  10.834659 = idf(docFreq=348, numDocs=6515640)
  0.09190578 = queryNorm
10.834659 = (MATCH) fieldWeight(account:1 in 6515410), product of:
  1.0 = tf(termFreq(account:1)=1)
  10.834659 = idf(docFreq=348, numDocs=6515640)
  1.0 = fieldNorm(field=account, doc=6515410)
  0.09190578 = (MATCH) 
ConstantScoreQuery(recordeddate_dt:[2008-06-16T00:00:00.000-2008-06-17T17:07:57.420]),
 product of:
1.0 = boost
0.09190578 = queryNorm


 

-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, June 17, 2008 12:43 PM
To: solr-user@lucene.apache.org
Subject: Re: Search query optimization

Hi,

Probably because the [NOW/DAYS-7DAYS+TO+NOW] part gets rewritten as lots of OR 
clauses.  I think that you'll see that if you add &debugQuery=true to the URL.  
Make sure your recorded_date_dt is not too granular (e.g. if you don't need 
minutes, round the values to hours. If you don't need hours, round the values 
to days).


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


- Original Message ----
> From: Yongjun Rong <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Tuesday, June 17, 2008 11:56:06 AM
> Subject: RE: Search query optimization
> 
> Hi,
>   Thanks for your reply. I did some test on my test machine. 
> http://stage.boomi.com:8080/solr/select/?q=account:1&rows=1000. It 
> will return resultset 384 in 3ms. If I add a new AND condition as below:
> http://stage.boomi.com:8080/solr/select/?q=account:1+AND+recordeddate_
> dt :[NOW/DAYS-7DAYS+TO+NOW]&rows=1000. It will take 18236 to return 21 
> resultset. If I only use the recordedate_dt condition like 
> http://stage.boomi.com:8080/solr/select/?q=recordeddate_dt:[NOW/DAYS-7
> DA
> YS+TO+NOW]&rows=1000. It takes 20271 ms to get 412800 results. All the
> above URL are live, you test it.
> 
> Can anyone give me some explaination why this happens if we have the 
> query optimization? Thank you very much.
> Yongjun Rong
> 
> 
> -Original Message-
> From: Walter Underwood [mailto:[EMAIL PROTECTED]
> Sent: Thursday, May 29, 2008 4:57 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Search query optimization
> 
> The people working on Lucene are pretty smart, and this sort of query 
> optimization is a well-known trick, so I would not worry about it.
> 
> A dozen years ago at Infoseek, we checked the count of matches for 
> each term in an AND, and evaluated the smallest one first.
> If any of them had zero matches, we didn't evaluate any of them.
> 
> I expect that Doug Cutting and the other Lucene folk know those same 
> tricks.
> 
> wunder
> 
> On 5/29/08 1:50 PM, "Yongjun Rong" wrote:
> 
> > Hi Yonik,
> >   Thanks for your quick reply. I'm very new to the lucene source code.
> > Can you give me a little more detail explaination about this.
> > Do you think it will save some memory if docnum = find_match("A") > 
> > docnum = find_match("B") and put B in the front of the AND query 
> > like "B AND A AND C"? How about sorting (sort=A,B,C&q=A AND B AND 
> > C)? Do you think the order of conditions (A,B,C) in a query will 
> > affect the performance of the query?
> >   Thank you very much.
> >   Yongjun
> >    
> > 
> > -Original Message-
> > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of 
> > Yonik Seeley
> > Sent: Thursday, May 29, 2008 4:12 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Search query optimization
> > 
> > On Thu, May 29, 2008 at 4:05 PM, Yongjun Rong
> > wrote:
> >>  I have a question about how the lucene query parser. For example, 
> >> I have query "A AND B AND C". Will lucene extract all documents 
> >> satisfy
> 
> >> condition A in memory and then filter it with condition B and C?
> > 
> > No, Lucene will try and optimize this the best it can.
> > 
> > It roughly goes like this..
> > docnum = find_match("A")
> > docnum = find_first_match_after(docnum, "B") docnum =
> > find_first_match_after(docnum,"C")
> > etc...
> > until the same docnum is returned for "A","B", and "C".
> > 
> > See ConjunctionScorer for the gritty details.
> > 
> > -Yonik
> > 
> > 
> > 
> >> or only
> >> the documents satisfying "A AND B AND C" will be put into memory? 
> >> Is there any articles discuss about how to build a optimization 
> >> query to
> 
> >> save memory and improve performance?
> >>  Thank you very much.
> >>  Yongjun Rong
> >> 



RE: Search query optimization

2008-06-17 Thread Yongjun Rong
Hi Otis,
  Thanks for your advice. Do you mean when we add the date data we need 
carefully select the granularity of the date field to make sure it is more 
coarse? How can we do this? We just access solr via http URL not API. If you 
talk about the query syntax, we do have NOW/DAY as round to DAY.
  Yongjun Rong
   

-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, June 17, 2008 1:32 PM
To: solr-user@lucene.apache.org
Subject: Re: Search query optimization

Hi,

This is what I was talking about:

recordeddate_dt:[2008-06-16T00:00:00.000Z TO 2008-06-17T17:07:57.420Z]

Note that the granularity of this date field is down to milliseconds.  You 
should change that to be more coarse if you don't need such precision (e.g. no 
milliseconds, no seconds, no minutes, no hours...)


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


- Original Message 
> From: Yongjun Rong <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Tuesday, June 17, 2008 1:09:19 PM
> Subject: RE: Search query optimization
> 
> Thanks for reply. Here is the debugQuery output:
> 
> −
> 
> account:1 AND recordeddate_dt:[NOW/DAYS-1DAYS TO NOW]
> 
> −
> 
> account:1 AND recordeddate_dt:[NOW/DAYS-1DAYS TO NOW]
> 
> −
> 
> +account:1 +recordeddate_dt:[2008-06-16T00:00:00.000Z TO
> 2008-06-17T17:07:57.420Z]
> 
> −
> 
> +account:1 +recordeddate_dt:[2008-06-16T00:00:00.000 TO 
> +2008-06-17T17:07:57.420]
> 
> −
> 
> −
> 
> name="id=e03dbd92-3d41-4693-8b69-ac9a0d332446-atom-d52484f5-7aa8-40b3-
> ad6f-ba3a9071999e,internal_docid=6515410">
> 
> 10.88071 = (MATCH) sum of:
>   10.788804 = (MATCH) weight(account:1 in 6515410), product of:
> 0.9957678 = queryWeight(account:1), product of:
>   10.834659 = idf(docFreq=348, numDocs=6515640)
>   0.09190578 = queryNorm
> 10.834659 = (MATCH) fieldWeight(account:1 in 6515410), product of:
>   1.0 = tf(termFreq(account:1)=1)
>   10.834659 = idf(docFreq=348, numDocs=6515640)
>   1.0 = fieldNorm(field=account, doc=6515410)
>   0.09190578 = (MATCH)
> ConstantScoreQuery(recordeddate_dt:[2008-06-16T00:00:00.000-2008-06-17
> T17:07:57.420]),
> product of:
> 1.0 = boost
> 0.09190578 = queryNorm
> 
> 
>  
> 
> -Original Message-
> From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, June 17, 2008 12:43 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Search query optimization
> 
> Hi,
> 
> Probably because the [NOW/DAYS-7DAYS+TO+NOW] part gets rewritten as 
> lots of OR clauses.  I think that you'll see that if you add &debugQuery=true 
> to the URL.
> Make sure your recorded_date_dt is not too granular (e.g. if you don't 
> need minutes, round the values to hours. If you don't need hours, 
> round the values to days).
> 
> 
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> 
> - Original Message 
> > From: Yongjun Rong
> > To: solr-user@lucene.apache.org
> > Sent: Tuesday, June 17, 2008 11:56:06 AM
> > Subject: RE: Search query optimization
> > 
> > Hi,
> >   Thanks for your reply. I did some test on my test machine. 
> > http://stage.boomi.com:8080/solr/select/?q=account:1&rows=1000. It 
> > will return resultset 384 in 3ms. If I add a new AND condition as below:
> > http://stage.boomi.com:8080/solr/select/?q=account:1+AND+recordeddat
> > e_ dt :[NOW/DAYS-7DAYS+TO+NOW]&rows=1000. It will take 18236 to 
> > return 21 resultset. If I only use the recordedate_dt condition like
> > http://stage.boomi.com:8080/solr/select/?q=recordeddate_dt:[NOW/DAYS
> > -7
> > DA
> > YS+TO+NOW]&rows=1000. It takes 20271 ms to get 412800 results. All 
> > YS+TO+the
> > above URL are live, you test it.
> > 
> > Can anyone give me some explaination why this happens if we have the 
> > query optimization? Thank you very much.
> > Yongjun Rong
> > 
> > 
> > -Original Message-
> > From: Walter Underwood [mailto:[EMAIL PROTECTED]
> > Sent: Thursday, May 29, 2008 4:57 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Search query optimization
> > 
> > The people working on Lucene are pretty smart, and this sort of 
> > query optimization is a well-known trick, so I would not worry about it.
> > 
> > A dozen years ago at Infoseek, we checked the count of matches for 
> > each term in an AND, and evaluated the smallest one first.
> > If any of them had zero matches, we didn't evaluate any of them.
> > 
> > I expect that Doug Cutting and the other

RE: Search query optimization

2008-06-17 Thread Yongjun Rong
Hi Chris,
   Thanks for your suggestions. I did try the [NOW/DAY-7DAYS TO
NOW/DAY], but it is not better. And I tried [NOW/DAY-7DAYS TO
NOW/DAY+1DAY], I got some exception as below:
org.apache.solr.core.SolrException: Query parsing error: Cannot parse
'account:1 AND recordeddate_dt:[NOW/DAYS-7DAYS TO NOW/DAY 1DAY]':
Encountered "1DAY" at line 1, column 57.
Was expecting:
"]" ...

at
org.apache.solr.search.QueryParsing.parseQuery(QueryParsing.java:104)
at
org.apache.solr.request.StandardRequestHandler.handleRequestBody(Standar
dRequestHandler.java:109)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB
ase.java:77)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:658)
at
org.apache.solr.servlet.SolrServlet.doGet(SolrServlet.java:66)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at
org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:487)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHan
dler.java:1093)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j
ava:185)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHan
dler.java:1084)
at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:360)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:2
16)
at
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:726)
at
org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandler
Collection.java:206)
at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.jav
a:114)
at
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:324)
at
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:505)
at
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConne
ction.java:828)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:514)
at
org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:211)
at
org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:380)
at
org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:
395)
at
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.ja
va:450)
Caused by: org.apache.lucene.queryParser.ParseException: Cannot parse
'account:1 AND recordeddate_dt:[NOW/DAYS-7DAYS TO NOW/DAY 1DAY]':
Encountered "1DAY" at line 1, column 57.
Was expecting:
"]" ...

at
org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:152)
at
org.apache.solr.search.QueryParsing.parseQuery(QueryParsing.java:94)
... 26 more

And I will try to open the cache and see if I can get better query time.
I will let you know.
Thank you very much.
Yongjun Rong

-Original Message-
From: Chris Hostetter [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, June 17, 2008 1:55 PM
To: solr-user@lucene.apache.org
Subject: Re: Search query optimization


: Probably because the [NOW/DAYS-7DAYS+TO+NOW] part gets rewritten as
lots
: of OR clauses.  I think that you'll see that if you add
&debugQuery=true
: to the URL.  Make sure your recorded_date_dt is not too granular (e.g.

: if you don't need minutes, round the values to hours. If you don't
need
: hours, round the values to days).

for the record: it doesn't get rewritten to a lot of OR clauses, it's
using ConstantScoreRangeQuery.

granularity is definitely important however, bth when indexing and when
querying.  

"NOW" is milliseconds, so every time you execute that query it's
different and there is almost no caching possible.

if you use [NOW/DAY-7DAYS TO NOW/DAY] or even [NOW/DAY-7DAYS TO
NOW/HOUR] you'll get a lot better caching behavior.  it looks like you
are trying to find anything in the past week, so you may want
[NOW/DAY-7DAYS TO NOW/DAY+1DAY] (to go to the end of the current day)

once you have a less granular date restriction, it can frequently make
sense to put this in a seperate fq clause, so it will get cached
independently of your main query. 

But Otis's point about reducing granularity can also help when indexing
... the fewer "unique" dates that apepar in your index, the faster range
queries will be ... if you've got 1000 documents that all of a
recordeddate of June 11 2008, but at different times, and you're never
going to care aboutthe times (just the date) then strip those times off
when indexing so they all have the same

RE: Search query optimization

2008-06-17 Thread Yongjun Rong
Hi Chris,
  Thank you very much for the detail suggestions. I just did the cache
test. If most of requests return the same set of data, cache will
improve the query performance. But in our usage, almost all requests
have different data set to return. The cache hit ratio is very low.
That's the reason we close the cache for memory saving.  Another
question is: 
q=account:1+AND+recordeddate_dt:[NOW/DAY-7DAYS+TO+NOW/DAY] will combine
the resultset of account:1 and
recordeddate_dt:[NOW/DAY-7DAYS+TO+NOW/DAY]. How lucene handle it? From
my previous test examples, it seems lucene will not check the size of
the subconditions (like account:1 or
recordeddate_dt:[NOW/DAY-7DAYS+TO+NOW/DAY]). Q=account:1 will return a
small set of data. But q=recordeddate_dt:[NOW/DAY-7DAYS+TO+NOW/DAY] will
return a large set of data. If we combine them with "AND" like:
q=account+AND+recordeddate_dt:[NOW/DAY-7DAYS+TO+NOW/DAY]. It should
return the small set of data and then apply the subcondition
"recordeddate_dt:[NOW/DAY-7DAYS+TO+NOW/DAY]". But from the response
time, it seems not the case.
Can anyone give me some detail explaination about this?
Thank you very much.
Yongjun Rong

-Original Message-
From: Chris Hostetter [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, June 17, 2008 2:32 PM
To: solr-user@lucene.apache.org
Subject: RE: Search query optimization

:Thanks for your suggestions. I did try the [NOW/DAY-7DAYS TO
: NOW/DAY], but it is not better. And I tried [NOW/DAY-7DAYS TO
: NOW/DAY+1DAY], I got some exception as below:
: org.apache.solr.core.SolrException: Query parsing error: Cannot parse
: 'account:1 AND recordeddate_dt:[NOW/DAYS-7DAYS TO NOW/DAY 1DAY]':
: Encountered "1DAY" at line 1, column 57.

you need to propertly URL escape the "+" character as %2B in your URLs.

: And I will try to open the cache and see if I can get better query
time.

the first request won't be any faster.  but the second request will be.

and if filtering by week is something you expect peopel to do a lot of,
you can put it in a newSearcher so it's always warmed up and fast for
everyone.


-Hoss