RE: issues with solr

2008-04-15 Thread dudes dudes

thanks for your help Erik, 

ak

> From: [EMAIL PROTECTED]
> Subject: Re: issues with solr
> Date: Mon, 14 Apr 2008 14:50:34 -0400
> To: solr-user@lucene.apache.org
> 
> There is an "Ant script" section on that mySolr page.
> 
> But there is no need to use any of that for your project.  All you  
> need is Solr's  WAR file and the appropriate Solr configuration files  
> and you're good to go.
> 
>   Erik
> 
> On Apr 14, 2008, at 9:12 AM, dudes dudes wrote:
>>
>> thanks Erik,
>>
>> Basically I have used the build file from solr not from that  
>> page,... I have had a look and couldn't really find their build.xml  
>> file !
>>
>> thanks
>> ak
>>
>> 
>>> From: [EMAIL PROTECTED]
>>> Subject: Re: issues with solr
>>> Date: Mon, 14 Apr 2008 08:54:39 -0400
>>> To: solr-user@lucene.apache.org
>>>
>>> The mysolr.dist target is defined in the Ant file on that page.  My
>>> guess is that you were not using the Ant build file bits there.
>>>
>>> My take is that the mySolr page is not quite what folks should be
>>> cloning for incorporation of Solr into their application.  Maybe that
>>> page should be removed or reworked?
>>>
>>> Erik
>>>
>>>
>>> On Apr 14, 2008, at 8:21 AM, dudes dudes wrote:

  Hello there

 I'm new to Solr 
  I'm trying to deploy the example under http://wiki.apache.org/solr/
 mySolr  .However, every time I issue ant mysolr.dist it generates:

  Buildfile: build.xml

  BUILD FAILED
  Target "mysolr.dist" does not exist in the project "solr".

  I'm running Ubuntu getty and the ant version is 1.7.0

  What have I missed ?

 many thanks for your help

  ak

 _
 Get Hotmail on your mobile. Text MSN to 63463 now!
 http://mobile.uk.msn.com/pc/mail.aspx
>>>
>>
>> _
>> The next generation of Windows Live is here
>> http://www.windowslive.co.uk/get-live
> 

_
Welcome to the next generation of Windows Live
http://www.windowslive.co.uk/get-live

Re: too many queries?

2008-04-15 Thread Jonathan Ariel
My index is 4GB on disk. My servers has 8 GB of RAM each (the OS is 32
bits).
It is optimized twice a day, it takes around 15 minutes to optimize.
The index is updated (commits) every two minutes. There are between 10 and
100 inserts/updates every 2 minutes.
The cache configuration is:
filterCache
autowarmCount=256
lookups : 24241
hits : 21575
hitratio : 0.89
inserts : 3708
evictions : 3155
size : 512
cumulative_lookups : 2662056
cumulative_hits : 2355474
cumulative_hitratio : 0.88
cumulative_inserts : 382039
cumulative_evictions : 365038
queryResultCache
autowarmCount=256
lookups : 2303
hits : 271
hitratio : 0.11
inserts : 2308
evictions : 1774
size : 512
cumulative_lookups : 237586
cumulative_hits : 39555
cumulative_hitratio : 0.16
cumulative_inserts : 201009
cumulative_evictions : 180025
documentCache
lookups : 58032
hits : 33759
hitratio : 0.58
inserts : 24273
evictions : 23761
size : 512
cumulative_lookups : 6694035
cumulative_hits : 3906883
cumulative_hitratio : 0.58
cumulative_inserts : 2787152
cumulative_evictions : 2752219


The CPU usage is usually 50%.
I give the JVM "java -server -Xmx2048m" when I start Solr.

Thanks!

Jonathan




On Mon, Apr 14, 2008 at 8:24 PM, Otis Gospodnetic <
[EMAIL PROTECTED]> wrote:

> It's hard to tell from the info given, though something doesn't sound
> ideal.  Even if Solr's caching doesn't help, with only 4M documents, your
> Solr search slaves should be able to keep the whole index in RAM, assuming
> your index is not huge.
>
> How large is the index? (GB on disk)
> Is it optimized?
> How often is it changed on the master - i.e. how often does your Searcher
> need to be reopened?
> What are cache hits and evictions like (Solr admin page)?
> What are cache sizes like and how is the warm-up configured?
> Is there any IO on the slaves? (run vmstat or iostat or some such)
> How is the CPU usage looking?
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
> - Original Message 
> From: Jonathan Ariel <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Monday, April 14, 2008 5:50:08 PM
> Subject: too many queries?
>
> Hi,
> I have some questions about performance for you guys.
> So basically I have 2 slave solr servers and 1 master solr server load
> balanced and around 100 request/second, aprox. 50 request per second per
> solr server.
> My index is about 4 million documents and the average query response time
> is
> 0.6 seconds, retrieving just 4 documents per query.
> What happens is that there are too many request to Solr and every second
> is
> getting bigger, so eventually my site stops working.
>
> I don't know if this stats are enough to tell if the servers are supposed
> to
> handle this amount of request. Maybe it's a configuration problem. I don't
> think that caching in solr would help in this case because all the queries
> are different (I'm not sure how caching works but if it's per query it
> won't
> help much in this case).
>
> Any thoughts about this?
>
> Thanks!
>
> Jonathan
>
>
>
>


Re: too many queries?

2008-04-15 Thread Erik Hatcher
Filter cache evictions are a big red flag.  Try bumping up the size  
of your filter cache to avoid regenerating filters.


Erik




On Apr 15, 2008, at 8:38 AM, Jonathan Ariel wrote:

filterCache
autowarmCount=256
lookups : 24241
hits : 21575
hitratio : 0.89
inserts : 3708
evictions : 3155
size : 512
cumulative_lookups : 2662056
cumulative_hits : 2355474
cumulative_hitratio : 0.88
cumulative_inserts : 382039
cumulative_evictions : 365038

The CPU usage is usually 50%.
I give the JVM "java -server -Xmx2048m" when I start Solr.

Thanks!

Jonathan




On Mon, Apr 14, 2008 at 8:24 PM, Otis Gospodnetic <
[EMAIL PROTECTED]> wrote:


It's hard to tell from the info given, though something doesn't sound
ideal.  Even if Solr's caching doesn't help, with only 4M  
documents, your
Solr search slaves should be able to keep the whole index in RAM,  
assuming

your index is not huge.

How large is the index? (GB on disk)
Is it optimized?
How often is it changed on the master - i.e. how often does your  
Searcher

need to be reopened?
What are cache hits and evictions like (Solr admin page)?
What are cache sizes like and how is the warm-up configured?
Is there any IO on the slaves? (run vmstat or iostat or some such)
How is the CPU usage looking?

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
From: Jonathan Ariel <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Monday, April 14, 2008 5:50:08 PM
Subject: too many queries?

Hi,
I have some questions about performance for you guys.
So basically I have 2 slave solr servers and 1 master solr server  
load
balanced and around 100 request/second, aprox. 50 request per  
second per

solr server.
My index is about 4 million documents and the average query  
response time

is
0.6 seconds, retrieving just 4 documents per query.
What happens is that there are too many request to Solr and every  
second

is
getting bigger, so eventually my site stops working.

I don't know if this stats are enough to tell if the servers are  
supposed

to
handle this amount of request. Maybe it's a configuration problem.  
I don't
think that caching in solr would help in this case because all the  
queries
are different (I'm not sure how caching works but if it's per  
query it

won't
help much in this case).

Any thoughts about this?

Thanks!

Jonathan








RE: Slow Highlighting -> CopyField maxSize property

2008-04-15 Thread Nicolas DESSAIGNE
Koji,

The patch is now available at https://issues.apache.org/jira/browse/SOLR-538

Tell me if it fits your needs.
Nicolas

-Message d'origine-
De : Koji Sekiguchi [mailto:[EMAIL PROTECTED]
Envoyé : vendredi 21 mars 2008 16:50
À : solr-user@lucene.apache.org
Objet : Re: Slow Highlighting -> CopyField maxSize property

Hello Nocolas,

This has been in the back of my mind for a time.
Can you make a patch for it? I'd like to use it.

Thank you,

Koji

[EMAIL PROTECTED] wrote:
> Hi all,
>
>
>
> I would like to propose a new property on copy fields that limit the number
> of characters that are copied.
>
>
>
> The use case is the following: Among other documents, we index very big
> documents (several Mo of text) and want to be able to use highlighting.
> However, as soon as one or more big documents are included in the matches,
> the response time is awful. The maxAnalyzedChars is not enough as the full
> document is uploaded in memory before to do any processing and that alone
> can be very long.
>
>
>
> For this kind of situations, we propose to use a dedicated copy field for
> highlighting and to limit the number of characters that are copied. For
> exemple:
>
> 
>
>
>
> This approach has also the advantage of limiting the index size for large
> documents (the original text field does not need to be stored and to have
> term vectors). However, the index is bigger for small documents...
>
>
>
> Of course, if the only terms that are matched by a query are after the
> limit, no highlight is possible.
>
>
>
> What do you think of this feature?
>
>
>
> Best regards,
>
> Nicolas
>
>


Re: too many queries?

2008-04-15 Thread Jonathan Ariel
Thanks. It should be around lookups*1.5, right?
Is this measured in bytes?

On Tue, Apr 15, 2008 at 11:26 AM, Erik Hatcher <[EMAIL PROTECTED]>
wrote:

> Filter cache evictions are a big red flag.  Try bumping up the size of
> your filter cache to avoid regenerating filters.
>
>Erik
>
>
>
>
> On Apr 15, 2008, at 8:38 AM, Jonathan Ariel wrote:
>
> > filterCache
> > autowarmCount=256
> > lookups : 24241
> > hits : 21575
> > hitratio : 0.89
> > inserts : 3708
> > evictions : 3155
> > size : 512
> > cumulative_lookups : 2662056
> > cumulative_hits : 2355474
> > cumulative_hitratio : 0.88
> > cumulative_inserts : 382039
> > cumulative_evictions : 365038
> >
> > The CPU usage is usually 50%.
> > I give the JVM "java -server -Xmx2048m" when I start Solr.
> >
> > Thanks!
> >
> > Jonathan
> >
> >
> >
> >
> > On Mon, Apr 14, 2008 at 8:24 PM, Otis Gospodnetic <
> > [EMAIL PROTECTED]> wrote:
> >
> >  It's hard to tell from the info given, though something doesn't sound
> > > ideal.  Even if Solr's caching doesn't help, with only 4M documents,
> > > your
> > > Solr search slaves should be able to keep the whole index in RAM,
> > > assuming
> > > your index is not huge.
> > >
> > > How large is the index? (GB on disk)
> > > Is it optimized?
> > > How often is it changed on the master - i.e. how often does your
> > > Searcher
> > > need to be reopened?
> > > What are cache hits and evictions like (Solr admin page)?
> > > What are cache sizes like and how is the warm-up configured?
> > > Is there any IO on the slaves? (run vmstat or iostat or some such)
> > > How is the CPU usage looking?
> > >
> > > Otis
> > > --
> > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> > >
> > > - Original Message 
> > > From: Jonathan Ariel <[EMAIL PROTECTED]>
> > > To: solr-user@lucene.apache.org
> > > Sent: Monday, April 14, 2008 5:50:08 PM
> > > Subject: too many queries?
> > >
> > > Hi,
> > > I have some questions about performance for you guys.
> > > So basically I have 2 slave solr servers and 1 master solr server load
> > > balanced and around 100 request/second, aprox. 50 request per second
> > > per
> > > solr server.
> > > My index is about 4 million documents and the average query response
> > > time
> > > is
> > > 0.6 seconds, retrieving just 4 documents per query.
> > > What happens is that there are too many request to Solr and every
> > > second
> > > is
> > > getting bigger, so eventually my site stops working.
> > >
> > > I don't know if this stats are enough to tell if the servers are
> > > supposed
> > > to
> > > handle this amount of request. Maybe it's a configuration problem. I
> > > don't
> > > think that caching in solr would help in this case because all the
> > > queries
> > > are different (I'm not sure how caching works but if it's per query it
> > > won't
> > > help much in this case).
> > >
> > > Any thoughts about this?
> > >
> > > Thanks!
> > >
> > > Jonathan
> > >
> > >
> > >
> > >
> > >
>


Re: Slow Highlighting -> CopyField maxSize property

2008-04-15 Thread Koji Sekiguchi

Hello Nicolas,

Thank you for letting me know this.

Yes, your patch will solve my problem (highlighter performance w/ large 
doc).

BTW, I posted similar ticket to solve my another problem (hl.alternateField
w/ large field).

https://issues.apache.org/jira/browse/SOLR-516

Thank you again,

Koji

Nicolas DESSAIGNE wrote:

Koji,

The patch is now available at https://issues.apache.org/jira/browse/SOLR-538

Tell me if it fits your needs.
Nicolas

  




Re: Fuzzy queries in dismax specs?

2008-04-15 Thread Chris Hostetter

: I've started implementing something to use fuzzy queries for selected fields
: in dismax. The request handler spec looks like this:
: 
:exact~0.7^4.0 stemmed^2.0

that's a pretty cool idea ... usually when people talk about adding 
support for other querytypes in dismax they mean to the query sytnax, but 
you are adding more info to the qf to specify how hte field should be 
handled in general -- i like it.

i think if i had it to do over again (now that dismax supports multiple 
param values, and per field overrides) i would have made qf and pf 
multivalued params containing just the field names, and gotten the boost 
value from a per field overridable fieldBoost param, so adding a 
fuzzyDistance param would also be trivial 9without needing to parse crazy 
syntax)

(hmmm... ps could be a per field overridable field too ... dismax v2.0 
maybe)


-Hoss



Re: Interleaved results form different sources

2008-04-15 Thread Chris Hostetter

: > We have an index of documents from different sources and we want to make
: > sure the results we display are interleaved from the different sources and
: > not only ranked based on relevancy.Is there a way to do this ?
: 
: By far the easiest way is to get the top N/2 results from each source and
: interleave on the client side.

Actually, for a search with no a priori information about the results, you 
need to fetch N from both sources in case one of them has no matches.  (in  
a paginated system, assuming you know the total number of resultsfrom the 
first page, subsequent pages can ask for N/2 from each source as long as 
you know that the current page won't exhaust either source)



-Hoss



Re: Interleaved results form different sources

2008-04-15 Thread peter360

How do you get the top N/2 results from each source?  What if you have more
than 2 sources?


Mike Klaas wrote:
> 
> By far the easiest way is to get the top N/2 results from each source  
> and interleave on the client side.
> 
> regards,
> -Mike
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Interleaved-results-form-different-sources-tp16693128p16703399.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Interleaved results form different sources

2008-04-15 Thread Mike Klaas


first query:
q=foo&fq=source:one&rows=5

second query:
q=foo&fq=source:two&rows=5

I don't know the answer to your second question, sicnce I don't  
understand the use case for interleaving two sources anyway (I would  
try to create scores for the sources that were comparable in some way  
and combine them using score).


-Mike

On 15-Apr-08, at 10:29 AM, peter360 wrote:


How do you get the top N/2 results from each source?  What if you  
have more

than 2 sources?


Mike Klaas wrote:


By far the easiest way is to get the top N/2 results from each source
and interleave on the client side.

regards,
-Mike




--
View this message in context: 
http://www.nabble.com/Interleaved-results-form-different-sources-tp16693128p16703399.html
Sent from the Solr - User mailing list archive at Nabble.com.





Re: filtering search using regex

2008-04-15 Thread Chris Hostetter

Solr doesn't provide any regex based searching features out of the box.  

There are some regex based query classes in lucene, if you wrote a custom 
Solr plugin to do the query parsing, you could use them.

Your question appears to be an "XY Problem" ... that is: you are dealing
with "X", you are assuming "Y" will help you, and you are asking about "Y"
without giving more details about the "X" so that we can understand the
full issue.  Perhaps the best solution doesn't involve "Y" at all?
See Also: http://www.perlmonks.org/index.pl?node_id=542341
http://people.apache.org/~hossman/#xyproblem

If you could elaborate a little more on the exact use case you are trying 
to solve, people might be able to offer you alternative solutions you've 
never thought of ... supporting regex search is a much harder problem then 
finding creative ways to support range queries on unclean data (which is 
what the root of your issue seems to be).

Tell us more about your data, and the types of queries you need to support 
(without making the assumption that regexes is the best way to 
support them)


-Hoss



Re: too many queries?

2008-04-15 Thread Otis Gospodnetic
Yeah, lots of evictions and tiny caches.  Why not increase them?  It looks like 
you have memory to spare.  And since you reopen the searcher so often, you can 
play with increasing the warm-up time if you want to preserve more cached items 
from the previous searcher.

Evictions are measured in the number of occurrences, not bytes.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
From: Jonathan Ariel <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Tuesday, April 15, 2008 11:03:53 AM
Subject: Re: too many queries?

Thanks. It should be around lookups*1.5, right?
Is this measured in bytes?

On Tue, Apr 15, 2008 at 11:26 AM, Erik Hatcher <[EMAIL PROTECTED]>
wrote:

> Filter cache evictions are a big red flag.  Try bumping up the size of
> your filter cache to avoid regenerating filters.
>
>Erik
>
>
>
>
> On Apr 15, 2008, at 8:38 AM, Jonathan Ariel wrote:
>
> > filterCache
> > autowarmCount=256
> > lookups : 24241
> > hits : 21575
> > hitratio : 0.89
> > inserts : 3708
> > evictions : 3155
> > size : 512
> > cumulative_lookups : 2662056
> > cumulative_hits : 2355474
> > cumulative_hitratio : 0.88
> > cumulative_inserts : 382039
> > cumulative_evictions : 365038
> >
> > The CPU usage is usually 50%.
> > I give the JVM "java -server -Xmx2048m" when I start Solr.
> >
> > Thanks!
> >
> > Jonathan
> >
> >
> >
> >
> > On Mon, Apr 14, 2008 at 8:24 PM, Otis Gospodnetic <
> > [EMAIL PROTECTED]> wrote:
> >
> >  It's hard to tell from the info given, though something doesn't sound
> > > ideal.  Even if Solr's caching doesn't help, with only 4M documents,
> > > your
> > > Solr search slaves should be able to keep the whole index in RAM,
> > > assuming
> > > your index is not huge.
> > >
> > > How large is the index? (GB on disk)
> > > Is it optimized?
> > > How often is it changed on the master - i.e. how often does your
> > > Searcher
> > > need to be reopened?
> > > What are cache hits and evictions like (Solr admin page)?
> > > What are cache sizes like and how is the warm-up configured?
> > > Is there any IO on the slaves? (run vmstat or iostat or some such)
> > > How is the CPU usage looking?
> > >
> > > Otis
> > > --
> > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> > >
> > > - Original Message 
> > > From: Jonathan Ariel <[EMAIL PROTECTED]>
> > > To: solr-user@lucene.apache.org
> > > Sent: Monday, April 14, 2008 5:50:08 PM
> > > Subject: too many queries?
> > >
> > > Hi,
> > > I have some questions about performance for you guys.
> > > So basically I have 2 slave solr servers and 1 master solr server load
> > > balanced and around 100 request/second, aprox. 50 request per second
> > > per
> > > solr server.
> > > My index is about 4 million documents and the average query response
> > > time
> > > is
> > > 0.6 seconds, retrieving just 4 documents per query.
> > > What happens is that there are too many request to Solr and every
> > > second
> > > is
> > > getting bigger, so eventually my site stops working.
> > >
> > > I don't know if this stats are enough to tell if the servers are
> > > supposed
> > > to
> > > handle this amount of request. Maybe it's a configuration problem. I
> > > don't
> > > think that caching in solr would help in this case because all the
> > > queries
> > > are different (I'm not sure how caching works but if it's per query it
> > > won't
> > > help much in this case).
> > >
> > > Any thoughts about this?
> > >
> > > Thanks!
> > >
> > > Jonathan
> > >
> > >
> > >
> > >
> > >
>





Re: Snipets Solr/nutch

2008-04-15 Thread khirb7



Mike Klaas wrote:
> 
> On 13-Apr-08, at 3:25 AM, khirb7 wrote:
>>
>> it doesn't work solr still use the default value fragsize=100. also  
>> I am not
>> able to spécifieregex  fragmenter due to this probleme of  
>> version I
>> suppose or the way I am declaring   ..> highlighting>
>> because
>> both of:
> 
> Hi khirb,
> 
> It might be easier for people to help you if you keep things in one  
> thread.
> 
> I notice that you're trying to apply a patch that has long since been  
> applied to Solr (another thread).  What version of Solr are you  
> using?  How did you acquire it?
> 
> -Mike
> 
hi mike 

Thank you a lot you are helpful, concerning my solr I am using the 1.2.0
version i download it from the Apache download mirror  
http://www.apache.org/dyn/closer.cgi/lucene/solr/  , I haven't well
understand you when you said :

you're trying to apply a patch that has long since been  
applied to Solr.

thank you mike.


-- 
View this message in context: 
http://www.nabble.com/Snipets-Solr-nutch-tp16537216p16708645.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: too many queries?

2008-04-15 Thread Mike Klaas

On 15-Apr-08, at 5:38 AM, Jonathan Ariel wrote:

My index is 4GB on disk. My servers has 8 GB of RAM each (the OS is 32
bits).
It is optimized twice a day, it takes around 15 minutes to optimize.
The index is updated (commits) every two minutes. There are between  
10 and

100 inserts/updates every 2 minutes.


Caching could help--you should definitely start there.

The commit every 2 minutes could end up being an unsurmountable  
problem.  You may have to partition your data into a large, mostly  
static set and a small dynamic set, combining the results at query time.


-Mike


Re: Snipets Solr/nutch

2008-04-15 Thread Mike Klaas

On 15-Apr-08, at 1:37 PM, khirb7 wrote:


Thank you a lot you are helpful, concerning my solr I am using the  
1.2.0

version i download it from the Apache download mirror
http://www.apache.org/dyn/closer.cgi/lucene/solr/  , I haven't well
understand you when you said :

you're trying to apply a patch that has long since been
applied to Solr.


Hi khirb,

You could try looking at "trunk" (the development version of Solr that  
hasn't yet been release).  It contains all the features you were  
trying to add manually to your version.


You can download a "nightly" build of Solr here:

http://people.apache.org/builds/lucene/solr/nightly/

regards,
-Mike