RE: issues with solr
thanks for your help Erik, ak > From: [EMAIL PROTECTED] > Subject: Re: issues with solr > Date: Mon, 14 Apr 2008 14:50:34 -0400 > To: solr-user@lucene.apache.org > > There is an "Ant script" section on that mySolr page. > > But there is no need to use any of that for your project. All you > need is Solr's WAR file and the appropriate Solr configuration files > and you're good to go. > > Erik > > On Apr 14, 2008, at 9:12 AM, dudes dudes wrote: >> >> thanks Erik, >> >> Basically I have used the build file from solr not from that >> page,... I have had a look and couldn't really find their build.xml >> file ! >> >> thanks >> ak >> >> >>> From: [EMAIL PROTECTED] >>> Subject: Re: issues with solr >>> Date: Mon, 14 Apr 2008 08:54:39 -0400 >>> To: solr-user@lucene.apache.org >>> >>> The mysolr.dist target is defined in the Ant file on that page. My >>> guess is that you were not using the Ant build file bits there. >>> >>> My take is that the mySolr page is not quite what folks should be >>> cloning for incorporation of Solr into their application. Maybe that >>> page should be removed or reworked? >>> >>> Erik >>> >>> >>> On Apr 14, 2008, at 8:21 AM, dudes dudes wrote: Hello there I'm new to Solr I'm trying to deploy the example under http://wiki.apache.org/solr/ mySolr .However, every time I issue ant mysolr.dist it generates: Buildfile: build.xml BUILD FAILED Target "mysolr.dist" does not exist in the project "solr". I'm running Ubuntu getty and the ant version is 1.7.0 What have I missed ? many thanks for your help ak _ Get Hotmail on your mobile. Text MSN to 63463 now! http://mobile.uk.msn.com/pc/mail.aspx >>> >> >> _ >> The next generation of Windows Live is here >> http://www.windowslive.co.uk/get-live > _ Welcome to the next generation of Windows Live http://www.windowslive.co.uk/get-live
Re: too many queries?
My index is 4GB on disk. My servers has 8 GB of RAM each (the OS is 32 bits). It is optimized twice a day, it takes around 15 minutes to optimize. The index is updated (commits) every two minutes. There are between 10 and 100 inserts/updates every 2 minutes. The cache configuration is: filterCache autowarmCount=256 lookups : 24241 hits : 21575 hitratio : 0.89 inserts : 3708 evictions : 3155 size : 512 cumulative_lookups : 2662056 cumulative_hits : 2355474 cumulative_hitratio : 0.88 cumulative_inserts : 382039 cumulative_evictions : 365038 queryResultCache autowarmCount=256 lookups : 2303 hits : 271 hitratio : 0.11 inserts : 2308 evictions : 1774 size : 512 cumulative_lookups : 237586 cumulative_hits : 39555 cumulative_hitratio : 0.16 cumulative_inserts : 201009 cumulative_evictions : 180025 documentCache lookups : 58032 hits : 33759 hitratio : 0.58 inserts : 24273 evictions : 23761 size : 512 cumulative_lookups : 6694035 cumulative_hits : 3906883 cumulative_hitratio : 0.58 cumulative_inserts : 2787152 cumulative_evictions : 2752219 The CPU usage is usually 50%. I give the JVM "java -server -Xmx2048m" when I start Solr. Thanks! Jonathan On Mon, Apr 14, 2008 at 8:24 PM, Otis Gospodnetic < [EMAIL PROTECTED]> wrote: > It's hard to tell from the info given, though something doesn't sound > ideal. Even if Solr's caching doesn't help, with only 4M documents, your > Solr search slaves should be able to keep the whole index in RAM, assuming > your index is not huge. > > How large is the index? (GB on disk) > Is it optimized? > How often is it changed on the master - i.e. how often does your Searcher > need to be reopened? > What are cache hits and evictions like (Solr admin page)? > What are cache sizes like and how is the warm-up configured? > Is there any IO on the slaves? (run vmstat or iostat or some such) > How is the CPU usage looking? > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > - Original Message > From: Jonathan Ariel <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Monday, April 14, 2008 5:50:08 PM > Subject: too many queries? > > Hi, > I have some questions about performance for you guys. > So basically I have 2 slave solr servers and 1 master solr server load > balanced and around 100 request/second, aprox. 50 request per second per > solr server. > My index is about 4 million documents and the average query response time > is > 0.6 seconds, retrieving just 4 documents per query. > What happens is that there are too many request to Solr and every second > is > getting bigger, so eventually my site stops working. > > I don't know if this stats are enough to tell if the servers are supposed > to > handle this amount of request. Maybe it's a configuration problem. I don't > think that caching in solr would help in this case because all the queries > are different (I'm not sure how caching works but if it's per query it > won't > help much in this case). > > Any thoughts about this? > > Thanks! > > Jonathan > > > >
Re: too many queries?
Filter cache evictions are a big red flag. Try bumping up the size of your filter cache to avoid regenerating filters. Erik On Apr 15, 2008, at 8:38 AM, Jonathan Ariel wrote: filterCache autowarmCount=256 lookups : 24241 hits : 21575 hitratio : 0.89 inserts : 3708 evictions : 3155 size : 512 cumulative_lookups : 2662056 cumulative_hits : 2355474 cumulative_hitratio : 0.88 cumulative_inserts : 382039 cumulative_evictions : 365038 The CPU usage is usually 50%. I give the JVM "java -server -Xmx2048m" when I start Solr. Thanks! Jonathan On Mon, Apr 14, 2008 at 8:24 PM, Otis Gospodnetic < [EMAIL PROTECTED]> wrote: It's hard to tell from the info given, though something doesn't sound ideal. Even if Solr's caching doesn't help, with only 4M documents, your Solr search slaves should be able to keep the whole index in RAM, assuming your index is not huge. How large is the index? (GB on disk) Is it optimized? How often is it changed on the master - i.e. how often does your Searcher need to be reopened? What are cache hits and evictions like (Solr admin page)? What are cache sizes like and how is the warm-up configured? Is there any IO on the slaves? (run vmstat or iostat or some such) How is the CPU usage looking? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Jonathan Ariel <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Monday, April 14, 2008 5:50:08 PM Subject: too many queries? Hi, I have some questions about performance for you guys. So basically I have 2 slave solr servers and 1 master solr server load balanced and around 100 request/second, aprox. 50 request per second per solr server. My index is about 4 million documents and the average query response time is 0.6 seconds, retrieving just 4 documents per query. What happens is that there are too many request to Solr and every second is getting bigger, so eventually my site stops working. I don't know if this stats are enough to tell if the servers are supposed to handle this amount of request. Maybe it's a configuration problem. I don't think that caching in solr would help in this case because all the queries are different (I'm not sure how caching works but if it's per query it won't help much in this case). Any thoughts about this? Thanks! Jonathan
RE: Slow Highlighting -> CopyField maxSize property
Koji, The patch is now available at https://issues.apache.org/jira/browse/SOLR-538 Tell me if it fits your needs. Nicolas -Message d'origine- De : Koji Sekiguchi [mailto:[EMAIL PROTECTED] Envoyé : vendredi 21 mars 2008 16:50 À : solr-user@lucene.apache.org Objet : Re: Slow Highlighting -> CopyField maxSize property Hello Nocolas, This has been in the back of my mind for a time. Can you make a patch for it? I'd like to use it. Thank you, Koji [EMAIL PROTECTED] wrote: > Hi all, > > > > I would like to propose a new property on copy fields that limit the number > of characters that are copied. > > > > The use case is the following: Among other documents, we index very big > documents (several Mo of text) and want to be able to use highlighting. > However, as soon as one or more big documents are included in the matches, > the response time is awful. The maxAnalyzedChars is not enough as the full > document is uploaded in memory before to do any processing and that alone > can be very long. > > > > For this kind of situations, we propose to use a dedicated copy field for > highlighting and to limit the number of characters that are copied. For > exemple: > > > > > > This approach has also the advantage of limiting the index size for large > documents (the original text field does not need to be stored and to have > term vectors). However, the index is bigger for small documents... > > > > Of course, if the only terms that are matched by a query are after the > limit, no highlight is possible. > > > > What do you think of this feature? > > > > Best regards, > > Nicolas > >
Re: too many queries?
Thanks. It should be around lookups*1.5, right? Is this measured in bytes? On Tue, Apr 15, 2008 at 11:26 AM, Erik Hatcher <[EMAIL PROTECTED]> wrote: > Filter cache evictions are a big red flag. Try bumping up the size of > your filter cache to avoid regenerating filters. > >Erik > > > > > On Apr 15, 2008, at 8:38 AM, Jonathan Ariel wrote: > > > filterCache > > autowarmCount=256 > > lookups : 24241 > > hits : 21575 > > hitratio : 0.89 > > inserts : 3708 > > evictions : 3155 > > size : 512 > > cumulative_lookups : 2662056 > > cumulative_hits : 2355474 > > cumulative_hitratio : 0.88 > > cumulative_inserts : 382039 > > cumulative_evictions : 365038 > > > > The CPU usage is usually 50%. > > I give the JVM "java -server -Xmx2048m" when I start Solr. > > > > Thanks! > > > > Jonathan > > > > > > > > > > On Mon, Apr 14, 2008 at 8:24 PM, Otis Gospodnetic < > > [EMAIL PROTECTED]> wrote: > > > > It's hard to tell from the info given, though something doesn't sound > > > ideal. Even if Solr's caching doesn't help, with only 4M documents, > > > your > > > Solr search slaves should be able to keep the whole index in RAM, > > > assuming > > > your index is not huge. > > > > > > How large is the index? (GB on disk) > > > Is it optimized? > > > How often is it changed on the master - i.e. how often does your > > > Searcher > > > need to be reopened? > > > What are cache hits and evictions like (Solr admin page)? > > > What are cache sizes like and how is the warm-up configured? > > > Is there any IO on the slaves? (run vmstat or iostat or some such) > > > How is the CPU usage looking? > > > > > > Otis > > > -- > > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > > > - Original Message > > > From: Jonathan Ariel <[EMAIL PROTECTED]> > > > To: solr-user@lucene.apache.org > > > Sent: Monday, April 14, 2008 5:50:08 PM > > > Subject: too many queries? > > > > > > Hi, > > > I have some questions about performance for you guys. > > > So basically I have 2 slave solr servers and 1 master solr server load > > > balanced and around 100 request/second, aprox. 50 request per second > > > per > > > solr server. > > > My index is about 4 million documents and the average query response > > > time > > > is > > > 0.6 seconds, retrieving just 4 documents per query. > > > What happens is that there are too many request to Solr and every > > > second > > > is > > > getting bigger, so eventually my site stops working. > > > > > > I don't know if this stats are enough to tell if the servers are > > > supposed > > > to > > > handle this amount of request. Maybe it's a configuration problem. I > > > don't > > > think that caching in solr would help in this case because all the > > > queries > > > are different (I'm not sure how caching works but if it's per query it > > > won't > > > help much in this case). > > > > > > Any thoughts about this? > > > > > > Thanks! > > > > > > Jonathan > > > > > > > > > > > > > > > >
Re: Slow Highlighting -> CopyField maxSize property
Hello Nicolas, Thank you for letting me know this. Yes, your patch will solve my problem (highlighter performance w/ large doc). BTW, I posted similar ticket to solve my another problem (hl.alternateField w/ large field). https://issues.apache.org/jira/browse/SOLR-516 Thank you again, Koji Nicolas DESSAIGNE wrote: Koji, The patch is now available at https://issues.apache.org/jira/browse/SOLR-538 Tell me if it fits your needs. Nicolas
Re: Fuzzy queries in dismax specs?
: I've started implementing something to use fuzzy queries for selected fields : in dismax. The request handler spec looks like this: : :exact~0.7^4.0 stemmed^2.0 that's a pretty cool idea ... usually when people talk about adding support for other querytypes in dismax they mean to the query sytnax, but you are adding more info to the qf to specify how hte field should be handled in general -- i like it. i think if i had it to do over again (now that dismax supports multiple param values, and per field overrides) i would have made qf and pf multivalued params containing just the field names, and gotten the boost value from a per field overridable fieldBoost param, so adding a fuzzyDistance param would also be trivial 9without needing to parse crazy syntax) (hmmm... ps could be a per field overridable field too ... dismax v2.0 maybe) -Hoss
Re: Interleaved results form different sources
: > We have an index of documents from different sources and we want to make : > sure the results we display are interleaved from the different sources and : > not only ranked based on relevancy.Is there a way to do this ? : : By far the easiest way is to get the top N/2 results from each source and : interleave on the client side. Actually, for a search with no a priori information about the results, you need to fetch N from both sources in case one of them has no matches. (in a paginated system, assuming you know the total number of resultsfrom the first page, subsequent pages can ask for N/2 from each source as long as you know that the current page won't exhaust either source) -Hoss
Re: Interleaved results form different sources
How do you get the top N/2 results from each source? What if you have more than 2 sources? Mike Klaas wrote: > > By far the easiest way is to get the top N/2 results from each source > and interleave on the client side. > > regards, > -Mike > > -- View this message in context: http://www.nabble.com/Interleaved-results-form-different-sources-tp16693128p16703399.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Interleaved results form different sources
first query: q=foo&fq=source:one&rows=5 second query: q=foo&fq=source:two&rows=5 I don't know the answer to your second question, sicnce I don't understand the use case for interleaving two sources anyway (I would try to create scores for the sources that were comparable in some way and combine them using score). -Mike On 15-Apr-08, at 10:29 AM, peter360 wrote: How do you get the top N/2 results from each source? What if you have more than 2 sources? Mike Klaas wrote: By far the easiest way is to get the top N/2 results from each source and interleave on the client side. regards, -Mike -- View this message in context: http://www.nabble.com/Interleaved-results-form-different-sources-tp16693128p16703399.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: filtering search using regex
Solr doesn't provide any regex based searching features out of the box. There are some regex based query classes in lucene, if you wrote a custom Solr plugin to do the query parsing, you could use them. Your question appears to be an "XY Problem" ... that is: you are dealing with "X", you are assuming "Y" will help you, and you are asking about "Y" without giving more details about the "X" so that we can understand the full issue. Perhaps the best solution doesn't involve "Y" at all? See Also: http://www.perlmonks.org/index.pl?node_id=542341 http://people.apache.org/~hossman/#xyproblem If you could elaborate a little more on the exact use case you are trying to solve, people might be able to offer you alternative solutions you've never thought of ... supporting regex search is a much harder problem then finding creative ways to support range queries on unclean data (which is what the root of your issue seems to be). Tell us more about your data, and the types of queries you need to support (without making the assumption that regexes is the best way to support them) -Hoss
Re: too many queries?
Yeah, lots of evictions and tiny caches. Why not increase them? It looks like you have memory to spare. And since you reopen the searcher so often, you can play with increasing the warm-up time if you want to preserve more cached items from the previous searcher. Evictions are measured in the number of occurrences, not bytes. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Jonathan Ariel <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Tuesday, April 15, 2008 11:03:53 AM Subject: Re: too many queries? Thanks. It should be around lookups*1.5, right? Is this measured in bytes? On Tue, Apr 15, 2008 at 11:26 AM, Erik Hatcher <[EMAIL PROTECTED]> wrote: > Filter cache evictions are a big red flag. Try bumping up the size of > your filter cache to avoid regenerating filters. > >Erik > > > > > On Apr 15, 2008, at 8:38 AM, Jonathan Ariel wrote: > > > filterCache > > autowarmCount=256 > > lookups : 24241 > > hits : 21575 > > hitratio : 0.89 > > inserts : 3708 > > evictions : 3155 > > size : 512 > > cumulative_lookups : 2662056 > > cumulative_hits : 2355474 > > cumulative_hitratio : 0.88 > > cumulative_inserts : 382039 > > cumulative_evictions : 365038 > > > > The CPU usage is usually 50%. > > I give the JVM "java -server -Xmx2048m" when I start Solr. > > > > Thanks! > > > > Jonathan > > > > > > > > > > On Mon, Apr 14, 2008 at 8:24 PM, Otis Gospodnetic < > > [EMAIL PROTECTED]> wrote: > > > > It's hard to tell from the info given, though something doesn't sound > > > ideal. Even if Solr's caching doesn't help, with only 4M documents, > > > your > > > Solr search slaves should be able to keep the whole index in RAM, > > > assuming > > > your index is not huge. > > > > > > How large is the index? (GB on disk) > > > Is it optimized? > > > How often is it changed on the master - i.e. how often does your > > > Searcher > > > need to be reopened? > > > What are cache hits and evictions like (Solr admin page)? > > > What are cache sizes like and how is the warm-up configured? > > > Is there any IO on the slaves? (run vmstat or iostat or some such) > > > How is the CPU usage looking? > > > > > > Otis > > > -- > > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > > > - Original Message > > > From: Jonathan Ariel <[EMAIL PROTECTED]> > > > To: solr-user@lucene.apache.org > > > Sent: Monday, April 14, 2008 5:50:08 PM > > > Subject: too many queries? > > > > > > Hi, > > > I have some questions about performance for you guys. > > > So basically I have 2 slave solr servers and 1 master solr server load > > > balanced and around 100 request/second, aprox. 50 request per second > > > per > > > solr server. > > > My index is about 4 million documents and the average query response > > > time > > > is > > > 0.6 seconds, retrieving just 4 documents per query. > > > What happens is that there are too many request to Solr and every > > > second > > > is > > > getting bigger, so eventually my site stops working. > > > > > > I don't know if this stats are enough to tell if the servers are > > > supposed > > > to > > > handle this amount of request. Maybe it's a configuration problem. I > > > don't > > > think that caching in solr would help in this case because all the > > > queries > > > are different (I'm not sure how caching works but if it's per query it > > > won't > > > help much in this case). > > > > > > Any thoughts about this? > > > > > > Thanks! > > > > > > Jonathan > > > > > > > > > > > > > > > >
Re: Snipets Solr/nutch
Mike Klaas wrote: > > On 13-Apr-08, at 3:25 AM, khirb7 wrote: >> >> it doesn't work solr still use the default value fragsize=100. also >> I am not >> able to spécifieregex fragmenter due to this probleme of >> version I >> suppose or the way I am declaring ..> highlighting> >> because >> both of: > > Hi khirb, > > It might be easier for people to help you if you keep things in one > thread. > > I notice that you're trying to apply a patch that has long since been > applied to Solr (another thread). What version of Solr are you > using? How did you acquire it? > > -Mike > hi mike Thank you a lot you are helpful, concerning my solr I am using the 1.2.0 version i download it from the Apache download mirror http://www.apache.org/dyn/closer.cgi/lucene/solr/ , I haven't well understand you when you said : you're trying to apply a patch that has long since been applied to Solr. thank you mike. -- View this message in context: http://www.nabble.com/Snipets-Solr-nutch-tp16537216p16708645.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: too many queries?
On 15-Apr-08, at 5:38 AM, Jonathan Ariel wrote: My index is 4GB on disk. My servers has 8 GB of RAM each (the OS is 32 bits). It is optimized twice a day, it takes around 15 minutes to optimize. The index is updated (commits) every two minutes. There are between 10 and 100 inserts/updates every 2 minutes. Caching could help--you should definitely start there. The commit every 2 minutes could end up being an unsurmountable problem. You may have to partition your data into a large, mostly static set and a small dynamic set, combining the results at query time. -Mike
Re: Snipets Solr/nutch
On 15-Apr-08, at 1:37 PM, khirb7 wrote: Thank you a lot you are helpful, concerning my solr I am using the 1.2.0 version i download it from the Apache download mirror http://www.apache.org/dyn/closer.cgi/lucene/solr/ , I haven't well understand you when you said : you're trying to apply a patch that has long since been applied to Solr. Hi khirb, You could try looking at "trunk" (the development version of Solr that hasn't yet been release). It contains all the features you were trying to add manually to your version. You can download a "nightly" build of Solr here: http://people.apache.org/builds/lucene/solr/nightly/ regards, -Mike