RE: Solr 1.1 HTTP server stops responding
Hi Otis. I'm filling-in for the guy that installed the software for us (now he's long gone), so I'm just getting familiar with all of this. Can you elaborate on what you mean? DW > -Original Message- > From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] > Sent: Friday, July 27, 2007 10:01 AM > To: solr-user@lucene.apache.org > Subject: Re: Solr 1.1 HTTP server stops responding > > Hi David, > > Have you ruled out your servlet container as the source of this bug? > > Otis > > > - Original Message > From: David Whalen <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Friday, July 27, 2007 3:06:42 PM > Subject: Solr 1.1 HTTP server stops responding > > Hi All. > > We're running Solr 1.1 and we're seeing intermittent cases > where Solr stops responding to HTTP requests. It seems like > the listener on port 8983 just doesn't respond. > > We stop and restart Solr and everything works fine for a few > hours, and then the problem returns. We can't seem to point > to any single factor that would lead to this problem, and I'm > hoping to get some hints on how to diagnose it. > > Here's what I can tell you now, and I can provide more info > by request: > > 1) The query load (via /solr/select) isn't that high. Maybe > 20 or 30 requests per minute tops. > > 2) The insert load (via /solr/update) is very high. We > commit almost 500,000 documents per day. We also trim out > the same number however, so the net number of documents > should stay around 20 million. > > 3) We do see Out of Memory errors sometimes, especially when > making facet queries (which we do most of the time). > > We think solr is great, and we want to keep using it, but the > downtime makes the product (and us) look bad, so we need to > solve this soon. > > Thanks in advance for your help! > > DW > > > > > No virus found in this incoming message. > Checked by AVG Free Edition. > Version: 7.5.476 / Virus Database: 269.10.22/922 - Release > Date: 7/27/2007 6:08 AM > >
Solr 1.1 HTTP server stops responding
Hi All. We're running Solr 1.1 and we're seeing intermittent cases where Solr stops responding to HTTP requests. It seems like the listener on port 8983 just doesn't respond. We stop and restart Solr and everything works fine for a few hours, and then the problem returns. We can't seem to point to any single factor that would lead to this problem, and I'm hoping to get some hints on how to diagnose it. Here's what I can tell you now, and I can provide more info by request: 1) The query load (via /solr/select) isn't that high. Maybe 20 or 30 requests per minute tops. 2) The insert load (via /solr/update) is very high. We commit almost 500,000 documents per day. We also trim out the same number however, so the net number of documents should stay around 20 million. 3) We do see Out of Memory errors sometimes, especially when making facet queries (which we do most of the time). We think solr is great, and we want to keep using it, but the downtime makes the product (and us) look bad, so we need to solve this soon. Thanks in advance for your help! DW
RE: Solr 1.1 HTTP server stops responding
We're using Jetty. I don't know what version though. To my knowledge, Solr is the only thing running inside it. Yes, we cannot get to the admin pages either. Nothing on port 8983 responds. So maybe it's actually Jetty that's messing me up? How can I make sure of that? Thanks for the help! DW > -Original Message- > From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] > Sent: Friday, July 27, 2007 10:40 AM > To: solr-user@lucene.apache.org > Subject: Re: Solr 1.1 HTTP server stops responding > > Solr runs as a webapp (think .war file) inside a servlet > container (e.g. Tomcat, Jetty, Resin...). It could be that > the servlet contan itself has a bug that prevents it from > responding properly after a while. If you have other webapps > in the same container, do they still respond? Can you got to > *any* of Solr's "pages" (e.g. admin page)? Anything in > container or Solr logs? > > Otis > -- > Lucene Consulting - http://lucene-consulting.com/ > > > > - Original Message > From: David Whalen <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Friday, July 27, 2007 4:21:18 PM > Subject: RE: Solr 1.1 HTTP server stops responding > > Hi Otis. > > I'm filling-in for the guy that installed the software for us > (now he's long gone), so I'm just getting familiar with all > of this. Can you elaborate on what you mean? > > DW > > > > -Original Message- > > From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] > > Sent: Friday, July 27, 2007 10:01 AM > > To: solr-user@lucene.apache.org > > Subject: Re: Solr 1.1 HTTP server stops responding > > > > Hi David, > > > > Have you ruled out your servlet container as the source of this bug? > > > > Otis > > > > > > - Original Message > > From: David Whalen <[EMAIL PROTECTED]> > > To: solr-user@lucene.apache.org > > Sent: Friday, July 27, 2007 3:06:42 PM > > Subject: Solr 1.1 HTTP server stops responding > > > > Hi All. > > > > We're running Solr 1.1 and we're seeing intermittent cases > where Solr > > stops responding to HTTP requests. It seems like the > listener on port > > 8983 just doesn't respond. > > > > We stop and restart Solr and everything works fine for a few hours, > > and then the problem returns. We can't seem to point to any single > > factor that would lead to this problem, and I'm hoping to get some > > hints on how to diagnose it. > > > > Here's what I can tell you now, and I can provide more info by > > request: > > > > 1) The query load (via /solr/select) isn't that high. > Maybe 20 or 30 > > requests per minute tops. > > > > 2) The insert load (via /solr/update) is very high. We > commit almost > > 500,000 documents per day. We also trim out the same > number however, > > so the net number of documents should stay around 20 million. > > > > 3) We do see Out of Memory errors sometimes, especially when making > > facet queries (which we do most of the time). > > > > We think solr is great, and we want to keep using it, but > the downtime > > makes the product (and us) look bad, so we need to solve this soon. > > > > Thanks in advance for your help! > > > > DW > > > > > > > > > > No virus found in this incoming message. > > Checked by AVG Free Edition. > > Version: 7.5.476 / Virus Database: 269.10.22/922 - Release > > Date: 7/27/2007 6:08 AM > > > > > > > > > No virus found in this incoming message. > Checked by AVG Free Edition. > Version: 7.5.476 / Virus Database: 269.10.22/922 - Release > Date: 7/27/2007 6:08 AM > >
RE: Solr 1.1 HTTP server stops responding
Hi All. I'm still hoping to get some insight into how I can solve this issue. If Jetty is the problem I'll happily get rid of it, but I'd feel better if I could do some tests first to be sure I'm solving the problem. Has anyone else had this problem in the past? Thanks, DW > -Original Message----- > From: David Whalen [mailto:[EMAIL PROTECTED] > Sent: Friday, July 27, 2007 10:49 AM > To: solr-user@lucene.apache.org > Subject: RE: Solr 1.1 HTTP server stops responding > > We're using Jetty. I don't know what version though. To my > knowledge, Solr is the only thing running inside it. > > Yes, we cannot get to the admin pages either. Nothing on port > 8983 responds. > > So maybe it's actually Jetty that's messing me up? How can I > make sure of that? > > Thanks for the help! > > DW > > > > -Original Message- > > From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] > > Sent: Friday, July 27, 2007 10:40 AM > > To: solr-user@lucene.apache.org > > Subject: Re: Solr 1.1 HTTP server stops responding > > > > Solr runs as a webapp (think .war file) inside a servlet container > > (e.g. Tomcat, Jetty, Resin...). It could be that the > servlet contan > > itself has a bug that prevents it from responding properly after a > > while. If you have other webapps in the same container, do > they still > > respond? Can you got to > > *any* of Solr's "pages" (e.g. admin page)? Anything in > container or > > Solr logs? > > > > Otis > > -- > > Lucene Consulting - http://lucene-consulting.com/ > > > > > > > > - Original Message > > From: David Whalen <[EMAIL PROTECTED]> > > To: solr-user@lucene.apache.org > > Sent: Friday, July 27, 2007 4:21:18 PM > > Subject: RE: Solr 1.1 HTTP server stops responding > > > > Hi Otis. > > > > I'm filling-in for the guy that installed the software for us (now > > he's long gone), so I'm just getting familiar with all of > this. Can > > you elaborate on what you mean? > > > > DW > > > > > > > -Original Message- > > > From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] > > > Sent: Friday, July 27, 2007 10:01 AM > > > To: solr-user@lucene.apache.org > > > Subject: Re: Solr 1.1 HTTP server stops responding > > > > > > Hi David, > > > > > > Have you ruled out your servlet container as the source > of this bug? > > > > > > Otis > > > > > > > > > - Original Message > > > From: David Whalen <[EMAIL PROTECTED]> > > > To: solr-user@lucene.apache.org > > > Sent: Friday, July 27, 2007 3:06:42 PM > > > Subject: Solr 1.1 HTTP server stops responding > > > > > > Hi All. > > > > > > We're running Solr 1.1 and we're seeing intermittent cases > > where Solr > > > stops responding to HTTP requests. It seems like the > > listener on port > > > 8983 just doesn't respond. > > > > > > We stop and restart Solr and everything works fine for a > few hours, > > > and then the problem returns. We can't seem to point to > any single > > > factor that would lead to this problem, and I'm hoping to > get some > > > hints on how to diagnose it. > > > > > > Here's what I can tell you now, and I can provide more info by > > > request: > > > > > > 1) The query load (via /solr/select) isn't that high. > > Maybe 20 or 30 > > > requests per minute tops. > > > > > > 2) The insert load (via /solr/update) is very high. We > > commit almost > > > 500,000 documents per day. We also trim out the same > > number however, > > > so the net number of documents should stay around 20 million. > > > > > > 3) We do see Out of Memory errors sometimes, especially > when making > > > facet queries (which we do most of the time). > > > > > > We think solr is great, and we want to keep using it, but > > the downtime > > > makes the product (and us) look bad, so we need to solve > this soon. > > > > > > Thanks in advance for your help! > > > > > > DW > > > > > > > > > > > > > > > No virus found in this incoming message. > > > Checked by AVG Free Edition. > > > Version: 7.5.476 / Virus Database: 269.10.22/922 - Release > > > Date: 7/27/2007 6:08 AM > > > > > > > > > > > > > > > > No virus found in this incoming message. > > Checked by AVG Free Edition. > > Version: 7.5.476 / Virus Database: 269.10.22/922 - Release > > Date: 7/27/2007 6:08 AM > > > > > > No virus found in this incoming message. > Checked by AVG Free Edition. > Version: 7.5.476 / Virus Database: 269.10.22/922 - Release > Date: 7/27/2007 6:08 AM > >
Please help! Solr 1.1 HTTP server stops responding
Guys: Can anyone help me? Things are getting serious at my company and heads are going to roll. I need to figure out why solr just suddenly stops responding without any warning. DW > -Original Message- > From: David Whalen [mailto:[EMAIL PROTECTED] > Sent: Friday, July 27, 2007 10:49 AM > To: solr-user@lucene.apache.org > Subject: RE: Solr 1.1 HTTP server stops responding > > We're using Jetty. I don't know what version though. To my > knowledge, Solr is the only thing running inside it. > > Yes, we cannot get to the admin pages either. Nothing on port > 8983 responds. > > So maybe it's actually Jetty that's messing me up? How can I > make sure of that? > > Thanks for the help! > > DW > > > > -Original Message- > > From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] > > Sent: Friday, July 27, 2007 10:40 AM > > To: solr-user@lucene.apache.org > > Subject: Re: Solr 1.1 HTTP server stops responding > > > > Solr runs as a webapp (think .war file) inside a servlet container > > (e.g. Tomcat, Jetty, Resin...). It could be that the > servlet contan > > itself has a bug that prevents it from responding properly after a > > while. If you have other webapps in the same container, do > they still > > respond? Can you got to > > *any* of Solr's "pages" (e.g. admin page)? Anything in > container or > > Solr logs? > > > > Otis > > -- > > Lucene Consulting - http://lucene-consulting.com/ > > > > > > > > - Original Message > > From: David Whalen <[EMAIL PROTECTED]> > > To: solr-user@lucene.apache.org > > Sent: Friday, July 27, 2007 4:21:18 PM > > Subject: RE: Solr 1.1 HTTP server stops responding > > > > Hi Otis. > > > > I'm filling-in for the guy that installed the software for us (now > > he's long gone), so I'm just getting familiar with all of > this. Can > > you elaborate on what you mean? > > > > DW > > > > > > > -Original Message- > > > From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] > > > Sent: Friday, July 27, 2007 10:01 AM > > > To: solr-user@lucene.apache.org > > > Subject: Re: Solr 1.1 HTTP server stops responding > > > > > > Hi David, > > > > > > Have you ruled out your servlet container as the source > of this bug? > > > > > > Otis > > > > > > > > > - Original Message > > > From: David Whalen <[EMAIL PROTECTED]> > > > To: solr-user@lucene.apache.org > > > Sent: Friday, July 27, 2007 3:06:42 PM > > > Subject: Solr 1.1 HTTP server stops responding > > > > > > Hi All. > > > > > > We're running Solr 1.1 and we're seeing intermittent cases > > where Solr > > > stops responding to HTTP requests. It seems like the > > listener on port > > > 8983 just doesn't respond. > > > > > > We stop and restart Solr and everything works fine for a > few hours, > > > and then the problem returns. We can't seem to point to > any single > > > factor that would lead to this problem, and I'm hoping to > get some > > > hints on how to diagnose it. > > > > > > Here's what I can tell you now, and I can provide more info by > > > request: > > > > > > 1) The query load (via /solr/select) isn't that high. > > Maybe 20 or 30 > > > requests per minute tops. > > > > > > 2) The insert load (via /solr/update) is very high. We > > commit almost > > > 500,000 documents per day. We also trim out the same > > number however, > > > so the net number of documents should stay around 20 million. > > > > > > 3) We do see Out of Memory errors sometimes, especially > when making > > > facet queries (which we do most of the time). > > > > > > We think solr is great, and we want to keep using it, but > > the downtime > > > makes the product (and us) look bad, so we need to solve > this soon. > > > > > > Thanks in advance for your help! > > > > > > DW > > > > > > > > > > > > > > > No virus found in this incoming message. > > > Checked by AVG Free Edition. > > > Version: 7.5.476 / Virus Database: 269.10.22/922 - Release > > > Date: 7/27/2007 6:08 AM > > > > > > > > > > > > > > > > No virus found in this incoming message. > > Checked by AVG Free Edition. > > Version: 7.5.476 / Virus Database: 269.10.22/922 - Release > > Date: 7/27/2007 6:08 AM > > > > > > No virus found in this incoming message. > Checked by AVG Free Edition. > Version: 7.5.476 / Virus Database: 269.10.22/922 - Release > Date: 7/27/2007 6:08 AM > >
RE: Please help! Solr 1.1 HTTP server stops responding
Hi Yonik! I'm glad to finally get to talk to you. We're all very impressed with solr and when it's running it's really great. We increased the heap size to 1500M and that didn't seem to help. In fact, the crashes seem to occur more now than ever. We're constantly restarting solr just to get a response. I don't know enough to know where the log files are to answer your question (again, I'm filling in for the guy that set us up with all this). Can I ask for your patience so we can figure this out? Thanks! Dave W > -Original Message- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf > Of Yonik Seeley > Sent: Monday, July 30, 2007 2:23 PM > To: solr-user@lucene.apache.org > Subject: Re: Please help! Solr 1.1 HTTP server stops responding > > It may be related to the out-of-memory errors you were seeing. > severe errors like that should never be ignored. > Do you see any other warning or severe errors in your logs? > > -Yonik > > On 7/30/07, David Whalen <[EMAIL PROTECTED]> wrote: > > Guys: > > > > Can anyone help me? Things are getting serious at my company and > > heads are going to roll. > > > > I need to figure out why solr just suddenly stops > responding without > > any warning. > > > > DW > > > > > > > -Original Message- > > > From: David Whalen [mailto:[EMAIL PROTECTED] > > > Sent: Friday, July 27, 2007 10:49 AM > > > To: solr-user@lucene.apache.org > > > Subject: RE: Solr 1.1 HTTP server stops responding > > > > > > We're using Jetty. I don't know what version though. To my > > > knowledge, Solr is the only thing running inside it. > > > > > > Yes, we cannot get to the admin pages either. Nothing on port > > > 8983 responds. > > > > > > So maybe it's actually Jetty that's messing me up? How > can I make > > > sure of that? > > > > > > Thanks for the help! > > > > > > DW > > > > > > > > > > -Original Message- > > > > From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] > > > > Sent: Friday, July 27, 2007 10:40 AM > > > > To: solr-user@lucene.apache.org > > > > Subject: Re: Solr 1.1 HTTP server stops responding > > > > > > > > Solr runs as a webapp (think .war file) inside a > servlet container > > > > (e.g. Tomcat, Jetty, Resin...). It could be that the > > > servlet contan > > > > itself has a bug that prevents it from responding > properly after a > > > > while. If you have other webapps in the same container, do > > > they still > > > > respond? Can you got to > > > > *any* of Solr's "pages" (e.g. admin page)? Anything in > > > container or > > > > Solr logs? > > > > > > > > Otis > > > > -- > > > > Lucene Consulting - http://lucene-consulting.com/ > > > > > > > > > > > > > > > > - Original Message > > > > From: David Whalen <[EMAIL PROTECTED]> > > > > To: solr-user@lucene.apache.org > > > > Sent: Friday, July 27, 2007 4:21:18 PM > > > > Subject: RE: Solr 1.1 HTTP server stops responding > > > > > > > > Hi Otis. > > > > > > > > I'm filling-in for the guy that installed the software > for us (now > > > > he's long gone), so I'm just getting familiar with all of > > > this. Can > > > > you elaborate on what you mean? > > > > > > > > DW > > > > > > > > > > > > > -Original Message- > > > > > From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] > > > > > Sent: Friday, July 27, 2007 10:01 AM > > > > > To: solr-user@lucene.apache.org > > > > > Subject: Re: Solr 1.1 HTTP server stops responding > > > > > > > > > > Hi David, > > > > > > > > > > Have you ruled out your servlet container as the source > > > of this bug? > > > > > > > > > > Otis > > > > > > > > > > > > > > > - Original Message > > > > > From: David Whalen <[EMAIL PROTECTED]> > > > > > To: solr-user@lucene.apache.org > > > > > Sent: Friday, July 27, 2007 3:06:42 PM > > > > > Subject: Solr 1.1 HTTP server stops responding > >
RE: Please help! Solr 1.1 HTTP server stops responding
Yonik: > If that's not the problem, you could decrease memory usage > due to faceting by upgrading to Solr 1.2 and using > facet.enum.cache.minDf Is it hard to upgrade from 1.1 to 1.2? We were considering making that change if it wouldn't cost us a lot of downtime. can you help me understand what "using facet.enum.cache.minDf" means? Is that a setting in the config file? Dave W > -Original Message- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf > Of Yonik Seeley > Sent: Monday, July 30, 2007 3:29 PM > To: solr-user@lucene.apache.org > Subject: Re: Please help! Solr 1.1 HTTP server stops responding > > Grep for PERFORMANCE in the logs to make sure that you aren't > running into a scenario where more than one searcher is > warming in the background. > > If that's not the problem, you could decrease memory usage > due to faceting by upgrading to Solr 1.2 and using > facet.enum.cache.minDf > > -Yonik > > On 7/30/07, Kevin Holmes <[EMAIL PROTECTED]> wrote: > > Just got this: > > > > > > > > Jul 30, 2007 3:02:14 PM org.apache.solr.core.SolrException log > > SEVERE: java.lang.OutOfMemoryError: Java heap space > > > > Jul 30, 2007 3:02:30 PM org.apache.solr.core.SolrException log > > SEVERE: java.lang.OutOfMemoryError: Java heap space > > > > > > > > > > > > Kevin Holmes > > eNR Services, Inc. > > 20 Glover Ave. 2nd Floor > > Norwalk, CT. 06851 > > 203-849-7248 > > [EMAIL PROTECTED] > > > > > > -Original Message- > > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On > Behalf Of Yonik > > Seeley > > Sent: Monday, July 30, 2007 2:55 PM > > To: solr-user@lucene.apache.org > > Subject: Re: Please help! Solr 1.1 HTTP server stops responding > > > > On 7/30/07, David Whalen <[EMAIL PROTECTED]> wrote: > > > We increased the heap size to 1500M and that didn't seem to help. > > > In fact, the crashes seem to occur more now than ever. We're > > > constantly restarting solr just to get a response. > > > > > > I don't know enough to know where the log files are to > answer your > > > question > > > > Me neither ;-) > > Solr's example app that uses Jetty just has logging going to stdout > > (the console) to make it clear and visible to new users > when an error > > happens. Hopefully you've configured Jetty to log to files, or at > > least redirected Jetty's stdout/stderr to a file. > > You need to look around and try and find those log files. > > If you find them, one thing to look for would be "WARNING" > in the log > > files. Another thing to look for would be "Exception" or "Memory" > > > > > So maybe it's actually Jetty that's messing me up? How > can I make > > > sure of that? > > > > Perhaps point your browser at http://localhost:8983/ and see if you > > get any reponse at all. > > > > -Yonik > > > > No virus found in this incoming message. > Checked by AVG Free Edition. > Version: 7.5.476 / Virus Database: 269.10.25/926 - Release > Date: 7/29/2007 11:14 PM > >
faceting on multiple columns
Hi All. I am using facets to help me build an ajax-driven tree for search results. When the search is first run, all I need to do is show the counts per facet, for example search results for "fred" +--A (102) +--B (234) +--C (721) +--D (512) sounds simple, but I also need to break-down the results from "D" by a different index in lucene: search results for "fred" +--A (102) +--B (234) +--C (721) +--D (512) +--D1 (19) +--D2 (34) +--D3 (45) what I have been doing in my solr querystring looks like this: rows=0&facet=true&facet.limit=-1&facet.field=&facet.field= Unfortunately we're seeing really bad performance and we're constantly running out of heap space on this type of query. So, my question is, would breaking this into two calls perform better? That is, rows=0&facet=true&facet.limit=-1&facet.field= and then rows=0&facet=true&facet.limit=-1&facet.field= ? It seems to me that two calls would have more overhead than one, but it might lessen the impact on the heap space on my server. Anyone work enough with facets to throw in their two cents? Thanks! Dave W.
RE: Any clever ideas to inject into solr? Without http?
What we're looking for is a way to inject *without* using curl, or wget, or any other http-based communication. We'd like for the HTTP daemon to only handle search requests, not indexing requests on top of them. Plus, I have to believe there's a faster way to get documents into solr/lucene than using curl _________ david whalen senior applications developer eNR Services, Inc. [EMAIL PROTECTED] 203-849-7240 > -Original Message- > From: Clay Webster [mailto:[EMAIL PROTECTED] > Sent: Thursday, August 09, 2007 11:43 AM > To: solr-user@lucene.apache.org > Subject: Re: Any clever ideas to inject into solr? Without http? > > Condensing the loader into a single executable sounds right > if you have performance problems. ;-) > > You could also try adding multiple s in a single post if > you notice your problems are with tcp setup time, though if > you're doing localhost connections that should be minimal. > > If you're already local to the solr server, you might check > out the CSV slurper. http://wiki.apache.org/solr/UpdateCSV > It's a little specialized. > > And then there's of course the question of "are you doing > full re-indexing or incremental indexing of changes?" > > --cw > > > On 8/9/07, Kevin Holmes <[EMAIL PROTECTED]> wrote: > > > > I inherited an existing (working) solr indexing script that > runs like > > this: > > > > > > > > Python script queries the mysql DB then calls bash script > > > > Bash script performs a curl POST submit to solr > > > > > > > > We're injecting about 1000 records / minute (constantly), > frequently > > pushing the edge of our CPU / RAM limitations. > > > > > > > > I'm in the process of building a Perl script to use DBI and > > lwp::simple::post that will perform this all from a single script > > (instead of 3). > > > > > > > > Two specific questions > > > > 1: Does anyone have a clever (or better) way to perform > this process > > efficiently? > > > > > > > > 2: Is there a way to inject into solr without using POST / > curl / http? > > > > > > > > Admittedly, I'm no solr expert - I'm starting from someone else's > > setup, trying to reverse-engineer my way out. Any input would be > > greatly appreciated. > > > > >
Problem with stemming
Hi All. We're running into a problem with stemming that I can't figure out. For example, searching for the word "transit" (whether in quotes or not) returns documents with the word "transition" in them. How do I disable this? We want our engine to be as literal as possible. If a user mis-types a word, that's too bad for them TIA DW
RE: Problem with stemming
Yonik: I only raised the question to the group after I had looked in the schema.xml. There are a lot of comments in that file, but they make no sense to me. I'd appreciate some specific help on what to do... DW > -Original Message- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf > Of Yonik Seeley > Sent: Monday, August 13, 2007 3:28 PM > To: solr-user@lucene.apache.org > Subject: Re: Problem with stemming > > On 8/13/07, David Whalen <[EMAIL PROTECTED]> wrote: > > Hi All. > > > > We're running into a problem with stemming that I can't > figure out. > > For example, searching for the word "transit" > > (whether in quotes or not) returns documents with the word > > "transition" in them. > > > > How do I disable this? We want our engine to be as literal as > > possible. If a user mis-types a word, that's too bad for them > > Use a different field-type for those fields that you want > exact matching for (and then re-index). > Read through schema.xml if you haven't... there are quite a > few comments in there. > You may want a field type with just a whitespace tokenizer > followed by a lowercase filter. > > -Yonik >
RE: Problem with stemming
Thanks, guys. I'm sure that by the time I get the book and learn all about Lucene the CEO of my company will have insisted we find another search engine. But the book will look great on my coffee table > -Original Message- > From: Lance Norskog [mailto:[EMAIL PROTECTED] > Sent: Monday, August 13, 2007 4:37 PM > To: solr-user@lucene.apache.org > Subject: RE: Problem with stemming > > (Oops, try again.) > > You need this book: > > http://www.amazon.com/Lucene-Action-Erik-Hatcher/dp/1932394281 > /ref=pd_bbs_sr > _1/103-4871137-7111056?ie=UTF8&s=books&qid=1187037246&sr=8-1 > > Lucene in Action by Eric Hatcher and Otis Gospodnetic. It > does not cover Solr really, but you will understand what > Lucene does and how it works. > Until then you will not really get anywhere. > > Cheers, > > Lance > > -Original Message- > From: David Whalen [mailto:[EMAIL PROTECTED] > Sent: Monday, August 13, 2007 1:00 PM > To: solr-user@lucene.apache.org > Subject: RE: Problem with stemming > > Yonik: > > I only raised the question to the group after I had looked in > the schema.xml. There are a lot of comments in that file, > but they make no sense to me. > > I'd appreciate some specific help on what to do... > > DW > > > -Original Message- > > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On > Behalf Of Yonik > > Seeley > > Sent: Monday, August 13, 2007 3:28 PM > > To: solr-user@lucene.apache.org > > Subject: Re: Problem with stemming > > > > On 8/13/07, David Whalen <[EMAIL PROTECTED]> wrote: > > > Hi All. > > > > > > We're running into a problem with stemming that I can't > > figure out. > > > For example, searching for the word "transit" > > > (whether in quotes or not) returns documents with the word > > > "transition" in them. > > > > > > How do I disable this? We want our engine to be as literal as > > > possible. If a user mis-types a word, that's too bad for them > > > > Use a different field-type for those fields that you want exact > > matching for (and then re-index). > > Read through schema.xml if you haven't... there are quite a few > > comments in there. > > You may want a field type with just a whitespace tokenizer > followed by > > a lowercase filter. > > > > -Yonik > > > >
RE: Problem with stemming
So I shut it off by removing these tags from my schema.xml file? Seems like it's this Porter thing that's messing me up. > -Original Message- > From: Tom Mastre [mailto:[EMAIL PROTECTED] > Sent: Monday, August 13, 2007 5:19 PM > To: solr-user@lucene.apache.org > Subject: Re: Problem with stemming > > Go here > > http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters?hi ghlight=%28stemming%29> #head-88cc86e4432b359030cffdb32d095062b843d4f5 > > Look for this > > solr.PorterStemFilterFactory > > > On 8/13/07 1:50 PM, "David Whalen" <[EMAIL PROTECTED]> wrote: > > > Thanks, guys. I'm sure that by the time I get the book and > > learn all about Lucene the CEO of my company will have insisted > > we find another search engine. But the book will look great > > on my coffee table > > > > > > > > > >> -Original Message- > >> From: Lance Norskog [mailto:[EMAIL PROTECTED] > >> Sent: Monday, August 13, 2007 4:37 PM > >> To: solr-user@lucene.apache.org > >> Subject: RE: Problem with stemming > >> > >> (Oops, try again.) > >> > >> You need this book: > >> > >> http://www.amazon.com/Lucene-Action-Erik-Hatcher/dp/1932394281 > >> /ref=pd_bbs_sr > >> _1/103-4871137-7111056?ie=UTF8&s=books&qid=1187037246&sr=8-1 > >> > >> Lucene in Action by Eric Hatcher and Otis Gospodnetic. It > >> does not cover Solr really, but you will understand what > >> Lucene does and how it works. > >> Until then you will not really get anywhere. > >> > >> Cheers, > >> > >> Lance > >> > >> -Original Message- > >> From: David Whalen [mailto:[EMAIL PROTECTED] > >> Sent: Monday, August 13, 2007 1:00 PM > >> To: solr-user@lucene.apache.org > >> Subject: RE: Problem with stemming > >> > >> Yonik: > >> > >> I only raised the question to the group after I had looked in > >> the schema.xml. There are a lot of comments in that file, > >> but they make no sense to me. > >> > >> I'd appreciate some specific help on what to do... > >> > >> DW > >> > >>> -Original Message- > >>> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On > >> Behalf Of Yonik > >>> Seeley > >>> Sent: Monday, August 13, 2007 3:28 PM > >>> To: solr-user@lucene.apache.org > >>> Subject: Re: Problem with stemming > >>> > >>> On 8/13/07, David Whalen <[EMAIL PROTECTED]> wrote: > >>>> Hi All. > >>>> > >>>> We're running into a problem with stemming that I can't > >>> figure out. > >>>> For example, searching for the word "transit" > >>>> (whether in quotes or not) returns documents with the word > >>>> "transition" in them. > >>>> > >>>> How do I disable this? We want our engine to be as literal as > >>>> possible. If a user mis-types a word, that's too bad > for them > >>> > >>> Use a different field-type for those fields that you want exact > >>> matching for (and then re-index). > >>> Read through schema.xml if you haven't... there are quite a few > >>> comments in there. > >>> You may want a field type with just a whitespace tokenizer > >> followed by > >>> a lowercase filter. > >>> > >>> -Yonik > >>> > >> > >> > > Thomas M. Mastre > Manager, Homeland Security Digital Library > > > Center for Homeland Defense and Security > The Nation's Homeland Security Educator > 1 University Circle > DKL, Room 112 > Monterey, Ca. 93943 > Phone: 831.656.1095, Cell:831.238.1451 > Fax:831.656.2619 > email: [EMAIL PROTECTED] > http://www.hsdl.org > >
Effects of changing schema?
Hi All. I'm unclear on whether changing the schema.xml file automatically causes a reindex or not. If I'm adding a field to the schema (and removing some unused ones), does solr do the reindex? Or, do I have to kick it off myself. Ideally, we'd like to avoid a reindex... Thanks! DW
searching where a value is not null?
Hi all. I'm trying to construct a query that in pseudo-code would read like this: field != '' I'm finding it difficult to write this as a solr query, though. Stuff like: NOT field:() doesn't seem to do the trick. any ideas? dw
quirks with sorting
Hi All. I'm seeing a weird problem with sorting that I can't figure out. I have a query that uses two fields -- a "source" column and a date column. I search on the source and I sort by the date descending. What I'm seeing is that depending on the value in the source, the date sort works in reverse. For example, the query: content_source:(mv); content_date desc returns 2007-09-10T09:25:00.000Z in its first row, which is what I expect. BUT, the query: content_source:(thomson); content_date desc returns 2008-08-17T00:00:00.000Z, which is the first date we put into SOLR. So, simply by changing the value in the field, the sort seems to beem reversed (or ignored outright). Now, before you ask, I did a "sanity-check" query to make sure that there is in fact data for that source from today, and there is. Can anyone help shed some light on this? TIA DW
RE: quirks with sorting
You know, I must have looked at that date 10 times and I never noticed the year. Sorry everyone! > -Original Message- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf > Of Yonik Seeley > Sent: Monday, September 10, 2007 11:23 AM > To: solr-user@lucene.apache.org > Subject: Re: quirks with sorting > > On 9/10/07, David Whalen <[EMAIL PROTECTED]> wrote: > > I'm seeing a weird problem with sorting that I can't figure out. > > > > I have a query that uses two fields -- a "source" column and a date > > column. I search on the source and I sort by the date descending. > > > > What I'm seeing is that depending on the value in the > source, the date > > sort works in reverse. > > > > For example, the query: > > > > content_source:(mv); content_date desc > > > > returns 2007-09-10T09:25:00.000Z in its first row, which is what I > > expect. > > > > BUT, the query: > > > > content_source:(thomson); content_date desc > > > > returns 2008-08-17T00:00:00.000Z, which is the first date > we put into > > SOLR. > > It is it the last (highest date) since it's 2008? > > -Yonik >
Selecting Distinct values?
Hi there. Is there a query I can use to select distinct values in an index? I thought I could use a facet, but the facets don't seem to return all the distinct values in the index, only the highest-count ones. Is there another query I can try? Or, can I adjust the facets somehow to make this work? Thanks, DW
RE: Selecting Distinct values?
Silly me. Thanks! > -Original Message- > From: Mike Klaas [mailto:[EMAIL PROTECTED] > Sent: Thursday, September 27, 2007 4:46 PM > To: solr-user@lucene.apache.org > Subject: Re: Selecting Distinct values? > > On 27-Sep-07, at 12:01 PM, David Whalen wrote: > > > Hi there. > > > > Is there a query I can use to select distinct values in an index? > > I thought I could use a facet, but the facets don't seem to > return all > > the distinct values in the index, only the highest-count ones. > > > > Is there another query I can try? Or, can I adjust the > facets somehow > > to make this work? > > http://wiki.apache.org/solr/SimpleFacetParameters#head-1b28106 > 7d007d3fb66f07a3e90e9b1704cbc59a3 > > cheers, > -Mike > >
Availability Issues
Hi All. I'm seeing all these threads about availability and I'm wondering why my situation is so different than others'. We're running SOLR 1.2 with a 2.5G heap size. On any given day, the system becomes completely unresponsive. We can't even get /solr/admin/ to come up, much less any select queries. The only thing we can do is kill the SOLR process and re-start it. We are indexing over 25 million documents and we add about as much as we remove daily, so the number remains fairly constant. Again, it seems like other folks are having a much easier time with SOLR than we are. Can anyone help by sharing how you've got it configured? Does anyone have a similar experience? TIA. DW
RE: Availability Issues
Hi Tom. The logs show nothing but regular activity. We do a "tail -f" on the logfile and we can read it during the unresponsive period and we don't see any errors. I've attached our schema/config files. They are pretty much out-of-the-box values, except for our index. Dave > -Original Message- > From: Tom Hill [mailto:[EMAIL PROTECTED] > Sent: Monday, October 08, 2007 2:22 PM > To: solr-user@lucene.apache.org > Subject: Re: Availability Issues > > Hi - > > We're definitely not seeing that. What do your logs show? > What do your schema/solrconfig look like? > > Tom > > > On 10/8/07, David Whalen <[EMAIL PROTECTED]> wrote: > > > > Hi All. > > > > I'm seeing all these threads about availability and I'm > wondering why > > my situation is so different than others'. > > > > We're running SOLR 1.2 with a 2.5G heap size. On any given > day, the > > system becomes completely unresponsive. > > We can't even get /solr/admin/ to come up, much less any select > > queries. > > > > The only thing we can do is kill the SOLR process and re-start it. > > > > We are indexing over 25 million documents and we add about > as much as > > we remove daily, so the number remains fairly constant. > > > > Again, it seems like other folks are having a much easier time with > > SOLR than we are. Can anyone help by sharing how you've got it > > configured? Does anyone have a similar experience? > > > > TIA. > > > > DW > > > > > >
RE: Availability Issues
Hi Yonik. > What version of Solr are you running? We're running: Solr Specification Version: 1.2.2007.08.24.08.06.00 Solr Implementation Version: nightly ${svnversion} - yonik - 2007-08-24 08:06:00 Lucene Specification Version: 2.2.0 Lucene Implementation Version: 2.2.0 548010 - buschmi - 2007-06-16 23:15:56 > Is the CPU pegged at 100% when it's unresponsive? It's a little difficult to be sure. We have a HT box and the CPU % we get back is misleading. I think it's safe to say we may spike up to 100% but we don't necessarily stay pegged there. > Have you taken a thread dump to see what is going on? We can't do it b/c during the unresponsive time we can't access the admin site (/solr/admin) at all. I don't know how to do a thread dump via the command line > Do you get into a situation where more than one searcher is > warming at a time? (there is configuration that can prevent > this one from happening). Forgive me when I say I'm not totally clear on what this question means. The index is constantly getting hit with a myriad or queries, if that's what you meant Thanks, Dave > -Original Message- > From: Yonik Seeley [mailto:[EMAIL PROTECTED] > Sent: Monday, October 08, 2007 2:23 PM > To: solr-user@lucene.apache.org > Subject: Re: Availability Issues > > On 10/8/07, David Whalen <[EMAIL PROTECTED]> wrote: > > We're running SOLR 1.2 with a 2.5G heap size. On any given > day, the > > system becomes completely unresponsive. > > We can't even get /solr/admin/ to come up, much less any select > > queries. > > What version of Solr are you running? > The first step to diagnose something like this is to figure > out what is going on... > Is the CPU pegged at 100% when it's unresponsive? > Have you taken a thread dump to see what is going on? > Do you get into a situation where more than one searcher is > warming at a time? (there is configuration that can prevent > this one from happening). > > -Yonik > >
RE: Availability Issues
Hi Yonik. > Do you see any requests that took a really long time to finish? The requests that take a long time to finish are just simple queries. And the same queries run at a later time come back much faster. Our logs contain 99% inserts and 1% queries. We are constantly adding documents to the index at a rate of 10,000 per minute, so the logs show mostly that. > Start with the thread dump. > I bet it's multiple queries piling up around some > synchronization points in lucene (sometimes caused by > multiple threads generating the same big filter that isn't > yet cached). What would be my next steps after that? I'm not sure I'd understand enough from the dump to make heads-or-tails of it. Can I share that here? Dave > -Original Message- > From: Yonik Seeley [mailto:[EMAIL PROTECTED] > Sent: Monday, October 08, 2007 3:01 PM > To: solr-user@lucene.apache.org > Subject: Re: Availability Issues > > On 10/8/07, David Whalen <[EMAIL PROTECTED]> wrote: > > The logs show nothing but regular activity. We do a "tail -f" > > on the logfile and we can read it during the unresponsive > period and > > we don't see any errors. > > You don't see log entries for requests until after they complete. > When a server becomes unresponsive, try shutting off further > traffic to it, and let it finish whatever requests it's > working on (assuming that's the issue) so you can see them in > the log. Do you see any requests that took a really long > time to finish? > > -Yonik > >
RE: Availability Issues
> Oh, so you are using the same boxes for updating and querying? Yep. We have a MySQL database on the box and we query it and POST directly into SOLR via wget in PERL. We then also hit the box for queries. [We'd be very interested in hearing about best practices on how to seperate-out the data from the index and how to balance them when the inserts outweigh the selects by factors of 50,000:1] > When you insert, are you using multiple threads? If so, how many? We're not threading at all. We have a PERL script that does a select statement out of a MySQL database and runs POSTs sequentially into SOLR, one per document. After a batch of 10,000 POSTs, we run a background commit (using waitFlush and waitSearcher) Again, I'd be very grateful for success stories from people in terms of good server architecture. We are ready and willing to change versions of linux, of the Java container, etc. And we're ready to add more boxes if that'll help. We just need some guidance. > What is the full URL of those slow query requests? They can be anything. For example: [08/10/2007:18:51:55 +] "GET /solr/select/?q=solr&version=2.2&start=0&rows=10&indent=on HTTP/1.1" 200 45799 > Do the slow requests start after a commit? Based on the way the logs read, you could argue that point. The stream of POSTs end in the logs and then subsequent queries take longer to run, but it's hard to be sure there's a direct correlation. > Yes, post it here. Most likely a majority of the threads > will be blocked somewhere deep in lucene code, and you will > probably need help from people here to figure it out. Next time it happens I'll shoot it over. --Dave > -Original Message- > From: Yonik Seeley [mailto:[EMAIL PROTECTED] > Sent: Monday, October 08, 2007 3:42 PM > To: solr-user@lucene.apache.org > Subject: Re: Availability Issues > > On 10/8/07, David Whalen <[EMAIL PROTECTED]> wrote: > > > Do you see any requests that took a really long time to finish? > > > > The requests that take a long time to finish are just > simple queries. > > And the same queries run at a later time come back much faster. > > > > Our logs contain 99% inserts and 1% queries. We are > constantly adding > > documents to the index at a rate of 10,000 per minute, so the logs > > show mostly that. > > Oh, so you are using the same boxes for updating and querying? > When you insert, are you using multiple threads? If so, how many? > > What is the full URL of those slow query requests? > Do the slow requests start after a commit? > > > > Start with the thread dump. > > > I bet it's multiple queries piling up around some synchronization > > > points in lucene (sometimes caused by multiple threads generating > > > the same big filter that isn't yet cached). > > > > What would be my next steps after that? I'm not sure I'd > understand > > enough from the dump to make heads-or-tails of it. Can I > share that > > here? > > Yes, post it here. Most likely a majority of the threads > will be blocked somewhere deep in lucene code, and you will > probably need help from people here to figure it out. > > -Yonik > >
RE: Availability Issues
Hi Chris. My logs don't look anything like that. They look like HTTP requests. Am I looking in the wrong place? Dave > -Original Message- > From: Chris Hostetter [mailto:[EMAIL PROTECTED] > Sent: Monday, October 08, 2007 5:02 PM > To: solr-user > Subject: RE: Availability Issues > > > : > Do the slow requests start after a commit? > : > : Based on the way the logs read, you could argue that point. > : The stream of POSTs end in the logs and then subsequent queries > : take longer to run, but it's hard to be sure there's a direct > : correlation. > > you would know based on the INFO level messages related to a > commit ... > you'll see messages that look like this when the commit starts... > > Oct 8, 2007 1:56:48 PM > org.apache.solr.update.DirectUpdateHandler2 commit > INFO: start commit(optimize=false,waitFlush=false,waitSearcher=true) > > ...then you'll see a message like this... > > Oct 8, 2007 1:56:48 PM > org.apache.solr.update.DirectUpdateHandler2 commit > INFO: end_commit_flush > > ...if you have autowarming you'll see a bunch of logs about > that, and then eventually you'll see a message like this... > > Oct 8, 2007 1:56:48 PM > org.apache.solr.update.processor.LogUpdateProcessor finish > INFO: {commit=} 0 299 > > ...the important question is how many of these hangs or > really long queries happen in the midst of all that ... how > many happen very quickly after it (which may indicate not > enough warming) > > (NOTE: some of those log messages may look different in your > nightly snapshot version, but the main gist should be the > same .. i don't remember when exactly the LogUpdateProcessor > was added). > > > > > -Hoss > > >
RE: Availability Issues
Thanks for letting me know that. Okay, here they are: BEGIN SCHEMA.XML === id text END SCHEMA.XML === BEGIN CONFIG.XML === false true 10 1000 2147483647 1 1000 1 true 10 1000 2147483647 1 true 1024 false 10 false explicit 50 10 * 2.1 --> explicit 0.01 text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4 text^0.2 features^1.1 name^1.5 manu^1.4 manu_exact^1.9 ord(poplarity)^0.5 recip(rord(price),1,1000,1000)^0.3 id,name,price,score 2<-1 5<-2 6<90% 100 explicit text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 2<-1 5<-2 6<90% incubationdate_dt:[* TO NOW/DAY-1MONTH]^2.2 inStock:true cat manu_exact price:[* TO 500] price:[500 TO *] inStock:true text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4 2<-1 5<-2 6<90% 5 solr solrconfig.xml schema.xml admin-extra.html qt=dismax&q=solr&start=3&fq=id:[* TO *]&fq=cat:[* TO *] END CONFIG.XML === > -Original Message- > From: Chris Hostetter [mailto:[EMAIL PROTECTED] > Sent: Monday, October 08, 2007 4:56 PM > To: solr-user > Subject: RE: Availability Issues > > : I've attached our schema/config files. They are pretty much > : out-of-the-box values, except for our index. > > FYI: the mailing list strips most attachemnts ... the best > thing to do is just inline them in your mail. > > Quick question: do you have autoCommit turned on in your > solrconfig.xml? > > Second question: do you have autowarming on your caches? > > > > -Hoss > > >
RE: Availability Issues
Chris: We're using Jetty also, so I get the sense I'm looking at the wrong log file. On that note -- I've read that Jetty isn't the best servlet container to use in these situations, is that your experience? Dave > -Original Message- > From: Chris Hostetter [mailto:[EMAIL PROTECTED] > Sent: Monday, October 08, 2007 11:20 PM > To: solr-user > Subject: RE: Availability Issues > > > : My logs don't look anything like that. They look like HTTP > : requests. Am I looking in the wrong place? > > what servlet container are you using? > > every servlet container handles applications logs differently > -- it's especially tricky becuse even the format can be > changed, the examples i gave before are in the default format > you get if you use the jetty setup in the solr example (which > logs to stdout), but many servlet containers won't include > that much detail by default (they typically leave out the > classname and method name). there's also typically a setting > that controls the verbosity -- so in some configurations only > the SEVERE messages are logged and in others the INFO > messages are logged ... you're going to want at least the > INFO level to debug stuff. > > grep all the log files you can find for "Solr home set to" > ... that's one of the first messages Solr logs. if you can > find that, you'll find the other messages i was talking about. > > > -Hoss > > >
RE: Availability Issues
All: How can I break up my install onto more than one box? We've hit a learning curve here and we don't understand how best to proceed. Right now we have everything crammed onto one box because we don't know any better. So, how would you build it if you could? Here are the specs: a) the index needs to hold at least 25 million articles b) the index is constantly updated at a rate of 10,000 articles per minute c) we need to have faceted queries Again, real-world experience is preferred here over book knowledge. We've tried to read the docs and it's only made us more confused. TIA Dave W > -Original Message- > From: Yonik Seeley [mailto:[EMAIL PROTECTED] > Sent: Monday, October 08, 2007 3:42 PM > To: solr-user@lucene.apache.org > Subject: Re: Availability Issues > > On 10/8/07, David Whalen <[EMAIL PROTECTED]> wrote: > > > Do you see any requests that took a really long time to finish? > > > > The requests that take a long time to finish are just > simple queries. > > And the same queries run at a later time come back much faster. > > > > Our logs contain 99% inserts and 1% queries. We are > constantly adding > > documents to the index at a rate of 10,000 per minute, so the logs > > show mostly that. > > Oh, so you are using the same boxes for updating and querying? > When you insert, are you using multiple threads? If so, how many? > > What is the full URL of those slow query requests? > Do the slow requests start after a commit? > > > > Start with the thread dump. > > > I bet it's multiple queries piling up around some synchronization > > > points in lucene (sometimes caused by multiple threads generating > > > the same big filter that isn't yet cached). > > > > What would be my next steps after that? I'm not sure I'd > understand > > enough from the dump to make heads-or-tails of it. Can I > share that > > here? > > Yes, post it here. Most likely a majority of the threads > will be blocked somewhere deep in lucene code, and you will > probably need help from people here to figure it out. > > -Yonik > >
Facets and running out of Heap Space
Hi All. I run a faceted query against a very large index on a regular schedule. Every now and then the query throws an out of heap space error, and we're sunk. So, naturally we increased the heap size and things worked well for a while and then the errors would happen again. We've increased the initial heap size to 2.5GB and it's still happening. Is there anything we can do about this? Thanks in advance, Dave W
RE: Facets and running out of Heap Space
Hi Yonik. According to the doc: > This is only used during the term enumeration method of > faceting (facet.field type faceting on multi-valued or > full-text fields). What if I'm faceting on just a plain String field? It's not full-text, and I don't have multiValued set for it Dave > -Original Message- > From: Yonik Seeley [mailto:[EMAIL PROTECTED] > Sent: Tuesday, October 09, 2007 12:47 PM > To: solr-user@lucene.apache.org > Subject: Re: Facets and running out of Heap Space > > On 10/9/07, David Whalen <[EMAIL PROTECTED]> wrote: > > I run a faceted query against a very large index on a regular > > schedule. Every now and then the query throws an out of heap space > > error, and we're sunk. > > > > So, naturally we increased the heap size and things worked > well for a > > while and then the errors would happen again. > > We've increased the initial heap size to 2.5GB and it's still > > happening. > > > > Is there anything we can do about this? > > Try facet.enum.cache.minDf param: > http://wiki.apache.org/solr/SimpleFacetParameters > > -Yonik > >
RE: Facets and running out of Heap Space
> Then you will be using the FieldCache counting method, and > this param is not applicable :-) Are all your field that you > facet on like this? Unfortunately yes. Could I improve my situation by changing them to multiValued? _____ david whalen senior applications developer eNR Services, Inc. [EMAIL PROTECTED] 203-849-7240 > -Original Message- > From: Yonik Seeley [mailto:[EMAIL PROTECTED] > Sent: Tuesday, October 09, 2007 2:14 PM > To: solr-user@lucene.apache.org > Subject: Re: Facets and running out of Heap Space > > On 10/9/07, David Whalen <[EMAIL PROTECTED]> wrote: > > > This is only used during the term enumeration method of faceting > > > (facet.field type faceting on multi-valued or full-text fields). > > > > What if I'm faceting on just a plain String field? It's not > > full-text, and I don't have multiValued set for it > > Then you will be using the FieldCache counting method, and > this param is not applicable :-) Are all your field that you > facet on like this? > > The FieldCache entry might be taking up too much room, esp if > the number of entries is high, and the entries are big. The > requests themselves can take up a good chunk of memory > temporarily (4 bytes * nValuesInField). > > You could try a memory profiling tool and see where all the > memory is being taken up too. > > -Yonik > >
RE: Facets and running out of Heap Space
> is this the same 25,000,000 document index you mentioned before? Yep. > how big is your index on disk? are you faceting or sorting on > other fields as well? running 'du -h' on my index directory returns 86G. We facet on almost all of our index fields (they were added to the index solely for that purpose, otherwise we'd remove them). Here's the meaty part of the config again: I'm sure we could stop storing many of these columns, especially if someone told me that would make a big difference. > what does the LukeReqeust Handler tell you about the # of > distinct terms in each field that you facet on? Where would I find that? I could probably estimate that myself on a per-column basis. it ranges from 4 distinct values for media_type to 30-ish for location to 200-ish for country_code to almost 10,000 for site_id to almost 100,000 for journalist_id. Thanks very much for your help so far, Chris! Dave > -Original Message- > From: Chris Hostetter [mailto:[EMAIL PROTECTED] > Sent: Tuesday, October 09, 2007 2:48 PM > To: solr-user > Subject: Re: Facets and running out of Heap Space > > > : So, naturally we increased the heap size and things worked > : well for a while and then the errors would happen again. > : We've increased the initial heap size to 2.5GB and it's > : still happening. > > is this the same 25,000,000 document index you mentioned before? > > 2.5GB of heap doesn't seem like much if you are also doing > faceting ... > even if you are faceting on an int field, there's going to be > 95MB of FieldCache for that field, you said this was a string > field, so it's going to be 95MB+however much space is needed > for all the terms (presumably if you are faceting on this > field every doc doesn't have a unique value, but even > assuming a conservative 10% unique values of 10 characters > each that's another ~50MB, so we're up to about 150MB of > FieldCache to facet that field -- and we haven't even started > talking about how big the index is itself (or how big the > filterCache gets, or how many other fields you are faceting on) > > how big is your index on disk? are you faceting or sorting on > other fields as well? > > what does the LukeReqeust Handler tell you about the # of > distinct terms in each field that you facet on? > > > > > -Hoss > > >
RE: Facets and running out of Heap Space
> Make sure you have: > class="org.apache.solr.handler.admin.LukeRequestHandler" /> > defined in solrconfig.xml What's the consequence of me changing the solrconfig.xml file? Doesn't that cause a restart of solr? > for a large index, this can be very slow but the results are valuable. In what way? I'm still not clear on what this does for me > -Original Message- > From: Ryan McKinley [mailto:[EMAIL PROTECTED] > Sent: Tuesday, October 09, 2007 4:01 PM > To: solr-user@lucene.apache.org > Subject: Re: Facets and running out of Heap Space > > > > >> what does the LukeReqeust Handler tell you about the # of distinct > >> terms in each field that you facet on? > > > > Where would I find that? > > check: > http://wiki.apache.org/solr/LukeRequestHandler > > Make sure you have: > class="org.apache.solr.handler.admin.LukeRequestHandler" /> > defined in solrconfig.xml > > for a large index, this can be very slow but the results are valuable. > > ryan > >
RE: Facets and running out of Heap Space
It looks now like I can't use facets the way I was hoping to because the memory requirements are impractical. So, as an alternative I was thinking I could get counts by doing rows=0 and using filter queries. Is there a reason to think that this might perform better? Or, am I simply moving the problem to another step in the process? DW > -Original Message- > From: Stu Hood [mailto:[EMAIL PROTECTED] > Sent: Tuesday, October 09, 2007 10:53 PM > To: solr-user@lucene.apache.org > Subject: Re: Facets and running out of Heap Space > > > Using the filter cache method on the things like media type and > > location; this will occupy ~2.3MB of memory _per unique value_ > > Mike, how did you calculate that value? I'm trying to tune my > caches, and any equations that could be used to determine > some balanced settings would be extremely helpful. I'm in a > memory limited environment, so I can't afford to throw a ton > of cache at the problem. > > (I don't want to thread-jack, but I'm also wondering whether > anyone has any notes on how to tune cache sizes for the > filterCache, queryResultCache and documentCache). > > Thanks, > Stu > > > -Original Message- > From: Mike Klaas <[EMAIL PROTECTED]> > Sent: Tuesday, October 9, 2007 9:30pm > To: solr-user@lucene.apache.org > Subject: Re: Facets and running out of Heap Space > > On 9-Oct-07, at 12:36 PM, David Whalen wrote: > > >(snip) > > I'm sure we could stop storing many of these columns, > especially if > >someone told me that would make a big difference. > > I don't think that it would make a difference in memory > consumption, but storage is certainly not necessary for > faceting. Extra stored fields can slow down search if they > are large (in terms of bytes), but don't really occupy extra > memory, unless they are polluting the doc cache. Does 'text' > need to be stored? > > > >> what does the LukeReqeust Handler tell you about the # of distinct > >> terms in each field that you facet on? > > > > Where would I find that? I could probably estimate that > myself on a > > per-column basis. it ranges from 4 distinct values for > media_type to > > 30-ish for location to 200-ish for country_code to almost > 10,000 for > > site_id to almost 100,000 for journalist_id. > > Using the filter cache method on the things like media type > and location; this will occupy ~2.3MB of memory _per unique > value_, so it should be a net win for those (although quite > close in space requirements for a 30-ary field on your index size). > > -Mike > >
RE: Facets and running out of Heap Space
Accoriding to Yonik I can't use minDf because I'm faceting on a string field. I'm thinking of changing it to a tokenized type so that I can utilize this setting, but then I'll have to rebuild my entire index. Unless there's some way around that? > -Original Message- > From: Mike Klaas [mailto:[EMAIL PROTECTED] > Sent: Wednesday, October 10, 2007 4:56 PM > To: solr-user@lucene.apache.org > Cc: stuhood > Subject: Re: Facets and running out of Heap Space > > On 10-Oct-07, at 12:19 PM, David Whalen wrote: > > > It looks now like I can't use facets the way I was hoping > to because > > the memory requirements are impractical. > > I can't remember if this has been mentioned, but upping the > HashDocSet size is one way to reduce memory consumption. > Whether this will work well depends greatly on the > cardinality of your facet sets. facet.enum.cache.minDf set > high is another option (will not generate a bitset for any > value whose facet set is less that this value). > > Both options have performance implications. > > > So, as an alternative I was thinking I could get counts by doing > > rows=0 and using filter queries. > > > > Is there a reason to think that this might perform better? > > Or, am I simply moving the problem to another step in the process? > > Running one query per unique facet value seems impractical, > if that is what you are suggesting. Setting minDf to a very > high value should always outperform such an approach. > > -Mike > > > DW > > > > > > > >> -Original Message- > >> From: Stu Hood [mailto:[EMAIL PROTECTED] > >> Sent: Tuesday, October 09, 2007 10:53 PM > >> To: solr-user@lucene.apache.org > >> Subject: Re: Facets and running out of Heap Space > >> > >>> Using the filter cache method on the things like media type and > >>> location; this will occupy ~2.3MB of memory _per unique value_ > >> > >> Mike, how did you calculate that value? I'm trying to tune > my caches, > >> and any equations that could be used to determine some balanced > >> settings would be extremely helpful. I'm in a memory limited > >> environment, so I can't afford to throw a ton of cache at the > >> problem. > >> > >> (I don't want to thread-jack, but I'm also wondering > whether anyone > >> has any notes on how to tune cache sizes for the filterCache, > >> queryResultCache and documentCache). > >> > >> Thanks, > >> Stu > >> > >> > >> -Original Message- > >> From: Mike Klaas <[EMAIL PROTECTED]> > >> Sent: Tuesday, October 9, 2007 9:30pm > >> To: solr-user@lucene.apache.org > >> Subject: Re: Facets and running out of Heap Space > >> > >> On 9-Oct-07, at 12:36 PM, David Whalen wrote: > >> > >>> (snip) > >>> I'm sure we could stop storing many of these columns, > >> especially if > >>> someone told me that would make a big difference. > >> > >> I don't think that it would make a difference in memory > consumption, > >> but storage is certainly not necessary for faceting. Extra stored > >> fields can slow down search if they are large (in terms of bytes), > >> but don't really occupy extra memory, unless they are > polluting the > >> doc cache. Does 'text' > >> need to be stored? > >>> > >>>> what does the LukeReqeust Handler tell you about the # > of distinct > >>>> terms in each field that you facet on? > >>> > >>> Where would I find that? I could probably estimate that > >> myself on a > >>> per-column basis. it ranges from 4 distinct values for > >> media_type to > >>> 30-ish for location to 200-ish for country_code to almost > >> 10,000 for > >>> site_id to almost 100,000 for journalist_id. > >> > >> Using the filter cache method on the things like media type and > >> location; this will occupy ~2.3MB of memory _per unique > value_, so it > >> should be a net win for those (although quite close in space > >> requirements for a 30-ary field on your index size). > >> > >> -Mike > >> > >> > > >
RE: Facets and running out of Heap Space
I'll see what I can do about that. Truthfully, the most important facet we need is the one on media_type, which has only 4 unique values. The second most important one to us is location, which has about 30 unique values. So, it would seem like we actually need a counter-intuitive solution. That's why I thought Field Queries might be the solution. Is there some reason to avoid setting multiValued to true here? It sounds like it would be the true cure-all Thanks again! dave > -Original Message- > From: Mike Klaas [mailto:[EMAIL PROTECTED] > Sent: Wednesday, October 10, 2007 6:20 PM > To: solr-user@lucene.apache.org > Subject: Re: Facets and running out of Heap Space > > On 10-Oct-07, at 2:40 PM, David Whalen wrote: > > > Accoriding to Yonik I can't use minDf because I'm faceting > on a string > > field. I'm thinking of changing it to a tokenized type so > that I can > > utilize this setting, but then I'll have to rebuild my entire index. > > > > Unless there's some way around that? > > For the fields that matter (many unique values), this is > likely result in a performance regression. > > It might be better to try storing less unique data. For > instance, faceting on the blog_url field, or create_date in > your schema would case problems (they probably have millions > of unique values). > > It would be helpful to know which field is causing the > problem. One way would be to do a sorted query on a > quiescent index for each field, and see if there are any > suspiciously large jumps in memory usage. > > -Mike > > > > > > > > >> -Original Message- > >> From: Mike Klaas [mailto:[EMAIL PROTECTED] > >> Sent: Wednesday, October 10, 2007 4:56 PM > >> To: solr-user@lucene.apache.org > >> Cc: stuhood > >> Subject: Re: Facets and running out of Heap Space > >> > >> On 10-Oct-07, at 12:19 PM, David Whalen wrote: > >> > >>> It looks now like I can't use facets the way I was hoping > >> to because > >>> the memory requirements are impractical. > >> > >> I can't remember if this has been mentioned, but upping the > >> HashDocSet size is one way to reduce memory consumption. > >> Whether this will work well depends greatly on the > >> cardinality of your facet sets. facet.enum.cache.minDf set > >> high is another option (will not generate a bitset for any > >> value whose facet set is less that this value). > >> > >> Both options have performance implications. > >> > >>> So, as an alternative I was thinking I could get counts by doing > >>> rows=0 and using filter queries. > >>> > >>> Is there a reason to think that this might perform better? > >>> Or, am I simply moving the problem to another step in the process? > >> > >> Running one query per unique facet value seems impractical, > >> if that is what you are suggesting. Setting minDf to a very > >> high value should always outperform such an approach. > >> > >> -Mike > >> > >>> DW > >>> > >>> > >>> > >>>> -Original Message- > >>>> From: Stu Hood [mailto:[EMAIL PROTECTED] > >>>> Sent: Tuesday, October 09, 2007 10:53 PM > >>>> To: solr-user@lucene.apache.org > >>>> Subject: Re: Facets and running out of Heap Space > >>>> > >>>>> Using the filter cache method on the things like media type and > >>>>> location; this will occupy ~2.3MB of memory _per unique value_ > >>>> > >>>> Mike, how did you calculate that value? I'm trying to tune > >> my caches, > >>>> and any equations that could be used to determine some balanced > >>>> settings would be extremely helpful. I'm in a memory limited > >>>> environment, so I can't afford to throw a ton of cache at the > >>>> problem. > >>>> > >>>> (I don't want to thread-jack, but I'm also wondering > >> whether anyone > >>>> has any notes on how to tune cache sizes for the filterCache, > >>>> queryResultCache and documentCache). > >>>> > >>>> Thanks, > >>>> Stu > >>>> > >>>> > >>>> -Original Message- > >>>> From: Mike Klaas <[EMAIL PROTECTED]> > >>>> Sent: Tuesday, October 9, 2007 9:30pm
comment-out a filter?
Hi All. I want to comment-out a filter in my schema.xml, specifically the solr.EnglishPorterFilterFactory filter. I want to know -- will this cause me to have to re-build my index? Or will a restart of SOLR get the job done? Thanks! Dave W
Date range problems
Hi All. We're seeing a really interesting problem when searching by date range. We have two fields of type "date" in our index (they are both indexed and stored). They are: content_date and created_date We can run any date-range query we want against content_date and we get expected results. However, when we run similar queries against created_date we get 0 results from the query consistently. Now, here's the interesting part -- if we do a plain search without a date range, BUT sort by created_date desc we get properly sorted results. So, it seems like the index works for sorting but not for searching. Does that make any sense? Anyone have an ideas on how we can diagnose this issue? Here's the relevant block from our schema (before you ask): TIA, Dave W.