from:"David Whalen"

RE: Solr 1.1 HTTP server stops responding

2007-07-27 Thread David Whalen

Hi Otis.

I'm filling-in for the guy that installed the software for us
(now he's long gone), so I'm just getting familiar with all of
this.  Can you elaborate on what you mean?

DW


> -Original Message-
> From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] 
> Sent: Friday, July 27, 2007 10:01 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr 1.1 HTTP server stops responding
> 
> Hi David,
> 
> Have you ruled out your servlet container as the source of this bug?
> 
> Otis
> 
> 
> - Original Message 
> From: David Whalen <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Friday, July 27, 2007 3:06:42 PM
> Subject: Solr 1.1 HTTP server stops responding
> 
> Hi All.
> 
> We're running Solr 1.1 and we're seeing intermittent cases 
> where Solr stops responding to HTTP requests.  It seems like 
> the listener on port 8983 just doesn't respond.
> 
> We stop and restart Solr and everything works fine for a few 
> hours, and then the problem returns.  We can't seem to point 
> to any single factor that would lead to this problem, and I'm 
> hoping to get some hints on how to diagnose it.
> 
> Here's what I can tell you now, and I can provide more info 
> by request:
> 
> 1) The query load (via /solr/select) isn't that high.  Maybe 
> 20 or 30 requests per minute tops.
> 
> 2) The insert load (via /solr/update) is very high.  We 
> commit almost 500,000 documents per day.  We also trim out 
> the same number however, so the net number of documents 
> should stay around 20 million.
> 
> 3) We do see Out of Memory errors sometimes, especially when 
> making facet queries (which we do most of the time).
> 
> We think solr is great, and we want to keep using it, but the 
> downtime makes the product (and us) look bad, so we need to 
> solve this soon.
> 
> Thanks in advance for your help!
> 
> DW
> 
> 
> 
> 
> No virus found in this incoming message.
> Checked by AVG Free Edition. 
> Version: 7.5.476 / Virus Database: 269.10.22/922 - Release 
> Date: 7/27/2007 6:08 AM
>  
>

Solr 1.1 HTTP server stops responding

2007-07-27 Thread David Whalen

Hi All.

We're running Solr 1.1 and we're seeing intermittent cases where
Solr stops responding to HTTP requests.  It seems like the listener
on port 8983 just doesn't respond.

We stop and restart Solr and everything works fine for a few hours,
and then the problem returns.  We can't seem to point to any
single factor that would lead to this problem, and I'm hoping
to get some hints on how to diagnose it.

Here's what I can tell you now, and I can provide more info by request:

1) The query load (via /solr/select) isn't that high.  Maybe 20 or 30
requests per minute tops.

2) The insert load (via /solr/update) is very high.  We commit almost
500,000 documents per day.  We also trim out the same number however,
so the net number of documents should stay around 20 million.

3) We do see Out of Memory errors sometimes, especially when making
facet queries (which we do most of the time).

We think solr is great, and we want to keep using it, but the downtime
makes the product (and us) look bad, so we need to solve this soon.

Thanks in advance for your help!

DW

RE: Solr 1.1 HTTP server stops responding

2007-07-27 Thread David Whalen

We're using Jetty.  I don't know what version though.  To my
knowledge, Solr is the only thing running inside it.  

Yes, we cannot get to the admin pages either.  Nothing on port
8983 responds.

So maybe it's actually Jetty that's messing me up?  How can
I make sure of that?

Thanks for the help!

DW


> -Original Message-
> From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] 
> Sent: Friday, July 27, 2007 10:40 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr 1.1 HTTP server stops responding
> 
> Solr runs as a webapp (think .war file) inside a servlet 
> container (e.g. Tomcat, Jetty, Resin...).  It could be that 
> the servlet contan itself has a bug that prevents it from 
> responding properly after a while.  If you have other webapps 
> in the same container, do they still respond?  Can you got to 
> *any* of Solr's "pages" (e.g. admin page)?  Anything in 
> container or Solr logs?
> 
> Otis
> --
> Lucene Consulting - http://lucene-consulting.com/
> 
> 
> 
> - Original Message 
> From: David Whalen <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Friday, July 27, 2007 4:21:18 PM
> Subject: RE: Solr 1.1 HTTP server stops responding
> 
> Hi Otis.
> 
> I'm filling-in for the guy that installed the software for us 
> (now he's long gone), so I'm just getting familiar with all 
> of this.  Can you elaborate on what you mean?
> 
> DW
> 
> 
> > -Original Message-
> > From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
> > Sent: Friday, July 27, 2007 10:01 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Solr 1.1 HTTP server stops responding
> > 
> > Hi David,
> > 
> > Have you ruled out your servlet container as the source of this bug?
> > 
> > Otis
> > 
> > 
> > - Original Message 
> > From: David Whalen <[EMAIL PROTECTED]>
> > To: solr-user@lucene.apache.org
> > Sent: Friday, July 27, 2007 3:06:42 PM
> > Subject: Solr 1.1 HTTP server stops responding
> > 
> > Hi All.
> > 
> > We're running Solr 1.1 and we're seeing intermittent cases 
> where Solr 
> > stops responding to HTTP requests.  It seems like the 
> listener on port 
> > 8983 just doesn't respond.
> > 
> > We stop and restart Solr and everything works fine for a few hours, 
> > and then the problem returns.  We can't seem to point to any single 
> > factor that would lead to this problem, and I'm hoping to get some 
> > hints on how to diagnose it.
> > 
> > Here's what I can tell you now, and I can provide more info by 
> > request:
> > 
> > 1) The query load (via /solr/select) isn't that high.  
> Maybe 20 or 30 
> > requests per minute tops.
> > 
> > 2) The insert load (via /solr/update) is very high.  We 
> commit almost 
> > 500,000 documents per day.  We also trim out the same 
> number however, 
> > so the net number of documents should stay around 20 million.
> > 
> > 3) We do see Out of Memory errors sometimes, especially when making 
> > facet queries (which we do most of the time).
> > 
> > We think solr is great, and we want to keep using it, but 
> the downtime 
> > makes the product (and us) look bad, so we need to solve this soon.
> > 
> > Thanks in advance for your help!
> > 
> > DW
> > 
> > 
> > 
> > 
> > No virus found in this incoming message.
> > Checked by AVG Free Edition. 
> > Version: 7.5.476 / Virus Database: 269.10.22/922 - Release
> > Date: 7/27/2007 6:08 AM
> >  
> > 
> 
> 
> 
> 
> No virus found in this incoming message.
> Checked by AVG Free Edition. 
> Version: 7.5.476 / Virus Database: 269.10.22/922 - Release 
> Date: 7/27/2007 6:08 AM
>  
>

RE: Solr 1.1 HTTP server stops responding

2007-07-30 Thread David Whalen

Hi All.

I'm still hoping to get some insight into how I can solve this
issue.  If Jetty is the problem I'll happily get rid of it, but
I'd feel better if I could do some tests first to be sure I'm
solving the problem.

Has anyone else had this problem in the past?

Thanks,

DW



> -Original Message-----
> From: David Whalen [mailto:[EMAIL PROTECTED] 
> Sent: Friday, July 27, 2007 10:49 AM
> To: solr-user@lucene.apache.org
> Subject: RE: Solr 1.1 HTTP server stops responding
> 
> We're using Jetty.  I don't know what version though.  To my 
> knowledge, Solr is the only thing running inside it.  
> 
> Yes, we cannot get to the admin pages either.  Nothing on port
> 8983 responds.
> 
> So maybe it's actually Jetty that's messing me up?  How can I 
> make sure of that?
> 
> Thanks for the help!
> 
> DW
> 
> 
> > -Original Message-
> > From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
> > Sent: Friday, July 27, 2007 10:40 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Solr 1.1 HTTP server stops responding
> > 
> > Solr runs as a webapp (think .war file) inside a servlet container 
> > (e.g. Tomcat, Jetty, Resin...).  It could be that the 
> servlet contan 
> > itself has a bug that prevents it from responding properly after a 
> > while.  If you have other webapps in the same container, do 
> they still 
> > respond?  Can you got to
> > *any* of Solr's "pages" (e.g. admin page)?  Anything in 
> container or 
> > Solr logs?
> > 
> > Otis
> > --
> > Lucene Consulting - http://lucene-consulting.com/
> > 
> > 
> > 
> > - Original Message 
> > From: David Whalen <[EMAIL PROTECTED]>
> > To: solr-user@lucene.apache.org
> > Sent: Friday, July 27, 2007 4:21:18 PM
> > Subject: RE: Solr 1.1 HTTP server stops responding
> > 
> > Hi Otis.
> > 
> > I'm filling-in for the guy that installed the software for us (now 
> > he's long gone), so I'm just getting familiar with all of 
> this.  Can 
> > you elaborate on what you mean?
> > 
> > DW
> > 
> > 
> > > -Original Message-
> > > From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
> > > Sent: Friday, July 27, 2007 10:01 AM
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: Solr 1.1 HTTP server stops responding
> > > 
> > > Hi David,
> > > 
> > > Have you ruled out your servlet container as the source 
> of this bug?
> > > 
> > > Otis
> > > 
> > > 
> > > - Original Message 
> > > From: David Whalen <[EMAIL PROTECTED]>
> > > To: solr-user@lucene.apache.org
> > > Sent: Friday, July 27, 2007 3:06:42 PM
> > > Subject: Solr 1.1 HTTP server stops responding
> > > 
> > > Hi All.
> > > 
> > > We're running Solr 1.1 and we're seeing intermittent cases
> > where Solr
> > > stops responding to HTTP requests.  It seems like the
> > listener on port
> > > 8983 just doesn't respond.
> > > 
> > > We stop and restart Solr and everything works fine for a 
> few hours, 
> > > and then the problem returns.  We can't seem to point to 
> any single 
> > > factor that would lead to this problem, and I'm hoping to 
> get some 
> > > hints on how to diagnose it.
> > > 
> > > Here's what I can tell you now, and I can provide more info by
> > > request:
> > > 
> > > 1) The query load (via /solr/select) isn't that high.  
> > Maybe 20 or 30
> > > requests per minute tops.
> > > 
> > > 2) The insert load (via /solr/update) is very high.  We
> > commit almost
> > > 500,000 documents per day.  We also trim out the same
> > number however,
> > > so the net number of documents should stay around 20 million.
> > > 
> > > 3) We do see Out of Memory errors sometimes, especially 
> when making 
> > > facet queries (which we do most of the time).
> > > 
> > > We think solr is great, and we want to keep using it, but
> > the downtime
> > > makes the product (and us) look bad, so we need to solve 
> this soon.
> > > 
> > > Thanks in advance for your help!
> > > 
> > > DW
> > > 
> > > 
> > > 
> > > 
> > > No virus found in this incoming message.
> > > Checked by AVG Free Edition. 
> > > Version: 7.5.476 / Virus Database: 269.10.22/922 - Release
> > > Date: 7/27/2007 6:08 AM
> > >  
> > > 
> > 
> > 
> > 
> > 
> > No virus found in this incoming message.
> > Checked by AVG Free Edition. 
> > Version: 7.5.476 / Virus Database: 269.10.22/922 - Release
> > Date: 7/27/2007 6:08 AM
> >  
> > 
> 
> No virus found in this incoming message.
> Checked by AVG Free Edition. 
> Version: 7.5.476 / Virus Database: 269.10.22/922 - Release 
> Date: 7/27/2007 6:08 AM
>  
>

Please help! Solr 1.1 HTTP server stops responding

2007-07-30 Thread David Whalen

Guys:

Can anyone help me?  Things are getting serious at my
company and heads are going to roll.  

I need to figure out why solr just suddenly stops responding
without any warning.

DW
  

> -Original Message-
> From: David Whalen [mailto:[EMAIL PROTECTED] 
> Sent: Friday, July 27, 2007 10:49 AM
> To: solr-user@lucene.apache.org
> Subject: RE: Solr 1.1 HTTP server stops responding
> 
> We're using Jetty.  I don't know what version though.  To my 
> knowledge, Solr is the only thing running inside it.  
> 
> Yes, we cannot get to the admin pages either.  Nothing on port
> 8983 responds.
> 
> So maybe it's actually Jetty that's messing me up?  How can I 
> make sure of that?
> 
> Thanks for the help!
> 
> DW
> 
> 
> > -Original Message-
> > From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
> > Sent: Friday, July 27, 2007 10:40 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Solr 1.1 HTTP server stops responding
> > 
> > Solr runs as a webapp (think .war file) inside a servlet container 
> > (e.g. Tomcat, Jetty, Resin...).  It could be that the 
> servlet contan 
> > itself has a bug that prevents it from responding properly after a 
> > while.  If you have other webapps in the same container, do 
> they still 
> > respond?  Can you got to
> > *any* of Solr's "pages" (e.g. admin page)?  Anything in 
> container or 
> > Solr logs?
> > 
> > Otis
> > --
> > Lucene Consulting - http://lucene-consulting.com/
> > 
> > 
> > 
> > - Original Message 
> > From: David Whalen <[EMAIL PROTECTED]>
> > To: solr-user@lucene.apache.org
> > Sent: Friday, July 27, 2007 4:21:18 PM
> > Subject: RE: Solr 1.1 HTTP server stops responding
> > 
> > Hi Otis.
> > 
> > I'm filling-in for the guy that installed the software for us (now 
> > he's long gone), so I'm just getting familiar with all of 
> this.  Can 
> > you elaborate on what you mean?
> > 
> > DW
> > 
> > 
> > > -Original Message-
> > > From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
> > > Sent: Friday, July 27, 2007 10:01 AM
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: Solr 1.1 HTTP server stops responding
> > > 
> > > Hi David,
> > > 
> > > Have you ruled out your servlet container as the source 
> of this bug?
> > > 
> > > Otis
> > > 
> > > 
> > > - Original Message 
> > > From: David Whalen <[EMAIL PROTECTED]>
> > > To: solr-user@lucene.apache.org
> > > Sent: Friday, July 27, 2007 3:06:42 PM
> > > Subject: Solr 1.1 HTTP server stops responding
> > > 
> > > Hi All.
> > > 
> > > We're running Solr 1.1 and we're seeing intermittent cases
> > where Solr
> > > stops responding to HTTP requests.  It seems like the
> > listener on port
> > > 8983 just doesn't respond.
> > > 
> > > We stop and restart Solr and everything works fine for a 
> few hours, 
> > > and then the problem returns.  We can't seem to point to 
> any single 
> > > factor that would lead to this problem, and I'm hoping to 
> get some 
> > > hints on how to diagnose it.
> > > 
> > > Here's what I can tell you now, and I can provide more info by
> > > request:
> > > 
> > > 1) The query load (via /solr/select) isn't that high.  
> > Maybe 20 or 30
> > > requests per minute tops.
> > > 
> > > 2) The insert load (via /solr/update) is very high.  We
> > commit almost
> > > 500,000 documents per day.  We also trim out the same
> > number however,
> > > so the net number of documents should stay around 20 million.
> > > 
> > > 3) We do see Out of Memory errors sometimes, especially 
> when making 
> > > facet queries (which we do most of the time).
> > > 
> > > We think solr is great, and we want to keep using it, but
> > the downtime
> > > makes the product (and us) look bad, so we need to solve 
> this soon.
> > > 
> > > Thanks in advance for your help!
> > > 
> > > DW
> > > 
> > > 
> > > 
> > > 
> > > No virus found in this incoming message.
> > > Checked by AVG Free Edition. 
> > > Version: 7.5.476 / Virus Database: 269.10.22/922 - Release
> > > Date: 7/27/2007 6:08 AM
> > >  
> > > 
> > 
> > 
> > 
> > 
> > No virus found in this incoming message.
> > Checked by AVG Free Edition. 
> > Version: 7.5.476 / Virus Database: 269.10.22/922 - Release
> > Date: 7/27/2007 6:08 AM
> >  
> > 
> 
> No virus found in this incoming message.
> Checked by AVG Free Edition. 
> Version: 7.5.476 / Virus Database: 269.10.22/922 - Release 
> Date: 7/27/2007 6:08 AM
>  
>

RE: Please help! Solr 1.1 HTTP server stops responding

2007-07-30 Thread David Whalen

Hi Yonik!

I'm glad to finally get to talk to you.  We're all very impressed
with solr and when it's running it's really great.

We increased the heap size to 1500M and that didn't seem to help.
In fact, the crashes seem to occur more now than ever.  We're
constantly restarting solr just to get a response.

I don't know enough to know where the log files are to answer
your question (again, I'm filling in for the guy that set us 
up with all this).  Can I ask for your patience so we can figure
this out?

Thanks!

Dave W


> -Original Message-
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf 
> Of Yonik Seeley
> Sent: Monday, July 30, 2007 2:23 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Please help! Solr 1.1 HTTP server stops responding
> 
> It may be related to the out-of-memory errors you were seeing.
> severe errors like that should never be ignored.
> Do you see any other warning or severe errors in your logs?
> 
> -Yonik
> 
> On 7/30/07, David Whalen <[EMAIL PROTECTED]> wrote:
> > Guys:
> >
> > Can anyone help me?  Things are getting serious at my company and 
> > heads are going to roll.
> >
> > I need to figure out why solr just suddenly stops 
> responding without 
> > any warning.
> >
> > DW
> >
> >
> > > -Original Message-
> > > From: David Whalen [mailto:[EMAIL PROTECTED]
> > > Sent: Friday, July 27, 2007 10:49 AM
> > > To: solr-user@lucene.apache.org
> > > Subject: RE: Solr 1.1 HTTP server stops responding
> > >
> > > We're using Jetty.  I don't know what version though.  To my 
> > > knowledge, Solr is the only thing running inside it.
> > >
> > > Yes, we cannot get to the admin pages either.  Nothing on port
> > > 8983 responds.
> > >
> > > So maybe it's actually Jetty that's messing me up?  How 
> can I make 
> > > sure of that?
> > >
> > > Thanks for the help!
> > >
> > > DW
> > >
> > >
> > > > -Original Message-
> > > > From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
> > > > Sent: Friday, July 27, 2007 10:40 AM
> > > > To: solr-user@lucene.apache.org
> > > > Subject: Re: Solr 1.1 HTTP server stops responding
> > > >
> > > > Solr runs as a webapp (think .war file) inside a 
> servlet container 
> > > > (e.g. Tomcat, Jetty, Resin...).  It could be that the
> > > servlet contan
> > > > itself has a bug that prevents it from responding 
> properly after a 
> > > > while.  If you have other webapps in the same container, do
> > > they still
> > > > respond?  Can you got to
> > > > *any* of Solr's "pages" (e.g. admin page)?  Anything in
> > > container or
> > > > Solr logs?
> > > >
> > > > Otis
> > > > --
> > > > Lucene Consulting - http://lucene-consulting.com/
> > > >
> > > >
> > > >
> > > > - Original Message 
> > > > From: David Whalen <[EMAIL PROTECTED]>
> > > > To: solr-user@lucene.apache.org
> > > > Sent: Friday, July 27, 2007 4:21:18 PM
> > > > Subject: RE: Solr 1.1 HTTP server stops responding
> > > >
> > > > Hi Otis.
> > > >
> > > > I'm filling-in for the guy that installed the software 
> for us (now 
> > > > he's long gone), so I'm just getting familiar with all of
> > > this.  Can
> > > > you elaborate on what you mean?
> > > >
> > > > DW
> > > >
> > > >
> > > > > -Original Message-
> > > > > From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
> > > > > Sent: Friday, July 27, 2007 10:01 AM
> > > > > To: solr-user@lucene.apache.org
> > > > > Subject: Re: Solr 1.1 HTTP server stops responding
> > > > >
> > > > > Hi David,
> > > > >
> > > > > Have you ruled out your servlet container as the source
> > > of this bug?
> > > > >
> > > > > Otis
> > > > >
> > > > >
> > > > > - Original Message 
> > > > > From: David Whalen <[EMAIL PROTECTED]>
> > > > > To: solr-user@lucene.apache.org
> > > > > Sent: Friday, July 27, 2007 3:06:42 PM
> > > > > Subject: Solr 1.1 HTTP server stops responding
> >

RE: Please help! Solr 1.1 HTTP server stops responding

2007-07-30 Thread David Whalen

Yonik:

> If that's not the problem, you could decrease memory usage 
> due to faceting by upgrading to Solr 1.2 and using 
> facet.enum.cache.minDf

Is it hard to upgrade from 1.1 to 1.2?  We were considering
making that change if it wouldn't cost us a lot of downtime.

can you help me understand what "using facet.enum.cache.minDf"
means?  Is that a setting in the config file?

Dave W

  

> -Original Message-
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf 
> Of Yonik Seeley
> Sent: Monday, July 30, 2007 3:29 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Please help! Solr 1.1 HTTP server stops responding
> 
> Grep for PERFORMANCE in the logs to make sure that you aren't 
> running into a scenario where more than one searcher is 
> warming in the background.
> 
> If that's not the problem, you could decrease memory usage 
> due to faceting by upgrading to Solr 1.2 and using 
> facet.enum.cache.minDf
> 
> -Yonik
> 
> On 7/30/07, Kevin Holmes <[EMAIL PROTECTED]> wrote:
> > Just got this:
> >
> >
> >
> > Jul 30, 2007 3:02:14 PM org.apache.solr.core.SolrException log
> > SEVERE: java.lang.OutOfMemoryError: Java heap space
> >
> > Jul 30, 2007 3:02:30 PM org.apache.solr.core.SolrException log
> > SEVERE: java.lang.OutOfMemoryError: Java heap space
> >
> >
> >
> >
> >
> > Kevin Holmes
> > eNR Services, Inc.
> > 20 Glover Ave. 2nd Floor
> > Norwalk, CT. 06851
> > 203-849-7248
> > [EMAIL PROTECTED]
> >
> >
> > -Original Message-
> > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On 
> Behalf Of Yonik 
> > Seeley
> > Sent: Monday, July 30, 2007 2:55 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Please help! Solr 1.1 HTTP server stops responding
> >
> > On 7/30/07, David Whalen <[EMAIL PROTECTED]> wrote:
> > > We increased the heap size to 1500M and that didn't seem to help.
> > > In fact, the crashes seem to occur more now than ever.  We're 
> > > constantly restarting solr just to get a response.
> > >
> > > I don't know enough to know where the log files are to 
> answer your 
> > > question
> >
> > Me neither ;-)
> > Solr's example app that uses Jetty just has logging going to stdout 
> > (the console) to make it clear and visible to new users 
> when an error 
> > happens.  Hopefully you've configured Jetty to log to files, or at 
> > least redirected Jetty's stdout/stderr to a file.
> > You need to look around and try and find those log files.
> > If you find them, one thing to look for would be "WARNING" 
> in the log 
> > files.  Another thing to look for would be "Exception" or "Memory"
> >
> > > So maybe it's actually Jetty that's messing me up?  How 
> can I make 
> > > sure of that?
> >
> > Perhaps point your browser at http://localhost:8983/ and see if you 
> > get any reponse at all.
> >
> > -Yonik
> >
> 
> No virus found in this incoming message.
> Checked by AVG Free Edition. 
> Version: 7.5.476 / Virus Database: 269.10.25/926 - Release 
> Date: 7/29/2007 11:14 PM
>  
>

faceting on multiple columns

2007-07-30 Thread David Whalen

Hi All.

I am using facets to help me build an ajax-driven tree for
search results.  When the search is first run, all I need to
do is show the counts per facet, for example

search results for "fred"
+--A (102)
+--B (234)
+--C (721)
+--D (512)

sounds simple, but I also need to break-down the results from
"D" by a different index in lucene:

search results for "fred"
+--A (102)
+--B (234)
+--C (721)
+--D (512)
  +--D1 (19)
  +--D2 (34)
  +--D3 (45)

what I have been doing in my solr querystring looks like this:

rows=0&facet=true&facet.limit=-1&facet.field=&facet.field=

Unfortunately we're seeing really bad performance and we're
constantly running out of heap space on this type of query.

So, my question is, would breaking this into two calls perform
better?  That is,

rows=0&facet=true&facet.limit=-1&facet.field=

and then

rows=0&facet=true&facet.limit=-1&facet.field=

?

It seems to me that two calls would have more overhead than one,
but it might lessen the impact on the heap space on my server.

Anyone work enough with facets to throw in their two cents?

Thanks!

Dave W.

RE: Any clever ideas to inject into solr? Without http?

2007-08-09 Thread David Whalen

What we're looking for is a way to inject *without* using
curl, or wget, or any other http-based communication.  We'd
like for the HTTP daemon to only handle search requests, not
indexing requests on top of them.

Plus, I have to believe there's a faster way to get documents
into solr/lucene than using curl

_________
david whalen
senior applications developer
eNR Services, Inc.
[EMAIL PROTECTED]
203-849-7240
  

> -Original Message-
> From: Clay Webster [mailto:[EMAIL PROTECTED] 
> Sent: Thursday, August 09, 2007 11:43 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Any clever ideas to inject into solr? Without http?
> 
> Condensing the loader into a single executable sounds right 
> if you have performance problems. ;-)
> 
> You could also try adding multiple s in a single post if 
> you notice your problems are with tcp setup time, though if 
> you're doing localhost connections that should be minimal.
> 
> If you're already local to the solr server, you might check 
> out the CSV slurper. http://wiki.apache.org/solr/UpdateCSV  
> It's a little specialized.
> 
> And then there's of course the question of "are you doing 
> full re-indexing or incremental indexing of changes?"
> 
> --cw
> 
> 
> On 8/9/07, Kevin Holmes <[EMAIL PROTECTED]> wrote:
> >
> > I inherited an existing (working) solr indexing script that 
> runs like
> > this:
> >
> >
> >
> > Python script queries the mysql DB then calls bash script
> >
> > Bash script performs a curl POST submit to solr
> >
> >
> >
> > We're injecting about 1000 records / minute (constantly), 
> frequently 
> > pushing the edge of our CPU / RAM limitations.
> >
> >
> >
> > I'm in the process of building a Perl script to use DBI and 
> > lwp::simple::post that will perform this all from a single script 
> > (instead of 3).
> >
> >
> >
> > Two specific questions
> >
> > 1: Does anyone have a clever (or better) way to perform 
> this process 
> > efficiently?
> >
> >
> >
> > 2: Is there a way to inject into solr without using POST / 
> curl / http?
> >
> >
> >
> > Admittedly, I'm no solr expert - I'm starting from someone else's 
> > setup, trying to reverse-engineer my way out.  Any input would be 
> > greatly appreciated.
> >
> >
>

Problem with stemming

2007-08-13 Thread David Whalen

Hi All.

We're running into a problem with stemming that I can't
figure out.  For example, searching for the word "transit"
(whether in quotes or not) returns documents with the word
"transition" in them.

How do I disable this?  We want our engine to be as literal
as possible.  If a user mis-types a word, that's too bad for
them

TIA

DW

RE: Problem with stemming

2007-08-13 Thread David Whalen

Yonik:

I only raised the question to the group after I had looked in
the schema.xml.  There are a lot of comments in that file, but
they make no sense to me.  

I'd appreciate some specific help on what to do...

DW

> -Original Message-
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf 
> Of Yonik Seeley
> Sent: Monday, August 13, 2007 3:28 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Problem with stemming
> 
> On 8/13/07, David Whalen <[EMAIL PROTECTED]> wrote:
> > Hi All.
> >
> > We're running into a problem with stemming that I can't 
> figure out.  
> > For example, searching for the word "transit"
> > (whether in quotes or not) returns documents with the word 
> > "transition" in them.
> >
> > How do I disable this?  We want our engine to be as literal as 
> > possible.  If a user mis-types a word, that's too bad for them
> 
> Use a different field-type for those fields that you want 
> exact matching for (and then re-index).
> Read through schema.xml if you haven't... there are quite a 
> few comments in there.
> You may want a field type with just a whitespace tokenizer 
> followed by a lowercase filter.
> 
> -Yonik
>

RE: Problem with stemming

2007-08-13 Thread David Whalen

Thanks, guys.  I'm sure that by the time I get the book and
learn all about Lucene the CEO of my company will have insisted
we find another search engine.  But the book will look great
on my coffee table


  

> -Original Message-
> From: Lance Norskog [mailto:[EMAIL PROTECTED] 
> Sent: Monday, August 13, 2007 4:37 PM
> To: solr-user@lucene.apache.org
> Subject: RE: Problem with stemming
> 
> (Oops, try again.)
> 
> You need this book:
> 
> http://www.amazon.com/Lucene-Action-Erik-Hatcher/dp/1932394281
> /ref=pd_bbs_sr
> _1/103-4871137-7111056?ie=UTF8&s=books&qid=1187037246&sr=8-1
> 
> Lucene in Action by Eric Hatcher and Otis Gospodnetic.  It 
> does not cover Solr really, but you will understand what 
> Lucene does and how it works.
> Until then you will not really get anywhere.
> 
> Cheers,
> 
> Lance 
> 
> -Original Message-
> From: David Whalen [mailto:[EMAIL PROTECTED]
> Sent: Monday, August 13, 2007 1:00 PM
> To: solr-user@lucene.apache.org
> Subject: RE: Problem with stemming
> 
> Yonik:
> 
> I only raised the question to the group after I had looked in 
> the schema.xml.  There are a lot of comments in that file, 
> but they make no sense to me.  
> 
> I'd appreciate some specific help on what to do...
> 
> DW
> 
> > -Original Message-
> > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On 
> Behalf Of Yonik 
> > Seeley
> > Sent: Monday, August 13, 2007 3:28 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Problem with stemming
> > 
> > On 8/13/07, David Whalen <[EMAIL PROTECTED]> wrote:
> > > Hi All.
> > >
> > > We're running into a problem with stemming that I can't
> > figure out.  
> > > For example, searching for the word "transit"
> > > (whether in quotes or not) returns documents with the word 
> > > "transition" in them.
> > >
> > > How do I disable this?  We want our engine to be as literal as 
> > > possible.  If a user mis-types a word, that's too bad for them
> > 
> > Use a different field-type for those fields that you want exact 
> > matching for (and then re-index).
> > Read through schema.xml if you haven't... there are quite a few 
> > comments in there.
> > You may want a field type with just a whitespace tokenizer 
> followed by 
> > a lowercase filter.
> > 
> > -Yonik
> > 
> 
>

RE: Problem with stemming

2007-08-13 Thread David Whalen

So I shut it off by removing these tags from my schema.xml
file?  Seems like it's this Porter thing that's messing
me up.


> -Original Message-
> From: Tom Mastre [mailto:[EMAIL PROTECTED] 
> Sent: Monday, August 13, 2007 5:19 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Problem with stemming
> 
> Go here
> 
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters?hi
ghlight=%28stemming%29> #head-88cc86e4432b359030cffdb32d095062b843d4f5
> 
> Look for this 
> 
> solr.PorterStemFilterFactory
> 
> 
> On 8/13/07 1:50 PM, "David Whalen" <[EMAIL PROTECTED]> wrote:
> 
> > Thanks, guys.  I'm sure that by the time I get the book and
> > learn all about Lucene the CEO of my company will have insisted
> > we find another search engine.  But the book will look great
> > on my coffee table
> > 
> > 
> >   
> > 
> >> -Original Message-
> >> From: Lance Norskog [mailto:[EMAIL PROTECTED]
> >> Sent: Monday, August 13, 2007 4:37 PM
> >> To: solr-user@lucene.apache.org
> >> Subject: RE: Problem with stemming
> >> 
> >> (Oops, try again.)
> >> 
> >> You need this book:
> >> 
> >> http://www.amazon.com/Lucene-Action-Erik-Hatcher/dp/1932394281
> >> /ref=pd_bbs_sr
> >> _1/103-4871137-7111056?ie=UTF8&s=books&qid=1187037246&sr=8-1
> >> 
> >> Lucene in Action by Eric Hatcher and Otis Gospodnetic.  It
> >> does not cover Solr really, but you will understand what
> >> Lucene does and how it works.
> >> Until then you will not really get anywhere.
> >> 
> >> Cheers,
> >> 
> >> Lance 
> >> 
> >> -Original Message-
> >> From: David Whalen [mailto:[EMAIL PROTECTED]
> >> Sent: Monday, August 13, 2007 1:00 PM
> >> To: solr-user@lucene.apache.org
> >> Subject: RE: Problem with stemming
> >> 
> >> Yonik:
> >> 
> >> I only raised the question to the group after I had looked in
> >> the schema.xml.  There are a lot of comments in that file,
> >> but they make no sense to me.
> >> 
> >> I'd appreciate some specific help on what to do...
> >> 
> >> DW
> >> 
> >>> -Original Message-
> >>> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On
> >> Behalf Of Yonik 
> >>> Seeley
> >>> Sent: Monday, August 13, 2007 3:28 PM
> >>> To: solr-user@lucene.apache.org
> >>> Subject: Re: Problem with stemming
> >>> 
> >>> On 8/13/07, David Whalen <[EMAIL PROTECTED]> wrote:
> >>>> Hi All.
> >>>> 
> >>>> We're running into a problem with stemming that I can't
> >>> figure out.  
> >>>> For example, searching for the word "transit"
> >>>> (whether in quotes or not) returns documents with the word
> >>>> "transition" in them.
> >>>> 
> >>>> How do I disable this?  We want our engine to be as literal as
> >>>> possible.  If a user mis-types a word, that's too bad 
> for them
> >>> 
> >>> Use a different field-type for those fields that you want exact
> >>> matching for (and then re-index).
> >>> Read through schema.xml if you haven't... there are quite a few
> >>> comments in there.
> >>> You may want a field type with just a whitespace tokenizer
> >> followed by 
> >>> a lowercase filter.
> >>> 
> >>> -Yonik
> >>> 
> >> 
> >> 
> 
> Thomas M. Mastre 
> Manager, Homeland Security Digital Library
> 
>  
> Center for Homeland Defense and Security
> The Nation's Homeland Security Educator
> 1 University Circle
> DKL, Room 112 
> Monterey, Ca. 93943
> Phone: 831.656.1095, Cell:831.238.1451
> Fax:831.656.2619 
> email: [EMAIL PROTECTED]
> http://www.hsdl.org
> 
>

Effects of changing schema?

2007-08-24 Thread David Whalen

Hi All.

I'm unclear on whether changing the schema.xml file
automatically causes a reindex or not.  If I'm adding
a field to the schema (and removing some unused ones),
does solr do the reindex?  Or, do I have to kick it
off myself.

Ideally, we'd like to avoid a reindex...

Thanks!

DW

searching where a value is not null?

2007-09-06 Thread David Whalen

Hi all.

I'm trying to construct a query that in pseudo-code would read
like this:

field != ''

I'm finding it difficult to write this as a solr query, though.
Stuff like:

NOT field:()

doesn't seem to do the trick.

any ideas?

dw

quirks with sorting

2007-09-10 Thread David Whalen

Hi All.

I'm seeing a weird problem with sorting that I can't figure out.

I have a query that uses two fields -- a "source" column and a
date column.  I search on the source and I sort by the date
descending.

What I'm seeing is that depending on the value in the source,
the date sort works in reverse.

For example, the query:

content_source:(mv); content_date desc

returns 2007-09-10T09:25:00.000Z in its first row, which is what
I expect.

BUT, the query:

content_source:(thomson); content_date desc

returns 2008-08-17T00:00:00.000Z, which is the first date we
put into SOLR.

So, simply by changing the value in the field, the sort seems
to beem reversed (or ignored outright).

Now, before you ask, I did a "sanity-check" query to make sure
that there is in fact data for that source from today, and there
is.

Can anyone help shed some light on this?

TIA

DW

RE: quirks with sorting

2007-09-10 Thread David Whalen



You know, I must have looked at that date 10 times and I never
noticed the year.

Sorry everyone!



  

> -Original Message-
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf 
> Of Yonik Seeley
> Sent: Monday, September 10, 2007 11:23 AM
> To: solr-user@lucene.apache.org
> Subject: Re: quirks with sorting
> 
> On 9/10/07, David Whalen <[EMAIL PROTECTED]> wrote:
> > I'm seeing a weird problem with sorting that I can't figure out.
> >
> > I have a query that uses two fields -- a "source" column and a date 
> > column.  I search on the source and I sort by the date descending.
> >
> > What I'm seeing is that depending on the value in the 
> source, the date 
> > sort works in reverse.
> >
> > For example, the query:
> >
> > content_source:(mv); content_date desc
> >
> > returns 2007-09-10T09:25:00.000Z in its first row, which is what I 
> > expect.
> >
> > BUT, the query:
> >
> > content_source:(thomson); content_date desc
> >
> > returns 2008-08-17T00:00:00.000Z, which is the first date 
> we put into 
> > SOLR.
> 
> It is it the last (highest date) since it's 2008?
> 
> -Yonik
>

Selecting Distinct values?

2007-09-27 Thread David Whalen

Hi there.

Is there a query I can use to select distinct values in an index?
I thought I could use a facet, but the facets don't seem to return
all the distinct values in the index, only the highest-count ones.

Is there another query I can try?  Or, can I adjust the facets
somehow to make this work?

Thanks,

DW

RE: Selecting Distinct values?

2007-09-27 Thread David Whalen

  Silly me.  Thanks!

  

> -Original Message-
> From: Mike Klaas [mailto:[EMAIL PROTECTED] 
> Sent: Thursday, September 27, 2007 4:46 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Selecting Distinct values?
> 
> On 27-Sep-07, at 12:01 PM, David Whalen wrote:
> 
> > Hi there.
> >
> > Is there a query I can use to select distinct values in an index?
> > I thought I could use a facet, but the facets don't seem to 
> return all 
> > the distinct values in the index, only the highest-count ones.
> >
> > Is there another query I can try?  Or, can I adjust the 
> facets somehow 
> > to make this work?
> 
> http://wiki.apache.org/solr/SimpleFacetParameters#head-1b28106
> 7d007d3fb66f07a3e90e9b1704cbc59a3
> 
> cheers,
> -Mike
> 
>

Availability Issues

2007-10-08 Thread David Whalen

Hi All.

I'm seeing all these threads about availability and I'm
wondering why my situation is so different than others'.

We're running SOLR 1.2 with a 2.5G heap size.  On any
given day, the system becomes completely unresponsive.
We can't even get /solr/admin/ to come up, much less
any select queries.  

The only thing we can do is kill the SOLR process and
re-start it.

We are indexing over 25 million documents and we add
about as much as we remove daily, so the number remains
fairly constant.

Again, it seems like other folks are having a much
easier time with SOLR than we are.  Can anyone help
by sharing how you've got it configured?  Does anyone
have a similar experience?

TIA.

DW

RE: Availability Issues

2007-10-08 Thread David Whalen

Hi Tom.

The logs show nothing but regular activity.  We do a "tail -f"
on the logfile and we can read it during the unresponsive period
and we don't see any errors.

I've attached our schema/config files.  They are pretty much
out-of-the-box values, except for our index.

Dave


> -Original Message-
> From: Tom Hill [mailto:[EMAIL PROTECTED] 
> Sent: Monday, October 08, 2007 2:22 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Availability Issues
> 
> Hi -
> 
> We're definitely not seeing that. What do your logs show? 
> What do your schema/solrconfig look like?
> 
> Tom
> 
> 
> On 10/8/07, David Whalen <[EMAIL PROTECTED]> wrote:
> >
> > Hi All.
> >
> > I'm seeing all these threads about availability and I'm 
> wondering why 
> > my situation is so different than others'.
> >
> > We're running SOLR 1.2 with a 2.5G heap size.  On any given 
> day, the 
> > system becomes completely unresponsive.
> > We can't even get /solr/admin/ to come up, much less any select 
> > queries.
> >
> > The only thing we can do is kill the SOLR process and re-start it.
> >
> > We are indexing over 25 million documents and we add about 
> as much as 
> > we remove daily, so the number remains fairly constant.
> >
> > Again, it seems like other folks are having a much easier time with 
> > SOLR than we are.  Can anyone help by sharing how you've got it 
> > configured?  Does anyone have a similar experience?
> >
> > TIA.
> >
> > DW
> >
> >
> 
>

RE: Availability Issues

2007-10-08 Thread David Whalen

Hi Yonik.

> What version of Solr are you running?

We're running:
Solr Specification Version: 1.2.2007.08.24.08.06.00 
Solr Implementation Version: nightly ${svnversion} - yonik - 2007-08-24 
08:06:00 
Lucene Specification Version: 2.2.0 
Lucene Implementation Version: 2.2.0 548010 - buschmi - 2007-06-16 23:15:56 

> Is the CPU pegged at 100% when it's unresponsive?

It's a little difficult to be sure.  We have a HT box and the
CPU % we get back is misleading.  I think it's safe to say we
may spike up to 100% but we don't necessarily stay pegged there.

> Have you taken a thread dump to see what is going on?

We can't do it b/c during the unresponsive time we can't access
the admin site (/solr/admin) at all.  I don't know how to do a
thread dump via the command line

> Do you get into a situation where more than one searcher is 
> warming at a time? (there is configuration that can prevent 
> this one from happening).

Forgive me when I say I'm not totally clear on what this 
question means.  The index is constantly getting hit with
a myriad or queries, if that's what you meant

Thanks,

Dave


  

> -Original Message-
> From: Yonik Seeley [mailto:[EMAIL PROTECTED] 
> Sent: Monday, October 08, 2007 2:23 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Availability Issues
> 
> On 10/8/07, David Whalen <[EMAIL PROTECTED]> wrote:
> > We're running SOLR 1.2 with a 2.5G heap size.  On any given 
> day, the 
> > system becomes completely unresponsive.
> > We can't even get /solr/admin/ to come up, much less any select 
> > queries.
> 
> What version of Solr are you running?
> The first step to diagnose something like this is to figure 
> out what is going on...
> Is the CPU pegged at 100% when it's unresponsive?
> Have you taken a thread dump to see what is going on?
> Do you get into a situation where more than one searcher is 
> warming at a time? (there is configuration that can prevent 
> this one from happening).
> 
> -Yonik
> 
>

RE: Availability Issues

2007-10-08 Thread David Whalen

Hi Yonik.

> Do you see any requests that took a really long time to finish?

The requests that take a long time to finish are just simple
queries.  And the same queries run at a later time come back
much faster.

Our logs contain 99% inserts and 1% queries.  We are constantly
adding documents to the index at a rate of 10,000 per minute,
so the logs show mostly that.


> Start with the thread dump.
> I bet it's multiple queries piling up around some 
> synchronization points in lucene (sometimes caused by 
> multiple threads generating the same big filter that isn't 
> yet cached).

What would be my next steps after that?  I'm not sure I'd
understand enough from the dump to make heads-or-tails of
it.  Can I share that here?

Dave


> -Original Message-
> From: Yonik Seeley [mailto:[EMAIL PROTECTED] 
> Sent: Monday, October 08, 2007 3:01 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Availability Issues
> 
> On 10/8/07, David Whalen <[EMAIL PROTECTED]> wrote:
> > The logs show nothing but regular activity.  We do a "tail -f"
> > on the logfile and we can read it during the unresponsive 
> period and 
> > we don't see any errors.
> 
> You don't see log entries for requests until after they complete.
> When a server becomes unresponsive, try shutting off further 
> traffic to it, and let it finish whatever requests it's 
> working on (assuming that's the issue) so you can see them in 
> the log.  Do you see any requests that took a really long 
> time to finish?
> 
> -Yonik
> 
>

RE: Availability Issues

2007-10-08 Thread David Whalen

> Oh, so you are using the same boxes for updating and querying?

Yep.  We have a MySQL database on the box and we query it and
POST directly into SOLR via wget in PERL.  We then also hit the
box for queries.

[We'd be very interested in hearing about best practices on
how to seperate-out the data from the index and how to balance
them when the inserts outweigh the selects by factors of 50,000:1]

> When you insert, are you using multiple threads?  If so, how many?

We're not threading at all.  We have a PERL script that does a
select statement out of a MySQL database and runs POSTs sequentially
into SOLR, one per document.  After a batch of 10,000 POSTs, we run a
background commit (using waitFlush and waitSearcher)

Again, I'd be very grateful for success stories from people in terms
of good server architecture.  We are ready and willing to change versions
of linux, of the Java container, etc.  And we're ready to add more
boxes if that'll help.  We just need some guidance.

> What is the full URL of those slow query requests?

They can be anything.  For example:

[08/10/2007:18:51:55 +] "GET 
/solr/select/?q=solr&version=2.2&start=0&rows=10&indent=on HTTP/1.1" 200 45799

> Do the slow requests start after a commit?

Based on the way the logs read, you could argue that point.
The stream of POSTs end in the logs and then subsequent queries
take longer to run, but it's hard to be sure there's a direct
correlation.

> Yes, post it here.  Most likely a majority of the threads 
> will be blocked somewhere deep in lucene code, and you will 
> probably need help from people here to figure it out.

Next time it happens I'll shoot it over.
  
--Dave


> -Original Message-
> From: Yonik Seeley [mailto:[EMAIL PROTECTED] 
> Sent: Monday, October 08, 2007 3:42 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Availability Issues
> 
> On 10/8/07, David Whalen <[EMAIL PROTECTED]> wrote:
> > > Do you see any requests that took a really long time to finish?
> >
> > The requests that take a long time to finish are just 
> simple queries.  
> > And the same queries run at a later time come back much faster.
> >
> > Our logs contain 99% inserts and 1% queries.  We are 
> constantly adding 
> > documents to the index at a rate of 10,000 per minute, so the logs 
> > show mostly that.
> 
> Oh, so you are using the same boxes for updating and querying?
> When you insert, are you using multiple threads?  If so, how many?
> 
> What is the full URL of those slow query requests?
> Do the slow requests start after a commit?
> 
> > > Start with the thread dump.
> > > I bet it's multiple queries piling up around some synchronization 
> > > points in lucene (sometimes caused by multiple threads generating 
> > > the same big filter that isn't yet cached).
> >
> > What would be my next steps after that?  I'm not sure I'd 
> understand 
> > enough from the dump to make heads-or-tails of it.  Can I 
> share that 
> > here?
> 
> Yes, post it here.  Most likely a majority of the threads 
> will be blocked somewhere deep in lucene code, and you will 
> probably need help from people here to figure it out.
> 
> -Yonik
> 
>

RE: Availability Issues

2007-10-08 Thread David Whalen

Hi Chris.

My logs don't look anything like that.  They look like HTTP
requests.  Am I looking in the wrong place?

Dave


> -Original Message-
> From: Chris Hostetter [mailto:[EMAIL PROTECTED] 
> Sent: Monday, October 08, 2007 5:02 PM
> To: solr-user
> Subject: RE: Availability Issues
> 
> 
> : > Do the slow requests start after a commit?
> : 
> : Based on the way the logs read, you could argue that point.
> : The stream of POSTs end in the logs and then subsequent queries
> : take longer to run, but it's hard to be sure there's a direct
> : correlation.
> 
> you would know based on the INFO level messages related to a 
> commit ... 
> you'll see messages that look like this when the commit starts...
> 
> Oct 8, 2007 1:56:48 PM 
> org.apache.solr.update.DirectUpdateHandler2 commit
> INFO: start commit(optimize=false,waitFlush=false,waitSearcher=true)
> 
> ...then you'll see a message like this...
> 
> Oct 8, 2007 1:56:48 PM 
> org.apache.solr.update.DirectUpdateHandler2 commit
> INFO: end_commit_flush
> 
> ...if you have autowarming you'll see a bunch of logs about 
> that, and then eventually you'll see a message like this...
> 
> Oct 8, 2007 1:56:48 PM 
> org.apache.solr.update.processor.LogUpdateProcessor finish
> INFO: {commit=} 0 299
> 
> ...the important question is how many of these hangs or 
> really long queries happen in the midst of all that ... how 
> many happen very quickly after it (which may indicate not 
> enough warming)
> 
> (NOTE: some of those log messages may look different in your 
> nightly snapshot version, but the main gist should be the 
> same .. i don't remember when exactly the LogUpdateProcessor 
> was added).
> 
> 
> 
> 
> -Hoss
> 
> 
>

RE: Availability Issues

2007-10-08 Thread David Whalen

Thanks for letting me know that.  Okay, here they are:


 BEGIN SCHEMA.XML ===








  

  


































  

  




  








  
  








  





  







  


 


 
   

   
   
   
   
   
   
   
   
   
   
   
   

   
   
   
   
   
   
   
   
   
 

 
 id

 
 text

 
 

  
   

 
 



 END SCHEMA.XML ===




 BEGIN CONFIG.XML ===








  
  
 false
  
   
true
10
1000
2147483647
1
1000
1
  

  

true
10
1000
2147483647
1


true
  

  
  









  


  

1024





   


  



false




   

   
10

















false

  


  
  

 
   explicit

   
   50
   10
   *
   2.1
-->
 
  

  
  

 explicit
 0.01
 
text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
 
 
text^0.2 features^1.1 name^1.5 manu^1.4 manu_exact^1.9
 
 
ord(poplarity)^0.5 recip(rord(price),1,1000,1000)^0.3
 
 
id,name,price,score
 
 
2<-1 5<-2 6<90%
 
 100

  

  
  

 explicit
 text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0
 2<-1 5<-2 6<90%
 
 incubationdate_dt:[* TO NOW/DAY-1MONTH]^2.2



  inStock:true



  cat
  manu_exact
  price:[* TO 500]
  price:[500 TO *]

  
  
  

 
inStock:true
 
 
text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
 
 
2<-1 5<-2 6<90%
 
  
  
  

  
  
5
   

   
  
solr
solrconfig.xml schema.xml admin-extra.html


 qt=dismax&q=solr&start=3&fq=id:[* TO *]&fq=cat:[* TO *]


  





 END CONFIG.XML ===



 

> -Original Message-
> From: Chris Hostetter [mailto:[EMAIL PROTECTED] 
> Sent: Monday, October 08, 2007 4:56 PM
> To: solr-user
> Subject: RE: Availability Issues
> 
> : I've attached our schema/config files.  They are pretty much
> : out-of-the-box values, except for our index.
> 
> FYI: the mailing list strips most attachemnts ... the best 
> thing to do is just inline them in your mail.
> 
> Quick question: do you have autoCommit turned on in your 
> solrconfig.xml?
> 
> Second question: do you have autowarming on your caches?
> 
> 
> 
> -Hoss
> 
> 
>

RE: Availability Issues

2007-10-09 Thread David Whalen

Chris:

We're using Jetty also, so I get the sense I'm looking at the
wrong log file.

On that note -- I've read that Jetty isn't the best servlet
container to use in these situations, is that your experience?

Dave


> -Original Message-
> From: Chris Hostetter [mailto:[EMAIL PROTECTED] 
> Sent: Monday, October 08, 2007 11:20 PM
> To: solr-user
> Subject: RE: Availability Issues
> 
> 
> : My logs don't look anything like that.  They look like HTTP
> : requests.  Am I looking in the wrong place?
> 
> what servlet container are you using?  
> 
> every servlet container handles applications logs differently 
> -- it's especially tricky becuse even the format can be 
> changed, the examples i gave before are in the default format 
> you get if you use the jetty setup in the solr example (which 
> logs to stdout), but many servlet containers won't include 
> that much detail by default (they typically leave out the 
> classname and method name).  there's also typically a setting 
> that controls the verbosity -- so in some configurations only 
> the SEVERE messages are logged and in others the INFO 
> messages are logged ... you're going to want at least the 
> INFO level to debug stuff.
> 
> grep all the log files you can find for "Solr home set to" 
> ... that's one of the first messages Solr logs.  if you can 
> find that, you'll find the other messages i was talking about.
> 
> 
> -Hoss
> 
> 
>

RE: Availability Issues

2007-10-09 Thread David Whalen

All:

How can I break up my install onto more than one box?  We've
hit a learning curve here and we don't understand how best to
proceed.  Right now we have everything crammed onto one box
because we don't know any better.

So, how would you build it if you could?  Here are the specs:

a) the index needs to hold at least 25 million articles
b) the index is constantly updated at a rate of 10,000 articles
per minute
c) we need to have faceted queries

Again, real-world experience is preferred here over book knowledge.
We've tried to read the docs and it's only made us more confused.

TIA

Dave W
  

> -Original Message-
> From: Yonik Seeley [mailto:[EMAIL PROTECTED] 
> Sent: Monday, October 08, 2007 3:42 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Availability Issues
> 
> On 10/8/07, David Whalen <[EMAIL PROTECTED]> wrote:
> > > Do you see any requests that took a really long time to finish?
> >
> > The requests that take a long time to finish are just 
> simple queries.  
> > And the same queries run at a later time come back much faster.
> >
> > Our logs contain 99% inserts and 1% queries.  We are 
> constantly adding 
> > documents to the index at a rate of 10,000 per minute, so the logs 
> > show mostly that.
> 
> Oh, so you are using the same boxes for updating and querying?
> When you insert, are you using multiple threads?  If so, how many?
> 
> What is the full URL of those slow query requests?
> Do the slow requests start after a commit?
> 
> > > Start with the thread dump.
> > > I bet it's multiple queries piling up around some synchronization 
> > > points in lucene (sometimes caused by multiple threads generating 
> > > the same big filter that isn't yet cached).
> >
> > What would be my next steps after that?  I'm not sure I'd 
> understand 
> > enough from the dump to make heads-or-tails of it.  Can I 
> share that 
> > here?
> 
> Yes, post it here.  Most likely a majority of the threads 
> will be blocked somewhere deep in lucene code, and you will 
> probably need help from people here to figure it out.
> 
> -Yonik
> 
>

Facets and running out of Heap Space

2007-10-09 Thread David Whalen

Hi All.

I run a faceted query against a very large index on a 
regular schedule.  Every now and then the query throws
an out of heap space error, and we're sunk.

So, naturally we increased the heap size and things worked
well for a while and then the errors would happen again.
We've increased the initial heap size to 2.5GB and it's
still happening.

Is there anything we can do about this?

Thanks in advance,

Dave W

RE: Facets and running out of Heap Space

2007-10-09 Thread David Whalen

Hi Yonik.

According to the doc:


> This is only used during the term enumeration method of
> faceting (facet.field type faceting on multi-valued or
> full-text fields). 

What if I'm faceting on just a plain String field?  It's
not full-text, and I don't have multiValued set for it


Dave


> -Original Message-
> From: Yonik Seeley [mailto:[EMAIL PROTECTED] 
> Sent: Tuesday, October 09, 2007 12:47 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Facets and running out of Heap Space
> 
> On 10/9/07, David Whalen <[EMAIL PROTECTED]> wrote:
> > I run a faceted query against a very large index on a regular 
> > schedule.  Every now and then the query throws an out of heap space 
> > error, and we're sunk.
> >
> > So, naturally we increased the heap size and things worked 
> well for a 
> > while and then the errors would happen again.
> > We've increased the initial heap size to 2.5GB and it's still 
> > happening.
> >
> > Is there anything we can do about this?
> 
> Try facet.enum.cache.minDf param:
> http://wiki.apache.org/solr/SimpleFacetParameters
> 
> -Yonik
> 
>

RE: Facets and running out of Heap Space

2007-10-09 Thread David Whalen

> Then you will be using the FieldCache counting method, and 
> this param is not applicable :-) Are all your field that you 
> facet on like this?

Unfortunately yes.  Could I improve my situation by changing
them to multiValued?



_____
david whalen
senior applications developer
eNR Services, Inc.
[EMAIL PROTECTED]
203-849-7240
  

> -Original Message-
> From: Yonik Seeley [mailto:[EMAIL PROTECTED] 
> Sent: Tuesday, October 09, 2007 2:14 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Facets and running out of Heap Space
> 
> On 10/9/07, David Whalen <[EMAIL PROTECTED]> wrote:
> > > This is only used during the term enumeration method of faceting 
> > > (facet.field type faceting on multi-valued or full-text fields).
> >
> > What if I'm faceting on just a plain String field?  It's not 
> > full-text, and I don't have multiValued set for it
> 
> Then you will be using the FieldCache counting method, and 
> this param is not applicable :-) Are all your field that you 
> facet on like this?
> 
> The FieldCache entry might be taking up too much room, esp if 
> the number of entries is high, and the entries are big.  The 
> requests themselves can take up a good chunk of memory 
> temporarily (4 bytes * nValuesInField).
> 
> You could try a memory profiling tool and see where all the 
> memory is being taken up too.
> 
> -Yonik
> 
>

RE: Facets and running out of Heap Space

2007-10-09 Thread David Whalen

> is this the same 25,000,000 document index you mentioned before?

Yep.

> how big is your index on disk? are you faceting or sorting on 
> other fields as well?

running 'du -h' on my index directory returns 86G.  We facet
on almost all of our index fields (they were added to the index
solely for that purpose, otherwise we'd remove them).  Here's
the meaty part of the config again:

 
 
 
 
 
 
 
 
 
 
 
 

I'm sure we could stop storing many of these columns, especially
if someone told me that would make a big difference.


> what does the LukeReqeust Handler tell you about the # of 
> distinct terms in each field that you facet on?

Where would I find that?  I could probably estimate that myself
on a per-column basis.  it ranges from 4 distinct values for
media_type to 30-ish for location to 200-ish for country_code
to almost 10,000 for site_id to almost 100,000 for journalist_id.

Thanks very much for your help so far, Chris!

Dave


  

> -Original Message-
> From: Chris Hostetter [mailto:[EMAIL PROTECTED] 
> Sent: Tuesday, October 09, 2007 2:48 PM
> To: solr-user
> Subject: Re: Facets and running out of Heap Space
> 
> 
> : So, naturally we increased the heap size and things worked
> : well for a while and then the errors would happen again.
> : We've increased the initial heap size to 2.5GB and it's
> : still happening.
> 
> is this the same 25,000,000 document index you mentioned before?
> 
> 2.5GB of heap doesn't seem like much if you are also doing 
> faceting ... 
> even if you are faceting on an int field, there's going to be 
> 95MB of FieldCache for that field, you said this was a string 
> field, so it's going to be 95MB+however much space is needed 
> for all the terms (presumably if you are faceting on this 
> field every doc doesn't have a unique value, but even 
> assuming a conservative 10% unique values of 10 characters 
> each that's another ~50MB, so we're up to about 150MB of 
> FieldCache to facet that field -- and we haven't even started 
> talking about how big the index is itself (or how big the 
> filterCache gets, or how many other fields you are faceting on)
> 
> how big is your index on disk? are you faceting or sorting on 
> other fields as well?
> 
> what does the LukeReqeust Handler tell you about the # of 
> distinct terms in each field that you facet on?
> 
> 
> 
> 
> -Hoss
> 
> 
>

RE: Facets and running out of Heap Space

2007-10-09 Thread David Whalen

> Make sure you have:
>  class="org.apache.solr.handler.admin.LukeRequestHandler" /> 
> defined in solrconfig.xml

What's the consequence of me changing the solrconfig.xml file?
Doesn't that cause a restart of solr?

> for a large index, this can be very slow but the results are valuable.

In what way?  I'm still not clear on what this does for me


> -Original Message-
> From: Ryan McKinley [mailto:[EMAIL PROTECTED] 
> Sent: Tuesday, October 09, 2007 4:01 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Facets and running out of Heap Space
> 
> > 
> >> what does the LukeReqeust Handler tell you about the # of distinct 
> >> terms in each field that you facet on?
> > 
> > Where would I find that?  
> 
> check:
> http://wiki.apache.org/solr/LukeRequestHandler
> 
> Make sure you have:
>  class="org.apache.solr.handler.admin.LukeRequestHandler" /> 
> defined in solrconfig.xml
> 
> for a large index, this can be very slow but the results are valuable.
> 
> ryan
> 
>

RE: Facets and running out of Heap Space

2007-10-10 Thread David Whalen

It looks now like I can't use facets the way I was hoping
to because the memory requirements are impractical.

So, as an alternative I was thinking I could get counts
by doing rows=0 and using filter queries.  

Is there a reason to think that this might perform better?
Or, am I simply moving the problem to another step in the
process?

DW

  

> -Original Message-
> From: Stu Hood [mailto:[EMAIL PROTECTED] 
> Sent: Tuesday, October 09, 2007 10:53 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Facets and running out of Heap Space
> 
> > Using the filter cache method on the things like media type and 
> > location; this will occupy ~2.3MB of memory _per unique value_
> 
> Mike, how did you calculate that value? I'm trying to tune my 
> caches, and any equations that could be used to determine 
> some balanced settings would be extremely helpful. I'm in a 
> memory limited environment, so I can't afford to throw a ton 
> of cache at the problem.
> 
> (I don't want to thread-jack, but I'm also wondering whether 
> anyone has any notes on how to tune cache sizes for the 
> filterCache, queryResultCache and documentCache).
> 
> Thanks,
> Stu
> 
> 
> -Original Message-
> From: Mike Klaas <[EMAIL PROTECTED]>
> Sent: Tuesday, October 9, 2007 9:30pm
> To: solr-user@lucene.apache.org
> Subject: Re: Facets and running out of Heap Space
> 
> On 9-Oct-07, at 12:36 PM, David Whalen wrote:
> 
> >(snip)
> > I'm sure we could stop storing many of these columns, 
> especially  if 
> >someone told me that would make a big difference.
> 
> I don't think that it would make a difference in memory 
> consumption, but storage is certainly not necessary for 
> faceting.  Extra stored fields can slow down search if they 
> are large (in terms of bytes), but don't really occupy extra 
> memory, unless they are polluting the doc cache.  Does 'text' 
> need to be stored?
> >
> >> what does the LukeReqeust Handler tell you about the # of distinct 
> >> terms in each field that you facet on?
> >
> > Where would I find that?  I could probably estimate that 
> myself on a 
> > per-column basis.  it ranges from 4 distinct values for 
> media_type to 
> > 30-ish for location to 200-ish for country_code to almost 
> 10,000 for 
> > site_id to almost 100,000 for journalist_id.
> 
> Using the filter cache method on the things like media type 
> and location; this will occupy ~2.3MB of memory _per unique 
> value_, so it should be a net win for those (although quite 
> close in space requirements for a 30-ary field on your index size).
> 
> -Mike
> 
>

RE: Facets and running out of Heap Space

2007-10-10 Thread David Whalen

Accoriding to Yonik I can't use minDf because I'm faceting
on a string field.  I'm thinking of changing it to a tokenized
type so that I can utilize this setting, but then I'll have to
rebuild my entire index.

Unless there's some way around that?


  

> -Original Message-
> From: Mike Klaas [mailto:[EMAIL PROTECTED] 
> Sent: Wednesday, October 10, 2007 4:56 PM
> To: solr-user@lucene.apache.org
> Cc: stuhood
> Subject: Re: Facets and running out of Heap Space
> 
> On 10-Oct-07, at 12:19 PM, David Whalen wrote:
> 
> > It looks now like I can't use facets the way I was hoping 
> to because 
> > the memory requirements are impractical.
> 
> I can't remember if this has been mentioned, but upping the 
> HashDocSet size is one way to reduce memory consumption.  
> Whether this will work well depends greatly on the 
> cardinality of your facet sets.  facet.enum.cache.minDf set 
> high is another option (will not generate a bitset for any 
> value whose facet set is less that this value).
> 
> Both options have performance implications.
> 
> > So, as an alternative I was thinking I could get counts by doing 
> > rows=0 and using filter queries.
> >
> > Is there a reason to think that this might perform better?
> > Or, am I simply moving the problem to another step in the process?
> 
> Running one query per unique facet value seems impractical, 
> if that is what you are suggesting.  Setting minDf to a very 
> high value should always outperform such an approach.
> 
> -Mike
> 
> > DW
> >
> >
> >
> >> -Original Message-
> >> From: Stu Hood [mailto:[EMAIL PROTECTED]
> >> Sent: Tuesday, October 09, 2007 10:53 PM
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: Facets and running out of Heap Space
> >>
> >>> Using the filter cache method on the things like media type and 
> >>> location; this will occupy ~2.3MB of memory _per unique value_
> >>
> >> Mike, how did you calculate that value? I'm trying to tune 
> my caches, 
> >> and any equations that could be used to determine some balanced 
> >> settings would be extremely helpful. I'm in a memory limited 
> >> environment, so I can't afford to throw a ton of cache at the 
> >> problem.
> >>
> >> (I don't want to thread-jack, but I'm also wondering 
> whether anyone 
> >> has any notes on how to tune cache sizes for the filterCache, 
> >> queryResultCache and documentCache).
> >>
> >> Thanks,
> >> Stu
> >>
> >>
> >> -Original Message-
> >> From: Mike Klaas <[EMAIL PROTECTED]>
> >> Sent: Tuesday, October 9, 2007 9:30pm
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: Facets and running out of Heap Space
> >>
> >> On 9-Oct-07, at 12:36 PM, David Whalen wrote:
> >>
> >>> (snip)
> >>> I'm sure we could stop storing many of these columns,
> >> especially  if
> >>> someone told me that would make a big difference.
> >>
> >> I don't think that it would make a difference in memory 
> consumption, 
> >> but storage is certainly not necessary for faceting.  Extra stored 
> >> fields can slow down search if they are large (in terms of bytes), 
> >> but don't really occupy extra memory, unless they are 
> polluting the 
> >> doc cache.  Does 'text'
> >> need to be stored?
> >>>
> >>>> what does the LukeReqeust Handler tell you about the # 
> of distinct 
> >>>> terms in each field that you facet on?
> >>>
> >>> Where would I find that?  I could probably estimate that
> >> myself on a
> >>> per-column basis.  it ranges from 4 distinct values for
> >> media_type to
> >>> 30-ish for location to 200-ish for country_code to almost
> >> 10,000 for
> >>> site_id to almost 100,000 for journalist_id.
> >>
> >> Using the filter cache method on the things like media type and 
> >> location; this will occupy ~2.3MB of memory _per unique 
> value_, so it 
> >> should be a net win for those (although quite close in space 
> >> requirements for a 30-ary field on your index size).
> >>
> >> -Mike
> >>
> >>
> 
> 
>

RE: Facets and running out of Heap Space

2007-10-10 Thread David Whalen

I'll see what I can do about that.

Truthfully, the most important facet we need is the one on
media_type, which has only 4 unique values.  The second
most important one to us is location, which has about 30
unique values.

So, it would seem like we actually need a counter-intuitive
solution.  That's why I thought Field Queries might be the
solution.

Is there some reason to avoid setting multiValued to true
here?  It sounds like it would be the true cure-all

Thanks again!

dave


  

> -Original Message-
> From: Mike Klaas [mailto:[EMAIL PROTECTED] 
> Sent: Wednesday, October 10, 2007 6:20 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Facets and running out of Heap Space
> 
> On 10-Oct-07, at 2:40 PM, David Whalen wrote:
> 
> > Accoriding to Yonik I can't use minDf because I'm faceting 
> on a string 
> > field.  I'm thinking of changing it to a tokenized type so 
> that I can 
> > utilize this setting, but then I'll have to rebuild my entire index.
> >
> > Unless there's some way around that?
> 
> For the fields that matter (many unique values), this is 
> likely result in a performance regression.
> 
> It might be better to try storing less unique data.  For 
> instance, faceting on the blog_url field, or create_date in 
> your schema would case problems (they probably have millions 
> of unique values).
> 
> It would be helpful to know which field is causing the 
> problem.  One way would be to do a sorted query on a 
> quiescent index for each field, and see if there are any 
> suspiciously large jumps in memory usage.
> 
> -Mike
> 
> >
> >
> >
> >> -Original Message-
> >> From: Mike Klaas [mailto:[EMAIL PROTECTED]
> >> Sent: Wednesday, October 10, 2007 4:56 PM
> >> To: solr-user@lucene.apache.org
> >> Cc: stuhood
> >> Subject: Re: Facets and running out of Heap Space
> >>
> >> On 10-Oct-07, at 12:19 PM, David Whalen wrote:
> >>
> >>> It looks now like I can't use facets the way I was hoping
> >> to because
> >>> the memory requirements are impractical.
> >>
> >> I can't remember if this has been mentioned, but upping the
> >> HashDocSet size is one way to reduce memory consumption.
> >> Whether this will work well depends greatly on the
> >> cardinality of your facet sets.  facet.enum.cache.minDf set
> >> high is another option (will not generate a bitset for any
> >> value whose facet set is less that this value).
> >>
> >> Both options have performance implications.
> >>
> >>> So, as an alternative I was thinking I could get counts by doing
> >>> rows=0 and using filter queries.
> >>>
> >>> Is there a reason to think that this might perform better?
> >>> Or, am I simply moving the problem to another step in the process?
> >>
> >> Running one query per unique facet value seems impractical,
> >> if that is what you are suggesting.  Setting minDf to a very
> >> high value should always outperform such an approach.
> >>
> >> -Mike
> >>
> >>> DW
> >>>
> >>>
> >>>
> >>>> -Original Message-
> >>>> From: Stu Hood [mailto:[EMAIL PROTECTED]
> >>>> Sent: Tuesday, October 09, 2007 10:53 PM
> >>>> To: solr-user@lucene.apache.org
> >>>> Subject: Re: Facets and running out of Heap Space
> >>>>
> >>>>> Using the filter cache method on the things like media type and
> >>>>> location; this will occupy ~2.3MB of memory _per unique value_
> >>>>
> >>>> Mike, how did you calculate that value? I'm trying to tune
> >> my caches,
> >>>> and any equations that could be used to determine some balanced
> >>>> settings would be extremely helpful. I'm in a memory limited
> >>>> environment, so I can't afford to throw a ton of cache at the
> >>>> problem.
> >>>>
> >>>> (I don't want to thread-jack, but I'm also wondering
> >> whether anyone
> >>>> has any notes on how to tune cache sizes for the filterCache,
> >>>> queryResultCache and documentCache).
> >>>>
> >>>> Thanks,
> >>>> Stu
> >>>>
> >>>>
> >>>> -Original Message-
> >>>> From: Mike Klaas <[EMAIL PROTECTED]>
> >>>> Sent: Tuesday, October 9, 2007 9:30pm

comment-out a filter?

2007-10-15 Thread David Whalen

Hi All.

I want to comment-out a filter in my schema.xml, specifically
the solr.EnglishPorterFilterFactory filter.

I want to know -- will this cause me to have to re-build my
index?  Or will a restart of SOLR get the job done?

Thanks!

Dave W

Date range problems

2007-10-26 Thread David Whalen

Hi All.

We're seeing a really interesting problem when searching by
date range.  

We have two fields of type "date" in our index (they are both
indexed and stored).  They are:

content_date
and
created_date

We can run any date-range query we want against content_date 
and we get expected results.  However, when we run similar
queries against created_date we get 0 results from the query
consistently.

Now, here's the interesting part -- if we do a plain search
without a date range, BUT sort by created_date desc we get 
properly sorted results.

So, it seems like the index works for sorting but not for
searching.  Does that make any sense?  Anyone have an ideas
on how we can diagnose this issue?

Here's the relevant block from our schema (before you ask):

   
   
   
   
   
   
   
   
   
   
   
   

TIA,

Dave W.

38 matches

Mail list logo