I"ll be watching this one as I  hope to be loading lots of docs soon.
Dennis Gearon

Signature Warning
----------------
EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Fri, 7/2/10, Jim Blomo <jim.bl...@pbworks.com> wrote:

> From: Jim Blomo <jim.bl...@pbworks.com>
> Subject: Re: general debugging techniques?
> To: solr-user@lucene.apache.org
> Date: Friday, July 2, 2010, 7:06 PM
> Just to confirm I'm not doing
> something insane, this is my general setup:
> 
> - index approx 1MM documents including HTML, pictures,
> office files, etc.
> - files are not local to solr process
> - use upload/extract to extract text from them through
> tika
> - use commit=1 on each POST (reasons below)
> - use optimize=1 every 150 documents or so (reasons below)
> 
> Through many manual restarts and modifications to the
> upload script,
> I've got about half way (numDocs : 467372, disk usage
> 1.6G).  The
> biggest problem is that any serious problem cannot be
> recovered from
> without a restart to tomcat, and serious problems can't be
> differentiated at the client level from non-serious
> problems (eg tika
> exceptions thrown by bad documents).
> 
> On Wed, Jun 9, 2010 at 10:13 AM, Jim Blomo <jim.bl...@pbworks.com>
> wrote:
> > In any case I bumped up the heap to 3G as suggested,
> which has helped
> > stability.  I have found that in practice I need to
> commit every
> > extraction because a crash or error will wipe out all
> extractions
> > after the last commit.
> 
> I've also found that I need to optimize very regularly
> because I kept
> getting "too many file handles" errors (though they usually
> came up as
> the more cryptic "directory, but cannot be listed: list()
> returned
> null" returned empty error).
> 
> What I am running into now is
> 
> SEVERE: Exception invoking periodic operation:
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>         at
> java.lang.String.substring(String.java:1940)
> [full backtrace below]
> 
> After a restart and optimize this goes away for a while
> (~100
> documents) but then comes back and every request after the
> error
> fails.  Even if I can't prevent this error, is there a
> way I can
> recover from it better?  Perhaps an option to solr or
> tomcat to just
> restart itself if it hits that error?
> 
> Jim
> 
> SEVERE: Exception invoking periodic operation:
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>         at
> java.lang.String.substring(String.java:1940)
>         at
> java.lang.String.substring(String.java:1905)
>         at
> java.io.File.getName(File.java:401)
>         at
> java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:229)
>         at
> java.io.File.isDirectory(File.java:754)
>         at
> org.apache.catalina.startup.HostConfig.checkResources(HostConfig.java:1000)
>         at
> org.apache.catalina.startup.HostConfig.check(HostConfig.java:1214)
>         at
> org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:293)
>         at
> org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:120)
>         at
> org.apache.catalina.core.ContainerBase.backgroundProcess(ContainerBase.java:1306)
>         at
> org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.processChildren(ContainerBase.java:1570)
>         at
> org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.processChildren(ContainerBase.java:1579)
>         at
> org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.run(ContainerBase.java:1559)
>         at
> java.lang.Thread.run(Thread.java:619)
> Jul 3, 2010 1:32:20 AM
> org.apache.solr.update.processor.LogUpdateProcessor finish
>

Reply via email to