I"ll be watching this one as I hope to be loading lots of docs soon. Dennis Gearon
Signature Warning ---------------- EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Fri, 7/2/10, Jim Blomo <jim.bl...@pbworks.com> wrote: > From: Jim Blomo <jim.bl...@pbworks.com> > Subject: Re: general debugging techniques? > To: solr-user@lucene.apache.org > Date: Friday, July 2, 2010, 7:06 PM > Just to confirm I'm not doing > something insane, this is my general setup: > > - index approx 1MM documents including HTML, pictures, > office files, etc. > - files are not local to solr process > - use upload/extract to extract text from them through > tika > - use commit=1 on each POST (reasons below) > - use optimize=1 every 150 documents or so (reasons below) > > Through many manual restarts and modifications to the > upload script, > I've got about half way (numDocs : 467372, disk usage > 1.6G). The > biggest problem is that any serious problem cannot be > recovered from > without a restart to tomcat, and serious problems can't be > differentiated at the client level from non-serious > problems (eg tika > exceptions thrown by bad documents). > > On Wed, Jun 9, 2010 at 10:13 AM, Jim Blomo <jim.bl...@pbworks.com> > wrote: > > In any case I bumped up the heap to 3G as suggested, > which has helped > > stability. I have found that in practice I need to > commit every > > extraction because a crash or error will wipe out all > extractions > > after the last commit. > > I've also found that I need to optimize very regularly > because I kept > getting "too many file handles" errors (though they usually > came up as > the more cryptic "directory, but cannot be listed: list() > returned > null" returned empty error). > > What I am running into now is > > SEVERE: Exception invoking periodic operation: > java.lang.OutOfMemoryError: GC overhead limit exceeded > at > java.lang.String.substring(String.java:1940) > [full backtrace below] > > After a restart and optimize this goes away for a while > (~100 > documents) but then comes back and every request after the > error > fails. Even if I can't prevent this error, is there a > way I can > recover from it better? Perhaps an option to solr or > tomcat to just > restart itself if it hits that error? > > Jim > > SEVERE: Exception invoking periodic operation: > java.lang.OutOfMemoryError: GC overhead limit exceeded > at > java.lang.String.substring(String.java:1940) > at > java.lang.String.substring(String.java:1905) > at > java.io.File.getName(File.java:401) > at > java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:229) > at > java.io.File.isDirectory(File.java:754) > at > org.apache.catalina.startup.HostConfig.checkResources(HostConfig.java:1000) > at > org.apache.catalina.startup.HostConfig.check(HostConfig.java:1214) > at > org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:293) > at > org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:120) > at > org.apache.catalina.core.ContainerBase.backgroundProcess(ContainerBase.java:1306) > at > org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.processChildren(ContainerBase.java:1570) > at > org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.processChildren(ContainerBase.java:1579) > at > org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.run(ContainerBase.java:1559) > at > java.lang.Thread.run(Thread.java:619) > Jul 3, 2010 1:32:20 AM > org.apache.solr.update.processor.LogUpdateProcessor finish >