P.S. Lucene by default limits the maximum field length
to 10K tokens, so you have to bump that for large files.

Erick

On Jan 16, 2008 11:04 AM, Erick Erickson <[EMAIL PROTECTED]> wrote:

> I don't think this is a StringBuilder limitation, but rather your Java
> JVM doesn't start with enough memory. i.e. -Xmx.
>
> In raw Lucene, I've indexed 240M files........
>
> Best
> Erick
>
>
> On Jan 16, 2008 10:12 AM, David Thibault <[EMAIL PROTECTED]>
> wrote:
>
> > All,
> > I just found a thread about this on the mailing list archives because
> > I'm
> > troubleshooting the same problem.  The kicker is that it doesn't take
> > such
> > large files to kill the StringBuilder.  I have discovered the following:
> >
> >
> > By using a text file made up of  3,443,464 bytes or less, I get no
> > error.
> >
> > AT 3,443,465 bytes:
> >
> >
> > Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
> >
> >        at java.lang.String .<init>(String.java:208)
> >
> >        at java.lang.StringBuilder.toString(StringBuilder.java:431)
> >
> >        at org.junit.Assert.format(Assert.java:321)
> >
> >        at org.junit.ComparisonFailure$ComparisonCompactor.compact (
> > ComparisonFailure.java:80)
> >
> >        at org.junit.ComparisonFailure.getMessage(ComparisonFailure.java
> > :37)
> >
> >        at java.lang.Throwable.getLocalizedMessage(Throwable.java:267)
> >
> >        at java.lang.Throwable.toString (Throwable.java:344)
> >
> >        at java.lang.String.valueOf(String.java:2615)
> >
> >        at java.io.PrintWriter.print(PrintWriter.java:546)
> >
> >        at java.io.PrintWriter.println(PrintWriter.java:683)
> >
> >        at java.lang.Throwable.printStackTrace(Throwable.java:510)
> >
> >        at org.apache.tools.ant.util.StringUtils.getStackTrace(
> > StringUtils.java:96)
> >
> >        at
> >
> > org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.getFilteredTrace
> > (JUnitTestRunner.java:856)
> >
> >        at
> >
> > org.apache.tools.ant.taskdefs.optional.junit.XMLJUnitResultFormatter.formatError
> > (XMLJUnitResultFormatter.java:280)
> >
> >        at
> >
> > org.apache.tools.ant.taskdefs.optional.junit.XMLJUnitResultFormatter.addError
> > (XMLJUnitResultFormatter.java:255)
> >
> >        at
> > org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner$4.addError(
> > JUnitTestRunner.java:988)
> >
> >        at junit.framework.TestResult.addError(TestResult.java :38)
> >
> >        at junit.framework.JUnit4TestAdapterCache$1.testFailure(
> > JUnit4TestAdapterCache.java:51)
> >
> >        at org.junit.runner.notification.RunNotifier$4.notifyListener(
> > RunNotifier.java:96)
> >
> >        at org.junit.runner.notification.RunNotifier$SafeNotifier.run(
> > RunNotifier.java:37)
> >
> >        at org.junit.runner.notification.RunNotifier.fireTestFailure(
> > RunNotifier.java:93)
> >
> >        at org.junit.internal.runners.TestMethodRunner.addFailure (
> > TestMethodRunner.java:104)
> >
> >        at org.junit.internal.runners.TestMethodRunner.runUnprotected(
> > TestMethodRunner.java:87)
> >
> >        at org.junit.internal.runners.BeforeAndAfterRunner.runProtected(
> > BeforeAndAfterRunner.java:34)
> >
> >        at org.junit.internal.runners.TestMethodRunner.runMethod(
> > TestMethodRunner.java:75)
> >
> >        at org.junit.internal.runners.TestMethodRunner.run(
> > TestMethodRunner.java :45)
> >
> >        at
> > org.junit.internal.runners.TestClassMethodsRunner.invokeTestMethod(
> > TestClassMethodsRunner.java:71)
> >
> >        at org.junit.internal.runners.TestClassMethodsRunner.run(
> > TestClassMethodsRunner.java :35)
> >
> >        at org.junit.internal.runners.TestClassRunner$1.runUnprotected(
> > TestClassRunner.java:42)
> >
> >        at org.junit.internal.runners.BeforeAndAfterRunner.runProtected(
> > BeforeAndAfterRunner.java:34)
> >
> >        at org.junit.internal.runners.TestClassRunner.run(
> > TestClassRunner.java:52)
> >
> >        at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java
> > :32)
> >
> >
> >
> > AT 3,443,466 byes (or more) :
> >
> >
> > Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
> >
> >        at java.lang.AbstractStringBuilder.expandCapacity(
> > AbstractStringBuilder.java:99)
> >
> >        at java.lang.AbstractStringBuilder.append (
> > AbstractStringBuilder.java
> > :393)
> >
> >        at java.lang.StringBuilder.append(StringBuilder.java:120)
> >
> >        at org.junit.Assert.format(Assert.java:321)
> >
> >        at org.junit.ComparisonFailure$ComparisonCompactor.compact (
> > ComparisonFailure.java:80)
> >
> >        at org.junit.ComparisonFailure.getMessage(ComparisonFailure.java
> > :37)
> >
> >        at java.lang.Throwable.getLocalizedMessage(Throwable.java:267)
> >
> >        at java.lang.Throwable.toString (Throwable.java:344)
> >
> >        at java.lang.String.valueOf(String.java:2615)
> >
> >        at java.io.PrintWriter.print(PrintWriter.java:546)
> >
> >        at java.io.PrintWriter.println(PrintWriter.java:683)
> >
> >        at java.lang.Throwable.printStackTrace(Throwable.java:510)
> >
> >        at org.apache.tools.ant.util.StringUtils.getStackTrace(
> > StringUtils.java:96)
> >
> >        at
> >
> > org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.getFilteredTrace
> > (JUnitTestRunner.java:856)
> >
> >        at
> >
> > org.apache.tools.ant.taskdefs.optional.junit.XMLJUnitResultFormatter.formatError
> > (XMLJUnitResultFormatter.java:280)
> >
> >        at
> >
> > org.apache.tools.ant.taskdefs.optional.junit.XMLJUnitResultFormatter.addError
> > (XMLJUnitResultFormatter.java:255)
> >
> >        at
> > org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner$4.addError(
> > JUnitTestRunner.java:988)
> >
> >        at junit.framework.TestResult.addError(TestResult.java :38)
> >
> >        at junit.framework.JUnit4TestAdapterCache$1.testFailure(
> > JUnit4TestAdapterCache.java:51)
> >
> >        at org.junit.runner.notification.RunNotifier$4.notifyListener(
> > RunNotifier.java:96)
> >
> >        at org.junit.runner.notification.RunNotifier$SafeNotifier.run(
> > RunNotifier.java:37)
> >
> >        at org.junit.runner.notification.RunNotifier.fireTestFailure(
> > RunNotifier.java:93)
> >
> >        at org.junit.internal.runners.TestMethodRunner.addFailure (
> > TestMethodRunner.java:104)
> >
> >        at org.junit.internal.runners.TestMethodRunner.runUnprotected(
> > TestMethodRunner.java:87)
> >
> >        at org.junit.internal.runners.BeforeAndAfterRunner.runProtected(
> > BeforeAndAfterRunner.java:34)
> >
> >        at org.junit.internal.runners.TestMethodRunner.runMethod(
> > TestMethodRunner.java:75)
> >
> >        at org.junit.internal.runners.TestMethodRunner.run(
> > TestMethodRunner.java :45)
> >
> >        at
> > org.junit.internal.runners.TestClassMethodsRunner.invokeTestMethod(
> > TestClassMethodsRunner.java:71)
> >
> >        at org.junit.internal.runners.TestClassMethodsRunner.run(
> > TestClassMethodsRunner.java :35)
> >
> >        at org.junit.internal.runners.TestClassRunner$1.runUnprotected(
> > TestClassRunner.java:42)
> >
> >        at org.junit.internal.runners.BeforeAndAfterRunner.runProtected(
> > BeforeAndAfterRunner.java:34)
> >
> >        at org.junit.internal.runners.TestClassRunner.run(
> > TestClassRunner.java:52)
> >
> >
> > I am writing a filesystem crawler so I need to be able to crawl and
> > index
> > any size file (within reason).  A 3-4MB file is certainly within reason.
> >  I
> > rewrote my code to store the file contents in a file and read/write in
> > one
> > line at a time.  However, when I post the XML file to Solr using
> > SimplePostTool, I get another OutOfMemoryError about the java heap space
> >
> > (thrown from org.xmlpull... again).  In any case, does anyone have any
> > ideas
> > about this?  Has anyone posted documents with contents larger than 3.5MBto
> > Solr successfully?  If so, how was it done?  I'm using Solr v1.2.
> >
> >
> > Best,
> >
> > Dave
> >
>
>

Reply via email to