Just as an addon: I have delete whole index directory and load the data from the start. After the data was loaded (and I commited the data) I run CheckIndex again. Again, there was bunch of broken segments.
I will try with the latest trunk to see if the problem still exists. Regards, Rok On Mon, Jun 11, 2012 at 8:32 AM, Rok Rejc <rokrej...@gmail.com> wrote: > Hi all, > > I have run CheckIndex. It seems that the index is currupted. I've got > plenty of exceptions like: > > test: terms, freq, prox...ERROR: java.lang.ArrayIndexOutOfBoundsException > java.lang.ArrayIndexOutOfBoundsException > at > org.apache.lucene.store.ByteArrayDataInput.readBytes(ByteArrayDataInput.java:181) > at > org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.nextLeaf(BlockTreeTermsReader.java:2414) > at > org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.next(BlockTreeTermsReader.java:2400) > at > org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum.next(BlockTreeTermsReader.java:2074) > at > org.apache.lucene.index.CheckIndex.checkFields(CheckIndex.java:771) > at > org.apache.lucene.index.CheckIndex.testPostings(CheckIndex.java:1164) > at > org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:602) > at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:1748) > > > and > > test: terms, freq, prox...ERROR: java.lang.RuntimeException: term [6f 70 > 65 72 61 63 69 6a 61]: doc 105407 <= lastDoc 105407 > java.lang.RuntimeException: term [6f 70 65 72 61 63 69 6a 61]: doc 105407 > <= lastDoc 105407 > at > org.apache.lucene.index.CheckIndex.checkFields(CheckIndex.java:858) > at > org.apache.lucene.index.CheckIndex.testPostings(CheckIndex.java:1164) > at > org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:602) > at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:1748) > test: stored fields.......OK [723321 total field count; avg 3 fields > per doc] > > > > final warning was: > > > WARNING: 154 broken segments (containing 48127608 documents) detected > WARNING: would write new segments file, and 48127608 documents would be > lost, if -fix were specified > > > As I mentiod - I have run optimization after initial import (no further > adds or deletion were made). > For import I'm creating csv files and I'm loading them through csv upload > with multiple threads. > > The index is otherwise queryable. > > Any ideas what should I do next? Is this a bug in lucene? > > Many thanks... > > Rok > > > > > > > > > > On Thu, Jun 7, 2012 at 5:05 PM, Jack Krupansky <j...@basetechnology.com>wrote: > >> Is the index otherwise usable for queries? And it is only the optimize >> that is failing? >> >> I suppose it is possible that the index could be corrupted, but it is >> also possible that there is a bug in Lucene. >> >> I would suggest running Lucene "CheckIndex" next. See what it has to say. >> >> See: >> https://builds.apache.org/job/**Lucene-trunk/javadoc/core/org/** >> apache/lucene/index/**CheckIndex.html#main(java.**lang.String[])<https://builds.apache.org/job/Lucene-trunk/javadoc/core/org/apache/lucene/index/CheckIndex.html#main%28java.lang.String[]%29> >> >> >> -- Jack Krupansky >> >> -----Original Message----- From: Rok Rejc >> Sent: Thursday, June 07, 2012 5:50 AM >> To: solr-user@lucene.apache.org >> Subject: Re: Exception when optimizing index >> >> >> Hi Jack, >> >> its the virtual machine running on a VMware vSphere 5 Enterprise Plus. >> Machine has 30 GB vRAM, 8 core vCPU 3.0 GHz, 2 TB SATA RAID-10 over iSCSI. >> Operation system is CentOS 6.2 64bit. >> >> Here are java infos: >> >> >> - catalina.base/usr/share/**tomcat6 >> - catalina.home/usr/share/**tomcat6 >> - catalina.useNamingtrue >> - common.loader >> ${catalina.base}/lib,${**catalina.base}/lib/*.jar,${** >> catalina.home}/lib,${catalina.**home}/lib/*.jar >> - file.encodingUTF-8 >> - file.encoding.pkgsun.io >> - file.separator/ >> - java.awt.graphicsenvsun.awt.**X11GraphicsEnvironment >> - java.awt.printerjobsun.**print.PSPrinterJob >> - java.class.path >> /usr/share/tomcat6/bin/**bootstrap.jar >> /usr/share/tomcat6/bin/tomcat-**juli.jar/usr/share/java/** >> commons-daemon.jar >> - java.class.version50.0 >> - java.endorsed.dirs >> - java.ext.dirs >> /usr/lib/jvm/java-1.6.0-**openjdk-1.6.0.0.x86_64/jre/**lib/ext >> /usr/java/packages/lib/ext >> - java.home/usr/lib/jvm/java-1.**6.0-openjdk-1.6.0.0.x86_64/jre >> - java.io.tmpdir/var/cache/**tomcat6/temp >> - java.library.path >> /usr/lib/jvm/java-1.6.0-**openjdk-1.6.0.0.x86_64/jre/**lib/amd64/server >> /usr/lib/jvm/java-1.6.0-**openjdk-1.6.0.0.x86_64/jre/**lib/amd64 >> /usr/lib/jvm/java-1.6.0-**openjdk-1.6.0.0.x86_64/jre/../**lib/amd64 >> /usr/java/packages/lib/amd64/**usr/lib64/lib64/lib/usr/lib >> - java.naming.factory.initial >> org.apache.naming.java.**javaURLContextFactory >> - java.naming.factory.url.**pkgsorg.apache.naming >> - java.runtime.nameOpenJDK Runtime Environment >> - java.runtime.version1.6.0_**22-b22 >> - java.specification.nameJava Platform API Specification >> - java.specification.vendorSun Microsystems Inc. >> - java.specification.version1.**6 >> - java.util.logging.config.**file >> /usr/share/tomcat6/conf/**logging.properties >> - java.util.logging.**managerorg.apache.juli.**ClassLoaderLogManager >> - java.vendorSun Microsystems Inc. >> - java.vendor.urlhttp://java.**sun.com/ <http://java.sun.com/> >> - >> java.vendor.url.bughttp://j**ava.sun.com/cgi-bin/bugreport.**cgi<http://java.sun.com/cgi-bin/bugreport.cgi> >> - java.version1.6.0_22 >> - java.vm.infomixed mode >> - java.vm.nameOpenJDK 64-Bit Server VM >> - java.vm.specification.**nameJava Virtual Machine Specification >> - java.vm.specification.**vendorSun Microsystems Inc. >> - java.vm.specification.**version1.0 >> - java.vm.vendorSun Microsystems Inc. >> - java.vm.version20.0-b11 >> - javax.sql.DataSource.**Factory >> org.apache.commons.dbcp.**BasicDataSourceFactory >> - line.separator >> - os.archamd64 >> - os.nameLinux >> - os.version2.6.32-220.13.1.**el6.x86_64 >> - package.access >> sun.,org.apache.catalina.,org.**apache.coyote.,org.apache.** >> tomcat.,org.apache.jasper.,**sun.beans. >> - package.definition >> sun.,java.,org.apache.**catalina.,org.apache.coyote.,** >> org.apache.tomcat.,org.apache.**jasper. >> - path.separator: >> - server.loader >> - shared.loader >> - sun.arch.data.model64 >> - sun.boot.class.path >> /usr/lib/jvm/java-1.6.0-**openjdk-1.6.0.0.x86_64/jre/**lib/resources.jar >> /usr/lib/jvm/java-1.6.0-**openjdk-1.6.0.0.x86_64/jre/**lib/rt.jar >> /usr/lib/jvm/java-1.6.0-**openjdk-1.6.0.0.x86_64/jre/** >> lib/sunrsasign.jar >> /usr/lib/jvm/java-1.6.0-**openjdk-1.6.0.0.x86_64/jre/**lib/jsse.jar >> /usr/lib/jvm/java-1.6.0-**openjdk-1.6.0.0.x86_64/jre/**lib/jce.jar >> /usr/lib/jvm/java-1.6.0-**openjdk-1.6.0.0.x86_64/jre/**lib/charsets.jar >> /usr/lib/jvm/java-1.6.0-**openjdk-1.6.0.0.x86_64/jre/**lib/netx.jar >> /usr/lib/jvm/java-1.6.0-**openjdk-1.6.0.0.x86_64/jre/**lib/plugin.jar >> /usr/lib/jvm/java-1.6.0-**openjdk-1.6.0.0.x86_64/jre/**lib/rhino.jar >> /usr/lib/jvm/java-1.6.0-**openjdk-1.6.0.0.x86_64/jre/** >> lib/modules/jdk.boot.jar >> /usr/lib/jvm/java-1.6.0-**openjdk-1.6.0.0.x86_64/jre/**classes >> - sun.boot.library.path >> /usr/lib/jvm/java-1.6.0-**openjdk-1.6.0.0.x86_64/jre/**lib/amd64 >> - sun.cpu.endianlittle >> - sun.cpu.isalist >> - sun.io.unicode.**encodingUnicodeLittle >> - sun.java.commandorg.apache.**catalina.startup.Bootstrap start >> - sun.java.launcherSUN_**STANDARD >> - sun.jnu.encodingUTF-8 >> - sun.management.**compilerHotSpot 64-Bit Tiered Compilers >> - sun.os.patch.levelunknown >> - tomcat.util.buf.**StringCache.byte.enabledtrue >> - user.countryUS >> - user.dir/usr/share/tomcat6 >> - user.home/usr/share/tomcat6 >> - user.languageen >> - user.nametomcat >> - user.timezoneEurope/Ljubljana >> >> >> >> >> As far as I see from the JIRA issue I have the patch attached (as >> mentioned >> I have a trunk version from May 12). Any ideas? >> >> Many thanks! >> >> >> >> On Wed, Jun 6, 2012 at 2:49 PM, Jack Krupansky <j...@basetechnology.com>* >> *wrote: >> >> It could be related to https://issues.apache.org/**** >>> jira/browse/LUCENE-2975<https://issues.apache.org/**jira/browse/LUCENE-2975> >>> <https:**//issues.apache.org/jira/**browse/LUCENE-2975<https://issues.apache.org/jira/browse/LUCENE-2975> >>> >. >>> At least the exception comes from the same function. >>> >>> >>> "Caused by: java.io.IOException: Invalid vInt detected (too many bits) >>> at org.apache.lucene.store.****DataInput.readVInt(DataInput.*** >>> *java:112)" >>> >>> What hardware and Java version are you running? >>> >>> -- Jack Krupansky >>> >>> -----Original Message----- From: Rok Rejc >>> Sent: Wednesday, June 06, 2012 3:45 AM >>> To: solr-user@lucene.apache.org >>> Subject: Exception when optimizing index >>> >>> >>> Hi all, >>> >>> I have a solr installation (version 4.0 from trunk - 1st May 2012). >>> >>> After I imported documents (99831145 documents) I have run the >>> optimization. I got an exception: >>> >>> <response><lst name="responseHeader"><int name="status">500</int><int >>> name="QTime">281615</int></****lst><lst name="error"><str >>> name="msg">background >>> merge hit exception: _8x(4.0):C202059 _e0(4.0):C192649 _3r(4.0):C205785 >>> _1s(4.0):C203526 _4w(4.0):C199793 _7f(4.0):C193108 _dy(4.0):C185814 >>> _7d(4.0):C190364 _c5(4.0):C187881 _8u(4.0):C185001 _r(4.0):C183475 >>> _1r(4.0):C185622 _2s(4.0):C174349 _3s(4.0):C171683 _7h(4.0):C170618 >>> _fj(4.0):C179232 _2t(4.0):C161907 _fi(4.0):C168713 _1q(4.0):C165402 >>> _2r(4.0):C152995 _e1(4.0):C146080 _f4(4.0):C155072 _af(4.0):C149113 >>> _dx(4.0):C147298 _3t(4.0):C150806 _q(4.0):C146874 _4v(4.0):C146324 >>> _fc(4.0):C141426 _al(4.0):C125361 _64(4.0):C119208 into _ft >>> [maxNumSegments=1]</str><str name="trace">java.io.****IOException: >>> background >>> merge hit exception: _8x(4.0):C202059 _e0(4.0):C192649 _3r(4.0):C205785 >>> _1s(4.0):C203526 _4w(4.0):C199793 _7f(4.0):C193108 _dy(4.0):C185814 >>> _7d(4.0):C190364 _c5(4.0):C187881 _8u(4.0):C185001 _r(4.0):C183475 >>> _1r(4.0):C185622 _2s(4.0):C174349 _3s(4.0):C171683 _7h(4.0):C170618 >>> _fj(4.0):C179232 _2t(4.0):C161907 _fi(4.0):C168713 _1q(4.0):C165402 >>> _2r(4.0):C152995 _e1(4.0):C146080 _f4(4.0):C155072 _af(4.0):C149113 >>> _dx(4.0):C147298 _3t(4.0):C150806 _q(4.0):C146874 _4v(4.0):C146324 >>> _fc(4.0):C141426 _al(4.0):C125361 _64(4.0):C119208 into _ft >>> [maxNumSegments=1] >>> at org.apache.lucene.index.****IndexWriter.forceMerge(** >>> IndexWriter.java:1475) >>> at org.apache.lucene.index.****IndexWriter.forceMerge(** >>> IndexWriter.java:1412) >>> at >>> org.apache.solr.update.****DirectUpdateHandler2.commit(** >>> DirectUpdateHandler2.java:385) >>> at >>> org.apache.solr.update.****processor.RunUpdateProcessor.*** >>> *processCommit(** >>> RunUpdateProcessorFactory.****java:82) >>> at >>> org.apache.solr.update.****processor.****UpdateRequestProcessor.** >>> processCommit(****UpdateRequestProcessor.java:****64) >>> at >>> org.apache.solr.update.****processor.****DistributedUpdateProcessor.** >>> processCommit(****DistributedUpdateProcessor.****java:783) >>> at >>> org.apache.solr.update.****processor.LogUpdateProcessor.*** >>> *processCommit(** >>> LogUpdateProcessorFactory.****java:154) >>> at org.apache.solr.handler.****XMLLoader.processUpdate(** >>> XMLLoader.java:155) >>> at org.apache.solr.handler.****XMLLoader.load(XMLLoader.java:****79) >>> at >>> org.apache.solr.handler.****ContentStreamHandlerBase.**** >>> handleRequestBody(** >>> ContentStreamHandlerBase.java:****59) >>> at >>> org.apache.solr.handler.****RequestHandlerBase.****handleRequest(** >>> RequestHandlerBase.java:129) >>> at org.apache.solr.core.SolrCore.****execute(SolrCore.java:1540) >>> at >>> org.apache.solr.servlet.****SolrDispatchFilter.execute(** >>> SolrDispatchFilter.java:435) >>> at >>> org.apache.solr.servlet.****SolrDispatchFilter.doFilter(** >>> SolrDispatchFilter.java:256) >>> at >>> org.apache.catalina.core.****ApplicationFilterChain.**** >>> internalDoFilter(** >>> ApplicationFilterChain.java:****235) >>> at >>> org.apache.catalina.core.****ApplicationFilterChain.****doFilter(** >>> ApplicationFilterChain.java:****206) >>> at >>> org.apache.catalina.core.****StandardWrapperValve.invoke(** >>> StandardWrapperValve.java:233) >>> at >>> org.apache.catalina.core.****StandardContextValve.invoke(** >>> StandardContextValve.java:191) >>> at >>> org.apache.catalina.core.****StandardHostValve.invoke(** >>> StandardHostValve.java:127) >>> at >>> org.apache.catalina.valves.****ErrorReportValve.invoke(** >>> ErrorReportValve.java:102) >>> at >>> org.apache.catalina.core.****StandardEngineValve.invoke(** >>> StandardEngineValve.java:109) >>> at >>> org.apache.catalina.connector.****CoyoteAdapter.service(** >>> CoyoteAdapter.java:298) >>> at >>> org.apache.coyote.http11.****Http11AprProcessor.process(** >>> Http11AprProcessor.java:865) >>> at >>> org.apache.coyote.http11.****Http11AprProtocol$**** >>> Http11ConnectionHandler.** >>> process(Http11AprProtocol.****java:579) >>> at >>> org.apache.tomcat.util.net.****AprEndpoint$Worker.run(** >>> AprEndpoint.java:1556) >>> at java.lang.Thread.run(Thread.****java:679) >>> Caused by: java.io.IOException: Invalid vInt detected (too many bits) >>> at org.apache.lucene.store.****DataInput.readVInt(DataInput.*** >>> *java:112) >>> at >>> org.apache.lucene.codecs.****lucene40.****Lucene40PostingsReader$** >>> AllDocsSegmentDocsEnum.****nextUnreadDoc(**** >>> Lucene40PostingsReader.java:** >>> 557) >>> at >>> org.apache.lucene.codecs.****lucene40.****Lucene40PostingsReader$** >>> SegmentDocsEnumBase.refill(****Lucene40PostingsReader.java:****408) >>> at >>> org.apache.lucene.codecs.****lucene40.****Lucene40PostingsReader$** >>> AllDocsSegmentDocsEnum.****nextDoc(****Lucene40PostingsReader.java:**** >>> 508) >>> at >>> org.apache.lucene.codecs.****MappingMultiDocsEnum.nextDoc(**** >>> MappingMultiDocsEnum.java:85) >>> at >>> org.apache.lucene.codecs.****PostingsConsumer.merge(** >>> PostingsConsumer.java:65) >>> at org.apache.lucene.codecs.****TermsConsumer.merge(** >>> TermsConsumer.java:82) >>> at org.apache.lucene.codecs.****FieldsConsumer.merge(** >>> FieldsConsumer.java:54) >>> at >>> org.apache.lucene.index.****SegmentMerger.mergeTerms(** >>> SegmentMerger.java:356) >>> at org.apache.lucene.index.****SegmentMerger.merge(** >>> SegmentMerger.java:115) >>> at >>> org.apache.lucene.index.****IndexWriter.mergeMiddle(**** >>> IndexWriter.java:3382) >>> at org.apache.lucene.index.****IndexWriter.merge(IndexWriter.** >>> **java:3004) >>> at >>> org.apache.lucene.index.****ConcurrentMergeScheduler.****doMerge(** >>> ConcurrentMergeScheduler.java:****382) >>> at >>> org.apache.lucene.index.****ConcurrentMergeScheduler$**** >>> MergeThread.run(** >>> ConcurrentMergeScheduler.java:****451) >>> </str><int name="code">500</int></lst></****response> >>> >>> What could be wrong? Exception is reproducable. Is exception fixed in >>> later >>> versions? >>> >>> Many thanks... >>> >>> >> >