Just as an addon:

I have delete whole index directory and load the data from the start. After
the data was loaded (and I commited the data) I run CheckIndex again.
Again, there was bunch of broken segments.

I will try with the latest trunk to see if the problem still exists.

Regards,
Rok


On Mon, Jun 11, 2012 at 8:32 AM, Rok Rejc <rokrej...@gmail.com> wrote:

> Hi all,
>
> I have run CheckIndex. It seems that the index is currupted. I've got
> plenty of exceptions like:
>
>   test: terms, freq, prox...ERROR: java.lang.ArrayIndexOutOfBoundsException
> java.lang.ArrayIndexOutOfBoundsException
>         at
> org.apache.lucene.store.ByteArrayDataInput.readBytes(ByteArrayDataInput.java:181)
>         at
> org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.nextLeaf(BlockTreeTermsReader.java:2414)
>         at
> org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.next(BlockTreeTermsReader.java:2400)
>         at
> org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum.next(BlockTreeTermsReader.java:2074)
>         at
> org.apache.lucene.index.CheckIndex.checkFields(CheckIndex.java:771)
>         at
> org.apache.lucene.index.CheckIndex.testPostings(CheckIndex.java:1164)
>         at
> org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:602)
>         at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:1748)
>
>
> and
>
>   test: terms, freq, prox...ERROR: java.lang.RuntimeException: term [6f 70
> 65 72 61 63 69 6a 61]: doc 105407 <= lastDoc 105407
> java.lang.RuntimeException: term [6f 70 65 72 61 63 69 6a 61]: doc 105407
> <= lastDoc 105407
>         at
> org.apache.lucene.index.CheckIndex.checkFields(CheckIndex.java:858)
>         at
> org.apache.lucene.index.CheckIndex.testPostings(CheckIndex.java:1164)
>         at
> org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:602)
>         at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:1748)
>     test: stored fields.......OK [723321 total field count; avg 3 fields
> per doc]
>
>
>
> final warning was:
>
>
> WARNING: 154 broken segments (containing 48127608 documents) detected
> WARNING: would write new segments file, and 48127608 documents would be
> lost, if -fix were specified
>
>
> As I mentiod - I have run optimization after initial import (no further
> adds or deletion were made).
> For import I'm creating csv files and I'm loading them through csv upload
> with multiple threads.
>
> The index is otherwise queryable.
>
> Any ideas what should I do next? Is this a bug in lucene?
>
> Many thanks...
>
> Rok
>
>
>
>
>
>
>
>
>
> On Thu, Jun 7, 2012 at 5:05 PM, Jack Krupansky <j...@basetechnology.com>wrote:
>
>> Is the index otherwise usable for queries? And it is only the optimize
>> that is failing?
>>
>> I suppose it is possible that the index could be corrupted, but it is
>> also possible that there is a bug in Lucene.
>>
>> I would suggest running Lucene "CheckIndex" next. See what it has to say.
>>
>> See:
>> https://builds.apache.org/job/**Lucene-trunk/javadoc/core/org/**
>> apache/lucene/index/**CheckIndex.html#main(java.**lang.String[])<https://builds.apache.org/job/Lucene-trunk/javadoc/core/org/apache/lucene/index/CheckIndex.html#main%28java.lang.String[]%29>
>>
>>
>> -- Jack Krupansky
>>
>> -----Original Message----- From: Rok Rejc
>> Sent: Thursday, June 07, 2012 5:50 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Exception when optimizing index
>>
>>
>> Hi Jack,
>>
>> its the virtual machine running on a VMware vSphere 5 Enterprise Plus.
>> Machine has 30 GB vRAM, 8 core vCPU 3.0 GHz, 2 TB SATA RAID-10 over iSCSI.
>> Operation system is CentOS 6.2 64bit.
>>
>> Here are java infos:
>>
>>
>>  - catalina.​base/usr/share/**tomcat6
>>  - catalina.​home/usr/share/**tomcat6
>>  - catalina.​useNamingtrue
>>  - common.​loader
>>  ${catalina.base}/lib,${**catalina.base}/lib/*.jar,${**
>> catalina.home}/lib,${catalina.**home}/lib/*.jar
>>  - file.​encodingUTF-8
>>  - file.​encoding.​pkgsun.io
>>  - file.​separator/
>>  - java.​awt.​graphicsenvsun.awt.**X11GraphicsEnvironment
>>  - java.​awt.​printerjobsun.**print.PSPrinterJob
>>  - java.​class.​path
>>  /usr/share/tomcat6/bin/**bootstrap.jar
>>  /usr/share/tomcat6/bin/tomcat-**juli.jar/usr/share/java/**
>> commons-daemon.jar
>>  - java.​class.​version50.0
>>  - java.​endorsed.​dirs
>>  - java.​ext.​dirs
>>  /usr/lib/jvm/java-1.6.0-**openjdk-1.6.0.0.x86_64/jre/**lib/ext
>>  /usr/java/packages/lib/ext
>>  - java.​home/usr/lib/jvm/java-1.**6.0-openjdk-1.6.0.0.x86_64/jre
>>  - java.​io.​tmpdir/var/cache/**tomcat6/temp
>>  - java.​library.​path
>>  /usr/lib/jvm/java-1.6.0-**openjdk-1.6.0.0.x86_64/jre/**lib/amd64/server
>>  /usr/lib/jvm/java-1.6.0-**openjdk-1.6.0.0.x86_64/jre/**lib/amd64
>>  /usr/lib/jvm/java-1.6.0-**openjdk-1.6.0.0.x86_64/jre/../**lib/amd64
>>  /usr/java/packages/lib/amd64/**usr/lib64/lib64/lib/usr/lib
>>  - java.​naming.​factory.​initial
>>  org.apache.naming.java.**javaURLContextFactory
>>  - java.​naming.​factory.​url.​**pkgsorg.apache.naming
>>  - java.​runtime.​nameOpenJDK Runtime Environment
>>  - java.​runtime.​version1.6.0_**22-b22
>>  - java.​specification.​nameJava Platform API Specification
>>  - java.​specification.​vendorSun Microsystems Inc.
>>  - java.​specification.​version1.**6
>>  - java.​util.​logging.​config.​**file
>>  /usr/share/tomcat6/conf/**logging.properties
>>  - java.​util.​logging.​**managerorg.apache.juli.**ClassLoaderLogManager
>>  - java.​vendorSun Microsystems Inc.
>>  - java.​vendor.​urlhttp://java.**sun.com/ <http://java.sun.com/>
>>  - 
>> java.​vendor.​url.​bughttp://j**ava.sun.com/cgi-bin/bugreport.**cgi<http://java.sun.com/cgi-bin/bugreport.cgi>
>>  - java.​version1.6.0_22
>>  - java.​vm.​infomixed mode
>>  - java.​vm.​nameOpenJDK 64-Bit Server VM
>>  - java.​vm.​specification.​**nameJava Virtual Machine Specification
>>  - java.​vm.​specification.​**vendorSun Microsystems Inc.
>>  - java.​vm.​specification.​**version1.0
>>  - java.​vm.​vendorSun Microsystems Inc.
>>  - java.​vm.​version20.0-b11
>>  - javax.​sql.​DataSource.​**Factory
>>  org.apache.commons.dbcp.**BasicDataSourceFactory
>>  - line.​separator
>>  - os.​archamd64
>>  - os.​nameLinux
>>  - os.​version2.6.32-220.13.1.**el6.x86_64
>>  - package.​access
>>  sun.,org.apache.catalina.,org.**apache.coyote.,org.apache.**
>> tomcat.,org.apache.jasper.,**sun.beans.
>>  - package.​definition
>>  sun.,java.,org.apache.**catalina.,org.apache.coyote.,**
>> org.apache.tomcat.,org.apache.**jasper.
>>  - path.​separator:
>>  - server.​loader
>>  - shared.​loader
>>  - sun.​arch.​data.​model64
>>  - sun.​boot.​class.​path
>>  /usr/lib/jvm/java-1.6.0-**openjdk-1.6.0.0.x86_64/jre/**lib/resources.jar
>>  /usr/lib/jvm/java-1.6.0-**openjdk-1.6.0.0.x86_64/jre/**lib/rt.jar
>>  /usr/lib/jvm/java-1.6.0-**openjdk-1.6.0.0.x86_64/jre/**
>> lib/sunrsasign.jar
>>  /usr/lib/jvm/java-1.6.0-**openjdk-1.6.0.0.x86_64/jre/**lib/jsse.jar
>>  /usr/lib/jvm/java-1.6.0-**openjdk-1.6.0.0.x86_64/jre/**lib/jce.jar
>>  /usr/lib/jvm/java-1.6.0-**openjdk-1.6.0.0.x86_64/jre/**lib/charsets.jar
>>  /usr/lib/jvm/java-1.6.0-**openjdk-1.6.0.0.x86_64/jre/**lib/netx.jar
>>  /usr/lib/jvm/java-1.6.0-**openjdk-1.6.0.0.x86_64/jre/**lib/plugin.jar
>>  /usr/lib/jvm/java-1.6.0-**openjdk-1.6.0.0.x86_64/jre/**lib/rhino.jar
>>  /usr/lib/jvm/java-1.6.0-**openjdk-1.6.0.0.x86_64/jre/**
>> lib/modules/jdk.boot.jar
>>  /usr/lib/jvm/java-1.6.0-**openjdk-1.6.0.0.x86_64/jre/**classes
>>  - sun.​boot.​library.​path
>>  /usr/lib/jvm/java-1.6.0-**openjdk-1.6.0.0.x86_64/jre/**lib/amd64
>>  - sun.​cpu.​endianlittle
>>  - sun.​cpu.​isalist
>>  - sun.​io.​unicode.​**encodingUnicodeLittle
>>  - sun.​java.​commandorg.apache.**catalina.startup.Bootstrap start
>>  - sun.​java.​launcherSUN_**STANDARD
>>  - sun.​jnu.​encodingUTF-8
>>  - sun.​management.​**compilerHotSpot 64-Bit Tiered Compilers
>>  - sun.​os.​patch.​levelunknown
>>  - tomcat.​util.​buf.​**StringCache.​byte.​enabledtrue
>>  - user.​countryUS
>>  - user.​dir/usr/share/tomcat6
>>  - user.​home/usr/share/tomcat6
>>  - user.​languageen
>>  - user.​nametomcat
>>  - user.​timezoneEurope/Ljubljana
>>
>>
>>
>>
>> As far as I see from the JIRA issue I have the patch attached (as
>> mentioned
>> I have a trunk version from May 12). Any ideas?
>>
>> Many thanks!
>>
>>
>>
>> On Wed, Jun 6, 2012 at 2:49 PM, Jack Krupansky <j...@basetechnology.com>*
>> *wrote:
>>
>>  It could be related to https://issues.apache.org/****
>>> jira/browse/LUCENE-2975<https://issues.apache.org/**jira/browse/LUCENE-2975>
>>> <https:**//issues.apache.org/jira/**browse/LUCENE-2975<https://issues.apache.org/jira/browse/LUCENE-2975>
>>> >.
>>> At least the exception comes from the same function.
>>>
>>>
>>> "Caused by: java.io.IOException: Invalid vInt detected (too many bits)
>>>  at org.apache.lucene.store.****DataInput.readVInt(DataInput.***
>>> *java:112)"
>>>
>>> What hardware and Java version are you running?
>>>
>>> -- Jack Krupansky
>>>
>>> -----Original Message----- From: Rok Rejc
>>> Sent: Wednesday, June 06, 2012 3:45 AM
>>> To: solr-user@lucene.apache.org
>>> Subject: Exception when optimizing index
>>>
>>>
>>> Hi all,
>>>
>>> I have a solr installation (version 4.0 from trunk - 1st May 2012).
>>>
>>> After I imported documents (99831145 documents) I have run the
>>> optimization. I got an exception:
>>>
>>> <response><lst name="responseHeader"><int name="status">500</int><int
>>> name="QTime">281615</int></****lst><lst name="error"><str
>>> name="msg">background
>>> merge hit exception: _8x(4.0):C202059 _e0(4.0):C192649 _3r(4.0):C205785
>>> _1s(4.0):C203526 _4w(4.0):C199793 _7f(4.0):C193108 _dy(4.0):C185814
>>> _7d(4.0):C190364 _c5(4.0):C187881 _8u(4.0):C185001 _r(4.0):C183475
>>> _1r(4.0):C185622 _2s(4.0):C174349 _3s(4.0):C171683 _7h(4.0):C170618
>>> _fj(4.0):C179232 _2t(4.0):C161907 _fi(4.0):C168713 _1q(4.0):C165402
>>> _2r(4.0):C152995 _e1(4.0):C146080 _f4(4.0):C155072 _af(4.0):C149113
>>> _dx(4.0):C147298 _3t(4.0):C150806 _q(4.0):C146874 _4v(4.0):C146324
>>> _fc(4.0):C141426 _al(4.0):C125361 _64(4.0):C119208 into _ft
>>> [maxNumSegments=1]</str><str name="trace">java.io.****IOException:
>>> background
>>> merge hit exception: _8x(4.0):C202059 _e0(4.0):C192649 _3r(4.0):C205785
>>> _1s(4.0):C203526 _4w(4.0):C199793 _7f(4.0):C193108 _dy(4.0):C185814
>>> _7d(4.0):C190364 _c5(4.0):C187881 _8u(4.0):C185001 _r(4.0):C183475
>>> _1r(4.0):C185622 _2s(4.0):C174349 _3s(4.0):C171683 _7h(4.0):C170618
>>> _fj(4.0):C179232 _2t(4.0):C161907 _fi(4.0):C168713 _1q(4.0):C165402
>>> _2r(4.0):C152995 _e1(4.0):C146080 _f4(4.0):C155072 _af(4.0):C149113
>>> _dx(4.0):C147298 _3t(4.0):C150806 _q(4.0):C146874 _4v(4.0):C146324
>>> _fc(4.0):C141426 _al(4.0):C125361 _64(4.0):C119208 into _ft
>>> [maxNumSegments=1]
>>>  at org.apache.lucene.index.****IndexWriter.forceMerge(**
>>> IndexWriter.java:1475)
>>>  at org.apache.lucene.index.****IndexWriter.forceMerge(**
>>> IndexWriter.java:1412)
>>>  at
>>> org.apache.solr.update.****DirectUpdateHandler2.commit(**
>>> DirectUpdateHandler2.java:385)
>>>  at
>>> org.apache.solr.update.****processor.RunUpdateProcessor.***
>>> *processCommit(**
>>> RunUpdateProcessorFactory.****java:82)
>>>  at
>>> org.apache.solr.update.****processor.****UpdateRequestProcessor.**
>>> processCommit(****UpdateRequestProcessor.java:****64)
>>>  at
>>> org.apache.solr.update.****processor.****DistributedUpdateProcessor.**
>>> processCommit(****DistributedUpdateProcessor.****java:783)
>>>  at
>>> org.apache.solr.update.****processor.LogUpdateProcessor.***
>>> *processCommit(**
>>> LogUpdateProcessorFactory.****java:154)
>>>  at org.apache.solr.handler.****XMLLoader.processUpdate(**
>>> XMLLoader.java:155)
>>>  at org.apache.solr.handler.****XMLLoader.load(XMLLoader.java:****79)
>>>  at
>>> org.apache.solr.handler.****ContentStreamHandlerBase.****
>>> handleRequestBody(**
>>> ContentStreamHandlerBase.java:****59)
>>>  at
>>> org.apache.solr.handler.****RequestHandlerBase.****handleRequest(**
>>> RequestHandlerBase.java:129)
>>>  at org.apache.solr.core.SolrCore.****execute(SolrCore.java:1540)
>>>  at
>>> org.apache.solr.servlet.****SolrDispatchFilter.execute(**
>>> SolrDispatchFilter.java:435)
>>>  at
>>> org.apache.solr.servlet.****SolrDispatchFilter.doFilter(**
>>> SolrDispatchFilter.java:256)
>>>  at
>>> org.apache.catalina.core.****ApplicationFilterChain.****
>>> internalDoFilter(**
>>> ApplicationFilterChain.java:****235)
>>>  at
>>> org.apache.catalina.core.****ApplicationFilterChain.****doFilter(**
>>> ApplicationFilterChain.java:****206)
>>>  at
>>> org.apache.catalina.core.****StandardWrapperValve.invoke(**
>>> StandardWrapperValve.java:233)
>>>  at
>>> org.apache.catalina.core.****StandardContextValve.invoke(**
>>> StandardContextValve.java:191)
>>>  at
>>> org.apache.catalina.core.****StandardHostValve.invoke(**
>>> StandardHostValve.java:127)
>>>  at
>>> org.apache.catalina.valves.****ErrorReportValve.invoke(**
>>> ErrorReportValve.java:102)
>>>  at
>>> org.apache.catalina.core.****StandardEngineValve.invoke(**
>>> StandardEngineValve.java:109)
>>>  at
>>> org.apache.catalina.connector.****CoyoteAdapter.service(**
>>> CoyoteAdapter.java:298)
>>>  at
>>> org.apache.coyote.http11.****Http11AprProcessor.process(**
>>> Http11AprProcessor.java:865)
>>>  at
>>> org.apache.coyote.http11.****Http11AprProtocol$****
>>> Http11ConnectionHandler.**
>>> process(Http11AprProtocol.****java:579)
>>>  at
>>> org.apache.tomcat.util.net.****AprEndpoint$Worker.run(**
>>> AprEndpoint.java:1556)
>>>  at java.lang.Thread.run(Thread.****java:679)
>>> Caused by: java.io.IOException: Invalid vInt detected (too many bits)
>>>  at org.apache.lucene.store.****DataInput.readVInt(DataInput.***
>>> *java:112)
>>>  at
>>> org.apache.lucene.codecs.****lucene40.****Lucene40PostingsReader$**
>>> AllDocsSegmentDocsEnum.****nextUnreadDoc(****
>>> Lucene40PostingsReader.java:**
>>> 557)
>>>  at
>>> org.apache.lucene.codecs.****lucene40.****Lucene40PostingsReader$**
>>> SegmentDocsEnumBase.refill(****Lucene40PostingsReader.java:****408)
>>>  at
>>> org.apache.lucene.codecs.****lucene40.****Lucene40PostingsReader$**
>>> AllDocsSegmentDocsEnum.****nextDoc(****Lucene40PostingsReader.java:****
>>> 508)
>>>  at
>>> org.apache.lucene.codecs.****MappingMultiDocsEnum.nextDoc(****
>>> MappingMultiDocsEnum.java:85)
>>>  at
>>> org.apache.lucene.codecs.****PostingsConsumer.merge(**
>>> PostingsConsumer.java:65)
>>>  at org.apache.lucene.codecs.****TermsConsumer.merge(**
>>> TermsConsumer.java:82)
>>>  at org.apache.lucene.codecs.****FieldsConsumer.merge(**
>>> FieldsConsumer.java:54)
>>>  at
>>> org.apache.lucene.index.****SegmentMerger.mergeTerms(**
>>> SegmentMerger.java:356)
>>>  at org.apache.lucene.index.****SegmentMerger.merge(**
>>> SegmentMerger.java:115)
>>>  at
>>> org.apache.lucene.index.****IndexWriter.mergeMiddle(****
>>> IndexWriter.java:3382)
>>>  at org.apache.lucene.index.****IndexWriter.merge(IndexWriter.**
>>> **java:3004)
>>>  at
>>> org.apache.lucene.index.****ConcurrentMergeScheduler.****doMerge(**
>>> ConcurrentMergeScheduler.java:****382)
>>>  at
>>> org.apache.lucene.index.****ConcurrentMergeScheduler$****
>>> MergeThread.run(**
>>> ConcurrentMergeScheduler.java:****451)
>>> </str><int name="code">500</int></lst></****response>
>>>
>>> What could be wrong? Exception is reproducable. Is exception fixed in
>>> later
>>> versions?
>>>
>>> Many thanks...
>>>
>>>
>>
>

Reply via email to