It's good to know that the situation is reproducible.

Maybe you could do a couple of smaller tests, such as running CheckIndex after loading only 10%, 25%, and 50% of the data to see if the problem occurs with less data or is dependent on a much higher document count.

And also check for any exceptions or even warnings in the logs before running CheckIndex.

What was the number of documents you believe added to the index before you ran CheckIndex this latest time? Can you do a query of *:* and see if its count agrees?

-- Jack Krupansky

-----Original Message----- From: Rok Rejc
Sent: Tuesday, June 12, 2012 1:40 AM
To: solr-user@lucene.apache.org
Subject: Re: Exception when optimizing index

Just as an addon:

I have delete whole index directory and load the data from the start. After
the data was loaded (and I commited the data) I run CheckIndex again.
Again, there was bunch of broken segments.

I will try with the latest trunk to see if the problem still exists.

Regards,
Rok


On Mon, Jun 11, 2012 at 8:32 AM, Rok Rejc <rokrej...@gmail.com> wrote:

Hi all,

I have run CheckIndex. It seems that the index is currupted. I've got
plenty of exceptions like:

test: terms, freq, prox...ERROR: java.lang.ArrayIndexOutOfBoundsException
java.lang.ArrayIndexOutOfBoundsException
        at
org.apache.lucene.store.ByteArrayDataInput.readBytes(ByteArrayDataInput.java:181)
        at
org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.nextLeaf(BlockTreeTermsReader.java:2414)
        at
org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.next(BlockTreeTermsReader.java:2400)
        at
org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum.next(BlockTreeTermsReader.java:2074)
        at
org.apache.lucene.index.CheckIndex.checkFields(CheckIndex.java:771)
        at
org.apache.lucene.index.CheckIndex.testPostings(CheckIndex.java:1164)
        at
org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:602)
        at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:1748)


and

  test: terms, freq, prox...ERROR: java.lang.RuntimeException: term [6f 70
65 72 61 63 69 6a 61]: doc 105407 <= lastDoc 105407
java.lang.RuntimeException: term [6f 70 65 72 61 63 69 6a 61]: doc 105407
<= lastDoc 105407
        at
org.apache.lucene.index.CheckIndex.checkFields(CheckIndex.java:858)
        at
org.apache.lucene.index.CheckIndex.testPostings(CheckIndex.java:1164)
        at
org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:602)
        at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:1748)
    test: stored fields.......OK [723321 total field count; avg 3 fields
per doc]



final warning was:


WARNING: 154 broken segments (containing 48127608 documents) detected
WARNING: would write new segments file, and 48127608 documents would be
lost, if -fix were specified


As I mentiod - I have run optimization after initial import (no further
adds or deletion were made).
For import I'm creating csv files and I'm loading them through csv upload
with multiple threads.

The index is otherwise queryable.

Any ideas what should I do next? Is this a bug in lucene?

Many thanks...

Rok









On Thu, Jun 7, 2012 at 5:05 PM, Jack Krupansky <j...@basetechnology.com>wrote:

Is the index otherwise usable for queries? And it is only the optimize
that is failing?

I suppose it is possible that the index could be corrupted, but it is
also possible that there is a bug in Lucene.

I would suggest running Lucene "CheckIndex" next. See what it has to say.

See:
https://builds.apache.org/job/**Lucene-trunk/javadoc/core/org/**
apache/lucene/index/**CheckIndex.html#main(java.**lang.String[])<https://builds.apache.org/job/Lucene-trunk/javadoc/core/org/apache/lucene/index/CheckIndex.html#main%28java.lang.String[]%29>


-- Jack Krupansky

-----Original Message----- From: Rok Rejc
Sent: Thursday, June 07, 2012 5:50 AM
To: solr-user@lucene.apache.org
Subject: Re: Exception when optimizing index


Hi Jack,

its the virtual machine running on a VMware vSphere 5 Enterprise Plus.
Machine has 30 GB vRAM, 8 core vCPU 3.0 GHz, 2 TB SATA RAID-10 over iSCSI.
Operation system is CentOS 6.2 64bit.

Here are java infos:


 - catalina.​base/usr/share/**tomcat6
 - catalina.​home/usr/share/**tomcat6
 - catalina.​useNamingtrue
 - common.​loader
 ${catalina.base}/lib,${**catalina.base}/lib/*.jar,${**
catalina.home}/lib,${catalina.**home}/lib/*.jar
 - file.​encodingUTF-8
 - file.​encoding.​pkgsun.io
 - file.​separator/
 - java.​awt.​graphicsenvsun.awt.**X11GraphicsEnvironment
 - java.​awt.​printerjobsun.**print.PSPrinterJob
 - java.​class.​path
 /usr/share/tomcat6/bin/**bootstrap.jar
 /usr/share/tomcat6/bin/tomcat-**juli.jar/usr/share/java/**
commons-daemon.jar
 - java.​class.​version50.0
 - java.​endorsed.​dirs
 - java.​ext.​dirs
 /usr/lib/jvm/java-1.6.0-**openjdk-1.6.0.0.x86_64/jre/**lib/ext
 /usr/java/packages/lib/ext
 - java.​home/usr/lib/jvm/java-1.**6.0-openjdk-1.6.0.0.x86_64/jre
 - java.​io.​tmpdir/var/cache/**tomcat6/temp
 - java.​library.​path
 /usr/lib/jvm/java-1.6.0-**openjdk-1.6.0.0.x86_64/jre/**lib/amd64/server
 /usr/lib/jvm/java-1.6.0-**openjdk-1.6.0.0.x86_64/jre/**lib/amd64
 /usr/lib/jvm/java-1.6.0-**openjdk-1.6.0.0.x86_64/jre/../**lib/amd64
 /usr/java/packages/lib/amd64/**usr/lib64/lib64/lib/usr/lib
 - java.​naming.​factory.​initial
 org.apache.naming.java.**javaURLContextFactory
 - java.​naming.​factory.​url.​**pkgsorg.apache.naming
 - java.​runtime.​nameOpenJDK Runtime Environment
 - java.​runtime.​version1.6.0_**22-b22
 - java.​specification.​nameJava Platform API Specification
 - java.​specification.​vendorSun Microsystems Inc.
 - java.​specification.​version1.**6
 - java.​util.​logging.​config.​**file
 /usr/share/tomcat6/conf/**logging.properties
 - java.​util.​logging.​**managerorg.apache.juli.**ClassLoaderLogManager
 - java.​vendorSun Microsystems Inc.
 - java.​vendor.​urlhttp://java.**sun.com/ <http://java.sun.com/>
 - 
java.​vendor.​url.bughttp://j**ava.sun.com/cgi-bin/bugreport.**cgi<http://java.sun.com/cgi-bin/bugreport.cgi>>>
  - java.​version1.6.0_22
 - java.​vm.​infomixed mode
 - java.​vm.​nameOpenJDK 64-Bit Server VM
 - java.​vm.​specification.​**nameJava Virtual Machine Specification
 - java.​vm.​specification.​**vendorSun Microsystems Inc.
 - java.​vm.​specification.​**version1.0
 - java.​vm.​vendorSun Microsystems Inc.
 - java.​vm.​version20.0-b11
 - javax.​sql.​DataSource.​**Factory
 org.apache.commons.dbcp.**BasicDataSourceFactory
 - line.​separator
 - os.​archamd64
 - os.​nameLinux
 - os.​version2.6.32-220.13.1.**el6.x86_64
 - package.​access
 sun.,org.apache.catalina.,org.**apache.coyote.,org.apache.**
tomcat.,org.apache.jasper.,**sun.beans.
 - package.​definition
 sun.,java.,org.apache.**catalina.,org.apache.coyote.,**
org.apache.tomcat.,org.apache.**jasper.
 - path.​separator:
 - server.​loader
 - shared.​loader
 - sun.​arch.​data.​model64
 - sun.​boot.​class.​path
 /usr/lib/jvm/java-1.6.0-**openjdk-1.6.0.0.x86_64/jre/**lib/resources.jar
 /usr/lib/jvm/java-1.6.0-**openjdk-1.6.0.0.x86_64/jre/**lib/rt.jar
 /usr/lib/jvm/java-1.6.0-**openjdk-1.6.0.0.x86_64/jre/**
lib/sunrsasign.jar
 /usr/lib/jvm/java-1.6.0-**openjdk-1.6.0.0.x86_64/jre/**lib/jsse.jar
 /usr/lib/jvm/java-1.6.0-**openjdk-1.6.0.0.x86_64/jre/**lib/jce.jar
 /usr/lib/jvm/java-1.6.0-**openjdk-1.6.0.0.x86_64/jre/**lib/charsets.jar
 /usr/lib/jvm/java-1.6.0-**openjdk-1.6.0.0.x86_64/jre/**lib/netx.jar
 /usr/lib/jvm/java-1.6.0-**openjdk-1.6.0.0.x86_64/jre/**lib/plugin.jar
 /usr/lib/jvm/java-1.6.0-**openjdk-1.6.0.0.x86_64/jre/**lib/rhino.jar
 /usr/lib/jvm/java-1.6.0-**openjdk-1.6.0.0.x86_64/jre/**
lib/modules/jdk.boot.jar
 /usr/lib/jvm/java-1.6.0-**openjdk-1.6.0.0.x86_64/jre/**classes
 - sun.​boot.​library.​path
 /usr/lib/jvm/java-1.6.0-**openjdk-1.6.0.0.x86_64/jre/**lib/amd64
 - sun.​cpu.​endianlittle
 - sun.​cpu.​isalist
 - sun.​io.​unicode.​**encodingUnicodeLittle
 - sun.​java.​commandorg.apache.**catalina.startup.Bootstrap start
 - sun.​java.​launcherSUN_**STANDARD
 - sun.​jnu.​encodingUTF-8
 - sun.​management.​**compilerHotSpot 64-Bit Tiered Compilers
 - sun.​os.​patch.​levelunknown
 - tomcat.​util.​buf.​**StringCache.​byte.​enabledtrue
 - user.​countryUS
 - user.​dir/usr/share/tomcat6
 - user.​home/usr/share/tomcat6
 - user.​languageen
 - user.​nametomcat
 - user.​timezoneEurope/Ljubljana




As far as I see from the JIRA issue I have the patch attached (as
mentioned
I have a trunk version from May 12). Any ideas?

Many thanks!



On Wed, Jun 6, 2012 at 2:49 PM, Jack Krupansky <j...@basetechnology.com>*
*wrote:

 It could be related to https://issues.apache.org/****
jira/browse/LUCENE-2975<https://issues.apache.org/**jira/browse/LUCENE-2975>
<https:**//issues.apache.org/jira/**browse/LUCENE-2975<https://issues.apache.org/jira/browse/LUCENE-2975>
>.
At least the exception comes from the same function.


"Caused by: java.io.IOException: Invalid vInt detected (too many bits)
 at org.apache.lucene.store.****DataInput.readVInt(DataInput.***
*java:112)"

What hardware and Java version are you running?

-- Jack Krupansky

-----Original Message----- From: Rok Rejc
Sent: Wednesday, June 06, 2012 3:45 AM
To: solr-user@lucene.apache.org
Subject: Exception when optimizing index


Hi all,

I have a solr installation (version 4.0 from trunk - 1st May 2012).

After I imported documents (99831145 documents) I have run the
optimization. I got an exception:

<response><lst name="responseHeader"><int name="status">500</int><int
name="QTime">281615</int></****lst><lst name="error"><str
name="msg">background
merge hit exception: _8x(4.0):C202059 _e0(4.0):C192649 _3r(4.0):C205785
_1s(4.0):C203526 _4w(4.0):C199793 _7f(4.0):C193108 _dy(4.0):C185814
_7d(4.0):C190364 _c5(4.0):C187881 _8u(4.0):C185001 _r(4.0):C183475
_1r(4.0):C185622 _2s(4.0):C174349 _3s(4.0):C171683 _7h(4.0):C170618
_fj(4.0):C179232 _2t(4.0):C161907 _fi(4.0):C168713 _1q(4.0):C165402
_2r(4.0):C152995 _e1(4.0):C146080 _f4(4.0):C155072 _af(4.0):C149113
_dx(4.0):C147298 _3t(4.0):C150806 _q(4.0):C146874 _4v(4.0):C146324
_fc(4.0):C141426 _al(4.0):C125361 _64(4.0):C119208 into _ft
[maxNumSegments=1]</str><str name="trace">java.io.****IOException:
background
merge hit exception: _8x(4.0):C202059 _e0(4.0):C192649 _3r(4.0):C205785
_1s(4.0):C203526 _4w(4.0):C199793 _7f(4.0):C193108 _dy(4.0):C185814
_7d(4.0):C190364 _c5(4.0):C187881 _8u(4.0):C185001 _r(4.0):C183475
_1r(4.0):C185622 _2s(4.0):C174349 _3s(4.0):C171683 _7h(4.0):C170618
_fj(4.0):C179232 _2t(4.0):C161907 _fi(4.0):C168713 _1q(4.0):C165402
_2r(4.0):C152995 _e1(4.0):C146080 _f4(4.0):C155072 _af(4.0):C149113
_dx(4.0):C147298 _3t(4.0):C150806 _q(4.0):C146874 _4v(4.0):C146324
_fc(4.0):C141426 _al(4.0):C125361 _64(4.0):C119208 into _ft
[maxNumSegments=1]
 at org.apache.lucene.index.****IndexWriter.forceMerge(**
IndexWriter.java:1475)
 at org.apache.lucene.index.****IndexWriter.forceMerge(**
IndexWriter.java:1412)
 at
org.apache.solr.update.****DirectUpdateHandler2.commit(**
DirectUpdateHandler2.java:385)
 at
org.apache.solr.update.****processor.RunUpdateProcessor.***
*processCommit(**
RunUpdateProcessorFactory.****java:82)
 at
org.apache.solr.update.****processor.****UpdateRequestProcessor.**
processCommit(****UpdateRequestProcessor.java:****64)
 at
org.apache.solr.update.****processor.****DistributedUpdateProcessor.**
processCommit(****DistributedUpdateProcessor.****java:783)
 at
org.apache.solr.update.****processor.LogUpdateProcessor.***
*processCommit(**
LogUpdateProcessorFactory.****java:154)
 at org.apache.solr.handler.****XMLLoader.processUpdate(**
XMLLoader.java:155)
 at org.apache.solr.handler.****XMLLoader.load(XMLLoader.java:****79)
 at
org.apache.solr.handler.****ContentStreamHandlerBase.****
handleRequestBody(**
ContentStreamHandlerBase.java:****59)
 at
org.apache.solr.handler.****RequestHandlerBase.****handleRequest(**
RequestHandlerBase.java:129)
 at org.apache.solr.core.SolrCore.****execute(SolrCore.java:1540)
 at
org.apache.solr.servlet.****SolrDispatchFilter.execute(**
SolrDispatchFilter.java:435)
 at
org.apache.solr.servlet.****SolrDispatchFilter.doFilter(**
SolrDispatchFilter.java:256)
 at
org.apache.catalina.core.****ApplicationFilterChain.****
internalDoFilter(**
ApplicationFilterChain.java:****235)
 at
org.apache.catalina.core.****ApplicationFilterChain.****doFilter(**
ApplicationFilterChain.java:****206)
 at
org.apache.catalina.core.****StandardWrapperValve.invoke(**
StandardWrapperValve.java:233)
 at
org.apache.catalina.core.****StandardContextValve.invoke(**
StandardContextValve.java:191)
 at
org.apache.catalina.core.****StandardHostValve.invoke(**
StandardHostValve.java:127)
 at
org.apache.catalina.valves.****ErrorReportValve.invoke(**
ErrorReportValve.java:102)
 at
org.apache.catalina.core.****StandardEngineValve.invoke(**
StandardEngineValve.java:109)
 at
org.apache.catalina.connector.****CoyoteAdapter.service(**
CoyoteAdapter.java:298)
 at
org.apache.coyote.http11.****Http11AprProcessor.process(**
Http11AprProcessor.java:865)
 at
org.apache.coyote.http11.****Http11AprProtocol$****
Http11ConnectionHandler.**
process(Http11AprProtocol.****java:579)
 at
org.apache.tomcat.util.net.****AprEndpoint$Worker.run(**
AprEndpoint.java:1556)
 at java.lang.Thread.run(Thread.****java:679)
Caused by: java.io.IOException: Invalid vInt detected (too many bits)
 at org.apache.lucene.store.****DataInput.readVInt(DataInput.***
*java:112)
 at
org.apache.lucene.codecs.****lucene40.****Lucene40PostingsReader$**
AllDocsSegmentDocsEnum.****nextUnreadDoc(****
Lucene40PostingsReader.java:**
557)
 at
org.apache.lucene.codecs.****lucene40.****Lucene40PostingsReader$**
SegmentDocsEnumBase.refill(****Lucene40PostingsReader.java:****408)
 at
org.apache.lucene.codecs.****lucene40.****Lucene40PostingsReader$**
AllDocsSegmentDocsEnum.****nextDoc(****Lucene40PostingsReader.java:****
508)
 at
org.apache.lucene.codecs.****MappingMultiDocsEnum.nextDoc(****
MappingMultiDocsEnum.java:85)
 at
org.apache.lucene.codecs.****PostingsConsumer.merge(**
PostingsConsumer.java:65)
 at org.apache.lucene.codecs.****TermsConsumer.merge(**
TermsConsumer.java:82)
 at org.apache.lucene.codecs.****FieldsConsumer.merge(**
FieldsConsumer.java:54)
 at
org.apache.lucene.index.****SegmentMerger.mergeTerms(**
SegmentMerger.java:356)
 at org.apache.lucene.index.****SegmentMerger.merge(**
SegmentMerger.java:115)
 at
org.apache.lucene.index.****IndexWriter.mergeMiddle(****
IndexWriter.java:3382)
 at org.apache.lucene.index.****IndexWriter.merge(IndexWriter.**
**java:3004)
 at
org.apache.lucene.index.****ConcurrentMergeScheduler.****doMerge(**
ConcurrentMergeScheduler.java:****382)
 at
org.apache.lucene.index.****ConcurrentMergeScheduler$****
MergeThread.run(**
ConcurrentMergeScheduler.java:****451)
</str><int name="code">500</int></lst></****response>

What could be wrong? Exception is reproducable. Is exception fixed in
later
versions?

Many thanks...





Reply via email to