Michael,
To answer your questions:
I completely deleted the index each time before retesting.
and the java command as shown by "ps" does show -Xbatch.
The program is running on:
> uname -a
Linux lab8.betech.virginia.edu 2.6.18-53.1.14.el5 #1 SMP Tue Feb 19
07:18:21 EST 2008 i686 i686 i386 GNU/Linux
> more /etc/redhat-release
Red Hat Enterprise Linux Server release 5.1 (Tikanga)
after downgrading from the originally reported version of java:
Java(TM) SE Runtime Environment (build 1.6.0_05-b13)
to this one:
> java -version
java version "1.6.0_02"
Java(TM) SE Runtime Environment (build 1.6.0_02-b05)
Java HotSpot(TM) Server VM (build 1.6.0_02-b05, mixed mode)
the indexing run sucessfully completed processing all 112 record
chunks! Yea!
(with -Xbatch on the command line, I didn't try with the 1.6.0_02 java
without -Xbatch)
However, I am still seeing a different problem which is what caused me
to upgrade to Lucene version 2.3.1 and start experiencing the
CorruptIndexException.
Basically we have a set of 112 files dumped from our OPAC in a binary
Marc record format, each of which contains about 35000 records. In
addition to those files we have a set of daily updates, consisting of
new records that have been added, and edits for existing records, as
well as a separate file listing the ids of records to be deleted.
After creating the initial index, I have a script loop through all of
the update files, adding in all of the new records and updates, and then
processing all of that day's deletes. Typically at some point in
processing the updates, an auto-commit will be triggered. Eventually
for one of these auto-commits (not the same one every time) the commit
will never finish. The behavior I see is that it will write out
information about doing a commit (as shown below) and then seeming do
nothing ever after, although the CPU % as reported by "ps" for the
process sits around 90 to 100 % and stays there for days. While the
program is sitting there doing this, no changes are made to the files in
the index. So its really not clear what it is doing.
If you have any ideas about this other problem, I would appreciate ant
insight you have.
Adding record 10993: u4386758
Adding record 10994: u4386760
Adding record 10995: u4386767
Adding record 10996: u4386768
Adding record 10997: u4386812
Adding record 10998: u4386816
Adding record 10999: u4386850
Adding record 11000: u4386883
Adding record 11001: u4387066
Adding record 11002: u4387074
Adding record 11003: u4387764
Apr 20, 2008 1:12:18 PM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start commit(optimize=false,waitFlush=true,waitSearcher=true)
Apr 20, 2008 1:12:18 PM org.apache.solr.update.DirectUpdateHandler2
doDeletions
INFO: DirectUpdateHandler2 deleting and removing dups for 11003 ids
Apr 20, 2008 1:12:32 PM org.apache.solr.search.SolrIndexSearcher <init>
INFO: Opening [EMAIL PROTECTED] DirectUpdateHandler2
Apr 20, 2008 1:12:36 PM org.apache.solr.update.DirectUpdateHandler2
doDeletions
INFO: DirectUpdateHandler2 docs deleted=11003
Apr 20, 2008 1:12:36 PM org.apache.solr.search.SolrIndexSearcher <init>
INFO: Opening [EMAIL PROTECTED] main
Apr 20, 2008 1:12:37 PM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: end_commit_flush
Apr 20, 2008 1:12:37 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming [EMAIL PROTECTED] main from [EMAIL PROTECTED] main
filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Apr 20, 2008 1:12:37 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming result for [EMAIL PROTECTED] main
filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Apr 20, 2008 1:12:37 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming [EMAIL PROTECTED] main from [EMAIL PROTECTED] main
queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Apr 20, 2008 1:12:37 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming result for [EMAIL PROTECTED] main
queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Apr 20, 2008 1:12:37 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming [EMAIL PROTECTED] main from [EMAIL PROTECTED] main
documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Apr 20, 2008 1:12:37 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming result for [EMAIL PROTECTED] main
documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Apr 20, 2008 1:12:37 PM org.apache.solr.core.SolrCore registerSearcher
INFO: Registered new searcher [EMAIL PROTECTED] main
Apr 20, 2008 1:12:37 PM org.apache.solr.search.SolrIndexSearcher close
INFO: Closing [EMAIL PROTECTED] main
filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Michael McCandless wrote:
Hmmm, not good.
One other thing to try would be -Xint, which turns off hotspot
compilation entirely. On that last case that also prevented the issue.
Did you cleanly rebuild your index when you retested? And you're
really certain your JRE is running with -Xbatch? (You should be able
to use "ps" to look in the process listing and verify).
Which OS are you on?
Mike
Robert Haschart wrote:
Mike,
You are right it does sound exactly like that situation. The java
version is:
java version "1.6.0_05"
Java(TM) SE Runtime Environment (build 1.6.0_05-b13)
Java HotSpot(TM) Server VM (build 10.0-b19, mixed mode)
Which seems to be the same as the one giving the other poster problems.
I tried running with the -Xbatch flag on for the indexing runs, but
after 23 chunks of Marc records, the exact same problem occurred.
I'm going to try downgrading java versions to see whether that helps.
-Robert Haschart
Michael McCandless wrote:
Which exact version of the JRE are you using? Can you try running
java with -Xbatch (forces up front compilation)?
Your situation sounds very similar to this one:
http://lucene.markmail.org/message/awkkunr7j24nh4qj
Mike
On Apr 17, 2008, at 10:57 AM, Robert Haschart wrote:
Greetings all,
We are using Solr to index Marc records to create a better, more
user friendly library catalog here at the University of
Virginia. To do this I have written a program starting from the
VuFind Java importer written by Wayne Graham (from the College of
William & Mary). After making extensive changes to accomodate
our different indexing scheme the program was working great.
Subsequently I upgraded to lucene version 2.3.1 to realize the
faster indexing performance, and although it is faster, is also
seems somewhat unreliable. A half a dozen times I have
completely wiped out the existing index and started afresh,
building an new index. Each time at some point in the indexing
run I receive the Error message:
Exception in thread "Thread-10"
org.apache.lucene.index.MergePolicy $MergeException:
org.apache.lucene.index.CorruptIndexException:
doc counts differ for segment _3gh: fieldsReader shows 15999
but segmentInfo shows 16000
at org.apache.lucene.index.ConcurrentMergeScheduler
$MergeThread.run(ConcurrentMergeScheduler.java:271)
Caused by: org.apache.lucene.index.CorruptIndexException: doc
counts differ for segment _3gh: fieldsReader shows 15999 but
segmentInfo shows 16000
at org.apache.lucene.index.SegmentReader.initialize
(SegmentReader.java:313)
at org.apache.lucene.index.SegmentReader.get
(SegmentReader.java:262)
at org.apache.lucene.index.SegmentReader.get
(SegmentReader.java:221)
at org.apache.lucene.index.IndexWriter.mergeMiddle
(IndexWriter.java:3099)
at org.apache.lucene.index.IndexWriter.merge
(IndexWriter.java:2834)
at org.apache.lucene.index.ConcurrentMergeScheduler
$MergeThread.run(ConcurrentMergeScheduler.java:240)
This occurs a dozen or so times within the set of Marc records it
is processing, and when the next segment starts processing I
receive this message:
Loading properties from import.properties
Apr 16, 2008 7:03:51 AM org.apache.solr.core.Config setInstanceDir
INFO: Solr home set to '/usr/local/projects/solr/'
Apr 16, 2008 7:03:51 AM org.apache.solr.core.SolrConfig initConfig
INFO: Loaded SolrConfig: solrconfig.xml
Apr 16, 2008 7:03:52 AM org.apache.solr.core.SolrCore <init>
INFO: Opening new SolrCore at /usr/local/projects/solr/, dataDir=/
usr/local/projects/solr/data
Apr 16, 2008 7:03:52 AM org.apache.solr.schema.IndexSchema readConfig
INFO: Reading Solr Schema
Apr 16, 2008 7:03:52 AM org.apache.solr.schema.IndexSchema readConfig
INFO: Schema name=solr_int
Apr 16, 2008 7:03:52 AM org.apache.solr.schema.IndexSchema readConfig
INFO: default search field is text
Apr 16, 2008 7:03:52 AM org.apache.solr.schema.IndexSchema readConfig
INFO: query parser default operator is AND
Apr 16, 2008 7:03:52 AM org.apache.solr.schema.IndexSchema readConfig
INFO: unique key field: id
Apr 16, 2008 7:03:52 AM org.apache.solr.core.SolrCore parseListener
INFO: Searching for listeners: //[EMAIL PROTECTED]"firstSearcher"]
Apr 16, 2008 7:03:52 AM org.apache.solr.core.SolrCore parseListener
INFO: Searching for listeners: //[EMAIL PROTECTED]"newSearcher"]
Apr 16, 2008 7:03:52 AM org.apache.solr.core.SolrCore initWriters
INFO: adding queryResponseWriter
xslt=org.apache.solr.request.XSLTResponseWriter
Apr 16, 2008 7:03:52 AM
org.apache.solr.request.XSLTResponseWriter init
INFO: xsltCacheLifetimeSeconds=5
Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers
initHandlersFromConfig
INFO: adding requestHandler: standard=solr.StandardRequestHandler
Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers
initHandlersFromConfig
INFO: adding requestHandler: partitioned=solr.DisMaxRequestHandler
Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers
initHandlersFromConfig
INFO: adding requestHandler: dismax=solr.DisMaxRequestHandler
Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers
initHandlersFromConfig
INFO: adding requestHandler: catalog=solr.DisMaxRequestHandler
Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers
initHandlersFromConfig
INFO: adding requestHandler: music=solr.DisMaxRequestHandler
Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers
initHandlersFromConfig
INFO: adding requestHandler:
semester_at_sea=solr.DisMaxRequestHandler
Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers
initHandlersFromConfig
INFO: adding lazy requestHandler:
spellchecker=solr.SpellCheckerRequestHandler
Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers
initHandlersFromConfig
INFO: adding requestHandler: /update=solr.XmlUpdateRequestHandler
Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers
initHandlersFromConfig
INFO: adding lazy requestHandler: /update/csv=solr.CSVRequestHandler
Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers
initHandlersFromConfig
INFO: adding requestHandler: /admin/
luke=org.apache.solr.handler.admin.LukeRequestHandler
Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers
initHandlersFromConfig
INFO: adding requestHandler: /admin/
system=org.apache.solr.handler.admin.SystemInfoHandler
Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers
initHandlersFromConfig
INFO: adding requestHandler: /admin/
plugins=org.apache.solr.handler.admin.PluginInfoHandler
Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers
initHandlersFromConfig
INFO: adding requestHandler: /admin/
threads=org.apache.solr.handler.admin.ThreadDumpHandler
Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers
initHandlersFromConfig
INFO: adding requestHandler: /admin/
properties=org.apache.solr.handler.admin.PropertiesRequestHandler
Apr 16, 2008 7:03:52 AM org.apache.solr.core.RequestHandlers
initHandlersFromConfig
INFO: adding requestHandler: /debug/dump=solr.DumpRequestHandler
Exception in thread "main" java.lang.RuntimeException:
org.apache.lucene.index.CorruptIndexException: doc counts differ
for segment
_3gh: fieldsReader shows 15999 but segmentInfo shows 16000
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:
433)
at org.apache.solr.core.SolrCore.<init>(SolrCore.java:216)
at MarcImporter.<init>(Unknown Source)
at MarcImporter.main(Unknown Source)
Caused by: org.apache.lucene.index.CorruptIndexException: doc
counts differ for segment _3gh: fieldsReader shows 15999 but
segmentIn
fo shows 16000
at org.apache.lucene.index.SegmentReader.initialize
(SegmentReader.java:313)
at org.apache.lucene.index.SegmentReader.get
(SegmentReader.java:262)
at org.apache.lucene.index.SegmentReader.get
(SegmentReader.java:197)
at org.apache.lucene.index.MultiSegmentReader.<init>
(MultiSegmentReader.java:55)
at org.apache.lucene.index.DirectoryIndexReader$1.doBody
(DirectoryIndexReader.java:75)
at org.apache.lucene.index.SegmentInfos
$FindSegmentsFile.run (SegmentInfos.java:636)
at org.apache.lucene.index.DirectoryIndexReader.open
(DirectoryIndexReader.java:63)
at org.apache.lucene.index.IndexReader.open
(IndexReader.java: 209)
at org.apache.lucene.index.IndexReader.open
(IndexReader.java: 173)
at org.apache.solr.search.SolrIndexSearcher.<init>
(SolrIndexSearcher.java:87)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:
424)
... 3 more
Does anybody have suggestions as to how to track this problem down?
Thanks in Advance,
Robert Haschart