Michael,
Following up on this most recent post. I remembered that the initial records were translated into utf-8 prior to indexing, whereas the updates records are in the marc-8 encoding internally, and the program is written to translate them on the fly as they are read in before indexing them. I just tried pre-translating them, and the entire set of updates ran. So at this point it looks like the problem is in my marc-8 to utf-8 translation code. I'll look into this possibility further.

   Thank again for your help on my earlier problem.
   -Robert Haschart

Robert Haschart wrote:

Michael,

To answer your questions: I completely deleted the index each time before retesting. and the java command as shown by "ps" does show -Xbatch.
The program is running on:
> uname -a
Linux lab8.betech.virginia.edu 2.6.18-53.1.14.el5 #1 SMP Tue Feb 19 07:18:21 EST 2008 i686 i686 i386 GNU/Linux
> more /etc/redhat-release
Red Hat Enterprise Linux Server release 5.1 (Tikanga)

after downgrading from the originally reported version of java: Java(TM) SE Runtime Environment (build 1.6.0_05-b13)
to this one:
> java -version
java version "1.6.0_02"
Java(TM) SE Runtime Environment (build 1.6.0_02-b05)
Java HotSpot(TM) Server VM (build 1.6.0_02-b05, mixed mode)

the indexing run sucessfully completed processing all 112 record chunks! Yea! (with -Xbatch on the command line, I didn't try with the 1.6.0_02 java without -Xbatch)


However, I am still seeing a different problem which is what caused me to upgrade to Lucene version 2.3.1 and start experiencing the CorruptIndexException.

Basically we have a set of 112 files dumped from our OPAC in a binary Marc record format, each of which contains about 35000 records. In addition to those files we have a set of daily updates, consisting of new records that have been added, and edits for existing records, as well as a separate file listing the ids of records to be deleted.

After creating the initial index, I have a script loop through all of the update files, adding in all of the new records and updates, and then processing all of that day's deletes. Typically at some point in processing the updates, an auto-commit will be triggered. Eventually for one of these auto-commits (not the same one every time) the commit will never finish. The behavior I see is that it will write out information about doing a commit (as shown below) and then seeming do nothing ever after, although the CPU % as reported by "ps" for the process sits around 90 to 100 % and stays there for days. While the program is sitting there doing this, no changes are made to the files in the index. So its really not clear what it is doing. If you have any ideas about this other problem, I would appreciate ant insight you have.

Adding record 10993: u4386758
Adding record 10994: u4386760
Adding record 10995: u4386767
Adding record 10996: u4386768
Adding record 10997: u4386812
Adding record 10998: u4386816
Adding record 10999: u4386850
Adding record 11000: u4386883
Adding record 11001: u4387066
Adding record 11002: u4387074
Adding record 11003: u4387764
Apr 20, 2008 1:12:18 PM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start commit(optimize=false,waitFlush=true,waitSearcher=true)
Apr 20, 2008 1:12:18 PM org.apache.solr.update.DirectUpdateHandler2 doDeletions
INFO: DirectUpdateHandler2 deleting and removing dups for 11003 ids
Apr 20, 2008 1:12:32 PM org.apache.solr.search.SolrIndexSearcher <init>
INFO: Opening [EMAIL PROTECTED] DirectUpdateHandler2
Apr 20, 2008 1:12:36 PM org.apache.solr.update.DirectUpdateHandler2 doDeletions
INFO: DirectUpdateHandler2 docs deleted=11003
Apr 20, 2008 1:12:36 PM org.apache.solr.search.SolrIndexSearcher <init>
INFO: Opening [EMAIL PROTECTED] main
Apr 20, 2008 1:12:37 PM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: end_commit_flush
Apr 20, 2008 1:12:37 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming [EMAIL PROTECTED] main from [EMAIL PROTECTED] main
filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Apr 20, 2008 1:12:37 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming result for [EMAIL PROTECTED] main
filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Apr 20, 2008 1:12:37 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming [EMAIL PROTECTED] main from [EMAIL PROTECTED] main
queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Apr 20, 2008 1:12:37 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming result for [EMAIL PROTECTED] main
queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Apr 20, 2008 1:12:37 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming [EMAIL PROTECTED] main from [EMAIL PROTECTED] main
documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Apr 20, 2008 1:12:37 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming result for [EMAIL PROTECTED] main
documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Apr 20, 2008 1:12:37 PM org.apache.solr.core.SolrCore registerSearcher
INFO: Registered new searcher [EMAIL PROTECTED] main
Apr 20, 2008 1:12:37 PM org.apache.solr.search.SolrIndexSearcher close
INFO: Closing [EMAIL PROTECTED] main
filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}





Reply via email to