Re: Spell checker index rebuild

Doug Steigerwald Thu, 17 Jan 2008 11:02:25 -0800

Seemed to be able to fix the below problem with the following patch in lucene-2.2. Going to try thelucene 2.3 branch.


Index: 
contrib/spellchecker/src/java/org/apache/lucene/search/spell/SpellChecker.java
===================================================================

--- contrib/spellchecker/src/java/org/apache/lucene/search/spell/SpellChecker.java (revision612882)

+++ 
contrib/spellchecker/src/java/org/apache/lucene/search/spell/SpellChecker.java  
    (working copy)
@@ -285,7 +285,7 @@
    */
   public void clearIndex() throws IOException {
     IndexReader.unlock(spellIndex);
-    IndexWriter writer = new IndexWriter(spellIndex, null, true);
+    IndexWriter writer = new IndexWriter(spellIndex, null, false);
     writer.close();
   }

Now the IndexWriter won't create a new index every time you rebuild the spellchecker index. Didn'tseem to have any issues with the small index I have.

Only issue I have now is with a large index (not that large, 49k documents) I get keep gettingerrors like the one below when initially building an index (and every rebuild after that). This iswith and without the patch above.

SEVERE: java.io.FileNotFoundException: /home/dsteiger/local/solr/cores/dsteiger/data/spell/_66.fnm(No such file or directory)

        at java.io.RandomAccessFile.open(Native Method)
        at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212)
        at 
org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor.<init>(FSDirectory.java:506)
        at 
org.apache.lucene.store.FSDirectory$FSIndexInput.<init>(FSDirectory.java:536)
        at 
org.apache.lucene.store.FSDirectory$FSIndexInput.<init>(FSDirectory.java:531)
        at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:440)
        at 
org.apache.lucene.index.CompoundFileWriter.copyFile(CompoundFileWriter.java:204)
        at 
org.apache.lucene.index.CompoundFileWriter.close(CompoundFileWriter.java:169)
        at 
org.apache.lucene.index.SegmentMerger.createCompoundFile(SegmentMerger.java:155)
        at 
org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:1970)
        at 
org.apache.lucene.index.IndexWriter.flushRamSegments(IndexWriter.java:1741)
        at 
org.apache.lucene.index.IndexWriter.flushRamSegments(IndexWriter.java:1733)
        at 
org.apache.lucene.index.IndexWriter.maybeFlushRamSegments(IndexWriter.java:1727)
        at 
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1004)
        at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:983)

Any ideas?

doug

Doug Steigerwald wrote:

It's in the index.  Can see it with a query: q=word:blackjack

And in luke: −
<lst name="topTerms">
    <int name="blackjack">29</int>

The actual index data seems to disappear.

First rebuild:
$ ls  spell/
_2.cfs  segments.gen  segments_i

Second rebuild:
$ ls spell
segments_2z  segments.gen

doug

Otis Gospodnetic wrote:
Do you trust the spellchecker 100% (not looking at its source now).I'd peek at the index with Luke (Luke I trust :)) and see if that termis really there first.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

----- Original Message ----
From: Doug Steigerwald <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Wednesday, January 16, 2008 2:56:35 PM
Subject: Spell checker index rebuild

Having another weird spell checker index issue.  Starting off from a
clean index and spell check index, I'll index everything inexample/exampledocs. On the firstrebuild of the spellchecker index using the query below says the word'blackjack' exists in the
 spellchecker index.  Great, no problems.

Rebuild it again and the word 'blackjack' does not exist any more.
http://localhost:8983/solr/core0/select?q=blackjack&qt=spellchecker&cmd=rebuild
Any ideas?  This is with a Solr trunk build from yesterday.

doug

Re: Spell checker index rebuild

Reply via email to