Seemed to be able to fix the below problem with the following patch in lucene-2.2. Going to try the
lucene 2.3 branch.
Index:
contrib/spellchecker/src/java/org/apache/lucene/search/spell/SpellChecker.java
===================================================================
--- contrib/spellchecker/src/java/org/apache/lucene/search/spell/SpellChecker.java (revision
612882)
+++
contrib/spellchecker/src/java/org/apache/lucene/search/spell/SpellChecker.java
(working copy)
@@ -285,7 +285,7 @@
*/
public void clearIndex() throws IOException {
IndexReader.unlock(spellIndex);
- IndexWriter writer = new IndexWriter(spellIndex, null, true);
+ IndexWriter writer = new IndexWriter(spellIndex, null, false);
writer.close();
}
Now the IndexWriter won't create a new index every time you rebuild the spellchecker index. Didn't
seem to have any issues with the small index I have.
Only issue I have now is with a large index (not that large, 49k documents) I get keep getting
errors like the one below when initially building an index (and every rebuild after that). This is
with and without the patch above.
SEVERE: java.io.FileNotFoundException: /home/dsteiger/local/solr/cores/dsteiger/data/spell/_66.fnm
(No such file or directory)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212)
at
org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor.<init>(FSDirectory.java:506)
at
org.apache.lucene.store.FSDirectory$FSIndexInput.<init>(FSDirectory.java:536)
at
org.apache.lucene.store.FSDirectory$FSIndexInput.<init>(FSDirectory.java:531)
at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:440)
at
org.apache.lucene.index.CompoundFileWriter.copyFile(CompoundFileWriter.java:204)
at
org.apache.lucene.index.CompoundFileWriter.close(CompoundFileWriter.java:169)
at
org.apache.lucene.index.SegmentMerger.createCompoundFile(SegmentMerger.java:155)
at
org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:1970)
at
org.apache.lucene.index.IndexWriter.flushRamSegments(IndexWriter.java:1741)
at
org.apache.lucene.index.IndexWriter.flushRamSegments(IndexWriter.java:1733)
at
org.apache.lucene.index.IndexWriter.maybeFlushRamSegments(IndexWriter.java:1727)
at
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1004)
at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:983)
Any ideas?
doug
Doug Steigerwald wrote:
It's in the index. Can see it with a query: q=word:blackjack
And in luke: −
<lst name="topTerms">
<int name="blackjack">29</int>
The actual index data seems to disappear.
First rebuild:
$ ls spell/
_2.cfs segments.gen segments_i
Second rebuild:
$ ls spell
segments_2z segments.gen
doug
Otis Gospodnetic wrote:
Do you trust the spellchecker 100% (not looking at its source now).
I'd peek at the index with Luke (Luke I trust :)) and see if that term
is really there first.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
----- Original Message ----
From: Doug Steigerwald <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Wednesday, January 16, 2008 2:56:35 PM
Subject: Spell checker index rebuild
Having another weird spell checker index issue. Starting off from a
clean index and spell check index, I'll index everything in
example/exampledocs. On the first
rebuild of the spellchecker index using the query below says the word
'blackjack' exists in the
spellchecker index. Great, no problems.
Rebuild it again and the word 'blackjack' does not exist any more.
http://localhost:8983/solr/core0/select?q=blackjack&qt=spellchecker&cmd=rebuild
Any ideas? This is with a Solr trunk build from yesterday.
doug