Greetings!
I'm attempting to use a file-based spell checker. My sourceLocation is
/usr/share/dict/linux.words, and my spellcheckIndexDir is set to
./data/spFile. BuildOnStartup is set to true, and I see nothing to
suggest any sort of problem/error in solr.log. However, in my
./data/spFile/ directory, there are only two files: segments_2 with only
71 bytes in it, and a zero-byte write.lock file. For a source
dictionary having 480,000 words in it, I was expecting a bit more
substance in the ./data/spFile directory. Something doesn't seem right
with this.
Moreover, I ran a query on the word Fenbers, which isn't listed in the
linux.words file, but there are several similar words. The results I
got back were odd, and suggestions included the following:
fenber
f en be r
f e nb er
f en b er
f e n be r
f en b e r
f e nb e r
f e n b er
f e n b e r
But I expected suggestions like fenders, embers, and fenberry, etc. I
also ran a query on Mark (which IS listed in linux.words) and got back
two suggestions in a similar format. I played with configurables like
changing the fieldType from text_en to string and the characterEncoding
from UTF-8 to ASCII, etc., but nothing seemed to yield any different
results.
Can anyone offer suggestions as to what I'm doing wrong? I've been
struggling with this for more than 40 hours now! I'm surprised my
persistence has lasted this long!
Thanks,
Mark