Greetings!

I'm attempting to use a file-based spell checker. My sourceLocation is /usr/share/dict/linux.words, and my spellcheckIndexDir is set to ./data/spFile. BuildOnStartup is set to true, and I see nothing to suggest any sort of problem/error in solr.log. However, in my ./data/spFile/ directory, there are only two files: segments_2 with only 71 bytes in it, and a zero-byte write.lock file. For a source dictionary having 480,000 words in it, I was expecting a bit more substance in the ./data/spFile directory. Something doesn't seem right with this.

Moreover, I ran a query on the word Fenbers, which isn't listed in the linux.words file, but there are several similar words. The results I got back were odd, and suggestions included the following:
fenber
f en be r
f e nb er
f en b er
f e n be r
f en b e r
f e nb e r
f e n b er
f e n b e r

But I expected suggestions like fenders, embers, and fenberry, etc. I also ran a query on Mark (which IS listed in linux.words) and got back two suggestions in a similar format. I played with configurables like changing the fieldType from text_en to string and the characterEncoding from UTF-8 to ASCII, etc., but nothing seemed to yield any different results.

Can anyone offer suggestions as to what I'm doing wrong? I've been struggling with this for more than 40 hours now! I'm surprised my persistence has lasted this long!

Thanks,
Mark

Reply via email to