I'm in the process of incorporating Solr spellchecking in our product. For that, I've created a new field:
<field name="spell" type="spell" indexed="true" stored="true" required="false" multiValued="false"/> <copyField source="name" dest="spell" maxChars="30000" /> And in the fieldType definitions: <fieldType name="spell" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> </analyzer> </fieldType> Then I feed the names of products into the corresponding core. They can have a lot of words (examples): door lock rear left Door brake, door in front + rear fitting. However, the names get pretty long, and in the source data, they have been truncated. This sometimes leaves parts of words at the end: The water pump can evacuate some coo I have created a spellcheck component, feeding of the `spell` field defined earlier. Now for the problem. Sometimes, when I look up a slightly misspelled word, I get results I do not expect. Example request: http://solr.url:8983/solr/en/spell?q=coole This is (part of) the response: <str name="word">cooler</str><int name="freq">21</int> <str name="word">coo le</str><int name="freq">2</int> <str name="word">cable</str><int name="freq">334</int> <str name="word">co o le</str><int name="freq">4</int> [...] Now, as you can see, the misspelled `coole` should have been `cooler`, and it's the first suggestion. However, the second and fourth suggestion baffle me. After a bit of research, I found this to be multiple words clunked together. As I described above, `coo` was a part of a name that was truncated. I found `co` the same way, and the source data contains a small number of `o` characters on their own (product number names). Now, my question is: Why is Solr suggesting `multiple words` pasted together for a spellcheck for a single word? Is there a way to prevent Solr from pasting together word parts to forge suggestions?