I indexed an electronics e-commerce product catalog. This is a typical document from my collection:
"docs": [ { "prezzo_vendita_d": 39.9, "codice_produttore_s": "DK00150020", "codice_s": "5.BAT.27407", "descrizione": "BATTERIA GO PRO HERO ", "barcode_interno_s": "185323000958", "categoria": "Batterie", "prezzo_acquisto_d": 16.12, "marchio": "GO PRO", "data_aggiornamento_dt": "2012-06-21T00:00:00Z", "id": "27407", "_version_": 1491274123542790100 }, { "codice_produttore_s": "DK0052043", "codice_s": "05.SP.42760", "id": "42760", "marchio": "SP GADGETS", "barcode_interno_s": "4028017520430", "prezzo_acquisto_d": 34.4, "data_aggiornamento_dt": "2014-11-04T00:00:00Z", "descrizione": "SP POS CASE GOPRO OLIVE LARGE", "prezzo_vendita_d": 59.95, "_version_": 1491274406746390500 } ...] I want my spellchecker to suggest "go pro" to users searching "gopro" (without whitespace). I also want users searching "go pro" to find "gopro" products, too. Here's a little bit of my configuration: *schema.xml* <field name="marchio" type="string" indexed="true" stored="true"/> <field name="categoria" type="string" indexed="true" stored="true"/> <field name="fornitore" type="string" indexed="true" stored="true"/> <field name="descrizione" type="string" indexed="true" stored="true"/> <field name="catch_all_original" type="text_general" indexed="true" stored="false" multiValued="true" /> <field name="catch_all" type="text_it" indexed="true" stored="false" multiValued="true" /> <copyField source="marchio" dest="catch_all" /> <copyField source="categoria" dest="catch_all" /> <copyField source="descrizione" dest="catch_all" /> <copyField source="fornitore" dest="catch_all" /> <copyField source="marchio" dest="catch_all_original" /> <copyField source="categoria" dest="catch_all_original" /> <copyField source="descrizione" dest="catch_all_original" /> <copyField source="fornitore" dest="catch_all_original" /> ... <fieldType name="text_it" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" preserveOriginal="1" /> <filter class="solr.ElisionFilterFactory" ignoreCase="true" articles="lang/contractions_it.txt"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.ASCIIFoldingFilterFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_it.txt" format="snowball" /> <filter class="solr.ItalianLightStemFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" preserveOriginal="1" /> <filter class="solr.ElisionFilterFactory" ignoreCase="true" articles="lang/contractions_it.txt"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.ASCIIFoldingFilterFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_it.txt" format="snowball" /> <filter class="solr.ItalianLightStemFilterFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> </analyzer> </fieldType> <br /> *solr-config.xml* <requestHandler name="/select" class="solr.SearchHandler"> <lst name="defaults"> <str name="echoParams">explicit</str> <int name="rows">10</int> <str name="df">catch_all</str> <str name="spellcheck">on</str> <str name="spellcheck.dictionary">default</str> <str name="spellcheck.dictionary">wordbreak</str> <str name="spellcheck.extendedResults">false</str> <str name="spellcheck.count">5</str> <str name="spellcheck.alternativeTermCount">2</str> <str name="spellcheck.maxResultsForSuggest">5</str> <str name="spellcheck.collate">true</str> <str name="spellcheck.collateExtendedResults">true</str> <str name="spellcheck.maxCollationTries">5</str> <str name="spellcheck.maxCollations">3</str> </lst> <arr name="last-components"> <str>spellcheck</str> </arr> </requestHandler> ... <searchComponent name="spellcheck" class="solr.SpellCheckComponent"> <str name="queryAnalyzerFieldType">text_general</str> <lst name="spellchecker"> <str name="name">default</str> <str name="field">catch_all_original</str> <str name="classname">solr.DirectSolrSpellChecker</str> <str name="distanceMeasure">internal</str> <float name="accuracy">0.5</float> <int name="maxEdits">2</int> <int name="minPrefix">1</int> <int name="maxInspections">5</int> <int name="minQueryLength">4</int> <float name="maxQueryFrequency">0.01</float> </lst> <lst name="spellchecker"> <str name="name">wordbreak</str> <str name="classname">solr.WordBreakSolrSpellChecker</str> <str name="field">catch_all_original</str> <str name="combineWords">true</str> <str name="breakWords">true</str> <int name="maxChanges">10</int> <int name="minBreakLength">3</int> </lst> </searchComponent> *Is the spellchecker the right solution or is this the case for something else, like the "more like this" feature?* Thank you -- View this message in context: http://lucene.472066.n3.nabble.com/Suggesting-broken-words-with-solr-WordBreakSolrSpellChecker-tp4182172.html Sent from the Solr - User mailing list archive at Nabble.com.