But what is your generic problem then. Because you probably are not looking for "andthe" kind of tokens.
However a shingle plus regex to remove whitespace can give you "anytwo wordstogether smooshed" tokens in the index. Regards, Alex On Fri, Aug 3, 2018, 7:19 AM Clemens Wyss DEV, <clemens...@mysign.ch> wrote: > Hi Markus, > thanks for the quick answer. > > "sound stage" was just an example. We are looking for a generic solution > ... > > Is it "ok" to apply an NGRamFilter for query-analyzing? > <analyzer type="query"> > <tokenizer class="solr.WhitespaceTokenizerFactory" /> > <filter class="solr.LowerCaseFilterFactory" /> > <filter class="solr.NGramFilterFactory" minGramSize="3" > maxGramSize="15" /> > </analyzer> > > I guess (besides the performance impact) this reduces search results > accuracy? > > -Clemens > > -----Ursprüngliche Nachricht----- > Von: Markus Jelsma <markus.jel...@openindex.io> > Gesendet: Freitag, 3. August 2018 12:43 > An: solr-user@lucene.apache.org > Betreff: RE: indexing two words, searching single word > > Hello, > > If your case is English you could use synonyms to work around the problem > of the few compound words of the language. However, would you be dealing > with a Germanic compound language, the HyphenationCompoundWordTokenFilter > [1] or DictionaryCompoundWordTokenFilter are a better choice. The former is > much more flexible but has its drawbacks. > > Regards, > Markus > > > https://lucene.apache.org/core/7_4_0/analyzers-common/org/apache/lucene/analysis/compound/HyphenationCompoundWordTokenFilterFactory.html > > > > -----Original message----- > > From:Clemens Wyss DEV <clemens...@mysign.ch> > > Sent: Friday 3rd August 2018 12:22 > > To: solr-user@lucene.apache.org > > Subject: indexing two words, searching single word > > > > Sounds like a rather simple issue: > > if I index "sound stage" and search for "soundstage" I get no hits > > > > What am I doing wrong > > a) when indexing > > b) when searching > > ? > > > > Thx in advance > > - Clemens > > >