>Because you probably are not looking for "andthe" kind of tokens (unfortunately) I guess I am, as we don't know what people enter...
> a shingle plus regex to remove whitespace sounds interesting. How would that filter-chain look like? That would be an type="index"-analyzer? I guess we could shingle after stop-word-filtering and I quess maxShingleSize="2" would suffice -----Ursprüngliche Nachricht----- Von: Alexandre Rafalovitch <arafa...@gmail.com> Gesendet: Freitag, 3. August 2018 13:33 An: solr-user <solr-user@lucene.apache.org> Betreff: Re: indexing two words, searching single word But what is your generic problem then. Because you probably are not looking for "andthe" kind of tokens. However a shingle plus regex to remove whitespace can give you "anytwo wordstogether smooshed" tokens in the index. Regards, Alex On Fri, Aug 3, 2018, 7:19 AM Clemens Wyss DEV, <clemens...@mysign.ch> wrote: > Hi Markus, > thanks for the quick answer. > > "sound stage" was just an example. We are looking for a generic > solution ... > > Is it "ok" to apply an NGRamFilter for query-analyzing? > <analyzer type="query"> > <tokenizer class="solr.WhitespaceTokenizerFactory" /> > <filter class="solr.LowerCaseFilterFactory" /> > <filter class="solr.NGramFilterFactory" minGramSize="3" > maxGramSize="15" /> > </analyzer> > > I guess (besides the performance impact) this reduces search results > accuracy? > > -Clemens > > -----Ursprüngliche Nachricht----- > Von: Markus Jelsma <markus.jel...@openindex.io> > Gesendet: Freitag, 3. August 2018 12:43 > An: solr-user@lucene.apache.org > Betreff: RE: indexing two words, searching single word > > Hello, > > If your case is English you could use synonyms to work around the > problem of the few compound words of the language. However, would you > be dealing with a Germanic compound language, the > HyphenationCompoundWordTokenFilter > [1] or DictionaryCompoundWordTokenFilter are a better choice. The > former is much more flexible but has its drawbacks. > > Regards, > Markus > > > https://lucene.apache.org/core/7_4_0/analyzers-common/org/apache/lucen > e/analysis/compound/HyphenationCompoundWordTokenFilterFactory.html > > > > -----Original message----- > > From:Clemens Wyss DEV <clemens...@mysign.ch> > > Sent: Friday 3rd August 2018 12:22 > > To: solr-user@lucene.apache.org > > Subject: indexing two words, searching single word > > > > Sounds like a rather simple issue: > > if I index "sound stage" and search for "soundstage" I get no hits > > > > What am I doing wrong > > a) when indexing > > b) when searching > > ? > > > > Thx in advance > > - Clemens > > >