and as you suggested, use stop word before shingles... On Fri, Aug 3, 2018 at 8:10 AM, Clemens Wyss DEV <clemens...@mysign.ch> wrote:
> <analyzer type="index"> > <tokenizer class="solr.WhitespaceTokenizerFactory" /> > <filter class="solr.LowerCaseFilterFactory" /> > <filter class="solr.ShingleFilterFactory" maxShingleSize="2" > outputUnigrams="true" tokenSeparator=""/> <!-- here weg go! --> > </analyzer> > > seems to "work" > > -----Ursprüngliche Nachricht----- > Von: Clemens Wyss DEV <clemens...@mysign.ch> > Gesendet: Freitag, 3. August 2018 13:46 > An: solr-user@lucene.apache.org > Betreff: AW: indexing two words, searching single word > > >Because you probably are not looking for "andthe" kind of tokens > (unfortunately) I guess I am, as we don't know what people enter... > > > a shingle plus regex to remove whitespace > sounds interesting. How would that filter-chain look like? That would be > an type="index"-analyzer? > I guess we could shingle after stop-word-filtering and I quess > maxShingleSize="2" would suffice > > -----Ursprüngliche Nachricht----- > Von: Alexandre Rafalovitch <arafa...@gmail.com> > Gesendet: Freitag, 3. August 2018 13:33 > An: solr-user <solr-user@lucene.apache.org> > Betreff: Re: indexing two words, searching single word > > But what is your generic problem then. Because you probably are not > looking for "andthe" kind of tokens. > > However a shingle plus regex to remove whitespace can give you "anytwo > wordstogether smooshed" tokens in the index. > > Regards, > Alex > > > On Fri, Aug 3, 2018, 7:19 AM Clemens Wyss DEV, <clemens...@mysign.ch> > wrote: > > > Hi Markus, > > thanks for the quick answer. > > > > "sound stage" was just an example. We are looking for a generic > > solution ... > > > > Is it "ok" to apply an NGRamFilter for query-analyzing? > > <analyzer type="query"> > > <tokenizer class="solr.WhitespaceTokenizerFactory" /> > > <filter class="solr.LowerCaseFilterFactory" /> > > <filter class="solr.NGramFilterFactory" minGramSize="3" > > maxGramSize="15" /> > > </analyzer> > > > > I guess (besides the performance impact) this reduces search results > > accuracy? > > > > -Clemens > > > > -----Ursprüngliche Nachricht----- > > Von: Markus Jelsma <markus.jel...@openindex.io> > > Gesendet: Freitag, 3. August 2018 12:43 > > An: solr-user@lucene.apache.org > > Betreff: RE: indexing two words, searching single word > > > > Hello, > > > > If your case is English you could use synonyms to work around the > > problem of the few compound words of the language. However, would you > > be dealing with a Germanic compound language, the > > HyphenationCompoundWordTokenFilter > > [1] or DictionaryCompoundWordTokenFilter are a better choice. The > > former is much more flexible but has its drawbacks. > > > > Regards, > > Markus > > > > > > https://lucene.apache.org/core/7_4_0/analyzers-common/org/apache/lucen > > e/analysis/compound/HyphenationCompoundWordTokenFilterFactory.html > > > > > > > > -----Original message----- > > > From:Clemens Wyss DEV <clemens...@mysign.ch> > > > Sent: Friday 3rd August 2018 12:22 > > > To: solr-user@lucene.apache.org > > > Subject: indexing two words, searching single word > > > > > > Sounds like a rather simple issue: > > > if I index "sound stage" and search for "soundstage" I get no hits > > > > > > What am I doing wrong > > > a) when indexing > > > b) when searching > > > ? > > > > > > Thx in advance > > > - Clemens > > > > > >