and as you suggested, use stop word before shingles... On Fri, Aug 3, 2018 at 8:10 AM, Clemens Wyss DEV <[email protected]> wrote:
> <analyzer type="index"> > <tokenizer class="solr.WhitespaceTokenizerFactory" /> > <filter class="solr.LowerCaseFilterFactory" /> > <filter class="solr.ShingleFilterFactory" maxShingleSize="2" > outputUnigrams="true" tokenSeparator=""/> <!-- here weg go! --> > </analyzer> > > seems to "work" > > -----Ursprüngliche Nachricht----- > Von: Clemens Wyss DEV <[email protected]> > Gesendet: Freitag, 3. August 2018 13:46 > An: [email protected] > Betreff: AW: indexing two words, searching single word > > >Because you probably are not looking for "andthe" kind of tokens > (unfortunately) I guess I am, as we don't know what people enter... > > > a shingle plus regex to remove whitespace > sounds interesting. How would that filter-chain look like? That would be > an type="index"-analyzer? > I guess we could shingle after stop-word-filtering and I quess > maxShingleSize="2" would suffice > > -----Ursprüngliche Nachricht----- > Von: Alexandre Rafalovitch <[email protected]> > Gesendet: Freitag, 3. August 2018 13:33 > An: solr-user <[email protected]> > Betreff: Re: indexing two words, searching single word > > But what is your generic problem then. Because you probably are not > looking for "andthe" kind of tokens. > > However a shingle plus regex to remove whitespace can give you "anytwo > wordstogether smooshed" tokens in the index. > > Regards, > Alex > > > On Fri, Aug 3, 2018, 7:19 AM Clemens Wyss DEV, <[email protected]> > wrote: > > > Hi Markus, > > thanks for the quick answer. > > > > "sound stage" was just an example. We are looking for a generic > > solution ... > > > > Is it "ok" to apply an NGRamFilter for query-analyzing? > > <analyzer type="query"> > > <tokenizer class="solr.WhitespaceTokenizerFactory" /> > > <filter class="solr.LowerCaseFilterFactory" /> > > <filter class="solr.NGramFilterFactory" minGramSize="3" > > maxGramSize="15" /> > > </analyzer> > > > > I guess (besides the performance impact) this reduces search results > > accuracy? > > > > -Clemens > > > > -----Ursprüngliche Nachricht----- > > Von: Markus Jelsma <[email protected]> > > Gesendet: Freitag, 3. August 2018 12:43 > > An: [email protected] > > Betreff: RE: indexing two words, searching single word > > > > Hello, > > > > If your case is English you could use synonyms to work around the > > problem of the few compound words of the language. However, would you > > be dealing with a Germanic compound language, the > > HyphenationCompoundWordTokenFilter > > [1] or DictionaryCompoundWordTokenFilter are a better choice. The > > former is much more flexible but has its drawbacks. > > > > Regards, > > Markus > > > > > > https://lucene.apache.org/core/7_4_0/analyzers-common/org/apache/lucen > > e/analysis/compound/HyphenationCompoundWordTokenFilterFactory.html > > > > > > > > -----Original message----- > > > From:Clemens Wyss DEV <[email protected]> > > > Sent: Friday 3rd August 2018 12:22 > > > To: [email protected] > > > Subject: indexing two words, searching single word > > > > > > Sounds like a rather simple issue: > > > if I index "sound stage" and search for "soundstage" I get no hits > > > > > > What am I doing wrong > > > a) when indexing > > > b) when searching > > > ? > > > > > > Thx in advance > > > - Clemens > > > > > >
