AW: indexing two words, searching single word

Clemens Wyss DEV Fri, 03 Aug 2018 04:45:56 -0700

>Because you probably are not looking for "andthe" kind of tokens
(unfortunately) I guess I am, as we don't know what people enter...


> a shingle plus regex to remove whitespace
sounds interesting. How would that filter-chain look like? That would be an 
type="index"-analyzer?
I guess we could shingle after stop-word-filtering and I quess 
maxShingleSize="2" would suffice

-----Ursprüngliche Nachricht-----
Von: Alexandre Rafalovitch <[email protected]> 
Gesendet: Freitag, 3. August 2018 13:33
An: solr-user <[email protected]>
Betreff: Re: indexing two words, searching single word

But what is your generic problem then. Because you probably are not looking for 
"andthe" kind of tokens.

However a shingle plus regex to remove whitespace can give you "anytwo 
wordstogether smooshed" tokens in the index.

Regards,
     Alex


On Fri, Aug 3, 2018, 7:19 AM Clemens Wyss DEV, <[email protected]> wrote:

> Hi Markus,
> thanks for the quick answer.
>
> "sound stage" was just an example. We are looking for a generic 
> solution ...
>
> Is it "ok" to apply an NGRamFilter for query-analyzing?
> <analyzer type="query">
>         <tokenizer class="solr.WhitespaceTokenizerFactory" />
>         <filter class="solr.LowerCaseFilterFactory" />
>         <filter class="solr.NGramFilterFactory" minGramSize="3"
> maxGramSize="15" />
> </analyzer>
>
> I guess (besides the performance impact) this reduces search results 
> accuracy?
>
> -Clemens
>
> -----Ursprüngliche Nachricht-----
> Von: Markus Jelsma <[email protected]>
> Gesendet: Freitag, 3. August 2018 12:43
> An: [email protected]
> Betreff: RE: indexing two words, searching single word
>
> Hello,
>
> If your case is English you could use synonyms to work around the 
> problem of the few compound words of the language. However, would you 
> be dealing with a Germanic compound language, the 
> HyphenationCompoundWordTokenFilter
> [1] or DictionaryCompoundWordTokenFilter are a better choice. The 
> former is much more flexible but has its drawbacks.
>
> Regards,
> Markus
>
>
> https://lucene.apache.org/core/7_4_0/analyzers-common/org/apache/lucen
> e/analysis/compound/HyphenationCompoundWordTokenFilterFactory.html
>
>
>
> -----Original message-----
> > From:Clemens Wyss DEV <[email protected]>
> > Sent: Friday 3rd August 2018 12:22
> > To: [email protected]
> > Subject: indexing two words, searching single word
> >
> > Sounds like a rather simple issue:
> > if I index "sound stage" and search for "soundstage" I get no hits
> >
> > What am I doing wrong
> > a) when indexing
> > b) when searching
> > ?
> >
> > Thx in advance
> > - Clemens
> >
>

AW: indexing two words, searching single word

Reply via email to