Re: Use case for the Shingle Filter

Ryan Yacyshyn Mon, 06 Mar 2017 00:58:27 -0800

The query parser will split on whitespace. I'm not sure how I can use the
shingle filter in my query, and use-cases for it. For example, if my
fieldType looks like this:


<fieldType name="text_general" class="solr.TextField"
positionIncrementGap="100" multiValued="true">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
*    <filter class="solr.ShingleFilterFactory" tokenSeparator=""/>*
  </analyzer>
</fieldType>

and I have a document that has "my babysitter is terrific" in the content_t
field, a query such as:

http://localhost:8983/solr/collection_name/select?q={!lucene}content_t:(the
baby
sitter was here)

won't return the document. I was hoping I'd get tokens like "the
thebaby baby babysitter sitter sitterwas ..." when querying.





On Sun, 5 Mar 2017 at 23:59 Ryan Josal <r...@josal.com> wrote:

> I thought new versions of solr didn't split on whitespace at the query
> parser anymore, so this should work?
>
> That being said, I think I remember it having a problem coming after a
> synonym filter.  IIRC, if your input is "Foo Bar" and you have a synonym
> "foo <=> baz" you would get foobaz bazbar instead of foobar and bazbar.  I
> wrote a custom shingler to account for that.
>
> Ryan
>
> On Sun, Mar 5, 2017 at 02:48 Markus Jelsma <markus.jel...@openindex.io>
> wrote:
>
> > Hello - we use it for text classification and online near-duplicate
> > document detection/filtering. Using shingles means you want to consider
> > order in the text. It is analogous to using bigrams and trigrams when
> doing
> > language detection, you cannot distinguish between Danish and Norwegian
> > solely on single characters.
> >
> > Markus
> >
> >
> >
> > -----Original message-----
> > > From:Ryan Yacyshyn <ryan.yacys...@gmail.com>
> > > Sent: Sunday 5th March 2017 5:57
> > > To: solr-user@lucene.apache.org
> > > Subject: Use case for the Shingle Filter
> > >
> > > Hi everyone,
> > >
> > > I was thinking of using the Shingle Filter to help solve an issue I'm
> > > facing. I can see this working in the analysis panel in the Solr admin,
> > but
> > > not when I make my queries.
> > >
> > > I find out it's because of the query parser splitting up the tokens on
> > > white space before passing them along.
> > >
> > > This made me wonder what a practical use case can be, for using the
> > shingle
> > > filter?
> > >
> > > Any enlightenment on this would be much appreciated!
> > >
> > > Thanks,
> > > Ryan
> > >
> >
>

Re: Use case for the Shingle Filter

Reply via email to