The query parser will split on whitespace. I'm not sure how I can use the shingle filter in my query, and use-cases for it. For example, if my fieldType looks like this:
<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100" multiValued="true"> <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> * <filter class="solr.ShingleFilterFactory" tokenSeparator=""/>* </analyzer> </fieldType> and I have a document that has "my babysitter is terrific" in the content_t field, a query such as: http://localhost:8983/solr/collection_name/select?q={!lucene}content_t:(the baby sitter was here) won't return the document. I was hoping I'd get tokens like "the thebaby baby babysitter sitter sitterwas ..." when querying. On Sun, 5 Mar 2017 at 23:59 Ryan Josal <r...@josal.com> wrote: > I thought new versions of solr didn't split on whitespace at the query > parser anymore, so this should work? > > That being said, I think I remember it having a problem coming after a > synonym filter. IIRC, if your input is "Foo Bar" and you have a synonym > "foo <=> baz" you would get foobaz bazbar instead of foobar and bazbar. I > wrote a custom shingler to account for that. > > Ryan > > On Sun, Mar 5, 2017 at 02:48 Markus Jelsma <markus.jel...@openindex.io> > wrote: > > > Hello - we use it for text classification and online near-duplicate > > document detection/filtering. Using shingles means you want to consider > > order in the text. It is analogous to using bigrams and trigrams when > doing > > language detection, you cannot distinguish between Danish and Norwegian > > solely on single characters. > > > > Markus > > > > > > > > -----Original message----- > > > From:Ryan Yacyshyn <ryan.yacys...@gmail.com> > > > Sent: Sunday 5th March 2017 5:57 > > > To: solr-user@lucene.apache.org > > > Subject: Use case for the Shingle Filter > > > > > > Hi everyone, > > > > > > I was thinking of using the Shingle Filter to help solve an issue I'm > > > facing. I can see this working in the analysis panel in the Solr admin, > > but > > > not when I make my queries. > > > > > > I find out it's because of the query parser splitting up the tokens on > > > white space before passing them along. > > > > > > This made me wonder what a practical use case can be, for using the > > shingle > > > filter? > > > > > > Any enlightenment on this would be much appreciated! > > > > > > Thanks, > > > Ryan > > > > > >