Re: Exact phrase search on very large text

Alessandro Benedetti Fri, 26 Jun 2015 06:06:44 -0700

You are tokenising …
"<tokenizer class="solr.WhitespaceTokenizerFactory"/>"
Be careful in doing first the lowercase token filter.
It's a best practice to first charFilter, then Tokenize and finally the set
of Token Filters.


Cheers

2015-06-26 13:27 GMT+01:00 Mike Thomsen <mikerthom...@gmail.com>:

> I tried creating a simplified new text field type that only did lower
> casing and exact phrasing worked this time. I'm not sure what the problem
> was. Perhaps it was a case of copypasta gone bad because I could have sworn
> that I tried exact phrase matching against a simple text field with bad
> results. Thanks for the help. In case anyone sees this and wonders what the
> field I created looks like here it is (with phonetic matching)
>
> <fieldType name="phonetics" class="solr.TextField"
> positionIncrementGap="100" multiValued="true">
>     <analyzer type="index">
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.PhoneticFilterFactory" encoder="RefinedSoundex"
> inject="true"/>
>     </analyzer>
>     <analyzer type="query">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.PhoneticFilterFactory" encoder="RefinedSoundex"
> inject="true"/>
>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>     </analyzer>
> </fieldType>
>
> On Fri, Jun 26, 2015 at 7:24 AM, Jack Krupansky <jack.krupan...@gmail.com>
> wrote:
>
> > Lucene, the underlying search engine library, imposes this 32K limit for
> > individual terms. Use tokenized text instead.
> >
> > -- Jack Krupansky
> >
> > On Thu, Jun 25, 2015 at 8:36 PM, Mike Thomsen <mikerthom...@gmail.com>
> > wrote:
> >
> > > I need to be able to do exact phrase searching on some documents that
> > are a
> > > few hundred kb when treated as a single block of text. I'm on 4.10.4
> and
> > it
> > > complains when I try to put something larger than 32kb in using a
> > textfield
> > > with the keyword tokenizer as the tokenizer. Is there any way I can
> index
> > > say a 500kb block of text like this?
> > >
> > > Thanks,
> > >
> > > Mike
> > >
> >
>



-- 
--------------------------

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Re: Exact phrase search on very large text

Reply via email to