I tried creating a simplified new text field type that only did lower
casing and exact phrasing worked this time. I'm not sure what the problem
was. Perhaps it was a case of copypasta gone bad because I could have sworn
that I tried exact phrase matching against a simple text field with bad
results. Thanks for the help. In case anyone sees this and wonders what the
field I created looks like here it is (with phonetic matching)

<fieldType name="phonetics" class="solr.TextField"
positionIncrementGap="100" multiValued="true">
    <analyzer type="index">
        <filter class="solr.LowerCaseFilterFactory"/>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.PhoneticFilterFactory" encoder="RefinedSoundex"
inject="true"/>
    </analyzer>
    <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.PhoneticFilterFactory" encoder="RefinedSoundex"
inject="true"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
    </analyzer>
</fieldType>

On Fri, Jun 26, 2015 at 7:24 AM, Jack Krupansky <jack.krupan...@gmail.com>
wrote:

> Lucene, the underlying search engine library, imposes this 32K limit for
> individual terms. Use tokenized text instead.
>
> -- Jack Krupansky
>
> On Thu, Jun 25, 2015 at 8:36 PM, Mike Thomsen <mikerthom...@gmail.com>
> wrote:
>
> > I need to be able to do exact phrase searching on some documents that
> are a
> > few hundred kb when treated as a single block of text. I'm on 4.10.4 and
> it
> > complains when I try to put something larger than 32kb in using a
> textfield
> > with the keyword tokenizer as the tokenizer. Is there any way I can index
> > say a 500kb block of text like this?
> >
> > Thanks,
> >
> > Mike
> >
>

Reply via email to