I tried creating a simplified new text field type that only did lower casing and exact phrasing worked this time. I'm not sure what the problem was. Perhaps it was a case of copypasta gone bad because I could have sworn that I tried exact phrase matching against a simple text field with bad results. Thanks for the help. In case anyone sees this and wonders what the field I created looks like here it is (with phonetic matching)
<fieldType name="phonetics" class="solr.TextField" positionIncrementGap="100" multiValued="true"> <analyzer type="index"> <filter class="solr.LowerCaseFilterFactory"/> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.PhoneticFilterFactory" encoder="RefinedSoundex" inject="true"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.PhoneticFilterFactory" encoder="RefinedSoundex" inject="true"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> </fieldType> On Fri, Jun 26, 2015 at 7:24 AM, Jack Krupansky <jack.krupan...@gmail.com> wrote: > Lucene, the underlying search engine library, imposes this 32K limit for > individual terms. Use tokenized text instead. > > -- Jack Krupansky > > On Thu, Jun 25, 2015 at 8:36 PM, Mike Thomsen <mikerthom...@gmail.com> > wrote: > > > I need to be able to do exact phrase searching on some documents that > are a > > few hundred kb when treated as a single block of text. I'm on 4.10.4 and > it > > complains when I try to put something larger than 32kb in using a > textfield > > with the keyword tokenizer as the tokenizer. Is there any way I can index > > say a 500kb block of text like this? > > > > Thanks, > > > > Mike > > >