what that error is telling you is that you have an unanalyzed term that is, well, huge (i..e > 32K). Is your "content" field by chance a "string" type? It's very rare that a term > 32K is actually useful. You can't search on it except with, say, wildcards,there's no stemming etc. So the first question is whether the "content" field is appropriately defined in your schema for your use case.
If your content field is some kind of text-based field (i.e. solr.Textfield), then the second issue may be that you just have wonky data coming in, say a base-64 encoded image or something scraped from somewhere. In that case you need to NOT index it. You can try Or try LengthFilterFactory, see: https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.LengthFilterFactory. This is a fundamental limitation enforced at the Lucene layer, so if that doesn't work, the only real solution is "don't do that". You'll have to intercept the doc and omit that data, perhaps write a custom update processor to throw out huge fields or the like. Best, Erick On Fri, Aug 5, 2016 at 10:59 AM, Musshorn, Kris T CTR USARMY RDECOM ARL (US) <kris.t.musshorn....@mail.mil> wrote: > CLASSIFICATION: UNCLASSIFIED > > I am trying to index from nutch 1.12 to SOLR 6.1.0. > Got this error. > java.lang.Exception: > org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error > from server at http://localhost:8983/solr/ARLInside: Exception writing > document id > https://emcstage.arl.army.mil/inside/fellows/corner/research.vol.3.2/index.cfm > to the index; possible analysis error: Document contains at least one > immense term in field="content" (whose UTF8 encoding is longer than the max > length 32766 > > How to correct? > > Thanks, > Kris > > ~~~~~~~~~~~~~~~~~~~~~~~~~~ > Kris T. Musshorn > FileMaker Developer - Contractor - Catapult Technology Inc. > US Army Research Lab > Aberdeen Proving Ground > Application Management & Development Branch > 410-278-7251 > kris.t.musshorn....@mail.mil > ~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > > CLASSIFICATION: UNCLASSIFIED