what that error is telling you is that you have an unanalyzed term
that is, well, huge (i..e > 32K). Is your "content" field by chance a
"string" type? It's very rare that a term > 32K is actually useful.
You can't search on it except with, say, wildcards,there's no stemming
etc. So the first question is whether the "content" field is
appropriately defined in your schema for your use case.

If your content field is some kind of text-based field (i.e.
solr.Textfield), then the second issue may be that you just have wonky
data coming in, say a base-64 encoded image or something scraped from
somewhere. In that case you need to NOT index it. You can try Or try
LengthFilterFactory, see:
https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.LengthFilterFactory.

This is a fundamental limitation enforced at the Lucene layer, so if
that doesn't work, the only real solution is "don't do that". You'll
have to intercept the doc and omit that data, perhaps write a custom
update processor to throw out huge fields or the like.

Best,
Erick


On Fri, Aug 5, 2016 at 10:59 AM, Musshorn, Kris T CTR USARMY RDECOM
ARL (US) <kris.t.musshorn....@mail.mil> wrote:
> CLASSIFICATION: UNCLASSIFIED
>
> I am trying to index from nutch 1.12 to SOLR 6.1.0.
> Got this error.
> java.lang.Exception: 
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
> from server at http://localhost:8983/solr/ARLInside: Exception writing 
> document id 
> https://emcstage.arl.army.mil/inside/fellows/corner/research.vol.3.2/index.cfm
>  to the index; possible analysis error: Document contains at least one 
> immense term in field="content" (whose UTF8 encoding is longer than the max 
> length 32766
>
> How to correct?
>
> Thanks,
> Kris
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~
> Kris T. Musshorn
> FileMaker Developer - Contractor - Catapult Technology Inc.
> US Army Research Lab
> Aberdeen Proving Ground
> Application Management & Development Branch
> 410-278-7251
> kris.t.musshorn....@mail.mil
> ~~~~~~~~~~~~~~~~~~~~~~~~~~
>
>
>
> CLASSIFICATION: UNCLASSIFIED

Reply via email to