Hi, I would like to check, is the string bytes must be at most 32766 characters in length?
I'm trying to do a copyField of my rich-text documents content to a field with fieldType=string to try out my getting distinct result for content, as there are several documents with the exact same content, and we only want to list one of them during searching. However, I get the following errors in some of the documents when I tried to index them with the copyField. Some of my documents are quite large in size, and there is a possibility that it exceed 32766 characters. Is there any other ways to overcome this problem? org.apache.solr.common.SolrException: Exception writing document id collection1_polymer100 to the index; possible analysis error. at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:167) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51) at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:955) at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1110) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:706) at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:104) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51) at org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.processAdd(LanguageIdentifierUpdateProcessor.java:207) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.doAdd(ExtractingDocumentLoader.java:122) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.addDoc(ExtractingDocumentLoader.java:127) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:235) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143) at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064) at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) at org.eclipse.jetty.server.Server.handle(Server.java:497) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257) at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.IllegalArgumentException: Document contains at least one immense term in field="signature" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[32, 60, 112, 62, 60, 98, 114, 62, 32, 32, 32, 60, 98, 114, 62, 56, 48, 56, 32, 72, 97, 110, 100, 98, 111, 111, 107, 32, 111, 102]...', original message: bytes can be at most 32766 in length; got 49960 at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:670) at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:344) at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:300) at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:232) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:458) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1363) at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:239) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:163) ... 38 more Caused by: org.apache.lucene.util.BytesRefHash$MaxBytesLengthExceededException: bytes can be at most 32766 in length; got 49960 at org.apache.lucene.util.BytesRefHash.add(BytesRefHash.java:284) at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:154) at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:660) ... 45 more Regards, Edwin