Lucene only supports 2^31-1 documents in an index, so Solr can only support 2^31-1 documents in a single shard.

I think it's a bug that Lucene doesn't throw an exception when more than that number of documents have been inserted. Instead, you get this error when Solr tries to read such an overstuffed index.

-- Jack Krupansky

-----Original Message----- From: [Tech Fun]山崎
Sent: Tuesday, May 6, 2014 8:54 PM
To: solr-user@lucene.apache.org
Subject: Too many documents Exception

Hello everybody,

Solr 4.3.1(and 4.7.1), Num Docs + Deleted Docs >
2147483647(Integer.MAX_VALUE) over
Caused by: java.lang.IllegalArgumentException: Too many documents,
composite IndexReaders cannot exceed 2147483647

It seems to be trouble similar to the unresolved e-mail.
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201307.mbox/browser

If How can I fix this?
This Solr Specification?


log.

ERROR org.apache.solr.core.CoreContainer – Unable to create core: collection1
org.apache.solr.common.SolrException: Error opening new searcher
   at org.apache.solr.core.SolrCore.<init>(SolrCore.java:821)
   at org.apache.solr.core.SolrCore.<init>(SolrCore.java:618)
at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:949)
   at org.apache.solr.core.CoreContainer.create(CoreContainer.java:984)
   at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597)
   at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
   at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.solr.common.SolrException: Error opening new searcher
   at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1438)
   at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1550)
   at org.apache.solr.core.SolrCore.<init>(SolrCore.java:796)
   ... 13 more
Caused by: org.apache.solr.common.SolrException: Error opening Reader
at org.apache.solr.search.SolrIndexSearcher.getReader(SolrIndexSearcher.java:172) at org.apache.solr.search.SolrIndexSearcher.<init>(SolrIndexSearcher.java:183) at org.apache.solr.search.SolrIndexSearcher.<init>(SolrIndexSearcher.java:179)
   at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1414)
   ... 15 more
Caused by: java.lang.IllegalArgumentException: Too many documents,
composite IndexReaders cannot exceed 2147483647
at org.apache.lucene.index.BaseCompositeReader.<init>(BaseCompositeReader.java:77) at org.apache.lucene.index.DirectoryReader.<init>(DirectoryReader.java:368) at org.apache.lucene.index.StandardDirectoryReader.<init>(StandardDirectoryReader.java:42) at org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:71) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:783) at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:52)
   at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:88)
at org.apache.solr.core.StandardIndexReaderFactory.newReader(StandardIndexReaderFactory.java:34) at org.apache.solr.search.SolrIndexSearcher.getReader(SolrIndexSearcher.java:169)
   ... 18 more
ERROR org.apache.solr.core.CoreContainer  –
null:org.apache.solr.common.SolrException: Unable to create core:
collection1
at org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:1450)
   at org.apache.solr.core.CoreContainer.create(CoreContainer.java:993)
   at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597)
   at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
   at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.solr.common.SolrException: Error opening new searcher
   at org.apache.solr.core.SolrCore.<init>(SolrCore.java:821)
   at org.apache.solr.core.SolrCore.<init>(SolrCore.java:618)
at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:949)
   at org.apache.solr.core.CoreContainer.create(CoreContainer.java:984)
   ... 10 more
Caused by: org.apache.solr.common.SolrException: Error opening new searcher
   at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1438)
   at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1550)
   at org.apache.solr.core.SolrCore.<init>(SolrCore.java:796)
   ... 13 more
Caused by: org.apache.solr.common.SolrException: Error opening Reader
at org.apache.solr.search.SolrIndexSearcher.getReader(SolrIndexSearcher.java:172) at org.apache.solr.search.SolrIndexSearcher.<init>(SolrIndexSearcher.java:183) at org.apache.solr.search.SolrIndexSearcher.<init>(SolrIndexSearcher.java:179)
   at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1414)
   ... 15 more
Caused by: java.lang.IllegalArgumentException: Too many documents,
composite IndexReaders cannot exceed 2147483647
at org.apache.lucene.index.BaseCompositeReader.<init>(BaseCompositeReader.java:77) at org.apache.lucene.index.DirectoryReader.<init>(DirectoryReader.java:368) at org.apache.lucene.index.StandardDirectoryReader.<init>(StandardDirectoryReader.java:42) at org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:71) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:783) at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:52)
   at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:88)
at org.apache.solr.core.StandardIndexReaderFactory.newReader(StandardIndexReaderFactory.java:34) at org.apache.solr.search.SolrIndexSearcher.getReader(SolrIndexSearcher.java:169)
   ... 18 more


sample solrconfig.xml

<?xml version="1.0" encoding="UTF-8" ?>
<config>
 <luceneMatchVersion>LUCENE_43</luceneMatchVersion>

 <lib dir="/opt/solr/dist" regex="solr-cell-\d.*\.jar" />
 <lib dir="/opt/solr/contrib/extraction/lib" regex=".*\.jar" />

 <lib dir="/opt/solr/dist" regex="solr-clustering-\d.*\.jar" />
 <lib dir="/opt/solr/contrib/clustering/lib" regex=".*\.jar" />

 <lib dir="/opt/solr/dist" regex="solr-langid-\d.*\.jar" />
 <lib dir="/opt/solr/contrib/langid/lib" regex=".*\.jar" />

 <lib dir="/opt/solr/dist" regex="solr-velocity-\d.*\.jar" />
 <lib dir="/opt/solr/contrib/velocity/lib" regex=".*\.jar" />

 <dataDir>${solr.data.dir:}</dataDir>

 <directoryFactory name="DirectoryFactory"

class="${solr.directoryFactory:solr.NRTCachingDirectoryFactory}"/>

 <codecFactory class="solr.SchemaCodecFactory"/>

 <indexConfig>
   <ramBufferSizeMB>256</ramBufferSizeMB>
   <lockType>${solr.lock.type:native}</lockType>
 </indexConfig>

 <jmx />

 <updateHandler class="solr.DirectUpdateHandler2">
   <updateLog>
     <str name="dir">${solr.ulog.dir:}</str>
   </updateLog>
   <autoCommit>
     <maxDocs>10000</maxDocs>
     <maxTime>60000</maxTime>
     <openSearcher>false</openSearcher>
   </autoCommit>
   <autoSoftCommit>
     <maxDocs>10</maxDocs>
     <maxTime>1000</maxTime>
   </autoSoftCommit>
 </updateHandler>

 <query>
   <maxBooleanClauses>1024</maxBooleanClauses>
   <filterCache class="solr.FastLRUCache"
                size="16384"
                initialSize="4096"
                autowarmCount="1024"/>
   <queryResultCache class="solr.FastLRUCache"
                    size="16384"
                    initialSize="4096"
                    autowarmCount="1024"/>
   <documentCache class="solr.FastLRUCache"
                  size="16384"
                  initialSize="4096"
                  autowarmCount="1024"/>
   <enableLazyFieldLoading>true</enableLazyFieldLoading>
   <queryResultWindowSize>20</queryResultWindowSize>
   <queryResultMaxDocsCached>200</queryResultMaxDocsCached>
   <useColdSearcher>false</useColdSearcher>
   <maxWarmingSearchers>2</maxWarmingSearchers>
 </query>

 <requestDispatcher handleSelect="false" >
   <requestParsers enableRemoteStreaming="true"
                   multipartUploadLimitInKB="2048000"
                   formdataUploadLimitInKB="2048"/>
   <httpCaching never304="true" />
 </requestDispatcher>

 <requestHandler name="/select" class="solr.SearchHandler">
   <lst name="defaults">
      <str name="echoParams">explicit</str>
      <int name="rows">10</int>
      <str name="df">text</str>
   </lst>
 </requestHandler>

 <requestHandler name="/update" class="solr.UpdateRequestHandler">
 </requestHandler>

 <requestHandler name="/update/json" class="solr.JsonUpdateRequestHandler">
   <lst name="defaults">
     <str name="stream.contentType">application/json</str>
   </lst>
 </requestHandler>

 <requestHandler name="/admin/" class="solr.admin.AdminHandlers" />

 <requestHandler name="/admin/ping" class="solr.PingRequestHandler">
   <lst name="invariants">
     <str name="q">solrpingquery</str>
   </lst>
   <lst name="defaults">
     <str name="echoParams">all</str>
   </lst>
 </requestHandler>

 <queryResponseWriter name="json" class="solr.JSONResponseWriter">
   <str name="content-type">text/plain; charset=UTF-8</str>
 </queryResponseWriter>
</config>


sample scheme.xml

<?xml version="1.0" encoding="UTF-8" ?>
<schema name="twitter" version="1.5">
 <!-- types -->
 <types>
   <fieldType name="string" class="solr.StrField" sortMissingLast="true" />
   <fieldType name="long" class="solr.TrieLongField"
precisionStep="0" positionIncrementGap="0"/>
   <fieldType name="tlong" class="solr.TrieLongField"
precisionStep="8" positionIncrementGap="0"/>
   <fieldType name="tdate" class="solr.TrieDateField"
precisionStep="6" positionIncrementGap="0"/>
   <fieldType name="text_cjk" class="solr.TextField"
positionIncrementGap="100">
     <analyzer>
       <charFilter class="solr.MappingCharFilterFactory"/>
       <tokenizer class="solr.StandardTokenizerFactory"/>
       <filter class="solr.CJKWidthFilterFactory"/>
       <filter class="solr.LowerCaseFilterFactory"/>
       <filter class="solr.CJKBigramFilterFactory" outputUnigrams="true"/>
     </analyzer>
   </fieldType>
 </types>

 <!-- fields -->
 <fields>
   <field name="key" type="string" indexed="true" stored="true"
required="true" />
   <field name="status_id" type="tlong" indexed="true" stored="true"
required="true"/>
   <field name="text" type="text_cjk" indexed="true" stored="true"
required="true"/>
   <field name="from_user_id_str" type="string" indexed="true"
stored="true" required="true"/>
   <field name="created_at" type="tdate" indexed="true" stored="true"
required="true"/>
   <field name="_version_" type="long" indexed="true" stored="true"
multiValued="false"/>
 </fields>
 <uniqueKey>key</uniqueKey>
 <defaultSearchField>text</defaultSearchField>
 <solrQueryParser defaultOperator="AND"/>
</schema>



sample data add source code, python

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import datetime
# use https://github.com/toastdriven/pysolr
from pysolr import(
   Solr,
)


def main():
   s_time = datetime.datetime.utcnow()
   print 'start.: ({})'.format(str(s_time))

   solr = Solr('http://localhost:8983/solr/collection1', timeout=60)

   docs = []

   max_range = 22 * (10 ** 8)  # Java Integer.MAX_VALUE over
   for x in xrange(1, max_range):
       docs.append(
           {
               'key': '{}'.format(x),
               'status_id': x,
               'text': '{} 番目の記事'.format(x).decode('utf-8'),
               'from_user_id_str': '1',
               'created_at': '2014-05-01T20:06:53Z',
           }
       )

       if x % (10 ** 4) == 0:
           solr.add(docs)
           solr.commit()
           docs = []

           e_time = datetime.datetime.utcnow()

           print '{} end.: ({})'.format(x, str(e_time - s_time))

   solr.add(docs)
   solr.commit()
   docs = []

   e_time = datetime.datetime.utcnow()

   print 'end.: ({})'.format(str(e_time - s_time))

if __name__ == '__main__':
main()

Reply via email to