Tom,

I think I see where this may be -- it looks like another > 2B terms
bug in Lucene (we are using an int instead of a long in the
TermInfoAndOrd class inside TermInfosReader.java), only present in
3.1.

I'm also mad that Test2BTerms fails to catch this!!  I will go fix
that test and confirm it sees this bug.

Can you build from source?  If so, try this patch:

Index: lucene/src/java/org/apache/lucene/index/TermInfosReader.java
===================================================================
--- lucene/src/java/org/apache/lucene/index/TermInfosReader.java        
(revision
1089906)
+++ lucene/src/java/org/apache/lucene/index/TermInfosReader.java        
(working copy)
@@ -46,8 +46,8 @@

   // Just adds term's ord to TermInfo
   private final static class TermInfoAndOrd extends TermInfo {
-    final int termOrd;
-    public TermInfoAndOrd(TermInfo ti, int termOrd) {
+    final long termOrd;
+    public TermInfoAndOrd(TermInfo ti, long termOrd) {
       super(ti);
       this.termOrd = termOrd;
     }
@@ -245,7 +245,7 @@
             // wipe out the cache when they iterate over a large numbers
             // of terms in order
             if (tiOrd == null) {
-              termsCache.put(cacheKey, new TermInfoAndOrd(ti, (int)
enumerator.position));
+              termsCache.put(cacheKey, new TermInfoAndOrd(ti,
enumerator.position));
             } else {
               assert sameTermInfo(ti, tiOrd, enumerator);
               assert (int) enumerator.position == tiOrd.termOrd;
@@ -262,7 +262,7 @@
     // random-access: must seek
     final int indexPos;
     if (tiOrd != null) {
-      indexPos = tiOrd.termOrd / totalIndexInterval;
+      indexPos = (int) (tiOrd.termOrd / totalIndexInterval);
     } else {
       // Must do binary search:
       indexPos = getIndexOffset(term);
@@ -274,7 +274,7 @@
     if (enumerator.term() != null && term.compareTo(enumerator.term()) == 0) {
       ti = enumerator.termInfo();
       if (tiOrd == null) {
-        termsCache.put(cacheKey, new TermInfoAndOrd(ti, (int)
enumerator.position));
+        termsCache.put(cacheKey, new TermInfoAndOrd(ti, enumerator.position));
       } else {
         assert sameTermInfo(ti, tiOrd, enumerator);
         assert (int) enumerator.position == tiOrd.termOrd;

Mike

http://blog.mikemccandless.com

On Fri, Apr 8, 2011 at 4:53 PM, Burton-West, Tom <tburt...@umich.edu> wrote:
> The query below results in an array out of bounds exception:
> select/?q=solr&version=2.2&start=0&rows=0&facet=true&facet.field=topicStr
>
> Here is the exception:
>  Exception during facet.field of 
> topicStr:java.lang.ArrayIndexOutOfBoundsException: -1931149
>        at 
> org.apache.lucene.index.TermInfosReader.seekEnum(TermInfosReader.java:201)
>
> We are using a dev version of Solr/Lucene:
>
> Solr Specification Version: 3.0.0.2010.11.19.16.00.54
> Solr Implementation Version: 3.1-SNAPSHOT 1036094 - root - 2010-11-19 16:00:54
> Lucene Specification Version: 3.1-SNAPSHOT
> Lucene Implementation Version: 3.1-SNAPSHOT 1036094 - 2010-11-19 16:01:10
>
> Just before the exception we see this entry in our tomcat logs:
>
> Apr 8, 2011 2:01:58 PM org.apache.solr.request.UnInvertedField uninvert
> INFO: UnInverted multi-valued field 
> {field=topicStr,memSize=7675174,tindexSize=289102,time=2577,phase1=2537,nTerms=498975,bigTerms=0,termInstances=1368694,uses=0}
> Apr 8, 2011 2:01:58 PM org.apache.solr.core.SolrCore execute
>
> Is this a known bug?  Can anyone provide a clue as to how we can determine 
> what the problem is?
>
> Tom Burton-West
>
>
> Appended Below is the exception stack trace:
>
> SEVERE: Exception during facet.field of 
> topicStr:java.lang.ArrayIndexOutOfBoundsException: -1931149
>        at 
> org.apache.lucene.index.TermInfosReader.seekEnum(TermInfosReader.java:201)
>        at 
> org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:271)
>        at 
> org.apache.lucene.index.TermInfosReader.terms(TermInfosReader.java:338)
>        at org.apache.lucene.index.SegmentReader.terms(SegmentReader.java:928)
>        at 
> org.apache.lucene.index.DirectoryReader$MultiTermEnum.<init>(DirectoryReader.java:1055)
>        at 
> org.apache.lucene.index.DirectoryReader.terms(DirectoryReader.java:659)
>        at 
> org.apache.solr.search.SolrIndexReader.terms(SolrIndexReader.java:302)
>        at 
> org.apache.solr.request.NumberedTermEnum.skipTo(UnInvertedField.java:1018)
>        at 
> org.apache.solr.request.UnInvertedField.getTermText(UnInvertedField.java:838)
>        at 
> org.apache.solr.request.UnInvertedField.getCounts(UnInvertedField.java:617)
>        at 
> org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:279)
>        at 
> org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:312)
>        at 
> org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:174)
>        at 
> org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:72)
>        at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
>        at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1354)
>
>

Reply via email to