Thomas Hecker created LUCENE-9755:
-------------------------------------

             Summary: Index Segment without DocValues May Cause Search to Fail
                 Key: LUCENE-9755
                 URL: https://issues.apache.org/jira/browse/LUCENE-9755
             Project: Lucene - Core
          Issue Type: Bug
          Components: core/search
    Affects Versions: 8.3.1, 8.x, 8.8
            Reporter: Thomas Hecker
         Attachments: DocValuesTest.java

Not sure if this can be considered a bug, but it is certainly a caveat that may 
slip through testing due to its nature.

Consider the following scenario:
 * all documents in the index have a field "numfield" indexed as IntPoint
 * in addition, SOME of those documents are also indexed with a 
SortedNumericDocValuesField using the same "numfield" name

The documents without the DocValues cannot be matched from any queries that 
involve sorting, so we save some space by omitting the DocValues for those 
documents.

This works perfectly fine, unless
 * the index contains a segment that only contains documents without the 
DocValues

In this case, running a query that sorts by "numfield" will throw the following 
exception:
{noformat}
java.lang.IllegalStateException: unexpected docvalues type NONE for field 
'numfield' (expected one of [SORTED_NUMERIC, NUMERIC]). Re-index with correct 
docvalues type.
   at org.apache.lucene.index.DocValues.checkField(DocValues.java:317)
   at org.apache.lucene.index.DocValues.getSortedNumeric(DocValues.java:389)
   at 
org.apache.lucene.search.SortedNumericSortField$3.getNumericDocValues(SortedNumericSortField.java:159)
   at 
org.apache.lucene.search.FieldComparator$NumericComparator.doSetNextReader(FieldComparator.java:155){noformat}
I have included a minimal example program that demonstrates the issue. This will
 * create an index with two documents, each having "numfield" indexed
 * add a DocValuesField "numfield" only for the first document
 * force the two documents into separate index segments
 * run a query that matches only the first document and sorts by "numfield"

This results in the aforementioned exception.

When removing the following lines from the code:
{code:java}
if (i==docCount/2) {
  iw.commit();
}
{code}
both documents get added to the same segment. When re-running the code creating 
with a single index segment, the query works fine.

Tested with Lucene 8.3.1 and 8.8.0  .

Like I said, this may not be considered a bug. But it has slipped through our 
testing because the existence of such a DocValues-free segment is such a rare 
and short-lived event.

We can avoid this issue in the future by using a different field name for the 
DocValuesField. But for our production systems we have to patch 
DocValues.checkField() to suppress the IllegalStateException as reindexing is 
not an option right now.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to