[ 
https://issues.apache.org/jira/browse/LUCENE-9755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17283264#comment-17283264
 ] 

Mayya Sharipova commented on LUCENE-9755:
-----------------------------------------

{quote}>> Consider the following scenario:
{quote}
{quote}>> all documents in the index have a field "numfield" indexed as IntPoint
{quote}
{quote}>> in addition, SOME of those documents are also indexed with a 
SortedNumericDocValuesField using the same "numfield" name
{quote}
[~tomhecker]. I am working on the LUCENE-9334  that will ensure that this never 
happens. That is, if a document has "numfield" indexed as IntPoint, it also 
must have a "numfield" indexed as SortedNumericDocValuesField.  In other words, 
there will be consistency between data-structures on a per-field across all the 
documents of an index.  

But this will be from version 9.0.  Your point is still valid for 8.x

 

 

> Index Segment without DocValues May Cause Search to Fail
> --------------------------------------------------------
>
>                 Key: LUCENE-9755
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9755
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: 8.x, 8.3.1, 8.8
>            Reporter: Thomas Hecker
>            Priority: Minor
>              Labels: docValues, sorting
>         Attachments: DocValuesTest.java
>
>
> Not sure if this can be considered a bug, but it is certainly a caveat that 
> may slip through testing due to its nature.
> Consider the following scenario:
>  * all documents in the index have a field "numfield" indexed as IntPoint
>  * in addition, SOME of those documents are also indexed with a 
> SortedNumericDocValuesField using the same "numfield" name
> The documents without the DocValues cannot be matched from any queries that 
> involve sorting, so we save some space by omitting the DocValues for those 
> documents.
> This works perfectly fine, unless
>  * the index contains a segment that only contains documents without the 
> DocValues
> In this case, running a query that sorts by "numfield" will throw the 
> following exception:
> {noformat}
> java.lang.IllegalStateException: unexpected docvalues type NONE for field 
> 'numfield' (expected one of [SORTED_NUMERIC, NUMERIC]). Re-index with correct 
> docvalues type.
>    at org.apache.lucene.index.DocValues.checkField(DocValues.java:317)
>    at org.apache.lucene.index.DocValues.getSortedNumeric(DocValues.java:389)
>    at 
> org.apache.lucene.search.SortedNumericSortField$3.getNumericDocValues(SortedNumericSortField.java:159)
>    at 
> org.apache.lucene.search.FieldComparator$NumericComparator.doSetNextReader(FieldComparator.java:155){noformat}
> I have included a minimal example program that demonstrates the issue. This 
> will
>  * create an index with two documents, each having "numfield" indexed
>  * add a DocValuesField "numfield" only for the first document
>  * force the two documents into separate index segments
>  * run a query that matches only the first document and sorts by "numfield"
> This results in the aforementioned exception.
> When removing the following lines from the code:
> {code:java}
> if (i==docCount/2) {
>   iw.commit();
> }
> {code}
> both documents get added to the same segment. When re-running the code 
> creating with a single index segment, the query works fine.
> Tested with Lucene 8.3.1 and 8.8.0  .
> Like I said, this may not be considered a bug. But it has slipped through our 
> testing because the existence of such a DocValues-free segment is such a rare 
> and short-lived event.
> We can avoid this issue in the future by using a different field name for the 
> DocValuesField. But for our production systems we have to patch 
> DocValues.checkField() to suppress the IllegalStateException as reindexing is 
> not an option right now.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to