[ https://issues.apache.org/jira/browse/LUCENE-9755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17283264#comment-17283264 ]
Mayya Sharipova commented on LUCENE-9755: ----------------------------------------- {quote}>> Consider the following scenario: {quote} {quote}>> all documents in the index have a field "numfield" indexed as IntPoint {quote} {quote}>> in addition, SOME of those documents are also indexed with a SortedNumericDocValuesField using the same "numfield" name {quote} [~tomhecker]. I am working on the LUCENE-9334 that will ensure that this never happens. That is, if a document has "numfield" indexed as IntPoint, it also must have a "numfield" indexed as SortedNumericDocValuesField. In other words, there will be consistency between data-structures on a per-field across all the documents of an index. But this will be from version 9.0. Your point is still valid for 8.x > Index Segment without DocValues May Cause Search to Fail > -------------------------------------------------------- > > Key: LUCENE-9755 > URL: https://issues.apache.org/jira/browse/LUCENE-9755 > Project: Lucene - Core > Issue Type: Bug > Components: core/search > Affects Versions: 8.x, 8.3.1, 8.8 > Reporter: Thomas Hecker > Priority: Minor > Labels: docValues, sorting > Attachments: DocValuesTest.java > > > Not sure if this can be considered a bug, but it is certainly a caveat that > may slip through testing due to its nature. > Consider the following scenario: > * all documents in the index have a field "numfield" indexed as IntPoint > * in addition, SOME of those documents are also indexed with a > SortedNumericDocValuesField using the same "numfield" name > The documents without the DocValues cannot be matched from any queries that > involve sorting, so we save some space by omitting the DocValues for those > documents. > This works perfectly fine, unless > * the index contains a segment that only contains documents without the > DocValues > In this case, running a query that sorts by "numfield" will throw the > following exception: > {noformat} > java.lang.IllegalStateException: unexpected docvalues type NONE for field > 'numfield' (expected one of [SORTED_NUMERIC, NUMERIC]). Re-index with correct > docvalues type. > at org.apache.lucene.index.DocValues.checkField(DocValues.java:317) > at org.apache.lucene.index.DocValues.getSortedNumeric(DocValues.java:389) > at > org.apache.lucene.search.SortedNumericSortField$3.getNumericDocValues(SortedNumericSortField.java:159) > at > org.apache.lucene.search.FieldComparator$NumericComparator.doSetNextReader(FieldComparator.java:155){noformat} > I have included a minimal example program that demonstrates the issue. This > will > * create an index with two documents, each having "numfield" indexed > * add a DocValuesField "numfield" only for the first document > * force the two documents into separate index segments > * run a query that matches only the first document and sorts by "numfield" > This results in the aforementioned exception. > When removing the following lines from the code: > {code:java} > if (i==docCount/2) { > iw.commit(); > } > {code} > both documents get added to the same segment. When re-running the code > creating with a single index segment, the query works fine. > Tested with Lucene 8.3.1 and 8.8.0 . > Like I said, this may not be considered a bug. But it has slipped through our > testing because the existence of such a DocValues-free segment is such a rare > and short-lived event. > We can avoid this issue in the future by using a different field name for the > DocValuesField. But for our production systems we have to patch > DocValues.checkField() to suppress the IllegalStateException as reindexing is > not an option right now. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org