Brain2000 opened a new issue, #12082:
URL: https://github.com/apache/lucene/issues/12082

   ### Description
   
   It looks like there's a problem in the TopFieldCollector.java where it calls 
"compareBottom" without calling "setBottom" first.
   I believe this is an issue if getLeafComparer( ) is called more than once, 
as the new class will no longer have the previous bottom slot value.
   
   I found a workaround is to rescope the variable storing the bottom slot set 
from calling "setBottom( )", to outside of the LeafFieldComparator class to 
make it persistent (just like the setTopValue( ) is called outside of the 
LeafFieldComparator class).
   
   Rescoping the bottom slot should be safe unless multiple threads are being 
run at the same time and need their own version of bottomSlot for the 
LeafFieldComparator class.
   
   The documentation here 
     
https://lucene.apache.org/core/8_0_0/core/org/apache/lucene/search/LeafFieldComparator.html
   states that _"[setBottom] will always be called before compareBottom(int)"_
   
   So one might think the LeafFieldComparator would be used like this:
   
   ```
   //get new leaf comparator
   LeafFieldComparator lc = getLeafComparator( );
   //setBottom before compareBottom
   lc.setBottom(xxx); 
   lc.compareBottom(yyy);
   
   //get new leaf comparator
   lc = getLeafComparator( );
   //setBottom before compareBottom
   lc.setBottom(xxx2);
   lc.compareBottom(yyy2);
   ```
   
   However, it is being called in this fashion:
   
   ```
   //get new leaf comparator
   LeafFieldComparator lc = getLeafComparator( );
   //setBottom before compareBottom
   lc.setBottom(xxx); 
   lc.compareBottom(yyy);
   
   //get new leaf comparator
   LeafFieldComparator lc = getLeafComparator( );
   //compareBottom... hopefully the slot is correct...
   lc.compareBottom(yyy);
   ```
   
   As you can see, the 2nd set of code does not treat the LeafComparator as if 
each instance has it's own bottomSlot variable. I believe the purpose of this 
would be to allow multiple threads to call this simultaneously, which would 
require each one to have it's own bottomSlot, otherwise it would not be thread 
safe.
   
   Also, if setBottom was meant to be outside of the scope of LeafComparator, 
then why was the time taken to put setTop outside and setBottom inside? This 
seems to have been done very deliberately according to the documentation.
   
   Here is a sample code snippet:
   ```
   import org.apache.lucene.index.LeafReaderContext;
   import org.apache.lucene.search.FieldComparator;
   import org.apache.lucene.search.LeafFieldComparator;
   import org.apache.lucene.search.Scorable;
   
   public final class MySimpleStringComparator extends FieldComparator<String> {
        private String[] values;
        private final String field;
        private String topValue;
   
        //SCOPING bottomSlot HERE WILL CAUSE ITEMS TO RETURN PROPERLY
        //private int bottomSlot;
   
        public MySimpleStringComparator(int numHits, String field) {
                this.values = new String[numHits];
                this.field = field;
        }
   
        @Override
        public int compare(int slot1, int slot2) {
                return this.compareValues(this.values[slot1], 
this.values[slot2]);
        }
   
        @Override
        public LeafFieldComparator getLeafComparator(LeafReaderContext lrc) 
throws IOException {
                final LeafReaderContext frc = lrc;
   
                return new LeafFieldComparator() {
                        //SCOPING bottomSlot HERE WILL CAUSE ITEMS ON A SUBSET 
TO RANDOMLY DISAPPEAR
                        private int bottomSlot;
   
                        public void setScorer(Scorable arg0) throws IOException 
{
                        }
   
                        public void setBottom(int slot) throws IOException {
                                bottomSlot = slot;
                        }
   
                        public void copy(int slot, int doc) throws IOException {
                                values[slot] = 
frc.reader().document(doc).get(field);
                        }
   
                        public int compareTop(int doc) throws IOException {
                                return compareValues(topValue, 
frc.reader().document(doc).get(field));
                        }
   
                        public int compareBottom(int doc) throws IOException {
                                return compareValues(values[bottomSlot], 
frc.reader().document(doc).get(field));
                        }
                };
        }
   
        @Override
        public void setTopValue(String val) {
                this.topValue = val;
        }
   
        @Override
        public String value(int slot) {
                return this.values[slot];
        }
   }
   ```
   
   Here is a code snippet that runs a search/sort for a single page that 
utilizes the MySimpleStringComparator above.
   ```
   ComplexPhraseQueryParser cpqp = new ComplexPhraseQueryParser("somefield", 
analyzer);
   Query query = cpqp.parse("somevalue");
   
   pageSize = 10;
   pageNum = 1;
   requestedRecords = pageSize * pageNum;
   startOffset = (pageNum - 1) * pageSize;
   
   FieldComparatorSource fsc = new FieldComparatorSource() {
        @Override
        public FieldComparator<String> newComparator(String fieldname, int 
numhits, int sortPos, boolean reversed) {
                return new MySimpleStringComparator(numhits, fieldname);
        }
   };
   
   Sort sort = new Sort(new SortField("firstname", fsc, false));
   IndexSearcher searcher = new IndexSearcher(reader);
   TopFieldCollector tfcollector = TopFieldCollector.create(sort, 
requestedRecords, Integer.MAX_VALUE);
   searcher.search(query, tfcollector);
   ScoreDoc[] hits = tfcollector.topDocs(startOffset, pageSize).scoreDocs;
   ```
   
   ### Version and environment details
   
   This is running on Windows Server 2019 using Tomcat 8.5 with Lucene 8.11.2 
(also reproduceable in 8.6.0)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to