[GitHub] [lucene] tang-hi commented on pull request #12255: allocate one NeighborQueue per search for results

via GitHub Thu, 18 May 2023 08:44:29 -0700


tang-hi commented on PR #12255:
URL: https://github.com/apache/lucene/pull/12255#issuecomment-1553253776


   > I tried running luceneutil before/after this change using this command:
   > 
   > ```
   >  comp =  competition.Competition()
   > 
   >   index = comp.newIndex('baseline', sourceData,
   >                         vectorFile=constants.GLOVE_VECTOR_DOCS_FILE,
   >                         vectorDimension=100,
   >                         vectorEncoding='FLOAT32')
   > 
   >   comp.competitor('baseline', 'baseline',
   >                   vectorDict=constants.GLOVE_WORD_VECTORS_FILE,
   >                   index = index, concurrentSearches = concurrentSearches)
   > 
   >   comp.competitor('candidate', 'candidate',
   >                   vectorDict=constants.GLOVE_WORD_VECTORS_FILE,
   >                   index = index, concurrentSearches = concurrentSearches)
   > 
   >   comp.benchmark("baseline_vs_candidate")
   > ```
   > 
   > and I get this error:
   > 
   > ```
   >   File "src/python/vector-test.py", line 65, in <module>                   
                                               [39/1959]
   >     comp.benchmark("baseline_vs_candidate")
   >   File 
"/local/home/sokolovm/workspace/lbench/luceneutil/src/python/competition.py", 
line 510, in benchmark                       
   >     searchBench.run(id, base, challenger,                                  
                                                       
   >   File 
"/local/home/sokolovm/workspace/lbench/luceneutil/src/python/searchBench.py", 
line 196, in run                             
   >     raise RuntimeError('errors occurred: %s' % str(cmpDiffs))
   > RuntimeError: errors occurred: ([], 
["query=KnnFloatVectorQuery:vector[0.0223385,...][100] filter=None sort=None 
groupField=None hi
   > tCount=100: hit 51 has wrong field/score value ([994765], '0.9567487') vs 
([824922], '0.9567554')", "query=KnnFloatVectorQuery:vect
   > or[-0.061654933,...][100] filter=None sort=None groupField=None 
hitCount=100: hit 16 has wrong field/score value ([813187], '0.8702
   > 4546') vs ([134050], '0.8707979')", 
"query=KnnFloatVectorQuery:vector[-0.111742884,...][100] filter=None sort=None 
groupField=None
   > hitCount=100: hit 27 has wrong field/score value ([724125], '0.8874463') 
vs ([817731], '0.88757277')"], 1.0) 
   > ```
   > 
   > maybe it's expected that we changed the results? I think this is what Mike 
M ran into with the nightly benchmarks
   
   @msokolov I try to run lucenutil use the command you provide, it throw the 
exception
   ````
   Exception in thread "main" java.lang.IllegalArgumentException: facetDim Date 
was not indexed
        at perf.TaskParser$TaskBuilder.parseFacets(TaskParser.java:289)
        at perf.TaskParser$TaskBuilder.buildQueryTask(TaskParser.java:154)
        at perf.TaskParser$TaskBuilder.build(TaskParser.java:147)
        at perf.TaskParser.parseOneTask(TaskParser.java:108)
        at perf.LocalTaskSource.loadTasks(LocalTaskSource.java:169)
        at perf.LocalTaskSource.<init>(LocalTaskSource.java:48)
        at perf.SearchPerfTest._main(SearchPerfTest.java:543)
        at perf.SearchPerfTest.main(SearchPerfTest.java:133)
   ````
   
   my vector-test.py looks like
   ````python
   #!/usr/bin/env python
   
   # Licensed to the Apache Software Foundation (ASF) under one or more
   # contributor license agreements.  See the NOTICE file distributed with
   # this work for additional information regarding copyright ownership.
   # The ASF licenses this file to You under the Apache License, Version 2.0
   # (the "License"); you may not use this file except in compliance with
   # the License.  You may obtain a copy of the License at
   #
   #     http://www.apache.org/licenses/LICENSE-2.0
   #
   # Unless required by applicable law or agreed to in writing, software
   # distributed under the License is distributed on an "AS IS" BASIS,
   # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   # See the License for the specific language governing permissions and
   # limitations under the License.
   
   import competition
   import sys
   import constants
   
   # simple example that runs benchmark with WIKI_MEDIUM source and task files
   # Baseline here is ../lucene_baseline versus ../lucene_candidate
   if __name__ == '__main__':
       #sourceData = competition.sourceData('wikivector1m')
       #sourceData = competition.sourceData('wikivector10k')
       sourceData = competition.sourceData('wikimedium10k')
       comp = competition.Competition(verifyScores=False)
   
       index = comp.newIndex('baseline', sourceData,
                             vectorFile=constants.GLOVE_VECTOR_DOCS_FILE,
                             vectorDimension=100,
                             vectorEncoding='FLOAT32')
       # Warning -- Do not break the order of arguments
       # TODO -- Fix the following by using argparser
       concurrentSearches = True
   
       # create a competitor named baseline with sources in the ../trunk folder
       comp.competitor('baseline', 'baseline',
                       vectorDict=constants.GLOVE_WORD_VECTORS_FILE,
                       index=index, concurrentSearches=concurrentSearches)
   
       comp.competitor('candidate', 'candidate',
                       vectorDict=constants.GLOVE_WORD_VECTORS_FILE,
                       index=index, concurrentSearches=concurrentSearches)
       # use a different index
       # create a competitor named my_modified_version with sources in the 
../patch folder
       # note that we haven't specified an index here, luceneutil will 
automatically use the index from the base competitor for searching
       # while the codec that is used for running this competitor is taken from 
this competitor.
       # start the benchmark - this can take long depending on your index and 
machines
       comp.benchmark("baseline_vs_candidate")
   ````
   Could you tell me how can I fix that?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] tang-hi commented on pull request #12255: allocate one NeighborQueue per search for results

Reply via email to