[ https://issues.apache.org/jira/browse/LUCENE-9715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17278903#comment-17278903 ]
Michael Sokolov edited comment on LUCENE-9715 at 2/4/21, 3:23 PM: ------------------------------------------------------------------ So the fix for this should be pretty easy, and I will test luceneutil locally to double-check that it fixes the reported issue, but it made me think it would be good to be able to unit test such overflows without actually needing to create a 2GB index. Is there a clean way to do this already? I see that we have a MockIndexInputWrapper/MockDirectoryWrapper where we could potentially introduce some kind of mock offset that introduces 2GB of used "address space" in the IndexInput. Does that seem useful? was (Author: sokolov): So the fix for this should be pretty easy, and I will test luceneutil locally do double-check that it fixes the reported issue, but it made me think it would be good to be able to unit test such overflows without actually needing to create a 2GB index. Is there a clean way to do this already? I see that we have a MockIndexInputWrapper/MockDirectoryWrapper where we could potentially introduce some kind of mock offset that introduces 2GB of used "address space" in the IndexInput. Does that seem useful? > EOF error in VectorValues in Lucene nightly benchmarks > ------------------------------------------------------ > > Key: LUCENE-9715 > URL: https://issues.apache.org/jira/browse/LUCENE-9715 > Project: Lucene - Core > Issue Type: Bug > Components: core/search > Affects Versions: trunk > Environment: OS: Arch Linux > Java versions: > {code:java} > openjdk version "11.0.10" 2021-01-19 > OpenJDK Runtime Environment (build 11.0.10+8) > OpenJDK 64-Bit Server VM (build 11.0.10+8, mixed mode){code} > > Reporter: Anton Hägerstrand > Priority: Major > Attachments: benchrun.py > > > Hi! When running the nightly benchmarks, I can consistently reproduce an EOF > exception in the VectorValues code: > {code:java} > TASK LEN=150000 > Task repeat count 1000 > Tasks file /home/anton/dev/lucene-bench-home/util/tasks/wikinightly.tasks > Num task per cat 5 > EXC: <vector:knn:<golf>[-0.07267512,...]> > Exception in thread "Thread-2" java.lang.RuntimeException: > java.lang.RuntimeException: java.io.EOFException: seek past EOF: > MMapIndexInput(path="/home/anton/dev/lucene-bench-home/indices/lucene_bench_2021-01-25_f670131_medium_1thread/index/_32.vec") > [slice=vector-data] > at perf.TaskThreads$TaskThread.run(TaskThreads.java:105) > Caused by: java.lang.RuntimeException: java.io.EOFException: seek past EOF: > MMapIndexInput(path="/home/anton/dev/lucene-bench-home/indices/lucene_bench_2021-01-25_f670131_medium_1thread/index/_32.vec") > [slice=vector-data] > at perf.SearchTask.go(SearchTask.java:322) > at perf.TaskThreads$TaskThread.run(TaskThreads.java:91) > Caused by: java.io.EOFException: seek past EOF: > MMapIndexInput(path="/home/anton/dev/lucene-bench-home/indices/lucene_bench_2021-01-25_f670131_medium_1thread/index/_32.vec") > [slice=vector-data] > at > org.apache.lucene.store.ByteBufferIndexInput.seek(ByteBufferIndexInput.java:255) > at > org.apache.lucene.store.ByteBufferIndexInput$MultiBufferImpl.seek(ByteBufferIndexInput.java:575) > at > org.apache.lucene.codecs.lucene90.Lucene90VectorReader$OffHeapVectorValues.vectorValue(Lucene90VectorReader.java:432) > at org.apache.lucene.util.hnsw.HnswGraph.search(HnswGraph.java:118) > at > org.apache.lucene.codecs.lucene90.Lucene90VectorReader$OffHeapVectorValues.search(Lucene90VectorReader.java:409) > at perf.KnnQuery$KnnWeight.scorer(KnnQuery.java:88) > at org.apache.lucene.search.Weight.bulkScorer(Weight.java:166) > at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:743) > at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:533) > at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:664) > at > org.apache.lucene.search.IndexSearcher.searchAfter(IndexSearcher.java:510) > at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:520) > at perf.SearchTask.go(SearchTask.java:263) > ... 1 more > EXC: <vector:knn:<many geografia>[0.02625591,...]>{code} > I have tried this both on eb24e95731b9f865b95b821c1745264fdc58119 which was > head of master/trunk about a week ago, as well as on > f670131cbccf42fdde378ee47f9b01977ebbd147 from > [https://github.com/apache/lucene-solr/pull/2239.] > Command was, in the lucene benchmark setup: > {code:java} > run: java -server -Xms8g -Xmx8g -XX:+FlightRecorder > -XX:StartFlightRecording=name=Default,filename=/home/anton/dev/lucene-bench-home/jfr/lucene_bench_2021-01-31_f670131_medium_1thread_search.jfr,settings=profile > -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints -classpath > /home/anton/dev/lucene-bench-home/trunk/lucene/core/build/libs/lucene-core-9.0.0-SNAPSHOT.jar:/home/anton/dev/lucene-bench-home/trunk/lucene/core/build/classes/java/test:/home/anton/dev/lucene-bench-home/trunk/lucene/sandbox/build/classes/java/main:/home/anton/dev/lucene-bench-home/trunk/lucene/misc/build/classes/java/main:/home/anton/dev/lucene-bench-home/trunk/lucene/facet/build/classes/java/main:/home/anton/dev/lucene-bench-home/trunk/lucene/analysis/common/build/classes/java/main:/home/anton/dev/lucene-bench-home/trunk/lucene/analysis/icu/build/classes/java/main:/home/anton/dev/lucene-bench-home/trunk/lucene/queryparser/build/classes/java/main:/home/anton/dev/lucene-bench-home/trunk/lucene/grouping/build/classes/java/main:/home/anton/dev/lucene-bench-home/trunk/lucene/suggest/build/classes/java/main:/home/anton/dev/lucene-bench-home/trunk/lucene/highlighter/build/classes/java/main:/home/anton/dev/lucene-bench-home/trunk/lucene/codecs/build/classes/java/main:/home/anton/dev/lucene-bench-home/trunk/lucene/queries/build/classes/java/main:/home/anton/.gradle/caches/modules-2/files-2.1/com.carrotsearch/hppc/0.8.2/ccb3ef933ead6b5d766fa571582ddb9b447e48c4/hppc-0.8.2.jar:/home/anton/dev/lucene-bench-home/util/lib/HdrHistogram.jar:/home/anton/dev/lucene-bench-home/util/build > perf.SearchPerfTest -dirImpl MMapDirectory -indexPath > /home/anton/dev/lucene-bench-home/indices/lucene_bench_2021-01-31_f670131_medium_1thread > -facets taxonomy:Date;Date -facets taxonomy:Month;Month -facets > taxonomy:DayOfYear;DayOfYear -facets sortedset:Month;Month -facets > sortedset:DayOfYear;DayOfYear -analyzer StandardAnalyzerNoStopWords > -taskSource /home/anton/dev/lucene-bench-home/util/tasks/wikinightly.tasks > -searchThreadCount 2 -taskRepeatCount 1000 -field body -tasksPerCat 5 > -staticSeed -8035476 -seed -8826252 -similarity BM25Similarity -commit multi > -hiliteImpl FastVectorHighlighter -log > /home/anton/dev/lucene-bench-home/logs/jfrtest.jfrtest.0 -topN 10 -printHeap > -pk -vectorDict /home/anton/dev/lucene-bench-home/data/glove.6B.100d.txt > {code} > Which should be the same that the nighly benchmarks run. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org