[GitHub] [lucene] zf853109035 opened a new issue, #12441: IndexSearcher.doc(int docID, Set fieldsToLoad) method is so slow?

2023-07-15 Thread via GitHub


zf853109035 opened a new issue, #12441:
URL: https://github.com/apache/lucene/issues/12441

   ### Description
   
   I created a file-related index and ten 1 MB files. When I did not store the 
file content, I ran the doc(int docID, Set fieldsToLoad) of the 
IndexSearcher class ten times, and the delay was about 30 ms. When I stored the 
file content, If the doc(int docID, Set fieldsToLoad) of the 
IndexSearcher class runs ten times, the delay is about 150 ms to 200 ms. Even 
if the fieldsToLoad does not contain the content field, the delay is also slow. 
How can I optimize the delay? Why is it slow if fieldsToLoad does not contain 
content filed?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mkhludnev commented on issue #12441: IndexSearcher.doc(int docID, Set fieldsToLoad) method is so slow?

2023-07-15 Thread via GitHub


mkhludnev commented on issue #12441:
URL: https://github.com/apache/lucene/issues/12441#issuecomment-1636702962

   It's by-design: whole block of records need to be decompressed and iterated 
through. Perhaps docValues (eg binary) might provide some sort of selectivity.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mkhludnev closed issue #12441: IndexSearcher.doc(int docID, Set fieldsToLoad) method is so slow?

2023-07-15 Thread via GitHub


mkhludnev closed issue #12441: IndexSearcher.doc(int docID, Set 
fieldsToLoad) method is so slow?
URL: https://github.com/apache/lucene/issues/12441


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] ChrisHegarty commented on pull request #12417: forutil add vectorized and scalar code

2023-07-15 Thread via GitHub


ChrisHegarty commented on PR #12417:
URL: https://github.com/apache/lucene/pull/12417#issuecomment-1636713748

   Apologies for my tardy and terse interaction here. I've been otherwise 
preoccupied. I hope to spend time on this soon.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #12417: forutil add vectorized and scalar code

2023-07-15 Thread via GitHub


uschindler commented on PR #12417:
URL: https://github.com/apache/lucene/pull/12417#issuecomment-1636714292

   > Note that these benchmarks were running with jdk19 (not 20), so it's 
possible we'd see something different with 20?
   
   Lucene enables and compiles the vectorized code only for jdk 20 and 21. In 
19 it won't be enabled.
   
   Be sure to also show the .message logged on startup by the 
`VectorizationProvider`!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] stefanvodita opened a new pull request, #12442: Assert IdxOrDvQuery subqueries and document useful fields

2023-07-15 Thread via GitHub


stefanvodita opened a new pull request, #12442:
URL: https://github.com/apache/lucene/pull/12442

   This is a follow-up from #12426. We introduce assertions in 
`TestIndexOrDocValuesQuery` that the two wrapped queries are behaving the same 
way and we document fields that produce indexed structures and doc values, 
which are good candidates for being used with `IndexOrDocValuesQuery`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] stefanvodita commented on a diff in pull request #12442: Assert IdxOrDvQuery subqueries and document useful fields

2023-07-15 Thread via GitHub


stefanvodita commented on code in PR #12442:
URL: https://github.com/apache/lucene/pull/12442#discussion_r1264364625


##
lucene/test-framework/src/java/org/apache/lucene/tests/search/QueryUtils.java:
##
@@ -675,7 +675,14 @@ public static void checkBulkScorerSkipTo(Random r, Query 
query, IndexSearcher se
 query = searcher.rewrite(query);
 Weight weight = searcher.createWeight(query, ScoreMode.COMPLETE, 1);
 for (LeafReaderContext context : searcher.getIndexReader().leaves()) {
-  final Scorer scorer = weight.scorer(context);
+  final Scorer scorer;
+  if (weight.scorerSupplier(context) != null) {
+// For IndexOrDocValuesQuey, the bulk scorer will use the indexed 
structure query
+// and the scorer with a lead cost of 0 will use the doc values query.
+scorer = weight.scorerSupplier(context).get(0);

Review Comment:
   I had some doubts if we should use a lead cost of 0 across the board, but it 
doesn't seem as if any tests relied on it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] stefanvodita commented on pull request #12426: Introduce VerifyingQuery

2023-07-15 Thread via GitHub


stefanvodita commented on PR #12426:
URL: https://github.com/apache/lucene/pull/12426#issuecomment-1636716527

   Thank you for the suggestions for `IndexOrDocValuesQuery`! I’ve opened a 
separate [PR](https://github.com/apache/lucene/pull/12442) to address them. Let 
me know if it matches what you had in mind.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #12417: forutil add vectorized and scalar code

2023-07-15 Thread via GitHub


rmuir commented on PR #12417:
URL: https://github.com/apache/lucene/pull/12417#issuecomment-1636842807

   please, lets not use this integer vectorization when `hasFastIntegerVectors` 
is false. Otherwise we can see 30x or so slowdown on virtualmachines without 
properly plumbed AVX.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org