[GitHub] [lucene] gf2121 commented on pull request #541: LUCENE-10315: Speed up BKD leaf block ids codec by a 512 ints ForUtil

2022-01-21 Thread GitBox


gf2121 commented on pull request #541:
URL: https://github.com/apache/lucene/pull/541#issuecomment-1018339065


   Hi @iverase ! Sorry to disturb again, but I can not see the error with the 
`IndexAndSearchShapes` in luceneutil too. (I run the script with param 
`-polyRussia -intersects -reindex` )
   
   Could you tell me what the param you were using and post the newest script 
code here? Thanks a lot!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] comdotwang162 commented on a change in pull request #601: LUCENE-10375: Write merged vectors to file before building graph

2022-01-21 Thread GitBox


comdotwang162 commented on a change in pull request #601:
URL: https://github.com/apache/lucene/pull/601#discussion_r789345508



##
File path: 
lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90HnswVectorsWriter.java
##
@@ -110,26 +113,17 @@
   @Override
   public void writeField(FieldInfo fieldInfo, KnnVectorsReader 
knnVectorsReader)
   throws IOException {
+writeVectorDataPadding();
+long vectorDataOffset = vectorData.getFilePointer();
+
 VectorValues vectors = knnVectorsReader.getVectorValues(fieldInfo.name);
-long pos = vectorData.getFilePointer();
-// write floats aligned at 4 bytes. This will not survive CFS, but it 
shows a small benefit when
-// CFS is not used, eg for larger indexes
-long padding = (4 - (pos & 0x3)) & 0x3;
-long vectorDataOffset = pos + padding;
-for (int i = 0; i < padding; i++) {
-  vectorData.writeByte((byte) 0);
-}
 // TODO - use a better data structure; a bitset? DocsWithFieldSet is p.p. 
in o.a.l.index
-int[] docIds = new int[vectors.size()];
-int count = 0;
-for (int docV = vectors.nextDoc(); docV != NO_MORE_DOCS; docV = 
vectors.nextDoc(), count++) {
-  // write vector
-  writeVectorValue(vectors);
-  docIds[count] = docV;
-}
+int[] docIds = writeVectorData(vectorData, vectors);

Review comment:
   Do we really need docIds?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] msokolov commented on a change in pull request #601: LUCENE-10375: Write merged vectors to file before building graph

2022-01-21 Thread GitBox


msokolov commented on a change in pull request #601:
URL: https://github.com/apache/lucene/pull/601#discussion_r789638938



##
File path: 
lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90HnswVectorsWriter.java
##
@@ -110,26 +113,17 @@
   @Override
   public void writeField(FieldInfo fieldInfo, KnnVectorsReader 
knnVectorsReader)
   throws IOException {
+writeVectorDataPadding();
+long vectorDataOffset = vectorData.getFilePointer();
+
 VectorValues vectors = knnVectorsReader.getVectorValues(fieldInfo.name);
-long pos = vectorData.getFilePointer();
-// write floats aligned at 4 bytes. This will not survive CFS, but it 
shows a small benefit when
-// CFS is not used, eg for larger indexes
-long padding = (4 - (pos & 0x3)) & 0x3;
-long vectorDataOffset = pos + padding;
-for (int i = 0; i < padding; i++) {
-  vectorData.writeByte((byte) 0);
-}
 // TODO - use a better data structure; a bitset? DocsWithFieldSet is p.p. 
in o.a.l.index
-int[] docIds = new int[vectors.size()];
-int count = 0;
-for (int docV = vectors.nextDoc(); docV != NO_MORE_DOCS; docV = 
vectors.nextDoc(), count++) {
-  // write vector
-  writeVectorValue(vectors);
-  docIds[count] = docV;
-}
+int[] docIds = writeVectorData(vectorData, vectors);

Review comment:
   We need to know which documents have a value in case the data is sparse 
(not populated for every doc). Probably could use a bitset instead




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] iverase commented on pull request #541: LUCENE-10315: Speed up BKD leaf block ids codec by a 512 ints ForUtil

2022-01-21 Thread GitBox


iverase commented on pull request #541:
URL: https://github.com/apache/lucene/pull/541#issuecomment-1018492442


   have you used the data in here: http://home.apache.org/~ivera/osmdata.wkt.gz?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gf2121 commented on pull request #541: LUCENE-10315: Speed up BKD leaf block ids codec by a 512 ints ForUtil

2022-01-21 Thread GitBox


gf2121 commented on pull request #541:
URL: https://github.com/apache/lucene/pull/541#issuecomment-1018673077


   @iverase Yes i've put the file under `DATA_LOCATION`.
   
   ```
   ➜  points ls -lh
   total 10971488
   -rw-r--r--@ 1 gf  staff23M 12 15 13:36 cleveland.poly.txt.gz
   -rw-r--r--  1 gf  staff   1.9G 12 15 13:42 latlon.subsetPlusAllLondon.txt
   -rw-r--r--@ 1 gf  staff   938K 12 15 13:36 london.boroughs.poly.txt.gz
   -rw-r--r--  1 gf  staff   3.3G  1 21 00:36 osmdata.wkt
   -rw-r--r--@ 1 gf  staff62K 12 15 13:36 russia.poly.txt.gz
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10050) Remove DrillSideways#search(DrillDownQuery,Collector) in favor of DrillSideways#search(DrillDownQuery,CollectorManager)

2022-01-21 Thread Gautam Worah (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17480286#comment-17480286
 ] 

Gautam Worah commented on LUCENE-10050:
---

I'm working on this issue right now. PR will be ready soon..

> Remove DrillSideways#search(DrillDownQuery,Collector) in favor of 
> DrillSideways#search(DrillDownQuery,CollectorManager)
> ---
>
> Key: LUCENE-10050
> URL: https://issues.apache.org/jira/browse/LUCENE-10050
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Greg Miller
>Priority: Minor
>
> With similar motivation to LUCENE-10002, we should consider doing away with 
> the ability to directly provide a Collector to DrillSideways in favor of 
> always accepting a CollectorManager. Just like with IndexSearcher, it's 
> trappy that you can provide an Executor when setting up DrillSideways and 
> then not leverage it by directly providing a single Collector.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jtibshirani commented on pull request #617: LUCENE-10375: Write vectors to file in flush

2022-01-21 Thread GitBox


jtibshirani commented on pull request #617:
URL: https://github.com/apache/lucene/pull/617#issuecomment-1018943960


   Ah right, that makes sense. Somehow I thought there'd be significant 
overhead from decoding vectors from the on-disk format, but I guess that's not 
true.
   
   Anyways, thanks for taking a look. I plan to merge in the next day if there 
aren't more comments.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz commented on a change in pull request #617: LUCENE-10375: Write vectors to file in flush

2022-01-21 Thread GitBox


jpountz commented on a change in pull request #617:
URL: https://github.com/apache/lucene/pull/617#discussion_r790115140



##
File path: 
lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90HnswVectorsWriter.java
##
@@ -114,79 +113,15 @@
   public void writeField(FieldInfo fieldInfo, KnnVectorsReader 
knnVectorsReader)
   throws IOException {
 long vectorDataOffset = vectorData.alignFilePointer(Float.BYTES);
-
 VectorValues vectors = knnVectorsReader.getVectorValues(fieldInfo.name);
-// TODO - use a better data structure; a bitset? DocsWithFieldSet is p.p. 
in o.a.l.index

Review comment:
   nit: can you retain that TODO?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org