[GitHub] [lucene] mikemccand commented on pull request #12415: Optimize disjunction counts.

2023-07-21 Thread via GitHub


mikemccand commented on PR #12415:
URL: https://github.com/apache/lucene/pull/12415#issuecomment-1645462761

   > counting title OR 12 is now 80% faster compared to main.
   
   Wow!  I'll try to review soon.  Thanks @jpountz!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gashutos commented on issue #12448: [Performance] sort query improvement for sequential ordered data [e.g. timestamp field sort in log data]

2023-07-21 Thread via GitHub


gashutos commented on issue #12448:
URL: https://github.com/apache/lucene/issues/12448#issuecomment-1645589833

   @backslasht The overhead of `Index Sort` is very high. I ingested above 36 
million documents with/without @timestamp indexsort and different is minimum 
20% plus. Refer below numbers.
   
   Without Index sort on @timestamp
   ```
   | Min Throughput | 
index-append |   182029 | docs/s |
   |Mean Throughput | 
index-append |   197051 | docs/s |
   |  Median Throughput | 
index-append |   195665 | docs/s |
   | Max Throughput | 
index-append |   210468 | docs/s |
   |50th percentile latency | 
index-append |  165.423 | ms |
   |90th percentile latency | 
index-append |  231.446 | ms |
   |99th percentile latency | 
index-append |  908.578 | ms |
   |  99.9th percentile latency | 
index-append |  8934.86 | ms |
   | 99.99th percentile latency | 
index-append |  10348.3 | ms |
   |   100th percentile latency | 
index-append |  10806.6 | ms |
   ```
   
   With index sort on @timestamp on ascending order.
   ```
   | Min Throughput | 
index-append |   141237 | docs/s |
   |Mean Throughput | 
index-append |   149861 | docs/s |
   |  Median Throughput | 
index-append |   146907 | docs/s |
   | Max Throughput | 
index-append |   167086 | docs/s |
   |50th percentile latency | 
index-append |  210.367 | ms |
   |90th percentile latency | 
index-append |  315.659 | ms |
   |99th percentile latency | 
index-append |  1458.79 | ms |
   |  99.9th percentile latency | 
index-append |  10476.3 | ms |
   | 99.99th percentile latency | 
index-append |  10963.3 | ms |
   |   100th percentile latency | 
index-append |  10994.6 | ms |
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] heemin32 commented on pull request #12287: Fix a bug in ShapeTestUtil

2023-07-21 Thread via GitHub


heemin32 commented on PR #12287:
URL: https://github.com/apache/lucene/pull/12287#issuecomment-1646099677

   > Nice find! I wonder if we could put a test in place that would have 
reliably detected this issue.
   
   This is a code for test actually. If we are writing a test using this 
method, we will detect the issue and that is how I found it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mayya-sharipova commented on a diff in pull request #12436: Move max vector dims limit to Codec

2023-07-21 Thread via GitHub


mayya-sharipova commented on code in PR #12436:
URL: https://github.com/apache/lucene/pull/12436#discussion_r1270997853


##
lucene/core/src/java/org/apache/lucene/index/IndexingChain.java:
##
@@ -621,6 +621,12 @@ private void initializeFieldInfo(PerField pf) throws 
IOException {
   final Sort indexSort = indexWriterConfig.getIndexSort();
   validateIndexSortDVType(indexSort, pf.fieldName, s.docValuesType);
 }
+if (s.vectorDimension != 0) {
+  validateMaxVectorDimension(
+  pf.fieldName,
+  s.vectorDimension,
+  indexWriterConfig.getCodec().knnVectorsFormat().getMaxDimensions());
+}

Review Comment:
   @jpountz Thanks for your feedback, you are completely right, I missed about 
`PerFieldKnnVectorsFormat`.
   
   In 21ebd51dfc5cf34e3741365f1069d0bfc98afb69, I add field name to 
`getMaxDimensions` method.



##
lucene/core/src/java/org/apache/lucene/index/IndexingChain.java:
##
@@ -621,6 +621,12 @@ private void initializeFieldInfo(PerField pf) throws 
IOException {
   final Sort indexSort = indexWriterConfig.getIndexSort();
   validateIndexSortDVType(indexSort, pf.fieldName, s.docValuesType);
 }
+if (s.vectorDimension != 0) {
+  validateMaxVectorDimension(
+  pf.fieldName,
+  s.vectorDimension,
+  indexWriterConfig.getCodec().knnVectorsFormat().getMaxDimensions());
+}

Review Comment:
   @jpountz Thanks for your feedback, you are completely right, I missed about 
`PerFieldKnnVectorsFormat`.
   
   In 21ebd51dfc5cf34e3741365f1069d0bfc98afb69, I've added a  field name to 
`getMaxDimensions` method.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] stefanvodita commented on pull request #12287: Fix a bug in ShapeTestUtil

2023-07-21 Thread via GitHub


stefanvodita commented on PR #12287:
URL: https://github.com/apache/lucene/pull/12287#issuecomment-1646142364

   I should have been clearer. I'm assuming the existing tests that call 
`nextPolygon()` are passing before and after the fix, which warrants some 
suspicion. Could we devise a test that would use these utility methods and fail 
without this fix? It sounds like you were already writing a test that would 
serve this purpose. Let me know if I've got it wrong.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mayya-sharipova commented on a diff in pull request #12436: Move max vector dims limit to Codec

2023-07-21 Thread via GitHub


mayya-sharipova commented on code in PR #12436:
URL: https://github.com/apache/lucene/pull/12436#discussion_r1270997853


##
lucene/core/src/java/org/apache/lucene/index/IndexingChain.java:
##
@@ -621,6 +621,12 @@ private void initializeFieldInfo(PerField pf) throws 
IOException {
   final Sort indexSort = indexWriterConfig.getIndexSort();
   validateIndexSortDVType(indexSort, pf.fieldName, s.docValuesType);
 }
+if (s.vectorDimension != 0) {
+  validateMaxVectorDimension(
+  pf.fieldName,
+  s.vectorDimension,
+  indexWriterConfig.getCodec().knnVectorsFormat().getMaxDimensions());
+}

Review Comment:
   @jpountz Thanks for your feedback, you are completely right, I missed about 
`PerFieldKnnVectorsFormat`.
   
   In 21ebd51dfc5cf34e3741365f1069d0bfc98afb69, I've added a  field name to the 
`getMaxDimensions` method.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] heemin32 commented on pull request #12287: Fix a bug in ShapeTestUtil

2023-07-21 Thread via GitHub


heemin32 commented on PR #12287:
URL: https://github.com/apache/lucene/pull/12287#issuecomment-1646175009

   > I should have been clearer. I'm assuming the existing tests that call 
`nextPolygon()` are passing before and after the fix, which warrants some 
suspicion. Could we devise a test that would use these utility methods and fail 
without this fix? It sounds like you were already writing a test that would 
serve this purpose. Let me know if I've got it wrong.
   
   Got your point. Actually, I am using this utility methods in other 
dependency repos, OpenSearch. There, the test fails without this fix. Let me 
see if I can add a test using this utility method in lucene.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jbellis commented on pull request #12421: Concurrent hnsw graph and builder, take two

2023-07-21 Thread via GitHub


jbellis commented on PR #12421:
URL: https://github.com/apache/lucene/pull/12421#issuecomment-1646504742

   Parallel code with one thread will build the same graph as serial code.
   (This is validated by unit tests.)  With multiple threads, parallel code
   errs on the side of potentially adding "extra" connections.  So recall
   should be equal or greater and that is what I see with the 1M SIFT dataset.
   
   Serial code: 92.0% recall over 5 runs
   Parallel code built with 8 threads: 92.4% over 5 runs
   
   On Mon, Jul 10, 2023 at 3:54 PM Mayya Sharipova ***@***.***>
   wrote:
   
   > Great work!
   > Have you compared the recall of the parallel graph with the serially built
   > graph (for example using ann-benchmarks)?
   >
   > —
   > Reply to this email directly, view it on GitHub
   > , or
   > unsubscribe
   > 

   > .
   > You are receiving this because you authored the thread.Message ID:
   > ***@***.***>
   >
   
   
   -- 
   Jonathan Ellis
   co-founder, http://www.datastax.com
   @spyced
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org