Re: [PR] Reduce dotProductBody to less than the maximum bytecode size of a hot method to be inlined (325) [lucene]

2024-12-02 Thread via GitHub
rmuir commented on PR #14031: URL: https://github.com/apache/lucene/pull/14031#issuecomment-2513295791 I applied and tested the same approach with the other 2 functions too. cosine was already underweight: it is only unrolled twice due to complexity of the mathematical formula, but it keeps

Re: [PR] Terminate automaton when it can match all suffixes, and match suffixes directly. [lucene]

2024-12-02 Thread via GitHub
github-actions[bot] commented on PR #13072: URL: https://github.com/apache/lucene/pull/13072#issuecomment-2513257967 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] Avoid reload block when seeking backward in SegmentTermsEnum. [lucene]

2024-12-02 Thread via GitHub
github-actions[bot] commented on PR #13253: URL: https://github.com/apache/lucene/pull/13253#issuecomment-2513256091 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] Parse escaped brackets and spaces in range queries [lucene]

2024-12-02 Thread via GitHub
github-actions[bot] commented on PR #13887: URL: https://github.com/apache/lucene/pull/13887#issuecomment-2513255223 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] Add some basic HNSW graph checks to CheckIndex [lucene]

2024-12-02 Thread via GitHub
github-actions[bot] commented on PR #13984: URL: https://github.com/apache/lucene/pull/13984#issuecomment-2513254955 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] Reduce dotProductBody to less than the maximum bytecode size of a hot method to be inlined (325) [lucene]

2024-12-02 Thread via GitHub
ChrisHegarty commented on PR #14031: URL: https://github.com/apache/lucene/pull/14031#issuecomment-2513179696 @rmuir nice!!! wanna push that to the branch? Then I’ll do some more benchmark runs tomorrow too. -- This is an automated message from the Apache Git Service. To respond to the m

Re: [PR] Speed up PostingsEnum when reading positions. [lucene]

2024-12-02 Thread via GitHub
jpountz merged PR #14032: URL: https://github.com/apache/lucene/pull/14032 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] LUCENE-4056: Japanese Tokenizer (Kuromoji) cannot build UniDic dictionary [lucene]

2024-12-02 Thread via GitHub
azagniotov commented on PR #12517: URL: https://github.com/apache/lucene/pull/12517#issuecomment-2513057933 @mocobeta @johtani Hello! I wanted to touch base within the context of the current PR, I am observing an interesting issue creating tokenizer using: - Setting `discardCompou

Re: [PR] Reduce dotProductBody to less than the maximum bytecode size of a hot method to be inlined (325) [lucene]

2024-12-02 Thread via GitHub
rmuir commented on PR #14031: URL: https://github.com/apache/lucene/pull/14031#issuecomment-2512993292 We can iterate on last patch and save a few more bytes (302b) if we just pull out into a static final constant instead, too: ``` --- a/lucene/core/src/java21/org/apache/lucene/in

Re: [PR] Speed up PostingsEnum when reading positions. [lucene]

2024-12-02 Thread via GitHub
jpountz commented on PR #14032: URL: https://github.com/apache/lucene/pull/14032#issuecomment-2512979187 TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value IntNRQ 110.78 (11.4%)

[PR] Speed up PostingsEnum when reading positions. [lucene]

2024-12-02 Thread via GitHub
jpountz opened a new pull request, #14032: URL: https://github.com/apache/lucene/pull/14032 This PR changes the following: - As much work as possible is moved from `nextDoc()`/`advance()` to `nextPosition()`. This helps only pay the overhead of reading positions when all query terms agr

Re: [PR] Grammar and typo fixes [lucene]

2024-12-02 Thread via GitHub
msokolov commented on code in PR #14019: URL: https://github.com/apache/lucene/pull/14019#discussion_r1865909044 ## lucene/core/src/java/org/apache/lucene/index/IndexCommit.java: ## @@ -25,9 +25,9 @@ * Expert: represents a single commit into an index as seen by the {@link Ind

Re: [PR] Reduce dotProductBody to less than the maximum bytecode size of a hot method to be inlined (325) [lucene]

2024-12-02 Thread via GitHub
rmuir commented on PR #14031: URL: https://github.com/apache/lucene/pull/14031#issuecomment-2512878266 good here too. we can also save another 5 bytes with something like this. it seems to help me a tiny bit according to the JMH too. not sure if it makes the code harder or easier to r

Re: [PR] Grammar and typo fixes [lucene]

2024-12-02 Thread via GitHub
msokolov commented on code in PR #14019: URL: https://github.com/apache/lucene/pull/14019#discussion_r1865914609 ## lucene/core/src/java/org/apache/lucene/index/IndexWriter.java: ## @@ -141,7 +141,7 @@ * soon as a new commit is done. Creating your own policy can allow you to

[PR] Reduce dotProductBody to less than the maximum bytecode size of a hot method to be inlined (325) [lucene]

2024-12-02 Thread via GitHub
ChrisHegarty opened a new pull request, #14031: URL: https://github.com/apache/lucene/pull/14031 This commit reduces the Panama `dotProductBody` implementation to less than the maximum bytecode size of a hot method to be inlined (325). Previously: ` org.apache.lucene.internal.vectori

Re: [I] Fuzzy phrase query incorrectly transformed to term query (without fuzziness) [lucene]

2024-12-02 Thread via GitHub
mfolnovic closed issue #14030: Fuzzy phrase query incorrectly transformed to term query (without fuzziness) URL: https://github.com/apache/lucene/issues/14030 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [I] Fuzzy phrase query incorrectly transformed to term query (without fuzziness) [lucene]

2024-12-02 Thread via GitHub
mfolnovic commented on issue #14030: URL: https://github.com/apache/lucene/issues/14030#issuecomment-251127 Sorry, I misunderstood what `A:12345~1` (levenhstein difference from "12345") VS `A:"12345"~1` (number of words separating matched words) means. -- This is an automated message

[I] Fuzzy phrase query incorrectly transformed to term query (without fuzziness) [lucene]

2024-12-02 Thread via GitHub
mfolnovic opened a new issue, #14030: URL: https://github.com/apache/lucene/issues/14030 ### Description Hello, I'm new to Lucene, so I apologize if this is expected behaviour. I've noticed current implementation of `PhraseQuery#rewrite` does not take into account fuzzin

Re: [I] Should we explore DiskANN for aKNN vector search? [lucene]

2024-12-02 Thread via GitHub
weizijun commented on issue #12615: URL: https://github.com/apache/lucene/issues/12615#issuecomment-2511299239 hi, all: Is there any latest progress on Lucene's diskann? We found that in the RAG scenario, the document data volume is very large and all of it is stored in memory, which

Re: [I] Upgrade to OpenNLP 2.5.x [lucene]

2024-12-02 Thread via GitHub
mawiesne commented on issue #14029: URL: https://github.com/apache/lucene/issues/14029#issuecomment-2511189960 FYI @cpoerschke -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

[I] Upgrade to OpenNLP 2.5.x [lucene]

2024-12-02 Thread via GitHub
mawiesne opened a new issue, #14029: URL: https://github.com/apache/lucene/issues/14029 ### Description Apache OpenNLP 2.5.0 has been released. This [version](https://opennlp.apache.org/news/release-250.html) contains new implementations of TokenNameFinder et al., that are Thread-Saf

[I] Optimizing use the QueryCache [lucene]

2024-12-02 Thread via GitHub
kkewwei opened a new issue, #14028: URL: https://github.com/apache/lucene/issues/14028 ### Description In my use-case, I discover the utilization percentage of `QueryCache`(with a capacity of 3GB and only 50MB used) is extremely low. Most of the queries are as follows: ``` POST

Re: [PR] Fix changelog for GITHUB#14011 [lucene]

2024-12-02 Thread via GitHub
viliam-durina commented on PR #14018: URL: https://github.com/apache/lucene/pull/14018#issuecomment-2511051103 > @viliam-durina would you also like to attempt the backport of this and the related commit to 10.1? Generally this is a matter of cherry-picking the change on to the branch, in th

Re: [PR] Grammar and typo fixes [lucene]

2024-12-02 Thread via GitHub
viliam-durina commented on PR #14019: URL: https://github.com/apache/lucene/pull/14019#issuecomment-2511016616 Thanks for the review :+1: I reverted all the disputed changes, agree we should not go into unproductive discussions. I added a few new ones, they are in a separate commit

Re: [PR] Make SegmentInfos#readCommit(Directory, String, int) public [lucene]

2024-12-02 Thread via GitHub
javanna merged PR #14027: URL: https://github.com/apache/lucene/pull/14027 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Better encapsulate locking logic in HnswGraphBuilder [lucene]

2024-12-02 Thread via GitHub
viliam-durina commented on PR #14016: URL: https://github.com/apache/lucene/pull/14016#issuecomment-2510926970 Our use case is to speed-up indexing of larger segments. We want to build fewer segments, so it makes sense to build them on multiple cores. We build the segments directly, not bui