[GitHub] [lucene] wjp719 commented on a diff in pull request #687: LUCENE-10425:speed up IndexSortSortedNumericDocValuesRangeQuery#BoundedDocSetIdIterator construction using bkd binary search

2022-09-21 Thread GitBox
wjp719 commented on code in PR #687: URL: https://github.com/apache/lucene/pull/687#discussion_r976173487 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/search/IndexSortSortedNumericDocValuesRangeQuery.java: ## @@ -214,12 +220,166 @@ public int count(LeafReaderContext cont

[GitHub] [lucene] uschindler commented on pull request #912: MR-JAR rewrite of MMapDirectory with JDK-19 preview Panama APIs (>= JDK-19-ea+23)

2022-09-21 Thread GitBox
uschindler commented on PR #912: URL: https://github.com/apache/lucene/pull/912#issuecomment-1253365297 JDK 19 was released, I am working on the Toolchain support to support the compilation of the MR-JAR. At moment, the code commented out does not yet work, as AdoptOpenJDK / Temurin did not

[GitHub] [lucene] javanna commented on a diff in pull request #11793: Prevent PointValues from returning null for ghost fields

2022-09-21 Thread GitBox
javanna commented on code in PR #11793: URL: https://github.com/apache/lucene/pull/11793#discussion_r976257408 ## lucene/core/src/java/org/apache/lucene/search/comparators/NumericComparator.java: ## @@ -104,28 +104,28 @@ public abstract class NumericLeafComparator implements Le

[GitHub] [lucene] shaie merged pull request #11775: Minor refactoring and cleanup to taxonomy index code

2022-09-21 Thread GitBox
shaie merged PR #11775: URL: https://github.com/apache/lucene/pull/11775 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apach

[GitHub] [lucene] shaie opened a new pull request, #11798: Minor refactoring and cleanup to taxonomy index code

2022-09-21 Thread GitBox
shaie opened a new pull request, #11798: URL: https://github.com/apache/lucene/pull/11798 ### Description -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

[GitHub] [lucene] thongnt99 opened a new issue, #11799: Indexing method for learned sparse retrieval

2022-09-21 Thread GitBox
thongnt99 opened a new issue, #11799: URL: https://github.com/apache/lucene/issues/11799 ### Description Recent learned sparse retrieval methods ([Splade](https://github.com/naver/splade), [uniCOIL](https://github.com/castorini/pyserini/blob/master/docs/experiments-unicoil.md)) were

[GitHub] [lucene] uschindler commented on pull request #912: MR-JAR rewrite of MMapDirectory with JDK-19 preview Panama APIs (>= JDK-19-ea+23)

2022-09-21 Thread GitBox
uschindler commented on PR #912: URL: https://github.com/apache/lucene/pull/912#issuecomment-1253646592 Current output: ``` Starting a Gradle Daemon (subsequent builds will be faster) Directory 'C:\Users\Uwe Schindler\.gradle\daemon\7.3.3\(custom paths)' (system property 'org.gr

[GitHub] [lucene-solr] itygh commented on pull request #2670: Backport a few upgrades to branch_8_11

2022-09-21 Thread GitBox
itygh commented on PR #2670: URL: https://github.com/apache/lucene-solr/pull/2670#issuecomment-1253686367 这是来自QQ邮箱的假期自动回复邮件。您好,我最近正在休假中,无法亲自回复您的邮件。我将在假期结束后,尽快给您回复。 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

[GitHub] [lucene-solr] janhoy commented on pull request #2670: Backport a few upgrades to branch_8_11

2022-09-21 Thread GitBox
janhoy commented on PR #2670: URL: https://github.com/apache/lucene-solr/pull/2670#issuecomment-1253712939 Most of these should be safe as they are pure bugfix version upgrades. I see one `.java` class touched, which is also safe. Can you try to highlight which part of this PR is the

[GitHub] [lucene-solr] risdenk commented on pull request #2670: Backport a few upgrades to branch_8_11

2022-09-21 Thread GitBox
risdenk commented on PR #2670: URL: https://github.com/apache/lucene-solr/pull/2670#issuecomment-1253717437 > Can you try to highlight which part of this PR is the most "risky"? I suppose it would be the 5 new jars/parsers pulled in by Tika? Or the addition of new Calcite deps we have not h

[GitHub] [lucene] mocobeta commented on issue #11799: Indexing method for learned sparse retrieval

2022-09-21 Thread GitBox
mocobeta commented on issue #11799: URL: https://github.com/apache/lucene/issues/11799#issuecomment-1253724856 In general I'm +1 for supporting learned sparse retrieval, though, I think it would not be so trivial as it looks. For a starter perhaps we could utilize terms' payloads to t

[GitHub] [lucene] gcbaptista opened a new issue, #11800: INVALID_SYNTAX_CANNOT_PARSE for at sign (@)

2022-09-21 Thread GitBox
gcbaptista opened a new issue, #11800: URL: https://github.com/apache/lucene/issues/11800 ### Description Since release `9.1.0`, Lucene's SyntaxParser have been uncapable to parse `@` in a query, throwing a Syntax Error (`INVALID_SYNTAX_CANNOT_PARSE`). Version `9.0.0` is the last I

[GitHub] [lucene] kotman12 commented on pull request #11734: Fix repeating token sentence boundary bug

2022-09-21 Thread GitBox
kotman12 commented on PR #11734: URL: https://github.com/apache/lucene/pull/11734#issuecomment-1253843509 > Hi @kotman12 . Sorry for the delay. I'm not that familiar with this part of the codebase but I think I see what's happening and how you managed to fix it. Looks good to me. It'd be go

[GitHub] [lucene-solr] risdenk commented on pull request #2670: Backport a few upgrades to branch_8_11

2022-09-21 Thread GitBox
risdenk commented on PR #2670: URL: https://github.com/apache/lucene-solr/pull/2670#issuecomment-1253920621 Well I'm struggling to run tests on my M1 Mac. They don't like JDK 8 on M1. I'll have to test this separately on another computer or VM. -- This is an automated message from the Apa

[GitHub] [lucene] rmuir commented on issue #11799: Indexing method for learned sparse retrieval

2022-09-21 Thread GitBox
rmuir commented on issue #11799: URL: https://github.com/apache/lucene/issues/11799#issuecomment-1253945883 You can use `TermFrequencyAttribute` in the analysis chain to set the frequency directly. -- This is an automated message from the Apache Git Service. To respond to the message, ple

[GitHub] [lucene-solr] janhoy commented on pull request #2670: Backport a few upgrades to branch_8_11

2022-09-21 Thread GitBox
janhoy commented on PR #2670: URL: https://github.com/apache/lucene-solr/pull/2670#issuecomment-1253988574 Should there have been an optional GitHub Action to run all tests in a PR? Something you could activate on demand? -- This is an automated message from the Apache Git Service. To res

[GitHub] [lucene-solr] risdenk commented on pull request #2670: Backport a few upgrades to branch_8_11

2022-09-21 Thread GitBox
risdenk commented on PR #2670: URL: https://github.com/apache/lucene-solr/pull/2670#issuecomment-1253995175 > Should there have been an optional GitHub Action to run all tests in a PR? Something you could activate on demand? eh its only an issue with 8.11 and I have ways to do it just

[GitHub] [lucene] rmuir commented on issue #11800: INVALID_SYNTAX_CANNOT_PARSE for at sign (@)

2022-09-21 Thread GitBox
rmuir commented on issue #11800: URL: https://github.com/apache/lucene/issues/11800#issuecomment-1254008994 not a bug, but related to new features added to the parser. see the associated message in `MIGRATE.txt`: ``` ## Minor syntactical changes in StandardQueryParser (Lucene 9.1)

[GitHub] [lucene] dweiss commented on issue #11800: INVALID_SYNTAX_CANNOT_PARSE for at sign (@)

2022-09-21 Thread GitBox
dweiss commented on issue #11800: URL: https://github.com/apache/lucene/issues/11800#issuecomment-1254016674 Also, please note that you can quote the ampersand in terms - this will behave like before. I don't think it's a bug, sorry it caused you trouble but the new functionality is worth i

[GitHub] [lucene] dweiss commented on pull request #11734: Fix repeating token sentence boundary bug

2022-09-21 Thread GitBox
dweiss commented on PR #11734: URL: https://github.com/apache/lucene/pull/11734#issuecomment-1254019222 > I am starting to suspect that this package is not really used a lot because otherwise one would expect these bugs to have been caught sooner given the configuration is in the documentat

[GitHub] [lucene] gautamworah96 commented on a diff in pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor

2022-09-21 Thread GitBox
gautamworah96 commented on code in PR #11796: URL: https://github.com/apache/lucene/pull/11796#discussion_r975969527 ## lucene/core/src/java/org/apache/lucene/store/ByteTrackingIndexOutput.java: ## @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under o

[GitHub] [lucene] shaie merged pull request #11798: Minor refactoring and cleanup to taxonomy index code

2022-09-21 Thread GitBox
shaie merged PR #11798: URL: https://github.com/apache/lucene/pull/11798 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apach

[GitHub] [lucene] jtibshirani commented on issue #11799: Indexing method for learned sparse retrieval

2022-09-21 Thread GitBox
jtibshirani commented on issue #11799: URL: https://github.com/apache/lucene/issues/11799#issuecomment-1254070746 +1 from me too, it'd be great to think through how to support this. Could you explain how the query side would look? Are the queries also sparse vectors with custom impacts?

[GitHub] [lucene] gsmiller commented on pull request #11797: DrillSideways uses advance instead of next when multiple dims miss

2022-09-21 Thread GitBox
gsmiller commented on PR #11797: URL: https://github.com/apache/lucene/pull/11797#issuecomment-1254074997 Cancelling this out as I've realized we can do even better. I'll post a new PR with a few more optimizations baked in. -- This is an automated message from the Apache Git Service. To

[GitHub] [lucene] gsmiller closed pull request #11797: DrillSideways uses advance instead of next when multiple dims miss

2022-09-21 Thread GitBox
gsmiller closed pull request #11797: DrillSideways uses advance instead of next when multiple dims miss URL: https://github.com/apache/lucene/pull/11797 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [lucene] stevenschlansker commented on issue #8553: Add AccessController.doPrivileged around all calls of Class#getResource() and Class#getResourceAsStream() [LUCENE-7502]

2022-09-21 Thread GitBox
stevenschlansker commented on issue #8553: URL: https://github.com/apache/lucene/issues/8553#issuecomment-1254096210 AccessController is now deprecated for removal, as is the security manager. Is this issue still relevant? -- This is an automated message from the Apache Git Service. To re

[GitHub] [lucene] stevenschlansker commented on issue #6534: Classloader issues when running Lucene under a java SecurityManager [LUCENE-5471]

2022-09-21 Thread GitBox
stevenschlansker commented on issue #6534: URL: https://github.com/apache/lucene/issues/6534#issuecomment-1254097488 SecurityManager is now deprecate for removal, so this issue might no longer be relevant going forward. -- This is an automated message from the Apache Git Service. To respo

[GitHub] [lucene] stevenschlansker opened a new issue, #11801: Remove usage of SecurityManager and AccessController

2022-09-21 Thread GitBox
stevenschlansker opened a new issue, #11801: URL: https://github.com/apache/lucene/issues/11801 ### Description Java is removing the SecurityManager and AccessController. Running Lucene build under Java 17 emits a lot of warnings: ``` WARNING: A command line option has

[GitHub] [lucene] msokolov commented on issue #11799: Indexing method for learned sparse retrieval

2022-09-21 Thread GitBox
msokolov commented on issue #11799: URL: https://github.com/apache/lucene/issues/11799#issuecomment-1254100366 Using `TermFrequencyAttribute` to customize the term frequencies you can then create a Query in the normal way and compute BM25 using `b==0` then I think you will directly control

[GitHub] [lucene] rmuir commented on issue #11801: Remove usage of SecurityManager and AccessController

2022-09-21 Thread GitBox
rmuir commented on issue #11801: URL: https://github.com/apache/lucene/issues/11801#issuecomment-1254102841 We use it to sandbox our tests, so we shouldn't remove it without replacement. Otherwise tests might interfere with each other which is not fun to debug. Additionally as a libr

[GitHub] [lucene] rmuir commented on issue #11801: Remove usage of SecurityManager and AccessController

2022-09-21 Thread GitBox
rmuir commented on issue #11801: URL: https://github.com/apache/lucene/issues/11801#issuecomment-1254114559 for the tests i have a couple ideas: * use forbidden-apis more aggressively to statically prevent tests from doing stuff we don't want. Actually more powerful for our use-case in a

[GitHub] [lucene] thongnt99 commented on issue #11799: Indexing method for learned sparse retrieval

2022-09-21 Thread GitBox
thongnt99 commented on issue #11799: URL: https://github.com/apache/lucene/issues/11799#issuecomment-1254119695 @jtibshirani The query side is same as document side, which is a dictionary of terms and weights. To make it compatible with Lucene, people just repeat the terms with its frequen

[GitHub] [lucene] stevenschlansker commented on issue #11801: Remove usage of SecurityManager and AccessController

2022-09-21 Thread GitBox
stevenschlansker commented on issue #11801: URL: https://github.com/apache/lucene/issues/11801#issuecomment-1254120085 > for the situation of being a library and needing to support apps that still rely on securitymanager, I don't see any immediate fix. because the only way to know the secur

[GitHub] [lucene] gautamworah96 commented on a diff in pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor

2022-09-21 Thread GitBox
gautamworah96 commented on code in PR #11796: URL: https://github.com/apache/lucene/pull/11796#discussion_r976900966 ## lucene/core/src/java/org/apache/lucene/store/WriteAmplificationTrackingDirectoryWrapper.java: ## @@ -0,0 +1,59 @@ +/* + * Licensed to the Apache Software Found

[GitHub] [lucene] rmuir commented on issue #11801: Remove usage of SecurityManager and AccessController

2022-09-21 Thread GitBox
rmuir commented on issue #11801: URL: https://github.com/apache/lucene/issues/11801#issuecomment-1254139967 I'm not worried, according to the JEP: https://openjdk.org/jeps/411 ``` In feature releases after Java 18, we will degrade other Security Manager APIs so that they remain in plac

[GitHub] [lucene] kotman12 opened a new pull request, #11802: fix sentence iteration in opennlp package

2022-09-21 Thread GitBox
kotman12 opened a new pull request, #11802: URL: https://github.com/apache/lucene/pull/11802 Fix sentence boundary detection bug in case of repeating tokens (i.e. while using OpenNLP analysis chain in conjunction with a KeywordRepeatFilter) by keeping track of the sentence index and looking

[GitHub] [lucene] kotman12 commented on pull request #11734: Fix repeating token sentence boundary bug

2022-09-21 Thread GitBox
kotman12 commented on PR #11734: URL: https://github.com/apache/lucene/pull/11734#issuecomment-1254253617 @dweiss I updated CHANGES.txt but blew up this PR and messed up the history in the process. If you prefer this is a more concise PR with the relevant changes patched in -> https://githu

[GitHub] [lucene] mdmarshmallow commented on a diff in pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor

2022-09-21 Thread GitBox
mdmarshmallow commented on code in PR #11796: URL: https://github.com/apache/lucene/pull/11796#discussion_r977019915 ## lucene/core/src/java/org/apache/lucene/store/ByteTrackingIndexOutput.java: ## @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under o

[GitHub] [lucene] gsmiller opened a new pull request, #11803: DrillSideways optimizations

2022-09-21 Thread GitBox
gsmiller opened a new pull request, #11803: URL: https://github.com/apache/lucene/pull/11803 ### Description This change makes use of `advance` instead of `next` where possible and splits out 1st and 2nd phase checking to avoid match confirmation when unnecessary. Note that I

[GitHub] [lucene] wjp719 commented on pull request #687: LUCENE-10425:speed up IndexSortSortedNumericDocValuesRangeQuery#BoundedDocSetIdIterator construction using bkd binary search

2022-09-21 Thread GitBox
wjp719 commented on PR #687: URL: https://github.com/apache/lucene/pull/687#issuecomment-1254416217 > Thanks, this looks good to me! Can you add a CHANGES entry with your name under 9.5? Thanks a lot, I have added the Change entry. And This PR has a limitation that only index s

[GitHub] [lucene-solr] risdenk commented on pull request #2670: Backport a few upgrades to branch_8_11

2022-09-21 Thread GitBox
risdenk commented on PR #2670: URL: https://github.com/apache/lucene-solr/pull/2670#issuecomment-1254471773 Took a few runs but got a pass: ``` BUILD SUCCESSFUL Total time: 75 minutes 41 seconds ``` all the failures didn't reproduce when run independently so I don't thi

[GitHub] [lucene-solr] risdenk merged pull request #2670: Backport a few upgrades to branch_8_11

2022-09-21 Thread GitBox
risdenk merged PR #2670: URL: https://github.com/apache/lucene-solr/pull/2670 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

[GitHub] [lucene] jpountz commented on pull request #11793: Prevent PointValues from returning null for ghost fields

2022-09-21 Thread GitBox
jpountz commented on PR #11793: URL: https://github.com/apache/lucene/pull/11793#issuecomment-1254593758 Test failures suggest CheckIndex needs to have its expectations adjusted. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [lucene] jpountz commented on pull request #687: LUCENE-10425:speed up IndexSortSortedNumericDocValuesRangeQuery#BoundedDocSetIdIterator construction using bkd binary search

2022-09-21 Thread GitBox
jpountz commented on PR #687: URL: https://github.com/apache/lucene/pull/687#issuecomment-1254600722 I was wondering about descending sorts too! Do we actually need to make this configurable on BKD trees, I would rather not add this option and make the binary search logic a bit more complex

[GitHub] [lucene] jpountz merged pull request #687: LUCENE-10425:speed up IndexSortSortedNumericDocValuesRangeQuery#BoundedDocSetIdIterator construction using bkd binary search

2022-09-21 Thread GitBox
jpountz merged PR #687: URL: https://github.com/apache/lucene/pull/687 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.