[GitHub] [lucene] uschindler commented on a diff in pull request #867: LUCENE-10558: expose stream-based Kuromoji resource constructors
uschindler commented on code in PR #867: URL: https://github.com/apache/lucene/pull/867#discussion_r865606003 ## lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/UnknownDictionary.java: ## @@ -52,18 +52,26 @@ private UnknownDictionary() throws IOException { () -> getClassResource(DICT_FILENAME_SUFFIX)); } - private UnknownDictionary( - IOSupplier targetMapResource, - IOSupplier posResource, - IOSupplier dictResource) + /** + * Create a {@link UnknownDictionary} from an external resource path. + * + * @param targetMap supplier for stream containing target map + * @param posDict supplier for stream containing POS dictionary + * @param dict supplier for stream containing dictionary entries + * @throws IOException if a stream could not be read + */ + public UnknownDictionary( + IOSupplier targetMap, + IOSupplier posDict, + IOSupplier dict) Review Comment: I think he did this because "Resource" is a bit strange, as it is no longer classpath based. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler opened a new pull request, #868: LUCENE-10558: Implement URL ctor to support classpath usage in Kuromoji dictionaries
uschindler opened a new pull request, #868: URL: https://github.com/apache/lucene/pull/868 see https://issues.apache.org/jira/browse/LUCENE-10558 This is against 9.x branch, but can be forward ported to main. TODO: This still needs Nori support. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10558) Expose IOSupplier constructors in Kuromoji (and Nori?)
[ https://issues.apache.org/jira/browse/LUCENE-10558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17532101#comment-17532101 ] Uwe Schindler commented on LUCENE-10558: Here is my preferred variant: https://github.com/apache/lucene/pull/868 See also the test how to use it. Just replace Path by URL ctors and use ClassLoader or Class#getResource(). > Expose IOSupplier constructors in Kuromoji (and Nori?) > --- > > Key: LUCENE-10558 > URL: https://issues.apache.org/jira/browse/LUCENE-10558 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael Sokolov >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > When we refactored the constructors for these resource objects used by the > kuromoji JapaneseTokenizer, we (inadvertently, I expect) changed the > behavior for consumers that were supplying these resources on the classpath. > In that case, we silently replaced the custom resources with the Lucene > built-in ones. I think we cannot support the old API because of Java Module > system restrictions, but we didn't provide any usable replacement or notice > either. > > This issue is for exposing the new (private) constructors that accept > streams, and adding a notice to Migration.md to point users at them, since > they can be used with resources streams loaded from the classpath by the > caller. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #867: LUCENE-10558: expose stream-based Kuromoji resource constructors
uschindler commented on PR #867: URL: https://github.com/apache/lucene/pull/867#issuecomment-1118251208 I would not make the IOSupplier ctors available, they are internal only (IOSupplier is a class which is marked as subject to change). Because we have `java.nio.files.Path` ctors for usage as replacement for the eprecated one, we need one taking `java.net.URL` for resource usage. See #868 for an implementation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath usage in Kuromoji dictionaries
uschindler commented on PR #868: URL: https://github.com/apache/lucene/pull/868#issuecomment-1118253739 In case you ask: This works both with classpath and module usage. The caller-sensitive parts are `Class#getResource(String)`, `ClassLoader#getResource(String)`, and `Module#getResource(String)`. The returned URL is free to use anywhere so it separates concerns like the `IOSupplier` or `Path`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10558) Expose IOSupplier constructors in Kuromoji (and Nori?)
[ https://issues.apache.org/jira/browse/LUCENE-10558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17532117#comment-17532117 ] Uwe Schindler commented on LUCENE-10558: The latest PR also fixes Nori with the same fix. > Expose IOSupplier constructors in Kuromoji (and Nori?) > --- > > Key: LUCENE-10558 > URL: https://issues.apache.org/jira/browse/LUCENE-10558 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael Sokolov >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > When we refactored the constructors for these resource objects used by the > kuromoji JapaneseTokenizer, we (inadvertently, I expect) changed the > behavior for consumers that were supplying these resources on the classpath. > In that case, we silently replaced the custom resources with the Lucene > built-in ones. I think we cannot support the old API because of Java Module > system restrictions, but we didn't provide any usable replacement or notice > either. > > This issue is for exposing the new (private) constructors that accept > streams, and adding a notice to Migration.md to point users at them, since > they can be used with resources streams loaded from the classpath by the > caller. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10558) Add URL constructors as complement to Path ctors in Kuromoji and Nori
[ https://issues.apache.org/jira/browse/LUCENE-10558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-10558: --- Summary: Add URL constructors as complement to Path ctors in Kuromoji and Nori (was: Expose IOSupplier constructors in Kuromoji (and Nori?)) > Add URL constructors as complement to Path ctors in Kuromoji and Nori > - > > Key: LUCENE-10558 > URL: https://issues.apache.org/jira/browse/LUCENE-10558 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael Sokolov >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > When we refactored the constructors for these resource objects used by the > kuromoji JapaneseTokenizer, we (inadvertently, I expect) changed the > behavior for consumers that were supplying these resources on the classpath. > In that case, we silently replaced the custom resources with the Lucene > built-in ones. I think we cannot support the old API because of Java Module > system restrictions, but we didn't provide any usable replacement or notice > either. > > This issue is for exposing the new (private) constructors that accept > streams, and adding a notice to Migration.md to point users at them, since > they can be used with resources streams loaded from the classpath by the > caller. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-10558) Add URL constructors for classpath/module usage as complement to Path ctors in Kuromoji and Nori
[ https://issues.apache.org/jira/browse/LUCENE-10558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler reassigned LUCENE-10558: -- Assignee: Uwe Schindler > Add URL constructors for classpath/module usage as complement to Path ctors > in Kuromoji and Nori > > > Key: LUCENE-10558 > URL: https://issues.apache.org/jira/browse/LUCENE-10558 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael Sokolov >Assignee: Uwe Schindler >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > When we refactored the constructors for these resource objects used by the > kuromoji JapaneseTokenizer, we (inadvertently, I expect) changed the > behavior for consumers that were supplying these resources on the classpath. > In that case, we silently replaced the custom resources with the Lucene > built-in ones. I think we cannot support the old API because of Java Module > system restrictions, but we didn't provide any usable replacement or notice > either. > > This issue is for exposing the new (private) constructors that accept > streams, and adding a notice to Migration.md to point users at them, since > they can be used with resources streams loaded from the classpath by the > caller. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10558) Add URL constructors for classpath/module usage as complement to Path ctors in Kuromoji and Nori
[ https://issues.apache.org/jira/browse/LUCENE-10558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-10558: --- Summary: Add URL constructors for classpath/module usage as complement to Path ctors in Kuromoji and Nori (was: Add URL constructors as complement to Path ctors in Kuromoji and Nori) > Add URL constructors for classpath/module usage as complement to Path ctors > in Kuromoji and Nori > > > Key: LUCENE-10558 > URL: https://issues.apache.org/jira/browse/LUCENE-10558 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael Sokolov >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > When we refactored the constructors for these resource objects used by the > kuromoji JapaneseTokenizer, we (inadvertently, I expect) changed the > behavior for consumers that were supplying these resources on the classpath. > In that case, we silently replaced the custom resources with the Lucene > built-in ones. I think we cannot support the old API because of Java Module > system restrictions, but we didn't provide any usable replacement or notice > either. > > This issue is for exposing the new (private) constructors that accept > streams, and adding a notice to Migration.md to point users at them, since > they can be used with resources streams loaded from the classpath by the > caller. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] sonatype-lift[bot] commented on a diff in pull request #868: LUCENE-10558: Implement URL ctor to support classpath usage in Kuromoji and Nori dictionaries
sonatype-lift[bot] commented on code in PR #868: URL: https://github.com/apache/lucene/pull/868#discussion_r865638824 ## lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/ConnectionCosts.java: ## @@ -63,6 +65,17 @@ public ConnectionCosts(Path connectionCostsFile) throws IOException { this(() -> Files.newInputStream(connectionCostsFile)); } + /** + * Create a {@link ConnectionCosts} from an external resource URL (e.g. from Classpath with {@link + * ClassLoader#getResource(String)}). + * + * @param connectionCostsUrl where to load connection costs resource + * @throws IOException if resource was not found or broken + */ + public ConnectionCosts(URL connectionCostsUrl) throws IOException { +this(() -> connectionCostsUrl.openStream()); Review Comment: *[URLCONNECTION_SSRF_FD](https://find-sec-bugs.github.io/bugs.htm#URLCONNECTION_SSRF_FD):* This web server request could be used by an attacker to expose internal services and filesystem. (at-me [in a reply](https://help.sonatype.com/lift/talking-to-lift) with `help` or `ignore`) --- Was this a good recommendation? [ [🙁 Not relevant](https://www.sonatype.com/lift-comment-rating?comment=204460545&lift_comment_rating=1) ] - [ [😕 Won't fix](https://www.sonatype.com/lift-comment-rating?comment=204460545&lift_comment_rating=2) ] - [ [😑 Not critical, will fix](https://www.sonatype.com/lift-comment-rating?comment=204460545&lift_comment_rating=3) ] - [ [🙂 Critical, will fix](https://www.sonatype.com/lift-comment-rating?comment=204460545&lift_comment_rating=4) ] - [ [😊 Critical, fixing now](https://www.sonatype.com/lift-comment-rating?comment=204460545&lift_comment_rating=5) ] ## lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/TokenInfoDictionary.java: ## @@ -82,6 +84,25 @@ public TokenInfoDictionary(Path targetMapFile, Path posDictFile, Path dictFile, () -> Files.newInputStream(fstFile)); } + /** + * Create a {@link TokenInfoDictionary} from an external resource URL (e.g. from Classpath with + * {@link ClassLoader#getResource(String)}). + * + * @param targetMapUrl where to load target map resource + * @param posDictUrl where to load POS dictionary resource + * @param dictUrl where to load dictionary entries resource + * @param fstUrl where to load encoded FST data resource + * @throws IOException if resource was not found or broken + */ + public TokenInfoDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl, URL fstUrl) + throws IOException { +this( +() -> targetMapUrl.openStream(), +() -> posDictUrl.openStream(), Review Comment: *[URLCONNECTION_SSRF_FD](https://find-sec-bugs.github.io/bugs.htm#URLCONNECTION_SSRF_FD):* This web server request could be used by an attacker to expose internal services and filesystem. (at-me [in a reply](https://help.sonatype.com/lift/talking-to-lift) with `help` or `ignore`) --- Was this a good recommendation? [ [🙁 Not relevant](https://www.sonatype.com/lift-comment-rating?comment=204460555&lift_comment_rating=1) ] - [ [😕 Won't fix](https://www.sonatype.com/lift-comment-rating?comment=204460555&lift_comment_rating=2) ] - [ [😑 Not critical, will fix](https://www.sonatype.com/lift-comment-rating?comment=204460555&lift_comment_rating=3) ] - [ [🙂 Critical, will fix](https://www.sonatype.com/lift-comment-rating?comment=204460555&lift_comment_rating=4) ] - [ [😊 Critical, fixing now](https://www.sonatype.com/lift-comment-rating?comment=204460555&lift_comment_rating=5) ] ## lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/TokenInfoDictionary.java: ## @@ -82,6 +84,25 @@ public TokenInfoDictionary(Path targetMapFile, Path posDictFile, Path dictFile, () -> Files.newInputStream(fstFile)); } + /** + * Create a {@link TokenInfoDictionary} from an external resource URL (e.g. from Classpath with + * {@link ClassLoader#getResource(String)}). + * + * @param targetMapUrl where to load target map resource + * @param posDictUrl where to load POS dictionary resource + * @param dictUrl where to load dictionary entries resource + * @param fstUrl where to load encoded FST data resource + * @throws IOException if resource was not found or broken + */ + public TokenInfoDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl, URL fstUrl) + throws IOException { +this( +() -> targetMapUrl.openStream(), +() -> posDictUrl.openStream(), +() -> dictUrl.openStream(), Review Comment: *[URLCONNECTION_SSRF_FD](https://find-sec-bugs.github.io/bugs.htm#URLCONNECTION_SSRF_FD):* This web server request could be used by an attacker to expose internal services and filesystem. (at-me [in a reply](https://help.sonatype.com/lift/talking-to-lift) with `help` or `ignore`) --- Was this a good reco
[GitHub] [lucene] sonatype-lift[bot] commented on a diff in pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries
sonatype-lift[bot] commented on code in PR #868: URL: https://github.com/apache/lucene/pull/868#discussion_r865639082 ## lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/TokenInfoDictionary.java: ## @@ -82,6 +84,25 @@ public TokenInfoDictionary(Path targetMapFile, Path posDictFile, Path dictFile, () -> Files.newInputStream(fstFile)); } + /** + * Create a {@link TokenInfoDictionary} from an external resource URL (e.g. from Classpath with + * {@link ClassLoader#getResource(String)}). + * + * @param targetMapUrl where to load target map resource + * @param posDictUrl where to load POS dictionary resource + * @param dictUrl where to load dictionary entries resource + * @param fstUrl where to load encoded FST data resource + * @throws IOException if resource was not found or broken + */ + public TokenInfoDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl, URL fstUrl) + throws IOException { +this( +() -> targetMapUrl.openStream(), Review Comment: *[URLCONNECTION_SSRF_FD](https://find-sec-bugs.github.io/bugs.htm#URLCONNECTION_SSRF_FD):* This web server request could be used by an attacker to expose internal services and filesystem. (at-me [in a reply](https://help.sonatype.com/lift/talking-to-lift) with `help` or `ignore`) --- Was this a good recommendation? [ [🙁 Not relevant](https://www.sonatype.com/lift-comment-rating?comment=204460624&lift_comment_rating=1) ] - [ [😕 Won't fix](https://www.sonatype.com/lift-comment-rating?comment=204460624&lift_comment_rating=2) ] - [ [😑 Not critical, will fix](https://www.sonatype.com/lift-comment-rating?comment=204460624&lift_comment_rating=3) ] - [ [🙂 Critical, will fix](https://www.sonatype.com/lift-comment-rating?comment=204460624&lift_comment_rating=4) ] - [ [😊 Critical, fixing now](https://www.sonatype.com/lift-comment-rating?comment=204460624&lift_comment_rating=5) ] ## lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/UnknownDictionary.java: ## @@ -64,6 +66,20 @@ public UnknownDictionary(Path targetMapFile, Path posDictFile, Path dictFile) th () -> Files.newInputStream(dictFile)); } + /** + * Create a {@link UnknownDictionary} from an external resource URL (e.g. from Classpath with + * {@link ClassLoader#getResource(String)}). + * + * @param targetMapUrl where to load target map resource + * @param posDictUrl where to load POS dictionary resource + * @param dictUrl where to load dictionary entries resource + * @throws IOException if resource was not found or broken + */ + public UnknownDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl) throws IOException { +super( +() -> targetMapUrl.openStream(), () -> posDictUrl.openStream(), () -> dictUrl.openStream()); Review Comment: *[URLCONNECTION_SSRF_FD](https://find-sec-bugs.github.io/bugs.htm#URLCONNECTION_SSRF_FD):* This web server request could be used by an attacker to expose internal services and filesystem. (at-me [in a reply](https://help.sonatype.com/lift/talking-to-lift) with `help` or `ignore`) --- Was this a good recommendation? [ [🙁 Not relevant](https://www.sonatype.com/lift-comment-rating?comment=204460625&lift_comment_rating=1) ] - [ [😕 Won't fix](https://www.sonatype.com/lift-comment-rating?comment=204460625&lift_comment_rating=2) ] - [ [😑 Not critical, will fix](https://www.sonatype.com/lift-comment-rating?comment=204460625&lift_comment_rating=3) ] - [ [🙂 Critical, will fix](https://www.sonatype.com/lift-comment-rating?comment=204460625&lift_comment_rating=4) ] - [ [😊 Critical, fixing now](https://www.sonatype.com/lift-comment-rating?comment=204460625&lift_comment_rating=5) ] ## lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/TokenInfoDictionary.java: ## @@ -82,6 +84,25 @@ public TokenInfoDictionary(Path targetMapFile, Path posDictFile, Path dictFile, () -> Files.newInputStream(fstFile)); } + /** + * Create a {@link TokenInfoDictionary} from an external resource URL (e.g. from Classpath with + * {@link ClassLoader#getResource(String)}). + * + * @param targetMapUrl where to load target map resource + * @param posDictUrl where to load POS dictionary resource + * @param dictUrl where to load dictionary entries resource + * @param fstUrl where to load encoded FST data resource + * @throws IOException if resource was not found or broken + */ + public TokenInfoDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl, URL fstUrl) + throws IOException { +this( +() -> targetMapUrl.openStream(), +() -> posDictUrl.openStream(), +() -> dictUrl.openStream(), +() -> fstUrl.openStream()); Review Comment: *[URLCONNECTION_SSRF_FD](https://find-sec-bugs.github.io/bugs.htm#URLCONNECTION_SSRF_FD):* This web server request could be
[GitHub] [lucene] sonatype-lift[bot] commented on a diff in pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries
sonatype-lift[bot] commented on code in PR #868: URL: https://github.com/apache/lucene/pull/868#discussion_r865650503 ## lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/ConnectionCosts.java: ## @@ -63,6 +65,17 @@ public ConnectionCosts(Path connectionCostsFile) throws IOException { this(() -> Files.newInputStream(connectionCostsFile)); } + /** + * Create a {@link ConnectionCosts} from an external resource URL (e.g. from Classpath with {@link + * ClassLoader#getResource(String)}). + * + * @param connectionCostsUrl where to load connection costs resource + * @throws IOException if resource was not found or broken + */ + public ConnectionCosts(URL connectionCostsUrl) throws IOException { +this(() -> connectionCostsUrl.openStream()); Review Comment: I've recorded this as ignored for this pull request. If you change your mind, just comment `@sonatype-lift unignore`. ## lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/TokenInfoDictionary.java: ## @@ -82,6 +84,25 @@ public TokenInfoDictionary(Path targetMapFile, Path posDictFile, Path dictFile, () -> Files.newInputStream(fstFile)); } + /** + * Create a {@link TokenInfoDictionary} from an external resource URL (e.g. from Classpath with + * {@link ClassLoader#getResource(String)}). + * + * @param targetMapUrl where to load target map resource + * @param posDictUrl where to load POS dictionary resource + * @param dictUrl where to load dictionary entries resource + * @param fstUrl where to load encoded FST data resource + * @throws IOException if resource was not found or broken + */ + public TokenInfoDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl, URL fstUrl) + throws IOException { +this( +() -> targetMapUrl.openStream(), +() -> posDictUrl.openStream(), Review Comment: I've recorded this as ignored for this pull request. If you change your mind, just comment `@sonatype-lift unignore`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a diff in pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries
uschindler commented on code in PR #868: URL: https://github.com/apache/lucene/pull/868#discussion_r865650474 ## lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/ConnectionCosts.java: ## @@ -63,6 +65,17 @@ public ConnectionCosts(Path connectionCostsFile) throws IOException { this(() -> Files.newInputStream(connectionCostsFile)); } + /** + * Create a {@link ConnectionCosts} from an external resource URL (e.g. from Classpath with {@link + * ClassLoader#getResource(String)}). + * + * @param connectionCostsUrl where to load connection costs resource + * @throws IOException if resource was not found or broken + */ + public ConnectionCosts(URL connectionCostsUrl) throws IOException { +this(() -> connectionCostsUrl.openStream()); Review Comment: @sonatype-lift ignore ## lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/TokenInfoDictionary.java: ## @@ -82,6 +84,25 @@ public TokenInfoDictionary(Path targetMapFile, Path posDictFile, Path dictFile, () -> Files.newInputStream(fstFile)); } + /** + * Create a {@link TokenInfoDictionary} from an external resource URL (e.g. from Classpath with + * {@link ClassLoader#getResource(String)}). + * + * @param targetMapUrl where to load target map resource + * @param posDictUrl where to load POS dictionary resource + * @param dictUrl where to load dictionary entries resource + * @param fstUrl where to load encoded FST data resource + * @throws IOException if resource was not found or broken + */ + public TokenInfoDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl, URL fstUrl) + throws IOException { +this( +() -> targetMapUrl.openStream(), +() -> posDictUrl.openStream(), Review Comment: @sonatype-lift ignore -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a diff in pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries
uschindler commented on code in PR #868: URL: https://github.com/apache/lucene/pull/868#discussion_r865650662 ## lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/TokenInfoDictionary.java: ## @@ -82,6 +84,25 @@ public TokenInfoDictionary(Path targetMapFile, Path posDictFile, Path dictFile, () -> Files.newInputStream(fstFile)); } + /** + * Create a {@link TokenInfoDictionary} from an external resource URL (e.g. from Classpath with + * {@link ClassLoader#getResource(String)}). + * + * @param targetMapUrl where to load target map resource + * @param posDictUrl where to load POS dictionary resource + * @param dictUrl where to load dictionary entries resource + * @param fstUrl where to load encoded FST data resource + * @throws IOException if resource was not found or broken + */ + public TokenInfoDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl, URL fstUrl) + throws IOException { +this( +() -> targetMapUrl.openStream(), +() -> posDictUrl.openStream(), +() -> dictUrl.openStream(), Review Comment: @sonatype-lift ignore ## lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/TokenInfoDictionary.java: ## @@ -82,6 +84,25 @@ public TokenInfoDictionary(Path targetMapFile, Path posDictFile, Path dictFile, () -> Files.newInputStream(fstFile)); } + /** + * Create a {@link TokenInfoDictionary} from an external resource URL (e.g. from Classpath with + * {@link ClassLoader#getResource(String)}). + * + * @param targetMapUrl where to load target map resource + * @param posDictUrl where to load POS dictionary resource + * @param dictUrl where to load dictionary entries resource + * @param fstUrl where to load encoded FST data resource + * @throws IOException if resource was not found or broken + */ + public TokenInfoDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl, URL fstUrl) + throws IOException { +this( +() -> targetMapUrl.openStream(), Review Comment: @sonatype-lift ignore -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] sonatype-lift[bot] commented on a diff in pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries
sonatype-lift[bot] commented on code in PR #868: URL: https://github.com/apache/lucene/pull/868#discussion_r865650684 ## lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/TokenInfoDictionary.java: ## @@ -82,6 +84,25 @@ public TokenInfoDictionary(Path targetMapFile, Path posDictFile, Path dictFile, () -> Files.newInputStream(fstFile)); } + /** + * Create a {@link TokenInfoDictionary} from an external resource URL (e.g. from Classpath with + * {@link ClassLoader#getResource(String)}). + * + * @param targetMapUrl where to load target map resource + * @param posDictUrl where to load POS dictionary resource + * @param dictUrl where to load dictionary entries resource + * @param fstUrl where to load encoded FST data resource + * @throws IOException if resource was not found or broken + */ + public TokenInfoDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl, URL fstUrl) + throws IOException { +this( +() -> targetMapUrl.openStream(), +() -> posDictUrl.openStream(), +() -> dictUrl.openStream(), Review Comment: I've recorded this as ignored for this pull request. If you change your mind, just comment `@sonatype-lift unignore`. ## lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/TokenInfoDictionary.java: ## @@ -82,6 +84,25 @@ public TokenInfoDictionary(Path targetMapFile, Path posDictFile, Path dictFile, () -> Files.newInputStream(fstFile)); } + /** + * Create a {@link TokenInfoDictionary} from an external resource URL (e.g. from Classpath with + * {@link ClassLoader#getResource(String)}). + * + * @param targetMapUrl where to load target map resource + * @param posDictUrl where to load POS dictionary resource + * @param dictUrl where to load dictionary entries resource + * @param fstUrl where to load encoded FST data resource + * @throws IOException if resource was not found or broken + */ + public TokenInfoDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl, URL fstUrl) + throws IOException { +this( +() -> targetMapUrl.openStream(), Review Comment: I've recorded this as ignored for this pull request. If you change your mind, just comment `@sonatype-lift unignore`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a diff in pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries
uschindler commented on code in PR #868: URL: https://github.com/apache/lucene/pull/868#discussion_r865650877 ## lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/UnknownDictionary.java: ## @@ -64,6 +66,20 @@ public UnknownDictionary(Path targetMapFile, Path posDictFile, Path dictFile) th () -> Files.newInputStream(dictFile)); } + /** + * Create a {@link UnknownDictionary} from an external resource URL (e.g. from Classpath with + * {@link ClassLoader#getResource(String)}). + * + * @param targetMapUrl where to load target map resource + * @param posDictUrl where to load POS dictionary resource + * @param dictUrl where to load dictionary entries resource + * @throws IOException if resource was not found or broken + */ + public UnknownDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl) throws IOException { +super( +() -> targetMapUrl.openStream(), () -> posDictUrl.openStream(), () -> dictUrl.openStream()); Review Comment: @sonatype-lift ignore ## lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/TokenInfoDictionary.java: ## @@ -82,6 +84,25 @@ public TokenInfoDictionary(Path targetMapFile, Path posDictFile, Path dictFile, () -> Files.newInputStream(fstFile)); } + /** + * Create a {@link TokenInfoDictionary} from an external resource URL (e.g. from Classpath with + * {@link ClassLoader#getResource(String)}). + * + * @param targetMapUrl where to load target map resource + * @param posDictUrl where to load POS dictionary resource + * @param dictUrl where to load dictionary entries resource + * @param fstUrl where to load encoded FST data resource + * @throws IOException if resource was not found or broken + */ + public TokenInfoDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl, URL fstUrl) + throws IOException { +this( +() -> targetMapUrl.openStream(), +() -> posDictUrl.openStream(), +() -> dictUrl.openStream(), +() -> fstUrl.openStream()); Review Comment: @sonatype-lift ignore -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] sonatype-lift[bot] commented on a diff in pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries
sonatype-lift[bot] commented on code in PR #868: URL: https://github.com/apache/lucene/pull/868#discussion_r865650926 ## lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/UnknownDictionary.java: ## @@ -64,6 +66,20 @@ public UnknownDictionary(Path targetMapFile, Path posDictFile, Path dictFile) th () -> Files.newInputStream(dictFile)); } + /** + * Create a {@link UnknownDictionary} from an external resource URL (e.g. from Classpath with + * {@link ClassLoader#getResource(String)}). + * + * @param targetMapUrl where to load target map resource + * @param posDictUrl where to load POS dictionary resource + * @param dictUrl where to load dictionary entries resource + * @throws IOException if resource was not found or broken + */ + public UnknownDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl) throws IOException { +super( +() -> targetMapUrl.openStream(), () -> posDictUrl.openStream(), () -> dictUrl.openStream()); Review Comment: I've recorded this as ignored for this pull request. If you change your mind, just comment `@sonatype-lift unignore`. ## lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/TokenInfoDictionary.java: ## @@ -82,6 +84,25 @@ public TokenInfoDictionary(Path targetMapFile, Path posDictFile, Path dictFile, () -> Files.newInputStream(fstFile)); } + /** + * Create a {@link TokenInfoDictionary} from an external resource URL (e.g. from Classpath with + * {@link ClassLoader#getResource(String)}). + * + * @param targetMapUrl where to load target map resource + * @param posDictUrl where to load POS dictionary resource + * @param dictUrl where to load dictionary entries resource + * @param fstUrl where to load encoded FST data resource + * @throws IOException if resource was not found or broken + */ + public TokenInfoDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl, URL fstUrl) + throws IOException { +this( +() -> targetMapUrl.openStream(), +() -> posDictUrl.openStream(), +() -> dictUrl.openStream(), +() -> fstUrl.openStream()); Review Comment: I've recorded this as ignored for this pull request. If you change your mind, just comment `@sonatype-lift unignore`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries
uschindler commented on PR #868: URL: https://github.com/apache/lucene/pull/868#issuecomment-1118281755 Can someone disable this @sonatype-lift, makes no sense for Lucene as we are no webservers. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries
uschindler commented on PR #868: URL: https://github.com/apache/lucene/pull/868#issuecomment-1118282233 @sonatype-lift silence ignore shutup 👎 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] sonatype-lift[bot] commented on a diff in pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries
sonatype-lift[bot] commented on code in PR #868: URL: https://github.com/apache/lucene/pull/868#discussion_r865656370 ## lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/TokenInfoDictionary.java: ## @@ -82,6 +84,25 @@ public TokenInfoDictionary(Path targetMapFile, Path posDictFile, Path dictFile, () -> Files.newInputStream(fstFile)); } + /** + * Create a {@link TokenInfoDictionary} from an external resource URL (e.g. from Classpath with + * {@link ClassLoader#getResource(String)}). + * + * @param targetMapUrl where to load target map resource + * @param posDictUrl where to load POS dictionary resource + * @param dictUrl where to load dictionary entries resource + * @param fstUrl where to load encoded FST data resource + * @throws IOException if resource was not found or broken + */ + public TokenInfoDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl, URL fstUrl) + throws IOException { +this( +() -> targetMapUrl.openStream(), +() -> posDictUrl.openStream(), Review Comment: *[URLCONNECTION_SSRF_FD](https://find-sec-bugs.github.io/bugs.htm#URLCONNECTION_SSRF_FD):* This web server request could be used by an attacker to expose internal services and filesystem. (at-me [in a reply](https://help.sonatype.com/lift/talking-to-lift) with `help` or `ignore`) --- Was this a good recommendation? [ [🙁 Not relevant](https://www.sonatype.com/lift-comment-rating?comment=204469703&lift_comment_rating=1) ] - [ [😕 Won't fix](https://www.sonatype.com/lift-comment-rating?comment=204469703&lift_comment_rating=2) ] - [ [😑 Not critical, will fix](https://www.sonatype.com/lift-comment-rating?comment=204469703&lift_comment_rating=3) ] - [ [🙂 Critical, will fix](https://www.sonatype.com/lift-comment-rating?comment=204469703&lift_comment_rating=4) ] - [ [😊 Critical, fixing now](https://www.sonatype.com/lift-comment-rating?comment=204469703&lift_comment_rating=5) ] ## lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/TokenInfoDictionary.java: ## @@ -82,6 +84,25 @@ public TokenInfoDictionary(Path targetMapFile, Path posDictFile, Path dictFile, () -> Files.newInputStream(fstFile)); } + /** + * Create a {@link TokenInfoDictionary} from an external resource URL (e.g. from Classpath with + * {@link ClassLoader#getResource(String)}). + * + * @param targetMapUrl where to load target map resource + * @param posDictUrl where to load POS dictionary resource + * @param dictUrl where to load dictionary entries resource + * @param fstUrl where to load encoded FST data resource + * @throws IOException if resource was not found or broken + */ + public TokenInfoDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl, URL fstUrl) + throws IOException { +this( +() -> targetMapUrl.openStream(), +() -> posDictUrl.openStream(), +() -> dictUrl.openStream(), +() -> fstUrl.openStream()); Review Comment: *[URLCONNECTION_SSRF_FD](https://find-sec-bugs.github.io/bugs.htm#URLCONNECTION_SSRF_FD):* This web server request could be used by an attacker to expose internal services and filesystem. (at-me [in a reply](https://help.sonatype.com/lift/talking-to-lift) with `help` or `ignore`) --- Was this a good recommendation? [ [🙁 Not relevant](https://www.sonatype.com/lift-comment-rating?comment=204469740&lift_comment_rating=1) ] - [ [😕 Won't fix](https://www.sonatype.com/lift-comment-rating?comment=204469740&lift_comment_rating=2) ] - [ [😑 Not critical, will fix](https://www.sonatype.com/lift-comment-rating?comment=204469740&lift_comment_rating=3) ] - [ [🙂 Critical, will fix](https://www.sonatype.com/lift-comment-rating?comment=204469740&lift_comment_rating=4) ] - [ [😊 Critical, fixing now](https://www.sonatype.com/lift-comment-rating?comment=204469740&lift_comment_rating=5) ] ## lucene/analysis/nori/src/java/org/apache/lucene/analysis/ko/dict/TokenInfoDictionary.java: ## @@ -89,6 +92,25 @@ public TokenInfoDictionary(Path targetMapFile, Path posDictFile, Path dictFile, () -> Files.newInputStream(fstFile)); } + /** + * Create a {@link TokenInfoDictionary} from an external resource URL (e.g. from Classpath with + * {@link ClassLoader#getResource(String)}). + * + * @param targetMapUrl where to load target map resource + * @param posDictUrl where to load POS dictionary resource + * @param dictUrl where to load dictionary entries resource + * @param fstUrl where to load encoded FST data resource + * @throws IOException if resource was not found or broken + */ + public TokenInfoDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl, URL fstUrl) + throws IOException { +this( +() -> targetMapUrl.openStream(), +() -> posDictUrl.openStream(), Review Comment: *[URLCONNECTION_SSRF_FD](http
[GitHub] [lucene] uschindler commented on a diff in pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries
uschindler commented on code in PR #868: URL: https://github.com/apache/lucene/pull/868#discussion_r865656501 ## lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/TokenInfoDictionary.java: ## @@ -82,6 +84,25 @@ public TokenInfoDictionary(Path targetMapFile, Path posDictFile, Path dictFile, () -> Files.newInputStream(fstFile)); } + /** + * Create a {@link TokenInfoDictionary} from an external resource URL (e.g. from Classpath with + * {@link ClassLoader#getResource(String)}). + * + * @param targetMapUrl where to load target map resource + * @param posDictUrl where to load POS dictionary resource + * @param dictUrl where to load dictionary entries resource + * @param fstUrl where to load encoded FST data resource + * @throws IOException if resource was not found or broken + */ + public TokenInfoDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl, URL fstUrl) + throws IOException { +this( +() -> targetMapUrl.openStream(), +() -> posDictUrl.openStream(), Review Comment: @sonatype-lift ignore -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] sonatype-lift[bot] commented on a diff in pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries
sonatype-lift[bot] commented on code in PR #868: URL: https://github.com/apache/lucene/pull/868#discussion_r865656677 ## lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/TokenInfoDictionary.java: ## @@ -82,6 +84,25 @@ public TokenInfoDictionary(Path targetMapFile, Path posDictFile, Path dictFile, () -> Files.newInputStream(fstFile)); } + /** + * Create a {@link TokenInfoDictionary} from an external resource URL (e.g. from Classpath with + * {@link ClassLoader#getResource(String)}). + * + * @param targetMapUrl where to load target map resource + * @param posDictUrl where to load POS dictionary resource + * @param dictUrl where to load dictionary entries resource + * @param fstUrl where to load encoded FST data resource + * @throws IOException if resource was not found or broken + */ + public TokenInfoDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl, URL fstUrl) + throws IOException { +this( +() -> targetMapUrl.openStream(), +() -> posDictUrl.openStream(), +() -> dictUrl.openStream(), Review Comment: *[URLCONNECTION_SSRF_FD](https://find-sec-bugs.github.io/bugs.htm#URLCONNECTION_SSRF_FD):* This web server request could be used by an attacker to expose internal services and filesystem. (at-me [in a reply](https://help.sonatype.com/lift/talking-to-lift) with `help` or `ignore`) --- Was this a good recommendation? [ [🙁 Not relevant](https://www.sonatype.com/lift-comment-rating?comment=204469769&lift_comment_rating=1) ] - [ [😕 Won't fix](https://www.sonatype.com/lift-comment-rating?comment=204469769&lift_comment_rating=2) ] - [ [😑 Not critical, will fix](https://www.sonatype.com/lift-comment-rating?comment=204469769&lift_comment_rating=3) ] - [ [🙂 Critical, will fix](https://www.sonatype.com/lift-comment-rating?comment=204469769&lift_comment_rating=4) ] - [ [😊 Critical, fixing now](https://www.sonatype.com/lift-comment-rating?comment=204469769&lift_comment_rating=5) ] ## lucene/analysis/nori/src/java/org/apache/lucene/analysis/ko/dict/TokenInfoDictionary.java: ## @@ -89,6 +92,25 @@ public TokenInfoDictionary(Path targetMapFile, Path posDictFile, Path dictFile, () -> Files.newInputStream(fstFile)); } + /** + * Create a {@link TokenInfoDictionary} from an external resource URL (e.g. from Classpath with + * {@link ClassLoader#getResource(String)}). + * + * @param targetMapUrl where to load target map resource + * @param posDictUrl where to load POS dictionary resource + * @param dictUrl where to load dictionary entries resource + * @param fstUrl where to load encoded FST data resource + * @throws IOException if resource was not found or broken + */ + public TokenInfoDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl, URL fstUrl) + throws IOException { +this( +() -> targetMapUrl.openStream(), +() -> posDictUrl.openStream(), +() -> dictUrl.openStream(), +() -> fstUrl.openStream()); Review Comment: *[URLCONNECTION_SSRF_FD](https://find-sec-bugs.github.io/bugs.htm#URLCONNECTION_SSRF_FD):* This web server request could be used by an attacker to expose internal services and filesystem. (at-me [in a reply](https://help.sonatype.com/lift/talking-to-lift) with `help` or `ignore`) --- Was this a good recommendation? [ [🙁 Not relevant](https://www.sonatype.com/lift-comment-rating?comment=204469788&lift_comment_rating=1) ] - [ [😕 Won't fix](https://www.sonatype.com/lift-comment-rating?comment=204469788&lift_comment_rating=2) ] - [ [😑 Not critical, will fix](https://www.sonatype.com/lift-comment-rating?comment=204469788&lift_comment_rating=3) ] - [ [🙂 Critical, will fix](https://www.sonatype.com/lift-comment-rating?comment=204469788&lift_comment_rating=4) ] - [ [😊 Critical, fixing now](https://www.sonatype.com/lift-comment-rating?comment=204469788&lift_comment_rating=5) ] ## lucene/analysis/nori/src/java/org/apache/lucene/analysis/ko/dict/TokenInfoDictionary.java: ## @@ -89,6 +92,25 @@ public TokenInfoDictionary(Path targetMapFile, Path posDictFile, Path dictFile, () -> Files.newInputStream(fstFile)); } + /** + * Create a {@link TokenInfoDictionary} from an external resource URL (e.g. from Classpath with + * {@link ClassLoader#getResource(String)}). + * + * @param targetMapUrl where to load target map resource + * @param posDictUrl where to load POS dictionary resource + * @param dictUrl where to load dictionary entries resource + * @param fstUrl where to load encoded FST data resource + * @throws IOException if resource was not found or broken + */ + public TokenInfoDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl, URL fstUrl) + throws IOException { +this( +() -> targetMapUrl.openStream(), +() -> posDictUrl.openStream(), Review Comment:
[GitHub] [lucene] uschindler commented on a diff in pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries
uschindler commented on code in PR #868: URL: https://github.com/apache/lucene/pull/868#discussion_r865656735 ## lucene/analysis/nori/src/java/org/apache/lucene/analysis/ko/dict/TokenInfoDictionary.java: ## @@ -89,6 +92,25 @@ public TokenInfoDictionary(Path targetMapFile, Path posDictFile, Path dictFile, () -> Files.newInputStream(fstFile)); } + /** + * Create a {@link TokenInfoDictionary} from an external resource URL (e.g. from Classpath with + * {@link ClassLoader#getResource(String)}). + * + * @param targetMapUrl where to load target map resource + * @param posDictUrl where to load POS dictionary resource + * @param dictUrl where to load dictionary entries resource + * @param fstUrl where to load encoded FST data resource + * @throws IOException if resource was not found or broken + */ + public TokenInfoDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl, URL fstUrl) + throws IOException { +this( +() -> targetMapUrl.openStream(), +() -> posDictUrl.openStream(), Review Comment: @sonatype-lift ignore ## lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/UnknownDictionary.java: ## @@ -64,6 +66,20 @@ public UnknownDictionary(Path targetMapFile, Path posDictFile, Path dictFile) th () -> Files.newInputStream(dictFile)); } + /** + * Create a {@link UnknownDictionary} from an external resource URL (e.g. from Classpath with + * {@link ClassLoader#getResource(String)}). + * + * @param targetMapUrl where to load target map resource + * @param posDictUrl where to load POS dictionary resource + * @param dictUrl where to load dictionary entries resource + * @throws IOException if resource was not found or broken + */ + public UnknownDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl) throws IOException { +super( +() -> targetMapUrl.openStream(), () -> posDictUrl.openStream(), () -> dictUrl.openStream()); Review Comment: @sonatype-lift ignore -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] sonatype-lift[bot] commented on a diff in pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries
sonatype-lift[bot] commented on code in PR #868: URL: https://github.com/apache/lucene/pull/868#discussion_r865656822 ## lucene/analysis/nori/src/java/org/apache/lucene/analysis/ko/dict/TokenInfoDictionary.java: ## @@ -89,6 +92,25 @@ public TokenInfoDictionary(Path targetMapFile, Path posDictFile, Path dictFile, () -> Files.newInputStream(fstFile)); } + /** + * Create a {@link TokenInfoDictionary} from an external resource URL (e.g. from Classpath with + * {@link ClassLoader#getResource(String)}). + * + * @param targetMapUrl where to load target map resource + * @param posDictUrl where to load POS dictionary resource + * @param dictUrl where to load dictionary entries resource + * @param fstUrl where to load encoded FST data resource + * @throws IOException if resource was not found or broken + */ + public TokenInfoDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl, URL fstUrl) + throws IOException { +this( +() -> targetMapUrl.openStream(), +() -> posDictUrl.openStream(), +() -> dictUrl.openStream(), Review Comment: *[URLCONNECTION_SSRF_FD](https://find-sec-bugs.github.io/bugs.htm#URLCONNECTION_SSRF_FD):* This web server request could be used by an attacker to expose internal services and filesystem. (at-me [in a reply](https://help.sonatype.com/lift/talking-to-lift) with `help` or `ignore`) --- Was this a good recommendation? [ [🙁 Not relevant](https://www.sonatype.com/lift-comment-rating?comment=204469797&lift_comment_rating=1) ] - [ [😕 Won't fix](https://www.sonatype.com/lift-comment-rating?comment=204469797&lift_comment_rating=2) ] - [ [😑 Not critical, will fix](https://www.sonatype.com/lift-comment-rating?comment=204469797&lift_comment_rating=3) ] - [ [🙂 Critical, will fix](https://www.sonatype.com/lift-comment-rating?comment=204469797&lift_comment_rating=4) ] - [ [😊 Critical, fixing now](https://www.sonatype.com/lift-comment-rating?comment=204469797&lift_comment_rating=5) ] ## lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/ConnectionCosts.java: ## @@ -63,6 +65,17 @@ public ConnectionCosts(Path connectionCostsFile) throws IOException { this(() -> Files.newInputStream(connectionCostsFile)); } + /** + * Create a {@link ConnectionCosts} from an external resource URL (e.g. from Classpath with {@link + * ClassLoader#getResource(String)}). + * + * @param connectionCostsUrl where to load connection costs resource + * @throws IOException if resource was not found or broken + */ + public ConnectionCosts(URL connectionCostsUrl) throws IOException { +this(() -> connectionCostsUrl.openStream()); Review Comment: *[URLCONNECTION_SSRF_FD](https://find-sec-bugs.github.io/bugs.htm#URLCONNECTION_SSRF_FD):* This web server request could be used by an attacker to expose internal services and filesystem. (at-me [in a reply](https://help.sonatype.com/lift/talking-to-lift) with `help` or `ignore`) --- Was this a good recommendation? [ [🙁 Not relevant](https://www.sonatype.com/lift-comment-rating?comment=204469815&lift_comment_rating=1) ] - [ [😕 Won't fix](https://www.sonatype.com/lift-comment-rating?comment=204469815&lift_comment_rating=2) ] - [ [😑 Not critical, will fix](https://www.sonatype.com/lift-comment-rating?comment=204469815&lift_comment_rating=3) ] - [ [🙂 Critical, will fix](https://www.sonatype.com/lift-comment-rating?comment=204469815&lift_comment_rating=4) ] - [ [😊 Critical, fixing now](https://www.sonatype.com/lift-comment-rating?comment=204469815&lift_comment_rating=5) ] ## lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/UnknownDictionary.java: ## @@ -64,6 +66,20 @@ public UnknownDictionary(Path targetMapFile, Path posDictFile, Path dictFile) th () -> Files.newInputStream(dictFile)); } + /** + * Create a {@link UnknownDictionary} from an external resource URL (e.g. from Classpath with + * {@link ClassLoader#getResource(String)}). + * + * @param targetMapUrl where to load target map resource + * @param posDictUrl where to load POS dictionary resource + * @param dictUrl where to load dictionary entries resource + * @throws IOException if resource was not found or broken + */ + public UnknownDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl) throws IOException { +super( +() -> targetMapUrl.openStream(), () -> posDictUrl.openStream(), () -> dictUrl.openStream()); Review Comment: I've recorded this as ignored for this pull request. If you change your mind, just comment `@sonatype-lift unignore`. ## lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/TokenInfoDictionary.java: ## @@ -82,6 +84,25 @@ public TokenInfoDictionary(Path targetMapFile, Path posDictFile, Path dictFile, () -> Files.newInputStream(fstFile)); } +
[GitHub] [lucene] uschindler commented on a diff in pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries
uschindler commented on code in PR #868: URL: https://github.com/apache/lucene/pull/868#discussion_r865656911 ## lucene/analysis/nori/src/java/org/apache/lucene/analysis/ko/dict/TokenInfoDictionary.java: ## @@ -89,6 +92,25 @@ public TokenInfoDictionary(Path targetMapFile, Path posDictFile, Path dictFile, () -> Files.newInputStream(fstFile)); } + /** + * Create a {@link TokenInfoDictionary} from an external resource URL (e.g. from Classpath with + * {@link ClassLoader#getResource(String)}). + * + * @param targetMapUrl where to load target map resource + * @param posDictUrl where to load POS dictionary resource + * @param dictUrl where to load dictionary entries resource + * @param fstUrl where to load encoded FST data resource + * @throws IOException if resource was not found or broken + */ + public TokenInfoDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl, URL fstUrl) + throws IOException { +this( +() -> targetMapUrl.openStream(), +() -> posDictUrl.openStream(), +() -> dictUrl.openStream(), +() -> fstUrl.openStream()); Review Comment: @sonatype-lift ignore ## lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/ConnectionCosts.java: ## @@ -63,6 +65,17 @@ public ConnectionCosts(Path connectionCostsFile) throws IOException { this(() -> Files.newInputStream(connectionCostsFile)); } + /** + * Create a {@link ConnectionCosts} from an external resource URL (e.g. from Classpath with {@link + * ClassLoader#getResource(String)}). + * + * @param connectionCostsUrl where to load connection costs resource + * @throws IOException if resource was not found or broken + */ + public ConnectionCosts(URL connectionCostsUrl) throws IOException { +this(() -> connectionCostsUrl.openStream()); Review Comment: @sonatype-lift ignore -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] sonatype-lift[bot] commented on a diff in pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries
sonatype-lift[bot] commented on code in PR #868: URL: https://github.com/apache/lucene/pull/868#discussion_r865657065 ## lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/ConnectionCosts.java: ## @@ -63,6 +65,17 @@ public ConnectionCosts(Path connectionCostsFile) throws IOException { this(() -> Files.newInputStream(connectionCostsFile)); } + /** + * Create a {@link ConnectionCosts} from an external resource URL (e.g. from Classpath with {@link + * ClassLoader#getResource(String)}). + * + * @param connectionCostsUrl where to load connection costs resource + * @throws IOException if resource was not found or broken + */ + public ConnectionCosts(URL connectionCostsUrl) throws IOException { +this(() -> connectionCostsUrl.openStream()); Review Comment: I've recorded this as ignored for this pull request. If you change your mind, just comment `@sonatype-lift unignore`. ## lucene/analysis/nori/src/java/org/apache/lucene/analysis/ko/dict/ConnectionCosts.java: ## @@ -63,6 +66,17 @@ public ConnectionCosts(Path connectionCostsFile) throws IOException { this(() -> Files.newInputStream(connectionCostsFile)); } + /** + * Create a {@link ConnectionCosts} from an external resource URL (e.g. from Classpath with {@link + * ClassLoader#getResource(String)}). + * + * @param connectionCostsUrl where to load connection costs resource + * @throws IOException if resource was not found or broken + */ + public ConnectionCosts(URL connectionCostsUrl) throws IOException { +this(() -> connectionCostsUrl.openStream()); Review Comment: I've recorded this as ignored for this pull request. If you change your mind, just comment `@sonatype-lift unignore`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a diff in pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries
uschindler commented on code in PR #868: URL: https://github.com/apache/lucene/pull/868#discussion_r865657121 ## lucene/analysis/nori/src/java/org/apache/lucene/analysis/ko/dict/ConnectionCosts.java: ## @@ -63,6 +66,17 @@ public ConnectionCosts(Path connectionCostsFile) throws IOException { this(() -> Files.newInputStream(connectionCostsFile)); } + /** + * Create a {@link ConnectionCosts} from an external resource URL (e.g. from Classpath with {@link + * ClassLoader#getResource(String)}). + * + * @param connectionCostsUrl where to load connection costs resource + * @throws IOException if resource was not found or broken + */ + public ConnectionCosts(URL connectionCostsUrl) throws IOException { +this(() -> connectionCostsUrl.openStream()); Review Comment: @sonatype-lift ignore ## lucene/analysis/nori/src/java/org/apache/lucene/analysis/ko/dict/UnknownDictionary.java: ## @@ -62,6 +65,20 @@ public UnknownDictionary(Path targetMapFile, Path posDictFile, Path dictFile) th () -> Files.newInputStream(dictFile)); } + /** + * Create a {@link UnknownDictionary} from an external resource URL (e.g. from Classpath with + * {@link ClassLoader#getResource(String)}). + * + * @param targetMapUrl where to load target map resource + * @param posDictUrl where to load POS dictionary resource + * @param dictUrl where to load dictionary entries resource + * @throws IOException if resource was not found or broken + */ + public UnknownDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl) throws IOException { +super( +() -> targetMapUrl.openStream(), () -> posDictUrl.openStream(), () -> dictUrl.openStream()); Review Comment: @sonatype-lift ignore -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] sonatype-lift[bot] commented on a diff in pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries
sonatype-lift[bot] commented on code in PR #868: URL: https://github.com/apache/lucene/pull/868#discussion_r865657342 ## lucene/analysis/nori/src/java/org/apache/lucene/analysis/ko/dict/UnknownDictionary.java: ## @@ -62,6 +65,20 @@ public UnknownDictionary(Path targetMapFile, Path posDictFile, Path dictFile) th () -> Files.newInputStream(dictFile)); } + /** + * Create a {@link UnknownDictionary} from an external resource URL (e.g. from Classpath with + * {@link ClassLoader#getResource(String)}). + * + * @param targetMapUrl where to load target map resource + * @param posDictUrl where to load POS dictionary resource + * @param dictUrl where to load dictionary entries resource + * @throws IOException if resource was not found or broken + */ + public UnknownDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl) throws IOException { +super( +() -> targetMapUrl.openStream(), () -> posDictUrl.openStream(), () -> dictUrl.openStream()); Review Comment: I've recorded this as ignored for this pull request. If you change your mind, just comment `@sonatype-lift unignore`. ## lucene/analysis/nori/src/java/org/apache/lucene/analysis/ko/dict/TokenInfoDictionary.java: ## @@ -89,6 +92,25 @@ public TokenInfoDictionary(Path targetMapFile, Path posDictFile, Path dictFile, () -> Files.newInputStream(fstFile)); } + /** + * Create a {@link TokenInfoDictionary} from an external resource URL (e.g. from Classpath with + * {@link ClassLoader#getResource(String)}). + * + * @param targetMapUrl where to load target map resource + * @param posDictUrl where to load POS dictionary resource + * @param dictUrl where to load dictionary entries resource + * @param fstUrl where to load encoded FST data resource + * @throws IOException if resource was not found or broken + */ + public TokenInfoDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl, URL fstUrl) + throws IOException { +this( +() -> targetMapUrl.openStream(), Review Comment: I've recorded this as ignored for this pull request. If you change your mind, just comment `@sonatype-lift unignore`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a diff in pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries
uschindler commented on code in PR #868: URL: https://github.com/apache/lucene/pull/868#discussion_r865657460 ## lucene/analysis/nori/src/java/org/apache/lucene/analysis/ko/dict/TokenInfoDictionary.java: ## @@ -89,6 +92,25 @@ public TokenInfoDictionary(Path targetMapFile, Path posDictFile, Path dictFile, () -> Files.newInputStream(fstFile)); } + /** + * Create a {@link TokenInfoDictionary} from an external resource URL (e.g. from Classpath with + * {@link ClassLoader#getResource(String)}). + * + * @param targetMapUrl where to load target map resource + * @param posDictUrl where to load POS dictionary resource + * @param dictUrl where to load dictionary entries resource + * @param fstUrl where to load encoded FST data resource + * @throws IOException if resource was not found or broken + */ + public TokenInfoDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl, URL fstUrl) + throws IOException { +this( +() -> targetMapUrl.openStream(), Review Comment: @sonatype-lift ignore ## lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/TokenInfoDictionary.java: ## @@ -82,6 +84,25 @@ public TokenInfoDictionary(Path targetMapFile, Path posDictFile, Path dictFile, () -> Files.newInputStream(fstFile)); } + /** + * Create a {@link TokenInfoDictionary} from an external resource URL (e.g. from Classpath with + * {@link ClassLoader#getResource(String)}). + * + * @param targetMapUrl where to load target map resource + * @param posDictUrl where to load POS dictionary resource + * @param dictUrl where to load dictionary entries resource + * @param fstUrl where to load encoded FST data resource + * @throws IOException if resource was not found or broken + */ + public TokenInfoDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl, URL fstUrl) + throws IOException { +this( +() -> targetMapUrl.openStream(), Review Comment: @sonatype-lift ignore -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] sonatype-lift[bot] commented on a diff in pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries
sonatype-lift[bot] commented on code in PR #868: URL: https://github.com/apache/lucene/pull/868#discussion_r865657636 ## lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/TokenInfoDictionary.java: ## @@ -82,6 +84,25 @@ public TokenInfoDictionary(Path targetMapFile, Path posDictFile, Path dictFile, () -> Files.newInputStream(fstFile)); } + /** + * Create a {@link TokenInfoDictionary} from an external resource URL (e.g. from Classpath with + * {@link ClassLoader#getResource(String)}). + * + * @param targetMapUrl where to load target map resource + * @param posDictUrl where to load POS dictionary resource + * @param dictUrl where to load dictionary entries resource + * @param fstUrl where to load encoded FST data resource + * @throws IOException if resource was not found or broken + */ + public TokenInfoDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl, URL fstUrl) + throws IOException { +this( +() -> targetMapUrl.openStream(), Review Comment: I've recorded this as ignored for this pull request. If you change your mind, just comment `@sonatype-lift unignore`. ## lucene/analysis/nori/src/java/org/apache/lucene/analysis/ko/dict/TokenInfoDictionary.java: ## @@ -89,6 +92,25 @@ public TokenInfoDictionary(Path targetMapFile, Path posDictFile, Path dictFile, () -> Files.newInputStream(fstFile)); } + /** + * Create a {@link TokenInfoDictionary} from an external resource URL (e.g. from Classpath with + * {@link ClassLoader#getResource(String)}). + * + * @param targetMapUrl where to load target map resource + * @param posDictUrl where to load POS dictionary resource + * @param dictUrl where to load dictionary entries resource + * @param fstUrl where to load encoded FST data resource + * @throws IOException if resource was not found or broken + */ + public TokenInfoDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl, URL fstUrl) + throws IOException { +this( +() -> targetMapUrl.openStream(), +() -> posDictUrl.openStream(), +() -> dictUrl.openStream(), Review Comment: I've recorded this as ignored for this pull request. If you change your mind, just comment `@sonatype-lift unignore`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a diff in pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries
uschindler commented on code in PR #868: URL: https://github.com/apache/lucene/pull/868#discussion_r865657741 ## lucene/analysis/nori/src/java/org/apache/lucene/analysis/ko/dict/TokenInfoDictionary.java: ## @@ -89,6 +92,25 @@ public TokenInfoDictionary(Path targetMapFile, Path posDictFile, Path dictFile, () -> Files.newInputStream(fstFile)); } + /** + * Create a {@link TokenInfoDictionary} from an external resource URL (e.g. from Classpath with + * {@link ClassLoader#getResource(String)}). + * + * @param targetMapUrl where to load target map resource + * @param posDictUrl where to load POS dictionary resource + * @param dictUrl where to load dictionary entries resource + * @param fstUrl where to load encoded FST data resource + * @throws IOException if resource was not found or broken + */ + public TokenInfoDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl, URL fstUrl) + throws IOException { +this( +() -> targetMapUrl.openStream(), +() -> posDictUrl.openStream(), +() -> dictUrl.openStream(), Review Comment: @sonatype-lift ignore -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a diff in pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries
uschindler commented on code in PR #868: URL: https://github.com/apache/lucene/pull/868#discussion_r865658630 ## lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/TokenInfoDictionary.java: ## @@ -82,6 +84,25 @@ public TokenInfoDictionary(Path targetMapFile, Path posDictFile, Path dictFile, () -> Files.newInputStream(fstFile)); } + /** + * Create a {@link TokenInfoDictionary} from an external resource URL (e.g. from Classpath with + * {@link ClassLoader#getResource(String)}). + * + * @param targetMapUrl where to load target map resource + * @param posDictUrl where to load POS dictionary resource + * @param dictUrl where to load dictionary entries resource + * @param fstUrl where to load encoded FST data resource + * @throws IOException if resource was not found or broken + */ + public TokenInfoDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl, URL fstUrl) + throws IOException { +this( +() -> targetMapUrl.openStream(), +() -> posDictUrl.openStream(), +() -> dictUrl.openStream(), Review Comment: @sonatype-lift ignore -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] sonatype-lift[bot] commented on a diff in pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries
sonatype-lift[bot] commented on code in PR #868: URL: https://github.com/apache/lucene/pull/868#discussion_r865658659 ## lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/TokenInfoDictionary.java: ## @@ -82,6 +84,25 @@ public TokenInfoDictionary(Path targetMapFile, Path posDictFile, Path dictFile, () -> Files.newInputStream(fstFile)); } + /** + * Create a {@link TokenInfoDictionary} from an external resource URL (e.g. from Classpath with + * {@link ClassLoader#getResource(String)}). + * + * @param targetMapUrl where to load target map resource + * @param posDictUrl where to load POS dictionary resource + * @param dictUrl where to load dictionary entries resource + * @param fstUrl where to load encoded FST data resource + * @throws IOException if resource was not found or broken + */ + public TokenInfoDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl, URL fstUrl) + throws IOException { +this( +() -> targetMapUrl.openStream(), +() -> posDictUrl.openStream(), +() -> dictUrl.openStream(), Review Comment: I've recorded this as ignored for this pull request. If you change your mind, just comment `@sonatype-lift unignore`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries
uschindler commented on PR #868: URL: https://github.com/apache/lucene/pull/868#issuecomment-1118309403 @mocobeta should be maybe add a note to the deprecated javadocs that the deprecated ctor no longer works with classpath resources and always loads the default one? Or should we change it to behave like before? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries
mocobeta commented on PR #868: URL: https://github.com/apache/lucene/pull/868#issuecomment-1118313342 Thank you @uschindler, I am fine with this! > This is against 9.x branch, but can be forward ported to main. I'm inclined to add these URL-based constructors only to 9.x for the temporal remedy; I'm afraid that people use this abusively. (like opening TCP sockets to load resources from another host...) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries
mocobeta commented on PR #868: URL: https://github.com/apache/lucene/pull/868#issuecomment-1118316633 > should be maybe add a note to the deprecated javadocs that the deprecated ctor no longer works with classpath resources and always loads the default one? > Or should we change it to behave like before? Actually, I haven't used these constructors on my own. When it is needed, I simply regenerate the resources and rebuild kuromoji jar. Maybe @msokolov would know what should we do on it? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries
uschindler commented on PR #868: URL: https://github.com/apache/lucene/pull/868#issuecomment-1118320435 > I'm inclined to add these URL-based constructors only to 9.x for the temporal remedy; I'm afraid that people use this abusively. (like opening TCP sockets to load resources from another host...) If you supply Path-based ctors, the URL-based are also needed to support both loading resources from *anywhere*. The problem that @msokolov hit is that you can "offcially" only support Path-based resources, which are not useable with modules or classpath. That's not different in main. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries
mocobeta commented on PR #868: URL: https://github.com/apache/lucene/pull/868#issuecomment-1118326675 (As for sonatype-lift bot, it repeatedly reports the same warnings per push; I once tried to silence it and gave up... I'd agree with disabling it for lucene.) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries
uschindler commented on PR #868: URL: https://github.com/apache/lucene/pull/868#issuecomment-1118328012 But yes, one workaround for Mike is to rebuild the JAR files. An alternative that always works (outside module system) is to place the JAR file with the resources in a separate JAR file added to classpath BEFORE the main kuromoji JAR. Nevertheless, this is NOT a temporary thing: We must either add URL based ctors or we must remove all specialized ctors and make the IOSupplier one public. Just allowing to use Path (files) to customize is not a well designed API. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries
uschindler commented on PR #868: URL: https://github.com/apache/lucene/pull/868#issuecomment-1118329537 > (As for sonatype-lift bot, it repeatedly reports the same warnings per push; I once tried to silence it and gave up... I'd agree with disabling it for lucene.) Especially as those warnings are complete nonsense. If you read the description we are not doing anything mentioned there. We get an URL from outside code. Filtering or ensuing that the URL is not coming from untrused sources is not our responsisbility. One could also pass a Path object with /etc/passwd. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries
uschindler commented on PR #868: URL: https://github.com/apache/lucene/pull/868#issuecomment-1118332065 See [TikaInputStream](https://tika.apache.org/1.28.2/api/org/apache/tika/io/TikaInputStream.html#get-java.net.URL-org.apache.tika.metadata.Metadata-) for an example that allow URL next to Path (any many others). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries
mocobeta commented on PR #868: URL: https://github.com/apache/lucene/pull/868#issuecomment-1118340545 Personally, I don't think we should allow users to load resources from anyware... It's not the sensible way to load the dictionary resources as far as I know. If you need external resources that do not on locally accessible paths, you should simply download or install them beforehand, then load them from the file - this is the convention I've seen so far. However, it's okay with me if we should support them as a general-purpose API. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries
uschindler commented on PR #868: URL: https://github.com/apache/lucene/pull/868#issuecomment-1118346475 > Personally, I don't think we should allow users to load resources from anyware... It's not the sensible way to load the dictionary resources as far as I know. If you need external resources that do not on locally accessible paths, you should simply download or install them beforehand, then load them from the file - this is the convention I've seen so far. > > However, it's okay with me if we should support them as a general-purpose API. It is also common to have them in JAR files. And if we want people to allow them to be supported in Solr, we would need to support URL, too. SolrResourceLoader also returns URL. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] romseygeek merged pull request #860: LUCENE-10553: Fix WANDScorer's handling of 0 and +Infty.
romseygeek merged PR #860: URL: https://github.com/apache/lucene/pull/860 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10553) WANDScorer's handling of 0 and +Infty is backwards
[ https://issues.apache.org/jira/browse/LUCENE-10553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17532157#comment-17532157 ] ASF subversion and git services commented on LUCENE-10553: -- Commit 26301898b20e30f484d653ab6415d460d011b099 in lucene's branch refs/heads/main from Adrien Grand [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=26301898b20 ] LUCENE-10553: Fix WANDScorer's handling of 0 and +Infty. (#860) The computation of the scaling factor has special cases for these two values, but the current logic is backwards. > WANDScorer's handling of 0 and +Infty is backwards > -- > > Key: LUCENE-10553 > URL: https://issues.apache.org/jira/browse/LUCENE-10553 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Time Spent: 1h > Remaining Estimate: 0h > > WANDScorer has special logic to deal with corner cases when the sum of the > maximum scores of sub queries is either 0 or +Infty, but both code and tests > have backwards logic regarding this special case, doing +1 instead of -1 and > vice-versa. > This leads to a failed assertion in the case when the sum of the scores of > the sub queries overflows, which typically happens if one of the clauses has > a default implementation that returns MAX_VALUE if it cannot reason about max > scores. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries
mocobeta commented on PR #868: URL: https://github.com/apache/lucene/pull/868#issuecomment-1118353023 Thanks for explaining, then I don't have an objection to forward porting; maybe we'll need it to make it possible to load the resources from another jar. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] romseygeek opened a new pull request, #869: LUCENE-10436: Reinstate public getdocValuesdocIdSetIterator method on DocValues
romseygeek opened a new pull request, #869: URL: https://github.com/apache/lucene/pull/869 When this was refactored previously, we moved a public static method from DocValuesFieldExistsQuery to the package-private DocValuesIterator class. This makes the method available again by moving it instead to the public DocValues utility class. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] romseygeek commented on pull request #869: LUCENE-10436: Reinstate public getdocValuesdocIdSetIterator method on DocValues
romseygeek commented on PR #869: URL: https://github.com/apache/lucene/pull/869#issuecomment-1118356099 When I backport I'll add a deprecated forwarding method to DocValuesFieldExistsQuery again, to make it a bit more obvious on how to migrate. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] romseygeek commented on pull request #869: LUCENE-10436: Reinstate public getdocValuesdocIdSetIterator method on DocValues
romseygeek commented on PR #869: URL: https://github.com/apache/lucene/pull/869#issuecomment-1118357618 Having said that, looking at the existing MIGRATE instructions, maybe the most sensible thing is to have this method on FieldExistsQuery directly? Then existing code that references the deprecated class continues to work in 9x, and its a fairly simple search-and-replace for upgrading? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10553) WANDScorer's handling of 0 and +Infty is backwards
[ https://issues.apache.org/jira/browse/LUCENE-10553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17532161#comment-17532161 ] ASF subversion and git services commented on LUCENE-10553: -- Commit efa5d6f4d4354ae87a1e6144dc70aeb52b98bfd2 in lucene's branch refs/heads/branch_9x from Adrien Grand [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=efa5d6f4d43 ] LUCENE-10553: Fix WANDScorer's handling of 0 and +Infty. (#860) The computation of the scaling factor has special cases for these two values, but the current logic is backwards. > WANDScorer's handling of 0 and +Infty is backwards > -- > > Key: LUCENE-10553 > URL: https://issues.apache.org/jira/browse/LUCENE-10553 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Time Spent: 1h > Remaining Estimate: 0h > > WANDScorer has special logic to deal with corner cases when the sum of the > maximum scores of sub queries is either 0 or +Infty, but both code and tests > have backwards logic regarding this special case, doing +1 instead of -1 and > vice-versa. > This leads to a failed assertion in the case when the sum of the scores of > the sub queries overflows, which typically happens if one of the clauses has > a default implementation that returns MAX_VALUE if it cannot reason about max > scores. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries
uschindler commented on PR #868: URL: https://github.com/apache/lucene/pull/868#issuecomment-1118408717 OK. I opened this PR against 9.x because it makes it easier to add the changes in deprecation messages. When forward porting just tell the merge/cherrypick on main to "use theirs". -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries
rmuir commented on PR #868: URL: https://github.com/apache/lucene/pull/868#issuecomment-1118416936 what is going on here? why are we allowing such stuff? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries
rmuir commented on PR #868: URL: https://github.com/apache/lucene/pull/868#issuecomment-1118418901 > Personally, I don't think we should allow users to load resources from anyware... It's not the sensible way to load the dictionary resources as far as I know. If you need external resources that do not on locally accessible paths, you should simply download or install them beforehand, then load them from the file - this is the convention I've seen so far. +1, this is how i feel. i don't think we should be supporting Path/URL apis. Sorry this is really wrong. e.g. for ConnnectionCosts, the only ctor that we need is `ConnectionCosts()`. load from jar. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries
uschindler commented on PR #868: URL: https://github.com/apache/lucene/pull/868#issuecomment-1118422000 In short: Some people like Mike Sokolov at Amazon wants to load custom ConnectionCosts and TermInfoDicts and Unk dicts from custom resources. This was working previously with file system and with classloader. The API of Classloading uses URL as resource, while while files use Path objects. So IMHO: - either remove both "Path and URL" ctors (my preference) - or add both (like done here to support Solr, Elasticsearch and Amazon to load dictionaries from custom JAR files) - or just add `IOSupplier` ctor, but that even worse as this is an internal class and should not be in public APIs. The question I ask also to @msokolov : Why the hell does naybody want to modify the dictioaries that are in highly propiertary format. To generate those files, you need to generate them with Gradle anyways (the FST, the compiler for connection costs). So anybody can just compile and package your own JAR file? So I agree with Robert here: Why do we need to make it customizable, there's no added value to provide some proprietary, non-standardized file formats to the ctor as external resource. This was a bug already in early versions of Kuromoji. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries
rmuir commented on PR #868: URL: https://github.com/apache/lucene/pull/868#issuecomment-1118425532 yes, please, lets remove any File,Path,URL,URL,whatever ctors. The code is open-source if amazon wants to build a custom crazy jar. We can't make all the apis complicated and unusable for such stuff. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov commented on a diff in pull request #867: LUCENE-10558: expose stream-based Kuromoji resource constructors
msokolov commented on code in PR #867: URL: https://github.com/apache/lucene/pull/867#discussion_r865801802 ## lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/UnknownDictionary.java: ## @@ -52,18 +52,26 @@ private UnknownDictionary() throws IOException { () -> getClassResource(DICT_FILENAME_SUFFIX)); } - private UnknownDictionary( - IOSupplier targetMapResource, - IOSupplier posResource, - IOSupplier dictResource) + /** + * Create a {@link UnknownDictionary} from an external resource path. + * + * @param targetMap supplier for stream containing target map + * @param posDict supplier for stream containing POS dictionary + * @param dict supplier for stream containing dictionary entries + * @throws IOException if a stream could not be read + */ + public UnknownDictionary( + IOSupplier targetMap, + IOSupplier posDict, + IOSupplier dict) Review Comment: it's another day, I can no longer confirm nor deny, but Uwe's explanation makes sense to me :) If we keep this change, I'd be fine with the resource naming too, although it does have that classpath connotation? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries
uschindler commented on PR #868: URL: https://github.com/apache/lucene/pull/868#issuecomment-1118442012 Let's give @msokolov a chance to comment. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries
msokolov commented on PR #868: URL: https://github.com/apache/lucene/pull/868#issuecomment-1118443078 > In case you ask: This works both with classpath and module usage. The caller-sensitive parts are `Class#getResource(String)`, `ClassLoader#getResource(String)`, and `Module#getResource(String)`. The returned URL is free to use anywhere so it separates concerns like the `IOSupplier` or `Path`. I was going to ask about this :) Didn't know about Module#getResource ... Anyway this approach seems sound. Although it's a little less general than the Stream-based approach, it does handle all the known use cases cleanly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #867: LUCENE-10558: expose stream-based Kuromoji resource constructors
rmuir commented on PR #867: URL: https://github.com/apache/lucene/pull/867#issuecomment-1118443273 I don't think we shoudl do this, same reasons as stated on #868 These things should be loaded from jar as singletons and that's it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries
msokolov commented on PR #868: URL: https://github.com/apache/lucene/pull/868#issuecomment-1118449412 Oh, I missed Robert's objections. Sorry, I don't understand the problem here. The way Kuromoji works, it uses a language model that is trained from a corpus of text to do tokenization. We just want to use a different model trained on a different set of text. I'm not clear why that is seen as a bug. It's not a new file format; it's different contents using the existing file format. The format is not proprietary, it was promoted by Mecab I think, which is the tool used to train the dictionary, and is open-source. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries
rmuir commented on PR #868: URL: https://github.com/apache/lucene/pull/868#issuecomment-1118450647 > Oh, I missed Robert's objections. Sorry, I don't understand the problem here. The way Kuromoji works, it uses a language model that is trained from a corpus of text to do tokenization. We just want to use a different model trained on a different set of text. Use the gradle build to make a jar then. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries
uschindler commented on PR #868: URL: https://github.com/apache/lucene/pull/868#issuecomment-1118457174 > The format is not proprietary, it was promoted by Mecab I think, which is the tool used to train the dictionary, and is open-source. It is proprietary because the FST in the TokenInfoDict is a lucene specific version. The format may change and due to changes in the algorithm we may have a different format. We have no version numbers in the file format. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries
uschindler commented on PR #868: URL: https://github.com/apache/lucene/pull/868#issuecomment-1118462741 I will leave that PR open for discussion. I just implemented the minimal approach + test to make the API compatible to classpath. We can still allow to use the IOSuppliers, but then we must remove the Path ctor, too. And we need a PR for both Nori and Kuromoji. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-10151) Add timeout support to IndexSearcher
[ https://issues.apache.org/jira/browse/LUCENE-10151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17532223#comment-17532223 ] Deepika Sharma edited comment on LUCENE-10151 at 5/5/22 12:16 PM: -- I am exploring adding timeout support to the {{IndexSearcher}} by using {{ExitableDirectoryReader.}} However, one issue with {{ExitableDirectoryReader}} is that it enforces timeout checking at the time of instantiating {{BulkScorer}} and doesn't actually enforce it once you start iterating postings/impacts. This is being discussed in LUCENE-10544 I want to ask if there are any suggestions on alternative ways to approach this problem that I should consider? was (Author: JIRAUSER288832): I am exploring adding timeout support to the {{IndexSearcher}} by using {{ExitableDirectoryReader.}} However, one issue with {{ExitableDirectoryReader}} is that it enforces timeout checking at the time of instantiating {{BulkScorer}} and doesn't actually enforce it once you start iterating postings/impacts. This is being discussed in [LUCENE-10544|https://issues.apache.org/jira/browse/LUCENE-10544] I want to ask if there are any suggestions on alternative ways to approach this problem that I should consider?{{{}{}}} > Add timeout support to IndexSearcher > > > Key: LUCENE-10151 > URL: https://issues.apache.org/jira/browse/LUCENE-10151 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Reporter: Greg Miller >Priority: Minor > > I'd like to explore adding optional "timeout" capabilities to > {{IndexSearcher}}. This would enable users to (optionally) specify a maximum > time budget for search execution. If the search "times out", partial results > would be available. > This idea originated on the dev list (thanks [~jpountz] for the suggestion). > Thread for reference: > [http://mail-archives.apache.org/mod_mbox/lucene-dev/202110.mbox/%3CCAL8PwkZdNGmYJopPjeXYK%3DF7rvLkWon91UEXVxMM4MeeJ3UHxQ%40mail.gmail.com%3E] > > A couple things to watch out for with this change: > # We want to make sure it's robust to a two-phase query evaluation scenario > where the "approximate" step matches a large number of candidates but the > "confirmation" step matches very few (or none). This is a particularly tricky > case. > # We want to make sure the {{TotalHits#Relation}} reported by {{TopDocs}} is > {{GREATER_THAN_OR_EQUAL_TO}} if the query times out > # We want to make sure it plays nice with the {{LRUCache}} since it iterates > the query to pre-populate a {{BitSet}} when caching. That step shouldn't be > allowed to overrun the timeout. The proper way to handle this probably needs > some thought. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10151) Add timeout support to IndexSearcher
[ https://issues.apache.org/jira/browse/LUCENE-10151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17532223#comment-17532223 ] Deepika Sharma commented on LUCENE-10151: - I am exploring adding timeout support to the {{IndexSearcher}} by using {{ExitableDirectoryReader.}} However, one issue with {{ExitableDirectoryReader}} is that it enforces timeout checking at the time of instantiating {{BulkScorer}} and doesn't actually enforce it once you start iterating postings/impacts. This is being discussed in [LUCENE-10544|https://issues.apache.org/jira/browse/LUCENE-10544] I want to ask if there are any suggestions on alternative ways to approach this problem that I should consider?{{{}{}}} > Add timeout support to IndexSearcher > > > Key: LUCENE-10151 > URL: https://issues.apache.org/jira/browse/LUCENE-10151 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Reporter: Greg Miller >Priority: Minor > > I'd like to explore adding optional "timeout" capabilities to > {{IndexSearcher}}. This would enable users to (optionally) specify a maximum > time budget for search execution. If the search "times out", partial results > would be available. > This idea originated on the dev list (thanks [~jpountz] for the suggestion). > Thread for reference: > [http://mail-archives.apache.org/mod_mbox/lucene-dev/202110.mbox/%3CCAL8PwkZdNGmYJopPjeXYK%3DF7rvLkWon91UEXVxMM4MeeJ3UHxQ%40mail.gmail.com%3E] > > A couple things to watch out for with this change: > # We want to make sure it's robust to a two-phase query evaluation scenario > where the "approximate" step matches a large number of candidates but the > "confirmation" step matches very few (or none). This is a particularly tricky > case. > # We want to make sure the {{TotalHits#Relation}} reported by {{TopDocs}} is > {{GREATER_THAN_OR_EQUAL_TO}} if the query times out > # We want to make sure it plays nice with the {{LRUCache}} since it iterates > the query to pre-populate a {{BitSet}} when caching. That step shouldn't be > allowed to overrun the timeout. The proper way to handle this probably needs > some thought. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries
rmuir commented on PR #868: URL: https://github.com/apache/lucene/pull/868#issuecomment-1118480829 Uwe explains it clearly, such APIs are impossible to support correctly and we don't need to add backwards-compatibility or anything else. Files such as `ConnectionCosts.dat` are not standardized or anything like like that, this is internal details. If we want to make a PR to save 1 byte compressing this file a bit better, we should be able to merge it without hesitation or worrying about back compat or any other insanity. We can't and shouldn't support ctors taking binary versions of this stuff. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #869: LUCENE-10436: Reinstate public getdocValuesdocIdSetIterator method on DocValues
rmuir commented on PR #869: URL: https://github.com/apache/lucene/pull/869#issuecomment-1118511195 Why in the world are we moving a method to DocValues API that is only used by 3 callsites. Please, let's make it package private somewhere else. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] romseygeek commented on pull request #869: LUCENE-10436: Reinstate public getdocValuesdocIdSetIterator method on DocValues
romseygeek commented on PR #869: URL: https://github.com/apache/lucene/pull/869#issuecomment-1118516530 > Please, let's make it package private somewhere else. It already is package private, but it was public before, and we use it in elasticsearch code. I'm happy to put it elsewhere (on FieldExistsQuery maybe?) but I don't think we can just remove public methods -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #869: LUCENE-10436: Reinstate public getdocValuesdocIdSetIterator method on DocValues
rmuir commented on PR #869: URL: https://github.com/apache/lucene/pull/869#issuecomment-1118526833 > It already is package private, but it was public before, and we use it in elasticsearch code. I'm happy to put it elsewhere (on FieldExistsQuery maybe?) but I don't think we can just remove public methods Sure we can. used by elasticsearch doesn't mean its a requirement to be public. Sorry, this is just a bit of a pain point as the most recent 2 pull requests in lucene are API changes just like this: for elasticsearch and amazon respectively. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries
msokolov commented on PR #868: URL: https://github.com/apache/lucene/pull/868#issuecomment-1118535867 > Use the gradle build to make a jar then. Is the idea that we would fork analysis/kuromoji package? That would be sad, but maybe you meant something else? Uwe mentioned some kind of classpath-loading approach, but I think that would depend on the classpath order, which is really fragile and not reliable in my experience. Still I may be missing something. If we are going to remove a feature that was supported in 9.0 (and before), can we please clearly document how we can support the same use case going forward? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] romseygeek commented on pull request #869: LUCENE-10436: Reinstate public getdocValuesdocIdSetIterator method on DocValues
romseygeek commented on PR #869: URL: https://github.com/apache/lucene/pull/869#issuecomment-1118537137 I'd argue that it's a revert of an API change - it's a public method in 9.1 and currently we're removing it in 9.2 with no CHANGES entry or information about how to migrate. And the fact that we're using it in ES suggests that there may be other users of it as well. If we really think it shouldn't be a public method, fine, but we should at least have some information on how consumers who are using it at the moment should handle upgrades? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries
msokolov commented on PR #868: URL: https://github.com/apache/lucene/pull/868#issuecomment-1118552882 Also - I don't really buy the idea that we can't support binary file formats - the entire index is filled with binary files. In this case we provide tools for generating these files, so users are free to regenerate them from source when Lucene version changes. There's no need to backwards-compatibly support old formats. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #869: LUCENE-10436: Reinstate public getdocValuesdocIdSetIterator method on DocValues
rmuir commented on PR #869: URL: https://github.com/apache/lucene/pull/869#issuecomment-1118557005 I'm not so opposed to the method being public somewhere, I'm more questioning the need to put it in `DocValues` api. This is what grabbed my attention. Would love to keep this API simple and minimal and without exotic stuff. Today the methods it uses are type-safe and here we are adding a relatively "untyped" method to get a generic iterator over any DV type. If you look at the other methods in the file, it really doesn't fit. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries
mocobeta commented on PR #868: URL: https://github.com/apache/lucene/pull/868#issuecomment-1118561015 I have complex feelings about it and I understand both opinions... Current APIs in 9.x to load custom resources are not perfect (or bad, I dare to say), meanwhile "customizable/switchable dictionary" is a general idea and advanced users would often need it. We still don't have good APIs to support such advanced users - skilled developers who possibly could contribute to Lucene - personally, I'd like to continue discussions of how to improve our current APIs, instead of simply discarding them. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries
uschindler commented on PR #868: URL: https://github.com/apache/lucene/pull/868#issuecomment-1118585168 > Also - I don't really buy the idea that we can't support binary file formats - the entire index is filled with binary files. In this case we provide tools for generating these files, so users are free to regenerate them from source when Lucene version changes. There's no need to backwards-compatibly support old formats. This is still odd, because we have not much error handling in those file formats, because the code was written to load it from the JAR file, so it is basically more or less a dump of the FST and ConnectionCosts. Sure you can regenerate them, but what is the issue in then also call `gradlew jar`? I think that's the main issue here: You need Lucene's source code anyways to build the dictionaries, you have to put the source files somewhere, so you actually forking lucene at that point. If we really want to support external dictionaries we should refactor the API so you can load just one combined (CFS/ZIP like file) that you can easily drop anywhere. This file would encode some version number in it and if you load a file thats not using actual version it bails out. What I would propose: - Add a gradle task that builds a dictionary package and that should be the same for Nori and Kuromoji, just different input files - Have the same factory class and exact same implementation for both dictionaries (I think @mocobeta is working on this). So a user should be able to load a single (zip-like) file and pass it to analyzer/tokenizer and it will automatically be Nori or Kuromojo, no matter what. The API is then very simple: `MorphologicalModel#load(aSingleFileNameOrURLOrInputStream)` - The default Tokenizers shipped in Lucene have no custom ctors, so JapaneseTokenizer behind the scenes loads a single japanese dictionary file from classpath. Anybody wanting to load any other file will use a generic tokenizer impl. The Japanese one shipped with lucene uses its default dictionary. Maybe we could also put the tokenizer in its separate JAR file (for both Japan and Korea) and ship the defacult dictionaries as separate JAR files on Maven central. The main desaster is the number of files which also makes it very error-prone. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] romseygeek commented on pull request #869: LUCENE-10436: Reinstate public getdocValuesdocIdSetIterator method on DocValues
romseygeek commented on PR #869: URL: https://github.com/apache/lucene/pull/869#issuecomment-1118588920 That I agree with! I'll update and put it on FieldExistsQuery - it was on DocValuesFieldExistsQuery before, which has been deprecated but now extends FEQ, and so it should make the transition a lot easier. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries
uschindler commented on PR #868: URL: https://github.com/apache/lucene/pull/868#issuecomment-1118606210 To come back to current issue: I see no problem in this PR it does not make it worse, just better. Path and URI are just holders of a resource. Path is used for filesystem, URL is returned by getResource() methods. I would like URI more, but Java's classloading uses URL also with modue system, so it is the only "correct" way to refer to a resource in a class or module loader. I know, Robert does not like some details of the URL class, but they don't us here. The new API is definitely better than the old deprecated one, which was also broken from beginning leading to the confusion that @msokolov has seen. We should mark the Path and URL APIs with `@lucene.internal` and warn users that we provide no garantiees when you use it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries
uschindler commented on PR #868: URL: https://github.com/apache/lucene/pull/868#issuecomment-1118611460 The alternative would be - as said before - remove Path ctors and use IOSupplier only. But that's worse (maybe it prevents people from doing this, haha). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10550) Add getAllChildren functionality to facets
[ https://issues.apache.org/jira/browse/LUCENE-10550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Miller updated LUCENE-10550: - Component/s: modules/facet > Add getAllChildren functionality to facets > -- > > Key: LUCENE-10550 > URL: https://issues.apache.org/jira/browse/LUCENE-10550 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/facet >Reporter: Yuting Gan >Priority: Minor > > Currently Lucene does not support returning range counts sorted by label > values, but there are use cases demanding this feature. For example, a user > specifies ranges (e.g., [0, 10], [10, 20]) and wants to get range counts > without changing the range order. Today we can only call getTopChildren to > populate range counts, but it would return ranges sorted by counts (e.g., > [10, 20] 100, [0, 10] 50) instead of range values. > Lucene has a API, getAllChildrenSortByValue, that returns numeric values with > counts sorted by label values, please see > [LUCENE-7927|https://issues.apache.org/jira/browse/LUCENE-7927] for details. > Therefore, it would be nice that we can also have a similar API to support > range counts. The proposed getAllChildren API is to return value/range counts > sorted by label values instead of counts. > This proposal was inspired from the discussions with [~gsmiller] when I was > working on the LUCENE-10538 [PR|https://github.com/apache/lucene/pull/843], > and we believe users would benefit from adding this API to Facets. > Hope I can get some feedback from the community since this proposal would > require changes to the getTopChildren API in RangeFacetCounts. Thanks! -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-10550) Add getAllChildren functionality to facets
[ https://issues.apache.org/jira/browse/LUCENE-10550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17532277#comment-17532277 ] Greg Miller edited comment on LUCENE-10550 at 5/5/22 2:37 PM: -- I'm also +1 on this but with a minor suggestion. {quote}The proposed getAllChildren API is to return value/range counts sorted by label values instead of counts. {quote} I wonder if we should "sort" at all for this functionality? If we're returning all children for a specified path, the caller can just as easily sort by whatever criteria they want (or maybe none at all), so sorting within the implementation might be wasteful. Also, for range faceting, the user is providing a list of ranges they care about up-front in a specific order. I would actually propose we retain that order instead of sorting by the range "values" in some way. This is what range faceting currently implements (somewhat confusingly) behind the {{getTopChildren}} API. The order of those ranges might have some meaning to the caller, so it might be best to retain it. What do you think? was (Author: gsmiller): I'm also +1 on this but with a minor suggestion. > The proposed getAllChildren API is to return value/range counts sorted by > label values instead of counts. I wonder if we should "sort" at all for this functionality? If we're returning all children for a specified path, the caller can just as easily sort by whatever criteria they want (or maybe none at all), so sorting within the implementation might be wasteful. Also, for range faceting, the user is providing a list of ranges they care about up-front in a specific order. I would actually propose we retain that order instead of sorting by the range "values" in some way. This is what range faceting currently implements (somewhat confusingly) behind the {{getTopChildren}} API. The order of those ranges might have some meaning to the caller, so it might be best to retain it. What do you think? > Add getAllChildren functionality to facets > -- > > Key: LUCENE-10550 > URL: https://issues.apache.org/jira/browse/LUCENE-10550 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/facet >Reporter: Yuting Gan >Priority: Minor > > Currently Lucene does not support returning range counts sorted by label > values, but there are use cases demanding this feature. For example, a user > specifies ranges (e.g., [0, 10], [10, 20]) and wants to get range counts > without changing the range order. Today we can only call getTopChildren to > populate range counts, but it would return ranges sorted by counts (e.g., > [10, 20] 100, [0, 10] 50) instead of range values. > Lucene has a API, getAllChildrenSortByValue, that returns numeric values with > counts sorted by label values, please see > [LUCENE-7927|https://issues.apache.org/jira/browse/LUCENE-7927] for details. > Therefore, it would be nice that we can also have a similar API to support > range counts. The proposed getAllChildren API is to return value/range counts > sorted by label values instead of counts. > This proposal was inspired from the discussions with [~gsmiller] when I was > working on the LUCENE-10538 [PR|https://github.com/apache/lucene/pull/843], > and we believe users would benefit from adding this API to Facets. > Hope I can get some feedback from the community since this proposal would > require changes to the getTopChildren API in RangeFacetCounts. Thanks! -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10550) Add getAllChildren functionality to facets
[ https://issues.apache.org/jira/browse/LUCENE-10550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17532277#comment-17532277 ] Greg Miller commented on LUCENE-10550: -- I'm also +1 on this but with a minor suggestion. > The proposed getAllChildren API is to return value/range counts sorted by > label values instead of counts. I wonder if we should "sort" at all for this functionality? If we're returning all children for a specified path, the caller can just as easily sort by whatever criteria they want (or maybe none at all), so sorting within the implementation might be wasteful. Also, for range faceting, the user is providing a list of ranges they care about up-front in a specific order. I would actually propose we retain that order instead of sorting by the range "values" in some way. This is what range faceting currently implements (somewhat confusingly) behind the {{getTopChildren}} API. The order of those ranges might have some meaning to the caller, so it might be best to retain it. What do you think? > Add getAllChildren functionality to facets > -- > > Key: LUCENE-10550 > URL: https://issues.apache.org/jira/browse/LUCENE-10550 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/facet >Reporter: Yuting Gan >Priority: Minor > > Currently Lucene does not support returning range counts sorted by label > values, but there are use cases demanding this feature. For example, a user > specifies ranges (e.g., [0, 10], [10, 20]) and wants to get range counts > without changing the range order. Today we can only call getTopChildren to > populate range counts, but it would return ranges sorted by counts (e.g., > [10, 20] 100, [0, 10] 50) instead of range values. > Lucene has a API, getAllChildrenSortByValue, that returns numeric values with > counts sorted by label values, please see > [LUCENE-7927|https://issues.apache.org/jira/browse/LUCENE-7927] for details. > Therefore, it would be nice that we can also have a similar API to support > range counts. The proposed getAllChildren API is to return value/range counts > sorted by label values instead of counts. > This proposal was inspired from the discussions with [~gsmiller] when I was > working on the LUCENE-10538 [PR|https://github.com/apache/lucene/pull/843], > and we believe users would benefit from adding this API to Facets. > Hope I can get some feedback from the community since this proposal would > require changes to the getTopChildren API in RangeFacetCounts. Thanks! -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries
rmuir commented on PR #868: URL: https://github.com/apache/lucene/pull/868#issuecomment-1118635459 > I know, Robert does not like some details of the URL class, but they don't hit us here. https://twitter.com/tnurkiewicz/status/1519643900423688192 Sorry, couldn't resist -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10538) TopN is not being used in getTopChildren()
[ https://issues.apache.org/jira/browse/LUCENE-10538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Miller updated LUCENE-10538: - Component/s: modules/facet > TopN is not being used in getTopChildren() > -- > > Key: LUCENE-10538 > URL: https://issues.apache.org/jira/browse/LUCENE-10538 > Project: Lucene - Core > Issue Type: Bug > Components: modules/facet >Reporter: Yuting Gan >Priority: Minor > Time Spent: 1.5h > Remaining Estimate: 0h > > When looking at the overridden implementation getTopChildren(int topN, String > dim, String... path) in RangeFacetCounts, I found that the topN parameter is > not being used in the code, and the unit tests did not test this function > properly. I will create a PR to fix this, and will look into other overridden > implementations and see if they have the same issue. Please let me know if > there is any question. Thanks! -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mayya-sharipova commented on pull request #792: LUCENE-10502: Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle ordToDoc
mayya-sharipova commented on PR #792: URL: https://github.com/apache/lucene/pull/792#issuecomment-1118635984 @LuXugang Do you want to make this PR against `apache:vectors-disi-direct` branch, so that we can merge it into this branch. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries
mocobeta commented on PR #868: URL: https://github.com/apache/lucene/pull/868#issuecomment-1118638144 > I see no problem in this PR - it does not make it worse, just better. Thanks @uschindler, I totally agree with that. I am +1 to this PR and keeping IOSupplier-based ones private. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] LuXugang commented on pull request #792: LUCENE-10502: Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle ordToDoc
LuXugang commented on PR #792: URL: https://github.com/apache/lucene/pull/792#issuecomment-1118642667 Yes, thanks @mayya-sharipova ,then I will continue to do the format changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mayya-sharipova commented on pull request #792: LUCENE-10502: Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle ordToDoc
mayya-sharipova commented on PR #792: URL: https://github.com/apache/lucene/pull/792#issuecomment-1118648748 @LuXugang Please change the base of this PR to `apache:vectors-disi-direct` yourself, and after that I can merge this PR to this branch. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries
mocobeta commented on PR #868: URL: https://github.com/apache/lucene/pull/868#issuecomment-1118651872 > I would like URI more, The same is true for me. We'd need URI, not URL... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] LuXugang commented on pull request #792: LUCENE-10502: Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle ordToDoc
LuXugang commented on PR #792: URL: https://github.com/apache/lucene/pull/792#issuecomment-1118656979 > @LuXugang Please change the base of this PR to `apache:vectors-disi-direct` yourself, and after that I can merge this PR to this branch. @mayya-sharipova done. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries
uschindler commented on PR #868: URL: https://github.com/apache/lucene/pull/868#issuecomment-1118665915 > > I would like URI more, > > The same is true for me. We'd need URI, not URL... The difference is that URI is just an identificator, but to get an InputStream you need to convert it to URL anyways. How about the following: We want to provide the URL-taking ctors only for the resource case, so lets add a check like `if (! url.getScheme().equalsIgnoreCase("jar")) throw new IllegalArgumentException("This method can only be called with classpath based resources, for file system resources use Path. Networking is not supported.");` Although I see no risk here with equals(). That's well known and not a sourceof problems here. I agree with you, but the URL class is fine, if it would not have this horrible bug (no idea why it is not fixed). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries
mocobeta commented on PR #868: URL: https://github.com/apache/lucene/pull/868#issuecomment-1118816930 Ah sorry for my vague comment, I know the difference between URI and URL; and I understand there is no problem in using URL here. I think we can ship this with 9.2? As an engineer who belongs to a company (the scale of it is very different though), I think I understand Mike's position and argument. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries
uschindler commented on PR #868: URL: https://github.com/apache/lucene/pull/868#issuecomment-1118841012 Sure, I would proceed with this plan and merge to 9.x and main. If we have better ideas (like proposed earlier) for main we can change it there. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] risdenk commented on pull request #2651: SOLR-16110 Using Schema/Config API breaks the File-Upload of Config Set File
risdenk commented on PR #2651: URL: https://github.com/apache/lucene-solr/pull/2651#issuecomment-1118841603 superceded by https://github.com/apache/solr/pull/831 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] risdenk closed pull request #2651: SOLR-16110 Using Schema/Config API breaks the File-Upload of Config Set File
risdenk closed pull request #2651: SOLR-16110 Using Schema/Config API breaks the File-Upload of Config Set File URL: https://github.com/apache/lucene-solr/pull/2651 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries
uschindler commented on PR #868: URL: https://github.com/apache/lucene/pull/868#issuecomment-1118843405 I still don't know if we can fix the deprecated ctors to handle the CLASSPATH resource name correctly. In my original LUCENE-10335 change it was still working (we had a test for it), but it seemed to have been broken after change to `IOSupplier`. Let me look and try to reintroduce the Lucene 9.0 and previous behaviour of the deprecated ctor. I am out of office now, maybe later this evening! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries
uschindler commented on PR #868: URL: https://github.com/apache/lucene/pull/868#issuecomment-1118848400 I think I can fix the old ctor so it works again. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta commented on pull request #540: LUCENE-10312: Add PersianStemmer
mocobeta commented on PR #540: URL: https://github.com/apache/lucene/pull/540#issuecomment-1118861297 I'm sorry for the late response. I just kicked the CI - I'll take a look. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10502) Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle ordToDoc
[ https://issues.apache.org/jira/browse/LUCENE-10502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17532447#comment-17532447 ] ASF subversion and git services commented on LUCENE-10502: -- Commit b3867da5443f58c554a3fd8391d3c98e0b2b7790 in lucene's branch refs/heads/vectors-disi-direct from Lu Xugang [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=b3867da5443 ] LUCENE-10502: Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle ordToDoc (#792) * LUCENE-10502: Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle ordToDoc * TestBackwardsCompatibility was temporarily removed for skipping test * dense case and empty case do not need to store ordToMap mapping * fix * invert if condition * add subclass of dense sparse and empty * spotless * remove the ord variable * move `getOffHeapVectorValues` to `OffHeapVectorValues` class as a static method and rename it as `load` * move the getAcceptOrds method to OffHeapVectorValues * move OffHeapVectorValues to its own class * keep OffHeapVectorValues and all its subclasses in one place * make `OffHeapVectorValues`'s subclasses private except `DenseOffHeapVectorValues` * add some short comments > Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle > ordToDoc > > > Key: LUCENE-10502 > URL: https://issues.apache.org/jira/browse/LUCENE-10502 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: 9.1 >Reporter: Lu Xugang >Priority: Major > Time Spent: 9h > Remaining Estimate: 0h > > Since at search phase, vector's all docs of all fields will be fully loaded > into memory, could we use IndexedDISI to store docIds and > DirectMonotonicWriter/Reader to handle ordToDoc mapping? -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mayya-sharipova commented on pull request #792: LUCENE-10502: Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle ordToDoc
mayya-sharipova commented on PR #792: URL: https://github.com/apache/lucene/pull/792#issuecomment-1118864081 @LuXugang Thanks, feel free to create a follow-up format PR against `apache:vectors-disi-direct` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mayya-sharipova merged pull request #792: LUCENE-10502: Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle ordToDoc
mayya-sharipova merged PR #792: URL: https://github.com/apache/lucene/pull/792 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org