[GitHub] [lucene] uschindler commented on a diff in pull request #867: LUCENE-10558: expose stream-based Kuromoji resource constructors

2022-05-05 Thread GitBox


uschindler commented on code in PR #867:
URL: https://github.com/apache/lucene/pull/867#discussion_r865606003


##
lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/UnknownDictionary.java:
##
@@ -52,18 +52,26 @@ private UnknownDictionary() throws IOException {
 () -> getClassResource(DICT_FILENAME_SUFFIX));
   }
 
-  private UnknownDictionary(
-  IOSupplier targetMapResource,
-  IOSupplier posResource,
-  IOSupplier dictResource)
+  /**
+   * Create a {@link UnknownDictionary} from an external resource path.
+   *
+   * @param targetMap supplier for stream containing target map
+   * @param posDict supplier for stream containing POS dictionary
+   * @param dict supplier for stream containing dictionary entries
+   * @throws IOException if a stream could not be read
+   */
+  public UnknownDictionary(
+  IOSupplier targetMap,
+  IOSupplier posDict,
+  IOSupplier dict)

Review Comment:
   I think he did this because "Resource" is a bit strange, as it is no longer 
classpath based.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler opened a new pull request, #868: LUCENE-10558: Implement URL ctor to support classpath usage in Kuromoji dictionaries

2022-05-05 Thread GitBox


uschindler opened a new pull request, #868:
URL: https://github.com/apache/lucene/pull/868

   see https://issues.apache.org/jira/browse/LUCENE-10558
   
   This is against 9.x branch, but can be forward ported to main.
   
   TODO: This still needs Nori support.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10558) Expose IOSupplier constructors in Kuromoji (and Nori?)

2022-05-05 Thread Uwe Schindler (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17532101#comment-17532101
 ] 

Uwe Schindler commented on LUCENE-10558:


Here is my preferred variant: https://github.com/apache/lucene/pull/868

See also the test how to use it. Just replace Path by URL ctors and use 
ClassLoader or Class#getResource().

> Expose IOSupplier constructors in Kuromoji (and Nori?)
> ---
>
> Key: LUCENE-10558
> URL: https://issues.apache.org/jira/browse/LUCENE-10558
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael Sokolov
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> When we refactored the constructors for  these resource objects used by the 
> kuromoji JapaneseTokenizer,  we (inadvertently, I expect) changed the 
> behavior for consumers that were supplying these resources on the classpath. 
> In that case, we silently replaced the custom resources with the Lucene 
> built-in ones.  I think we cannot support the old API because of Java Module 
> system restrictions, but we didn't provide any usable replacement or notice 
> either.
>  
> This issue is for exposing the new (private) constructors that accept 
> streams, and adding a notice to Migration.md to point users at them, since 
> they can be used with resources streams loaded from the classpath by the 
> caller.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #867: LUCENE-10558: expose stream-based Kuromoji resource constructors

2022-05-05 Thread GitBox


uschindler commented on PR #867:
URL: https://github.com/apache/lucene/pull/867#issuecomment-1118251208

   I would not make the IOSupplier ctors available, they are internal only 
(IOSupplier is a class which is marked as subject to change).
   
   Because we have `java.nio.files.Path` ctors for usage as replacement for the 
eprecated one, we need one taking `java.net.URL` for resource usage.
   
   See #868 for an implementation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath usage in Kuromoji dictionaries

2022-05-05 Thread GitBox


uschindler commented on PR #868:
URL: https://github.com/apache/lucene/pull/868#issuecomment-1118253739

   In case you ask: This works both with classpath and module usage. The 
caller-sensitive parts are `Class#getResource(String)`, 
`ClassLoader#getResource(String)`, and `Module#getResource(String)`. The 
returned URL is free to use anywhere so it separates concerns like the 
`IOSupplier` or `Path`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10558) Expose IOSupplier constructors in Kuromoji (and Nori?)

2022-05-05 Thread Uwe Schindler (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17532117#comment-17532117
 ] 

Uwe Schindler commented on LUCENE-10558:


The latest PR also fixes Nori with the same fix.

> Expose IOSupplier constructors in Kuromoji (and Nori?)
> ---
>
> Key: LUCENE-10558
> URL: https://issues.apache.org/jira/browse/LUCENE-10558
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael Sokolov
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> When we refactored the constructors for  these resource objects used by the 
> kuromoji JapaneseTokenizer,  we (inadvertently, I expect) changed the 
> behavior for consumers that were supplying these resources on the classpath. 
> In that case, we silently replaced the custom resources with the Lucene 
> built-in ones.  I think we cannot support the old API because of Java Module 
> system restrictions, but we didn't provide any usable replacement or notice 
> either.
>  
> This issue is for exposing the new (private) constructors that accept 
> streams, and adding a notice to Migration.md to point users at them, since 
> they can be used with resources streams loaded from the classpath by the 
> caller.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10558) Add URL constructors as complement to Path ctors in Kuromoji and Nori

2022-05-05 Thread Uwe Schindler (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-10558:
---
Summary: Add URL constructors as complement to Path ctors in Kuromoji and 
Nori  (was: Expose IOSupplier constructors in Kuromoji (and Nori?))

> Add URL constructors as complement to Path ctors in Kuromoji and Nori
> -
>
> Key: LUCENE-10558
> URL: https://issues.apache.org/jira/browse/LUCENE-10558
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael Sokolov
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> When we refactored the constructors for  these resource objects used by the 
> kuromoji JapaneseTokenizer,  we (inadvertently, I expect) changed the 
> behavior for consumers that were supplying these resources on the classpath. 
> In that case, we silently replaced the custom resources with the Lucene 
> built-in ones.  I think we cannot support the old API because of Java Module 
> system restrictions, but we didn't provide any usable replacement or notice 
> either.
>  
> This issue is for exposing the new (private) constructors that accept 
> streams, and adding a notice to Migration.md to point users at them, since 
> they can be used with resources streams loaded from the classpath by the 
> caller.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-10558) Add URL constructors for classpath/module usage as complement to Path ctors in Kuromoji and Nori

2022-05-05 Thread Uwe Schindler (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler reassigned LUCENE-10558:
--

Assignee: Uwe Schindler

> Add URL constructors for classpath/module usage as complement to Path ctors 
> in Kuromoji and Nori
> 
>
> Key: LUCENE-10558
> URL: https://issues.apache.org/jira/browse/LUCENE-10558
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael Sokolov
>Assignee: Uwe Schindler
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> When we refactored the constructors for  these resource objects used by the 
> kuromoji JapaneseTokenizer,  we (inadvertently, I expect) changed the 
> behavior for consumers that were supplying these resources on the classpath. 
> In that case, we silently replaced the custom resources with the Lucene 
> built-in ones.  I think we cannot support the old API because of Java Module 
> system restrictions, but we didn't provide any usable replacement or notice 
> either.
>  
> This issue is for exposing the new (private) constructors that accept 
> streams, and adding a notice to Migration.md to point users at them, since 
> they can be used with resources streams loaded from the classpath by the 
> caller.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10558) Add URL constructors for classpath/module usage as complement to Path ctors in Kuromoji and Nori

2022-05-05 Thread Uwe Schindler (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-10558:
---
Summary: Add URL constructors for classpath/module usage as complement to 
Path ctors in Kuromoji and Nori  (was: Add URL constructors as complement to 
Path ctors in Kuromoji and Nori)

> Add URL constructors for classpath/module usage as complement to Path ctors 
> in Kuromoji and Nori
> 
>
> Key: LUCENE-10558
> URL: https://issues.apache.org/jira/browse/LUCENE-10558
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael Sokolov
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> When we refactored the constructors for  these resource objects used by the 
> kuromoji JapaneseTokenizer,  we (inadvertently, I expect) changed the 
> behavior for consumers that were supplying these resources on the classpath. 
> In that case, we silently replaced the custom resources with the Lucene 
> built-in ones.  I think we cannot support the old API because of Java Module 
> system restrictions, but we didn't provide any usable replacement or notice 
> either.
>  
> This issue is for exposing the new (private) constructors that accept 
> streams, and adding a notice to Migration.md to point users at them, since 
> they can be used with resources streams loaded from the classpath by the 
> caller.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] sonatype-lift[bot] commented on a diff in pull request #868: LUCENE-10558: Implement URL ctor to support classpath usage in Kuromoji and Nori dictionaries

2022-05-05 Thread GitBox


sonatype-lift[bot] commented on code in PR #868:
URL: https://github.com/apache/lucene/pull/868#discussion_r865638824


##
lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/ConnectionCosts.java:
##
@@ -63,6 +65,17 @@ public ConnectionCosts(Path connectionCostsFile) throws 
IOException {
 this(() -> Files.newInputStream(connectionCostsFile));
   }
 
+  /**
+   * Create a {@link ConnectionCosts} from an external resource URL (e.g. from 
Classpath with {@link
+   * ClassLoader#getResource(String)}).
+   *
+   * @param connectionCostsUrl where to load connection costs resource
+   * @throws IOException if resource was not found or broken
+   */
+  public ConnectionCosts(URL connectionCostsUrl) throws IOException {
+this(() -> connectionCostsUrl.openStream());

Review Comment:
   
*[URLCONNECTION_SSRF_FD](https://find-sec-bugs.github.io/bugs.htm#URLCONNECTION_SSRF_FD):*
  This web server request could be used by an attacker to expose internal 
services and filesystem.
   
   (at-me [in a reply](https://help.sonatype.com/lift/talking-to-lift) with 
`help` or `ignore`)
   
   ---
   
   Was this a good recommendation?
   [ [🙁 Not 
relevant](https://www.sonatype.com/lift-comment-rating?comment=204460545&lift_comment_rating=1)
 ] - [ [😕 Won't 
fix](https://www.sonatype.com/lift-comment-rating?comment=204460545&lift_comment_rating=2)
 ] - [ [😑 Not critical, will 
fix](https://www.sonatype.com/lift-comment-rating?comment=204460545&lift_comment_rating=3)
 ] - [ [🙂 Critical, will 
fix](https://www.sonatype.com/lift-comment-rating?comment=204460545&lift_comment_rating=4)
 ] - [ [😊 Critical, fixing 
now](https://www.sonatype.com/lift-comment-rating?comment=204460545&lift_comment_rating=5)
 ]



##
lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/TokenInfoDictionary.java:
##
@@ -82,6 +84,25 @@ public TokenInfoDictionary(Path targetMapFile, Path 
posDictFile, Path dictFile,
 () -> Files.newInputStream(fstFile));
   }
 
+  /**
+   * Create a {@link TokenInfoDictionary} from an external resource URL (e.g. 
from Classpath with
+   * {@link ClassLoader#getResource(String)}).
+   *
+   * @param targetMapUrl where to load target map resource
+   * @param posDictUrl where to load POS dictionary resource
+   * @param dictUrl where to load dictionary entries resource
+   * @param fstUrl where to load encoded FST data resource
+   * @throws IOException if resource was not found or broken
+   */
+  public TokenInfoDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl, 
URL fstUrl)
+  throws IOException {
+this(
+() -> targetMapUrl.openStream(),
+() -> posDictUrl.openStream(),

Review Comment:
   
*[URLCONNECTION_SSRF_FD](https://find-sec-bugs.github.io/bugs.htm#URLCONNECTION_SSRF_FD):*
  This web server request could be used by an attacker to expose internal 
services and filesystem.
   
   (at-me [in a reply](https://help.sonatype.com/lift/talking-to-lift) with 
`help` or `ignore`)
   
   ---
   
   Was this a good recommendation?
   [ [🙁 Not 
relevant](https://www.sonatype.com/lift-comment-rating?comment=204460555&lift_comment_rating=1)
 ] - [ [😕 Won't 
fix](https://www.sonatype.com/lift-comment-rating?comment=204460555&lift_comment_rating=2)
 ] - [ [😑 Not critical, will 
fix](https://www.sonatype.com/lift-comment-rating?comment=204460555&lift_comment_rating=3)
 ] - [ [🙂 Critical, will 
fix](https://www.sonatype.com/lift-comment-rating?comment=204460555&lift_comment_rating=4)
 ] - [ [😊 Critical, fixing 
now](https://www.sonatype.com/lift-comment-rating?comment=204460555&lift_comment_rating=5)
 ]



##
lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/TokenInfoDictionary.java:
##
@@ -82,6 +84,25 @@ public TokenInfoDictionary(Path targetMapFile, Path 
posDictFile, Path dictFile,
 () -> Files.newInputStream(fstFile));
   }
 
+  /**
+   * Create a {@link TokenInfoDictionary} from an external resource URL (e.g. 
from Classpath with
+   * {@link ClassLoader#getResource(String)}).
+   *
+   * @param targetMapUrl where to load target map resource
+   * @param posDictUrl where to load POS dictionary resource
+   * @param dictUrl where to load dictionary entries resource
+   * @param fstUrl where to load encoded FST data resource
+   * @throws IOException if resource was not found or broken
+   */
+  public TokenInfoDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl, 
URL fstUrl)
+  throws IOException {
+this(
+() -> targetMapUrl.openStream(),
+() -> posDictUrl.openStream(),
+() -> dictUrl.openStream(),

Review Comment:
   
*[URLCONNECTION_SSRF_FD](https://find-sec-bugs.github.io/bugs.htm#URLCONNECTION_SSRF_FD):*
  This web server request could be used by an attacker to expose internal 
services and filesystem.
   
   (at-me [in a reply](https://help.sonatype.com/lift/talking-to-lift) with 
`help` or `ignore`)
   
   ---
   
   Was this a good reco

[GitHub] [lucene] sonatype-lift[bot] commented on a diff in pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries

2022-05-05 Thread GitBox


sonatype-lift[bot] commented on code in PR #868:
URL: https://github.com/apache/lucene/pull/868#discussion_r865639082


##
lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/TokenInfoDictionary.java:
##
@@ -82,6 +84,25 @@ public TokenInfoDictionary(Path targetMapFile, Path 
posDictFile, Path dictFile,
 () -> Files.newInputStream(fstFile));
   }
 
+  /**
+   * Create a {@link TokenInfoDictionary} from an external resource URL (e.g. 
from Classpath with
+   * {@link ClassLoader#getResource(String)}).
+   *
+   * @param targetMapUrl where to load target map resource
+   * @param posDictUrl where to load POS dictionary resource
+   * @param dictUrl where to load dictionary entries resource
+   * @param fstUrl where to load encoded FST data resource
+   * @throws IOException if resource was not found or broken
+   */
+  public TokenInfoDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl, 
URL fstUrl)
+  throws IOException {
+this(
+() -> targetMapUrl.openStream(),

Review Comment:
   
*[URLCONNECTION_SSRF_FD](https://find-sec-bugs.github.io/bugs.htm#URLCONNECTION_SSRF_FD):*
  This web server request could be used by an attacker to expose internal 
services and filesystem.
   
   (at-me [in a reply](https://help.sonatype.com/lift/talking-to-lift) with 
`help` or `ignore`)
   
   ---
   
   Was this a good recommendation?
   [ [🙁 Not 
relevant](https://www.sonatype.com/lift-comment-rating?comment=204460624&lift_comment_rating=1)
 ] - [ [😕 Won't 
fix](https://www.sonatype.com/lift-comment-rating?comment=204460624&lift_comment_rating=2)
 ] - [ [😑 Not critical, will 
fix](https://www.sonatype.com/lift-comment-rating?comment=204460624&lift_comment_rating=3)
 ] - [ [🙂 Critical, will 
fix](https://www.sonatype.com/lift-comment-rating?comment=204460624&lift_comment_rating=4)
 ] - [ [😊 Critical, fixing 
now](https://www.sonatype.com/lift-comment-rating?comment=204460624&lift_comment_rating=5)
 ]



##
lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/UnknownDictionary.java:
##
@@ -64,6 +66,20 @@ public UnknownDictionary(Path targetMapFile, Path 
posDictFile, Path dictFile) th
 () -> Files.newInputStream(dictFile));
   }
 
+  /**
+   * Create a {@link UnknownDictionary} from an external resource URL (e.g. 
from Classpath with
+   * {@link ClassLoader#getResource(String)}).
+   *
+   * @param targetMapUrl where to load target map resource
+   * @param posDictUrl where to load POS dictionary resource
+   * @param dictUrl where to load dictionary entries resource
+   * @throws IOException if resource was not found or broken
+   */
+  public UnknownDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl) 
throws IOException {
+super(
+() -> targetMapUrl.openStream(), () -> posDictUrl.openStream(), () -> 
dictUrl.openStream());

Review Comment:
   
*[URLCONNECTION_SSRF_FD](https://find-sec-bugs.github.io/bugs.htm#URLCONNECTION_SSRF_FD):*
  This web server request could be used by an attacker to expose internal 
services and filesystem.
   
   (at-me [in a reply](https://help.sonatype.com/lift/talking-to-lift) with 
`help` or `ignore`)
   
   ---
   
   Was this a good recommendation?
   [ [🙁 Not 
relevant](https://www.sonatype.com/lift-comment-rating?comment=204460625&lift_comment_rating=1)
 ] - [ [😕 Won't 
fix](https://www.sonatype.com/lift-comment-rating?comment=204460625&lift_comment_rating=2)
 ] - [ [😑 Not critical, will 
fix](https://www.sonatype.com/lift-comment-rating?comment=204460625&lift_comment_rating=3)
 ] - [ [🙂 Critical, will 
fix](https://www.sonatype.com/lift-comment-rating?comment=204460625&lift_comment_rating=4)
 ] - [ [😊 Critical, fixing 
now](https://www.sonatype.com/lift-comment-rating?comment=204460625&lift_comment_rating=5)
 ]



##
lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/TokenInfoDictionary.java:
##
@@ -82,6 +84,25 @@ public TokenInfoDictionary(Path targetMapFile, Path 
posDictFile, Path dictFile,
 () -> Files.newInputStream(fstFile));
   }
 
+  /**
+   * Create a {@link TokenInfoDictionary} from an external resource URL (e.g. 
from Classpath with
+   * {@link ClassLoader#getResource(String)}).
+   *
+   * @param targetMapUrl where to load target map resource
+   * @param posDictUrl where to load POS dictionary resource
+   * @param dictUrl where to load dictionary entries resource
+   * @param fstUrl where to load encoded FST data resource
+   * @throws IOException if resource was not found or broken
+   */
+  public TokenInfoDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl, 
URL fstUrl)
+  throws IOException {
+this(
+() -> targetMapUrl.openStream(),
+() -> posDictUrl.openStream(),
+() -> dictUrl.openStream(),
+() -> fstUrl.openStream());

Review Comment:
   
*[URLCONNECTION_SSRF_FD](https://find-sec-bugs.github.io/bugs.htm#URLCONNECTION_SSRF_FD):*
  This web server request could be 

[GitHub] [lucene] sonatype-lift[bot] commented on a diff in pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries

2022-05-05 Thread GitBox


sonatype-lift[bot] commented on code in PR #868:
URL: https://github.com/apache/lucene/pull/868#discussion_r865650503


##
lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/ConnectionCosts.java:
##
@@ -63,6 +65,17 @@ public ConnectionCosts(Path connectionCostsFile) throws 
IOException {
 this(() -> Files.newInputStream(connectionCostsFile));
   }
 
+  /**
+   * Create a {@link ConnectionCosts} from an external resource URL (e.g. from 
Classpath with {@link
+   * ClassLoader#getResource(String)}).
+   *
+   * @param connectionCostsUrl where to load connection costs resource
+   * @throws IOException if resource was not found or broken
+   */
+  public ConnectionCosts(URL connectionCostsUrl) throws IOException {
+this(() -> connectionCostsUrl.openStream());

Review Comment:
   I've recorded this as ignored for this pull request. If you change your 
mind, just comment `@sonatype-lift unignore`.



##
lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/TokenInfoDictionary.java:
##
@@ -82,6 +84,25 @@ public TokenInfoDictionary(Path targetMapFile, Path 
posDictFile, Path dictFile,
 () -> Files.newInputStream(fstFile));
   }
 
+  /**
+   * Create a {@link TokenInfoDictionary} from an external resource URL (e.g. 
from Classpath with
+   * {@link ClassLoader#getResource(String)}).
+   *
+   * @param targetMapUrl where to load target map resource
+   * @param posDictUrl where to load POS dictionary resource
+   * @param dictUrl where to load dictionary entries resource
+   * @param fstUrl where to load encoded FST data resource
+   * @throws IOException if resource was not found or broken
+   */
+  public TokenInfoDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl, 
URL fstUrl)
+  throws IOException {
+this(
+() -> targetMapUrl.openStream(),
+() -> posDictUrl.openStream(),

Review Comment:
   I've recorded this as ignored for this pull request. If you change your 
mind, just comment `@sonatype-lift unignore`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on a diff in pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries

2022-05-05 Thread GitBox


uschindler commented on code in PR #868:
URL: https://github.com/apache/lucene/pull/868#discussion_r865650474


##
lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/ConnectionCosts.java:
##
@@ -63,6 +65,17 @@ public ConnectionCosts(Path connectionCostsFile) throws 
IOException {
 this(() -> Files.newInputStream(connectionCostsFile));
   }
 
+  /**
+   * Create a {@link ConnectionCosts} from an external resource URL (e.g. from 
Classpath with {@link
+   * ClassLoader#getResource(String)}).
+   *
+   * @param connectionCostsUrl where to load connection costs resource
+   * @throws IOException if resource was not found or broken
+   */
+  public ConnectionCosts(URL connectionCostsUrl) throws IOException {
+this(() -> connectionCostsUrl.openStream());

Review Comment:
   @sonatype-lift ignore



##
lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/TokenInfoDictionary.java:
##
@@ -82,6 +84,25 @@ public TokenInfoDictionary(Path targetMapFile, Path 
posDictFile, Path dictFile,
 () -> Files.newInputStream(fstFile));
   }
 
+  /**
+   * Create a {@link TokenInfoDictionary} from an external resource URL (e.g. 
from Classpath with
+   * {@link ClassLoader#getResource(String)}).
+   *
+   * @param targetMapUrl where to load target map resource
+   * @param posDictUrl where to load POS dictionary resource
+   * @param dictUrl where to load dictionary entries resource
+   * @param fstUrl where to load encoded FST data resource
+   * @throws IOException if resource was not found or broken
+   */
+  public TokenInfoDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl, 
URL fstUrl)
+  throws IOException {
+this(
+() -> targetMapUrl.openStream(),
+() -> posDictUrl.openStream(),

Review Comment:
   @sonatype-lift ignore



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on a diff in pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries

2022-05-05 Thread GitBox


uschindler commented on code in PR #868:
URL: https://github.com/apache/lucene/pull/868#discussion_r865650662


##
lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/TokenInfoDictionary.java:
##
@@ -82,6 +84,25 @@ public TokenInfoDictionary(Path targetMapFile, Path 
posDictFile, Path dictFile,
 () -> Files.newInputStream(fstFile));
   }
 
+  /**
+   * Create a {@link TokenInfoDictionary} from an external resource URL (e.g. 
from Classpath with
+   * {@link ClassLoader#getResource(String)}).
+   *
+   * @param targetMapUrl where to load target map resource
+   * @param posDictUrl where to load POS dictionary resource
+   * @param dictUrl where to load dictionary entries resource
+   * @param fstUrl where to load encoded FST data resource
+   * @throws IOException if resource was not found or broken
+   */
+  public TokenInfoDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl, 
URL fstUrl)
+  throws IOException {
+this(
+() -> targetMapUrl.openStream(),
+() -> posDictUrl.openStream(),
+() -> dictUrl.openStream(),

Review Comment:
   @sonatype-lift ignore



##
lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/TokenInfoDictionary.java:
##
@@ -82,6 +84,25 @@ public TokenInfoDictionary(Path targetMapFile, Path 
posDictFile, Path dictFile,
 () -> Files.newInputStream(fstFile));
   }
 
+  /**
+   * Create a {@link TokenInfoDictionary} from an external resource URL (e.g. 
from Classpath with
+   * {@link ClassLoader#getResource(String)}).
+   *
+   * @param targetMapUrl where to load target map resource
+   * @param posDictUrl where to load POS dictionary resource
+   * @param dictUrl where to load dictionary entries resource
+   * @param fstUrl where to load encoded FST data resource
+   * @throws IOException if resource was not found or broken
+   */
+  public TokenInfoDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl, 
URL fstUrl)
+  throws IOException {
+this(
+() -> targetMapUrl.openStream(),

Review Comment:
   @sonatype-lift ignore



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] sonatype-lift[bot] commented on a diff in pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries

2022-05-05 Thread GitBox


sonatype-lift[bot] commented on code in PR #868:
URL: https://github.com/apache/lucene/pull/868#discussion_r865650684


##
lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/TokenInfoDictionary.java:
##
@@ -82,6 +84,25 @@ public TokenInfoDictionary(Path targetMapFile, Path 
posDictFile, Path dictFile,
 () -> Files.newInputStream(fstFile));
   }
 
+  /**
+   * Create a {@link TokenInfoDictionary} from an external resource URL (e.g. 
from Classpath with
+   * {@link ClassLoader#getResource(String)}).
+   *
+   * @param targetMapUrl where to load target map resource
+   * @param posDictUrl where to load POS dictionary resource
+   * @param dictUrl where to load dictionary entries resource
+   * @param fstUrl where to load encoded FST data resource
+   * @throws IOException if resource was not found or broken
+   */
+  public TokenInfoDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl, 
URL fstUrl)
+  throws IOException {
+this(
+() -> targetMapUrl.openStream(),
+() -> posDictUrl.openStream(),
+() -> dictUrl.openStream(),

Review Comment:
   I've recorded this as ignored for this pull request. If you change your 
mind, just comment `@sonatype-lift unignore`.



##
lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/TokenInfoDictionary.java:
##
@@ -82,6 +84,25 @@ public TokenInfoDictionary(Path targetMapFile, Path 
posDictFile, Path dictFile,
 () -> Files.newInputStream(fstFile));
   }
 
+  /**
+   * Create a {@link TokenInfoDictionary} from an external resource URL (e.g. 
from Classpath with
+   * {@link ClassLoader#getResource(String)}).
+   *
+   * @param targetMapUrl where to load target map resource
+   * @param posDictUrl where to load POS dictionary resource
+   * @param dictUrl where to load dictionary entries resource
+   * @param fstUrl where to load encoded FST data resource
+   * @throws IOException if resource was not found or broken
+   */
+  public TokenInfoDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl, 
URL fstUrl)
+  throws IOException {
+this(
+() -> targetMapUrl.openStream(),

Review Comment:
   I've recorded this as ignored for this pull request. If you change your 
mind, just comment `@sonatype-lift unignore`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on a diff in pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries

2022-05-05 Thread GitBox


uschindler commented on code in PR #868:
URL: https://github.com/apache/lucene/pull/868#discussion_r865650877


##
lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/UnknownDictionary.java:
##
@@ -64,6 +66,20 @@ public UnknownDictionary(Path targetMapFile, Path 
posDictFile, Path dictFile) th
 () -> Files.newInputStream(dictFile));
   }
 
+  /**
+   * Create a {@link UnknownDictionary} from an external resource URL (e.g. 
from Classpath with
+   * {@link ClassLoader#getResource(String)}).
+   *
+   * @param targetMapUrl where to load target map resource
+   * @param posDictUrl where to load POS dictionary resource
+   * @param dictUrl where to load dictionary entries resource
+   * @throws IOException if resource was not found or broken
+   */
+  public UnknownDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl) 
throws IOException {
+super(
+() -> targetMapUrl.openStream(), () -> posDictUrl.openStream(), () -> 
dictUrl.openStream());

Review Comment:
   @sonatype-lift ignore



##
lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/TokenInfoDictionary.java:
##
@@ -82,6 +84,25 @@ public TokenInfoDictionary(Path targetMapFile, Path 
posDictFile, Path dictFile,
 () -> Files.newInputStream(fstFile));
   }
 
+  /**
+   * Create a {@link TokenInfoDictionary} from an external resource URL (e.g. 
from Classpath with
+   * {@link ClassLoader#getResource(String)}).
+   *
+   * @param targetMapUrl where to load target map resource
+   * @param posDictUrl where to load POS dictionary resource
+   * @param dictUrl where to load dictionary entries resource
+   * @param fstUrl where to load encoded FST data resource
+   * @throws IOException if resource was not found or broken
+   */
+  public TokenInfoDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl, 
URL fstUrl)
+  throws IOException {
+this(
+() -> targetMapUrl.openStream(),
+() -> posDictUrl.openStream(),
+() -> dictUrl.openStream(),
+() -> fstUrl.openStream());

Review Comment:
   @sonatype-lift ignore



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] sonatype-lift[bot] commented on a diff in pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries

2022-05-05 Thread GitBox


sonatype-lift[bot] commented on code in PR #868:
URL: https://github.com/apache/lucene/pull/868#discussion_r865650926


##
lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/UnknownDictionary.java:
##
@@ -64,6 +66,20 @@ public UnknownDictionary(Path targetMapFile, Path 
posDictFile, Path dictFile) th
 () -> Files.newInputStream(dictFile));
   }
 
+  /**
+   * Create a {@link UnknownDictionary} from an external resource URL (e.g. 
from Classpath with
+   * {@link ClassLoader#getResource(String)}).
+   *
+   * @param targetMapUrl where to load target map resource
+   * @param posDictUrl where to load POS dictionary resource
+   * @param dictUrl where to load dictionary entries resource
+   * @throws IOException if resource was not found or broken
+   */
+  public UnknownDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl) 
throws IOException {
+super(
+() -> targetMapUrl.openStream(), () -> posDictUrl.openStream(), () -> 
dictUrl.openStream());

Review Comment:
   I've recorded this as ignored for this pull request. If you change your 
mind, just comment `@sonatype-lift unignore`.



##
lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/TokenInfoDictionary.java:
##
@@ -82,6 +84,25 @@ public TokenInfoDictionary(Path targetMapFile, Path 
posDictFile, Path dictFile,
 () -> Files.newInputStream(fstFile));
   }
 
+  /**
+   * Create a {@link TokenInfoDictionary} from an external resource URL (e.g. 
from Classpath with
+   * {@link ClassLoader#getResource(String)}).
+   *
+   * @param targetMapUrl where to load target map resource
+   * @param posDictUrl where to load POS dictionary resource
+   * @param dictUrl where to load dictionary entries resource
+   * @param fstUrl where to load encoded FST data resource
+   * @throws IOException if resource was not found or broken
+   */
+  public TokenInfoDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl, 
URL fstUrl)
+  throws IOException {
+this(
+() -> targetMapUrl.openStream(),
+() -> posDictUrl.openStream(),
+() -> dictUrl.openStream(),
+() -> fstUrl.openStream());

Review Comment:
   I've recorded this as ignored for this pull request. If you change your 
mind, just comment `@sonatype-lift unignore`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries

2022-05-05 Thread GitBox


uschindler commented on PR #868:
URL: https://github.com/apache/lucene/pull/868#issuecomment-1118281755

   Can someone disable this @sonatype-lift, makes no sense for Lucene as we are 
no webservers.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries

2022-05-05 Thread GitBox


uschindler commented on PR #868:
URL: https://github.com/apache/lucene/pull/868#issuecomment-1118282233

   @sonatype-lift silence ignore shutup 👎 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] sonatype-lift[bot] commented on a diff in pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries

2022-05-05 Thread GitBox


sonatype-lift[bot] commented on code in PR #868:
URL: https://github.com/apache/lucene/pull/868#discussion_r865656370


##
lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/TokenInfoDictionary.java:
##
@@ -82,6 +84,25 @@ public TokenInfoDictionary(Path targetMapFile, Path 
posDictFile, Path dictFile,
 () -> Files.newInputStream(fstFile));
   }
 
+  /**
+   * Create a {@link TokenInfoDictionary} from an external resource URL (e.g. 
from Classpath with
+   * {@link ClassLoader#getResource(String)}).
+   *
+   * @param targetMapUrl where to load target map resource
+   * @param posDictUrl where to load POS dictionary resource
+   * @param dictUrl where to load dictionary entries resource
+   * @param fstUrl where to load encoded FST data resource
+   * @throws IOException if resource was not found or broken
+   */
+  public TokenInfoDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl, 
URL fstUrl)
+  throws IOException {
+this(
+() -> targetMapUrl.openStream(),
+() -> posDictUrl.openStream(),

Review Comment:
   
*[URLCONNECTION_SSRF_FD](https://find-sec-bugs.github.io/bugs.htm#URLCONNECTION_SSRF_FD):*
  This web server request could be used by an attacker to expose internal 
services and filesystem.
   
   (at-me [in a reply](https://help.sonatype.com/lift/talking-to-lift) with 
`help` or `ignore`)
   
   ---
   
   Was this a good recommendation?
   [ [🙁 Not 
relevant](https://www.sonatype.com/lift-comment-rating?comment=204469703&lift_comment_rating=1)
 ] - [ [😕 Won't 
fix](https://www.sonatype.com/lift-comment-rating?comment=204469703&lift_comment_rating=2)
 ] - [ [😑 Not critical, will 
fix](https://www.sonatype.com/lift-comment-rating?comment=204469703&lift_comment_rating=3)
 ] - [ [🙂 Critical, will 
fix](https://www.sonatype.com/lift-comment-rating?comment=204469703&lift_comment_rating=4)
 ] - [ [😊 Critical, fixing 
now](https://www.sonatype.com/lift-comment-rating?comment=204469703&lift_comment_rating=5)
 ]



##
lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/TokenInfoDictionary.java:
##
@@ -82,6 +84,25 @@ public TokenInfoDictionary(Path targetMapFile, Path 
posDictFile, Path dictFile,
 () -> Files.newInputStream(fstFile));
   }
 
+  /**
+   * Create a {@link TokenInfoDictionary} from an external resource URL (e.g. 
from Classpath with
+   * {@link ClassLoader#getResource(String)}).
+   *
+   * @param targetMapUrl where to load target map resource
+   * @param posDictUrl where to load POS dictionary resource
+   * @param dictUrl where to load dictionary entries resource
+   * @param fstUrl where to load encoded FST data resource
+   * @throws IOException if resource was not found or broken
+   */
+  public TokenInfoDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl, 
URL fstUrl)
+  throws IOException {
+this(
+() -> targetMapUrl.openStream(),
+() -> posDictUrl.openStream(),
+() -> dictUrl.openStream(),
+() -> fstUrl.openStream());

Review Comment:
   
*[URLCONNECTION_SSRF_FD](https://find-sec-bugs.github.io/bugs.htm#URLCONNECTION_SSRF_FD):*
  This web server request could be used by an attacker to expose internal 
services and filesystem.
   
   (at-me [in a reply](https://help.sonatype.com/lift/talking-to-lift) with 
`help` or `ignore`)
   
   ---
   
   Was this a good recommendation?
   [ [🙁 Not 
relevant](https://www.sonatype.com/lift-comment-rating?comment=204469740&lift_comment_rating=1)
 ] - [ [😕 Won't 
fix](https://www.sonatype.com/lift-comment-rating?comment=204469740&lift_comment_rating=2)
 ] - [ [😑 Not critical, will 
fix](https://www.sonatype.com/lift-comment-rating?comment=204469740&lift_comment_rating=3)
 ] - [ [🙂 Critical, will 
fix](https://www.sonatype.com/lift-comment-rating?comment=204469740&lift_comment_rating=4)
 ] - [ [😊 Critical, fixing 
now](https://www.sonatype.com/lift-comment-rating?comment=204469740&lift_comment_rating=5)
 ]



##
lucene/analysis/nori/src/java/org/apache/lucene/analysis/ko/dict/TokenInfoDictionary.java:
##
@@ -89,6 +92,25 @@ public TokenInfoDictionary(Path targetMapFile, Path 
posDictFile, Path dictFile,
 () -> Files.newInputStream(fstFile));
   }
 
+  /**
+   * Create a {@link TokenInfoDictionary} from an external resource URL (e.g. 
from Classpath with
+   * {@link ClassLoader#getResource(String)}).
+   *
+   * @param targetMapUrl where to load target map resource
+   * @param posDictUrl where to load POS dictionary resource
+   * @param dictUrl where to load dictionary entries resource
+   * @param fstUrl where to load encoded FST data resource
+   * @throws IOException if resource was not found or broken
+   */
+  public TokenInfoDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl, 
URL fstUrl)
+  throws IOException {
+this(
+() -> targetMapUrl.openStream(),
+() -> posDictUrl.openStream(),

Review Comment:
   
*[URLCONNECTION_SSRF_FD](http

[GitHub] [lucene] uschindler commented on a diff in pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries

2022-05-05 Thread GitBox


uschindler commented on code in PR #868:
URL: https://github.com/apache/lucene/pull/868#discussion_r865656501


##
lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/TokenInfoDictionary.java:
##
@@ -82,6 +84,25 @@ public TokenInfoDictionary(Path targetMapFile, Path 
posDictFile, Path dictFile,
 () -> Files.newInputStream(fstFile));
   }
 
+  /**
+   * Create a {@link TokenInfoDictionary} from an external resource URL (e.g. 
from Classpath with
+   * {@link ClassLoader#getResource(String)}).
+   *
+   * @param targetMapUrl where to load target map resource
+   * @param posDictUrl where to load POS dictionary resource
+   * @param dictUrl where to load dictionary entries resource
+   * @param fstUrl where to load encoded FST data resource
+   * @throws IOException if resource was not found or broken
+   */
+  public TokenInfoDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl, 
URL fstUrl)
+  throws IOException {
+this(
+() -> targetMapUrl.openStream(),
+() -> posDictUrl.openStream(),

Review Comment:
   @sonatype-lift ignore



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] sonatype-lift[bot] commented on a diff in pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries

2022-05-05 Thread GitBox


sonatype-lift[bot] commented on code in PR #868:
URL: https://github.com/apache/lucene/pull/868#discussion_r865656677


##
lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/TokenInfoDictionary.java:
##
@@ -82,6 +84,25 @@ public TokenInfoDictionary(Path targetMapFile, Path 
posDictFile, Path dictFile,
 () -> Files.newInputStream(fstFile));
   }
 
+  /**
+   * Create a {@link TokenInfoDictionary} from an external resource URL (e.g. 
from Classpath with
+   * {@link ClassLoader#getResource(String)}).
+   *
+   * @param targetMapUrl where to load target map resource
+   * @param posDictUrl where to load POS dictionary resource
+   * @param dictUrl where to load dictionary entries resource
+   * @param fstUrl where to load encoded FST data resource
+   * @throws IOException if resource was not found or broken
+   */
+  public TokenInfoDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl, 
URL fstUrl)
+  throws IOException {
+this(
+() -> targetMapUrl.openStream(),
+() -> posDictUrl.openStream(),
+() -> dictUrl.openStream(),

Review Comment:
   
*[URLCONNECTION_SSRF_FD](https://find-sec-bugs.github.io/bugs.htm#URLCONNECTION_SSRF_FD):*
  This web server request could be used by an attacker to expose internal 
services and filesystem.
   
   (at-me [in a reply](https://help.sonatype.com/lift/talking-to-lift) with 
`help` or `ignore`)
   
   ---
   
   Was this a good recommendation?
   [ [🙁 Not 
relevant](https://www.sonatype.com/lift-comment-rating?comment=204469769&lift_comment_rating=1)
 ] - [ [😕 Won't 
fix](https://www.sonatype.com/lift-comment-rating?comment=204469769&lift_comment_rating=2)
 ] - [ [😑 Not critical, will 
fix](https://www.sonatype.com/lift-comment-rating?comment=204469769&lift_comment_rating=3)
 ] - [ [🙂 Critical, will 
fix](https://www.sonatype.com/lift-comment-rating?comment=204469769&lift_comment_rating=4)
 ] - [ [😊 Critical, fixing 
now](https://www.sonatype.com/lift-comment-rating?comment=204469769&lift_comment_rating=5)
 ]



##
lucene/analysis/nori/src/java/org/apache/lucene/analysis/ko/dict/TokenInfoDictionary.java:
##
@@ -89,6 +92,25 @@ public TokenInfoDictionary(Path targetMapFile, Path 
posDictFile, Path dictFile,
 () -> Files.newInputStream(fstFile));
   }
 
+  /**
+   * Create a {@link TokenInfoDictionary} from an external resource URL (e.g. 
from Classpath with
+   * {@link ClassLoader#getResource(String)}).
+   *
+   * @param targetMapUrl where to load target map resource
+   * @param posDictUrl where to load POS dictionary resource
+   * @param dictUrl where to load dictionary entries resource
+   * @param fstUrl where to load encoded FST data resource
+   * @throws IOException if resource was not found or broken
+   */
+  public TokenInfoDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl, 
URL fstUrl)
+  throws IOException {
+this(
+() -> targetMapUrl.openStream(),
+() -> posDictUrl.openStream(),
+() -> dictUrl.openStream(),
+() -> fstUrl.openStream());

Review Comment:
   
*[URLCONNECTION_SSRF_FD](https://find-sec-bugs.github.io/bugs.htm#URLCONNECTION_SSRF_FD):*
  This web server request could be used by an attacker to expose internal 
services and filesystem.
   
   (at-me [in a reply](https://help.sonatype.com/lift/talking-to-lift) with 
`help` or `ignore`)
   
   ---
   
   Was this a good recommendation?
   [ [🙁 Not 
relevant](https://www.sonatype.com/lift-comment-rating?comment=204469788&lift_comment_rating=1)
 ] - [ [😕 Won't 
fix](https://www.sonatype.com/lift-comment-rating?comment=204469788&lift_comment_rating=2)
 ] - [ [😑 Not critical, will 
fix](https://www.sonatype.com/lift-comment-rating?comment=204469788&lift_comment_rating=3)
 ] - [ [🙂 Critical, will 
fix](https://www.sonatype.com/lift-comment-rating?comment=204469788&lift_comment_rating=4)
 ] - [ [😊 Critical, fixing 
now](https://www.sonatype.com/lift-comment-rating?comment=204469788&lift_comment_rating=5)
 ]



##
lucene/analysis/nori/src/java/org/apache/lucene/analysis/ko/dict/TokenInfoDictionary.java:
##
@@ -89,6 +92,25 @@ public TokenInfoDictionary(Path targetMapFile, Path 
posDictFile, Path dictFile,
 () -> Files.newInputStream(fstFile));
   }
 
+  /**
+   * Create a {@link TokenInfoDictionary} from an external resource URL (e.g. 
from Classpath with
+   * {@link ClassLoader#getResource(String)}).
+   *
+   * @param targetMapUrl where to load target map resource
+   * @param posDictUrl where to load POS dictionary resource
+   * @param dictUrl where to load dictionary entries resource
+   * @param fstUrl where to load encoded FST data resource
+   * @throws IOException if resource was not found or broken
+   */
+  public TokenInfoDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl, 
URL fstUrl)
+  throws IOException {
+this(
+() -> targetMapUrl.openStream(),
+() -> posDictUrl.openStream(),

Review Comment:

[GitHub] [lucene] uschindler commented on a diff in pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries

2022-05-05 Thread GitBox


uschindler commented on code in PR #868:
URL: https://github.com/apache/lucene/pull/868#discussion_r865656735


##
lucene/analysis/nori/src/java/org/apache/lucene/analysis/ko/dict/TokenInfoDictionary.java:
##
@@ -89,6 +92,25 @@ public TokenInfoDictionary(Path targetMapFile, Path 
posDictFile, Path dictFile,
 () -> Files.newInputStream(fstFile));
   }
 
+  /**
+   * Create a {@link TokenInfoDictionary} from an external resource URL (e.g. 
from Classpath with
+   * {@link ClassLoader#getResource(String)}).
+   *
+   * @param targetMapUrl where to load target map resource
+   * @param posDictUrl where to load POS dictionary resource
+   * @param dictUrl where to load dictionary entries resource
+   * @param fstUrl where to load encoded FST data resource
+   * @throws IOException if resource was not found or broken
+   */
+  public TokenInfoDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl, 
URL fstUrl)
+  throws IOException {
+this(
+() -> targetMapUrl.openStream(),
+() -> posDictUrl.openStream(),

Review Comment:
   @sonatype-lift ignore



##
lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/UnknownDictionary.java:
##
@@ -64,6 +66,20 @@ public UnknownDictionary(Path targetMapFile, Path 
posDictFile, Path dictFile) th
 () -> Files.newInputStream(dictFile));
   }
 
+  /**
+   * Create a {@link UnknownDictionary} from an external resource URL (e.g. 
from Classpath with
+   * {@link ClassLoader#getResource(String)}).
+   *
+   * @param targetMapUrl where to load target map resource
+   * @param posDictUrl where to load POS dictionary resource
+   * @param dictUrl where to load dictionary entries resource
+   * @throws IOException if resource was not found or broken
+   */
+  public UnknownDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl) 
throws IOException {
+super(
+() -> targetMapUrl.openStream(), () -> posDictUrl.openStream(), () -> 
dictUrl.openStream());

Review Comment:
   @sonatype-lift ignore



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] sonatype-lift[bot] commented on a diff in pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries

2022-05-05 Thread GitBox


sonatype-lift[bot] commented on code in PR #868:
URL: https://github.com/apache/lucene/pull/868#discussion_r865656822


##
lucene/analysis/nori/src/java/org/apache/lucene/analysis/ko/dict/TokenInfoDictionary.java:
##
@@ -89,6 +92,25 @@ public TokenInfoDictionary(Path targetMapFile, Path 
posDictFile, Path dictFile,
 () -> Files.newInputStream(fstFile));
   }
 
+  /**
+   * Create a {@link TokenInfoDictionary} from an external resource URL (e.g. 
from Classpath with
+   * {@link ClassLoader#getResource(String)}).
+   *
+   * @param targetMapUrl where to load target map resource
+   * @param posDictUrl where to load POS dictionary resource
+   * @param dictUrl where to load dictionary entries resource
+   * @param fstUrl where to load encoded FST data resource
+   * @throws IOException if resource was not found or broken
+   */
+  public TokenInfoDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl, 
URL fstUrl)
+  throws IOException {
+this(
+() -> targetMapUrl.openStream(),
+() -> posDictUrl.openStream(),
+() -> dictUrl.openStream(),

Review Comment:
   
*[URLCONNECTION_SSRF_FD](https://find-sec-bugs.github.io/bugs.htm#URLCONNECTION_SSRF_FD):*
  This web server request could be used by an attacker to expose internal 
services and filesystem.
   
   (at-me [in a reply](https://help.sonatype.com/lift/talking-to-lift) with 
`help` or `ignore`)
   
   ---
   
   Was this a good recommendation?
   [ [🙁 Not 
relevant](https://www.sonatype.com/lift-comment-rating?comment=204469797&lift_comment_rating=1)
 ] - [ [😕 Won't 
fix](https://www.sonatype.com/lift-comment-rating?comment=204469797&lift_comment_rating=2)
 ] - [ [😑 Not critical, will 
fix](https://www.sonatype.com/lift-comment-rating?comment=204469797&lift_comment_rating=3)
 ] - [ [🙂 Critical, will 
fix](https://www.sonatype.com/lift-comment-rating?comment=204469797&lift_comment_rating=4)
 ] - [ [😊 Critical, fixing 
now](https://www.sonatype.com/lift-comment-rating?comment=204469797&lift_comment_rating=5)
 ]



##
lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/ConnectionCosts.java:
##
@@ -63,6 +65,17 @@ public ConnectionCosts(Path connectionCostsFile) throws 
IOException {
 this(() -> Files.newInputStream(connectionCostsFile));
   }
 
+  /**
+   * Create a {@link ConnectionCosts} from an external resource URL (e.g. from 
Classpath with {@link
+   * ClassLoader#getResource(String)}).
+   *
+   * @param connectionCostsUrl where to load connection costs resource
+   * @throws IOException if resource was not found or broken
+   */
+  public ConnectionCosts(URL connectionCostsUrl) throws IOException {
+this(() -> connectionCostsUrl.openStream());

Review Comment:
   
*[URLCONNECTION_SSRF_FD](https://find-sec-bugs.github.io/bugs.htm#URLCONNECTION_SSRF_FD):*
  This web server request could be used by an attacker to expose internal 
services and filesystem.
   
   (at-me [in a reply](https://help.sonatype.com/lift/talking-to-lift) with 
`help` or `ignore`)
   
   ---
   
   Was this a good recommendation?
   [ [🙁 Not 
relevant](https://www.sonatype.com/lift-comment-rating?comment=204469815&lift_comment_rating=1)
 ] - [ [😕 Won't 
fix](https://www.sonatype.com/lift-comment-rating?comment=204469815&lift_comment_rating=2)
 ] - [ [😑 Not critical, will 
fix](https://www.sonatype.com/lift-comment-rating?comment=204469815&lift_comment_rating=3)
 ] - [ [🙂 Critical, will 
fix](https://www.sonatype.com/lift-comment-rating?comment=204469815&lift_comment_rating=4)
 ] - [ [😊 Critical, fixing 
now](https://www.sonatype.com/lift-comment-rating?comment=204469815&lift_comment_rating=5)
 ]



##
lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/UnknownDictionary.java:
##
@@ -64,6 +66,20 @@ public UnknownDictionary(Path targetMapFile, Path 
posDictFile, Path dictFile) th
 () -> Files.newInputStream(dictFile));
   }
 
+  /**
+   * Create a {@link UnknownDictionary} from an external resource URL (e.g. 
from Classpath with
+   * {@link ClassLoader#getResource(String)}).
+   *
+   * @param targetMapUrl where to load target map resource
+   * @param posDictUrl where to load POS dictionary resource
+   * @param dictUrl where to load dictionary entries resource
+   * @throws IOException if resource was not found or broken
+   */
+  public UnknownDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl) 
throws IOException {
+super(
+() -> targetMapUrl.openStream(), () -> posDictUrl.openStream(), () -> 
dictUrl.openStream());

Review Comment:
   I've recorded this as ignored for this pull request. If you change your 
mind, just comment `@sonatype-lift unignore`.



##
lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/TokenInfoDictionary.java:
##
@@ -82,6 +84,25 @@ public TokenInfoDictionary(Path targetMapFile, Path 
posDictFile, Path dictFile,
 () -> Files.newInputStream(fstFile));
   }
 
+  

[GitHub] [lucene] uschindler commented on a diff in pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries

2022-05-05 Thread GitBox


uschindler commented on code in PR #868:
URL: https://github.com/apache/lucene/pull/868#discussion_r865656911


##
lucene/analysis/nori/src/java/org/apache/lucene/analysis/ko/dict/TokenInfoDictionary.java:
##
@@ -89,6 +92,25 @@ public TokenInfoDictionary(Path targetMapFile, Path 
posDictFile, Path dictFile,
 () -> Files.newInputStream(fstFile));
   }
 
+  /**
+   * Create a {@link TokenInfoDictionary} from an external resource URL (e.g. 
from Classpath with
+   * {@link ClassLoader#getResource(String)}).
+   *
+   * @param targetMapUrl where to load target map resource
+   * @param posDictUrl where to load POS dictionary resource
+   * @param dictUrl where to load dictionary entries resource
+   * @param fstUrl where to load encoded FST data resource
+   * @throws IOException if resource was not found or broken
+   */
+  public TokenInfoDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl, 
URL fstUrl)
+  throws IOException {
+this(
+() -> targetMapUrl.openStream(),
+() -> posDictUrl.openStream(),
+() -> dictUrl.openStream(),
+() -> fstUrl.openStream());

Review Comment:
   @sonatype-lift ignore



##
lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/ConnectionCosts.java:
##
@@ -63,6 +65,17 @@ public ConnectionCosts(Path connectionCostsFile) throws 
IOException {
 this(() -> Files.newInputStream(connectionCostsFile));
   }
 
+  /**
+   * Create a {@link ConnectionCosts} from an external resource URL (e.g. from 
Classpath with {@link
+   * ClassLoader#getResource(String)}).
+   *
+   * @param connectionCostsUrl where to load connection costs resource
+   * @throws IOException if resource was not found or broken
+   */
+  public ConnectionCosts(URL connectionCostsUrl) throws IOException {
+this(() -> connectionCostsUrl.openStream());

Review Comment:
   @sonatype-lift ignore



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] sonatype-lift[bot] commented on a diff in pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries

2022-05-05 Thread GitBox


sonatype-lift[bot] commented on code in PR #868:
URL: https://github.com/apache/lucene/pull/868#discussion_r865657065


##
lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/ConnectionCosts.java:
##
@@ -63,6 +65,17 @@ public ConnectionCosts(Path connectionCostsFile) throws 
IOException {
 this(() -> Files.newInputStream(connectionCostsFile));
   }
 
+  /**
+   * Create a {@link ConnectionCosts} from an external resource URL (e.g. from 
Classpath with {@link
+   * ClassLoader#getResource(String)}).
+   *
+   * @param connectionCostsUrl where to load connection costs resource
+   * @throws IOException if resource was not found or broken
+   */
+  public ConnectionCosts(URL connectionCostsUrl) throws IOException {
+this(() -> connectionCostsUrl.openStream());

Review Comment:
   I've recorded this as ignored for this pull request. If you change your 
mind, just comment `@sonatype-lift unignore`.



##
lucene/analysis/nori/src/java/org/apache/lucene/analysis/ko/dict/ConnectionCosts.java:
##
@@ -63,6 +66,17 @@ public ConnectionCosts(Path connectionCostsFile) throws 
IOException {
 this(() -> Files.newInputStream(connectionCostsFile));
   }
 
+  /**
+   * Create a {@link ConnectionCosts} from an external resource URL (e.g. from 
Classpath with {@link
+   * ClassLoader#getResource(String)}).
+   *
+   * @param connectionCostsUrl where to load connection costs resource
+   * @throws IOException if resource was not found or broken
+   */
+  public ConnectionCosts(URL connectionCostsUrl) throws IOException {
+this(() -> connectionCostsUrl.openStream());

Review Comment:
   I've recorded this as ignored for this pull request. If you change your 
mind, just comment `@sonatype-lift unignore`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on a diff in pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries

2022-05-05 Thread GitBox


uschindler commented on code in PR #868:
URL: https://github.com/apache/lucene/pull/868#discussion_r865657121


##
lucene/analysis/nori/src/java/org/apache/lucene/analysis/ko/dict/ConnectionCosts.java:
##
@@ -63,6 +66,17 @@ public ConnectionCosts(Path connectionCostsFile) throws 
IOException {
 this(() -> Files.newInputStream(connectionCostsFile));
   }
 
+  /**
+   * Create a {@link ConnectionCosts} from an external resource URL (e.g. from 
Classpath with {@link
+   * ClassLoader#getResource(String)}).
+   *
+   * @param connectionCostsUrl where to load connection costs resource
+   * @throws IOException if resource was not found or broken
+   */
+  public ConnectionCosts(URL connectionCostsUrl) throws IOException {
+this(() -> connectionCostsUrl.openStream());

Review Comment:
   @sonatype-lift ignore



##
lucene/analysis/nori/src/java/org/apache/lucene/analysis/ko/dict/UnknownDictionary.java:
##
@@ -62,6 +65,20 @@ public UnknownDictionary(Path targetMapFile, Path 
posDictFile, Path dictFile) th
 () -> Files.newInputStream(dictFile));
   }
 
+  /**
+   * Create a {@link UnknownDictionary} from an external resource URL (e.g. 
from Classpath with
+   * {@link ClassLoader#getResource(String)}).
+   *
+   * @param targetMapUrl where to load target map resource
+   * @param posDictUrl where to load POS dictionary resource
+   * @param dictUrl where to load dictionary entries resource
+   * @throws IOException if resource was not found or broken
+   */
+  public UnknownDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl) 
throws IOException {
+super(
+() -> targetMapUrl.openStream(), () -> posDictUrl.openStream(), () -> 
dictUrl.openStream());

Review Comment:
   @sonatype-lift ignore



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] sonatype-lift[bot] commented on a diff in pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries

2022-05-05 Thread GitBox


sonatype-lift[bot] commented on code in PR #868:
URL: https://github.com/apache/lucene/pull/868#discussion_r865657342


##
lucene/analysis/nori/src/java/org/apache/lucene/analysis/ko/dict/UnknownDictionary.java:
##
@@ -62,6 +65,20 @@ public UnknownDictionary(Path targetMapFile, Path 
posDictFile, Path dictFile) th
 () -> Files.newInputStream(dictFile));
   }
 
+  /**
+   * Create a {@link UnknownDictionary} from an external resource URL (e.g. 
from Classpath with
+   * {@link ClassLoader#getResource(String)}).
+   *
+   * @param targetMapUrl where to load target map resource
+   * @param posDictUrl where to load POS dictionary resource
+   * @param dictUrl where to load dictionary entries resource
+   * @throws IOException if resource was not found or broken
+   */
+  public UnknownDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl) 
throws IOException {
+super(
+() -> targetMapUrl.openStream(), () -> posDictUrl.openStream(), () -> 
dictUrl.openStream());

Review Comment:
   I've recorded this as ignored for this pull request. If you change your 
mind, just comment `@sonatype-lift unignore`.



##
lucene/analysis/nori/src/java/org/apache/lucene/analysis/ko/dict/TokenInfoDictionary.java:
##
@@ -89,6 +92,25 @@ public TokenInfoDictionary(Path targetMapFile, Path 
posDictFile, Path dictFile,
 () -> Files.newInputStream(fstFile));
   }
 
+  /**
+   * Create a {@link TokenInfoDictionary} from an external resource URL (e.g. 
from Classpath with
+   * {@link ClassLoader#getResource(String)}).
+   *
+   * @param targetMapUrl where to load target map resource
+   * @param posDictUrl where to load POS dictionary resource
+   * @param dictUrl where to load dictionary entries resource
+   * @param fstUrl where to load encoded FST data resource
+   * @throws IOException if resource was not found or broken
+   */
+  public TokenInfoDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl, 
URL fstUrl)
+  throws IOException {
+this(
+() -> targetMapUrl.openStream(),

Review Comment:
   I've recorded this as ignored for this pull request. If you change your 
mind, just comment `@sonatype-lift unignore`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on a diff in pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries

2022-05-05 Thread GitBox


uschindler commented on code in PR #868:
URL: https://github.com/apache/lucene/pull/868#discussion_r865657460


##
lucene/analysis/nori/src/java/org/apache/lucene/analysis/ko/dict/TokenInfoDictionary.java:
##
@@ -89,6 +92,25 @@ public TokenInfoDictionary(Path targetMapFile, Path 
posDictFile, Path dictFile,
 () -> Files.newInputStream(fstFile));
   }
 
+  /**
+   * Create a {@link TokenInfoDictionary} from an external resource URL (e.g. 
from Classpath with
+   * {@link ClassLoader#getResource(String)}).
+   *
+   * @param targetMapUrl where to load target map resource
+   * @param posDictUrl where to load POS dictionary resource
+   * @param dictUrl where to load dictionary entries resource
+   * @param fstUrl where to load encoded FST data resource
+   * @throws IOException if resource was not found or broken
+   */
+  public TokenInfoDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl, 
URL fstUrl)
+  throws IOException {
+this(
+() -> targetMapUrl.openStream(),

Review Comment:
   @sonatype-lift ignore



##
lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/TokenInfoDictionary.java:
##
@@ -82,6 +84,25 @@ public TokenInfoDictionary(Path targetMapFile, Path 
posDictFile, Path dictFile,
 () -> Files.newInputStream(fstFile));
   }
 
+  /**
+   * Create a {@link TokenInfoDictionary} from an external resource URL (e.g. 
from Classpath with
+   * {@link ClassLoader#getResource(String)}).
+   *
+   * @param targetMapUrl where to load target map resource
+   * @param posDictUrl where to load POS dictionary resource
+   * @param dictUrl where to load dictionary entries resource
+   * @param fstUrl where to load encoded FST data resource
+   * @throws IOException if resource was not found or broken
+   */
+  public TokenInfoDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl, 
URL fstUrl)
+  throws IOException {
+this(
+() -> targetMapUrl.openStream(),

Review Comment:
   @sonatype-lift ignore



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] sonatype-lift[bot] commented on a diff in pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries

2022-05-05 Thread GitBox


sonatype-lift[bot] commented on code in PR #868:
URL: https://github.com/apache/lucene/pull/868#discussion_r865657636


##
lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/TokenInfoDictionary.java:
##
@@ -82,6 +84,25 @@ public TokenInfoDictionary(Path targetMapFile, Path 
posDictFile, Path dictFile,
 () -> Files.newInputStream(fstFile));
   }
 
+  /**
+   * Create a {@link TokenInfoDictionary} from an external resource URL (e.g. 
from Classpath with
+   * {@link ClassLoader#getResource(String)}).
+   *
+   * @param targetMapUrl where to load target map resource
+   * @param posDictUrl where to load POS dictionary resource
+   * @param dictUrl where to load dictionary entries resource
+   * @param fstUrl where to load encoded FST data resource
+   * @throws IOException if resource was not found or broken
+   */
+  public TokenInfoDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl, 
URL fstUrl)
+  throws IOException {
+this(
+() -> targetMapUrl.openStream(),

Review Comment:
   I've recorded this as ignored for this pull request. If you change your 
mind, just comment `@sonatype-lift unignore`.



##
lucene/analysis/nori/src/java/org/apache/lucene/analysis/ko/dict/TokenInfoDictionary.java:
##
@@ -89,6 +92,25 @@ public TokenInfoDictionary(Path targetMapFile, Path 
posDictFile, Path dictFile,
 () -> Files.newInputStream(fstFile));
   }
 
+  /**
+   * Create a {@link TokenInfoDictionary} from an external resource URL (e.g. 
from Classpath with
+   * {@link ClassLoader#getResource(String)}).
+   *
+   * @param targetMapUrl where to load target map resource
+   * @param posDictUrl where to load POS dictionary resource
+   * @param dictUrl where to load dictionary entries resource
+   * @param fstUrl where to load encoded FST data resource
+   * @throws IOException if resource was not found or broken
+   */
+  public TokenInfoDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl, 
URL fstUrl)
+  throws IOException {
+this(
+() -> targetMapUrl.openStream(),
+() -> posDictUrl.openStream(),
+() -> dictUrl.openStream(),

Review Comment:
   I've recorded this as ignored for this pull request. If you change your 
mind, just comment `@sonatype-lift unignore`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on a diff in pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries

2022-05-05 Thread GitBox


uschindler commented on code in PR #868:
URL: https://github.com/apache/lucene/pull/868#discussion_r865657741


##
lucene/analysis/nori/src/java/org/apache/lucene/analysis/ko/dict/TokenInfoDictionary.java:
##
@@ -89,6 +92,25 @@ public TokenInfoDictionary(Path targetMapFile, Path 
posDictFile, Path dictFile,
 () -> Files.newInputStream(fstFile));
   }
 
+  /**
+   * Create a {@link TokenInfoDictionary} from an external resource URL (e.g. 
from Classpath with
+   * {@link ClassLoader#getResource(String)}).
+   *
+   * @param targetMapUrl where to load target map resource
+   * @param posDictUrl where to load POS dictionary resource
+   * @param dictUrl where to load dictionary entries resource
+   * @param fstUrl where to load encoded FST data resource
+   * @throws IOException if resource was not found or broken
+   */
+  public TokenInfoDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl, 
URL fstUrl)
+  throws IOException {
+this(
+() -> targetMapUrl.openStream(),
+() -> posDictUrl.openStream(),
+() -> dictUrl.openStream(),

Review Comment:
   @sonatype-lift ignore



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on a diff in pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries

2022-05-05 Thread GitBox


uschindler commented on code in PR #868:
URL: https://github.com/apache/lucene/pull/868#discussion_r865658630


##
lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/TokenInfoDictionary.java:
##
@@ -82,6 +84,25 @@ public TokenInfoDictionary(Path targetMapFile, Path 
posDictFile, Path dictFile,
 () -> Files.newInputStream(fstFile));
   }
 
+  /**
+   * Create a {@link TokenInfoDictionary} from an external resource URL (e.g. 
from Classpath with
+   * {@link ClassLoader#getResource(String)}).
+   *
+   * @param targetMapUrl where to load target map resource
+   * @param posDictUrl where to load POS dictionary resource
+   * @param dictUrl where to load dictionary entries resource
+   * @param fstUrl where to load encoded FST data resource
+   * @throws IOException if resource was not found or broken
+   */
+  public TokenInfoDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl, 
URL fstUrl)
+  throws IOException {
+this(
+() -> targetMapUrl.openStream(),
+() -> posDictUrl.openStream(),
+() -> dictUrl.openStream(),

Review Comment:
   @sonatype-lift ignore



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] sonatype-lift[bot] commented on a diff in pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries

2022-05-05 Thread GitBox


sonatype-lift[bot] commented on code in PR #868:
URL: https://github.com/apache/lucene/pull/868#discussion_r865658659


##
lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/TokenInfoDictionary.java:
##
@@ -82,6 +84,25 @@ public TokenInfoDictionary(Path targetMapFile, Path 
posDictFile, Path dictFile,
 () -> Files.newInputStream(fstFile));
   }
 
+  /**
+   * Create a {@link TokenInfoDictionary} from an external resource URL (e.g. 
from Classpath with
+   * {@link ClassLoader#getResource(String)}).
+   *
+   * @param targetMapUrl where to load target map resource
+   * @param posDictUrl where to load POS dictionary resource
+   * @param dictUrl where to load dictionary entries resource
+   * @param fstUrl where to load encoded FST data resource
+   * @throws IOException if resource was not found or broken
+   */
+  public TokenInfoDictionary(URL targetMapUrl, URL posDictUrl, URL dictUrl, 
URL fstUrl)
+  throws IOException {
+this(
+() -> targetMapUrl.openStream(),
+() -> posDictUrl.openStream(),
+() -> dictUrl.openStream(),

Review Comment:
   I've recorded this as ignored for this pull request. If you change your 
mind, just comment `@sonatype-lift unignore`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries

2022-05-05 Thread GitBox


uschindler commented on PR #868:
URL: https://github.com/apache/lucene/pull/868#issuecomment-1118309403

   @mocobeta should be maybe add a note to the deprecated javadocs that the 
deprecated ctor no longer works with classpath resources and always loads the 
default one?
   
   Or should we change it to behave like before?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mocobeta commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries

2022-05-05 Thread GitBox


mocobeta commented on PR #868:
URL: https://github.com/apache/lucene/pull/868#issuecomment-1118313342

   Thank you @uschindler, I am fine with this!
   
   > This is against 9.x branch, but can be forward ported to main.
   
   I'm inclined to add these URL-based constructors only to 9.x for the 
temporal remedy; I'm afraid that people use this abusively. (like opening TCP 
sockets to load resources from another host...)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mocobeta commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries

2022-05-05 Thread GitBox


mocobeta commented on PR #868:
URL: https://github.com/apache/lucene/pull/868#issuecomment-1118316633

   > should be maybe add a note to the deprecated javadocs that the deprecated 
ctor no longer works with classpath resources and always loads the default one?
   > Or should we change it to behave like before?
   
   Actually, I haven't used these constructors on my own. When it is needed, I 
simply regenerate the resources and rebuild kuromoji jar.
   Maybe @msokolov would know what should we do on it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries

2022-05-05 Thread GitBox


uschindler commented on PR #868:
URL: https://github.com/apache/lucene/pull/868#issuecomment-1118320435

   > I'm inclined to add these URL-based constructors only to 9.x for the 
temporal remedy; I'm afraid that people use this abusively. (like opening TCP 
sockets to load resources from another host...)
   
   If you supply Path-based ctors, the URL-based are also needed to support 
both loading resources from *anywhere*. The problem that @msokolov hit is that 
you can "offcially" only support Path-based resources, which are not useable 
with modules or classpath. That's not different in main.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mocobeta commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries

2022-05-05 Thread GitBox


mocobeta commented on PR #868:
URL: https://github.com/apache/lucene/pull/868#issuecomment-1118326675

   (As for sonatype-lift bot, it repeatedly reports the same warnings per push; 
I once tried to silence it and gave up... I'd agree with disabling it for 
lucene.)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries

2022-05-05 Thread GitBox


uschindler commented on PR #868:
URL: https://github.com/apache/lucene/pull/868#issuecomment-1118328012

   But yes, one workaround for Mike is to rebuild the JAR files. An alternative 
that always works (outside module system) is to place the JAR file with the 
resources in a separate JAR file added to classpath BEFORE the main kuromoji 
JAR.
   
   Nevertheless, this is NOT a temporary thing: We must either add URL based 
ctors or we must remove all specialized ctors and make the IOSupplier one 
public. Just allowing to use Path (files) to customize is not a well designed 
API.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries

2022-05-05 Thread GitBox


uschindler commented on PR #868:
URL: https://github.com/apache/lucene/pull/868#issuecomment-1118329537

   > (As for sonatype-lift bot, it repeatedly reports the same warnings per 
push; I once tried to silence it and gave up... I'd agree with disabling it for 
lucene.)
   
   Especially as those warnings are complete nonsense. If you read the 
description we are not doing anything mentioned there. We get an URL from 
outside code. Filtering or ensuing that the URL is not coming from untrused 
sources is not our responsisbility. One could also pass a Path object with 
/etc/passwd.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries

2022-05-05 Thread GitBox


uschindler commented on PR #868:
URL: https://github.com/apache/lucene/pull/868#issuecomment-1118332065

   See 
[TikaInputStream](https://tika.apache.org/1.28.2/api/org/apache/tika/io/TikaInputStream.html#get-java.net.URL-org.apache.tika.metadata.Metadata-)
 for an example that allow URL next to Path (any many others).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mocobeta commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries

2022-05-05 Thread GitBox


mocobeta commented on PR #868:
URL: https://github.com/apache/lucene/pull/868#issuecomment-1118340545

   Personally, I don't think we should allow users to load resources from 
anyware... It's not the sensible way to load the dictionary resources as far as 
I know. If you need external resources that do not on locally accessible paths, 
you should simply download or install them beforehand, then load them from the 
file - this is the convention I've seen so far.
   
   However, it's okay with me if we should support them as a general-purpose 
API.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries

2022-05-05 Thread GitBox


uschindler commented on PR #868:
URL: https://github.com/apache/lucene/pull/868#issuecomment-1118346475

   > Personally, I don't think we should allow users to load resources from 
anyware... It's not the sensible way to load the dictionary resources as far as 
I know. If you need external resources that do not on locally accessible paths, 
you should simply download or install them beforehand, then load them from the 
file - this is the convention I've seen so far.
   > 
   > However, it's okay with me if we should support them as a general-purpose 
API.
   
   It is also common to have them in JAR files. And if we want people to allow 
them to be supported in Solr, we would need to support URL, too. 
SolrResourceLoader also returns URL.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] romseygeek merged pull request #860: LUCENE-10553: Fix WANDScorer's handling of 0 and +Infty.

2022-05-05 Thread GitBox


romseygeek merged PR #860:
URL: https://github.com/apache/lucene/pull/860


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10553) WANDScorer's handling of 0 and +Infty is backwards

2022-05-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17532157#comment-17532157
 ] 

ASF subversion and git services commented on LUCENE-10553:
--

Commit 26301898b20e30f484d653ab6415d460d011b099 in lucene's branch 
refs/heads/main from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=26301898b20 ]

LUCENE-10553: Fix WANDScorer's handling of 0 and +Infty. (#860)

The computation of the scaling factor has special cases for these two values,
but the current logic is backwards.

> WANDScorer's handling of 0 and +Infty is backwards
> --
>
> Key: LUCENE-10553
> URL: https://issues.apache.org/jira/browse/LUCENE-10553
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> WANDScorer has special logic to deal with corner cases when the sum of the 
> maximum scores of sub queries is either 0 or +Infty, but both code and tests 
> have backwards logic regarding this special case, doing +1 instead of -1 and 
> vice-versa.
> This leads to a failed assertion in the case when the sum of the scores of 
> the sub queries overflows, which typically happens if one of the clauses has 
> a default implementation that returns MAX_VALUE if it cannot reason about max 
> scores.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mocobeta commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries

2022-05-05 Thread GitBox


mocobeta commented on PR #868:
URL: https://github.com/apache/lucene/pull/868#issuecomment-1118353023

   Thanks for explaining, then I don't have an objection to forward porting; 
maybe we'll need it to make it possible to load the resources from another jar.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] romseygeek opened a new pull request, #869: LUCENE-10436: Reinstate public getdocValuesdocIdSetIterator method on DocValues

2022-05-05 Thread GitBox


romseygeek opened a new pull request, #869:
URL: https://github.com/apache/lucene/pull/869

   When this was refactored previously, we moved a public static method from
   DocValuesFieldExistsQuery to the package-private DocValuesIterator class.  
This
   makes the method available again by moving it instead to the public 
DocValues 
   utility class.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] romseygeek commented on pull request #869: LUCENE-10436: Reinstate public getdocValuesdocIdSetIterator method on DocValues

2022-05-05 Thread GitBox


romseygeek commented on PR #869:
URL: https://github.com/apache/lucene/pull/869#issuecomment-1118356099

   When I backport I'll add a deprecated forwarding method to 
DocValuesFieldExistsQuery again, to make it a bit more obvious on how to 
migrate.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] romseygeek commented on pull request #869: LUCENE-10436: Reinstate public getdocValuesdocIdSetIterator method on DocValues

2022-05-05 Thread GitBox


romseygeek commented on PR #869:
URL: https://github.com/apache/lucene/pull/869#issuecomment-1118357618

   Having said that, looking at the existing MIGRATE instructions, maybe the 
most sensible thing is to have this method on FieldExistsQuery directly? Then 
existing code that references the deprecated class continues to work in 9x, and 
its a fairly simple search-and-replace for upgrading?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10553) WANDScorer's handling of 0 and +Infty is backwards

2022-05-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17532161#comment-17532161
 ] 

ASF subversion and git services commented on LUCENE-10553:
--

Commit efa5d6f4d4354ae87a1e6144dc70aeb52b98bfd2 in lucene's branch 
refs/heads/branch_9x from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=efa5d6f4d43 ]

LUCENE-10553: Fix WANDScorer's handling of 0 and +Infty. (#860)

The computation of the scaling factor has special cases for these two values,
but the current logic is backwards.

> WANDScorer's handling of 0 and +Infty is backwards
> --
>
> Key: LUCENE-10553
> URL: https://issues.apache.org/jira/browse/LUCENE-10553
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> WANDScorer has special logic to deal with corner cases when the sum of the 
> maximum scores of sub queries is either 0 or +Infty, but both code and tests 
> have backwards logic regarding this special case, doing +1 instead of -1 and 
> vice-versa.
> This leads to a failed assertion in the case when the sum of the scores of 
> the sub queries overflows, which typically happens if one of the clauses has 
> a default implementation that returns MAX_VALUE if it cannot reason about max 
> scores.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries

2022-05-05 Thread GitBox


uschindler commented on PR #868:
URL: https://github.com/apache/lucene/pull/868#issuecomment-1118408717

   OK. I opened this PR against 9.x because it makes it easier to add the 
changes in deprecation messages. When forward porting just tell the 
merge/cherrypick on main to "use theirs".


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries

2022-05-05 Thread GitBox


rmuir commented on PR #868:
URL: https://github.com/apache/lucene/pull/868#issuecomment-1118416936

   what is going on here? why are we allowing such stuff?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries

2022-05-05 Thread GitBox


rmuir commented on PR #868:
URL: https://github.com/apache/lucene/pull/868#issuecomment-1118418901

   > Personally, I don't think we should allow users to load resources from 
anyware... It's not the sensible way to load the dictionary resources as far as 
I know. If you need external resources that do not on locally accessible paths, 
you should simply download or install them beforehand, then load them from the 
file - this is the convention I've seen so far.
   
   +1, this is how i feel. i don't think we should be supporting Path/URL apis. 
Sorry this is really wrong. e.g. for ConnnectionCosts, the only ctor that we 
need is `ConnectionCosts()`. load from jar.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries

2022-05-05 Thread GitBox


uschindler commented on PR #868:
URL: https://github.com/apache/lucene/pull/868#issuecomment-1118422000

   In short: Some people like Mike Sokolov at Amazon wants to load custom 
ConnectionCosts and TermInfoDicts and Unk dicts from custom resources. This was 
working previously with file system and with classloader.
   
   The API of Classloading uses URL as resource, while while files use Path 
objects.
   
   So IMHO:
   - either remove both "Path and URL" ctors (my preference)
   - or add both (like done here to support Solr, Elasticsearch and Amazon to 
load dictionaries from custom JAR files)
   - or just add `IOSupplier` ctor, but that even worse as this is 
an internal class and should not be in public APIs.
   
   The question I ask also to @msokolov : Why the hell does naybody want to 
modify the dictioaries that are in highly propiertary format. To generate those 
files, you need to generate them with Gradle anyways (the FST, the compiler for 
connection costs). So anybody can just compile and package your own JAR file?
   
   So I agree with Robert here: Why do we need to make it customizable, there's 
no added value to provide some proprietary, non-standardized file formats to 
the ctor as external resource. This was a bug already in early versions of 
Kuromoji.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries

2022-05-05 Thread GitBox


rmuir commented on PR #868:
URL: https://github.com/apache/lucene/pull/868#issuecomment-1118425532

   yes, please, lets remove any File,Path,URL,URL,whatever ctors. The code is 
open-source if amazon wants to build a custom crazy jar. We can't make all the 
apis complicated and unusable for such stuff.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] msokolov commented on a diff in pull request #867: LUCENE-10558: expose stream-based Kuromoji resource constructors

2022-05-05 Thread GitBox


msokolov commented on code in PR #867:
URL: https://github.com/apache/lucene/pull/867#discussion_r865801802


##
lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/UnknownDictionary.java:
##
@@ -52,18 +52,26 @@ private UnknownDictionary() throws IOException {
 () -> getClassResource(DICT_FILENAME_SUFFIX));
   }
 
-  private UnknownDictionary(
-  IOSupplier targetMapResource,
-  IOSupplier posResource,
-  IOSupplier dictResource)
+  /**
+   * Create a {@link UnknownDictionary} from an external resource path.
+   *
+   * @param targetMap supplier for stream containing target map
+   * @param posDict supplier for stream containing POS dictionary
+   * @param dict supplier for stream containing dictionary entries
+   * @throws IOException if a stream could not be read
+   */
+  public UnknownDictionary(
+  IOSupplier targetMap,
+  IOSupplier posDict,
+  IOSupplier dict)

Review Comment:
   it's another day, I can no longer confirm nor deny, but Uwe's explanation 
makes sense to me :) If we keep this change, I'd be fine with the resource 
naming too, although it does have that classpath connotation? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries

2022-05-05 Thread GitBox


uschindler commented on PR #868:
URL: https://github.com/apache/lucene/pull/868#issuecomment-1118442012

   Let's give @msokolov a chance to comment.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] msokolov commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries

2022-05-05 Thread GitBox


msokolov commented on PR #868:
URL: https://github.com/apache/lucene/pull/868#issuecomment-1118443078

   > In case you ask: This works both with classpath and module usage. The 
caller-sensitive parts are `Class#getResource(String)`, 
`ClassLoader#getResource(String)`, and `Module#getResource(String)`. The 
returned URL is free to use anywhere so it separates concerns like the 
`IOSupplier` or `Path`.
   
   I was going to ask about this :) Didn't know about Module#getResource ... 
Anyway this approach seems sound. Although it's a little less general than the 
Stream-based approach, it does handle all the known use cases cleanly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #867: LUCENE-10558: expose stream-based Kuromoji resource constructors

2022-05-05 Thread GitBox


rmuir commented on PR #867:
URL: https://github.com/apache/lucene/pull/867#issuecomment-1118443273

   I don't think we shoudl do this, same reasons as stated on #868 
   
   These things should be loaded from jar as singletons and that's it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] msokolov commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries

2022-05-05 Thread GitBox


msokolov commented on PR #868:
URL: https://github.com/apache/lucene/pull/868#issuecomment-1118449412

   Oh, I missed Robert's objections. Sorry, I don't understand the problem 
here. The way Kuromoji works, it uses a language model that is trained from a 
corpus of text to do tokenization. We just want to use a different model 
trained on a different set of text. I'm not clear why that is seen as a bug.  
It's not a new file format; it's different contents using the existing file 
format. The format is not proprietary, it was promoted by Mecab I think, which 
is the tool used to train the dictionary, and is open-source.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries

2022-05-05 Thread GitBox


rmuir commented on PR #868:
URL: https://github.com/apache/lucene/pull/868#issuecomment-1118450647

   > Oh, I missed Robert's objections. Sorry, I don't understand the problem 
here. The way Kuromoji works, it uses a language model that is trained from a 
corpus of text to do tokenization. We just want to use a different model 
trained on a different set of text.
   
   Use the gradle build to make a jar then.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries

2022-05-05 Thread GitBox


uschindler commented on PR #868:
URL: https://github.com/apache/lucene/pull/868#issuecomment-1118457174

   > The format is not proprietary, it was promoted by Mecab I think, which is 
the tool used to train the dictionary, and is open-source.
   
   It is proprietary because the FST in the TokenInfoDict is a lucene specific 
version. The format may change and due to changes in the algorithm we may have 
a different format. We have no version numbers in the file format.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries

2022-05-05 Thread GitBox


uschindler commented on PR #868:
URL: https://github.com/apache/lucene/pull/868#issuecomment-1118462741

   I will leave that PR open for discussion. I just implemented the minimal 
approach + test to make the API compatible to classpath.
   
   We can still allow to use the IOSuppliers, but then we must remove the Path 
ctor, too. And we need a PR for both Nori and Kuromoji.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-10151) Add timeout support to IndexSearcher

2022-05-05 Thread Deepika Sharma (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17532223#comment-17532223
 ] 

Deepika Sharma edited comment on LUCENE-10151 at 5/5/22 12:16 PM:
--

I am exploring adding timeout support to the {{IndexSearcher}} by using 
{{ExitableDirectoryReader.}} However, one issue with 
{{ExitableDirectoryReader}} is that it enforces timeout checking at the time of 
instantiating {{BulkScorer}} and doesn't actually enforce it once you start 
iterating postings/impacts. This is being discussed in LUCENE-10544
I want to ask if there are any suggestions on alternative ways to approach this 
problem that I should consider?


was (Author: JIRAUSER288832):
I am exploring adding timeout support to the {{IndexSearcher}} by using 
{{ExitableDirectoryReader.}} However, one issue with 
{{ExitableDirectoryReader}} is that it enforces timeout checking at the time of 
instantiating {{BulkScorer}} and doesn't actually enforce it once you start 
iterating postings/impacts. This is being discussed in 
[LUCENE-10544|https://issues.apache.org/jira/browse/LUCENE-10544]
I want to ask if there are any suggestions on alternative ways to approach this 
problem that I should consider?{{{}{}}}

> Add timeout support to IndexSearcher
> 
>
> Key: LUCENE-10151
> URL: https://issues.apache.org/jira/browse/LUCENE-10151
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Reporter: Greg Miller
>Priority: Minor
>
> I'd like to explore adding optional "timeout" capabilities to 
> {{IndexSearcher}}. This would enable users to (optionally) specify a maximum 
> time budget for search execution. If the search "times out", partial results 
> would be available.
> This idea originated on the dev list (thanks [~jpountz] for the suggestion). 
> Thread for reference: 
> [http://mail-archives.apache.org/mod_mbox/lucene-dev/202110.mbox/%3CCAL8PwkZdNGmYJopPjeXYK%3DF7rvLkWon91UEXVxMM4MeeJ3UHxQ%40mail.gmail.com%3E]
>  
> A couple things to watch out for with this change:
>  # We want to make sure it's robust to a two-phase query evaluation scenario 
> where the "approximate" step matches a large number of candidates but the 
> "confirmation" step matches very few (or none). This is a particularly tricky 
> case.
>  # We want to make sure the {{TotalHits#Relation}} reported by {{TopDocs}} is 
> {{GREATER_THAN_OR_EQUAL_TO}} if the query times out
>  # We want to make sure it plays nice with the {{LRUCache}} since it iterates 
> the query to pre-populate a {{BitSet}} when caching. That step shouldn't be 
> allowed to overrun the timeout. The proper way to handle this probably needs 
> some thought.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10151) Add timeout support to IndexSearcher

2022-05-05 Thread Deepika Sharma (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17532223#comment-17532223
 ] 

Deepika Sharma commented on LUCENE-10151:
-

I am exploring adding timeout support to the {{IndexSearcher}} by using 
{{ExitableDirectoryReader.}} However, one issue with 
{{ExitableDirectoryReader}} is that it enforces timeout checking at the time of 
instantiating {{BulkScorer}} and doesn't actually enforce it once you start 
iterating postings/impacts. This is being discussed in 
[LUCENE-10544|https://issues.apache.org/jira/browse/LUCENE-10544]
I want to ask if there are any suggestions on alternative ways to approach this 
problem that I should consider?{{{}{}}}

> Add timeout support to IndexSearcher
> 
>
> Key: LUCENE-10151
> URL: https://issues.apache.org/jira/browse/LUCENE-10151
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Reporter: Greg Miller
>Priority: Minor
>
> I'd like to explore adding optional "timeout" capabilities to 
> {{IndexSearcher}}. This would enable users to (optionally) specify a maximum 
> time budget for search execution. If the search "times out", partial results 
> would be available.
> This idea originated on the dev list (thanks [~jpountz] for the suggestion). 
> Thread for reference: 
> [http://mail-archives.apache.org/mod_mbox/lucene-dev/202110.mbox/%3CCAL8PwkZdNGmYJopPjeXYK%3DF7rvLkWon91UEXVxMM4MeeJ3UHxQ%40mail.gmail.com%3E]
>  
> A couple things to watch out for with this change:
>  # We want to make sure it's robust to a two-phase query evaluation scenario 
> where the "approximate" step matches a large number of candidates but the 
> "confirmation" step matches very few (or none). This is a particularly tricky 
> case.
>  # We want to make sure the {{TotalHits#Relation}} reported by {{TopDocs}} is 
> {{GREATER_THAN_OR_EQUAL_TO}} if the query times out
>  # We want to make sure it plays nice with the {{LRUCache}} since it iterates 
> the query to pre-populate a {{BitSet}} when caching. That step shouldn't be 
> allowed to overrun the timeout. The proper way to handle this probably needs 
> some thought.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries

2022-05-05 Thread GitBox


rmuir commented on PR #868:
URL: https://github.com/apache/lucene/pull/868#issuecomment-1118480829

   Uwe explains it clearly, such APIs are impossible to support correctly and 
we don't need to add backwards-compatibility or anything else. 
   
   Files such as `ConnectionCosts.dat` are not standardized or anything like 
like that, this is internal details. If we want to make a PR to save 1 byte 
compressing this file a bit better, we should be able to merge it without 
hesitation or worrying about back compat or any other insanity. 
   
   We can't and shouldn't support ctors taking binary versions of this stuff.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #869: LUCENE-10436: Reinstate public getdocValuesdocIdSetIterator method on DocValues

2022-05-05 Thread GitBox


rmuir commented on PR #869:
URL: https://github.com/apache/lucene/pull/869#issuecomment-1118511195

   Why in the world are we moving a method to DocValues API that is only used 
by 3 callsites. Please, let's make it package private somewhere else.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] romseygeek commented on pull request #869: LUCENE-10436: Reinstate public getdocValuesdocIdSetIterator method on DocValues

2022-05-05 Thread GitBox


romseygeek commented on PR #869:
URL: https://github.com/apache/lucene/pull/869#issuecomment-1118516530

   > Please, let's make it package private somewhere else.
   
   It already is package private, but it was public before, and we use it in 
elasticsearch code.  I'm happy to put it elsewhere (on FieldExistsQuery maybe?) 
but I don't think we can just remove public methods


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #869: LUCENE-10436: Reinstate public getdocValuesdocIdSetIterator method on DocValues

2022-05-05 Thread GitBox


rmuir commented on PR #869:
URL: https://github.com/apache/lucene/pull/869#issuecomment-1118526833

   > It already is package private, but it was public before, and we use it in 
elasticsearch code. I'm happy to put it elsewhere (on FieldExistsQuery maybe?) 
but I don't think we can just remove public methods
   
   Sure we can. used by elasticsearch doesn't mean its a requirement to be 
public. Sorry, this is just a bit of a pain point as the most recent 2 pull 
requests in lucene are API changes just like this: for elasticsearch and amazon 
respectively.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] msokolov commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries

2022-05-05 Thread GitBox


msokolov commented on PR #868:
URL: https://github.com/apache/lucene/pull/868#issuecomment-1118535867

   > Use the gradle build to make a jar then.
   
   Is the idea that we would fork analysis/kuromoji package? That would be sad, 
but maybe you meant something else? Uwe mentioned some kind of 
classpath-loading approach, but I think that would depend on the classpath 
order, which is really fragile and not reliable in my experience. Still I may 
be missing something. If we are going to remove a feature that was supported in 
9.0 (and before), can we please clearly document how we can support the same 
use case going forward?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] romseygeek commented on pull request #869: LUCENE-10436: Reinstate public getdocValuesdocIdSetIterator method on DocValues

2022-05-05 Thread GitBox


romseygeek commented on PR #869:
URL: https://github.com/apache/lucene/pull/869#issuecomment-1118537137

   I'd argue that it's a revert of an API change - it's a public method in 9.1 
and currently we're removing it in 9.2 with no CHANGES entry or information 
about how to migrate.  And the fact that we're using it in ES suggests that 
there may be other users of it as well.  If we really think it shouldn't be a 
public method, fine, but we should at least have some information on how 
consumers who are using it at the moment should handle upgrades?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] msokolov commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries

2022-05-05 Thread GitBox


msokolov commented on PR #868:
URL: https://github.com/apache/lucene/pull/868#issuecomment-1118552882

   Also - I don't really buy the idea that we can't support binary file formats 
- the entire index is filled with binary files. In this case we provide tools 
for generating these files, so users are free to regenerate them from source 
when Lucene version changes. There's no need to backwards-compatibly support 
old formats.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #869: LUCENE-10436: Reinstate public getdocValuesdocIdSetIterator method on DocValues

2022-05-05 Thread GitBox


rmuir commented on PR #869:
URL: https://github.com/apache/lucene/pull/869#issuecomment-1118557005

   I'm not so opposed to the method being public somewhere, I'm more 
questioning the need to put it in `DocValues` api. This is what grabbed my 
attention. Would love to keep this API simple and minimal and without exotic 
stuff. Today the methods it uses are type-safe and here we are adding a 
relatively "untyped" method to get a generic iterator over any DV type. If you 
look at the other methods in the file, it really doesn't fit.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mocobeta commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries

2022-05-05 Thread GitBox


mocobeta commented on PR #868:
URL: https://github.com/apache/lucene/pull/868#issuecomment-1118561015

   I have complex feelings about it and I understand both opinions...
   
   Current APIs in 9.x to load custom resources are not perfect (or bad, I dare 
to say), meanwhile "customizable/switchable dictionary" is a general idea and 
advanced users would often need it. We still don't have good APIs to support 
such advanced users - skilled developers who possibly could contribute to 
Lucene - personally, I'd like to continue discussions of how to improve our 
current APIs, instead of simply discarding them.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries

2022-05-05 Thread GitBox


uschindler commented on PR #868:
URL: https://github.com/apache/lucene/pull/868#issuecomment-1118585168

   > Also - I don't really buy the idea that we can't support binary file 
formats - the entire index is filled with binary files. In this case we provide 
tools for generating these files, so users are free to regenerate them from 
source when Lucene version changes. There's no need to backwards-compatibly 
support old formats.
   
   This is still odd, because we have not much error handling in those file 
formats, because the code was written to load it from the JAR file, so it is 
basically more or less a dump of the FST and ConnectionCosts. Sure you can 
regenerate them, but what is the issue in then also call `gradlew jar`? I think 
that's the main issue here: You need Lucene's source code anyways to build the 
dictionaries, you have to put the source files somewhere, so you actually 
forking lucene at that point.
   
   If we really want to support external dictionaries we should refactor the 
API so you can load just one combined (CFS/ZIP like file) that you can easily 
drop anywhere. This file would encode some version number in it and if you load 
a file thats not using actual version it bails out.
   
   What I would propose:
   - Add a gradle task that builds a dictionary package and that should be the 
same for Nori and Kuromoji, just different input files
   - Have the same factory class and exact same implementation for both 
dictionaries (I think @mocobeta is working on this). So a user should be able 
to load a single (zip-like) file and pass it to analyzer/tokenizer and it will 
automatically be Nori or Kuromojo, no matter what. The API is then very simple: 
`MorphologicalModel#load(aSingleFileNameOrURLOrInputStream)`
   - The default Tokenizers shipped in Lucene have no custom ctors, so 
JapaneseTokenizer behind the scenes loads a single japanese dictionary file 
from classpath. Anybody wanting to load any other file will use a generic 
tokenizer impl. The Japanese one shipped with lucene uses its default 
dictionary. Maybe we could also put the tokenizer in its separate JAR file (for 
both Japan and Korea) and ship the defacult dictionaries as separate JAR files 
on Maven central.
   
   The main desaster is the number of files which also makes it very 
error-prone. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] romseygeek commented on pull request #869: LUCENE-10436: Reinstate public getdocValuesdocIdSetIterator method on DocValues

2022-05-05 Thread GitBox


romseygeek commented on PR #869:
URL: https://github.com/apache/lucene/pull/869#issuecomment-1118588920

   That I agree with!  I'll update and put it on FieldExistsQuery - it was on 
DocValuesFieldExistsQuery before, which has been deprecated but now extends 
FEQ, and so it should make the transition a lot easier.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries

2022-05-05 Thread GitBox


uschindler commented on PR #868:
URL: https://github.com/apache/lucene/pull/868#issuecomment-1118606210

   To come back to current issue: I see no problem in this PR it does not make 
it worse, just better. Path and URI are just holders of a resource. Path is 
used for filesystem, URL is returned by getResource() methods. I would like URI 
more, but Java's classloading uses URL also with modue system, so it is the 
only "correct" way to refer to a resource in a class or module loader. I know, 
Robert does not like some details of the URL class, but they don't us here. The 
new API is definitely better than the old deprecated one, which was also broken 
from beginning leading to the confusion that @msokolov has seen.
   
   We should mark the Path and URL APIs with `@lucene.internal` and warn users 
that we provide no garantiees when you use it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries

2022-05-05 Thread GitBox


uschindler commented on PR #868:
URL: https://github.com/apache/lucene/pull/868#issuecomment-1118611460

   The alternative would be - as said before - remove Path ctors and use 
IOSupplier only. But that's worse (maybe it prevents people from doing this, 
haha).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10550) Add getAllChildren functionality to facets

2022-05-05 Thread Greg Miller (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Miller updated LUCENE-10550:
-
Component/s: modules/facet

> Add getAllChildren functionality to facets
> --
>
> Key: LUCENE-10550
> URL: https://issues.apache.org/jira/browse/LUCENE-10550
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/facet
>Reporter: Yuting Gan
>Priority: Minor
>
> Currently Lucene does not support returning range counts sorted by label 
> values, but there are use cases demanding this feature. For example, a user 
> specifies ranges (e.g., [0, 10], [10, 20]) and wants to get range counts 
> without changing the range order. Today we can only call getTopChildren to 
> populate range counts, but it would return ranges sorted by counts (e.g., 
> [10, 20] 100, [0, 10] 50) instead of range values. 
> Lucene has a API, getAllChildrenSortByValue, that returns numeric values with 
> counts sorted by label values, please see 
> [LUCENE-7927|https://issues.apache.org/jira/browse/LUCENE-7927] for details. 
> Therefore, it would be nice that we can also have a similar API to support 
> range counts. The proposed getAllChildren API is to return value/range counts 
> sorted by label values instead of counts. 
> This proposal was inspired from the discussions with [~gsmiller] when I was 
> working on the LUCENE-10538 [PR|https://github.com/apache/lucene/pull/843], 
> and we believe users would benefit from adding this API to Facets. 
> Hope I can get some feedback from the community since this proposal would 
> require changes to the getTopChildren API in RangeFacetCounts. Thanks!



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-10550) Add getAllChildren functionality to facets

2022-05-05 Thread Greg Miller (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17532277#comment-17532277
 ] 

Greg Miller edited comment on LUCENE-10550 at 5/5/22 2:37 PM:
--

I'm also +1 on this but with a minor suggestion.

{quote}The proposed getAllChildren API is to return value/range counts sorted 
by label values instead of counts. {quote}

I wonder if we should "sort" at all for this functionality? If we're returning 
all children for a specified path, the caller can just as easily sort by 
whatever criteria they want (or maybe none at all), so sorting within the 
implementation might be wasteful. Also, for range faceting, the user is 
providing a list of ranges they care about up-front in a specific order. I 
would actually propose we retain that order instead of sorting by the range 
"values" in some way. This is what range faceting currently implements 
(somewhat confusingly) behind the {{getTopChildren}} API. The order of those 
ranges might have some meaning to the caller, so it might be best to retain it. 
What do you think?


was (Author: gsmiller):
I'm also +1 on this but with a minor suggestion.

> The proposed getAllChildren API is to return value/range counts sorted by 
> label values instead of counts. 

I wonder if we should "sort" at all for this functionality? If we're returning 
all children for a specified path, the caller can just as easily sort by 
whatever criteria they want (or maybe none at all), so sorting within the 
implementation might be wasteful. Also, for range faceting, the user is 
providing a list of ranges they care about up-front in a specific order. I 
would actually propose we retain that order instead of sorting by the range 
"values" in some way. This is what range faceting currently implements 
(somewhat confusingly) behind the {{getTopChildren}} API. The order of those 
ranges might have some meaning to the caller, so it might be best to retain it. 
What do you think?

> Add getAllChildren functionality to facets
> --
>
> Key: LUCENE-10550
> URL: https://issues.apache.org/jira/browse/LUCENE-10550
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/facet
>Reporter: Yuting Gan
>Priority: Minor
>
> Currently Lucene does not support returning range counts sorted by label 
> values, but there are use cases demanding this feature. For example, a user 
> specifies ranges (e.g., [0, 10], [10, 20]) and wants to get range counts 
> without changing the range order. Today we can only call getTopChildren to 
> populate range counts, but it would return ranges sorted by counts (e.g., 
> [10, 20] 100, [0, 10] 50) instead of range values. 
> Lucene has a API, getAllChildrenSortByValue, that returns numeric values with 
> counts sorted by label values, please see 
> [LUCENE-7927|https://issues.apache.org/jira/browse/LUCENE-7927] for details. 
> Therefore, it would be nice that we can also have a similar API to support 
> range counts. The proposed getAllChildren API is to return value/range counts 
> sorted by label values instead of counts. 
> This proposal was inspired from the discussions with [~gsmiller] when I was 
> working on the LUCENE-10538 [PR|https://github.com/apache/lucene/pull/843], 
> and we believe users would benefit from adding this API to Facets. 
> Hope I can get some feedback from the community since this proposal would 
> require changes to the getTopChildren API in RangeFacetCounts. Thanks!



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10550) Add getAllChildren functionality to facets

2022-05-05 Thread Greg Miller (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17532277#comment-17532277
 ] 

Greg Miller commented on LUCENE-10550:
--

I'm also +1 on this but with a minor suggestion.

> The proposed getAllChildren API is to return value/range counts sorted by 
> label values instead of counts. 

I wonder if we should "sort" at all for this functionality? If we're returning 
all children for a specified path, the caller can just as easily sort by 
whatever criteria they want (or maybe none at all), so sorting within the 
implementation might be wasteful. Also, for range faceting, the user is 
providing a list of ranges they care about up-front in a specific order. I 
would actually propose we retain that order instead of sorting by the range 
"values" in some way. This is what range faceting currently implements 
(somewhat confusingly) behind the {{getTopChildren}} API. The order of those 
ranges might have some meaning to the caller, so it might be best to retain it. 
What do you think?

> Add getAllChildren functionality to facets
> --
>
> Key: LUCENE-10550
> URL: https://issues.apache.org/jira/browse/LUCENE-10550
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/facet
>Reporter: Yuting Gan
>Priority: Minor
>
> Currently Lucene does not support returning range counts sorted by label 
> values, but there are use cases demanding this feature. For example, a user 
> specifies ranges (e.g., [0, 10], [10, 20]) and wants to get range counts 
> without changing the range order. Today we can only call getTopChildren to 
> populate range counts, but it would return ranges sorted by counts (e.g., 
> [10, 20] 100, [0, 10] 50) instead of range values. 
> Lucene has a API, getAllChildrenSortByValue, that returns numeric values with 
> counts sorted by label values, please see 
> [LUCENE-7927|https://issues.apache.org/jira/browse/LUCENE-7927] for details. 
> Therefore, it would be nice that we can also have a similar API to support 
> range counts. The proposed getAllChildren API is to return value/range counts 
> sorted by label values instead of counts. 
> This proposal was inspired from the discussions with [~gsmiller] when I was 
> working on the LUCENE-10538 [PR|https://github.com/apache/lucene/pull/843], 
> and we believe users would benefit from adding this API to Facets. 
> Hope I can get some feedback from the community since this proposal would 
> require changes to the getTopChildren API in RangeFacetCounts. Thanks!



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries

2022-05-05 Thread GitBox


rmuir commented on PR #868:
URL: https://github.com/apache/lucene/pull/868#issuecomment-1118635459

   > I know, Robert does not like some details of the URL class, but they don't 
hit us here.
   
   https://twitter.com/tnurkiewicz/status/1519643900423688192
   
   Sorry, couldn't resist


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10538) TopN is not being used in getTopChildren()

2022-05-05 Thread Greg Miller (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Miller updated LUCENE-10538:
-
Component/s: modules/facet

> TopN is not being used in getTopChildren()
> --
>
> Key: LUCENE-10538
> URL: https://issues.apache.org/jira/browse/LUCENE-10538
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/facet
>Reporter: Yuting Gan
>Priority: Minor
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> When looking at the overridden implementation getTopChildren(int topN, String 
> dim, String... path) in RangeFacetCounts, I found that the topN parameter is 
> not being used in the code, and the unit tests did not test this function 
> properly. I will create a PR to fix this, and will look into other overridden 
> implementations and see if they have the same issue. Please let me know if 
> there is any question. Thanks!



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mayya-sharipova commented on pull request #792: LUCENE-10502: Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle ordToDoc

2022-05-05 Thread GitBox


mayya-sharipova commented on PR #792:
URL: https://github.com/apache/lucene/pull/792#issuecomment-1118635984

   @LuXugang Do you want to make this PR against `apache:vectors-disi-direct` 
branch, so that we can merge it into this branch.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mocobeta commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries

2022-05-05 Thread GitBox


mocobeta commented on PR #868:
URL: https://github.com/apache/lucene/pull/868#issuecomment-1118638144

   > I see no problem in this PR - it does not make it worse, just better.
   
   Thanks @uschindler, I totally agree with that.
   
   I am +1 to this PR and keeping IOSupplier-based ones private.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] LuXugang commented on pull request #792: LUCENE-10502: Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle ordToDoc

2022-05-05 Thread GitBox


LuXugang commented on PR #792:
URL: https://github.com/apache/lucene/pull/792#issuecomment-1118642667

   Yes, thanks  @mayya-sharipova ,then I will continue to do the format changes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mayya-sharipova commented on pull request #792: LUCENE-10502: Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle ordToDoc

2022-05-05 Thread GitBox


mayya-sharipova commented on PR #792:
URL: https://github.com/apache/lucene/pull/792#issuecomment-1118648748

   @LuXugang Please change the base of this PR to `apache:vectors-disi-direct` 
yourself, and after that I can merge this PR to this branch. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mocobeta commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries

2022-05-05 Thread GitBox


mocobeta commented on PR #868:
URL: https://github.com/apache/lucene/pull/868#issuecomment-1118651872

   > I would like URI more,
   
   The same is true for me. We'd need URI, not URL...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] LuXugang commented on pull request #792: LUCENE-10502: Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle ordToDoc

2022-05-05 Thread GitBox


LuXugang commented on PR #792:
URL: https://github.com/apache/lucene/pull/792#issuecomment-1118656979

   > @LuXugang Please change the base of this PR to 
`apache:vectors-disi-direct` yourself, and after that I can merge this PR to 
this branch.
   
   @mayya-sharipova done.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries

2022-05-05 Thread GitBox


uschindler commented on PR #868:
URL: https://github.com/apache/lucene/pull/868#issuecomment-1118665915

   > > I would like URI more,
   > 
   > The same is true for me. We'd need URI, not URL...
   
   The difference is that URI is just an identificator, but to get an 
InputStream you need to convert it to URL anyways.
   
   How about the following: We want to provide the URL-taking ctors only for 
the resource case, so lets add a check like `if (! 
url.getScheme().equalsIgnoreCase("jar")) throw new 
IllegalArgumentException("This method can only be called with classpath based 
resources, for file system resources use Path. Networking is not supported.");`
   
   Although I see no risk here with equals(). That's well known and not a 
sourceof problems here. I agree with you, but the URL class is fine, if it 
would not have this horrible bug (no idea why it is not fixed).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mocobeta commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries

2022-05-05 Thread GitBox


mocobeta commented on PR #868:
URL: https://github.com/apache/lucene/pull/868#issuecomment-1118816930

   Ah sorry for my vague comment, I know the difference between URI and URL; 
and I understand there is no problem in using URL here.
   
   I think we can ship this with 9.2? As an engineer who belongs to a company 
(the scale of it is very different though), I think I understand Mike's 
position and argument.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries

2022-05-05 Thread GitBox


uschindler commented on PR #868:
URL: https://github.com/apache/lucene/pull/868#issuecomment-1118841012

   Sure, I would proceed with this plan and merge to 9.x and main. If we have 
better ideas (like proposed earlier) for main we can change it there.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] risdenk commented on pull request #2651: SOLR-16110 Using Schema/Config API breaks the File-Upload of Config Set File

2022-05-05 Thread GitBox


risdenk commented on PR #2651:
URL: https://github.com/apache/lucene-solr/pull/2651#issuecomment-1118841603

   superceded by https://github.com/apache/solr/pull/831


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] risdenk closed pull request #2651: SOLR-16110 Using Schema/Config API breaks the File-Upload of Config Set File

2022-05-05 Thread GitBox


risdenk closed pull request #2651: SOLR-16110 Using Schema/Config API breaks 
the File-Upload of Config Set File
URL: https://github.com/apache/lucene-solr/pull/2651


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries

2022-05-05 Thread GitBox


uschindler commented on PR #868:
URL: https://github.com/apache/lucene/pull/868#issuecomment-1118843405

   I still don't know if we can fix the deprecated ctors to handle the 
CLASSPATH resource name correctly. In my original LUCENE-10335 change it was 
still working (we had a test for it), but it seemed to have been broken after 
change to `IOSupplier`.
   
   Let me look and try to reintroduce the Lucene 9.0 and previous behaviour of 
the deprecated ctor. I am out of office now, maybe later this evening!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #868: LUCENE-10558: Implement URL ctor to support classpath/module usage in Kuromoji and Nori dictionaries

2022-05-05 Thread GitBox


uschindler commented on PR #868:
URL: https://github.com/apache/lucene/pull/868#issuecomment-1118848400

   I think I can fix the old ctor so it works again.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mocobeta commented on pull request #540: LUCENE-10312: Add PersianStemmer

2022-05-05 Thread GitBox


mocobeta commented on PR #540:
URL: https://github.com/apache/lucene/pull/540#issuecomment-1118861297

   I'm sorry for the late response. I just kicked the CI - I'll take a look. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10502) Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle ordToDoc

2022-05-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17532447#comment-17532447
 ] 

ASF subversion and git services commented on LUCENE-10502:
--

Commit b3867da5443f58c554a3fd8391d3c98e0b2b7790 in lucene's branch 
refs/heads/vectors-disi-direct from Lu Xugang
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=b3867da5443 ]

LUCENE-10502: Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader 
to handle ordToDoc (#792)

* LUCENE-10502: Use IndexedDISI to store docIds and 
DirectMonotonicWriter/Reader to handle ordToDoc

* TestBackwardsCompatibility was temporarily removed for skipping test

* dense case and empty case do not need to store ordToMap mapping

* fix

* invert if condition

* add subclass of dense sparse and empty

* spotless

* remove the ord variable

* move `getOffHeapVectorValues` to `OffHeapVectorValues` class as a static 
method and rename it as `load`

* move the getAcceptOrds method to OffHeapVectorValues

* move OffHeapVectorValues to its own class

* keep OffHeapVectorValues and all its subclasses in one place

* make `OffHeapVectorValues`'s subclasses private except 
`DenseOffHeapVectorValues`

* add some short comments

> Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle 
> ordToDoc 
> 
>
> Key: LUCENE-10502
> URL: https://issues.apache.org/jira/browse/LUCENE-10502
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 9.1
>Reporter: Lu Xugang
>Priority: Major
>  Time Spent: 9h
>  Remaining Estimate: 0h
>
> Since at search phase, vector's all docs of all fields will be fully loaded 
> into memory, could we use IndexedDISI to store docIds and 
> DirectMonotonicWriter/Reader to handle ordToDoc mapping?



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mayya-sharipova commented on pull request #792: LUCENE-10502: Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle ordToDoc

2022-05-05 Thread GitBox


mayya-sharipova commented on PR #792:
URL: https://github.com/apache/lucene/pull/792#issuecomment-1118864081

   @LuXugang Thanks, feel free to create a follow-up format PR against 
`apache:vectors-disi-direct`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mayya-sharipova merged pull request #792: LUCENE-10502: Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle ordToDoc

2022-05-05 Thread GitBox


mayya-sharipova merged PR #792:
URL: https://github.com/apache/lucene/pull/792


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



  1   2   >