[GitHub] [lucene] dweiss merged pull request #209: LUCENE-10021: Upgrade HPPC to 0.9.0.

2021-07-13 Thread GitBox


dweiss merged pull request #209:
URL: https://github.com/apache/lucene/pull/209


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10021) Upgrade HPPC to 0.9.0

2021-07-13 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379864#comment-17379864
 ] 

ASF subversion and git services commented on LUCENE-10021:
--

Commit caa822ff38ab1b1e48b930aff28d5bd18c6eea93 in lucene's branch 
refs/heads/main from Patrick Zhai
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=caa822f ]

LUCENE-10021: Upgrade HPPC to 0.9.0. Replace usage of ...ScatterMap to 
...HashMap (#209)



> Upgrade HPPC to 0.9.0
> -
>
> Key: LUCENE-10021
> URL: https://issues.apache.org/jira/browse/LUCENE-10021
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Haoyu Zhai
>Priority: Trivial
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HPPC 0.9.0 was out and we probably should upgrade.
> The {{...ScatterMap}} was deprecated in 0.9.0 and I think we're still using 
> them in a few places so probably we should measure the performance impact if 
> there is. (According to [release 
> note|https://github.com/carrotsearch/hppc/releases] there shouldn't be any)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-10021) Upgrade HPPC to 0.9.0

2021-07-13 Thread Dawid Weiss (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss resolved LUCENE-10021.
--
Fix Version/s: main (9.0)
   Resolution: Fixed

> Upgrade HPPC to 0.9.0
> -
>
> Key: LUCENE-10021
> URL: https://issues.apache.org/jira/browse/LUCENE-10021
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Haoyu Zhai
>Priority: Trivial
> Fix For: main (9.0)
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> HPPC 0.9.0 was out and we probably should upgrade.
> The {{...ScatterMap}} was deprecated in 0.9.0 and I think we're still using 
> them in a few places so probably we should measure the performance impact if 
> there is. (According to [release 
> note|https://github.com/carrotsearch/hppc/releases] there shouldn't be any)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] msokolov commented on pull request #207: LUCENE-9855: Rename nn search vector format

2021-07-13 Thread GitBox


msokolov commented on pull request #207:
URL: https://github.com/apache/lucene/pull/207#issuecomment-879141625


   Re: `VectorValues`; I think we changed it to avoid possible confusion w/term 
vectors, but perhaps we are agreed that it is distnguished enough already. I'm 
fine keeping as is. re: Nn vs Knn :shrug: 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10023) Multi-token post-analysis DocValues

2021-07-13 Thread Michael Gibney (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17380092#comment-17380092
 ] 

Michael Gibney commented on LUCENE-10023:
-

{quote}this sentence is a bit misleading and means that we don't support this 
aggregation on fields that enable only doc values, ie. we require the field to 
be indexed to have access to term frequencies.
{quote}

Ah, ok! For "significant_terms" it looks like the "subset" (foreground set) 
count is calculated via docValues API, but the field must be indexed in order 
to calculate "superset" (background set) count, via one of:
# accessing static doc freq (for terms with no backgroundFilter), or
# calculating the intersection of backgroundFilter with each candidate bucket 
value (either via FilterableTermsEnum or BooleanQuery).

In any case, iiuc this approach is problematic for "full text" mainly because 
"full text" fields tend to be high-cardinality. Put another way: 
"significant_terms" over a hypothetical "full text" field with post-analysis 
DocValues enabled would be no less performant than over a DocValues-enabled 
keyword field of equivalent cardinality (or perhaps _slightly_ less performant 
due to higher mean per-term docFreq). This is not a revolutionary observation 
... but it's relevant because an entirely DocValues-driven method of 
calculating "relatedness"/"significant_terms" (as is the case now in Solr) 
should scale well enough wrt field cardinality that full-domain 
"significant_terms" would become viable over "full text" fields. In this 
context, there is a practical reason to prefer multi-token post-analysis 
DocValues for "full text" fields, as opposed to a restricted-domain, 
term-vectors-based approach.

I'm mainly mentioning this because I agree that in the _absence_ of an purely 
DocValues-driven approach to calculating "relatedness"/"significant_terms", the 
practical argument in favor of multi-token post-analysis DocValues for 
"significant_terms" over full text would indeed be weak; so it's worth noting 
that such a purely DocValues-driven approach has in fact been implemented.

> Multi-token post-analysis DocValues
> ---
>
> Key: LUCENE-10023
> URL: https://issues.apache.org/jira/browse/LUCENE-10023
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Michael Gibney
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The single-token case for post-analysis DocValues is accounted for by 
> {{Analyzer.normalize(...)}} (and formerly {{MultiTermAwareComponent}}); but 
> there are cases where it would be desirable to have post-analysis DocValues 
> based on multi-token fields.
> The main use cases that I can think of are variants of faceting/terms 
> aggregation. I understand that this could be viewed as "trappy" for the naive 
> "Moby Dick word cloud" case; but:
> # I think this can be supported fairly cleanly in Lucene
> # Explicit user configuration of this option would help prevent people 
> shooting themselves in the foot
> # The current situation is arguably "trappy" as well; it just offloads the 
> trappiness onto Lucene-external workarounds for systems/users that want to 
> support this kind of behavior
> # Integrating this functionality directly in Lucene would afford consistency 
> guarantees that present opportunities for future optimizations (e.g., shared 
> Terms dictionary between indexed terms and DocValues).
> This issue proposes adding support for multi-token post-analysis DocValues 
> directly to {{IndexingChain}}. The initial proposal involves extending the 
> API to include {{IndexableFieldType.tokenDocValuesType()}} (in addition to 
> existing {{IndexableFieldType.docValuesType()}}).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9177) ICUNormalizer2CharFilter worst case is very slow

2021-07-13 Thread Michael Gibney (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17380108#comment-17380108
 ] 

Michael Gibney commented on LUCENE-9177:


[~jim.ferenczi], [~rcmuir]: wondering if either of you have had a chance to 
look at the [associated PR|https://github.com/apache/lucene/pull/199]? It's a 
pretty manageable-sized change, and I think it directly addresses the concern 
raised in this issue. (fwiw, I beasted the {{TestICUNormalizer2CharFilter}} 
suite several hundred times and encountered no problems).

> ICUNormalizer2CharFilter worst case is very slow
> 
>
> Key: LUCENE-9177
> URL: https://issues.apache.org/jira/browse/LUCENE-9177
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Jim Ferenczi
>Priority: Minor
> Attachments: LUCENE-9177-benchmark-test.patch, 
> LUCENE-9177_LUCENE-8972.patch, lucene.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> ICUNormalizer2CharFilter is fast most of the times but we've had some report 
> in Elasticsearch that some unrealistic data can slow down the process very 
> significantly. For instance an input that consists of characters to normalize 
> with no normalization-inert character in between can take up to several 
> seconds to process few hundreds of kilo-bytes on my machine. While the input 
> is not realistic, this worst case can slow down indexing considerably when 
> dealing with uncleaned data.
> I attached a small test that reproduces the slow processing using a stream 
> that contains a lot of repetition of the character `℃` and no 
> normalization-inert character. I am not surprised that the processing is 
> slower than usual but several seconds to process seems a lot. Adding 
> normalization-inert character makes the processing a lot more faster so I 
> wonder if we can improve the process to split the input more eagerly ?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9177) ICUNormalizer2CharFilter worst case is very slow

2021-07-13 Thread Robert Muir (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17380111#comment-17380111
 ] 

Robert Muir commented on LUCENE-9177:
-

i haven't had a chance to test it out, I didn't have any plan yet given that 
the randomized test is completely disabled: 
https://github.com/apache/lucene/blob/main/lucene/analysis/icu/src/test/org/apache/lucene/analysis/icu/TestICUNormalizer2CharFilter.java#L226-L227

> ICUNormalizer2CharFilter worst case is very slow
> 
>
> Key: LUCENE-9177
> URL: https://issues.apache.org/jira/browse/LUCENE-9177
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Jim Ferenczi
>Priority: Minor
> Attachments: LUCENE-9177-benchmark-test.patch, 
> LUCENE-9177_LUCENE-8972.patch, lucene.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> ICUNormalizer2CharFilter is fast most of the times but we've had some report 
> in Elasticsearch that some unrealistic data can slow down the process very 
> significantly. For instance an input that consists of characters to normalize 
> with no normalization-inert character in between can take up to several 
> seconds to process few hundreds of kilo-bytes on my machine. While the input 
> is not realistic, this worst case can slow down indexing considerably when 
> dealing with uncleaned data.
> I attached a small test that reproduces the slow processing using a stream 
> that contains a lot of repetition of the character `℃` and no 
> normalization-inert character. I am not surprised that the processing is 
> slower than usual but several seconds to process seems a lot. Adding 
> normalization-inert character makes the processing a lot more faster so I 
> wonder if we can improve the process to split the input more eagerly ?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9177) ICUNormalizer2CharFilter worst case is very slow

2021-07-13 Thread Michael Gibney (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17380123#comment-17380123
 ] 

Michael Gibney commented on LUCENE-9177:


Interesting; some of the randomized tests were still enabled, but I confess I 
was relying on existing, enabled tests to catch regressions and had not 
considered the disabled test you pointed out. That said, I just re-enabled that 
test locally and am beasting without encountering any problems -- neither on 
current main branch, nor with the patch for LUCENE-9177 applied. I wonder 
whether LUCENE-5595 might have been fixed incidentally by some more general fix 
to CharFilter offset correction?

> ICUNormalizer2CharFilter worst case is very slow
> 
>
> Key: LUCENE-9177
> URL: https://issues.apache.org/jira/browse/LUCENE-9177
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Jim Ferenczi
>Priority: Minor
> Attachments: LUCENE-9177-benchmark-test.patch, 
> LUCENE-9177_LUCENE-8972.patch, lucene.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> ICUNormalizer2CharFilter is fast most of the times but we've had some report 
> in Elasticsearch that some unrealistic data can slow down the process very 
> significantly. For instance an input that consists of characters to normalize 
> with no normalization-inert character in between can take up to several 
> seconds to process few hundreds of kilo-bytes on my machine. While the input 
> is not realistic, this worst case can slow down indexing considerably when 
> dealing with uncleaned data.
> I attached a small test that reproduces the slow processing using a stream 
> that contains a lot of repetition of the character `℃` and no 
> normalization-inert character. I am not surprised that the processing is 
> slower than usual but several seconds to process seems a lot. Adding 
> normalization-inert character makes the processing a lot more faster so I 
> wonder if we can improve the process to split the input more eagerly ?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9177) ICUNormalizer2CharFilter worst case is very slow

2021-07-13 Thread Robert Muir (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17380178#comment-17380178
 ] 

Robert Muir commented on LUCENE-9177:
-

it may be the case. it shouldn't hold up your change really, sorry i've just 
been busy. I need to study the issue, but it sounds like the previous 
implementation did incremental normalization inefficiently, and the PR fixes 
this? there are more "safepoints" than just inert characters.

> ICUNormalizer2CharFilter worst case is very slow
> 
>
> Key: LUCENE-9177
> URL: https://issues.apache.org/jira/browse/LUCENE-9177
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Jim Ferenczi
>Priority: Minor
> Attachments: LUCENE-9177-benchmark-test.patch, 
> LUCENE-9177_LUCENE-8972.patch, lucene.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> ICUNormalizer2CharFilter is fast most of the times but we've had some report 
> in Elasticsearch that some unrealistic data can slow down the process very 
> significantly. For instance an input that consists of characters to normalize 
> with no normalization-inert character in between can take up to several 
> seconds to process few hundreds of kilo-bytes on my machine. While the input 
> is not realistic, this worst case can slow down indexing considerably when 
> dealing with uncleaned data.
> I attached a small test that reproduces the slow processing using a stream 
> that contains a lot of repetition of the character `℃` and no 
> normalization-inert character. I am not surprised that the processing is 
> slower than usual but several seconds to process seems a lot. Adding 
> normalization-inert character makes the processing a lot more faster so I 
> wonder if we can improve the process to split the input more eagerly ?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9177) ICUNormalizer2CharFilter worst case is very slow

2021-07-13 Thread Robert Muir (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17380238#comment-17380238
 ] 

Robert Muir commented on LUCENE-9177:
-

I re-enabled this test on top of your branch. I am beasting it (inefficiently: 
bash script) with this test re-enabled:
{noformat}
#!/usr/bin/env bash
set -ex
while true; do
  ./gradlew -p lucene/analysis/icu -Dtests.nightly=true -Dtest.multiplier=10 
test
done
{noformat}

I'll give it a little time to run. I'm not sure if we should re-enable the test 
for this issue. Nobody ever debugged to the bottom of why it failed. In the 
past we have found bugs in ICU with our random tests... an ICU upgrade may have 
fixed the issue (or dodged it via changes to unicode).

At the same time this component is super-hairy and needs some serious testing :)

Honestly, I didn't understand this charfilter's logic before, but I will give a 
try to reviewing this PR. For sure, we shouldn't be looking for inert 
characters.

> ICUNormalizer2CharFilter worst case is very slow
> 
>
> Key: LUCENE-9177
> URL: https://issues.apache.org/jira/browse/LUCENE-9177
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Jim Ferenczi
>Priority: Minor
> Attachments: LUCENE-9177-benchmark-test.patch, 
> LUCENE-9177_LUCENE-8972.patch, lucene.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> ICUNormalizer2CharFilter is fast most of the times but we've had some report 
> in Elasticsearch that some unrealistic data can slow down the process very 
> significantly. For instance an input that consists of characters to normalize 
> with no normalization-inert character in between can take up to several 
> seconds to process few hundreds of kilo-bytes on my machine. While the input 
> is not realistic, this worst case can slow down indexing considerably when 
> dealing with uncleaned data.
> I attached a small test that reproduces the slow processing using a stream 
> that contains a lot of repetition of the character `℃` and no 
> normalization-inert character. I am not surprised that the processing is 
> slower than usual but several seconds to process seems a lot. Adding 
> normalization-inert character makes the processing a lot more faster so I 
> wonder if we can improve the process to split the input more eagerly ?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5595) TestICUNormalizer2CharFilter test failure

2021-07-13 Thread Robert Muir (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-5595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17380239#comment-17380239
 ] 

Robert Muir commented on LUCENE-5595:
-

I'd like to re-enable this test. I will open a PR. If jenkins gives us a new 
seed, we can re-open it and try to drill down.

> TestICUNormalizer2CharFilter test failure
> -
>
> Key: LUCENE-5595
> URL: https://issues.apache.org/jira/browse/LUCENE-5595
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
>Priority: Major
>
> Seems it does the offsets differently with a spoonfed reader.
> seed for 4.x:
>  ant test  -Dtestcase=TestICUNormalizer2CharFilter 
> -Dtests.method=testRandomStrings -Dtests.seed=19423CE8988D3E11 
> -Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=en 
> -Dtests.timezone=America/Bahia_Banderas -Dtests.file.encoding=UTF-8



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9177) ICUNormalizer2CharFilter worst case is very slow

2021-07-13 Thread Robert Muir (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-9177:

Fix Version/s: 8.10
   main (9.0)

> ICUNormalizer2CharFilter worst case is very slow
> 
>
> Key: LUCENE-9177
> URL: https://issues.apache.org/jira/browse/LUCENE-9177
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Jim Ferenczi
>Priority: Minor
> Fix For: main (9.0), 8.10
>
> Attachments: LUCENE-9177-benchmark-test.patch, 
> LUCENE-9177_LUCENE-8972.patch, lucene.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> ICUNormalizer2CharFilter is fast most of the times but we've had some report 
> in Elasticsearch that some unrealistic data can slow down the process very 
> significantly. For instance an input that consists of characters to normalize 
> with no normalization-inert character in between can take up to several 
> seconds to process few hundreds of kilo-bytes on my machine. While the input 
> is not realistic, this worst case can slow down indexing considerably when 
> dealing with uncleaned data.
> I attached a small test that reproduces the slow processing using a stream 
> that contains a lot of repetition of the character `℃` and no 
> normalization-inert character. I am not surprised that the processing is 
> slower than usual but several seconds to process seems a lot. Adding 
> normalization-inert character makes the processing a lot more faster so I 
> wonder if we can improve the process to split the input more eagerly ?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir merged pull request #199: LUCENE-9177: ICUNormalizer2CharFilter streaming no longer depends on presence of normalization-inert characters

2021-07-13 Thread GitBox


rmuir merged pull request #199:
URL: https://github.com/apache/lucene/pull/199


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9177) ICUNormalizer2CharFilter worst case is very slow

2021-07-13 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17380241#comment-17380241
 ] 

ASF subversion and git services commented on LUCENE-9177:
-

Commit c3482c99ffd9b30acb423e63760ebc7baab9dd26 in lucene's branch 
refs/heads/main from Michael Gibney
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=c3482c9 ]

LUCENE-9177: ICUNormalizer2CharFilter streaming no longer depends on presence 
of normalization-inert characters (#199)

Normalization-inert characters need not be required as boundaries
for incremental processing. It is sufficient to check `hasBoundaryAfter`
and `hasBoundaryBefore`, substantially improving worst-case performance.

> ICUNormalizer2CharFilter worst case is very slow
> 
>
> Key: LUCENE-9177
> URL: https://issues.apache.org/jira/browse/LUCENE-9177
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Jim Ferenczi
>Priority: Minor
> Fix For: main (9.0), 8.10
>
> Attachments: LUCENE-9177-benchmark-test.patch, 
> LUCENE-9177_LUCENE-8972.patch, lucene.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> ICUNormalizer2CharFilter is fast most of the times but we've had some report 
> in Elasticsearch that some unrealistic data can slow down the process very 
> significantly. For instance an input that consists of characters to normalize 
> with no normalization-inert character in between can take up to several 
> seconds to process few hundreds of kilo-bytes on my machine. While the input 
> is not realistic, this worst case can slow down indexing considerably when 
> dealing with uncleaned data.
> I attached a small test that reproduces the slow processing using a stream 
> that contains a lot of repetition of the character `℃` and no 
> normalization-inert character. I am not surprised that the processing is 
> slower than usual but several seconds to process seems a lot. Adding 
> normalization-inert character makes the processing a lot more faster so I 
> wonder if we can improve the process to split the input more eagerly ?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9177) ICUNormalizer2CharFilter worst case is very slow

2021-07-13 Thread Robert Muir (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17380243#comment-17380243
 ] 

Robert Muir commented on LUCENE-9177:
-

Thanks [~mgibney] a lot for taking care of this! I'm backporting this fix to 
8.10 due to the performance trap (doing some more testing first). 

For the LUCENE-5595 test, let's discuss that over there. I will open a PR.

> ICUNormalizer2CharFilter worst case is very slow
> 
>
> Key: LUCENE-9177
> URL: https://issues.apache.org/jira/browse/LUCENE-9177
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Jim Ferenczi
>Priority: Minor
> Fix For: main (9.0), 8.10
>
> Attachments: LUCENE-9177-benchmark-test.patch, 
> LUCENE-9177_LUCENE-8972.patch, lucene.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> ICUNormalizer2CharFilter is fast most of the times but we've had some report 
> in Elasticsearch that some unrealistic data can slow down the process very 
> significantly. For instance an input that consists of characters to normalize 
> with no normalization-inert character in between can take up to several 
> seconds to process few hundreds of kilo-bytes on my machine. While the input 
> is not realistic, this worst case can slow down indexing considerably when 
> dealing with uncleaned data.
> I attached a small test that reproduces the slow processing using a stream 
> that contains a lot of repetition of the character `℃` and no 
> normalization-inert character. I am not surprised that the processing is 
> slower than usual but several seconds to process seems a lot. Adding 
> normalization-inert character makes the processing a lot more faster so I 
> wonder if we can improve the process to split the input more eagerly ?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9177) ICUNormalizer2CharFilter worst case is very slow

2021-07-13 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17380248#comment-17380248
 ] 

ASF subversion and git services commented on LUCENE-9177:
-

Commit 4c95d3ef597dd12bbcfa0153f516539fca0a8e69 in lucene-solr's branch 
refs/heads/branch_8x from Michael Gibney
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=4c95d3e ]

LUCENE-9177: ICUNormalizer2CharFilter streaming no longer depends on presence 
of normalization-inert characters (#199)

Normalization-inert characters need not be required as boundaries
for incremental processing. It is sufficient to check `hasBoundaryAfter`
and `hasBoundaryBefore`, substantially improving worst-case performance.


> ICUNormalizer2CharFilter worst case is very slow
> 
>
> Key: LUCENE-9177
> URL: https://issues.apache.org/jira/browse/LUCENE-9177
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Jim Ferenczi
>Priority: Minor
> Fix For: main (9.0), 8.10
>
> Attachments: LUCENE-9177-benchmark-test.patch, 
> LUCENE-9177_LUCENE-8972.patch, lucene.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> ICUNormalizer2CharFilter is fast most of the times but we've had some report 
> in Elasticsearch that some unrealistic data can slow down the process very 
> significantly. For instance an input that consists of characters to normalize 
> with no normalization-inert character in between can take up to several 
> seconds to process few hundreds of kilo-bytes on my machine. While the input 
> is not realistic, this worst case can slow down indexing considerably when 
> dealing with uncleaned data.
> I attached a small test that reproduces the slow processing using a stream 
> that contains a lot of repetition of the character `℃` and no 
> normalization-inert character. I am not surprised that the processing is 
> slower than usual but several seconds to process seems a lot. Adding 
> normalization-inert character makes the processing a lot more faster so I 
> wonder if we can improve the process to split the input more eagerly ?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-9177) ICUNormalizer2CharFilter worst case is very slow

2021-07-13 Thread Robert Muir (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-9177.
-
Resolution: Fixed

> ICUNormalizer2CharFilter worst case is very slow
> 
>
> Key: LUCENE-9177
> URL: https://issues.apache.org/jira/browse/LUCENE-9177
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Jim Ferenczi
>Priority: Minor
> Fix For: main (9.0), 8.10
>
> Attachments: LUCENE-9177-benchmark-test.patch, 
> LUCENE-9177_LUCENE-8972.patch, lucene.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> ICUNormalizer2CharFilter is fast most of the times but we've had some report 
> in Elasticsearch that some unrealistic data can slow down the process very 
> significantly. For instance an input that consists of characters to normalize 
> with no normalization-inert character in between can take up to several 
> seconds to process few hundreds of kilo-bytes on my machine. While the input 
> is not realistic, this worst case can slow down indexing considerably when 
> dealing with uncleaned data.
> I attached a small test that reproduces the slow processing using a stream 
> that contains a lot of repetition of the character `℃` and no 
> normalization-inert character. I am not surprised that the processing is 
> slower than usual but several seconds to process seems a lot. Adding 
> normalization-inert character makes the processing a lot more faster so I 
> wonder if we can improve the process to split the input more eagerly ?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9177) ICUNormalizer2CharFilter worst case is very slow

2021-07-13 Thread Michael Gibney (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17380254#comment-17380254
 ] 

Michael Gibney commented on LUCENE-9177:


Thanks [~rcmuir]!

> ICUNormalizer2CharFilter worst case is very slow
> 
>
> Key: LUCENE-9177
> URL: https://issues.apache.org/jira/browse/LUCENE-9177
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Jim Ferenczi
>Priority: Minor
> Fix For: main (9.0), 8.10
>
> Attachments: LUCENE-9177-benchmark-test.patch, 
> LUCENE-9177_LUCENE-8972.patch, lucene.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> ICUNormalizer2CharFilter is fast most of the times but we've had some report 
> in Elasticsearch that some unrealistic data can slow down the process very 
> significantly. For instance an input that consists of characters to normalize 
> with no normalization-inert character in between can take up to several 
> seconds to process few hundreds of kilo-bytes on my machine. While the input 
> is not realistic, this worst case can slow down indexing considerably when 
> dealing with uncleaned data.
> I attached a small test that reproduces the slow processing using a stream 
> that contains a lot of repetition of the character `℃` and no 
> normalization-inert character. I am not surprised that the processing is 
> slower than usual but several seconds to process seems a lot. Adding 
> normalization-inert character makes the processing a lot more faster so I 
> wonder if we can improve the process to split the input more eagerly ?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5595) TestICUNormalizer2CharFilter test failure

2021-07-13 Thread Robert Muir (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-5595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17380256#comment-17380256
 ] 

Robert Muir commented on LUCENE-5595:
-

One thing bogus about the existing test is that it tries to do stuff with 
{{Normalizer2.getInstance(null, "nfkc", Normalizer2.Mode.DECOMPOSE)}}
I'm surprised it doesn't get IAE for this, it makes no sense.

Also its not great to test different modes all in the same method anyway.
I am looking into splitting this into NFC, NFKC, NFKC_CF, NFD, NFKD tests.

> TestICUNormalizer2CharFilter test failure
> -
>
> Key: LUCENE-5595
> URL: https://issues.apache.org/jira/browse/LUCENE-5595
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
>Priority: Major
>
> Seems it does the offsets differently with a spoonfed reader.
> seed for 4.x:
>  ant test  -Dtestcase=TestICUNormalizer2CharFilter 
> -Dtests.method=testRandomStrings -Dtests.seed=19423CE8988D3E11 
> -Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=en 
> -Dtests.timezone=America/Bahia_Banderas -Dtests.file.encoding=UTF-8



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5595) TestICUNormalizer2CharFilter test failure

2021-07-13 Thread Robert Muir (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-5595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17380259#comment-17380259
 ] 

Robert Muir commented on LUCENE-5595:
-

sorry, the previous NFKD test is fine. I thought i read it as NFKC+decompose. 
Anyway, more argument to splitting the testing up to separate methods, so that 
if jenkins trips, we might have hints as to the problem. Still testing locally 
and then I'll make a PR.

> TestICUNormalizer2CharFilter test failure
> -
>
> Key: LUCENE-5595
> URL: https://issues.apache.org/jira/browse/LUCENE-5595
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
>Priority: Major
>
> Seems it does the offsets differently with a spoonfed reader.
> seed for 4.x:
>  ant test  -Dtestcase=TestICUNormalizer2CharFilter 
> -Dtests.method=testRandomStrings -Dtests.seed=19423CE8988D3E11 
> -Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=en 
> -Dtests.timezone=America/Bahia_Banderas -Dtests.file.encoding=UTF-8



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir opened a new pull request #211: LUCENE-5595: re-enable TestICUNormalizer2CharFilter random test, splitting by mode

2021-07-13 Thread GitBox


rmuir opened a new pull request #211:
URL: https://github.com/apache/lucene/pull/211


   Re-enable the randomized testing here, but with a separate test for each
   mode rather than all in one method. It gives better testing and also 
easier-to-debug
   testing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #211: LUCENE-5595: re-enable TestICUNormalizer2CharFilter random test, splitting by mode

2021-07-13 Thread GitBox


rmuir commented on pull request #211:
URL: https://github.com/apache/lucene/pull/211#issuecomment-879535535


   cc: @magibney 
   
   This is the basic random test that we've had disabled for years. Honestly 
original bugs could have been in ICU itself, not sure. Maybe the new tests will 
fail! But I think it is much better for us to enable it in `main` branch with 
the new gradle build, with tests corresponding to different normalization 
modes. Maybe we stand a better chance to fix any failures this way.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] magibney commented on pull request #211: LUCENE-5595: re-enable TestICUNormalizer2CharFilter random test, splitting by mode

2021-07-13 Thread GitBox


magibney commented on pull request #211:
URL: https://github.com/apache/lucene/pull/211#issuecomment-879538390


   LGTM; makes sense to re-enable and add separate tests for different 
normalization forms.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #211: LUCENE-5595: re-enable TestICUNormalizer2CharFilter random test, splitting by mode

2021-07-13 Thread GitBox


rmuir commented on pull request #211:
URL: https://github.com/apache/lucene/pull/211#issuecomment-879540243


   I'm running my inefficient beasting script: shell script loop, gradle 
daemons disabled, all lucene/analysis/icu tests with nightly and multiplier. 
I'll let it run for a while before we try jenkins, I don't want to just make 
builds flaky.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #211: LUCENE-5595: re-enable TestICUNormalizer2CharFilter random test, splitting by mode

2021-07-13 Thread GitBox


rmuir commented on pull request #211:
URL: https://github.com/apache/lucene/pull/211#issuecomment-879551134


   100 successful runs in beasting with nightly and 10x multiplier: I think we 
are ok. Can always open an issue if jenkins trips.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir merged pull request #211: LUCENE-5595: re-enable TestICUNormalizer2CharFilter random test, splitting by mode

2021-07-13 Thread GitBox


rmuir merged pull request #211:
URL: https://github.com/apache/lucene/pull/211


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5595) TestICUNormalizer2CharFilter test failure

2021-07-13 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-5595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17380268#comment-17380268
 ] 

ASF subversion and git services commented on LUCENE-5595:
-

Commit 5cf142f972db9a658d768ba3eac42c29916545aa in lucene's branch 
refs/heads/main from Robert Muir
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=5cf142f ]

LUCENE-5595: re-enable TestICUNormalizer2CharFilter random test, splitting by 
mode (#211)

Re-enable the randomized testing here, but with a separate test for each
mode rather than all in one method. It gives better testing and also 
easier-to-debug
testing.

> TestICUNormalizer2CharFilter test failure
> -
>
> Key: LUCENE-5595
> URL: https://issues.apache.org/jira/browse/LUCENE-5595
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Seems it does the offsets differently with a spoonfed reader.
> seed for 4.x:
>  ant test  -Dtestcase=TestICUNormalizer2CharFilter 
> -Dtests.method=testRandomStrings -Dtests.seed=19423CE8988D3E11 
> -Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=en 
> -Dtests.timezone=America/Bahia_Banderas -Dtests.file.encoding=UTF-8



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-5595) TestICUNormalizer2CharFilter test failure

2021-07-13 Thread Robert Muir (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-5595.
-
Fix Version/s: main (9.0)
   Resolution: Fixed

Marking as fixed. Actually all we did is crank the testing up on this issue. 
But the underlying library has been upgraded a few times since the original 
issue was opened.

For now, random tests are enabled. If they trip, please open an issue.

> TestICUNormalizer2CharFilter test failure
> -
>
> Key: LUCENE-5595
> URL: https://issues.apache.org/jira/browse/LUCENE-5595
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
>Priority: Major
> Fix For: main (9.0)
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Seems it does the offsets differently with a spoonfed reader.
> seed for 4.x:
>  ant test  -Dtestcase=TestICUNormalizer2CharFilter 
> -Dtests.method=testRandomStrings -Dtests.seed=19423CE8988D3E11 
> -Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=en 
> -Dtests.timezone=America/Bahia_Banderas -Dtests.file.encoding=UTF-8



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-10024) Catch NoSuchFileException when trying to open an index directory which does not exist

2021-07-13 Thread Michael Wechner (Jira)
Michael Wechner created LUCENE-10024:


 Summary: Catch NoSuchFileException when trying to open an index 
directory which does not exist
 Key: LUCENE-10024
 URL: https://issues.apache.org/jira/browse/LUCENE-10024
 Project: Lucene - Core
  Issue Type: Improvement
  Components: luke
Reporter: Michael Wechner


When trying to open an index one can select from the dropdown "Index Path" 
(Dialog: "Choose index directory path") previously opened index directories.

If such a previously opened index directory path has been deleted, but one 
selects it from the dropdown, then the error message should tell that this 
directory does not exist.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10024) Catch NoSuchFileException when trying to open an index directory which does not exist

2021-07-13 Thread Michael Wechner (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Wechner updated LUCENE-10024:
-
Description: 
When trying to open an index one can select from the dropdown "Index Path" 
(Dialog: "Choose index directory path") previously opened index directories.

If such a previously opened index directory path has been deleted in the 
meantime, but one selects it from the dropdown, then the error message should 
tell that this directory does not exist.

As an alternative Luke might be able to check the existence of the previously 
opened index directories before displaying in the dropdown

  was:
When trying to open an index one can select from the dropdown "Index Path" 
(Dialog: "Choose index directory path") previously opened index directories.

If such a previously opened index directory path has been deleted, but one 
selects it from the dropdown, then the error message should tell that this 
directory does not exist.


> Catch NoSuchFileException when trying to open an index directory which does 
> not exist
> -
>
> Key: LUCENE-10024
> URL: https://issues.apache.org/jira/browse/LUCENE-10024
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: luke
>Reporter: Michael Wechner
>Priority: Minor
>
> When trying to open an index one can select from the dropdown "Index Path" 
> (Dialog: "Choose index directory path") previously opened index directories.
> If such a previously opened index directory path has been deleted in the 
> meantime, but one selects it from the dropdown, then the error message should 
> tell that this directory does not exist.
> As an alternative Luke might be able to check the existence of the previously 
> opened index directories before displaying in the dropdown



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10024) Catch NoSuchFileException when trying to open an index directory which does not exist

2021-07-13 Thread Michael Wechner (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Wechner updated LUCENE-10024:
-
Attachment: proposed-patch.txt

> Catch NoSuchFileException when trying to open an index directory which does 
> not exist
> -
>
> Key: LUCENE-10024
> URL: https://issues.apache.org/jira/browse/LUCENE-10024
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: luke
>Reporter: Michael Wechner
>Priority: Minor
> Attachments: proposed-patch.txt
>
>
> When trying to open an index one can select from the dropdown "Index Path" 
> (Dialog: "Choose index directory path") previously opened index directories.
> If such a previously opened index directory path has been deleted in the 
> meantime, but one selects it from the dropdown, then the error message should 
> tell that this directory does not exist.
> As an alternative Luke might be able to check the existence of the previously 
> opened index directories before displaying in the dropdown



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org