[GitHub] [lucene] uschindler commented on a diff in pull request #817: improve spotless error to suggest running 'gradlew tidy'

2022-04-19 Thread GitBox


uschindler commented on code in PR #817:
URL: https://github.com/apache/lucene/pull/817#discussion_r852697178


##
gradle/validation/spotless.gradle:
##
@@ -111,3 +111,9 @@ configure(project(":lucene").subprojects) { prj ->
 v.dependsOn ":checkJdkInternalsExportedToGradle"
   }
 }
+
+gradle.taskGraph.afterTask { Task task, TaskState state ->
+  if (task.name == 'spotlessJavaCheck' && state.failure) {
+throw new GradleException("\n***\n*PLEASE RUN 
./gradle tidy!*\n***");

Review Comment:
   "gradlew tidy", with "w".



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10520) HTMLStripCharFilter fails on '>' or '<' characters in attribute values

2022-04-19 Thread Alex Alishevskikh (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Alishevskikh updated LUCENE-10520:
---
Description: 
If HTML input contains attributes with '<' or '>' characters in their values, 
HTMLStripCharFilter produces unexpected results.

See the attached unit test for example.

These characters are valid in attribute values, as by the [HTML5 specification 
|https://html.spec.whatwg.org/#syntax-attribute-value]. The [W3C 
validator|https://validator.w3.org/nu/#textarea] does not have issues with the 
test HTML.  


 

  was:
If HTML input contains attributes with '<' or '>' characters in their values, 
HTMLCharStripFilter produces unexpected results.

See the attached unit test for example.

These characters are valid in attribute values, as by the [HTML5 specification 
|https://html.spec.whatwg.org/#syntax-attribute-value]. The [W3C 
validator|https://validator.w3.org/nu/#textarea] does not have issues with the 
test HTML.  


 

 Labels: HTMLStripCharFilter  (was: HTMLCharStripFilter)
Summary: HTMLStripCharFilter fails on '>' or '<' characters in 
attribute values   (was: HTMLCharStripFilter fails on '>' or '<' characters in 
attribute values )

> HTMLStripCharFilter fails on '>' or '<' characters in attribute values 
> ---
>
> Key: LUCENE-10520
> URL: https://issues.apache.org/jira/browse/LUCENE-10520
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/analysis
>Affects Versions: 9.1
>Reporter: Alex Alishevskikh
>Priority: Major
>  Labels: HTMLStripCharFilter
> Fix For: 9.1
>
> Attachments: HTMLStripCharFilterTest.java
>
>
> If HTML input contains attributes with '<' or '>' characters in their values, 
> HTMLStripCharFilter produces unexpected results.
> See the attached unit test for example.
> These characters are valid in attribute values, as by the [HTML5 
> specification |https://html.spec.whatwg.org/#syntax-attribute-value]. The 
> [W3C validator|https://validator.w3.org/nu/#textarea] does not have issues 
> with the test HTML.  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] LuXugang commented on pull request #792: LUCENE-10502: Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle ordToDoc

2022-04-19 Thread GitBox


LuXugang commented on PR #792:
URL: https://github.com/apache/lucene/pull/792#issuecomment-1102297367

   Thanks @jtibshirani @mayya-sharipova , Indeed, only dense case was coverd in 
[luceneutil](https://github.com/mikemccand/luceneutil), so I write a 
[demo](https://github.com/LuXugang/Lucene-7.5.0/commit/b69ae6c70665878f95115a6a49715c84c760b4c6)
 to run a sparse case test.
   
   vector source: 
   - 3 dimensions
   - 7 vectors within 100k documents in one segment
   - do `KnnVectorQuery`
   
   NumberOfDocumentsToFind | baseline(search)ms | candidate(search)ms
   -- | -- | --
   10 | 3 | 3
   1000 | 7 | 7
   1 | 24 | 30
   2 | 45 | 48
   5 | 108 | 117
   
   
   
   FORMAT | baseline(indexSize) | candidate(indexSize)
   -- | -- | --
   vec | 781K | 806K
   vem | 278K | 18K
   vex | 4.6M | 4.6M
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] LuXugang commented on pull request #792: LUCENE-10502: Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle ordToDoc

2022-04-19 Thread GitBox


LuXugang commented on PR #792:
URL: https://github.com/apache/lucene/pull/792#issuecomment-1102336772

   Result of dense case in Luceneutil's benchmark by running `python 
src/python/localrun.py -source wikivector10k`:
   
   LowTermVector 1493.52  (9.1%) 1457.88 (11.5%)   
-2.4% ( -21% -   20%) 0.468
   AndHighLowVector 1248.97  (9.3%) 1251.44  
(9.0%)0.2% ( -16% -   20%) 0.945
  MedTermVector 1407.52  (9.9%) 1414.02  
(9.9%)0.5% ( -17% -   22%) 0.883
   AndHighMedVector 1422.91 (11.6%) 1444.62  
(9.6%)1.5% ( -17% -   25%) 0.649
  AndHighHighVector 1441.78  (8.7%) 1468.59  
(8.0%)1.9% ( -13% -   20%) 0.480
   PKLookup   55.55 (22.3%)   56.87 
(23.3%)2.4% ( -35% -   61%) 0.741
 HighTermVector 1349.45 (11.1%) 1400.94  
(9.7%)3.8% ( -15% -   27%) 0.249
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-10517) Improve performance of SortedSetDV faceting by iterating on class types

2022-04-19 Thread Chris Hegarty (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17522838#comment-17522838
 ] 

Chris Hegarty edited comment on LUCENE-10517 at 4/19/22 10:38 AM:
--

With my M1 I get the following luceneutil benchmark results.

Hardware Overview:

Chip: Apple M1
Total Number of Cores: 8 (4 performance and 4 efficiency)
Memory: 16 GB
{code:java}
                            TaskQPS baseline      StdDevQPS my_modified_version 
     StdDev                Pct diff p-value
                       LowPhrase      148.35      (2.1%)      143.66      
(2.6%)   -3.2% (  -7% -    1%) 0.000
             MedIntervalsOrdered      197.27      (3.7%)      191.24      
(5.7%)   -3.1% ( -12% -    6%) 0.044
            HighIntervalsOrdered       11.55      (2.6%)       11.33      
(3.5%)   -1.9% (  -7% -    4%) 0.055
                      AndHighMed      447.74      (2.1%)      441.26      
(2.4%)   -1.4% (  -5% -    3%) 0.042
                        HighTerm     2397.60      (4.0%)     2367.10      
(2.4%)   -1.3% (  -7% -    5%) 0.223
                         LowTerm     3939.37      (2.7%)     3890.14      
(2.3%)   -1.2% (  -6% -    3%) 0.111
                   OrHighNotHigh     1917.21      (2.8%)     1893.94      
(3.2%)   -1.2% (  -6% -    4%) 0.198
                      HighPhrase       32.93      (1.9%)       32.55      
(1.1%)   -1.2% (  -4% -    1%) 0.022
                        PKLookup      340.11      (4.5%)      336.69      
(4.3%)   -1.0% (  -9% -    8%) 0.471
                      TermDTSort      145.39      (4.1%)      144.09      
(2.3%)   -0.9% (  -7% -    5%) 0.394
                    HighSpanNear       10.38      (3.7%)       10.32      
(1.9%)   -0.6% (  -5% -    5%) 0.531
                     MedSpanNear      206.69      (2.8%)      205.70      
(1.5%)   -0.5% (  -4% -    3%) 0.500
                          Fuzzy2       91.75      (2.5%)       91.41      
(1.4%)   -0.4% (  -4% -    3%) 0.562
                    OrHighNotMed     1975.22      (3.5%)     1968.91      
(2.7%)   -0.3% (  -6% -    6%) 0.744
                       OrHighMed       66.62      (3.9%)       66.45      
(4.8%)   -0.3% (  -8% -    8%) 0.850
                 LowSloppyPhrase       62.60      (2.1%)       62.44      
(2.5%)   -0.3% (  -4% -    4%) 0.726
                    OrHighNotLow     1876.16      (2.5%)     1871.56      
(2.4%)   -0.2% (  -5% -    4%) 0.756
                      OrHighHigh       55.70      (3.9%)       55.64      
(4.9%)   -0.1% (  -8% -    9%) 0.940
                          Fuzzy1      100.97      (2.2%)      100.88      
(2.1%)   -0.1% (  -4% -    4%) 0.898
             LowIntervalsOrdered       42.24      (0.7%)       42.21      
(1.0%)   -0.1% (  -1% -    1%) 0.766
                       MedPhrase      923.85      (1.3%)      923.14      
(1.6%)   -0.1% (  -2% -    2%) 0.867
                    OrNotHighMed     1427.45      (2.0%)     1428.11      
(2.5%)    0.0% (  -4% -    4%) 0.949
                         Respell       82.74      (2.6%)       82.81      
(1.9%)    0.1% (  -4% -    4%) 0.903
                     LowSpanNear      373.63      (2.6%)      373.97      
(1.6%)    0.1% (  -4% -    4%) 0.893
           HighTermDayOfYearSort      199.64      (1.7%)      199.83      
(2.5%)    0.1% (  -4% -    4%) 0.887
                   OrNotHighHigh     1523.02      (2.2%)     1526.12      
(2.0%)    0.2% (  -3% -    4%) 0.759
         AndHighMedDayTaxoFacets      185.23      (0.9%)      185.79      
(1.4%)    0.3% (  -1% -    2%) 0.416
                         MedTerm     3016.98      (3.4%)     3026.53      
(3.2%)    0.3% (  -6% -    7%) 0.761
                    OrNotHighLow     1867.65      (2.5%)     1876.63      
(2.4%)    0.5% (  -4% -    5%) 0.535
                      AndHighLow     1571.61      (3.1%)     1579.86      
(2.6%)    0.5% (  -5% -    6%) 0.564
                       OrHighLow     1485.93      (3.7%)     1494.56      
(2.5%)    0.6% (  -5% -    7%) 0.559
                     AndHighHigh       80.42      (2.8%)       81.06      
(1.7%)    0.8% (  -3% -    5%) 0.273
                HighSloppyPhrase       50.68      (4.0%)       51.14      
(4.7%)    0.9% (  -7% -    9%) 0.506
                 MedSloppyPhrase       40.76      (2.6%)       41.13      
(3.6%)    0.9% (  -5% -    7%) 0.356
                        Wildcard      123.13      (7.3%)      124.34      
(6.5%)    1.0% ( -11% -   15%) 0.654
        AndHighHighDayTaxoFacets       17.77      (2.8%)       17.95      
(2.7%)    1.0% (  -4% -    6%) 0.256
            MedTermDayTaxoFacets       46.83      (2.6%)       47.38      
(1.8%)    1.2% (  -3% -    5%) 0.097
               HighTermMonthSort      193.35      (1.5%)      195.77      
(5.4%)    1.2% (  -5% -    8%) 0.320
                          IntNRQ       69.13     (17.2%)       70.81     
(16.2%)    2.4% ( -26% -   43%) 0.646
            H

[GitHub] [lucene] rmuir commented on a diff in pull request #817: improve spotless error to suggest running 'gradlew tidy'

2022-04-19 Thread GitBox


rmuir commented on code in PR #817:
URL: https://github.com/apache/lucene/pull/817#discussion_r852889508


##
gradle/validation/spotless.gradle:
##
@@ -111,3 +111,9 @@ configure(project(":lucene").subprojects) { prj ->
 v.dependsOn ":checkJdkInternalsExportedToGradle"
   }
 }
+
+gradle.taskGraph.afterTask { Task task, TaskState state ->
+  if (task.name == 'spotlessJavaCheck' && state.failure) {
+throw new GradleException("\n***\n*PLEASE RUN 
./gradle tidy!*\n***");

Review Comment:
   OOPS



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10521) Tests in windows are failing for the new testAlwaysRefreshDirectoryTaxonomyReader test

2022-04-19 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524255#comment-17524255
 ] 

ASF subversion and git services commented on LUCENE-10521:
--

Commit fb76d0b104ef843790848531cf14707e2059e079 in lucene's branch 
refs/heads/main from Michael McCandless
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=fb76d0b104e ]

LUCENE-10482, LUCENE-10521: hrmph, put the @Ignore in the right place


> Tests in windows are failing for the new 
> testAlwaysRefreshDirectoryTaxonomyReader test
> --
>
> Key: LUCENE-10521
> URL: https://issues.apache.org/jira/browse/LUCENE-10521
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/facet
> Environment: Windows 10
>Reporter: Gautam Worah
>Priority: Minor
>
> Build: [https://jenkins.thetaphi.de/job/Lucene-main-Windows/10725/] is 
> failing.
>  
> Specifically, the loop which checks if any files still remain to be deleted 
> is not ending.
> We have added an exception to the main test class to not run the test on 
> WindowsFS (not sure if this is related).
>  
> ```
> SEVERE: 1 thread leaked from SUITE scope at 
> org.apache.lucene.facet.taxonomy.directory.TestAlwaysRefreshDirectoryTaxonomyReader:
>  1) Thread[id=19, 
> name=TEST-TestAlwaysRefreshDirectoryTaxonomyReader.testAlwaysRefreshDirectoryTaxonomyReader-seed#[F46E42CB7F2B6959],
>  state=RUNNABLE, group=TGRP-TestAlwaysRefreshDirectoryTaxonomyReader] at 
> java.base@18/sun.nio.fs.WindowsNativeDispatcher.GetFileAttributesEx0(Native 
> Method) at 
> java.base@18/sun.nio.fs.WindowsNativeDispatcher.GetFileAttributesEx(WindowsNativeDispatcher.java:390)
>  at 
> java.base@18/sun.nio.fs.WindowsFileAttributes.get(WindowsFileAttributes.java:307)
>  at 
> java.base@18/sun.nio.fs.WindowsFileSystemProvider.implDelete(WindowsFileSystemProvider.java:251)
>  at 
> java.base@18/sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:105)
>  at 
> app/org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.mockfile.FilterFileSystemProvider.delete(FilterFileSystemProvider.java:130)
>  at 
> app/org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.mockfile.FilterFileSystemProvider.delete(FilterFileSystemProvider.java:130)
>  at 
> app/org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.mockfile.FilterFileSystemProvider.delete(FilterFileSystemProvider.java:130)
>  at 
> app/org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.mockfile.FilterFileSystemProvider.delete(FilterFileSystemProvider.java:130)
>  at 
> app/org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.mockfile.FilterFileSystemProvider.delete(FilterFileSystemProvider.java:130)
>  at java.base@18/java.nio.file.Files.delete(Files.java:1152) at 
> app/org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.store.FSDirectory.privateDeleteFile(FSDirectory.java:344)
>  at 
> app/org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.store.FSDirectory.deletePendingFiles(FSDirectory.java:325)
>  at 
> app/org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.store.FSDirectory.getPendingDeletions(FSDirectory.java:410)
>  at 
> app/org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.store.FilterDirectory.getPendingDeletions(FilterDirectory.java:121)
>  at 
> app//org.apache.lucene.facet.taxonomy.directory.TestAlwaysRefreshDirectoryTaxonomyReader.testAlwaysRefreshDirectoryTaxonomyReader(TestAlwaysRefreshDirectoryTaxonomyReader.java:97)
> ```



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10482) Allow users to create their own DirectoryTaxonomyReaders with empty taxoArrays instead of letting the taxoEpoch decide

2022-04-19 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524254#comment-17524254
 ] 

ASF subversion and git services commented on LUCENE-10482:
--

Commit fb76d0b104ef843790848531cf14707e2059e079 in lucene's branch 
refs/heads/main from Michael McCandless
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=fb76d0b104e ]

LUCENE-10482, LUCENE-10521: hrmph, put the @Ignore in the right place


> Allow users to create their own DirectoryTaxonomyReaders with empty 
> taxoArrays instead of letting the taxoEpoch decide
> --
>
> Key: LUCENE-10482
> URL: https://issues.apache.org/jira/browse/LUCENE-10482
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Affects Versions: 9.1
>Reporter: Gautam Worah
>Priority: Minor
>  Time Spent: 8.5h
>  Remaining Estimate: 0h
>
> I was experimenting with the taxonomy index and {{DirectoryTaxonomyReaders}} 
> in my day job where we were trying to replace the index underneath a reader 
> asynchronously and then call the {{doOpenIfChanged}} call on it.
> It turns out that the taxonomy index uses its own index based counter (the 
> {{{}taxonomyIndexEpoch{}}}) to determine if the index was opened in write 
> mode after the last time it was written and if not, it directly tries to 
> reuse the previous {{taxoArrays}} it had created. This logic fails in a 
> scenario where both the old and new index were opened just once but the index 
> itself is completely different in both the cases.
> In such a case, it would be good to give the user the flexibility to inform 
> the DTR to recreate its {{{}taxoArrays{}}}, {{ordinalCache}} and 
> {{{}categoryCache{}}} (not refreshing these arrays causes it to fail in 
> various ways). Luckily, such a constructor already exists! But it is private 
> today! The idea here is to allow subclasses of DTR to use this constructor.
> Curious to see what other folks think about this idea. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10482) Allow users to create their own DirectoryTaxonomyReaders with empty taxoArrays instead of letting the taxoEpoch decide

2022-04-19 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524256#comment-17524256
 ] 

ASF subversion and git services commented on LUCENE-10482:
--

Commit 2fa3a36899f4560ffb593449d6778307aa232e35 in lucene's branch 
refs/heads/branch_9x from Michael McCandless
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=2fa3a36899f ]

LUCENE-10482, LUCENE-10521: hrmph, put the @Ignore in the right place


> Allow users to create their own DirectoryTaxonomyReaders with empty 
> taxoArrays instead of letting the taxoEpoch decide
> --
>
> Key: LUCENE-10482
> URL: https://issues.apache.org/jira/browse/LUCENE-10482
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Affects Versions: 9.1
>Reporter: Gautam Worah
>Priority: Minor
>  Time Spent: 8.5h
>  Remaining Estimate: 0h
>
> I was experimenting with the taxonomy index and {{DirectoryTaxonomyReaders}} 
> in my day job where we were trying to replace the index underneath a reader 
> asynchronously and then call the {{doOpenIfChanged}} call on it.
> It turns out that the taxonomy index uses its own index based counter (the 
> {{{}taxonomyIndexEpoch{}}}) to determine if the index was opened in write 
> mode after the last time it was written and if not, it directly tries to 
> reuse the previous {{taxoArrays}} it had created. This logic fails in a 
> scenario where both the old and new index were opened just once but the index 
> itself is completely different in both the cases.
> In such a case, it would be good to give the user the flexibility to inform 
> the DTR to recreate its {{{}taxoArrays{}}}, {{ordinalCache}} and 
> {{{}categoryCache{}}} (not refreshing these arrays causes it to fail in 
> various ways). Luckily, such a constructor already exists! But it is private 
> today! The idea here is to allow subclasses of DTR to use this constructor.
> Curious to see what other folks think about this idea. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10521) Tests in windows are failing for the new testAlwaysRefreshDirectoryTaxonomyReader test

2022-04-19 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524257#comment-17524257
 ] 

ASF subversion and git services commented on LUCENE-10521:
--

Commit 2fa3a36899f4560ffb593449d6778307aa232e35 in lucene's branch 
refs/heads/branch_9x from Michael McCandless
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=2fa3a36899f ]

LUCENE-10482, LUCENE-10521: hrmph, put the @Ignore in the right place


> Tests in windows are failing for the new 
> testAlwaysRefreshDirectoryTaxonomyReader test
> --
>
> Key: LUCENE-10521
> URL: https://issues.apache.org/jira/browse/LUCENE-10521
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/facet
> Environment: Windows 10
>Reporter: Gautam Worah
>Priority: Minor
>
> Build: [https://jenkins.thetaphi.de/job/Lucene-main-Windows/10725/] is 
> failing.
>  
> Specifically, the loop which checks if any files still remain to be deleted 
> is not ending.
> We have added an exception to the main test class to not run the test on 
> WindowsFS (not sure if this is related).
>  
> ```
> SEVERE: 1 thread leaked from SUITE scope at 
> org.apache.lucene.facet.taxonomy.directory.TestAlwaysRefreshDirectoryTaxonomyReader:
>  1) Thread[id=19, 
> name=TEST-TestAlwaysRefreshDirectoryTaxonomyReader.testAlwaysRefreshDirectoryTaxonomyReader-seed#[F46E42CB7F2B6959],
>  state=RUNNABLE, group=TGRP-TestAlwaysRefreshDirectoryTaxonomyReader] at 
> java.base@18/sun.nio.fs.WindowsNativeDispatcher.GetFileAttributesEx0(Native 
> Method) at 
> java.base@18/sun.nio.fs.WindowsNativeDispatcher.GetFileAttributesEx(WindowsNativeDispatcher.java:390)
>  at 
> java.base@18/sun.nio.fs.WindowsFileAttributes.get(WindowsFileAttributes.java:307)
>  at 
> java.base@18/sun.nio.fs.WindowsFileSystemProvider.implDelete(WindowsFileSystemProvider.java:251)
>  at 
> java.base@18/sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:105)
>  at 
> app/org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.mockfile.FilterFileSystemProvider.delete(FilterFileSystemProvider.java:130)
>  at 
> app/org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.mockfile.FilterFileSystemProvider.delete(FilterFileSystemProvider.java:130)
>  at 
> app/org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.mockfile.FilterFileSystemProvider.delete(FilterFileSystemProvider.java:130)
>  at 
> app/org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.mockfile.FilterFileSystemProvider.delete(FilterFileSystemProvider.java:130)
>  at 
> app/org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.mockfile.FilterFileSystemProvider.delete(FilterFileSystemProvider.java:130)
>  at java.base@18/java.nio.file.Files.delete(Files.java:1152) at 
> app/org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.store.FSDirectory.privateDeleteFile(FSDirectory.java:344)
>  at 
> app/org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.store.FSDirectory.deletePendingFiles(FSDirectory.java:325)
>  at 
> app/org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.store.FSDirectory.getPendingDeletions(FSDirectory.java:410)
>  at 
> app/org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.store.FilterDirectory.getPendingDeletions(FilterDirectory.java:121)
>  at 
> app//org.apache.lucene.facet.taxonomy.directory.TestAlwaysRefreshDirectoryTaxonomyReader.testAlwaysRefreshDirectoryTaxonomyReader(TestAlwaysRefreshDirectoryTaxonomyReader.java:97)
> ```



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10521) Tests in windows are failing for the new testAlwaysRefreshDirectoryTaxonomyReader test

2022-04-19 Thread Michael McCandless (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524258#comment-17524258
 ] 

Michael McCandless commented on LUCENE-10521:
-

Phew!  This time I put the {{@Ignore}} in the right place, I think :)  I 
confirmed now when I run that one test case, it indeed says skipped.  Hopefully 
[~uschindler]'s awesome Windows Jenkins builds are OK again.  Sorry for all the 
flailing :)

> Tests in windows are failing for the new 
> testAlwaysRefreshDirectoryTaxonomyReader test
> --
>
> Key: LUCENE-10521
> URL: https://issues.apache.org/jira/browse/LUCENE-10521
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/facet
> Environment: Windows 10
>Reporter: Gautam Worah
>Priority: Minor
>
> Build: [https://jenkins.thetaphi.de/job/Lucene-main-Windows/10725/] is 
> failing.
>  
> Specifically, the loop which checks if any files still remain to be deleted 
> is not ending.
> We have added an exception to the main test class to not run the test on 
> WindowsFS (not sure if this is related).
>  
> ```
> SEVERE: 1 thread leaked from SUITE scope at 
> org.apache.lucene.facet.taxonomy.directory.TestAlwaysRefreshDirectoryTaxonomyReader:
>  1) Thread[id=19, 
> name=TEST-TestAlwaysRefreshDirectoryTaxonomyReader.testAlwaysRefreshDirectoryTaxonomyReader-seed#[F46E42CB7F2B6959],
>  state=RUNNABLE, group=TGRP-TestAlwaysRefreshDirectoryTaxonomyReader] at 
> java.base@18/sun.nio.fs.WindowsNativeDispatcher.GetFileAttributesEx0(Native 
> Method) at 
> java.base@18/sun.nio.fs.WindowsNativeDispatcher.GetFileAttributesEx(WindowsNativeDispatcher.java:390)
>  at 
> java.base@18/sun.nio.fs.WindowsFileAttributes.get(WindowsFileAttributes.java:307)
>  at 
> java.base@18/sun.nio.fs.WindowsFileSystemProvider.implDelete(WindowsFileSystemProvider.java:251)
>  at 
> java.base@18/sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:105)
>  at 
> app/org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.mockfile.FilterFileSystemProvider.delete(FilterFileSystemProvider.java:130)
>  at 
> app/org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.mockfile.FilterFileSystemProvider.delete(FilterFileSystemProvider.java:130)
>  at 
> app/org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.mockfile.FilterFileSystemProvider.delete(FilterFileSystemProvider.java:130)
>  at 
> app/org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.mockfile.FilterFileSystemProvider.delete(FilterFileSystemProvider.java:130)
>  at 
> app/org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.mockfile.FilterFileSystemProvider.delete(FilterFileSystemProvider.java:130)
>  at java.base@18/java.nio.file.Files.delete(Files.java:1152) at 
> app/org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.store.FSDirectory.privateDeleteFile(FSDirectory.java:344)
>  at 
> app/org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.store.FSDirectory.deletePendingFiles(FSDirectory.java:325)
>  at 
> app/org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.store.FSDirectory.getPendingDeletions(FSDirectory.java:410)
>  at 
> app/org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.store.FilterDirectory.getPendingDeletions(FilterDirectory.java:121)
>  at 
> app//org.apache.lucene.facet.taxonomy.directory.TestAlwaysRefreshDirectoryTaxonomyReader.testAlwaysRefreshDirectoryTaxonomyReader(TestAlwaysRefreshDirectoryTaxonomyReader.java:97)
> ```



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10517) Improve performance of SortedSetDV faceting by iterating on class types

2022-04-19 Thread Michael McCandless (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524262#comment-17524262
 ] 

Michael McCandless commented on LUCENE-10517:
-

This is a very impressive performance jump for the "pure browse" faceting case! 
 I'll review the PR soon.   Thanks [~ChrisHegarty]!

> Improve performance of SortedSetDV faceting by iterating on class types
> ---
>
> Key: LUCENE-10517
> URL: https://issues.apache.org/jira/browse/LUCENE-10517
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Affects Versions: 9.1
>Reporter: Chris Hegarty
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> While analysing various profiles, [@grcevski|https://github.com/grcevski] and 
> I can came across this potential improvement.
> SortedSetDV faceting (and friends), can improve performance within tight 
> loops by using invokevirtual (rather than invokeinterface). The C2 JIT 
> compiler can produce slightly more optimal code in this case, and since these 
> loops are very hot, the impact can be significant (in the order of 10-30%).
> This issue is in some ways similar to, and builds upon, prior optimisations 
> in this area, like say LUCENE-5300 or more recently LUCENE-5309



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] ssigut commented on pull request #439: LUCENE-8739: custom codec providing Zstandard compression/decompression

2022-04-19 Thread GitBox


ssigut commented on PR #439:
URL: https://github.com/apache/lucene/pull/439#issuecomment-1102537548

   Is this PR going to be merged? What release is this planned for?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir opened a new pull request, #818: Fix incorrect docs in README.md: it must be java 17 exactly, java 18 does not work

2022-04-19 Thread GitBox


rmuir opened a new pull request, #818:
URL: https://github.com/apache/lucene/pull/818

   These instructions tell the user to install 17 (or greater), then run 
`./gradlew`. This will not actually work if they install something greater than 
java 17.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10517) Improve performance of SortedSetDV faceting by iterating on class types

2022-04-19 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524277#comment-17524277
 ] 

Adrien Grand commented on LUCENE-10517:
---

Very impressive indeed. This makes me wonder if {{DefaultBulkScorer#scoreAll}} 
would benefit from a similar change. This is the function that iterates over 
all the matches of a query to pass them to the collector.

> Improve performance of SortedSetDV faceting by iterating on class types
> ---
>
> Key: LUCENE-10517
> URL: https://issues.apache.org/jira/browse/LUCENE-10517
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Affects Versions: 9.1
>Reporter: Chris Hegarty
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> While analysing various profiles, [@grcevski|https://github.com/grcevski] and 
> I can came across this potential improvement.
> SortedSetDV faceting (and friends), can improve performance within tight 
> loops by using invokevirtual (rather than invokeinterface). The C2 JIT 
> compiler can produce slightly more optimal code in this case, and since these 
> loops are very hot, the impact can be significant (in the order of 10-30%).
> This issue is in some ways similar to, and builds upon, prior optimisations 
> in this area, like say LUCENE-5300 or more recently LUCENE-5309



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz commented on pull request #439: LUCENE-8739: custom codec providing Zstandard compression/decompression

2022-04-19 Thread GitBox


jpountz commented on PR #439:
URL: https://github.com/apache/lucene/pull/439#issuecomment-1102557832

   See discussion on 
[LUCENE-8739](https://issues.apache.org/jira/browse/LUCENE-8739), this PR is 
unlikely going to get merged.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir opened a new pull request, #819: fail clearly on too-new JDK

2022-04-19 Thread GitBox


rmuir opened a new pull request, #819:
URL: https://github.com/apache/lucene/pull/819

   Gradle will give a very confusing error, let's make it absolutely clear.
   ![Screen_Shot_2022-04-19_at_08 28 
26](https://user-images.githubusercontent.com/504194/164003748-e0b26827-c38d-4cf8-9e91-a48dfa9d5928.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #818: Fix incorrect docs in README.md: it must be java 17 exactly, java 18 does not work

2022-04-19 Thread GitBox


rmuir commented on PR #818:
URL: https://github.com/apache/lucene/pull/818#issuecomment-1102585523

   > I wish we could fix `./gradlew` to detect you are using an unsupported JDK 
version and say so (exit with error/exception with a clear message).
   
   https://github.com/apache/lucene/pull/819


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mocobeta commented on pull request #808: LUCENE-10513: Run `gradlew tidy` first

2022-04-19 Thread GitBox


mocobeta commented on PR #808:
URL: https://github.com/apache/lucene/pull/808#issuecomment-1102598934

   oh, I learned that wikipedia has an article on Shoshin (初心) for the first 
time. It's a common noun in Japanese (and also in Chinese I think) so there are 
no corresponding articles in those languages; I can't really explain it, but 
it's interesting to me the word is capitalized like proper nouns...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] cpoerschke opened a new pull request, #820: Remove outdated comment in UnifiedHighlighter.get(Formatter|Scorer) javadoc.

2022-04-19 Thread GitBox


cpoerschke opened a new pull request, #820:
URL: https://github.com/apache/lucene/pull/820

   No JIRA ticket required for this change, in my opinion.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mikemccand commented on pull request #808: LUCENE-10513: Run `gradlew tidy` first

2022-04-19 Thread GitBox


mikemccand commented on PR #808:
URL: https://github.com/apache/lucene/pull/808#issuecomment-1102642347

   > oh, I learned that wikipedia has an article on Shoshin (初心) for the first 
time. It's a common noun in Japanese (and also in Chinese I think) so there are 
no corresponding articles in those languages; I can't really explain it, but 
it's interesting to me the word is capitalized like proper nouns...
   
   Oh thanks for explaining @mocobeta!  I have been capitalizing it ever since 
I learned it but I will try to stop.  Maybe I can just use 初心 going forwards.  
Thanks!
   
   shoshin (初心) also reminds of this helpful graph (from WaitButWhy's [The 
Thinking Ladder](https://waitbutwhy.com/2019/09/thinking-ladder.html)):
   
   
![wbw-conviction-knowledge](https://user-images.githubusercontent.com/796508/164011667-0a4d86cf-d933-4695-8af1-d0420062ee5e.png)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz merged pull request #794: LUCENE-10153: Improve accuracy of scaled scores in WANDScorer.

2022-04-19 Thread GitBox


jpountz merged PR #794:
URL: https://github.com/apache/lucene/pull/794


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10153) More speedups for operations on byte[] via VarHandles

2022-04-19 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524302#comment-17524302
 ] 

ASF subversion and git services commented on LUCENE-10153:
--

Commit d9e37f31230f595dce668e86a2f151d3aa4c4176 in lucene's branch 
refs/heads/main from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=d9e37f31230 ]

LUCENE-10153: Improve accuracy of scaled scores in WANDScorer. (#794)



> More speedups for operations on byte[] via VarHandles
> -
>
> Key: LUCENE-10153
> URL: https://issues.apache.org/jira/browse/LUCENE-10153
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 9.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> LUCENE-10145 leveraged VarHandles to speed up unsigned comparisons of byte[4] 
> or byte[8]. But we could do more, such as speeding up the computation of 
> common prefix lengths.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-10523) facilitate UnifiedHighlighter extension w.r.t. FieldHighlighter

2022-04-19 Thread Christine Poerschke (Jira)
Christine Poerschke created LUCENE-10523:


 Summary: facilitate UnifiedHighlighter extension w.r.t. 
FieldHighlighter
 Key: LUCENE-10523
 URL: https://issues.apache.org/jira/browse/LUCENE-10523
 Project: Lucene - Core
  Issue Type: Wish
Reporter: Christine Poerschke
Assignee: Christine Poerschke


If the {{UnifiedHighlighter}} had a protected {{newFieldHighlighter}} method 
then less {{getFieldHighlighter}} code would need to be duplicated if one 
wanted to use a custom {{FieldHighlighter}}.

Proposed change: pull-request-link-to-follow

A possible usage scenario:
 * e.g. via Solr's {{HTMLStripFieldUpdateProcessorFactory}} any HTML markup 
could be stripped at document ingestion time but this may not suit all use cases
 * e.g. via Solr's {{hl.encoder=html}} parameter any HTML markup could be 
escaped at document search time when returning highlighting snippets but this 
may not suit all use cases
 * extension illustration: link-to-follow
 ** i.e. at document search time remove any HTML markup prior to highlight 
snippet extraction



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] cpoerschke opened a new pull request, #821: LUCENE-10523: factor out UnifiedHighlighter.newFieldHighlighter() method

2022-04-19 Thread GitBox


cpoerschke opened a new pull request, #821:
URL: https://github.com/apache/lucene/pull/821

   https://issues.apache.org/jira/browse/LUCENE-10523


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10153) More speedups for operations on byte[] via VarHandles

2022-04-19 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524304#comment-17524304
 ] 

ASF subversion and git services commented on LUCENE-10153:
--

Commit db0e712cad3a23e58a66c7c3aa1a9a8b0e217823 in lucene's branch 
refs/heads/branch_9x from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=db0e712cad3 ]

LUCENE-10153: Improve accuracy of scaled scores in WANDScorer. (#794)



> More speedups for operations on byte[] via VarHandles
> -
>
> Key: LUCENE-10153
> URL: https://issues.apache.org/jira/browse/LUCENE-10153
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 9.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> LUCENE-10145 leveraged VarHandles to speed up unsigned comparisons of byte[4] 
> or byte[8]. But we could do more, such as speeding up the computation of 
> common prefix lengths.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10523) facilitate UnifiedHighlighter extension w.r.t. FieldHighlighter

2022-04-19 Thread Christine Poerschke (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christine Poerschke updated LUCENE-10523:
-
Description: 
If the {{UnifiedHighlighter}} had a protected {{newFieldHighlighter}} method 
then less {{getFieldHighlighter}} code would need to be duplicated if one 
wanted to use a custom {{FieldHighlighter}}.

Proposed change: https://github.com/apache/lucene/pull/821

A possible usage scenario:
 * e.g. via Solr's {{HTMLStripFieldUpdateProcessorFactory}} any HTML markup 
could be stripped at document ingestion time but this may not suit all use cases
 * e.g. via Solr's {{hl.encoder=html}} parameter any HTML markup could be 
escaped at document search time when returning highlighting snippets but this 
may not suit all use cases
 * extension illustration: https://github.com/apache/solr/pull/811
 ** i.e. at document search time remove any HTML markup prior to highlight 
snippet extraction

  was:
If the {{UnifiedHighlighter}} had a protected {{newFieldHighlighter}} method 
then less {{getFieldHighlighter}} code would need to be duplicated if one 
wanted to use a custom {{FieldHighlighter}}.

Proposed change: pull-request-link-to-follow

A possible usage scenario:
 * e.g. via Solr's {{HTMLStripFieldUpdateProcessorFactory}} any HTML markup 
could be stripped at document ingestion time but this may not suit all use cases
 * e.g. via Solr's {{hl.encoder=html}} parameter any HTML markup could be 
escaped at document search time when returning highlighting snippets but this 
may not suit all use cases
 * extension illustration: link-to-follow
 ** i.e. at document search time remove any HTML markup prior to highlight 
snippet extraction


> facilitate UnifiedHighlighter extension w.r.t. FieldHighlighter
> ---
>
> Key: LUCENE-10523
> URL: https://issues.apache.org/jira/browse/LUCENE-10523
> Project: Lucene - Core
>  Issue Type: Wish
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> If the {{UnifiedHighlighter}} had a protected {{newFieldHighlighter}} method 
> then less {{getFieldHighlighter}} code would need to be duplicated if one 
> wanted to use a custom {{FieldHighlighter}}.
> Proposed change: https://github.com/apache/lucene/pull/821
> A possible usage scenario:
>  * e.g. via Solr's {{HTMLStripFieldUpdateProcessorFactory}} any HTML markup 
> could be stripped at document ingestion time but this may not suit all use 
> cases
>  * e.g. via Solr's {{hl.encoder=html}} parameter any HTML markup could be 
> escaped at document search time when returning highlighting snippets but this 
> may not suit all use cases
>  * extension illustration: https://github.com/apache/solr/pull/811
>  ** i.e. at document search time remove any HTML markup prior to highlight 
> snippet extraction



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz merged pull request #799: LUCENE-10506: change visibility of ProfilerCollector#deriveCollectorName to protected

2022-04-19 Thread GitBox


jpountz merged PR #799:
URL: https://github.com/apache/lucene/pull/799


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10506) ProfilerCollector to support customizing how name is derived

2022-04-19 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524310#comment-17524310
 ] 

ASF subversion and git services commented on LUCENE-10506:
--

Commit 972663cc1de4c273df99e3ed9dcf7a5c0d44065a in lucene's branch 
refs/heads/branch_9x from Luca Cavanna
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=972663cc1de ]

LUCENE-10506: change visibility of ProfilerCollector#deriveCollectorName to 
protected (#799)

This allows subclasses to extend how the inner collector name is derived.

> ProfilerCollector to support customizing how name is derived
> 
>
> Key: LUCENE-10506
> URL: https://issues.apache.org/jira/browse/LUCENE-10506
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/sandbox
>Reporter: Luca Cavanna
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> ProfilerCollector (part of the sandbox) has a private method called 
> deriveCollectorName that extracts the class simple name from the provided 
> collector and sets it as the name of the collector which becomes part of the 
> profile results later.
> While the default behaviour is reasonable, there are cases where it would be 
> useful to extend this logic, and perhaps not use class names, or enhance that 
> with more context that the collectors could provide. This could be achieved 
> by making the deriveCollectorName method protected.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10506) ProfilerCollector to support customizing how name is derived

2022-04-19 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524309#comment-17524309
 ] 

ASF subversion and git services commented on LUCENE-10506:
--

Commit 866bb86a1c97590a4f42934afe05a78f66f10c92 in lucene's branch 
refs/heads/main from Luca Cavanna
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=866bb86a1c9 ]

LUCENE-10506: change visibility of ProfilerCollector#deriveCollectorName to 
protected (#799)

This allows subclasses to extend how the inner collector name is derived.

> ProfilerCollector to support customizing how name is derived
> 
>
> Key: LUCENE-10506
> URL: https://issues.apache.org/jira/browse/LUCENE-10506
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/sandbox
>Reporter: Luca Cavanna
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> ProfilerCollector (part of the sandbox) has a private method called 
> deriveCollectorName that extracts the class simple name from the provided 
> collector and sets it as the name of the collector which becomes part of the 
> profile results later.
> While the default behaviour is reasonable, there are cases where it would be 
> useful to extend this logic, and perhaps not use class names, or enhance that 
> with more context that the collectors could provide. This could be achieved 
> by making the deriveCollectorName method protected.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-10506) ProfilerCollector to support customizing how name is derived

2022-04-19 Thread Adrien Grand (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-10506.
---
Fix Version/s: 9.2
   Resolution: Fixed

> ProfilerCollector to support customizing how name is derived
> 
>
> Key: LUCENE-10506
> URL: https://issues.apache.org/jira/browse/LUCENE-10506
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/sandbox
>Reporter: Luca Cavanna
>Priority: Minor
> Fix For: 9.2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> ProfilerCollector (part of the sandbox) has a private method called 
> deriveCollectorName that extracts the class simple name from the provided 
> collector and sets it as the name of the collector which becomes part of the 
> profile results later.
> While the default behaviour is reasonable, there are cases where it would be 
> useful to extend this logic, and perhaps not use class names, or enhance that 
> with more context that the collectors could provide. This could be achieved 
> by making the deriveCollectorName method protected.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-10503) Preserve more significant bits of scores in WANDScorer

2022-04-19 Thread Adrien Grand (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-10503.
---
Fix Version/s: 9.2
   Resolution: Fixed

I mixed up the JIRA number in the commit message and the notification went to 
LUCENE-10153.

> Preserve more significant bits of scores in WANDScorer
> --
>
> Key: LUCENE-10503
> URL: https://issues.apache.org/jira/browse/LUCENE-10503
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 9.2
>
>
> WANDScorer operates on longs to avoid accuracy issues with floating-point 
> numbers. The current process loses more accuracy bits than it could, and 
> making it better could help skip in a few more situations.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mocobeta commented on pull request #808: LUCENE-10513: Run `gradlew tidy` first

2022-04-19 Thread GitBox


mocobeta commented on PR #808:
URL: https://github.com/apache/lucene/pull/808#issuecomment-1102669627

   I didn't know the figure, it's very simple and helpful, thanks! I'll read 
the article next holidays.
   
   >  I have been capitalizing it ever since I learned it but I will try to 
stop.
   
   It looks that the word Shoshin may already have a special or cultural 
meaning in English, so capitalization may be needed to express the nuance that 
the original Japanese word doesn't have? :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10153) More speedups for operations on byte[] via VarHandles

2022-04-19 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524314#comment-17524314
 ] 

Adrien Grand commented on LUCENE-10153:
---

Sorry for the noise, I pushed a commit that had the wrong JIRA number attached 
to it.

> More speedups for operations on byte[] via VarHandles
> -
>
> Key: LUCENE-10153
> URL: https://issues.apache.org/jira/browse/LUCENE-10153
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 9.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> LUCENE-10145 leveraged VarHandles to speed up unsigned comparisons of byte[4] 
> or byte[8]. But we could do more, such as speeding up the computation of 
> common prefix lengths.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10503) Preserve more significant bits of scores in WANDScorer

2022-04-19 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524317#comment-17524317
 ] 

ASF subversion and git services commented on LUCENE-10503:
--

Commit 15ecf3c27f97a109e53f9bdcccb0db34c3a30379 in lucene's branch 
refs/heads/main from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=15ecf3c27f9 ]

LUCENE-10503: Fix JIRA number in CHANGES.


> Preserve more significant bits of scores in WANDScorer
> --
>
> Key: LUCENE-10503
> URL: https://issues.apache.org/jira/browse/LUCENE-10503
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 9.2
>
>
> WANDScorer operates on longs to avoid accuracy issues with floating-point 
> numbers. The current process loses more accuracy bits than it could, and 
> making it better could help skip in a few more situations.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10503) Preserve more significant bits of scores in WANDScorer

2022-04-19 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524316#comment-17524316
 ] 

ASF subversion and git services commented on LUCENE-10503:
--

Commit 241406123384a81c230dfb51b2225aa329823196 in lucene's branch 
refs/heads/branch_9x from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=24140612338 ]

LUCENE-10503: Fix JIRA number in CHANGES.


> Preserve more significant bits of scores in WANDScorer
> --
>
> Key: LUCENE-10503
> URL: https://issues.apache.org/jira/browse/LUCENE-10503
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 9.2
>
>
> WANDScorer operates on longs to avoid accuracy issues with floating-point 
> numbers. The current process loses more accuracy bits than it could, and 
> making it better could help skip in a few more situations.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-9848) Correctly sort HNSW graph neighbors when applying diversity criterion

2022-04-19 Thread Mayya Sharipova (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayya Sharipova reassigned LUCENE-9848:
---

Assignee: Mayya Sharipova

> Correctly sort HNSW graph neighbors when applying diversity criterion 
> --
>
> Key: LUCENE-9848
> URL: https://issues.apache.org/jira/browse/LUCENE-9848
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael Sokolov
>Assignee: Mayya Sharipova
>Priority: Major
>
> When indexing new documents in an HNSW graph, we first find its nearest 
> maxConn neighbors (using HNSW search), and then link the new document to this 
> neighbors in the graph. These neighbors are filtered using a diversity test. 
> The neighbors are added one by one, from most similar to least. Each new 
> neighbor is checked against all prior (better) neighbors, and if it is more 
> similar to that neighbor than it is to the target document, it is rejected as 
> insufficiently diverse.
> When we applied this diversity criterion (rather than simply picking the k 
> nearest neighbors), we saw substantial improvements in recall / latency ROC 
> curves across several data sets, and it is part of the reference 
> implementation, too (where we got it). I believe the impact on indexing 
> performance was relatively small; this is a good thing to do, even though it 
> is n^2 at its heart, the n remains reasonable due to being bounded by the 
> maximum graph fanout parameter, {{maxConn}}. 
> Something funny happens when we reach the maximum fanout though. While a new 
> document is being linked to its new neighbors, the neighbors are reciprocally 
> linked to the new document, until their maximum fanout is reached. At that 
> point, the diversity criterion is reapplied to select the neighbors to keep. 
> Basically every neighbor is re-checked against every earlier (better) 
> neighbor to verify the diversity criterion.  This is needed because we 
> haven't really maintained the diversity property while adding these 
> reciprocal links – the initial neighbors are checked for diversity, which 
> often leads to fewer than {{maxConn}} of them being added. Then the new 
> documents get linked in without checking, until {{maxConn}} is reached, and 
> then diversity is checked again. This is kind of weird, but seems to work.
> But the really strange thing is that when we reject non-diverse documents (in 
> HnswGraphBuilder.diversityUpdate), the neighbors are no longer sorted in 
> nearness order. I did some rough checks to see if better graphs would result 
> from re-sorting (so that when there are non-diverse neighbors, we always 
> prefer to drop the worse-scoring one), but it didn't seem to matter all that 
> much. But how can that be?
> At any rate this code is funky and hard to understand, and it would probably 
> benefit from a second look to see if we can either improve indexing 
> performance or improve search performance (by producing better graphs during 
> indexing).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10518) FieldInfos consistency check can refuse to open Lucene 8 index

2022-04-19 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524408#comment-17524408
 ] 

Adrien Grand commented on LUCENE-10518:
---

I'm unsure of the value of the consistency checks on 8.x indices. My gut 
feeling is that either users have created indices with consistent fields until 
now, and they'll keep their indices consistent after 9.0, or they have created 
indices with inconsistent fields and this consistency check is making upgrades 
harder without helping much. I wonder if we should just disable the check on 
8.x indices based on the index created version?

> FieldInfos consistency check can refuse to open Lucene 8 index
> --
>
> Key: LUCENE-10518
> URL: https://issues.apache.org/jira/browse/LUCENE-10518
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 8.10.1
>Reporter: Nhat Nguyen
>Priority: Major
>
> A field-infos consistency check introduced in Lucene 9 (LUCENE-9334) can 
> refuse to open a Lucene 8 index. Lucene 8 can create a partial FieldInfo if 
> hitting a non-aborting exception (for example [term is too 
> long|https://github.com/apache/lucene-solr/blob/6a6484ba396927727b16e5061384d3cd80d616b2/lucene/core/src/java/org/apache/lucene/index/DefaultIndexingChain.java#L944])
>  during processing fields of a document. We don't have this problem in Lucene 
> 9 as we process fields in two phases with the [first 
> phase|https://github.com/apache/lucene/blob/10ebc099c846c7d96f4ff5f9b7853df850fa8442/lucene/core/src/java/org/apache/lucene/index/IndexingChain.java#L589-L614]
>  processing only FieldInfos. 
> The issue can be reproduced with this snippet.
> {code:java}
> public void testWriteIndexOn8x() throws Exception {
>   FieldType KeywordField = new FieldType();
>   KeywordField.setTokenized(false);
>   KeywordField.setOmitNorms(true);
>   KeywordField.setIndexOptions(IndexOptions.DOCS);
>   KeywordField.freeze();
>   try (Directory dir = newDirectory()) {
> IndexWriterConfig config = new IndexWriterConfig();
> config.setCommitOnClose(false);
> config.setMergePolicy(NoMergePolicy.INSTANCE);
> try (IndexWriter writer = new IndexWriter(dir, config)) {
>   // first segment
>   writer.addDocument(new Document()); // an empty doc
>   Document d1 = new Document();
>   byte[] chars = new byte[IndexWriter.MAX_STORED_STRING_LENGTH + 1];
>   Arrays.fill(chars, (byte) 'a');
>   d1.add(new Field("field", new BytesRef(chars), KeywordField));
>   d1.add(new BinaryDocValuesField("field", new BytesRef(chars)));
>   expectThrows(IllegalArgumentException.class, () -> 
> writer.addDocument(d1));
>   writer.flush();
>   // second segment
>   Document d2 = new Document();
>   d2.add(new Field("field", new BytesRef("hello world"), KeywordField));
>   d2.add(new SortedDocValuesField("field", new BytesRef("hello world")));
>   writer.addDocument(d2);
>   writer.flush();
>   writer.commit();
>   // Check for doc values types consistency
>   Map docValuesTypes = new HashMap<>();
>   try(DirectoryReader reader = DirectoryReader.open(dir)){
> for (LeafReaderContext leaf : reader.leaves()) {
>   for (FieldInfo fi : leaf.reader().getFieldInfos()) {
> DocValuesType current = docValuesTypes.putIfAbsent(fi.name, 
> fi.getDocValuesType());
> if (current != null && current != fi.getDocValuesType()) {
>   fail("cannot change DocValues type from " + current + " to " + 
> fi.getDocValuesType() + " for field \"" + fi.name + "\"");
> }
>   }
> }
>   }
> }
>   }
> }
> {code}
> I would like to propose to:
> - Backport the two-phase fields processing from Lucene9 to Lucene8. The patch 
> should be small and contained.
> - Introduce an option in Lucene9 to skip checking field-infos consistency 
> (i.e., behave like Lucene 8 when the option is enabled).
> /cc [~mayya] and [~jpountz]



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10518) FieldInfos consistency check can refuse to open Lucene 8 index

2022-04-19 Thread Nhat Nguyen (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524409#comment-17524409
 ] 

Nhat Nguyen commented on LUCENE-10518:
--

+1 to disable consistency checks for 8.x indices. [~mayya] WDYT?

> FieldInfos consistency check can refuse to open Lucene 8 index
> --
>
> Key: LUCENE-10518
> URL: https://issues.apache.org/jira/browse/LUCENE-10518
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 8.10.1
>Reporter: Nhat Nguyen
>Priority: Major
>
> A field-infos consistency check introduced in Lucene 9 (LUCENE-9334) can 
> refuse to open a Lucene 8 index. Lucene 8 can create a partial FieldInfo if 
> hitting a non-aborting exception (for example [term is too 
> long|https://github.com/apache/lucene-solr/blob/6a6484ba396927727b16e5061384d3cd80d616b2/lucene/core/src/java/org/apache/lucene/index/DefaultIndexingChain.java#L944])
>  during processing fields of a document. We don't have this problem in Lucene 
> 9 as we process fields in two phases with the [first 
> phase|https://github.com/apache/lucene/blob/10ebc099c846c7d96f4ff5f9b7853df850fa8442/lucene/core/src/java/org/apache/lucene/index/IndexingChain.java#L589-L614]
>  processing only FieldInfos. 
> The issue can be reproduced with this snippet.
> {code:java}
> public void testWriteIndexOn8x() throws Exception {
>   FieldType KeywordField = new FieldType();
>   KeywordField.setTokenized(false);
>   KeywordField.setOmitNorms(true);
>   KeywordField.setIndexOptions(IndexOptions.DOCS);
>   KeywordField.freeze();
>   try (Directory dir = newDirectory()) {
> IndexWriterConfig config = new IndexWriterConfig();
> config.setCommitOnClose(false);
> config.setMergePolicy(NoMergePolicy.INSTANCE);
> try (IndexWriter writer = new IndexWriter(dir, config)) {
>   // first segment
>   writer.addDocument(new Document()); // an empty doc
>   Document d1 = new Document();
>   byte[] chars = new byte[IndexWriter.MAX_STORED_STRING_LENGTH + 1];
>   Arrays.fill(chars, (byte) 'a');
>   d1.add(new Field("field", new BytesRef(chars), KeywordField));
>   d1.add(new BinaryDocValuesField("field", new BytesRef(chars)));
>   expectThrows(IllegalArgumentException.class, () -> 
> writer.addDocument(d1));
>   writer.flush();
>   // second segment
>   Document d2 = new Document();
>   d2.add(new Field("field", new BytesRef("hello world"), KeywordField));
>   d2.add(new SortedDocValuesField("field", new BytesRef("hello world")));
>   writer.addDocument(d2);
>   writer.flush();
>   writer.commit();
>   // Check for doc values types consistency
>   Map docValuesTypes = new HashMap<>();
>   try(DirectoryReader reader = DirectoryReader.open(dir)){
> for (LeafReaderContext leaf : reader.leaves()) {
>   for (FieldInfo fi : leaf.reader().getFieldInfos()) {
> DocValuesType current = docValuesTypes.putIfAbsent(fi.name, 
> fi.getDocValuesType());
> if (current != null && current != fi.getDocValuesType()) {
>   fail("cannot change DocValues type from " + current + " to " + 
> fi.getDocValuesType() + " for field \"" + fi.name + "\"");
> }
>   }
> }
>   }
> }
>   }
> }
> {code}
> I would like to propose to:
> - Backport the two-phase fields processing from Lucene9 to Lucene8. The patch 
> should be small and contained.
> - Introduce an option in Lucene9 to skip checking field-infos consistency 
> (i.e., behave like Lucene 8 when the option is enabled).
> /cc [~mayya] and [~jpountz]



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10521) Tests in windows are failing for the new testAlwaysRefreshDirectoryTaxonomyReader test

2022-04-19 Thread Gautam Worah (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524471#comment-17524471
 ] 

Gautam Worah commented on LUCENE-10521:
---

Tests are passing now. Latest main build: 
https://jenkins.thetaphi.de/job/Lucene-main-Windows/10728/

> Tests in windows are failing for the new 
> testAlwaysRefreshDirectoryTaxonomyReader test
> --
>
> Key: LUCENE-10521
> URL: https://issues.apache.org/jira/browse/LUCENE-10521
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/facet
> Environment: Windows 10
>Reporter: Gautam Worah
>Priority: Minor
>
> Build: [https://jenkins.thetaphi.de/job/Lucene-main-Windows/10725/] is 
> failing.
>  
> Specifically, the loop which checks if any files still remain to be deleted 
> is not ending.
> We have added an exception to the main test class to not run the test on 
> WindowsFS (not sure if this is related).
>  
> ```
> SEVERE: 1 thread leaked from SUITE scope at 
> org.apache.lucene.facet.taxonomy.directory.TestAlwaysRefreshDirectoryTaxonomyReader:
>  1) Thread[id=19, 
> name=TEST-TestAlwaysRefreshDirectoryTaxonomyReader.testAlwaysRefreshDirectoryTaxonomyReader-seed#[F46E42CB7F2B6959],
>  state=RUNNABLE, group=TGRP-TestAlwaysRefreshDirectoryTaxonomyReader] at 
> java.base@18/sun.nio.fs.WindowsNativeDispatcher.GetFileAttributesEx0(Native 
> Method) at 
> java.base@18/sun.nio.fs.WindowsNativeDispatcher.GetFileAttributesEx(WindowsNativeDispatcher.java:390)
>  at 
> java.base@18/sun.nio.fs.WindowsFileAttributes.get(WindowsFileAttributes.java:307)
>  at 
> java.base@18/sun.nio.fs.WindowsFileSystemProvider.implDelete(WindowsFileSystemProvider.java:251)
>  at 
> java.base@18/sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:105)
>  at 
> app/org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.mockfile.FilterFileSystemProvider.delete(FilterFileSystemProvider.java:130)
>  at 
> app/org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.mockfile.FilterFileSystemProvider.delete(FilterFileSystemProvider.java:130)
>  at 
> app/org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.mockfile.FilterFileSystemProvider.delete(FilterFileSystemProvider.java:130)
>  at 
> app/org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.mockfile.FilterFileSystemProvider.delete(FilterFileSystemProvider.java:130)
>  at 
> app/org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.mockfile.FilterFileSystemProvider.delete(FilterFileSystemProvider.java:130)
>  at java.base@18/java.nio.file.Files.delete(Files.java:1152) at 
> app/org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.store.FSDirectory.privateDeleteFile(FSDirectory.java:344)
>  at 
> app/org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.store.FSDirectory.deletePendingFiles(FSDirectory.java:325)
>  at 
> app/org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.store.FSDirectory.getPendingDeletions(FSDirectory.java:410)
>  at 
> app/org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.store.FilterDirectory.getPendingDeletions(FilterDirectory.java:121)
>  at 
> app//org.apache.lucene.facet.taxonomy.directory.TestAlwaysRefreshDirectoryTaxonomyReader.testAlwaysRefreshDirectoryTaxonomyReader(TestAlwaysRefreshDirectoryTaxonomyReader.java:97)
> ```



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dweiss commented on pull request #819: fail clearly on too-new JDK

2022-04-19 Thread GitBox


dweiss commented on PR #819:
URL: https://github.com/apache/lucene/pull/819#issuecomment-1102943639

   Yep. I wonder if we could do it in one script (the downloader?) to avoid 
running java so many times but overall I think it's better than before. :)
   
   This also reminds me that the various java versions (minimum required for 
Lucene, minimum required for gradle) are scattered around in oh-so-many places. 
I wonder how we could somehow centralize this information. I don't have any 
clean ideas though.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dweiss commented on pull request #818: Fix incorrect docs in README.md: it must be java 17 exactly, java 18 does not work

2022-04-19 Thread GitBox


dweiss commented on PR #818:
URL: https://github.com/apache/lucene/pull/818#issuecomment-1102946084

   > I wish we could fix ./gradlew to detect you are using an unsupported JDK 
version and say so (exit with error/exception with a clear message).
   
   I really don't understand why gradle doesn't just check it up front... Maybe 
they hope things will just run with future versions out of the box (I don't see 
how it's possible, given all the bytecode manipulation magic, eh).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dweiss commented on pull request #817: improve spotless error to suggest running 'gradlew tidy'

2022-04-19 Thread GitBox


dweiss commented on PR #817:
URL: https://github.com/apache/lucene/pull/817#issuecomment-1102953991

   This attaches to each and every spotless task (and would print a message for 
all of them). 
   
   Maybe it'd be better to create a single finalizer task (at the root level), 
collect all the spotless tasks in the graph and add finalizedBy pointing at 
that single task (it's still need to check the status of those tasks it 
finalizes). Then if you have multiple failures, it'd print the message just 
once. It could even fail on its own - then the error would be the last one 
reported... But then, maybe it's overdoing things.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dweiss commented on pull request #807: LUCENE-10512: Grammar: Remove incidents of "the the" in comments.

2022-04-19 Thread GitBox


dweiss commented on PR #807:
URL: https://github.com/apache/lucene/pull/807#issuecomment-1102961503

   I don't know, Mike... Gradle doesn't seem like a tool that you can ever make 
dead-simple (like ant). I like what Robert added but with hacks like that a 
question always pops to my mind of what happens in gradle, internally, if you 
throw an exception from such a block - think throwing an exception from java's 
try-finally that obscures the original cause...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #819: fail clearly on too-new JDK

2022-04-19 Thread GitBox


rmuir commented on PR #819:
URL: https://github.com/apache/lucene/pull/819#issuecomment-1102995607

   > Yep. I wonder if we could do it in one script (the downloader?) to avoid 
running java so many times but overall I think it's better than before. :)
   
   I agree with this (as someone who disables the daemon and runs the commands 
every time). The only reason I specified it as a different command was due to 
the fact that if the downloader fails, the error is "trapped" and an additional 
version-related error (that you need at least java 11) is printed... IMO that's 
a bit confusing, but I get it.
   
   > 
   > This also reminds me that the various java versions (minimum required for 
Lucene, minimum required for gradle) are scattered around in oh-so-many places. 
I wonder how we could somehow centralize this information. I don't have any 
clean ideas though.
   
   Maybe it could be in a .properties file? Not gradle, not groovy, a real 
actual .properties file that we can read with java.io.Properties too?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10518) FieldInfos consistency check can refuse to open Lucene 8 index

2022-04-19 Thread Mayya Sharipova (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524548#comment-17524548
 ] 

Mayya Sharipova commented on LUCENE-10518:
--

[~dnhatn] [~jpountz]  Thanks for your suggestions. +1 on the idea as well. Do 
we want to disable consistency check only on segments that were created in 8.x 
and even of 9.x segments of 8.x index?

> FieldInfos consistency check can refuse to open Lucene 8 index
> --
>
> Key: LUCENE-10518
> URL: https://issues.apache.org/jira/browse/LUCENE-10518
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 8.10.1
>Reporter: Nhat Nguyen
>Priority: Major
>
> A field-infos consistency check introduced in Lucene 9 (LUCENE-9334) can 
> refuse to open a Lucene 8 index. Lucene 8 can create a partial FieldInfo if 
> hitting a non-aborting exception (for example [term is too 
> long|https://github.com/apache/lucene-solr/blob/6a6484ba396927727b16e5061384d3cd80d616b2/lucene/core/src/java/org/apache/lucene/index/DefaultIndexingChain.java#L944])
>  during processing fields of a document. We don't have this problem in Lucene 
> 9 as we process fields in two phases with the [first 
> phase|https://github.com/apache/lucene/blob/10ebc099c846c7d96f4ff5f9b7853df850fa8442/lucene/core/src/java/org/apache/lucene/index/IndexingChain.java#L589-L614]
>  processing only FieldInfos. 
> The issue can be reproduced with this snippet.
> {code:java}
> public void testWriteIndexOn8x() throws Exception {
>   FieldType KeywordField = new FieldType();
>   KeywordField.setTokenized(false);
>   KeywordField.setOmitNorms(true);
>   KeywordField.setIndexOptions(IndexOptions.DOCS);
>   KeywordField.freeze();
>   try (Directory dir = newDirectory()) {
> IndexWriterConfig config = new IndexWriterConfig();
> config.setCommitOnClose(false);
> config.setMergePolicy(NoMergePolicy.INSTANCE);
> try (IndexWriter writer = new IndexWriter(dir, config)) {
>   // first segment
>   writer.addDocument(new Document()); // an empty doc
>   Document d1 = new Document();
>   byte[] chars = new byte[IndexWriter.MAX_STORED_STRING_LENGTH + 1];
>   Arrays.fill(chars, (byte) 'a');
>   d1.add(new Field("field", new BytesRef(chars), KeywordField));
>   d1.add(new BinaryDocValuesField("field", new BytesRef(chars)));
>   expectThrows(IllegalArgumentException.class, () -> 
> writer.addDocument(d1));
>   writer.flush();
>   // second segment
>   Document d2 = new Document();
>   d2.add(new Field("field", new BytesRef("hello world"), KeywordField));
>   d2.add(new SortedDocValuesField("field", new BytesRef("hello world")));
>   writer.addDocument(d2);
>   writer.flush();
>   writer.commit();
>   // Check for doc values types consistency
>   Map docValuesTypes = new HashMap<>();
>   try(DirectoryReader reader = DirectoryReader.open(dir)){
> for (LeafReaderContext leaf : reader.leaves()) {
>   for (FieldInfo fi : leaf.reader().getFieldInfos()) {
> DocValuesType current = docValuesTypes.putIfAbsent(fi.name, 
> fi.getDocValuesType());
> if (current != null && current != fi.getDocValuesType()) {
>   fail("cannot change DocValues type from " + current + " to " + 
> fi.getDocValuesType() + " for field \"" + fi.name + "\"");
> }
>   }
> }
>   }
> }
>   }
> }
> {code}
> I would like to propose to:
> - Backport the two-phase fields processing from Lucene9 to Lucene8. The patch 
> should be small and contained.
> - Introduce an option in Lucene9 to skip checking field-infos consistency 
> (i.e., behave like Lucene 8 when the option is enabled).
> /cc [~mayya] and [~jpountz]



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #817: improve spotless error to suggest running 'gradlew tidy'

2022-04-19 Thread GitBox


rmuir commented on PR #817:
URL: https://github.com/apache/lucene/pull/817#issuecomment-1102998198

   > Maybe it'd be better to create a single finalizer task (at the root 
level), collect all the spotless tasks in the graph and add finalizedBy 
pointing at that single task (it's still need to check the status of those 
tasks it finalizes). Then if you have multiple failures, it'd print the message 
just once. It could even fail on its own - then the error would be the last one 
reported... But then, maybe it's overdoing things.
   
   This is beyond my area of gradle kung fu, but I'll leave the PR open in case 
anyone else understands it. I just basically hacked until I was able to get 
something printed in the case spotless fails. If there is a way to "try/catch" 
its error, that would be fine too. I do think, despite its confusing errors, 
that it is best to print whatever spotless says. It is just that we want to 
"amend" the output (loudly) when it does with a simple tip to make it easier.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-10518) FieldInfos consistency check can refuse to open Lucene 8 index

2022-04-19 Thread Mayya Sharipova (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524548#comment-17524548
 ] 

Mayya Sharipova edited comment on LUCENE-10518 at 4/19/22 7:15 PM:
---

[~dnhatn] [~jpountz]  Thanks for your suggestions. +1 on the idea as well. Do 
we want to disable consistency check only on segments that were created in 8.x  
or on all segments  even of 9.x segments of 8.x index?


was (Author: mayya):
[~dnhatn] [~jpountz]  Thanks for your suggestions. +1 on the idea as well. Do 
we want to disable consistency check only on segments that were created in 8.x 
and even of 9.x segments of 8.x index?

> FieldInfos consistency check can refuse to open Lucene 8 index
> --
>
> Key: LUCENE-10518
> URL: https://issues.apache.org/jira/browse/LUCENE-10518
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 8.10.1
>Reporter: Nhat Nguyen
>Priority: Major
>
> A field-infos consistency check introduced in Lucene 9 (LUCENE-9334) can 
> refuse to open a Lucene 8 index. Lucene 8 can create a partial FieldInfo if 
> hitting a non-aborting exception (for example [term is too 
> long|https://github.com/apache/lucene-solr/blob/6a6484ba396927727b16e5061384d3cd80d616b2/lucene/core/src/java/org/apache/lucene/index/DefaultIndexingChain.java#L944])
>  during processing fields of a document. We don't have this problem in Lucene 
> 9 as we process fields in two phases with the [first 
> phase|https://github.com/apache/lucene/blob/10ebc099c846c7d96f4ff5f9b7853df850fa8442/lucene/core/src/java/org/apache/lucene/index/IndexingChain.java#L589-L614]
>  processing only FieldInfos. 
> The issue can be reproduced with this snippet.
> {code:java}
> public void testWriteIndexOn8x() throws Exception {
>   FieldType KeywordField = new FieldType();
>   KeywordField.setTokenized(false);
>   KeywordField.setOmitNorms(true);
>   KeywordField.setIndexOptions(IndexOptions.DOCS);
>   KeywordField.freeze();
>   try (Directory dir = newDirectory()) {
> IndexWriterConfig config = new IndexWriterConfig();
> config.setCommitOnClose(false);
> config.setMergePolicy(NoMergePolicy.INSTANCE);
> try (IndexWriter writer = new IndexWriter(dir, config)) {
>   // first segment
>   writer.addDocument(new Document()); // an empty doc
>   Document d1 = new Document();
>   byte[] chars = new byte[IndexWriter.MAX_STORED_STRING_LENGTH + 1];
>   Arrays.fill(chars, (byte) 'a');
>   d1.add(new Field("field", new BytesRef(chars), KeywordField));
>   d1.add(new BinaryDocValuesField("field", new BytesRef(chars)));
>   expectThrows(IllegalArgumentException.class, () -> 
> writer.addDocument(d1));
>   writer.flush();
>   // second segment
>   Document d2 = new Document();
>   d2.add(new Field("field", new BytesRef("hello world"), KeywordField));
>   d2.add(new SortedDocValuesField("field", new BytesRef("hello world")));
>   writer.addDocument(d2);
>   writer.flush();
>   writer.commit();
>   // Check for doc values types consistency
>   Map docValuesTypes = new HashMap<>();
>   try(DirectoryReader reader = DirectoryReader.open(dir)){
> for (LeafReaderContext leaf : reader.leaves()) {
>   for (FieldInfo fi : leaf.reader().getFieldInfos()) {
> DocValuesType current = docValuesTypes.putIfAbsent(fi.name, 
> fi.getDocValuesType());
> if (current != null && current != fi.getDocValuesType()) {
>   fail("cannot change DocValues type from " + current + " to " + 
> fi.getDocValuesType() + " for field \"" + fi.name + "\"");
> }
>   }
> }
>   }
> }
>   }
> }
> {code}
> I would like to propose to:
> - Backport the two-phase fields processing from Lucene9 to Lucene8. The patch 
> should be small and contained.
> - Introduce an option in Lucene9 to skip checking field-infos consistency 
> (i.e., behave like Lucene 8 when the option is enabled).
> /cc [~mayya] and [~jpountz]



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-10518) FieldInfos consistency check can refuse to open Lucene 8 index

2022-04-19 Thread Mayya Sharipova (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524548#comment-17524548
 ] 

Mayya Sharipova edited comment on LUCENE-10518 at 4/19/22 7:16 PM:
---

[~dnhatn] [~jpountz]  Thanks for your suggestions. +1 on the idea as well. Do 
we want to disable consistency check only on segments that were created in 8.x? 
 I guess for new 9.x segments of the 8.x index, consistency checks will be 
enforced during indexing. 


was (Author: mayya):
[~dnhatn] [~jpountz]  Thanks for your suggestions. +1 on the idea as well. Do 
we want to disable consistency check only on segments that were created in 8.x  
or on all segments  even of 9.x segments of 8.x index?

> FieldInfos consistency check can refuse to open Lucene 8 index
> --
>
> Key: LUCENE-10518
> URL: https://issues.apache.org/jira/browse/LUCENE-10518
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 8.10.1
>Reporter: Nhat Nguyen
>Priority: Major
>
> A field-infos consistency check introduced in Lucene 9 (LUCENE-9334) can 
> refuse to open a Lucene 8 index. Lucene 8 can create a partial FieldInfo if 
> hitting a non-aborting exception (for example [term is too 
> long|https://github.com/apache/lucene-solr/blob/6a6484ba396927727b16e5061384d3cd80d616b2/lucene/core/src/java/org/apache/lucene/index/DefaultIndexingChain.java#L944])
>  during processing fields of a document. We don't have this problem in Lucene 
> 9 as we process fields in two phases with the [first 
> phase|https://github.com/apache/lucene/blob/10ebc099c846c7d96f4ff5f9b7853df850fa8442/lucene/core/src/java/org/apache/lucene/index/IndexingChain.java#L589-L614]
>  processing only FieldInfos. 
> The issue can be reproduced with this snippet.
> {code:java}
> public void testWriteIndexOn8x() throws Exception {
>   FieldType KeywordField = new FieldType();
>   KeywordField.setTokenized(false);
>   KeywordField.setOmitNorms(true);
>   KeywordField.setIndexOptions(IndexOptions.DOCS);
>   KeywordField.freeze();
>   try (Directory dir = newDirectory()) {
> IndexWriterConfig config = new IndexWriterConfig();
> config.setCommitOnClose(false);
> config.setMergePolicy(NoMergePolicy.INSTANCE);
> try (IndexWriter writer = new IndexWriter(dir, config)) {
>   // first segment
>   writer.addDocument(new Document()); // an empty doc
>   Document d1 = new Document();
>   byte[] chars = new byte[IndexWriter.MAX_STORED_STRING_LENGTH + 1];
>   Arrays.fill(chars, (byte) 'a');
>   d1.add(new Field("field", new BytesRef(chars), KeywordField));
>   d1.add(new BinaryDocValuesField("field", new BytesRef(chars)));
>   expectThrows(IllegalArgumentException.class, () -> 
> writer.addDocument(d1));
>   writer.flush();
>   // second segment
>   Document d2 = new Document();
>   d2.add(new Field("field", new BytesRef("hello world"), KeywordField));
>   d2.add(new SortedDocValuesField("field", new BytesRef("hello world")));
>   writer.addDocument(d2);
>   writer.flush();
>   writer.commit();
>   // Check for doc values types consistency
>   Map docValuesTypes = new HashMap<>();
>   try(DirectoryReader reader = DirectoryReader.open(dir)){
> for (LeafReaderContext leaf : reader.leaves()) {
>   for (FieldInfo fi : leaf.reader().getFieldInfos()) {
> DocValuesType current = docValuesTypes.putIfAbsent(fi.name, 
> fi.getDocValuesType());
> if (current != null && current != fi.getDocValuesType()) {
>   fail("cannot change DocValues type from " + current + " to " + 
> fi.getDocValuesType() + " for field \"" + fi.name + "\"");
> }
>   }
> }
>   }
> }
>   }
> }
> {code}
> I would like to propose to:
> - Backport the two-phase fields processing from Lucene9 to Lucene8. The patch 
> should be small and contained.
> - Introduce an option in Lucene9 to skip checking field-infos consistency 
> (i.e., behave like Lucene 8 when the option is enabled).
> /cc [~mayya] and [~jpountz]



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-10518) FieldInfos consistency check can refuse to open Lucene 8 index

2022-04-19 Thread Mayya Sharipova (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524548#comment-17524548
 ] 

Mayya Sharipova edited comment on LUCENE-10518 at 4/19/22 7:17 PM:
---

[~dnhatn] [~jpountz]  Thanks for your suggestions. +1 on the idea as well.

Do we want to disable consistency check only on segments that were created in 
8.x?  I guess for new 9.x segments of the 8.x index, consistency checks will be 
enforced during indexing. 


was (Author: mayya):
[~dnhatn] [~jpountz]  Thanks for your suggestions. +1 on the idea as well. Do 
we want to disable consistency check only on segments that were created in 8.x? 
 I guess for new 9.x segments of the 8.x index, consistency checks will be 
enforced during indexing. 

> FieldInfos consistency check can refuse to open Lucene 8 index
> --
>
> Key: LUCENE-10518
> URL: https://issues.apache.org/jira/browse/LUCENE-10518
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 8.10.1
>Reporter: Nhat Nguyen
>Priority: Major
>
> A field-infos consistency check introduced in Lucene 9 (LUCENE-9334) can 
> refuse to open a Lucene 8 index. Lucene 8 can create a partial FieldInfo if 
> hitting a non-aborting exception (for example [term is too 
> long|https://github.com/apache/lucene-solr/blob/6a6484ba396927727b16e5061384d3cd80d616b2/lucene/core/src/java/org/apache/lucene/index/DefaultIndexingChain.java#L944])
>  during processing fields of a document. We don't have this problem in Lucene 
> 9 as we process fields in two phases with the [first 
> phase|https://github.com/apache/lucene/blob/10ebc099c846c7d96f4ff5f9b7853df850fa8442/lucene/core/src/java/org/apache/lucene/index/IndexingChain.java#L589-L614]
>  processing only FieldInfos. 
> The issue can be reproduced with this snippet.
> {code:java}
> public void testWriteIndexOn8x() throws Exception {
>   FieldType KeywordField = new FieldType();
>   KeywordField.setTokenized(false);
>   KeywordField.setOmitNorms(true);
>   KeywordField.setIndexOptions(IndexOptions.DOCS);
>   KeywordField.freeze();
>   try (Directory dir = newDirectory()) {
> IndexWriterConfig config = new IndexWriterConfig();
> config.setCommitOnClose(false);
> config.setMergePolicy(NoMergePolicy.INSTANCE);
> try (IndexWriter writer = new IndexWriter(dir, config)) {
>   // first segment
>   writer.addDocument(new Document()); // an empty doc
>   Document d1 = new Document();
>   byte[] chars = new byte[IndexWriter.MAX_STORED_STRING_LENGTH + 1];
>   Arrays.fill(chars, (byte) 'a');
>   d1.add(new Field("field", new BytesRef(chars), KeywordField));
>   d1.add(new BinaryDocValuesField("field", new BytesRef(chars)));
>   expectThrows(IllegalArgumentException.class, () -> 
> writer.addDocument(d1));
>   writer.flush();
>   // second segment
>   Document d2 = new Document();
>   d2.add(new Field("field", new BytesRef("hello world"), KeywordField));
>   d2.add(new SortedDocValuesField("field", new BytesRef("hello world")));
>   writer.addDocument(d2);
>   writer.flush();
>   writer.commit();
>   // Check for doc values types consistency
>   Map docValuesTypes = new HashMap<>();
>   try(DirectoryReader reader = DirectoryReader.open(dir)){
> for (LeafReaderContext leaf : reader.leaves()) {
>   for (FieldInfo fi : leaf.reader().getFieldInfos()) {
> DocValuesType current = docValuesTypes.putIfAbsent(fi.name, 
> fi.getDocValuesType());
> if (current != null && current != fi.getDocValuesType()) {
>   fail("cannot change DocValues type from " + current + " to " + 
> fi.getDocValuesType() + " for field \"" + fi.name + "\"");
> }
>   }
> }
>   }
> }
>   }
> }
> {code}
> I would like to propose to:
> - Backport the two-phase fields processing from Lucene9 to Lucene8. The patch 
> should be small and contained.
> - Introduce an option in Lucene9 to skip checking field-infos consistency 
> (i.e., behave like Lucene 8 when the option is enabled).
> /cc [~mayya] and [~jpountz]



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dweiss commented on pull request #819: fail clearly on too-new JDK

2022-04-19 Thread GitBox


dweiss commented on PR #819:
URL: https://github.com/apache/lucene/pull/819#issuecomment-1103018456

   > Maybe it could be in a .properties file? Not gradle, not groovy, a real 
actual .properties file that we can read with java.io.Properties too?
   
   I was thinking about something like this too, it's convenient. There is a 
number of those places referencing version numbers - I can't even remember them 
all. A commit list referencing LUCENE-10283 is a good place to start... :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dweiss commented on pull request #817: improve spotless error to suggest running 'gradlew tidy'

2022-04-19 Thread GitBox


dweiss commented on PR #817:
URL: https://github.com/apache/lucene/pull/817#issuecomment-1103022163

   You can commit this in or leave this open for a day or two. I'm catching up 
with work after a short holiday but maybe I can take a stab at this as a 
breather.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #807: LUCENE-10512: Grammar: Remove incidents of "the the" in comments.

2022-04-19 Thread GitBox


rmuir commented on PR #807:
URL: https://github.com/apache/lucene/pull/807#issuecomment-1103079108

   > I don't know, Mike... Gradle doesn't seem like a tool that you can ever 
make dead-simple (like ant). I like what Robert added but with hacks like that 
a question always pops to my mind of what happens in gradle, internally, if you 
throw an exception from such a block - think throwing an exception from java's 
try-finally that obscures the original cause...
   
   I first tried simply "printing stuff" (not throwing exception). in that case 
you see my "print" before the actual spotless exception text. So I changed it 
to `throw new GradleException` only because it would print my text after the 
spotless exception. If the concern is throwing the exception, maybe we could 
just print stuff. If we added more ascii art around it, it could still work :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10518) FieldInfos consistency check can refuse to open Lucene 8 index

2022-04-19 Thread Nhat Nguyen (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524582#comment-17524582
 ] 

Nhat Nguyen commented on LUCENE-10518:
--

[~mayya] That's correct. We only reduce the strict level of the consistency 
check when opening an existing Lucene 8x with IndexWriter. The same enforce 
will be applied to new segments.

> FieldInfos consistency check can refuse to open Lucene 8 index
> --
>
> Key: LUCENE-10518
> URL: https://issues.apache.org/jira/browse/LUCENE-10518
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 8.10.1
>Reporter: Nhat Nguyen
>Priority: Major
>
> A field-infos consistency check introduced in Lucene 9 (LUCENE-9334) can 
> refuse to open a Lucene 8 index. Lucene 8 can create a partial FieldInfo if 
> hitting a non-aborting exception (for example [term is too 
> long|https://github.com/apache/lucene-solr/blob/6a6484ba396927727b16e5061384d3cd80d616b2/lucene/core/src/java/org/apache/lucene/index/DefaultIndexingChain.java#L944])
>  during processing fields of a document. We don't have this problem in Lucene 
> 9 as we process fields in two phases with the [first 
> phase|https://github.com/apache/lucene/blob/10ebc099c846c7d96f4ff5f9b7853df850fa8442/lucene/core/src/java/org/apache/lucene/index/IndexingChain.java#L589-L614]
>  processing only FieldInfos. 
> The issue can be reproduced with this snippet.
> {code:java}
> public void testWriteIndexOn8x() throws Exception {
>   FieldType KeywordField = new FieldType();
>   KeywordField.setTokenized(false);
>   KeywordField.setOmitNorms(true);
>   KeywordField.setIndexOptions(IndexOptions.DOCS);
>   KeywordField.freeze();
>   try (Directory dir = newDirectory()) {
> IndexWriterConfig config = new IndexWriterConfig();
> config.setCommitOnClose(false);
> config.setMergePolicy(NoMergePolicy.INSTANCE);
> try (IndexWriter writer = new IndexWriter(dir, config)) {
>   // first segment
>   writer.addDocument(new Document()); // an empty doc
>   Document d1 = new Document();
>   byte[] chars = new byte[IndexWriter.MAX_STORED_STRING_LENGTH + 1];
>   Arrays.fill(chars, (byte) 'a');
>   d1.add(new Field("field", new BytesRef(chars), KeywordField));
>   d1.add(new BinaryDocValuesField("field", new BytesRef(chars)));
>   expectThrows(IllegalArgumentException.class, () -> 
> writer.addDocument(d1));
>   writer.flush();
>   // second segment
>   Document d2 = new Document();
>   d2.add(new Field("field", new BytesRef("hello world"), KeywordField));
>   d2.add(new SortedDocValuesField("field", new BytesRef("hello world")));
>   writer.addDocument(d2);
>   writer.flush();
>   writer.commit();
>   // Check for doc values types consistency
>   Map docValuesTypes = new HashMap<>();
>   try(DirectoryReader reader = DirectoryReader.open(dir)){
> for (LeafReaderContext leaf : reader.leaves()) {
>   for (FieldInfo fi : leaf.reader().getFieldInfos()) {
> DocValuesType current = docValuesTypes.putIfAbsent(fi.name, 
> fi.getDocValuesType());
> if (current != null && current != fi.getDocValuesType()) {
>   fail("cannot change DocValues type from " + current + " to " + 
> fi.getDocValuesType() + " for field \"" + fi.name + "\"");
> }
>   }
> }
>   }
> }
>   }
> }
> {code}
> I would like to propose to:
> - Backport the two-phase fields processing from Lucene9 to Lucene8. The patch 
> should be small and contained.
> - Introduce an option in Lucene9 to skip checking field-infos consistency 
> (i.e., behave like Lucene 8 when the option is enabled).
> /cc [~mayya] and [~jpountz]



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-10524) Augment CONTRIBUTING.md guide with instructions on how/when to benchmark

2022-04-19 Thread Gautam Worah (Jira)
Gautam Worah created LUCENE-10524:
-

 Summary: Augment CONTRIBUTING.md guide with instructions on 
how/when to benchmark
 Key: LUCENE-10524
 URL: https://issues.apache.org/jira/browse/LUCENE-10524
 Project: Lucene - Core
  Issue Type: Wish
Reporter: Gautam Worah


This came up when I was trying to think about improving the experience for new 
contributors.

Today, new contributors are usually unaware of where luceneutil benchmarks are 
and when/how to run them. Committers usually end up pointing contributors to 
the benchmarks package when they make perf impacting changes and then they run 
the benchmarks.

 

Adding benchmark details to the Lucene repo will also make them more accessible 
to other researchers who want to experiment/benchmark their own custom task 
implementation with Java Lucene.

 

What does the community think?

 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-10525) Improve WindowsFS emulation to catch directory names with : in them (which is not allowed)

2022-04-19 Thread Gautam Worah (Jira)
Gautam Worah created LUCENE-10525:
-

 Summary: Improve WindowsFS emulation to catch directory names with 
: in them (which is not allowed) 
 Key: LUCENE-10525
 URL: https://issues.apache.org/jira/browse/LUCENE-10525
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Gautam Worah


In PR ([https://github.com/apache/lucene/pull/762)] we missed the case where a 
tempDir name was using `:` in the dir name. This test was passing in Linux, 
MacOS environments but ended up failing in Windows build systems.

We ended up pushing a fix to not use `:` in the names.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10525) Improve WindowsFS emulation to catch directory names with : in them (which is not allowed)

2022-04-19 Thread Gautam Worah (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gautam Worah updated LUCENE-10525:
--
Description: 
In PR ([https://github.com/apache/lucene/pull/762)] we missed the case where a 
tempDir name was using `:` in the dir name. This test was passing in Linux, 
MacOS environments but ended up failing in Windows build systems.

We ended up pushing a fix to not use `:` in the names.

Open to other ideas as well!

  was:
In PR ([https://github.com/apache/lucene/pull/762)] we missed the case where a 
tempDir name was using `:` in the dir name. This test was passing in Linux, 
MacOS environments but ended up failing in Windows build systems.

We ended up pushing a fix to not use `:` in the names.


> Improve WindowsFS emulation to catch directory names with : in them (which is 
> not allowed) 
> ---
>
> Key: LUCENE-10525
> URL: https://issues.apache.org/jira/browse/LUCENE-10525
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Gautam Worah
>Priority: Minor
>
> In PR ([https://github.com/apache/lucene/pull/762)] we missed the case where 
> a tempDir name was using `:` in the dir name. This test was passing in Linux, 
> MacOS environments but ended up failing in Windows build systems.
> We ended up pushing a fix to not use `:` in the names.
> Open to other ideas as well!



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #817: improve spotless error to suggest running 'gradlew tidy'

2022-04-19 Thread GitBox


rmuir commented on PR #817:
URL: https://github.com/apache/lucene/pull/817#issuecomment-1103262627

   no reason to rush it in, take some time to think about it. i do think the 
change is worth the trouble though, reduce friction for new developers.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #819: fail clearly on too-new JDK

2022-04-19 Thread GitBox


rmuir commented on PR #819:
URL: https://github.com/apache/lucene/pull/819#issuecomment-1103272545

   > I agree with this (as someone who disables the daemon and runs the 
commands every time). The only reason I specified it as a different command was 
due to the fact that if the downloader fails, the error is "trapped" and an 
additional version-related error (that you need at least java 11) is printed... 
IMO that's a bit confusing, but I get it.
   
   we can clean up exit status so that wrapperdownloader uses something other 
than `1` when it fails. that's also the same status old `java` uses when it 
doesn't recognize `--source` parameter. then we can move the version check into 
wrapperdownloader and only say "please make sure you're using at least java 11" 
when it is relevant (java itself fails).  and it is safe to only apply it to 
exit status `1` because it is only relevant to already-released older jvms. I 
will look into this.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #819: fail clearly on too-new JDK

2022-04-19 Thread GitBox


rmuir commented on PR #819:
URL: https://github.com/apache/lucene/pull/819#issuecomment-1103286863

   OK, @dweiss can you take another look when you get a chance?
   
   java 8:
   ```
   $ ./gradlew check
   Unrecognized option: --source
   Error: Could not create the Java Virtual Machine.
   Error: A fatal exception has occurred. Program will exit.
   ERROR: Something went wrong. Make sure you're using Java 11 or later.
   ```
   
   java 18:
   ```
   $ ./gradlew check
   ERROR: java version must not be newer than 17 (unsupported by gradle), your 
version: 18
   ```
   
   But there's still more to do. if you use java 11 it will still fail, just 
differently, based on our gradle actual gradle logic.
   
   Let's say the user has java 9. It isn't good to tell a user to upgrade to 
java 11 or later (say they pick java 11), they download it, install it, only it 
then fails and says you need 17 or later. then the user download and install's 
the latest (say 18), only for it to fail yet one more time and tell them they 
need exactly 17... this is like leading them through a maze. So I want to clean 
up the messaging some more still...
   
   java 11:
   ```
   $ ./gradlew check
   To honour the JVM settings for this build a single-use Daemon process will 
be forked. See 
https://docs.gradle.org/7.2/userguide/gradle_daemon.html#sec:disabling_the_daemon.
   Daemon will be stopped at the end of the build
   
   FAILURE: Build failed with an exception.
   
   * Where:
   Script 
'/home/rmuir/workspace/lucene/gradle/validation/check-environment.gradle' line: 
36
   
   * What went wrong:
   A problem occurred evaluating script.
   > At least Java 17 is required, you are running Java 11 [OpenJDK 64-Bit 
Server VM 11.0.13+8]
   
   * Try:
   Run with --stacktrace option to get the stack trace. Run with --info or 
--debug option to get more log output. Run with --scan to get full insights.
   
   * Get more help at https://help.gradle.org
   
   BUILD FAILED in 4s
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #819: fail clearly on too-new JDK

2022-04-19 Thread GitBox


rmuir commented on PR #819:
URL: https://github.com/apache/lucene/pull/819#issuecomment-1103301886

   OK, i cleaned up the messaging here to emit better messages so we don't lead 
users through a maze of downloading and retrying. I didn't touch the gradle 
checks. And yeah, it would be great to consolidate to a properties file (min 
and max versions?) so there is less places to change when we bump it, we could 
read the properties from this checker at least.
   
   java 18
   ```
   $ ./gradlew check
   ERROR: java version be exactly 17, your version: 18
   
   java 9
   ```
   $ ./gradlew check
   Unrecognized option: --source
   Error: Could not create the Java Virtual Machine.
   Error: A fatal exception has occurred. Program will exit.
   ERROR: Something went wrong. Make sure you're using Java 17.
   ```
   
   java 11:
   ```
   $ ./gradlew check
   ERROR: java version be exactly 17, your version: 11
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10525) Improve WindowsFS emulation to catch directory names with : in them (which is not allowed)

2022-04-19 Thread Robert Muir (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524644#comment-17524644
 ] 

Robert Muir commented on LUCENE-10525:
--

good idea. there are quite a few banned characters and "names" we could look 
for: https://docs.microsoft.com/en-us/windows/win32/fileio/naming-a-file. 

> Improve WindowsFS emulation to catch directory names with : in them (which is 
> not allowed) 
> ---
>
> Key: LUCENE-10525
> URL: https://issues.apache.org/jira/browse/LUCENE-10525
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Gautam Worah
>Priority: Minor
>
> In PR ([https://github.com/apache/lucene/pull/762)] we missed the case where 
> a tempDir name was using `:` in the dir name. This test was passing in Linux, 
> MacOS environments but ended up failing in Windows build systems.
> We ended up pushing a fix to not use `:` in the names.
> Open to other ideas as well!



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10525) Improve WindowsFS emulation to catch directory names with : in them (which is not allowed)

2022-04-19 Thread Robert Muir (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524647#comment-17524647
 ] 

Robert Muir commented on LUCENE-10525:
--

So the "real" windowsfilesystem fails in the methods like Path.resolve(), 
throwing exceptions that look like this:
{noformat}
java.nio.file.InvalidPathException: Illegal char <:> at index 13: 
2022-04-15T20:35:33.995886500Z-001
{noformat}

I think we can really simulate it well, by doing exactly the same thing. But it 
is some work, and these APIs are not exactly fun to wrestle with. and we have 
to refactor these mock filesystems for it to work cleanly. Here's my suggestion 
of a plan: First we have to fix all the places in the code currently calling 
"new FilterPath()" directly (e.g. make this class package private or 
something). We can add a method like {{wrap(Path)}} to filesystemprovider that 
does the same thing, so there's only a single place doing the wrapping. then 
windowsfs can override this new {{wrap(Path)}} to return 'new WindowsPath', 
where WindowsPath is a new class that extends FilterPath, but overrides all the 
resolve() methods with the additional checks.

> Improve WindowsFS emulation to catch directory names with : in them (which is 
> not allowed) 
> ---
>
> Key: LUCENE-10525
> URL: https://issues.apache.org/jira/browse/LUCENE-10525
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Gautam Worah
>Priority: Minor
>
> In PR ([https://github.com/apache/lucene/pull/762)] we missed the case where 
> a tempDir name was using `:` in the dir name. This test was passing in Linux, 
> MacOS environments but ended up failing in Windows build systems.
> We ended up pushing a fix to not use `:` in the names.
> Open to other ideas as well!



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10524) Augment CONTRIBUTING.md guide with instructions on how/when to benchmark

2022-04-19 Thread Tomoko Uchida (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524662#comment-17524662
 ] 

Tomoko Uchida commented on LUCENE-10524:


I think it is great to have good documentation about benchmarking, actually, it 
is a kind of must-have stuff for this project to me.
I'm not sure how the volume will be, but how about having a dedicated help 
document (say, `gradlew helpBenchmark`) and link to it from CONTRIBUTING.md? 

> Augment CONTRIBUTING.md guide with instructions on how/when to benchmark
> 
>
> Key: LUCENE-10524
> URL: https://issues.apache.org/jira/browse/LUCENE-10524
> Project: Lucene - Core
>  Issue Type: Wish
>Reporter: Gautam Worah
>Priority: Minor
>
> This came up when I was trying to think about improving the experience for 
> new contributors.
> Today, new contributors are usually unaware of where luceneutil benchmarks 
> are and when/how to run them. Committers usually end up pointing contributors 
> to the benchmarks package when they make perf impacting changes and then they 
> run the benchmarks.
>  
> Adding benchmark details to the Lucene repo will also make them more 
> accessible to other researchers who want to experiment/benchmark their own 
> custom task implementation with Java Lucene.
>  
> What does the community think?
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-10526) add single method to mockfile to wrap a Path

2022-04-19 Thread Robert Muir (Jira)
Robert Muir created LUCENE-10526:


 Summary: add single method to mockfile to wrap a Path
 Key: LUCENE-10526
 URL: https://issues.apache.org/jira/browse/LUCENE-10526
 Project: Lucene - Core
  Issue Type: Sub-task
Reporter: Robert Muir


Currently, mockfilesystems wrap a path with "new FilterPath". but this 
"wrapping" logic is scattered everywhere in the code (and tests!). And it is 
hardcoded at filterpath (subclassing is not possible).

This makes it impossible for a mock filesystem to extend FilterPath with some 
custom logic (example: check for special windows reserved characters).

I don't think code/tests should be calling "new FilterPath" everywhere, this is 
also just messy. Instead they should ask the mockfilesystem's provider to wrap 
the path: {{provider.wrapPath(path, filesystem)}}.

This way, WindowsFS can then override wrapPath() with a subclass that looks for 
special characters.

This issue is just for the API refactoring/cleanup. Additional 
Windows-simulation can happen on the parent issue.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir opened a new pull request, #822: LUCENE-10526: add single method to mockfile to wrap a Path

2022-04-19 Thread GitBox


rmuir opened a new pull request, #822:
URL: https://github.com/apache/lucene/pull/822

   Currently "new FilterPath" is called from everywhere, making it impossible 
for a mockfilesystem to use a custom subclass.
   
   See JIRA for full description.
   
   The use case here is to e.g. allow WindowsFS to wrap with a custom subclass 
that checks for special characters that windows doesn't allow. But today it 
can't do that since there is code everywhere doing the wrapping.
   
   This PR adds a single method `wrapPath()` to FilterFileSystemProvider to do 
this wrapping, and refactors everything to use it.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10525) Improve WindowsFS emulation to catch directory names with : in them (which is not allowed)

2022-04-19 Thread Robert Muir (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524664#comment-17524664
 ] 

Robert Muir commented on LUCENE-10525:
--

I added a PR with a stab at the refactoring piece: 
https://github.com/apache/lucene/pull/822

I think after this change, you can override {{wrapPath()}} in WindowsFS to 
return a subclass of FilterPath (e.g. WindowsPath) that adds special character 
checks/name checks to all of its resolve() methods.

> Improve WindowsFS emulation to catch directory names with : in them (which is 
> not allowed) 
> ---
>
> Key: LUCENE-10525
> URL: https://issues.apache.org/jira/browse/LUCENE-10525
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Gautam Worah
>Priority: Minor
>
> In PR ([https://github.com/apache/lucene/pull/762)] we missed the case where 
> a tempDir name was using `:` in the dir name. This test was passing in Linux, 
> MacOS environments but ended up failing in Windows build systems.
> We ended up pushing a fix to not use `:` in the names.
> Open to other ideas as well!



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mocobeta commented on pull request #819: fail clearly on too-new JDK

2022-04-19 Thread GitBox


mocobeta commented on PR #819:
URL: https://github.com/apache/lucene/pull/819#issuecomment-1103423089

   > Maybe it could be in a .properties file? Not gradle, not groovy, a real 
actual .properties file that we can read with java.io.Properties too?
   
   +1 to a single source of source/target Java version(s). A simple key-value 
format may be easily used from the outside world of java/gradle - github 
actions scripts or the smoke tester, and so on.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org