[GitHub] [lucene] uschindler commented on a diff in pull request #817: improve spotless error to suggest running 'gradlew tidy'
uschindler commented on code in PR #817: URL: https://github.com/apache/lucene/pull/817#discussion_r852697178 ## gradle/validation/spotless.gradle: ## @@ -111,3 +111,9 @@ configure(project(":lucene").subprojects) { prj -> v.dependsOn ":checkJdkInternalsExportedToGradle" } } + +gradle.taskGraph.afterTask { Task task, TaskState state -> + if (task.name == 'spotlessJavaCheck' && state.failure) { +throw new GradleException("\n***\n*PLEASE RUN ./gradle tidy!*\n***"); Review Comment: "gradlew tidy", with "w". -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10520) HTMLStripCharFilter fails on '>' or '<' characters in attribute values
[ https://issues.apache.org/jira/browse/LUCENE-10520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Alishevskikh updated LUCENE-10520: --- Description: If HTML input contains attributes with '<' or '>' characters in their values, HTMLStripCharFilter produces unexpected results. See the attached unit test for example. These characters are valid in attribute values, as by the [HTML5 specification |https://html.spec.whatwg.org/#syntax-attribute-value]. The [W3C validator|https://validator.w3.org/nu/#textarea] does not have issues with the test HTML. was: If HTML input contains attributes with '<' or '>' characters in their values, HTMLCharStripFilter produces unexpected results. See the attached unit test for example. These characters are valid in attribute values, as by the [HTML5 specification |https://html.spec.whatwg.org/#syntax-attribute-value]. The [W3C validator|https://validator.w3.org/nu/#textarea] does not have issues with the test HTML. Labels: HTMLStripCharFilter (was: HTMLCharStripFilter) Summary: HTMLStripCharFilter fails on '>' or '<' characters in attribute values (was: HTMLCharStripFilter fails on '>' or '<' characters in attribute values ) > HTMLStripCharFilter fails on '>' or '<' characters in attribute values > --- > > Key: LUCENE-10520 > URL: https://issues.apache.org/jira/browse/LUCENE-10520 > Project: Lucene - Core > Issue Type: Bug > Components: modules/analysis >Affects Versions: 9.1 >Reporter: Alex Alishevskikh >Priority: Major > Labels: HTMLStripCharFilter > Fix For: 9.1 > > Attachments: HTMLStripCharFilterTest.java > > > If HTML input contains attributes with '<' or '>' characters in their values, > HTMLStripCharFilter produces unexpected results. > See the attached unit test for example. > These characters are valid in attribute values, as by the [HTML5 > specification |https://html.spec.whatwg.org/#syntax-attribute-value]. The > [W3C validator|https://validator.w3.org/nu/#textarea] does not have issues > with the test HTML. > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] LuXugang commented on pull request #792: LUCENE-10502: Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle ordToDoc
LuXugang commented on PR #792: URL: https://github.com/apache/lucene/pull/792#issuecomment-1102297367 Thanks @jtibshirani @mayya-sharipova , Indeed, only dense case was coverd in [luceneutil](https://github.com/mikemccand/luceneutil), so I write a [demo](https://github.com/LuXugang/Lucene-7.5.0/commit/b69ae6c70665878f95115a6a49715c84c760b4c6) to run a sparse case test. vector source: - 3 dimensions - 7 vectors within 100k documents in one segment - do `KnnVectorQuery` NumberOfDocumentsToFind | baseline(search)ms | candidate(search)ms -- | -- | -- 10 | 3 | 3 1000 | 7 | 7 1 | 24 | 30 2 | 45 | 48 5 | 108 | 117 FORMAT | baseline(indexSize) | candidate(indexSize) -- | -- | -- vec | 781K | 806K vem | 278K | 18K vex | 4.6M | 4.6M -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] LuXugang commented on pull request #792: LUCENE-10502: Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle ordToDoc
LuXugang commented on PR #792: URL: https://github.com/apache/lucene/pull/792#issuecomment-1102336772 Result of dense case in Luceneutil's benchmark by running `python src/python/localrun.py -source wikivector10k`: LowTermVector 1493.52 (9.1%) 1457.88 (11.5%) -2.4% ( -21% - 20%) 0.468 AndHighLowVector 1248.97 (9.3%) 1251.44 (9.0%)0.2% ( -16% - 20%) 0.945 MedTermVector 1407.52 (9.9%) 1414.02 (9.9%)0.5% ( -17% - 22%) 0.883 AndHighMedVector 1422.91 (11.6%) 1444.62 (9.6%)1.5% ( -17% - 25%) 0.649 AndHighHighVector 1441.78 (8.7%) 1468.59 (8.0%)1.9% ( -13% - 20%) 0.480 PKLookup 55.55 (22.3%) 56.87 (23.3%)2.4% ( -35% - 61%) 0.741 HighTermVector 1349.45 (11.1%) 1400.94 (9.7%)3.8% ( -15% - 27%) 0.249 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-10517) Improve performance of SortedSetDV faceting by iterating on class types
[ https://issues.apache.org/jira/browse/LUCENE-10517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17522838#comment-17522838 ] Chris Hegarty edited comment on LUCENE-10517 at 4/19/22 10:38 AM: -- With my M1 I get the following luceneutil benchmark results. Hardware Overview: Chip: Apple M1 Total Number of Cores: 8 (4 performance and 4 efficiency) Memory: 16 GB {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value LowPhrase 148.35 (2.1%) 143.66 (2.6%) -3.2% ( -7% - 1%) 0.000 MedIntervalsOrdered 197.27 (3.7%) 191.24 (5.7%) -3.1% ( -12% - 6%) 0.044 HighIntervalsOrdered 11.55 (2.6%) 11.33 (3.5%) -1.9% ( -7% - 4%) 0.055 AndHighMed 447.74 (2.1%) 441.26 (2.4%) -1.4% ( -5% - 3%) 0.042 HighTerm 2397.60 (4.0%) 2367.10 (2.4%) -1.3% ( -7% - 5%) 0.223 LowTerm 3939.37 (2.7%) 3890.14 (2.3%) -1.2% ( -6% - 3%) 0.111 OrHighNotHigh 1917.21 (2.8%) 1893.94 (3.2%) -1.2% ( -6% - 4%) 0.198 HighPhrase 32.93 (1.9%) 32.55 (1.1%) -1.2% ( -4% - 1%) 0.022 PKLookup 340.11 (4.5%) 336.69 (4.3%) -1.0% ( -9% - 8%) 0.471 TermDTSort 145.39 (4.1%) 144.09 (2.3%) -0.9% ( -7% - 5%) 0.394 HighSpanNear 10.38 (3.7%) 10.32 (1.9%) -0.6% ( -5% - 5%) 0.531 MedSpanNear 206.69 (2.8%) 205.70 (1.5%) -0.5% ( -4% - 3%) 0.500 Fuzzy2 91.75 (2.5%) 91.41 (1.4%) -0.4% ( -4% - 3%) 0.562 OrHighNotMed 1975.22 (3.5%) 1968.91 (2.7%) -0.3% ( -6% - 6%) 0.744 OrHighMed 66.62 (3.9%) 66.45 (4.8%) -0.3% ( -8% - 8%) 0.850 LowSloppyPhrase 62.60 (2.1%) 62.44 (2.5%) -0.3% ( -4% - 4%) 0.726 OrHighNotLow 1876.16 (2.5%) 1871.56 (2.4%) -0.2% ( -5% - 4%) 0.756 OrHighHigh 55.70 (3.9%) 55.64 (4.9%) -0.1% ( -8% - 9%) 0.940 Fuzzy1 100.97 (2.2%) 100.88 (2.1%) -0.1% ( -4% - 4%) 0.898 LowIntervalsOrdered 42.24 (0.7%) 42.21 (1.0%) -0.1% ( -1% - 1%) 0.766 MedPhrase 923.85 (1.3%) 923.14 (1.6%) -0.1% ( -2% - 2%) 0.867 OrNotHighMed 1427.45 (2.0%) 1428.11 (2.5%) 0.0% ( -4% - 4%) 0.949 Respell 82.74 (2.6%) 82.81 (1.9%) 0.1% ( -4% - 4%) 0.903 LowSpanNear 373.63 (2.6%) 373.97 (1.6%) 0.1% ( -4% - 4%) 0.893 HighTermDayOfYearSort 199.64 (1.7%) 199.83 (2.5%) 0.1% ( -4% - 4%) 0.887 OrNotHighHigh 1523.02 (2.2%) 1526.12 (2.0%) 0.2% ( -3% - 4%) 0.759 AndHighMedDayTaxoFacets 185.23 (0.9%) 185.79 (1.4%) 0.3% ( -1% - 2%) 0.416 MedTerm 3016.98 (3.4%) 3026.53 (3.2%) 0.3% ( -6% - 7%) 0.761 OrNotHighLow 1867.65 (2.5%) 1876.63 (2.4%) 0.5% ( -4% - 5%) 0.535 AndHighLow 1571.61 (3.1%) 1579.86 (2.6%) 0.5% ( -5% - 6%) 0.564 OrHighLow 1485.93 (3.7%) 1494.56 (2.5%) 0.6% ( -5% - 7%) 0.559 AndHighHigh 80.42 (2.8%) 81.06 (1.7%) 0.8% ( -3% - 5%) 0.273 HighSloppyPhrase 50.68 (4.0%) 51.14 (4.7%) 0.9% ( -7% - 9%) 0.506 MedSloppyPhrase 40.76 (2.6%) 41.13 (3.6%) 0.9% ( -5% - 7%) 0.356 Wildcard 123.13 (7.3%) 124.34 (6.5%) 1.0% ( -11% - 15%) 0.654 AndHighHighDayTaxoFacets 17.77 (2.8%) 17.95 (2.7%) 1.0% ( -4% - 6%) 0.256 MedTermDayTaxoFacets 46.83 (2.6%) 47.38 (1.8%) 1.2% ( -3% - 5%) 0.097 HighTermMonthSort 193.35 (1.5%) 195.77 (5.4%) 1.2% ( -5% - 8%) 0.320 IntNRQ 69.13 (17.2%) 70.81 (16.2%) 2.4% ( -26% - 43%) 0.646 H
[GitHub] [lucene] rmuir commented on a diff in pull request #817: improve spotless error to suggest running 'gradlew tidy'
rmuir commented on code in PR #817: URL: https://github.com/apache/lucene/pull/817#discussion_r852889508 ## gradle/validation/spotless.gradle: ## @@ -111,3 +111,9 @@ configure(project(":lucene").subprojects) { prj -> v.dependsOn ":checkJdkInternalsExportedToGradle" } } + +gradle.taskGraph.afterTask { Task task, TaskState state -> + if (task.name == 'spotlessJavaCheck' && state.failure) { +throw new GradleException("\n***\n*PLEASE RUN ./gradle tidy!*\n***"); Review Comment: OOPS -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10521) Tests in windows are failing for the new testAlwaysRefreshDirectoryTaxonomyReader test
[ https://issues.apache.org/jira/browse/LUCENE-10521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524255#comment-17524255 ] ASF subversion and git services commented on LUCENE-10521: -- Commit fb76d0b104ef843790848531cf14707e2059e079 in lucene's branch refs/heads/main from Michael McCandless [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=fb76d0b104e ] LUCENE-10482, LUCENE-10521: hrmph, put the @Ignore in the right place > Tests in windows are failing for the new > testAlwaysRefreshDirectoryTaxonomyReader test > -- > > Key: LUCENE-10521 > URL: https://issues.apache.org/jira/browse/LUCENE-10521 > Project: Lucene - Core > Issue Type: Bug > Components: modules/facet > Environment: Windows 10 >Reporter: Gautam Worah >Priority: Minor > > Build: [https://jenkins.thetaphi.de/job/Lucene-main-Windows/10725/] is > failing. > > Specifically, the loop which checks if any files still remain to be deleted > is not ending. > We have added an exception to the main test class to not run the test on > WindowsFS (not sure if this is related). > > ``` > SEVERE: 1 thread leaked from SUITE scope at > org.apache.lucene.facet.taxonomy.directory.TestAlwaysRefreshDirectoryTaxonomyReader: > 1) Thread[id=19, > name=TEST-TestAlwaysRefreshDirectoryTaxonomyReader.testAlwaysRefreshDirectoryTaxonomyReader-seed#[F46E42CB7F2B6959], > state=RUNNABLE, group=TGRP-TestAlwaysRefreshDirectoryTaxonomyReader] at > java.base@18/sun.nio.fs.WindowsNativeDispatcher.GetFileAttributesEx0(Native > Method) at > java.base@18/sun.nio.fs.WindowsNativeDispatcher.GetFileAttributesEx(WindowsNativeDispatcher.java:390) > at > java.base@18/sun.nio.fs.WindowsFileAttributes.get(WindowsFileAttributes.java:307) > at > java.base@18/sun.nio.fs.WindowsFileSystemProvider.implDelete(WindowsFileSystemProvider.java:251) > at > java.base@18/sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:105) > at > app/org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.mockfile.FilterFileSystemProvider.delete(FilterFileSystemProvider.java:130) > at > app/org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.mockfile.FilterFileSystemProvider.delete(FilterFileSystemProvider.java:130) > at > app/org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.mockfile.FilterFileSystemProvider.delete(FilterFileSystemProvider.java:130) > at > app/org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.mockfile.FilterFileSystemProvider.delete(FilterFileSystemProvider.java:130) > at > app/org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.mockfile.FilterFileSystemProvider.delete(FilterFileSystemProvider.java:130) > at java.base@18/java.nio.file.Files.delete(Files.java:1152) at > app/org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.store.FSDirectory.privateDeleteFile(FSDirectory.java:344) > at > app/org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.store.FSDirectory.deletePendingFiles(FSDirectory.java:325) > at > app/org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.store.FSDirectory.getPendingDeletions(FSDirectory.java:410) > at > app/org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.store.FilterDirectory.getPendingDeletions(FilterDirectory.java:121) > at > app//org.apache.lucene.facet.taxonomy.directory.TestAlwaysRefreshDirectoryTaxonomyReader.testAlwaysRefreshDirectoryTaxonomyReader(TestAlwaysRefreshDirectoryTaxonomyReader.java:97) > ``` -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10482) Allow users to create their own DirectoryTaxonomyReaders with empty taxoArrays instead of letting the taxoEpoch decide
[ https://issues.apache.org/jira/browse/LUCENE-10482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524254#comment-17524254 ] ASF subversion and git services commented on LUCENE-10482: -- Commit fb76d0b104ef843790848531cf14707e2059e079 in lucene's branch refs/heads/main from Michael McCandless [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=fb76d0b104e ] LUCENE-10482, LUCENE-10521: hrmph, put the @Ignore in the right place > Allow users to create their own DirectoryTaxonomyReaders with empty > taxoArrays instead of letting the taxoEpoch decide > -- > > Key: LUCENE-10482 > URL: https://issues.apache.org/jira/browse/LUCENE-10482 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Affects Versions: 9.1 >Reporter: Gautam Worah >Priority: Minor > Time Spent: 8.5h > Remaining Estimate: 0h > > I was experimenting with the taxonomy index and {{DirectoryTaxonomyReaders}} > in my day job where we were trying to replace the index underneath a reader > asynchronously and then call the {{doOpenIfChanged}} call on it. > It turns out that the taxonomy index uses its own index based counter (the > {{{}taxonomyIndexEpoch{}}}) to determine if the index was opened in write > mode after the last time it was written and if not, it directly tries to > reuse the previous {{taxoArrays}} it had created. This logic fails in a > scenario where both the old and new index were opened just once but the index > itself is completely different in both the cases. > In such a case, it would be good to give the user the flexibility to inform > the DTR to recreate its {{{}taxoArrays{}}}, {{ordinalCache}} and > {{{}categoryCache{}}} (not refreshing these arrays causes it to fail in > various ways). Luckily, such a constructor already exists! But it is private > today! The idea here is to allow subclasses of DTR to use this constructor. > Curious to see what other folks think about this idea. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10482) Allow users to create their own DirectoryTaxonomyReaders with empty taxoArrays instead of letting the taxoEpoch decide
[ https://issues.apache.org/jira/browse/LUCENE-10482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524256#comment-17524256 ] ASF subversion and git services commented on LUCENE-10482: -- Commit 2fa3a36899f4560ffb593449d6778307aa232e35 in lucene's branch refs/heads/branch_9x from Michael McCandless [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=2fa3a36899f ] LUCENE-10482, LUCENE-10521: hrmph, put the @Ignore in the right place > Allow users to create their own DirectoryTaxonomyReaders with empty > taxoArrays instead of letting the taxoEpoch decide > -- > > Key: LUCENE-10482 > URL: https://issues.apache.org/jira/browse/LUCENE-10482 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Affects Versions: 9.1 >Reporter: Gautam Worah >Priority: Minor > Time Spent: 8.5h > Remaining Estimate: 0h > > I was experimenting with the taxonomy index and {{DirectoryTaxonomyReaders}} > in my day job where we were trying to replace the index underneath a reader > asynchronously and then call the {{doOpenIfChanged}} call on it. > It turns out that the taxonomy index uses its own index based counter (the > {{{}taxonomyIndexEpoch{}}}) to determine if the index was opened in write > mode after the last time it was written and if not, it directly tries to > reuse the previous {{taxoArrays}} it had created. This logic fails in a > scenario where both the old and new index were opened just once but the index > itself is completely different in both the cases. > In such a case, it would be good to give the user the flexibility to inform > the DTR to recreate its {{{}taxoArrays{}}}, {{ordinalCache}} and > {{{}categoryCache{}}} (not refreshing these arrays causes it to fail in > various ways). Luckily, such a constructor already exists! But it is private > today! The idea here is to allow subclasses of DTR to use this constructor. > Curious to see what other folks think about this idea. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10521) Tests in windows are failing for the new testAlwaysRefreshDirectoryTaxonomyReader test
[ https://issues.apache.org/jira/browse/LUCENE-10521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524257#comment-17524257 ] ASF subversion and git services commented on LUCENE-10521: -- Commit 2fa3a36899f4560ffb593449d6778307aa232e35 in lucene's branch refs/heads/branch_9x from Michael McCandless [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=2fa3a36899f ] LUCENE-10482, LUCENE-10521: hrmph, put the @Ignore in the right place > Tests in windows are failing for the new > testAlwaysRefreshDirectoryTaxonomyReader test > -- > > Key: LUCENE-10521 > URL: https://issues.apache.org/jira/browse/LUCENE-10521 > Project: Lucene - Core > Issue Type: Bug > Components: modules/facet > Environment: Windows 10 >Reporter: Gautam Worah >Priority: Minor > > Build: [https://jenkins.thetaphi.de/job/Lucene-main-Windows/10725/] is > failing. > > Specifically, the loop which checks if any files still remain to be deleted > is not ending. > We have added an exception to the main test class to not run the test on > WindowsFS (not sure if this is related). > > ``` > SEVERE: 1 thread leaked from SUITE scope at > org.apache.lucene.facet.taxonomy.directory.TestAlwaysRefreshDirectoryTaxonomyReader: > 1) Thread[id=19, > name=TEST-TestAlwaysRefreshDirectoryTaxonomyReader.testAlwaysRefreshDirectoryTaxonomyReader-seed#[F46E42CB7F2B6959], > state=RUNNABLE, group=TGRP-TestAlwaysRefreshDirectoryTaxonomyReader] at > java.base@18/sun.nio.fs.WindowsNativeDispatcher.GetFileAttributesEx0(Native > Method) at > java.base@18/sun.nio.fs.WindowsNativeDispatcher.GetFileAttributesEx(WindowsNativeDispatcher.java:390) > at > java.base@18/sun.nio.fs.WindowsFileAttributes.get(WindowsFileAttributes.java:307) > at > java.base@18/sun.nio.fs.WindowsFileSystemProvider.implDelete(WindowsFileSystemProvider.java:251) > at > java.base@18/sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:105) > at > app/org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.mockfile.FilterFileSystemProvider.delete(FilterFileSystemProvider.java:130) > at > app/org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.mockfile.FilterFileSystemProvider.delete(FilterFileSystemProvider.java:130) > at > app/org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.mockfile.FilterFileSystemProvider.delete(FilterFileSystemProvider.java:130) > at > app/org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.mockfile.FilterFileSystemProvider.delete(FilterFileSystemProvider.java:130) > at > app/org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.mockfile.FilterFileSystemProvider.delete(FilterFileSystemProvider.java:130) > at java.base@18/java.nio.file.Files.delete(Files.java:1152) at > app/org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.store.FSDirectory.privateDeleteFile(FSDirectory.java:344) > at > app/org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.store.FSDirectory.deletePendingFiles(FSDirectory.java:325) > at > app/org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.store.FSDirectory.getPendingDeletions(FSDirectory.java:410) > at > app/org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.store.FilterDirectory.getPendingDeletions(FilterDirectory.java:121) > at > app//org.apache.lucene.facet.taxonomy.directory.TestAlwaysRefreshDirectoryTaxonomyReader.testAlwaysRefreshDirectoryTaxonomyReader(TestAlwaysRefreshDirectoryTaxonomyReader.java:97) > ``` -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10521) Tests in windows are failing for the new testAlwaysRefreshDirectoryTaxonomyReader test
[ https://issues.apache.org/jira/browse/LUCENE-10521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524258#comment-17524258 ] Michael McCandless commented on LUCENE-10521: - Phew! This time I put the {{@Ignore}} in the right place, I think :) I confirmed now when I run that one test case, it indeed says skipped. Hopefully [~uschindler]'s awesome Windows Jenkins builds are OK again. Sorry for all the flailing :) > Tests in windows are failing for the new > testAlwaysRefreshDirectoryTaxonomyReader test > -- > > Key: LUCENE-10521 > URL: https://issues.apache.org/jira/browse/LUCENE-10521 > Project: Lucene - Core > Issue Type: Bug > Components: modules/facet > Environment: Windows 10 >Reporter: Gautam Worah >Priority: Minor > > Build: [https://jenkins.thetaphi.de/job/Lucene-main-Windows/10725/] is > failing. > > Specifically, the loop which checks if any files still remain to be deleted > is not ending. > We have added an exception to the main test class to not run the test on > WindowsFS (not sure if this is related). > > ``` > SEVERE: 1 thread leaked from SUITE scope at > org.apache.lucene.facet.taxonomy.directory.TestAlwaysRefreshDirectoryTaxonomyReader: > 1) Thread[id=19, > name=TEST-TestAlwaysRefreshDirectoryTaxonomyReader.testAlwaysRefreshDirectoryTaxonomyReader-seed#[F46E42CB7F2B6959], > state=RUNNABLE, group=TGRP-TestAlwaysRefreshDirectoryTaxonomyReader] at > java.base@18/sun.nio.fs.WindowsNativeDispatcher.GetFileAttributesEx0(Native > Method) at > java.base@18/sun.nio.fs.WindowsNativeDispatcher.GetFileAttributesEx(WindowsNativeDispatcher.java:390) > at > java.base@18/sun.nio.fs.WindowsFileAttributes.get(WindowsFileAttributes.java:307) > at > java.base@18/sun.nio.fs.WindowsFileSystemProvider.implDelete(WindowsFileSystemProvider.java:251) > at > java.base@18/sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:105) > at > app/org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.mockfile.FilterFileSystemProvider.delete(FilterFileSystemProvider.java:130) > at > app/org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.mockfile.FilterFileSystemProvider.delete(FilterFileSystemProvider.java:130) > at > app/org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.mockfile.FilterFileSystemProvider.delete(FilterFileSystemProvider.java:130) > at > app/org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.mockfile.FilterFileSystemProvider.delete(FilterFileSystemProvider.java:130) > at > app/org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.mockfile.FilterFileSystemProvider.delete(FilterFileSystemProvider.java:130) > at java.base@18/java.nio.file.Files.delete(Files.java:1152) at > app/org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.store.FSDirectory.privateDeleteFile(FSDirectory.java:344) > at > app/org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.store.FSDirectory.deletePendingFiles(FSDirectory.java:325) > at > app/org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.store.FSDirectory.getPendingDeletions(FSDirectory.java:410) > at > app/org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.store.FilterDirectory.getPendingDeletions(FilterDirectory.java:121) > at > app//org.apache.lucene.facet.taxonomy.directory.TestAlwaysRefreshDirectoryTaxonomyReader.testAlwaysRefreshDirectoryTaxonomyReader(TestAlwaysRefreshDirectoryTaxonomyReader.java:97) > ``` -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10517) Improve performance of SortedSetDV faceting by iterating on class types
[ https://issues.apache.org/jira/browse/LUCENE-10517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524262#comment-17524262 ] Michael McCandless commented on LUCENE-10517: - This is a very impressive performance jump for the "pure browse" faceting case! I'll review the PR soon. Thanks [~ChrisHegarty]! > Improve performance of SortedSetDV faceting by iterating on class types > --- > > Key: LUCENE-10517 > URL: https://issues.apache.org/jira/browse/LUCENE-10517 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Affects Versions: 9.1 >Reporter: Chris Hegarty >Priority: Minor > Time Spent: 40m > Remaining Estimate: 0h > > While analysing various profiles, [@grcevski|https://github.com/grcevski] and > I can came across this potential improvement. > SortedSetDV faceting (and friends), can improve performance within tight > loops by using invokevirtual (rather than invokeinterface). The C2 JIT > compiler can produce slightly more optimal code in this case, and since these > loops are very hot, the impact can be significant (in the order of 10-30%). > This issue is in some ways similar to, and builds upon, prior optimisations > in this area, like say LUCENE-5300 or more recently LUCENE-5309 -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] ssigut commented on pull request #439: LUCENE-8739: custom codec providing Zstandard compression/decompression
ssigut commented on PR #439: URL: https://github.com/apache/lucene/pull/439#issuecomment-1102537548 Is this PR going to be merged? What release is this planned for? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir opened a new pull request, #818: Fix incorrect docs in README.md: it must be java 17 exactly, java 18 does not work
rmuir opened a new pull request, #818: URL: https://github.com/apache/lucene/pull/818 These instructions tell the user to install 17 (or greater), then run `./gradlew`. This will not actually work if they install something greater than java 17. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10517) Improve performance of SortedSetDV faceting by iterating on class types
[ https://issues.apache.org/jira/browse/LUCENE-10517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524277#comment-17524277 ] Adrien Grand commented on LUCENE-10517: --- Very impressive indeed. This makes me wonder if {{DefaultBulkScorer#scoreAll}} would benefit from a similar change. This is the function that iterates over all the matches of a query to pass them to the collector. > Improve performance of SortedSetDV faceting by iterating on class types > --- > > Key: LUCENE-10517 > URL: https://issues.apache.org/jira/browse/LUCENE-10517 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Affects Versions: 9.1 >Reporter: Chris Hegarty >Priority: Minor > Time Spent: 40m > Remaining Estimate: 0h > > While analysing various profiles, [@grcevski|https://github.com/grcevski] and > I can came across this potential improvement. > SortedSetDV faceting (and friends), can improve performance within tight > loops by using invokevirtual (rather than invokeinterface). The C2 JIT > compiler can produce slightly more optimal code in this case, and since these > loops are very hot, the impact can be significant (in the order of 10-30%). > This issue is in some ways similar to, and builds upon, prior optimisations > in this area, like say LUCENE-5300 or more recently LUCENE-5309 -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on pull request #439: LUCENE-8739: custom codec providing Zstandard compression/decompression
jpountz commented on PR #439: URL: https://github.com/apache/lucene/pull/439#issuecomment-1102557832 See discussion on [LUCENE-8739](https://issues.apache.org/jira/browse/LUCENE-8739), this PR is unlikely going to get merged. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir opened a new pull request, #819: fail clearly on too-new JDK
rmuir opened a new pull request, #819: URL: https://github.com/apache/lucene/pull/819 Gradle will give a very confusing error, let's make it absolutely clear.  -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #818: Fix incorrect docs in README.md: it must be java 17 exactly, java 18 does not work
rmuir commented on PR #818: URL: https://github.com/apache/lucene/pull/818#issuecomment-1102585523 > I wish we could fix `./gradlew` to detect you are using an unsupported JDK version and say so (exit with error/exception with a clear message). https://github.com/apache/lucene/pull/819 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta commented on pull request #808: LUCENE-10513: Run `gradlew tidy` first
mocobeta commented on PR #808: URL: https://github.com/apache/lucene/pull/808#issuecomment-1102598934 oh, I learned that wikipedia has an article on Shoshin (初心) for the first time. It's a common noun in Japanese (and also in Chinese I think) so there are no corresponding articles in those languages; I can't really explain it, but it's interesting to me the word is capitalized like proper nouns... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] cpoerschke opened a new pull request, #820: Remove outdated comment in UnifiedHighlighter.get(Formatter|Scorer) javadoc.
cpoerschke opened a new pull request, #820: URL: https://github.com/apache/lucene/pull/820 No JIRA ticket required for this change, in my opinion. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mikemccand commented on pull request #808: LUCENE-10513: Run `gradlew tidy` first
mikemccand commented on PR #808: URL: https://github.com/apache/lucene/pull/808#issuecomment-1102642347 > oh, I learned that wikipedia has an article on Shoshin (初心) for the first time. It's a common noun in Japanese (and also in Chinese I think) so there are no corresponding articles in those languages; I can't really explain it, but it's interesting to me the word is capitalized like proper nouns... Oh thanks for explaining @mocobeta! I have been capitalizing it ever since I learned it but I will try to stop. Maybe I can just use 初心 going forwards. Thanks! shoshin (初心) also reminds of this helpful graph (from WaitButWhy's [The Thinking Ladder](https://waitbutwhy.com/2019/09/thinking-ladder.html)):  -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz merged pull request #794: LUCENE-10153: Improve accuracy of scaled scores in WANDScorer.
jpountz merged PR #794: URL: https://github.com/apache/lucene/pull/794 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10153) More speedups for operations on byte[] via VarHandles
[ https://issues.apache.org/jira/browse/LUCENE-10153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524302#comment-17524302 ] ASF subversion and git services commented on LUCENE-10153: -- Commit d9e37f31230f595dce668e86a2f151d3aa4c4176 in lucene's branch refs/heads/main from Adrien Grand [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=d9e37f31230 ] LUCENE-10153: Improve accuracy of scaled scores in WANDScorer. (#794) > More speedups for operations on byte[] via VarHandles > - > > Key: LUCENE-10153 > URL: https://issues.apache.org/jira/browse/LUCENE-10153 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Fix For: 9.0 > > Time Spent: 2h > Remaining Estimate: 0h > > LUCENE-10145 leveraged VarHandles to speed up unsigned comparisons of byte[4] > or byte[8]. But we could do more, such as speeding up the computation of > common prefix lengths. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10523) facilitate UnifiedHighlighter extension w.r.t. FieldHighlighter
Christine Poerschke created LUCENE-10523: Summary: facilitate UnifiedHighlighter extension w.r.t. FieldHighlighter Key: LUCENE-10523 URL: https://issues.apache.org/jira/browse/LUCENE-10523 Project: Lucene - Core Issue Type: Wish Reporter: Christine Poerschke Assignee: Christine Poerschke If the {{UnifiedHighlighter}} had a protected {{newFieldHighlighter}} method then less {{getFieldHighlighter}} code would need to be duplicated if one wanted to use a custom {{FieldHighlighter}}. Proposed change: pull-request-link-to-follow A possible usage scenario: * e.g. via Solr's {{HTMLStripFieldUpdateProcessorFactory}} any HTML markup could be stripped at document ingestion time but this may not suit all use cases * e.g. via Solr's {{hl.encoder=html}} parameter any HTML markup could be escaped at document search time when returning highlighting snippets but this may not suit all use cases * extension illustration: link-to-follow ** i.e. at document search time remove any HTML markup prior to highlight snippet extraction -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] cpoerschke opened a new pull request, #821: LUCENE-10523: factor out UnifiedHighlighter.newFieldHighlighter() method
cpoerschke opened a new pull request, #821: URL: https://github.com/apache/lucene/pull/821 https://issues.apache.org/jira/browse/LUCENE-10523 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10153) More speedups for operations on byte[] via VarHandles
[ https://issues.apache.org/jira/browse/LUCENE-10153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524304#comment-17524304 ] ASF subversion and git services commented on LUCENE-10153: -- Commit db0e712cad3a23e58a66c7c3aa1a9a8b0e217823 in lucene's branch refs/heads/branch_9x from Adrien Grand [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=db0e712cad3 ] LUCENE-10153: Improve accuracy of scaled scores in WANDScorer. (#794) > More speedups for operations on byte[] via VarHandles > - > > Key: LUCENE-10153 > URL: https://issues.apache.org/jira/browse/LUCENE-10153 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Fix For: 9.0 > > Time Spent: 2h > Remaining Estimate: 0h > > LUCENE-10145 leveraged VarHandles to speed up unsigned comparisons of byte[4] > or byte[8]. But we could do more, such as speeding up the computation of > common prefix lengths. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10523) facilitate UnifiedHighlighter extension w.r.t. FieldHighlighter
[ https://issues.apache.org/jira/browse/LUCENE-10523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christine Poerschke updated LUCENE-10523: - Description: If the {{UnifiedHighlighter}} had a protected {{newFieldHighlighter}} method then less {{getFieldHighlighter}} code would need to be duplicated if one wanted to use a custom {{FieldHighlighter}}. Proposed change: https://github.com/apache/lucene/pull/821 A possible usage scenario: * e.g. via Solr's {{HTMLStripFieldUpdateProcessorFactory}} any HTML markup could be stripped at document ingestion time but this may not suit all use cases * e.g. via Solr's {{hl.encoder=html}} parameter any HTML markup could be escaped at document search time when returning highlighting snippets but this may not suit all use cases * extension illustration: https://github.com/apache/solr/pull/811 ** i.e. at document search time remove any HTML markup prior to highlight snippet extraction was: If the {{UnifiedHighlighter}} had a protected {{newFieldHighlighter}} method then less {{getFieldHighlighter}} code would need to be duplicated if one wanted to use a custom {{FieldHighlighter}}. Proposed change: pull-request-link-to-follow A possible usage scenario: * e.g. via Solr's {{HTMLStripFieldUpdateProcessorFactory}} any HTML markup could be stripped at document ingestion time but this may not suit all use cases * e.g. via Solr's {{hl.encoder=html}} parameter any HTML markup could be escaped at document search time when returning highlighting snippets but this may not suit all use cases * extension illustration: link-to-follow ** i.e. at document search time remove any HTML markup prior to highlight snippet extraction > facilitate UnifiedHighlighter extension w.r.t. FieldHighlighter > --- > > Key: LUCENE-10523 > URL: https://issues.apache.org/jira/browse/LUCENE-10523 > Project: Lucene - Core > Issue Type: Wish >Reporter: Christine Poerschke >Assignee: Christine Poerschke >Priority: Minor > Time Spent: 0.5h > Remaining Estimate: 0h > > If the {{UnifiedHighlighter}} had a protected {{newFieldHighlighter}} method > then less {{getFieldHighlighter}} code would need to be duplicated if one > wanted to use a custom {{FieldHighlighter}}. > Proposed change: https://github.com/apache/lucene/pull/821 > A possible usage scenario: > * e.g. via Solr's {{HTMLStripFieldUpdateProcessorFactory}} any HTML markup > could be stripped at document ingestion time but this may not suit all use > cases > * e.g. via Solr's {{hl.encoder=html}} parameter any HTML markup could be > escaped at document search time when returning highlighting snippets but this > may not suit all use cases > * extension illustration: https://github.com/apache/solr/pull/811 > ** i.e. at document search time remove any HTML markup prior to highlight > snippet extraction -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz merged pull request #799: LUCENE-10506: change visibility of ProfilerCollector#deriveCollectorName to protected
jpountz merged PR #799: URL: https://github.com/apache/lucene/pull/799 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10506) ProfilerCollector to support customizing how name is derived
[ https://issues.apache.org/jira/browse/LUCENE-10506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524310#comment-17524310 ] ASF subversion and git services commented on LUCENE-10506: -- Commit 972663cc1de4c273df99e3ed9dcf7a5c0d44065a in lucene's branch refs/heads/branch_9x from Luca Cavanna [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=972663cc1de ] LUCENE-10506: change visibility of ProfilerCollector#deriveCollectorName to protected (#799) This allows subclasses to extend how the inner collector name is derived. > ProfilerCollector to support customizing how name is derived > > > Key: LUCENE-10506 > URL: https://issues.apache.org/jira/browse/LUCENE-10506 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/sandbox >Reporter: Luca Cavanna >Priority: Minor > Time Spent: 20m > Remaining Estimate: 0h > > ProfilerCollector (part of the sandbox) has a private method called > deriveCollectorName that extracts the class simple name from the provided > collector and sets it as the name of the collector which becomes part of the > profile results later. > While the default behaviour is reasonable, there are cases where it would be > useful to extend this logic, and perhaps not use class names, or enhance that > with more context that the collectors could provide. This could be achieved > by making the deriveCollectorName method protected. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10506) ProfilerCollector to support customizing how name is derived
[ https://issues.apache.org/jira/browse/LUCENE-10506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524309#comment-17524309 ] ASF subversion and git services commented on LUCENE-10506: -- Commit 866bb86a1c97590a4f42934afe05a78f66f10c92 in lucene's branch refs/heads/main from Luca Cavanna [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=866bb86a1c9 ] LUCENE-10506: change visibility of ProfilerCollector#deriveCollectorName to protected (#799) This allows subclasses to extend how the inner collector name is derived. > ProfilerCollector to support customizing how name is derived > > > Key: LUCENE-10506 > URL: https://issues.apache.org/jira/browse/LUCENE-10506 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/sandbox >Reporter: Luca Cavanna >Priority: Minor > Time Spent: 20m > Remaining Estimate: 0h > > ProfilerCollector (part of the sandbox) has a private method called > deriveCollectorName that extracts the class simple name from the provided > collector and sets it as the name of the collector which becomes part of the > profile results later. > While the default behaviour is reasonable, there are cases where it would be > useful to extend this logic, and perhaps not use class names, or enhance that > with more context that the collectors could provide. This could be achieved > by making the deriveCollectorName method protected. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-10506) ProfilerCollector to support customizing how name is derived
[ https://issues.apache.org/jira/browse/LUCENE-10506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrien Grand resolved LUCENE-10506. --- Fix Version/s: 9.2 Resolution: Fixed > ProfilerCollector to support customizing how name is derived > > > Key: LUCENE-10506 > URL: https://issues.apache.org/jira/browse/LUCENE-10506 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/sandbox >Reporter: Luca Cavanna >Priority: Minor > Fix For: 9.2 > > Time Spent: 20m > Remaining Estimate: 0h > > ProfilerCollector (part of the sandbox) has a private method called > deriveCollectorName that extracts the class simple name from the provided > collector and sets it as the name of the collector which becomes part of the > profile results later. > While the default behaviour is reasonable, there are cases where it would be > useful to extend this logic, and perhaps not use class names, or enhance that > with more context that the collectors could provide. This could be achieved > by making the deriveCollectorName method protected. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-10503) Preserve more significant bits of scores in WANDScorer
[ https://issues.apache.org/jira/browse/LUCENE-10503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrien Grand resolved LUCENE-10503. --- Fix Version/s: 9.2 Resolution: Fixed I mixed up the JIRA number in the commit message and the notification went to LUCENE-10153. > Preserve more significant bits of scores in WANDScorer > -- > > Key: LUCENE-10503 > URL: https://issues.apache.org/jira/browse/LUCENE-10503 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Fix For: 9.2 > > > WANDScorer operates on longs to avoid accuracy issues with floating-point > numbers. The current process loses more accuracy bits than it could, and > making it better could help skip in a few more situations. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta commented on pull request #808: LUCENE-10513: Run `gradlew tidy` first
mocobeta commented on PR #808: URL: https://github.com/apache/lucene/pull/808#issuecomment-1102669627 I didn't know the figure, it's very simple and helpful, thanks! I'll read the article next holidays. > I have been capitalizing it ever since I learned it but I will try to stop. It looks that the word Shoshin may already have a special or cultural meaning in English, so capitalization may be needed to express the nuance that the original Japanese word doesn't have? :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10153) More speedups for operations on byte[] via VarHandles
[ https://issues.apache.org/jira/browse/LUCENE-10153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524314#comment-17524314 ] Adrien Grand commented on LUCENE-10153: --- Sorry for the noise, I pushed a commit that had the wrong JIRA number attached to it. > More speedups for operations on byte[] via VarHandles > - > > Key: LUCENE-10153 > URL: https://issues.apache.org/jira/browse/LUCENE-10153 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Fix For: 9.0 > > Time Spent: 2h > Remaining Estimate: 0h > > LUCENE-10145 leveraged VarHandles to speed up unsigned comparisons of byte[4] > or byte[8]. But we could do more, such as speeding up the computation of > common prefix lengths. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10503) Preserve more significant bits of scores in WANDScorer
[ https://issues.apache.org/jira/browse/LUCENE-10503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524317#comment-17524317 ] ASF subversion and git services commented on LUCENE-10503: -- Commit 15ecf3c27f97a109e53f9bdcccb0db34c3a30379 in lucene's branch refs/heads/main from Adrien Grand [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=15ecf3c27f9 ] LUCENE-10503: Fix JIRA number in CHANGES. > Preserve more significant bits of scores in WANDScorer > -- > > Key: LUCENE-10503 > URL: https://issues.apache.org/jira/browse/LUCENE-10503 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Fix For: 9.2 > > > WANDScorer operates on longs to avoid accuracy issues with floating-point > numbers. The current process loses more accuracy bits than it could, and > making it better could help skip in a few more situations. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10503) Preserve more significant bits of scores in WANDScorer
[ https://issues.apache.org/jira/browse/LUCENE-10503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524316#comment-17524316 ] ASF subversion and git services commented on LUCENE-10503: -- Commit 241406123384a81c230dfb51b2225aa329823196 in lucene's branch refs/heads/branch_9x from Adrien Grand [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=24140612338 ] LUCENE-10503: Fix JIRA number in CHANGES. > Preserve more significant bits of scores in WANDScorer > -- > > Key: LUCENE-10503 > URL: https://issues.apache.org/jira/browse/LUCENE-10503 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Fix For: 9.2 > > > WANDScorer operates on longs to avoid accuracy issues with floating-point > numbers. The current process loses more accuracy bits than it could, and > making it better could help skip in a few more situations. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-9848) Correctly sort HNSW graph neighbors when applying diversity criterion
[ https://issues.apache.org/jira/browse/LUCENE-9848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayya Sharipova reassigned LUCENE-9848: --- Assignee: Mayya Sharipova > Correctly sort HNSW graph neighbors when applying diversity criterion > -- > > Key: LUCENE-9848 > URL: https://issues.apache.org/jira/browse/LUCENE-9848 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael Sokolov >Assignee: Mayya Sharipova >Priority: Major > > When indexing new documents in an HNSW graph, we first find its nearest > maxConn neighbors (using HNSW search), and then link the new document to this > neighbors in the graph. These neighbors are filtered using a diversity test. > The neighbors are added one by one, from most similar to least. Each new > neighbor is checked against all prior (better) neighbors, and if it is more > similar to that neighbor than it is to the target document, it is rejected as > insufficiently diverse. > When we applied this diversity criterion (rather than simply picking the k > nearest neighbors), we saw substantial improvements in recall / latency ROC > curves across several data sets, and it is part of the reference > implementation, too (where we got it). I believe the impact on indexing > performance was relatively small; this is a good thing to do, even though it > is n^2 at its heart, the n remains reasonable due to being bounded by the > maximum graph fanout parameter, {{maxConn}}. > Something funny happens when we reach the maximum fanout though. While a new > document is being linked to its new neighbors, the neighbors are reciprocally > linked to the new document, until their maximum fanout is reached. At that > point, the diversity criterion is reapplied to select the neighbors to keep. > Basically every neighbor is re-checked against every earlier (better) > neighbor to verify the diversity criterion. This is needed because we > haven't really maintained the diversity property while adding these > reciprocal links – the initial neighbors are checked for diversity, which > often leads to fewer than {{maxConn}} of them being added. Then the new > documents get linked in without checking, until {{maxConn}} is reached, and > then diversity is checked again. This is kind of weird, but seems to work. > But the really strange thing is that when we reject non-diverse documents (in > HnswGraphBuilder.diversityUpdate), the neighbors are no longer sorted in > nearness order. I did some rough checks to see if better graphs would result > from re-sorting (so that when there are non-diverse neighbors, we always > prefer to drop the worse-scoring one), but it didn't seem to matter all that > much. But how can that be? > At any rate this code is funky and hard to understand, and it would probably > benefit from a second look to see if we can either improve indexing > performance or improve search performance (by producing better graphs during > indexing). -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10518) FieldInfos consistency check can refuse to open Lucene 8 index
[ https://issues.apache.org/jira/browse/LUCENE-10518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524408#comment-17524408 ] Adrien Grand commented on LUCENE-10518: --- I'm unsure of the value of the consistency checks on 8.x indices. My gut feeling is that either users have created indices with consistent fields until now, and they'll keep their indices consistent after 9.0, or they have created indices with inconsistent fields and this consistency check is making upgrades harder without helping much. I wonder if we should just disable the check on 8.x indices based on the index created version? > FieldInfos consistency check can refuse to open Lucene 8 index > -- > > Key: LUCENE-10518 > URL: https://issues.apache.org/jira/browse/LUCENE-10518 > Project: Lucene - Core > Issue Type: Bug > Components: core/index >Affects Versions: 8.10.1 >Reporter: Nhat Nguyen >Priority: Major > > A field-infos consistency check introduced in Lucene 9 (LUCENE-9334) can > refuse to open a Lucene 8 index. Lucene 8 can create a partial FieldInfo if > hitting a non-aborting exception (for example [term is too > long|https://github.com/apache/lucene-solr/blob/6a6484ba396927727b16e5061384d3cd80d616b2/lucene/core/src/java/org/apache/lucene/index/DefaultIndexingChain.java#L944]) > during processing fields of a document. We don't have this problem in Lucene > 9 as we process fields in two phases with the [first > phase|https://github.com/apache/lucene/blob/10ebc099c846c7d96f4ff5f9b7853df850fa8442/lucene/core/src/java/org/apache/lucene/index/IndexingChain.java#L589-L614] > processing only FieldInfos. > The issue can be reproduced with this snippet. > {code:java} > public void testWriteIndexOn8x() throws Exception { > FieldType KeywordField = new FieldType(); > KeywordField.setTokenized(false); > KeywordField.setOmitNorms(true); > KeywordField.setIndexOptions(IndexOptions.DOCS); > KeywordField.freeze(); > try (Directory dir = newDirectory()) { > IndexWriterConfig config = new IndexWriterConfig(); > config.setCommitOnClose(false); > config.setMergePolicy(NoMergePolicy.INSTANCE); > try (IndexWriter writer = new IndexWriter(dir, config)) { > // first segment > writer.addDocument(new Document()); // an empty doc > Document d1 = new Document(); > byte[] chars = new byte[IndexWriter.MAX_STORED_STRING_LENGTH + 1]; > Arrays.fill(chars, (byte) 'a'); > d1.add(new Field("field", new BytesRef(chars), KeywordField)); > d1.add(new BinaryDocValuesField("field", new BytesRef(chars))); > expectThrows(IllegalArgumentException.class, () -> > writer.addDocument(d1)); > writer.flush(); > // second segment > Document d2 = new Document(); > d2.add(new Field("field", new BytesRef("hello world"), KeywordField)); > d2.add(new SortedDocValuesField("field", new BytesRef("hello world"))); > writer.addDocument(d2); > writer.flush(); > writer.commit(); > // Check for doc values types consistency > Map docValuesTypes = new HashMap<>(); > try(DirectoryReader reader = DirectoryReader.open(dir)){ > for (LeafReaderContext leaf : reader.leaves()) { > for (FieldInfo fi : leaf.reader().getFieldInfos()) { > DocValuesType current = docValuesTypes.putIfAbsent(fi.name, > fi.getDocValuesType()); > if (current != null && current != fi.getDocValuesType()) { > fail("cannot change DocValues type from " + current + " to " + > fi.getDocValuesType() + " for field \"" + fi.name + "\""); > } > } > } > } > } > } > } > {code} > I would like to propose to: > - Backport the two-phase fields processing from Lucene9 to Lucene8. The patch > should be small and contained. > - Introduce an option in Lucene9 to skip checking field-infos consistency > (i.e., behave like Lucene 8 when the option is enabled). > /cc [~mayya] and [~jpountz] -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10518) FieldInfos consistency check can refuse to open Lucene 8 index
[ https://issues.apache.org/jira/browse/LUCENE-10518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524409#comment-17524409 ] Nhat Nguyen commented on LUCENE-10518: -- +1 to disable consistency checks for 8.x indices. [~mayya] WDYT? > FieldInfos consistency check can refuse to open Lucene 8 index > -- > > Key: LUCENE-10518 > URL: https://issues.apache.org/jira/browse/LUCENE-10518 > Project: Lucene - Core > Issue Type: Bug > Components: core/index >Affects Versions: 8.10.1 >Reporter: Nhat Nguyen >Priority: Major > > A field-infos consistency check introduced in Lucene 9 (LUCENE-9334) can > refuse to open a Lucene 8 index. Lucene 8 can create a partial FieldInfo if > hitting a non-aborting exception (for example [term is too > long|https://github.com/apache/lucene-solr/blob/6a6484ba396927727b16e5061384d3cd80d616b2/lucene/core/src/java/org/apache/lucene/index/DefaultIndexingChain.java#L944]) > during processing fields of a document. We don't have this problem in Lucene > 9 as we process fields in two phases with the [first > phase|https://github.com/apache/lucene/blob/10ebc099c846c7d96f4ff5f9b7853df850fa8442/lucene/core/src/java/org/apache/lucene/index/IndexingChain.java#L589-L614] > processing only FieldInfos. > The issue can be reproduced with this snippet. > {code:java} > public void testWriteIndexOn8x() throws Exception { > FieldType KeywordField = new FieldType(); > KeywordField.setTokenized(false); > KeywordField.setOmitNorms(true); > KeywordField.setIndexOptions(IndexOptions.DOCS); > KeywordField.freeze(); > try (Directory dir = newDirectory()) { > IndexWriterConfig config = new IndexWriterConfig(); > config.setCommitOnClose(false); > config.setMergePolicy(NoMergePolicy.INSTANCE); > try (IndexWriter writer = new IndexWriter(dir, config)) { > // first segment > writer.addDocument(new Document()); // an empty doc > Document d1 = new Document(); > byte[] chars = new byte[IndexWriter.MAX_STORED_STRING_LENGTH + 1]; > Arrays.fill(chars, (byte) 'a'); > d1.add(new Field("field", new BytesRef(chars), KeywordField)); > d1.add(new BinaryDocValuesField("field", new BytesRef(chars))); > expectThrows(IllegalArgumentException.class, () -> > writer.addDocument(d1)); > writer.flush(); > // second segment > Document d2 = new Document(); > d2.add(new Field("field", new BytesRef("hello world"), KeywordField)); > d2.add(new SortedDocValuesField("field", new BytesRef("hello world"))); > writer.addDocument(d2); > writer.flush(); > writer.commit(); > // Check for doc values types consistency > Map docValuesTypes = new HashMap<>(); > try(DirectoryReader reader = DirectoryReader.open(dir)){ > for (LeafReaderContext leaf : reader.leaves()) { > for (FieldInfo fi : leaf.reader().getFieldInfos()) { > DocValuesType current = docValuesTypes.putIfAbsent(fi.name, > fi.getDocValuesType()); > if (current != null && current != fi.getDocValuesType()) { > fail("cannot change DocValues type from " + current + " to " + > fi.getDocValuesType() + " for field \"" + fi.name + "\""); > } > } > } > } > } > } > } > {code} > I would like to propose to: > - Backport the two-phase fields processing from Lucene9 to Lucene8. The patch > should be small and contained. > - Introduce an option in Lucene9 to skip checking field-infos consistency > (i.e., behave like Lucene 8 when the option is enabled). > /cc [~mayya] and [~jpountz] -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10521) Tests in windows are failing for the new testAlwaysRefreshDirectoryTaxonomyReader test
[ https://issues.apache.org/jira/browse/LUCENE-10521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524471#comment-17524471 ] Gautam Worah commented on LUCENE-10521: --- Tests are passing now. Latest main build: https://jenkins.thetaphi.de/job/Lucene-main-Windows/10728/ > Tests in windows are failing for the new > testAlwaysRefreshDirectoryTaxonomyReader test > -- > > Key: LUCENE-10521 > URL: https://issues.apache.org/jira/browse/LUCENE-10521 > Project: Lucene - Core > Issue Type: Bug > Components: modules/facet > Environment: Windows 10 >Reporter: Gautam Worah >Priority: Minor > > Build: [https://jenkins.thetaphi.de/job/Lucene-main-Windows/10725/] is > failing. > > Specifically, the loop which checks if any files still remain to be deleted > is not ending. > We have added an exception to the main test class to not run the test on > WindowsFS (not sure if this is related). > > ``` > SEVERE: 1 thread leaked from SUITE scope at > org.apache.lucene.facet.taxonomy.directory.TestAlwaysRefreshDirectoryTaxonomyReader: > 1) Thread[id=19, > name=TEST-TestAlwaysRefreshDirectoryTaxonomyReader.testAlwaysRefreshDirectoryTaxonomyReader-seed#[F46E42CB7F2B6959], > state=RUNNABLE, group=TGRP-TestAlwaysRefreshDirectoryTaxonomyReader] at > java.base@18/sun.nio.fs.WindowsNativeDispatcher.GetFileAttributesEx0(Native > Method) at > java.base@18/sun.nio.fs.WindowsNativeDispatcher.GetFileAttributesEx(WindowsNativeDispatcher.java:390) > at > java.base@18/sun.nio.fs.WindowsFileAttributes.get(WindowsFileAttributes.java:307) > at > java.base@18/sun.nio.fs.WindowsFileSystemProvider.implDelete(WindowsFileSystemProvider.java:251) > at > java.base@18/sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:105) > at > app/org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.mockfile.FilterFileSystemProvider.delete(FilterFileSystemProvider.java:130) > at > app/org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.mockfile.FilterFileSystemProvider.delete(FilterFileSystemProvider.java:130) > at > app/org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.mockfile.FilterFileSystemProvider.delete(FilterFileSystemProvider.java:130) > at > app/org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.mockfile.FilterFileSystemProvider.delete(FilterFileSystemProvider.java:130) > at > app/org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.mockfile.FilterFileSystemProvider.delete(FilterFileSystemProvider.java:130) > at java.base@18/java.nio.file.Files.delete(Files.java:1152) at > app/org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.store.FSDirectory.privateDeleteFile(FSDirectory.java:344) > at > app/org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.store.FSDirectory.deletePendingFiles(FSDirectory.java:325) > at > app/org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.store.FSDirectory.getPendingDeletions(FSDirectory.java:410) > at > app/org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.store.FilterDirectory.getPendingDeletions(FilterDirectory.java:121) > at > app//org.apache.lucene.facet.taxonomy.directory.TestAlwaysRefreshDirectoryTaxonomyReader.testAlwaysRefreshDirectoryTaxonomyReader(TestAlwaysRefreshDirectoryTaxonomyReader.java:97) > ``` -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on pull request #819: fail clearly on too-new JDK
dweiss commented on PR #819: URL: https://github.com/apache/lucene/pull/819#issuecomment-1102943639 Yep. I wonder if we could do it in one script (the downloader?) to avoid running java so many times but overall I think it's better than before. :) This also reminds me that the various java versions (minimum required for Lucene, minimum required for gradle) are scattered around in oh-so-many places. I wonder how we could somehow centralize this information. I don't have any clean ideas though. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on pull request #818: Fix incorrect docs in README.md: it must be java 17 exactly, java 18 does not work
dweiss commented on PR #818: URL: https://github.com/apache/lucene/pull/818#issuecomment-1102946084 > I wish we could fix ./gradlew to detect you are using an unsupported JDK version and say so (exit with error/exception with a clear message). I really don't understand why gradle doesn't just check it up front... Maybe they hope things will just run with future versions out of the box (I don't see how it's possible, given all the bytecode manipulation magic, eh). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on pull request #817: improve spotless error to suggest running 'gradlew tidy'
dweiss commented on PR #817: URL: https://github.com/apache/lucene/pull/817#issuecomment-1102953991 This attaches to each and every spotless task (and would print a message for all of them). Maybe it'd be better to create a single finalizer task (at the root level), collect all the spotless tasks in the graph and add finalizedBy pointing at that single task (it's still need to check the status of those tasks it finalizes). Then if you have multiple failures, it'd print the message just once. It could even fail on its own - then the error would be the last one reported... But then, maybe it's overdoing things. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on pull request #807: LUCENE-10512: Grammar: Remove incidents of "the the" in comments.
dweiss commented on PR #807: URL: https://github.com/apache/lucene/pull/807#issuecomment-1102961503 I don't know, Mike... Gradle doesn't seem like a tool that you can ever make dead-simple (like ant). I like what Robert added but with hacks like that a question always pops to my mind of what happens in gradle, internally, if you throw an exception from such a block - think throwing an exception from java's try-finally that obscures the original cause... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #819: fail clearly on too-new JDK
rmuir commented on PR #819: URL: https://github.com/apache/lucene/pull/819#issuecomment-1102995607 > Yep. I wonder if we could do it in one script (the downloader?) to avoid running java so many times but overall I think it's better than before. :) I agree with this (as someone who disables the daemon and runs the commands every time). The only reason I specified it as a different command was due to the fact that if the downloader fails, the error is "trapped" and an additional version-related error (that you need at least java 11) is printed... IMO that's a bit confusing, but I get it. > > This also reminds me that the various java versions (minimum required for Lucene, minimum required for gradle) are scattered around in oh-so-many places. I wonder how we could somehow centralize this information. I don't have any clean ideas though. Maybe it could be in a .properties file? Not gradle, not groovy, a real actual .properties file that we can read with java.io.Properties too? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10518) FieldInfos consistency check can refuse to open Lucene 8 index
[ https://issues.apache.org/jira/browse/LUCENE-10518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524548#comment-17524548 ] Mayya Sharipova commented on LUCENE-10518: -- [~dnhatn] [~jpountz] Thanks for your suggestions. +1 on the idea as well. Do we want to disable consistency check only on segments that were created in 8.x and even of 9.x segments of 8.x index? > FieldInfos consistency check can refuse to open Lucene 8 index > -- > > Key: LUCENE-10518 > URL: https://issues.apache.org/jira/browse/LUCENE-10518 > Project: Lucene - Core > Issue Type: Bug > Components: core/index >Affects Versions: 8.10.1 >Reporter: Nhat Nguyen >Priority: Major > > A field-infos consistency check introduced in Lucene 9 (LUCENE-9334) can > refuse to open a Lucene 8 index. Lucene 8 can create a partial FieldInfo if > hitting a non-aborting exception (for example [term is too > long|https://github.com/apache/lucene-solr/blob/6a6484ba396927727b16e5061384d3cd80d616b2/lucene/core/src/java/org/apache/lucene/index/DefaultIndexingChain.java#L944]) > during processing fields of a document. We don't have this problem in Lucene > 9 as we process fields in two phases with the [first > phase|https://github.com/apache/lucene/blob/10ebc099c846c7d96f4ff5f9b7853df850fa8442/lucene/core/src/java/org/apache/lucene/index/IndexingChain.java#L589-L614] > processing only FieldInfos. > The issue can be reproduced with this snippet. > {code:java} > public void testWriteIndexOn8x() throws Exception { > FieldType KeywordField = new FieldType(); > KeywordField.setTokenized(false); > KeywordField.setOmitNorms(true); > KeywordField.setIndexOptions(IndexOptions.DOCS); > KeywordField.freeze(); > try (Directory dir = newDirectory()) { > IndexWriterConfig config = new IndexWriterConfig(); > config.setCommitOnClose(false); > config.setMergePolicy(NoMergePolicy.INSTANCE); > try (IndexWriter writer = new IndexWriter(dir, config)) { > // first segment > writer.addDocument(new Document()); // an empty doc > Document d1 = new Document(); > byte[] chars = new byte[IndexWriter.MAX_STORED_STRING_LENGTH + 1]; > Arrays.fill(chars, (byte) 'a'); > d1.add(new Field("field", new BytesRef(chars), KeywordField)); > d1.add(new BinaryDocValuesField("field", new BytesRef(chars))); > expectThrows(IllegalArgumentException.class, () -> > writer.addDocument(d1)); > writer.flush(); > // second segment > Document d2 = new Document(); > d2.add(new Field("field", new BytesRef("hello world"), KeywordField)); > d2.add(new SortedDocValuesField("field", new BytesRef("hello world"))); > writer.addDocument(d2); > writer.flush(); > writer.commit(); > // Check for doc values types consistency > Map docValuesTypes = new HashMap<>(); > try(DirectoryReader reader = DirectoryReader.open(dir)){ > for (LeafReaderContext leaf : reader.leaves()) { > for (FieldInfo fi : leaf.reader().getFieldInfos()) { > DocValuesType current = docValuesTypes.putIfAbsent(fi.name, > fi.getDocValuesType()); > if (current != null && current != fi.getDocValuesType()) { > fail("cannot change DocValues type from " + current + " to " + > fi.getDocValuesType() + " for field \"" + fi.name + "\""); > } > } > } > } > } > } > } > {code} > I would like to propose to: > - Backport the two-phase fields processing from Lucene9 to Lucene8. The patch > should be small and contained. > - Introduce an option in Lucene9 to skip checking field-infos consistency > (i.e., behave like Lucene 8 when the option is enabled). > /cc [~mayya] and [~jpountz] -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #817: improve spotless error to suggest running 'gradlew tidy'
rmuir commented on PR #817: URL: https://github.com/apache/lucene/pull/817#issuecomment-1102998198 > Maybe it'd be better to create a single finalizer task (at the root level), collect all the spotless tasks in the graph and add finalizedBy pointing at that single task (it's still need to check the status of those tasks it finalizes). Then if you have multiple failures, it'd print the message just once. It could even fail on its own - then the error would be the last one reported... But then, maybe it's overdoing things. This is beyond my area of gradle kung fu, but I'll leave the PR open in case anyone else understands it. I just basically hacked until I was able to get something printed in the case spotless fails. If there is a way to "try/catch" its error, that would be fine too. I do think, despite its confusing errors, that it is best to print whatever spotless says. It is just that we want to "amend" the output (loudly) when it does with a simple tip to make it easier. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-10518) FieldInfos consistency check can refuse to open Lucene 8 index
[ https://issues.apache.org/jira/browse/LUCENE-10518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524548#comment-17524548 ] Mayya Sharipova edited comment on LUCENE-10518 at 4/19/22 7:15 PM: --- [~dnhatn] [~jpountz] Thanks for your suggestions. +1 on the idea as well. Do we want to disable consistency check only on segments that were created in 8.x or on all segments even of 9.x segments of 8.x index? was (Author: mayya): [~dnhatn] [~jpountz] Thanks for your suggestions. +1 on the idea as well. Do we want to disable consistency check only on segments that were created in 8.x and even of 9.x segments of 8.x index? > FieldInfos consistency check can refuse to open Lucene 8 index > -- > > Key: LUCENE-10518 > URL: https://issues.apache.org/jira/browse/LUCENE-10518 > Project: Lucene - Core > Issue Type: Bug > Components: core/index >Affects Versions: 8.10.1 >Reporter: Nhat Nguyen >Priority: Major > > A field-infos consistency check introduced in Lucene 9 (LUCENE-9334) can > refuse to open a Lucene 8 index. Lucene 8 can create a partial FieldInfo if > hitting a non-aborting exception (for example [term is too > long|https://github.com/apache/lucene-solr/blob/6a6484ba396927727b16e5061384d3cd80d616b2/lucene/core/src/java/org/apache/lucene/index/DefaultIndexingChain.java#L944]) > during processing fields of a document. We don't have this problem in Lucene > 9 as we process fields in two phases with the [first > phase|https://github.com/apache/lucene/blob/10ebc099c846c7d96f4ff5f9b7853df850fa8442/lucene/core/src/java/org/apache/lucene/index/IndexingChain.java#L589-L614] > processing only FieldInfos. > The issue can be reproduced with this snippet. > {code:java} > public void testWriteIndexOn8x() throws Exception { > FieldType KeywordField = new FieldType(); > KeywordField.setTokenized(false); > KeywordField.setOmitNorms(true); > KeywordField.setIndexOptions(IndexOptions.DOCS); > KeywordField.freeze(); > try (Directory dir = newDirectory()) { > IndexWriterConfig config = new IndexWriterConfig(); > config.setCommitOnClose(false); > config.setMergePolicy(NoMergePolicy.INSTANCE); > try (IndexWriter writer = new IndexWriter(dir, config)) { > // first segment > writer.addDocument(new Document()); // an empty doc > Document d1 = new Document(); > byte[] chars = new byte[IndexWriter.MAX_STORED_STRING_LENGTH + 1]; > Arrays.fill(chars, (byte) 'a'); > d1.add(new Field("field", new BytesRef(chars), KeywordField)); > d1.add(new BinaryDocValuesField("field", new BytesRef(chars))); > expectThrows(IllegalArgumentException.class, () -> > writer.addDocument(d1)); > writer.flush(); > // second segment > Document d2 = new Document(); > d2.add(new Field("field", new BytesRef("hello world"), KeywordField)); > d2.add(new SortedDocValuesField("field", new BytesRef("hello world"))); > writer.addDocument(d2); > writer.flush(); > writer.commit(); > // Check for doc values types consistency > Map docValuesTypes = new HashMap<>(); > try(DirectoryReader reader = DirectoryReader.open(dir)){ > for (LeafReaderContext leaf : reader.leaves()) { > for (FieldInfo fi : leaf.reader().getFieldInfos()) { > DocValuesType current = docValuesTypes.putIfAbsent(fi.name, > fi.getDocValuesType()); > if (current != null && current != fi.getDocValuesType()) { > fail("cannot change DocValues type from " + current + " to " + > fi.getDocValuesType() + " for field \"" + fi.name + "\""); > } > } > } > } > } > } > } > {code} > I would like to propose to: > - Backport the two-phase fields processing from Lucene9 to Lucene8. The patch > should be small and contained. > - Introduce an option in Lucene9 to skip checking field-infos consistency > (i.e., behave like Lucene 8 when the option is enabled). > /cc [~mayya] and [~jpountz] -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-10518) FieldInfos consistency check can refuse to open Lucene 8 index
[ https://issues.apache.org/jira/browse/LUCENE-10518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524548#comment-17524548 ] Mayya Sharipova edited comment on LUCENE-10518 at 4/19/22 7:16 PM: --- [~dnhatn] [~jpountz] Thanks for your suggestions. +1 on the idea as well. Do we want to disable consistency check only on segments that were created in 8.x? I guess for new 9.x segments of the 8.x index, consistency checks will be enforced during indexing. was (Author: mayya): [~dnhatn] [~jpountz] Thanks for your suggestions. +1 on the idea as well. Do we want to disable consistency check only on segments that were created in 8.x or on all segments even of 9.x segments of 8.x index? > FieldInfos consistency check can refuse to open Lucene 8 index > -- > > Key: LUCENE-10518 > URL: https://issues.apache.org/jira/browse/LUCENE-10518 > Project: Lucene - Core > Issue Type: Bug > Components: core/index >Affects Versions: 8.10.1 >Reporter: Nhat Nguyen >Priority: Major > > A field-infos consistency check introduced in Lucene 9 (LUCENE-9334) can > refuse to open a Lucene 8 index. Lucene 8 can create a partial FieldInfo if > hitting a non-aborting exception (for example [term is too > long|https://github.com/apache/lucene-solr/blob/6a6484ba396927727b16e5061384d3cd80d616b2/lucene/core/src/java/org/apache/lucene/index/DefaultIndexingChain.java#L944]) > during processing fields of a document. We don't have this problem in Lucene > 9 as we process fields in two phases with the [first > phase|https://github.com/apache/lucene/blob/10ebc099c846c7d96f4ff5f9b7853df850fa8442/lucene/core/src/java/org/apache/lucene/index/IndexingChain.java#L589-L614] > processing only FieldInfos. > The issue can be reproduced with this snippet. > {code:java} > public void testWriteIndexOn8x() throws Exception { > FieldType KeywordField = new FieldType(); > KeywordField.setTokenized(false); > KeywordField.setOmitNorms(true); > KeywordField.setIndexOptions(IndexOptions.DOCS); > KeywordField.freeze(); > try (Directory dir = newDirectory()) { > IndexWriterConfig config = new IndexWriterConfig(); > config.setCommitOnClose(false); > config.setMergePolicy(NoMergePolicy.INSTANCE); > try (IndexWriter writer = new IndexWriter(dir, config)) { > // first segment > writer.addDocument(new Document()); // an empty doc > Document d1 = new Document(); > byte[] chars = new byte[IndexWriter.MAX_STORED_STRING_LENGTH + 1]; > Arrays.fill(chars, (byte) 'a'); > d1.add(new Field("field", new BytesRef(chars), KeywordField)); > d1.add(new BinaryDocValuesField("field", new BytesRef(chars))); > expectThrows(IllegalArgumentException.class, () -> > writer.addDocument(d1)); > writer.flush(); > // second segment > Document d2 = new Document(); > d2.add(new Field("field", new BytesRef("hello world"), KeywordField)); > d2.add(new SortedDocValuesField("field", new BytesRef("hello world"))); > writer.addDocument(d2); > writer.flush(); > writer.commit(); > // Check for doc values types consistency > Map docValuesTypes = new HashMap<>(); > try(DirectoryReader reader = DirectoryReader.open(dir)){ > for (LeafReaderContext leaf : reader.leaves()) { > for (FieldInfo fi : leaf.reader().getFieldInfos()) { > DocValuesType current = docValuesTypes.putIfAbsent(fi.name, > fi.getDocValuesType()); > if (current != null && current != fi.getDocValuesType()) { > fail("cannot change DocValues type from " + current + " to " + > fi.getDocValuesType() + " for field \"" + fi.name + "\""); > } > } > } > } > } > } > } > {code} > I would like to propose to: > - Backport the two-phase fields processing from Lucene9 to Lucene8. The patch > should be small and contained. > - Introduce an option in Lucene9 to skip checking field-infos consistency > (i.e., behave like Lucene 8 when the option is enabled). > /cc [~mayya] and [~jpountz] -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-10518) FieldInfos consistency check can refuse to open Lucene 8 index
[ https://issues.apache.org/jira/browse/LUCENE-10518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524548#comment-17524548 ] Mayya Sharipova edited comment on LUCENE-10518 at 4/19/22 7:17 PM: --- [~dnhatn] [~jpountz] Thanks for your suggestions. +1 on the idea as well. Do we want to disable consistency check only on segments that were created in 8.x? I guess for new 9.x segments of the 8.x index, consistency checks will be enforced during indexing. was (Author: mayya): [~dnhatn] [~jpountz] Thanks for your suggestions. +1 on the idea as well. Do we want to disable consistency check only on segments that were created in 8.x? I guess for new 9.x segments of the 8.x index, consistency checks will be enforced during indexing. > FieldInfos consistency check can refuse to open Lucene 8 index > -- > > Key: LUCENE-10518 > URL: https://issues.apache.org/jira/browse/LUCENE-10518 > Project: Lucene - Core > Issue Type: Bug > Components: core/index >Affects Versions: 8.10.1 >Reporter: Nhat Nguyen >Priority: Major > > A field-infos consistency check introduced in Lucene 9 (LUCENE-9334) can > refuse to open a Lucene 8 index. Lucene 8 can create a partial FieldInfo if > hitting a non-aborting exception (for example [term is too > long|https://github.com/apache/lucene-solr/blob/6a6484ba396927727b16e5061384d3cd80d616b2/lucene/core/src/java/org/apache/lucene/index/DefaultIndexingChain.java#L944]) > during processing fields of a document. We don't have this problem in Lucene > 9 as we process fields in two phases with the [first > phase|https://github.com/apache/lucene/blob/10ebc099c846c7d96f4ff5f9b7853df850fa8442/lucene/core/src/java/org/apache/lucene/index/IndexingChain.java#L589-L614] > processing only FieldInfos. > The issue can be reproduced with this snippet. > {code:java} > public void testWriteIndexOn8x() throws Exception { > FieldType KeywordField = new FieldType(); > KeywordField.setTokenized(false); > KeywordField.setOmitNorms(true); > KeywordField.setIndexOptions(IndexOptions.DOCS); > KeywordField.freeze(); > try (Directory dir = newDirectory()) { > IndexWriterConfig config = new IndexWriterConfig(); > config.setCommitOnClose(false); > config.setMergePolicy(NoMergePolicy.INSTANCE); > try (IndexWriter writer = new IndexWriter(dir, config)) { > // first segment > writer.addDocument(new Document()); // an empty doc > Document d1 = new Document(); > byte[] chars = new byte[IndexWriter.MAX_STORED_STRING_LENGTH + 1]; > Arrays.fill(chars, (byte) 'a'); > d1.add(new Field("field", new BytesRef(chars), KeywordField)); > d1.add(new BinaryDocValuesField("field", new BytesRef(chars))); > expectThrows(IllegalArgumentException.class, () -> > writer.addDocument(d1)); > writer.flush(); > // second segment > Document d2 = new Document(); > d2.add(new Field("field", new BytesRef("hello world"), KeywordField)); > d2.add(new SortedDocValuesField("field", new BytesRef("hello world"))); > writer.addDocument(d2); > writer.flush(); > writer.commit(); > // Check for doc values types consistency > Map docValuesTypes = new HashMap<>(); > try(DirectoryReader reader = DirectoryReader.open(dir)){ > for (LeafReaderContext leaf : reader.leaves()) { > for (FieldInfo fi : leaf.reader().getFieldInfos()) { > DocValuesType current = docValuesTypes.putIfAbsent(fi.name, > fi.getDocValuesType()); > if (current != null && current != fi.getDocValuesType()) { > fail("cannot change DocValues type from " + current + " to " + > fi.getDocValuesType() + " for field \"" + fi.name + "\""); > } > } > } > } > } > } > } > {code} > I would like to propose to: > - Backport the two-phase fields processing from Lucene9 to Lucene8. The patch > should be small and contained. > - Introduce an option in Lucene9 to skip checking field-infos consistency > (i.e., behave like Lucene 8 when the option is enabled). > /cc [~mayya] and [~jpountz] -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on pull request #819: fail clearly on too-new JDK
dweiss commented on PR #819: URL: https://github.com/apache/lucene/pull/819#issuecomment-1103018456 > Maybe it could be in a .properties file? Not gradle, not groovy, a real actual .properties file that we can read with java.io.Properties too? I was thinking about something like this too, it's convenient. There is a number of those places referencing version numbers - I can't even remember them all. A commit list referencing LUCENE-10283 is a good place to start... :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on pull request #817: improve spotless error to suggest running 'gradlew tidy'
dweiss commented on PR #817: URL: https://github.com/apache/lucene/pull/817#issuecomment-1103022163 You can commit this in or leave this open for a day or two. I'm catching up with work after a short holiday but maybe I can take a stab at this as a breather. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #807: LUCENE-10512: Grammar: Remove incidents of "the the" in comments.
rmuir commented on PR #807: URL: https://github.com/apache/lucene/pull/807#issuecomment-1103079108 > I don't know, Mike... Gradle doesn't seem like a tool that you can ever make dead-simple (like ant). I like what Robert added but with hacks like that a question always pops to my mind of what happens in gradle, internally, if you throw an exception from such a block - think throwing an exception from java's try-finally that obscures the original cause... I first tried simply "printing stuff" (not throwing exception). in that case you see my "print" before the actual spotless exception text. So I changed it to `throw new GradleException` only because it would print my text after the spotless exception. If the concern is throwing the exception, maybe we could just print stuff. If we added more ascii art around it, it could still work :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10518) FieldInfos consistency check can refuse to open Lucene 8 index
[ https://issues.apache.org/jira/browse/LUCENE-10518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524582#comment-17524582 ] Nhat Nguyen commented on LUCENE-10518: -- [~mayya] That's correct. We only reduce the strict level of the consistency check when opening an existing Lucene 8x with IndexWriter. The same enforce will be applied to new segments. > FieldInfos consistency check can refuse to open Lucene 8 index > -- > > Key: LUCENE-10518 > URL: https://issues.apache.org/jira/browse/LUCENE-10518 > Project: Lucene - Core > Issue Type: Bug > Components: core/index >Affects Versions: 8.10.1 >Reporter: Nhat Nguyen >Priority: Major > > A field-infos consistency check introduced in Lucene 9 (LUCENE-9334) can > refuse to open a Lucene 8 index. Lucene 8 can create a partial FieldInfo if > hitting a non-aborting exception (for example [term is too > long|https://github.com/apache/lucene-solr/blob/6a6484ba396927727b16e5061384d3cd80d616b2/lucene/core/src/java/org/apache/lucene/index/DefaultIndexingChain.java#L944]) > during processing fields of a document. We don't have this problem in Lucene > 9 as we process fields in two phases with the [first > phase|https://github.com/apache/lucene/blob/10ebc099c846c7d96f4ff5f9b7853df850fa8442/lucene/core/src/java/org/apache/lucene/index/IndexingChain.java#L589-L614] > processing only FieldInfos. > The issue can be reproduced with this snippet. > {code:java} > public void testWriteIndexOn8x() throws Exception { > FieldType KeywordField = new FieldType(); > KeywordField.setTokenized(false); > KeywordField.setOmitNorms(true); > KeywordField.setIndexOptions(IndexOptions.DOCS); > KeywordField.freeze(); > try (Directory dir = newDirectory()) { > IndexWriterConfig config = new IndexWriterConfig(); > config.setCommitOnClose(false); > config.setMergePolicy(NoMergePolicy.INSTANCE); > try (IndexWriter writer = new IndexWriter(dir, config)) { > // first segment > writer.addDocument(new Document()); // an empty doc > Document d1 = new Document(); > byte[] chars = new byte[IndexWriter.MAX_STORED_STRING_LENGTH + 1]; > Arrays.fill(chars, (byte) 'a'); > d1.add(new Field("field", new BytesRef(chars), KeywordField)); > d1.add(new BinaryDocValuesField("field", new BytesRef(chars))); > expectThrows(IllegalArgumentException.class, () -> > writer.addDocument(d1)); > writer.flush(); > // second segment > Document d2 = new Document(); > d2.add(new Field("field", new BytesRef("hello world"), KeywordField)); > d2.add(new SortedDocValuesField("field", new BytesRef("hello world"))); > writer.addDocument(d2); > writer.flush(); > writer.commit(); > // Check for doc values types consistency > Map docValuesTypes = new HashMap<>(); > try(DirectoryReader reader = DirectoryReader.open(dir)){ > for (LeafReaderContext leaf : reader.leaves()) { > for (FieldInfo fi : leaf.reader().getFieldInfos()) { > DocValuesType current = docValuesTypes.putIfAbsent(fi.name, > fi.getDocValuesType()); > if (current != null && current != fi.getDocValuesType()) { > fail("cannot change DocValues type from " + current + " to " + > fi.getDocValuesType() + " for field \"" + fi.name + "\""); > } > } > } > } > } > } > } > {code} > I would like to propose to: > - Backport the two-phase fields processing from Lucene9 to Lucene8. The patch > should be small and contained. > - Introduce an option in Lucene9 to skip checking field-infos consistency > (i.e., behave like Lucene 8 when the option is enabled). > /cc [~mayya] and [~jpountz] -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10524) Augment CONTRIBUTING.md guide with instructions on how/when to benchmark
Gautam Worah created LUCENE-10524: - Summary: Augment CONTRIBUTING.md guide with instructions on how/when to benchmark Key: LUCENE-10524 URL: https://issues.apache.org/jira/browse/LUCENE-10524 Project: Lucene - Core Issue Type: Wish Reporter: Gautam Worah This came up when I was trying to think about improving the experience for new contributors. Today, new contributors are usually unaware of where luceneutil benchmarks are and when/how to run them. Committers usually end up pointing contributors to the benchmarks package when they make perf impacting changes and then they run the benchmarks. Adding benchmark details to the Lucene repo will also make them more accessible to other researchers who want to experiment/benchmark their own custom task implementation with Java Lucene. What does the community think? -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10525) Improve WindowsFS emulation to catch directory names with : in them (which is not allowed)
Gautam Worah created LUCENE-10525: - Summary: Improve WindowsFS emulation to catch directory names with : in them (which is not allowed) Key: LUCENE-10525 URL: https://issues.apache.org/jira/browse/LUCENE-10525 Project: Lucene - Core Issue Type: Improvement Reporter: Gautam Worah In PR ([https://github.com/apache/lucene/pull/762)] we missed the case where a tempDir name was using `:` in the dir name. This test was passing in Linux, MacOS environments but ended up failing in Windows build systems. We ended up pushing a fix to not use `:` in the names. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10525) Improve WindowsFS emulation to catch directory names with : in them (which is not allowed)
[ https://issues.apache.org/jira/browse/LUCENE-10525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gautam Worah updated LUCENE-10525: -- Description: In PR ([https://github.com/apache/lucene/pull/762)] we missed the case where a tempDir name was using `:` in the dir name. This test was passing in Linux, MacOS environments but ended up failing in Windows build systems. We ended up pushing a fix to not use `:` in the names. Open to other ideas as well! was: In PR ([https://github.com/apache/lucene/pull/762)] we missed the case where a tempDir name was using `:` in the dir name. This test was passing in Linux, MacOS environments but ended up failing in Windows build systems. We ended up pushing a fix to not use `:` in the names. > Improve WindowsFS emulation to catch directory names with : in them (which is > not allowed) > --- > > Key: LUCENE-10525 > URL: https://issues.apache.org/jira/browse/LUCENE-10525 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Gautam Worah >Priority: Minor > > In PR ([https://github.com/apache/lucene/pull/762)] we missed the case where > a tempDir name was using `:` in the dir name. This test was passing in Linux, > MacOS environments but ended up failing in Windows build systems. > We ended up pushing a fix to not use `:` in the names. > Open to other ideas as well! -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #817: improve spotless error to suggest running 'gradlew tidy'
rmuir commented on PR #817: URL: https://github.com/apache/lucene/pull/817#issuecomment-1103262627 no reason to rush it in, take some time to think about it. i do think the change is worth the trouble though, reduce friction for new developers. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #819: fail clearly on too-new JDK
rmuir commented on PR #819: URL: https://github.com/apache/lucene/pull/819#issuecomment-1103272545 > I agree with this (as someone who disables the daemon and runs the commands every time). The only reason I specified it as a different command was due to the fact that if the downloader fails, the error is "trapped" and an additional version-related error (that you need at least java 11) is printed... IMO that's a bit confusing, but I get it. we can clean up exit status so that wrapperdownloader uses something other than `1` when it fails. that's also the same status old `java` uses when it doesn't recognize `--source` parameter. then we can move the version check into wrapperdownloader and only say "please make sure you're using at least java 11" when it is relevant (java itself fails). and it is safe to only apply it to exit status `1` because it is only relevant to already-released older jvms. I will look into this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #819: fail clearly on too-new JDK
rmuir commented on PR #819: URL: https://github.com/apache/lucene/pull/819#issuecomment-1103286863 OK, @dweiss can you take another look when you get a chance? java 8: ``` $ ./gradlew check Unrecognized option: --source Error: Could not create the Java Virtual Machine. Error: A fatal exception has occurred. Program will exit. ERROR: Something went wrong. Make sure you're using Java 11 or later. ``` java 18: ``` $ ./gradlew check ERROR: java version must not be newer than 17 (unsupported by gradle), your version: 18 ``` But there's still more to do. if you use java 11 it will still fail, just differently, based on our gradle actual gradle logic. Let's say the user has java 9. It isn't good to tell a user to upgrade to java 11 or later (say they pick java 11), they download it, install it, only it then fails and says you need 17 or later. then the user download and install's the latest (say 18), only for it to fail yet one more time and tell them they need exactly 17... this is like leading them through a maze. So I want to clean up the messaging some more still... java 11: ``` $ ./gradlew check To honour the JVM settings for this build a single-use Daemon process will be forked. See https://docs.gradle.org/7.2/userguide/gradle_daemon.html#sec:disabling_the_daemon. Daemon will be stopped at the end of the build FAILURE: Build failed with an exception. * Where: Script '/home/rmuir/workspace/lucene/gradle/validation/check-environment.gradle' line: 36 * What went wrong: A problem occurred evaluating script. > At least Java 17 is required, you are running Java 11 [OpenJDK 64-Bit Server VM 11.0.13+8] * Try: Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output. Run with --scan to get full insights. * Get more help at https://help.gradle.org BUILD FAILED in 4s ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #819: fail clearly on too-new JDK
rmuir commented on PR #819: URL: https://github.com/apache/lucene/pull/819#issuecomment-1103301886 OK, i cleaned up the messaging here to emit better messages so we don't lead users through a maze of downloading and retrying. I didn't touch the gradle checks. And yeah, it would be great to consolidate to a properties file (min and max versions?) so there is less places to change when we bump it, we could read the properties from this checker at least. java 18 ``` $ ./gradlew check ERROR: java version be exactly 17, your version: 18 java 9 ``` $ ./gradlew check Unrecognized option: --source Error: Could not create the Java Virtual Machine. Error: A fatal exception has occurred. Program will exit. ERROR: Something went wrong. Make sure you're using Java 17. ``` java 11: ``` $ ./gradlew check ERROR: java version be exactly 17, your version: 11 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10525) Improve WindowsFS emulation to catch directory names with : in them (which is not allowed)
[ https://issues.apache.org/jira/browse/LUCENE-10525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524644#comment-17524644 ] Robert Muir commented on LUCENE-10525: -- good idea. there are quite a few banned characters and "names" we could look for: https://docs.microsoft.com/en-us/windows/win32/fileio/naming-a-file. > Improve WindowsFS emulation to catch directory names with : in them (which is > not allowed) > --- > > Key: LUCENE-10525 > URL: https://issues.apache.org/jira/browse/LUCENE-10525 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Gautam Worah >Priority: Minor > > In PR ([https://github.com/apache/lucene/pull/762)] we missed the case where > a tempDir name was using `:` in the dir name. This test was passing in Linux, > MacOS environments but ended up failing in Windows build systems. > We ended up pushing a fix to not use `:` in the names. > Open to other ideas as well! -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10525) Improve WindowsFS emulation to catch directory names with : in them (which is not allowed)
[ https://issues.apache.org/jira/browse/LUCENE-10525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524647#comment-17524647 ] Robert Muir commented on LUCENE-10525: -- So the "real" windowsfilesystem fails in the methods like Path.resolve(), throwing exceptions that look like this: {noformat} java.nio.file.InvalidPathException: Illegal char <:> at index 13: 2022-04-15T20:35:33.995886500Z-001 {noformat} I think we can really simulate it well, by doing exactly the same thing. But it is some work, and these APIs are not exactly fun to wrestle with. and we have to refactor these mock filesystems for it to work cleanly. Here's my suggestion of a plan: First we have to fix all the places in the code currently calling "new FilterPath()" directly (e.g. make this class package private or something). We can add a method like {{wrap(Path)}} to filesystemprovider that does the same thing, so there's only a single place doing the wrapping. then windowsfs can override this new {{wrap(Path)}} to return 'new WindowsPath', where WindowsPath is a new class that extends FilterPath, but overrides all the resolve() methods with the additional checks. > Improve WindowsFS emulation to catch directory names with : in them (which is > not allowed) > --- > > Key: LUCENE-10525 > URL: https://issues.apache.org/jira/browse/LUCENE-10525 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Gautam Worah >Priority: Minor > > In PR ([https://github.com/apache/lucene/pull/762)] we missed the case where > a tempDir name was using `:` in the dir name. This test was passing in Linux, > MacOS environments but ended up failing in Windows build systems. > We ended up pushing a fix to not use `:` in the names. > Open to other ideas as well! -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10524) Augment CONTRIBUTING.md guide with instructions on how/when to benchmark
[ https://issues.apache.org/jira/browse/LUCENE-10524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524662#comment-17524662 ] Tomoko Uchida commented on LUCENE-10524: I think it is great to have good documentation about benchmarking, actually, it is a kind of must-have stuff for this project to me. I'm not sure how the volume will be, but how about having a dedicated help document (say, `gradlew helpBenchmark`) and link to it from CONTRIBUTING.md? > Augment CONTRIBUTING.md guide with instructions on how/when to benchmark > > > Key: LUCENE-10524 > URL: https://issues.apache.org/jira/browse/LUCENE-10524 > Project: Lucene - Core > Issue Type: Wish >Reporter: Gautam Worah >Priority: Minor > > This came up when I was trying to think about improving the experience for > new contributors. > Today, new contributors are usually unaware of where luceneutil benchmarks > are and when/how to run them. Committers usually end up pointing contributors > to the benchmarks package when they make perf impacting changes and then they > run the benchmarks. > > Adding benchmark details to the Lucene repo will also make them more > accessible to other researchers who want to experiment/benchmark their own > custom task implementation with Java Lucene. > > What does the community think? > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10526) add single method to mockfile to wrap a Path
Robert Muir created LUCENE-10526: Summary: add single method to mockfile to wrap a Path Key: LUCENE-10526 URL: https://issues.apache.org/jira/browse/LUCENE-10526 Project: Lucene - Core Issue Type: Sub-task Reporter: Robert Muir Currently, mockfilesystems wrap a path with "new FilterPath". but this "wrapping" logic is scattered everywhere in the code (and tests!). And it is hardcoded at filterpath (subclassing is not possible). This makes it impossible for a mock filesystem to extend FilterPath with some custom logic (example: check for special windows reserved characters). I don't think code/tests should be calling "new FilterPath" everywhere, this is also just messy. Instead they should ask the mockfilesystem's provider to wrap the path: {{provider.wrapPath(path, filesystem)}}. This way, WindowsFS can then override wrapPath() with a subclass that looks for special characters. This issue is just for the API refactoring/cleanup. Additional Windows-simulation can happen on the parent issue. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir opened a new pull request, #822: LUCENE-10526: add single method to mockfile to wrap a Path
rmuir opened a new pull request, #822: URL: https://github.com/apache/lucene/pull/822 Currently "new FilterPath" is called from everywhere, making it impossible for a mockfilesystem to use a custom subclass. See JIRA for full description. The use case here is to e.g. allow WindowsFS to wrap with a custom subclass that checks for special characters that windows doesn't allow. But today it can't do that since there is code everywhere doing the wrapping. This PR adds a single method `wrapPath()` to FilterFileSystemProvider to do this wrapping, and refactors everything to use it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10525) Improve WindowsFS emulation to catch directory names with : in them (which is not allowed)
[ https://issues.apache.org/jira/browse/LUCENE-10525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524664#comment-17524664 ] Robert Muir commented on LUCENE-10525: -- I added a PR with a stab at the refactoring piece: https://github.com/apache/lucene/pull/822 I think after this change, you can override {{wrapPath()}} in WindowsFS to return a subclass of FilterPath (e.g. WindowsPath) that adds special character checks/name checks to all of its resolve() methods. > Improve WindowsFS emulation to catch directory names with : in them (which is > not allowed) > --- > > Key: LUCENE-10525 > URL: https://issues.apache.org/jira/browse/LUCENE-10525 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Gautam Worah >Priority: Minor > > In PR ([https://github.com/apache/lucene/pull/762)] we missed the case where > a tempDir name was using `:` in the dir name. This test was passing in Linux, > MacOS environments but ended up failing in Windows build systems. > We ended up pushing a fix to not use `:` in the names. > Open to other ideas as well! -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta commented on pull request #819: fail clearly on too-new JDK
mocobeta commented on PR #819: URL: https://github.com/apache/lucene/pull/819#issuecomment-1103423089 > Maybe it could be in a .properties file? Not gradle, not groovy, a real actual .properties file that we can read with java.io.Properties too? +1 to a single source of source/target Java version(s). A simple key-value format may be easily used from the outside world of java/gradle - github actions scripts or the smoke tester, and so on. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org