[GitHub] [lucene] iverase merged pull request #478: LUCENE-10264: Clone index input when creating a PointTree in SimpleTextBKDReader
iverase merged pull request #478: URL: https://github.com/apache/lucene/pull/478 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9820) Separate logic for reading the BKD index from logic to intersecting it.
[ https://issues.apache.org/jira/browse/LUCENE-9820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450249#comment-17450249 ] ASF subversion and git services commented on LUCENE-9820: - Commit 634c22c527ef72b1d400bb8284cff6b9971766c1 in lucene's branch refs/heads/main from Ignacio Vera [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=634c22c ] LUCENE-10264: Clone index input when creating a PointTree in SimpleTextBKDReader (#478) Fixes a race condition introduced in LUCENE-9820. > Separate logic for reading the BKD index from logic to intersecting it. > --- > > Key: LUCENE-9820 > URL: https://issues.apache.org/jira/browse/LUCENE-9820 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Ignacio Vera >Assignee: Ignacio Vera >Priority: Major > Fix For: 9.1 > > Time Spent: 12h 20m > Remaining Estimate: 0h > > Currently the class BKDReader contains all the logic for traversing the KD > tree and the logic to read the actual index. This makes difficult to develop > new visiting strategies, for example LUCENE-9619, where it is proposed to > move Points from a visitor API to a custor-style API. > The first step is to isolate the logic the read the index from the logic that > visits the the tree. Another benefit of doing this, is that it will help > evolving the index, for example moving the current index format to backwards > codec without moving the visiting logic. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10264) Test errors in SimpleTextBKDReader
[ https://issues.apache.org/jira/browse/LUCENE-10264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450248#comment-17450248 ] ASF subversion and git services commented on LUCENE-10264: -- Commit 634c22c527ef72b1d400bb8284cff6b9971766c1 in lucene's branch refs/heads/main from Ignacio Vera [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=634c22c ] LUCENE-10264: Clone index input when creating a PointTree in SimpleTextBKDReader (#478) Fixes a race condition introduced in LUCENE-9820. > Test errors in SimpleTextBKDReader > -- > > Key: LUCENE-10264 > URL: https://issues.apache.org/jira/browse/LUCENE-10264 > Project: Lucene - Core > Issue Type: Bug >Reporter: Ignacio Vera >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > I noticed a couple of errors in CI regarding the SimpleTextBKDReader which > are introduced by LUCENE-9820. I had a look and indeed the problem is that we > are not cloning the index input when creating a PointTree and therefore if > there are two threads accessing the same PointValue instance (e.g a search > request and a background merge), then we have troubles. > Reproduce with: > > {noformat} > ./gradlew test --tests TestSimpleTextPointsFormat.testWithExceptions > -Dtests.seed=56F6BF03D7871A6D -Dtests.multiplier=3 -Dtests.slow=true > -Dtests.locale=ses -Dtests.timezone=Asia/Ho_Chi_Minh -Dtests.asserts=true > -Dtests.file.encoding=UTF-8 > {noformat} > > Error: > {noformat} > Stack Trace: > java.lang.AssertionError > at > __randomizedtesting.SeedInfo.seed([56F6BF03D7871A6D:F4A5237F58095597]:0) > at > org.apache.lucene.codecs.simpletext.SimpleTextBKDReader$SimpleTextPointTree.parseInt(SimpleTextBKDReader.java:387) > at > org.apache.lucene.codecs.simpletext.SimpleTextBKDReader$SimpleTextPointTree.readDocIDs(SimpleTextBKDReader.java:374) > at > org.apache.lucene.codecs.simpletext.SimpleTextBKDReader$SimpleTextPointTree.visitDocValues(SimpleTextBKDReader.java:345) > at > org.apache.lucene.codecs.simpletext.SimpleTextBKDReader$SimpleTextPointTree.visitDocValues(SimpleTextBKDReader.java:362) > at > org.apache.lucene.codecs.PointsWriter$1$1$1.visitDocValues(PointsWriter.java:142) > at > org.apache.lucene.codecs.simpletext.SimpleTextPointsWriter.writeField(SimpleTextPointsWriter.java:95) > at > org.apache.lucene.codecs.PointsWriter.mergeOneField(PointsWriter.java:57) > at org.apache.lucene.codecs.PointsWriter.merge(PointsWriter.java:231) > at > org.apache.lucene.index.SegmentMerger.mergePoints(SegmentMerger.java:184) > at > org.apache.lucene.index.SegmentMerger.mergeWithLogging(SegmentMerger.java:291) > at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:144) > at > org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:3190) > at > org.apache.lucene.index.RandomIndexWriter.addIndexes(RandomIndexWriter.java:320) > at > org.apache.lucene.index.BasePointsFormatTestCase.switchIndex(BasePointsFormatTestCase.java:1118) > at > org.apache.lucene.index.BasePointsFormatTestCase.verify(BasePointsFormatTestCase.java:779) > at > org.apache.lucene.index.BasePointsFormatTestCase.testWithExceptions(BasePointsFormatTestCase.java:247) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) > at > com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992) > at > org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:44) > at > org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43) > at > org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45) > at > org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60) > at > org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at > com.carrotsearch.randomiz
[jira] [Commented] (LUCENE-9820) Separate logic for reading the BKD index from logic to intersecting it.
[ https://issues.apache.org/jira/browse/LUCENE-9820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450251#comment-17450251 ] ASF subversion and git services commented on LUCENE-9820: - Commit 62084d7138808887783199b4256fc3eee794355e in lucene's branch refs/heads/branch_9x from Ignacio Vera [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=62084d7 ] LUCENE-10264: Clone index input when creating a PointTree in SimpleTextBKDReader (#478) Fixes a race condition introduced in LUCENE-9820. > Separate logic for reading the BKD index from logic to intersecting it. > --- > > Key: LUCENE-9820 > URL: https://issues.apache.org/jira/browse/LUCENE-9820 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Ignacio Vera >Assignee: Ignacio Vera >Priority: Major > Fix For: 9.1 > > Time Spent: 12h 20m > Remaining Estimate: 0h > > Currently the class BKDReader contains all the logic for traversing the KD > tree and the logic to read the actual index. This makes difficult to develop > new visiting strategies, for example LUCENE-9619, where it is proposed to > move Points from a visitor API to a custor-style API. > The first step is to isolate the logic the read the index from the logic that > visits the the tree. Another benefit of doing this, is that it will help > evolving the index, for example moving the current index format to backwards > codec without moving the visiting logic. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10264) Test errors in SimpleTextBKDReader
[ https://issues.apache.org/jira/browse/LUCENE-10264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450250#comment-17450250 ] ASF subversion and git services commented on LUCENE-10264: -- Commit 62084d7138808887783199b4256fc3eee794355e in lucene's branch refs/heads/branch_9x from Ignacio Vera [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=62084d7 ] LUCENE-10264: Clone index input when creating a PointTree in SimpleTextBKDReader (#478) Fixes a race condition introduced in LUCENE-9820. > Test errors in SimpleTextBKDReader > -- > > Key: LUCENE-10264 > URL: https://issues.apache.org/jira/browse/LUCENE-10264 > Project: Lucene - Core > Issue Type: Bug >Reporter: Ignacio Vera >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > I noticed a couple of errors in CI regarding the SimpleTextBKDReader which > are introduced by LUCENE-9820. I had a look and indeed the problem is that we > are not cloning the index input when creating a PointTree and therefore if > there are two threads accessing the same PointValue instance (e.g a search > request and a background merge), then we have troubles. > Reproduce with: > > {noformat} > ./gradlew test --tests TestSimpleTextPointsFormat.testWithExceptions > -Dtests.seed=56F6BF03D7871A6D -Dtests.multiplier=3 -Dtests.slow=true > -Dtests.locale=ses -Dtests.timezone=Asia/Ho_Chi_Minh -Dtests.asserts=true > -Dtests.file.encoding=UTF-8 > {noformat} > > Error: > {noformat} > Stack Trace: > java.lang.AssertionError > at > __randomizedtesting.SeedInfo.seed([56F6BF03D7871A6D:F4A5237F58095597]:0) > at > org.apache.lucene.codecs.simpletext.SimpleTextBKDReader$SimpleTextPointTree.parseInt(SimpleTextBKDReader.java:387) > at > org.apache.lucene.codecs.simpletext.SimpleTextBKDReader$SimpleTextPointTree.readDocIDs(SimpleTextBKDReader.java:374) > at > org.apache.lucene.codecs.simpletext.SimpleTextBKDReader$SimpleTextPointTree.visitDocValues(SimpleTextBKDReader.java:345) > at > org.apache.lucene.codecs.simpletext.SimpleTextBKDReader$SimpleTextPointTree.visitDocValues(SimpleTextBKDReader.java:362) > at > org.apache.lucene.codecs.PointsWriter$1$1$1.visitDocValues(PointsWriter.java:142) > at > org.apache.lucene.codecs.simpletext.SimpleTextPointsWriter.writeField(SimpleTextPointsWriter.java:95) > at > org.apache.lucene.codecs.PointsWriter.mergeOneField(PointsWriter.java:57) > at org.apache.lucene.codecs.PointsWriter.merge(PointsWriter.java:231) > at > org.apache.lucene.index.SegmentMerger.mergePoints(SegmentMerger.java:184) > at > org.apache.lucene.index.SegmentMerger.mergeWithLogging(SegmentMerger.java:291) > at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:144) > at > org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:3190) > at > org.apache.lucene.index.RandomIndexWriter.addIndexes(RandomIndexWriter.java:320) > at > org.apache.lucene.index.BasePointsFormatTestCase.switchIndex(BasePointsFormatTestCase.java:1118) > at > org.apache.lucene.index.BasePointsFormatTestCase.verify(BasePointsFormatTestCase.java:779) > at > org.apache.lucene.index.BasePointsFormatTestCase.testWithExceptions(BasePointsFormatTestCase.java:247) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) > at > com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992) > at > org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:44) > at > org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43) > at > org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45) > at > org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60) > at > org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at > com.carrotsearch.ran
[jira] [Commented] (LUCENE-10267) Gradle does not write module version attribute for modules with zero dependencies
[ https://issues.apache.org/jira/browse/LUCENE-10267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450257#comment-17450257 ] Jerome Prinet commented on LUCENE-10267: Hi David and thanks for raising that! I'm taking it internally and I'll keep you posted. > Gradle does not write module version attribute for modules with zero > dependencies > - > > Key: LUCENE-10267 > URL: https://issues.apache.org/jira/browse/LUCENE-10267 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Dawid Weiss >Priority: Minor > Attachments: mod-version-repro.zip > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10267) Gradle does not write module version attribute for modules with zero dependencies
[ https://issues.apache.org/jira/browse/LUCENE-10267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450258#comment-17450258 ] Dawid Weiss commented on LUCENE-10267: -- Thanks [~JeromeP]! > Gradle does not write module version attribute for modules with zero > dependencies > - > > Key: LUCENE-10267 > URL: https://issues.apache.org/jira/browse/LUCENE-10267 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Dawid Weiss >Priority: Minor > Attachments: mod-version-repro.zip > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-10264) Test errors in SimpleTextBKDReader
[ https://issues.apache.org/jira/browse/LUCENE-10264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ignacio Vera resolved LUCENE-10264. --- Assignee: Ignacio Vera Resolution: Fixed I haven't added an entry in CHANGES.txt as it is an unreleased bug. > Test errors in SimpleTextBKDReader > -- > > Key: LUCENE-10264 > URL: https://issues.apache.org/jira/browse/LUCENE-10264 > Project: Lucene - Core > Issue Type: Bug >Reporter: Ignacio Vera >Assignee: Ignacio Vera >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > I noticed a couple of errors in CI regarding the SimpleTextBKDReader which > are introduced by LUCENE-9820. I had a look and indeed the problem is that we > are not cloning the index input when creating a PointTree and therefore if > there are two threads accessing the same PointValue instance (e.g a search > request and a background merge), then we have troubles. > Reproduce with: > > {noformat} > ./gradlew test --tests TestSimpleTextPointsFormat.testWithExceptions > -Dtests.seed=56F6BF03D7871A6D -Dtests.multiplier=3 -Dtests.slow=true > -Dtests.locale=ses -Dtests.timezone=Asia/Ho_Chi_Minh -Dtests.asserts=true > -Dtests.file.encoding=UTF-8 > {noformat} > > Error: > {noformat} > Stack Trace: > java.lang.AssertionError > at > __randomizedtesting.SeedInfo.seed([56F6BF03D7871A6D:F4A5237F58095597]:0) > at > org.apache.lucene.codecs.simpletext.SimpleTextBKDReader$SimpleTextPointTree.parseInt(SimpleTextBKDReader.java:387) > at > org.apache.lucene.codecs.simpletext.SimpleTextBKDReader$SimpleTextPointTree.readDocIDs(SimpleTextBKDReader.java:374) > at > org.apache.lucene.codecs.simpletext.SimpleTextBKDReader$SimpleTextPointTree.visitDocValues(SimpleTextBKDReader.java:345) > at > org.apache.lucene.codecs.simpletext.SimpleTextBKDReader$SimpleTextPointTree.visitDocValues(SimpleTextBKDReader.java:362) > at > org.apache.lucene.codecs.PointsWriter$1$1$1.visitDocValues(PointsWriter.java:142) > at > org.apache.lucene.codecs.simpletext.SimpleTextPointsWriter.writeField(SimpleTextPointsWriter.java:95) > at > org.apache.lucene.codecs.PointsWriter.mergeOneField(PointsWriter.java:57) > at org.apache.lucene.codecs.PointsWriter.merge(PointsWriter.java:231) > at > org.apache.lucene.index.SegmentMerger.mergePoints(SegmentMerger.java:184) > at > org.apache.lucene.index.SegmentMerger.mergeWithLogging(SegmentMerger.java:291) > at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:144) > at > org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:3190) > at > org.apache.lucene.index.RandomIndexWriter.addIndexes(RandomIndexWriter.java:320) > at > org.apache.lucene.index.BasePointsFormatTestCase.switchIndex(BasePointsFormatTestCase.java:1118) > at > org.apache.lucene.index.BasePointsFormatTestCase.verify(BasePointsFormatTestCase.java:779) > at > org.apache.lucene.index.BasePointsFormatTestCase.testWithExceptions(BasePointsFormatTestCase.java:247) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) > at > com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992) > at > org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:44) > at > org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43) > at > org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45) > at > org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60) > at > org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) > at > com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370) > at > com.carrotsearch.randomizedtesting.ThreadLe
[jira] [Commented] (LUCENE-9619) Move Points from a visitor API to a cursor-style API?
[ https://issues.apache.org/jira/browse/LUCENE-9619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450277#comment-17450277 ] Ignacio Vera commented on LUCENE-9619: -- In LUCENE-9820 we have done the first step to move the API but still the methods #visitDocsIds and #visitDocValues are using the IntersectVisitor as an input. Here I am proposing to introduce two functional interfaces {{DocIdsVisitor}} and {{DocValuesVisitor}} to use them as the input for those methods so the API would look like: {code:java} /** * Basic operations to read the KD-tree. * * @lucene.experimental */ public interface PointTree extends Cloneable { /** Clone, the current node becomes the root of the new tree. */ PointTree clone(); /** * Move to the first child node and return {@code true} upon success. Returns {@code false} for * leaf nodes and {@code true} otherwise. */ boolean moveToChild() throws IOException; /** * Move to the next sibling node and return {@code true} upon success. Returns {@code false} if * the current node has no more siblings. */ boolean moveToSibling() throws IOException; /** * Move to the parent node and return {@code true} upon success. Returns {@code false} for the * root node and {@code true} otherwise. */ boolean moveToParent() throws IOException; /** Return the minimum packed value of the current node. */ byte[] getMinPackedValue(); /** Return the maximum packed value of the current node. */ byte[] getMaxPackedValue(); /** Return the number of points below the current node. */ long size(); /** Visit all the docs below the current node. */ void visitDocIDs(DocIdsVisitor docIdsVisitor) throws IOException; /** Visit all the docs and values below the current node. */ default void visitDocValues(DocValuesVisitor docValuesVisitor) throws IOException { visitDocValues((min, max) -> Relation.CELL_CROSSES_QUERY, docID -> {}, docValuesVisitor); } /** * Similar to {@link #visitDocValues(DocValuesVisitor)} but in this case it allows adding a * filter that works like {@link IntersectVisitor#compare(byte[], byte[])}. */ void visitDocValues( BiFunction compare, DocIdsVisitor docIdsVisitor, DocValuesVisitor docValuesVisitor) throws IOException; } /** * Collects all documents below a tree node by calling {@link * PointTree#visitDocIDs(DocIdsVisitor)} */ @FunctionalInterface public interface DocIdsVisitor { /** Called for all documents below a tree node. */ void visit(int docID) throws IOException; } /** * Collects all documents and values below a tree node by calling {@link * PointTree#visitDocValues(DocValuesVisitor)} (DocIdsVisitor)} */ @FunctionalInterface public interface DocValuesVisitor { /** Called for all documents and values below a tree node. */ void visit(int docID, byte[] packedValue) throws IOException; /** * Similar to {@link DocValuesVisitor#visit(int, byte[])} but in this case the packedValue can * have more than one docID associated to it. The provided iterator should not escape the scope * of this method so that implementations of PointValues are free to reuse it. */ default void visit(DocIdSetIterator iterator, byte[] packedValue) throws IOException { int docID; while ((docID = iterator.nextDoc()) != DocIdSetIterator.NO_MORE_DOCS) { visit(docID, packedValue); } } } /** * We recurse the {@link PointTree}, using a provided instance of this to guide the recursion. * * @lucene.experimental */ public interface IntersectVisitor extends DocValuesVisitor, DocIdsVisitor { /** * Called for non-leaf cells to test how the cell relates to the query, to determine how to * further recurse down the tree. * * * {@link Relation#CELL_OUTSIDE_QUERY}: Stop recursing down the current branch of the * tree. * {@link Relation#CELL_INSIDE_QUERY}: All nodes below the current node are visited using * the underlying {@link DocIdsVisitor}. he consumer should generally blindly accept the * docID. * {@link Relation#CELL_CROSSES_QUERY}: Keep recursing down the current branch of the * tree. If the current node is a leaf, visit all docs and values usinng the underlying * {@link DocValuesVisitor}. The consumer should scrutinize the packedValue to decide * whether to accept it. * */ Relation compare(byte[] minPackedValue, byte[] maxPackedValue); /** Notifies the caller that this many documents are about to be visited */ default void grow(int count) {} } {code} Any thoughts? > Move Points from a visitor API to a cursor-style API? > - > > Key: LUCENE-9619 > URL: https://issues.apache.org/jira/browse/LUCENE-9619 > Project: Lucene - Core > Issue Type:
[jira] [Created] (LUCENE-10269) Add the ability to read KD trees from right to left
Ignacio Vera created LUCENE-10269: - Summary: Add the ability to read KD trees from right to left Key: LUCENE-10269 URL: https://issues.apache.org/jira/browse/LUCENE-10269 Project: Lucene - Core Issue Type: Improvement Reporter: Ignacio Vera In LUCENE-9820 we exposed a programatic API to navigate Lucene Kd-trees. It is currently only possible to navigate those trees from left to right via the methods #moveToChild and #moveToSibling. In LUCENE-10262 we improve the Kd tree so we remove the constraint of having to read the tree always forward. This added the possibility to introduce an API to read the tree from right to left. This will allow for example to get the maximum value for a dimension stored in a kd-tree that contains deleted documents, The idea will be something like: {code:java} /** * Move to the first child node and return {@code true} upon success. Returns {@code false} for * leaf nodes and {@code true} otherwise. */ boolean moveToFirstChild() throws IOException; /** * Move to the next sibling node and return {@code true} upon success. Returns {@code false} if * the current node is the last child. */ boolean moveToNextSibling() throws IOException; /** * Move to the last child node and return {@code true} upon success. Returns {@code false} for * leaf nodes and {@code true} otherwise. */ boolean moveToLastChild() throws IOException; /** * Move to the previous sibling node and return {@code true} upon success. Returns {@code false} if * the current node is the first child. */ boolean moveToPreviousSibling() throws IOException; {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10266) Move nearest-neighbor search on points to core?
[ https://issues.apache.org/jira/browse/LUCENE-10266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450296#comment-17450296 ] Ignacio Vera commented on LUCENE-10266: --- +1 > Move nearest-neighbor search on points to core? > --- > > Key: LUCENE-10266 > URL: https://issues.apache.org/jira/browse/LUCENE-10266 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Priority: Minor > > Now that the Points' public API supports running nearest-nearest neighbor > search, should we move it to core via helper methods on {{LatLonPoint}} and > {{XYPoint}}? -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir opened a new pull request #485: LUCENE-10010: don't determinize in CompiledAutomaton/RunAutomaton
rmuir opened a new pull request #485: URL: https://github.com/apache/lucene/pull/485 Instead, require that incoming automata is determinized by the caller, throwing an exception if it isn't. This paves the way for NFA execution in the future: if you pass an NFA to AutomatonQuery, we should use the NFA algorithm on it. No need for lots of booleans or enums. The idea is that we clean this one up and fold this into the main LUCENE-10010 PR, to keep the APIs simple. But we could also merge it independently first. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #225: LUCENE-10010 Introduce NFARunAutomaton to run NFA directly
rmuir commented on pull request #225: URL: https://github.com/apache/lucene/pull/225#issuecomment-981447584 I made a quick prototype with what i mean for the API: https://github.com/apache/lucene/pull/485 The idea is that AutomatonQuery shouldn't be determinizing. Let's push this to the caller. If they pass it a DFA, it uses DFA algorithm. If they pass it NFA, it can use the NFA algorithm (it currently throws an exception in my branch, instead of slowly determinizing, that is the change). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] iverase merged pull request #428: LUCENE-9538: Detect polygon self-intersections in the Tessellator
iverase merged pull request #428: URL: https://github.com/apache/lucene/pull/428 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9538) Tessellator should provide a better error message for self-intersecting shapes
[ https://issues.apache.org/jira/browse/LUCENE-9538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450310#comment-17450310 ] ASF subversion and git services commented on LUCENE-9538: - Commit 78c8d7b7ea6aca2202c5eeffcc19e837279721c6 in lucene's branch refs/heads/main from Ignacio Vera [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=78c8d7b ] LUCENE-9538: Detect polygon self-intersections in the Tessellator (#428) Detect self-intersections so it can provide a more meaningful error to the users. > Tessellator should provide a better error message for self-intersecting shapes > -- > > Key: LUCENE-9538 > URL: https://issues.apache.org/jira/browse/LUCENE-9538 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Ignacio Vera >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > Self-intersecting shapes cannot be tessellated and currently throw a generic > like: > > > {code:java} > Unable to Tessellate shape...{code} > > In case of Self-intersecting shapes we can do better and try to give a more > useful message by detecting the self-intersection position and provide that > information to the user. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9538) Tessellator should provide a better error message for self-intersecting shapes
[ https://issues.apache.org/jira/browse/LUCENE-9538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450311#comment-17450311 ] ASF subversion and git services commented on LUCENE-9538: - Commit 70243ea81151335183773944607164bb1c2b4ece in lucene's branch refs/heads/branch_9x from Ignacio Vera [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=70243ea ] LUCENE-9538: Detect polygon self-intersections in the Tessellator (#428) Detect self-intersections so it can provide a more meaningful error to the users. > Tessellator should provide a better error message for self-intersecting shapes > -- > > Key: LUCENE-9538 > URL: https://issues.apache.org/jira/browse/LUCENE-9538 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Ignacio Vera >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > Self-intersecting shapes cannot be tessellated and currently throw a generic > like: > > > {code:java} > Unable to Tessellate shape...{code} > > In case of Self-intersecting shapes we can do better and try to give a more > useful message by detecting the self-intersection position and provide that > information to the user. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-9538) Tessellator should provide a better error message for self-intersecting shapes
[ https://issues.apache.org/jira/browse/LUCENE-9538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ignacio Vera resolved LUCENE-9538. -- Fix Version/s: 9.1 Assignee: Ignacio Vera Resolution: Fixed > Tessellator should provide a better error message for self-intersecting shapes > -- > > Key: LUCENE-9538 > URL: https://issues.apache.org/jira/browse/LUCENE-9538 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Ignacio Vera >Assignee: Ignacio Vera >Priority: Major > Fix For: 9.1 > > Time Spent: 1.5h > Remaining Estimate: 0h > > Self-intersecting shapes cannot be tessellated and currently throw a generic > like: > > > {code:java} > Unable to Tessellate shape...{code} > > In case of Self-intersecting shapes we can do better and try to give a more > useful message by detecting the self-intersection position and provide that > information to the user. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10255) Fully embrace the java module system
[ https://issues.apache.org/jira/browse/LUCENE-10255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450327#comment-17450327 ] Uwe Schindler commented on LUCENE-10255: Hi, I would like to bring in one more thing to investiagte: The current Lucene modules as of 9.0 are named without full reverse domain names. We should investigate on other ASF projects if there is a "standard" how to name modules. I don't like it that the Maven group:artifact name is totally different from the module name. IMHO the Lucene module should be named with "org.apache.lucene." instead of plain "lucene.X". The log4j module uses this pattern already, and we should coordinate that. Maybe ASF has a standard already. I'd ask on the "Apache Commons" project to figure out how they plan to handle it. Changing the current syntax of module name is not a problem, because except for Luke we don't expose the modules in our documentation. As said before I am in favor to name the modules like "groupid.artifactid" based on Maven coordinates (append with "." inbetween). Thoughts? > Fully embrace the java module system > > > Key: LUCENE-10255 > URL: https://issues.apache.org/jira/browse/LUCENE-10255 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Dawid Weiss >Priority: Major > Attachments: screenshot-1.png, screenshot-2.png > > Time Spent: 3h 40m > Remaining Estimate: 0h > > I've experimented a bit trying to move the code to the JMS. It is > _surprisingly difficult_... A PoC that almost passes all checks is here: > https://github.com/dweiss/lucene/tree/jms > Here are my conclusions so far: > * The JMS and gradle add a lot of complexity (this applies to any > higher-level tooling, including IDEs, I think). For starters, modules have to > be JARs. The effect of this is that what was previously a set of directories > from dependencies now has to be a JAR. What was previously an incremental > update of a single .class file now ripples throughout the build recreating > module JARs (ZIPs!)... I didn't realize it at first, but it's a costly thing > to do. I'm not even sure how IDEs handle this issue. > * A Java module contains metadata (such as the module version or main class) > that is completely detached from any source file. These things live in a > class bytecode of the compiled module-info; interestingly, there is no > source-level way to specify it - these class attributes are injected by the > 'jar' tool. Gradle has some fancy on-the-fly asm conversion filter that > injects it. > * Dependencies between modules will effectively live in two places: in gradle > build files and in module-info files. And they can go out of sync, although > it's probably easy to catch (since javac would complain about missing classes > during compilation, even if they're in module path). > * Probably the biggest challenge (not covered in the PoC) are with our custom > javadoc and ecj linter tasks - they see the module-info.java and can't cope > with it. At the same time, there is no easy way to exclude that one > particular file: ecj would have to accept a full set of sources (command > argument limit will be a problem), javac can accept a full set of java > sources (external file) but then it doesn't copy doc-files properly anymore > (this is probably easier to fix). > * There are differences at runtime that are hard to anticipate - for example > resource lookups via class loader no longer work (I fixed this in Luke). > After poking a bit and trying it out I have to say I have mixed feelings > about moving to the JMS. On the one hand, many things are great - the module > path, module descriptors and access modes. On the other hand, the tooling > tricks required to make it all work make you shiver. > If anybody wants to play/ improve things on that experimental branch (I > converted Luke to a full module - it works), please be my guest. I have to > sit on this and think whether it's something I really like or not. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10255) Fully embrace the java module system
[ https://issues.apache.org/jira/browse/LUCENE-10255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450359#comment-17450359 ] Dawid Weiss commented on LUCENE-10255: -- I did that intentionally. I hate those long prefixes. They make life much more complicated and I don't think there's a risk of running into a conflict with anything existing... java -m org.apache.lucene.core sounds way less attractive than just java -m lucene.core. > Fully embrace the java module system > > > Key: LUCENE-10255 > URL: https://issues.apache.org/jira/browse/LUCENE-10255 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Dawid Weiss >Priority: Major > Attachments: screenshot-1.png, screenshot-2.png > > Time Spent: 3h 40m > Remaining Estimate: 0h > > I've experimented a bit trying to move the code to the JMS. It is > _surprisingly difficult_... A PoC that almost passes all checks is here: > https://github.com/dweiss/lucene/tree/jms > Here are my conclusions so far: > * The JMS and gradle add a lot of complexity (this applies to any > higher-level tooling, including IDEs, I think). For starters, modules have to > be JARs. The effect of this is that what was previously a set of directories > from dependencies now has to be a JAR. What was previously an incremental > update of a single .class file now ripples throughout the build recreating > module JARs (ZIPs!)... I didn't realize it at first, but it's a costly thing > to do. I'm not even sure how IDEs handle this issue. > * A Java module contains metadata (such as the module version or main class) > that is completely detached from any source file. These things live in a > class bytecode of the compiled module-info; interestingly, there is no > source-level way to specify it - these class attributes are injected by the > 'jar' tool. Gradle has some fancy on-the-fly asm conversion filter that > injects it. > * Dependencies between modules will effectively live in two places: in gradle > build files and in module-info files. And they can go out of sync, although > it's probably easy to catch (since javac would complain about missing classes > during compilation, even if they're in module path). > * Probably the biggest challenge (not covered in the PoC) are with our custom > javadoc and ecj linter tasks - they see the module-info.java and can't cope > with it. At the same time, there is no easy way to exclude that one > particular file: ecj would have to accept a full set of sources (command > argument limit will be a problem), javac can accept a full set of java > sources (external file) but then it doesn't copy doc-files properly anymore > (this is probably easier to fix). > * There are differences at runtime that are hard to anticipate - for example > resource lookups via class loader no longer work (I fixed this in Luke). > After poking a bit and trying it out I have to say I have mixed feelings > about moving to the JMS. On the one hand, many things are great - the module > path, module descriptors and access modes. On the other hand, the tooling > tricks required to make it all work make you shiver. > If anybody wants to play/ improve things on that experimental branch (I > converted Luke to a full module - it works), please be my guest. I have to > sit on this and think whether it's something I really like or not. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10255) Fully embrace the java module system
[ https://issues.apache.org/jira/browse/LUCENE-10255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450361#comment-17450361 ] Dawid Weiss commented on LUCENE-10255: -- Think this way: java's internal modules don't have the domain prefix either - they rely on the uniqueness of the first part (jdk., java.). I think this is sufficient. No need to be paranoid./ > Fully embrace the java module system > > > Key: LUCENE-10255 > URL: https://issues.apache.org/jira/browse/LUCENE-10255 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Dawid Weiss >Priority: Major > Attachments: screenshot-1.png, screenshot-2.png > > Time Spent: 3h 40m > Remaining Estimate: 0h > > I've experimented a bit trying to move the code to the JMS. It is > _surprisingly difficult_... A PoC that almost passes all checks is here: > https://github.com/dweiss/lucene/tree/jms > Here are my conclusions so far: > * The JMS and gradle add a lot of complexity (this applies to any > higher-level tooling, including IDEs, I think). For starters, modules have to > be JARs. The effect of this is that what was previously a set of directories > from dependencies now has to be a JAR. What was previously an incremental > update of a single .class file now ripples throughout the build recreating > module JARs (ZIPs!)... I didn't realize it at first, but it's a costly thing > to do. I'm not even sure how IDEs handle this issue. > * A Java module contains metadata (such as the module version or main class) > that is completely detached from any source file. These things live in a > class bytecode of the compiled module-info; interestingly, there is no > source-level way to specify it - these class attributes are injected by the > 'jar' tool. Gradle has some fancy on-the-fly asm conversion filter that > injects it. > * Dependencies between modules will effectively live in two places: in gradle > build files and in module-info files. And they can go out of sync, although > it's probably easy to catch (since javac would complain about missing classes > during compilation, even if they're in module path). > * Probably the biggest challenge (not covered in the PoC) are with our custom > javadoc and ecj linter tasks - they see the module-info.java and can't cope > with it. At the same time, there is no easy way to exclude that one > particular file: ecj would have to accept a full set of sources (command > argument limit will be a problem), javac can accept a full set of java > sources (external file) but then it doesn't copy doc-files properly anymore > (this is probably easier to fix). > * There are differences at runtime that are hard to anticipate - for example > resource lookups via class loader no longer work (I fixed this in Luke). > After poking a bit and trying it out I have to say I have mixed feelings > about moving to the JMS. On the one hand, many things are great - the module > path, module descriptors and access modes. On the other hand, the tooling > tricks required to make it all work make you shiver. > If anybody wants to play/ improve things on that experimental branch (I > converted Luke to a full module - it works), please be my guest. I have to > sit on this and think whether it's something I really like or not. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10267) Gradle does not write module version attribute for modules with zero dependencies
[ https://issues.apache.org/jira/browse/LUCENE-10267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450365#comment-17450365 ] Dawid Weiss commented on LUCENE-10267: -- I see this is a known issue - it was marked as a duplicate of: https://github.com/gradle/gradle/issues/17484 > Gradle does not write module version attribute for modules with zero > dependencies > - > > Key: LUCENE-10267 > URL: https://issues.apache.org/jira/browse/LUCENE-10267 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Dawid Weiss >Priority: Minor > Attachments: mod-version-repro.zip > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] romseygeek commented on pull request #477: LUCENE-10263: Implement Weight.count() on NormsFieldExistsQuery
romseygeek commented on pull request #477: URL: https://github.com/apache/lucene/pull/477#issuecomment-981531806 Have updated; the test is now docCount == maxDoc, which works even in the case that we have deleted docs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10267) Gradle does not write module version attribute for modules with zero dependencies
[ https://issues.apache.org/jira/browse/LUCENE-10267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450447#comment-17450447 ] Jerome Prinet commented on LUCENE-10267: Yes, it was already raised but not scoped yet. Your submission will help to give it more weight. > Gradle does not write module version attribute for modules with zero > dependencies > - > > Key: LUCENE-10267 > URL: https://issues.apache.org/jira/browse/LUCENE-10267 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Dawid Weiss >Priority: Minor > Attachments: mod-version-repro.zip > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10255) Fully embrace the java module system
[ https://issues.apache.org/jira/browse/LUCENE-10255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450472#comment-17450472 ] Uwe Schindler commented on LUCENE-10255: bq. java -m org.apache.lucene.core sounds way less attractive than just java -m lucene.core This is IMHO no argument for shorter names: If your own project is using the module system then you have a module-info.java, too. Then you can start it without hassle and won't specify any extra options. I would ask around if there's a standard already. I would really like to see consistent module names. "java", "jdk" prefix is different, because Java never had any modules names before, but Maven has/had package prefixes. There were discussions about this already on JDK mailing list together with Maven people, but I have to find them. I think Maven uses the artifact coordinates also for module name, but I am not 100% sure. Maybe [~rfscholte] has some more information what community standards have evolved. -1 to use "lucene" as module name prefix, +1 to use "org.apache.lucene" as prefix. > Fully embrace the java module system > > > Key: LUCENE-10255 > URL: https://issues.apache.org/jira/browse/LUCENE-10255 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Dawid Weiss >Priority: Major > Attachments: screenshot-1.png, screenshot-2.png > > Time Spent: 3h 40m > Remaining Estimate: 0h > > I've experimented a bit trying to move the code to the JMS. It is > _surprisingly difficult_... A PoC that almost passes all checks is here: > https://github.com/dweiss/lucene/tree/jms > Here are my conclusions so far: > * The JMS and gradle add a lot of complexity (this applies to any > higher-level tooling, including IDEs, I think). For starters, modules have to > be JARs. The effect of this is that what was previously a set of directories > from dependencies now has to be a JAR. What was previously an incremental > update of a single .class file now ripples throughout the build recreating > module JARs (ZIPs!)... I didn't realize it at first, but it's a costly thing > to do. I'm not even sure how IDEs handle this issue. > * A Java module contains metadata (such as the module version or main class) > that is completely detached from any source file. These things live in a > class bytecode of the compiled module-info; interestingly, there is no > source-level way to specify it - these class attributes are injected by the > 'jar' tool. Gradle has some fancy on-the-fly asm conversion filter that > injects it. > * Dependencies between modules will effectively live in two places: in gradle > build files and in module-info files. And they can go out of sync, although > it's probably easy to catch (since javac would complain about missing classes > during compilation, even if they're in module path). > * Probably the biggest challenge (not covered in the PoC) are with our custom > javadoc and ecj linter tasks - they see the module-info.java and can't cope > with it. At the same time, there is no easy way to exclude that one > particular file: ecj would have to accept a full set of sources (command > argument limit will be a problem), javac can accept a full set of java > sources (external file) but then it doesn't copy doc-files properly anymore > (this is probably easier to fix). > * There are differences at runtime that are hard to anticipate - for example > resource lookups via class loader no longer work (I fixed this in Luke). > After poking a bit and trying it out I have to say I have mixed feelings > about moving to the JMS. On the one hand, many things are great - the module > path, module descriptors and access modes. On the other hand, the tooling > tricks required to make it all work make you shiver. > If anybody wants to play/ improve things on that experimental branch (I > converted Luke to a full module - it works), please be my guest. I have to > sit on this and think whether it's something I really like or not. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] xaviersanchez commented on pull request #461: LUCENE-10248: Spanish Plural Stemmer
xaviersanchez commented on pull request #461: URL: https://github.com/apache/lucene/pull/461#issuecomment-981735988 > Hi @xaviersanchez, this contribution looks great. > > I'll do another pass on review and give some time for others to review as well. > > I did a little investigation at a glance, and I think it is confusing that the current `SpanishMinimalStemmer` is doing aggressive conversions such as `ñ -> n`. I think, as a followup issue, we should `@deprecate` the `SpanishMinimalStemmer` and point users to this one instead? > > `SpanishMinimalStemmer` is not a typical "upstream" algorithm, with academic papers/study from snowball or savoy, and there doesn't seem to be any reason to keep it anymore, except for a legacy index. So we could keep it around for another major release or so but not forever, IMO. Thanks @rmuir for the comment! Yes, I agree we could deprecate SpanishMinimalStemmer and point the users to this implementation since it can cover the same use cases. We implemented this a while ago so, before contributing our code, we did the analysis of the different behaviors of the Spanish stemmers just for checking we could provide some added value. From our analysis we see that SpanishMinimalStemmer has some issues and does some quite aggressive text normalization. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #477: LUCENE-10263: Implement Weight.count() on NormsFieldExistsQuery
rmuir commented on pull request #477: URL: https://github.com/apache/lucene/pull/477#issuecomment-981738439 Let's fix the CHANGES now that it works with deleted documents. I'm sad the optimization couldnt work because of a crazy corner case: which begs the question, why does the user care about corner cases of Norms? Shouldn't that be a implementation detail? e.g., should we deprecate this `NormsExistQuery`, and create a `TokensExistQuery` in its place, that has both this optimization, and the docCount-based opto (when there are no deleted docs). It would be faster, so I'd love to know the use-case where the user actually cares about low-level stuff like norms. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] romseygeek commented on pull request #477: LUCENE-10263: Implement Weight.count() on NormsFieldExistsQuery
romseygeek commented on pull request #477: URL: https://github.com/apache/lucene/pull/477#issuecomment-981755138 For a `TokensExistsQuery`, is the idea that the query part would work the same as norms, we just filter out docs with a norm of 0? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #477: LUCENE-10263: Implement Weight.count() on NormsFieldExistsQuery
rmuir commented on pull request #477: URL: https://github.com/apache/lucene/pull/477#issuecomment-981761039 > For a `TokensExistsQuery`, is the idea that the query part would work the same as norms, we just filter out docs with a norm of 0? yeah, at first at least. sounds like we need a zero-check because apparently put a norm in there when there's no tokens (which seems absolutely insane to me). Maybe we can fix it for a future index version and then remove the zero check. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #477: LUCENE-10263: Implement Weight.count() on NormsFieldExistsQuery
rmuir commented on pull request #477: URL: https://github.com/apache/lucene/pull/477#issuecomment-981765930 personally, i really feel if someone wants "empty string" to be considered "indexed" for cases like this, they should use KeywordTokenizer/StringField, and actually index that empty string? We've certainly suffered lots of pain to support indexing that damn thing, might as well lean on it for such cases, and keep lucene fast. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] iverase opened a new pull request #486: LUCENE-9619: Remove IntersectVisitor from PointsTree API
iverase opened a new pull request #486: URL: https://github.com/apache/lucene/pull/486 Introduces two functional interfaces, `DocValuesVisitor` and `DocIdsVisitor` that are used in the PointTree API instead of using the IntersectVisitor. The IntersectVisitor is now extending those interfaces. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] iverase commented on a change in pull request #486: LUCENE-9619: Remove IntersectVisitor from PointsTree API
iverase commented on a change in pull request #486: URL: https://github.com/apache/lucene/pull/486#discussion_r758504244 ## File path: lucene/core/src/java/org/apache/lucene/index/PointValues.java ## @@ -323,10 +355,18 @@ default void grow(int count) {} */ public final void intersect(IntersectVisitor visitor) throws IOException { final PointTree pointTree = getPointTree(); -intersect(visitor, pointTree); +intersect(wrapIntersectVisitor(visitor), pointTree); assert pointTree.moveToParent() == false; } + /** + * Adds the possibility of wrapping a provided {@link IntersectVisitor} in {@link + * #intersect(IntersectVisitor)}. + */ + protected IntersectVisitor wrapIntersectVisitor(IntersectVisitor visitor) throws IOException { +return visitor; + } Review comment: This added this entry point in order to wrap IntersectVisitor with an AssertingIntersectVisitor during testing. I don't really like it but the only other option is to make intersects method not final which I didn't like it either. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10255) Fully embrace the java module system
[ https://issues.apache.org/jira/browse/LUCENE-10255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450554#comment-17450554 ] Dawid Weiss commented on LUCENE-10255: -- Sorry but I remain unconvinced that typing a million times "org.apache." in various contexts wins you or me anything. Sure - maven coordinates are there as an example where this sort of makes sense (because all of the bazillion artifacts live under the same namespace tree). The module system is different though - there will be no name conflicts there if you shorten the module name to just "lucene". I don't see any gain in prefixing it with anything - the opposite, adding a prefix is a nuisance if the 'lucene' prefix is sufficiently unique to guarantee no conflicts with anything else. Even in the maven namespace some people opt for shorter prefixes (including various Apache commons libraries) [1]. [1] https://repo1.maven.org/maven2/commons-net/commons-net/ > Fully embrace the java module system > > > Key: LUCENE-10255 > URL: https://issues.apache.org/jira/browse/LUCENE-10255 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Dawid Weiss >Priority: Major > Attachments: screenshot-1.png, screenshot-2.png > > Time Spent: 3h 40m > Remaining Estimate: 0h > > I've experimented a bit trying to move the code to the JMS. It is > _surprisingly difficult_... A PoC that almost passes all checks is here: > https://github.com/dweiss/lucene/tree/jms > Here are my conclusions so far: > * The JMS and gradle add a lot of complexity (this applies to any > higher-level tooling, including IDEs, I think). For starters, modules have to > be JARs. The effect of this is that what was previously a set of directories > from dependencies now has to be a JAR. What was previously an incremental > update of a single .class file now ripples throughout the build recreating > module JARs (ZIPs!)... I didn't realize it at first, but it's a costly thing > to do. I'm not even sure how IDEs handle this issue. > * A Java module contains metadata (such as the module version or main class) > that is completely detached from any source file. These things live in a > class bytecode of the compiled module-info; interestingly, there is no > source-level way to specify it - these class attributes are injected by the > 'jar' tool. Gradle has some fancy on-the-fly asm conversion filter that > injects it. > * Dependencies between modules will effectively live in two places: in gradle > build files and in module-info files. And they can go out of sync, although > it's probably easy to catch (since javac would complain about missing classes > during compilation, even if they're in module path). > * Probably the biggest challenge (not covered in the PoC) are with our custom > javadoc and ecj linter tasks - they see the module-info.java and can't cope > with it. At the same time, there is no easy way to exclude that one > particular file: ecj would have to accept a full set of sources (command > argument limit will be a problem), javac can accept a full set of java > sources (external file) but then it doesn't copy doc-files properly anymore > (this is probably easier to fix). > * There are differences at runtime that are hard to anticipate - for example > resource lookups via class loader no longer work (I fixed this in Luke). > After poking a bit and trying it out I have to say I have mixed feelings > about moving to the JMS. On the one hand, many things are great - the module > path, module descriptors and access modes. On the other hand, the tooling > tricks required to make it all work make you shiver. > If anybody wants to play/ improve things on that experimental branch (I > converted Luke to a full module - it works), please be my guest. I have to > sit on this and think whether it's something I really like or not. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] romseygeek commented on pull request #477: LUCENE-10263: Implement Weight.count() on NormsFieldExistsQuery
romseygeek commented on pull request #477: URL: https://github.com/apache/lucene/pull/477#issuecomment-981775748 One disadvantage of renaming it is that it really does require norms to work; it might be a bit surprising to have a 'TokensExistsQuery' that you run against a field with norms disabled and it doesn't return anything. Or maybe it could throw an exception if the field in question doesn't have norms. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #477: LUCENE-10263: Implement Weight.count() on NormsFieldExistsQuery
rmuir commented on pull request #477: URL: https://github.com/apache/lucene/pull/477#issuecomment-981811821 > One disadvantage of renaming it is that it really does require norms to work; it might be a bit surprising to have a 'TokensExistsQuery' that you run against a field with norms disabled and it doesn't return anything. Or maybe it could throw an exception if the field in question doesn't have norms. +1 to an exception and documenting the restriction. It is crazy that the existing NormsFieldExistsQuery doesn't throw exception today when FieldInfo.omitNorms, instead silently returning `0`! This is clearly an error, like not indexing positions for a phrasequery. I personally think a new name would be more descriptive of what it does (clarifying the semantics to make it faster), and make more sense to users. We could even document that if you want to count empty strings, you should index empty strings as tokens. I suspect almost nobody cares about this previous empty string crap, seems overthought and now hurts our performance, due to the way the current query is named/defined. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10255) Fully embrace the java module system
[ https://issues.apache.org/jira/browse/LUCENE-10255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450583#comment-17450583 ] Uwe Schindler commented on LUCENE-10255: Hi, please have an overview on Maven central and the good work done by [~sormu...@gmx.de]: This table has module names and their artifact names extracted by a script from Maven central: https://github.com/sormuras/modules/blob/main/doc/Top1000-2020.txt.md (see also the repo: https://github.com/sormuras/modules) When looking at the Top 1000, you will se that all module names that can be found on Maven Central use the package names / coordinate names. If you read the JLS, they recommend for packages and modules only "simple names" for small projects without large outreach. Here is conclusion what he recommends: https://sormuras.github.io/blog/2019-08-04-maven-coordinates-and-java-module-names.html So I just repeat myself: Module names should really be unqiue. I don't care about 9.0, because its not officially announced, but when we enable the module system we should use unique names. O should we rename also all packages in Lucene's sozurce code and strip off org.apache? > Fully embrace the java module system > > > Key: LUCENE-10255 > URL: https://issues.apache.org/jira/browse/LUCENE-10255 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Dawid Weiss >Priority: Major > Attachments: screenshot-1.png, screenshot-2.png > > Time Spent: 3h 40m > Remaining Estimate: 0h > > I've experimented a bit trying to move the code to the JMS. It is > _surprisingly difficult_... A PoC that almost passes all checks is here: > https://github.com/dweiss/lucene/tree/jms > Here are my conclusions so far: > * The JMS and gradle add a lot of complexity (this applies to any > higher-level tooling, including IDEs, I think). For starters, modules have to > be JARs. The effect of this is that what was previously a set of directories > from dependencies now has to be a JAR. What was previously an incremental > update of a single .class file now ripples throughout the build recreating > module JARs (ZIPs!)... I didn't realize it at first, but it's a costly thing > to do. I'm not even sure how IDEs handle this issue. > * A Java module contains metadata (such as the module version or main class) > that is completely detached from any source file. These things live in a > class bytecode of the compiled module-info; interestingly, there is no > source-level way to specify it - these class attributes are injected by the > 'jar' tool. Gradle has some fancy on-the-fly asm conversion filter that > injects it. > * Dependencies between modules will effectively live in two places: in gradle > build files and in module-info files. And they can go out of sync, although > it's probably easy to catch (since javac would complain about missing classes > during compilation, even if they're in module path). > * Probably the biggest challenge (not covered in the PoC) are with our custom > javadoc and ecj linter tasks - they see the module-info.java and can't cope > with it. At the same time, there is no easy way to exclude that one > particular file: ecj would have to accept a full set of sources (command > argument limit will be a problem), javac can accept a full set of java > sources (external file) but then it doesn't copy doc-files properly anymore > (this is probably easier to fix). > * There are differences at runtime that are hard to anticipate - for example > resource lookups via class loader no longer work (I fixed this in Luke). > After poking a bit and trying it out I have to say I have mixed feelings > about moving to the JMS. On the one hand, many things are great - the module > path, module descriptors and access modes. On the other hand, the tooling > tricks required to make it all work make you shiver. > If anybody wants to play/ improve things on that experimental branch (I > converted Luke to a full module - it works), please be my guest. I have to > sit on this and think whether it's something I really like or not. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10255) Fully embrace the java module system
[ https://issues.apache.org/jira/browse/LUCENE-10255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450589#comment-17450589 ] Dawid Weiss commented on LUCENE-10255: -- You don't understand me, Uwe. I agree on maven central coordinates. I don't agree on full prefixes for module naming. I think "lucene." is unique enough. This is a subjective opinion and it's really no convincing me otherwise. If you want to push the full prefix - I'll live with it, but I don't agree it is necessary or useful or solves anything. > Fully embrace the java module system > > > Key: LUCENE-10255 > URL: https://issues.apache.org/jira/browse/LUCENE-10255 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Dawid Weiss >Priority: Major > Attachments: screenshot-1.png, screenshot-2.png > > Time Spent: 3h 40m > Remaining Estimate: 0h > > I've experimented a bit trying to move the code to the JMS. It is > _surprisingly difficult_... A PoC that almost passes all checks is here: > https://github.com/dweiss/lucene/tree/jms > Here are my conclusions so far: > * The JMS and gradle add a lot of complexity (this applies to any > higher-level tooling, including IDEs, I think). For starters, modules have to > be JARs. The effect of this is that what was previously a set of directories > from dependencies now has to be a JAR. What was previously an incremental > update of a single .class file now ripples throughout the build recreating > module JARs (ZIPs!)... I didn't realize it at first, but it's a costly thing > to do. I'm not even sure how IDEs handle this issue. > * A Java module contains metadata (such as the module version or main class) > that is completely detached from any source file. These things live in a > class bytecode of the compiled module-info; interestingly, there is no > source-level way to specify it - these class attributes are injected by the > 'jar' tool. Gradle has some fancy on-the-fly asm conversion filter that > injects it. > * Dependencies between modules will effectively live in two places: in gradle > build files and in module-info files. And they can go out of sync, although > it's probably easy to catch (since javac would complain about missing classes > during compilation, even if they're in module path). > * Probably the biggest challenge (not covered in the PoC) are with our custom > javadoc and ecj linter tasks - they see the module-info.java and can't cope > with it. At the same time, there is no easy way to exclude that one > particular file: ecj would have to accept a full set of sources (command > argument limit will be a problem), javac can accept a full set of java > sources (external file) but then it doesn't copy doc-files properly anymore > (this is probably easier to fix). > * There are differences at runtime that are hard to anticipate - for example > resource lookups via class loader no longer work (I fixed this in Luke). > After poking a bit and trying it out I have to say I have mixed feelings > about moving to the JMS. On the one hand, many things are great - the module > path, module descriptors and access modes. On the other hand, the tooling > tricks required to make it all work make you shiver. > If anybody wants to play/ improve things on that experimental branch (I > converted Luke to a full module - it works), please be my guest. I have to > sit on this and think whether it's something I really like or not. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-10255) Fully embrace the java module system
[ https://issues.apache.org/jira/browse/LUCENE-10255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450583#comment-17450583 ] Uwe Schindler edited comment on LUCENE-10255 at 11/29/21, 4:50 PM: --- Hi, please have an overview on Maven central and the good work done by [~sormu...@gmx.de]: This table has module names and their artifact names extracted by a script from Maven central: https://github.com/sormuras/modules/blob/main/doc/Top1000-2020.txt.md (see also the repo: https://github.com/sormuras/modules) When looking at the Top 1000, you will se that all module names that can be found on Maven Central use the package names / coordinate names. If you read the JLS, they recommend for packages and modules only "simple names" for small projects without large outreach. Here is conclusion what he recommends: https://sormuras.github.io/blog/2019-08-04-maven-coordinates-and-java-module-names.html So I just repeat myself: Module names should really be unqiue. I don't care about 9.0, because its not officially announced, but when we enable the module system we should use unique names. Or should we rename also all packages in Lucene's source code and strip off "org.apache."? was (Author: thetaphi): Hi, please have an overview on Maven central and the good work done by [~sormu...@gmx.de]: This table has module names and their artifact names extracted by a script from Maven central: https://github.com/sormuras/modules/blob/main/doc/Top1000-2020.txt.md (see also the repo: https://github.com/sormuras/modules) When looking at the Top 1000, you will se that all module names that can be found on Maven Central use the package names / coordinate names. If you read the JLS, they recommend for packages and modules only "simple names" for small projects without large outreach. Here is conclusion what he recommends: https://sormuras.github.io/blog/2019-08-04-maven-coordinates-and-java-module-names.html So I just repeat myself: Module names should really be unqiue. I don't care about 9.0, because its not officially announced, but when we enable the module system we should use unique names. O should we rename also all packages in Lucene's sozurce code and strip off org.apache? > Fully embrace the java module system > > > Key: LUCENE-10255 > URL: https://issues.apache.org/jira/browse/LUCENE-10255 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Dawid Weiss >Priority: Major > Attachments: screenshot-1.png, screenshot-2.png > > Time Spent: 3h 40m > Remaining Estimate: 0h > > I've experimented a bit trying to move the code to the JMS. It is > _surprisingly difficult_... A PoC that almost passes all checks is here: > https://github.com/dweiss/lucene/tree/jms > Here are my conclusions so far: > * The JMS and gradle add a lot of complexity (this applies to any > higher-level tooling, including IDEs, I think). For starters, modules have to > be JARs. The effect of this is that what was previously a set of directories > from dependencies now has to be a JAR. What was previously an incremental > update of a single .class file now ripples throughout the build recreating > module JARs (ZIPs!)... I didn't realize it at first, but it's a costly thing > to do. I'm not even sure how IDEs handle this issue. > * A Java module contains metadata (such as the module version or main class) > that is completely detached from any source file. These things live in a > class bytecode of the compiled module-info; interestingly, there is no > source-level way to specify it - these class attributes are injected by the > 'jar' tool. Gradle has some fancy on-the-fly asm conversion filter that > injects it. > * Dependencies between modules will effectively live in two places: in gradle > build files and in module-info files. And they can go out of sync, although > it's probably easy to catch (since javac would complain about missing classes > during compilation, even if they're in module path). > * Probably the biggest challenge (not covered in the PoC) are with our custom > javadoc and ecj linter tasks - they see the module-info.java and can't cope > with it. At the same time, there is no easy way to exclude that one > particular file: ecj would have to accept a full set of sources (command > argument limit will be a problem), javac can accept a full set of java > sources (external file) but then it doesn't copy doc-files properly anymore > (this is probably easier to fix). > * There are differences at runtime that are hard to anticipate - for example > resource lookups via class loader no longer work (I fixed this in Luke). > After poking a bit and trying it out I have to say I have mixed feelings > about moving to the JMS. On the one hand, man
[jira] [Comment Edited] (LUCENE-10255) Fully embrace the java module system
[ https://issues.apache.org/jira/browse/LUCENE-10255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450583#comment-17450583 ] Uwe Schindler edited comment on LUCENE-10255 at 11/29/21, 4:59 PM: --- Hi, please have an overview on Maven central and the good work done by [~sor]: This table has module names and their artifact names extracted by a script from Maven central: https://github.com/sormuras/modules/blob/main/doc/Top1000-2020.txt.md (see also the repo: https://github.com/sormuras/modules) When looking at the Top 1000, you will se that all module names that can be found on Maven Central use the package names / coordinate names. If you read the JLS, they recommend for packages and modules only "simple names" for small projects without large outreach. Here is conclusion what he recommends: https://sormuras.github.io/blog/2019-08-04-maven-coordinates-and-java-module-names.html So I just repeat myself: Module names should really be unqiue. I don't care about 9.0, because its not officially announced, but when we enable the module system we should use unique names. Or should we rename also all packages in Lucene's source code and strip off "org.apache."? was (Author: thetaphi): Hi, please have an overview on Maven central and the good work done by [~sormu...@gmx.de]: This table has module names and their artifact names extracted by a script from Maven central: https://github.com/sormuras/modules/blob/main/doc/Top1000-2020.txt.md (see also the repo: https://github.com/sormuras/modules) When looking at the Top 1000, you will se that all module names that can be found on Maven Central use the package names / coordinate names. If you read the JLS, they recommend for packages and modules only "simple names" for small projects without large outreach. Here is conclusion what he recommends: https://sormuras.github.io/blog/2019-08-04-maven-coordinates-and-java-module-names.html So I just repeat myself: Module names should really be unqiue. I don't care about 9.0, because its not officially announced, but when we enable the module system we should use unique names. Or should we rename also all packages in Lucene's source code and strip off "org.apache."? > Fully embrace the java module system > > > Key: LUCENE-10255 > URL: https://issues.apache.org/jira/browse/LUCENE-10255 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Dawid Weiss >Priority: Major > Attachments: screenshot-1.png, screenshot-2.png > > Time Spent: 3h 40m > Remaining Estimate: 0h > > I've experimented a bit trying to move the code to the JMS. It is > _surprisingly difficult_... A PoC that almost passes all checks is here: > https://github.com/dweiss/lucene/tree/jms > Here are my conclusions so far: > * The JMS and gradle add a lot of complexity (this applies to any > higher-level tooling, including IDEs, I think). For starters, modules have to > be JARs. The effect of this is that what was previously a set of directories > from dependencies now has to be a JAR. What was previously an incremental > update of a single .class file now ripples throughout the build recreating > module JARs (ZIPs!)... I didn't realize it at first, but it's a costly thing > to do. I'm not even sure how IDEs handle this issue. > * A Java module contains metadata (such as the module version or main class) > that is completely detached from any source file. These things live in a > class bytecode of the compiled module-info; interestingly, there is no > source-level way to specify it - these class attributes are injected by the > 'jar' tool. Gradle has some fancy on-the-fly asm conversion filter that > injects it. > * Dependencies between modules will effectively live in two places: in gradle > build files and in module-info files. And they can go out of sync, although > it's probably easy to catch (since javac would complain about missing classes > during compilation, even if they're in module path). > * Probably the biggest challenge (not covered in the PoC) are with our custom > javadoc and ecj linter tasks - they see the module-info.java and can't cope > with it. At the same time, there is no easy way to exclude that one > particular file: ecj would have to accept a full set of sources (command > argument limit will be a problem), javac can accept a full set of java > sources (external file) but then it doesn't copy doc-files properly anymore > (this is probably easier to fix). > * There are differences at runtime that are hard to anticipate - for example > resource lookups via class loader no longer work (I fixed this in Luke). > After poking a bit and trying it out I have to say I have mixed feelings > about moving to the JMS. On the one hand, many things a
[jira] [Commented] (LUCENE-10255) Fully embrace the java module system
[ https://issues.apache.org/jira/browse/LUCENE-10255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450597#comment-17450597 ] Uwe Schindler commented on LUCENE-10255: We can disagree, for sure, but I'd like to get some more opinions. This MUST be a community decision. I gave my well educated opinion and invite everybody to read this blog post: https://sormuras.github.io/blog/2019-08-04-maven-coordinates-and-java-module-names.html; [~sor] explains very well how a module name should look like. The module names inside java/jdk are short, but the same is for package names. There is also the satement: The package names in every module *should* start with the module name (this is not always fully possible, but a good rule is that module name and package name should have a common prefix). > Fully embrace the java module system > > > Key: LUCENE-10255 > URL: https://issues.apache.org/jira/browse/LUCENE-10255 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Dawid Weiss >Priority: Major > Attachments: screenshot-1.png, screenshot-2.png > > Time Spent: 3h 40m > Remaining Estimate: 0h > > I've experimented a bit trying to move the code to the JMS. It is > _surprisingly difficult_... A PoC that almost passes all checks is here: > https://github.com/dweiss/lucene/tree/jms > Here are my conclusions so far: > * The JMS and gradle add a lot of complexity (this applies to any > higher-level tooling, including IDEs, I think). For starters, modules have to > be JARs. The effect of this is that what was previously a set of directories > from dependencies now has to be a JAR. What was previously an incremental > update of a single .class file now ripples throughout the build recreating > module JARs (ZIPs!)... I didn't realize it at first, but it's a costly thing > to do. I'm not even sure how IDEs handle this issue. > * A Java module contains metadata (such as the module version or main class) > that is completely detached from any source file. These things live in a > class bytecode of the compiled module-info; interestingly, there is no > source-level way to specify it - these class attributes are injected by the > 'jar' tool. Gradle has some fancy on-the-fly asm conversion filter that > injects it. > * Dependencies between modules will effectively live in two places: in gradle > build files and in module-info files. And they can go out of sync, although > it's probably easy to catch (since javac would complain about missing classes > during compilation, even if they're in module path). > * Probably the biggest challenge (not covered in the PoC) are with our custom > javadoc and ecj linter tasks - they see the module-info.java and can't cope > with it. At the same time, there is no easy way to exclude that one > particular file: ecj would have to accept a full set of sources (command > argument limit will be a problem), javac can accept a full set of java > sources (external file) but then it doesn't copy doc-files properly anymore > (this is probably easier to fix). > * There are differences at runtime that are hard to anticipate - for example > resource lookups via class loader no longer work (I fixed this in Luke). > After poking a bit and trying it out I have to say I have mixed feelings > about moving to the JMS. On the one hand, many things are great - the module > path, module descriptors and access modes. On the other hand, the tooling > tricks required to make it all work make you shiver. > If anybody wants to play/ improve things on that experimental branch (I > converted Luke to a full module - it works), please be my guest. I have to > sit on this and think whether it's something I really like or not. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10255) Fully embrace the java module system
[ https://issues.apache.org/jira/browse/LUCENE-10255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450604#comment-17450604 ] Dawid Weiss commented on LUCENE-10255: -- Sure, Uwe. I think I expressed my personal opinion. :) Some of our current module naming cannot be converted to modules (anything with a dash). If you want consistency then the first step would be to rename those modules in the repo. > Fully embrace the java module system > > > Key: LUCENE-10255 > URL: https://issues.apache.org/jira/browse/LUCENE-10255 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Dawid Weiss >Priority: Major > Attachments: screenshot-1.png, screenshot-2.png > > Time Spent: 3h 40m > Remaining Estimate: 0h > > I've experimented a bit trying to move the code to the JMS. It is > _surprisingly difficult_... A PoC that almost passes all checks is here: > https://github.com/dweiss/lucene/tree/jms > Here are my conclusions so far: > * The JMS and gradle add a lot of complexity (this applies to any > higher-level tooling, including IDEs, I think). For starters, modules have to > be JARs. The effect of this is that what was previously a set of directories > from dependencies now has to be a JAR. What was previously an incremental > update of a single .class file now ripples throughout the build recreating > module JARs (ZIPs!)... I didn't realize it at first, but it's a costly thing > to do. I'm not even sure how IDEs handle this issue. > * A Java module contains metadata (such as the module version or main class) > that is completely detached from any source file. These things live in a > class bytecode of the compiled module-info; interestingly, there is no > source-level way to specify it - these class attributes are injected by the > 'jar' tool. Gradle has some fancy on-the-fly asm conversion filter that > injects it. > * Dependencies between modules will effectively live in two places: in gradle > build files and in module-info files. And they can go out of sync, although > it's probably easy to catch (since javac would complain about missing classes > during compilation, even if they're in module path). > * Probably the biggest challenge (not covered in the PoC) are with our custom > javadoc and ecj linter tasks - they see the module-info.java and can't cope > with it. At the same time, there is no easy way to exclude that one > particular file: ecj would have to accept a full set of sources (command > argument limit will be a problem), javac can accept a full set of java > sources (external file) but then it doesn't copy doc-files properly anymore > (this is probably easier to fix). > * There are differences at runtime that are hard to anticipate - for example > resource lookups via class loader no longer work (I fixed this in Luke). > After poking a bit and trying it out I have to say I have mixed feelings > about moving to the JMS. On the one hand, many things are great - the module > path, module descriptors and access modes. On the other hand, the tooling > tricks required to make it all work make you shiver. > If anybody wants to play/ improve things on that experimental branch (I > converted Luke to a full module - it works), please be my guest. I have to > sit on this and think whether it's something I really like or not. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #477: LUCENE-10263: Implement Weight.count() on NormsFieldExistsQuery
rmuir commented on pull request #477: URL: https://github.com/apache/lucene/pull/477#issuecomment-981837493 and btw i'm not suggesting we do all this crap underneath this PR, the current PR looks fine to me (the optimization it uses is safe) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10255) Fully embrace the java module system
[ https://issues.apache.org/jira/browse/LUCENE-10255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450607#comment-17450607 ] Uwe Schindler commented on LUCENE-10255: >From the list posted before there is also an example made you: >"com.carrotsearch.hppc" is module name of the maven artifact >"com.carrotsearch:hppc". This was exactly also my proposal for Lucene: https://github.com/carrotsearch/hppc/blob/29ab369adac23a76acae1d08529654b2c2dc59e5/gradle/java/compiler.gradle#L24 > Fully embrace the java module system > > > Key: LUCENE-10255 > URL: https://issues.apache.org/jira/browse/LUCENE-10255 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Dawid Weiss >Priority: Major > Attachments: screenshot-1.png, screenshot-2.png > > Time Spent: 3h 40m > Remaining Estimate: 0h > > I've experimented a bit trying to move the code to the JMS. It is > _surprisingly difficult_... A PoC that almost passes all checks is here: > https://github.com/dweiss/lucene/tree/jms > Here are my conclusions so far: > * The JMS and gradle add a lot of complexity (this applies to any > higher-level tooling, including IDEs, I think). For starters, modules have to > be JARs. The effect of this is that what was previously a set of directories > from dependencies now has to be a JAR. What was previously an incremental > update of a single .class file now ripples throughout the build recreating > module JARs (ZIPs!)... I didn't realize it at first, but it's a costly thing > to do. I'm not even sure how IDEs handle this issue. > * A Java module contains metadata (such as the module version or main class) > that is completely detached from any source file. These things live in a > class bytecode of the compiled module-info; interestingly, there is no > source-level way to specify it - these class attributes are injected by the > 'jar' tool. Gradle has some fancy on-the-fly asm conversion filter that > injects it. > * Dependencies between modules will effectively live in two places: in gradle > build files and in module-info files. And they can go out of sync, although > it's probably easy to catch (since javac would complain about missing classes > during compilation, even if they're in module path). > * Probably the biggest challenge (not covered in the PoC) are with our custom > javadoc and ecj linter tasks - they see the module-info.java and can't cope > with it. At the same time, there is no easy way to exclude that one > particular file: ecj would have to accept a full set of sources (command > argument limit will be a problem), javac can accept a full set of java > sources (external file) but then it doesn't copy doc-files properly anymore > (this is probably easier to fix). > * There are differences at runtime that are hard to anticipate - for example > resource lookups via class loader no longer work (I fixed this in Luke). > After poking a bit and trying it out I have to say I have mixed feelings > about moving to the JMS. On the one hand, many things are great - the module > path, module descriptors and access modes. On the other hand, the tooling > tricks required to make it all work make you shiver. > If anybody wants to play/ improve things on that experimental branch (I > converted Luke to a full module - it works), please be my guest. I have to > sit on this and think whether it's something I really like or not. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-10255) Fully embrace the java module system
[ https://issues.apache.org/jira/browse/LUCENE-10255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450607#comment-17450607 ] Uwe Schindler edited comment on LUCENE-10255 at 11/29/21, 5:14 PM: --- >From the list posted before there is also an example made by you: >"com.carrotsearch.hppc" is module name of the maven artifact >"com.carrotsearch:hppc". This was exactly also my proposal for Lucene: https://github.com/carrotsearch/hppc/blob/29ab369adac23a76acae1d08529654b2c2dc59e5/gradle/java/compiler.gradle#L24 was (Author: thetaphi): >From the list posted before there is also an example made you: >"com.carrotsearch.hppc" is module name of the maven artifact >"com.carrotsearch:hppc". This was exactly also my proposal for Lucene: https://github.com/carrotsearch/hppc/blob/29ab369adac23a76acae1d08529654b2c2dc59e5/gradle/java/compiler.gradle#L24 > Fully embrace the java module system > > > Key: LUCENE-10255 > URL: https://issues.apache.org/jira/browse/LUCENE-10255 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Dawid Weiss >Priority: Major > Attachments: screenshot-1.png, screenshot-2.png > > Time Spent: 3h 40m > Remaining Estimate: 0h > > I've experimented a bit trying to move the code to the JMS. It is > _surprisingly difficult_... A PoC that almost passes all checks is here: > https://github.com/dweiss/lucene/tree/jms > Here are my conclusions so far: > * The JMS and gradle add a lot of complexity (this applies to any > higher-level tooling, including IDEs, I think). For starters, modules have to > be JARs. The effect of this is that what was previously a set of directories > from dependencies now has to be a JAR. What was previously an incremental > update of a single .class file now ripples throughout the build recreating > module JARs (ZIPs!)... I didn't realize it at first, but it's a costly thing > to do. I'm not even sure how IDEs handle this issue. > * A Java module contains metadata (such as the module version or main class) > that is completely detached from any source file. These things live in a > class bytecode of the compiled module-info; interestingly, there is no > source-level way to specify it - these class attributes are injected by the > 'jar' tool. Gradle has some fancy on-the-fly asm conversion filter that > injects it. > * Dependencies between modules will effectively live in two places: in gradle > build files and in module-info files. And they can go out of sync, although > it's probably easy to catch (since javac would complain about missing classes > during compilation, even if they're in module path). > * Probably the biggest challenge (not covered in the PoC) are with our custom > javadoc and ecj linter tasks - they see the module-info.java and can't cope > with it. At the same time, there is no easy way to exclude that one > particular file: ecj would have to accept a full set of sources (command > argument limit will be a problem), javac can accept a full set of java > sources (external file) but then it doesn't copy doc-files properly anymore > (this is probably easier to fix). > * There are differences at runtime that are hard to anticipate - for example > resource lookups via class loader no longer work (I fixed this in Luke). > After poking a bit and trying it out I have to say I have mixed feelings > about moving to the JMS. On the one hand, many things are great - the module > path, module descriptors and access modes. On the other hand, the tooling > tricks required to make it all work make you shiver. > If anybody wants to play/ improve things on that experimental branch (I > converted Luke to a full module - it works), please be my guest. I have to > sit on this and think whether it's something I really like or not. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10255) Fully embrace the java module system
[ https://issues.apache.org/jira/browse/LUCENE-10255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450613#comment-17450613 ] Dawid Weiss commented on LUCENE-10255: -- Mistakes of the youth... I remain unconvinced. > Fully embrace the java module system > > > Key: LUCENE-10255 > URL: https://issues.apache.org/jira/browse/LUCENE-10255 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Dawid Weiss >Priority: Major > Attachments: screenshot-1.png, screenshot-2.png > > Time Spent: 3h 40m > Remaining Estimate: 0h > > I've experimented a bit trying to move the code to the JMS. It is > _surprisingly difficult_... A PoC that almost passes all checks is here: > https://github.com/dweiss/lucene/tree/jms > Here are my conclusions so far: > * The JMS and gradle add a lot of complexity (this applies to any > higher-level tooling, including IDEs, I think). For starters, modules have to > be JARs. The effect of this is that what was previously a set of directories > from dependencies now has to be a JAR. What was previously an incremental > update of a single .class file now ripples throughout the build recreating > module JARs (ZIPs!)... I didn't realize it at first, but it's a costly thing > to do. I'm not even sure how IDEs handle this issue. > * A Java module contains metadata (such as the module version or main class) > that is completely detached from any source file. These things live in a > class bytecode of the compiled module-info; interestingly, there is no > source-level way to specify it - these class attributes are injected by the > 'jar' tool. Gradle has some fancy on-the-fly asm conversion filter that > injects it. > * Dependencies between modules will effectively live in two places: in gradle > build files and in module-info files. And they can go out of sync, although > it's probably easy to catch (since javac would complain about missing classes > during compilation, even if they're in module path). > * Probably the biggest challenge (not covered in the PoC) are with our custom > javadoc and ecj linter tasks - they see the module-info.java and can't cope > with it. At the same time, there is no easy way to exclude that one > particular file: ecj would have to accept a full set of sources (command > argument limit will be a problem), javac can accept a full set of java > sources (external file) but then it doesn't copy doc-files properly anymore > (this is probably easier to fix). > * There are differences at runtime that are hard to anticipate - for example > resource lookups via class loader no longer work (I fixed this in Luke). > After poking a bit and trying it out I have to say I have mixed feelings > about moving to the JMS. On the one hand, many things are great - the module > path, module descriptors and access modes. On the other hand, the tooling > tricks required to make it all work make you shiver. > If anybody wants to play/ improve things on that experimental branch (I > converted Luke to a full module - it works), please be my guest. I have to > sit on this and think whether it's something I really like or not. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10255) Fully embrace the java module system
[ https://issues.apache.org/jira/browse/LUCENE-10255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450620#comment-17450620 ] Uwe Schindler commented on LUCENE-10255: Apache TIKA also uses module names according to the spec: https://github.com/apache/tika/blob/9d29536228860860549d89a052673d47c2af75ca/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-xmp-commons/pom.xml#L48 I just figured out that you already added Automatic Module names to the 9.0 release, which are not even hardcoded, but derived through regular expressions/search replace. This was done completely without any announcement, so we have a Lucene release with broken names going out soon. I am glad that nobody takes care about modules at the moment... > Fully embrace the java module system > > > Key: LUCENE-10255 > URL: https://issues.apache.org/jira/browse/LUCENE-10255 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Dawid Weiss >Priority: Major > Attachments: screenshot-1.png, screenshot-2.png > > Time Spent: 3h 40m > Remaining Estimate: 0h > > I've experimented a bit trying to move the code to the JMS. It is > _surprisingly difficult_... A PoC that almost passes all checks is here: > https://github.com/dweiss/lucene/tree/jms > Here are my conclusions so far: > * The JMS and gradle add a lot of complexity (this applies to any > higher-level tooling, including IDEs, I think). For starters, modules have to > be JARs. The effect of this is that what was previously a set of directories > from dependencies now has to be a JAR. What was previously an incremental > update of a single .class file now ripples throughout the build recreating > module JARs (ZIPs!)... I didn't realize it at first, but it's a costly thing > to do. I'm not even sure how IDEs handle this issue. > * A Java module contains metadata (such as the module version or main class) > that is completely detached from any source file. These things live in a > class bytecode of the compiled module-info; interestingly, there is no > source-level way to specify it - these class attributes are injected by the > 'jar' tool. Gradle has some fancy on-the-fly asm conversion filter that > injects it. > * Dependencies between modules will effectively live in two places: in gradle > build files and in module-info files. And they can go out of sync, although > it's probably easy to catch (since javac would complain about missing classes > during compilation, even if they're in module path). > * Probably the biggest challenge (not covered in the PoC) are with our custom > javadoc and ecj linter tasks - they see the module-info.java and can't cope > with it. At the same time, there is no easy way to exclude that one > particular file: ecj would have to accept a full set of sources (command > argument limit will be a problem), javac can accept a full set of java > sources (external file) but then it doesn't copy doc-files properly anymore > (this is probably easier to fix). > * There are differences at runtime that are hard to anticipate - for example > resource lookups via class loader no longer work (I fixed this in Luke). > After poking a bit and trying it out I have to say I have mixed feelings > about moving to the JMS. On the one hand, many things are great - the module > path, module descriptors and access modes. On the other hand, the tooling > tricks required to make it all work make you shiver. > If anybody wants to play/ improve things on that experimental branch (I > converted Luke to a full module - it works), please be my guest. I have to > sit on this and think whether it's something I really like or not. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-10255) Fully embrace the java module system
[ https://issues.apache.org/jira/browse/LUCENE-10255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450620#comment-17450620 ] Uwe Schindler edited comment on LUCENE-10255 at 11/29/21, 5:41 PM: --- Apache TIKA also uses module names according to the spec: https://github.com/apache/tika/blob/9d29536228860860549d89a052673d47c2af75ca/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-xmp-commons/pom.xml#L48 I just figured out that you already added Automatic Module names to the 9.0 release, which are not even hardcoded, but derived through regular expressions/search-replace from the internal gradle project path. This was done completely without any announcement, so we have a Lucene release with broken names going out soon. I am glad that nobody takes care about modules at the moment... was (Author: thetaphi): Apache TIKA also uses module names according to the spec: https://github.com/apache/tika/blob/9d29536228860860549d89a052673d47c2af75ca/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-xmp-commons/pom.xml#L48 I just figured out that you already added Automatic Module names to the 9.0 release, which are not even hardcoded, but derived through regular expressions/search replace. This was done completely without any announcement, so we have a Lucene release with broken names going out soon. I am glad that nobody takes care about modules at the moment... > Fully embrace the java module system > > > Key: LUCENE-10255 > URL: https://issues.apache.org/jira/browse/LUCENE-10255 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Dawid Weiss >Priority: Major > Attachments: screenshot-1.png, screenshot-2.png > > Time Spent: 3h 40m > Remaining Estimate: 0h > > I've experimented a bit trying to move the code to the JMS. It is > _surprisingly difficult_... A PoC that almost passes all checks is here: > https://github.com/dweiss/lucene/tree/jms > Here are my conclusions so far: > * The JMS and gradle add a lot of complexity (this applies to any > higher-level tooling, including IDEs, I think). For starters, modules have to > be JARs. The effect of this is that what was previously a set of directories > from dependencies now has to be a JAR. What was previously an incremental > update of a single .class file now ripples throughout the build recreating > module JARs (ZIPs!)... I didn't realize it at first, but it's a costly thing > to do. I'm not even sure how IDEs handle this issue. > * A Java module contains metadata (such as the module version or main class) > that is completely detached from any source file. These things live in a > class bytecode of the compiled module-info; interestingly, there is no > source-level way to specify it - these class attributes are injected by the > 'jar' tool. Gradle has some fancy on-the-fly asm conversion filter that > injects it. > * Dependencies between modules will effectively live in two places: in gradle > build files and in module-info files. And they can go out of sync, although > it's probably easy to catch (since javac would complain about missing classes > during compilation, even if they're in module path). > * Probably the biggest challenge (not covered in the PoC) are with our custom > javadoc and ecj linter tasks - they see the module-info.java and can't cope > with it. At the same time, there is no easy way to exclude that one > particular file: ecj would have to accept a full set of sources (command > argument limit will be a problem), javac can accept a full set of java > sources (external file) but then it doesn't copy doc-files properly anymore > (this is probably easier to fix). > * There are differences at runtime that are hard to anticipate - for example > resource lookups via class loader no longer work (I fixed this in Luke). > After poking a bit and trying it out I have to say I have mixed feelings > about moving to the JMS. On the one hand, many things are great - the module > path, module descriptors and access modes. On the other hand, the tooling > tricks required to make it all work make you shiver. > If anybody wants to play/ improve things on that experimental branch (I > converted Luke to a full module - it works), please be my guest. I have to > sit on this and think whether it's something I really like or not. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10255) Fully embrace the java module system
[ https://issues.apache.org/jira/browse/LUCENE-10255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450624#comment-17450624 ] Dawid Weiss commented on LUCENE-10255: -- Uwe... This was done with an announcement on the pull request and the issue. And it's also literally everywhere in the scripts you've reviewed ("-m lucene.luke"). If you really care so much about it and wish to change it to a full prefix we can still do it - 9.0 is not out yet. > Fully embrace the java module system > > > Key: LUCENE-10255 > URL: https://issues.apache.org/jira/browse/LUCENE-10255 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Dawid Weiss >Priority: Major > Attachments: screenshot-1.png, screenshot-2.png > > Time Spent: 3h 40m > Remaining Estimate: 0h > > I've experimented a bit trying to move the code to the JMS. It is > _surprisingly difficult_... A PoC that almost passes all checks is here: > https://github.com/dweiss/lucene/tree/jms > Here are my conclusions so far: > * The JMS and gradle add a lot of complexity (this applies to any > higher-level tooling, including IDEs, I think). For starters, modules have to > be JARs. The effect of this is that what was previously a set of directories > from dependencies now has to be a JAR. What was previously an incremental > update of a single .class file now ripples throughout the build recreating > module JARs (ZIPs!)... I didn't realize it at first, but it's a costly thing > to do. I'm not even sure how IDEs handle this issue. > * A Java module contains metadata (such as the module version or main class) > that is completely detached from any source file. These things live in a > class bytecode of the compiled module-info; interestingly, there is no > source-level way to specify it - these class attributes are injected by the > 'jar' tool. Gradle has some fancy on-the-fly asm conversion filter that > injects it. > * Dependencies between modules will effectively live in two places: in gradle > build files and in module-info files. And they can go out of sync, although > it's probably easy to catch (since javac would complain about missing classes > during compilation, even if they're in module path). > * Probably the biggest challenge (not covered in the PoC) are with our custom > javadoc and ecj linter tasks - they see the module-info.java and can't cope > with it. At the same time, there is no easy way to exclude that one > particular file: ecj would have to accept a full set of sources (command > argument limit will be a problem), javac can accept a full set of java > sources (external file) but then it doesn't copy doc-files properly anymore > (this is probably easier to fix). > * There are differences at runtime that are hard to anticipate - for example > resource lookups via class loader no longer work (I fixed this in Luke). > After poking a bit and trying it out I have to say I have mixed feelings > about moving to the JMS. On the one hand, many things are great - the module > path, module descriptors and access modes. On the other hand, the tooling > tricks required to make it all work make you shiver. > If anybody wants to play/ improve things on that experimental branch (I > converted Luke to a full module - it works), please be my guest. I have to > sit on this and think whether it's something I really like or not. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10255) Fully embrace the java module system
[ https://issues.apache.org/jira/browse/LUCENE-10255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450625#comment-17450625 ] Uwe Schindler commented on LUCENE-10255: bq. I just figured out that you already added Automatic Module names to the 9.0 release, which are not even hardcoded, but derived through regular expressions/search-replace from the internal gradle project path. This is even more risky if we decide to remove the ":lucene" top level Gradle folder, then the module name changes and nobody will notice! Everything that's relevant to source code of downstream users should be explicitly declared (either in module-info.java or in the manifest). > Fully embrace the java module system > > > Key: LUCENE-10255 > URL: https://issues.apache.org/jira/browse/LUCENE-10255 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Dawid Weiss >Priority: Major > Attachments: screenshot-1.png, screenshot-2.png > > Time Spent: 3h 40m > Remaining Estimate: 0h > > I've experimented a bit trying to move the code to the JMS. It is > _surprisingly difficult_... A PoC that almost passes all checks is here: > https://github.com/dweiss/lucene/tree/jms > Here are my conclusions so far: > * The JMS and gradle add a lot of complexity (this applies to any > higher-level tooling, including IDEs, I think). For starters, modules have to > be JARs. The effect of this is that what was previously a set of directories > from dependencies now has to be a JAR. What was previously an incremental > update of a single .class file now ripples throughout the build recreating > module JARs (ZIPs!)... I didn't realize it at first, but it's a costly thing > to do. I'm not even sure how IDEs handle this issue. > * A Java module contains metadata (such as the module version or main class) > that is completely detached from any source file. These things live in a > class bytecode of the compiled module-info; interestingly, there is no > source-level way to specify it - these class attributes are injected by the > 'jar' tool. Gradle has some fancy on-the-fly asm conversion filter that > injects it. > * Dependencies between modules will effectively live in two places: in gradle > build files and in module-info files. And they can go out of sync, although > it's probably easy to catch (since javac would complain about missing classes > during compilation, even if they're in module path). > * Probably the biggest challenge (not covered in the PoC) are with our custom > javadoc and ecj linter tasks - they see the module-info.java and can't cope > with it. At the same time, there is no easy way to exclude that one > particular file: ecj would have to accept a full set of sources (command > argument limit will be a problem), javac can accept a full set of java > sources (external file) but then it doesn't copy doc-files properly anymore > (this is probably easier to fix). > * There are differences at runtime that are hard to anticipate - for example > resource lookups via class loader no longer work (I fixed this in Luke). > After poking a bit and trying it out I have to say I have mixed feelings > about moving to the JMS. On the one hand, many things are great - the module > path, module descriptors and access modes. On the other hand, the tooling > tricks required to make it all work make you shiver. > If anybody wants to play/ improve things on that experimental branch (I > converted Luke to a full module - it works), please be my guest. I have to > sit on this and think whether it's something I really like or not. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10255) Fully embrace the java module system
[ https://issues.apache.org/jira/browse/LUCENE-10255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450627#comment-17450627 ] Uwe Schindler commented on LUCENE-10255: bq. This was done with an announcement on the pull request These are so important changes that it should have been a post on mailing list! > Fully embrace the java module system > > > Key: LUCENE-10255 > URL: https://issues.apache.org/jira/browse/LUCENE-10255 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Dawid Weiss >Priority: Major > Attachments: screenshot-1.png, screenshot-2.png > > Time Spent: 3h 40m > Remaining Estimate: 0h > > I've experimented a bit trying to move the code to the JMS. It is > _surprisingly difficult_... A PoC that almost passes all checks is here: > https://github.com/dweiss/lucene/tree/jms > Here are my conclusions so far: > * The JMS and gradle add a lot of complexity (this applies to any > higher-level tooling, including IDEs, I think). For starters, modules have to > be JARs. The effect of this is that what was previously a set of directories > from dependencies now has to be a JAR. What was previously an incremental > update of a single .class file now ripples throughout the build recreating > module JARs (ZIPs!)... I didn't realize it at first, but it's a costly thing > to do. I'm not even sure how IDEs handle this issue. > * A Java module contains metadata (such as the module version or main class) > that is completely detached from any source file. These things live in a > class bytecode of the compiled module-info; interestingly, there is no > source-level way to specify it - these class attributes are injected by the > 'jar' tool. Gradle has some fancy on-the-fly asm conversion filter that > injects it. > * Dependencies between modules will effectively live in two places: in gradle > build files and in module-info files. And they can go out of sync, although > it's probably easy to catch (since javac would complain about missing classes > during compilation, even if they're in module path). > * Probably the biggest challenge (not covered in the PoC) are with our custom > javadoc and ecj linter tasks - they see the module-info.java and can't cope > with it. At the same time, there is no easy way to exclude that one > particular file: ecj would have to accept a full set of sources (command > argument limit will be a problem), javac can accept a full set of java > sources (external file) but then it doesn't copy doc-files properly anymore > (this is probably easier to fix). > * There are differences at runtime that are hard to anticipate - for example > resource lookups via class loader no longer work (I fixed this in Luke). > After poking a bit and trying it out I have to say I have mixed feelings > about moving to the JMS. On the one hand, many things are great - the module > path, module descriptors and access modes. On the other hand, the tooling > tricks required to make it all work make you shiver. > If anybody wants to play/ improve things on that experimental branch (I > converted Luke to a full module - it works), please be my guest. I have to > sit on this and think whether it's something I really like or not. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10255) Fully embrace the java module system
[ https://issues.apache.org/jira/browse/LUCENE-10255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450628#comment-17450628 ] Dawid Weiss commented on LUCENE-10255: -- https://issues.apache.org/jira/browse/LUCENE-10234 > These are so important changes that it should have been a post on mailing > list! Sure. It wasn't a change though - it was an introduction of what wasn't there before at all. > Fully embrace the java module system > > > Key: LUCENE-10255 > URL: https://issues.apache.org/jira/browse/LUCENE-10255 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Dawid Weiss >Priority: Major > Attachments: screenshot-1.png, screenshot-2.png > > Time Spent: 3h 40m > Remaining Estimate: 0h > > I've experimented a bit trying to move the code to the JMS. It is > _surprisingly difficult_... A PoC that almost passes all checks is here: > https://github.com/dweiss/lucene/tree/jms > Here are my conclusions so far: > * The JMS and gradle add a lot of complexity (this applies to any > higher-level tooling, including IDEs, I think). For starters, modules have to > be JARs. The effect of this is that what was previously a set of directories > from dependencies now has to be a JAR. What was previously an incremental > update of a single .class file now ripples throughout the build recreating > module JARs (ZIPs!)... I didn't realize it at first, but it's a costly thing > to do. I'm not even sure how IDEs handle this issue. > * A Java module contains metadata (such as the module version or main class) > that is completely detached from any source file. These things live in a > class bytecode of the compiled module-info; interestingly, there is no > source-level way to specify it - these class attributes are injected by the > 'jar' tool. Gradle has some fancy on-the-fly asm conversion filter that > injects it. > * Dependencies between modules will effectively live in two places: in gradle > build files and in module-info files. And they can go out of sync, although > it's probably easy to catch (since javac would complain about missing classes > during compilation, even if they're in module path). > * Probably the biggest challenge (not covered in the PoC) are with our custom > javadoc and ecj linter tasks - they see the module-info.java and can't cope > with it. At the same time, there is no easy way to exclude that one > particular file: ecj would have to accept a full set of sources (command > argument limit will be a problem), javac can accept a full set of java > sources (external file) but then it doesn't copy doc-files properly anymore > (this is probably easier to fix). > * There are differences at runtime that are hard to anticipate - for example > resource lookups via class loader no longer work (I fixed this in Luke). > After poking a bit and trying it out I have to say I have mixed feelings > about moving to the JMS. On the one hand, many things are great - the module > path, module descriptors and access modes. On the other hand, the tooling > tricks required to make it all work make you shiver. > If anybody wants to play/ improve things on that experimental branch (I > converted Luke to a full module - it works), please be my guest. I have to > sit on this and think whether it's something I really like or not. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10255) Fully embrace the java module system
[ https://issues.apache.org/jira/browse/LUCENE-10255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450630#comment-17450630 ] Robert Muir commented on LUCENE-10255: -- {quote} Sorry but I remain unconvinced that typing a million times "org.apache." in various contexts wins you or me anything. {quote} Sorry for the silly question, but I'm trying to understand why you'd need to type it a million times. I too dislike the verbosity of java, but it is my understanding that you might only add it to a module-info.java, like once? I think as far as specifying stuff on the commandline, its not a problem, as lucene isn't a commandline application but instead an API. The one app we really ship (luke) has a sh/bat to make it easy. But because it is an API, I do care that it's easy for users to consume it with the module system. And that also includes making it easy to consume things like analyzers via SPI providers if they are using the module system. I just don't know what that looks like yet (due to my unfamiliarity with the module system), but I'd love to visually see the tradeoffs between say 'lucene.analysis.common' and 'org.apache.lucene.analysis.common' from an "API user" perspective. > Fully embrace the java module system > > > Key: LUCENE-10255 > URL: https://issues.apache.org/jira/browse/LUCENE-10255 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Dawid Weiss >Priority: Major > Attachments: screenshot-1.png, screenshot-2.png > > Time Spent: 3h 40m > Remaining Estimate: 0h > > I've experimented a bit trying to move the code to the JMS. It is > _surprisingly difficult_... A PoC that almost passes all checks is here: > https://github.com/dweiss/lucene/tree/jms > Here are my conclusions so far: > * The JMS and gradle add a lot of complexity (this applies to any > higher-level tooling, including IDEs, I think). For starters, modules have to > be JARs. The effect of this is that what was previously a set of directories > from dependencies now has to be a JAR. What was previously an incremental > update of a single .class file now ripples throughout the build recreating > module JARs (ZIPs!)... I didn't realize it at first, but it's a costly thing > to do. I'm not even sure how IDEs handle this issue. > * A Java module contains metadata (such as the module version or main class) > that is completely detached from any source file. These things live in a > class bytecode of the compiled module-info; interestingly, there is no > source-level way to specify it - these class attributes are injected by the > 'jar' tool. Gradle has some fancy on-the-fly asm conversion filter that > injects it. > * Dependencies between modules will effectively live in two places: in gradle > build files and in module-info files. And they can go out of sync, although > it's probably easy to catch (since javac would complain about missing classes > during compilation, even if they're in module path). > * Probably the biggest challenge (not covered in the PoC) are with our custom > javadoc and ecj linter tasks - they see the module-info.java and can't cope > with it. At the same time, there is no easy way to exclude that one > particular file: ecj would have to accept a full set of sources (command > argument limit will be a problem), javac can accept a full set of java > sources (external file) but then it doesn't copy doc-files properly anymore > (this is probably easier to fix). > * There are differences at runtime that are hard to anticipate - for example > resource lookups via class loader no longer work (I fixed this in Luke). > After poking a bit and trying it out I have to say I have mixed feelings > about moving to the JMS. On the one hand, many things are great - the module > path, module descriptors and access modes. On the other hand, the tooling > tricks required to make it all work make you shiver. > If anybody wants to play/ improve things on that experimental branch (I > converted Luke to a full module - it works), please be my guest. I have to > sit on this and think whether it's something I really like or not. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Reopened] (LUCENE-10234) Add automatic module name to JAR manifests.
[ https://issues.apache.org/jira/browse/LUCENE-10234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss reopened LUCENE-10234: -- Uwe seems to be really devoted to changing the automatic module name to full-prefix convention, so I'm reopening this issue. This will require changes to the build system and the scripts that launch Luke. [~jpountz] - please cancel the current release candidate, we will have to respin. > Add automatic module name to JAR manifests. > --- > > Key: LUCENE-10234 > URL: https://issues.apache.org/jira/browse/LUCENE-10234 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Trivial > Fix For: 9.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > This is the first step to make Lucene a proper fit for the java module > system. I chose a shorthand "lucene.[x]" module name convention, without the > "org.apache" prefix. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10255) Fully embrace the java module system
[ https://issues.apache.org/jira/browse/LUCENE-10255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450635#comment-17450635 ] Dawid Weiss commented on LUCENE-10255: -- I already typed "-m lucene.luke" what seems like half a million times while debugging stuff around the jms and gradle bugs. So I'm almost there. Listen... I really don't like the full prefix but I really could care less about it if you all want to stick with the full domain name - let's just fix it, respin the release candidate and be done with it. I did announce the shorthand version on LUCENE-10234, perhaps I should have written an all-caps announcement but I didn't, sorry. Let's do it the way you like it, I really don't care THAT MUCH. I only care a little. > Fully embrace the java module system > > > Key: LUCENE-10255 > URL: https://issues.apache.org/jira/browse/LUCENE-10255 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Dawid Weiss >Priority: Major > Attachments: screenshot-1.png, screenshot-2.png > > Time Spent: 3h 40m > Remaining Estimate: 0h > > I've experimented a bit trying to move the code to the JMS. It is > _surprisingly difficult_... A PoC that almost passes all checks is here: > https://github.com/dweiss/lucene/tree/jms > Here are my conclusions so far: > * The JMS and gradle add a lot of complexity (this applies to any > higher-level tooling, including IDEs, I think). For starters, modules have to > be JARs. The effect of this is that what was previously a set of directories > from dependencies now has to be a JAR. What was previously an incremental > update of a single .class file now ripples throughout the build recreating > module JARs (ZIPs!)... I didn't realize it at first, but it's a costly thing > to do. I'm not even sure how IDEs handle this issue. > * A Java module contains metadata (such as the module version or main class) > that is completely detached from any source file. These things live in a > class bytecode of the compiled module-info; interestingly, there is no > source-level way to specify it - these class attributes are injected by the > 'jar' tool. Gradle has some fancy on-the-fly asm conversion filter that > injects it. > * Dependencies between modules will effectively live in two places: in gradle > build files and in module-info files. And they can go out of sync, although > it's probably easy to catch (since javac would complain about missing classes > during compilation, even if they're in module path). > * Probably the biggest challenge (not covered in the PoC) are with our custom > javadoc and ecj linter tasks - they see the module-info.java and can't cope > with it. At the same time, there is no easy way to exclude that one > particular file: ecj would have to accept a full set of sources (command > argument limit will be a problem), javac can accept a full set of java > sources (external file) but then it doesn't copy doc-files properly anymore > (this is probably easier to fix). > * There are differences at runtime that are hard to anticipate - for example > resource lookups via class loader no longer work (I fixed this in Luke). > After poking a bit and trying it out I have to say I have mixed feelings > about moving to the JMS. On the one hand, many things are great - the module > path, module descriptors and access modes. On the other hand, the tooling > tricks required to make it all work make you shiver. > If anybody wants to play/ improve things on that experimental branch (I > converted Luke to a full module - it works), please be my guest. I have to > sit on this and think whether it's something I really like or not. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10255) Fully embrace the java module system
[ https://issues.apache.org/jira/browse/LUCENE-10255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450639#comment-17450639 ] Robert Muir commented on LUCENE-10255: -- my comment was a genuine question, as I don't yet understand how annoying this name will be to API users. I don't yet have any opinion on the color of the bikeshed :) > Fully embrace the java module system > > > Key: LUCENE-10255 > URL: https://issues.apache.org/jira/browse/LUCENE-10255 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Dawid Weiss >Priority: Major > Attachments: screenshot-1.png, screenshot-2.png > > Time Spent: 3h 40m > Remaining Estimate: 0h > > I've experimented a bit trying to move the code to the JMS. It is > _surprisingly difficult_... A PoC that almost passes all checks is here: > https://github.com/dweiss/lucene/tree/jms > Here are my conclusions so far: > * The JMS and gradle add a lot of complexity (this applies to any > higher-level tooling, including IDEs, I think). For starters, modules have to > be JARs. The effect of this is that what was previously a set of directories > from dependencies now has to be a JAR. What was previously an incremental > update of a single .class file now ripples throughout the build recreating > module JARs (ZIPs!)... I didn't realize it at first, but it's a costly thing > to do. I'm not even sure how IDEs handle this issue. > * A Java module contains metadata (such as the module version or main class) > that is completely detached from any source file. These things live in a > class bytecode of the compiled module-info; interestingly, there is no > source-level way to specify it - these class attributes are injected by the > 'jar' tool. Gradle has some fancy on-the-fly asm conversion filter that > injects it. > * Dependencies between modules will effectively live in two places: in gradle > build files and in module-info files. And they can go out of sync, although > it's probably easy to catch (since javac would complain about missing classes > during compilation, even if they're in module path). > * Probably the biggest challenge (not covered in the PoC) are with our custom > javadoc and ecj linter tasks - they see the module-info.java and can't cope > with it. At the same time, there is no easy way to exclude that one > particular file: ecj would have to accept a full set of sources (command > argument limit will be a problem), javac can accept a full set of java > sources (external file) but then it doesn't copy doc-files properly anymore > (this is probably easier to fix). > * There are differences at runtime that are hard to anticipate - for example > resource lookups via class loader no longer work (I fixed this in Luke). > After poking a bit and trying it out I have to say I have mixed feelings > about moving to the JMS. On the one hand, many things are great - the module > path, module descriptors and access modes. On the other hand, the tooling > tricks required to make it all work make you shiver. > If anybody wants to play/ improve things on that experimental branch (I > converted Luke to a full module - it works), please be my guest. I have to > sit on this and think whether it's something I really like or not. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10255) Fully embrace the java module system
[ https://issues.apache.org/jira/browse/LUCENE-10255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450647#comment-17450647 ] Dawid Weiss commented on LUCENE-10255: -- It'll be a different prefix in module-info.java "requires xyz" statements and in command-line invocations of Luke. Also, it'll list Lucene module as "lucene.core@version" instead of "org.apache.lucene@version". I'll provide a PR to go back to the full-prefix - Uwe seems to be really determined that this is the right way (tm) of doing it. > Fully embrace the java module system > > > Key: LUCENE-10255 > URL: https://issues.apache.org/jira/browse/LUCENE-10255 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Dawid Weiss >Priority: Major > Attachments: screenshot-1.png, screenshot-2.png > > Time Spent: 3h 40m > Remaining Estimate: 0h > > I've experimented a bit trying to move the code to the JMS. It is > _surprisingly difficult_... A PoC that almost passes all checks is here: > https://github.com/dweiss/lucene/tree/jms > Here are my conclusions so far: > * The JMS and gradle add a lot of complexity (this applies to any > higher-level tooling, including IDEs, I think). For starters, modules have to > be JARs. The effect of this is that what was previously a set of directories > from dependencies now has to be a JAR. What was previously an incremental > update of a single .class file now ripples throughout the build recreating > module JARs (ZIPs!)... I didn't realize it at first, but it's a costly thing > to do. I'm not even sure how IDEs handle this issue. > * A Java module contains metadata (such as the module version or main class) > that is completely detached from any source file. These things live in a > class bytecode of the compiled module-info; interestingly, there is no > source-level way to specify it - these class attributes are injected by the > 'jar' tool. Gradle has some fancy on-the-fly asm conversion filter that > injects it. > * Dependencies between modules will effectively live in two places: in gradle > build files and in module-info files. And they can go out of sync, although > it's probably easy to catch (since javac would complain about missing classes > during compilation, even if they're in module path). > * Probably the biggest challenge (not covered in the PoC) are with our custom > javadoc and ecj linter tasks - they see the module-info.java and can't cope > with it. At the same time, there is no easy way to exclude that one > particular file: ecj would have to accept a full set of sources (command > argument limit will be a problem), javac can accept a full set of java > sources (external file) but then it doesn't copy doc-files properly anymore > (this is probably easier to fix). > * There are differences at runtime that are hard to anticipate - for example > resource lookups via class loader no longer work (I fixed this in Luke). > After poking a bit and trying it out I have to say I have mixed feelings > about moving to the JMS. On the one hand, many things are great - the module > path, module descriptors and access modes. On the other hand, the tooling > tricks required to make it all work make you shiver. > If anybody wants to play/ improve things on that experimental branch (I > converted Luke to a full module - it works), please be my guest. I have to > sit on this and think whether it's something I really like or not. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-10255) Fully embrace the java module system
[ https://issues.apache.org/jira/browse/LUCENE-10255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450647#comment-17450647 ] Dawid Weiss edited comment on LUCENE-10255 at 11/29/21, 6:15 PM: - It'll be a different prefix in module-info.java "requires xyz" statements and in command-line invocations of Luke. Also, it'll list Lucene module as "lucene.core@version" instead of "org.apache.lucene.core@version". I'll provide a PR to go back to the full-prefix - Uwe seems to be really determined that this is the right way (tm) of doing it. was (Author: dweiss): It'll be a different prefix in module-info.java "requires xyz" statements and in command-line invocations of Luke. Also, it'll list Lucene module as "lucene.core@version" instead of "org.apache.lucene@version". I'll provide a PR to go back to the full-prefix - Uwe seems to be really determined that this is the right way (tm) of doing it. > Fully embrace the java module system > > > Key: LUCENE-10255 > URL: https://issues.apache.org/jira/browse/LUCENE-10255 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Dawid Weiss >Priority: Major > Attachments: screenshot-1.png, screenshot-2.png > > Time Spent: 3h 40m > Remaining Estimate: 0h > > I've experimented a bit trying to move the code to the JMS. It is > _surprisingly difficult_... A PoC that almost passes all checks is here: > https://github.com/dweiss/lucene/tree/jms > Here are my conclusions so far: > * The JMS and gradle add a lot of complexity (this applies to any > higher-level tooling, including IDEs, I think). For starters, modules have to > be JARs. The effect of this is that what was previously a set of directories > from dependencies now has to be a JAR. What was previously an incremental > update of a single .class file now ripples throughout the build recreating > module JARs (ZIPs!)... I didn't realize it at first, but it's a costly thing > to do. I'm not even sure how IDEs handle this issue. > * A Java module contains metadata (such as the module version or main class) > that is completely detached from any source file. These things live in a > class bytecode of the compiled module-info; interestingly, there is no > source-level way to specify it - these class attributes are injected by the > 'jar' tool. Gradle has some fancy on-the-fly asm conversion filter that > injects it. > * Dependencies between modules will effectively live in two places: in gradle > build files and in module-info files. And they can go out of sync, although > it's probably easy to catch (since javac would complain about missing classes > during compilation, even if they're in module path). > * Probably the biggest challenge (not covered in the PoC) are with our custom > javadoc and ecj linter tasks - they see the module-info.java and can't cope > with it. At the same time, there is no easy way to exclude that one > particular file: ecj would have to accept a full set of sources (command > argument limit will be a problem), javac can accept a full set of java > sources (external file) but then it doesn't copy doc-files properly anymore > (this is probably easier to fix). > * There are differences at runtime that are hard to anticipate - for example > resource lookups via class loader no longer work (I fixed this in Luke). > After poking a bit and trying it out I have to say I have mixed feelings > about moving to the JMS. On the one hand, many things are great - the module > path, module descriptors and access modes. On the other hand, the tooling > tricks required to make it all work make you shiver. > If anybody wants to play/ improve things on that experimental branch (I > converted Luke to a full module - it works), please be my guest. I have to > sit on this and think whether it's something I really like or not. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10234) Add automatic module name to JAR manifests.
[ https://issues.apache.org/jira/browse/LUCENE-10234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss updated LUCENE-10234: - Description: This is the first step to make Lucene a proper fit for the java module system. -I chose a shorthand "lucene.[x]" module name convention, without the "org.apache" prefix.- (was: This is the first step to make Lucene a proper fit for the java module system. I chose a shorthand "lucene.[x]" module name convention, without the "org.apache" prefix.) > Add automatic module name to JAR manifests. > --- > > Key: LUCENE-10234 > URL: https://issues.apache.org/jira/browse/LUCENE-10234 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Trivial > Fix For: 9.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > This is the first step to make Lucene a proper fit for the java module > system. -I chose a shorthand "lucene.[x]" module name convention, without the > "org.apache" prefix.- -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a change in pull request #487: LUCENE-10234: Change module prefix to org.apache.*
uschindler commented on a change in pull request #487: URL: https://github.com/apache/lucene/pull/487#discussion_r758637522 ## File path: gradle/java/jar-manifest.gradle ## @@ -66,7 +66,7 @@ subprojects { "X-Build-JDK" : "${System.properties['java.version']} (${System.properties['java.vendor']} ${System.properties['java.vm.version']})", "X-Build-OS": "${System.properties['os.name']} ${System.properties['os.arch']} ${System.properties['os.version']}", - "Automatic-Module-Name" : "${-> project.path.replaceFirst(":", "").replace(':', '.').replace("-", "_")}" + "Automatic-Module-Name" : "org.apache.${-> project.path.replaceFirst(":", "").replace(':', '.').replace("-", "_")}" Review comment: Can we do this like here, based on the Maven group? https://github.com/apache/lucene/blob/main/gradle/maven/publications-maven.gradle#L59-L60 Of course, we would need to strip the ":lucene" from project path. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10255) Fully embrace the java module system
[ https://issues.apache.org/jira/browse/LUCENE-10255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450680#comment-17450680 ] Uwe Schindler commented on LUCENE-10255: bq. Sorry for the silly question, but I'm trying to understand why you'd need to type it a million times. I too dislike the verbosity of java, but it is my understanding that you might only add it to a module-info.java, like once? Exactly. And for our API users it is not understandable why you must write in the moudle-info.java {{requires lucene.core}} but in all java files {{import org.apach.lucene.xyz.*;}}. This is inconsistent. And there is the risk of clashes (although Lucene is very special, but we will see other third party modules then also name their modules like "lucene.foobar.xy", although they have nothing in common with Apache. We are an Apache project, so our package names, module names and maven artifact names should have the "org.apache.lucene" prefix. This allows to consume in the way everybody knows: In java files for imports and when definig your dependencies in Maven or the requires directoives in Java modules. bq. But because it is an API, I do care that it's easy for users to consume it with the module system. And that also includes making it easy to consume things like analyzers via SPI providers if they are using the module system. Yes, and this will also work with module system. I tested it after adding correct "uses SPIBaseClass" statements to lucene-core's module-info.java. Theoretically, in addition we can hide everything from analyzers-common except the SPI > Fully embrace the java module system > > > Key: LUCENE-10255 > URL: https://issues.apache.org/jira/browse/LUCENE-10255 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Dawid Weiss >Priority: Major > Attachments: screenshot-1.png, screenshot-2.png > > Time Spent: 3h 40m > Remaining Estimate: 0h > > I've experimented a bit trying to move the code to the JMS. It is > _surprisingly difficult_... A PoC that almost passes all checks is here: > https://github.com/dweiss/lucene/tree/jms > Here are my conclusions so far: > * The JMS and gradle add a lot of complexity (this applies to any > higher-level tooling, including IDEs, I think). For starters, modules have to > be JARs. The effect of this is that what was previously a set of directories > from dependencies now has to be a JAR. What was previously an incremental > update of a single .class file now ripples throughout the build recreating > module JARs (ZIPs!)... I didn't realize it at first, but it's a costly thing > to do. I'm not even sure how IDEs handle this issue. > * A Java module contains metadata (such as the module version or main class) > that is completely detached from any source file. These things live in a > class bytecode of the compiled module-info; interestingly, there is no > source-level way to specify it - these class attributes are injected by the > 'jar' tool. Gradle has some fancy on-the-fly asm conversion filter that > injects it. > * Dependencies between modules will effectively live in two places: in gradle > build files and in module-info files. And they can go out of sync, although > it's probably easy to catch (since javac would complain about missing classes > during compilation, even if they're in module path). > * Probably the biggest challenge (not covered in the PoC) are with our custom > javadoc and ecj linter tasks - they see the module-info.java and can't cope > with it. At the same time, there is no easy way to exclude that one > particular file: ecj would have to accept a full set of sources (command > argument limit will be a problem), javac can accept a full set of java > sources (external file) but then it doesn't copy doc-files properly anymore > (this is probably easier to fix). > * There are differences at runtime that are hard to anticipate - for example > resource lookups via class loader no longer work (I fixed this in Luke). > After poking a bit and trying it out I have to say I have mixed feelings > about moving to the JMS. On the one hand, many things are great - the module > path, module descriptors and access modes. On the other hand, the tooling > tricks required to make it all work make you shiver. > If anybody wants to play/ improve things on that experimental branch (I > converted Luke to a full module - it works), please be my guest. I have to > sit on this and think whether it's something I really like or not. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-10255) Fully embrace the java module system
[ https://issues.apache.org/jira/browse/LUCENE-10255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450680#comment-17450680 ] Uwe Schindler edited comment on LUCENE-10255 at 11/29/21, 7:02 PM: --- bq. Sorry for the silly question, but I'm trying to understand why you'd need to type it a million times. I too dislike the verbosity of java, but it is my understanding that you might only add it to a module-info.java, like once? Exactly. And for our API users it is not understandable why you must write in the modudle-info.java "{{requires lucene.core;}}" but in all java files "{{import org.apach.lucene.xyz.*;}}". This is inconsistent! And there is the risk of clashes (although Lucene is very special, but we will see other third party modules then also name their modules like "lucene.foobar.xy", although they have nothing in common with Apache. We are an Apache project, so our package names, module names and maven artifact names should have the "org.apache.lucene" prefix. This allows to consume in the way everybody knows: In java files for imports and when definig your dependencies in Maven or the requires directoives in Java modules. bq. But because it is an API, I do care that it's easy for users to consume it with the module system. And that also includes making it easy to consume things like analyzers via SPI providers if they are using the module system. Yes, and this will also work with module system. I tested it after adding correct "uses SPIBaseClass" statements to lucene-core's module-info.java. Theoretically, in addition we can hide everything from analyzers-common except the SPI was (Author: thetaphi): bq. Sorry for the silly question, but I'm trying to understand why you'd need to type it a million times. I too dislike the verbosity of java, but it is my understanding that you might only add it to a module-info.java, like once? Exactly. And for our API users it is not understandable why you must write in the moudle-info.java {{requires lucene.core}} but in all java files {{import org.apach.lucene.xyz.*;}}. This is inconsistent. And there is the risk of clashes (although Lucene is very special, but we will see other third party modules then also name their modules like "lucene.foobar.xy", although they have nothing in common with Apache. We are an Apache project, so our package names, module names and maven artifact names should have the "org.apache.lucene" prefix. This allows to consume in the way everybody knows: In java files for imports and when definig your dependencies in Maven or the requires directoives in Java modules. bq. But because it is an API, I do care that it's easy for users to consume it with the module system. And that also includes making it easy to consume things like analyzers via SPI providers if they are using the module system. Yes, and this will also work with module system. I tested it after adding correct "uses SPIBaseClass" statements to lucene-core's module-info.java. Theoretically, in addition we can hide everything from analyzers-common except the SPI > Fully embrace the java module system > > > Key: LUCENE-10255 > URL: https://issues.apache.org/jira/browse/LUCENE-10255 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Dawid Weiss >Priority: Major > Attachments: screenshot-1.png, screenshot-2.png > > Time Spent: 3h 40m > Remaining Estimate: 0h > > I've experimented a bit trying to move the code to the JMS. It is > _surprisingly difficult_... A PoC that almost passes all checks is here: > https://github.com/dweiss/lucene/tree/jms > Here are my conclusions so far: > * The JMS and gradle add a lot of complexity (this applies to any > higher-level tooling, including IDEs, I think). For starters, modules have to > be JARs. The effect of this is that what was previously a set of directories > from dependencies now has to be a JAR. What was previously an incremental > update of a single .class file now ripples throughout the build recreating > module JARs (ZIPs!)... I didn't realize it at first, but it's a costly thing > to do. I'm not even sure how IDEs handle this issue. > * A Java module contains metadata (such as the module version or main class) > that is completely detached from any source file. These things live in a > class bytecode of the compiled module-info; interestingly, there is no > source-level way to specify it - these class attributes are injected by the > 'jar' tool. Gradle has some fancy on-the-fly asm conversion filter that > injects it. > * Dependencies between modules will effectively live in two places: in gradle > build files and in module-info files. And they can go out of sync, although > it's probably easy to catch (since javac w
[jira] [Commented] (LUCENE-10255) Fully embrace the java module system
[ https://issues.apache.org/jira/browse/LUCENE-10255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450690#comment-17450690 ] Dawid Weiss commented on LUCENE-10255: -- I accept your arguments, even if I disagree with them, Uwe. I provided a PR to change it already. > Fully embrace the java module system > > > Key: LUCENE-10255 > URL: https://issues.apache.org/jira/browse/LUCENE-10255 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Dawid Weiss >Priority: Major > Attachments: screenshot-1.png, screenshot-2.png > > Time Spent: 3h 40m > Remaining Estimate: 0h > > I've experimented a bit trying to move the code to the JMS. It is > _surprisingly difficult_... A PoC that almost passes all checks is here: > https://github.com/dweiss/lucene/tree/jms > Here are my conclusions so far: > * The JMS and gradle add a lot of complexity (this applies to any > higher-level tooling, including IDEs, I think). For starters, modules have to > be JARs. The effect of this is that what was previously a set of directories > from dependencies now has to be a JAR. What was previously an incremental > update of a single .class file now ripples throughout the build recreating > module JARs (ZIPs!)... I didn't realize it at first, but it's a costly thing > to do. I'm not even sure how IDEs handle this issue. > * A Java module contains metadata (such as the module version or main class) > that is completely detached from any source file. These things live in a > class bytecode of the compiled module-info; interestingly, there is no > source-level way to specify it - these class attributes are injected by the > 'jar' tool. Gradle has some fancy on-the-fly asm conversion filter that > injects it. > * Dependencies between modules will effectively live in two places: in gradle > build files and in module-info files. And they can go out of sync, although > it's probably easy to catch (since javac would complain about missing classes > during compilation, even if they're in module path). > * Probably the biggest challenge (not covered in the PoC) are with our custom > javadoc and ecj linter tasks - they see the module-info.java and can't cope > with it. At the same time, there is no easy way to exclude that one > particular file: ecj would have to accept a full set of sources (command > argument limit will be a problem), javac can accept a full set of java > sources (external file) but then it doesn't copy doc-files properly anymore > (this is probably easier to fix). > * There are differences at runtime that are hard to anticipate - for example > resource lookups via class loader no longer work (I fixed this in Luke). > After poking a bit and trying it out I have to say I have mixed feelings > about moving to the JMS. On the one hand, many things are great - the module > path, module descriptors and access modes. On the other hand, the tooling > tricks required to make it all work make you shiver. > If anybody wants to play/ improve things on that experimental branch (I > converted Luke to a full module - it works), please be my guest. I have to > sit on this and think whether it's something I really like or not. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on a change in pull request #487: LUCENE-10234: Change module prefix to org.apache.*
dweiss commented on a change in pull request #487: URL: https://github.com/apache/lucene/pull/487#discussion_r758670507 ## File path: gradle/java/jar-manifest.gradle ## @@ -66,7 +66,7 @@ subprojects { "X-Build-JDK" : "${System.properties['java.version']} (${System.properties['java.vendor']} ${System.properties['java.vm.version']})", "X-Build-OS": "${System.properties['os.name']} ${System.properties['os.arch']} ${System.properties['os.version']}", - "Automatic-Module-Name" : "${-> project.path.replaceFirst(":", "").replace(':', '.').replace("-", "_")}" + "Automatic-Module-Name" : "org.apache.${-> project.path.replaceFirst(":", "").replace(':', '.').replace("-", "_")}" Review comment: The problem may be in the order of how all these properties are set (and when) - this is the dark pit I wouldn't want to go into... Manifest attributes should be resolved lazily but I've had mixed results with this. I'll see if it works. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jtibshirani commented on a change in pull request #416: LUCENE-10054 Make HnswGraph hierarchical
jtibshirani commented on a change in pull request #416: URL: https://github.com/apache/lucene/pull/416#discussion_r758678766 ## File path: lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraph.java ## @@ -56,31 +59,50 @@ public final class HnswGraph extends KnnGraphValues { private final int maxConn; + private int numLevels; // the current number of levels in the graph + private int entryNode; // the current graph entry node on the top level - // Each entry lists the top maxConn neighbors of a node. The nodes correspond to vectors added to - // HnswBuilder, and the - // node values are the ordinals of those vectors. - private final List graph; + // Nodes by level expressed as the level 0's nodes' ordinals. + // As level 0 contains all nodes, nodesByLevel.get(0) is null. + private final List nodesByLevel; + + // graph is a list of graph levels. + // Each level is represented as List – nodes' connections on this level. + // Each entry in the list has the top maxConn neighbors of a node. The nodes correspond to vectors + // added to HnswBuilder, and the node values are the ordinals of those vectors. + // Thus, on all levels, neighbors expressed as the level 0's nodes' ordinals. + private final List> graph; // KnnGraphValues iterator members private int upto; private NeighborArray cur; - HnswGraph(int maxConn) { -graph = new ArrayList<>(); -// Typically with diversity criteria we see nodes not fully occupied; average fanout seems to be -// about 1/2 maxConn. There is some indexing time penalty for under-allocating, but saves RAM -graph.add(new NeighborArray(Math.max(32, maxConn / 4))); + HnswGraph(int maxConn, int levelOfFirstNode) { this.maxConn = maxConn; +this.numLevels = levelOfFirstNode + 1; +this.graph = new ArrayList<>(numLevels); +this.entryNode = 0; +for (int i = 0; i < numLevels; i++) { + graph.add(new ArrayList<>()); + // Typically with diversity criteria we see nodes not fully occupied; + // average fanout seems to be about 1/2 maxConn. + // There is some indexing time penalty for under-allocating, but saves RAM + graph.get(i).add(new NeighborArray(Math.max(32, maxConn / 4))); +} + +this.nodesByLevel = new ArrayList<>(numLevels); +nodesByLevel.add(null); // we don't need this for 0th level, as it contains all nodes +for (int l = 1; l < numLevels; l++) { + nodesByLevel.add(new int[] {0}); +} } /** - * Searches for the nearest neighbors of a query vector. + * Searches HNSW graph for the nearest neighbors of a query vector. * * @param query search query vector * @param topK the number of nodes to be returned - * @param numSeed the size of the queue maintained while searching, and controls the number of - * random entry points to sample + * @param numSeed the size of the queue maintained while searching Review comment: It works for me to have a separate discussion. Maybe at least in this PR we can rename this to `numCandidates`, since the 'seed' naming no longer makes sense? As context, I still think it makes sense to remove the `numCandidates` vs. `k` distinction in `HnswGraph`. The public signature `KnnVectorsReader#search` does not include a notion of "num candidates", so users have no way to even use this distinction. I'd be in favor of removing it from `HnswGraph`, then having a follow-up discussion about whether the vector search APIs should handle `numCandidates` vs `k`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] madrob merged pull request #2615: SOLR-14412 NPE in MetricsHistoryHandler
madrob merged pull request #2615: URL: https://github.com/apache/lucene-solr/pull/2615 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on a change in pull request #487: LUCENE-10234: Change module prefix to org.apache.*
dweiss commented on a change in pull request #487: URL: https://github.com/apache/lucene/pull/487#discussion_r758689212 ## File path: gradle/java/jar-manifest.gradle ## @@ -66,7 +66,7 @@ subprojects { "X-Build-JDK" : "${System.properties['java.version']} (${System.properties['java.vendor']} ${System.properties['java.vm.version']})", "X-Build-OS": "${System.properties['os.name']} ${System.properties['os.arch']} ${System.properties['os.version']}", - "Automatic-Module-Name" : "${-> project.path.replaceFirst(":", "").replace(':', '.').replace("-", "_")}" + "Automatic-Module-Name" : "org.apache.${-> project.path.replaceFirst(":", "").replace(':', '.').replace("-", "_")}" Review comment: Ok, take a look now. Try: gradlew showModuleNames -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on a change in pull request #487: LUCENE-10234: Change module prefix to org.apache.*
dweiss commented on a change in pull request #487: URL: https://github.com/apache/lucene/pull/487#discussion_r758691210 ## File path: gradle/java/jar-manifest.gradle ## @@ -66,7 +66,7 @@ subprojects { "X-Build-JDK" : "${System.properties['java.version']} (${System.properties['java.vendor']} ${System.properties['java.vm.version']})", "X-Build-OS": "${System.properties['os.name']} ${System.properties['os.arch']} ${System.properties['os.version']}", - "Automatic-Module-Name" : "${-> project.path.replaceFirst(":", "").replace(':', '.').replace("-", "_")}" + "Automatic-Module-Name" : "org.apache.${-> project.path.replaceFirst(":", "").replace(':', '.').replace("-", "_")}" Review comment: There is a functional change piggybacked in the last commit - javadoc and source jars no longer receive automatic module name. I consider it a fix of something that wasn't right (these JARs are not modules). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a change in pull request #487: LUCENE-10234: Change module prefix to org.apache.*
uschindler commented on a change in pull request #487: URL: https://github.com/apache/lucene/pull/487#discussion_r758697734 ## File path: gradle/java/jar-manifest.gradle ## @@ -66,7 +66,7 @@ subprojects { "X-Build-JDK" : "${System.properties['java.version']} (${System.properties['java.vendor']} ${System.properties['java.vm.version']})", "X-Build-OS": "${System.properties['os.name']} ${System.properties['os.arch']} ${System.properties['os.version']}", - "Automatic-Module-Name" : "${-> project.path.replaceFirst(":", "").replace(':', '.').replace("-", "_")}" + "Automatic-Module-Name" : "org.apache.${-> project.path.replaceFirst(":", "").replace(':', '.').replace("-", "_")}" Review comment: Will check soon. I am not yet sure about how the local name should look like. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on a change in pull request #487: LUCENE-10234: Change module prefix to org.apache.*
dweiss commented on a change in pull request #487: URL: https://github.com/apache/lucene/pull/487#discussion_r758700461 ## File path: gradle/java/jar-manifest.gradle ## @@ -66,7 +66,7 @@ subprojects { "X-Build-JDK" : "${System.properties['java.version']} (${System.properties['java.vendor']} ${System.properties['java.vm.version']})", "X-Build-OS": "${System.properties['os.name']} ${System.properties['os.arch']} ${System.properties['os.version']}", - "Automatic-Module-Name" : "${-> project.path.replaceFirst(":", "").replace(':', '.').replace("-", "_")}" + "Automatic-Module-Name" : "org.apache.${-> project.path.replaceFirst(":", "").replace(':', '.').replace("-", "_")}" Review comment: Sure. Let me know (or commit the changes to this PR). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10255) Fully embrace the java module system
[ https://issues.apache.org/jira/browse/LUCENE-10255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450707#comment-17450707 ] Christian Stein commented on LUCENE-10255: -- FWIW, I agree with Uwe on the naming topic and want to add "prior art" samples from other `org.apache.*` project already shipping as Java modules with their module names with `org.apache.`: Derby, Felix, POI, Tomcat, and Wicket. https://github.com/sormuras/modules/blob/be524907f29f60c7895b3cde62850a1937969ad7/com.github.sormuras.modules/com/github/sormuras/modules/modules.properties#L2480-L2551 > Fully embrace the java module system > > > Key: LUCENE-10255 > URL: https://issues.apache.org/jira/browse/LUCENE-10255 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Dawid Weiss >Priority: Major > Attachments: screenshot-1.png, screenshot-2.png > > Time Spent: 3h 40m > Remaining Estimate: 0h > > I've experimented a bit trying to move the code to the JMS. It is > _surprisingly difficult_... A PoC that almost passes all checks is here: > https://github.com/dweiss/lucene/tree/jms > Here are my conclusions so far: > * The JMS and gradle add a lot of complexity (this applies to any > higher-level tooling, including IDEs, I think). For starters, modules have to > be JARs. The effect of this is that what was previously a set of directories > from dependencies now has to be a JAR. What was previously an incremental > update of a single .class file now ripples throughout the build recreating > module JARs (ZIPs!)... I didn't realize it at first, but it's a costly thing > to do. I'm not even sure how IDEs handle this issue. > * A Java module contains metadata (such as the module version or main class) > that is completely detached from any source file. These things live in a > class bytecode of the compiled module-info; interestingly, there is no > source-level way to specify it - these class attributes are injected by the > 'jar' tool. Gradle has some fancy on-the-fly asm conversion filter that > injects it. > * Dependencies between modules will effectively live in two places: in gradle > build files and in module-info files. And they can go out of sync, although > it's probably easy to catch (since javac would complain about missing classes > during compilation, even if they're in module path). > * Probably the biggest challenge (not covered in the PoC) are with our custom > javadoc and ecj linter tasks - they see the module-info.java and can't cope > with it. At the same time, there is no easy way to exclude that one > particular file: ecj would have to accept a full set of sources (command > argument limit will be a problem), javac can accept a full set of java > sources (external file) but then it doesn't copy doc-files properly anymore > (this is probably easier to fix). > * There are differences at runtime that are hard to anticipate - for example > resource lookups via class loader no longer work (I fixed this in Luke). > After poking a bit and trying it out I have to say I have mixed feelings > about moving to the JMS. On the one hand, many things are great - the module > path, module descriptors and access modes. On the other hand, the tooling > tricks required to make it all work make you shiver. > If anybody wants to play/ improve things on that experimental branch (I > converted Luke to a full module - it works), please be my guest. I have to > sit on this and think whether it's something I really like or not. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a change in pull request #487: LUCENE-10234: Change module prefix to org.apache.*
uschindler commented on a change in pull request #487: URL: https://github.com/apache/lucene/pull/487#discussion_r758715684 ## File path: gradle/java/jar-manifest.gradle ## @@ -66,7 +66,7 @@ subprojects { "X-Build-JDK" : "${System.properties['java.version']} (${System.properties['java.vendor']} ${System.properties['java.vm.version']})", "X-Build-OS": "${System.properties['os.name']} ${System.properties['os.arch']} ${System.properties['os.version']}", - "Automatic-Module-Name" : "${-> project.path.replaceFirst(":", "").replace(':', '.').replace("-", "_")}" + "Automatic-Module-Name" : "org.apache.${-> project.path.replaceFirst(":", "").replace(':', '.').replace("-", "_")}" Review comment: Hi, so it looks like this: ``` > Task :showModuleNames lucene-benchmark-10.0.0-SNAPSHOT.jar -> org.apache.lucene.benchmark lucene-backward-codecs-10.0.0-SNAPSHOT.jar -> org.apache.lucene.backward_codecs lucene-classification-10.0.0-SNAPSHOT.jar -> org.apache.lucene.classification lucene-codecs-10.0.0-SNAPSHOT.jar -> org.apache.lucene.codecs lucene-core-10.0.0-SNAPSHOT.jar-> org.apache.lucene.core lucene-demo-10.0.0-SNAPSHOT.jar-> org.apache.lucene.demo lucene-expressions-10.0.0-SNAPSHOT.jar -> org.apache.lucene.expressions lucene-facet-10.0.0-SNAPSHOT.jar -> org.apache.lucene.facet lucene-grouping-10.0.0-SNAPSHOT.jar-> org.apache.lucene.grouping lucene-highlighter-10.0.0-SNAPSHOT.jar -> org.apache.lucene.highlighter lucene-join-10.0.0-SNAPSHOT.jar-> org.apache.lucene.join lucene-luke-10.0.0-SNAPSHOT.jar-> org.apache.lucene.luke lucene-memory-10.0.0-SNAPSHOT.jar -> org.apache.lucene.memory lucene-misc-10.0.0-SNAPSHOT.jar-> org.apache.lucene.misc lucene-monitor-10.0.0-SNAPSHOT.jar -> org.apache.lucene.monitor lucene-queries-10.0.0-SNAPSHOT.jar -> org.apache.lucene.queries lucene-queryparser-10.0.0-SNAPSHOT.jar -> org.apache.lucene.queryparser lucene-replicator-10.0.0-SNAPSHOT.jar -> org.apache.lucene.replicator lucene-sandbox-10.0.0-SNAPSHOT.jar -> org.apache.lucene.sandbox lucene-spatial-extras-10.0.0-SNAPSHOT.jar -> org.apache.lucene.spatial_extras lucene-spatial3d-10.0.0-SNAPSHOT.jar -> org.apache.lucene.spatial3d lucene-suggest-10.0.0-SNAPSHOT.jar -> org.apache.lucene.suggest lucene-test-framework-10.0.0-SNAPSHOT.jar -> org.apache.lucene.test_framework lucene-analysis-common-10.0.0-SNAPSHOT.jar -> org.apache.lucene.common lucene-analysis-icu-10.0.0-SNAPSHOT.jar-> org.apache.lucene.icu lucene-analysis-kuromoji-10.0.0-SNAPSHOT.jar -> org.apache.lucene.kuromoji lucene-analysis-morfologik-10.0.0-SNAPSHOT.jar -> org.apache.lucene.morfologik lucene-analysis-nori-10.0.0-SNAPSHOT.jar -> org.apache.lucene.nori lucene-analysis-opennlp-10.0.0-SNAPSHOT.jar-> org.apache.lucene.opennlp lucene-analysis-phonetic-10.0.0-SNAPSHOT.jar -> org.apache.lucene.phonetic lucene-analysis-smartcn-10.0.0-SNAPSHOT.jar-> org.apache.lucene.smartcn lucene-analysis-stempel-10.0.0-SNAPSHOT.jar-> org.apache.lucene.stempel ``` I liked the previous names more, because now "analysis" is missing, @rmuir any suggestion. Maybe we should do something like this: ``` "${-> project.group.toString() + "." + project.path.replaceFirst(":lucene:", "").replace(':', '.').replace("-", "_")}" ``` So maybe ask others on mailing list. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a change in pull request #487: LUCENE-10234: Change module prefix to org.apache.*
uschindler commented on a change in pull request #487: URL: https://github.com/apache/lucene/pull/487#discussion_r758715684 ## File path: gradle/java/jar-manifest.gradle ## @@ -66,7 +66,7 @@ subprojects { "X-Build-JDK" : "${System.properties['java.version']} (${System.properties['java.vendor']} ${System.properties['java.vm.version']})", "X-Build-OS": "${System.properties['os.name']} ${System.properties['os.arch']} ${System.properties['os.version']}", - "Automatic-Module-Name" : "${-> project.path.replaceFirst(":", "").replace(':', '.').replace("-", "_")}" + "Automatic-Module-Name" : "org.apache.${-> project.path.replaceFirst(":", "").replace(':', '.').replace("-", "_")}" Review comment: Hi, so it looks like this: ``` > Task :showModuleNames lucene-benchmark-10.0.0-SNAPSHOT.jar -> org.apache.lucene.benchmark lucene-backward-codecs-10.0.0-SNAPSHOT.jar -> org.apache.lucene.backward_codecs lucene-classification-10.0.0-SNAPSHOT.jar -> org.apache.lucene.classification lucene-codecs-10.0.0-SNAPSHOT.jar -> org.apache.lucene.codecs lucene-core-10.0.0-SNAPSHOT.jar-> org.apache.lucene.core lucene-demo-10.0.0-SNAPSHOT.jar-> org.apache.lucene.demo lucene-expressions-10.0.0-SNAPSHOT.jar -> org.apache.lucene.expressions lucene-facet-10.0.0-SNAPSHOT.jar -> org.apache.lucene.facet lucene-grouping-10.0.0-SNAPSHOT.jar-> org.apache.lucene.grouping lucene-highlighter-10.0.0-SNAPSHOT.jar -> org.apache.lucene.highlighter lucene-join-10.0.0-SNAPSHOT.jar-> org.apache.lucene.join lucene-luke-10.0.0-SNAPSHOT.jar-> org.apache.lucene.luke lucene-memory-10.0.0-SNAPSHOT.jar -> org.apache.lucene.memory lucene-misc-10.0.0-SNAPSHOT.jar-> org.apache.lucene.misc lucene-monitor-10.0.0-SNAPSHOT.jar -> org.apache.lucene.monitor lucene-queries-10.0.0-SNAPSHOT.jar -> org.apache.lucene.queries lucene-queryparser-10.0.0-SNAPSHOT.jar -> org.apache.lucene.queryparser lucene-replicator-10.0.0-SNAPSHOT.jar -> org.apache.lucene.replicator lucene-sandbox-10.0.0-SNAPSHOT.jar -> org.apache.lucene.sandbox lucene-spatial-extras-10.0.0-SNAPSHOT.jar -> org.apache.lucene.spatial_extras lucene-spatial3d-10.0.0-SNAPSHOT.jar -> org.apache.lucene.spatial3d lucene-suggest-10.0.0-SNAPSHOT.jar -> org.apache.lucene.suggest lucene-test-framework-10.0.0-SNAPSHOT.jar -> org.apache.lucene.test_framework lucene-analysis-common-10.0.0-SNAPSHOT.jar -> org.apache.lucene.common lucene-analysis-icu-10.0.0-SNAPSHOT.jar-> org.apache.lucene.icu lucene-analysis-kuromoji-10.0.0-SNAPSHOT.jar -> org.apache.lucene.kuromoji lucene-analysis-morfologik-10.0.0-SNAPSHOT.jar -> org.apache.lucene.morfologik lucene-analysis-nori-10.0.0-SNAPSHOT.jar -> org.apache.lucene.nori lucene-analysis-opennlp-10.0.0-SNAPSHOT.jar-> org.apache.lucene.opennlp lucene-analysis-phonetic-10.0.0-SNAPSHOT.jar -> org.apache.lucene.phonetic lucene-analysis-smartcn-10.0.0-SNAPSHOT.jar-> org.apache.lucene.smartcn lucene-analysis-stempel-10.0.0-SNAPSHOT.jar-> org.apache.lucene.stempel ``` I liked the previous names more, because now "analysis" is missing, @rmuir do you have any suggestion? Maybe we should do something like this: ``` "${-> project.group.toString() + "." + project.path.replaceFirst(":lucene:", "").replace(':', '.').replace("-", "_")}" ``` So maybe ask others on mailing list. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a change in pull request #487: LUCENE-10234: Change module prefix to org.apache.*
uschindler commented on a change in pull request #487: URL: https://github.com/apache/lucene/pull/487#discussion_r758717236 ## File path: gradle/java/jar-manifest.gradle ## @@ -66,7 +66,7 @@ subprojects { "X-Build-JDK" : "${System.properties['java.version']} (${System.properties['java.vendor']} ${System.properties['java.vm.version']})", "X-Build-OS": "${System.properties['os.name']} ${System.properties['os.arch']} ${System.properties['os.version']}", - "Automatic-Module-Name" : "${-> project.path.replaceFirst(":", "").replace(':', '.').replace("-", "_")}" + "Automatic-Module-Name" : "org.apache.${-> project.path.replaceFirst(":", "").replace(':', '.').replace("-", "_")}" Review comment: Anyways, thanks for fixing the bug with the wrong manifest on non-library JARs (javadocs). One thing to keep in mind: When we add module-info.java, we need to remove the attribute, otherwise we have a duplicate. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a change in pull request #487: LUCENE-10234: Change module prefix to org.apache.*
uschindler commented on a change in pull request #487: URL: https://github.com/apache/lucene/pull/487#discussion_r758718326 ## File path: gradle/java/jar-manifest.gradle ## @@ -66,7 +66,7 @@ subprojects { "X-Build-JDK" : "${System.properties['java.version']} (${System.properties['java.vendor']} ${System.properties['java.vm.version']})", "X-Build-OS": "${System.properties['os.name']} ${System.properties['os.arch']} ${System.properties['os.version']}", - "Automatic-Module-Name" : "${-> project.path.replaceFirst(":", "").replace(':', '.').replace("-", "_")}" + "Automatic-Module-Name" : "org.apache.${-> project.path.replaceFirst(":", "").replace(':', '.').replace("-", "_")}" Review comment: With my change it would like this: ``` > Task :showModuleNames lucene-benchmark-10.0.0-SNAPSHOT.jar -> org.apache.lucene.benchmark lucene-backward-codecs-10.0.0-SNAPSHOT.jar -> org.apache.lucene.backward_codecs lucene-classification-10.0.0-SNAPSHOT.jar -> org.apache.lucene.classification lucene-codecs-10.0.0-SNAPSHOT.jar -> org.apache.lucene.codecs lucene-core-10.0.0-SNAPSHOT.jar-> org.apache.lucene.core lucene-demo-10.0.0-SNAPSHOT.jar-> org.apache.lucene.demo lucene-expressions-10.0.0-SNAPSHOT.jar -> org.apache.lucene.expressions lucene-facet-10.0.0-SNAPSHOT.jar -> org.apache.lucene.facet lucene-grouping-10.0.0-SNAPSHOT.jar-> org.apache.lucene.grouping lucene-highlighter-10.0.0-SNAPSHOT.jar -> org.apache.lucene.highlighter lucene-join-10.0.0-SNAPSHOT.jar-> org.apache.lucene.join lucene-luke-10.0.0-SNAPSHOT.jar-> org.apache.lucene.luke lucene-memory-10.0.0-SNAPSHOT.jar -> org.apache.lucene.memory lucene-misc-10.0.0-SNAPSHOT.jar-> org.apache.lucene.misc lucene-monitor-10.0.0-SNAPSHOT.jar -> org.apache.lucene.monitor lucene-queries-10.0.0-SNAPSHOT.jar -> org.apache.lucene.queries lucene-queryparser-10.0.0-SNAPSHOT.jar -> org.apache.lucene.queryparser lucene-replicator-10.0.0-SNAPSHOT.jar -> org.apache.lucene.replicator lucene-sandbox-10.0.0-SNAPSHOT.jar -> org.apache.lucene.sandbox lucene-spatial-extras-10.0.0-SNAPSHOT.jar -> org.apache.lucene.spatial_extras lucene-spatial3d-10.0.0-SNAPSHOT.jar -> org.apache.lucene.spatial3d lucene-suggest-10.0.0-SNAPSHOT.jar -> org.apache.lucene.suggest lucene-test-framework-10.0.0-SNAPSHOT.jar -> org.apache.lucene.test_framework lucene-analysis-common-10.0.0-SNAPSHOT.jar -> org.apache.lucene.analysis.common lucene-analysis-icu-10.0.0-SNAPSHOT.jar-> org.apache.lucene.analysis.icu lucene-analysis-kuromoji-10.0.0-SNAPSHOT.jar -> org.apache.lucene.analysis.kuromoji lucene-analysis-morfologik-10.0.0-SNAPSHOT.jar -> org.apache.lucene.analysis.morfologik lucene-analysis-nori-10.0.0-SNAPSHOT.jar -> org.apache.lucene.analysis.nori lucene-analysis-opennlp-10.0.0-SNAPSHOT.jar-> org.apache.lucene.analysis.opennlp lucene-analysis-phonetic-10.0.0-SNAPSHOT.jar -> org.apache.lucene.analysis.phonetic lucene-analysis-smartcn-10.0.0-SNAPSHOT.jar-> org.apache.lucene.analysis.smartcn lucene-analysis-stempel-10.0.0-SNAPSHOT.jar-> org.apache.lucene.analysis.stempel ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on a change in pull request #487: LUCENE-10234: Change module prefix to org.apache.*
rmuir commented on a change in pull request #487: URL: https://github.com/apache/lucene/pull/487#discussion_r758721225 ## File path: gradle/java/jar-manifest.gradle ## @@ -66,7 +66,7 @@ subprojects { "X-Build-JDK" : "${System.properties['java.version']} (${System.properties['java.vendor']} ${System.properties['java.vm.version']})", "X-Build-OS": "${System.properties['os.name']} ${System.properties['os.arch']} ${System.properties['os.version']}", - "Automatic-Module-Name" : "${-> project.path.replaceFirst(":", "").replace(':', '.').replace("-", "_")}" + "Automatic-Module-Name" : "org.apache.${-> project.path.replaceFirst(":", "").replace(':', '.').replace("-", "_")}" Review comment: hmm, for sure I'm unhappy about the "analysis" missing, because `org.apache.lucene.common` seems pretty ambiguous. But I'd be happy with `org.apache.lucene.analysis_common` which is what I think your suggestion would create? But +1 to iterate here a little bit more (can we get "analysis" in the name some way or another), and then ping the mailing list with a printout just like what you showed here. If anyone has a strong opinion then they had a chance. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on a change in pull request #487: LUCENE-10234: Change module prefix to org.apache.*
rmuir commented on a change in pull request #487: URL: https://github.com/apache/lucene/pull/487#discussion_r758723885 ## File path: gradle/java/jar-manifest.gradle ## @@ -66,7 +66,7 @@ subprojects { "X-Build-JDK" : "${System.properties['java.version']} (${System.properties['java.vendor']} ${System.properties['java.vm.version']})", "X-Build-OS": "${System.properties['os.name']} ${System.properties['os.arch']} ${System.properties['os.version']}", - "Automatic-Module-Name" : "${-> project.path.replaceFirst(":", "").replace(':', '.').replace("-", "_")}" + "Automatic-Module-Name" : "org.apache.${-> project.path.replaceFirst(":", "").replace(':', '.').replace("-", "_")}" Review comment: That's also great to me (in a way, preferred over the underscores for the analysis ones, but I didn't want to suggest special-casing it in the gradle build). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a change in pull request #487: LUCENE-10234: Change module prefix to org.apache.*
uschindler commented on a change in pull request #487: URL: https://github.com/apache/lucene/pull/487#discussion_r758723981 ## File path: gradle/java/jar-manifest.gradle ## @@ -66,7 +66,7 @@ subprojects { "X-Build-JDK" : "${System.properties['java.version']} (${System.properties['java.vendor']} ${System.properties['java.vm.version']})", "X-Build-OS": "${System.properties['os.name']} ${System.properties['os.arch']} ${System.properties['os.version']}", - "Automatic-Module-Name" : "${-> project.path.replaceFirst(":", "").replace(':', '.').replace("-", "_")}" + "Automatic-Module-Name" : "org.apache.${-> project.path.replaceFirst(":", "").replace(':', '.').replace("-", "_")}" Review comment: We might have overlapped: I copypasted my idea. Thanks @dweiss for the printout possibility. This also helps when migrating to module-info.java. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on a change in pull request #487: LUCENE-10234: Change module prefix to org.apache.*
rmuir commented on a change in pull request #487: URL: https://github.com/apache/lucene/pull/487#discussion_r758728664 ## File path: gradle/java/jar-manifest.gradle ## @@ -66,7 +66,7 @@ subprojects { "X-Build-JDK" : "${System.properties['java.version']} (${System.properties['java.vendor']} ${System.properties['java.vm.version']})", "X-Build-OS": "${System.properties['os.name']} ${System.properties['os.arch']} ${System.properties['os.version']}", - "Automatic-Module-Name" : "${-> project.path.replaceFirst(":", "").replace(':', '.').replace("-", "_")}" + "Automatic-Module-Name" : "org.apache.${-> project.path.replaceFirst(":", "").replace(':', '.').replace("-", "_")}" Review comment: by the way (sorry its offtopic) i just realized that in 9.0, we changed the maven names of all the analyzers from `lucene-analyzers-xxx` to `lucene-analysis-xxx`. Do we call this out somewhere bigtime / can we add a note here if we dont? I would anticipate lots of questions if anyone upgrades, because they may not expect to have to adjust this in their build. Sorry, just now thought of it, trying to reduce respins :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a change in pull request #487: LUCENE-10234: Change module prefix to org.apache.*
uschindler commented on a change in pull request #487: URL: https://github.com/apache/lucene/pull/487#discussion_r758729874 ## File path: gradle/java/jar-manifest.gradle ## @@ -66,7 +66,7 @@ subprojects { "X-Build-JDK" : "${System.properties['java.version']} (${System.properties['java.vendor']} ${System.properties['java.vm.version']})", "X-Build-OS": "${System.properties['os.name']} ${System.properties['os.arch']} ${System.properties['os.version']}", - "Automatic-Module-Name" : "${-> project.path.replaceFirst(":", "").replace(':', '.').replace("-", "_")}" + "Automatic-Module-Name" : "org.apache.${-> project.path.replaceFirst(":", "").replace(':', '.').replace("-", "_")}" Review comment: Yes it was changed to be conform to the source directry names. The "analyzers" name was wrong. But I agree, we should add a note to the MIGRATE.md file! Thanks @rmuir ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on a change in pull request #487: LUCENE-10234: Change module prefix to org.apache.*
dweiss commented on a change in pull request #487: URL: https://github.com/apache/lucene/pull/487#discussion_r758732884 ## File path: gradle/java/jar-manifest.gradle ## @@ -66,7 +66,7 @@ subprojects { "X-Build-JDK" : "${System.properties['java.version']} (${System.properties['java.vendor']} ${System.properties['java.vm.version']})", "X-Build-OS": "${System.properties['os.name']} ${System.properties['os.arch']} ${System.properties['os.version']}", - "Automatic-Module-Name" : "${-> project.path.replaceFirst(":", "").replace(':', '.').replace("-", "_")}" + "Automatic-Module-Name" : "org.apache.${-> project.path.replaceFirst(":", "").replace(':', '.').replace("-", "_")}" Review comment: Artifact rename is mentioned in migration: ``` ## Rename of binary artifacts from '**-analyzers-**' to '**-analysis-**' (LUCENE-9562) ``` So... what should I do about module names then?... I'm not sure what the outcome of the discussion is. Perhaps we should do this - tweak the names in a way you like it and commit (or provide a change suggestion)? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a change in pull request #487: LUCENE-10234: Change module prefix to org.apache.*
uschindler commented on a change in pull request #487: URL: https://github.com/apache/lucene/pull/487#discussion_r758733113 ## File path: gradle/java/jar-manifest.gradle ## @@ -66,7 +66,7 @@ subprojects { "X-Build-JDK" : "${System.properties['java.version']} (${System.properties['java.vendor']} ${System.properties['java.vm.version']})", "X-Build-OS": "${System.properties['os.name']} ${System.properties['os.arch']} ${System.properties['os.version']}", - "Automatic-Module-Name" : "${-> project.path.replaceFirst(":", "").replace(':', '.').replace("-", "_")}" + "Automatic-Module-Name" : "org.apache.${-> project.path.replaceFirst(":", "").replace(':', '.').replace("-", "_")}" Review comment: I rewrote it a bit, if nobody objects, I'd replace the line in Dawids code with: ``` manifestAttrs["Automatic-Module-Name"] = "${-> project.path.replaceFirst(/^:lucene/, project.group as String).replace(':', '.').replace('-', '_')}" ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on a change in pull request #487: LUCENE-10234: Change module prefix to org.apache.*
rmuir commented on a change in pull request #487: URL: https://github.com/apache/lucene/pull/487#discussion_r758737615 ## File path: gradle/java/jar-manifest.gradle ## @@ -66,7 +66,7 @@ subprojects { "X-Build-JDK" : "${System.properties['java.version']} (${System.properties['java.vendor']} ${System.properties['java.vm.version']})", "X-Build-OS": "${System.properties['os.name']} ${System.properties['os.arch']} ${System.properties['os.version']}", - "Automatic-Module-Name" : "${-> project.path.replaceFirst(":", "").replace(':', '.').replace("-", "_")}" + "Automatic-Module-Name" : "org.apache.${-> project.path.replaceFirst(":", "").replace(':', '.').replace("-", "_")}" Review comment: > Artifact rename is mentioned in migration: > > ## Rename of binary artifacts from '**-analyzers-**' to '**-analysis-**' (LUCENE-9562) OK I will make a separate PR with my suggestions. Sorry for creating noise on the issue, but Uwe's questions here had me snooping around maven doing inspections, and that's why I noticed it. I'd like to move this up "higher" in the file and just add a simple list of old/new maven coordinates for each affected jar. I think it would be a bit more verbose but useful to almost anyone upgrading the library. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on a change in pull request #487: LUCENE-10234: Change module prefix to org.apache.*
dweiss commented on a change in pull request #487: URL: https://github.com/apache/lucene/pull/487#discussion_r758742625 ## File path: gradle/java/jar-manifest.gradle ## @@ -66,7 +66,7 @@ subprojects { "X-Build-JDK" : "${System.properties['java.version']} (${System.properties['java.vendor']} ${System.properties['java.vm.version']})", "X-Build-OS": "${System.properties['os.name']} ${System.properties['os.arch']} ${System.properties['os.version']}", - "Automatic-Module-Name" : "${-> project.path.replaceFirst(":", "").replace(':', '.').replace("-", "_")}" + "Automatic-Module-Name" : "org.apache.${-> project.path.replaceFirst(":", "").replace(':', '.').replace("-", "_")}" Review comment: Sure, please do. It must have been at the top at some point... The structure of this file was never too clear to me - whether it's prioritized or just listed chronologically. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a change in pull request #487: LUCENE-10234: Change module prefix to org.apache.*
uschindler commented on a change in pull request #487: URL: https://github.com/apache/lucene/pull/487#discussion_r758744591 ## File path: gradle/java/jar-manifest.gradle ## @@ -66,7 +66,7 @@ subprojects { "X-Build-JDK" : "${System.properties['java.version']} (${System.properties['java.vendor']} ${System.properties['java.vm.version']})", "X-Build-OS": "${System.properties['os.name']} ${System.properties['os.arch']} ${System.properties['os.version']}", - "Automatic-Module-Name" : "${-> project.path.replaceFirst(":", "").replace(':', '.').replace("-", "_")}" + "Automatic-Module-Name" : "org.apache.${-> project.path.replaceFirst(":", "").replace(':', '.').replace("-", "_")}" Review comment: I committed my change. I will post the output on mailing list to get the others informed. Nevertheless we should still not make module system public for the 9.0 release, this may lead to too many questions. Once we have real module-info files and tested everything, we can make it public. By my complaint I just wanted to make sure that at least the module names are according to community standards and suggestions by Oracle. I know, @dweiss does not agree but let's present this to the committers on ML. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on a change in pull request #487: LUCENE-10234: Change module prefix to org.apache.*
dweiss commented on a change in pull request #487: URL: https://github.com/apache/lucene/pull/487#discussion_r758746111 ## File path: gradle/java/jar-manifest.gradle ## @@ -66,7 +66,7 @@ subprojects { "X-Build-JDK" : "${System.properties['java.version']} (${System.properties['java.vendor']} ${System.properties['java.vm.version']})", "X-Build-OS": "${System.properties['os.name']} ${System.properties['os.arch']} ${System.properties['os.version']}", - "Automatic-Module-Name" : "${-> project.path.replaceFirst(":", "").replace(':', '.').replace("-", "_")}" + "Automatic-Module-Name" : "org.apache.${-> project.path.replaceFirst(":", "").replace(':', '.').replace("-", "_")}" Review comment: That's fine, Uwe - I'll live with it. As for the module system - I think it can be mentioned that it is preliminary support but at least module names will remain the same. It's better than nothing... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir opened a new pull request #488: Improve MIGRATE.md around analyzers artifacts.
rmuir opened a new pull request #488: URL: https://github.com/apache/lucene/pull/488 Move this to the very top of MIGRATE, the user needs to first be able to pull in the artifacts, before doing anything else like trying to compile, deal with renamed classes, etc. Add a table of each package that got moved, with explicit old and new names. Hopefully it helps search engines and users. @jpountz I'd like to backport this to 9.0 if possible, since we are respinning for module names anyway. It is low risk. # Description Please provide a short description of the changes you're making with this pull request. # Solution Please provide a short description of the approach taken to implement your solution. # Tests Please describe the tests you've developed or run to confirm this patch implements the feature or solves the problem. # Checklist Please review the following and check all that apply: - [ ] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/lucene/HowToContribute) and my code conforms to the standards described there to the best of my ability. - [ ] I have created a Jira issue and added the issue ID to my pull request title. - [ ] I have given Lucene maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended) - [ ] I have developed this patch against the `main` branch. - [ ] I have run `./gradlew check`. - [ ] I have added tests for my changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #488: Improve MIGRATE.md around analyzers artifacts.
uschindler commented on pull request #488: URL: https://github.com/apache/lucene/pull/488#issuecomment-982046022 A separate note: The new name is also more conforming what the modules relay do: They are not only "analyzers", those are compoents for "analysis" of text while indexing/searching lucene. So new name is better. Maybe add this to explanation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #488: Improve MIGRATE.md around analyzers artifacts.
uschindler commented on pull request #488: URL: https://github.com/apache/lucene/pull/488#issuecomment-982048976 Table looks much better now. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #487: LUCENE-10234: Change module prefix to org.apache.*
uschindler commented on pull request #487: URL: https://github.com/apache/lucene/pull/487#issuecomment-982055622 I added a change to the CHANGES.txt file to explain that the automatic module names are a preparation for full module system support. This should be added, because we have not fully tested that everything works well with automatic module names. Also I added a note that the module names should not be considered "stable". OK? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #488: Improve MIGRATE.md around analyzers artifacts.
rmuir commented on pull request #488: URL: https://github.com/apache/lucene/pull/488#issuecomment-982055841 Sorry for all the commits, I wanted to try to make this easy to read and prominent, and linked from the README too for more visibility. I realize the `MIGRATE.md` is an unstructured list, but there's an advantage to listing some stuff at the top (esp. if it is likely to impact most users that upgrade). I don't want to hold up the 9.0 release, but maybe for the next one we can improve it to be better, some ideas: * avoid usage of abbreviations in our MIGRATE notes such as `o.a.l.a.util.TokenizerFactory` * avoid lists of from-to stuff and use tables (like this one as an example) * structure the high-impacting stuff such as package and class renames at the top. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #488: Improve MIGRATE.md around analyzers artifacts.
uschindler commented on pull request #488: URL: https://github.com/apache/lucene/pull/488#issuecomment-982058390 Originally I made the MIGRATE.md a markdown file to have all formatting possibilities. The unordered list was just a "quick conversion" from the old format introduced in Lucene 4.0. The Markdown converter accepts all markdown in the `gradlew documentation` output and also expands LUCENE/SOLR issue numbers (and makes them clickable): https://github.com/apache/lucene/blob/main/gradle/documentation/markdown.gradle -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #488: Improve MIGRATE.md around analyzers artifacts.
rmuir commented on pull request #488: URL: https://github.com/apache/lucene/pull/488#issuecomment-982061761 > A separate note: The new name is also more conforming what the modules relay do: They are not only "analyzers", those are compoents for "analysis" of text while indexing/searching lucene. So new name is better. Maybe add this to explanation. I'd rather not mix in justification/reasoning for any changes in this file, I think it adds noise. Most users will be annoyed with us regardless :) I think this file should just be simple hints of how to fix your code? We list the JIRA issue for each change already, in case an interested party wants to drill down to background discussion of why changes were done, or get any more detailed information. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #488: Improve MIGRATE.md around analyzers artifacts.
uschindler commented on pull request #488: URL: https://github.com/apache/lucene/pull/488#issuecomment-982064591 This was just meant as a replacement for this text: "and are now consistent with repository module 'analysis'". This does not sound like a acceptable explanation to an annoyed user, so my idea was to just say: "better name because it does more than providing analyzers". But all fine, was just my 2ct. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir merged pull request #488: Improve MIGRATE.md around analyzers artifacts.
rmuir merged pull request #488: URL: https://github.com/apache/lucene/pull/488 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #488: Improve MIGRATE.md around analyzers artifacts.
uschindler commented on pull request #488: URL: https://github.com/apache/lucene/pull/488#issuecomment-982072123 Hi Robert, I built the documentation with "gradlew documentation" and noticed that tables were not enabled in the markdown converter. It now looks not very well. If you don't mind I will add a change to fix the converter:  So please wait a bit with merging! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir opened a new pull request #489: support tables in generated html documentation
rmuir opened a new pull request #489: URL: https://github.com/apache/lucene/pull/489 https://github.com/apache/lucene/pull/488 added table of analyzer artifacts changes. Unfortunately it looks like crap in generated HTML unless we bring in the tables extension. Before:  After:  -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #489: support tables in generated html documentation
uschindler commented on pull request #489: URL: https://github.com/apache/lucene/pull/489#issuecomment-982080248 I created the same PR but did not yet commit it. Looks identical here, I just reformatted the long line with the extensions. +1 to commit. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #489: support tables in generated html documentation
uschindler commented on pull request #489: URL: https://github.com/apache/lucene/pull/489#issuecomment-982083351 +1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir merged pull request #489: support tables in generated html documentation
rmuir merged pull request #489: URL: https://github.com/apache/lucene/pull/489 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10255) Fully embrace the java module system
[ https://issues.apache.org/jira/browse/LUCENE-10255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450762#comment-17450762 ] Uwe Schindler commented on LUCENE-10255: Thanks [~sor] for confirmation. We now have the following names generated from the gradle build: {noformat} > Task :showModuleNames lucene-benchmark-10.0.0-SNAPSHOT.jar -> org.apache.lucene.benchmark lucene-backward-codecs-10.0.0-SNAPSHOT.jar -> org.apache.lucene.backward_codecs lucene-classification-10.0.0-SNAPSHOT.jar -> org.apache.lucene.classification lucene-codecs-10.0.0-SNAPSHOT.jar -> org.apache.lucene.codecs lucene-core-10.0.0-SNAPSHOT.jar-> org.apache.lucene.core lucene-demo-10.0.0-SNAPSHOT.jar-> org.apache.lucene.demo lucene-expressions-10.0.0-SNAPSHOT.jar -> org.apache.lucene.expressions lucene-facet-10.0.0-SNAPSHOT.jar -> org.apache.lucene.facet lucene-grouping-10.0.0-SNAPSHOT.jar-> org.apache.lucene.grouping lucene-highlighter-10.0.0-SNAPSHOT.jar -> org.apache.lucene.highlighter lucene-join-10.0.0-SNAPSHOT.jar-> org.apache.lucene.join lucene-luke-10.0.0-SNAPSHOT.jar-> org.apache.lucene.luke lucene-memory-10.0.0-SNAPSHOT.jar -> org.apache.lucene.memory lucene-misc-10.0.0-SNAPSHOT.jar-> org.apache.lucene.misc lucene-monitor-10.0.0-SNAPSHOT.jar -> org.apache.lucene.monitor lucene-queries-10.0.0-SNAPSHOT.jar -> org.apache.lucene.queries lucene-queryparser-10.0.0-SNAPSHOT.jar -> org.apache.lucene.queryparser lucene-replicator-10.0.0-SNAPSHOT.jar -> org.apache.lucene.replicator lucene-sandbox-10.0.0-SNAPSHOT.jar -> org.apache.lucene.sandbox lucene-spatial-extras-10.0.0-SNAPSHOT.jar -> org.apache.lucene.spatial_extras lucene-spatial3d-10.0.0-SNAPSHOT.jar -> org.apache.lucene.spatial3d lucene-suggest-10.0.0-SNAPSHOT.jar -> org.apache.lucene.suggest lucene-test-framework-10.0.0-SNAPSHOT.jar -> org.apache.lucene.test_framework lucene-analysis-common-10.0.0-SNAPSHOT.jar -> org.apache.lucene.analysis.common lucene-analysis-icu-10.0.0-SNAPSHOT.jar-> org.apache.lucene.analysis.icu lucene-analysis-kuromoji-10.0.0-SNAPSHOT.jar -> org.apache.lucene.analysis.kuromoji lucene-analysis-morfologik-10.0.0-SNAPSHOT.jar -> org.apache.lucene.analysis.morfologik lucene-analysis-nori-10.0.0-SNAPSHOT.jar -> org.apache.lucene.analysis.nori lucene-analysis-opennlp-10.0.0-SNAPSHOT.jar-> org.apache.lucene.analysis.opennlp lucene-analysis-phonetic-10.0.0-SNAPSHOT.jar -> org.apache.lucene.analysis.phonetic lucene-analysis-smartcn-10.0.0-SNAPSHOT.jar-> org.apache.lucene.analysis.smartcn lucene-analysis-stempel-10.0.0-SNAPSHOT.jar-> org.apache.lucene.analysis.stempel {noformat} (see https://github.com/apache/lucene/pull/487) At the moment it is automatic module names, but this issue is about fully modularizing. > Fully embrace the java module system > > > Key: LUCENE-10255 > URL: https://issues.apache.org/jira/browse/LUCENE-10255 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Dawid Weiss >Priority: Major > Attachments: screenshot-1.png, screenshot-2.png > > Time Spent: 3h 40m > Remaining Estimate: 0h > > I've experimented a bit trying to move the code to the JMS. It is > _surprisingly difficult_... A PoC that almost passes all checks is here: > https://github.com/dweiss/lucene/tree/jms > Here are my conclusions so far: > * The JMS and gradle add a lot of complexity (this applies to any > higher-level tooling, including IDEs, I think). For starters, modules have to > be JARs. The effect of this is that what was previously a set of directories > from dependencies now has to be a JAR. What was previously an incremental > update of a single .class file now ripples throughout the build recreating > module JARs (ZIPs!)... I didn't realize it at first, but it's a costly thing > to do. I'm not even sure how IDEs handle this issue. > * A Java module contains metadata (such as the module version or main class) > that is completely detached from any source file. These things live in a > class bytecode of the compiled module-info; interestingly, there is no > source-level way to specify it - these class attributes are injected by the > 'jar' tool. Gradle has some fancy on-the-fly asm conversion filter that > injects it. > * Dependencies between modules will effectively live in two places: in gradle > build files and in module-info files. And they can go out of sync, although > it's probably easy to
[GitHub] [lucene] zhaih commented on pull request #225: LUCENE-10010 Introduce NFARunAutomaton to run NFA directly
zhaih commented on pull request #225: URL: https://github.com/apache/lucene/pull/225#issuecomment-982125853 Thanks @rmuir, I'll run a benchmark to ensure this PR does not introduce regression recently. I like the approach you proposed in #485, it would be nice if we can get rid of `determinizeWorkLimit` in some classes that previously exists everywhere. One reason for carrying an enum and the `determinizeWorkLimit` together is that we might want to use that `determinizeWorkLimit` to limit the number of state that NFA can cache as well. But that's a feature not implemented yet and could be done in some other ways. I think we can try to get that pushed and then I can rebase this one after. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org