[GitHub] [lucene] mocobeta commented on a diff in pull request #854: LUCENE-10545: allow to link github pr from changes
mocobeta commented on code in PR #854: URL: https://github.com/apache/lucene/pull/854#discussion_r862349332 ## lucene/CHANGES.txt: ## @@ -175,6 +175,8 @@ Other * LUCENE-10541: Test-framework: limit the default length of MockTokenizer tokens to 255. (Robert Muir, Uwe Schindler, Tomoko Uchida, Dawid Weiss) +* GITHUB#854: Allow to link to GitHub pull request from CHANGES (Tomoko Uchida) Review Comment: ```suggestion * GITHUB#854: Allow to link to GitHub pull request from CHANGES. (Tomoko Uchida, Jan Høydahl) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta commented on pull request #854: LUCENE-10545: allow to link github pr from changes
mocobeta commented on PR #854: URL: https://github.com/apache/lucene/pull/854#issuecomment-1113979801 Thanks everyone for your feedback. I don't think this is such a big deal though, I'll wait for a few more days before merging as I wrote in the mail list. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10551) LowercaseAsciiCompression should return false when it's unable to compress
[ https://issues.apache.org/jira/browse/LUCENE-10551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17530419#comment-17530419 ] Michael McCandless commented on LUCENE-10551: - Yeah this is definitely no good. The exception includes the exact term {{LowercaseAsciiCompression}} was trying to compress – I'll see if that exact string repros the exception. Thanks for reporting [~irislpx]! > LowercaseAsciiCompression should return false when it's unable to compress > -- > > Key: LUCENE-10551 > URL: https://issues.apache.org/jira/browse/LUCENE-10551 > Project: Lucene - Core > Issue Type: Bug > Environment: Lucene version 8.11.1 >Reporter: Peixin Li >Priority: Major > > {code:java} > Failed to commit.. > java.lang.IllegalStateException: 10 <> 5 > cion1cion_desarrollociones_oraclecionesnaturacionesnatura2tedppsa-integrationdemotiontion > cloud gen2tion instance - dev1tion instance - > testtion-devbtion-instancetion-prdtion-promerication-qation064533tion535217tion697401tion761348tion892818tion_matrationcauto_simmonsintgic_testtioncloudprodictioncloudservicetiongateway10tioninstance-jtsundatamartprd??o > at > org.apache.lucene.util.compress.LowercaseAsciiCompression.compress(LowercaseAsciiCompression.java:115) > at > org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.writeBlock(BlockTreeTermsWriter.java:834) > at > org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.writeBlocks(BlockTreeTermsWriter.java:628) > at > org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.pushTerm(BlockTreeTermsWriter.java:947) > at > org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.write(BlockTreeTermsWriter.java:912) > at > org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter.write(BlockTreeTermsWriter.java:318) > at > org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.write(PerFieldPostingsFormat.java:170) > at > org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:120) > at > org.apache.lucene.index.DefaultIndexingChain.flush(DefaultIndexingChain.java:267) > at > org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:350) > at > org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:476) > at > org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:656) > at > org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:3364) > at > org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3770) > at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3728) > {code} > {code:java} > key=och-live--WorkResource.renewAssignmentToken.ResourceTime[namespace=workflow, > resourceGroup=workflow-service-overlay]{availabilityDomain=iad-ad-1, > domainId=och-live, host=workflow-service-overlay-01341.node.ad1.us-ashburn-1}) > java.lang.IllegalStateException: 29 <> 16 > analytics-platform-test/koala/cluster-tool:1.0-20220310151438.492,mesh_istio_examples-bookinfo-details-v1:1.16.2mesh_istio_examples-bookinfo-reviews-v3:1.16.2oce-clamav:1.0.219oce-tesseract:1.0.7oce-traefik:2.5.1oci-opensearch:1.2.4.8.103oda-digital-assistant-control-plane-train-pool-workflow-v6:22.02.14oke-coresvcs-k8s-dns-dnsmasq-nanny-amd64@sha256:41aa9160ceeaf712369ddb660d02e5ec06d1679965e6930351967c8cf5ed62d4oke-coresvcs-k8s-dns-kube-dns-amd64@sha256:2cf34b04106974952996c6ef1313f165ce65b4ad68a3051f51b1b8f91ba5f838oke-coresvcs-k8s-dns-sidecar-amd64@sha256:8a82c7288725cb4de9c7cd8d5a78279208e379f35751539b406077f9a3163dcdoke-coresvcs-node-problem-detector@sha256:9d54df11804a862c54276648702a45a6a0027a9d930a86becd69c34cc84bf510oke-coresvcs-oke-fluentd-lumberjack@sha256:5f3f10b187eb804ce4e84bc3672de1cf318c0f793f00dac01cd7da8beea8f269oke-etcd-operator@sha256:4353a2e5ef02bb0f6b046a8d6219b1af359a2c1141c358ff110e395f29d0bfc8oke-oke-hyperkube-amd64@sha256:3c734f46099400507f938090eb9a874338fa25cde425ac9409df4c885759752foke-public-busybox@sha256:4cee1979ba0bf7db9fc5d28fb7b798ca69ae95a47c5fecf46327720df4ff352doke-public-coredns@sha256:86f8cfc74497f04e181ab2e1d26d2fd8bd46c4b33ce24b55620efcdfcb214670oke-public-coredns@sha256:8cd974302f1f6108f6f31312f8181ae723b514e2022089cdcc3db10666c49228oke-public-etcd@sha256:b751e459bc2a8f079f6730dd8462671b253c7c8b0d0eb47c67888d5091c6bb77oke-public-etcd@sha256:d6a76200a6e9103681bc2cf7fefbcada0dd9372d52cf8964178d846b89959d14oke-public-etcd@sha256:fa056479342b45479ac74c58176ddad43687d5fc295375d705808f9dfb48439aoke-public-kube-proxy@sha256:93b2da69d03413671606e22294c59a69fe404088a5f6e74d6394a8641fdb899boke-public-tiller@sha256:c2eb6e580123622e1bc0ff3becae3a3a71
[GitHub] [lucene] LuXugang commented on pull request #792: LUCENE-10502: Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle ordToDoc
LuXugang commented on PR #792: URL: https://github.com/apache/lucene/pull/792#issuecomment-1114014473 > I've left a couple of comments, but I am also wondering what is the reason you deleted `TestBackwardsCompatibility.java` in this PR? Hi @mayya-sharipova , since `9.1.0-cfs` and `9.1.0-nocfs` were added from https://github.com/apache/lucene/commit/04127ed9fc6972bb3d6ab5aed86c3512b2971aba which based on `Lucene91HnswVectorsFormat`, and changes in RP still work on `Lucene91HnswVectorsFormat`, so TestBackwardsCompatibility will not pass. After renaming modified `Lucene91HnswVectorsXXX` to `Lucene92HnswVectorsXXX` , deleted TestBackwardsCompatibility could be reverted. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] LuXugang commented on a diff in pull request #792: LUCENE-10502: Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle ordToDoc
LuXugang commented on code in PR #792: URL: https://github.com/apache/lucene/pull/792#discussion_r862373384 ## lucene/core/src/java/org/apache/lucene/codecs/lucene91/Lucene91HnswVectorsReader.java: ## @@ -258,14 +257,18 @@ public TopDocs search(String field, float[] target, int k, Bits acceptDocs, int } private OffHeapVectorValues getOffHeapVectorValues(FieldEntry fieldEntry) throws IOException { -IndexInput bytesSlice = -vectorData.slice("vector-data", fieldEntry.vectorDataOffset, fieldEntry.vectorDataLength); -return new OffHeapVectorValues( -fieldEntry.dimension, fieldEntry.size(), fieldEntry.ordToDoc, bytesSlice); +if (fieldEntry.docsWithFieldOffset == -2) { + return OffHeapVectorValues.emptyOffHeapVectorValues(fieldEntry.dimension); +} else { + IndexInput bytesSlice = + vectorData.slice("vector-data", fieldEntry.vectorDataOffset, fieldEntry.vectorDataLength); + return new OffHeapVectorValues( + fieldEntry.dimension, fieldEntry.size(), fieldEntry, vectorData, bytesSlice); Review Comment: `OffHeapVectorValues` was used in `Lucene91HnswVectorsWriter#writeField(...)` which `FieldEntry` is not existed. I had to do it base on the least changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] LuXugang commented on a diff in pull request #792: LUCENE-10502: Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle ordToDoc
LuXugang commented on code in PR #792: URL: https://github.com/apache/lucene/pull/792#discussion_r862374612 ## lucene/core/src/java/org/apache/lucene/codecs/lucene91/Lucene91HnswVectorsReader.java: ## @@ -320,13 +323,19 @@ private static class FieldEntry { final int numLevels; final int dimension; private final int size; -final int[] ordToDoc; -private final IntUnaryOperator ordToDocOperator; final int[][] nodesByLevel; // for each level the start offsets in vectorIndex file from where to read neighbours final long[] graphOffsetsByLevel; - -FieldEntry(DataInput input, VectorSimilarityFunction similarityFunction) throws IOException { +final long docsWithFieldOffset; +final long docsWithFieldLength; +final short jumpTableEntryCount; +final byte denseRankPower; +long addressesOffset; Review Comment: > Do we need to write ordToDoc mapping for the dense case, where is 1-1 mapping between ord and doc? May be, we can skip it in this case? Since after this change, only sparse case will write these meta data, Or could we make these new variables be final and set them all `null` like ``` // sparse if (docsWithFieldOffset != -1 && docsWithFieldOffset != -2) { addressesOffset = input.readLong(); blockShift = input.readVInt(); meta = DirectMonotonicReader.loadMeta(input, size, blockShift); addressesLength = input.readLong(); }else{ addressesOffset = null; blockShift = null; meta = null; addressesLength = null; } ``` which seems a little ugly, could gibe some suggestions? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] LuXugang commented on a diff in pull request #792: LUCENE-10502: Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle ordToDoc
LuXugang commented on code in PR #792: URL: https://github.com/apache/lucene/pull/792#discussion_r862374612 ## lucene/core/src/java/org/apache/lucene/codecs/lucene91/Lucene91HnswVectorsReader.java: ## @@ -320,13 +323,19 @@ private static class FieldEntry { final int numLevels; final int dimension; private final int size; -final int[] ordToDoc; -private final IntUnaryOperator ordToDocOperator; final int[][] nodesByLevel; // for each level the start offsets in vectorIndex file from where to read neighbours final long[] graphOffsetsByLevel; - -FieldEntry(DataInput input, VectorSimilarityFunction similarityFunction) throws IOException { +final long docsWithFieldOffset; +final long docsWithFieldLength; +final short jumpTableEntryCount; +final byte denseRankPower; +long addressesOffset; Review Comment: > Do we need to write ordToDoc mapping for the dense case, where is 1-1 mapping between ord and doc? May be, we can skip it in this case? Since after this change, only sparse case will write these meta data, Or could we make these new variables be final and set them all `null` like ``` // sparse if (docsWithFieldOffset != -1 && docsWithFieldOffset != -2) { addressesOffset = input.readLong(); blockShift = input.readVInt(); meta = DirectMonotonicReader.loadMeta(input, size, blockShift); addressesLength = input.readLong(); }else{ addressesOffset = null; blockShift = null; meta = null; addressesLength = null; } ``` which seems a little ugly, could you give some suggestions? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10551) LowercaseAsciiCompression should return false when it's unable to compress
[ https://issues.apache.org/jira/browse/LUCENE-10551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-10551: Attachment: LUCENE-10551-test.patch Status: Open (was: Open) Hmm I wrote a simple test case for each of the reported strings here but they do not fail. Maybe this test is invoking the API slightly differently than {{{}blocktree{}}}'s suffix compression? > LowercaseAsciiCompression should return false when it's unable to compress > -- > > Key: LUCENE-10551 > URL: https://issues.apache.org/jira/browse/LUCENE-10551 > Project: Lucene - Core > Issue Type: Bug > Environment: Lucene version 8.11.1 >Reporter: Peixin Li >Priority: Major > Attachments: LUCENE-10551-test.patch > > > {code:java} > Failed to commit.. > java.lang.IllegalStateException: 10 <> 5 > cion1cion_desarrollociones_oraclecionesnaturacionesnatura2tedppsa-integrationdemotiontion > cloud gen2tion instance - dev1tion instance - > testtion-devbtion-instancetion-prdtion-promerication-qation064533tion535217tion697401tion761348tion892818tion_matrationcauto_simmonsintgic_testtioncloudprodictioncloudservicetiongateway10tioninstance-jtsundatamartprd??o > at > org.apache.lucene.util.compress.LowercaseAsciiCompression.compress(LowercaseAsciiCompression.java:115) > at > org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.writeBlock(BlockTreeTermsWriter.java:834) > at > org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.writeBlocks(BlockTreeTermsWriter.java:628) > at > org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.pushTerm(BlockTreeTermsWriter.java:947) > at > org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.write(BlockTreeTermsWriter.java:912) > at > org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter.write(BlockTreeTermsWriter.java:318) > at > org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.write(PerFieldPostingsFormat.java:170) > at > org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:120) > at > org.apache.lucene.index.DefaultIndexingChain.flush(DefaultIndexingChain.java:267) > at > org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:350) > at > org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:476) > at > org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:656) > at > org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:3364) > at > org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3770) > at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3728) > {code} > {code:java} > key=och-live--WorkResource.renewAssignmentToken.ResourceTime[namespace=workflow, > resourceGroup=workflow-service-overlay]{availabilityDomain=iad-ad-1, > domainId=och-live, host=workflow-service-overlay-01341.node.ad1.us-ashburn-1}) > java.lang.IllegalStateException: 29 <> 16 > analytics-platform-test/koala/cluster-tool:1.0-20220310151438.492,mesh_istio_examples-bookinfo-details-v1:1.16.2mesh_istio_examples-bookinfo-reviews-v3:1.16.2oce-clamav:1.0.219oce-tesseract:1.0.7oce-traefik:2.5.1oci-opensearch:1.2.4.8.103oda-digital-assistant-control-plane-train-pool-workflow-v6:22.02.14oke-coresvcs-k8s-dns-dnsmasq-nanny-amd64@sha256:41aa9160ceeaf712369ddb660d02e5ec06d1679965e6930351967c8cf5ed62d4oke-coresvcs-k8s-dns-kube-dns-amd64@sha256:2cf34b04106974952996c6ef1313f165ce65b4ad68a3051f51b1b8f91ba5f838oke-coresvcs-k8s-dns-sidecar-amd64@sha256:8a82c7288725cb4de9c7cd8d5a78279208e379f35751539b406077f9a3163dcdoke-coresvcs-node-problem-detector@sha256:9d54df11804a862c54276648702a45a6a0027a9d930a86becd69c34cc84bf510oke-coresvcs-oke-fluentd-lumberjack@sha256:5f3f10b187eb804ce4e84bc3672de1cf318c0f793f00dac01cd7da8beea8f269oke-etcd-operator@sha256:4353a2e5ef02bb0f6b046a8d6219b1af359a2c1141c358ff110e395f29d0bfc8oke-oke-hyperkube-amd64@sha256:3c734f46099400507f938090eb9a874338fa25cde425ac9409df4c885759752foke-public-busybox@sha256:4cee1979ba0bf7db9fc5d28fb7b798ca69ae95a47c5fecf46327720df4ff352doke-public-coredns@sha256:86f8cfc74497f04e181ab2e1d26d2fd8bd46c4b33ce24b55620efcdfcb214670oke-public-coredns@sha256:8cd974302f1f6108f6f31312f8181ae723b514e2022089cdcc3db10666c49228oke-public-etcd@sha256:b751e459bc2a8f079f6730dd8462671b253c7c8b0d0eb47c67888d5091c6bb77oke-public-etcd@sha256:d6a76200a6e9103681bc2cf7fefbcada0dd9372d52cf8964178d846b89959d14oke-public-etcd@sha256:fa056479342b45479ac74c58176ddad43687d5fc295375d705808f9dfb48439aoke-public-kube-proxy@sha256:93b2da69d03413671606e22294c59a69fe404088a5f6e74d6394a8641fdb899boke-public-t
[jira] [Commented] (LUCENE-10551) LowercaseAsciiCompression should return false when it's unable to compress
[ https://issues.apache.org/jira/browse/LUCENE-10551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17530423#comment-17530423 ] Michael McCandless commented on LUCENE-10551: - I think at a minimum we should fix the exception message to not expect/require that the incoming {{byte[]}} is really {{UTF-8}} – we should change the {{.toUTF8String()}} to {{.toString()}} which will render the bytes accurately in hex I think. > LowercaseAsciiCompression should return false when it's unable to compress > -- > > Key: LUCENE-10551 > URL: https://issues.apache.org/jira/browse/LUCENE-10551 > Project: Lucene - Core > Issue Type: Bug > Environment: Lucene version 8.11.1 >Reporter: Peixin Li >Priority: Major > Attachments: LUCENE-10551-test.patch > > > {code:java} > Failed to commit.. > java.lang.IllegalStateException: 10 <> 5 > cion1cion_desarrollociones_oraclecionesnaturacionesnatura2tedppsa-integrationdemotiontion > cloud gen2tion instance - dev1tion instance - > testtion-devbtion-instancetion-prdtion-promerication-qation064533tion535217tion697401tion761348tion892818tion_matrationcauto_simmonsintgic_testtioncloudprodictioncloudservicetiongateway10tioninstance-jtsundatamartprd??o > at > org.apache.lucene.util.compress.LowercaseAsciiCompression.compress(LowercaseAsciiCompression.java:115) > at > org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.writeBlock(BlockTreeTermsWriter.java:834) > at > org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.writeBlocks(BlockTreeTermsWriter.java:628) > at > org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.pushTerm(BlockTreeTermsWriter.java:947) > at > org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.write(BlockTreeTermsWriter.java:912) > at > org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter.write(BlockTreeTermsWriter.java:318) > at > org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.write(PerFieldPostingsFormat.java:170) > at > org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:120) > at > org.apache.lucene.index.DefaultIndexingChain.flush(DefaultIndexingChain.java:267) > at > org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:350) > at > org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:476) > at > org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:656) > at > org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:3364) > at > org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3770) > at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3728) > {code} > {code:java} > key=och-live--WorkResource.renewAssignmentToken.ResourceTime[namespace=workflow, > resourceGroup=workflow-service-overlay]{availabilityDomain=iad-ad-1, > domainId=och-live, host=workflow-service-overlay-01341.node.ad1.us-ashburn-1}) > java.lang.IllegalStateException: 29 <> 16 > analytics-platform-test/koala/cluster-tool:1.0-20220310151438.492,mesh_istio_examples-bookinfo-details-v1:1.16.2mesh_istio_examples-bookinfo-reviews-v3:1.16.2oce-clamav:1.0.219oce-tesseract:1.0.7oce-traefik:2.5.1oci-opensearch:1.2.4.8.103oda-digital-assistant-control-plane-train-pool-workflow-v6:22.02.14oke-coresvcs-k8s-dns-dnsmasq-nanny-amd64@sha256:41aa9160ceeaf712369ddb660d02e5ec06d1679965e6930351967c8cf5ed62d4oke-coresvcs-k8s-dns-kube-dns-amd64@sha256:2cf34b04106974952996c6ef1313f165ce65b4ad68a3051f51b1b8f91ba5f838oke-coresvcs-k8s-dns-sidecar-amd64@sha256:8a82c7288725cb4de9c7cd8d5a78279208e379f35751539b406077f9a3163dcdoke-coresvcs-node-problem-detector@sha256:9d54df11804a862c54276648702a45a6a0027a9d930a86becd69c34cc84bf510oke-coresvcs-oke-fluentd-lumberjack@sha256:5f3f10b187eb804ce4e84bc3672de1cf318c0f793f00dac01cd7da8beea8f269oke-etcd-operator@sha256:4353a2e5ef02bb0f6b046a8d6219b1af359a2c1141c358ff110e395f29d0bfc8oke-oke-hyperkube-amd64@sha256:3c734f46099400507f938090eb9a874338fa25cde425ac9409df4c885759752foke-public-busybox@sha256:4cee1979ba0bf7db9fc5d28fb7b798ca69ae95a47c5fecf46327720df4ff352doke-public-coredns@sha256:86f8cfc74497f04e181ab2e1d26d2fd8bd46c4b33ce24b55620efcdfcb214670oke-public-coredns@sha256:8cd974302f1f6108f6f31312f8181ae723b514e2022089cdcc3db10666c49228oke-public-etcd@sha256:b751e459bc2a8f079f6730dd8462671b253c7c8b0d0eb47c67888d5091c6bb77oke-public-etcd@sha256:d6a76200a6e9103681bc2cf7fefbcada0dd9372d52cf8964178d846b89959d14oke-public-etcd@sha256:fa056479342b45479ac74c58176ddad43687d5fc295375d705808f9dfb48439aoke-public-kube-proxy@sha256:93b2da69d03413671606e22294c59a69fe404088a
[GitHub] [lucene] mikemccand opened a new pull request, #858: LUCENE-10551: try to improve testing of LowercaseAsciiCompression
mikemccand opened a new pull request, #858: URL: https://github.com/apache/lucene/pull/858 # Description Just improving testing based on the user-reported `IllegalArgumentException`, but the new tests seem to pass in my few runs ... maybe Jenkins CI builds will uncover an interesting failing seed? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mikemccand commented on pull request #858: LUCENE-10551: try to improve testing of LowercaseAsciiCompression
mikemccand commented on PR #858: URL: https://github.com/apache/lucene/pull/858#issuecomment-1114025936 And to improve testing we should add a higher level test indexing "difficult" terms through IW/blocktree so that the invocations in the test (via blocktree and not the direct low-level API) match what we saw LUCENE-10551. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on a diff in pull request #858: LUCENE-10551: try to improve testing of LowercaseAsciiCompression
rmuir commented on code in PR #858: URL: https://github.com/apache/lucene/pull/858#discussion_r862383315 ## lucene/core/src/test/org/apache/lucene/util/compress/TestLowercaseAsciiCompression.java: ## @@ -118,4 +134,12 @@ public void testRandom() throws IOException { doTestCompress(bytes, len); } } + + public void testAsciiCompressionRandom2() throws IOException { +for (int iter = 0; iter < 100; ++iter) { Review Comment: Thanks for doing this! Maybe run this many iterations with Nightly, but use a smaller amount otherwise? Test is slow as it is: ``` > Task :lucene:core:test :lucene:core:test (SUCCESS): 5456 test(s), 245 skipped > Task :lucene:core:wipeTaskTemp The slowest tests (exceeding 500 ms) during this run: 64.55s TestLowercaseAsciiCompression.testAsciiCompressionRandom2 (:lucene:core) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10216) Add concurrency to addIndexes(CodecReader…) API
[ https://issues.apache.org/jira/browse/LUCENE-10216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17530476#comment-17530476 ] Vigya Sharma commented on LUCENE-10216: --- I think the PR is ready for review, with existing tests passing and added tests for new changes. {{OneMerge}} distribution is now provided by a new {{findMerges(CodecReaders[])}} API in {{{}MergePolicy{}}}, and executed by {{MergeScheduler}} threads. I've also modified the {{MockRandomMergePolicy}} to randomly pick a highly concurrent, (one segment per reader), {{findMerges(...)}} implementation 50% of the time. And confirmed manually that tests pass in both scenarios (this new impl., as well as the default impl. being picked) (thanks Michael McCandless for the suggestion). > Add concurrency to addIndexes(CodecReader…) API > --- > > Key: LUCENE-10216 > URL: https://issues.apache.org/jira/browse/LUCENE-10216 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Reporter: Vigya Sharma >Priority: Major > Time Spent: 6.5h > Remaining Estimate: 0h > > I work at Amazon Product Search, and we use Lucene to power search for the > e-commerce platform. I’m working on a project that involves applying > metadata+ETL transforms and indexing documents on n different _indexing_ > boxes, combining them into a single index on a separate _reducer_ box, and > making it available for queries on m different _search_ boxes (replicas). > Segments are asynchronously copied from indexers to reducers to searchers as > they become available for the next layer to consume. > I am using the addIndexes API to combine multiple indexes into one on the > reducer boxes. Since we also have taxonomy data, we need to remap facet field > ordinals, which means I need to use the {{addIndexes(CodecReader…)}} version > of this API. The API leverages {{SegmentMerger.merge()}} to create segments > with new ordinal values while also merging all provided segments in the > process. > _This is however a blocking call that runs in a single thread._ Until we have > written segments with new ordinal values, we cannot copy them to searcher > boxes, which increases the time to make documents available for search. > I was playing around with the API by creating multiple concurrent merges, > each with only a single reader, creating a concurrently running 1:1 > conversion from old segments to new ones (with new ordinal values). We follow > this up with non-blocking background merges. This lets us copy the segments > to searchers and replicas as soon as they are available, and later replace > them with merged segments as background jobs complete. On the Amazon dataset > I profiled, this gave us around 2.5 to 3x improvement in addIndexes() time. > Each call was given about 5 readers to add on average. > This might be useful add to Lucene. We could create another {{addIndexes()}} > API with a {{boolean}} flag for concurrency, that internally submits multiple > merge jobs (each with a single reader) to the {{ConcurrentMergeScheduler}}, > and waits for them to complete before returning. > While this is doable from outside Lucene by using your thread pool, starting > multiple addIndexes() calls and waiting for them to complete, I felt it needs > some understanding of what addIndexes does, why you need to wait on the merge > and why it makes sense to pass a single reader in the addIndexes API. > Out of box support in Lucene could simplify this for folks a similar use case. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org