[GitHub] [lucene] javanna commented on a diff in pull request #12019: Clean up vector backward-codecs
javanna commented on code in PR #12019: URL: https://github.com/apache/lucene/pull/12019#discussion_r1052136107 ## lucene/backward-codecs/src/test/org/apache/lucene/backward_codecs/lucene94/Lucene94RWCodec.java: ## @@ -0,0 +1,46 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.backward_codecs.lucene94; + +import org.apache.lucene.codecs.KnnVectorsFormat; +import org.apache.lucene.codecs.perfield.PerFieldKnnVectorsFormat; + +/** Implements the Lucene 9.4 index format for backwards compat testing */ +public class Lucene94RWCodec extends Lucene94Codec { Review Comment: I am not super familiar with backward codecs. Question: what is the advantage of having two variants of the lucene 94 codec ? I was looking for a similar pattern in other previous versions but I did not find much. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] benwtrent commented on a diff in pull request #12019: Clean up vector backward-codecs
benwtrent commented on code in PR #12019: URL: https://github.com/apache/lucene/pull/12019#discussion_r1052196141 ## lucene/backward-codecs/src/test/org/apache/lucene/backward_codecs/lucene94/Lucene94RWCodec.java: ## @@ -0,0 +1,46 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.backward_codecs.lucene94; + +import org.apache.lucene.codecs.KnnVectorsFormat; +import org.apache.lucene.codecs.perfield.PerFieldKnnVectorsFormat; + +/** Implements the Lucene 9.4 index format for backwards compat testing */ +public class Lucene94RWCodec extends Lucene94Codec { Review Comment: The idea here is that we have a ReadWrite codec as I have removed the ability to write in the old one. This is for testing. There are other iterations of a change like this for Lucene92RWCodec, etc. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] vsop-479 commented on pull request #11888: [Fix] Binary search the entries when all suffixes have the same length in a leaf block.
vsop-479 commented on PR #11888: URL: https://github.com/apache/lucene/pull/11888#issuecomment-1357671269 @jpountz Is there any progress on this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #11946: add similarity threshold for hnsw
rmuir commented on PR #11946: URL: https://github.com/apache/lucene/pull/11946#issuecomment-1357737816 > * I agree that this kind of thing is valuable in KNN. KNN is unique when compared to sparse retrieval as you always retrieve K results (unless using a restrictive filter). In some cases, the K retrieved can be irrelevant, especially in the case when a filter is used. That said, it seems better fit outside of Lucene. this isn't any different than BM25 search. Nothing special about KNN. Still no justification to filter by score / scores as percentages. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on a diff in pull request #12019: Clean up vector backward-codecs
rmuir commented on code in PR #12019: URL: https://github.com/apache/lucene/pull/12019#discussion_r1052301297 ## lucene/backward-codecs/src/test/org/apache/lucene/backward_codecs/lucene94/Lucene94RWCodec.java: ## @@ -0,0 +1,46 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.backward_codecs.lucene94; + +import org.apache.lucene.codecs.KnnVectorsFormat; +import org.apache.lucene.codecs.perfield.PerFieldKnnVectorsFormat; + +/** Implements the Lucene 9.4 index format for backwards compat testing */ +public class Lucene94RWCodec extends Lucene94Codec { Review Comment: That's correct, it isn't two variants, it is just that we don't want to provide users ability to "write" new indexes with an ancient codec, only be able to read them. It would be bad (and some kind of error/misconfiguration/bug) if users could do that (e.g. set an ancient codec via Codec.setDefault and create new segments with an ancient format). At the same time, we want our tests to be able to "write" in the old format for testing purposes, so we know we can read all the corner cases. the `.zip`ped up indexes stored in the test suite are very basic, so we wouldn't have good testing with these alone. So that's why you see these RWCodec variants, with the write support etc factored out into test-only code -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] epotyom closed pull request #12025: Issue #11582 Update Faceting user guide
epotyom closed pull request #12025: Issue #11582 Update Faceting user guide URL: https://github.com/apache/lucene/pull/12025 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] epotyom commented on pull request #12025: Issue #11582 Update Faceting user guide
epotyom commented on PR #12025: URL: https://github.com/apache/lucene/pull/12025#issuecomment-1357980091 > -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] epotyom commented on pull request #12025: Issue #11582 Update Faceting user guide
epotyom commented on PR #12025: URL: https://github.com/apache/lucene/pull/12025#issuecomment-1358036804 > From maintenance perspective, I think it would be better to bump the minimum java requirement (e.g. to 19) than to only process the `@snippet` on certain newer JDKs, and produce different-looking docs on 17/18. > > A big problem is that github actions etc are using java 17 (the minimum version), so nothing would really be testing the snippets across code changes and they could get broken easily. Thank you for the reply! Do you think it worth changing the minimum requirement just for this change? I can also look into implementing some basic `@snippet` support in java 17 as a [javadoc Taglet](https://docs.oracle.com/javase/8/docs/technotes/guides/javadoc/taglet/overview.html) and we can use it until there are more reasons to change the requirements, what do you think? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #12025: Issue #11582 Update Faceting user guide
rmuir commented on PR #12025: URL: https://github.com/apache/lucene/pull/12025#issuecomment-1358049032 I'm not familiar enough with the new snippet support to know. I've only glanced at the JEP. It seemed interesting to me for these cases, but at the same time I don't understand the level of compile-time safety that they have. At the same time, for demo module we already have a method for compile-time safety of examples that doesn't rely upon this new `@snippet`. See IndexFiles/SearchFiles where we simply include the source code in the javadocs: https://lucene.apache.org/core/9_4_2/demo/index.html https://lucene.apache.org/core/9_4_2/demo/src-html/org/apache/lucene/demo/IndexFiles.html I'd say, if anything the `@snippet` might be something to defer to later. it would make it easier for me to integrate this change. Currently I wouldn't be happy about turning off ECJ checks on javadocs, passing hardcoded org.apache.lucene.demo stuff to every lucene module, generating radically different docs depending on JDK version, etc. It makes things a lot more complicated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org