[GitHub] [lucene] javanna commented on a diff in pull request #12019: Clean up vector backward-codecs

2022-12-19 Thread GitBox


javanna commented on code in PR #12019:
URL: https://github.com/apache/lucene/pull/12019#discussion_r1052136107


##
lucene/backward-codecs/src/test/org/apache/lucene/backward_codecs/lucene94/Lucene94RWCodec.java:
##
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.backward_codecs.lucene94;
+
+import org.apache.lucene.codecs.KnnVectorsFormat;
+import org.apache.lucene.codecs.perfield.PerFieldKnnVectorsFormat;
+
+/** Implements the Lucene 9.4 index format for backwards compat testing */
+public class Lucene94RWCodec extends Lucene94Codec {

Review Comment:
   I am not super familiar with backward codecs. Question: what is the 
advantage of having two variants of the lucene 94 codec ? I was looking for a 
similar pattern in other previous versions but I did not find much.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] benwtrent commented on a diff in pull request #12019: Clean up vector backward-codecs

2022-12-19 Thread GitBox


benwtrent commented on code in PR #12019:
URL: https://github.com/apache/lucene/pull/12019#discussion_r1052196141


##
lucene/backward-codecs/src/test/org/apache/lucene/backward_codecs/lucene94/Lucene94RWCodec.java:
##
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.backward_codecs.lucene94;
+
+import org.apache.lucene.codecs.KnnVectorsFormat;
+import org.apache.lucene.codecs.perfield.PerFieldKnnVectorsFormat;
+
+/** Implements the Lucene 9.4 index format for backwards compat testing */
+public class Lucene94RWCodec extends Lucene94Codec {

Review Comment:
   The idea here is that we have a ReadWrite codec as I have removed the 
ability to write in the old one. This is for testing. 
   
   There are other iterations of a change like this for Lucene92RWCodec, etc.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] vsop-479 commented on pull request #11888: [Fix] Binary search the entries when all suffixes have the same length in a leaf block.

2022-12-19 Thread GitBox


vsop-479 commented on PR #11888:
URL: https://github.com/apache/lucene/pull/11888#issuecomment-1357671269

   @jpountz Is there any progress on this PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #11946: add similarity threshold for hnsw

2022-12-19 Thread GitBox


rmuir commented on PR #11946:
URL: https://github.com/apache/lucene/pull/11946#issuecomment-1357737816

   > * I agree that this kind of thing is valuable in KNN. KNN is unique when 
compared to sparse retrieval as you always retrieve K results (unless using a 
restrictive filter). In some cases, the K retrieved can be irrelevant, 
especially in the case when a filter is used. That said, it seems better fit 
outside of Lucene.
   
   this isn't any different than BM25 search.
   
   Nothing special about KNN. 
   
   Still no justification to filter by score / scores as percentages.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on a diff in pull request #12019: Clean up vector backward-codecs

2022-12-19 Thread GitBox


rmuir commented on code in PR #12019:
URL: https://github.com/apache/lucene/pull/12019#discussion_r1052301297


##
lucene/backward-codecs/src/test/org/apache/lucene/backward_codecs/lucene94/Lucene94RWCodec.java:
##
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.backward_codecs.lucene94;
+
+import org.apache.lucene.codecs.KnnVectorsFormat;
+import org.apache.lucene.codecs.perfield.PerFieldKnnVectorsFormat;
+
+/** Implements the Lucene 9.4 index format for backwards compat testing */
+public class Lucene94RWCodec extends Lucene94Codec {

Review Comment:
   That's correct, it isn't two variants, it is just that we don't want to 
provide users ability to "write" new indexes with an ancient codec, only be 
able to read them. It would be bad (and some kind of 
error/misconfiguration/bug) if users could do that (e.g. set an ancient codec 
via Codec.setDefault and create new segments with an ancient format). 
   
   At the same time, we want our tests to be able to "write" in the old format 
for testing purposes, so we know we can read all the corner cases. the 
`.zip`ped up indexes stored in the test suite are very basic, so we wouldn't 
have good testing with these alone.
   
   So that's why you see these RWCodec variants, with the write support etc 
factored out into test-only code



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] epotyom closed pull request #12025: Issue #11582 Update Faceting user guide

2022-12-19 Thread GitBox


epotyom closed pull request #12025: Issue #11582 Update Faceting user guide
URL: https://github.com/apache/lucene/pull/12025


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] epotyom commented on pull request #12025: Issue #11582 Update Faceting user guide

2022-12-19 Thread GitBox


epotyom commented on PR #12025:
URL: https://github.com/apache/lucene/pull/12025#issuecomment-1357980091

   > 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] epotyom commented on pull request #12025: Issue #11582 Update Faceting user guide

2022-12-19 Thread GitBox


epotyom commented on PR #12025:
URL: https://github.com/apache/lucene/pull/12025#issuecomment-1358036804

   > From maintenance perspective, I think it would be better to bump the 
minimum java requirement (e.g. to 19) than to only process the `@snippet` on 
certain newer JDKs, and produce different-looking docs on 17/18.
   > 
   > A big problem is that github actions etc are using java 17 (the minimum 
version), so nothing would really be testing the snippets across code changes 
and they could get broken easily.
   
   Thank you for the reply! Do you think it worth changing the minimum 
requirement just for this change?
   
   I can also look into implementing some basic `@snippet` support in java 17 
as a [javadoc 
Taglet](https://docs.oracle.com/javase/8/docs/technotes/guides/javadoc/taglet/overview.html)
 and we can use it until there are more reasons to change the requirements, 
what do you think?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #12025: Issue #11582 Update Faceting user guide

2022-12-19 Thread GitBox


rmuir commented on PR #12025:
URL: https://github.com/apache/lucene/pull/12025#issuecomment-1358049032

   I'm not familiar enough with the new snippet support to know. I've only 
glanced at the JEP. 
   
   It seemed interesting to me for these cases, but at the same time I don't 
understand the level of compile-time safety that they have.
   
   At the same time, for demo module we already have a method for compile-time 
safety of examples that doesn't rely upon this new `@snippet`. See 
IndexFiles/SearchFiles where we simply include the source code in the javadocs:
   
   https://lucene.apache.org/core/9_4_2/demo/index.html
   
https://lucene.apache.org/core/9_4_2/demo/src-html/org/apache/lucene/demo/IndexFiles.html
   
   I'd say, if anything the `@snippet` might be something to defer to later. it 
would make it easier for me to integrate this change. Currently I wouldn't be 
happy about turning off ECJ checks on javadocs, passing hardcoded 
org.apache.lucene.demo stuff to every lucene module, generating radically 
different docs depending on JDK version, etc. It makes things a lot more 
complicated.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org