[GitHub] [lucene] fcofdez commented on a diff in pull request #11997: Add IntField, LongField, FloatField and DoubleField

2022-12-07 Thread GitBox


fcofdez commented on code in PR #11997:
URL: https://github.com/apache/lucene/pull/11997#discussion_r1041891742


##
lucene/core/src/java/org/apache/lucene/document/DoubleField.java:
##
@@ -0,0 +1,138 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.document;
+
+import org.apache.lucene.index.DocValuesType;
+import org.apache.lucene.index.PointValues;
+import org.apache.lucene.search.IndexOrDocValuesQuery;
+import org.apache.lucene.search.PointRangeQuery;
+import org.apache.lucene.search.Query;
+import org.apache.lucene.util.BytesRef;
+import org.apache.lucene.util.NumericUtils;
+
+/**
+ * Field that stores a per-document double value for scoring, 
sorting or value
+ * retrieval and index the field for fast range filters. If you also need to 
store the value, you
+ * should add a separate {@link StoredField} instance. If you need more 
fine-grained control you can
+ * use {@link DoublePoint} and {@link DoubleDocValuesField}.
+ *
+ * This field defines static factory methods for creating common queries:
+ *
+ * 
+ *   {@link #newExactQuery(String, double)} for matching an exact 1D point.
+ *   {@link #newRangeQuery(String, double, double)} for matching a 1D 
range.
+ * 
+ *
+ * @see PointValues
+ */
+public final class DoubleField extends Field {
+  /**
+   * Creates a new DoubleField, indexing the provided point and storing it as 
a DocValue
+   *
+   * @param name field name
+   * @param value the double value
+   * @param sorted configure the field to support multiple DocValues
+   * @throws IllegalArgumentException if the field name or value is null.
+   */
+  public DoubleField(String name, double value, boolean sorted) {

Review Comment:
   That makes sense and I find `multivalued` less confusing than `sorted`. I 
used that name to be aligned with the `DocValuesType`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #11999: Add support for stored fields to MemoryIndex

2022-12-07 Thread GitBox


uschindler commented on PR #11999:
URL: https://github.com/apache/lucene/pull/11999#issuecomment-1340615081

   Hi,
   I have no idea, what the reason for this issue/PR is (can I have a bit more 
information, is there an issue, too?), because MemoryIndex is mostly used for 
the old highlighting. I am fine to add stored fields to the MemoryIndex, but 
from my perspective the whole encoding looks useless: Why do we need to encode 
the stored contents at all into a byte[]? MemoryIndex only has one document and 
like terms, why not simply use a `Map` as datastore. `Object` 
can be any type here. To me it looks like an overhead to encode values on 
indexing and decode them, although the encoded values are never exposed.
   Merging MemoryIndex into another conventional index is no issue, as 
MemoryIndexReader does not implement CodecReader at all, so Bulk Merging won't 
work and merging will use the LeafReader interface.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz commented on pull request #11995: draft pr

2022-12-07 Thread GitBox


jpountz commented on PR #11995:
URL: https://github.com/apache/lucene/pull/11995#issuecomment-1340620512

   > Thanks for your reply, does that mean I can add a new custom codec in 
Lucene?
   
   I was more thinking you could maintain a codec with your own stored fields 
format in your own codebase and take advantage of the fact that codecs are 
pluggable in Lucene.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] romseygeek commented on pull request #11999: Add support for stored fields to MemoryIndex

2022-12-07 Thread GitBox


romseygeek commented on PR #11999:
URL: https://github.com/apache/lucene/pull/11999#issuecomment-1340643868

   MemoryIndex is used by the monitor module and by the ES percolator as well, 
and that's where we may have issues - there are some slow queries in ES that 
use stored fields to search non-indexed documents, and they won't work when run 
against a MemoryIndex.
   
   We need to store the types of the data so that we can call the correct 
methods on StoredFieldsVisitor.  But you're right, it might be easier to store 
as a Map and then use instanceof checks to get the types - will give that a go 
and see if it works!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #11999: Add support for stored fields to MemoryIndex

2022-12-07 Thread GitBox


rmuir commented on PR #11999:
URL: https://github.com/apache/lucene/pull/11999#issuecomment-1340950877

   > there are some slow queries in ES that use stored fields to search 
non-indexed documents, and they won't work when run against a MemoryIndex.
   
   Sorry, this seems like its the problem you should fix. not adding stored 
fields to memoryindex!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #11999: Add support for stored fields to MemoryIndex

2022-12-07 Thread GitBox


rmuir commented on PR #11999:
URL: https://github.com/apache/lucene/pull/11999#issuecomment-1340958535

   i hate to be that guy, but -1, this isn't a real use-case. we all know you 
shouldnt search on stored fields. shame on you.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] romseygeek commented on pull request #11999: Add support for stored fields to MemoryIndex

2022-12-07 Thread GitBox


romseygeek commented on PR #11999:
URL: https://github.com/apache/lucene/pull/11999#issuecomment-1340992346

   > you shouldnt search on stored fields
   
   It works surprisingly well as a confirmation phase in a two-phase query!
   
   The fact that memory indexes don't support stored fields also makes 
highlighting in monitor/percolator more annoying.  And with Uwe's suggested 
refactor it's a very minimal change.  It's also I think the only LeafReader 
functionality apart from Vectors that MemoryIndex doesn't support? It just 
seems a weird omission.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #11999: Add support for stored fields to MemoryIndex

2022-12-07 Thread GitBox


rmuir commented on PR #11999:
URL: https://github.com/apache/lucene/pull/11999#issuecomment-1340995971

   i'm sticking with my -1. i will also keep close watch on all stored fields 
issues now, to try to make sure it isn't for "searching abuse cases" like this.
   
   I think, maybe we should use stackwalker and throw exception if someone 
retrieves storedfields from a query subclass.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] romseygeek commented on pull request #11999: Add support for stored fields to MemoryIndex

2022-12-07 Thread GitBox


romseygeek commented on PR #11999:
URL: https://github.com/apache/lucene/pull/11999#issuecomment-1341120986

   To clarify, is your objection to adding support for `document()` to the only 
`LeafReader` implementation that doesn't have it, or is it that `document()` is 
a method on `LeafReader` in the first place?  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #11999: Add support for stored fields to MemoryIndex

2022-12-07 Thread GitBox


rmuir commented on PR #11999:
URL: https://github.com/apache/lucene/pull/11999#issuecomment-1341154505

   it doesn't need stored fields because stored fields aren't for searching on. 
PERIOD.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] romseygeek commented on pull request #11999: Add support for stored fields to MemoryIndex

2022-12-07 Thread GitBox


romseygeek commented on PR #11999:
URL: https://github.com/apache/lucene/pull/11999#issuecomment-1341167622

   You don't have an objection to them being used for highlighting though, 
right? In the elasticsearch percolator we have a bunch of workarounds to fetch 
stored fields from elsewhere when highlighting because we can't get them from a 
MemoryIndex.  Which is a pain.
   
   I understand your objection to what we're doing with stored fields in 
elasticsearch (although I disagree with it!).  What I don't understand is your 
technical objection to this specific change.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on a diff in pull request #11999: Add support for stored fields to MemoryIndex

2022-12-07 Thread GitBox


uschindler commented on code in PR #11999:
URL: https://github.com/apache/lucene/pull/11999#discussion_r1042400407


##
lucene/memory/src/java/org/apache/lucene/index/memory/MemoryIndex.java:
##
@@ -1797,9 +1825,34 @@ public int maxDoc() {
 }
 
 @Override
-public void document(int docID, StoredFieldVisitor visitor) {
+public void document(int docID, StoredFieldVisitor visitor) throws 
IOException {
   if (DEBUG) System.err.println("MemoryIndexReader.document");
-  // no-op: there are no stored fields
+  for (Info info : fields.values()) {
+StoredFieldVisitor.Status status = visitor.needsField(info.fieldInfo);
+if (status == StoredFieldVisitor.Status.STOP) {
+  return;
+}
+if (status == StoredFieldVisitor.Status.NO) {
+  continue;
+}
+if (info.storedValues != null) {
+  for (Object value : info.storedValues) {
+if (value instanceof BytesRef bytes) {

Review Comment:
   ❤️



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #11999: Add support for stored fields to MemoryIndex

2022-12-07 Thread GitBox


rmuir commented on PR #11999:
URL: https://github.com/apache/lucene/pull/11999#issuecomment-1341220984

   If you want to search on a field, you index it. This is lucene 101.
   
   I have to push back on the idea that its ok to search on the stored fields: 
it isn't! I will be watching closely now that I know this is happening. The 
stored fields are intended for returning the top-N summary results, and code 
around them should be optimized just for that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] romseygeek commented on pull request #11999: Add support for stored fields to MemoryIndex

2022-12-07 Thread GitBox


romseygeek commented on PR #11999:
URL: https://github.com/apache/lucene/pull/11999#issuecomment-1341240534

   > The stored fields are intended for returning the top-N summary results, 
and code around them should be optimized just for that.
   
   And with this change you can now use a MemoryIndex to do that.  I know what 
you're pushing back against, but I don't think this specific change makes any 
difference there?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #11999: Add support for stored fields to MemoryIndex

2022-12-07 Thread GitBox


rmuir commented on PR #11999:
URL: https://github.com/apache/lucene/pull/11999#issuecomment-1341244642

   It isn't needed, because the memoryindex only really needs the postings to 
match what the query is doing. The highlighter has been working for years with 
memoryindex, without storedfields support, because searches don't use the 
stored fields.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #11999: Add support for stored fields to MemoryIndex

2022-12-07 Thread GitBox


rmuir commented on PR #11999:
URL: https://github.com/apache/lucene/pull/11999#issuecomment-1341245393

   and there's no way in hell you can convince me that we should allow it. we 
should absolutely not support such nonsense in lucene.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gvaysman-at-github opened a new issue, #12000: Lucene-facet leaves ThreadLocal that creates a memory leak

2022-12-07 Thread GitBox


gvaysman-at-github opened a new issue, #12000:
URL: https://github.com/apache/lucene/issues/12000

   ### Description
   
   When the web application that uses Lucene Facets is shut down (say, from 
Tomcat Manager app), ThreadLocal from UTF8TaxonomyWriterCache 
(the field bytes) is never cleaned up.
   
   As a result, not only Tomcat complains of possible thread leak, the heap 
dump confirms this (below).
   
   The code examination (including the latest lucene-facet 9.4.2) shows no 
bytes.remove() anywhere. Just clearing the BytesRefBuilder internal array is 
not sufficient.
   
   Tomcat outputs to the console the following:
   
  SEVERE [Thread-213] 
org.apache.catalina.loader.WebappClassLoaderBase.checkThreadLocalMapForLeaks 
The web application [mh] created a ThreadLocal with key of type 
[org.apache.lucene.facet.taxonomy.writercache.UTF8TaxonomyWriterCache$1] (value 
[org.apache.lucene.facet.taxonomy.writercache.UTF8TaxonomyWriterCache$1@4c6f99ab])
 and a value of type [org.apache.lucene.util.BytesRefBuilder] (value [Unknown]) 
but failed to remove it when the web application was stopped. Threads are going 
to be renewed over time to try and avoid a probable memory leak.
   
   Heap dump shows that the web app class loader, with all statics, etc. 
(unrelated to Lucene), cannot be released because of this leak
   
    
   
![LuceneThreadLocalLeak](https://user-images.githubusercontent.com/120044372/206240629-1b9619af-1d3a-4ed1-846c-ec0815ce600b.png)
   
   
   
   
   ### Version and environment details
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on issue #12000: Lucene-facet leaves ThreadLocal that creates a memory leak

2022-12-07 Thread GitBox


rmuir commented on issue #12000:
URL: https://github.com/apache/lucene/issues/12000#issuecomment-1341271839

   Thanks @gvaysman-at-github for reporting this. I was unaware of this 
threadlocal, indeed it looks problematic just from the description in this 
cache:
   
   ```
   /** A "cache" that never frees memory, and stores labels in a BytesRefHash 
(utf-8 encoding). */
   public final class UTF8TaxonomyWriterCache implements TaxonomyWriterCache, 
Accountable {
 private final ThreadLocal bytes =
 new ThreadLocal() {
   @Override
   protected BytesRefBuilder initialValue() {
 return new BytesRefBuilder();
   }
 };
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on issue #12000: Lucene-facet leaves ThreadLocal that creates a memory leak

2022-12-07 Thread GitBox


rmuir commented on issue #12000:
URL: https://github.com/apache/lucene/issues/12000#issuecomment-1341280462

   The UTF8TaxonomyWriterCache does have a `public void close() {}` but I'm not 
sure when/if it gets invoked as I'm not familiar with this cache and will need 
to look more into it. But one possibility to fix this cache is to replace the 
ThreadLocal with CloseableThreadLocal, and close it out in `close()`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] romseygeek commented on pull request #11999: Add support for stored fields to MemoryIndex

2022-12-07 Thread GitBox


romseygeek commented on PR #11999:
URL: https://github.com/apache/lucene/pull/11999#issuecomment-1341283678

   The elasticsearch highlighter for the percolator has been working for years 
with a really annoying workaround, because MemoryIndex is the only LeafReader 
implementation in lucene that doesn't support the `document` method, and that 
this small PR will allow me to remove.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on issue #12000: Lucene-facet leaves ThreadLocal that creates a memory leak

2022-12-07 Thread GitBox


rmuir commented on issue #12000:
URL: https://github.com/apache/lucene/issues/12000#issuecomment-1341291174

   A possible workaround for now might be to configure `LruTaxonomyWriterCache` 
instead of the default `UTF8TaxonomyWriterCache`. The LRU-based implementation 
doesn't use any threadlocals. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #11999: Add support for stored fields to MemoryIndex

2022-12-07 Thread GitBox


rmuir commented on PR #11999:
URL: https://github.com/apache/lucene/pull/11999#issuecomment-1341303231

   > The elasticsearch highlighter for the percolator has been working for 
years with a really annoying workaround, because MemoryIndex is the only 
LeafReader implementation in lucene that doesn't support the `document` method, 
and that this small PR will allow me to remove.
   
   Thanks @romseygeek this is an actual valid use-case (as opposed to 
searchiing on stored fields which is NOT). please proceed. Though I will 
mention, its not true about only LeafReader without `document`, trust me, there 
are others, i know (look in IndexingChain.java if you need an example).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] msokolov commented on a diff in pull request #11860: GITHUB-11830 Better optimize storage for vector connections

2022-12-07 Thread GitBox


msokolov commented on code in PR #11860:
URL: https://github.com/apache/lucene/pull/11860#discussion_r1042658193


##
lucene/backward-codecs/src/java/org/apache/lucene/backward_codecs/lucene94/ExpandingVectorValues.java:
##
@@ -0,0 +1,49 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.backward_codecs.lucene94;
+
+import java.io.IOException;
+import org.apache.lucene.index.FilterVectorValues;
+import org.apache.lucene.index.VectorValues;
+import org.apache.lucene.util.BytesRef;
+
+/** reads from byte-encoded data */
+public class ExpandingVectorValues extends FilterVectorValues {
+
+  private final float[] value;
+
+  /**
+   * Constructs ExpandingVectorValues with passed byte encoded VectorValues 
iterator
+   *
+   * @param in the wrapped values
+   */
+  protected ExpandingVectorValues(VectorValues in) {

Review Comment:
   @rmuir I ... hear your frustration, and. I'm sorry if I mocked you; I also 
get frustrated.
   
   >  I raised this on the original PR and it was just totally **ignored**
   
   I do see there was no comment directly addressing yours, back in that other 
issue, but I think the substance *was* addressed, at least in part. We're *not* 
doing any quantization (that was originally an idea that got floated, and then 
discarded). I guess this ExpandingVectorValues is doing the opposite of 
quantization (no information loss)? And there was some reasonable discussion 
here about how to eliminate even that. I guess Ben is planning a more radical 
refactor, which ... I guess I would prefer to see separately, but @benwtrent 
you're doing the work, if you want to combine, we can review a larger change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] msokolov commented on a diff in pull request #11860: GITHUB-11830 Better optimize storage for vector connections

2022-12-07 Thread GitBox


msokolov commented on code in PR #11860:
URL: https://github.com/apache/lucene/pull/11860#discussion_r1042658193


##
lucene/backward-codecs/src/java/org/apache/lucene/backward_codecs/lucene94/ExpandingVectorValues.java:
##
@@ -0,0 +1,49 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.backward_codecs.lucene94;
+
+import java.io.IOException;
+import org.apache.lucene.index.FilterVectorValues;
+import org.apache.lucene.index.VectorValues;
+import org.apache.lucene.util.BytesRef;
+
+/** reads from byte-encoded data */
+public class ExpandingVectorValues extends FilterVectorValues {
+
+  private final float[] value;
+
+  /**
+   * Constructs ExpandingVectorValues with passed byte encoded VectorValues 
iterator
+   *
+   * @param in the wrapped values
+   */
+  protected ExpandingVectorValues(VectorValues in) {

Review Comment:
   @rmuir I ... hear your frustration, and. I'm sorry if I mocked you; I also 
get frustrated.
   
   >  I raised this on the original PR and it was just totally **ignored**
   
   I do see there was no comment directly addressing yours, back in that other 
issue, but I think the substance *was* addressed, at least in part. We're *not* 
doing any quantization (that was originally an idea that got floated, and then 
discarded). I guess this ExpandingVectorValues is doing the opposite of 
quantization (no information loss)? [ deleted some stuff that is no longer 
relevant since this has been pushed ]



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] benwtrent commented on a diff in pull request #11860: GITHUB-11830 Better optimize storage for vector connections

2022-12-07 Thread GitBox


benwtrent commented on code in PR #11860:
URL: https://github.com/apache/lucene/pull/11860#discussion_r1042673843


##
lucene/backward-codecs/src/java/org/apache/lucene/backward_codecs/lucene94/ExpandingVectorValues.java:
##
@@ -0,0 +1,49 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.backward_codecs.lucene94;
+
+import java.io.IOException;
+import org.apache.lucene.index.FilterVectorValues;
+import org.apache.lucene.index.VectorValues;
+import org.apache.lucene.util.BytesRef;
+
+/** reads from byte-encoded data */
+public class ExpandingVectorValues extends FilterVectorValues {
+
+  private final float[] value;
+
+  /**
+   * Constructs ExpandingVectorValues with passed byte encoded VectorValues 
iterator
+   *
+   * @param in the wrapped values
+   */
+  protected ExpandingVectorValues(VectorValues in) {

Review Comment:
   FYI, I am working on a refactor. It's fairly big, but separates out Byte and 
Float vector fields, values, searches, etc. This way we don't overload 
`binaryValue` to handle byte vectors and the API is clearer.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz commented on pull request #11972: Generalize range query optimization on sorted indexes to descending sorts.

2022-12-07 Thread GitBox


jpountz commented on PR #11972:
URL: https://github.com/apache/lucene/pull/11972#issuecomment-1342208682

   It is a bit tricky indeed, thanks for having a look @gsmiller !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz merged pull request #11972: Generalize range query optimization on sorted indexes to descending sorts.

2022-12-07 Thread GitBox


jpountz merged PR #11972:
URL: https://github.com/apache/lucene/pull/11972


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz merged pull request #11964: Make RandomAccessVectorValues an implementation detail of HNSW implementations rather than a proper API.

2022-12-07 Thread GitBox


jpountz merged PR #11964:
URL: https://github.com/apache/lucene/pull/11964


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz closed issue #10623: How should we expose VectorValues.RandomAccess? [LUCENE-9583]

2022-12-07 Thread GitBox


jpountz closed issue #10623: How should we expose VectorValues.RandomAccess? 
[LUCENE-9583]
URL: https://github.com/apache/lucene/issues/10623


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org