[GitHub] [lucene] fcofdez commented on a diff in pull request #11997: Add IntField, LongField, FloatField and DoubleField
fcofdez commented on code in PR #11997: URL: https://github.com/apache/lucene/pull/11997#discussion_r1041891742 ## lucene/core/src/java/org/apache/lucene/document/DoubleField.java: ## @@ -0,0 +1,138 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.document; + +import org.apache.lucene.index.DocValuesType; +import org.apache.lucene.index.PointValues; +import org.apache.lucene.search.IndexOrDocValuesQuery; +import org.apache.lucene.search.PointRangeQuery; +import org.apache.lucene.search.Query; +import org.apache.lucene.util.BytesRef; +import org.apache.lucene.util.NumericUtils; + +/** + * Field that stores a per-document double value for scoring, sorting or value + * retrieval and index the field for fast range filters. If you also need to store the value, you + * should add a separate {@link StoredField} instance. If you need more fine-grained control you can + * use {@link DoublePoint} and {@link DoubleDocValuesField}. + * + * This field defines static factory methods for creating common queries: + * + * + * {@link #newExactQuery(String, double)} for matching an exact 1D point. + * {@link #newRangeQuery(String, double, double)} for matching a 1D range. + * + * + * @see PointValues + */ +public final class DoubleField extends Field { + /** + * Creates a new DoubleField, indexing the provided point and storing it as a DocValue + * + * @param name field name + * @param value the double value + * @param sorted configure the field to support multiple DocValues + * @throws IllegalArgumentException if the field name or value is null. + */ + public DoubleField(String name, double value, boolean sorted) { Review Comment: That makes sense and I find `multivalued` less confusing than `sorted`. I used that name to be aligned with the `DocValuesType`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #11999: Add support for stored fields to MemoryIndex
uschindler commented on PR #11999: URL: https://github.com/apache/lucene/pull/11999#issuecomment-1340615081 Hi, I have no idea, what the reason for this issue/PR is (can I have a bit more information, is there an issue, too?), because MemoryIndex is mostly used for the old highlighting. I am fine to add stored fields to the MemoryIndex, but from my perspective the whole encoding looks useless: Why do we need to encode the stored contents at all into a byte[]? MemoryIndex only has one document and like terms, why not simply use a `Map` as datastore. `Object` can be any type here. To me it looks like an overhead to encode values on indexing and decode them, although the encoded values are never exposed. Merging MemoryIndex into another conventional index is no issue, as MemoryIndexReader does not implement CodecReader at all, so Bulk Merging won't work and merging will use the LeafReader interface. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on pull request #11995: draft pr
jpountz commented on PR #11995: URL: https://github.com/apache/lucene/pull/11995#issuecomment-1340620512 > Thanks for your reply, does that mean I can add a new custom codec in Lucene? I was more thinking you could maintain a codec with your own stored fields format in your own codebase and take advantage of the fact that codecs are pluggable in Lucene. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] romseygeek commented on pull request #11999: Add support for stored fields to MemoryIndex
romseygeek commented on PR #11999: URL: https://github.com/apache/lucene/pull/11999#issuecomment-1340643868 MemoryIndex is used by the monitor module and by the ES percolator as well, and that's where we may have issues - there are some slow queries in ES that use stored fields to search non-indexed documents, and they won't work when run against a MemoryIndex. We need to store the types of the data so that we can call the correct methods on StoredFieldsVisitor. But you're right, it might be easier to store as a Map and then use instanceof checks to get the types - will give that a go and see if it works! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #11999: Add support for stored fields to MemoryIndex
rmuir commented on PR #11999: URL: https://github.com/apache/lucene/pull/11999#issuecomment-1340950877 > there are some slow queries in ES that use stored fields to search non-indexed documents, and they won't work when run against a MemoryIndex. Sorry, this seems like its the problem you should fix. not adding stored fields to memoryindex! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #11999: Add support for stored fields to MemoryIndex
rmuir commented on PR #11999: URL: https://github.com/apache/lucene/pull/11999#issuecomment-1340958535 i hate to be that guy, but -1, this isn't a real use-case. we all know you shouldnt search on stored fields. shame on you. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] romseygeek commented on pull request #11999: Add support for stored fields to MemoryIndex
romseygeek commented on PR #11999: URL: https://github.com/apache/lucene/pull/11999#issuecomment-1340992346 > you shouldnt search on stored fields It works surprisingly well as a confirmation phase in a two-phase query! The fact that memory indexes don't support stored fields also makes highlighting in monitor/percolator more annoying. And with Uwe's suggested refactor it's a very minimal change. It's also I think the only LeafReader functionality apart from Vectors that MemoryIndex doesn't support? It just seems a weird omission. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #11999: Add support for stored fields to MemoryIndex
rmuir commented on PR #11999: URL: https://github.com/apache/lucene/pull/11999#issuecomment-1340995971 i'm sticking with my -1. i will also keep close watch on all stored fields issues now, to try to make sure it isn't for "searching abuse cases" like this. I think, maybe we should use stackwalker and throw exception if someone retrieves storedfields from a query subclass. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] romseygeek commented on pull request #11999: Add support for stored fields to MemoryIndex
romseygeek commented on PR #11999: URL: https://github.com/apache/lucene/pull/11999#issuecomment-1341120986 To clarify, is your objection to adding support for `document()` to the only `LeafReader` implementation that doesn't have it, or is it that `document()` is a method on `LeafReader` in the first place? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #11999: Add support for stored fields to MemoryIndex
rmuir commented on PR #11999: URL: https://github.com/apache/lucene/pull/11999#issuecomment-1341154505 it doesn't need stored fields because stored fields aren't for searching on. PERIOD. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] romseygeek commented on pull request #11999: Add support for stored fields to MemoryIndex
romseygeek commented on PR #11999: URL: https://github.com/apache/lucene/pull/11999#issuecomment-1341167622 You don't have an objection to them being used for highlighting though, right? In the elasticsearch percolator we have a bunch of workarounds to fetch stored fields from elsewhere when highlighting because we can't get them from a MemoryIndex. Which is a pain. I understand your objection to what we're doing with stored fields in elasticsearch (although I disagree with it!). What I don't understand is your technical objection to this specific change. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a diff in pull request #11999: Add support for stored fields to MemoryIndex
uschindler commented on code in PR #11999: URL: https://github.com/apache/lucene/pull/11999#discussion_r1042400407 ## lucene/memory/src/java/org/apache/lucene/index/memory/MemoryIndex.java: ## @@ -1797,9 +1825,34 @@ public int maxDoc() { } @Override -public void document(int docID, StoredFieldVisitor visitor) { +public void document(int docID, StoredFieldVisitor visitor) throws IOException { if (DEBUG) System.err.println("MemoryIndexReader.document"); - // no-op: there are no stored fields + for (Info info : fields.values()) { +StoredFieldVisitor.Status status = visitor.needsField(info.fieldInfo); +if (status == StoredFieldVisitor.Status.STOP) { + return; +} +if (status == StoredFieldVisitor.Status.NO) { + continue; +} +if (info.storedValues != null) { + for (Object value : info.storedValues) { +if (value instanceof BytesRef bytes) { Review Comment: ❤️ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #11999: Add support for stored fields to MemoryIndex
rmuir commented on PR #11999: URL: https://github.com/apache/lucene/pull/11999#issuecomment-1341220984 If you want to search on a field, you index it. This is lucene 101. I have to push back on the idea that its ok to search on the stored fields: it isn't! I will be watching closely now that I know this is happening. The stored fields are intended for returning the top-N summary results, and code around them should be optimized just for that. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] romseygeek commented on pull request #11999: Add support for stored fields to MemoryIndex
romseygeek commented on PR #11999: URL: https://github.com/apache/lucene/pull/11999#issuecomment-1341240534 > The stored fields are intended for returning the top-N summary results, and code around them should be optimized just for that. And with this change you can now use a MemoryIndex to do that. I know what you're pushing back against, but I don't think this specific change makes any difference there? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #11999: Add support for stored fields to MemoryIndex
rmuir commented on PR #11999: URL: https://github.com/apache/lucene/pull/11999#issuecomment-1341244642 It isn't needed, because the memoryindex only really needs the postings to match what the query is doing. The highlighter has been working for years with memoryindex, without storedfields support, because searches don't use the stored fields. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #11999: Add support for stored fields to MemoryIndex
rmuir commented on PR #11999: URL: https://github.com/apache/lucene/pull/11999#issuecomment-1341245393 and there's no way in hell you can convince me that we should allow it. we should absolutely not support such nonsense in lucene. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gvaysman-at-github opened a new issue, #12000: Lucene-facet leaves ThreadLocal that creates a memory leak
gvaysman-at-github opened a new issue, #12000: URL: https://github.com/apache/lucene/issues/12000 ### Description When the web application that uses Lucene Facets is shut down (say, from Tomcat Manager app), ThreadLocal from UTF8TaxonomyWriterCache (the field bytes) is never cleaned up. As a result, not only Tomcat complains of possible thread leak, the heap dump confirms this (below). The code examination (including the latest lucene-facet 9.4.2) shows no bytes.remove() anywhere. Just clearing the BytesRefBuilder internal array is not sufficient. Tomcat outputs to the console the following: SEVERE [Thread-213] org.apache.catalina.loader.WebappClassLoaderBase.checkThreadLocalMapForLeaks The web application [mh] created a ThreadLocal with key of type [org.apache.lucene.facet.taxonomy.writercache.UTF8TaxonomyWriterCache$1] (value [org.apache.lucene.facet.taxonomy.writercache.UTF8TaxonomyWriterCache$1@4c6f99ab]) and a value of type [org.apache.lucene.util.BytesRefBuilder] (value [Unknown]) but failed to remove it when the web application was stopped. Threads are going to be renewed over time to try and avoid a probable memory leak. Heap dump shows that the web app class loader, with all statics, etc. (unrelated to Lucene), cannot be released because of this leak  ### Version and environment details _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on issue #12000: Lucene-facet leaves ThreadLocal that creates a memory leak
rmuir commented on issue #12000: URL: https://github.com/apache/lucene/issues/12000#issuecomment-1341271839 Thanks @gvaysman-at-github for reporting this. I was unaware of this threadlocal, indeed it looks problematic just from the description in this cache: ``` /** A "cache" that never frees memory, and stores labels in a BytesRefHash (utf-8 encoding). */ public final class UTF8TaxonomyWriterCache implements TaxonomyWriterCache, Accountable { private final ThreadLocal bytes = new ThreadLocal() { @Override protected BytesRefBuilder initialValue() { return new BytesRefBuilder(); } }; ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on issue #12000: Lucene-facet leaves ThreadLocal that creates a memory leak
rmuir commented on issue #12000: URL: https://github.com/apache/lucene/issues/12000#issuecomment-1341280462 The UTF8TaxonomyWriterCache does have a `public void close() {}` but I'm not sure when/if it gets invoked as I'm not familiar with this cache and will need to look more into it. But one possibility to fix this cache is to replace the ThreadLocal with CloseableThreadLocal, and close it out in `close()`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] romseygeek commented on pull request #11999: Add support for stored fields to MemoryIndex
romseygeek commented on PR #11999: URL: https://github.com/apache/lucene/pull/11999#issuecomment-1341283678 The elasticsearch highlighter for the percolator has been working for years with a really annoying workaround, because MemoryIndex is the only LeafReader implementation in lucene that doesn't support the `document` method, and that this small PR will allow me to remove. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on issue #12000: Lucene-facet leaves ThreadLocal that creates a memory leak
rmuir commented on issue #12000: URL: https://github.com/apache/lucene/issues/12000#issuecomment-1341291174 A possible workaround for now might be to configure `LruTaxonomyWriterCache` instead of the default `UTF8TaxonomyWriterCache`. The LRU-based implementation doesn't use any threadlocals. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #11999: Add support for stored fields to MemoryIndex
rmuir commented on PR #11999: URL: https://github.com/apache/lucene/pull/11999#issuecomment-1341303231 > The elasticsearch highlighter for the percolator has been working for years with a really annoying workaround, because MemoryIndex is the only LeafReader implementation in lucene that doesn't support the `document` method, and that this small PR will allow me to remove. Thanks @romseygeek this is an actual valid use-case (as opposed to searchiing on stored fields which is NOT). please proceed. Though I will mention, its not true about only LeafReader without `document`, trust me, there are others, i know (look in IndexingChain.java if you need an example). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov commented on a diff in pull request #11860: GITHUB-11830 Better optimize storage for vector connections
msokolov commented on code in PR #11860: URL: https://github.com/apache/lucene/pull/11860#discussion_r1042658193 ## lucene/backward-codecs/src/java/org/apache/lucene/backward_codecs/lucene94/ExpandingVectorValues.java: ## @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.lucene.backward_codecs.lucene94; + +import java.io.IOException; +import org.apache.lucene.index.FilterVectorValues; +import org.apache.lucene.index.VectorValues; +import org.apache.lucene.util.BytesRef; + +/** reads from byte-encoded data */ +public class ExpandingVectorValues extends FilterVectorValues { + + private final float[] value; + + /** + * Constructs ExpandingVectorValues with passed byte encoded VectorValues iterator + * + * @param in the wrapped values + */ + protected ExpandingVectorValues(VectorValues in) { Review Comment: @rmuir I ... hear your frustration, and. I'm sorry if I mocked you; I also get frustrated. > I raised this on the original PR and it was just totally **ignored** I do see there was no comment directly addressing yours, back in that other issue, but I think the substance *was* addressed, at least in part. We're *not* doing any quantization (that was originally an idea that got floated, and then discarded). I guess this ExpandingVectorValues is doing the opposite of quantization (no information loss)? And there was some reasonable discussion here about how to eliminate even that. I guess Ben is planning a more radical refactor, which ... I guess I would prefer to see separately, but @benwtrent you're doing the work, if you want to combine, we can review a larger change. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov commented on a diff in pull request #11860: GITHUB-11830 Better optimize storage for vector connections
msokolov commented on code in PR #11860: URL: https://github.com/apache/lucene/pull/11860#discussion_r1042658193 ## lucene/backward-codecs/src/java/org/apache/lucene/backward_codecs/lucene94/ExpandingVectorValues.java: ## @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.lucene.backward_codecs.lucene94; + +import java.io.IOException; +import org.apache.lucene.index.FilterVectorValues; +import org.apache.lucene.index.VectorValues; +import org.apache.lucene.util.BytesRef; + +/** reads from byte-encoded data */ +public class ExpandingVectorValues extends FilterVectorValues { + + private final float[] value; + + /** + * Constructs ExpandingVectorValues with passed byte encoded VectorValues iterator + * + * @param in the wrapped values + */ + protected ExpandingVectorValues(VectorValues in) { Review Comment: @rmuir I ... hear your frustration, and. I'm sorry if I mocked you; I also get frustrated. > I raised this on the original PR and it was just totally **ignored** I do see there was no comment directly addressing yours, back in that other issue, but I think the substance *was* addressed, at least in part. We're *not* doing any quantization (that was originally an idea that got floated, and then discarded). I guess this ExpandingVectorValues is doing the opposite of quantization (no information loss)? [ deleted some stuff that is no longer relevant since this has been pushed ] -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] benwtrent commented on a diff in pull request #11860: GITHUB-11830 Better optimize storage for vector connections
benwtrent commented on code in PR #11860: URL: https://github.com/apache/lucene/pull/11860#discussion_r1042673843 ## lucene/backward-codecs/src/java/org/apache/lucene/backward_codecs/lucene94/ExpandingVectorValues.java: ## @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.lucene.backward_codecs.lucene94; + +import java.io.IOException; +import org.apache.lucene.index.FilterVectorValues; +import org.apache.lucene.index.VectorValues; +import org.apache.lucene.util.BytesRef; + +/** reads from byte-encoded data */ +public class ExpandingVectorValues extends FilterVectorValues { + + private final float[] value; + + /** + * Constructs ExpandingVectorValues with passed byte encoded VectorValues iterator + * + * @param in the wrapped values + */ + protected ExpandingVectorValues(VectorValues in) { Review Comment: FYI, I am working on a refactor. It's fairly big, but separates out Byte and Float vector fields, values, searches, etc. This way we don't overload `binaryValue` to handle byte vectors and the API is clearer. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on pull request #11972: Generalize range query optimization on sorted indexes to descending sorts.
jpountz commented on PR #11972: URL: https://github.com/apache/lucene/pull/11972#issuecomment-1342208682 It is a bit tricky indeed, thanks for having a look @gsmiller ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz merged pull request #11972: Generalize range query optimization on sorted indexes to descending sorts.
jpountz merged PR #11972: URL: https://github.com/apache/lucene/pull/11972 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz merged pull request #11964: Make RandomAccessVectorValues an implementation detail of HNSW implementations rather than a proper API.
jpountz merged PR #11964: URL: https://github.com/apache/lucene/pull/11964 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz closed issue #10623: How should we expose VectorValues.RandomAccess? [LUCENE-9583]
jpountz closed issue #10623: How should we expose VectorValues.RandomAccess? [LUCENE-9583] URL: https://github.com/apache/lucene/issues/10623 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org