[
https://issues.apache.org/jira/browse/LUCENE-10054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17401172#comment-17401172
]
Mayya Sharipova edited comment on LUCENE-10054 at 8/20/21, 7:39 PM:
--------------------------------------------------------------------
Current .vem index file structure:
{code:java}
+-------------+--------+----------+----------+----------+---------+-------
| FieldNumber | SimFun | VDOffset | VDLength | VIOffset | VILength| dims
+-------------+--------+----------+----------+----------+---------+-------
+------+--------+--------------+
| size | docIds | graphOffsets |
+------+--------+--------------+
{code}
* Field Number: a number of the filed
* SimFun: an ordinal similarity function
* VDOffset: an offset in the vector data file (.vec file), where the original
vector values are stored
* VDLength: the length of vector data for this field
* VIOffset: an offset int the vector index file (.vex file), where the node's
connections are stored
* VILength: the length of vector index for the this field
* dims: vector field's dimensions
* size: the total number of documents with this vector field
* docIDs: ids of documents with this vector field
* graphOffsets: for each document's vector its offsets in .vex file where its
connections are stored
Proposed .vem index file structure:
{code:java}
+-------------+--------+----------+----------+----------+---------+-------
| FieldNumber | SimFun | VDOffset | VDLength | VIOffset | VILength| dims
+-------------+--------+----------+----------+----------+---------+-------
+-------------+-----------+-----+-------------+--------+
| LevelsCount | SizeLevel0| ... | SizeLevelmax| docIds
+-------------+-----------+-----+-------------+--------+
---+------------+-----+--------------+
ep | NodesLevel1| ... | NodesLevelmax
---+------------+-----+--------------+
--------------------+-----+----------------------+
graphOffsetsLevel0 | ... | graphOffsetsLevelmax |
--------------------+---- +----------------------+
{code}
* LevelCount: number of levels
* SizeLevel0, ..., SizeLevelmax: number of nodes of each level
* NodesLevel1, ..., NodesLevelmax: a list of the ordinals in level 0 that are
contained in each level . It is not necessary to store nodes on level 0 as this
level contains all nodes.
* graphOffsetsLevelmax, ..., graphOffsetsLevel0: graph offsets for
corresponding levels from 0 to max
was (Author: mayya):
Proposed .vem index file structure:
{code:java}
+-------------+--------+----------+----------+----------+---------+-------
| FieldNumber | SimFun | VDOffset | VDLength | VIOffset | VILength| dims
+-------------+--------+----------+----------+----------+---------+-------
+-------------+-----------+-----+-------------+--------+
| LevelsCount | SizeLevel0| ... | SizeLevelmax| docIds
+-------------+-----------+-----+-------------+--------+
---+------------+-----+--------------+
ep | NodesLevel1| ... | NodesLevelmax
---+------------+-----+--------------+
--------------------+-----+----------------------+
graphOffsetsLevel0 | ... | graphOffsetsLevelmax |
--------------------+---- +----------------------+
{code}
LevelCount - number of levels
SizeLevel0, ..., SizeLevelmax - number of nodes of each level
ep - entry point of the graph on the top level as a node ordinal
NodesLevel1, ..., NodesLevelmax - list of nodes on each level from 1 to max; it
not necessary to store nodes on level 0 as this level contains all nodes.
graphOffsetsLevelmax, ..., graphOffsetsLevel0 - graph offsets for corresponding
levels from 0 to max
> Handle hierarchy in HNSW graph
> ------------------------------
>
> Key: LUCENE-10054
> URL: https://issues.apache.org/jira/browse/LUCENE-10054
> Project: Lucene - Core
> Issue Type: Task
> Reporter: Mayya Sharipova
> Priority: Major
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> Currently HNSW graph is represented as a single layer graph.
> We would like to extend it to handle hierarchy as per
> [discussion|https://issues.apache.org/jira/browse/LUCENE-9004?focusedCommentId=17393216&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17393216].
>
>
> TODO tasks:
> - add multiple layers in the HnswGraph class
> - modify the format in Lucene90HnswVectorsWriter and
> Lucene90HnswVectorsReader to handle multiple layers
> - modify graph construction and search algorithm to handle hierarchy
> - run benchmarks
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]