GitHub user vamossagar12 created a discussion: Some questions on LanceDB 
Lakehouse Integration

I was looking at the LanceDB [lakehouse 
integration](https://fluss.apache.org/docs/streaming-lakehouse/integrate-data-lakes/lance/#vector-embedding-example)
 and had some questions on the same.

1) I noticed that when creating a table in Fluss, the vector index type is not 
configurable. Based on my understanding, LanceDB supports [3 types of 
indexes](https://docs.lancedb.com/indexing/vector-index#index-tuning), so I was 
wondering if it should be possible to configure this via Fluss? I think having 
separate index type is useful to allow creating indexes to cater to multiple 
usecases/embedding types. LanceDB has 2 separate steps for creating a table and 
creating an index on top of it. Is the expectation that users would use it to 
create only a table via Fluss and have index created separately? That could be 
a friction for users.
2) A similar question arises for hybrid search which is also supported by 
LanceDB. That might need configuring a different set of indexes for FTS 
described 
[here](https://docs.lancedb.com/search/full-text-search#tokenize-table-data). 
Does Fluss look at also enabling hybrid search via the index it sets up?
3) Another question is about [multivector 
support](https://docs.lancedb.com/search/multivector-search#multivector-support).
 I think LanceDB expects such vectors to be stored in lists of lists but that 
is a data type not supported in Fluss I believe. Maybe Fluss can model it as 
users creating a table with <id> and different columns or rows with vectors and 
Fluss internally does the mapping of conerting it to the above format. I think 
this is also an important area so maybe a Fluss integrated solution could be 
useful. 

Let me know your thoughts!

GitHub link: https://github.com/apache/fluss/discussions/2877

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to