GitHub user vamossagar12 created a discussion: Some questions on LanceDB Lakehouse Integration
I was looking at the LanceDB [lakehouse integration](https://fluss.apache.org/docs/streaming-lakehouse/integrate-data-lakes/lance/#vector-embedding-example) and had some questions on the same. 1) I noticed that when creating a table in Fluss, the vector index type is not configurable. Based on my understanding, LanceDB supports [3 types of indexes](https://docs.lancedb.com/indexing/vector-index#index-tuning), so I was wondering if it should be possible to configure this via Fluss? I think having separate index type is useful to allow creating indexes to cater to multiple usecases/embedding types. LanceDB has 2 separate steps for creating a table and creating an index on top of it. Is the expectation that users would use it to create only a table via Fluss and have index created separately? That could be a friction for users. 2) A similar question arises for hybrid search which is also supported by LanceDB. That might need configuring a different set of indexes for FTS described [here](https://docs.lancedb.com/search/full-text-search#tokenize-table-data). Does Fluss look at also enabling hybrid search via the index it sets up? 3) Another question is about [multivector support](https://docs.lancedb.com/search/multivector-search#multivector-support). I think LanceDB expects such vectors to be stored in lists of lists but that is a data type not supported in Fluss I believe. Maybe Fluss can model it as users creating a table with <id> and different columns or rows with vectors and Fluss internally does the mapping of conerting it to the above format. I think this is also an important area so maybe a Fluss integrated solution could be useful. Let me know your thoughts! GitHub link: https://github.com/apache/fluss/discussions/2877 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
