[
https://issues.apache.org/jira/browse/SOLR-12890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16948963#comment-16948963
]
Matt Davis edited comment on SOLR-12890 at 10/10/19 9:52 PM:
-------------------------------------------------------------
A few years ago I did an implementation in LuMongo (now Zulia.io ) which is
lucene based. It used superbit to create fields for each bit and then used a
min should match query based on the similarity (or higher) requested. For a
index of 30 million docs with 300 dimensional word vectors projected into 1000
bits it was taking like 30 seconds so I figured there was probably a better way
but I will note this here. There were a lot of other fields in the index as
well.
[https://github.com/zuliaio/zuliasearch/blob/master/zulia-server/src/main/java/io/zulia/server/index/ZuliaIndex.java#L347]
[https://github.com/zuliaio/zuliasearch/blob/master/zulia-server/src/main/java/io/zulia/server/index/ZuliaIndex.java#L347]
[https://github.com/lumongo/lumongo/issues/116]
was (Author: mdavis95):
A few years ago I did an implementation in LuMongo (now Zulia.io ) which is
lucene based. It used superbit to create fields for each bit and then used a
min should match query based on the similarity (or higher) requested. For a
index of 30 million docs with 300 dimensional word vectors projected into 1000
bits it was taking like 30 seconds so I figured there was probably a better way
but I will note this here. There were a lot of other fields in the index as
well.
[https://github.com/zuliaio/zuliasearch/blob/master/zulia-server/src/main/java/io/zulia/server/index/ZuliaIndex.java#L347]
[https://github.com/zuliaio/zuliasearch/blob/master/zulia-server/src/main/java/io/zulia/server/index/ZuliaIndex.java#L347]
> Vector Search in Solr (Umbrella Issue)
> --------------------------------------
>
> Key: SOLR-12890
> URL: https://issues.apache.org/jira/browse/SOLR-12890
> Project: Solr
> Issue Type: New Feature
> Reporter: mosh
> Priority: Major
>
> We have recently come across a need to index documents containing vectors
> using solr, and have even worked on a small POC. We used an URP to calculate
> the LSH(we chose to use the superbit algorithm, but the code is designed in a
> way the algorithm picked can be easily chagned), and stored the vector in
> either sparse or dense forms, in a binary field.
> Perhaps an addition of an LSH URP in conjunction with a query parser that
> uses the same properties to calculate LSH(or maybe ktree, or some other
> algorithm all together) should be considered as a Solr feature?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]