[jira] [Comment Edited] (SOLR-12890) Vector Search in Solr (Umbrella Issue)

Joel Bernstein (Jira) Tue, 19 Nov 2019 10:54:55 -0800


    [ 
https://issues.apache.org/jira/browse/SOLR-12890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16948916#comment-16948916
 ]


Joel Bernstein edited comment on SOLR-12890 at 11/19/19 6:53 PM:
-----------------------------------------------------------------

h1. Rough survey of some available approaches:
h2. 1) Vector Scoring using Streaming Expressions (works now):

*Docs:* (paste each into "Documents" pane in Solr Admin UI as type:"json") 
{code:java}
curl -X POST -H "Content-Type: application/json" 
http://localhost:8983/solr/food_collection/update?commit=true  --data-binary '
[
{"id": "1", "name_s":"donut","vector_fs":[5.0,0.0,1.0,5.0,0.0,4.0,5.0,1.0]},
{"id": "2", "name_s":"apple 
juice","vector_fs":[1.0,5.0,0.0,0.0,0.0,4.0,4.0,3.0]},
{"id": "3", 
"name_s":"cappuccino","vector_fs":[0.0,5.0,3.0,0.0,4.0,1.0,2.0,3.0]},
{"id": "4", "name_s":"cheese 
pizza","vector_fs":[5.0,0.0,4.0,4.0,0.0,1.0,5.0,2.0]},
{"id": "5", "name_s":"green tea","vector_fs":[0.0,5.0,0.0,0.0,2.0,1.0,1.0,5.0]},
{"id": "6", "name_s":"latte","vector_fs":[0.0,5.0,4.0,0.0,4.0,1.0,3.0,3.0]},
{"id": "7", "name_s":"soda","vector_fs":[0.0,5.0,0.0,0.0,3.0,5.0,5.0,0.0]},
{"id": "8", "name_s":"cheese bread 
sticks","vector_fs":[5.0,0.0,4.0,5.0,0.0,1.0,4.0,2.0]},
{"id": "9", "name_s":"water","vector_fs":[0.0,5.0,0.0,0.0,0.0,0.0,0.0,5.0]},
{"id": "10", "name_s":"cinnamon bread 
sticks","vector_fs":[5.0,0.0,1.0,5.0,0.0,3.0,4.0,2.0]}
]{code}
 

*Streaming Expression:*
{code:java}
sort(
  select(
     search(food_collection, 
            q="*:*", 
            fl="id,vector_fs", 
            sort="id asc", 
            rows=3), 
     cosineSimilarity(vector_fs, array(5.1,0.0,1.0,5.0,0.0,4.0,5.0,1.0)) as 
sim, 
     id), 
  by="sim desc")
 

*Response:*
{
  "result-set": {
    "docs": [
        { "sim": 0.99996111, "id": "1" }, 
        { "sim": 0.98590279, "id": "10" }, 
        { "sim": 0.55566643, "id": "2" }, 
        { "EOF": true, "RESPONSE_TIME": 10 }
    ]
  }
 }{code}
*Benefits*:

1) Works Now out of the box

*Drawbacks*: 
 1) Have to switch searches to using Streaming Expressions, which may not be 
practical in some use cases.
 2) Solr doesn't have multi-dimensional point field support yet (SOLR-11077), 
so you can only store one vector per field per document.
 3) Requires traversing all vectors and scoring them. Needs some sore of KNN 
option (possibly this could be done with another inner-streaming expression 
using a hash function?)
h2. 2) Available Solr Vector Search Plugin (works now):

[https://github.com/saaay71/solr-vector-scoring]
 Note: I recently reached out to Ali (the author of this plugin) and asked him 
to add an ASL 2.0 license, which he has now done, so we can pull in this code 
as needed.

*Docs*
{code:java}
curl -X POST -H "Content-Type: application/json" 
http://localhost:8983/solr/{your-collection-name}/update?commit=true  
--data-binary '
[
    {"name":"example 0", "vector":"0|1.55 1|3.53 2|2.3 3|0.7 4|3.44 5|2.33 "},
    {"name":"example 1", "vector":"0|3.54 1|0.4 2|4.16 3|4.88 4|4.28 5|4.25 "},
    {"name":"example 2", "vector":"0|1.11 1|0.6 2|1.47 3|1.99 4|2.91 5|1.01 "},
    {"name":"example 3", "vector":"0|0.06 1|4.73 2|0.29 3|1.27 4|0.69 5|3.9 "},
    {"name":"example 4", "vector":"0|4.01 1|3.69 2|2 3|4.36 4|1.09 5|0.1 "},
    {"name":"example 5", "vector":"0|0.64 1|3.95 2|1.03 3|1.65 4|0.99 5|0.09 "}
]'
{code}
*Request:*
{code:java}
http://localhost:8983/solr/{your-collection-name}/query?fl=name,score,vector&q={!vp
 f=vector vector="0.1,4.75,0.3,1.2,0.7,4.0"}
{code}
*Response:*
{code:java}
{
  "responseHeader":{
    "status":0,
    "QTime":1,
    "params":{
      "q":"{!myqp f=vector vector=\"0.1,4.75,0.3,1.2,0.7,4.0\"}",
      "fl":"name,score,vector"}},
  "response":{"numFound":6,"start":0,"maxScore":0.99984086,"docs":[
      {
        "name":["example 3"],
        "vector":["0|0.06 1|4.73 2|0.29 3|1.27 4|0.69 5|3.9 "],
        "score":0.99984086},
      {
        "name":["example 0"],
        "vector":["0|1.55 1|3.53 2|2.3 3|0.7 4|3.44 5|2.33 "],
        "score":0.7693964},
      {
        "name":["example 5"],
        "vector":["0|0.64 1|3.95 2|1.03 3|1.65 4|0.99 5|0.09 "],
        "score":0.76322395},
      {
        "name":["example 4"],
        "vector":["0|4.01 1|3.69 2|2 3|4.36 4|1.09 5|0.1 "],
        "score":0.5328145},
      {
        "name":["example 1"],
        "vector":["0|3.54 1|0.4 2|4.16 3|4.88 4|4.28 5|4.25 "],
        "score":0.48513117},
      {
        "name":["example 2"],
        "vector":["0|1.11 1|0.6 2|1.47 3|1.99 4|2.91 5|1.01 "],
        "score":0.44909418}]
  }}
{code}
*Benefits:* 
 1) Works now (when you install the plugin)
 2) Can use in regular search (Search Handler) and don't have to switch to 
streaming expressions

*Drawbacks*:
 1) Slow implementation. It uses payloads to store the values within the 
vector, and traversing those is very expensive. If we were going to follow this 
approach, at a minimum we should switch from using payloads to overriding term 
frequencies for a speedup.
 2) Only supports one vector per field per document (I think... haven't tried 
with a multi-valued text field, but it looks like the payload scoring logic 
expects only a single value per dimension. Might be possible to modify this...)
 3) Requires traversing all vectors and scoring them. Needs some sort of KNN 
option to not be slow at scale.
h2. 3) Available Solr Vector Search Plugin with LSH Hashing (works now)

[https://github.com/moshebla/solr-vector-scoring]

[~moshebla], who created this JIRA issue, forked option #2 above and added and 
LSH implementation so that a KNN filter can be applied before the vector 
scoring, greatly improving the speed of the vector search / scoring.

*Docs:*
{code:java}
curl -X POST -H "Content-Type: application/json" 
http://localhost:8983/solr/{your-collection-name}/update?update.chain=LSH&commit=true
  --data-binary '
[
    {"id":"1", "vector":"1.55,3.53,2.3,0.7,3.44,2.33"},
    {"id":"2", "vector":"3.54,0.4,4.16,4.88,4.28,4.25"}
]'
{code}
*Request:*
{code:java}
http://localhost:8983/solr/{your-collection-name}/query?fl=name,score,vector&q={!vp
 f=vector vector=\"1.55,3.53,2.3,0.7,3.44,2.33\" lsh=\"true\" 
reRankDocs=\"5\"}&fl=name,score,vector,_vector_,_lsh_hash_
{code}
*Response:*
{code:java}
{
  "responseHeader":{
    "status":0,
    "QTime":8,
    "params":{
      "q":"{!vp f=vector vector=\"1.55,3.53,2.3,0.7,3.44,2.33\" lsh=\"true\" 
reRankDocs=\"5\"}",
      "fl":"id, score, vector, _vector_, _lsh_hash_",
      "wt":"xml"}},
  "response":{"numFound":1,"start":0,"maxScore":36.65736,"docs":[
      {
        "id": "1",
        "vector":"1.55,3.53,2.3,0.7,3.44,2.33",
        "_vector_":"/z/GZmZAYeuFQBMzMz8zMzNAXCj2QBUeuA==",
        "_lsh_hash_":["0_8",
          "1_35",
          "2_7",
          "3_10",
          "4_2",
          "5_35",
          "6_16",
          "7_30",
          "8_27",
          "9_12",
          "10_7",
          "11_32",
          "12_48",
          "13_36",
          "14_10",
          "15_7",
          "16_42",
          "17_5",
          "18_3",
          "19_2",
          "20_1",
          "21_0",
          "22_24",
          "23_18",
          "24_42",
          "25_31",
          "26_35",
          "27_8",
          "28_1",
          "29_24",
          "30_47",
          "31_14",
          "32_22",
          "33_39",
          "34_0",
          "35_34",
          "36_34",
          "37_39",
          "38_27",
          "39_27",
          "40_45",
          "41_10",
          "42_21",
          "43_34",
          "44_41",
          "45_9",
          "46_31",
          "47_0",
          "48_4",
          "49_43"],
        "score":36.65736}
      ]
  }
}
{code}
*Benefits:*
 1) Works now (when you install the plugin)
 2) Can use in regular search (Search Handler) and don't have to switch to 
streaming expressions
 3) Provides a KNN implementation that prevents every document's vector from 
having to be scored

*Drawbacks:*
 1) Slow. Even though the KNN reduces unnecessarily scoring all vectors, using 
payloads in still inefficient here. If we were going to follow this approach, 
at a minimum we should switch from using payloads to overriding term 
frequencies for a speedup.
 2) Only supports one vector per field per document (same as original it was 
forked from)
 3) Doesn't currently have an ASL 2.0 license on it for reuse. [~moshebla] - 
since Ali has added a license to the original repo now, can you please pull 
that into your repo so that your changes are also covered under ASL2.0? Thanks!
h2. 4) Port over the Elasticsearch implementation

Elasticsearch recently implemented sparse and dense vector fields, though they 
chose to release the feature under their proprietary Elastic license instead of 
an open source license. HOWEVER, when they were originally implementing this 
feature they were intending for it to be open source, and only later decided to 
change the license to be proprietary, so most of the feature was built under 
and ASL 2.0 license before they restricted it. This means that we can port over 
any part of their implementation that existed prior to this commit under the 
ASL 2.0 license: 
[https://github.com/elastic/elasticsearch/commit/952ddf247a2df8ade64ae067c1904436fd7a2ba8]

*Benefits*:
 1) Encoded vectors into BinaryDocValues, so certainly more efficient that the 
approaches with the Solr Plugins above (using payloads)

*Drawbacks:*
 1) Doesn't work with Solr yet, so would have to do more work to port it over.
 2) Can't copy over future improvements since Elasticsearch isn't open source 
(only early versions of this feature were)
 3) Only supports one vector per field per document currently (though this 
shouldn't be too hard to change)
 4) Doesn't appear to provide a quantized representation for KNN filtering 
prior to scoring, so likely slow at scale.
h2. 5) Port over Open Distro for Stretchysearch implementation

[https://github.com/opendistro-for-elasticsearch/k-NN]
 Open Distro for "Stretchysearch" (not using the original project name because 
Elastic is suing them for Trademark infringement for doing so) created their 
own ASL2.0 plugin for their Open Distro version which implements vector search 
with KNN support. On the surface, this looks like it may be an enhanced version 
over the Elastic version in #4 above, but it appears to still be a work in 
progress, so not sure if it is production ready yet (haven't tried it).

*Benefits:*
 1) Provides a quantized representation of the vectors for KNN, so should still 
be fast at scale with lots of docs
 2) They appear to making codec-level changes, which implies this approach may 
end up being way more efficient than any of the others mentioned above over 
time.

*Drawbacks:*

1) Developed against "Stretchysearch" code base, so will take extra effort to 
port to Solr.
 2) Looks like a work in progress, so likely not ready for production use and 
porting
 3) Only supports one vector per field per document currently
h2. 6) Others please contribute ideas!

There's lots of ways to approach this problem, and we have some really smart 
people in this community. All ideas welcome - possibly we should even implement 
a few different approaches.


was (Author: solrtrey):
h1. Rough survey of some available approaches:
h2. 1) Vector Scoring using Streaming Expressions (works now):

*Docs:* (paste each into "Documents" pane in Solr Admin UI as type:"json") 
{code:java}
curl -X POST -H "Content-Type: application/json" 
http://localhost:8983/solr/food_collection/update?commit=true  --data-binary '
[
{"id": "1", "name_s":"donut","vector_fs":[5.0,0.0,1.0,5.0,0.0,4.0,5.0,1.0]},
{"id": "2", "name_s":"apple 
juice","vector_fs":[1.0,5.0,0.0,0.0,0.0,4.0,4.0,3.0]},
{"id": "3", 
"name_s":"cappuccino","vector_fs":[0.0,5.0,3.0,0.0,4.0,1.0,2.0,3.0]},
{"id": "4", "name_s":"cheese 
pizza","vector_fs":[5.0,0.0,4.0,4.0,0.0,1.0,5.0,2.0]},
{"id": "5", "name_s":"green tea","vector_fs":[0.0,5.0,0.0,0.0,2.0,1.0,1.0,5.0]},
{"id": "6", "name_s":"latte","vector_fs":[0.0,5.0,4.0,0.0,4.0,1.0,3.0,3.0]},
{"id": "7", "name_s":"soda","vector_fs":[0.0,5.0,0.0,0.0,3.0,5.0,5.0,0.0]},
{"id": "8", "name_s":"cheese bread 
sticks","vector_fs":[5.0,0.0,4.0,5.0,0.0,1.0,4.0,2.0]},
{"id": "9", "name_s":"water","vector_fs":[0.0,5.0,0.0,0.0,0.0,0.0,0.0,5.0]},
{"id": "10", "name_s":"cinnamon bread 
sticks","vector_fs":[5.0,0.0,1.0,5.0,0.0,3.0,4.0,2.0]}
]{code}
 

*Streaming Expression:*
{code:java}
sort(
  select(
     search(food_collection, 
            q="*:*", 
            fl="id,vector_fs", 
            sort="id asc", rows=3), 
     cosineSimilarity(vector_fs, array(5.1,0.0,1.0,5.0,0.0,4.0,5.0,1.0)) as 
sim, 
     id), 
  by="sim desc")
 

*Response:*
{
  "result-set": {
    "docs": [
        { "sim": 0.99996111, "id": "1" }, 
        { "sim": 0.98590279, "id": "10" }, 
        { "sim": 0.55566643, "id": "2" }, 
        { "EOF": true, "RESPONSE_TIME": 10 }
    ]
  }
 }{code}
*Benefits*:

1) Works Now out of the box

*Drawbacks*: 
 1) Have to switch searches to using Streaming Expressions, which may not be 
practical in some use cases.
 2) Solr doesn't have multi-dimensional point field support yet (SOLR-11077), 
so you can only store one vector per field per document.
 3) Requires traversing all vectors and scoring them. Needs some sore of KNN 
option (possibly this could be done with another inner-streaming expression 
using a hash function?)
h2. 2) Available Solr Vector Search Plugin (works now):

[https://github.com/saaay71/solr-vector-scoring]
 Note: I recently reached out to Ali (the author of this plugin) and asked him 
to add an ASL 2.0 license, which he has now done, so we can pull in this code 
as needed.

*Docs*
{code:java}
curl -X POST -H "Content-Type: application/json" 
http://localhost:8983/solr/{your-collection-name}/update?commit=true  
--data-binary '
[
    {"name":"example 0", "vector":"0|1.55 1|3.53 2|2.3 3|0.7 4|3.44 5|2.33 "},
    {"name":"example 1", "vector":"0|3.54 1|0.4 2|4.16 3|4.88 4|4.28 5|4.25 "},
    {"name":"example 2", "vector":"0|1.11 1|0.6 2|1.47 3|1.99 4|2.91 5|1.01 "},
    {"name":"example 3", "vector":"0|0.06 1|4.73 2|0.29 3|1.27 4|0.69 5|3.9 "},
    {"name":"example 4", "vector":"0|4.01 1|3.69 2|2 3|4.36 4|1.09 5|0.1 "},
    {"name":"example 5", "vector":"0|0.64 1|3.95 2|1.03 3|1.65 4|0.99 5|0.09 "}
]'
{code}
*Request:*
{code:java}
http://localhost:8983/solr/{your-collection-name}/query?fl=name,score,vector&q={!vp
 f=vector vector="0.1,4.75,0.3,1.2,0.7,4.0"}
{code}
*Response:*
{code:java}
{
  "responseHeader":{
    "status":0,
    "QTime":1,
    "params":{
      "q":"{!myqp f=vector vector=\"0.1,4.75,0.3,1.2,0.7,4.0\"}",
      "fl":"name,score,vector"}},
  "response":{"numFound":6,"start":0,"maxScore":0.99984086,"docs":[
      {
        "name":["example 3"],
        "vector":["0|0.06 1|4.73 2|0.29 3|1.27 4|0.69 5|3.9 "],
        "score":0.99984086},
      {
        "name":["example 0"],
        "vector":["0|1.55 1|3.53 2|2.3 3|0.7 4|3.44 5|2.33 "],
        "score":0.7693964},
      {
        "name":["example 5"],
        "vector":["0|0.64 1|3.95 2|1.03 3|1.65 4|0.99 5|0.09 "],
        "score":0.76322395},
      {
        "name":["example 4"],
        "vector":["0|4.01 1|3.69 2|2 3|4.36 4|1.09 5|0.1 "],
        "score":0.5328145},
      {
        "name":["example 1"],
        "vector":["0|3.54 1|0.4 2|4.16 3|4.88 4|4.28 5|4.25 "],
        "score":0.48513117},
      {
        "name":["example 2"],
        "vector":["0|1.11 1|0.6 2|1.47 3|1.99 4|2.91 5|1.01 "],
        "score":0.44909418}]
  }}
{code}
*Benefits:* 
 1) Works now (when you install the plugin)
 2) Can use in regular search (Search Handler) and don't have to switch to 
streaming expressions

*Drawbacks*:
 1) Slow implementation. It uses payloads to store the values within the 
vector, and traversing those is very expensive. If we were going to follow this 
approach, at a minimum we should switch from using payloads to overriding term 
frequencies for a speedup.
 2) Only supports one vector per field per document (I think... haven't tried 
with a multi-valued text field, but it looks like the payload scoring logic 
expects only a single value per dimension. Might be possible to modify this...)
 3) Requires traversing all vectors and scoring them. Needs some sort of KNN 
option to not be slow at scale.
h2. 3) Available Solr Vector Search Plugin with LSH Hashing (works now)

[https://github.com/moshebla/solr-vector-scoring]

[~moshebla], who created this JIRA issue, forked option #2 above and added and 
LSH implementation so that a KNN filter can be applied before the vector 
scoring, greatly improving the speed of the vector search / scoring.

*Docs:*
{code:java}
curl -X POST -H "Content-Type: application/json" 
http://localhost:8983/solr/{your-collection-name}/update?update.chain=LSH&commit=true
  --data-binary '
[
    {"id":"1", "vector":"1.55,3.53,2.3,0.7,3.44,2.33"},
    {"id":"2", "vector":"3.54,0.4,4.16,4.88,4.28,4.25"}
]'
{code}
*Request:*
{code:java}
http://localhost:8983/solr/{your-collection-name}/query?fl=name,score,vector&q={!vp
 f=vector vector=\"1.55,3.53,2.3,0.7,3.44,2.33\" lsh=\"true\" 
reRankDocs=\"5\"}&fl=name,score,vector,_vector_,_lsh_hash_
{code}
*Response:*
{code:java}
{
  "responseHeader":{
    "status":0,
    "QTime":8,
    "params":{
      "q":"{!vp f=vector vector=\"1.55,3.53,2.3,0.7,3.44,2.33\" lsh=\"true\" 
reRankDocs=\"5\"}",
      "fl":"id, score, vector, _vector_, _lsh_hash_",
      "wt":"xml"}},
  "response":{"numFound":1,"start":0,"maxScore":36.65736,"docs":[
      {
        "id": "1",
        "vector":"1.55,3.53,2.3,0.7,3.44,2.33",
        "_vector_":"/z/GZmZAYeuFQBMzMz8zMzNAXCj2QBUeuA==",
        "_lsh_hash_":["0_8",
          "1_35",
          "2_7",
          "3_10",
          "4_2",
          "5_35",
          "6_16",
          "7_30",
          "8_27",
          "9_12",
          "10_7",
          "11_32",
          "12_48",
          "13_36",
          "14_10",
          "15_7",
          "16_42",
          "17_5",
          "18_3",
          "19_2",
          "20_1",
          "21_0",
          "22_24",
          "23_18",
          "24_42",
          "25_31",
          "26_35",
          "27_8",
          "28_1",
          "29_24",
          "30_47",
          "31_14",
          "32_22",
          "33_39",
          "34_0",
          "35_34",
          "36_34",
          "37_39",
          "38_27",
          "39_27",
          "40_45",
          "41_10",
          "42_21",
          "43_34",
          "44_41",
          "45_9",
          "46_31",
          "47_0",
          "48_4",
          "49_43"],
        "score":36.65736}
      ]
  }
}
{code}
*Benefits:*
 1) Works now (when you install the plugin)
 2) Can use in regular search (Search Handler) and don't have to switch to 
streaming expressions
 3) Provides a KNN implementation that prevents every document's vector from 
having to be scored

*Drawbacks:*
 1) Slow. Even though the KNN reduces unnecessarily scoring all vectors, using 
payloads in still inefficient here. If we were going to follow this approach, 
at a minimum we should switch from using payloads to overriding term 
frequencies for a speedup.
 2) Only supports one vector per field per document (same as original it was 
forked from)
 3) Doesn't currently have an ASL 2.0 license on it for reuse. [~moshebla] - 
since Ali has added a license to the original repo now, can you please pull 
that into your repo so that your changes are also covered under ASL2.0? Thanks!
h2. 4) Port over the Elasticsearch implementation

Elasticsearch recently implemented sparse and dense vector fields, though they 
chose to release the feature under their proprietary Elastic license instead of 
an open source license. HOWEVER, when they were originally implementing this 
feature they were intending for it to be open source, and only later decided to 
change the license to be proprietary, so most of the feature was built under 
and ASL 2.0 license before they restricted it. This means that we can port over 
any part of their implementation that existed prior to this commit under the 
ASL 2.0 license: 
[https://github.com/elastic/elasticsearch/commit/952ddf247a2df8ade64ae067c1904436fd7a2ba8]

*Benefits*:
 1) Encoded vectors into BinaryDocValues, so certainly more efficient that the 
approaches with the Solr Plugins above (using payloads)

*Drawbacks:*
 1) Doesn't work with Solr yet, so would have to do more work to port it over.
 2) Can't copy over future improvements since Elasticsearch isn't open source 
(only early versions of this feature were)
 3) Only supports one vector per field per document currently (though this 
shouldn't be too hard to change)
 4) Doesn't appear to provide a quantized representation for KNN filtering 
prior to scoring, so likely slow at scale.
h2. 5) Port over Open Distro for Stretchysearch implementation

[https://github.com/opendistro-for-elasticsearch/k-NN]
 Open Distro for "Stretchysearch" (not using the original project name because 
Elastic is suing them for Trademark infringement for doing so) created their 
own ASL2.0 plugin for their Open Distro version which implements vector search 
with KNN support. On the surface, this looks like it may be an enhanced version 
over the Elastic version in #4 above, but it appears to still be a work in 
progress, so not sure if it is production ready yet (haven't tried it).

*Benefits:*
 1) Provides a quantized representation of the vectors for KNN, so should still 
be fast at scale with lots of docs
 2) They appear to making codec-level changes, which implies this approach may 
end up being way more efficient than any of the others mentioned above over 
time.

*Drawbacks:*

1) Developed against "Stretchysearch" code base, so will take extra effort to 
port to Solr.
 2) Looks like a work in progress, so likely not ready for production use and 
porting
 3) Only supports one vector per field per document currently
h2. 6) Others please contribute ideas!

There's lots of ways to approach this problem, and we have some really smart 
people in this community. All ideas welcome - possibly we should even implement 
a few different approaches.

> Vector Search in Solr (Umbrella Issue)
> --------------------------------------
>
>                 Key: SOLR-12890
>                 URL: https://issues.apache.org/jira/browse/SOLR-12890
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: mosh
>            Priority: Major
>
> We have recently come across a need to index documents containing vectors 
> using solr, and have even worked on a small POC. We used an URP to calculate 
> the LSH(we chose to use the superbit algorithm, but the code is designed in a 
> way the algorithm picked can be easily chagned), and stored the vector in 
> either sparse or dense forms, in a binary field.
> Perhaps an addition of an LSH URP in conjunction with a query parser that 
> uses the same properties to calculate LSH(or maybe ktree, or some other 
> algorithm all together) should be considered as a Solr feature?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-12890) Vector Search in Solr (Umbrella Issue)

Reply via email to