Jason Gerlowski created SOLR-14190:
--------------------------------------

             Summary: Add multi-shard support to TaggerRequestHandler
                 Key: SOLR-14190
                 URL: https://issues.apache.org/jira/browse/SOLR-14190
             Project: Solr
          Issue Type: Improvement
      Security Level: Public (Default Security Level. Issues are Public)
          Components: query
    Affects Versions: master (9.0)
            Reporter: Jason Gerlowski


As documented in the ref-guide, the Tagger Handler currently only works on 
single-shard collections.

Users attempting to invoke {{/tag}} on a multi-shard collection will get 
results that only represent the tags from one of the shards.  This is pretty 
easy to reproduce with the tagger tutorial in the 
[docs|https://lucene.apache.org/solr/guide/8_2/the-tagger-handler.html#tutorial-with-geonames].
  If the geonames collection is created with multiple shards (e.g. {{bin/solr 
create -c geonames -shards 2}}), then the tags returned by the API vary based 
on which shard ends up being used.  Repeating the same request returns 
different results:

{code}
➜  solr git:(master) ✗ curl -X POST 
'http://localhost:8983/solr/geonames2/tag?overlaps=NO_SUB&tagsLimit=5000&fl=id,name,countrycode&wt=json&indent=on'
 -H 'Content-Type:text/plain' -d 'Hello New York City' 
{
  "responseHeader":{...},
  "tagsCount":2,
  "tags":[[
      "startOffset",10,
      "endOffset",14,
      "ids",["4098776",
        "4562407"]],
    [
      "startOffset",15,
      "endOffset",19,
      "ids",["8347868"]]],
  "response":{"numFound":3,"start":0,"docs":[
      {"id":"8347868", "name":["City"], "countrycode":["AU"]},
      {"id":"4098776", "name":["York"], "countrycode":["US"]},
      {"id":"4562407", "name":["York"], "countrycode":["US"]}]
  }}
➜  solr git:(master) ✗ curl -X POST 
'http://localhost:8983/solr/geonames2/tag?overlaps=NO_SUB&tagsLimit=5000&fl=id,name,countrycode&wt=json&indent=on'
 -H 'Content-Type:text/plain' -d 'Hello New York City'
{
  "responseHeader":{...},
  "tagsCount":1,
  "tags":[[
      "startOffset",6,
      "endOffset",19,
      "ids",["5128581"]]],
  "response":{"numFound":1,"start":0,"docs":[
      {"id":"5128581", "name":["New York City"], "countrycode":["US"]}]
  }}
{code}

Nothing inherent to {{/tag}} prevents it from handling multi-shard requests, it 
just wasn't a priority at the time the initial implementation was put in.  We 
should add distributed support to this request handler.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to