I've been playing around with highlighting snippets and there are
times when I'd like to know if the snippet returned contains the
entire value of the original field or if the snippet is a truncated
version of the original field.

For example, when I search for "xms" below I can tell that the snippet
returned is truncated:

- original: CORSAIR  XMS 2GB (2 x 1GB) 184-Pin DDR SDRAM Unbuffered
DDR 400 (PC 3200) Dual Channel Kit System Memory - Retail
- snippet: CORSAIR  <em>XMS</em> 2GB (2 x 1GB) 184-Pin DDR SDRAM
Unbuffered DDR 400 (PC 3200) Dual Channel Kit System

Obviously, I can compare the original to the snippet (stripping out
the <em> tags) to see if they are the same, but does Solr natively
support returning a boolean if the values are equal? I couldn't find
anything at https://wiki.apache.org/solr/HighlightingParameters

Maybe the boolean would say "truncated=true" or something.

Here's the example:

$ curl 
'http://localhost:8983/solr/collection1/select?wt=json&indent=true&hl=true&hl.fl=*&q=xms'
{
  "responseHeader":{
    "status":0,
    "QTime":2,
    "params":{
      "indent":"true",
      "q":"xms",
      "hl.fl":"*",
      "wt":"json",
      "hl":"true"}},
  "response":{"numFound":1,"start":0,"docs":[
      {
        "id":"TWINX2048-3200PRO",
        "name":"CORSAIR  XMS 2GB (2 x 1GB) 184-Pin DDR SDRAM
Unbuffered DDR 400 (PC 3200) Dual Channel Kit System Memory - Retail",
        "manu":"Corsair Microsystems Inc.",
        "manu_id_s":"corsair",
        "cat":["electronics",
          "memory"],
        "features":["CAS latency 2,\t2-3-3-6 timing, 2.75v,
unbuffered, heat-spreader"],
        "price":185.0,
        "price_c":"185,USD",
        "popularity":5,
        "inStock":true,
        "store":"37.7752,-122.4232",
        "manufacturedate_dt":"2006-02-13T15:26:37Z",
        "payloads":"electronics|6.0 memory|3.0",
        "_version_":1506070286991097856}]
  },
  "highlighting":{
    "TWINX2048-3200PRO":{
      "name":["CORSAIR  <em>XMS</em> 2GB (2 x 1GB) 184-Pin DDR SDRAM
Unbuffered DDR 400 (PC 3200) Dual Channel Kit System"]}}}

My use case is only including ellipses (...) when the snippet is
truncated: https://github.com/IQSS/dataverse/issues/537

Thanks,

Phil

-- 
Philip Durbin
Software Developer for http://dataverse.org
http://www.iq.harvard.edu/people/philip-durbin

Reply via email to