How to index/search without whitespace but hightlight with whitespace?

Travis Thu, 11 Jun 2015 11:00:19 -0700

Hey everyone!

I'm trying to setup a Solr instance on some free text clinical data.
This data has a lot of white space formatting, for example, I might have a
document that contains unstructured bulleted lists or section titles.


For example,

blah blah blah...
MEDICATIONS:
* Xanax
* Phenobritrol

DIAGNOSIS:
blah blah blah...

When indexing (and thus querying) this document, I use a text field with
tokenization, stemming, etc, lets call it "text".

Unfortunately, when I try to print highlighted results, the newlines and
whitespace are obviously not preserved. In an attempt to get around this, I
created a second field in the index that stores the full content of each
document as a string, thus preserving the whitespace, called "raw_text".

If I setup the search page to search on the text field, but highlight on
the text_raw field, then the highlighted matches don't always line up. Is
there a way to some how project the stemmed matches from the text field
onto the text_raw field when displaying hightlighting?

Thank you for your time,
Travis

How to index/search without whitespace but hightlight with whitespace?

Reply via email to