The boundary scanner supports sentence as per:
https://cwiki.apache.org/confluence/display/solr/Highlighting
So, the word in context should - if I remember correctly - give you
the sentence that word is in even if the field has longer text.
Regards,
Alex.
http://www.solr-start.com/ - Reso
lides/TextProcessingAndAdvancedSearch_tallison_MITRE_201510_final_abbrev.pdf
slide 23ff.
-Original Message-
From: ankur [mailto:ankur.sancheti.netw...@gmail.com]
Sent: Thursday, April 13, 2017 12:08 PM
To: solr-user@lucene.apache.org
Subject: Re: keyword-in-content for PDF document
Thanks Alex. Yes,
Thanks Alex. Yes, I am using TIKA. So, to some extent it preserves the text
flow.
There is something interesting in your reply, "Or you could try using
highlighter to return only
the sentence. ".
I didnt understand that bit. How do we use Highlighter to return the
sentence?
To make sure, I want
With great difficulty. PDF does not usually preserve the text flow, it
uses instead absolute positioning for text fragments. Extraction will
try to approximate the right thing, but it is an approximation. And if
you have two columns, it is harder again. Some documents may have
accessibility layer,