Beautiful! Thank you all - that is exactly what I needed to be sure where I stood on this before going into a meeting today.
On Tue, Nov 29, 2016 at 11:03 AM, Kevin Risden <compuwizard...@gmail.com> wrote: > For #2 you might be able to get away with the following: > > https://cwiki.apache.org/confluence/display/solr/The+Term+Vector+Component > > The Term Vector component can return offsets and positions. Not sure how > useful they would be to you, but at least is a starting point. I'm assuming > this requires only termVecotrs and termPositions and won't require stored > to be true. > > Kevin Risden > > On Tue, Nov 29, 2016 at 12:00 PM, Kevin Risden <compuwizard...@gmail.com> > wrote: > > > For #3 specifically, I've always found this page useful: > > > > https://cwiki.apache.org/confluence/display/solr/Field+ > > Properties+by+Use+Case > > > > It lists out what properties are necessary on each field based on a use > > case. > > > > Kevin Risden > > > > On Tue, Nov 29, 2016 at 11:49 AM, Erick Erickson < > erickerick...@gmail.com> > > wrote: > > > >> (1) No that I have readily at hand. And to make it > >> worse, there's the UnifiedHighlighter coming out soon.... > >> > >> I don't think there's a good way for (2). > >> > >> for (3) at least yes. The reason is simple. For analyzed text, > >> the only thing in the index is what's made it through the > >> analysis chains. So stopwords are missing. Stemming > >> has been done. You could even have put a phonetic filter > >> in there and have terms like ARDT KNTR which would > >> be...er...not very useful to show the end user so the original > >> text must be available. > >> > >> > >> > >> > >> Not much help... > >> Erick > >> > >> On Tue, Nov 29, 2016 at 8:43 AM, John Bickerstaff > >> <j...@johnbickerstaff.com> wrote: > >> > All, > >> > > >> > One of the questions I've been asked to answer / prove out is around > the > >> > question of highlighting query matches in responses. > >> > > >> > BTW - One assumption I'm making is that highlighting is basically a > >> > function of storing offsets for terms / tokens at index time. If > that's > >> > not right, I'd be grateful for pointers in the right direction. > >> > > >> > My underlying need is to get highlighting on search term matches for > >> > returned documents. I need to choose between doing this in Solr and > >> using > >> > an external document store, so I'm interested in whether Solr can > >> provide > >> > the doc store with the information necessary to identify which > >> section(s) > >> > of the doc to highlight in a query response... > >> > > >> > A few questions: > >> > > >> > 1. This page doesn't say a lot about how things work - is there > >> somewhere > >> > with more information on dealing with offsets and highlighting? On > >> offsets > >> > and how they're handled? > >> > https://cwiki.apache.org/confluence/display/solr/Highlighting > >> > > >> > 2. Can I return offset information with a query response or is that > >> > internal only? If yes, can I return offset info if I have NOT stored > >> the > >> > data in Solr but indexed only? > >> > > >> > (Explanation: Currently my project is considering indexing only and > >> storing > >> > the entire text elsewhere -- using Solr to return only doc ID's for > >> > searches. If Solr could also return offsets, these could be used in > >> > processing the text stored elsewhere to provide highlighting) > >> > > >> > 3. Do I assume correctly that in order for Solr highlighting to work > >> > correctly, the text MUST also be stored in Solr (I.E. not indexed > only, > >> but > >> > stored=true) > >> > > >> > Many thanks... > >> > > > > >