A colleague came to be with a problem that intrigued me.  I can see
partly how to solve it with Solr, but looking for insight into solving
the last step.

The problem:

1) Start from a set of text transcriptions of videos where there is a
timestamp associated with each word.

2) Index into Solr with analysis including stemming, so that a user
can search for videos based on keywords.

3) When the user clicks into a single video in the search result,
retrieve from the corresponding doc in Solr the timestamps of all
words matching the keyword(s) (including stemming).

So, obviously #1 and 2 are easy.  As part of #2 it would seem one
could use the DelimitedPayloadTokenFilterFactory to index the
timestamp as a payload for each word.  I don't want the payload to
influence score, but my understanding is that by default it will not.

Ok, so now for the harder part.  For #3 it would seem I need something
roughly like the highlighter - to return each matching word and the
payload which is the timestamp.

I'm not seeing any existing request handler or component that would do
this.  Is there an easy way to retrieve the indexed words (or analyzed
tokens) and their payload?

Thanks,

-Peter


--
Peter M. Wolanin, Ph.D.      : Momentum Specialist,  Acquia. Inc.
peter.wola...@acquia.com : 781-313-8322

"Get a free, hosted Drupal 7 site: http://www.drupalgardens.com";

Reply via email to