After some more research, it seems that I might be able to use payloads to 
store the timecodes with the words, though this would appear to require some 
custom java code.  I found this post useful:

http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payloads/

(thanks, Grant!)

On Jun 16, 2010, at 9:50 PM, Peter Wilkins pwilk...@mit.edu wrote:

> I have lecture transcripts with start and stop times for each word.  The time 
> codes allow us to search the transcripts, and show the part of the lecture 
> video that contain the search results.  I want to structure the index so that 
> I can search the transcripts for phrases, and have the search results contain 
> a portion of the transcript containing the query terms, as well as metadata 
> identifying the transcript, video, and time codes that will allow me to 
> position the video player at the correct point for playback.  
> 
> Here's what the raw input data looks like (time codes are in milliseconds):
> 
> ...
> 6183 6288 in
> 6288 6868 physics
> 7186 7342 we
> 7342 8013 explore
> 9091 9181 the
> 9181 9461 very
> 9461 9956 small
> 10741 10862 to
> 10862 10946 the
> 10946 11226 very
> 11226 11686 large
> ..
> 
> 
> Can someone offer some guidance as to how I can structure the upload data to 
> perform this magic?  I want to believe that someone with more Solr/Lucene 
> knowledge than I can see their way through this problem.
> 
> thank you,
> Peter

Reply via email to