After some more research, it seems that I might be able to use payloads to store the timecodes with the words, though this would appear to require some custom java code. I found this post useful:
http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payloads/ (thanks, Grant!) On Jun 16, 2010, at 9:50 PM, Peter Wilkins pwilk...@mit.edu wrote: > I have lecture transcripts with start and stop times for each word. The time > codes allow us to search the transcripts, and show the part of the lecture > video that contain the search results. I want to structure the index so that > I can search the transcripts for phrases, and have the search results contain > a portion of the transcript containing the query terms, as well as metadata > identifying the transcript, video, and time codes that will allow me to > position the video player at the correct point for playback. > > Here's what the raw input data looks like (time codes are in milliseconds): > > ... > 6183 6288 in > 6288 6868 physics > 7186 7342 we > 7342 8013 explore > 9091 9181 the > 9181 9461 very > 9461 9956 small > 10741 10862 to > 10862 10946 the > 10946 11226 very > 11226 11686 large > .. > > > Can someone offer some guidance as to how I can structure the upload data to > perform this magic? I want to believe that someone with more Solr/Lucene > knowledge than I can see their way through this problem. > > thank you, > Peter