I have lecture transcripts with start and stop times for each word. The time codes allow us to search the transcripts, and show the part of the lecture video that contain the search results. I want to structure the index so that I can search the transcripts for phrases, and have the search results contain a portion of the transcript containing the query terms, as well as metadata identifying the transcript, video, and time codes that will allow me to position the video player at the correct point for playback.
Here's what the raw input data looks like (time codes are in milliseconds): ... 6183 6288 in 6288 6868 physics 7186 7342 we 7342 8013 explore 9091 9181 the 9181 9461 very 9461 9956 small 10741 10862 to 10862 10946 the 10946 11226 very 11226 11686 large .. Can someone offer some guidance as to how I can structure the upload data to perform this magic? I want to believe that someone with more Solr/Lucene knowledge than I can see their way through this problem. thank you, Peter