Chapter seems too broad and line seems too narrow -- have you thought about paragraph level? Something like:
docID, book fields (title, author, publisher, etc), chapter fields (#, title, pages, etc), section fields (title, #, etc), sub-sectionN fields, paragraph text, lines Seems like line #'s would only be useful for display so just store the lines the paragraph covers. On Tue, Apr 23, 2013 at 7:51 PM, Walter Underwood <wun...@wunderwood.org> wrote: > If you can represent your books in XML, then MarkLogic could do the job very > cleanly. It isn't free, but it is very good. > > wunder > > On Apr 23, 2013, at 6:47 PM, Jason Funk wrote: > >> Is there a better tool than Solr to use for my situation? >> >> >> On Apr 23, 2013, at 5:04 PM, Jack Krupansky <j...@basetechnology.com> wrote: >> >>> There is no simple, obvious, and direct approach, right out of the box. >>> Sure, you can highlight passages of raw text, right out of the box, but >>> that won't give you chapters, pages, and line numbers. To do all of that, >>> you would have to either: >>> >>> 1. Add chapter, page, and line number as part of the payload for each word. >>> And add some custom document transformers to access the information. >>> or >>> 2. Index each line as a separate Solr document, with fields for book, >>> chapter, page, and line number. >>> >>> -- Jack Krupansky >>> >>> -----Original Message----- From: Jason Funk >>> Sent: Tuesday, April 23, 2013 5:02 PM >>> To: solr-user@lucene.apache.org >>> Subject: Book text with chapter line number >>> >>> Hello. >>> >>> I'm trying to figure out if Solr is going to work for a new project that I >>> am wanting to build. At it's heart it's a book text searching application. >>> Each book is broken into chapters and each chapter is broken into lines. I >>> want to be able to search these books and return relevant sections of the >>> book and display the results with chapter and line number. I'm not sure how >>> I would structure my data so that it's efficient and functional. I could >>> simply treat each line of text as a document which would provide some of >>> the functionality but what if the search query spanned two lines? Then it >>> seems the passage the user was searching for wouldn't be returned. I could >>> treat each book as a document and use highlighting to find the context but >>> that seems to limit weighting/results for best matches as well as >>> difficultly in finding chapter/line numbers. What is the best way to do >>> this with Solr? >>> >>> Is there a better tool to use to solve my problem? >> > > -- > Walter Underwood > wun...@wunderwood.org > > >