It seems that the normal use case is line=document with some exception
for cross-line indexing.

The edge case could be solved by either indexing additional 'two-line'
documents with lower boost or to have 'context' field with line
before/after where applicable (e.g. within same para).  Then there
might also be some trick around using highlighter to figure out
whether the match came from the 'line' field or from 'context' field.

I also like payload idea, though there does not seem to be too much
information around on using that.

Regards,
   Alex.
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Wed, Apr 24, 2013 at 10:28 AM, Paul Libbrecht <p...@hoplahup.net> wrote:
> It's easy to then store a map of "term position" to line-number and 
> page-number along with each paragraph, or?
>
> Paul
>
>
> On 24 avr. 2013, at 16:24, Timothy Potter wrote:
>
>> Chapter seems too broad and line seems too narrow -- have you thought
>> about paragraph level? Something like:
>>
>> docID, book fields (title, author, publisher, etc), chapter fields (#,
>> title, pages, etc), section fields (title, #, etc), sub-sectionN
>> fields, paragraph text, lines
>>
>> Seems like line #'s would only be useful for display so just store the
>> lines the paragraph covers.
>>
>>
>>
>> On Tue, Apr 23, 2013 at 7:51 PM, Walter Underwood <wun...@wunderwood.org> 
>> wrote:
>>> If you can represent your books in XML, then MarkLogic could do the job 
>>> very cleanly. It isn't free, but it is very good.
>>>
>>> wunder
>>>
>>> On Apr 23, 2013, at 6:47 PM, Jason Funk wrote:
>>>
>>>> Is there a better tool than Solr to use for my situation?
>>>>
>>>>
>>>> On Apr 23, 2013, at 5:04 PM, Jack Krupansky <j...@basetechnology.com> 
>>>> wrote:
>>>>
>>>>> There is no simple, obvious, and direct approach, right out of the box. 
>>>>> Sure, you can highlight passages of raw text, right out of the box, but 
>>>>> that won't give you chapters, pages, and line numbers. To do all of that, 
>>>>> you would have to either:
>>>>>
>>>>> 1. Add chapter, page, and line number as part of the payload for each 
>>>>> word. And add some custom document transformers to access the information.
>>>>> or
>>>>> 2. Index each line as a separate Solr document, with fields for book, 
>>>>> chapter, page, and line number.
>>>>>
>>>>> -- Jack Krupansky
>>>>>
>>>>> -----Original Message----- From: Jason Funk
>>>>> Sent: Tuesday, April 23, 2013 5:02 PM
>>>>> To: solr-user@lucene.apache.org
>>>>> Subject: Book text with chapter line number
>>>>>
>>>>> Hello.
>>>>>
>>>>> I'm trying to figure out if Solr is going to work for a new project that 
>>>>> I am wanting to build. At it's heart it's a book text searching 
>>>>> application. Each book is broken into chapters and each chapter is broken 
>>>>> into lines. I want to be able to search these books and return relevant 
>>>>> sections of the book and display the results with chapter and line 
>>>>> number. I'm not sure how I would structure my data so that it's efficient 
>>>>> and functional. I could simply treat each line of text as a document 
>>>>> which would provide some of the functionality but what if the search 
>>>>> query spanned two lines? Then it seems the passage the user was searching 
>>>>> for wouldn't be returned. I could treat each book as a document and use 
>>>>> highlighting to find the context but that seems to limit 
>>>>> weighting/results for best matches as well as difficultly in finding 
>>>>> chapter/line numbers. What is the best way to do this with Solr?
>>>>>
>>>>> Is there a better tool to use to solve my problem?
>>>>
>>>
>>> --
>>> Walter Underwood
>>> wun...@wunderwood.org
>>>
>>>
>>>
>

Reply via email to