Thanks for the response. Your suggestion is to keep the existing way of
indexing data where every page of a document is a row in the SOLR database,
changing the "content" field to be store-only and add another field (ex.
document_content) for "index only" where I should put the whole content of
the document. This is a good idea but I am also using HighLighter and I
think it won't work since it requires the field to be stored=true. My
problem will be solved if there is a way to search in the index-only field
where the whole document is indexed but to get the highlights/context of
the match from the existing page.
Originally my idea was to keep data in existing format (1 page - 1 record)
but somehow search in grouped (by document) results or some kind of union
between pages of a document. Is this possible?


On Thu, Aug 29, 2013 at 4:45 PM, Alexandre Rafalovitch
<arafa...@gmail.com>wrote:

> Assuming you want both pages to match you need the text to be present on
> both pages. Do you actually return/store text of the page in Solr? If so,
> you can have that 'page' field store-only and have another field which is
> index-only and into which you put all your matching logic. So, that
> index-only field can contain the page plus another line/paragraph/page on
> each side.
>
> Regards,
>    Alex.
>
> Personal website: http://www.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all at
> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
>
>
> On Thu, Aug 29, 2013 at 2:49 PM, Alexandre Rafalovitch
> <arafa...@gmail.com>wrote:
>
> > So, if the match spans pages 4 and 5, what do you want returned? Page 4,
> > page 5, or both?
> >
> > Regards,
> >      Alex
> > On 28 Aug 2013 06:55, "Атанас Атанасов" <atanaso...@gmail.com> wrote:
> >
> >> Hello,
> >>
> >> My name is Atanas Atanasov, I'm using SOLR 1.4/3.5/4.3 for an year and a
> >> half and I'm really satisfied of what it provides. Searching and
> indexing
> >> are extremely fast, it is easy to work with.
> >> However I ran into a small problem and I can't figure it out.
> >> I'm using SOLR to store the content/text of different types of
> >> documents(.pdf, .txt, .doc, etc.).
> >> The whole document content represents a SOLR record(all the text from
> all
> >> pages of the document).
> >> schema.xml is in SOLR_Document_Level folder of attached .zip file.
> >> This worked absolutely fine but I wanted to see the exact page/pages of
> a
> >> document where the search match is/are.
> >>
> >> I redesigned it so that every page of a document is a row in the SOLR
> >> database (schema.xml is in SOLR_Page_Level folder of attached .zip
> file.)
> >> and it works good but this resulted in the following problem:
> >> Example: I search for (lucene AND apache). If both words are on the same
> >> page I will get a hit and
> >> result will be returned. However If the words are on different pages of
> a
> >> document no results will be found.
> >> My goal is to find out the exact page of a document where the match is.
> >> Dynamic fields would solve this problem but there are very big documents
> >> with many pages so I don't think this is a solution.
> >> Can you help me with some ideas on how to make it work?
> >>
> >> Just for information. I am using SOLR as a REST service hosted in Apache
> >> and a .NET application to work with it.
> >> If you have questions please feel free to ask.
> >>
> >> Thanks in advance and Best Regards,
> >> Atanas Atanasov
> >>
> >>
>

Reply via email to