Nested documents and block join MAY work, but... I'm not so sure that nutch be be able to send the data in the structure that Solr and Lucene would expect. You may have to do some sort of customer connector between nutch and Solr to do that. I mean, normally the output of nutch is simply a stream of flat documents.

-- Jack Krupansky

-----Original Message----- From: Ali Nazemian
Sent: Wednesday, August 6, 2014 9:35 AM
To: solr-user@lucene.apache.org
Subject: Re: indexing comments with Apache Solr

Dear Alexandre,
Hi,
Thank you very much. I think nested document is what I need. Do you have
more information about how can I define such thing in solr schema? Your
mentioned blog post was all about retrieving nested docs.
Best regards.


On Wed, Aug 6, 2014 at 5:16 PM, Alexandre Rafalovitch <arafa...@gmail.com>
wrote:

You can index comments as child records. The structure of the Solr
document should be able to incorporate both parents and children
fields and you need to index them all together. Then, just search for
JOIN syntax for nested documents. Also, latest Solr (4.9) has some
extra functionality that allows you to find all parent pages and then
expand children pages to match.

E.g.: http://heliosearch.org/expand-block-join/ seems relevant

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On Wed, Aug 6, 2014 at 11:18 AM, Ali Nazemian <alinazem...@gmail.com>
wrote:
> Dear Gora,
> I think you misunderstood my problem. Actually I used nutch for crawling
> websites and my problem is in index side and not crawl side. Suppose > page
> is fetch and parsed by Nutch and all comments and the date and source of
> comments are identified by parsing. Now what can I do for indexing these
> comments? What is the document granularity?
> Best regards.
>
>
> On Wed, Aug 6, 2014 at 1:29 PM, Gora Mohanty <g...@mimirtech.com> wrote:
>
>> On 6 August 2014 14:13, Ali Nazemian <alinazem...@gmail.com> wrote:
>> >
>> > Dear all,
>> > Hi,
>> > I was wondering how can I mange to index comments in solr? suppose I
am
>> > going to index a web page that has a content of news and some >> > comments
>> that
>> > are presented by people at the end of this page. How can I index >> > these
>> > comments in solr? consider the fact that I am going to do some
analysis
>> on
>> > these comments. For example I want to have such query flexibility for
>> > retrieving all comments that are presented between 24 June 2014 to 24
>> July
>> > 2014! or all the comments that are presented by specific person.
>> Therefore
>> > defining these comment as multi-value field would not be the solution
>> since
>> > in this case such query flexibility is not feasible. So what is you
>> > suggestion about document granularity in this case? Can I consider
all of
>> > these comments as a new document inside main document (tree based
>> > structure). What is your suggestion for this case? I think it is a
common
>> > case of indexing webpages these days so probably I am not the only >> > one
>> > thinking about this situation. Please share you though and perhaps
your
>> > experiences in this condition with me. Thank you very much.
>>
>> Parsing a web page, and breaking up parts up for indexing into >> different
>> fields
>> is out of the scope of Solr. You might want to look at Apache Nutch
which
>> can index into Solr, and/or other web crawlers/scrapers.
>>
>> Regards,
>> Gora
>>
>
>
>
> --
> A.Nazemian




--
A.Nazemian

Reply via email to