Dear Gora, I think you misunderstood my problem. Actually I used nutch for crawling websites and my problem is in index side and not crawl side. Suppose page is fetch and parsed by Nutch and all comments and the date and source of comments are identified by parsing. Now what can I do for indexing these comments? What is the document granularity? Best regards.
On Wed, Aug 6, 2014 at 1:29 PM, Gora Mohanty <g...@mimirtech.com> wrote: > On 6 August 2014 14:13, Ali Nazemian <alinazem...@gmail.com> wrote: > > > > Dear all, > > Hi, > > I was wondering how can I mange to index comments in solr? suppose I am > > going to index a web page that has a content of news and some comments > that > > are presented by people at the end of this page. How can I index these > > comments in solr? consider the fact that I am going to do some analysis > on > > these comments. For example I want to have such query flexibility for > > retrieving all comments that are presented between 24 June 2014 to 24 > July > > 2014! or all the comments that are presented by specific person. > Therefore > > defining these comment as multi-value field would not be the solution > since > > in this case such query flexibility is not feasible. So what is you > > suggestion about document granularity in this case? Can I consider all of > > these comments as a new document inside main document (tree based > > structure). What is your suggestion for this case? I think it is a common > > case of indexing webpages these days so probably I am not the only one > > thinking about this situation. Please share you though and perhaps your > > experiences in this condition with me. Thank you very much. > > Parsing a web page, and breaking up parts up for indexing into different > fields > is out of the scope of Solr. You might want to look at Apache Nutch which > can index into Solr, and/or other web crawlers/scrapers. > > Regards, > Gora > -- A.Nazemian