Hi Markus, I am sorry for not being clear, I meant to say that...
Suppose if a url namely www.somehost.com/gifts/greetingcard.html(which in turn contain links to a.html, b.html, c.html, d.html) is injected into the seed.txt, after the whole process I was expecting a bunch of other pages which crawled from this seed url. However, at the end of it all I see is the contents from only this page namely www.somehost.com/gifts/greetingcard.htmland I do not see any other pages(here a.html, b.html, c.html, d.html) crawled from this one. The crawling happens only for the URLs mentioned in the seed.txt and does not proceed further from there. So I am just bit confused. Why is it not crawling the linked pages(a.html, b.html, c.html and d.html). I get a feeling that I am missing something that the author of the blog( http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/) assumed everyone would know. Thanks, Abi On Wed, Feb 9, 2011 at 7:09 PM, Markus Jelsma <markus.jel...@openindex.io>wrote: > The parsed data is only sent to the Solr index of you tell a segment to be > indexed; solrindex <crawldb> <linkdb> <segment> > > If you did this only once after injecting and then the consequent > fetch,parse,update,index sequence then you, of course, only see those > URL's. > If you don't index a segment after it's being parsed, you need to do it > later > on. > > On Wednesday 09 February 2011 04:29:44 .: Abhishek :. wrote: > > Hi all, > > > > I am a newbie to nutch and solr. Well relatively much newer to Solr than > > Nutch :) > > > > I have been using nutch for past two weeks, and I wanted to know if I > can > > query or search on my nutch crawls on the fly(before it completes). I am > > asking this because the websites I am crawling are really huge and it > takes > > around 3-4 days for a crawl to complete. I want to analyze some quick > > results while the nutch crawler is still crawling the URLs. Some one > > suggested me that Solr would make it possible. > > > > I followed the steps in > > http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/ for this. By > > this process, I see only the injected URLs are shown in the Solr search. > I > > know I did something really foolish and the crawl never happened, I feel > I > > am missing some information here. I think somewhere in the process there > > should be a crawling happening and I missed it out. > > > > Just wanted to see if some one could help me pointing this out and where > I > > went wrong in the process. Forgive my foolishness and thanks for your > > patience. > > > > Cheers, > > Abi > > -- > Markus Jelsma - CTO - Openindex > http://www.linkedin.com/in/markus17 > 050-8536620 / 06-50258350 >