Hi Markus,

 I am sorry for not being clear, I meant to say that...

 Suppose if a url namely www.somehost.com/gifts/greetingcard.html(which in
turn contain links to a.html, b.html, c.html, d.html) is injected into the
seed.txt, after the whole process I was expecting a bunch of other pages
which crawled from this seed url. However, at the end of it all I see is the
contents from only this page namely
www.somehost.com/gifts/greetingcard.htmland I do not see any other
pages(here a.html, b.html, c.html, d.html)
crawled from this one.

 The crawling happens only for the URLs mentioned in the seed.txt and does
not proceed further from there. So I am just bit confused. Why is it not
crawling the linked pages(a.html, b.html, c.html and d.html). I get a
feeling that I am missing something that the author of the blog(
http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/) assumed
everyone would know.

Thanks,
Abi


On Wed, Feb 9, 2011 at 7:09 PM, Markus Jelsma <markus.jel...@openindex.io>wrote:

> The parsed data is only sent to the Solr index of you tell a segment to be
> indexed; solrindex <crawldb> <linkdb> <segment>
>
> If you did this only once after injecting  and then the consequent
> fetch,parse,update,index sequence then you, of course, only see those
> URL's.
> If you don't index a segment after it's being parsed, you need to do it
> later
> on.
>
> On Wednesday 09 February 2011 04:29:44 .: Abhishek :. wrote:
> > Hi all,
> >
> >  I am a newbie to nutch and solr. Well relatively much newer to Solr than
> > Nutch :)
> >
> >  I have been using nutch for past two weeks, and I wanted to know if I
> can
> > query or search on my nutch crawls on the fly(before it completes). I am
> > asking this because the websites I am crawling are really huge and it
> takes
> > around 3-4 days for a crawl to complete. I want to analyze some quick
> > results while the nutch crawler is still crawling the URLs. Some one
> > suggested me that Solr would make it possible.
> >
> >  I followed the steps in
> > http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/ for this. By
> > this process, I see only the injected URLs are shown in the Solr search.
> I
> > know I did something really foolish and the crawl never happened, I feel
> I
> > am missing some information here. I think somewhere in the process there
> > should be a crawling happening and I missed it out.
> >
> >  Just wanted to see if some one could help me pointing this out and where
> I
> > went wrong in the process. Forgive my foolishness and thanks for your
> > patience.
> >
> > Cheers,
> > Abi
>
> --
> Markus Jelsma - CTO - Openindex
> http://www.linkedin.com/in/markus17
> 050-8536620 / 06-50258350
>

Reply via email to