Pardon me if I am taking too much of your time. It would be really great if you could please highlight a few advantages of caching and maintenance over nutch.
Some musing:- (I have used Nutch before and one thing I observed there was that if I delete the crawl folder when Nutch is running, users can still search and obtain proper results. It seems Nutch caches all the indexes in the memory when it starts. I don't understand how is that feasible when the size of the crawl is in the order of 10 GBs where as you have a RAM + swap of only a few GBs.) How is Solr caching better than this? On 6/7/07, Ian Holsman <[EMAIL PROTECTED]> wrote:
Manoharam Reddy wrote: > Thanks for your quick response. > > This brings me to another question. As far as I know Nutch can take > care of crawling as well as indexing. Then why go through the hassle > of crawling through Nutch and integrating it into Solr? I found Solr's caching and maintenance easier to use than nutch's. But that's just me. > > Another question I have, Solr provides the search results in XML > format, any ready made tools to convert them directly to web pages for > visitors to see? yep.. it's called XSLT. most modern browsers can do the transform on the client side. otherwise there is some server side tools (cocoon I think does this) to do the transform on the server before sending it out. --Ian > > On 6/7/07, Ian Holsman <[EMAIL PROTECTED]> wrote: >> Hi Manoharam. >> >> we use nutch to do the crawl, and have used sami's patch of nutch >> (http://blog.foofactory.fi/2007/02/online-indexing-integrating-nutch-with.html >> >> ) to have it integrate with Solr. It works quite well for our needs. >> >> If you are concerned with the speed, Solr also has a CSV upload >> facility, which you might be able to use to upload the data into solr >> that way, but we haven't found the HTTP Post speed to be an issue for >> us. >> >> Regards >> Ian >> >> >> Manoharam Reddy wrote: >> > I have just begun using Solr. I see that we have to insert documents >> > by posting XMLs to solr/update >> > >> > I would like to know how Solr is used as a search engine in >> > enterprises. How do you do the crawling of your intranet and passing >> > the information as XML to solr/update. Isn't this going to be slow? To >> > put all content in the index via a HTTP POST request requiring network >> > sockets to be opened? >> > >> > Isn't there any direct way to to do the same thing without resorting >> > to HTTP? >> > >> >> >