Re: how to crawl when Solr is search engine?

Manoharam Reddy Thu, 07 Jun 2007 01:05:14 -0700

Pardon me if I am taking too much of your time.

It would be really great if you could please highlight a few
advantages of caching and maintenance over nutch.


Some musing:-
(I have used Nutch before and one thing I observed there was that if I
delete the crawl folder when Nutch is running, users can still search
and obtain proper results. It seems Nutch caches all the indexes in
the memory when it starts. I don't understand how is that feasible
when the size of the crawl is in the order of 10 GBs where as you have
a RAM + swap of only a few GBs.)

How is Solr caching better than this?

On 6/7/07, Ian Holsman <[EMAIL PROTECTED]> wrote:

Manoharam Reddy wrote:
> Thanks for your quick response.
>
> This brings me to another question. As far as I know Nutch can take
> care of crawling as well as indexing. Then why go through the hassle
> of crawling through Nutch and integrating it into Solr?

I found Solr's caching and maintenance easier to use than nutch's. But
that's just me.

>
> Another question I have, Solr provides the search results in XML
> format, any ready made tools to convert them directly to web pages for
> visitors to see?

yep.. it's called XSLT. most modern browsers can do the transform on the
client side.
otherwise there is some server side tools (cocoon I think does this) to
do the transform on the server before sending it out.

--Ian
>
> On 6/7/07, Ian Holsman <[EMAIL PROTECTED]> wrote:
>> Hi Manoharam.
>>
>> we use nutch to do the crawl, and have used sami's patch of nutch
>> 
(http://blog.foofactory.fi/2007/02/online-indexing-integrating-nutch-with.html
>>
>> ) to have it integrate with Solr. It works quite well for our needs.
>>
>> If you are concerned with the speed, Solr also has a CSV upload
>> facility, which you might be able to use to upload the data into solr
>> that way, but we haven't found the HTTP Post speed to be an issue for
>> us.
>>
>> Regards
>> Ian
>>
>>
>> Manoharam Reddy wrote:
>> > I have just begun using Solr. I see that we have to insert documents
>> > by posting XMLs to solr/update
>> >
>> > I would like to know how Solr is used as a search engine in
>> > enterprises. How do you do the crawling of your intranet and passing
>> > the information as XML to solr/update. Isn't this going to be slow? To
>> > put all content in the index via a HTTP POST request requiring network
>> > sockets to be opened?
>> >
>> > Isn't there any direct way to to do the same thing without resorting
>> > to HTTP?
>> >
>>
>>
>

Re: how to crawl when Solr is search engine?

Reply via email to