Have you looked at Apache Nutch [1]. It is a distributed web crawl and
search system, based on Lucene/Solr and Hadoop.
[1] http://nutch.apache.org/
--
Renaud Delbru
On 19/11/10 16:52, Bing Li wrote:
Hi, all,
I am working on a distributed searching system. Now I have one server only.
It has t
Make sure you are not going to "reinvent the wheel" here ;). There's been
done a lot around the problem of distributes search engine.
This thread might be useful for you: http://search-hadoop.com/m/ARlbS1MiTNY
Alex Baranau
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - H
On Sat, Nov 20, 2010 at 12:39 AM, Bing Li wrote:
> Hi, Gora,
>
> No, I really wonder if Solr is based on Hadoop?
As far as I know, no it it isn't.
> Hadoop is efficient when using on search engines since it is suitable to the
> write-once-read-many model. After reading your emails, it looks like
Hi, Gora,
No, I really wonder if Solr is based on Hadoop?
Hadoop is efficient when using on search engines since it is suitable to the
write-once-read-many model. After reading your emails, it looks like Solr's
distributed file system does the same thing. Both of them are good for
searching large
On Sat, Nov 20, 2010 at 12:05 AM, Bing Li wrote:
> Dear Erick,
>
> Thanks so much for your help! I am new in Solr. So I have no idea about the
> version.
The solr/admin/registry.jsp URL on your local Solr installation should show
you the version at the top.
> But I wonder what are the difference
Dear Erick,
Thanks so much for your help! I am new in Solr. So I have no idea about the
version.
But I wonder what are the differences between Solr and Hadoop? It seems that
Solr has done the same as what Hadoop promises.
Best,
Bing
On Sat, Nov 20, 2010 at 2:28 AM, Erick Erickson wrote:
> You
You haven't said what version of Solr you're using, but you're
asking about replication, which is built-in.
See: http://wiki.apache.org/solr/SolrReplication
And no, your slave doesn't block while the update is happening,
and it automatically switches to the updated index upon
successful replicatio
Hi, all,
I am working on a distributed searching system. Now I have one server only.
It has to crawl pages from the Web, generate indexes locally and respond
users' queries. I think this is too busy for it to work smoothly.
I plan to use two servers at at least. The jobs to crawl pages and genera