Re: Using Solr with Hadoop ....

2008-11-29 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Sat, Nov 29, 2008 at 7:26 PM, Jon Baer <[EMAIL PROTECTED]> wrote: > HadoopEntityProcessor for the DIH? Reading data from Hadoop with DIH could be really cool There are a few very useful ones which are required badly. Most useful one would be a TikaEntityProcessor. But I do not see it solving th

Re: Using Solr with Hadoop ....

2008-11-29 Thread Jon Baer
HadoopEntityProcessor for the DIH? Ive wondered about this as they make HadoopCluster LiveCDs and EC2 have images but best way to make use of them is always a challenge. - Jon On Nov 29, 2008, at 3:34 AM, Erik Hatcher wrote: On Nov 28, 2008, at 8:38 PM, Yonik Seeley wrote: Or, it would b

Re: Using Solr with Hadoop ....

2008-11-29 Thread Erik Hatcher
On Nov 28, 2008, at 8:38 PM, Yonik Seeley wrote: Or, it would be relatively trivial to write a Lucene program to merge the indexes. FYI, such a tool exists in Lucene's API already: Erik

RE: Using Solr with Hadoop ....

2008-11-28 Thread souravm
ay be worthwhile where I use Solr/Lucene's indexing power and Hadoop's parallel processing capability. Regards, Sourav -Original Message- From: Yonik Seeley [mailto:[EMAIL PROTECTED] Sent: Friday, November 28, 2008 7:08 PM To: solr-user@lucene.apache.org Subject: Re: Using S

Re: Using Solr with Hadoop ....

2008-11-28 Thread Yonik Seeley
Ah sorry, I had misread your original post. 3-6M docs per hour can be challenging. Using the CSV loader, I've indexed 4000 docs per second (14M per hour) on a 2.6GHz Athlon, but they were relatively simple and small docs. On Fri, Nov 28, 2008 at 9:54 PM, souravm <[EMAIL PROTECTED]> wrote: > There

RE: Using Solr with Hadoop ....

2008-11-28 Thread souravm
ginal Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: Friday, November 28, 2008 5:38 PM To: solr-user@lucene.apache.org Subject: Re: Using Solr with Hadoop The indexing rate you need to achieve should be equal to the rate that new documents are produced

Re: Using Solr with Hadoop ....

2008-11-28 Thread Yonik Seeley
Original Message- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley > Sent: Friday, November 28, 2008 1:58 PM > To: solr-user@lucene.apache.org > Subject: Re: Using Solr with Hadoop > > While future Solr-hadoop integration is a definite possibil

RE: Using Solr with Hadoop ....

2008-11-28 Thread souravm
he servers. Regards, Sourav -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: Friday, November 28, 2008 1:58 PM To: solr-user@lucene.apache.org Subject: Re: Using Solr with Hadoop While future Solr-hadoop integration is a definite p

Re: Using Solr with Hadoop ....

2008-11-28 Thread Yonik Seeley
s not an option as my target doc size > per hr itself can be very huge (3-6M). So I am considering using HDFS and > MapReduce to do the indexing job within time. > > In that regard I have following queries regarding using Solr with Hadoop. > > 1. After creating the index using Had

Using Solr with Hadoop ....

2008-11-28 Thread souravm
using HDFS and MapReduce to do the indexing job within time. In that regard I have following queries regarding using Solr with Hadoop. 1. After creating the index using Hadoop whether storing them for query purpose again in HDFS would mean additional performance overhead (compared to storing