Re: Using Solr with Hadoop ....

2008-11-29 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Sat, Nov 29, 2008 at 7:26 PM, Jon Baer <[EMAIL PROTECTED]> wrote: > HadoopEntityProcessor for the DIH? Reading data from Hadoop with DIH could be really cool There are a few very useful ones which are required badly. Most useful one would be a TikaEntityProcessor. But I do not see it solving th

Re: Using Solr with Hadoop ....

2008-11-29 Thread Jon Baer
HadoopEntityProcessor for the DIH? Ive wondered about this as they make HadoopCluster LiveCDs and EC2 have images but best way to make use of them is always a challenge. - Jon On Nov 29, 2008, at 3:34 AM, Erik Hatcher wrote: On Nov 28, 2008, at 8:38 PM, Yonik Seeley wrote: Or, it would b

Re: Using Solr with Hadoop ....

2008-11-29 Thread Erik Hatcher
On Nov 28, 2008, at 8:38 PM, Yonik Seeley wrote: Or, it would be relatively trivial to write a Lucene program to merge the indexes. FYI, such a tool exists in Lucene's API already: Erik

RE: Using Solr with Hadoop ....

2008-11-28 Thread souravm
ay be worthwhile where I use Solr/Lucene's indexing power and Hadoop's parallel processing capability. Regards, Sourav -Original Message- From: Yonik Seeley [mailto:[EMAIL PROTECTED] Sent: Friday, November 28, 2008 7:08 PM To: solr-user@lucene.apache.org Subject: Re: Using S

Re: Using Solr with Hadoop ....

2008-11-28 Thread Yonik Seeley
Ah sorry, I had misread your original post. 3-6M docs per hour can be challenging. Using the CSV loader, I've indexed 4000 docs per second (14M per hour) on a 2.6GHz Athlon, but they were relatively simple and small docs. On Fri, Nov 28, 2008 at 9:54 PM, souravm <[EMAIL PROTECTED]> wrote: > There

RE: Using Solr with Hadoop ....

2008-11-28 Thread souravm
ginal Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: Friday, November 28, 2008 5:38 PM To: solr-user@lucene.apache.org Subject: Re: Using Solr with Hadoop The indexing rate you need to achieve should be equal to the rate that new documents are produced

Re: Using Solr with Hadoop ....

2008-11-28 Thread Yonik Seeley
Original Message- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley > Sent: Friday, November 28, 2008 1:58 PM > To: solr-user@lucene.apache.org > Subject: Re: Using Solr with Hadoop > > While future Solr-hadoop integration is a definite possibil

RE: Using Solr with Hadoop ....

2008-11-28 Thread souravm
he servers. Regards, Sourav -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: Friday, November 28, 2008 1:58 PM To: solr-user@lucene.apache.org Subject: Re: Using Solr with Hadoop While future Solr-hadoop integration is a definite p

Re: Using Solr with Hadoop ....

2008-11-28 Thread Yonik Seeley
While future Solr-hadoop integration is a definite possibility (and will enable other cool stuff), it doesn't necessarily seem needed for the problem you are trying to solve. > indexing them in parallel is not an option as my target doc size per hr > itself can be very huge (3-6M) I'm not sure I