Re: dataimporthandler large dataset

2011-08-12 Thread Shawn Heisey
On 8/12/2011 3:32 PM, Eric Myers wrote: Recently started looking into solr to solve a problem created before my time. We have a dataset consisting of 390,000,000+ records that had a search written for it using a simple query. The problem is that the dataset needs additional indices to keep oper

Re: dataimporthandler large dataset

2011-08-12 Thread Kyle Lee
We have a 200,000,000 record index with 14 fields, and we can re-index the entire data set in about five hours. One thing to note is that the DataImportHandler uses one thread per entity by default. If you have a multcore box, you can drastically speed indexing by specifying a threadcount of n+1, w

dataimporthandler large dataset

2011-08-12 Thread Eric Myers
Recently started looking into solr to solve a problem created before my time. We have a dataset consisting of 390,000,000+ records that had a search written for it using a simple query. The problem is that the dataset needs additional indices to keep operating. The DBA says no go, too large a da

Re: Solr - DataImportHandler - Large Dataset results ?

2008-12-12 Thread Shalin Shekhar Mangar
On Sat, Dec 13, 2008 at 11:45 AM, Kay Kay wrote: > True - Currently , playing around with mysql . But I was trying to > understand more about how the Statement object is getting created (in the > case of a platform / vendor specific query like this ). Are we going through > JPA internally in Solr

Re: Solr - DataImportHandler - Large Dataset results ?

2008-12-12 Thread Kay Kay
Shalin Shekhar Mangar wrote: On Sat, Dec 13, 2008 at 11:03 AM, Kay Kay wrote: Thanks Shalin for the clarification. The case about Lucene taking more time to index the Document when compared to DataImportHandler creating the input is definitely intuitive. But just curious about the underly

Re: Solr - DataImportHandler - Large Dataset results ?

2008-12-12 Thread Shalin Shekhar Mangar
On Sat, Dec 13, 2008 at 11:03 AM, Kay Kay wrote: > Thanks Shalin for the clarification. > > The case about Lucene taking more time to index the Document when compared > to DataImportHandler creating the input is definitely intuitive. > > But just curious about the underlying architecture on which

Re: Solr - DataImportHandler - Large Dataset results ?

2008-12-12 Thread Kay Kay
Thanks Shalin for the clarification. The case about Lucene taking more time to index the Document when compared to DataImportHandler creating the input is definitely intuitive. But just curious about the underlying architecture on which the test was being run. Was this performed on a multi-co

Re: Solr - DataImportHandler - Large Dataset results ?

2008-12-12 Thread Shalin Shekhar Mangar
On Sat, Dec 13, 2008 at 4:51 AM, Kay Kay wrote: > Thanks Bryan . > > That clarifies a lot. > > But even with streaming - retrieving one document at a time and adding to > the IndexWriter seems to making it more serializable . > We have experimented with making DataImportHandler multi-threaded in

Re: Solr - DataImportHandler - Large Dataset results ?

2008-12-12 Thread Kay Kay
: Bryan Talbot Subject: Re: Solr - DataImportHandler - Large Dataset results ? To: solr-user@lucene.apache.org Date: Friday, December 12, 2008, 5:26 PM It only supports streaming if properly enabled which is completely lame: http://dev.mysql.com/doc/refman/5.0/en/connector-j-reference-implementation

Re: Solr - DataImportHandler - Large Dataset results ?

2008-12-12 Thread Bryan Talbot
Subject: Re: Solr - DataImportHandler - Large Dataset results ? To: solr-user@lucene.apache.org Date: Friday, December 12, 2008, 9:41 PM DataImportHandler is designed to stream rows one by one to create Solr documents. As long as your database driver supports streaming, you should be fine. Which

Re: Solr - DataImportHandler - Large Dataset results ?

2008-12-12 Thread Kay Kay
Mangar Subject: Re: Solr - DataImportHandler - Large Dataset results ? To: solr-user@lucene.apache.org Date: Friday, December 12, 2008, 9:41 PM DataImportHandler is designed to stream rows one by one to create Solr documents. As long as your database driver supports streaming, you should be fine

Re: Solr - DataImportHandler - Large Dataset results ?

2008-12-12 Thread Shalin Shekhar Mangar
DataImportHandler is designed to stream rows one by one to create Solr documents. As long as your database driver supports streaming, you should be fine. Which database are you using? On Sat, Dec 13, 2008 at 2:20 AM, Kay Kay wrote: > As per the example in the wiki - > http://wiki.apache.org/solr

Solr - DataImportHandler - Large Dataset results ?

2008-12-12 Thread Kay Kay
As per the example in the wiki - http://wiki.apache.org/solr/DataImportHandler - I am seeing the following fragment. .. My scaled-down application looks very similar along these lines but where my resultset is s