Re: Importing large datasets

2010-06-07 Thread Alexey Serba
What's the relation between items and item_descriptions table? I.e. is there only one item_descriptions record for every id? If 1-1 then you can merge all your data into single database and use the following query HTH, Alex On Thu, Jun 3, 2010 at 6:34 AM, Blargy wrote: > > > Erik Hatcher-4

Re: Importing large datasets

2010-06-03 Thread Grant Ingersoll
On Jun 2, 2010, at 10:30 PM, Blargy wrote: > Whats more efficient a batch size of 1000 or -1 for MySQL? Is this why its > so slow because I am using 2 different datasources? > By batch size, I meant the number of docs sent from the client to Solr. MySQL Batch Size is broken. The only thing th

Re: Importing large datasets

2010-06-03 Thread Erik Hatcher
Frankly, if you can create a script that'll turn your data into valid CSV, that might be the easiest, quickest way to ingest your data. Pragmatic, at least. Avoids the complexity of DIH, allows you to script the export from your DB in the most efficient manner you can, and so on. Solr's

Re: Importing large datasets

2010-06-02 Thread David Stuart
On 3 Jun 2010, at 03:51, Blargy wrote: Would dumping the databases to a local file help at all? I would suspect not especally with the size of your data. But it would be good to know how long that takes i.e. Creating a SQL script that just pulls that data out how long does that take?

Re: Importing large datasets

2010-06-02 Thread David Stuart
wrote: From: Grant Ingersoll Subject: Re: Importing large datasets To: solr-user@lucene.apache.org Date: Wednesday, June 2, 2010, 3:42 AM On Jun 1, 2010, at 9:54 PM, Blargy wrote: We have around 5 million items in our index and each item has a description located on a separate physical

Re: Importing large datasets

2010-06-02 Thread David Stuart
w.yert.com/film.php --- On Wed, 6/2/10, Andrzej Bialecki wrote: From: Andrzej Bialecki Subject: Re: Importing large datasets To: solr-user@lucene.apache.org Date: Wednesday, June 2, 2010, 4:52 AM On 2010-06-02 13:12, Grant Ingersoll wrote: On Jun 2, 2010, at 6:53 AM, Andrzej Bialecki wrote:

Re: Importing large datasets

2010-06-02 Thread Blargy
Would dumping the databases to a local file help at all? -- View this message in context: http://lucene.472066.n3.nabble.com/Importing-large-datasets-tp863447p866538.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Importing large datasets

2010-06-02 Thread Blargy
Erik Hatcher-4 wrote: > > One thing that might help indexing speed - create a *single* SQL query > to grab all the data you need without using DIH's sub-entities, at > least the non-cached ones. > > Erik > > On Jun 2, 2010, at 12:21 PM, Blargy wrote: > >> >> >> As a data point, I ro

Re: Importing large datasets

2010-06-02 Thread Blargy
Lance Norskog-2 wrote: > > Wait! You're fetching records from one database and then doing lookups > against another DB? That makes this a completely different problem. > > The DIH does not to my knowledge have the ability to "pool" these > queries. That is, it will not build a batch of 1000 key

Re: Importing large datasets

2010-06-02 Thread Dennis Gearon
http://www.yert.com/film.php --- On Wed, 6/2/10, David Stuart wrote: > From: David Stuart > Subject: Re: Importing large datasets > To: "solr-user@lucene.apache.org" > Date: Wednesday, June 2, 2010, 12:00 PM > How long does it take to do a grab of > all the data via

Re: Importing large datasets

2010-06-02 Thread Dennis Gearon
gh at http://www.yert.com/film.php --- On Wed, 6/2/10, Andrzej Bialecki wrote: > From: Andrzej Bialecki > Subject: Re: Importing large datasets > To: solr-user@lucene.apache.org > Date: Wednesday, June 2, 2010, 4:52 AM > On 2010-06-02 13:12, Grant Ingersoll > wrote: > > >

Re: Importing large datasets

2010-06-02 Thread Dennis Gearon
e: > From: Grant Ingersoll > Subject: Re: Importing large datasets > To: solr-user@lucene.apache.org > Date: Wednesday, June 2, 2010, 3:42 AM > > On Jun 1, 2010, at 9:54 PM, Blargy wrote: > > > > > We have around 5 million items in our index and each > it

Re: Importing large datasets

2010-06-02 Thread Lance Norskog
Wait! You're fetching records from one database and then doing lookups against another DB? That makes this a completely different problem. The DIH does not to my knowledge have the ability to "pool" these queries. That is, it will not build a batch of 1000 keys from datasource1 and then do a query

Re: Importing large datasets

2010-06-02 Thread David Stuart
How long does it take to do a grab of all the data via SQL? I found by denormalizing the data into a lookup table meant that I was able to index about 300k rows of similar data size with dih regex spilting on some fields in about 8mins I know it's not quite the scale bit with batching...

Re: Importing large datasets

2010-06-02 Thread Blargy
> One thing that might help indexing speed - create a *single* SQL query > to grab all the data you need without using DIH's sub-entities, at > least the non-cached ones. > Not sure how much that would help. As I mentioned that without the item description import the full process takes 4 h

Re: Importing large datasets

2010-06-02 Thread Erik Hatcher
One thing that might help indexing speed - create a *single* SQL query to grab all the data you need without using DIH's sub-entities, at least the non-cached ones. Erik On Jun 2, 2010, at 12:21 PM, Blargy wrote: As a data point, I routinely see clients index 5M items on normal

Re: Importing large datasets

2010-06-02 Thread Blargy
As a data point, I routinely see clients index 5M items on normal hardware in approx. 1 hour (give or take 30 minutes). Also wanted to add that our main entity (item) consists of 5 sub-entities (ie, joins). 2 of those 5 are fairly small so I am using CachedSqlEntityProcessor for them but the ot

Re: Importing large datasets

2010-06-02 Thread Blargy
Andrzej Bialecki wrote: > > On 2010-06-02 12:42, Grant Ingersoll wrote: >> >> On Jun 1, 2010, at 9:54 PM, Blargy wrote: >> >>> >>> We have around 5 million items in our index and each item has a >>> description >>> located on a separate physical database. These item descriptions vary in >>> si

Re: Importing large datasets

2010-06-02 Thread Blargy
As a data point, I routinely see clients index 5M items on normal > hardware in approx. 1 hour (give or take 30 minutes). Our master solr machine is running 64-bit RHEL 5.4 on dedicated machine with 4 cores and 16G ram so I think we are good on the hardware. Our DB is MySQL version 5.0.67 (exa

Re: Importing large datasets

2010-06-02 Thread Andrzej Bialecki
On 2010-06-02 13:12, Grant Ingersoll wrote: > > On Jun 2, 2010, at 6:53 AM, Andrzej Bialecki wrote: > >> On 2010-06-02 12:42, Grant Ingersoll wrote: >>> >>> On Jun 1, 2010, at 9:54 PM, Blargy wrote: >>> We have around 5 million items in our index and each item has a description loc

Re: Importing large datasets

2010-06-02 Thread Grant Ingersoll
On Jun 2, 2010, at 6:53 AM, Andrzej Bialecki wrote: > On 2010-06-02 12:42, Grant Ingersoll wrote: >> >> On Jun 1, 2010, at 9:54 PM, Blargy wrote: >> >>> >>> We have around 5 million items in our index and each item has a description >>> located on a separate physical database. These item descr

Re: Importing large datasets

2010-06-02 Thread Andrzej Bialecki
On 2010-06-02 12:42, Grant Ingersoll wrote: > > On Jun 1, 2010, at 9:54 PM, Blargy wrote: > >> >> We have around 5 million items in our index and each item has a description >> located on a separate physical database. These item descriptions vary in >> size and for the most part are quite large.

Re: Importing large datasets

2010-06-02 Thread Grant Ingersoll
On Jun 1, 2010, at 9:54 PM, Blargy wrote: > > We have around 5 million items in our index and each item has a description > located on a separate physical database. These item descriptions vary in > size and for the most part are quite large. Currently we are only indexing > items and not their