What's the relation between items and item_descriptions table? I.e. is
there only one item_descriptions record for every id?
If 1-1 then you can merge all your data into single database and use
the following query
HTH,
Alex
On Thu, Jun 3, 2010 at 6:34 AM, Blargy wrote:
>
>
> Erik Hatcher-4
On Jun 2, 2010, at 10:30 PM, Blargy wrote:
> Whats more efficient a batch size of 1000 or -1 for MySQL? Is this why its
> so slow because I am using 2 different datasources?
>
By batch size, I meant the number of docs sent from the client to Solr. MySQL
Batch Size is broken. The only thing th
Frankly, if you can create a script that'll turn your data into valid
CSV, that might be the easiest, quickest way to ingest your data.
Pragmatic, at least. Avoids the complexity of DIH, allows you to
script the export from your DB in the most efficient manner you can,
and so on.
Solr's
On 3 Jun 2010, at 03:51, Blargy wrote:
Would dumping the databases to a local file help at all?
I would suspect not especally with the size of your data. But it would
be good to know how long that takes i.e. Creating a SQL script that
just pulls that data out how long does that take?
wrote:
From: Grant Ingersoll
Subject: Re: Importing large datasets
To: solr-user@lucene.apache.org
Date: Wednesday, June 2, 2010, 3:42 AM
On Jun 1, 2010, at 9:54 PM, Blargy wrote:
We have around 5 million items in our index and each
item has a description
located on a separate physical
w.yert.com/film.php
--- On Wed, 6/2/10, Andrzej Bialecki wrote:
From: Andrzej Bialecki
Subject: Re: Importing large datasets
To: solr-user@lucene.apache.org
Date: Wednesday, June 2, 2010, 4:52 AM
On 2010-06-02 13:12, Grant Ingersoll
wrote:
On Jun 2, 2010, at 6:53 AM, Andrzej Bialecki wrote:
Would dumping the databases to a local file help at all?
--
View this message in context:
http://lucene.472066.n3.nabble.com/Importing-large-datasets-tp863447p866538.html
Sent from the Solr - User mailing list archive at Nabble.com.
Erik Hatcher-4 wrote:
>
> One thing that might help indexing speed - create a *single* SQL query
> to grab all the data you need without using DIH's sub-entities, at
> least the non-cached ones.
>
> Erik
>
> On Jun 2, 2010, at 12:21 PM, Blargy wrote:
>
>>
>>
>> As a data point, I ro
Lance Norskog-2 wrote:
>
> Wait! You're fetching records from one database and then doing lookups
> against another DB? That makes this a completely different problem.
>
> The DIH does not to my knowledge have the ability to "pool" these
> queries. That is, it will not build a batch of 1000 key
http://www.yert.com/film.php
--- On Wed, 6/2/10, David Stuart wrote:
> From: David Stuart
> Subject: Re: Importing large datasets
> To: "solr-user@lucene.apache.org"
> Date: Wednesday, June 2, 2010, 12:00 PM
> How long does it take to do a grab of
> all the data via
gh at http://www.yert.com/film.php
--- On Wed, 6/2/10, Andrzej Bialecki wrote:
> From: Andrzej Bialecki
> Subject: Re: Importing large datasets
> To: solr-user@lucene.apache.org
> Date: Wednesday, June 2, 2010, 4:52 AM
> On 2010-06-02 13:12, Grant Ingersoll
> wrote:
> >
>
e:
> From: Grant Ingersoll
> Subject: Re: Importing large datasets
> To: solr-user@lucene.apache.org
> Date: Wednesday, June 2, 2010, 3:42 AM
>
> On Jun 1, 2010, at 9:54 PM, Blargy wrote:
>
> >
> > We have around 5 million items in our index and each
> it
Wait! You're fetching records from one database and then doing lookups
against another DB? That makes this a completely different problem.
The DIH does not to my knowledge have the ability to "pool" these
queries. That is, it will not build a batch of 1000 keys from
datasource1 and then do a query
How long does it take to do a grab of all the data via SQL? I found by
denormalizing the data into a lookup table meant that I was able to
index about 300k rows of similar data size with dih regex spilting on
some fields in about 8mins I know it's not quite the scale bit with
batching...
> One thing that might help indexing speed - create a *single* SQL query
> to grab all the data you need without using DIH's sub-entities, at
> least the non-cached ones.
>
Not sure how much that would help. As I mentioned that without the item
description import the full process takes 4 h
One thing that might help indexing speed - create a *single* SQL query
to grab all the data you need without using DIH's sub-entities, at
least the non-cached ones.
Erik
On Jun 2, 2010, at 12:21 PM, Blargy wrote:
As a data point, I routinely see clients index 5M items on normal
As a data point, I routinely see clients index 5M items on normal hardware
in approx. 1 hour (give or take 30 minutes).
Also wanted to add that our main entity (item) consists of 5 sub-entities
(ie, joins). 2 of those 5 are fairly small so I am using
CachedSqlEntityProcessor for them but the ot
Andrzej Bialecki wrote:
>
> On 2010-06-02 12:42, Grant Ingersoll wrote:
>>
>> On Jun 1, 2010, at 9:54 PM, Blargy wrote:
>>
>>>
>>> We have around 5 million items in our index and each item has a
>>> description
>>> located on a separate physical database. These item descriptions vary in
>>> si
As a data point, I routinely see clients index 5M items on normal
> hardware in approx. 1 hour (give or take 30 minutes).
Our master solr machine is running 64-bit RHEL 5.4 on dedicated machine with
4 cores and 16G ram so I think we are good on the hardware. Our DB is MySQL
version 5.0.67 (exa
On 2010-06-02 13:12, Grant Ingersoll wrote:
>
> On Jun 2, 2010, at 6:53 AM, Andrzej Bialecki wrote:
>
>> On 2010-06-02 12:42, Grant Ingersoll wrote:
>>>
>>> On Jun 1, 2010, at 9:54 PM, Blargy wrote:
>>>
We have around 5 million items in our index and each item has a description
loc
On Jun 2, 2010, at 6:53 AM, Andrzej Bialecki wrote:
> On 2010-06-02 12:42, Grant Ingersoll wrote:
>>
>> On Jun 1, 2010, at 9:54 PM, Blargy wrote:
>>
>>>
>>> We have around 5 million items in our index and each item has a description
>>> located on a separate physical database. These item descr
On 2010-06-02 12:42, Grant Ingersoll wrote:
>
> On Jun 1, 2010, at 9:54 PM, Blargy wrote:
>
>>
>> We have around 5 million items in our index and each item has a description
>> located on a separate physical database. These item descriptions vary in
>> size and for the most part are quite large.
On Jun 1, 2010, at 9:54 PM, Blargy wrote:
>
> We have around 5 million items in our index and each item has a description
> located on a separate physical database. These item descriptions vary in
> size and for the most part are quite large. Currently we are only indexing
> items and not their
23 matches
Mail list logo