I've been using Solr for a while now, indexing 2-4 million records
using the DIH to pull data from MySQL, which has been working great.
For a new project, I need to index about 20M records (30 fields) and I
have been running into issues with MySQL disconnects, right around
15M. I've tried several r
; the data.
>
> We've run into some troubles for the first 2 attempts, but setting
> batchSize="-1" for the dataSource resolved the issues.
>
> Do you need a lot of complex joins to import the data from mysql?
>
>
>
> -robert
>
>
>
>
> On 4/2
Thanks for the e-mail. I probably should have provided more details,
but I was more interested in making sure I was approaching the problem
correctly (using DIH, with one big SELECT statement for millions of
rows) instead of solving this specific problem. Here's a partial
stacktrace from this speci
erational difference between a newly-rebuilt index
> and one that's been optimized. If you don't delete/update, there's not
> much reason to optimize either
>
> I'll leave the DIH to others..
>
> Best
> Erick
>
> On Thu, Apr 21, 2011 at 8:09 PM, Scot
In DataImportHandler, is it possible to use the prior maximum value of
the PrimaryKey in the delta query, as opposed to (or in addition to)
using "dataimporter.last_index_time"? We already have Created_On and
Updated_On fields, but we've only indexed the Updated_On fields. I was
hoping for somethin
I experienced the same issue. With Solr 1.x, I was copying out the
'example' directory to make my solr installation. However, for the
Solr 3.x distributions, the DataImportHandler class exists in a
directory that is at the same level as example: "dist", not a
directory within.
You'll either want t
Title pretty much says it all; I've configured the DIH in 3.1.0, and
it works great, except the delta-imports are always from the last time
a full-import happened, not a delta-import. After a delta-import,
dataimport.properties is completely untouched. The documentation
implies that the delta-impor
s
> net_write_timeout, so it kills the connection.
> {quote}
>
> I was thinking about some hackish solution to paginate results
>
>
>
>
> Or something along those lines ( you'd need to to calculate offset in
> pages query )
>
> But unfortunately MySQL does not provi