subject:"Indexing 20M documents from MySQL with DIH"

Re: Indexing 20M documents from MySQL with DIH

2011-05-05 Thread Scott Bigelow

Alex, thanks for your response. I suspect you're right about autoCommit; i ended up solving the problem by merely moving the entire Solr install, untouched, to a significantly larger instance (EC2 m1.small to m1.large). I think it is appropriately sized now for the quantity and intensity of queries

Re: Indexing 20M documents from MySQL with DIH

2011-05-05 Thread Shawn Heisey

I am running into this problem as well, but only sporadically, and only in my 3.1 test environment, not 1.4.1 production. I may have narrowed things down, I am interested now in learning whether this is a problem with the MySQL connector or DIH. On 4/21/2011 6:09 PM, Scott Bigelow wrote: Th

Re: Indexing 20M documents from MySQL with DIH

2011-05-05 Thread Alexey Serba

{quote} ... Caused by: java.io.EOFException: Can not read response from server. Expected to read 4 bytes, read 0 bytes before connection was unexpectedly lost. at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:2539) at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2989)

Re: Indexing 20M documents from MySQL with DIH

2011-04-24 Thread Scott Bigelow

Thank you everyone for your help. I ended up getting the index to work using the exact same config file on a (substantially) larger instance. On Fri, Apr 22, 2011 at 5:46 AM, Erick Erickson wrote: > {{{A custom indexer, so that's a fairly common practice? So when you are > dealing with these larg

Re: Indexing 20M documents from MySQL with DIH

2011-04-22 Thread Erick Erickson

{{{A custom indexer, so that's a fairly common practice? So when you are dealing with these large indexes, do you try not to fully rebuild them when you can? It's not a nightly thing, but something to do in case of a disaster? Is there a difference in the performance of an index that was built all

Re: Indexing 20M documents from MySQL with DIH

2011-04-21 Thread Li

Can you post the dataconfig.XML? Probably you didn't use batch size Sent from my iPhone On Apr 21, 2011, at 5:09 PM, Scott Bigelow wrote: > Thanks for the e-mail. I probably should have provided more details, > but I was more interested in making sure I was approaching the problem > correctly (

Re: Indexing 20M documents from MySQL with DIH

2011-04-21 Thread Scott Bigelow

Thanks for the e-mail. I probably should have provided more details, but I was more interested in making sure I was approaching the problem correctly (using DIH, with one big SELECT statement for millions of rows) instead of solving this specific problem. Here's a partial stacktrace from this speci

Re: Indexing 20M documents from MySQL with DIH

2011-04-21 Thread Chris Hostetter

: For a new project, I need to index about 20M records (30 fields) and I : have been running into issues with MySQL disconnects, right around : 15M. I've tried several remedies I've found on blogs, changing if you can provide some concrete error/log messages and the details of how you are config

Re: Indexing 20M documents from MySQL with DIH

2011-04-21 Thread Scott Bigelow

Thanks for your response! I think the issue is that the records are being returned TOO fast from MySQL. I can dump them to CSV in about 30 minutes, but building the solr index takes hours on the system I'm using. I may just need to use a more powerful Solr instance so it doesn't leave MySQL hangin

Re: Indexing 20M documents from MySQL with DIH

2011-04-21 Thread Robert Gründler

we're indexing around 10M records from a mysql database into a single solr core. The DataImportHandler needs to join 3 sub-entities to denormalize the data. We've run into some troubles for the first 2 attempts, but setting batchSize="-1" for the dataSource resolved the issues. Do you need a lo

Indexing 20M documents from MySQL with DIH

2011-04-21 Thread Scott Bigelow

I've been using Solr for a while now, indexing 2-4 million records using the DIH to pull data from MySQL, which has been working great. For a new project, I need to index about 20M records (30 fields) and I have been running into issues with MySQL disconnects, right around 15M. I've tried several r

Re: Indexing 20M documents from MySQL with DIH

Re: Indexing 20M documents from MySQL with DIH

Re: Indexing 20M documents from MySQL with DIH

Re: Indexing 20M documents from MySQL with DIH

Re: Indexing 20M documents from MySQL with DIH

Re: Indexing 20M documents from MySQL with DIH

Re: Indexing 20M documents from MySQL with DIH

Re: Indexing 20M documents from MySQL with DIH

Re: Indexing 20M documents from MySQL with DIH

Re: Indexing 20M documents from MySQL with DIH

Indexing 20M documents from MySQL with DIH

11 matches

Site Navigation

Mail list logo

Footer information