Lance Norskog-2 wrote: > > Wait! You're fetching records from one database and then doing lookups > against another DB? That makes this a completely different problem. > > The DIH does not to my knowledge have the ability to "pool" these > queries. That is, it will not build a batch of 1000 keys from > datasource1 and then do a query against datasource2 with: > select foo where key_field IN (key1, key2,... key1000); > > This is the efficient way to do what you want. You'll have to write > your own client to do this. > > On Wed, Jun 2, 2010 at 12:00 PM, David Stuart > <david.stu...@progressivealliance.co.uk> wrote: >> How long does it take to do a grab of all the data via SQL? I found by >> denormalizing the data into a lookup table meant that I was able to index >> about 300k rows of similar data size with dih regex spilting on some >> fields >> in about 8mins I know it's not quite the scale bit with batching... >> >> David Stuar >> >> On 2 Jun 2010, at 17:58, Blargy <zman...@hotmail.com> wrote: >> >>> >>> >>> >>>> One thing that might help indexing speed - create a *single* SQL query >>>> to grab all the data you need without using DIH's sub-entities, at >>>> least the non-cached ones. >>>> >>> >>> Not sure how much that would help. As I mentioned that without the item >>> description import the full process takes 4 hours which is bearable. >>> However >>> once I started to import the item description which is located on a >>> separate >>> machine/database the import process exploded to over 24 hours. >>> >>> -- >>> View this message in context: >>> http://lucene.472066.n3.nabble.com/Importing-large-datasets-tp863447p865324.html >>> Sent from the Solr - User mailing list archive at Nabble.com. >> > > > > -- > Lance Norskog > goks...@gmail.com >
Whats more efficient a batch size of 1000 or -1 for MySQL? Is this why its so slow because I am using 2 different datasources? Say I am using just one datasource should I still be seing "Creating a connection for entity ...." for each sub entity in the document or should it just be using one connection? -- View this message in context: http://lucene.472066.n3.nabble.com/Importing-large-datasets-tp863447p866499.html Sent from the Solr - User mailing list archive at Nabble.com.