Forgot to add... sometimes really hammering at the SQL query in DIH can be fruitful, can you make a huge, monster query that's faster than the sub-queries?
I've also seen people run processes on the DB that move all the data into a temporary place making use of all of the nifty stuff you can do there and then use DIH on _that_. Or the view. All that said, I generally prefer using SolrJ if DIH doesn't do the job after a day or two of fiddling, it gives more control. Good Luck! Erick On Thu, May 26, 2016 at 11:02 AM, John Blythe <j...@curvolabs.com> wrote: > oo gotcha. cool, will make sure to check it out and bounce any related > questions through here. > > thanks! > > best, > > > -- > *John Blythe* > Product Manager & Lead Developer > > 251.605.3071 | j...@curvolabs.com > www.curvolabs.com > > 58 Adams Ave > Evansville, IN 47713 > > On Thu, May 26, 2016 at 1:45 PM, Erick Erickson <erickerick...@gmail.com> > wrote: > >> Solr commits aren't the issue I'd guess. All the time is >> probably being spent getting the data from MySQL. >> >> I've had some luck writing to Solr from a DB through a >> SolrJ program, here's a place to get started: >> searchhub.org/2012/02/14/indexing-with-solrj/ >> you can peel out the Tika bits pretty easily I should >> think. >> >> One technique I've used is to cache >> some of the DB tables in Java's memory to keep >> from having to do the secondary lookup(s). This only >> really works if the "secondary table" is small enough to fit in >> Java's memory of course. You can do some creative >> things with caching partial tables if you can sort appropriately. >> >> Best, >> Erick >> >> On Thu, May 26, 2016 at 9:01 AM, John Blythe <j...@curvolabs.com> wrote: >> > hi all, >> > >> > i've got layered entities in my solr import. it's calling on some >> > transactional data from a MySQL instance. there are two fields that are >> > used to then lookup other information from other tables via their related >> > UIDs, one of which has its own child entity w yet another select >> statement >> > to grab up more data. >> > >> > it fetches at about 120/s but processes at ~50-60/s. we currently only >> have >> > close to 500k records, but it's growing quickly and thus is becoming >> > increasingly painful to make modifications due to the reimport that needs >> > to then occur. >> > >> > i feel like i'd seen some threads regarding commits of new data, >> > master/slave, or solrcloud/sharding that could help in some ways related >> to >> > this but as of yet can't scrounge them up w my searches (ironic :p). >> > >> > can someone help by pointing me to some good material related to this >> sort >> > of thing? >> > >> > thanks- >>