Do these settings apply to DIH? The example linked seems to refer to updateHandler, but I am not sure how/whether that affects DIH.
Regards, Alex. P.s. I was also having OOMs on large DIH imports. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Mon, Oct 15, 2012 at 5:15 AM, Erick Erickson <erickerick...@gmail.com> wrote: > About your second point. Try committing more often with openSearcher > set to false. > There's a bit here: > http://wiki.apache.org/solr/SolrConfigXml > > <autoCommit> > <maxDocs>10000</maxDocs> <!-- maximum uncommited docs before > autocommit triggered --> > <maxTime>15000</maxTime> <!-- maximum time (in MS) after adding > a doc before an autocommit is triggered --> > <openSearcher>false</openSearcher> <!-- SOLR 4.0. Optionally > don't open a searcher on hard commit. This is useful to minimize the > size of transaction logs that keep track of uncommitted updates. --> > </autoCommit> > > > That should keep the size of the transaction log down to reasonable levels... > > Best > Erick > > On Sun, Oct 14, 2012 at 4:11 PM, Shawn Heisey <s...@elyograg.org> wrote: >> Please see my other thread called "Testing Solr4 - reference thread"for >> general information about my config layout. If more specific information is >> required, please let me know. >> >> So far I cannot get a solr.war built without slf4j bindings to work right. >> There does not seem to be any centrally configured directory I can use for >> the slf4j and log4j jars. I am hesitant to use a lib entry in >> solrconfig.xml, because I actually have three distinct solrconfig.xml files >> and each server has 16 cores that symlink to those files. I can have each >> instanceDir contain a symlink to a more central lib directory, but I don't >> want each core to have its own copy of those jars loaded into memory unless >> it's the only way to make it work. If anyone knows how to make this work >> properly, let me know. If the instanceDir symlink option is the only way, I >> will probably file an issue in Jira. >> >> If the updateLog is turned on (I did add _version_ to my schema), doing a >> full reindex (using DIH) leads to "out of memory" exceptions, and the >> transaction log takes up the same amount of disk space (in a single log >> file) as the partially built index. Based on the index progress before it >> died, performance is terrible -- about one third the pace of Solr 3.5.0, >> perhaps less. >> >> After I turned off updateLog, performance went way up and it was able to >> complete without error. I think it is actually faster than it was under >> 3.5.0 with the exact same DIH config, as long as updateLog is turned off. I >> haven't done enough testing to file an issue yet. Are there ways to split >> the transaction log into multiple files and control how much disk space the >> log uses? Can I do anything to increase performance? >> >> For relative paths, instanceDir is relative to solr.home, dataDir is >> relative to instanceDir, and if you are using symlinks for solrconfig.xml, >> xinclude directives are relative to the symlink location, not the real file >> location. These seem like reasonable defaults to me. Is this what I should >> expect for the future, or should I be filing an issue? >> >> Thanks, >> Shawn >>