Do these settings apply to DIH? The example linked seems to refer to
updateHandler, but I am not sure how/whether that affects DIH.

Regards,
  Alex.
P.s. I was also having OOMs on large DIH imports.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Mon, Oct 15, 2012 at 5:15 AM, Erick Erickson <erickerick...@gmail.com> wrote:
> About your second point. Try committing more often with openSearcher
> set to false.
> There's a bit here:
> http://wiki.apache.org/solr/SolrConfigXml
>
>     <autoCommit>
>       <maxDocs>10000</maxDocs> <!-- maximum uncommited docs before
> autocommit triggered -->
>       <maxTime>15000</maxTime> <!-- maximum time (in MS) after adding
> a doc before an autocommit is triggered -->
>       <openSearcher>false</openSearcher> <!-- SOLR 4.0.  Optionally
> don't open a searcher on hard commit.  This is useful to minimize the
> size of transaction logs that keep track of uncommitted updates. -->
>     </autoCommit>
>
>
> That should keep the size of the transaction log down to reasonable levels...
>
> Best
> Erick
>
> On Sun, Oct 14, 2012 at 4:11 PM, Shawn Heisey <s...@elyograg.org> wrote:
>> Please see my other thread called "Testing Solr4 - reference thread"for
>> general information about my config layout. If more specific information is
>> required, please let me know.
>>
>> So far I cannot get a solr.war built without slf4j bindings to work right.
>> There does not seem to be any centrally configured directory I can use for
>> the slf4j and log4j jars.  I am hesitant to use a lib entry in
>> solrconfig.xml, because I actually have three distinct solrconfig.xml files
>> and each server has 16 cores that symlink to those files.  I can have each
>> instanceDir contain a symlink to a more central lib directory, but I don't
>> want each core to have its own copy of those jars loaded into memory unless
>> it's the only way to make it work.  If anyone knows how to make this work
>> properly, let me know.  If the instanceDir symlink option is the only way, I
>> will probably file an issue in Jira.
>>
>> If the updateLog is turned on (I did add _version_ to my schema), doing a
>> full reindex (using DIH) leads to "out of memory" exceptions, and the
>> transaction log takes up the same amount of disk space (in a single log
>> file) as the partially built index. Based on the index progress before it
>> died, performance is terrible -- about one third the pace of Solr 3.5.0,
>> perhaps less.
>>
>> After I turned off updateLog, performance went way up and it was able to
>> complete without error.  I think it is actually faster than it was under
>> 3.5.0 with the exact same DIH config, as long as updateLog is turned off.  I
>> haven't done enough testing to file an issue yet.  Are there ways to split
>> the transaction log into multiple files and control how much disk space the
>> log uses?  Can I do anything to increase performance?
>>
>> For relative paths, instanceDir is relative to solr.home, dataDir is
>> relative to instanceDir, and if you are using symlinks for solrconfig.xml,
>> xinclude directives are relative to the symlink location, not the real file
>> location.  These seem like reasonable defaults to me.  Is this what I should
>> expect for the future, or should I be filing an issue?
>>
>> Thanks,
>> Shawn
>>

Reply via email to