About your second point. Try committing more often with openSearcher
set to false.
There's a bit here:
http://wiki.apache.org/solr/SolrConfigXml
<autoCommit>
<maxDocs>10000</maxDocs> <!-- maximum uncommited docs before
autocommit triggered -->
<maxTime>15000</maxTime> <!-- maximum time (in MS) after adding
a doc before an autocommit is triggered -->
<openSearcher>false</openSearcher> <!-- SOLR 4.0. Optionally
don't open a searcher on hard commit. This is useful to minimize the
size of transaction logs that keep track of uncommitted updates. -->
</autoCommit>
That should keep the size of the transaction log down to reasonable levels...
Best
Erick
On Sun, Oct 14, 2012 at 4:11 PM, Shawn Heisey <[email protected]> wrote:
> Please see my other thread called "Testing Solr4 - reference thread"for
> general information about my config layout. If more specific information is
> required, please let me know.
>
> So far I cannot get a solr.war built without slf4j bindings to work right.
> There does not seem to be any centrally configured directory I can use for
> the slf4j and log4j jars. I am hesitant to use a lib entry in
> solrconfig.xml, because I actually have three distinct solrconfig.xml files
> and each server has 16 cores that symlink to those files. I can have each
> instanceDir contain a symlink to a more central lib directory, but I don't
> want each core to have its own copy of those jars loaded into memory unless
> it's the only way to make it work. If anyone knows how to make this work
> properly, let me know. If the instanceDir symlink option is the only way, I
> will probably file an issue in Jira.
>
> If the updateLog is turned on (I did add _version_ to my schema), doing a
> full reindex (using DIH) leads to "out of memory" exceptions, and the
> transaction log takes up the same amount of disk space (in a single log
> file) as the partially built index. Based on the index progress before it
> died, performance is terrible -- about one third the pace of Solr 3.5.0,
> perhaps less.
>
> After I turned off updateLog, performance went way up and it was able to
> complete without error. I think it is actually faster than it was under
> 3.5.0 with the exact same DIH config, as long as updateLog is turned off. I
> haven't done enough testing to file an issue yet. Are there ways to split
> the transaction log into multiple files and control how much disk space the
> log uses? Can I do anything to increase performance?
>
> For relative paths, instanceDir is relative to solr.home, dataDir is
> relative to instanceDir, and if you are using symlinks for solrconfig.xml,
> xinclude directives are relative to the symlink location, not the real file
> location. These seem like reasonable defaults to me. Is this what I should
> expect for the future, or should I be filing an issue?
>
> Thanks,
> Shawn
>