Thanks Eric. For the record, we are using 1.4.1 and SolrJ. On 31 October 2010 01:54, Erick Erickson <erickerick...@gmail.com> wrote:
> What version of Solr are you using? > > About committing. I'd just let the solr defaults handle that. You configure > this in the autocommit section of solrconfig.xml. I'm pretty sure this > gets > triggered even if you're using SolrJ. > > That said, it's probably wise to issue a commit after all your data is > indexed > too, just to flush any remaining documents since the last autocommit. > > Optimize should not be issued until you're all done, if at all. If > you're not deleting (or updating) documents, don't bother to optimize > unless the number of files in your index directory gets really large. > Recent Solr code almost removes the need to optimize unless you > delete documents, but I confess I don't know the revision number > "recent" refers to, perhaps only trunk... > > HTH > Erick > > On Thu, Oct 28, 2010 at 9:56 AM, Savvas-Andreas Moysidis < > savvas.andreas.moysi...@googlemail.com> wrote: > > > Hello, > > > > We currently index our data through a SQL-DIH setup but due to our model > > (and therefore sql query) becoming complex we need to index our data > > programmatically. As we didn't have to deal with commit/optimise before, > we > > are now wondering whether there is an optimal approach to that. Is there > a > > batch size after which we should fire a commit or should we execute a > > commit > > after indexing all of our data? What about optimise? > > > > Our document corpus is > 4m documents and through DIH the resulting index > > is > > around 1.5G > > > > We have searched previous posts but couldn't find a definite answer. Any > > input much appreciated! > > > > Regards, > > -- Savvas > > >