Long blog post on commits and the state of updates here:
http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

hdfs is perfectly fine with Solr, there's even an HdfsDirectoryFactory for
your index. It has its own
performance characteristics/tuning parameters, so there'll be something of
a learning curve.

Best
Erick


On Sat, Nov 23, 2013 at 4:14 AM, Flavio Pompermaier <pomperma...@okkam.it>wrote:

> Thanks again for such a detailed description.
> In our use case we're going to save shards data on hdfs so they all have
> access to a shared location, it would be great to put such a file in one
> place in that case :)
> Do you think that using hdfs as storage is bad for performance?
> Last question: if I softCommit and I have to shutdown my tomcat, will data
> be commited to disk or do I have to annually force a commit before shutting
> down?
>
> Best,
> Flavio
>
> On Sat, Nov 23, 2013 at 2:01 AM, Erick Erickson <erickerick...@gmail.com
> >wrote:
>
> > about <1>. Well, at a high level you're right, of course.
> > Having the EFF stuff in a single place seems more elegant. But
> > then ugly details crop up. I.e. "one place" implies that you'd have
> > to fetch them over the network, potentially a very expensive
> > operation every time there was a commit. Is this really a good
> > tradeoff? With high network latency, this could be a performance
> > killer. But I suspect that the real reason is that nobody has found
> > a compelling use-case for this kind of thing. Until and unless
> > someone does, and is willing to make a patch, it'll be theory :).
> >
> > bq:  modifications also sent to replicas
> > with this kind of commits
> >
> > brief review:
> >
> > Update process:
> > 1> Update goes to a node.
> > 2> node forwards to all leaders
> > 3> leader forward to replicas
> > 4> replicas respond to their leader.
> > 5> leader responds to originating node.
> > 6> originating node responds to caller.
> >
> > At this point all the replicas for your entire cluster have the
> > update. This is entirely independent of commits. Whenever a
> > commit is issued the documents currently pending on a node
> > are committed and made visible to a searcher.
> >
> > If one is relying on solrconfig settings, then the commit happens
> > a little bit out of synch. Let's say that the commit (hard with
> > opensearcher=true or soft) is set to 60 seconds. Each node may
> > have a different commit time, depending upon when it was started.
> > So there may be a slight difference in when documents are visible.
> > You'll probably never notice.
> >
> > If you issue commits from a client, then the commit is propagated
> > to all nodes in the cluster.
> >
> > HTH,
> > Erick
> >
> >
> > On Fri, Nov 22, 2013 at 7:23 PM, Flavio Pompermaier <
> pomperma...@okkam.it
> > >wrote:
> >
> > > On Fri, Nov 22, 2013 at 2:21 PM, Erick Erickson <
> erickerick...@gmail.com
> > > >wrote:
> > >
> > > > 1> I'm not quite sure I understand. External File Fields are keyed
> > > > by the unique id of the doc. So every shard _must_ have the
> > > > eff available for at least the documents in that shard. At first
> glance
> > > > this doesn't look simple. Perhaps a bit more explanation of what
> > > > you're using EFF for?
> > > >
> > > Thanks Erick for the reply, I use EFF for boosting results by
> popularity.
> > > So I was right, I should put popularity in every shard data dir..right?
> > But
> > > why not keeping that file in just one place (obviously the file should
> be
> > > reachable by all solrcloud nodes...) and allow external fields to be
> > > outside data dir?
> > >
> > > >
> > > > 2> Let's be sure we're talking about the same thing here. In Solr,
> > > > a "commit" is the command that makes documents visible, often
> > > > controlled by the autoCommit and autoSoftCommit settings in
> > > > solrconfig.xml. You will not be able to issue 100 commits/second.
> > > >
> > > > If you're using "commit" to mean adding a document to the index,
> > > > then 100/s should be no problem. I regularly see many times that
> > > > ingestion rate. The documents won't be visible to search until
> > > > you do a commit however.
> > > >
> > > Yeah, now it is more clear. Still a question: for my client is not a
> > > problem to soft commit but, are the modifications also sent to replicas
> > > with this kind of commits?
> > >
> > > >
> > > > Best
> > > > Erick
> > > >
> > > >
> > > > On Fri, Nov 22, 2013 at 4:44 AM, Flavio Pompermaier <
> > > pomperma...@okkam.it
> > > > >wrote:
> > > >
> > > > > Hi to all,
> > > > > we're migrating from solr 3.x to solr 4.x to use Solrcloud and I
> have
> > > two
> > > > > big doubts:
> > > > >
> > > > > 1) External fields. When I compute such a file do I have to copy it
> > in
> > > > the
> > > > >  data directory of shards..? The external fields boosts the results
> > of
> > > > the
> > > > > query to a specific collection, for me it doesn't make sense to put
> > it
> > > in
> > > > > all shard's data dir, it should be something related to the
> > collection
> > > > > itself.
> > > > > Am I wrong or missing something? Is there a simple way to upload
> the
> > > > > popularity file (for the external field) at one in all shards?
> > > > >
> > > > > 2) My index requires frequently commits (i.e. sometimes up to
> 100/s).
> > > How
> > > > > do I have to manage this? Do I have to use soft commits..? Any
> simple
> > > > > configuration/code snippet to use them? Is it true that external
> > fields
> > > > > affect performance on commit?
> > > > >
> > > > > Best,
> > > > > Flavio
> > > > >
> > > >
> > >
> >
>

Reply via email to