Long blog post on commits and the state of updates here: http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
hdfs is perfectly fine with Solr, there's even an HdfsDirectoryFactory for your index. It has its own performance characteristics/tuning parameters, so there'll be something of a learning curve. Best Erick On Sat, Nov 23, 2013 at 4:14 AM, Flavio Pompermaier <pomperma...@okkam.it>wrote: > Thanks again for such a detailed description. > In our use case we're going to save shards data on hdfs so they all have > access to a shared location, it would be great to put such a file in one > place in that case :) > Do you think that using hdfs as storage is bad for performance? > Last question: if I softCommit and I have to shutdown my tomcat, will data > be commited to disk or do I have to annually force a commit before shutting > down? > > Best, > Flavio > > On Sat, Nov 23, 2013 at 2:01 AM, Erick Erickson <erickerick...@gmail.com > >wrote: > > > about <1>. Well, at a high level you're right, of course. > > Having the EFF stuff in a single place seems more elegant. But > > then ugly details crop up. I.e. "one place" implies that you'd have > > to fetch them over the network, potentially a very expensive > > operation every time there was a commit. Is this really a good > > tradeoff? With high network latency, this could be a performance > > killer. But I suspect that the real reason is that nobody has found > > a compelling use-case for this kind of thing. Until and unless > > someone does, and is willing to make a patch, it'll be theory :). > > > > bq: modifications also sent to replicas > > with this kind of commits > > > > brief review: > > > > Update process: > > 1> Update goes to a node. > > 2> node forwards to all leaders > > 3> leader forward to replicas > > 4> replicas respond to their leader. > > 5> leader responds to originating node. > > 6> originating node responds to caller. > > > > At this point all the replicas for your entire cluster have the > > update. This is entirely independent of commits. Whenever a > > commit is issued the documents currently pending on a node > > are committed and made visible to a searcher. > > > > If one is relying on solrconfig settings, then the commit happens > > a little bit out of synch. Let's say that the commit (hard with > > opensearcher=true or soft) is set to 60 seconds. Each node may > > have a different commit time, depending upon when it was started. > > So there may be a slight difference in when documents are visible. > > You'll probably never notice. > > > > If you issue commits from a client, then the commit is propagated > > to all nodes in the cluster. > > > > HTH, > > Erick > > > > > > On Fri, Nov 22, 2013 at 7:23 PM, Flavio Pompermaier < > pomperma...@okkam.it > > >wrote: > > > > > On Fri, Nov 22, 2013 at 2:21 PM, Erick Erickson < > erickerick...@gmail.com > > > >wrote: > > > > > > > 1> I'm not quite sure I understand. External File Fields are keyed > > > > by the unique id of the doc. So every shard _must_ have the > > > > eff available for at least the documents in that shard. At first > glance > > > > this doesn't look simple. Perhaps a bit more explanation of what > > > > you're using EFF for? > > > > > > > Thanks Erick for the reply, I use EFF for boosting results by > popularity. > > > So I was right, I should put popularity in every shard data dir..right? > > But > > > why not keeping that file in just one place (obviously the file should > be > > > reachable by all solrcloud nodes...) and allow external fields to be > > > outside data dir? > > > > > > > > > > > 2> Let's be sure we're talking about the same thing here. In Solr, > > > > a "commit" is the command that makes documents visible, often > > > > controlled by the autoCommit and autoSoftCommit settings in > > > > solrconfig.xml. You will not be able to issue 100 commits/second. > > > > > > > > If you're using "commit" to mean adding a document to the index, > > > > then 100/s should be no problem. I regularly see many times that > > > > ingestion rate. The documents won't be visible to search until > > > > you do a commit however. > > > > > > > Yeah, now it is more clear. Still a question: for my client is not a > > > problem to soft commit but, are the modifications also sent to replicas > > > with this kind of commits? > > > > > > > > > > > Best > > > > Erick > > > > > > > > > > > > On Fri, Nov 22, 2013 at 4:44 AM, Flavio Pompermaier < > > > pomperma...@okkam.it > > > > >wrote: > > > > > > > > > Hi to all, > > > > > we're migrating from solr 3.x to solr 4.x to use Solrcloud and I > have > > > two > > > > > big doubts: > > > > > > > > > > 1) External fields. When I compute such a file do I have to copy it > > in > > > > the > > > > > data directory of shards..? The external fields boosts the results > > of > > > > the > > > > > query to a specific collection, for me it doesn't make sense to put > > it > > > in > > > > > all shard's data dir, it should be something related to the > > collection > > > > > itself. > > > > > Am I wrong or missing something? Is there a simple way to upload > the > > > > > popularity file (for the external field) at one in all shards? > > > > > > > > > > 2) My index requires frequently commits (i.e. sometimes up to > 100/s). > > > How > > > > > do I have to manage this? Do I have to use soft commits..? Any > simple > > > > > configuration/code snippet to use them? Is it true that external > > fields > > > > > affect performance on commit? > > > > > > > > > > Best, > > > > > Flavio > > > > > > > > > > > > > > >