Thanks again for such a detailed description. In our use case we're going to save shards data on hdfs so they all have access to a shared location, it would be great to put such a file in one place in that case :) Do you think that using hdfs as storage is bad for performance? Last question: if I softCommit and I have to shutdown my tomcat, will data be commited to disk or do I have to annually force a commit before shutting down?
Best, Flavio On Sat, Nov 23, 2013 at 2:01 AM, Erick Erickson <erickerick...@gmail.com>wrote: > about <1>. Well, at a high level you're right, of course. > Having the EFF stuff in a single place seems more elegant. But > then ugly details crop up. I.e. "one place" implies that you'd have > to fetch them over the network, potentially a very expensive > operation every time there was a commit. Is this really a good > tradeoff? With high network latency, this could be a performance > killer. But I suspect that the real reason is that nobody has found > a compelling use-case for this kind of thing. Until and unless > someone does, and is willing to make a patch, it'll be theory :). > > bq: modifications also sent to replicas > with this kind of commits > > brief review: > > Update process: > 1> Update goes to a node. > 2> node forwards to all leaders > 3> leader forward to replicas > 4> replicas respond to their leader. > 5> leader responds to originating node. > 6> originating node responds to caller. > > At this point all the replicas for your entire cluster have the > update. This is entirely independent of commits. Whenever a > commit is issued the documents currently pending on a node > are committed and made visible to a searcher. > > If one is relying on solrconfig settings, then the commit happens > a little bit out of synch. Let's say that the commit (hard with > opensearcher=true or soft) is set to 60 seconds. Each node may > have a different commit time, depending upon when it was started. > So there may be a slight difference in when documents are visible. > You'll probably never notice. > > If you issue commits from a client, then the commit is propagated > to all nodes in the cluster. > > HTH, > Erick > > > On Fri, Nov 22, 2013 at 7:23 PM, Flavio Pompermaier <pomperma...@okkam.it > >wrote: > > > On Fri, Nov 22, 2013 at 2:21 PM, Erick Erickson <erickerick...@gmail.com > > >wrote: > > > > > 1> I'm not quite sure I understand. External File Fields are keyed > > > by the unique id of the doc. So every shard _must_ have the > > > eff available for at least the documents in that shard. At first glance > > > this doesn't look simple. Perhaps a bit more explanation of what > > > you're using EFF for? > > > > > Thanks Erick for the reply, I use EFF for boosting results by popularity. > > So I was right, I should put popularity in every shard data dir..right? > But > > why not keeping that file in just one place (obviously the file should be > > reachable by all solrcloud nodes...) and allow external fields to be > > outside data dir? > > > > > > > > 2> Let's be sure we're talking about the same thing here. In Solr, > > > a "commit" is the command that makes documents visible, often > > > controlled by the autoCommit and autoSoftCommit settings in > > > solrconfig.xml. You will not be able to issue 100 commits/second. > > > > > > If you're using "commit" to mean adding a document to the index, > > > then 100/s should be no problem. I regularly see many times that > > > ingestion rate. The documents won't be visible to search until > > > you do a commit however. > > > > > Yeah, now it is more clear. Still a question: for my client is not a > > problem to soft commit but, are the modifications also sent to replicas > > with this kind of commits? > > > > > > > > Best > > > Erick > > > > > > > > > On Fri, Nov 22, 2013 at 4:44 AM, Flavio Pompermaier < > > pomperma...@okkam.it > > > >wrote: > > > > > > > Hi to all, > > > > we're migrating from solr 3.x to solr 4.x to use Solrcloud and I have > > two > > > > big doubts: > > > > > > > > 1) External fields. When I compute such a file do I have to copy it > in > > > the > > > > data directory of shards..? The external fields boosts the results > of > > > the > > > > query to a specific collection, for me it doesn't make sense to put > it > > in > > > > all shard's data dir, it should be something related to the > collection > > > > itself. > > > > Am I wrong or missing something? Is there a simple way to upload the > > > > popularity file (for the external field) at one in all shards? > > > > > > > > 2) My index requires frequently commits (i.e. sometimes up to 100/s). > > How > > > > do I have to manage this? Do I have to use soft commits..? Any simple > > > > configuration/code snippet to use them? Is it true that external > fields > > > > affect performance on commit? > > > > > > > > Best, > > > > Flavio > > > > > > > > > >