Thanks again for such a detailed description.
In our use case we're going to save shards data on hdfs so they all have
access to a shared location, it would be great to put such a file in one
place in that case :)
Do you think that using hdfs as storage is bad for performance?
Last question: if I softCommit and I have to shutdown my tomcat, will data
be commited to disk or do I have to annually force a commit before shutting
down?

Best,
Flavio

On Sat, Nov 23, 2013 at 2:01 AM, Erick Erickson <erickerick...@gmail.com>wrote:

> about <1>. Well, at a high level you're right, of course.
> Having the EFF stuff in a single place seems more elegant. But
> then ugly details crop up. I.e. "one place" implies that you'd have
> to fetch them over the network, potentially a very expensive
> operation every time there was a commit. Is this really a good
> tradeoff? With high network latency, this could be a performance
> killer. But I suspect that the real reason is that nobody has found
> a compelling use-case for this kind of thing. Until and unless
> someone does, and is willing to make a patch, it'll be theory :).
>
> bq:  modifications also sent to replicas
> with this kind of commits
>
> brief review:
>
> Update process:
> 1> Update goes to a node.
> 2> node forwards to all leaders
> 3> leader forward to replicas
> 4> replicas respond to their leader.
> 5> leader responds to originating node.
> 6> originating node responds to caller.
>
> At this point all the replicas for your entire cluster have the
> update. This is entirely independent of commits. Whenever a
> commit is issued the documents currently pending on a node
> are committed and made visible to a searcher.
>
> If one is relying on solrconfig settings, then the commit happens
> a little bit out of synch. Let's say that the commit (hard with
> opensearcher=true or soft) is set to 60 seconds. Each node may
> have a different commit time, depending upon when it was started.
> So there may be a slight difference in when documents are visible.
> You'll probably never notice.
>
> If you issue commits from a client, then the commit is propagated
> to all nodes in the cluster.
>
> HTH,
> Erick
>
>
> On Fri, Nov 22, 2013 at 7:23 PM, Flavio Pompermaier <pomperma...@okkam.it
> >wrote:
>
> > On Fri, Nov 22, 2013 at 2:21 PM, Erick Erickson <erickerick...@gmail.com
> > >wrote:
> >
> > > 1> I'm not quite sure I understand. External File Fields are keyed
> > > by the unique id of the doc. So every shard _must_ have the
> > > eff available for at least the documents in that shard. At first glance
> > > this doesn't look simple. Perhaps a bit more explanation of what
> > > you're using EFF for?
> > >
> > Thanks Erick for the reply, I use EFF for boosting results by popularity.
> > So I was right, I should put popularity in every shard data dir..right?
> But
> > why not keeping that file in just one place (obviously the file should be
> > reachable by all solrcloud nodes...) and allow external fields to be
> > outside data dir?
> >
> > >
> > > 2> Let's be sure we're talking about the same thing here. In Solr,
> > > a "commit" is the command that makes documents visible, often
> > > controlled by the autoCommit and autoSoftCommit settings in
> > > solrconfig.xml. You will not be able to issue 100 commits/second.
> > >
> > > If you're using "commit" to mean adding a document to the index,
> > > then 100/s should be no problem. I regularly see many times that
> > > ingestion rate. The documents won't be visible to search until
> > > you do a commit however.
> > >
> > Yeah, now it is more clear. Still a question: for my client is not a
> > problem to soft commit but, are the modifications also sent to replicas
> > with this kind of commits?
> >
> > >
> > > Best
> > > Erick
> > >
> > >
> > > On Fri, Nov 22, 2013 at 4:44 AM, Flavio Pompermaier <
> > pomperma...@okkam.it
> > > >wrote:
> > >
> > > > Hi to all,
> > > > we're migrating from solr 3.x to solr 4.x to use Solrcloud and I have
> > two
> > > > big doubts:
> > > >
> > > > 1) External fields. When I compute such a file do I have to copy it
> in
> > > the
> > > >  data directory of shards..? The external fields boosts the results
> of
> > > the
> > > > query to a specific collection, for me it doesn't make sense to put
> it
> > in
> > > > all shard's data dir, it should be something related to the
> collection
> > > > itself.
> > > > Am I wrong or missing something? Is there a simple way to upload the
> > > > popularity file (for the external field) at one in all shards?
> > > >
> > > > 2) My index requires frequently commits (i.e. sometimes up to 100/s).
> > How
> > > > do I have to manage this? Do I have to use soft commits..? Any simple
> > > > configuration/code snippet to use them? Is it true that external
> fields
> > > > affect performance on commit?
> > > >
> > > > Best,
> > > > Flavio
> > > >
> > >
> >
>

Reply via email to