Re: What do you use for solr's logging analysis?

2013-08-11 Thread Shreejay Nair
There are a lot of tools out there with varying degrees of functionality (
and ease of setup) we also have multiple solr servers in production ( both
cloud and single nodes ) and we have decided to use
http://loggly. <http://loggly.com/> We will probably be setting it up for
all our servers in the next few weeks. .

There are plenty of other such log analysis tools. It all depends on your
particular use case.

--Shreejay



On Sunday, August 11, 2013, adfel70 wrote:

> Hi
> I'm looking at a tool that could help me perform solr logging analysis.
> I use SolrCloud on multiple servers, so the tool should be able to collect
> logs from multiple servers.
>
> Any tool you use and can advice of?
>
> Thanks
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/What-do-you-use-for-solr-s-logging-analysis-tp4083809.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


-- 
-- 
Shreejay Nair
Sent from my mobile device. Please excuse brevity and typos.


Re: commit vs soft-commit

2013-08-11 Thread Shreejay Nair
Yes a new searcher is opened with every soft commit. It's still considered
faster because it does not write to the disk which is a slow IO operation
and might take a lot more time.

On Sunday, August 11, 2013, tamanjit.bin...@yahoo.co.in wrote:

> Hi,
> Some confusion in my head.
> http://
> http://wiki.apache.org/solr/UpdateXmlMessages#A.22commit.22_and_.22optimize.22
>  http://wiki.apache.org/solr/UpdateXmlMessages#A.22commit.22_and_.22optimize.22
> >
> says that
> /A soft commit is much faster since it only makes index changes visible and
> does not fsync index files or write a new index descriptor./
>
> So this means that even with every softcommit a new searcher opens right?
> If
> it does, isn't it still very heavy?
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/commit-vs-soft-commit-tp4083817.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


-- 
-- 
Shreejay Nair
Sent from my mobile device. Please excuse brevity and typos.


Re: SolrCloud setup - any advice?

2013-09-19 Thread Shreejay Nair
Hi Neil,

Although you haven't mentioned it, just wanted to confirm - do you have
soft commits enabled?

Also what's the version of solr you are using for the solr cloud setup?
4.0.0 had lots of memory and zk related issues. What's the warmup time for
your caches? Have you tried disabling the caches?

Is this is static index or you documents are added continuously?

The answers to these questions might help us pin point the issue...

On Thursday, September 19, 2013, Neil Prosser wrote:

> Apologies for the giant email. Hopefully it makes sense.
>
> We've been trying out SolrCloud to solve some scalability issues with our
> current setup and have run into problems. I'd like to describe our current
> setup, our queries and the sort of load we see and am hoping someone might
> be able to spot the massive flaw in the way I've been trying to set things
> up.
>
> We currently run Solr 4.0.0 in the old style Master/Slave replication. We
> have five slaves, each running Centos with 96GB of RAM, 24 cores and with
> 48GB assigned to the JVM heap. Disks aren't crazy fast (i.e. not SSDs) but
> aren't slow either. Our GC parameters aren't particularly exciting, just
> -XX:+UseConcMarkSweepGC. Java version is 1.7.0_11.
>
> Our index size ranges between 144GB and 200GB (when we optimise it back
> down, since we've had bad experiences with large cores). We've got just
> over 37M documents some are smallish but most range between 1000-6000
> bytes. We regularly update documents so large portions of the index will be
> touched leading to a maxDocs value of around 43M.
>
> Query load ranges between 400req/s to 800req/s across the five slaves
> throughout the day, increasing and decreasing gradually over a period of
> hours, rather than bursting.
>
> Most of our documents have upwards of twenty fields. We use different
> fields to store territory variant (we have around 30 territories) values
> and also boost based on the values in some of these fields (integer ones).
>
> So an average query can do a range filter by two of the territory variant
> fields, filter by a non-territory variant field. Facet by a field or two
> (may be territory variant). Bring back the values of 60 fields. Boost query
> on field values of a non-territory variant field. Boost by values of two
> territory-variant fields. Dismax query on up to 20 fields (with boosts) and
> phrase boost on those fields too. They're pretty big queries. We don't do
> any index-time boosting. We try to keep things dynamic so we can alter our
> boosts on-the-fly.
>
> Another common query is to list documents with a given set of IDs and
> select documents with a common reference and order them by one of their
> fields.
>
> Auto-commit every 30 minutes. Replication polls every 30 minutes.
>
> Document cache:
>   * initialSize - 32768
>   * size - 32768
>
> Filter cache:
>   * autowarmCount - 128
>   * initialSize - 8192
>   * size - 8192
>
> Query result cache:
>   * autowarmCount - 128
>   * initialSize - 8192
>   * size - 8192
>
> After a replicated core has finished downloading (probably while it's
> warming) we see requests which usually take around 100ms taking over 5s. GC
> logs show concurrent mode failure.
>
> I was wondering whether anyone can help with sizing the boxes required to
> split this index down into shards for use with SolrCloud and roughly how
> much memory we should be assigning to the JVM. Everything I've read
> suggests that running with a 48GB heap is way too high but every attempt
> I've made to reduce the cache sizes seems to wind up causing out-of-memory
> problems. Even dropping all cache sizes by 50% and reducing the heap by 50%
> caused problems.
>
> I've already tried using SolrCloud 10 shards (around 3.7M documents per
> shard, each with one replica) and kept the cache sizes low:
>
> Document cache:
>   * initialSize - 1024
>   * size - 1024
>
> Filter cache:
>   * autowarmCount - 128
>   * initialSize - 512
>   * size - 512
>
> Query result cache:
>   * autowarmCount - 32
>   * initialSize - 128
>   * size - 128
>
> Even when running on six machines in AWS with SSDs, 24GB heap (out of 60GB
> memory) and four shards on two boxes and three on the rest I still see
> concurrent mode failure. This looks like it's causing ZooKeeper to mark the
> node as down and things begin to struggle.
>
> Is concurrent mode failure just something that will inevitably happen or is
> it avoidable by dropping the CMSInitiatingOccupancyFraction?
>
> If anyone has anything that might shove me in the right direction I'd be
> very grateful. I'm wondering whether our set-up will just never work and
> maybe we're expecting too much.
>
> Many thanks,
>
> Neil
>


Request to be added to ContributorsGroup

2013-05-13 Thread Shreejay Nair
Hello Wiki Admins,

Request you to please add me to the ContributorsGroup.

I have been using Solr for a few years now and I would like to contribute
back by adding more information to the wiki Pages.

Wiki User Name : Shreejay

--Shreejay


Re: replication without automated polling, just manual trigger?

2013-05-15 Thread Shreejay Nair
You can disable polling so that the slave never polls the Master(In Solr
4.3 you can disable it from the Admin interface). . And you can trigger a
replication using the HTTP API
http://wiki.apache.org/solr/SolrReplication#HTTP_API or again, use the
Admin interface to trigger a manual replication.



On Wed, May 15, 2013 at 12:47 PM, Jonathan Rochkind wrote:

> I want to set up Solr replication between a master and slave, where no
> automatic polling every X minutes happens, instead the slave only
> replicates on command. [1]
>
> So the basic question is: What's the best way to do that? But I'll provide
> what I've been doing etc., for anyone interested.
>
> Until recently, my appliation was running on Solr 1.4.  I had a setup that
> was working to accomplish this in Solr 1.4, but as I work on moving it to
> Solr 4.3, it's unclear to me if it can/will work the same way.
>
> In Solr 1.4, on slave,  I supplied a masterUrl, but did NOT supply any
> pollInterval at all on slave.  I did NOT supply an "enable"
> "false" in slave, because I think that would have prevented even manual
> replication.
>
> This seemed to result in the slave never polling, although I'm not sure if
> that was just an accident of Solr implementation or not.  Can anyone say if
> the same thing would happen in Solr 4.3?  If I look at the admin screen for
> my slave set up this way in Solr 4.3, it does say "polling enabled", but I
> realize that doesn't neccesarily mean any polling will take place, since
> I've set no pollInterval.
>
> In Solr 1.4 under this setup, I could go to the slave's admin/replication,
> and there was a "replicate now" button that I could use for manually
> triggered replication.  This button seems to no longer be there in 4.3
> replication admin screen, although I suppose I could still, somewhat less
> conveniently, issue a `replication?command=**fetchindex` to the slave, to
> manually trigger a replication?
>
>
>
> Thanks for any advice or ideas.
>
>
>
> [1]: Why, you ask?  The master is actually my 'indexing' server. Due to
> business needs, indexing only happens in bulk/mass indexing, and only
> happens periodically -- sometimes nightly, sometimes less. So I index on
> master, at a periodic schedule, and then when indexing is complete and
> verified, tell slave to replicate.  I don't want slave accidentally
> replicating in the middle of the bulk indexing process either, when the
> index might be in an unfinished state.
>