from:"Software Dev"

Help me understand these newrelic graphs

2014-03-13 Thread Software Dev

Here are some screen shots of our Solr Cloud cluster via Newrelic

http://postimg.org/gallery/2hyzyeyc/

We currently have a 5 node cluster and all indexing is done on separate
machines and shipped over. Our machines are running on SSD's with 18G of
ram (Index size is 8G). We only have 1 shard at the moment with replicas on
all 5 machines. I'm guessing thats a bit of a waste?

How come when we do our bulk updating the response time actually decreases?
I would think the load would be higher therefor response time should be
higher. Any way I can decrease the response time?

Thanks

Re: Help me understand these newrelic graphs

2014-03-13 Thread Software Dev

Ahh.. its including the add operation. That makes sense I then. A bit silly
on NR's part they don't break it down.

Otis, our index is only 8G so I don't consider that big by any means but
our queries can get a bit complex with a bit of faceting. Do you still
think it makes sense to shard? How easy would this be to get working?


On Thu, Mar 13, 2014 at 4:02 PM, Otis Gospodnetic <
otis.gospodne...@gmail.com> wrote:

> Hi,
>
> I think NR has support for breaking by handler, no?  Just checked - no.
>  Only webapp controller, but that doesn't apply to Solr.
>
> SPM should be more helpful when it comes to monitoring Solr - you can
> filter by host, handler, collection/core, etc. -- you can see the demo -
> https://apps.sematext.com/demo - though this is plain Solr, not SolrCloud.
>
> If your index is big or queries are complex, shard it and parallelize
> search.
>
> Otis
> --
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/
>
>
> On Thu, Mar 13, 2014 at 6:17 PM, ralph tice  wrote:
>
> > I think your response time is including the average response for an add
> > operation, which generally returns very quickly and due to sheer number
> are
> > averaging out the response time of your queries.  New Relic should break
> > out requests based on which handler they're hitting but they don't seem
> to.
> >
> >
> > On Thu, Mar 13, 2014 at 2:18 PM, Software Dev  > >wrote:
> >
> > > Here are some screen shots of our Solr Cloud cluster via Newrelic
> > >
> > > http://postimg.org/gallery/2hyzyeyc/
> > >
> > > We currently have a 5 node cluster and all indexing is done on separate
> > > machines and shipped over. Our machines are running on SSD's with 18G
> of
> > > ram (Index size is 8G). We only have 1 shard at the moment with
> replicas
> > on
> > > all 5 machines. I'm guessing thats a bit of a waste?
> > >
> > > How come when we do our bulk updating the response time actually
> > decreases?
> > > I would think the load would be higher therefor response time should be
> > > higher. Any way I can decrease the response time?
> > >
> > > Thanks
> > >
> >
>

Re: Help me understand these newrelic graphs

2014-03-14 Thread Software Dev

If that is the case, what would help?


On Thu, Mar 13, 2014 at 8:46 PM, Otis Gospodnetic <
otis.gospodne...@gmail.com> wrote:

> It really depends, hard to give a definitive instruction without more
> pieces of info.
> e.g. if your CPUs are all maxed out and you already have a high number of
> concurrent queries than sharding may not be of any help at all.
>
> Otis
> --
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/
>
>
> On Thu, Mar 13, 2014 at 7:42 PM, Software Dev  >wrote:
>
> > Ahh.. its including the add operation. That makes sense I then. A bit
> silly
> > on NR's part they don't break it down.
> >
> > Otis, our index is only 8G so I don't consider that big by any means but
> > our queries can get a bit complex with a bit of faceting. Do you still
> > think it makes sense to shard? How easy would this be to get working?
> >
> >
> > On Thu, Mar 13, 2014 at 4:02 PM, Otis Gospodnetic <
> > otis.gospodne...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > I think NR has support for breaking by handler, no?  Just checked - no.
> > >  Only webapp controller, but that doesn't apply to Solr.
> > >
> > > SPM should be more helpful when it comes to monitoring Solr - you can
> > > filter by host, handler, collection/core, etc. -- you can see the demo
> -
> > > https://apps.sematext.com/demo - though this is plain Solr, not
> > SolrCloud.
> > >
> > > If your index is big or queries are complex, shard it and parallelize
> > > search.
> > >
> > > Otis
> > > --
> > > Performance Monitoring * Log Analytics * Search Analytics
> > > Solr & Elasticsearch Support * http://sematext.com/
> > >
> > >
> > > On Thu, Mar 13, 2014 at 6:17 PM, ralph tice 
> > wrote:
> > >
> > > > I think your response time is including the average response for an
> add
> > > > operation, which generally returns very quickly and due to sheer
> number
> > > are
> > > > averaging out the response time of your queries.  New Relic should
> > break
> > > > out requests based on which handler they're hitting but they don't
> seem
> > > to.
> > > >
> > > >
> > > > On Thu, Mar 13, 2014 at 2:18 PM, Software Dev <
> > static.void@gmail.com
> > > > >wrote:
> > > >
> > > > > Here are some screen shots of our Solr Cloud cluster via Newrelic
> > > > >
> > > > > http://postimg.org/gallery/2hyzyeyc/
> > > > >
> > > > > We currently have a 5 node cluster and all indexing is done on
> > separate
> > > > > machines and shipped over. Our machines are running on SSD's with
> 18G
> > > of
> > > > > ram (Index size is 8G). We only have 1 shard at the moment with
> > > replicas
> > > > on
> > > > > all 5 machines. I'm guessing thats a bit of a waste?
> > > > >
> > > > > How come when we do our bulk updating the response time actually
> > > > decreases?
> > > > > I would think the load would be higher therefor response time
> should
> > be
> > > > > higher. Any way I can decrease the response time?
> > > > >
> > > > > Thanks
> > > > >
> > > >
> > >
> >
>

Re: Help me understand these newrelic graphs

2014-03-14 Thread Software Dev

Here is a screenshot of the host information:
http://postimg.org/image/vub5ihxix/

As you can see we have 24 core CPU's and the load is only at 5-7.5.


On Fri, Mar 14, 2014 at 10:02 AM, Software Dev wrote:

> If that is the case, what would help?
>
>
> On Thu, Mar 13, 2014 at 8:46 PM, Otis Gospodnetic <
> otis.gospodne...@gmail.com> wrote:
>
>> It really depends, hard to give a definitive instruction without more
>> pieces of info.
>> e.g. if your CPUs are all maxed out and you already have a high number of
>> concurrent queries than sharding may not be of any help at all.
>>
>> Otis
>> --
>> Performance Monitoring * Log Analytics * Search Analytics
>> Solr & Elasticsearch Support * http://sematext.com/
>>
>>
>> On Thu, Mar 13, 2014 at 7:42 PM, Software Dev > >wrote:
>>
>> > Ahh.. its including the add operation. That makes sense I then. A bit
>> silly
>> > on NR's part they don't break it down.
>> >
>> > Otis, our index is only 8G so I don't consider that big by any means but
>> > our queries can get a bit complex with a bit of faceting. Do you still
>> > think it makes sense to shard? How easy would this be to get working?
>> >
>> >
>> > On Thu, Mar 13, 2014 at 4:02 PM, Otis Gospodnetic <
>> > otis.gospodne...@gmail.com> wrote:
>> >
>> > > Hi,
>> > >
>> > > I think NR has support for breaking by handler, no?  Just checked -
>> no.
>> > >  Only webapp controller, but that doesn't apply to Solr.
>> > >
>> > > SPM should be more helpful when it comes to monitoring Solr - you can
>> > > filter by host, handler, collection/core, etc. -- you can see the
>> demo -
>> > > https://apps.sematext.com/demo - though this is plain Solr, not
>> > SolrCloud.
>> > >
>> > > If your index is big or queries are complex, shard it and parallelize
>> > > search.
>> > >
>> > > Otis
>> > > --
>> > > Performance Monitoring * Log Analytics * Search Analytics
>> > > Solr & Elasticsearch Support * http://sematext.com/
>> > >
>> > >
>> > > On Thu, Mar 13, 2014 at 6:17 PM, ralph tice 
>> > wrote:
>> > >
>> > > > I think your response time is including the average response for an
>> add
>> > > > operation, which generally returns very quickly and due to sheer
>> number
>> > > are
>> > > > averaging out the response time of your queries.  New Relic should
>> > break
>> > > > out requests based on which handler they're hitting but they don't
>> seem
>> > > to.
>> > > >
>> > > >
>> > > > On Thu, Mar 13, 2014 at 2:18 PM, Software Dev <
>> > static.void@gmail.com
>> > > > >wrote:
>> > > >
>> > > > > Here are some screen shots of our Solr Cloud cluster via Newrelic
>> > > > >
>> > > > > http://postimg.org/gallery/2hyzyeyc/
>> > > > >
>> > > > > We currently have a 5 node cluster and all indexing is done on
>> > separate
>> > > > > machines and shipped over. Our machines are running on SSD's with
>> 18G
>> > > of
>> > > > > ram (Index size is 8G). We only have 1 shard at the moment with
>> > > replicas
>> > > > on
>> > > > > all 5 machines. I'm guessing thats a bit of a waste?
>> > > > >
>> > > > > How come when we do our bulk updating the response time actually
>> > > > decreases?
>> > > > > I would think the load would be higher therefor response time
>> should
>> > be
>> > > > > higher. Any way I can decrease the response time?
>> > > > >
>> > > > > Thanks
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Re: Help me understand these newrelic graphs

2014-03-17 Thread Software Dev

Otis, I want to get those spikes down lower if possible. As mentioned in
the above posts that the 25ms timing you are seeing is not really accurate
because that's the average response time for ALL requests including the
bulk add operations which are generally super fast. Our true response time
is around 100ms.


On Fri, Mar 14, 2014 at 10:54 AM, Otis Gospodnetic <
otis.gospodne...@gmail.com> wrote:

> Are you trying to bring that 24.9 ms response time down?
> Looks like there is room for more aggressive sharing there, yes.
>
> Otis
> --
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/
>
>
> On Fri, Mar 14, 2014 at 1:07 PM, Software Dev  >wrote:
>
> > Here is a screenshot of the host information:
> > http://postimg.org/image/vub5ihxix/
> >
> > As you can see we have 24 core CPU's and the load is only at 5-7.5.
> >
> >
> > On Fri, Mar 14, 2014 at 10:02 AM, Software Dev <
> static.void@gmail.com
> > >wrote:
> >
> > > If that is the case, what would help?
> > >
> > >
> > > On Thu, Mar 13, 2014 at 8:46 PM, Otis Gospodnetic <
> > > otis.gospodne...@gmail.com> wrote:
> > >
> > >> It really depends, hard to give a definitive instruction without more
> > >> pieces of info.
> > >> e.g. if your CPUs are all maxed out and you already have a high number
> > of
> > >> concurrent queries than sharding may not be of any help at all.
> > >>
> > >> Otis
> > >> --
> > >> Performance Monitoring * Log Analytics * Search Analytics
> > >> Solr & Elasticsearch Support * http://sematext.com/
> > >>
> > >>
> > >> On Thu, Mar 13, 2014 at 7:42 PM, Software Dev <
> > static.void@gmail.com
> > >> >wrote:
> > >>
> > >> > Ahh.. its including the add operation. That makes sense I then. A
> bit
> > >> silly
> > >> > on NR's part they don't break it down.
> > >> >
> > >> > Otis, our index is only 8G so I don't consider that big by any means
> > but
> > >> > our queries can get a bit complex with a bit of faceting. Do you
> still
> > >> > think it makes sense to shard? How easy would this be to get
> working?
> > >> >
> > >> >
> > >> > On Thu, Mar 13, 2014 at 4:02 PM, Otis Gospodnetic <
> > >> > otis.gospodne...@gmail.com> wrote:
> > >> >
> > >> > > Hi,
> > >> > >
> > >> > > I think NR has support for breaking by handler, no?  Just checked
> -
> > >> no.
> > >> > >  Only webapp controller, but that doesn't apply to Solr.
> > >> > >
> > >> > > SPM should be more helpful when it comes to monitoring Solr - you
> > can
> > >> > > filter by host, handler, collection/core, etc. -- you can see the
> > >> demo -
> > >> > > https://apps.sematext.com/demo - though this is plain Solr, not
> > >> > SolrCloud.
> > >> > >
> > >> > > If your index is big or queries are complex, shard it and
> > parallelize
> > >> > > search.
> > >> > >
> > >> > > Otis
> > >> > > --
> > >> > > Performance Monitoring * Log Analytics * Search Analytics
> > >> > > Solr & Elasticsearch Support * http://sematext.com/
> > >> > >
> > >> > >
> > >> > > On Thu, Mar 13, 2014 at 6:17 PM, ralph tice  >
> > >> > wrote:
> > >> > >
> > >> > > > I think your response time is including the average response for
> > an
> > >> add
> > >> > > > operation, which generally returns very quickly and due to sheer
> > >> number
> > >> > > are
> > >> > > > averaging out the response time of your queries.  New Relic
> should
> > >> > break
> > >> > > > out requests based on which handler they're hitting but they
> don't
> > >> seem
> > >> > > to.
> > >> > > >
> > >> > > >
> > >> > > > On Thu, Mar 13, 2014 at 2:18 PM, Software Dev <
> > >> > static.void@gmail.com
> > >> > > > >wrote:
> > >> > > >
> > >> > > > > Here are some screen shots of our Solr Cloud cluster via
> > Newrelic
> > >> > > > >
> > >> > > > > http://postimg.org/gallery/2hyzyeyc/
> > >> > > > >
> > >> > > > > We currently have a 5 node cluster and all indexing is done on
> > >> > separate
> > >> > > > > machines and shipped over. Our machines are running on SSD's
> > with
> > >> 18G
> > >> > > of
> > >> > > > > ram (Index size is 8G). We only have 1 shard at the moment
> with
> > >> > > replicas
> > >> > > > on
> > >> > > > > all 5 machines. I'm guessing thats a bit of a waste?
> > >> > > > >
> > >> > > > > How come when we do our bulk updating the response time
> actually
> > >> > > > decreases?
> > >> > > > > I would think the load would be higher therefor response time
> > >> should
> > >> > be
> > >> > > > > higher. Any way I can decrease the response time?
> > >> > > > >
> > >> > > > > Thanks
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Solr Cloud collection keep going down?

2014-03-22 Thread Software Dev

We have 2 collections with 1 shard each replicated over 5 servers in the
cluster. We see a lot of flapping (down or recovering) on one of the
collections. When this happens the other collection hosted on the same
machine is still marked as active. When this happens it takes a fairly long
time (~30 minutes) for the collection to come back online, if at all. I
find that its usually more reliable to completely shutdown solr on the
affected machine and bring it back up with its core disabled. We then
re-enable the core when its marked as active.

A few questions:

1) What is the healthcheck in Solr-Cloud? Put another way, what is failing
that marks one collection as down but the other on the same machine as up?

2) Why does recovery take forever when a node goes down.. even if its only
down for 30 seconds. Our index is only 7-8G and we are running on SSD's.

3) What can be done to diagnose and fix this problem?

Re: Solr Cloud collection keep going down?

2014-03-22 Thread Software Dev

iter.write(OutputStreamWriter.java:207)

at org.apache.solr.util.FastWriter.flush(FastWriter.java:141)

at org.apache.solr.util.FastWriter.write(FastWriter.java:55)

at org.apache.solr.response.RubyWriter.writeStr(RubyResponseWriter.java:87)

at 
org.apache.solr.response.JSONWriter.writeNamedListAsFlat(JSONResponseWriter.java:285)

at 
org.apache.solr.response.JSONWriter.writeNamedList(JSONResponseWriter.java:301)

at 
org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:188)

at 
org.apache.solr.response.JSONWriter.writeNamedListAsMapWithDups(JSONResponseWriter.java:183)

at 
org.apache.solr.response.JSONWriter.writeNamedList(JSONResponseWriter.java:299)

at 
org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:188)

at 
org.apache.solr.response.JSONWriter.writeNamedListAsMapWithDups(JSONResponseWriter.java:183)

at 
org.apache.solr.response.JSONWriter.writeNamedList(JSONResponseWriter.java:299)

at 
org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:188)

at 
org.apache.solr.response.JSONWriter.writeNamedListAsMapWithDups(JSONResponseWriter.java:183)

at 
org.apache.solr.response.JSONWriter.writeNamedList(JSONResponseWriter.java:299)

at org.apache.solr.response.JSONWriter.writeResponse(JSONResponseWriter.java:95)

at org.apache.solr.response.RubyResponseWriter.write(RubyResponseWriter.java:37)

at 
org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:768)

at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:440)

at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217)

at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)

at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)

at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)

at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)

at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)

at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)

at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)

at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)

at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)

at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)

at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)

at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)

at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)

at org.eclipse.jetty.server.Server.handle(Server.java:368)

at 
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)

at 
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)

at 
org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953)

at 
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014)

at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861)

at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)

at 
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)

at 
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)

at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)

at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)

at java.lang.Thread.run(Thread.java:744)

Caused by: java.net.SocketException: Connection reset

at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:118)

at java.net.SocketOutputStream.write(SocketOutputStream.java:159)

at org.eclipse.jetty.io.ByteArrayBuffer.writeTo(ByteArrayBuffer.java:375)

at org.eclipse.jetty.io.bio.StreamEndPoint.flush(StreamEndPoint.java:164)

at org.eclipse.jetty.io.bio.StreamEndPoint.flush(StreamEndPoint.java:182)

at org.eclipse.jetty.http.HttpGenerator.flushBuffer(HttpGenerator.java:838)

... 51 more


,code=500}



On Sat, Mar 22, 2014 at 12:23 PM, Software Dev
 wrote:
> We have 2 collections with 1 shard each replicated over 5 servers in the
> cluster. We see a lot of flapping (down or recovering) on one of the
> collections. When this happens the other collection hosted on the same
> machine is still marked as active. When this happens it takes a fairly long
> time (~30 minutes) for the collection to come back online, if at all. I find
> that its usually more reliable to completely shutdown solr on the affected
> machine and bring it back up with its core disabled. We then re-enable the
> core when its marked as active.
>
> A few questions:
>
> 1) What is the healt

Re: Solr Cloud collection keep going down?

2014-03-24 Thread Software Dev

Shawn,

Thanks for pointing me in the right direction. After consulting the
above document I *think* that the problem may be too large of a heap
and which may be affecting GC collection and hence causing ZK
timeouts.

We have around 20G of memory on these machines with a min/max of heap
at 6, 8 respectively (-Xms6G -Xmx10G). The rest was allocated for
aside for disk cache. Why did we choose 6-10? No other reason than we
wanted to allot enough for disk cache and then everything else was
thrown and Solr. Does this sound about right?

I took some screenshots for VisualVM and our NewRelic reporting as
well as some relevant portions of our SolrConfig.xml. Any
thoughts/comments would be greatly appreciated.

http://postimg.org/gallery/4t73sdks/1fc10f9c/

Thanks




On Sat, Mar 22, 2014 at 2:26 PM, Shawn Heisey  wrote:
> On 3/22/2014 1:23 PM, Software Dev wrote:
>> We have 2 collections with 1 shard each replicated over 5 servers in the
>> cluster. We see a lot of flapping (down or recovering) on one of the
>> collections. When this happens the other collection hosted on the same
>> machine is still marked as active. When this happens it takes a fairly long
>> time (~30 minutes) for the collection to come back online, if at all. I
>> find that its usually more reliable to completely shutdown solr on the
>> affected machine and bring it back up with its core disabled. We then
>> re-enable the core when its marked as active.
>>
>> A few questions:
>>
>> 1) What is the healthcheck in Solr-Cloud? Put another way, what is failing
>> that marks one collection as down but the other on the same machine as up?
>>
>> 2) Why does recovery take forever when a node goes down.. even if its only
>> down for 30 seconds. Our index is only 7-8G and we are running on SSD's.
>>
>> 3) What can be done to diagnose and fix this problem?
>
> Unless you are actually using the ping request handler, the healthcheck
> config will not matter.  Or were you referring to something else?
>
> Referencing the logs you included in your reply:  The EofException
> errors happen because your client code times out and disconnects before
> the request it made has completed.  That is most likely just a symptom
> that has nothing at all to do with the problem.
>
> Read the following wiki page.  What I'm going to say below will
> reference information you can find there:
>
> http://wiki.apache.org/solr/SolrPerformanceProblems
>
> Relevant side note: The default zookeeper client timeout is 15 seconds.
>  A typical zookeeper config defines tickTime as 2 seconds, and the
> timeout cannot be configured to be more than 20 times the tickTime,
> which means it cannot go beyond 40 seconds.  The default timeout value
> 15 seconds is usually more than enough, unless you are having
> performance problems.
>
> If you are not actually taking Solr instances down, then the fact that
> you are seeing the log replay messages indicates to me that something is
> taking so much time that the connection to Zookeeper times out.  When it
> finally responds, it will attempt to recover the index, which means
> first it will replay the transaction log and then it might replicate the
> index from the shard leader.
>
> Replaying the transaction log is likely the reason it takes so long to
> recover.  The wiki page I linked above has a "slow startup" section that
> explains how to fix this.
>
> There is some kind of underlying problem that is causing the zookeeper
> connection to timeout.  It is most likely garbage collection pauses or
> insufficient RAM to cache the index, possibly both.
>
> You did not indicate how much total RAM you have or how big your Java
> heap is.  As the wiki page mentions in the SSD section, SSD is not a
> substitute for having enough RAM to cache at significant percentage of
> your index.
>
> Thanks,
> Shawn
>

Question on highlighting edgegrams

2014-03-24 Thread Software Dev

In 3.5.0 we have the following.


  



  
  


  


If we searched for "c" with highlighting enabled we would get back
results such as:

cdat
crocdile
cool beans

But in the latest Solr (4.7) we get the full words highlighted back.
Did something change from these versions with regards to highlighting?

Thanks

Re: Question on highlighting edgegrams

2014-03-25 Thread Software Dev

Bump

On Mon, Mar 24, 2014 at 3:00 PM, Software Dev  wrote:
> In 3.5.0 we have the following.
>
>  positionIncrementGap="100">
>   
> 
> 
>  maxGramSize="30"/>
>   
>   
> 
> 
>   
> 
>
> If we searched for "c" with highlighting enabled we would get back
> results such as:
>
> cdat
> crocdile
> cool beans
>
> But in the latest Solr (4.7) we get the full words highlighted back.
> Did something change from these versions with regards to highlighting?
>
> Thanks

Replication (Solr Cloud)

2014-03-25 Thread Software Dev

I see that by default in SolrCloud that my collections are
replicating. Should this be disabled in SolrCloud as this is already
handled by it?

>From the documentation:

"The Replication screen shows you the current replication state for
the named core you have specified. In Solr, replication is for the
index only. SolrCloud has supplanted much of this functionality, but
if you are still using index replication, you can use this screen to
see the replication state:"

I just want to make sure before I disable it that if we send an update
to one server that the document will be correctly replicated across
all nodes. Thanks

Re: Replication (Solr Cloud)

2014-03-25 Thread Software Dev

Thanks for the reply. Ill make sure NOT to disable it.

Re: Solr Cloud collection keep going down?

2014-03-25 Thread Software Dev

Can anyone else chime in? Thanks

On Mon, Mar 24, 2014 at 10:10 AM, Software Dev
 wrote:
> Shawn,
>
> Thanks for pointing me in the right direction. After consulting the
> above document I *think* that the problem may be too large of a heap
> and which may be affecting GC collection and hence causing ZK
> timeouts.
>
> We have around 20G of memory on these machines with a min/max of heap
> at 6, 8 respectively (-Xms6G -Xmx10G). The rest was allocated for
> aside for disk cache. Why did we choose 6-10? No other reason than we
> wanted to allot enough for disk cache and then everything else was
> thrown and Solr. Does this sound about right?
>
> I took some screenshots for VisualVM and our NewRelic reporting as
> well as some relevant portions of our SolrConfig.xml. Any
> thoughts/comments would be greatly appreciated.
>
> http://postimg.org/gallery/4t73sdks/1fc10f9c/
>
> Thanks
>
>
>
>
> On Sat, Mar 22, 2014 at 2:26 PM, Shawn Heisey  wrote:
>> On 3/22/2014 1:23 PM, Software Dev wrote:
>>> We have 2 collections with 1 shard each replicated over 5 servers in the
>>> cluster. We see a lot of flapping (down or recovering) on one of the
>>> collections. When this happens the other collection hosted on the same
>>> machine is still marked as active. When this happens it takes a fairly long
>>> time (~30 minutes) for the collection to come back online, if at all. I
>>> find that its usually more reliable to completely shutdown solr on the
>>> affected machine and bring it back up with its core disabled. We then
>>> re-enable the core when its marked as active.
>>>
>>> A few questions:
>>>
>>> 1) What is the healthcheck in Solr-Cloud? Put another way, what is failing
>>> that marks one collection as down but the other on the same machine as up?
>>>
>>> 2) Why does recovery take forever when a node goes down.. even if its only
>>> down for 30 seconds. Our index is only 7-8G and we are running on SSD's.
>>>
>>> 3) What can be done to diagnose and fix this problem?
>>
>> Unless you are actually using the ping request handler, the healthcheck
>> config will not matter.  Or were you referring to something else?
>>
>> Referencing the logs you included in your reply:  The EofException
>> errors happen because your client code times out and disconnects before
>> the request it made has completed.  That is most likely just a symptom
>> that has nothing at all to do with the problem.
>>
>> Read the following wiki page.  What I'm going to say below will
>> reference information you can find there:
>>
>> http://wiki.apache.org/solr/SolrPerformanceProblems
>>
>> Relevant side note: The default zookeeper client timeout is 15 seconds.
>>  A typical zookeeper config defines tickTime as 2 seconds, and the
>> timeout cannot be configured to be more than 20 times the tickTime,
>> which means it cannot go beyond 40 seconds.  The default timeout value
>> 15 seconds is usually more than enough, unless you are having
>> performance problems.
>>
>> If you are not actually taking Solr instances down, then the fact that
>> you are seeing the log replay messages indicates to me that something is
>> taking so much time that the connection to Zookeeper times out.  When it
>> finally responds, it will attempt to recover the index, which means
>> first it will replay the transaction log and then it might replicate the
>> index from the shard leader.
>>
>> Replaying the transaction log is likely the reason it takes so long to
>> recover.  The wiki page I linked above has a "slow startup" section that
>> explains how to fix this.
>>
>> There is some kind of underlying problem that is causing the zookeeper
>> connection to timeout.  It is most likely garbage collection pauses or
>> insufficient RAM to cache the index, possibly both.
>>
>> You did not indicate how much total RAM you have or how big your Java
>> heap is.  As the wiki page mentions in the SSD section, SSD is not a
>> substitute for having enough RAM to cache at significant percentage of
>> your index.
>>
>> Thanks,
>> Shawn
>>

Re: Replication (Solr Cloud)

2014-03-25 Thread Software Dev

One other question. If I optimize a collection on one node, does this
get replicated to all others when finished?

On Tue, Mar 25, 2014 at 10:13 AM, Software Dev
 wrote:
> Thanks for the reply. Ill make sure NOT to disable it.

Re: Replication (Solr Cloud)

2014-03-25 Thread Software Dev

Ehh.. found out the hard way. I optimized the collection on 1 machine
and when it was completed it replicated to the others and took my
cluster down. Shitty

On Tue, Mar 25, 2014 at 10:46 AM, Software Dev
 wrote:
> One other question. If I optimize a collection on one node, does this
> get replicated to all others when finished?
>
> On Tue, Mar 25, 2014 at 10:13 AM, Software Dev
>  wrote:
>> Thanks for the reply. Ill make sure NOT to disable it.

Re: Replication (Solr Cloud)

2014-03-25 Thread Software Dev

So its generally a bad idea to optimize I gather?

- In older versions it might have done them all at once, but I believe
that newer versions only do one core at a time.

On Tue, Mar 25, 2014 at 11:16 AM, Shawn Heisey  wrote:
> On 3/25/2014 11:59 AM, Software Dev wrote:
>>
>> Ehh.. found out the hard way. I optimized the collection on 1 machine
>> and when it was completed it replicated to the others and took my
>> cluster down. Shitty
>
>
> It doesn't get replicated -- each core in the collection will be optimized.
> In older versions it might have done them all at once, but I believe that
> newer versions only do one core at a time.
>
> Doing an optimize on a Solr core results in a LOT of I/O. If your Solr
> install is having performance issues, that will push it over the edge.  When
> SolrCloud ends up with a performance problem in one place, they tend to
> multiply and cause MORE problems.  It can get bad enough that the whole
> cluster goes down because it's trying to do a recovery on every node.  For
> that reason, it's extremely important that you have enough system resources
> available across your cloud (RAM in particular) to avoid performance issues.
>
> Thanks,
> Shawn
>

Re: Replication (Solr Cloud)

2014-03-25 Thread Software Dev

"In older versions it might have done them all at once, but I believe
that newer versions only do one core at a time."

It looks like it did it all at once and I'm on the latest (4.7)

On Tue, Mar 25, 2014 at 11:27 AM, Software Dev
 wrote:
> So its generally a bad idea to optimize I gather?
>
> - In older versions it might have done them all at once, but I believe
> that newer versions only do one core at a time.
>
> On Tue, Mar 25, 2014 at 11:16 AM, Shawn Heisey  wrote:
>> On 3/25/2014 11:59 AM, Software Dev wrote:
>>>
>>> Ehh.. found out the hard way. I optimized the collection on 1 machine
>>> and when it was completed it replicated to the others and took my
>>> cluster down. Shitty
>>
>>
>> It doesn't get replicated -- each core in the collection will be optimized.
>> In older versions it might have done them all at once, but I believe that
>> newer versions only do one core at a time.
>>
>> Doing an optimize on a Solr core results in a LOT of I/O. If your Solr
>> install is having performance issues, that will push it over the edge.  When
>> SolrCloud ends up with a performance problem in one place, they tend to
>> multiply and cause MORE problems.  It can get bad enough that the whole
>> cluster goes down because it's trying to do a recovery on every node.  For
>> that reason, it's extremely important that you have enough system resources
>> available across your cloud (RAM in particular) to avoid performance issues.
>>
>> Thanks,
>> Shawn
>>

Re: Question on highlighting edgegrams

2014-03-25 Thread Software Dev

Same problem here:
http://lucene.472066.n3.nabble.com/Solr-4-x-EdgeNGramFilterFactory-and-highlighting-td4114748.html

On Tue, Mar 25, 2014 at 9:39 AM, Software Dev  wrote:
> Bump
>
> On Mon, Mar 24, 2014 at 3:00 PM, Software Dev  
> wrote:
>> In 3.5.0 we have the following.
>>
>> > positionIncrementGap="100">
>>   
>> 
>> 
>> > maxGramSize="30"/>
>>   
>>   
>> 
>> 
>>   
>> 
>>
>> If we searched for "c" with highlighting enabled we would get back
>> results such as:
>>
>> cdat
>> crocdile
>> cool beans
>>
>> But in the latest Solr (4.7) we get the full words highlighted back.
>> Did something change from these versions with regards to highlighting?
>>
>> Thanks

What contributes to disk IO?

2014-03-25 Thread Software Dev

What are the main contributing factors for Solr Cloud generating a lot
of disk IO?

A lot of reads? Writes? Insufficient RAM?

I would think if there was enough disk cache available for the whole
index there would be little to no disk IO.

Re: Question on highlighting edgegrams

2014-03-26 Thread Software Dev

Is this a known bug?

On Tue, Mar 25, 2014 at 1:12 PM, Software Dev  wrote:
> Same problem here:
> http://lucene.472066.n3.nabble.com/Solr-4-x-EdgeNGramFilterFactory-and-highlighting-td4114748.html
>
> On Tue, Mar 25, 2014 at 9:39 AM, Software Dev  
> wrote:
>> Bump
>>
>> On Mon, Mar 24, 2014 at 3:00 PM, Software Dev  
>> wrote:
>>> In 3.5.0 we have the following.
>>>
>>> >> positionIncrementGap="100">
>>>   
>>> 
>>> 
>>> >> maxGramSize="30"/>
>>>   
>>>   
>>> 
>>> 
>>>   
>>> 
>>>
>>> If we searched for "c" with highlighting enabled we would get back
>>> results such as:
>>>
>>> cdat
>>> crocdile
>>> cool beans
>>>
>>> But in the latest Solr (4.7) we get the full words highlighted back.
>>> Did something change from these versions with regards to highlighting?
>>>
>>> Thanks

What are my options?

2014-03-27 Thread Software Dev

We have a collection named "items". These are simply products that we
sell. A large part of our scoring involves boosting on certain metrics
for each product (amount sold, total GMS, ratings, etc). Some of these
metrics are actually split across multiple tables.

We are currently re-indexing the complete document anytime any of
these values changes. I'm wondering if there is a better way?

Some ideas:

1) Partial update the document. Is this even possible?
2) Add a parent-child relationship on Item and its metrics?
3) Dump all metrics to a file and use that as it changes throughout
the day? I forgot the actual component that does it. Either way, can
it handle multiple values?
4) Something else?

I appreciate any feedback. Thanks

Re: Question on highlighting edgegrams

2014-03-27 Thread Software Dev

Certainly I am not the only user experiencing this?

On Wed, Mar 26, 2014 at 1:11 PM, Software Dev  wrote:
> Is this a known bug?
>
> On Tue, Mar 25, 2014 at 1:12 PM, Software Dev  
> wrote:
>> Same problem here:
>> http://lucene.472066.n3.nabble.com/Solr-4-x-EdgeNGramFilterFactory-and-highlighting-td4114748.html
>>
>> On Tue, Mar 25, 2014 at 9:39 AM, Software Dev  
>> wrote:
>>> Bump
>>>
>>> On Mon, Mar 24, 2014 at 3:00 PM, Software Dev  
>>> wrote:
>>>> In 3.5.0 we have the following.
>>>>
>>>> >>> positionIncrementGap="100">
>>>>   
>>>> 
>>>> 
>>>> >>> maxGramSize="30"/>
>>>>   
>>>>   
>>>> 
>>>> 
>>>>   
>>>> 
>>>>
>>>> If we searched for "c" with highlighting enabled we would get back
>>>> results such as:
>>>>
>>>> cdat
>>>> crocdile
>>>> cool beans
>>>>
>>>> But in the latest Solr (4.7) we get the full words highlighted back.
>>>> Did something change from these versions with regards to highlighting?
>>>>
>>>> Thanks

Re: Question on highlighting edgegrams

2014-03-28 Thread Software Dev

Shalin,

I am running 4.7 and seeing this behavior :(

On Thu, Mar 27, 2014 at 10:36 PM, Shalin Shekhar Mangar
 wrote:
> Yes, there are known bugs with EdgeNGram filters. I think they are fixed in 
> 4.4
>
> See https://issues.apache.org/jira/browse/LUCENE-3907
>
> On Fri, Mar 28, 2014 at 10:17 AM, Software Dev
>  wrote:
>> Certainly I am not the only user experiencing this?
>>
>> On Wed, Mar 26, 2014 at 1:11 PM, Software Dev  
>> wrote:
>>> Is this a known bug?
>>>
>>> On Tue, Mar 25, 2014 at 1:12 PM, Software Dev  
>>> wrote:
>>>> Same problem here:
>>>> http://lucene.472066.n3.nabble.com/Solr-4-x-EdgeNGramFilterFactory-and-highlighting-td4114748.html
>>>>
>>>> On Tue, Mar 25, 2014 at 9:39 AM, Software Dev  
>>>> wrote:
>>>>> Bump
>>>>>
>>>>> On Mon, Mar 24, 2014 at 3:00 PM, Software Dev  
>>>>> wrote:
>>>>>> In 3.5.0 we have the following.
>>>>>>
>>>>>> >>>>> positionIncrementGap="100">
>>>>>>   
>>>>>> 
>>>>>> 
>>>>>> >>>>> maxGramSize="30"/>
>>>>>>   
>>>>>>   
>>>>>> 
>>>>>> 
>>>>>>   
>>>>>> 
>>>>>>
>>>>>> If we searched for "c" with highlighting enabled we would get back
>>>>>> results such as:
>>>>>>
>>>>>> cdat
>>>>>> crocdile
>>>>>> cool beans
>>>>>>
>>>>>> But in the latest Solr (4.7) we get the full words highlighted back.
>>>>>> Did something change from these versions with regards to highlighting?
>>>>>>
>>>>>> Thanks
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.

Highlighting bug with edgegrams

2014-04-09 Thread Software Dev

In 3.5.0 we have the following.


  



  
  


  


If we searched for "c" with highlighting enabled we would get back
results such as:

cdat
crocdile
cool beans

But in the latest Solr (4.7.1) we get the full words highlighted back.
Did something change from these versions with regards to highlighting?

Thanks

Found an old post but no info:

http://lucene.472066.n3.nabble.com/Solr-4-x-EdgeNGramFilterFactory-and-highlighting-td4114748.html

Re: Sharding and replicas (Solr Cloud)

2013-11-07 Thread Software Dev

Sorry about the confusion. I meant I created my config via the ZkCLI and
then I wanted to create my core via the CollectionsAPI. I *think* I have it
working but was wondering why there are a crazy amount of core names under
the admin "Core Selector"?

When I create X amount of shards via the bootstrap command I think it only
creates 1 core. Am I missing something?

On Thu, Nov 7, 2013 at 1:06 PM, Shawn Heisey  wrote:

> On 11/7/2013 1:58 PM, Mark wrote:
>
>> If I create my collection via the ZkCLI (https://cwiki.apache.org/
>> confluence/display/solr/Command+Line+Utilities) how do I configure the
>> number of shards and replicas?
>>
>
> I was not aware that  you could create collections with zkcli.  I did not
> think that was possible.  Use the collections API:
>
> http://wiki.apache.org/solr/SolrCloud#Managing_collections_via_the_
> Collections_API
>
> Thanks,
> Shawn
>
>

Re: Sharding and replicas (Solr Cloud)

2013-11-07 Thread Software Dev

I too want to be in control of everything that is created.

Here is what I'm trying to do.

1) Start up a cluster of 5 Solr Instances
2) Import the configuration to Zookeeper
3) Manually create a collection via the collections api with number of
shards and replication factor

Now there are some issues with step 3. After creating the collection reload
the GUI I always see:

   - *collection1:*
org.apache.solr.common.cloud.ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException:
   Could not find configName for collection collection1 found:null

until I restart the cluster. Is there a way around this?

Also after creating the collection it creates a directory in
$SOLR_HOME/home. So in this example it created
${SOLR_HOME}/collection1_shard1_replica1 and
${SOLR_HOME}/collection1_shard1_replica2. What happens when I rename both
of these to the same in the core admin?






On Thu, Nov 7, 2013 at 3:15 PM, Shawn Heisey  wrote:

> On 11/7/2013 2:52 PM, Software Dev wrote:
>
>> Sorry about the confusion. I meant I created my config via the ZkCLI and
>> then I wanted to create my core via the CollectionsAPI. I *think* I have
>> it
>> working but was wondering why there are a crazy amount of core names under
>> the admin "Core Selector"?
>>
>> When I create X amount of shards via the bootstrap command I think it only
>> creates 1 core. Am I missing something?
>>
>
> If you create it with numShards=1 and replicationFactor=2, you'll end up
> with a total of 2 cores across all your Solr instances.  For my simple
> cloud install, these are the numbers that I'm using.  One shard, a total of
> two copies.
>
> If you create it with the numbers given on the wiki page, numShards=3 and
> replicationFactor=4, there would be a total of 12 cores created across all
> your servers.  The maxShardsPerNode parameter defaults to 1, which means
> that only 1 core per instance (SolrCloud node) is allowed for that
> collection.  If there aren't enough Solr instances for the numbers you have
> entered, the creation will fail.
>
> I don't know any details about what the bootstrap_conf parameter actually
> does when it creates collections.  I've never used it - I want to be in
> control of the configs and collections that get created.
>
> Thanks,
> Shawn
>
>

Solr Cloud Bulk Indexing Questions

2014-01-20 Thread Software Dev

We are testing our shiny new Solr Cloud architecture but we are
experiencing some issues when doing bulk indexing.

We have 5 solr cloud machines running and 3 indexing machines (separate
from the cloud servers). The indexing machines pull off ids from a queue
then they index and ship over a document via a CloudSolrServer. It appears
that the indexers are too fast because the load (particularly disk io) on
the solr cloud machines spikes through the roof making the entire cluster
unusable. It's kind of odd because the total index size is not even
large..ie, < 10GB. Are there any optimization/enhancements I could try to
help alleviate these problems?

I should note that for the above collection we have only have 1 shard thats
replicated across all machines so all machines have the full index.

Would we benefit from switching to a ConcurrentUpdateSolrServer where all
updates get sent to 1 machine and 1 machine only? We could then remove this
machine from our cluster than that handles user requests.

Thanks for any input.

Re: Solr Cloud Bulk Indexing Questions

2014-01-20 Thread Software Dev

We commit have a soft commit every 5 seconds and hard commit every 30. As
far as docs/second it would guess around 200/sec which doesn't seem that
high.


On Mon, Jan 20, 2014 at 2:26 PM, Erick Erickson wrote:

> Questions: How often do you commit your updates? What is your
> indexing rate in docs/second?
>
> In a SolrCloud setup, you should be using a CloudSolrServer. If the
> server is having trouble keeping up with updates, switching to CUSS
> probably wouldn't help.
>
> So I suspect there's something not optimal about your setup that's
> the culprit.
>
> Best,
> Erick
>
> On Mon, Jan 20, 2014 at 4:00 PM, Software Dev 
> wrote:
> > We are testing our shiny new Solr Cloud architecture but we are
> > experiencing some issues when doing bulk indexing.
> >
> > We have 5 solr cloud machines running and 3 indexing machines (separate
> > from the cloud servers). The indexing machines pull off ids from a queue
> > then they index and ship over a document via a CloudSolrServer. It
> appears
> > that the indexers are too fast because the load (particularly disk io) on
> > the solr cloud machines spikes through the roof making the entire cluster
> > unusable. It's kind of odd because the total index size is not even
> > large..ie, < 10GB. Are there any optimization/enhancements I could try to
> > help alleviate these problems?
> >
> > I should note that for the above collection we have only have 1 shard
> thats
> > replicated across all machines so all machines have the full index.
> >
> > Would we benefit from switching to a ConcurrentUpdateSolrServer where all
> > updates get sent to 1 machine and 1 machine only? We could then remove
> this
> > machine from our cluster than that handles user requests.
> >
> > Thanks for any input.
>

Re: Solr Cloud Bulk Indexing Questions

2014-01-20 Thread Software Dev

We also noticed that disk IO shoots up to 100% on 1 of the nodes. Do all
updates get sent to one machine or something?


On Mon, Jan 20, 2014 at 2:42 PM, Software Dev wrote:

> We commit have a soft commit every 5 seconds and hard commit every 30. As
> far as docs/second it would guess around 200/sec which doesn't seem that
> high.
>
>
> On Mon, Jan 20, 2014 at 2:26 PM, Erick Erickson 
> wrote:
>
>> Questions: How often do you commit your updates? What is your
>> indexing rate in docs/second?
>>
>> In a SolrCloud setup, you should be using a CloudSolrServer. If the
>> server is having trouble keeping up with updates, switching to CUSS
>> probably wouldn't help.
>>
>> So I suspect there's something not optimal about your setup that's
>> the culprit.
>>
>> Best,
>> Erick
>>
>> On Mon, Jan 20, 2014 at 4:00 PM, Software Dev 
>> wrote:
>> > We are testing our shiny new Solr Cloud architecture but we are
>> > experiencing some issues when doing bulk indexing.
>> >
>> > We have 5 solr cloud machines running and 3 indexing machines (separate
>> > from the cloud servers). The indexing machines pull off ids from a queue
>> > then they index and ship over a document via a CloudSolrServer. It
>> appears
>> > that the indexers are too fast because the load (particularly disk io)
>> on
>> > the solr cloud machines spikes through the roof making the entire
>> cluster
>> > unusable. It's kind of odd because the total index size is not even
>> > large..ie, < 10GB. Are there any optimization/enhancements I could try
>> to
>> > help alleviate these problems?
>> >
>> > I should note that for the above collection we have only have 1 shard
>> thats
>> > replicated across all machines so all machines have the full index.
>> >
>> > Would we benefit from switching to a ConcurrentUpdateSolrServer where
>> all
>> > updates get sent to 1 machine and 1 machine only? We could then remove
>> this
>> > machine from our cluster than that handles user requests.
>> >
>> > Thanks for any input.
>>
>
>

Re: Solr Cloud Bulk Indexing Questions

2014-01-20 Thread Software Dev

4.6.0


On Mon, Jan 20, 2014 at 2:47 PM, Mark Miller  wrote:

> What version are you running?
>
> - Mark
>
> On Jan 20, 2014, at 5:43 PM, Software Dev 
> wrote:
>
> > We also noticed that disk IO shoots up to 100% on 1 of the nodes. Do all
> > updates get sent to one machine or something?
> >
> >
> > On Mon, Jan 20, 2014 at 2:42 PM, Software Dev  >wrote:
> >
> >> We commit have a soft commit every 5 seconds and hard commit every 30.
> As
> >> far as docs/second it would guess around 200/sec which doesn't seem that
> >> high.
> >>
> >>
> >> On Mon, Jan 20, 2014 at 2:26 PM, Erick Erickson <
> erickerick...@gmail.com>wrote:
> >>
> >>> Questions: How often do you commit your updates? What is your
> >>> indexing rate in docs/second?
> >>>
> >>> In a SolrCloud setup, you should be using a CloudSolrServer. If the
> >>> server is having trouble keeping up with updates, switching to CUSS
> >>> probably wouldn't help.
> >>>
> >>> So I suspect there's something not optimal about your setup that's
> >>> the culprit.
> >>>
> >>> Best,
> >>> Erick
> >>>
> >>> On Mon, Jan 20, 2014 at 4:00 PM, Software Dev <
> static.void@gmail.com>
> >>> wrote:
> >>>> We are testing our shiny new Solr Cloud architecture but we are
> >>>> experiencing some issues when doing bulk indexing.
> >>>>
> >>>> We have 5 solr cloud machines running and 3 indexing machines
> (separate
> >>>> from the cloud servers). The indexing machines pull off ids from a
> queue
> >>>> then they index and ship over a document via a CloudSolrServer. It
> >>> appears
> >>>> that the indexers are too fast because the load (particularly disk io)
> >>> on
> >>>> the solr cloud machines spikes through the roof making the entire
> >>> cluster
> >>>> unusable. It's kind of odd because the total index size is not even
> >>>> large..ie, < 10GB. Are there any optimization/enhancements I could try
> >>> to
> >>>> help alleviate these problems?
> >>>>
> >>>> I should note that for the above collection we have only have 1 shard
> >>> thats
> >>>> replicated across all machines so all machines have the full index.
> >>>>
> >>>> Would we benefit from switching to a ConcurrentUpdateSolrServer where
> >>> all
> >>>> updates get sent to 1 machine and 1 machine only? We could then remove
> >>> this
> >>>> machine from our cluster than that handles user requests.
> >>>>
> >>>> Thanks for any input.
> >>>
> >>
> >>
>
>

Removing a node from Solr Cloud

2014-01-21 Thread Software Dev

What is the process for completely removing a node from Solr Cloud? We
recently removed one but t its still showing up as "Gone" in the Cloud
admin.

Thanks

Setting leaderVoteWait for auto discovered cores

2014-01-21 Thread Software Dev

How is this accomplished? We currently have an empty solr.xml
(auto-discovery) so I'm not sure where to put this value?

Re: Removing a node from Solr Cloud

2014-01-21 Thread Software Dev

Thanks. Anyway to accomplish this if the machine crashed (ie, can't unload
it from that admin)?


On Tue, Jan 21, 2014 at 11:25 AM, Anshum Gupta wrote:

> You could unload the cores. This optionally also deletes the data and
> instance directory.
> Look at http://wiki.apache.org/solr/CoreAdmin#UNLOAD.
>
>
> On Tue, Jan 21, 2014 at 10:22 AM, Software Dev  >wrote:
>
> > What is the process for completely removing a node from Solr Cloud? We
> > recently removed one but t its still showing up as "Gone" in the Cloud
> > admin.
> >
> > Thanks
> >
>
>
>
> --
>
> Anshum Gupta
> http://www.anshumgupta.net
>

Re: Solr Cloud Bulk Indexing Questions

2014-01-21 Thread Software Dev

Any other suggestions?


On Mon, Jan 20, 2014 at 2:49 PM, Software Dev wrote:

> 4.6.0
>
>
> On Mon, Jan 20, 2014 at 2:47 PM, Mark Miller wrote:
>
>> What version are you running?
>>
>> - Mark
>>
>> On Jan 20, 2014, at 5:43 PM, Software Dev 
>> wrote:
>>
>> > We also noticed that disk IO shoots up to 100% on 1 of the nodes. Do all
>> > updates get sent to one machine or something?
>> >
>> >
>> > On Mon, Jan 20, 2014 at 2:42 PM, Software Dev <
>> static.void@gmail.com>wrote:
>> >
>> >> We commit have a soft commit every 5 seconds and hard commit every 30.
>> As
>> >> far as docs/second it would guess around 200/sec which doesn't seem
>> that
>> >> high.
>> >>
>> >>
>> >> On Mon, Jan 20, 2014 at 2:26 PM, Erick Erickson <
>> erickerick...@gmail.com>wrote:
>> >>
>> >>> Questions: How often do you commit your updates? What is your
>> >>> indexing rate in docs/second?
>> >>>
>> >>> In a SolrCloud setup, you should be using a CloudSolrServer. If the
>> >>> server is having trouble keeping up with updates, switching to CUSS
>> >>> probably wouldn't help.
>> >>>
>> >>> So I suspect there's something not optimal about your setup that's
>> >>> the culprit.
>> >>>
>> >>> Best,
>> >>> Erick
>> >>>
>> >>> On Mon, Jan 20, 2014 at 4:00 PM, Software Dev <
>> static.void@gmail.com>
>> >>> wrote:
>> >>>> We are testing our shiny new Solr Cloud architecture but we are
>> >>>> experiencing some issues when doing bulk indexing.
>> >>>>
>> >>>> We have 5 solr cloud machines running and 3 indexing machines
>> (separate
>> >>>> from the cloud servers). The indexing machines pull off ids from a
>> queue
>> >>>> then they index and ship over a document via a CloudSolrServer. It
>> >>> appears
>> >>>> that the indexers are too fast because the load (particularly disk
>> io)
>> >>> on
>> >>>> the solr cloud machines spikes through the roof making the entire
>> >>> cluster
>> >>>> unusable. It's kind of odd because the total index size is not even
>> >>>> large..ie, < 10GB. Are there any optimization/enhancements I could
>> try
>> >>> to
>> >>>> help alleviate these problems?
>> >>>>
>> >>>> I should note that for the above collection we have only have 1 shard
>> >>> thats
>> >>>> replicated across all machines so all machines have the full index.
>> >>>>
>> >>>> Would we benefit from switching to a ConcurrentUpdateSolrServer where
>> >>> all
>> >>>> updates get sent to 1 machine and 1 machine only? We could then
>> remove
>> >>> this
>> >>>> machine from our cluster than that handles user requests.
>> >>>>
>> >>>> Thanks for any input.
>> >>>
>> >>
>> >>
>>
>>
>

Re: Solr Cloud Bulk Indexing Questions

2014-01-22 Thread Software Dev

A suggestion would be to hard commit much less often, ie every 10
minutes, and see if there is a change.

- Will try this

How much system RAM ? JVM Heap ? Enough space in RAM for system disk cache ?

- We have 18G of ram 12 dedicated to Solr but as of right now the total
index size is only 5GB

Ah, and what about network IO ? Could that be a limiting factor ?

- What is the size of your documents ? A few KB, MB, ... ?

Under 1MB

- Again, total index size is only 5GB so I dont know if this would be a
problem






On Wed, Jan 22, 2014 at 12:26 AM, Andre Bois-Crettez
wrote:

> 1 node having more load should be the leader (because of the extra work
> of receiving and distributing updates, but my experiences show only a
> bit more CPU usage, and no difference in disk IO).
>
> A suggestion would be to hard commit much less often, ie every 10
> minutes, and see if there is a change.
> How much system RAM ? JVM Heap ? Enough space in RAM for system disk cache
> ?
> What is the size of your documents ? A few KB, MB, ... ?
> Ah, and what about network IO ? Could that be a limiting factor ?
>
>
> André
>
>
> On 2014-01-21 23:40, Software Dev wrote:
>
>> Any other suggestions?
>>
>>
>> On Mon, Jan 20, 2014 at 2:49 PM, Software Dev 
>> wrote:
>>
>>  4.6.0
>>>
>>>
>>> On Mon, Jan 20, 2014 at 2:47 PM, Mark Miller >> >wrote:
>>>
>>>  What version are you running?
>>>>
>>>> - Mark
>>>>
>>>> On Jan 20, 2014, at 5:43 PM, Software Dev 
>>>> wrote:
>>>>
>>>>  We also noticed that disk IO shoots up to 100% on 1 of the nodes. Do
>>>>> all
>>>>> updates get sent to one machine or something?
>>>>>
>>>>>
>>>>> On Mon, Jan 20, 2014 at 2:42 PM, Software Dev <
>>>>>
>>>> static.void@gmail.com>wrote:
>>>>
>>>>> We commit have a soft commit every 5 seconds and hard commit every 30.
>>>>>>
>>>>> As
>>>>
>>>>> far as docs/second it would guess around 200/sec which doesn't seem
>>>>>>
>>>>> that
>>>>
>>>>> high.
>>>>>>
>>>>>>
>>>>>> On Mon, Jan 20, 2014 at 2:26 PM, Erick Erickson <
>>>>>>
>>>>> erickerick...@gmail.com>wrote:
>>>>
>>>>> Questions: How often do you commit your updates? What is your
>>>>>>> indexing rate in docs/second?
>>>>>>>
>>>>>>> In a SolrCloud setup, you should be using a CloudSolrServer. If the
>>>>>>> server is having trouble keeping up with updates, switching to CUSS
>>>>>>> probably wouldn't help.
>>>>>>>
>>>>>>> So I suspect there's something not optimal about your setup that's
>>>>>>> the culprit.
>>>>>>>
>>>>>>> Best,
>>>>>>> Erick
>>>>>>>
>>>>>>> On Mon, Jan 20, 2014 at 4:00 PM, Software Dev <
>>>>>>>
>>>>>> static.void@gmail.com>
>>>>
>>>>> wrote:
>>>>>>>
>>>>>>>> We are testing our shiny new Solr Cloud architecture but we are
>>>>>>>> experiencing some issues when doing bulk indexing.
>>>>>>>>
>>>>>>>> We have 5 solr cloud machines running and 3 indexing machines
>>>>>>>>
>>>>>>> (separate
>>>>
>>>>> from the cloud servers). The indexing machines pull off ids from a
>>>>>>>>
>>>>>>> queue
>>>>
>>>>> then they index and ship over a document via a CloudSolrServer. It
>>>>>>>>
>>>>>>> appears
>>>>>>>
>>>>>>>> that the indexers are too fast because the load (particularly disk
>>>>>>>>
>>>>>>> io)
>>>>
>>>>> on
>>>>>>>
>>>>>>>> the solr cloud machines spikes through the roof making the entire
>>>>>>>>
>>>>>>> cluster
>>>>>>>
>>>>>>>> unusable. It's kind of odd because the total index size is not even
>>>>>>>> large..ie, < 10GB. Are there any optimization/enhancements I could
>>>>>>>>
>>>>>>> try
>>>>
>>>>> to
>>>>>>>
>>>>>>>> help alleviate these problems?
>>>>>>>>
>>>>>>>> I should note that for the above collection we have only have 1
>>>>>>>> shard
>>>>>>>>
>>>>>>> thats
>>>>>>>
>>>>>>>> replicated across all machines so all machines have the full index.
>>>>>>>>
>>>>>>>> Would we benefit from switching to a ConcurrentUpdateSolrServer
>>>>>>>> where
>>>>>>>>
>>>>>>> all
>>>>>>>
>>>>>>>> updates get sent to 1 machine and 1 machine only? We could then
>>>>>>>>
>>>>>>> remove
>>>>
>>>>> this
>>>>>>>
>>>>>>>> machine from our cluster than that handles user requests.
>>>>>>>>
>>>>>>>> Thanks for any input.
>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>> --
>> André Bois-Crettez
>>
>> Software Architect
>> Search Developer
>> http://www.kelkoo.com/
>>
>
> Kelkoo SAS
> Société par Actions Simplifiée
> Au capital de € 4.168.964,30
> Siège social : 8, rue du Sentier 75002 Paris
> 425 093 069 RCS Paris
>
> Ce message et les pièces jointes sont confidentiels et établis à
> l'attention exclusive de leurs destinataires. Si vous n'êtes pas le
> destinataire de ce message, merci de le détruire et d'en avertir
> l'expéditeur.
>

Re: Solr Cloud Bulk Indexing Questions

2014-01-23 Thread Software Dev

Thanks for suggestions. After reading that document I feel even more
confused though because I always thought that hard commits should be less
frequent that hard commits.

Is there any way to configure autoCommit, softCommit values on a per
request basis? The majority of the time we have small flow of updates
coming in and we would like to see them in ASAP. However we occasionally
need to do some bulk indexing (once a week or less) and the need to see
those updates right away isn't as critical.

I would say 95% of the time we are in "Index-Light Query-Light/Heavy" mode
and the other 5% is "Index-Heavy Query-Light/Heavy" mode.

Thanks


On Wed, Jan 22, 2014 at 5:33 PM, Erick Erickson wrote:

> When you're doing hard commits, is it with openSeacher = true or
> false? It should probably be false...
>
> Here's a rundown of the soft/hard commit consequences:
>
>
> http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>
> I suspect (but, of course, can't prove) that you're over-committing
> and hitting segment
> merges without meaning to...
>
> FWIW,
> Erick
>
> On Wed, Jan 22, 2014 at 1:46 PM, Software Dev 
> wrote:
> > A suggestion would be to hard commit much less often, ie every 10
> > minutes, and see if there is a change.
> >
> > - Will try this
> >
> > How much system RAM ? JVM Heap ? Enough space in RAM for system disk
> cache ?
> >
> > - We have 18G of ram 12 dedicated to Solr but as of right now the total
> > index size is only 5GB
> >
> > Ah, and what about network IO ? Could that be a limiting factor ?
> >
> > - What is the size of your documents ? A few KB, MB, ... ?
> >
> > Under 1MB
> >
> > - Again, total index size is only 5GB so I dont know if this would be a
> > problem
> >
> >
> >
> >
> >
> >
> > On Wed, Jan 22, 2014 at 12:26 AM, Andre Bois-Crettez
> > wrote:
> >
> >> 1 node having more load should be the leader (because of the extra work
> >> of receiving and distributing updates, but my experiences show only a
> >> bit more CPU usage, and no difference in disk IO).
> >>
> >> A suggestion would be to hard commit much less often, ie every 10
> >> minutes, and see if there is a change.
> >> How much system RAM ? JVM Heap ? Enough space in RAM for system disk
> cache
> >> ?
> >> What is the size of your documents ? A few KB, MB, ... ?
> >> Ah, and what about network IO ? Could that be a limiting factor ?
> >>
> >>
> >> André
> >>
> >>
> >> On 2014-01-21 23:40, Software Dev wrote:
> >>
> >>> Any other suggestions?
> >>>
> >>>
> >>> On Mon, Jan 20, 2014 at 2:49 PM, Software Dev <
> static.void@gmail.com>
> >>> wrote:
> >>>
> >>>  4.6.0
> >>>>
> >>>>
> >>>> On Mon, Jan 20, 2014 at 2:47 PM, Mark Miller  >>>> >wrote:
> >>>>
> >>>>  What version are you running?
> >>>>>
> >>>>> - Mark
> >>>>>
> >>>>> On Jan 20, 2014, at 5:43 PM, Software Dev  >
> >>>>> wrote:
> >>>>>
> >>>>>  We also noticed that disk IO shoots up to 100% on 1 of the nodes. Do
> >>>>>> all
> >>>>>> updates get sent to one machine or something?
> >>>>>>
> >>>>>>
> >>>>>> On Mon, Jan 20, 2014 at 2:42 PM, Software Dev <
> >>>>>>
> >>>>> static.void@gmail.com>wrote:
> >>>>>
> >>>>>> We commit have a soft commit every 5 seconds and hard commit every
> 30.
> >>>>>>>
> >>>>>> As
> >>>>>
> >>>>>> far as docs/second it would guess around 200/sec which doesn't seem
> >>>>>>>
> >>>>>> that
> >>>>>
> >>>>>> high.
> >>>>>>>
> >>>>>>>
> >>>>>>> On Mon, Jan 20, 2014 at 2:26 PM, Erick Erickson <
> >>>>>>>
> >>>>>> erickerick...@gmail.com>wrote:
> >>>>>
> >>>>>> Questions: How often do you commit your updates? What is your
> >>>>>>>> indexing rate in docs/second?
> >>>>>>>>
> >>>>>>>&g

Re: Solr Cloud Bulk Indexing Questions

2014-01-23 Thread Software Dev

Also, any suggestions on debugging? What should I look for and how? Thanks


On Thu, Jan 23, 2014 at 10:01 AM, Software Dev wrote:

> Thanks for suggestions. After reading that document I feel even more
> confused though because I always thought that hard commits should be less
> frequent that hard commits.
>
> Is there any way to configure autoCommit, softCommit values on a per
> request basis? The majority of the time we have small flow of updates
> coming in and we would like to see them in ASAP. However we occasionally
> need to do some bulk indexing (once a week or less) and the need to see
> those updates right away isn't as critical.
>
> I would say 95% of the time we are in "Index-Light Query-Light/Heavy" mode
> and the other 5% is "Index-Heavy Query-Light/Heavy" mode.
>
> Thanks
>
>
> On Wed, Jan 22, 2014 at 5:33 PM, Erick Erickson 
> wrote:
>
>> When you're doing hard commits, is it with openSeacher = true or
>> false? It should probably be false...
>>
>> Here's a rundown of the soft/hard commit consequences:
>>
>>
>> http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>>
>> I suspect (but, of course, can't prove) that you're over-committing
>> and hitting segment
>> merges without meaning to...
>>
>> FWIW,
>> Erick
>>
>> On Wed, Jan 22, 2014 at 1:46 PM, Software Dev 
>> wrote:
>> > A suggestion would be to hard commit much less often, ie every 10
>> > minutes, and see if there is a change.
>> >
>> > - Will try this
>> >
>> > How much system RAM ? JVM Heap ? Enough space in RAM for system disk
>> cache ?
>> >
>> > - We have 18G of ram 12 dedicated to Solr but as of right now the total
>> > index size is only 5GB
>> >
>> > Ah, and what about network IO ? Could that be a limiting factor ?
>> >
>> > - What is the size of your documents ? A few KB, MB, ... ?
>> >
>> > Under 1MB
>> >
>> > - Again, total index size is only 5GB so I dont know if this would be a
>> > problem
>> >
>> >
>> >
>> >
>> >
>> >
>> > On Wed, Jan 22, 2014 at 12:26 AM, Andre Bois-Crettez
>> > wrote:
>> >
>> >> 1 node having more load should be the leader (because of the extra work
>> >> of receiving and distributing updates, but my experiences show only a
>> >> bit more CPU usage, and no difference in disk IO).
>> >>
>> >> A suggestion would be to hard commit much less often, ie every 10
>> >> minutes, and see if there is a change.
>> >> How much system RAM ? JVM Heap ? Enough space in RAM for system disk
>> cache
>> >> ?
>> >> What is the size of your documents ? A few KB, MB, ... ?
>> >> Ah, and what about network IO ? Could that be a limiting factor ?
>> >>
>> >>
>> >> André
>> >>
>> >>
>> >> On 2014-01-21 23:40, Software Dev wrote:
>> >>
>> >>> Any other suggestions?
>> >>>
>> >>>
>> >>> On Mon, Jan 20, 2014 at 2:49 PM, Software Dev <
>> static.void@gmail.com>
>> >>> wrote:
>> >>>
>> >>>  4.6.0
>> >>>>
>> >>>>
>> >>>> On Mon, Jan 20, 2014 at 2:47 PM, Mark Miller > >>>> >wrote:
>> >>>>
>> >>>>  What version are you running?
>> >>>>>
>> >>>>> - Mark
>> >>>>>
>> >>>>> On Jan 20, 2014, at 5:43 PM, Software Dev <
>> static.void@gmail.com>
>> >>>>> wrote:
>> >>>>>
>> >>>>>  We also noticed that disk IO shoots up to 100% on 1 of the nodes.
>> Do
>> >>>>>> all
>> >>>>>> updates get sent to one machine or something?
>> >>>>>>
>> >>>>>>
>> >>>>>> On Mon, Jan 20, 2014 at 2:42 PM, Software Dev <
>> >>>>>>
>> >>>>> static.void@gmail.com>wrote:
>> >>>>>
>> >>>>>> We commit have a soft commit every 5 seconds and hard commit every
>> 30.
>> >>>>>>>
>> >>>>>> As
>> >>>>>
>> >>>>>> far as docs/second it would guess around 200/s

Re: Solr Cloud Bulk Indexing Questions

2014-01-23 Thread Software Dev

Does maxWriteMBPerSec apply to NRTCachingDirectoryFactory? I only
see maxMergeSizeMB and maxCachedMB as configuration values.


On Thu, Jan 23, 2014 at 11:05 AM, Otis Gospodnetic <
otis.gospodne...@gmail.com> wrote:

> Hi,
>
> Have you tried maxWriteMBPerSec?
>
> http://search-lucene.com/?q=maxWriteMBPerSec&fc_project=Solr
>
> Otis
> --
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/
>
>
> On Mon, Jan 20, 2014 at 4:00 PM, Software Dev  >wrote:
>
> > We are testing our shiny new Solr Cloud architecture but we are
> > experiencing some issues when doing bulk indexing.
> >
> > We have 5 solr cloud machines running and 3 indexing machines (separate
> > from the cloud servers). The indexing machines pull off ids from a queue
> > then they index and ship over a document via a CloudSolrServer. It
> appears
> > that the indexers are too fast because the load (particularly disk io) on
> > the solr cloud machines spikes through the roof making the entire cluster
> > unusable. It's kind of odd because the total index size is not even
> > large..ie, < 10GB. Are there any optimization/enhancements I could try to
> > help alleviate these problems?
> >
> > I should note that for the above collection we have only have 1 shard
> thats
> > replicated across all machines so all machines have the full index.
> >
> > Would we benefit from switching to a ConcurrentUpdateSolrServer where all
> > updates get sent to 1 machine and 1 machine only? We could then remove
> this
> > machine from our cluster than that handles user requests.
> >
> > Thanks for any input.
> >
>

SolrCloudServer questions

2014-01-31 Thread Software Dev

Can someone clarify what the following options are:

- updatesToLeaders
- shutdownLBHttpSolrServer
- parallelUpdates

Also, I remember in older version of Solr there was an efficient format
that was used between SolrJ and Solr that is more compact. Does this sill
exist in the latest version of Solr? If so, is it the default?

Thanks

Disabling Commit/Auto-Commit (SolrCloud)

2014-01-31 Thread Software Dev

Is there a way to disable commit/hard-commit at runtime? For example, we
usually have our hard commit and soft-commit set really low but when we do
bulk indexing we would like to disable this to increase performance. If
there isn't a an easy way of doing this would simply pushing a new
solrconfig to solrcloud work?

Re: SolrCloudServer questions

2014-01-31 Thread Software Dev

Which of any of these settings would be beneficial when bulk uploading?


On Fri, Jan 31, 2014 at 11:05 AM, Mark Miller  wrote:

>
>
> On Jan 31, 2014, at 1:56 PM, Greg Walters 
> wrote:
>
> > I'm assuming you mean CloudSolrServer here. If I'm wrong please ignore
> my response.
> >
> >> -updatesToLeaders
> >
> > Only send documents to shard leaders while indexing. This saves
> cross-talk between slaves and leaders which results in more efficient
> document routing.
>
> Right, but recently this has less of an affect because CloudSolrServer can
> now hash documents and directly send them to the right place. This option
> has become more historical. Just make sure you set the correct id field on
> the CloudSolrServer instance for this hashing to work (I think it defaults
> to "id").
>
> >
> >> shutdownLBHttpSolrServer
> >
> > CloudSolrServer uses a LBHttpSolrServer behind the scenes to distribute
> requests (that aren't updates directly to leaders). Where did you find
> this? I don't see this in the javadoc anywhere but it is a boolean in the
> CloudSolrServer class. It looks like when you create a new CloudSolrServer
> and pass it your own LBHttpSolrServer the boolean gets set to false and the
> CloudSolrServer won't shut down the LBHttpSolrServer when it gets shut down.
> >
> >> parellelUpdates
> >
> > The javadoc's done have any description for this one but I checked out
> the code for CloudSolrServer and if parallelUpdates it looks like it
> executes update statements to multiple shards at the same time.
>
> Right, we should def add some javadoc, but this sends updates to shards in
> parallel rather than with a single thread. Can really increase update
> speed. Still not as powerful as using CloudSolrServer from multiple
> threads, but a nice improvement non the less.
>
>
> - Mark
>
> http://about.me/markrmiller
>
> >
> > I'm no dev but I can read so please excuse any errors on my part.
> >
> > Thanks,
> > Greg
> >
> > On Jan 31, 2014, at 11:40 AM, Software Dev 
> wrote:
> >
> >> Can someone clarify what the following options are:
> >>
> >> - updatesToLeaders
> >> - shutdownLBHttpSolrServer
> >> - parallelUpdates
> >>
> >> Also, I remember in older version of Solr there was an efficient format
> >> that was used between SolrJ and Solr that is more compact. Does this
> sill
> >> exist in the latest version of Solr? If so, is it the default?
> >>
> >> Thanks
> >
>
>

Re: SolrCloudServer questions

2014-02-01 Thread Software Dev

Out use case is we have 3 indexing machines pulling off a kafka queue and
they are all sending individual updates.


On Fri, Jan 31, 2014 at 12:54 PM, Mark Miller  wrote:

> Just make sure parallel updates is set to true.
>
> If you want to load even faster, you can use the bulk add methods, or if
> you need more fine grained responses, use the single add from multiple
> threads (though bulk add can also be done via multiple threads if you
> really want to try and push the max).
>
> - Mark
>
> http://about.me/markrmiller
>
> On Jan 31, 2014, at 3:50 PM, Software Dev 
> wrote:
>
> > Which of any of these settings would be beneficial when bulk uploading?
> >
> >
> > On Fri, Jan 31, 2014 at 11:05 AM, Mark Miller 
> wrote:
> >
> >>
> >>
> >> On Jan 31, 2014, at 1:56 PM, Greg Walters 
> >> wrote:
> >>
> >>> I'm assuming you mean CloudSolrServer here. If I'm wrong please ignore
> >> my response.
> >>>
> >>>> -updatesToLeaders
> >>>
> >>> Only send documents to shard leaders while indexing. This saves
> >> cross-talk between slaves and leaders which results in more efficient
> >> document routing.
> >>
> >> Right, but recently this has less of an affect because CloudSolrServer
> can
> >> now hash documents and directly send them to the right place. This
> option
> >> has become more historical. Just make sure you set the correct id field
> on
> >> the CloudSolrServer instance for this hashing to work (I think it
> defaults
> >> to "id").
> >>
> >>>
> >>>> shutdownLBHttpSolrServer
> >>>
> >>> CloudSolrServer uses a LBHttpSolrServer behind the scenes to distribute
> >> requests (that aren't updates directly to leaders). Where did you find
> >> this? I don't see this in the javadoc anywhere but it is a boolean in
> the
> >> CloudSolrServer class. It looks like when you create a new
> CloudSolrServer
> >> and pass it your own LBHttpSolrServer the boolean gets set to false and
> the
> >> CloudSolrServer won't shut down the LBHttpSolrServer when it gets shut
> down.
> >>>
> >>>> parellelUpdates
> >>>
> >>> The javadoc's done have any description for this one but I checked out
> >> the code for CloudSolrServer and if parallelUpdates it looks like it
> >> executes update statements to multiple shards at the same time.
> >>
> >> Right, we should def add some javadoc, but this sends updates to shards
> in
> >> parallel rather than with a single thread. Can really increase update
> >> speed. Still not as powerful as using CloudSolrServer from multiple
> >> threads, but a nice improvement non the less.
> >>
> >>
> >> - Mark
> >>
> >> http://about.me/markrmiller
> >>
> >>>
> >>> I'm no dev but I can read so please excuse any errors on my part.
> >>>
> >>> Thanks,
> >>> Greg
> >>>
> >>> On Jan 31, 2014, at 11:40 AM, Software Dev 
> >> wrote:
> >>>
> >>>> Can someone clarify what the following options are:
> >>>>
> >>>> - updatesToLeaders
> >>>> - shutdownLBHttpSolrServer
> >>>> - parallelUpdates
> >>>>
> >>>> Also, I remember in older version of Solr there was an efficient
> format
> >>>> that was used between SolrJ and Solr that is more compact. Does this
> >> sill
> >>>> exist in the latest version of Solr? If so, is it the default?
> >>>>
> >>>> Thanks
> >>>
> >>
> >>
>
>

Re: SolrCloudServer questions

2014-02-01 Thread Software Dev

Also, if we are seeing a huge cpu spike on the leader when doing a bulk
index, would changing any of the options help?


On Sat, Feb 1, 2014 at 2:59 PM, Software Dev wrote:

> Out use case is we have 3 indexing machines pulling off a kafka queue and
> they are all sending individual updates.
>
>
> On Fri, Jan 31, 2014 at 12:54 PM, Mark Miller wrote:
>
>> Just make sure parallel updates is set to true.
>>
>> If you want to load even faster, you can use the bulk add methods, or if
>> you need more fine grained responses, use the single add from multiple
>> threads (though bulk add can also be done via multiple threads if you
>> really want to try and push the max).
>>
>> - Mark
>>
>> http://about.me/markrmiller
>>
>> On Jan 31, 2014, at 3:50 PM, Software Dev 
>> wrote:
>>
>> > Which of any of these settings would be beneficial when bulk uploading?
>> >
>> >
>> > On Fri, Jan 31, 2014 at 11:05 AM, Mark Miller 
>> wrote:
>> >
>> >>
>> >>
>> >> On Jan 31, 2014, at 1:56 PM, Greg Walters 
>> >> wrote:
>> >>
>> >>> I'm assuming you mean CloudSolrServer here. If I'm wrong please ignore
>> >> my response.
>> >>>
>> >>>> -updatesToLeaders
>> >>>
>> >>> Only send documents to shard leaders while indexing. This saves
>> >> cross-talk between slaves and leaders which results in more efficient
>> >> document routing.
>> >>
>> >> Right, but recently this has less of an affect because CloudSolrServer
>> can
>> >> now hash documents and directly send them to the right place. This
>> option
>> >> has become more historical. Just make sure you set the correct id
>> field on
>> >> the CloudSolrServer instance for this hashing to work (I think it
>> defaults
>> >> to "id").
>> >>
>> >>>
>> >>>> shutdownLBHttpSolrServer
>> >>>
>> >>> CloudSolrServer uses a LBHttpSolrServer behind the scenes to
>> distribute
>> >> requests (that aren't updates directly to leaders). Where did you find
>> >> this? I don't see this in the javadoc anywhere but it is a boolean in
>> the
>> >> CloudSolrServer class. It looks like when you create a new
>> CloudSolrServer
>> >> and pass it your own LBHttpSolrServer the boolean gets set to false
>> and the
>> >> CloudSolrServer won't shut down the LBHttpSolrServer when it gets shut
>> down.
>> >>>
>> >>>> parellelUpdates
>> >>>
>> >>> The javadoc's done have any description for this one but I checked out
>> >> the code for CloudSolrServer and if parallelUpdates it looks like it
>> >> executes update statements to multiple shards at the same time.
>> >>
>> >> Right, we should def add some javadoc, but this sends updates to
>> shards in
>> >> parallel rather than with a single thread. Can really increase update
>> >> speed. Still not as powerful as using CloudSolrServer from multiple
>> >> threads, but a nice improvement non the less.
>> >>
>> >>
>> >> - Mark
>> >>
>> >> http://about.me/markrmiller
>> >>
>> >>>
>> >>> I'm no dev but I can read so please excuse any errors on my part.
>> >>>
>> >>> Thanks,
>> >>> Greg
>> >>>
>> >>> On Jan 31, 2014, at 11:40 AM, Software Dev > >
>> >> wrote:
>> >>>
>> >>>> Can someone clarify what the following options are:
>> >>>>
>> >>>> - updatesToLeaders
>> >>>> - shutdownLBHttpSolrServer
>> >>>> - parallelUpdates
>> >>>>
>> >>>> Also, I remember in older version of Solr there was an efficient
>> format
>> >>>> that was used between SolrJ and Solr that is more compact. Does this
>> >> sill
>> >>>> exist in the latest version of Solr? If so, is it the default?
>> >>>>
>> >>>> Thanks
>> >>>
>> >>
>> >>
>>
>>
>

How does Solr parse schema.xml?

2014-02-26 Thread Software Dev

Can anyone point me in the right direction. I'm trying to duplicate the
functionality of the analysis request handler so we can wrap a service
around it to return the terms given a string of text. We would like to read
the same schema.xml file to configure the analyzer,tokenizer, etc but I
can't seem to find the class that actually does the parsing of that file.

Thanks

44 matches

Mail list logo