from:"Nathan Neulinger"

problem with replication/solrcloud - getting 'missing required field' during update intermittently (SOLR-6251)

2014-07-15 Thread Nathan Neulinger

Issue was closed in Jira requesting it be discussed here first. Looking for any diagnostic assistance on this issue with 
4.8.0 since it is intermittent and occurs without warning.


Setup is two nodes, with external zk ensemble. Nodes are accessed round-robin 
on EC2 behind an ELB.

Schema has:


...
   omitNorms="true" />

...


Most of the updates are working without issue, but randomly we'll get the above failure, even though searches before and 
after the update clearly indicate that the document had the timestamp field in it. The error occurs when the second node 
does it's distrib operation against the first node.


Diagnostic details are all in the jira issue. Can provide more as needed, but would appreciate any suggestions on what 
to try or to help diagnose this other than just trying to throw thousands of requests at it in round-robin between the 
two instances to see if it's possible to reproduce the issue.


-- Nathan

--------
Nathan Neulinger   nn...@neulinger.org
Neulinger Consulting   (573) 612-1412

Re: problem with replication/solrcloud - getting 'missing required field' during update intermittently (SOLR-6251)

2014-07-16 Thread Nathan Neulinger

FYI. We finally tracked down the problem at least 99.9% sure at this point, and it was staring me in the face the
whole time - just never noticed:

[{"id":"4b2c4d09-31e2-4fe2-b767-3868efbdcda1","channel": {"add": "preet"},"channel":
{"add": "adam"}}]

Look at the JSON... It's trying to add two "channel" array elements... Should
have been:

[{"id":"4b2c4d09-31e2-4fe2-b767-3868efbdcda1","channel": {"add": "preet"}},
{"id":"4b2c4d09-31e2-4fe2-b767-3868efbdcda1","channel": {"add": "adam"}}]

I half wonder how it chose to interpret that particular chunk of json, but either way, I think the origin of our issue
is resolved.

From what I'm reading on JSON - this isn't valid syntax at all. I'm guessing that SOLR doesn't actually validate the
JSON, and it's parser is just creating something weird in that situation like a new request for a whole new document.

-- Nathan

On 07/15/2014 07:19 PM, Nathan Neulinger wrote:

Issue was closed in Jira requesting it be discussed here first. Looking for any
diagnostic assistance on this issue with
4.8.0 since it is intermittent and occurs without warning.

Setup is two nodes, with external zk ensemble. Nodes are accessed round-robin
on EC2 behind an ELB.

Schema has:

...

Most of the updates are working without issue, but randomly we'll get the above
failure, even though searches before and
after the update clearly indicate that the document had the timestamp field in
it. The error occurs when the second node
does it's distrib operation against the first node.

Diagnostic details are all in the jira issue. Can provide more as needed, but
would appreciate any suggestions on what
to try or to help diagnose this other than just trying to throw thousands of
requests at it in round-robin between the
two instances to see if it's possible to reproduce the issue.

-- Nathan

--------
Nathan Neulinger nn...@neulinger.org
Neulinger Consulting (573) 612-1412

Nathan Neulinger nn...@neulinger.org
Neulinger Consulting (573) 612-1412

/solr/admin/ping causing exceptions in log?

2014-07-26 Thread Nathan Neulinger

Recently deployed haproxy in front of my solr instances, and seeing a large number of exceptions in the logs now... 
Example below. I can pound the server with requests against /solr/admin/ping via curl, with no obvious issue, but the 
haproxy checks appear to be aggravating something.


Solr 4.8.0 w/ solr cloud, 2 nodes, 3 zk, linux x86_64

It seems like when the issue occurs, I get a set of the errors all in a burst 
(below), never just one.

Suggestions?

-- Nathan


Nathan Neulinger   nn...@neulinger.org
Neulinger Consulting   (573) 612-1412



2014-07-26 23:04:36,506 ERROR qtp1532385072-4864 [g.apache.solr.servlet.SolrDispatchFilter]  - 
null:org.eclipse.jetty.io.EofException

at 
org.eclipse.jetty.http.HttpGenerator.flushBuffer(HttpGenerator.java:914)
at 
org.eclipse.jetty.http.AbstractGenerator.flush(AbstractGenerator.java:443)
at org.eclipse.jetty.server.HttpOutput.flush(HttpOutput.java:100)
at 
org.eclipse.jetty.server.AbstractHttpConnection$Output.flush(AbstractHttpConnection.java:1094)
at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:297)
at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:141)
at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229)
at org.apache.solr.util.FastWriter.flush(FastWriter.java:137)
at 
org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:763)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:431)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:339)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at 
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at 
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
at 
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at 
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.SocketException: Connection reset
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:118)
at java.net.SocketOutputStream.write(SocketOutputStream.java:159)
at 
org.eclipse.jetty.io.ByteArrayBuffer.writeTo(ByteArrayBuffer.java:375)
at 
org.eclipse.jetty.io.bio.StreamEndPoint.flush(StreamEndPoint.java:164)
at 
org.eclipse.jetty.io.bio.StreamEndPoint.flush(StreamEndPoint.java:194)
at 
org.eclipse.jetty.http.HttpGenerator.flushBuffer(HttpGenerator.java:838)
... 36 more

2014-07-26 23:04:36,513 ERROR qtp1532385072-4864 [g.apache.solr.servlet.SolrDispatchFilter]  - 
null:org.eclipse.jetty.io.EofException

at 
org.eclipse.jetty.http.HttpGenerator.flushBuffer(HttpGenerator.java:914)
at 
org.eclipse.jetty.http.AbstractGenerator.flush(AbstractGenerator.java:443

Re: /solr/admin/ping causing exceptions in log?

2014-07-26 Thread Nathan Neulinger

Tried changing to use /solr/admin/cores instead as a test - still see the
same issue, though much less frequent.


Nathan Neulinger   nn...@neulinger.org
Neulinger Consulting   (573) 612-1412




On Sat, Jul 26, 2014 at 6:15 PM, Nathan Neulinger 
wrote:

> Recently deployed haproxy in front of my solr instances, and seeing a
> large number of exceptions in the logs now... Example below. I can pound
> the server with requests against /solr/admin/ping via curl, with no obvious
> issue, but the haproxy checks appear to be aggravating something.
>
> Solr 4.8.0 w/ solr cloud, 2 nodes, 3 zk, linux x86_64
>
> It seems like when the issue occurs, I get a set of the errors all in a
> burst (below), never just one.
>
> Suggestions?
>
> -- Nathan
>
> --------
> Nathan Neulinger   nn...@neulinger.org
> Neulinger Consulting   (573) 612-1412
>
>
>
> 2014-07-26 23:04:36,506 ERROR qtp1532385072-4864 
> [g.apache.solr.servlet.SolrDispatchFilter]
>  - null:org.eclipse.jetty.io.EofException
> at org.eclipse.jetty.http.HttpGenerator.flushBuffer(
> HttpGenerator.java:914)
> at org.eclipse.jetty.http.AbstractGenerator.flush(
> AbstractGenerator.java:443)
> at org.eclipse.jetty.server.HttpOutput.flush(HttpOutput.java:100)
> at org.eclipse.jetty.server.AbstractHttpConnection$Output.
> flush(AbstractHttpConnection.java:1094)
> at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:297)
> at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:141)
> at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229)
> at org.apache.solr.util.FastWriter.flush(FastWriter.java:137)
> at org.apache.solr.servlet.SolrDispatchFilter.writeResponse(
> SolrDispatchFilter.java:763)
> at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> SolrDispatchFilter.java:431)
> at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> SolrDispatchFilter.java:339)
> at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> SolrDispatchFilter.java:207)
> at org.eclipse.jetty.servlet.ServletHandler$CachedChain.
> doFilter(ServletHandler.java:1419)
> at org.eclipse.jetty.servlet.ServletHandler.doHandle(
> ServletHandler.java:455)
> at org.eclipse.jetty.server.handler.ScopedHandler.handle(
> ScopedHandler.java:137)
> at org.eclipse.jetty.security.SecurityHandler.handle(
> SecurityHandler.java:557)
> at org.eclipse.jetty.server.session.SessionHandler.
> doHandle(SessionHandler.java:231)
> at org.eclipse.jetty.server.handler.ContextHandler.
> doHandle(ContextHandler.java:1075)
> at org.eclipse.jetty.servlet.ServletHandler.doScope(
> ServletHandler.java:384)
> at org.eclipse.jetty.server.session.SessionHandler.
> doScope(SessionHandler.java:193)
> at org.eclipse.jetty.server.handler.ContextHandler.
> doScope(ContextHandler.java:1009)
> at org.eclipse.jetty.server.handler.ScopedHandler.handle(
> ScopedHandler.java:135)
> at org.eclipse.jetty.server.handler.ContextHandlerCollection.
> handle(ContextHandlerCollection.java:255)
> at org.eclipse.jetty.server.handler.HandlerCollection.
> handle(HandlerCollection.java:154)
> at org.eclipse.jetty.server.handler.HandlerWrapper.handle(
> HandlerWrapper.java:116)
> at org.eclipse.jetty.server.Server.handle(Server.java:368)
> at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(
> AbstractHttpConnection.java:489)
> at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(
> BlockingHttpConnection.java:53)
> at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(
> AbstractHttpConnection.java:942)
> at org.eclipse.jetty.server.AbstractHttpConnection$
> RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
> at org.eclipse.jetty.http.HttpParser.parseNext(
> HttpParser.java:640)
> at org.eclipse.jetty.http.HttpParser.parseAvailable(
> HttpParser.java:235)
> at org.eclipse.jetty.server.BlockingHttpConnection.handle(
> BlockingHttpConnection.java:72)
> at org.eclipse.jetty.server.bio.SocketConnector$
> ConnectorEndPoint.run(SocketConnector.java:264)
> at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(
> QueuedThreadPool.java:608)
> at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(
> QueuedThreadPool.java:543)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.net.SocketException: Conn

Re: /solr/admin/ping causing exceptions in log?

2014-07-27 Thread Nathan Neulinger

Cool. That's likely exactly it, since I don't have one set, it's using the check interval, and occasionally must just be 
too short.


Thank you!

-- Nathan



I assume that this is the httpchk config to make sure that the server is
operational.  If so, you need to increase the "timeout check" value,
because it is too small.  The ping request is taking longer to run than
you have allowed in the timeout.  Here's part of my haproxy config:



--
--------
Nathan Neulinger   nn...@neulinger.org
Neulinger Consulting   (573) 612-1412

Re: /solr/admin/ping causing exceptions in log?

2014-07-27 Thread Nathan Neulinger

Unfortunately, doesn't look like this clears the symptom.

The ping is responding almost instantly every time. I've tried setting a 15 second timeout on the check, with no change
in occurences of the error.

Looking at a packet capture on the server side, there is a clear distinction between working and
failing/error-triggering connections.

It looks like in a "working" case, I see two packets immediately back to back (one with header, and next a continuation
with content) with no ack in between, followed by ack, rst+ack, rst.

In the failing request, I see the GET request, acked, then the http/1.1 200 Ok response from Solr, a single ack, and
then an almost instantaneous reset sent by the client.

I'm only seeing this on traffic to/from haproxy checks. If I do a simple:

while [ true ]; do curl -s http://host:8983/solr/admin/ping; done

from the same box, that flood runs with generally 10-20ms request times and
zero errors.

-- Nathan

On 07/27/2014 07:12 PM, Nathan Neulinger wrote:

Cool. That's likely exactly it, since I don't have one set, it's using the
check interval, and occasionally must just be
too short.

Thank you!

-- Nathan

I assume that this is the httpchk config to make sure that the server is
operational. If so, you need to increase the "timeout check" value,
because it is too small. The ping request is taking longer to run than
you have allowed in the timeout. Here's part of my haproxy config:

--
--------
Nathan Neulinger nn...@neulinger.org
Neulinger Consulting (573) 612-1412

solr-working.cap
Description: application/vnd.tcpdump.pcap

solr-cutoff2.cap
Description: application/vnd.tcpdump.pcap

Re: /solr/admin/ping causing exceptions in log?

2014-07-27 Thread Nathan Neulinger


Either way, looks like this is not a SOLR issue, but rather haproxy.

Thanks.

-- Nathan

On 07/27/2014 08:23 PM, Nathan Neulinger wrote:

Unfortunately, doesn't look like this clears the symptom.

The ping is responding almost instantly every time. I've tried setting a 15 
second timeout on the check, with no change
in occurences of the error.

Looking at a packet capture on the server side, there is a clear distinction 
between working and
failing/error-triggering connections.

It looks like in a "working" case, I see two packets immediately back to back 
(one with header, and next a continuation
with content) with no ack in between, followed by ack, rst+ack, rst.

In the failing request, I see the GET request, acked, then the http/1.1 200 Ok 
response from Solr, a single ack, and
then an almost instantaneous reset sent by the client.


I'm only seeing this on traffic to/from haproxy checks. If I do a simple:

 while [ true ]; do curl -s http://host:8983/solr/admin/ping; done

from the same box, that flood runs with generally 10-20ms request times and 
zero errors.

-- Nathan

On 07/27/2014 07:12 PM, Nathan Neulinger wrote:

Cool. That's likely exactly it, since I don't have one set, it's using the 
check interval, and occasionally must just be
too short.

Thank you!

-- Nathan



I assume that this is the httpchk config to make sure that the server is
operational.  If so, you need to increase the "timeout check" value,
because it is too small.  The ping request is taking longer to run than
you have allowed in the timeout.  Here's part of my haproxy config:







--
--------
Nathan Neulinger   nn...@neulinger.org
Neulinger Consulting   (573) 612-1412

Re: /solr/admin/ping causing exceptions in log?

2014-07-28 Thread Nathan Neulinger

Thing is - I wouldn't expect any of the default options mentioned to change the
behavior intermittently.

i.e. it's working for 95% of the health check requests, it's just the intermittent ones that seem to be cut off... I'm
inquiring with haproxy devs since it appears that at least one other person on #haproxy is seeing the same behavior.
Doesn't appear to be specific to solr.

-- Nathan

On 07/27/2014 10:44 PM, Shawn Heisey wrote:

On 7/27/2014 7:23 PM, Nathan Neulinger wrote:

Unfortunately, doesn't look like this clears the symptom.

The ping is responding almost instantly every time. I've tried setting a
15 second timeout on the check, with no change in occurences of the error.

Looking at a packet capture on the server side, there is a clear
distinction between working and failing/error-triggering connections.

It looks like in a "working" case, I see two packets immediately back to
back (one with header, and next a continuation with content) with no ack
in between, followed by ack, rst+ack, rst.

In the failing request, I see the GET request, acked, then the http/1.1
200 Ok response from Solr, a single ack, and then an almost
instantaneous reset sent by the client.

I'm only seeing this on traffic to/from haproxy checks. If I do a simple:

while [ true ]; do curl -s http://host:8983/solr/admin/ping; done

from the same box, that flood runs with generally 10-20ms request times
and zero errors.

I won't claim to understand what's going on here, but it might be a
matter of the haproxy options. Here are the options I'm using in the
"defaults" section of the config:

defaults
log global
modehttp
option httplog
option dontlognull
option redispatch
option abortonclose
option http-server-close
option http-pretend-keepalive
retries 1
maxconn 1024
timeout connect 1s
timeout client 5s
timeout server 30s

One bit of information I came across when I first started setting
haproxy up for Solr is that servlet containers like Jetty and Tomcat
require the "http-pretend-keepalive" option to work properly. Are you
using this option?

Thanks,
Shawn

Nathan Neulinger nn...@neulinger.org
Neulinger Consulting (573) 612-1412

What is the "right" way to bring a failed SolrCloud node back online?

2014-01-24 Thread Nathan Neulinger

I have an environment where new collections are being added frequently (isolated per customer), and the backup is 
virtually guaranteed to be missing some of them.


As it stands, bringing up the restored/out-of-date instance results in thos collections being stuck in 'Recovering' 
state, because the cores don't exist on the resulting server. This can also be extended to the case of restoring a 
completely blank instance.


Is there any way to tell SolrCloud "Try recreating any missing cores for this collection based on where you know they 
should be located."


Or do I need to actually determine a list of cores (..._shardX_replicaY) and trigger the core creates myself, at which 
point I gather that it will start recovery for each of them?


-- Nathan

----
Nathan Neulinger   nn...@neulinger.org
Neulinger Consulting   (573) 612-1412

Replica not consistent after update request?

2014-01-24 Thread Nathan Neulinger


How can we issue an update request and be certain that all of the replicas in 
the SolrCloud cluster are up to date?

I found this post:

http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/79886

which seems to indicate that all replicas for a shard must finish/succeed before it returns to client that the operation 
succeeded - but we've been seeing behavior lately (until we configured automatic soft commits) where the replicas were 
almost always "not current" - i.e. the replicas were missing documents/etc.


Is this something wrong with our cloud setup/replication, or am I misinterpreting the way that updates in a cloud 
deployment are supposed to function?


If it's a problem with our cloud setup, do you have any suggestions on 
diagnostics?

Alternatively, are we perhaps just using it wrong?

-- Nathan

--------
Nathan Neulinger   nn...@neulinger.org
Neulinger Consulting   (573) 612-1412

Re: Replica not consistent after update request?

2014-01-24 Thread Nathan Neulinger


Wow, the detail in that jira issue makes my brain hurt... Great to see it's got 
a quick answer/fix!

Thank you!

-- Nathan

On 01/24/2014 09:43 PM, Joel Bernstein wrote:

If you're on Solr 4.6 then this is likely the issue:
https://issues.apache.org/jira/browse/SOLR-4260.

The issue is resolved for Solr 4.6.1 which should be out next week.


Joel Bernstein
Search Engineer at Heliosearch


On Fri, Jan 24, 2014 at 9:52 PM, Nathan Neulinger wrote:


How can we issue an update request and be certain that all of the replicas
in the SolrCloud cluster are up to date?

I found this post:

 http://comments.gmane.org/gmane.comp.jakarta.lucene.
solr.user/79886

which seems to indicate that all replicas for a shard must finish/succeed
before it returns to client that the operation succeeded - but we've been
seeing behavior lately (until we configured automatic soft commits) where
the replicas were almost always "not current" - i.e. the replicas were
missing documents/etc.

Is this something wrong with our cloud setup/replication, or am I
misinterpreting the way that updates in a cloud deployment are supposed to
function?

If it's a problem with our cloud setup, do you have any suggestions on
diagnostics?

Alternatively, are we perhaps just using it wrong?

-- Nathan

--------
Nathan Neulinger   nn...@neulinger.org
Neulinger Consulting   (573) 612-1412





--
--------
Nathan Neulinger   nn...@neulinger.org
Neulinger Consulting   (573) 612-1412

Re: Replica not consistent after update request?

2014-01-24 Thread Nathan Neulinger


It's 4.6.0. Pair of servers with an external 3-node zk ensemble.

SOLR-4260 looks like a very promising answer. Will check it out as soon as 
4.6.1 is released.

May also check out the nightly builds since this is still just 
development/prototype usage.

-- Nathan

On 01/24/2014 09:45 PM, Anshum Gupta wrote:

Hi Nathan,

It'd be great to have more information about your setup, Solr Version?
Depending upon your version, you might want to also look at:
https://issues.apache.org/jira/browse/SOLR-4260 (which is now fixed).


On Fri, Jan 24, 2014 at 6:52 PM, Nathan Neulinger wrote:


How can we issue an update request and be certain that all of the replicas
in the SolrCloud cluster are up to date?

I found this post:

 http://comments.gmane.org/gmane.comp.jakarta.lucene.
solr.user/79886

which seems to indicate that all replicas for a shard must finish/succeed
before it returns to client that the operation succeeded - but we've been
seeing behavior lately (until we configured automatic soft commits) where
the replicas were almost always "not current" - i.e. the replicas were
missing documents/etc.

Is this something wrong with our cloud setup/replication, or am I
misinterpreting the way that updates in a cloud deployment are supposed to
function?

If it's a problem with our cloud setup, do you have any suggestions on
diagnostics?

Alternatively, are we perhaps just using it wrong?

-- Nathan

--------
Nathan Neulinger   nn...@neulinger.org
Neulinger Consulting   (573) 612-1412







--
--------
Nathan Neulinger   nn...@neulinger.org
Neulinger Consulting   (573) 612-1412

Re: Replica not consistent after update request?

2014-01-25 Thread Nathan Neulinger


Ok, so our issue sounds like a combination of not having softCommits properly 
done, combined with SOLR-4260.

Thanks everyone!

On 01/24/2014 11:04 PM, Erick Erickson wrote:

Right. There updates are guaranteed to be on the replicas and in their
transaction logs. That doesn't mean they're searchable, however. For a
document to be found in a search there must be a commit, either soft,
or hard with openSearcher=true. Here's a post that outlines all this.



If you have discrepancies when after commits, that's a problem

Best,
Erick

On Fri, Jan 24, 2014 at 8:52 PM, Nathan Neulinger  wrote:

How can we issue an update request and be certain that all of the replicas
in the SolrCloud cluster are up to date?

I found this post:

 http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/79886

which seems to indicate that all replicas for a shard must finish/succeed
before it returns to client that the operation succeeded - but we've been
seeing behavior lately (until we configured automatic soft commits) where
the replicas were almost always "not current" - i.e. the replicas were
missing documents/etc.

Is this something wrong with our cloud setup/replication, or am I
misinterpreting the way that updates in a cloud deployment are supposed to
function?

If it's a problem with our cloud setup, do you have any suggestions on
diagnostics?

Alternatively, are we perhaps just using it wrong?

-- Nathan

--------
Nathan Neulinger   nn...@neulinger.org
Neulinger Consulting   (573) 612-1412


--
--------
Nathan Neulinger   nn...@neulinger.org
Neulinger Consulting   (573) 612-1412

Re: What is the "right" way to bring a failed SolrCloud node back online?

2014-01-26 Thread Nathan Neulinger

Thanks, yeah, I did just that - and sent the script in on SOLR-5665 if anyone wants a copy. Script is trivial, but
you're welcome to stick it (trivial) in contrib or something if it's at all useful to anyone.

-- Nathan

On 01/26/2014 08:28 AM, Mark Miller wrote:

We are working on a new mode (which should become the default) where ZooKeeper
will be treated as the truth for a cluster.

This mode will be able to handle situations like this - if the cluster state
says a core should exist on a node and it doesn’t, it will be created on
startup.

The way things work currently is this kind of hybrid situation where the truth
is partly in ZooKeeper partly on each node. This is not ideal at all.

I think this new mode is very important, and it will be coming shortly. Until
then, I’d recommend writing this logic externally as you suggest (I’ve seen it
done before).

- Mark

http://about.me/markrmiller

On Jan 24, 2014, at 12:01 PM, Nathan Neulinger wrote:

I have an environment where new collections are being added frequently
(isolated per customer), and the backup is virtually guaranteed to be missing
some of them.

As it stands, bringing up the restored/out-of-date instance results in thos
collections being stuck in 'Recovering' state, because the cores don't exist on
the resulting server. This can also be extended to the case of restoring a
completely blank instance.

Is there any way to tell SolrCloud "Try recreating any missing cores for this
collection based on where you know they should be located."

Or do I need to actually determine a list of cores (..._shardX_replicaY) and
trigger the core creates myself, at which point I gather that it will start
recovery for each of them?

-- Nathan

--------
Nathan Neulinger nn...@neulinger.org
Neulinger Consulting (573) 612-1412

--
--------
Nathan Neulinger nn...@neulinger.org
Neulinger Consulting (573) 612-1412

Why does CLUSTERSTATUS return different information than the web cloud view?

2014-08-23 Thread Nathan Neulinger


In particular, a shard being 'active' vs. 'gone'.

The web ui is clearly showing the given replicas as being in "Gone" state when I shut down a server, yet the 
CLUSTERSTATUS says that each replica has state: "active"


Is there any way to ask it for status that will reflect that the replica is 
gone?

This is with 4.8.0.

-- Nathan

--------
Nathan Neulinger   nn...@neulinger.org
Neulinger Consulting   (573) 612-1412

Re: Why does CLUSTERSTATUS return different information than the web cloud view?

2014-08-23 Thread Nathan Neulinger

Is there a way to query the 'live node' state without sending a query to every node myself? i.e. to get the same data 
that is used for that cloud status screen?


-- Nathan

On 08/23/2014 06:39 PM, Mark Miller wrote:

The state is actually a combo of the state in clusterstate and the live nodes. 
If the live node is not there, it's gone regardless of the last state it 
published.

- Mark


On Aug 23, 2014, at 6:00 PM, Nathan Neulinger  wrote:

In particular, a shard being 'active' vs. 'gone'.

The web ui is clearly showing the given replicas as being in "Gone" state when I shut 
down a server, yet the CLUSTERSTATUS says that each replica has state: "active"

Is there any way to ask it for status that will reflect that the replica is 
gone?

This is with 4.8.0.

-- Nathan

--------
Nathan Neulinger   nn...@neulinger.org
Neulinger Consulting   (573) 612-1412


--
--------
Nathan Neulinger   nn...@neulinger.org
Neulinger Consulting   (573) 612-1412

problem with replication/solrcloud - getting 'missing required field' during update intermittently (SOLR-6251)

Re: problem with replication/solrcloud - getting 'missing required field' during update intermittently (SOLR-6251)

/solr/admin/ping causing exceptions in log?

Re: /solr/admin/ping causing exceptions in log?

Re: /solr/admin/ping causing exceptions in log?

Re: /solr/admin/ping causing exceptions in log?

Re: /solr/admin/ping causing exceptions in log?

Re: /solr/admin/ping causing exceptions in log?

What is the "right" way to bring a failed SolrCloud node back online?

Replica not consistent after update request?

Re: Replica not consistent after update request?

Re: Replica not consistent after update request?

Re: Replica not consistent after update request?

Re: What is the "right" way to bring a failed SolrCloud node back online?

Why does CLUSTERSTATUS return different information than the web cloud view?

Re: Why does CLUSTERSTATUS return different information than the web cloud view?

16 matches

Site Navigation

Mail list logo

Footer information