How to restore an index from a backup over HTTP

2014-08-15 Thread Greg Solovyev
Hello, I am looking for advice on implementing the following backup/restore 
scenario. 
We are using Solr to index email. Each mailbox has it's own Collection. We do 
not store emails in Solr, the emails are stored on disk in a blob store, meta 
data is stored in a database and Solr is used only for full text search. The 
scenario is restoring a mailbox from a backup. The backup of a mailbox contains 
blobs, meta data in a SQL file. We can also pull Lucene index files from Solr 
using ReplicationHandler in the same way Solr's SnapPuller does it on a slave 
server. We already have restore utility that restores blobs and meta-data, but 
are working on a mechanism to backup and restore Solr index in a way that 
allows us to package each mailbox into a separate backup folder/archive. 

An obvious first idea for restoring is to drop the index files into a new 
folder on one of the existing Solr servers and make it pick up the new 
collection - that's simple. However, this approach has two downsides 1 - it 
requires that SSH access is set up between the machine where backup-and-restore 
script is running and Solr server, 2 - if Solr is running in SolrCloud mode, 
this approach bypasses ZooKeeper and we would have to pick the Solr instance 
for this new Collection without ZooKeeper. 

Another idea is to not include index files in backups and re-index mail upon 
restoring it. This isn't a good idea at all when restoring large mailboxes. 

What I want to achieve is being able to send the backed up index to Solr 
(either standalone or with ZooKeeper) in a way similar to creating a new 
Collection. I.e. create a new collection and upload an exiting index directly 
into that Collection. I've looked through Solr code and so far I have not found 
a handler that would allow this scenario. So, the last idea is to implement a 
special handler for this case, perhaps extending CoreAdminHandler. 
ReplicationHandler together with SnapPuller do pretty much what I need to do, 
except that the action has to be initiated by the receiving Solr server and I 
need to initiate the action externally. I.e., instead of having Solr slave 
download an index from Solr master, I need to feed the index to Solr master and 
ideally this would work the same way in standalone and SolrCloud modes. 

What are your thoughts and ideas on the subject? 

Thanks, 
Greg 


Re: How to restore an index from a backup over HTTP

2014-08-16 Thread Greg Solovyev
Thanks Shawn, this is a pretty cool idea. Adding the handler seems pretty 
straight forward, but the main concern I have is the internal data format that 
ReplicationHandler and SnapPuller use. This new handler as well as the code 
that I've already written to download the index files from Solr will depend on 
that format. Unfortunately, this format is not documented and is not abstracted 
by SolrJ, so I wonder what I can do to make sure it does not change on us 
without notice.

Thanks,
Greg

- Original Message -
From: "Shawn Heisey" 
To: solr-user@lucene.apache.org
Sent: Friday, August 15, 2014 7:31:19 PM
Subject: Re: How to restore an index from a backup over HTTP

On 8/15/2014 5:51 AM, Greg Solovyev wrote:
> What I want to achieve is being able to send the backed up index to Solr 
> (either standalone or with ZooKeeper) in a way similar to creating a new 
> Collection. I.e. create a new collection and upload an exiting index directly 
> into that Collection. I've looked through Solr code and so far I have not 
> found a handler that would allow this scenario. So, the last idea is to 
> implement a special handler for this case, perhaps extending 
> CoreAdminHandler. ReplicationHandler together with SnapPuller do pretty much 
> what I need to do, except that the action has to be initiated by the 
> receiving Solr server and I need to initiate the action externally. I.e., 
> instead of having Solr slave download an index from Solr master, I need to 
> feed the index to Solr master and ideally this would work the same way in 
> standalone and SolrCloud modes. 

I have not made any attempt to verify what I'm stating below.  It may
not work.

What I think I would *try* is setting up a standalone Solr (no cloud) on
the backup server.  Use scripted index/config copies and Solr start/stop
actions to get the index up and running on a known core in the
standalone Solr.  Then use the replication handler's HTTP API to
replicate the index from that standalone server to each of the replicas
in your cluster.

https://wiki.apache.org/solr/SolrReplication#HTTP_API
https://cwiki.apache.org/confluence/display/solr/Index+Replication#IndexReplication-HTTPAPICommandsfortheReplicationHandler

One thing that I do not know is whether SolrCloud itself might interfere
with these actions, or whether it might automatically take care of
additional replicas if you replicate to the shard leader.  If SolrCloud
*would* interfere, then this idea might need special support in
SolrCloud, perhaps as an extension to the Collections API.  If it won't
interfere, then the use-case would need to be documented (on the user
wiki at a minimum) so that committers will be aware of it and preserve
the capability in future versions.  An extension to the Collections API
might be a good idea either way -- I've seen a number of questions about
capability that falls under this basic heading.

Thanks,
Shawn


Re: How to restore an index from a backup over HTTP

2014-08-18 Thread Greg Solovyev
Thanks Jeff, I'd be interested in taking a look at the code for this tool. My 
github ID is grishick.

Thanks,
Greg

- Original Message -
From: "Jeff Wartes" 
To: solr-user@lucene.apache.org
Sent: Monday, August 18, 2014 9:49:28 PM
Subject: Re: How to restore an index from a backup over HTTP

I¹m able to do cross-solrcloud-cluster index copy using nothing more than
careful use of the ³fetchindex² replication handler command.

I¹m using this as a build/deployment tool, so I manually create a
collection in two clusters, index into one, test, and then ask the other
cluster to fetchindex from it on each shard/replica.

Some caveats:
  1. It seems like fetchindex may silently decline if it thinks the index
it has is newer.
  2. I¹m not doing this on an index that¹s currently receiving updates.
  3. SolrCloud replication doesn¹t come into this flow, even if you
fetchindex on a leader. (although once you¹re done, updates should get
replicated normally)
  4. Both collections must be created with the same number of shards and
sharding mechanism. (although replication factor can vary)
 

I¹ve got a tool for automating this that I¹d like to push to github at
some point, let me know if you¹re interested.





On 8/16/14, 3:03 AM, "Greg Solovyev"  wrote:

>Thanks Shawn, this is a pretty cool idea. Adding the handler seems pretty
>straight forward, but the main concern I have is the internal data format
>that ReplicationHandler and SnapPuller use. This new handler as well as
>the code that I've already written to download the index files from Solr
>will depend on that format. Unfortunately, this format is not documented
>and is not abstracted by SolrJ, so I wonder what I can do to make sure it
>does not change on us without notice.
>
>Thanks,
>Greg
>
>- Original Message -
>From: "Shawn Heisey" 
>To: solr-user@lucene.apache.org
>Sent: Friday, August 15, 2014 7:31:19 PM
>Subject: Re: How to restore an index from a backup over HTTP
>
>On 8/15/2014 5:51 AM, Greg Solovyev wrote:
>> What I want to achieve is being able to send the backed up index to
>>Solr (either standalone or with ZooKeeper) in a way similar to creating
>>a new Collection. I.e. create a new collection and upload an exiting
>>index directly into that Collection. I've looked through Solr code and
>>so far I have not found a handler that would allow this scenario. So,
>>the last idea is to implement a special handler for this case, perhaps
>>extending CoreAdminHandler. ReplicationHandler together with SnapPuller
>>do pretty much what I need to do, except that the action has to be
>>initiated by the receiving Solr server and I need to initiate the action
>>externally. I.e., instead of having Solr slave download an index from
>>Solr master, I need to feed the index to Solr master and ideally this
>>would work the same way in standalone and SolrCloud modes.
>
>I have not made any attempt to verify what I'm stating below.  It may
>not work.
>
>What I think I would *try* is setting up a standalone Solr (no cloud) on
>the backup server.  Use scripted index/config copies and Solr start/stop
>actions to get the index up and running on a known core in the
>standalone Solr.  Then use the replication handler's HTTP API to
>replicate the index from that standalone server to each of the replicas
>in your cluster.
>
>https://wiki.apache.org/solr/SolrReplication#HTTP_API
>https://cwiki.apache.org/confluence/display/solr/Index+Replication#IndexRe
>plication-HTTPAPICommandsfortheReplicationHandler
>
>One thing that I do not know is whether SolrCloud itself might interfere
>with these actions, or whether it might automatically take care of
>additional replicas if you replicate to the shard leader.  If SolrCloud
>*would* interfere, then this idea might need special support in
>SolrCloud, perhaps as an extension to the Collections API.  If it won't
>interfere, then the use-case would need to be documented (on the user
>wiki at a minimum) so that committers will be aware of it and preserve
>the capability in future versions.  An extension to the Collections API
>might be a good idea either way -- I've seen a number of questions about
>capability that falls under this basic heading.
>
>Thanks,
>Shawn


Re: How to restore an index from a backup over HTTP

2014-08-18 Thread Greg Solovyev
Shawn, the format that I am referencing is "filestream", which starts with 2 
bytes carrying file size, then 4 bytes carrying checksum (optional) and then 
the actual bits of the file.

Thanks,
Greg

- Original Message -
From: "Shawn Heisey" 
To: solr-user@lucene.apache.org
Sent: Sunday, August 17, 2014 12:28:12 AM
Subject: Re: How to restore an index from a backup over HTTP

On 8/16/2014 4:03 AM, Greg Solovyev wrote:
> Thanks Shawn, this is a pretty cool idea. Adding the handler seems pretty 
> straight forward, but the main concern I have is the internal data format 
> that ReplicationHandler and SnapPuller use. This new handler as well as the 
> code that I've already written to download the index files from Solr will 
> depend on that format. Unfortunately, this format is not documented and is 
> not abstracted by SolrJ, so I wonder what I can do to make sure it does not 
> change on us without notice.

I am not really sure what format you're referencing here, but I'm about
99% sure the format *over the wire* is javabin.  When the javabin format
changed between 1.4.1 and 3.1.0, replication between those versions
became impossible.

Historical: The Solr version made a huge leap after the Solr and Lucene
development was merged -- it was synchronized with the Lucene version.
There are no 1.5, 2.x, or 3.0 versions of Solr.

https://issues.apache.org/jira/browse/SOLR-2204

Thanks,
Shawn


Re: How to restore an index from a backup over HTTP

2014-09-04 Thread Greg Solovyev
Thanks Jeff!

Thanks,
Greg

- Original Message -
From: "Jeff Wartes" 
To: solr-user@lucene.apache.org
Sent: Wednesday, August 20, 2014 10:36:07 AM
Subject: Re: How to restore an index from a backup over HTTP

Here’s the repo:
https://github.com/whitepages/solrcloud_manager


Comments/Issues/Patches welcome.


On 8/18/14, 11:28 AM, "Greg Solovyev"  wrote:

>Thanks Jeff, I'd be interested in taking a look at the code for this
>tool. My github ID is grishick.
>
>Thanks,
>Greg
>
>- Original Message -
>From: "Jeff Wartes" 
>To: solr-user@lucene.apache.org
>Sent: Monday, August 18, 2014 9:49:28 PM
>Subject: Re: How to restore an index from a backup over HTTP
>
>I¹m able to do cross-solrcloud-cluster index copy using nothing more than
>careful use of the ³fetchindex² replication handler command.
>
>I¹m using this as a build/deployment tool, so I manually create a
>collection in two clusters, index into one, test, and then ask the other
>cluster to fetchindex from it on each shard/replica.
>
>Some caveats:
>  1. It seems like fetchindex may silently decline if it thinks the index
>it has is newer.
>  2. I¹m not doing this on an index that¹s currently receiving updates.
>  3. SolrCloud replication doesn¹t come into this flow, even if you
>fetchindex on a leader. (although once you¹re done, updates should get
>replicated normally)
>  4. Both collections must be created with the same number of shards and
>sharding mechanism. (although replication factor can vary)
> 
>
>I¹ve got a tool for automating this that I¹d like to push to github at
>some point, let me know if you¹re interested.
>
>
>
>
>
>On 8/16/14, 3:03 AM, "Greg Solovyev"  wrote:
>
>>Thanks Shawn, this is a pretty cool idea. Adding the handler seems pretty
>>straight forward, but the main concern I have is the internal data format
>>that ReplicationHandler and SnapPuller use. This new handler as well as
>>the code that I've already written to download the index files from Solr
>>will depend on that format. Unfortunately, this format is not documented
>>and is not abstracted by SolrJ, so I wonder what I can do to make sure it
>>does not change on us without notice.
>>
>>Thanks,
>>Greg
>>
>>- Original Message -
>>From: "Shawn Heisey" 
>>To: solr-user@lucene.apache.org
>>Sent: Friday, August 15, 2014 7:31:19 PM
>>Subject: Re: How to restore an index from a backup over HTTP
>>
>>On 8/15/2014 5:51 AM, Greg Solovyev wrote:
>>> What I want to achieve is being able to send the backed up index to
>>>Solr (either standalone or with ZooKeeper) in a way similar to creating
>>>a new Collection. I.e. create a new collection and upload an exiting
>>>index directly into that Collection. I've looked through Solr code and
>>>so far I have not found a handler that would allow this scenario. So,
>>>the last idea is to implement a special handler for this case, perhaps
>>>extending CoreAdminHandler. ReplicationHandler together with SnapPuller
>>>do pretty much what I need to do, except that the action has to be
>>>initiated by the receiving Solr server and I need to initiate the action
>>>externally. I.e., instead of having Solr slave download an index from
>>>Solr master, I need to feed the index to Solr master and ideally this
>>>would work the same way in standalone and SolrCloud modes.
>>
>>I have not made any attempt to verify what I'm stating below.  It may
>>not work.
>>
>>What I think I would *try* is setting up a standalone Solr (no cloud) on
>>the backup server.  Use scripted index/config copies and Solr start/stop
>>actions to get the index up and running on a known core in the
>>standalone Solr.  Then use the replication handler's HTTP API to
>>replicate the index from that standalone server to each of the replicas
>>in your cluster.
>>
>>https://wiki.apache.org/solr/SolrReplication#HTTP_API
>>https://cwiki.apache.org/confluence/display/solr/Index+Replication#IndexR
>>e
>>plication-HTTPAPICommandsfortheReplicationHandler
>>
>>One thing that I do not know is whether SolrCloud itself might interfere
>>with these actions, or whether it might automatically take care of
>>additional replicas if you replicate to the shard leader.  If SolrCloud
>>*would* interfere, then this idea might need special support in
>>SolrCloud, perhaps as an extension to the Collections API.  If it won't
>>interfere, then the use-case would need to be documented (on the user
>>wiki at a minimum) so that committers will be aware of it and preserve
>>the capability in future versions.  An extension to the Collections API
>>might be a good idea either way -- I've seen a number of questions about
>>capability that falls under this basic heading.
>>
>>Thanks,
>>Shawn


Re: Mongo DB Users

2014-09-15 Thread Greg Solovyev
Remove me from this thread please

Thanks,
Greg

- Original Message -
From: "Jack Krupansky" 
To: solr-user@lucene.apache.org
Sent: Monday, September 15, 2014 10:44:00 AM
Subject: Re: Mongo DB Users

> >Waiting for a positive response!

-1

-- Jack Krupansky

-Original Message- 
From: Rakesh Varna
Sent: Monday, September 15, 2014 10:18 AM
To: solr-user@lucene.apache.org
Subject: Re: Mongo DB Users

Remove

Regards,
Rakesh Varna


On Mon, Sep 15, 2014 at 9:29 AM, Ed Smiley  wrote:

> Remove
>
> On 9/15/14, 8:35 AM, "Aaron Susan"  wrote:
>
> >Hi,
> >
> >I am here to inform you that we are having a contact list of *Mongo DB
> >Users *would you be interested in it?
> >
> >Data Field¹s Consist Of: Name, Job Title, Verified Phone Number, Verified
> >Email Address, Company Name & Address Employee Size, Revenue size, SIC
> >Code, Industry Type etc.,
> >
> >We also provide other technology users as well depends on your
> >requirement.
> >
> >For Example:
> >
> >
> >*Red Hat *
> >
> >*Terra data *
> >
> >*Net-app *
> >
> >*NuoDB*
> >
> >*MongoHQ ** and many more*
> >
> >
> >We also provide IT Decision Makers, Sales and Marketing Decision Makers,
> >C-level Titles and other titles as per your requirement.
> >
> >Please review and let me know your interest if you are looking for above
> >mentioned users list or other contacts list for your campaigns.
> >
> >Waiting for a positive response!
> >
> >Thanks
> >
> >*Aaron Susan*
> >Data Specialist
> >
> >If you are not the right person, feel free to forward this email to the
> >right person in your organization. To opt out response Remove
>
>


Consul instead of ZooKeeper anyone?

2014-10-31 Thread Greg Solovyev
I am investigating a project to make SolrCloud run on Consul instead of 
ZooKeeper. So far, my research revealed no such efforts, but I wanted to check 
with this list to make sure I am not going to be reinventing the wheel. Have 
anyone attempted using Consul instead of ZK to coordinate SolrCloud nodes? 

Thanks, 
Greg 


Re: Consul instead of ZooKeeper anyone?

2014-11-03 Thread Greg Solovyev
Thanks Erick, 
after looking further into Solr's source code, I see that it's married to ZK 
libraries and it won't be possible to extend existing code without diverting 
from the trunk. At the same time, I don't see any reason for lack of 
abstraction in cloud-related code of Solr and SolrJ. As far as I can see Consul 
provides all that SolrCloud needs and so if cloud code was using some more 
abstraction, ZK bindings could be substituted with another library. I am 
willing to implement a this functionality and the abstraction, but at the same 
time, I don't want to maintain my own branch of Solr because of this 
integration. Do you think it would be possible to add an abstraction layer to 
Solr source code in near future? 

I think Consul has all the features that SolrCloud needs and what's especially 
attractive about Consul is that it's memory footprint is 100X smaller than ZK. 
Mainly though, we are considering Consul as a main service locator for a bunch 
of other moving parts within Zimbra, so being able to avoid deploying ZK just 
for SolrCloud would save a bunch of $$ for large customers.

Thanks,
Greg

- Original Message -
From: "Erick Erickson" 
To: solr-user@lucene.apache.org
Sent: Friday, October 31, 2014 5:15:09 PM
Subject: Re: Consul instead of ZooKeeper anyone?

Not that I know of, but look before you leap. I took a quick look at
Consul and it really doesn't look like any kind of drop-in replacement.
Also, the Zookeeper usage in SolrCloud isn't really pluggable
AFAIK, so there'll be lots of places in the Solr code that need to be
reworked etc., especially in the realm of collections and sharding.

The Collections API will be challenging to port over I think.

Not to mention SolrJ and CloudSolrServer for clients who want to interact
with SolrCloud through Java.

Not saying it won't work, I just suspect that getting it done would be
a big job, and thereafter keeping those changes in sync with the
changing SolrCloud code base would chew up a lots of time. So if
I were putting my Product Manager hat on I'd ask "is the benefit
worth the effort?".

All that said, go for it if you've a mind to!

Best,
Erick

On Fri, Oct 31, 2014 at 4:08 PM, Greg Solovyev  wrote:
> I am investigating a project to make SolrCloud run on Consul instead of 
> ZooKeeper. So far, my research revealed no such efforts, but I wanted to 
> check with this list to make sure I am not going to be reinventing the wheel. 
> Have anyone attempted using Consul instead of ZK to coordinate SolrCloud 
> nodes?
>
> Thanks,
> Greg


Re: Consul instead of ZooKeeper anyone?

2014-11-04 Thread Greg Solovyev
Thanks for the answers Erick. I can see that this is a significant effort and I 
am certainly not asking the community to undertake this work. I was actually 
going to take a stab at it myself. Regarding $$ savings from not requiring ZK 
my assumption is that ZK in production demands a dedicated host and requires 
2GB RAM/instance while Consul runs on less than 100MB RAM/instance. So, for 
ISPs, BSP and large enterprise deployments, the savings come would from reduced 
resource requirements. 

Thanks,
Greg

- Original Message -
From: "Erick Erickson" 
To: solr-user@lucene.apache.org
Sent: Monday, November 3, 2014 3:25:25 PM
Subject: Re: Consul instead of ZooKeeper anyone?

bq:  Do you think it would be possible to add an abstraction layer to
Solr source code in near future?

I strongly doubt it. As you've already noted, this is a large amount
of work. Without some super-compelling advantage I just don't see the
interest.

bq:  to avoid deploying ZK just for SolrCloud would save a bunch of $$
for large customers

How so? It's free.

Making this change would, IMO, require a compelling story to generate
much enthusiasm. So far I haven't seen that story, and Jürgen and
Walter raise valid points that haven't been addressed. I suspect
you're significantly underestimating the effort to get this stable in
the SolrCloud world as well.

I don't really want to be such a wet blanket, but you're asking about
a very significant amount of work from a bunch of people, all of whom
have lots of things on their plate. So without a _very_ good reason, I
think it's unlikely to generate much interest.

Best,
Erick

On Mon, Nov 3, 2014 at 11:17 AM, Greg Solovyev  wrote:
> Thanks Erick,
> after looking further into Solr's source code, I see that it's married to ZK 
> libraries and it won't be possible to extend existing code without diverting 
> from the trunk. At the same time, I don't see any reason for lack of 
> abstraction in cloud-related code of Solr and SolrJ. As far as I can see 
> Consul provides all that SolrCloud needs and so if cloud code was using some 
> more abstraction, ZK bindings could be substituted with another library. I am 
> willing to implement a this functionality and the abstraction, but at the 
> same time, I don't want to maintain my own branch of Solr because of this 
> integration. Do you think it would be possible to add an abstraction layer to 
> Solr source code in near future?
>
> I think Consul has all the features that SolrCloud needs and what's 
> especially attractive about Consul is that it's memory footprint is 100X 
> smaller than ZK. Mainly though, we are considering Consul as a main service 
> locator for a bunch of other moving parts within Zimbra, so being able to 
> avoid deploying ZK just for SolrCloud would save a bunch of $$ for large 
> customers.
>
> Thanks,
> Greg
>
> - Original Message -
> From: "Erick Erickson" 
> To: solr-user@lucene.apache.org
> Sent: Friday, October 31, 2014 5:15:09 PM
> Subject: Re: Consul instead of ZooKeeper anyone?
>
> Not that I know of, but look before you leap. I took a quick look at
> Consul and it really doesn't look like any kind of drop-in replacement.
> Also, the Zookeeper usage in SolrCloud isn't really pluggable
> AFAIK, so there'll be lots of places in the Solr code that need to be
> reworked etc., especially in the realm of collections and sharding.
>
> The Collections API will be challenging to port over I think.
>
> Not to mention SolrJ and CloudSolrServer for clients who want to interact
> with SolrCloud through Java.
>
> Not saying it won't work, I just suspect that getting it done would be
> a big job, and thereafter keeping those changes in sync with the
> changing SolrCloud code base would chew up a lots of time. So if
> I were putting my Product Manager hat on I'd ask "is the benefit
> worth the effort?".
>
> All that said, go for it if you've a mind to!
>
> Best,
> Erick
>
> On Fri, Oct 31, 2014 at 4:08 PM, Greg Solovyev  wrote:
>> I am investigating a project to make SolrCloud run on Consul instead of 
>> ZooKeeper. So far, my research revealed no such efforts, but I wanted to 
>> check with this list to make sure I am not going to be reinventing the 
>> wheel. Have anyone attempted using Consul instead of ZK to coordinate 
>> SolrCloud nodes?
>>
>> Thanks,
>> Greg


Re: CloudSolrServer, concurrency and too many connections

2014-12-10 Thread Greg Solovyev
I am seeing the same problem with 4.10.2 and 4.9.0. CloudSolrServer keeps 
opening connections to ZK and never closes them. Eventually (very soon) ZK runs 
out of connections and stops accepting new ones. 

Thanks,
Greg

- Original Message -
From: "JoeSmith" 
To: "solr-user" 
Sent: Sunday, December 7, 2014 8:11:50 PM
Subject: Re: CloudSolrServer, concurrency and too many connections

i've upgraded to 4.10.2 on the client-side.  Still seeing this connection
problem when connecting to the Zookeeper port.  If I connect directly to
SolrServer, the connections do not increase.  But when connecting to
Zookeeper, the connections increase up to 60 and then start to fail.  I
understand Zookeeper is configured to fail after 60 connections to prevent
a DOS attack, but I dont see why we keep adding new connections (up to
60).  Does the client-side Zookeeper code also use HttpClient
ConnectionPooling for its Connection Pool?  Below is the Exception that
shows up in the log file when this happens.  When we execute queries we are
using the _route_ parameter, could this explain anything?

o.a.zookeeper.ClientCnxn - Session 0x0 for server
aweqca3utmtc10.cloud..com/10.22.10.107:9983, unexpected error, closing
socket connection and attempting reconnect

java.io.IOException: Connection reset by peer

at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[na:1.7.0_55]

at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
~[na:1.7.0_55]

at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
~[na:1.7.0_55]

at sun.nio.ch.IOUtil.read(IOUtil.java:192) ~[na:1.7.0_55]

at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
~[na:1.7.0_55]

at
org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68)
~[zookeeper-3.4.6.jar:3.4.6-1569965]

at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366)
~[zookeeper-3.4.6.jar:3.4.6-1569965]

at
org.apache.zookeeper.Clie4.ntCnxn$SendThread.run(ClientCnxn.java:1081)
~[zookeeper-3.4.6.jar:3.4.6-1569965]


Will try to get the server code upgraded to 4.10.2.



On Sat, Dec 6, 2014 at 3:52 PM, Shawn Heisey  wrote:

> On 12/6/2014 12:09 PM, JoeSmith wrote:
> > We are currently using CloudSolrServer, but it looks like this class is
> not
> > thread-safe (setDefaultCollection). Should this instance be initialized
> > once (at startup) and then re-used (in all threads) until shutdown when
> the
> > process terminates?  Or should it re-instantiated for each request?
> >
> > Currently, we are trying to use CloudSolrServer as a singleton, but it
> > looks like the connections to the host are not being closed and under
> load
> > we start getting failures.  and In the Zookeeper logs we see this error:
> >
> >> WARN  - 2014-12-04 10:09:14.364;
> >> org.apache.zookeeper.server.NIOServerCnxnFactory; Too many connections
> from
> >> /11.22.33.44 - max is 60
> >
> > netstat (on the Zookeeper host) shows that the connections are not being
> > closed. What is the 'correct' way to fix this?   Apologies if i have
> missed
> > any documentation that explains, pointers would be helpful.
>
> All SolrServer implementations in SolrJ, including CloudSolrServer, are
> supposed to be threadsafe.  If it turns out they're not actually
> threadsafe, then we treat that as a bug.  The discussion to determine
> that it's a bug takes place on this mailing list, and once we determine
> that, the next step is to file an issue in Jira.
>
> The general way to use SolrJ is to initialize the server instance at the
> beginning and re-use it for all client communication to Solr.  With
> CloudSolrServer, you normally only need a single server instance to talk
> to the entire cloud, because you can set the "collection" parameter on
> each request to indicate which collection to work on.  If you only have
> a handful of collections, you might want to use multiple instances and
> use setDefaultCollection  to specify the collection.  With
> HttpSolrServer, an instance is required for each core, because the core
> name is in the initialization URL.
>
> I've not looked at the code, but I can't imagine that the client ever
> needs to make more than one connection to each server in the zookeeper
> ensemble.  Here's a list of the open connections on one of my zookeeper
> servers for my SolrCloud 4.2.1 install:
>
> java21800 root   21u  IPv62836983  0t0  TCP
> 10.8.0.151:50178->10.8.0.152:2888 (ESTABLISHED)
> java21800 root   22u  IPv62661097  0t0  TCP
> 10.8.0.151:3888->10.8.0.152:34116 (ESTABLISHED)
> java21800 root   26u  IPv6   28065088  0t0  TCP
> 10.8.0.151:2181->10.8.0.141:52583 (ESTABLISHED)
> java21800 root   27u  IPv6   23967470  0t0  TCP
> 10.8.0.151:2181->10.8.0.152:49436 (ESTABLISHED)
> java21800 root   28r  IPv6   23969636  0t0  TCP
> 10.8.0.151:2181->10.8.0.151:57290 (ESTABLISHED)
> jav

Re: CloudSolrServer, concurrency and too many connections

2014-12-10 Thread Greg Solovyev
I am seeing this problem with Java 1.8.0_25-b17 on Ubuntu 14.04.1 LTS ZK 3.4.6, 
Solr 4.10.2

Thanks,
Greg

- Original Message -
From: "JoeSmith" 
To: "solr-user" 
Sent: Monday, December 8, 2014 6:19:08 PM
Subject: Re: CloudSolrServer, concurrency and too many connections

Thanks, Shawn.  I updated to 7u72 and was not able to reproduce the
problem. That was good.  But just to be sure about this, I backed back down
to 7u55 and again was not able to reproduce.  So at least for now, this has
gone away even if the reason is inconclusive.


On Mon, Dec 8, 2014 at 7:37 AM, JoeSmith  wrote:

> We will need to update to 7u52, we are using 7u55.  On the client side,
> this happens with zookeeper 3.4.6 and 4.10.2 solrj.  And we will need to
> update both on the server side.   What kind of config/setup information
> would you need to see if we do still have an issue after these updates?
>
> On Mon, Dec 8, 2014 at 12:40 AM, Shawn Heisey  wrote:
>
>> On 12/7/2014 9:11 PM, JoeSmith wrote:
>> > i've upgraded to 4.10.2 on the client-side.  Still seeing this
>> connection
>> > problem when connecting to the Zookeeper port.  If I connect directly to
>> > SolrServer, the connections do not increase.  But when connecting to
>> > Zookeeper, the connections increase up to 60 and then start to fail.  I
>> > understand Zookeeper is configured to fail after 60 connections to
>> prevent
>> > a DOS attack, but I dont see why we keep adding new connections (up to
>> > 60).  Does the client-side Zookeeper code also use HttpClient
>> > ConnectionPooling for its Connection Pool?  Below is the Exception that
>> > shows up in the log file when this happens.  When we execute queries we
>> are
>> > using the _route_ parameter, could this explain anything?
>>
>> The docs say that Zookeeper uses NIO communication directly by default,
>> so there's no layer like HttpClient.  I don't think it uses pooling ...
>> it does everything over a single TCP connection that doesn't normally
>> disconnect until the program exits.
>>
>> Basically, the Zookeeper authors built their own networking layer that
>> uses TCP directly.  You have the option of using Netty instead:
>>
>>
>> http://zookeeper.apache.org/doc/r3.4.6/zookeeperAdmin.html#Communication+using+the+Netty+framework
>>
>> Are you running version 3.4.6 for your zookeeper servers?  That's the
>> version of ZK client code you'll find in Solr 4.10.x, and the
>> recommended version for both the server and your SolrJ program.
>>
>> The most likely reasons for the connection problems you are seeing are:
>>
>> 1) A bug in the networking layer of your JVM.
>> 1a) The latest Oracle Java 7 (currently 7u72) is highly recommended.
>> 2) A bug or misconfig in the OS TCP stack, or possibly its firewall.
>> 3) A bug or misconfig in zookeeper.
>>
>> I can't rule out the fourth possibility, but so far I think it's unlikely:
>>
>> 4) A bug in SolrJ that has not yet been reported or fixed.
>>
>> Thanks,
>> Shawn
>>
>>
>


Re: CloudSolrServer, concurrency and too many connections

2014-12-10 Thread Greg Solovyev
This was a user error. My code was re-instantiating CloudSolrServer for each 
request and never calling CloudSolrServer::shutdown(). 

Thanks,
Greg

- Original Message -
From: "Greg Solovyev" 
To: solr-user@lucene.apache.org
Sent: Wednesday, December 10, 2014 11:08:10 AM
Subject: Re: CloudSolrServer, concurrency and too many connections

I am seeing this problem with Java 1.8.0_25-b17 on Ubuntu 14.04.1 LTS ZK 3.4.6, 
Solr 4.10.2

Thanks,
Greg

- Original Message -
From: "JoeSmith" 
To: "solr-user" 
Sent: Monday, December 8, 2014 6:19:08 PM
Subject: Re: CloudSolrServer, concurrency and too many connections

Thanks, Shawn.  I updated to 7u72 and was not able to reproduce the
problem. That was good.  But just to be sure about this, I backed back down
to 7u55 and again was not able to reproduce.  So at least for now, this has
gone away even if the reason is inconclusive.


On Mon, Dec 8, 2014 at 7:37 AM, JoeSmith  wrote:

> We will need to update to 7u52, we are using 7u55.  On the client side,
> this happens with zookeeper 3.4.6 and 4.10.2 solrj.  And we will need to
> update both on the server side.   What kind of config/setup information
> would you need to see if we do still have an issue after these updates?
>
> On Mon, Dec 8, 2014 at 12:40 AM, Shawn Heisey  wrote:
>
>> On 12/7/2014 9:11 PM, JoeSmith wrote:
>> > i've upgraded to 4.10.2 on the client-side.  Still seeing this
>> connection
>> > problem when connecting to the Zookeeper port.  If I connect directly to
>> > SolrServer, the connections do not increase.  But when connecting to
>> > Zookeeper, the connections increase up to 60 and then start to fail.  I
>> > understand Zookeeper is configured to fail after 60 connections to
>> prevent
>> > a DOS attack, but I dont see why we keep adding new connections (up to
>> > 60).  Does the client-side Zookeeper code also use HttpClient
>> > ConnectionPooling for its Connection Pool?  Below is the Exception that
>> > shows up in the log file when this happens.  When we execute queries we
>> are
>> > using the _route_ parameter, could this explain anything?
>>
>> The docs say that Zookeeper uses NIO communication directly by default,
>> so there's no layer like HttpClient.  I don't think it uses pooling ...
>> it does everything over a single TCP connection that doesn't normally
>> disconnect until the program exits.
>>
>> Basically, the Zookeeper authors built their own networking layer that
>> uses TCP directly.  You have the option of using Netty instead:
>>
>>
>> http://zookeeper.apache.org/doc/r3.4.6/zookeeperAdmin.html#Communication+using+the+Netty+framework
>>
>> Are you running version 3.4.6 for your zookeeper servers?  That's the
>> version of ZK client code you'll find in Solr 4.10.x, and the
>> recommended version for both the server and your SolrJ program.
>>
>> The most likely reasons for the connection problems you are seeing are:
>>
>> 1) A bug in the networking layer of your JVM.
>> 1a) The latest Oracle Java 7 (currently 7u72) is highly recommended.
>> 2) A bug or misconfig in the OS TCP stack, or possibly its firewall.
>> 3) A bug or misconfig in zookeeper.
>>
>> I can't rule out the fourth possibility, but so far I think it's unlikely:
>>
>> 4) A bug in SolrJ that has not yet been reported or fixed.
>>
>> Thanks,
>> Shawn
>>
>>
>


Re: Team please help

2018-04-30 Thread Greg Solovyev
Sujeet, what do you mean by migrating? E.g., are you moving your data from
Cloudera CDH to Azure HDI? Are migrating your application code written on
top of Cloudera CDH to run on top of Azure HDI? As far as I know, Azure HDI
does not include Solr, so if your application on top of Cloudera CDH is
using Solr, it won't run on HDI.
Greg

On Sat, Apr 28, 2018 at 5:45 PM Sujeet Singh 
wrote:

> Adding Dev
>
>
>
> *From:* Sujeet Singh
> *Sent:* Sunday, April 29, 2018 12:14 AM
> *To:* 'solr-user@lucene.apache.org'
> *Subject:* Team please help
>
>
>
> Team I am facing an issue right now. I am working ahead to migrate
> cloudera to HDI Azure. Now cloudera has Solr implementation and using the
> below jar
>
> search-mr-1.0.0-cdh5.7.0-job.jar
> org.apache.solr.hadoop.MapReduceIndexerTool
>
>
>
> While looking into all option I found “solr-map-reduce-4.9.0.jar” and
> tried using it with class “org.apache.solr.hadoop.MapReduceIndexerTool”. I
> tried adding lib details in solrconfig.xml but did not worked . Getting
> error
>
> “Caused by: java.lang.ClassNotFoundException:
> org.apache.solr.morphlines.solr.DocumentLoader”
>
>
>
> Please let me know the right way to use MapReduceIndexerTool class.
>
>
>
> Regards,
> --
>
> *Sujeet Singh* | Sr. Software Analyst | cloudmoyo | *E.*
> sujeet.si...@cloudmoyo.com | *M.* +91 9860586055 <+91%2098605%2086055>
>
> [image: CloudMoyo Logo]
> 
> [image:
> https://icertisportalstorage.blob.core.windows.net/siteasset/icon-linkedin.png]
> [image:
> https://icertisportalstorage.blob.core.windows.net/siteasset/icon-fb.png]
> [image:
> https://icertisportalstorage.blob.core.windows.net/siteasset/icon-twitter.png]
> 
> www.cloudmoyo.com
>
>
>