Range Facet queries for date ranges with with non-constant gaps

2015-07-13 Thread JoeSmith
I am trying to do a range facet query for on date ranges.  The query below
executes and returns results (almost) as desired for 60DAY buckets.



http://localhost:8983/solr/mykeyspace2.user_data/select?wt=json&fq:id=7465033&q=*:*&rows=0&indent=true&facet=on&facet.range=login_event&facet.range.gap=%2B60DAY&facet.range.start=NOW/YEAR&facet.range.end=NOW/MONTH%2B1MONTH&facet.range.min=1



Some of the buckets return with a count of  ‘0’ in the bucket even though
the facet.range.min is set to ‘1’.  That is not the primary issue though.
What I would like to get back are buckets of unevenly spaced gaps.  For
example,  counts for the last 7 days, last 30 days, last 90 days.


What would be the best way to accomplish this?And is there something
wrong with facet.range.min usage?


Re: Range Facet queries for date ranges with with non-constant gaps

2015-07-13 Thread JoeSmith
Are there any examples/documentation for IntervalFaceting using dates that
I could refer to?

On Mon, Jul 13, 2015 at 6:36 PM, Chris Hostetter 
wrote:

>
> : Some of the buckets return with a count of ‘0’ in the bucket even though
> : the facet.range.min is set to ‘1’.  That is not the primary issue
>
> facet.range.min has never been a supported (or documented) param -- you
> are most likeley trying to use "facet.mincount" (which can be specified
> per field as a top level f.my_field_name.facet.mincount, or as a
> localparam, ex: facet.range={!facet.mincount=1}my_field_name
>
> : though. What I would like to get back are buckets of unevenly spaced
> : gaps.  For example, counts for the last 7 days, last 30 days, last 90
> : days.
>
> what you are describing is exactly what the "Interval Faceting" feature
> provides...
>
>
> https://cwiki.apache.org/confluence/display/solr/Faceting#Faceting-IntervalFaceting
>
>
> -Hoss
> http://www.lucidworks.com/


Re: Range Facet queries for date ranges with with non-constant gaps

2015-07-18 Thread JoeSmith
Thank you.  That helped

On Tue, Jul 14, 2015 at 5:02 PM, Chris Hostetter 
wrote:

>
> : Are there any examples/documentation for IntervalFaceting using dates
> that
> : I could refer to?
>
> You just specify the interval set start & end as properly formated date
> values.  This example shows some range faceting and interval faceting on
> the same field of the "bin/solr -e techproducts" example..
>
>
> http://localhost:8983/solr/techproducts/select?q=*:*&rows=0&facet=true&facet.interval.set=[2006-01-01T00:00:00Z,2007-01-01T00:00:00Z]&&facet.interval.set=[2005-01-01T00:00:00Z,2006-01-01T00:00:00Z]&facet.interval.set=[2005-01-01T00:00:00Z,2007-01-01T00:00:00Z]&facet.interval=manufacturedate_dt&facet.range=manufacturedate_dt&facet.range.start=2005-01-01T00:00:00Z&facet.range.end=2007-01-01T00:00:00Z&facet.range.gap=%2B2MONTHS
>
>
>
> -Hoss
> http://www.lucidworks.com/
>


CloudSolrServer, concurrency and too many connections

2014-12-06 Thread JoeSmith
We are using Solrj 10.10.0 to connect to a Zookeeper Solr host.  What is
the correct pattern for making concurrent requests to the Zookeeper host?

We are currently using CloudSolrServer, but it looks like this class is not
thread-safe (setDefaultCollection). Should this instance be initialized
once (at startup) and then re-used (in all threads) until shutdown when the
process terminates?  Or should it re-instantiated for each request?

Currently, we are trying to use CloudSolrServer as a singleton, but it
looks like the connections to the host are not being closed and under load
we start getting failures.  and In the Zookeeper logs we see this error:
   I

> WARN  - 2014-12-04 10:09:14.364;
> org.apache.zookeeper.server.NIOServerCnxnFactory; Too many connections from
> /11.22.33.44 - max is 60
>

netstat (on the Zookeeper host) shows that the connections are not being
closed. What is the 'correct' way to fix this?   Apologies if i have missed
any documentation that explains, pointers would be helpful.

Thanks,


Re: CloudSolrServer, concurrency and too many connections

2014-12-07 Thread JoeSmith
i've upgraded to 4.10.2 on the client-side.  Still seeing this connection
problem when connecting to the Zookeeper port.  If I connect directly to
SolrServer, the connections do not increase.  But when connecting to
Zookeeper, the connections increase up to 60 and then start to fail.  I
understand Zookeeper is configured to fail after 60 connections to prevent
a DOS attack, but I dont see why we keep adding new connections (up to
60).  Does the client-side Zookeeper code also use HttpClient
ConnectionPooling for its Connection Pool?  Below is the Exception that
shows up in the log file when this happens.  When we execute queries we are
using the _route_ parameter, could this explain anything?

o.a.zookeeper.ClientCnxn - Session 0x0 for server
aweqca3utmtc10.cloud..com/10.22.10.107:9983, unexpected error, closing
socket connection and attempting reconnect

java.io.IOException: Connection reset by peer

at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[na:1.7.0_55]

at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
~[na:1.7.0_55]

at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
~[na:1.7.0_55]

at sun.nio.ch.IOUtil.read(IOUtil.java:192) ~[na:1.7.0_55]

at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
~[na:1.7.0_55]

at
org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68)
~[zookeeper-3.4.6.jar:3.4.6-1569965]

at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366)
~[zookeeper-3.4.6.jar:3.4.6-1569965]

at
org.apache.zookeeper.Clie4.ntCnxn$SendThread.run(ClientCnxn.java:1081)
~[zookeeper-3.4.6.jar:3.4.6-1569965]


Will try to get the server code upgraded to 4.10.2.



On Sat, Dec 6, 2014 at 3:52 PM, Shawn Heisey  wrote:

> On 12/6/2014 12:09 PM, JoeSmith wrote:
> > We are currently using CloudSolrServer, but it looks like this class is
> not
> > thread-safe (setDefaultCollection). Should this instance be initialized
> > once (at startup) and then re-used (in all threads) until shutdown when
> the
> > process terminates?  Or should it re-instantiated for each request?
> >
> > Currently, we are trying to use CloudSolrServer as a singleton, but it
> > looks like the connections to the host are not being closed and under
> load
> > we start getting failures.  and In the Zookeeper logs we see this error:
> >
> >> WARN  - 2014-12-04 10:09:14.364;
> >> org.apache.zookeeper.server.NIOServerCnxnFactory; Too many connections
> from
> >> /11.22.33.44 - max is 60
> >
> > netstat (on the Zookeeper host) shows that the connections are not being
> > closed. What is the 'correct' way to fix this?   Apologies if i have
> missed
> > any documentation that explains, pointers would be helpful.
>
> All SolrServer implementations in SolrJ, including CloudSolrServer, are
> supposed to be threadsafe.  If it turns out they're not actually
> threadsafe, then we treat that as a bug.  The discussion to determine
> that it's a bug takes place on this mailing list, and once we determine
> that, the next step is to file an issue in Jira.
>
> The general way to use SolrJ is to initialize the server instance at the
> beginning and re-use it for all client communication to Solr.  With
> CloudSolrServer, you normally only need a single server instance to talk
> to the entire cloud, because you can set the "collection" parameter on
> each request to indicate which collection to work on.  If you only have
> a handful of collections, you might want to use multiple instances and
> use setDefaultCollection  to specify the collection.  With
> HttpSolrServer, an instance is required for each core, because the core
> name is in the initialization URL.
>
> I've not looked at the code, but I can't imagine that the client ever
> needs to make more than one connection to each server in the zookeeper
> ensemble.  Here's a list of the open connections on one of my zookeeper
> servers for my SolrCloud 4.2.1 install:
>
> java21800 root   21u  IPv62836983  0t0  TCP
> 10.8.0.151:50178->10.8.0.152:2888 (ESTABLISHED)
> java21800 root   22u  IPv62661097  0t0  TCP
> 10.8.0.151:3888->10.8.0.152:34116 (ESTABLISHED)
> java21800 root   26u  IPv6   28065088  0t0  TCP
> 10.8.0.151:2181->10.8.0.141:52583 (ESTABLISHED)
> java21800 root   27u  IPv6   23967470  0t0  TCP
> 10.8.0.151:2181->10.8.0.152:49436 (ESTABLISHED)
> java21800 root   28r  IPv6   23969636  0t0  TCP
> 10.8.0.151:2181->10.8.0.151:57290 (ESTABLISHED)
> java21800 root   29r  IPv6   23969951  0t0  TCP
> 10.8.0.151:3888->10.8.0.153:54

Re: CloudSolrServer, concurrency and too many connections

2014-12-08 Thread JoeSmith
We will need to update to 7u52, we are using 7u55.  On the client side,
this happens with zookeeper 3.4.6 and 4.10.2 solrj.  And we will need to
update both on the server side.   What kind of config/setup information
would you need to see if we do still have an issue after these updates?

On Mon, Dec 8, 2014 at 12:40 AM, Shawn Heisey  wrote:

> On 12/7/2014 9:11 PM, JoeSmith wrote:
> > i've upgraded to 4.10.2 on the client-side.  Still seeing this connection
> > problem when connecting to the Zookeeper port.  If I connect directly to
> > SolrServer, the connections do not increase.  But when connecting to
> > Zookeeper, the connections increase up to 60 and then start to fail.  I
> > understand Zookeeper is configured to fail after 60 connections to
> prevent
> > a DOS attack, but I dont see why we keep adding new connections (up to
> > 60).  Does the client-side Zookeeper code also use HttpClient
> > ConnectionPooling for its Connection Pool?  Below is the Exception that
> > shows up in the log file when this happens.  When we execute queries we
> are
> > using the _route_ parameter, could this explain anything?
>
> The docs say that Zookeeper uses NIO communication directly by default,
> so there's no layer like HttpClient.  I don't think it uses pooling ...
> it does everything over a single TCP connection that doesn't normally
> disconnect until the program exits.
>
> Basically, the Zookeeper authors built their own networking layer that
> uses TCP directly.  You have the option of using Netty instead:
>
>
> http://zookeeper.apache.org/doc/r3.4.6/zookeeperAdmin.html#Communication+using+the+Netty+framework
>
> Are you running version 3.4.6 for your zookeeper servers?  That's the
> version of ZK client code you'll find in Solr 4.10.x, and the
> recommended version for both the server and your SolrJ program.
>
> The most likely reasons for the connection problems you are seeing are:
>
> 1) A bug in the networking layer of your JVM.
> 1a) The latest Oracle Java 7 (currently 7u72) is highly recommended.
> 2) A bug or misconfig in the OS TCP stack, or possibly its firewall.
> 3) A bug or misconfig in zookeeper.
>
> I can't rule out the fourth possibility, but so far I think it's unlikely:
>
> 4) A bug in SolrJ that has not yet been reported or fixed.
>
> Thanks,
> Shawn
>
>


Re: CloudSolrServer, concurrency and too many connections

2014-12-08 Thread JoeSmith
Thanks, Shawn.  I updated to 7u72 and was not able to reproduce the
problem. That was good.  But just to be sure about this, I backed back down
to 7u55 and again was not able to reproduce.  So at least for now, this has
gone away even if the reason is inconclusive.


On Mon, Dec 8, 2014 at 7:37 AM, JoeSmith  wrote:

> We will need to update to 7u52, we are using 7u55.  On the client side,
> this happens with zookeeper 3.4.6 and 4.10.2 solrj.  And we will need to
> update both on the server side.   What kind of config/setup information
> would you need to see if we do still have an issue after these updates?
>
> On Mon, Dec 8, 2014 at 12:40 AM, Shawn Heisey  wrote:
>
>> On 12/7/2014 9:11 PM, JoeSmith wrote:
>> > i've upgraded to 4.10.2 on the client-side.  Still seeing this
>> connection
>> > problem when connecting to the Zookeeper port.  If I connect directly to
>> > SolrServer, the connections do not increase.  But when connecting to
>> > Zookeeper, the connections increase up to 60 and then start to fail.  I
>> > understand Zookeeper is configured to fail after 60 connections to
>> prevent
>> > a DOS attack, but I dont see why we keep adding new connections (up to
>> > 60).  Does the client-side Zookeeper code also use HttpClient
>> > ConnectionPooling for its Connection Pool?  Below is the Exception that
>> > shows up in the log file when this happens.  When we execute queries we
>> are
>> > using the _route_ parameter, could this explain anything?
>>
>> The docs say that Zookeeper uses NIO communication directly by default,
>> so there's no layer like HttpClient.  I don't think it uses pooling ...
>> it does everything over a single TCP connection that doesn't normally
>> disconnect until the program exits.
>>
>> Basically, the Zookeeper authors built their own networking layer that
>> uses TCP directly.  You have the option of using Netty instead:
>>
>>
>> http://zookeeper.apache.org/doc/r3.4.6/zookeeperAdmin.html#Communication+using+the+Netty+framework
>>
>> Are you running version 3.4.6 for your zookeeper servers?  That's the
>> version of ZK client code you'll find in Solr 4.10.x, and the
>> recommended version for both the server and your SolrJ program.
>>
>> The most likely reasons for the connection problems you are seeing are:
>>
>> 1) A bug in the networking layer of your JVM.
>> 1a) The latest Oracle Java 7 (currently 7u72) is highly recommended.
>> 2) A bug or misconfig in the OS TCP stack, or possibly its firewall.
>> 3) A bug or misconfig in zookeeper.
>>
>> I can't rule out the fourth possibility, but so far I think it's unlikely:
>>
>> 4) A bug in SolrJ that has not yet been reported or fixed.
>>
>> Thanks,
>> Shawn
>>
>>
>