Re: Streaming Expression joins not returning all results

2016-05-16 Thread Ryan Cutter
h+Worker+Collections > > As I work on the documentation I'll revalidate the performance numbers I > was seeing when I did the performance testing several months ago. > > Joel Bernstein > http://joelsolr.blogspot.com/ > > On Mon, May 16, 2016 at 10:51 AM, Ryan Cutter &g

Re: Streaming Expression joins not returning all results

2016-05-16 Thread Ryan Cutter
ly take some time to complete though as you are > sorting > >> and exporting 30,000,000 million docs from a single node. > >> > >> 2) Then try running the same *:* search() against the /export handler in > >> parallel() gradually increasing the number of worker

Re: Streaming Expression joins not returning all results

2016-05-14 Thread Ryan Cutter
ers, you could export 52,000,000 docs per-second. With 40 > shards, 5 replicas and 40 workers you could export 130,000,000 docs per > second. > > So with large clusters you could do very large distributed joins with > sub-second performance. > > > > > Joel Bernstein >

Re: Streaming Expression joins not returning all results

2016-05-13 Thread Ryan Cutter
oming in 6.1: > > > > > > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=62693238 > > > > > > Joel Bernstein > > > http://joelsolr.blogspot.com/ > > > > > > On Fri, May 13, 2016 at 5:57 PM, Joel Bernste

Re: Streaming Expression joins not returning all results

2016-05-13 Thread Ryan Cutter
so keep in mind that the /export handler requires > > that sort fields and fl fields have docValues set. > > > > Joel Bernstein > > http://joelsolr.blogspot.com/ > > > > On Fri, May 13, 2016 at 5:36 PM, Ryan Cutter > wrote: > > > >> Question #1:

Streaming Expression joins not returning all results

2016-05-13 Thread Ryan Cutter
Question #1: triple_type collection has a few hundred docs and triple has 25M docs. When I search for a particular subject_id in triple which I know has 14 results and do not pass in 'rows' params, it returns 0 results: innerJoin( search(triple, q=subject_id:1656521, fl="triple_id,subject_id

Re: Streaming expressions join operations

2016-05-12 Thread Ryan Cutter
ming from the wrapper exception > I > > believe. > > > > Joel Bernstein > > http://joelsolr.blogspot.com/ > > > > On Tue, May 10, 2016 at 12:30 AM, Ryan Cutter > > wrote: > > > >> Yes, the people collection has the personId and pets has ownerId, as

Re: Streaming expressions join operations

2016-05-09 Thread Ryan Cutter
Joel Bernstein > http://joelsolr.blogspot.com/ > > On Mon, May 9, 2016 at 10:43 PM, Ryan Cutter wrote: > > > Thanks Joel, I added the personId and ownerId fields before ingested a > > little data. I made them to be stored=true/multiValue=false/longs (and > > strings, la

Re: Streaming expressions join operations

2016-05-09 Thread Ryan Cutter
> http://joelsolr.blogspot.com/ > > On Mon, May 9, 2016 at 9:22 PM, Ryan Cutter wrote: > > > Hello, I'm checking out the cool stream join operations in Solr 6.0 but > > can't seem to the example listed on the wiki to work: > > > > > > &g

Streaming expressions join operations

2016-05-09 Thread Ryan Cutter
Hello, I'm checking out the cool stream join operations in Solr 6.0 but can't seem to the example listed on the wiki to work: https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions#StreamingExpressions-innerJoin innerJoin( search(people, q=*:*, fl="personId,name", sort="personId

Re: SolrCloud delete by query performance

2015-05-20 Thread Ryan Cutter
Shawn, thank you very much for that explanation. It helps a lot. Cheers, Ryan On Wed, May 20, 2015 at 5:07 PM, Shawn Heisey wrote: > On 5/20/2015 5:57 PM, Ryan Cutter wrote: > > GC is operating the way I think it should but I am lacking memory. I am > > just surprised beca

Re: SolrCloud delete by query performance

2015-05-20 Thread Ryan Cutter
? And if there isn't much slack memory laying around to begin with, there's a bunch of contention/swap? Thanks Shawn! On Wed, May 20, 2015 at 4:50 PM, Shawn Heisey wrote: > On 5/20/2015 5:41 PM, Ryan Cutter wrote: > > I have a collection with 1 billion documents and I want to del

SolrCloud delete by query performance

2015-05-20 Thread Ryan Cutter
I have a collection with 1 billion documents and I want to delete 500 of them. The collection has a dozen shards and a couple replicas. Using Solr 4.4. Sent the delete query via HTTP: http://hostname:8983/solr/my_collection/update?stream.body= source:foo Took a couple minutes and several repli

Re: Overseer cannot talk to ZK

2014-09-29 Thread Ryan Cutter
Sorry, I believe this can be disregarded. There were changes made to system time that likely caused this state. Apologies, Ryan On Mon, Sep 29, 2014 at 8:24 AM, Ryan Cutter wrote: > Solr 4.7.2 went down during a period of little activity. Wondering if > anyone has an idea about what&#

Overseer cannot talk to ZK

2014-09-29 Thread Ryan Cutter
Solr 4.7.2 went down during a period of little activity. Wondering if anyone has an idea about what's going on, thanks! INFO - 2014-09-26 15:35:00.152; org.apache.solr.cloud.DistributedQueue$LatchChildWatcher; LatchChildWatcher fired on path: null state: Disconnected type None then eventually:

Re: Index a time/date range

2014-07-31 Thread Ryan Cutter
> > On Wed, Jul 30, 2014 at 7:29 PM, Jost Baron wrote: > >> -BEGIN PGP SIGNED MESSAGE- >> Hash: SHA1 >> >> Hi Ryan, >> >> On 07/31/2014 01:26 AM, Ryan Cutter wrote: >> > Is there a way to index time or date ranges? That

Index a time/date range

2014-07-30 Thread Ryan Cutter
Is there a way to index time or date ranges? That is, assume 2 docs: #1: date = 2014-01-01 #2: date = 2014-02-01 through 2014-05-01 Would there be a way to index #2's date as a single field and have all the search options you usually get with time/date? One strategy could be to index the start

Re: Branch/Java questions re: contributing code

2014-01-06 Thread Ryan Cutter
Thanks, everything worked fine after these pointers and I was able to generate a patch properly. Cheers, Ryan On Mon, Jan 6, 2014 at 7:31 AM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: > On Mon, Jan 6, 2014 at 8:54 PM, Ryan Cutter wrote: > > 1. Should we be using Ja

Branch/Java questions re: contributing code

2014-01-06 Thread Ryan Cutter
1. Should we be using Java 6 or 7? The docs say 1.6 ( http://wiki.apache.org/solr/HowToContribute) but running 'ant test' on trunk/ yields: /lucene/common-build.xml:328: Minimum supported Java version is 1.7. I don't get that error with branch_4x/ which leads to my next question. 2. Should

Re: Heap size and Solr 4.3

2013-12-17 Thread Ryan Cutter
Marcello, Can you quantify what you're seeing? Did you send the JVM any args (Xmx, Xms, etc)? Thanks, Ryan On Mon, Dec 16, 2013 at 1:01 AM, Marcello Lorenzi wrote: > Hi All, > we have deployed on our production environment a new Solr 4.3 instance (2 > nodes with SolrCloud) but this morning on

Re: Solr hardware memory question

2013-12-10 Thread Ryan Cutter
Shawn's right that if you're going to scale this big you'd be very well served to spend time getting the index as small as possible. In my experience if your searches require real-time random access reads (that is, the entire index needs to be fast), you don't want to wait for HDD disk reads. Get

Re: Constantly increasing time of full data import

2013-12-02 Thread Ryan Cutter
Michal, I don't have much experience with DIH so I'll leave that to someone else but I would suggest you profile Solr during imports. That might show you where the bottleneck is. Generally, it's reasonable to think Solr updates will get slower the larger the indexes get and the more load you put

Re: How to remove a Solr Node and its cores from a cluster SolrCloud and from collection

2013-11-29 Thread Ryan Cutter
For people who run into this situation in the future: I had the exact same problem Sebastien had while using 4.4.0 (1 of my 6 nodes died). We rebuilt a host to take its place but gave it the same hostname instead of making a new one. It was configured the same way with the same config files but w

Re: Unit of dimension for solr field

2013-11-11 Thread Ryan Cutter
I think Upayavira's suggestion of writing a filter factory fits what you're asking for. However, the other end of cleverness is to simple use solr.TrieIntField and store everything in MB. So for 1TB you'd write 51200. A range query for 256MB to 1GB would be field:[256 TO 1024]. Conversion from