subject:"Streaming Expression joins not returning all results"

Re: Streaming Expression joins not returning all results

2016-05-16 Thread Ryan Cutter

We likely have the same laptop :-) There must be something weird with my schema or usage but even if I had 10x the throughput I have now, throwing around that many docs for a single join isn't conducive to desired latency, concurrent requests, network bandwidth, etc. I feel like I'm not using the

Re: Streaming Expression joins not returning all results

2016-05-16 Thread Joel Bernstein

So, with that setup you're getting around 150,000 docs per second throughput. On my laptop with a similar query I was able to stream around 650,000 docs per second. I have an SSD and 16 Gigs of RAM. Also I did lots of experimenting with different numbers of workers and tested after warming the part

Re: Streaming Expression joins not returning all results

2016-05-16 Thread Ryan Cutter

Thanks for all this info, Joel. I found if I artificially limit the triples stream to 3M and use the /export handler with only 2 workers, I can get results in @ 20 seconds and Solr doesn't tip over. That seems to be the best config for this local/single instance. It's also clear I'm not using st

Re: Streaming Expression joins not returning all results

2016-05-15 Thread Joel Bernstein

One other thing to keep in is how the partitioning is done when you add the partitionKeys. Partitioning is done using the HashQParserPlugin, which builds a filter for each worker. Under the covers this is using the normal filter query mechanism. So after the filters are built and cached they are e

Re: Streaming Expression joins not returning all results

2016-05-15 Thread Joel Bernstein

Ah, you also used 4 shards. That means with 8 workers there were 32 concurrent queries against the /select handler each requesting 100,000 rows. That's a really heavy load! You can still try out the approach from my last email on the 4 shards setup, as you add workers gradually you'll gradually ra

Re: Streaming Expression joins not returning all results

2016-05-15 Thread Joel Bernstein

Hi Ryan, The rows=10 on the /select handler is likely going to cause problems with 8 workers. This is calling the /select handler with 8 concurrent workers each retrieving 100,000 rows. The /select handler bogs down as the number of rows increases. So using the rows parameter with the /select

Re: Streaming Expression joins not returning all results

2016-05-14 Thread Ryan Cutter

Hello, I'm running Solr on my laptop with -Xmx8g and gave each collection 4 shards and 2 replicas. Even grabbing 100k triple documents (like the following) is taking 20 seconds to complete and prone to fall over. I could try this in a proper cluster with multiple hosts and more sharding, etc. I

Re: Streaming Expression joins not returning all results

2016-05-13 Thread Joel Bernstein

Also the hashJoin is going to read the entire entity table into memory. If that's a large index that could be using lots of memory. 25 million docs should be ok to /export from one node, as long as you have enough memory to load the docValues for the fields for sorting and exporting. Breaking dow

Re: Streaming Expression joins not returning all results

2016-05-13 Thread Ryan Cutter

Thanks very much for the advice. Yes, I'm running in a very basic single shard environment. I thought that 25M docs was small enough to not require anything special but I will try scaling like you suggest and let you know what happens. Cheers, Ryan On Fri, May 13, 2016 at 4:53 PM, Joel Bernstei

Re: Streaming Expression joins not returning all results

2016-05-13 Thread Joel Bernstein

I would try breaking down the second query to see when the problems occur. 1) Start with just a single *:* search from one of the collections. 2) Then test the innerJoin. The innerJoin won't take much memory as it's a streaming merge join. 3) Then try the full thing. If you're running a large joi

Re: Streaming Expression joins not returning all results

2016-05-13 Thread Ryan Cutter

qt="/export" immediately fixed the query in Question #1. Sorry for missing that in the docs! The second query (with /export) crashes the server so I was going to look at parallelization if you think that's a good idea. It also seems unwise to joining into 26M docs so maybe I can reconfigure the

Re: Streaming Expression joins not returning all results

2016-05-13 Thread Joel Bernstein

A couple of other things: 1) Your innerJoin can parallelized across workers to improve performance. Take a look at the docs on the parallel function for the details. 2) It looks like you might be doing graph operations with joins. You might to take a look at the gatherNodes function coming in 6.1

Re: Streaming Expression joins not returning all results

2016-05-13 Thread Joel Bernstein

When doing things that require all the results (like joins) you need to specify the /export handler in the search function. qt="/export" The search function defaults to the /select handler which is designed to return the top N results. The /export handler always returns all results that match the

Streaming Expression joins not returning all results

2016-05-13 Thread Ryan Cutter

Question #1: triple_type collection has a few hundred docs and triple has 25M docs. When I search for a particular subject_id in triple which I know has 14 results and do not pass in 'rows' params, it returns 0 results: innerJoin( search(triple, q=subject_id:1656521, fl="triple_id,subject_id

Re: Streaming Expression joins not returning all results

Re: Streaming Expression joins not returning all results

Re: Streaming Expression joins not returning all results

Re: Streaming Expression joins not returning all results

Re: Streaming Expression joins not returning all results

Re: Streaming Expression joins not returning all results

Re: Streaming Expression joins not returning all results

Re: Streaming Expression joins not returning all results

Re: Streaming Expression joins not returning all results

Re: Streaming Expression joins not returning all results

Re: Streaming Expression joins not returning all results

Re: Streaming Expression joins not returning all results

Re: Streaming Expression joins not returning all results

Streaming Expression joins not returning all results

14 matches

Site Navigation

Mail list logo

Footer information