Re: Realtime get not always returning existing data

2018-09-28 Thread sgaron cse
Hey Shawn, because this is a test deployment replica is set to 1 so as far as I understand, data will not be replicated for this core. Basically we have two SOLR instances running on the same box. One on port 8983, the other on port 8984. We have 9 cores on this SOLR cloud deployment, 5 of which o

Re: Solr Streaming Queries Performance Issues [v7.2.1]

2018-09-28 Thread Toke Eskildsen
On Thu, 2018-09-27 at 15:52 -0700, RAUNAK AGRAWAL wrote: > But for last few days, we are observing now that streaming facet > response is slower that json facets. Also we have increased the > number of documents in collection (30%). Export performance goes down when segment size goes way up, so I

Re: solr and diversification

2018-09-28 Thread Tim Allison
If you haven’t already, might want to check out maximal marginal relevance...original paper: Carbonell and Goldstein. On Thu, Sep 27, 2018 at 7:29 PM Joel Bernstein wrote: > Yeah, I think your plan sounds fine. > > Do you have a specific use case for diversity of results. I've been > wondering i

Re: matches missing highlight information

2018-09-28 Thread Kudrettin Güleryüz
Hi Edwin, I do not have any modifications in solrconfig.xml for highlighting. Here is the query: http://test-51:8983/solr/mycollection/select?hl.fl=bodync&hl.simple.post=%25highlightpost%25&hl.simple.pre=%25highlightpre%25&hl=on&q=bodync:g12312 Kudret On Fri, Sep 28, 2018 at 2:09 AM Zheng Lin E

Re: Solr Streaming Queries Performance Issues [v7.2.1]

2018-09-28 Thread RAUNAK AGRAWAL
Hey Guys, This is the sample query I am making: curl http://localhost:8983/solr/collection_name/stream -d 'expr=facet(collection_name,q="id:953",bucketSorts="week desc",buckets="week",bucketSizeLimit=200,sum(sales),sum(amount),sum(days))' Also in my collection, I have almost 10 Billion documen

Re: Realtime get not always returning existing data

2018-09-28 Thread Shawn Heisey
On 9/28/2018 6:09 AM, sgaron cse wrote: because this is a test deployment replica is set to 1 so as far as I understand, data will not be replicated for this core. Basically we have two SOLR instances running on the same box. One on port 8983, the other on port 8984. We have 9 cores on this SOLR

Re: Realtime get not always returning existing data

2018-09-28 Thread Erick Erickson
I've set up a test program on a local machine, we'll see if I can reproduce here's the setup: 1> created a 2-shard, leader(primary) only collection 2> added 1M simple docs to it (ids 0-999,999) and some text 3> re-added 100_000 docs with a random id between 0 - 999,999 (inclusive) to insure t

Re: Solr Streaming Queries Performance Issues [v7.2.1]

2018-09-28 Thread Erick Erickson
It Depends (tm). The behavior changed with Solr 7.5. Here are all the gory details: https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/ and for 7.5+ https://lucidworks.com/2018/06/20/solr-and-optimizing-your-index-take-ii/ Best, Erick On Fri, Sep 28, 2018 at 10:

Re: Solr Streaming Queries Performance Issues [v7.2.1]

2018-09-28 Thread RAUNAK AGRAWAL
Thanks a lot Erick for the documentation. I will go through it and get back to you in case of any queries. Regards, Raunak On Fri, Sep 28, 2018 at 11:09 AM Erick Erickson wrote: > It Depends (tm). The behavior changed with Solr 7.5. Here are all the > gory details: > > > https://lucidworks.com/

Re: Solr Streaming Queries Performance Issues [v7.2.1]

2018-09-28 Thread Toke Eskildsen
RAUNAK AGRAWAL wrote: > curl http://localhost:8983/solr/collection_name/stream -d > 'expr=facet(collection_name,q="id:953",bucketSorts="week > desc",buckets="week",bucketSizeLimit=200,sum(sales), > sum(amount),sum(days))' Stats on numeric fields then. > Also in my collection, I have almost 10

Re: Solr Streaming Queries Performance Issues [v7.2.1]

2018-09-28 Thread RAUNAK AGRAWAL
Thanks a lot Toki. I will get back to you soon regarding patch update after having discussion with the team. Thanks & Regards On Fri, Sep 28, 2018 at 11:30 AM Toke Eskildsen wrote: > RAUNAK AGRAWAL wrote: > > > curl http://localhost:8983/solr/collection_name/stream -d > > 'expr=facet(collecti

Re: Solr Streaming Queries Performance Issues [v7.2.1]

2018-09-28 Thread Joel Bernstein
The facet expression is currently not as expressive as the JSON facet API. So for very demanding use cases you can create more highly tuned JSON facet API call. The good news is we are working this. And also working on other expressions that can be wrapped around the facet expression to implement

Re: solr and diversification

2018-09-28 Thread Joel Bernstein
Interesting, I had not heard of MMR. Joel Bernstein http://joelsolr.blogspot.com/ On Fri, Sep 28, 2018 at 10:43 AM Tim Allison wrote: > If you haven’t already, might want to check out maximal marginal > relevance...original paper: Carbonell and Goldstein. > > On Thu, Sep 27, 2018 at 7:29 PM J

Re: Realtime get not always returning existing data

2018-09-28 Thread Erick Erickson
Well, I flipped indexing on and after another 7 million queries, no fails. No reason to stop just yet, but not encouraging so far... On Fri, Sep 28, 2018, 10:58 Erick Erickson wrote: > I've set up a test program on a local machine, we'll see if I can reproduce > here's the setup: > > 1> created

Re: Solr Streaming Queries Performance Issues [v7.2.1]

2018-09-28 Thread RAUNAK AGRAWAL
Thank you Joel. Looking forward to the latest version of solr. Thanks On Fri, Sep 28, 2018 at 12:22 PM Joel Bernstein wrote: > The facet expression is currently not as expressive as the JSON facet API. > So for very demanding use cases you can create more highly tuned JSON facet > API call. > >

Re: Auto recovery of a failed Solr Cloud Node?

2018-09-28 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Shawn, On 9/27/18 10:00, Shawn Heisey wrote: > On 9/27/2018 7:24 AM, Kimber, Mike wrote: >> I'm trying to determine if there is any health check available >> to determine the above and then if the issue happens then an >> automated mechanism in Solr

Re: Exporting data using solr streaming expressions is randomly not working

2018-09-28 Thread Gaini Rajeshwar
Hi Joel, Yes it worked fine after setting *useFilterForSortedQuery *false. Does SOLR-8291 address this issue? As per SOLR-8291, it happens based on number of segments. But it my case, q=*:* always works irrespective of number of segments. Queries in

Re: Auto recovery of a failed Solr Cloud Node?

2018-09-28 Thread Shawn Heisey
On 9/28/2018 4:18 PM, Christopher Schultz wrote: I thought someone recently mentioned (but I cannot find a reference, sorry) that Solr would automatically restart if an OutOfMemoryError was encountered. Is that only for single-note Solr (i.e. non-cloud/ZK)? On non-windows systems, Solr include

Re: Realtime get not always returning existing data

2018-09-28 Thread sgaron cse
@Shawn We're running two instance on one machine for two reason: 1. The box has plenty of resources (48 cores / 256GB ram) and since I was reading that it's not recommended to use more than 31GB of heap in SOLR we figured 96 GB for keeping index data in OS cache + 31 GB of heap per instance was a g