Re: cursorMark and shards? (6.6.2)

2020-02-10 Thread Walter Underwood
sort=“id asc” wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Feb 10, 2020, at 9:50 PM, Tim Casey wrote: > > Walter, > > When you do the query, what is the sort of the results? > > tim > > On Mon, Feb 10, 2020 at 8:44 PM Walter Underwood > wrote

Re: Solr Analyzer : Filter to drop tokens based on some logic which needs access to adjacent tokens

2020-02-10 Thread Emir Arnautović
Hi Pratik, Shingle filter should do that. Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > On 10 Feb 2020, at 18:57, Pratik Patel wrote: > > Thanks for the reply Emir. > > I will be exploring the opti

Re: cursorMark and shards? (6.6.2)

2020-02-10 Thread Tim Casey
Walter, When you do the query, what is the sort of the results? tim On Mon, Feb 10, 2020 at 8:44 PM Walter Underwood wrote: > I’ll back up a bit, since it is sort of an X/Y problem. > > I have an index with four shards and 17 million documents. I want to dump > all the docs in JSON, label each

Re: cursorMark and shards? (6.6.2)

2020-02-10 Thread Walter Underwood
I’ll back up a bit, since it is sort of an X/Y problem. I have an index with four shards and 17 million documents. I want to dump all the docs in JSON, label each one with a classifier, then load them back in with the labels. This is a one-time (or rare) bootstrap of the classified data. This w

Re: cursorMark and shards? (6.6.2)

2020-02-10 Thread Michael Gibney
Possibly worth mentioning, although it might not be appropriate for your use case: if the fields you're interested in are configured with docValues, you could use streaming expressions (or directly handle thread-per-shard connections to the /export handler) and get everything in a single shot witho

Re: cursorMark and shards? (6.6.2)

2020-02-10 Thread Erick Erickson
Any field that’s unique per doc would do, but yeah, that’s usually an ID. Hmmm, I don’t see why separate queries for 0-f are necessary if you’re firing at individual replicas. Each replica should have multiple UUIDs that start with 0-f. Unless I misunderstand and you’re just firing off, say, 16

Re: cursorMark and shards? (6.6.2)

2020-02-10 Thread Walter Underwood
> On Feb 10, 2020, at 2:24 PM, Walter Underwood wrote: > > Not sure if range queries work on a UUID field, ... A search for id:0* took 260 ms, so it looks like they work just fine. I’ll try separate queries for 0-f. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org

Re: cursorMark and shards? (6.6.2)

2020-02-10 Thread Walter Underwood
I’ll give that a shot. Not sure if range queries work on a UUID field, but I have thought of segmenting the ID space and running parallel queries on those. Right now it is sucking over 1.6 million docs per hour, so that is bearable. Making it 4X or 16 X faster would be nice, though. wunder Wal

Re: cursorMark and shards? (6.6.2)

2020-02-10 Thread Erick Erickson
Not sure whether cursormark respects distrib=false, although I can easily see there being “complications” here. Hmmm, whenever I try to use distrib=false, I usually fire the query at the specific replica rather than use the shards parameter. IDK whether that’ll make any difference. https://nod

Split Shard - HDFS Index - Solr 7.6.0

2020-02-10 Thread Joe Obernberger
Hi All - Getting this error when trying to split a shard.  HDFS has space available, but it looks like it is using the local disk storage value instead of available HDFS disk space.  Is there a workaround? Thanks! {     "responseHeader": {         "status": 0,         "QTime": 6     },     "Op

cursorMark and shards? (6.6.2)

2020-02-10 Thread Walter Underwood
I tried to get fancy and dump our content with one thread per shard, but it did distributed search anyway. I specified the shard using the “shards” param and set distrib=false. Is this a bug or expected behavior in 6.6.2? I did not see it mentioned in the docs. It is working fine with a single

Re: Solr Analyzer : Filter to drop tokens based on some logic which needs access to adjacent tokens

2020-02-10 Thread Pratik Patel
Thanks for the reply Emir. I will be exploring the option of creating a custom filter. It's good to know that we can consume more than one tokens from previous filter and emit different number of tokens. Do you know of any existing filter in Solr which does something similar? It would be greatly h

Solr 8.2 replicas use only 1 CPU at 100% every solr.autoCommit.maxTime minutes

2020-02-10 Thread Vangelis Katsikaros
Hi all We run Solr 8.2.0 * with Amazon Corretto 11.0.5.10.1 SDK (java arguments shown in [1]), * on Ubuntu 18.04 * on AWS EC2 m5.2xlarge with 8 CPUs and 32GB of RAM * with -Xmx16g [1]. We have migrated from Solr 3.5 and in big core (16GB) replicas we have started to suffer degraded service. The r

Re: SORLCLOUD

2020-02-10 Thread Erick Erickson
You’ve misconfigured the startup. Although looking at the script help it is a little confusing. The -z parameter should be the _ensemble_. Pointing each Solr instance three times at the same ZK instance is not at all what you need to do. You should start them up with the “-z” parameter set to so

SORLCLOUD

2020-02-10 Thread noman
I have created three different solrcloud instance running on three different ports with external zookeeper 3 instance link with them and when I load the data in one solrcloud instance, it successfully can be accessible from three different solrcloud instance. E.g: Zookeeper ./zkServer start zoo.cfg

Re: Using MM efficiently to get right number of results

2020-02-10 Thread Erick Erickson
There isn’t really an “industry standard”, since the reasons someone wants this kind of behavior vary from situation to situation. That said, Solr has RerankQParserPlugin that’s designed for this. Best, Erick > On Feb 10, 2020, at 4:23 AM, Nitin Arora wrote: > > I am looking for an efficient w

Using MM efficiently to get right number of results

2020-02-10 Thread Nitin Arora
I am looking for an efficient way for setting the MM(minimum should match) parameter for my solr search queries. As we go from MM=100% to MM=0%, we move from lots of zero result queries on one hand to too many irrelevant results (which may then get boosted by other factors) on the other. I can thin