Re: Shingles behavior

2020-05-20 Thread Radu Gheorghe
Hi Alex, long time no see :) I tried with sow, and that basically invalidates query-time shingles (it only mathes mona OR lisa OR smile). I'm using shingles at both index and query time as a substitute for pf2 and pf3: the more shingles I match, the more relevant the document. Also, higher order

Re: Need help on handling large size of index.

2020-05-20 Thread Shawn Heisey
On 5/20/2020 11:43 AM, Modassar Ather wrote: Can you please help me with following few questions? - What is the ideal index size per shard? We have no way of knowing that. A size that works well for one index use case may not work well for another, even if the index size in both cases i

MIGRATE without split.key?

2020-05-20 Thread slly
Hello everyone, I want to migrate data from one collection to another with MIGRATE API, but if this parameter split.key is not specified, it cannot be executed. Why can't we remove this limitation? Is there a better way to migrate data? Thanks.

Solr Atomic update change value and field name

2020-05-20 Thread Hup Chen
I am new to Solr. I tried to do Atomic update by using .json file update. $SOLR/bin/post not only changing field values, but field name also has become "fieldname.set", for instance, "price" become "price.set". Update by curl /update handler was working well but since I have several millions of

Re: Need help on handling large size of index.

2020-05-20 Thread Phill Campbell
In my world your index size is common. Optimal Index size: Depends on what you are optimizing for. Query Speed? Hardware utilization? Optimizing the index is something I never do. We live with about 28% deletes. You should check your configuration for your merge policy. I run 120 shards, and I

Re: Need help on handling large size of index.

2020-05-20 Thread Phill Campbell
In my world your index size is common. Optimal Index size: Depends on what you are optimizing for. Query Speed? Hardware utilization? Optimizing the index is something I never do. We live with about 28% deletes. You should check your configuration for your merge policy. I run 120 shards, and I

Re: This IndexSchema is not mutable. Solr 7.3.1

2020-05-20 Thread Shawn Heisey
On 5/20/2020 4:30 PM, Vincenzo D'Amore wrote: another update. I think I found the problem. This error is generated when I have defined add-schema-fields in the updateRequestProcessorChain. In other words you can have ClassicIndexSchemaFactory but (and make sense) add-schema-fields has to be remo

Re: This IndexSchema is not mutable. Solr 7.3.1

2020-05-20 Thread Vincenzo D'Amore
Hi all, another update. I think I found the problem. This error is generated when I have defined add-schema-fields in the updateRequestProcessorChain. In other words you can have ClassicIndexSchemaFactory but (and make sense) add-schema-fields has to be removed by the updateRequestProcessorChain:

Re: when to use docvalue

2020-05-20 Thread Revas
Thanks, Erick. Its just when we enable both index=true and docValues=true, it increases the index time by 2x atleast for full re-index. On Wed, May 20, 2020 at 2:30 PM Erick Erickson wrote: > Revas: > > Facet queries are just queries that are constrained by the total result > set of your > prima

Re: Shingles behavior

2020-05-20 Thread Alexandre Rafalovitch
Did you try it with 'sow' parameter both ways? I am not sure I fully understand the question, especially with shingling on both passes rather than just indexing one. But at least it is something to try and is one of the difference areas between Solr and ES. Regards, Alex. On Tue, 19 May 2020 a

Re: This IndexSchema is not mutable. Solr 7.3.1

2020-05-20 Thread Vincenzo D'Amore
Hi Erick, thanks for the prompt support, I'm sure all the fields are defined (after all they are all strings and only 6). It seems that you cannot use CSV with ClassicIndexSchemaFactory On Wed, May 20, 2020 at 8:20 PM Erick Erickson wrote: > It’s the _schema_ that’s not mutable. Which implies

Re: What is the logical order of applying sorts in SOLR?

2020-05-20 Thread Alexandre Rafalovitch
If you use sort, you are basically ignoring relevancy (unless you put that into sort). Which you seem to know as your example uses FQ. Do you see performance drop on non-clustered or clustered Solr? Because, I would not be surprised if, for clustered node, all the results need to be brought into o

Re: Large query size in Solr 8.3.0

2020-05-20 Thread Alexandre Rafalovitch
Does this actually work? This individual ID matching feels very fragile attempt at enforcing the sort order and maybe represents an architectural issue. Maybe you need to do some joins or graph walking instead. Or, more likely, you would benefit from over-fetching and just sorting on the ids on the

json faceting - Terms faceting and EnumField

2020-05-20 Thread Ponnuswamy, Poornima (GE Healthcare)
Hello, We have solr 6.6 version. Below is the field and field type that is defined in solr schema. Below is the configuration for the enum servicerequestcorrective servicerequestplanned servicerequestinstallationandupgrade

Re: when to use docvalue

2020-05-20 Thread Erick Erickson
Revas: Facet queries are just queries that are constrained by the total result set of your primary query, so the answer to that would be the same as speeding up regular queries. As far as range facets are concerned, I believe they _do_ use docValues, after all they have to answer the exact same

Re: This IndexSchema is not mutable. Solr 7.3.1

2020-05-20 Thread Erick Erickson
It’s the _schema_ that’s not mutable. Which implies you have field guessing turned _off_ I’d take a look at the solr log, the error might be more informative. But at a guess, you need to define the fields you’re importing, namely id, name, surname, gender, eyeColor and hairColor in your schema.

Haystack is Back! Not just one - but three search conferences

2020-05-20 Thread Charlie Hull
Hi all, So there's no Haystack in Charlottesville this year - but we've done our very best to bring you some of the talks and training we planned online - find out more at https://opensourceconnections.com/blog/2020/05/18/haystack-is-back-go-virtual-for-relevant-search-talks-workshops-discussi

This IndexSchema is not mutable. Solr 7.3.1

2020-05-20 Thread Vincenzo D'Amore
Hi all, I'm trying to import a csv file in solr id,name,surname,gender,eyeColor,hairColor 1,pippo,pluto,male,brown,brown I'm using this command curl ' http://localhost:8983/solr/videoid/update?commit=true&header=true&fieldnames=id,name,surname,gender,eyeColor,hairColor&separator=,' -H "Content-

Need help on handling large size of index.

2020-05-20 Thread Modassar Ather
Hi, Currently we have index of size 3.5 TB. These index are distributed across 12 shards under two cores. The size of index on each shards are almost equal. We do a delta indexing every week and optimise the index. The server configuration is as follows. - Solr Version : 6.5.1 - AWS insta

Use cases for the graph streams

2020-05-20 Thread Nightingale, Jonathan A (US)
This is kind of broad question, but I was playing with the graph streams and was having trouble making the tools work for what I wanted to do. I'm wondering if the use case for the graph streams really supports standard graph queries you might use with Gemlin or the like? I ask because right no

Re: when to use docvalue

2020-05-20 Thread Rahul Goswami
Eric, Thanks for that explanation. I have a follow up question on that. I find the scenario of stored=true and docValues=true to be tricky at times... would like to know when is each of these scenarios preferred over the other two for primitive datatypes: 1) stored=true and docValues=false 2) stor

Re: when to use docvalue

2020-05-20 Thread Revas
Erick, Can you also explain how to optimize facet query and range facets as they dont use docValues and contribute to higher response time? On Tue, May 19, 2020 at 5:55 PM Erick Erickson wrote: > They are _absolutely_ able to be used together. Background: > > “In the bad old days”, there was no

Query takes more time in Solr 8.5.1 compare to 6.1.0 version

2020-05-20 Thread jay harkhani
Hello, Currently I upgrade Solr version from 6.1.0 to 8.5.1 and come across one issue. Query which have more ids (around 3000) and grouping is applied takes more time to execute. In Solr 6.1.0 it takes 677ms and in Solr 8.5.1 it takes 26090ms. While take reading we have same solr schema and sam

Re: Different indexing times for two different collections with different data sizes

2020-05-20 Thread Erick Erickson
The easy question first. There is an absolute limit of 2B docs per shard. Internally, Lucene assigns an integer internal document ID that overflows after 2B. That includes deleted docs, so your “maxDoc” on the admin page is the limit. Practically, as you are finding, you run into performance iss

MIGRATE without split.key?

2020-05-20 Thread YangLiu
Hello everyone, I want to migrate data from one collection to another with MIGRATE API, but if this parameter split.key is not specified, it cannot be executed. Why can't we remove this limitation? Is there a better way to migrate data? Thanks.

Different indexing times for two different collections with different data sizes

2020-05-20 Thread Kommu, Vinodh K.
Hi, Recently we had noticed that one of the largest collection (shards = 6 ; replication factor =3) which holds up to 1TB of data & nearly 3.2 billion of docs is taking longer time to index than it used to before. To see the indexing time difference, we created another collection using largest

Re: REINDEXCOLLECTION not working on an alias

2020-05-20 Thread Bjarke Buur Mortensen
OK, that makes sense. Looking forward to that fix, thanks for the reply. Den tir. 19. maj 2020 kl. 17.21 skrev Joel Bernstein : > I believe the issue is that under the covers this feature is using the > "topic" streaming expressions which it was just reported doesn't work with > aliases. This is