Re: How to split index more than 2GB in size

2018-06-20 Thread Michael Kuhlmann
Hi Sushant, while this is true in general, it won't hold here. If you split your index, searching on each splitted shard might be a bit faster, but you'll increase search time much more because Solr needs to send your search queries to all shards and then combine the results. So instead of having

Re: Delete By Query issue followed by Delete By Id Issues

2018-06-20 Thread Shawn Heisey
On 6/20/2018 3:46 PM, sujatha sankaran wrote: > Thanks,Shawn. Very useful information. > > Please find below the log details:- Is your collection using the implicit router?  You didn't say.  If it is, then I think you may not be able to use deleteById.  This is indeed a bug, one that has been re

Re: Solr Upgrade DateField to TrieDateField

2018-06-20 Thread Shawn Heisey
On 6/20/2018 12:35 PM, Yunee Lee wrote: > I have two questions. > > 1. solr index on verion 4.6.0 and there are multiple date fields as the type > DateField in schema.xml > When I upgraded to version 5.2.1 with new data type Trie* for integer, float, > string and date. > Only date fields are not

Re: Remove schema.xml in favor of managed-schema

2018-06-20 Thread Walter Underwood
I strongly prefer the classic config files approach. Our config files are checked into version control. We update on the fly by uploading new files to Zookeeper, then reloading the collection. No restart needed. Pushing changes to prod is straightforward. Check out the tested files, load them i

Re: Streaming expressions and fetch()

2018-06-20 Thread Dariusz Wojtas
I have filled a JIRA Issue: SOLR-12505 Best regards, Darek On Mon, Jun 18, 2018 at 11:08 PM, Dariusz Wojtas wrote: > Hi, > I thing this might give some clue. > I tried to reproduce the issue with a collection called testCloud. > > fetch(testCloud1, > search(testCloud1, q="*:*", fq="type:n

Re: Delete By Query issue followed by Delete By Id Issues

2018-06-20 Thread sujatha sankaran
Thanks,Shawn. Very useful information. Please find below the log details:- 2018-06-20 17:19:06.661 ERROR (updateExecutor-2-thread-8226-processing-crm_v2_01_shard3_replica1 x:crm_v2_01_shard3_replica2 r:core_node4 n:masked:8983_solr s:shard3 c:crm_v2_01) [c:crm_v2_01 s:shard3 r:core_node4 x:cr

Re: MoreLikeThis in Solr 7.3.1

2018-06-20 Thread Monique Monteiro
Hi Anshum, Thanks! By using Zookeeper CLI I managed to update the configs. On Tue, Jun 19, 2018 at 6:29 PM Anshum Gupta wrote: > That explains it :) > > I assume you did make those changes on disk and did not upload the updated > configset to zookeeper. > > SolrCloud instances use the configset

Solr Upgrade DateField to TrieDateField

2018-06-20 Thread Yunee Lee
Hi, Hi, I have two questions. 1. solr index on verion 4.6.0 and there are multiple date fields as the type DateField in schema.xml When I upgraded to version 5.2.1 with new data type Trie* for integer, float, string and date. Only date fields are not upgraded properly with the following erro

Re: How to split index more than 2GB in size

2018-06-20 Thread Sushant Vengurlekar
Thank you for the detailed response Eric. Very much appreciated. The reason I am looking into splitting the index into two is because it’s much faster to search across a smaller index than a larger one. On Wed, Jun 20, 2018 at 10:46 AM Erick Erickson wrote: > You still haven't answered _why_ you

Applying streaming expression as a filter in graph traversal expression (gatherNodes)

2018-06-20 Thread Pratik Patel
We can limit the scope of graph traversal by applying some filter along the way as follows. gatherNodes(emails, walk="john...@apache.org->from", fq="body:(solr rocks)", gather="to") Is it possible to replace "body:(solr rocks)" by some streaming expression lik

Re: How to split index more than 2GB in size

2018-06-20 Thread Erick Erickson
You still haven't answered _why_ you think splitting even a 20G index is desirable. We regularly see 200G+ indexes per replica in the field, so what's the point? Have you measured different setups to see if it's a good idea? A 200G index needs some beefy hardware admittedly If you have adequat

Re: How to split index more than 2GB in size

2018-06-20 Thread Sushant Vengurlekar
The index size is small because this is my local development copy. The production index is more than 20GB. So I am working on getting the index split and replicated on different nodes. Our current instance on prod is single instance solr 6 which we are working on moving towards solrcloud 7 On Wed

Re: How to split index more than 2GB in size

2018-06-20 Thread Erick Erickson
Use the indexupgrader tool or optimize your index before using splitshard. Since this is a small index (< 5G), optimizing will not create an overly-large segment, so that pitfall is avoided. You haven't yet explained why you think splitting the index would be beneficial. Splitting an index this s

Re: How to split index more than 2GB in size

2018-06-20 Thread Sushant Vengurlekar
How can I resolve this error? On Wed, Jun 20, 2018 at 9:11 AM, Alexandre Rafalovitch wrote: > This seems more related to an old index upgraded to latest Solr rather than > the split itself. > > Regards, > Alex > > On Wed, Jun 20, 2018, 12:07 PM Sushant Vengurlekar, < > svengurle...@curvolabs

Re: How to split index more than 2GB in size

2018-06-20 Thread Sushant Vengurlekar
My old solr instance was 6.6.3 and the current solrcloud I am building is 7.3.1. Are there any issues there? On Wed, Jun 20, 2018 at 9:11 AM, Alexandre Rafalovitch wrote: > This seems more related to an old index upgraded to latest Solr rather than > the split itself. > > Regards, > Alex > >

Re: How to split index more than 2GB in size

2018-06-20 Thread Alexandre Rafalovitch
This seems more related to an old index upgraded to latest Solr rather than the split itself. Regards, Alex On Wed, Jun 20, 2018, 12:07 PM Sushant Vengurlekar, < svengurle...@curvolabs.com> wrote: > Thanks for the reply Alessandro! Appreciate it. > > Below is the full request and the error r

Re: How to split index more than 2GB in size

2018-06-20 Thread Sushant Vengurlekar
Thanks for the reply Alessandro! Appreciate it. Below is the full request and the error received curl ' http://localhost:8081/solr/admin/collections?action=SPLITSHARD&collection=dev-transactions&shard=shard1 ' { "responseHeader":{ "status":500, "QTime":7920}, "success":{ "so

Re: some solr replicas down

2018-06-20 Thread Shawn Heisey
On 6/20/2018 6:39 AM, Satya Marivada wrote: Yes, there are some other errors that there is a javabin character 2 expected and is returning 60 which is "<" . This happens when the response is an error.  Error responses are sent in HTML format (so they render properly when viewed in a browser),

Re: Drive Change for Solr Setup

2018-06-20 Thread Shawn Heisey
On 6/20/2018 5:03 AM, Srinivas Muppu (US) wrote: Hi Solr Team,My Solr project installation setup and instances(including clustered solr, zk services and indexing jobs schedulers) is available in Windows 'E:\ ' drive in production environment. As business needs to remove the E:\ drive, going forwa

Re: Solrcloud doesn't like relative path

2018-06-20 Thread Shawn Heisey
On 6/19/2018 5:47 PM, Sushant Vengurlekar wrote: Based on your suggestion I moved the helpers to be under configsets/conf so my new folder structure looks -configsets - conf helpers synonyms_vendors.txt - collection1 -conf

Re: Delete By Query issue followed by Delete By Id Issues

2018-06-20 Thread Shawn Heisey
On 6/15/2018 3:14 PM, sujatha sankaran wrote: We were initially having an issue with DBQ and heavy batch updates which used to result in many missing updates. After reading many mails in mailing list which mentions that DBQ and batch update do not work well together, we switched to DBI. But we

Indexing part of Binary Documents and not the entire contents

2018-06-20 Thread neotorand
Hi List, I have a specific Requirement where i need to index below things Meta Data of any document Some parts from the Document that matches some keywords that i configure The first part i am able to achieve through ERH or FilelistEntityProcessor. I am struggling on second part.I am looking for

Re: some solr replicas down

2018-06-20 Thread Chris Ulicny
Having time drift longer than the TTL would definitely cause these types of problems. In our case, the clusters are time-synchronized and the error is still encountered periodically. On Wed, Jun 20, 2018 at 10:07 AM Erick Erickson wrote: > We've seen this exact issue when the times reported by

Re: tlogs not deleting

2018-06-20 Thread Susheel Kumar
Not in my knowledge. Please double check or wait for some time but after DISABLEBUFFER on source, your logs should start rolling and its the exact same issue I have faced with 6.6 which you resolve by DISABLEBUFFER. On Tue, Jun 19, 2018 at 1:39 PM, Brian Yee wrote: > Does anyone have any additi

RE: 7.2.1 looking for ??????

2018-06-20 Thread Markus Jelsma
Ah, i completely forgot question mark is a single character wildcard! Yes, yes, the word lengths are due to stemming. Can't believe i didn't think of it, but thanks for clearing up the fog! Markus -Original message- > From:Erick Erickson > Sent: Wednesday 20th June 2018 16:15 > To:

Re: 7.2.1 looking for ??????

2018-06-20 Thread Erick Erickson
You're confusing query parsing with the analysis chain processing. Before the query gets to the WDFF, the query _parser_ decides it's a wildcard query so it never gets passed through WDFF. If you escaped all the question marks, then you'd get what you expect. If it weren't so, imagine what would h

Re: some solr replicas down

2018-06-20 Thread Erick Erickson
We've seen this exact issue when the times reported by various machines have different wall-clock times, so getting these times coordinated is definitely the very first thing I'd do. It's particularly annoying because if the clocks are drifting apart gradually, your setup can be running find for d

Drive Change for Solr Setup

2018-06-20 Thread Srinivas Muppu (US)
Hi Solr Team,My Solr project installation setup and instances(including clustered solr, zk services and indexing jobs schedulers) is available in Windows 'E:\ ' drive in production environment. As business needs to remove the E:\ drive, going forward D:\ drive will be used and operational.Is ther

Re: some solr replicas down

2018-06-20 Thread Satya Marivada
Chris, You are spot on with the timestamps. The date command returns different times on these vms and are not in sync with ntp. The ntpstat returns a difference of about 8-10 seconds on the 4 vms and that would caused this synchronization issues and marked the replicas as down. This just happened

7.2.1 looking for ??????

2018-06-20 Thread Markus Jelsma
Hello, On the monitoring i spotted a query that that tooks over twenty seconds, instead of the usual 200 ms. It turned out to be a someone looking for question marks. I couldn't believe that would be a costly query, they should be removed by WordDelimiterFilter, and that is the case, that query

Solr - SpellCheckComponent Threshold parameter value not being honored.

2018-06-20 Thread ruby
I'm using the SpellCheckComponent to build term suggestions. It's working in most cases but for few words I am not seeing suggestions. There are around 14652 in total indexed. Out of them 856 documents start with the word "feq". When we search by "feq" we get results back but spellcheck does not re

Re: Logging Every document to particular core

2018-06-20 Thread govind nitk
Thanks a lot for your inputs Alessandro and Mikhail. @Alessandro, I tried with transaction log. But it was bit more of work to get around( as it gets rolled over). Hack I did is use of a proxy in between and Now I have more control. Regards, Govind On Thu, Jun 14, 2018 at 7:32 PM Mikhail Khludne

Re: How to split index more than 2GB in size

2018-06-20 Thread Alessandro Benedetti
Hi, in the first place, why do you want to split 2 Gb indexes ? Nowadays is a fairly small index. Secondly what you reported is incomplete. I would expect a Caused By section in the stacktrace. This are generic recommendations, always spend time in analysing the problem you had scrupulously. - So

Re: Solr 6.5 autosuggest suggests misspelt words and unwanted words

2018-06-20 Thread Alessandro Benedetti
Hi, you should curate your data, that is fundamental to have an healthy search solution, but let's see what you can do anyway : 1) curate a dictionary of such bad words and then configure analysis to skip them 2) Have you tried different dictionary implementations ? I would assume that each single