in-places update solr 5.5.2
Are in place updates available in solr 5.5.2, I find atomic updates in the doc https://archive.apache.org/dist/lucene/solr/ref-guide/apache-solr-ref-guide-5.5.pdf, which redirects me to the page https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents#UpdatingPartsofDocuments-AtomicUpdates . On that page, for in-place updates, it says the _*version*_ field is also a non-indexed, non-stored single valued docValues field when I try this with solr 5.5.2 I get an error message org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Unable to use updateLog: _version_ field must exist in schema, using indexed=\"true\" or docValues=\"true\", stored=\"true\" and multiValued=\"false\" (_version_ is not stored What I'm looking for is a way to update one field of a doc without erasing the non stored fields. Is this possible in solr 5.5.2? best regards, Elisabeth
High CPU utilization on Upgrading to Solr Version 6.3
Hi , We upgrade our production Solr to version 6.3 from Version 4.3.2 about a week ago. We had our Dev / QA / staging on the same version (6.3) before finally releasing the application leveraging Solr 6.3. We did our functional and load testing on these boxes , however when we released it to production along with the same application (using SolrJ to query Solr) , we ran into severe CPU issues. Just to add we're on Master - Slave where master has index on NRTCachingDirectory and Slave on RAMDirectory. As soon as we placed the slaves under load balancer , under NO LOAD condition as well , the slave went from a load of 4 -> 10 -> 16 - > 100 in 12 mins. I suspected this to be caused due to replication but this is never ending , so before this crashed we de-provisioned it and brought it down. I'm not sure what could possibly cause it. I looked into the caches , where documentcache , filtercache , queryresultcaches are set to defaults 1024 and 100 documents. I tried observing the GC activity on GCViewer too , which does'nt really shows something alarming (as in what I feel) - a sampler at https://pastebin.com/cnuATYrS Can anyone possibly tell me the reasons ? Thanks a lot in advance. Atita
Re: 6.6 cloud starting to eat CPU after 8+ hours
I've looked into stacktrace. I see that one thread triggers commit via update command. And it's blocked on searcher warming. The really odd thing is total state = BLOCKED. Can you check that there is a spare heap space available during indexing peak? And also that there free RAM (after heap allocation)? Can it happen that warming query is unnecessary heavy? Also, explicit commits might cause issues, consider the best practice with auto-commit and openSearcher=false and soft commit when necessary. On Mon, Jul 24, 2017 at 4:35 PM, Markus Jelsma wrote: > Alright, after adding a field and full cluster restart, the cluster is > going nuts once again and this time almost immediately after the restart. > > I have now restarted all but one so there is some room to compare, or so i > thought. Now, the node i didn't restart also drops CPU-usage. This seems to > correspond to another incident some time ago where all nodes went crazy > over an extended period, but calmed down after a few were restarted. So it > could be a problem of inter-node communication. > > The index is is one segment at this moment but some documents are being > indexed. Some queries are executed but not very much. Attaching the stack > anyway. > > > > > > -Original message- > > From:Mikhail Khludnev > > Sent: Wednesday 19th July 2017 14:41 > > To: solr-user > > Subject: Re: 6.6 cloud starting to eat CPU after 8+ hours > > > > You can get stack from kill -3 jstack even from solradmin. Overall, this > > behavior looks like typical heavy merge kicking off from time to time. > > > > On Wed, Jul 19, 2017 at 3:31 PM, Markus Jelsma < > markus.jel...@openindex.io> > > wrote: > > > > > Hello, > > > > > > No i cannot expose the stack, VisualVM samples won't show it to me. > > > > > > I am not sure if they're about to sync all the time, but every 15 > minutes > > > some documents are indexed (3 - 4k). For some reason, index time does > > > increase with latency / CPU usage. > > > > > > This situation runs fine for many hours, then it will slowly start to > go > > > bad, until nodes are restarted (or index size decreased). > > > > > > Thanks, > > > Markus > > > > > > -Original message- > > > > From:Mikhail Khludnev > > > > Sent: Wednesday 19th July 2017 14:18 > > > > To: solr-user > > > > Subject: Re: 6.6 cloud starting to eat CPU after 8+ hours > > > > > > > > > > > > > > The real distinction between busy and calm nodes is that busy > nodes all > > > > > have o.a.l.codecs.perfield.PerFieldPostingsFormat$ > FieldsReader.terms() > > > as > > > > > second to fillBuffer(), what are they doing? > > > > > > > > > > > > Can you expose the stack deeper? > > > > Can they start to sync shards due to some reason? > > > > > > > > On Wed, Jul 19, 2017 at 12:35 PM, Markus Jelsma < > > > markus.jel...@openindex.io> > > > > wrote: > > > > > > > > > Hello, > > > > > > > > > > Another peculiarity here, our six node (2 shards / 3 replica's) > > > cluster is > > > > > going crazy after a good part of the day has passed. It starts > eating > > > CPU > > > > > for no good reason and its latency goes up. Grafana graphs show the > > > problem > > > > > really well > > > > > > > > > > After restarting 2/6 nodes, there is also quite a distinction in > the > > > > > VisualVM monitor views, and the VisualVM CPU sampler reports > (sorted on > > > > > self time (CPU)). The busy nodes are deeply red in o.a.h.impl.io. > > > > > AbstractSessionInputBuffer.fillBuffer (as usual), the restarted > nodes > > > are > > > > > not. > > > > > > > > > > The real distinction between busy and calm nodes is that busy > nodes all > > > > > have o.a.l.codecs.perfield.PerFieldPostingsFormat$ > FieldsReader.terms() > > > as > > > > > second to fillBuffer(), what are they doing?! Why? The calm nodes > don't > > > > > show this at all. Busy nodes all have o.a.l.codec stuff on top, > > > restarted > > > > > nodes don't. > > > > > > > > > > So, actually, i don't have a clue! Any, any ideas? > > > > > > > > > > Thanks, > > > > > Markus > > > > > > > > > > Each replica is underpowered but performing really well after > restart > > > (and > > > > > JVM warmup), 4 CPU's, 900M heap, 8 GB RAM, maxDoc 2.8 million, > index > > > size > > > > > 18 GB. > > > > > > > > > > > > > > > > > > > > > -- > > > > Sincerely yours > > > > Mikhail Khludnev > > > > > > > > > > > > > > > -- > > Sincerely yours > > Mikhail Khludnev > > > -- Sincerely yours Mikhail Khludnev
Issue with delta import
Hi, I'm trying to integrate Solr and Cassandra. I"m facing problem with delta import. For every 10 minutes I'm running deltaquery using cron job. If any changes in the data based on last index time, it has to fetch the data(as far as my knowledge), however, it keeps fetching the whole data irrespective of changes. My problem: https://stackoverflow.com/questions/45304803/deltaimport-fetches-all-the-data Looking forward to hear from you. Thanks, Bhargava Ravali Koganti
Re: Issue with delta import
can you please try ${dih.last_index_time} instead of ${dataimporter.last_index_time}. On Wed, Jul 26, 2017 at 2:33 PM, bhargava ravali koganti < ravali@gmail.com> wrote: > Hi, > > I'm trying to integrate Solr and Cassandra. I"m facing problem with delta > import. For every 10 minutes I'm running deltaquery using cron job. If any > changes in the data based on last index time, it has to fetch the data(as > far as my knowledge), however, it keeps fetching the whole data > irrespective of changes. > > My problem: > https://stackoverflow.com/questions/45304803/deltaimport-fetches-all-the- > data > > Looking forward to hear from you. > > Thanks, > Bhargava Ravali Koganti > -- Thanks, Sujay P Bawaskar M:+91-77091 53669
Re: in-places update solr 5.5.2
The in-place update section you referenced was added in Solr 6.5. On p. 224 of the PDF for 5.5, note it says there are only two available approaches and the section on in-place updates you see online isn't mentioned. I looked into the history of the online page and the section on in-place updates was added for Solr 6.5, when SOLR-5944 was released. So, unfortunately, unless someone else has a better option for pre-6.5, I believe it was not possible in 5.5.2. Cassandra On Wed, Jul 26, 2017 at 2:30 AM, elisabeth benoit wrote: > Are in place updates available in solr 5.5.2, I find atomic updates in the > doc > https://archive.apache.org/dist/lucene/solr/ref-guide/apache-solr-ref-guide-5.5.pdf, > which redirects me to the page > https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents#UpdatingPartsofDocuments-AtomicUpdates > . > > On that page, for in-place updates, it says > > the _*version*_ field is also a non-indexed, non-stored single valued > docValues field > > when I try this with solr 5.5.2 I get an error message > > org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: > Unable to use updateLog: _version_ field must exist in schema, using > indexed=\"true\" or docValues=\"true\", stored=\"true\" and > multiValued=\"false\" (_version_ is not stored > > > What I'm looking for is a way to update one field of a doc without erasing > the non stored fields. Is this possible in solr 5.5.2? > > best regards, > Elisabeth
Re: Copy field a source of copy field
I get your point, the second KeepWordFilter is not keeping anything because the token it gets is : "hey you" and the word is supposed to keep is "hey". Which does clearly not work. The KeepWordFilter just consider each row a single token ( I may be wrong, i didn't check the code, I am just asssuming based on your observations). If you want, you can put a WordDelimiterFilter between the 2 KeepWordFilter. Configure the WordDelimiterFilter to split on space ( I need to double check, but it should be possible). OR You simply do as Erick suggested, and you just keep the genera in the genus field. But as Erick mentioned, you may have problems of entity recognition. - --- Alessandro Benedetti Search Consultant, R&D Software Engineer, Director Sease Ltd. - www.sease.io -- View this message in context: http://lucene.472066.n3.nabble.com/Copy-field-a-source-of-copy-field-tp4346425p4347731.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: in-places update solr 5.5.2
Thanks a lot for your answer 2017-07-26 16:35 GMT+02:00 Cassandra Targett : > The in-place update section you referenced was added in Solr 6.5. On > p. 224 of the PDF for 5.5, note it says there are only two available > approaches and the section on in-place updates you see online isn't > mentioned. I looked into the history of the online page and the > section on in-place updates was added for Solr 6.5, when SOLR-5944 was > released. > > So, unfortunately, unless someone else has a better option for > pre-6.5, I believe it was not possible in 5.5.2. > > Cassandra > > On Wed, Jul 26, 2017 at 2:30 AM, elisabeth benoit > wrote: > > Are in place updates available in solr 5.5.2, I find atomic updates in > the > > doc > > https://archive.apache.org/dist/lucene/solr/ref-guide/ > apache-solr-ref-guide-5.5.pdf, > > which redirects me to the page > > https://cwiki.apache.org/confluence/display/solr/ > Updating+Parts+of+Documents#UpdatingPartsofDocuments-AtomicUpdates > > . > > > > On that page, for in-place updates, it says > > > > the _*version*_ field is also a non-indexed, non-stored single valued > > docValues field > > > > when I try this with solr 5.5.2 I get an error message > > > > org.apache.solr.common.SolrException:org.apache.solr. > common.SolrException: > > Unable to use updateLog: _version_ field must exist in schema, using > > indexed=\"true\" or docValues=\"true\", stored=\"true\" and > > multiValued=\"false\" (_version_ is not stored > > > > > > What I'm looking for is a way to update one field of a doc without > erasing > > the non stored fields. Is this possible in solr 5.5.2? > > > > best regards, > > Elisabeth >
WordDelimiterFilterFactory with Wildcards
I have several fieldtypes that use the WordDelimiterFilterFactory We have a fieldtype for cas numbers. which look like 1234-12-1, numbers separated by hyphens, users often leave out the hyphens and either use spaces or just string the numbers together. The WDF seemed like a great solution especially as it gave partial matches. However, a query like 1234-12-* fails. The analyzer tool shows the wildcard getting stripped off. Is there any way to preserve the wildcard in the query analyzer when using the WordDelimiterFilterFactory? -- This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not the intended recipient, you must not copy this message or attachment or disclose the contents to any other person. If you have received this transmission in error, please notify the sender immediately and delete the message and any attachment from your system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept liability for any omissions or errors in this message which may arise as a result of E-Mail-transmission or for damages resulting from any unauthorized changes of the content of this message and any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not guarantee that this message is free of viruses and does not accept liability for any damages caused by any virus transmitted therewith. Click http://www.emdgroup.com/disclaimer to access the German, French, Spanish and Portuguese versions of this disclaimer.
Re: WordDelimiterFilterFactory with Wildcards
1. What tokenizer are you using? 2. Do you have preserveOriginal="1" flag set in your filter? 3. Which version of solr are you using? On Wed, Jul 26, 2017 at 10:48 AM, Webster Homer wrote: > I have several fieldtypes that use the WordDelimiterFilterFactory > > We have a fieldtype for cas numbers. which look like 1234-12-1, numbers > separated by hyphens, users often leave out the hyphens and either use > spaces or just string the numbers together. > > The WDF seemed like a great solution especially as it gave partial matches. > However, a query like 1234-12-* fails. The analyzer tool shows the wildcard > getting stripped off. > Is there any way to preserve the wildcard in the query analyzer when using > the WordDelimiterFilterFactory? > > -- > > > This message and any attachment are confidential and may be privileged or > otherwise protected from disclosure. If you are not the intended recipient, > you must not copy this message or attachment or disclose the contents to > any other person. If you have received this transmission in error, please > notify the sender immediately and delete the message and any attachment > from your system. Merck KGaA, Darmstadt, Germany and any of its > subsidiaries do not accept liability for any omissions or errors in this > message which may arise as a result of E-Mail-transmission or for damages > resulting from any unauthorized changes of the content of this message and > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its > subsidiaries do not guarantee that this message is free of viruses and does > not accept liability for any damages caused by any virus transmitted > therewith. > > Click http://www.emdgroup.com/disclaimer to access the German, French, > Spanish and Portuguese versions of this disclaimer. > -- Saurabh Sethi Principal Engineer I | Engineering
Re: WordDelimiterFilterFactory with Wildcards
1. KeywordTokenizer - we want to treat the entire field as a single term to parse 2. preserveOriginal = "0" Thought about changing this to 1 3. 6.2.2 This is the fieldtype On Wed, Jul 26, 2017 at 12:56 PM, Saurabh Sethi wrote: > 1. What tokenizer are you using? > 2. Do you have preserveOriginal="1" flag set in your filter? > 3. Which version of solr are you using? > > On Wed, Jul 26, 2017 at 10:48 AM, Webster Homer > wrote: > > > I have several fieldtypes that use the WordDelimiterFilterFactory > > > > We have a fieldtype for cas numbers. which look like 1234-12-1, numbers > > separated by hyphens, users often leave out the hyphens and either use > > spaces or just string the numbers together. > > > > The WDF seemed like a great solution especially as it gave partial > matches. > > However, a query like 1234-12-* fails. The analyzer tool shows the > wildcard > > getting stripped off. > > Is there any way to preserve the wildcard in the query analyzer when > using > > the WordDelimiterFilterFactory? > > > > -- > > > > > > This message and any attachment are confidential and may be privileged or > > otherwise protected from disclosure. If you are not the intended > recipient, > > you must not copy this message or attachment or disclose the contents to > > any other person. If you have received this transmission in error, please > > notify the sender immediately and delete the message and any attachment > > from your system. Merck KGaA, Darmstadt, Germany and any of its > > subsidiaries do not accept liability for any omissions or errors in this > > message which may arise as a result of E-Mail-transmission or for damages > > resulting from any unauthorized changes of the content of this message > and > > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its > > subsidiaries do not guarantee that this message is free of viruses and > does > > not accept liability for any damages caused by any virus transmitted > > therewith. > > > > Click http://www.emdgroup.com/disclaimer to access the German, French, > > Spanish and Portuguese versions of this disclaimer. > > > > > > -- > Saurabh Sethi > Principal Engineer I | Engineering > -- This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not the intended recipient, you must not copy this message or attachment or disclose the contents to any other person. If you have received this transmission in error, please notify the sender immediately and delete the message and any attachment from your system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept liability for any omissions or errors in this message which may arise as a result of E-Mail-transmission or for damages resulting from any unauthorized changes of the content of this message and any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not guarantee that this message is free of viruses and does not accept liability for any damages caused by any virus transmitted therewith. Click http://www.emdgroup.com/disclaimer to access the German, French, Spanish and Portuguese versions of this disclaimer.
Re: WordDelimiterFilterFactory with Wildcards
My guess is PatternReplaceFilterFactory is most likely the cause. Also, based on your query, you might want to set preserveOriginal=1 You can take one filter out at a time and see which one is altering the query. On Wed, Jul 26, 2017 at 11:13 AM, Webster Homer wrote: > 1. KeywordTokenizer - we want to treat the entire field as a single term to > parse > 2. preserveOriginal = "0" Thought about changing this to 1 > 3. 6.2.2 > > This is the fieldtype > positionIncrementGap="100"> > > > > generateWordParts="0" >splitOnCaseChange="0" >splitOnNumerics="1" >generateNumberParts="0" >catenateWords="0" >catenateNumbers="1" >catenateAll="0" >preserveOriginal="0" >stemEnglishPossessive="0"/> > > > > > ignoreCase="true" expand="true" > tokenizerFactory="solr.KeywordTokenizerFactory"/> > > pattern="^.*([^- 0-9*]+).*$" replacement="" replace="all"/> > generateWordParts="0" >splitOnCaseChange="0" >splitOnNumerics="1" >generateNumberParts="0" >catenateWords="0" >catenateNumbers="1" >catenateAll="0" >preserveOriginal="0" >stemEnglishPossessive="0"/> > > > > > On Wed, Jul 26, 2017 at 12:56 PM, Saurabh Sethi < > saurabh.se...@sendgrid.com> > wrote: > > > 1. What tokenizer are you using? > > 2. Do you have preserveOriginal="1" flag set in your filter? > > 3. Which version of solr are you using? > > > > On Wed, Jul 26, 2017 at 10:48 AM, Webster Homer > > wrote: > > > > > I have several fieldtypes that use the WordDelimiterFilterFactory > > > > > > We have a fieldtype for cas numbers. which look like 1234-12-1, numbers > > > separated by hyphens, users often leave out the hyphens and either use > > > spaces or just string the numbers together. > > > > > > The WDF seemed like a great solution especially as it gave partial > > matches. > > > However, a query like 1234-12-* fails. The analyzer tool shows the > > wildcard > > > getting stripped off. > > > Is there any way to preserve the wildcard in the query analyzer when > > using > > > the WordDelimiterFilterFactory? > > > > > > -- > > > > > > > > > This message and any attachment are confidential and may be privileged > or > > > otherwise protected from disclosure. If you are not the intended > > recipient, > > > you must not copy this message or attachment or disclose the contents > to > > > any other person. If you have received this transmission in error, > please > > > notify the sender immediately and delete the message and any attachment > > > from your system. Merck KGaA, Darmstadt, Germany and any of its > > > subsidiaries do not accept liability for any omissions or errors in > this > > > message which may arise as a result of E-Mail-transmission or for > damages > > > resulting from any unauthorized changes of the content of this message > > and > > > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its > > > subsidiaries do not guarantee that this message is free of viruses and > > does > > > not accept liability for any damages caused by any virus transmitted > > > therewith. > > > > > > Click http://www.emdgroup.com/disclaimer to access the German, French, > > > Spanish and Portuguese versions of this disclaimer. > > > > > > > > > > > -- > > Saurabh Sethi > > Principal Engineer I | Engineering > > > > -- > > > This message and any attachment are confidential and may be privileged or > otherwise protected from disclosure. If you are not the intended recipient, > you must not copy this message or attachment or disclose the contents to > any other person. If you have received this transmission in error, please > notify the sender immediately and delete the message and any attachment > from your system. Merck KGaA, Darmstadt, Germany and any of its > subsidiaries do not accept liability for any omissions or errors in this > message which may arise as a result of E-Mail-transmission or for damages > resulting from any unauthorized changes of the content of this message and > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its > subsidiaries do not guarantee that this message is free of viruses and does > not accept liability for any damages caused by any virus transmitted > therewith. > > Click http://www.emdgroup.com/disclaimer to access the German, French, > Spanish and Portuguese versions of this disclaimer. > -- Saurabh Sethi Principal Engineer I | Engineering
Re: WordDelimiterFilterFactory with Wildcards
checked the Pattern Replace it's OK. Can't use the preserve original since it preserves the hyphens too, which I don't want. It would be best if it didn't touch the * at all On Wed, Jul 26, 2017 at 1:30 PM, Saurabh Sethi wrote: > My guess is PatternReplaceFilterFactory is most likely the cause. > Also, based on your query, you might want to set preserveOriginal=1 > > You can take one filter out at a time and see which one is altering the > query. > > On Wed, Jul 26, 2017 at 11:13 AM, Webster Homer > wrote: > > > 1. KeywordTokenizer - we want to treat the entire field as a single term > to > > parse > > 2. preserveOriginal = "0" Thought about changing this to 1 > > 3. 6.2.2 > > > > This is the fieldtype > > > positionIncrementGap="100"> > > > > > > > > >generateWordParts="0" > >splitOnCaseChange="0" > >splitOnNumerics="1" > >generateNumberParts="0" > >catenateWords="0" > >catenateNumbers="1" > >catenateAll="0" > >preserveOriginal="0" > >stemEnglishPossessive="0"/> > > > > > > > > > > >ignoreCase="true" expand="true" > > tokenizerFactory="solr.KeywordTokenizerFactory"/> > > > > > pattern="^.*([^- 0-9*]+).*$" replacement="" replace="all"/> > >>generateWordParts="0" > >splitOnCaseChange="0" > >splitOnNumerics="1" > >generateNumberParts="0" > >catenateWords="0" > >catenateNumbers="1" > >catenateAll="0" > >preserveOriginal="0" > >stemEnglishPossessive="0"/> > > > > > > > > > > On Wed, Jul 26, 2017 at 12:56 PM, Saurabh Sethi < > > saurabh.se...@sendgrid.com> > > wrote: > > > > > 1. What tokenizer are you using? > > > 2. Do you have preserveOriginal="1" flag set in your filter? > > > 3. Which version of solr are you using? > > > > > > On Wed, Jul 26, 2017 at 10:48 AM, Webster Homer < > webster.ho...@sial.com> > > > wrote: > > > > > > > I have several fieldtypes that use the WordDelimiterFilterFactory > > > > > > > > We have a fieldtype for cas numbers. which look like 1234-12-1, > numbers > > > > separated by hyphens, users often leave out the hyphens and either > use > > > > spaces or just string the numbers together. > > > > > > > > The WDF seemed like a great solution especially as it gave partial > > > matches. > > > > However, a query like 1234-12-* fails. The analyzer tool shows the > > > wildcard > > > > getting stripped off. > > > > Is there any way to preserve the wildcard in the query analyzer when > > > using > > > > the WordDelimiterFilterFactory? > > > > > > > > -- > > > > > > > > > > > > This message and any attachment are confidential and may be > privileged > > or > > > > otherwise protected from disclosure. If you are not the intended > > > recipient, > > > > you must not copy this message or attachment or disclose the contents > > to > > > > any other person. If you have received this transmission in error, > > please > > > > notify the sender immediately and delete the message and any > attachment > > > > from your system. Merck KGaA, Darmstadt, Germany and any of its > > > > subsidiaries do not accept liability for any omissions or errors in > > this > > > > message which may arise as a result of E-Mail-transmission or for > > damages > > > > resulting from any unauthorized changes of the content of this > message > > > and > > > > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its > > > > subsidiaries do not guarantee that this message is free of viruses > and > > > does > > > > not accept liability for any damages caused by any virus transmitted > > > > therewith. > > > > > > > > Click http://www.emdgroup.com/disclaimer to access the German, > French, > > > > Spanish and Portuguese versions of this disclaimer. > > > > > > > > > > > > > > > > -- > > > Saurabh Sethi > > > Principal Engineer I | Engineering > > > > > > > -- > > > > > > This message and any attachment are confidential and may be privileged or > > otherwise protected from disclosure. If you are not the intended > recipient, > > you must not copy this message or attachment or disclose the contents to > > any other person. If you have received this transmission in error, please > > notify the sender immediately and delete the message and any attachment > > from your system. Merck KGaA, Darmstadt, Germany and any of its > > subsidiaries do not accept liability for any omissions or errors in this > > message which may arise as a result of E-Mail-transmission or for damages > > resulting from any unauthorized changes of the content of this message > and > > any attachment thereto. Merck KGaA, Darmsta
Re: Unable to create core [collection] Caused by: null
Hi Lucas, It would be super useful if you provided more information with the question. A few things you might want to include are - version of Solr, how did you start it, stack trace from the log etc. -Anshum > On Jul 25, 2017, at 4:21 PM, Lucas Pelegrino wrote: > > Hey guys. > > Trying to make solr work here, but I'm getting this error from this command: > > $ ./solr create -c products -d /Users/lucaswxp/reduza-solr/products/conf/ > > Error CREATEing SolrCore 'products': Unable to create core [products] > Caused by: null > > I'm posting my solrconf.xml, schema.xml and data-config.xml here: > https://pastebin.com/fnYK9pSJ > > The debug from log solr: https://pastebin.com/kVLMvBwZ > > Not sure what to do, the error isn't very descriptive.
shared file system other than HDFS?
Hi, Just wondering if anyone has deployed solrcloud on a shared filesystem other than HDSF. Appreciate if they can share a bit about the setup, OS, file system, replication and backup strategies, etc. Thanks -- Daniel Huang BNY Mellon Innovation Center – Silicon Valley 3495 Deer Creek Road, Palo Alto, CA 94304 The information contained in this e-mail, and any attachment, is confidential and is intended solely for the use of the intended recipient. Access, copying or re-use of the e-mail or any attachment, or any information contained therein, by any other person is not authorized. If you are not the intended recipient please return the e-mail to the sender and delete it from your computer. Although we attempt to sweep e-mail and attachments for viruses, we do not guarantee that either are virus-free and accept no liability for any damage sustained as a result of viruses. Please refer to http://disclaimer.bnymellon.com/eu.htm for certain disclosures relating to European legal entities.
Re: shared file system other than HDFS?
Hello, Daniel You might find it useful https://issues.apache.org/jira/browse/SOLR-9952. On Wed, Jul 26, 2017 at 10:46 PM, Huang, Daniel wrote: > Hi, > > Just wondering if anyone has deployed solrcloud on a shared filesystem > other than HDSF. Appreciate if they can share a bit about the setup, OS, > file system, replication and backup strategies, etc. > > Thanks > > -- > Daniel Huang > BNY Mellon Innovation Center – Silicon Valley > 3495 Deer Creek Road, Palo Alto, CA 94304 > > The information contained in this e-mail, and any attachment, is > confidential and is intended solely for the use of the intended recipient. > Access, copying or re-use of the e-mail or any attachment, or any > information contained therein, by any other person is not authorized. If > you are not the intended recipient please return the e-mail to the sender > and delete it from your computer. Although we attempt to sweep e-mail and > attachments for viruses, we do not guarantee that either are virus-free and > accept no liability for any damage sustained as a result of viruses. > > Please refer to http://disclaimer.bnymellon.com/eu.htm for certain > disclosures relating to European legal entities. > -- Sincerely yours Mikhail Khludnev
RE: 6.6 cloud starting to eat CPU after 8+ hours
Hello Mikhail, Spot on, there is indeed not enough heap when our nodes are in this crazy state. When the nodes are happy, the average heap consumption is 50 to 60 percent, at peak when indexing there is in general enough heap for the warming to run smoothly. I probably forgot to mention that high CPU is in our case also high heap consumption when the nodes act mad. The stack trace you saw was a crazy node while documents were indexed, so the blocking you mention makes sense. I still believe this is not a heap issue but something odd in Solr that eats in (INSERT SITUATION) unreasonably amounts of heap. Remind, the bad node went back in a normal 50-60 percent heap consumption and normal CPU time after _other_ nodes got restarted. All nodes were bad and went normal after restart. Bad nodes that were not restarted then suddenly revived and also went back to normal. This is very odd behavior. Observing this, i am inclined to think that Solr's inter-node communication can get into a weird state, eating heap, eating CPU, going bad. Using CPU or heap sampling it is very hard, for me at least, to quickly spot something that is bad so i am clueless. What do you think? How can a bad node become normal just by restarting another bad node? Puzzling.. Thanks, Markus -Original message- > From:Mikhail Khludnev > Sent: Wednesday 26th July 2017 10:50 > To: solr-user > Subject: Re: 6.6 cloud starting to eat CPU after 8+ hours > > I've looked into stacktrace. > I see that one thread triggers commit via update command. And it's blocked > on searcher warming. The really odd thing is total state = BLOCKED. Can you > check that there is a spare heap space available during indexing peak? And > also that there free RAM (after heap allocation)? Can it happen that > warming query is unnecessary heavy? Also, explicit commits might cause > issues, consider the best practice with auto-commit and openSearcher=false > and soft commit when necessary. > > > On Mon, Jul 24, 2017 at 4:35 PM, Markus Jelsma > wrote: > > > Alright, after adding a field and full cluster restart, the cluster is > > going nuts once again and this time almost immediately after the restart. > > > > I have now restarted all but one so there is some room to compare, or so i > > thought. Now, the node i didn't restart also drops CPU-usage. This seems to > > correspond to another incident some time ago where all nodes went crazy > > over an extended period, but calmed down after a few were restarted. So it > > could be a problem of inter-node communication. > > > > The index is is one segment at this moment but some documents are being > > indexed. Some queries are executed but not very much. Attaching the stack > > anyway. > > > > > > > > > > > > -Original message- > > > From:Mikhail Khludnev > > > Sent: Wednesday 19th July 2017 14:41 > > > To: solr-user > > > Subject: Re: 6.6 cloud starting to eat CPU after 8+ hours > > > > > > You can get stack from kill -3 jstack even from solradmin. Overall, this > > > behavior looks like typical heavy merge kicking off from time to time. > > > > > > On Wed, Jul 19, 2017 at 3:31 PM, Markus Jelsma < > > markus.jel...@openindex.io> > > > wrote: > > > > > > > Hello, > > > > > > > > No i cannot expose the stack, VisualVM samples won't show it to me. > > > > > > > > I am not sure if they're about to sync all the time, but every 15 > > minutes > > > > some documents are indexed (3 - 4k). For some reason, index time does > > > > increase with latency / CPU usage. > > > > > > > > This situation runs fine for many hours, then it will slowly start to > > go > > > > bad, until nodes are restarted (or index size decreased). > > > > > > > > Thanks, > > > > Markus > > > > > > > > -Original message- > > > > > From:Mikhail Khludnev > > > > > Sent: Wednesday 19th July 2017 14:18 > > > > > To: solr-user > > > > > Subject: Re: 6.6 cloud starting to eat CPU after 8+ hours > > > > > > > > > > > > > > > > > The real distinction between busy and calm nodes is that busy > > nodes all > > > > > > have o.a.l.codecs.perfield.PerFieldPostingsFormat$ > > FieldsReader.terms() > > > > as > > > > > > second to fillBuffer(), what are they doing? > > > > > > > > > > > > > > > Can you expose the stack deeper? > > > > > Can they start to sync shards due to some reason? > > > > > > > > > > On Wed, Jul 19, 2017 at 12:35 PM, Markus Jelsma < > > > > markus.jel...@openindex.io> > > > > > wrote: > > > > > > > > > > > Hello, > > > > > > > > > > > > Another peculiarity here, our six node (2 shards / 3 replica's) > > > > cluster is > > > > > > going crazy after a good part of the day has passed. It starts > > eating > > > > CPU > > > > > > for no good reason and its latency goes up. Grafana graphs show the > > > > problem > > > > > > really well > > > > > > > > > > > > After restarting 2/6 nodes, there is also quite a distinction in > > the > > > > > > VisualVM monitor views, and the VisualVM CPU sampler reports > > (sort
Re: High CPU utilization on Upgrading to Solr Version 6.3
On 7/26/2017 1:49 AM, Atita Arora wrote: > We did our functional and load testing on these boxes , however when we > released it to production along with the same application (using SolrJ to > query Solr) , we ran into severe CPU issues. > Just to add we're on Master - Slave where master has index on > NRTCachingDirectory > and Slave on RAMDirectory. > > As soon as we placed the slaves under load balancer , under NO LOAD > condition as well , the slave went from a load of 4 -> 10 -> 16 - > 100 in > 12 mins. > > I suspected this to be caused due to replication but this is never ending , > so before this crashed we de-provisioned it and brought it down. > > I'm not sure what could possibly cause it. > > I looked into the caches , where documentcache , filtercache , > queryresultcaches are set to defaults 1024 and 100 documents. > > I tried observing the GC activity on GCViewer too , which does'nt really > shows something alarming (as in what I feel) - a sampler at > https://pastebin.com/cnuATYrS What OS is Solr running on? I'm only asking because some additional information I'm after has different gathering methods depending on OS. Other questions: Is there only one Solr process per machine, or more than one? How many total documents are managed by one machine? How big is all the index data managed by one machine? What is the max heap on each Solr process? FYI, RAMDirectory is not the preferred way of running Solr or Lucene. If you have enough memory to hold the entire index, it's better to let the OS handle keeping that information in memory, rather than having Lucene and Java do it. http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html NRTCachingDirectoryFactory uses MMap by default as its delegate implementation, so your master is fine. I would be interested in getting a copy of Solr's gc log from a system with high CPU to look at. Thanks, Shawn
precedence for configurations in solrconfig.xml file
Hi, If I have a configoverlay.json file with the below content {"props":{"updateHandler":{"autoCommit":{ "maxTime":5, "maxDocs":1 and I also have a JVM properties set on the Solr JVM instance as -Dsolr.autoCommit.maxtime=2 -Dsolr.autoCommit.maxDocs=10 I would like to know the order of precedence in which the configurations are applied. Regards Suresh
Re: WordDelimiterFilterFactory with Wildcards
The Admin/Analysis page is useful here. It'll show you what each bit of your query analysis chain does and may well point you to the part of the chain that's the problem. Best, Erick On Wed, Jul 26, 2017 at 11:33 AM, Webster Homer wrote: > checked the Pattern Replace it's OK. Can't use the preserve original since > it preserves the hyphens too, which I don't want. It would be best if it > didn't touch the * at all > > On Wed, Jul 26, 2017 at 1:30 PM, Saurabh Sethi > wrote: > >> My guess is PatternReplaceFilterFactory is most likely the cause. >> Also, based on your query, you might want to set preserveOriginal=1 >> >> You can take one filter out at a time and see which one is altering the >> query. >> >> On Wed, Jul 26, 2017 at 11:13 AM, Webster Homer >> wrote: >> >> > 1. KeywordTokenizer - we want to treat the entire field as a single term >> to >> > parse >> > 2. preserveOriginal = "0" Thought about changing this to 1 >> > 3. 6.2.2 >> > >> > This is the fieldtype >> > > > positionIncrementGap="100"> >> > >> > >> > >> > > >generateWordParts="0" >> >splitOnCaseChange="0" >> >splitOnNumerics="1" >> >generateNumberParts="0" >> >catenateWords="0" >> >catenateNumbers="1" >> >catenateAll="0" >> >preserveOriginal="0" >> >stemEnglishPossessive="0"/> >> > >> > >> > >> > >> > > >ignoreCase="true" expand="true" >> > tokenizerFactory="solr.KeywordTokenizerFactory"/> >> > >> > > > pattern="^.*([^- 0-9*]+).*$" replacement="" replace="all"/> >> > > >generateWordParts="0" >> >splitOnCaseChange="0" >> >splitOnNumerics="1" >> >generateNumberParts="0" >> >catenateWords="0" >> >catenateNumbers="1" >> >catenateAll="0" >> >preserveOriginal="0" >> >stemEnglishPossessive="0"/> >> > >> > >> > >> > >> > On Wed, Jul 26, 2017 at 12:56 PM, Saurabh Sethi < >> > saurabh.se...@sendgrid.com> >> > wrote: >> > >> > > 1. What tokenizer are you using? >> > > 2. Do you have preserveOriginal="1" flag set in your filter? >> > > 3. Which version of solr are you using? >> > > >> > > On Wed, Jul 26, 2017 at 10:48 AM, Webster Homer < >> webster.ho...@sial.com> >> > > wrote: >> > > >> > > > I have several fieldtypes that use the WordDelimiterFilterFactory >> > > > >> > > > We have a fieldtype for cas numbers. which look like 1234-12-1, >> numbers >> > > > separated by hyphens, users often leave out the hyphens and either >> use >> > > > spaces or just string the numbers together. >> > > > >> > > > The WDF seemed like a great solution especially as it gave partial >> > > matches. >> > > > However, a query like 1234-12-* fails. The analyzer tool shows the >> > > wildcard >> > > > getting stripped off. >> > > > Is there any way to preserve the wildcard in the query analyzer when >> > > using >> > > > the WordDelimiterFilterFactory? >> > > > >> > > > -- >> > > > >> > > > >> > > > This message and any attachment are confidential and may be >> privileged >> > or >> > > > otherwise protected from disclosure. If you are not the intended >> > > recipient, >> > > > you must not copy this message or attachment or disclose the contents >> > to >> > > > any other person. If you have received this transmission in error, >> > please >> > > > notify the sender immediately and delete the message and any >> attachment >> > > > from your system. Merck KGaA, Darmstadt, Germany and any of its >> > > > subsidiaries do not accept liability for any omissions or errors in >> > this >> > > > message which may arise as a result of E-Mail-transmission or for >> > damages >> > > > resulting from any unauthorized changes of the content of this >> message >> > > and >> > > > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its >> > > > subsidiaries do not guarantee that this message is free of viruses >> and >> > > does >> > > > not accept liability for any damages caused by any virus transmitted >> > > > therewith. >> > > > >> > > > Click http://www.emdgroup.com/disclaimer to access the German, >> French, >> > > > Spanish and Portuguese versions of this disclaimer. >> > > > >> > > >> > > >> > > >> > > -- >> > > Saurabh Sethi >> > > Principal Engineer I | Engineering >> > > >> > >> > -- >> > >> > >> > This message and any attachment are confidential and may be privileged or >> > otherwise protected from disclosure. If you are not the intended >> recipient, >> > you must not copy this message or attachment or disclose the contents to >> > any other person. If you have received this transmission in error, please >> > notify the sender immediately and de