Question regarding Upgrading to SolrCloud
Hello Guys, As of now we are running Solr 3.4 with Master Slave Configuration. We are planning to upgrade it to the lastest version (6.6 or 7). Questions I have before upgrading 1. Since we do not have a lot of data, is it required to move to SolrCloud or continue using it Master Slave 2. Is the support for Master Slave will be there in the future release or do you plan to remove it. 3. Can we configure master-slave replication in Solr Cloud, if yes then do we need zookeeper as well. Thanks, Gopesh Sharma
Re: FilterCache size should reduce as index grows?
On Wed, 2017-10-04 at 21:42 -0700, S G wrote: > The bit-vectors in filterCache are as long as the maximum number of > documents in a core. If there are a billion docs per core, every bit > vector will have a billion bits making its size as 10 9 / 8 = 128 mb The tricky part here is there are sparse (aka few hits) entries that takes up less space. The 1 bit/hit is worst case. This is both good and bad. The good part is of course that it saves memory. The bad part is that it often means that people set the filterCache size to a high number and that it works well, right until a series of filters with many hits. It seems that the memory limit option maxSizeMB was added in Solr 5.2: https://issues.apache.org/jira/browse/SOLR-7372 I am not sure if it works with all caches in Solr, but in my world it is way better to define the caches by memory instead of count. > With such a big cache-value per entry, the default value of 128 > values in will become 128x128mb = 16gb and would not be very good for > a system running below 32 gb of memory. Sure. The default values are just that. For an index with 1M documents and a lot of different filters, 128 would probably be too low. If someone were to create a well-researched set of config files for different scenarios, it would be a welcome addition to our shared knowledge pool. > If such a use-case is anticipated, either the JVM's max memory be > increased to beyond 40 gb or the filterCache size be reduced to 32. Best solution: Use maxSizeMB (if it works) Second best solution: Reduce to 32 or less Third best, but often used, solution: Hope that most of the entries are sparse and will remain so - Toke Eskildsen, Royal Danish Library
Re: Complexphrase treats wildcards differently than other query parsers
Hi Bjarke, It is not multiterm that is causing query parser to skip analysis chain but wildcard. The majority of query parsers do not analyse query string if there are wildcards. HTH Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > On 4 Oct 2017, at 22:08, Bjarke Buur Mortensen wrote: > > Hi list, > > I'm trying to search for the term funktionsnedsättning* > In my analyzer chain I use a MappingCharFilterFactory to change ä to a. > So I would expect that funktionsnedsättning* would translate to > funktionsnedsattning*. > > If I use e.g. the lucene query parser, this is indeed what happens: > ...debugQuery=on&defType=lucene&q=funktionsneds%C3%A4ttning* gives me > "rawquerystring":"funktionsnedsättning*", "querystring": > "funktionsnedsättning*", "parsedquery":"content_ol:funktionsnedsattning*" > and 15 documents returned. > > Trying the same with complexphrase gives me: > ...debugQuery=on&defType=complexphrase&q=funktionsneds%C3%A4ttning* gives me > "rawquerystring":"funktionsnedsättning*", "querystring": > "funktionsnedsättning*", "parsedquery":"content_ol:funktionsnedsättning*" > and 0 documents. Notice how ä has not been changed to a. > > How can this be? Is complexphrase somehow skipping the analysis chain for > multiterms, even though components and in particular > MappingCharFilterFactory are Multi-term aware > > Are there any configuration gotchas that I'm not aware of? > > Thanks for the help, > Bjarke Buur Mortensen > Senior Software Engineer, Eluence A/S
Re: Question regarding Upgrading to SolrCloud
Hi Sharma, Please see inline answers. Thanks, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > On 5 Oct 2017, at 09:00, Gopesh Sharma wrote: > > Hello Guys, > > As of now we are running Solr 3.4 with Master Slave Configuration. We are > planning to upgrade it to the lastest version (6.6 or 7). Questions I have > before upgrading > > > 1. Since we do not have a lot of data, is it required to move to SolrCloud > or continue using it Master Slave It is not required to move to SolrCloud if you are ok with MS limitations. The main drivers to move to SC are: data volume that requires sharding NRT requirements that cannot be met with MS model FT requirements - with MS you have SPOF - master node that can prevent updates, but if you NRT requirements are not strict and can tolerate longer periods without updates, this can be ignored > 2. Is the support for Master Slave will be there in the future release or > do you plan to remove it. SolrCloud also uses replication as backup mechanism, so it is there to stay. > 3. Can we configure master-slave replication in Solr Cloud, if yes then do > we need zookeeper as well. SolrCloud requires ZK - it is where it keeps cluster state. Like mentioned above, SolrCloud have replication handlers enabled, so you can have some hybrid model, but it will not make your system simpler. > > Thanks, > Gopesh Sharma >
Re: Complexphrase treats wildcards differently than other query parsers
Well, according to https://lucidworks.com/2011/11/29/whats-with-lowercasing-wildcard-multiterm-queries-in-solr/ multiterm means wildcard range prefix so it is that way i'm using the word. That same article explains how analysis will be performed with wildcards if the analyzers are multi-term aware. Furthermore, both lucene and dismax do the correct analysis, so I don't think you are right in your statement about the majority of QPs skipping analysis for wildcards. So I'm still confused as to why complexphrase does things differently. Thanks, /Bjarke 2017-10-05 10:16 GMT+02:00 Emir Arnautović : > Hi Bjarke, > It is not multiterm that is causing query parser to skip analysis chain > but wildcard. The majority of query parsers do not analyse query string if > there are wildcards. > > HTH > Emir > -- > Monitoring - Log Management - Alerting - Anomaly Detection > Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > > > > > On 4 Oct 2017, at 22:08, Bjarke Buur Mortensen > wrote: > > > > Hi list, > > > > I'm trying to search for the term funktionsnedsättning* > > In my analyzer chain I use a MappingCharFilterFactory to change ä to a. > > So I would expect that funktionsnedsättning* would translate to > > funktionsnedsattning*. > > > > If I use e.g. the lucene query parser, this is indeed what happens: > > ...debugQuery=on&defType=lucene&q=funktionsneds%C3%A4ttning* gives me > > "rawquerystring":"funktionsnedsättning*", "querystring": > > "funktionsnedsättning*", "parsedquery":"content_ol: > funktionsnedsattning*" > > and 15 documents returned. > > > > Trying the same with complexphrase gives me: > > ...debugQuery=on&defType=complexphrase&q=funktionsneds%C3%A4ttning* > gives me > > "rawquerystring":"funktionsnedsättning*", "querystring": > > "funktionsnedsättning*", "parsedquery":"content_ol: > funktionsnedsättning*" > > and 0 documents. Notice how ä has not been changed to a. > > > > How can this be? Is complexphrase somehow skipping the analysis chain for > > multiterms, even though components and in particular > > MappingCharFilterFactory are Multi-term aware > > > > Are there any configuration gotchas that I'm not aware of? > > > > Thanks for the help, > > Bjarke Buur Mortensen > > Senior Software Engineer, Eluence A/S > >
Re: Complexphrase treats wildcards differently than other query parsers
Hi Bjarke, You are right - I jumped into wrong/old conclusion as the simplest answer to your question. I guess looking at the code could give you an answer. Thanks, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > On 5 Oct 2017, at 10:44, Bjarke Buur Mortensen wrote: > > Well, according to > https://lucidworks.com/2011/11/29/whats-with-lowercasing-wildcard-multiterm-queries-in-solr/ > multiterm means > > wildcard > range > prefix > > so it is that way i'm using the word. That same article explains how > analysis will be performed with wildcards if the analyzers are multi-term > aware. > Furthermore, both lucene and dismax do the correct analysis, so I don't > think you are right in your statement about the majority of QPs skipping > analysis for wildcards. > > So I'm still confused as to why complexphrase does things differently. > > Thanks, > /Bjarke > > 2017-10-05 10:16 GMT+02:00 Emir Arnautović : > >> Hi Bjarke, >> It is not multiterm that is causing query parser to skip analysis chain >> but wildcard. The majority of query parsers do not analyse query string if >> there are wildcards. >> >> HTH >> Emir >> -- >> Monitoring - Log Management - Alerting - Anomaly Detection >> Solr & Elasticsearch Consulting Support Training - http://sematext.com/ >> >> >> >>> On 4 Oct 2017, at 22:08, Bjarke Buur Mortensen >> wrote: >>> >>> Hi list, >>> >>> I'm trying to search for the term funktionsnedsättning* >>> In my analyzer chain I use a MappingCharFilterFactory to change ä to a. >>> So I would expect that funktionsnedsättning* would translate to >>> funktionsnedsattning*. >>> >>> If I use e.g. the lucene query parser, this is indeed what happens: >>> ...debugQuery=on&defType=lucene&q=funktionsneds%C3%A4ttning* gives me >>> "rawquerystring":"funktionsnedsättning*", "querystring": >>> "funktionsnedsättning*", "parsedquery":"content_ol: >> funktionsnedsattning*" >>> and 15 documents returned. >>> >>> Trying the same with complexphrase gives me: >>> ...debugQuery=on&defType=complexphrase&q=funktionsneds%C3%A4ttning* >> gives me >>> "rawquerystring":"funktionsnedsättning*", "querystring": >>> "funktionsnedsättning*", "parsedquery":"content_ol: >> funktionsnedsättning*" >>> and 0 documents. Notice how ä has not been changed to a. >>> >>> How can this be? Is complexphrase somehow skipping the analysis chain for >>> multiterms, even though components and in particular >>> MappingCharFilterFactory are Multi-term aware >>> >>> Are there any configuration gotchas that I'm not aware of? >>> >>> Thanks for the help, >>> Bjarke Buur Mortensen >>> Senior Software Engineer, Eluence A/S >> >>
tf function query
Hi, According to https://lucene.apache.org/solr/guide/6_6/function-queries.html#FunctionQueries-AvailableFunctions tf(field, term) requires a term as a second parameter. Is there a possibility to pass in an entire input query (multiterm and boolean) to the function? The context here is that we don't use edismax parser to apply multifield boosts, but instead use a custom ranking function. Would appreciate any thoughts, Dmitry -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: https://semanticanalyzer.info
Re: Complexphrase treats wildcards differently than other query parsers
2017-10-05 11:29 GMT+02:00 Emir Arnautović : > Hi Bjarke, > You are right - I jumped into wrong/old conclusion as the simplest answer > to your question. No problem :-) I guess looking at the code could give you an answer. > This is what I would like to avoid out of fear that my head would explode ;-) > > Thanks, > Emir > -- > Monitoring - Log Management - Alerting - Anomaly Detection > Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > > > > > On 5 Oct 2017, at 10:44, Bjarke Buur Mortensen > wrote: > > > > Well, according to > > https://lucidworks.com/2011/11/29/whats-with-lowercasing- > wildcard-multiterm-queries-in-solr/ > > multiterm means > > > > wildcard > > range > > prefix > > > > so it is that way i'm using the word. That same article explains how > > analysis will be performed with wildcards if the analyzers are multi-term > > aware. > > Furthermore, both lucene and dismax do the correct analysis, so I don't > > think you are right in your statement about the majority of QPs skipping > > analysis for wildcards. > > > > So I'm still confused as to why complexphrase does things differently. > > > > Thanks, > > /Bjarke > > > > 2017-10-05 10:16 GMT+02:00 Emir Arnautović >: > > > >> Hi Bjarke, > >> It is not multiterm that is causing query parser to skip analysis chain > >> but wildcard. The majority of query parsers do not analyse query string > if > >> there are wildcards. > >> > >> HTH > >> Emir > >> -- > >> Monitoring - Log Management - Alerting - Anomaly Detection > >> Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > >> > >> > >> > >>> On 4 Oct 2017, at 22:08, Bjarke Buur Mortensen > >> wrote: > >>> > >>> Hi list, > >>> > >>> I'm trying to search for the term funktionsnedsättning* > >>> In my analyzer chain I use a MappingCharFilterFactory to change ä to a. > >>> So I would expect that funktionsnedsättning* would translate to > >>> funktionsnedsattning*. > >>> > >>> If I use e.g. the lucene query parser, this is indeed what happens: > >>> ...debugQuery=on&defType=lucene&q=funktionsneds%C3%A4ttning* gives me > >>> "rawquerystring":"funktionsnedsättning*", "querystring": > >>> "funktionsnedsättning*", "parsedquery":"content_ol: > >> funktionsnedsattning*" > >>> and 15 documents returned. > >>> > >>> Trying the same with complexphrase gives me: > >>> ...debugQuery=on&defType=complexphrase&q=funktionsneds%C3%A4ttning* > >> gives me > >>> "rawquerystring":"funktionsnedsättning*", "querystring": > >>> "funktionsnedsättning*", "parsedquery":"content_ol: > >> funktionsnedsättning*" > >>> and 0 documents. Notice how ä has not been changed to a. > >>> > >>> How can this be? Is complexphrase somehow skipping the analysis chain > for > >>> multiterms, even though components and in particular > >>> MappingCharFilterFactory are Multi-term aware > >>> > >>> Are there any configuration gotchas that I'm not aware of? > >>> > >>> Thanks for the help, > >>> Bjarke Buur Mortensen > >>> Senior Software Engineer, Eluence A/S > >> > >> > >
Question regarding Upgrading to SolrCloud
Hello Guys, As of now we are running Solr 3.4 with Master Slave Configuration. We are planning to upgrade it to the lastest version (6.6 or 7). Questions I have before upgrading 1. Since we do not have a lot of data, is it required to move to SolrCloud or continue using it Master Slave 2. Is the support for Master Slave will be there in the future release or do you plan to remove it. 3. Can we configure master-slave replication in Solr Cloud, if yes then do we need zookeeper as well. Thanks, Gopesh Sharma
RE: tf function query
I am afraid this is not possible, since getting frequencies for phrases is not possible, unless the phrases are created as tokens (i.e. using n-grams or shingles) and indexed. If someone has a solution for this, then I am interested as well. /JZ -Original Message- From: Dmitry Kan [mailto:solrexp...@gmail.com] Sent: Thursday, October 5, 2017 12:15 PM To: solr-user@lucene.apache.org Subject: tf function query Hi, According to https://lucene.apache.org/solr/guide/6_6/function-queries.html#FunctionQueries-AvailableFunctions tf(field, term) requires a term as a second parameter. Is there a possibility to pass in an entire input query (multiterm and boolean) to the function? The context here is that we don't use edismax parser to apply multifield boosts, but instead use a custom ranking function. Would appreciate any thoughts, Dmitry -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: https://semanticanalyzer.info
Re: tf function query
How about the query() function? Just be clever about the query you specify ;) > On Oct 5, 2017, at 06:14, Dmitry Kan wrote: > > Hi, > > According to > https://lucene.apache.org/solr/guide/6_6/function-queries.html#FunctionQueries-AvailableFunctions > > tf(field, term) requires a term as a second parameter. Is there a > possibility to pass in an entire input query (multiterm and boolean) to the > function? > > The context here is that we don't use edismax parser to apply multifield > boosts, but instead use a custom ranking function. > > Would appreciate any thoughts, > > Dmitry > > -- > Dmitry Kan > Luke Toolbox: http://github.com/DmitryKey/luke > Blog: http://dmitrykan.blogspot.com > Twitter: http://twitter.com/dmitrykan > SemanticAnalyzer: https://semanticanalyzer.info
Re: Solr boost function taking precedence over relevance boosting
I would try to use an additive boost and the ^= boost operator: - name_property :( test^=2 ) will assign a fixed score of 2 if the match happens ( it is a constant score query) - additive boost will be 0http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
RE: Complexphrase treats wildcards differently than other query parsers
What version of Solr are you using? I thought this had been fixed fairly recently, but I can't quickly find the JIRA. Let me take a look. Best, Tim This was one of my initial reasons for my SpanQueryParser LUCENE-5205[1] and [2], which handles analysis of multiterms even in phrases. [1] https://github.com/tballison/lucene-addons/tree/master/lucene-5205 [2] https://mvnrepository.com/artifact/org.tallison.lucene/lucene-5205/6.6-0.1 -Original Message- From: Bjarke Buur Mortensen [mailto:morten...@eluence.com] Sent: Thursday, October 5, 2017 6:28 AM To: solr-user@lucene.apache.org Subject: Re: Complexphrase treats wildcards differently than other query parsers 2017-10-05 11:29 GMT+02:00 Emir Arnautović : > Hi Bjarke, > You are right - I jumped into wrong/old conclusion as the simplest > answer to your question. No problem :-) I guess looking at the code could give you an answer. > This is what I would like to avoid out of fear that my head would explode ;-) > > Thanks, > Emir > -- > Monitoring - Log Management - Alerting - Anomaly Detection Solr & > Elasticsearch Consulting Support Training - http://sematext.com/ > > > > > On 5 Oct 2017, at 10:44, Bjarke Buur Mortensen > > > wrote: > > > > Well, according to > > https://lucidworks.com/2011/11/29/whats-with-lowercasing- > wildcard-multiterm-queries-in-solr/ > > multiterm means > > > > wildcard > > range > > prefix > > > > so it is that way i'm using the word. That same article explains how > > analysis will be performed with wildcards if the analyzers are > > multi-term aware. > > Furthermore, both lucene and dismax do the correct analysis, so I > > don't think you are right in your statement about the majority of > > QPs skipping analysis for wildcards. > > > > So I'm still confused as to why complexphrase does things differently. > > > > Thanks, > > /Bjarke > > > > 2017-10-05 10:16 GMT+02:00 Emir Arnautović > > >: > > > >> Hi Bjarke, > >> It is not multiterm that is causing query parser to skip analysis > >> chain but wildcard. The majority of query parsers do not analyse > >> query string > if > >> there are wildcards. > >> > >> HTH > >> Emir > >> -- > >> Monitoring - Log Management - Alerting - Anomaly Detection Solr & > >> Elasticsearch Consulting Support Training - http://sematext.com/ > >> > >> > >> > >>> On 4 Oct 2017, at 22:08, Bjarke Buur Mortensen > >>> > >> wrote: > >>> > >>> Hi list, > >>> > >>> I'm trying to search for the term funktionsnedsättning* In my > >>> analyzer chain I use a MappingCharFilterFactory to change ä to a. > >>> So I would expect that funktionsnedsättning* would translate to > >>> funktionsnedsattning*. > >>> > >>> If I use e.g. the lucene query parser, this is indeed what happens: > >>> ...debugQuery=on&defType=lucene&q=funktionsneds%C3%A4ttning* gives > >>> me "rawquerystring":"funktionsnedsättning*", "querystring": > >>> "funktionsnedsättning*", "parsedquery":"content_ol: > >> funktionsnedsattning*" > >>> and 15 documents returned. > >>> > >>> Trying the same with complexphrase gives me: > >>> ...debugQuery=on&defType=complexphrase&q=funktionsneds%C3%A4ttning > >>> * > >> gives me > >>> "rawquerystring":"funktionsnedsättning*", "querystring": > >>> "funktionsnedsättning*", "parsedquery":"content_ol: > >> funktionsnedsättning*" > >>> and 0 documents. Notice how ä has not been changed to a. > >>> > >>> How can this be? Is complexphrase somehow skipping the analysis > >>> chain > for > >>> multiterms, even though components and in particular > >>> MappingCharFilterFactory are Multi-term aware > >>> > >>> Are there any configuration gotchas that I'm not aware of? > >>> > >>> Thanks for the help, > >>> Bjarke Buur Mortensen > >>> Senior Software Engineer, Eluence A/S > >> > >> > >
RE: Complexphrase treats wildcards differently than other query parsers
There's every chance that I'm missing something at the Solr level, but it _looks_ at the Lucene level, like ComplexPhraseQueryParser is still not applying analysis to multiterms. When I call this on 7.0.0: QueryParser qp = new ComplexPhraseQueryParser(defaultFieldName, analyzer); return qp.parse(qString); where the analyzer is a mock "uppercase vowel" analyzer[1] and the qString is; "the* quick~" the* quick~ the quick I get this: "the* quick~" name:the* name:quick~2 name:thE name:qUIck [1] https://github.com/tballison/lucene-addons/blob/master/lucene-5205/src/test/java/org/apache/lucene/queryparser/spans/TestAdvancedAnalyzers.java#L117 -Original Message- From: Allison, Timothy B. [mailto:talli...@mitre.org] Sent: Thursday, October 5, 2017 8:02 AM To: solr-user@lucene.apache.org Subject: RE: Complexphrase treats wildcards differently than other query parsers What version of Solr are you using? I thought this had been fixed fairly recently, but I can't quickly find the JIRA. Let me take a look. Best, Tim This was one of my initial reasons for my SpanQueryParser LUCENE-5205[1] and [2], which handles analysis of multiterms even in phrases. [1] https://github.com/tballison/lucene-addons/tree/master/lucene-5205 [2] https://mvnrepository.com/artifact/org.tallison.lucene/lucene-5205/6.6-0.1 -Original Message- From: Bjarke Buur Mortensen [mailto:morten...@eluence.com] Sent: Thursday, October 5, 2017 6:28 AM To: solr-user@lucene.apache.org Subject: Re: Complexphrase treats wildcards differently than other query parsers 2017-10-05 11:29 GMT+02:00 Emir Arnautović : > Hi Bjarke, > You are right - I jumped into wrong/old conclusion as the simplest > answer to your question. No problem :-) I guess looking at the code could give you an answer. > This is what I would like to avoid out of fear that my head would explode ;-) > > Thanks, > Emir > -- > Monitoring - Log Management - Alerting - Anomaly Detection Solr & > Elasticsearch Consulting Support Training - http://sematext.com/ > > > > > On 5 Oct 2017, at 10:44, Bjarke Buur Mortensen > > > wrote: > > > > Well, according to > > https://lucidworks.com/2011/11/29/whats-with-lowercasing- > wildcard-multiterm-queries-in-solr/ > > multiterm means > > > > wildcard > > range > > prefix > > > > so it is that way i'm using the word. That same article explains how > > analysis will be performed with wildcards if the analyzers are > > multi-term aware. > > Furthermore, both lucene and dismax do the correct analysis, so I > > don't think you are right in your statement about the majority of > > QPs skipping analysis for wildcards. > > > > So I'm still confused as to why complexphrase does things differently. > > > > Thanks, > > /Bjarke > > > > 2017-10-05 10:16 GMT+02:00 Emir Arnautović > > >: > > > >> Hi Bjarke, > >> It is not multiterm that is causing query parser to skip analysis > >> chain but wildcard. The majority of query parsers do not analyse > >> query string > if > >> there are wildcards. > >> > >> HTH > >> Emir > >> -- > >> Monitoring - Log Management - Alerting - Anomaly Detection Solr & > >> Elasticsearch Consulting Support Training - http://sematext.com/ > >> > >> > >> > >>> On 4 Oct 2017, at 22:08, Bjarke Buur Mortensen > >>> > >> wrote: > >>> > >>> Hi list, > >>> > >>> I'm trying to search for the term funktionsnedsättning* In my > >>> analyzer chain I use a MappingCharFilterFactory to change ä to a. > >>> So I would expect that funktionsnedsättning* would translate to > >>> funktionsnedsattning*. > >>> > >>> If I use e.g. the lucene query parser, this is indeed what happens: > >>> ...debugQuery=on&defType=lucene&q=funktionsneds%C3%A4ttning* gives > >>> me "rawquerystring":"funktionsnedsättning*", "querystring": > >>> "funktionsnedsättning*", "parsedquery":"content_ol: > >> funktionsnedsattning*" > >>> and 15 documents returned. > >>> > >>> Trying the same with complexphrase gives me: > >>> ...debugQuery=on&defType=complexphrase&q=funktionsneds%C3%A4ttning > >>> * > >> gives me > >>> "rawquerystring":"funktionsnedsättning*", "querystring": > >>> "funktionsnedsättning*", "parsedquery":"content_ol: > >> funktionsnedsättning*" > >>> and 0 documents. Notice how ä has not been changed to a. > >>> > >>> How can this be? Is complexphrase somehow skipping the analysis > >>> chain > for > >>> multiterms, even though components and in particular > >>> MappingCharFilterFactory are Multi-term aware > >>> > >>> Are there any configuration gotchas that I'm not aware of? > >>> > >>> Thanks for the help, > >>> Bjarke Buur Mortensen > >>> Senior Software Engineer, Eluence A/S > >> > >> > >
Re: Complexphrase treats wildcards differently than other query parsers
Thanks Tim, that might be what I'm experiencing. I'm actually quite certain of it :-) Do you remember any reason that multi term analysis is not happening in ComplexPhraseQueryParser? I'm on 6.6.1, so latest on the 6.x branch. 2017-10-05 14:34 GMT+02:00 Allison, Timothy B. : > There's every chance that I'm missing something at the Solr level, but it > _looks_ at the Lucene level, like ComplexPhraseQueryParser is still not > applying analysis to multiterms. > > When I call this on 7.0.0: >QueryParser qp = new ComplexPhraseQueryParser(defaultFieldName, > analyzer); > return qp.parse(qString); > > where the analyzer is a mock "uppercase vowel" analyzer[1] and the > qString is; > > "the* quick~" the* quick~ the quick > > I get this: > "the* quick~" name:the* name:quick~2 name:thE name:qUIck > > > [1] https://github.com/tballison/lucene-addons/blob/master/ > lucene-5205/src/test/java/org/apache/lucene/queryparser/ > spans/TestAdvancedAnalyzers.java#L117 > > -Original Message- > From: Allison, Timothy B. [mailto:talli...@mitre.org] > Sent: Thursday, October 5, 2017 8:02 AM > To: solr-user@lucene.apache.org > Subject: RE: Complexphrase treats wildcards differently than other query > parsers > > What version of Solr are you using? > > I thought this had been fixed fairly recently, but I can't quickly find > the JIRA. Let me take a look. > > Best, > > Tim > > This was one of my initial reasons for my SpanQueryParser LUCENE-5205[1] > and [2], which handles analysis of multiterms even in phrases. > > [1] https://github.com/tballison/lucene-addons/tree/master/lucene-5205 > [2] https://mvnrepository.com/artifact/org.tallison.lucene/ > lucene-5205/6.6-0.1 > > -Original Message- > From: Bjarke Buur Mortensen [mailto:morten...@eluence.com] > Sent: Thursday, October 5, 2017 6:28 AM > To: solr-user@lucene.apache.org > Subject: Re: Complexphrase treats wildcards differently than other query > parsers > > 2017-10-05 11:29 GMT+02:00 Emir Arnautović : > > > Hi Bjarke, > > You are right - I jumped into wrong/old conclusion as the simplest > > answer to your question. > > > No problem :-) > > I guess looking at the code could give you an answer. > > > > This is what I would like to avoid out of fear that my head would explode > ;-) > > > > > > Thanks, > > Emir > > -- > > Monitoring - Log Management - Alerting - Anomaly Detection Solr & > > Elasticsearch Consulting Support Training - http://sematext.com/ > > > > > > > > > On 5 Oct 2017, at 10:44, Bjarke Buur Mortensen > > > > > wrote: > > > > > > Well, according to > > > https://lucidworks.com/2011/11/29/whats-with-lowercasing- > > wildcard-multiterm-queries-in-solr/ > > > multiterm means > > > > > > wildcard > > > range > > > prefix > > > > > > so it is that way i'm using the word. That same article explains how > > > analysis will be performed with wildcards if the analyzers are > > > multi-term aware. > > > Furthermore, both lucene and dismax do the correct analysis, so I > > > don't think you are right in your statement about the majority of > > > QPs skipping analysis for wildcards. > > > > > > So I'm still confused as to why complexphrase does things differently. > > > > > > Thanks, > > > /Bjarke > > > > > > 2017-10-05 10:16 GMT+02:00 Emir Arnautović > > > > >: > > > > > >> Hi Bjarke, > > >> It is not multiterm that is causing query parser to skip analysis > > >> chain but wildcard. The majority of query parsers do not analyse > > >> query string > > if > > >> there are wildcards. > > >> > > >> HTH > > >> Emir > > >> -- > > >> Monitoring - Log Management - Alerting - Anomaly Detection Solr & > > >> Elasticsearch Consulting Support Training - http://sematext.com/ > > >> > > >> > > >> > > >>> On 4 Oct 2017, at 22:08, Bjarke Buur Mortensen > > >>> > > >> wrote: > > >>> > > >>> Hi list, > > >>> > > >>> I'm trying to search for the term funktionsnedsättning* In my > > >>> analyzer chain I use a MappingCharFilterFactory to change ä to a. > > >>> So I would expect that funktionsnedsättning* would translate to > > >>> funktionsnedsattning*. > > >>> > > >>> If I use e.g. the lucene query parser, this is indeed what happens: > > >>> ...debugQuery=on&defType=lucene&q=funktionsneds%C3%A4ttning* gives > > >>> me "rawquerystring":"funktionsnedsättning*", "querystring": > > >>> "funktionsnedsättning*", "parsedquery":"content_ol: > > >> funktionsnedsattning*" > > >>> and 15 documents returned. > > >>> > > >>> Trying the same with complexphrase gives me: > > >>> ...debugQuery=on&defType=complexphrase&q=funktionsneds%C3%A4ttning > > >>> * > > >> gives me > > >>> "rawquerystring":"funktionsnedsättning*", "querystring": > > >>> "funktionsnedsättning*", "parsedquery":"content_ol: > > >> funktionsnedsättning*" > > >>> and 0 documents. Notice how ä has not been changed to a. > > >>> > > >>> How can this be? Is complexphrase somehow skipping the analysis > > >>> chain > > for > > >>> multiterms, even though components and in particular > > >>> MappingCh
Re: tf function query
What would you expect as output? tf(field, "a OR b AND c NOT d"). I'm not sure what term frequency would even mean in that situation. tf is a pretty simple function, it expects a single term and there's now way I know of to do what you're asking. Best, Erick On Thu, Oct 5, 2017 at 3:14 AM, Dmitry Kan wrote: > Hi, > > According to > https://lucene.apache.org/solr/guide/6_6/function-queries.html#FunctionQueries-AvailableFunctions > > tf(field, term) requires a term as a second parameter. Is there a > possibility to pass in an entire input query (multiterm and boolean) to the > function? > > The context here is that we don't use edismax parser to apply multifield > boosts, but instead use a custom ranking function. > > Would appreciate any thoughts, > > Dmitry > > -- > Dmitry Kan > Luke Toolbox: http://github.com/DmitryKey/luke > Blog: http://dmitrykan.blogspot.com > Twitter: http://twitter.com/dmitrykan > SemanticAnalyzer: https://semanticanalyzer.info
Solrcloud replication not working
Hi, We are using Solr 6.4.2 & SolrCloud setup. We have two solr instances in the solr cluster.This solrcloud running in ubuntu OS. The problem is replication is not happening between these two solr instances. sometimes it replicate 10% of the content and sometimes not. In Zookeeper ensemble we have three zookeeper instances running in a different box. thanks. -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Question regarding Upgrading to SolrCloud
Gopesh: There is brand new functionality in Solr 7, see: SOLR-10233, the "PULL" replica type which is a hybrid SolrCloud replica that uses master/slave type replication. You should find this in the reference guide, the 7.0 ref guide should be published soon. Meanwhile, that JIRA will let you know. Also see .../solr/CHANGES.txt. As Emir says, though, it would require ZooKeeper. Really, though, once you move to SolrCloud (if you do) I'd stick with the standard NRT replica type unless I had reason to use one of the other two, (TLOG and PULL) as they're for pretty special situations. All that said, if you're happy with master/slave there's no compelling reason to go to SolrCloud, especially for smaller installations. Best, Erick On Wed, Oct 4, 2017 at 11:46 PM, Gopesh Sharma wrote: > Hello Guys, > > As of now we are running Solr 3.4 with Master Slave Configuration. We are > planning to upgrade it to the lastest version (6.6 or 7). Questions I have > before upgrading > > > 1. Since we do not have a lot of data, is it required to move to SolrCloud > or continue using it Master Slave > 2. Is the support for Master Slave will be there in the future release or > do you plan to remove it. > 3. Can we configure master-slave replication in Solr Cloud, if yes then do > we need zookeeper as well. > > Thanks, > Gopesh Sharma
Re: Solrcloud replication not working
We need a lot more data to say anything useful, please read: https://wiki.apache.org/solr/UsingMailingLists What do you see in your Solr logs? What have you tried to do to diagnose this? Do you have enough disk space? Best, Erick On Thu, Oct 5, 2017 at 6:56 AM, solr2020 wrote: > Hi, > > We are using Solr 6.4.2 & SolrCloud setup. We have two solr instances in the > solr cluster.This solrcloud running in ubuntu OS. The problem is replication > is not happening between these two solr instances. sometimes it replicate > 10% of the content and sometimes not. > > In Zookeeper ensemble we have three zookeeper instances running in a > different box. > > thanks. > > > > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: FilterCache size should reduce as index grows?
The other thing I'd point out is that if your hit ratio is low, you might as well disable it entirely. Finally, if you have any a-priori knowledge that certain fq clauses are very unlikely to be re-used, add {!cache=false}. If you also add cost=101, then the fq clause will only be evaluated for docs that need it, especially if you turn caching off. See: http://yonik.com/advanced-filter-caching-in-solr/ Best, Erick On Thu, Oct 5, 2017 at 12:20 AM, Toke Eskildsen wrote: > On Wed, 2017-10-04 at 21:42 -0700, S G wrote: >> The bit-vectors in filterCache are as long as the maximum number of >> documents in a core. If there are a billion docs per core, every bit >> vector will have a billion bits making its size as 10 9 / 8 = 128 mb > > The tricky part here is there are sparse (aka few hits) entries that > takes up less space. The 1 bit/hit is worst case. > > This is both good and bad. The good part is of course that it saves > memory. The bad part is that it often means that people set the > filterCache size to a high number and that it works well, right until > a series of filters with many hits. > > It seems that the memory limit option maxSizeMB was added in Solr 5.2: > https://issues.apache.org/jira/browse/SOLR-7372 > I am not sure if it works with all caches in Solr, but in my world it > is way better to define the caches by memory instead of count. > >> With such a big cache-value per entry, the default value of 128 >> values in will become 128x128mb = 16gb and would not be very good for >> a system running below 32 gb of memory. > > Sure. The default values are just that. For an index with 1M documents > and a lot of different filters, 128 would probably be too low. > > If someone were to create a well-researched set of config files for > different scenarios, it would be a welcome addition to our shared > knowledge pool. > >> If such a use-case is anticipated, either the JVM's max memory be >> increased to beyond 40 gb or the filterCache size be reduced to 32. > > Best solution: Use maxSizeMB (if it works) > Second best solution: Reduce to 32 or less > Third best, but often used, solution: Hope that most of the entries are > sparse and will remain so > > - Toke Eskildsen, Royal Danish Library >
Solrcloud replication not working
Hi, We are using Solr 6.4.2 & SolrCloud setup. We have two solr instances in the solr cluster.This solrcloud running in ubuntu OS. The problem is replication is not happening between these two solr instances. sometimes it replicate 10% of the content and sometimes not. In Zookeeper ensemble we have three zookeeper instances running in a different box. thanks. -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Solrcloud replication not working
thanks. We dont see any error message/any message in logs. And we have enough disk space. We are running solr as root user in ubuntu box but zookeeper process running as zookeeper user.Will that cause the problem? -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Error adding replica after a delete replica
A colleague of mine was testing how solrcloud replica recovery works. We have had a lot of issues with replicas going into recovery mode, replicas down and in recovery failed states. So to test, he deleted a healthy replica in one of our development. First the delete operation timed out, but the replica appears to be gone. However, addReplica always fails with this error: Error CREATEing SolrCore 'sial-content-citations_shard1_replica1': Unable to create core [sial-content-citations_shard1_replica1] Caused by: Lock held by this virtual machine: /var/solr/data/sial-content- citations_shard1_replica1/data/index/write.lock This cloud has 4 nodes. The collection has two shards with two replicas per shard. They are all hosted in a google cloud environment. So if the delete deleted the replica why would it then hold a lock? We want to understand this. We are using Solr 6.2.0 -- This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not the intended recipient, you must not copy this message or attachment or disclose the contents to any other person. If you have received this transmission in error, please notify the sender immediately and delete the message and any attachment from your system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept liability for any omissions or errors in this message which may arise as a result of E-Mail-transmission or for damages resulting from any unauthorized changes of the content of this message and any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not guarantee that this message is free of viruses and does not accept liability for any damages caused by any virus transmitted therewith. Click http://www.emdgroup.com/disclaimer to access the German, French, Spanish and Portuguese versions of this disclaimer.
Re: FilterCache size should reduce as index grows?
On Thu, Oct 5, 2017 at 10:07 AM, Erick Erickson wrote: > The other thing I'd point out is that if your hit ratio is low, you > might as well disable it entirely. I'd normally recommend against turning it off entirely, except in *very* custom cases. Even if the user doesn't reuse filter queries, Solr itself can internally in many different ways. One way is 2-phase distributed search for example. Another is big terms in UIF faceting. Some of these things were designed with the presence of a filter cache in mind. -Yonik
Re: FilterCache size should reduce as index grows?
On Thu, Oct 5, 2017 at 3:20 AM, Toke Eskildsen wrote: > On Wed, 2017-10-04 at 21:42 -0700, S G wrote: > > It seems that the memory limit option maxSizeMB was added in Solr 5.2: > https://issues.apache.org/jira/browse/SOLR-7372 > I am not sure if it works with all caches in Solr, but in my world it > is way better to define the caches by memory instead of count. Yes, that will work with the filterCache, but one needs to change the cache type as well (maxSizeMB is only an option on LRUCache, and filterCache uses FastLRUCache in the default solrconfig.xml) -Yonik
Recommendations for number of open files?
We have begun to see errors around too many open files on one of our solrcloud nodes. One replica tries to open >8000 files. This replica tries to startup and then fails the open files are exceeded upon startup as it tries to recover. Our solrclouds have 12 distinct collections. I would think that the number of open files would depend upon the number of collections as well as numbers of files per index etc... Our current setting is 8192 open files per process. What values are recommended? is there a normal number of open files? What would lead to there being lots of open files? -- This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not the intended recipient, you must not copy this message or attachment or disclose the contents to any other person. If you have received this transmission in error, please notify the sender immediately and delete the message and any attachment from your system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept liability for any omissions or errors in this message which may arise as a result of E-Mail-transmission or for damages resulting from any unauthorized changes of the content of this message and any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not guarantee that this message is free of viruses and does not accept liability for any damages caused by any virus transmitted therewith. Click http://www.emdgroup.com/disclaimer to access the German, French, Spanish and Portuguese versions of this disclaimer.
Re: Recommendations for number of open files?
Well, Lucene keeps an open file handle for _every_ file in _every_ index directory. So, for instance, let's say a replica has 10 segments. Each segment is 10-15 individual files. So that's 100-150 file handles right there. And indexes can have many segments. Check to see if "cfs" extensions are in your indexing directory, that's "compound file system" and if present will reduce the number of file handles needed. A second thing you might be able to do is increase the maximum segment size by setting maxMergedSegmentMB in your solrconfig file for TieredMergePolicy, something like 1 eventually that'll merge segments into fewer, but that'll take a while. As to your question, we usually recommend to set the file limit to "unlimited". You do have to monitor it however, at some point there's a lot of bookkeeping. one replica trying to open > 8,000 files seems very odd though. Is it a massive index? The default max segment size is 5G, so you could have a gazillion small segments in which case you might want to split that shard up and move the sub-shards to some other machine. Best, Erick On Thu, Oct 5, 2017 at 10:02 AM, Webster Homer wrote: > We have begun to see errors around too many open files on one of our > solrcloud nodes. One replica tries to open >8000 files. This replica tries > to startup and then fails the open files are exceeded upon startup as it > tries to recover. > > > Our solrclouds have 12 distinct collections. I would think that the number > of open files would depend upon the number of collections as well as > numbers of files per index etc... > > Our current setting is 8192 open files per process. > > What values are recommended? is there a normal number of open files? > > What would lead to there being lots of open files? > > -- > > > This message and any attachment are confidential and may be privileged or > otherwise protected from disclosure. If you are not the intended recipient, > you must not copy this message or attachment or disclose the contents to > any other person. If you have received this transmission in error, please > notify the sender immediately and delete the message and any attachment > from your system. Merck KGaA, Darmstadt, Germany and any of its > subsidiaries do not accept liability for any omissions or errors in this > message which may arise as a result of E-Mail-transmission or for damages > resulting from any unauthorized changes of the content of this message and > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its > subsidiaries do not guarantee that this message is free of viruses and does > not accept liability for any damages caused by any virus transmitted > therewith. > > Click http://www.emdgroup.com/disclaimer to access the German, French, > Spanish and Portuguese versions of this disclaimer.
RE: Complexphrase treats wildcards differently than other query parsers
Prob the usual reasons...no one has submitted a patch yet, or could be a regression after LUCENE-7355. See also: https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201407.mbox/%3c1d06a081892adf4589bd83ee24b9dc3025971...@imcmbx02.mitre.org%3E I'll take a look. -Original Message- From: Bjarke Buur Mortensen [mailto:morten...@eluence.com] Sent: Thursday, October 5, 2017 8:52 AM To: solr-user@lucene.apache.org Subject: Re: Complexphrase treats wildcards differently than other query parsers Thanks Tim, that might be what I'm experiencing. I'm actually quite certain of it :-) Do you remember any reason that multi term analysis is not happening in ComplexPhraseQueryParser? I'm on 6.6.1, so latest on the 6.x branch. 2017-10-05 14:34 GMT+02:00 Allison, Timothy B. : > There's every chance that I'm missing something at the Solr level, but > it _looks_ at the Lucene level, like ComplexPhraseQueryParser is still > not applying analysis to multiterms. > > When I call this on 7.0.0: >QueryParser qp = new ComplexPhraseQueryParser(defaultFieldName, > analyzer); > return qp.parse(qString); > > where the analyzer is a mock "uppercase vowel" analyzer[1] and the > qString is; > > "the* quick~" the* quick~ the quick > > I get this: > "the* quick~" name:the* name:quick~2 name:thE name:qUIck > > > [1] https://github.com/tballison/lucene-addons/blob/master/ > lucene-5205/src/test/java/org/apache/lucene/queryparser/ > spans/TestAdvancedAnalyzers.java#L117 > > -Original Message- > From: Allison, Timothy B. [mailto:talli...@mitre.org] > Sent: Thursday, October 5, 2017 8:02 AM > To: solr-user@lucene.apache.org > Subject: RE: Complexphrase treats wildcards differently than other > query parsers > > What version of Solr are you using? > > I thought this had been fixed fairly recently, but I can't quickly > find the JIRA. Let me take a look. > > Best, > > Tim > > This was one of my initial reasons for my SpanQueryParser > LUCENE-5205[1] and [2], which handles analysis of multiterms even in phrases. > > [1] https://github.com/tballison/lucene-addons/tree/master/lucene-5205 > [2] https://mvnrepository.com/artifact/org.tallison.lucene/ > lucene-5205/6.6-0.1 > > -Original Message- > From: Bjarke Buur Mortensen [mailto:morten...@eluence.com] > Sent: Thursday, October 5, 2017 6:28 AM > To: solr-user@lucene.apache.org > Subject: Re: Complexphrase treats wildcards differently than other > query parsers > > 2017-10-05 11:29 GMT+02:00 Emir Arnautović : > > > Hi Bjarke, > > You are right - I jumped into wrong/old conclusion as the simplest > > answer to your question. > > > No problem :-) > > I guess looking at the code could give you an answer. > > > > This is what I would like to avoid out of fear that my head would > explode > ;-) > > > > > > Thanks, > > Emir > > -- > > Monitoring - Log Management - Alerting - Anomaly Detection Solr & > > Elasticsearch Consulting Support Training - http://sematext.com/ > > > > > > > > > On 5 Oct 2017, at 10:44, Bjarke Buur Mortensen > > > > > wrote: > > > > > > Well, according to > > > https://lucidworks.com/2011/11/29/whats-with-lowercasing- > > wildcard-multiterm-queries-in-solr/ > > > multiterm means > > > > > > wildcard > > > range > > > prefix > > > > > > so it is that way i'm using the word. That same article explains > > > how analysis will be performed with wildcards if the analyzers are > > > multi-term aware. > > > Furthermore, both lucene and dismax do the correct analysis, so I > > > don't think you are right in your statement about the majority of > > > QPs skipping analysis for wildcards. > > > > > > So I'm still confused as to why complexphrase does things differently. > > > > > > Thanks, > > > /Bjarke > > > > > > 2017-10-05 10:16 GMT+02:00 Emir Arnautović > > > > >: > > > > > >> Hi Bjarke, > > >> It is not multiterm that is causing query parser to skip analysis > > >> chain but wildcard. The majority of query parsers do not analyse > > >> query string > > if > > >> there are wildcards. > > >> > > >> HTH > > >> Emir > > >> -- > > >> Monitoring - Log Management - Alerting - Anomaly Detection Solr & > > >> Elasticsearch Consulting Support Training - http://sematext.com/ > > >> > > >> > > >> > > >>> On 4 Oct 2017, at 22:08, Bjarke Buur Mortensen > > >>> > > >> wrote: > > >>> > > >>> Hi list, > > >>> > > >>> I'm trying to search for the term funktionsnedsättning* In my > > >>> analyzer chain I use a MappingCharFilterFactory to change ä to a. > > >>> So I would expect that funktionsnedsättning* would translate to > > >>> funktionsnedsattning*. > > >>> > > >>> If I use e.g. the lucene query parser, this is indeed what happens: > > >>> ...debugQuery=on&defType=lucene&q=funktionsneds%C3%A4ttning* > > >>> gives me "rawquerystring":"funktionsnedsättning*", "querystring": > > >>> "funktionsnedsättning*", "parsedquery":"content_ol: > > >> funktionsnedsattning*" > > >>> and 15 documents returned. > > >>> > > >>> Trying
Re: Jenkins setup for continuous build
: I have some custom code in solr (which is not of good quality for : contributing back) so I need to setup my own continuous build solution. I : tried jenkins and was hoping that ant build (ant clean compile) in Execute : Shell textbox will work, but I am stuck at this ivy-fail error: : : To work around it, I also added another step in the 'Execute Shell' (ant : ivy-bootstrap), which succeeds but 'ant clean compile' still fails with the : following error. I guess that I am not alone in doing this so there should : be some standard work around for this. The ivy bootstraping is really designed to to be for developers to setup their ~/.ant/lib directory -- IIRC most of the jenkins build servers out there don't use it as part of their job, they instead of install ivy once when setting up the jenkins server (in the home dir of the jenkins user) I suspect the error you are running into may have to do with directory permissions of your jenkins server not letting the job write to the jenkins home dir? or some other path/permissions incompatibility. You could consider following the instruction in the ivy-fail warning to have ivy-bootstrap put the ivy jar files in a custom path inside hte workspace of your job, and then use "-lib" to point at that directory when running solr tests. Alternatively, my preference for setting up jenkins jobs these days is to use docker, and let all the per-job activity (inlcuding the git co of lucene and the ivy bootstraping) happen inside the docker container. For example: this is a set of scripts/configs i use for an "ondemand" jenkins job i have, that let's me checkout arbitrary branches/commits of lucene-solr, apply arbitrary patches, and the nrun arbitrary build commands (ie: ant test) using arbitrary JDK versions -- all configured at build time with build params... https://github.com/hossman/solr-jenkins-docker-tester : : ivy-fail: : [echo] : [echo] This build requires Ivy and Ivy could not be found in : your ant classpath. : [echo] : [echo] (Due to classpath issues and the recursive nature of : the Lucene/Solr : [echo] build system, a local copy of Ivy can not be used an : loaded dynamically : [echo] by the build.xml) : [echo] : [echo] You can either manually install a copy of Ivy 2.3.0 : in your ant classpath: : [echo]http://ant.apache.org/manual/install.html#optionalTasks : [echo] : [echo] Or this build file can do it for you by running the : Ivy Bootstrap target: : [echo]ant ivy-bootstrap : [echo] : [echo] Either way you will only have to install Ivy one time. : [echo] : [echo] 'ant ivy-bootstrap' will install a copy of Ivy into : your Ant User Library: : [echo]/home/jenkins/.ant/lib : [echo] : [echo] If you would prefer, you can have it installed into : an alternative : [echo] directory using the : "-Divy_install_path=/some/path/you/choose" option, : [echo] but you will have to specify this path every time you : build Lucene/Solr : [echo] in the future... : [echo]ant ivy-bootstrap -Divy_install_path=/some/path/you/choose : [echo]... : [echo]ant -lib /some/path/you/choose clean compile : [echo]... : [echo]ant -lib /some/path/you/choose clean compile : [echo] : [echo] If you have already run ivy-bootstrap, and still get : this message, please : [echo] try using the "--noconfig" option when running ant, : or editing your global : [echo] ant config to allow the user lib to be loaded. See : the wiki for more details: : [echo] : http://wiki.apache.org/lucene-java/DeveloperTips#Problems_with_Ivy.3F : [echo] : -Hoss http://www.lucidworks.com/
Re: Recommendations for number of open files?
The issue is on one of our QA collections which means I don't have access to the systems to see. I have to go through the admins it does have ".cfs" files in the index. However, it turns out that the replica in question has 8007 tlog files. This solrcloud is a target cloud for cdcr. The replica dies during recovery, I guess it tries to read all those files to apply them? How does a cdcr target know when it can delete a tlog? The source collection has 83 tlog files. Just to be clear, you suggest a per process open file limit of unlimited? Thanks On Thu, Oct 5, 2017 at 12:23 PM, Erick Erickson wrote: > Well, Lucene keeps an open file handle for _every_ file in _every_ > index directory. So, for instance, let's say a replica has 10 > segments. Each segment is 10-15 individual files. So that's 100-150 > file handles right there. And indexes can have many segments. > > Check to see if "cfs" extensions are in your indexing directory, > that's "compound file system" and if present will reduce the number of > file handles needed. > > A second thing you might be able to do is increase the maximum segment > size by setting maxMergedSegmentMB in your solrconfig file for > TieredMergePolicy, something like > 1 > eventually that'll merge segments into fewer, but that'll take a while. > > As to your question, we usually recommend to set the file limit to > "unlimited". You do have to monitor it however, at some point there's > a lot of bookkeeping. > > one replica trying to open > 8,000 files seems very odd though. Is it > a massive index? The default max segment size is 5G, so you could have > a gazillion small segments in which case you might want to split that > shard up and move the sub-shards to some other machine. > > Best, > Erick > > On Thu, Oct 5, 2017 at 10:02 AM, Webster Homer > wrote: > > We have begun to see errors around too many open files on one of our > > solrcloud nodes. One replica tries to open >8000 files. This replica > tries > > to startup and then fails the open files are exceeded upon startup as it > > tries to recover. > > > > > > Our solrclouds have 12 distinct collections. I would think that the > number > > of open files would depend upon the number of collections as well as > > numbers of files per index etc... > > > > Our current setting is 8192 open files per process. > > > > What values are recommended? is there a normal number of open files? > > > > What would lead to there being lots of open files? > > > > -- > > > > > > This message and any attachment are confidential and may be privileged or > > otherwise protected from disclosure. If you are not the intended > recipient, > > you must not copy this message or attachment or disclose the contents to > > any other person. If you have received this transmission in error, please > > notify the sender immediately and delete the message and any attachment > > from your system. Merck KGaA, Darmstadt, Germany and any of its > > subsidiaries do not accept liability for any omissions or errors in this > > message which may arise as a result of E-Mail-transmission or for damages > > resulting from any unauthorized changes of the content of this message > and > > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its > > subsidiaries do not guarantee that this message is free of viruses and > does > > not accept liability for any damages caused by any virus transmitted > > therewith. > > > > Click http://www.emdgroup.com/disclaimer to access the German, French, > > Spanish and Portuguese versions of this disclaimer. > -- This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not the intended recipient, you must not copy this message or attachment or disclose the contents to any other person. If you have received this transmission in error, please notify the sender immediately and delete the message and any attachment from your system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept liability for any omissions or errors in this message which may arise as a result of E-Mail-transmission or for damages resulting from any unauthorized changes of the content of this message and any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not guarantee that this message is free of viruses and does not accept liability for any damages caused by any virus transmitted therewith. Click http://www.emdgroup.com/disclaimer to access the German, French, Spanish and Portuguese versions of this disclaimer.
Re: Recommendations for number of open files?
I wouldn't call it massive. The index is ~9 million documents. So not too big, the documents themselves are pretty small On Thu, Oct 5, 2017 at 12:23 PM, Erick Erickson wrote: > Well, Lucene keeps an open file handle for _every_ file in _every_ > index directory. So, for instance, let's say a replica has 10 > segments. Each segment is 10-15 individual files. So that's 100-150 > file handles right there. And indexes can have many segments. > > Check to see if "cfs" extensions are in your indexing directory, > that's "compound file system" and if present will reduce the number of > file handles needed. > > A second thing you might be able to do is increase the maximum segment > size by setting maxMergedSegmentMB in your solrconfig file for > TieredMergePolicy, something like > 1 > eventually that'll merge segments into fewer, but that'll take a while. > > As to your question, we usually recommend to set the file limit to > "unlimited". You do have to monitor it however, at some point there's > a lot of bookkeeping. > > one replica trying to open > 8,000 files seems very odd though. Is it > a massive index? The default max segment size is 5G, so you could have > a gazillion small segments in which case you might want to split that > shard up and move the sub-shards to some other machine. > > Best, > Erick > > On Thu, Oct 5, 2017 at 10:02 AM, Webster Homer > wrote: > > We have begun to see errors around too many open files on one of our > > solrcloud nodes. One replica tries to open >8000 files. This replica > tries > > to startup and then fails the open files are exceeded upon startup as it > > tries to recover. > > > > > > Our solrclouds have 12 distinct collections. I would think that the > number > > of open files would depend upon the number of collections as well as > > numbers of files per index etc... > > > > Our current setting is 8192 open files per process. > > > > What values are recommended? is there a normal number of open files? > > > > What would lead to there being lots of open files? > > > > -- > > > > > > This message and any attachment are confidential and may be privileged or > > otherwise protected from disclosure. If you are not the intended > recipient, > > you must not copy this message or attachment or disclose the contents to > > any other person. If you have received this transmission in error, please > > notify the sender immediately and delete the message and any attachment > > from your system. Merck KGaA, Darmstadt, Germany and any of its > > subsidiaries do not accept liability for any omissions or errors in this > > message which may arise as a result of E-Mail-transmission or for damages > > resulting from any unauthorized changes of the content of this message > and > > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its > > subsidiaries do not guarantee that this message is free of viruses and > does > > not accept liability for any damages caused by any virus transmitted > > therewith. > > > > Click http://www.emdgroup.com/disclaimer to access the German, French, > > Spanish and Portuguese versions of this disclaimer. > -- This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not the intended recipient, you must not copy this message or attachment or disclose the contents to any other person. If you have received this transmission in error, please notify the sender immediately and delete the message and any attachment from your system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept liability for any omissions or errors in this message which may arise as a result of E-Mail-transmission or for damages resulting from any unauthorized changes of the content of this message and any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not guarantee that this message is free of viruses and does not accept liability for any damages caused by any virus transmitted therewith. Click http://www.emdgroup.com/disclaimer to access the German, French, Spanish and Portuguese versions of this disclaimer.
Re: Question regarding Upgrading to SolrCloud
The 7.0 Ref Guide was released Monday. An overview of the new replica types is available online here: https://lucene.apache.org/solr/guide/7_0/shards-and-indexing-data-in-solrcloud.html#types-of-replicas. The replica type is specified when you either create the collection or add a replica. On Thu, Oct 5, 2017 at 9:01 AM, Erick Erickson wrote: > Gopesh: > > There is brand new functionality in Solr 7, see: SOLR-10233, the > "PULL" replica type which is a hybrid SolrCloud replica that uses > master/slave type replication. You should find this in the reference > guide, the 7.0 ref guide should be published soon. Meanwhile, that > JIRA will let you know. Also see .../solr/CHANGES.txt. As Emir says, > though, it would require ZooKeeper. > > Really, though, once you move to SolrCloud (if you do) I'd stick with > the standard NRT replica type unless I had reason to use one of the > other two, (TLOG and PULL) as they're for pretty special situations. > > All that said, if you're happy with master/slave there's no compelling > reason to go to SolrCloud, especially for smaller installations. > > Best, > Erick > > On Wed, Oct 4, 2017 at 11:46 PM, Gopesh Sharma > wrote: >> Hello Guys, >> >> As of now we are running Solr 3.4 with Master Slave Configuration. We are >> planning to upgrade it to the lastest version (6.6 or 7). Questions I have >> before upgrading >> >> >> 1. Since we do not have a lot of data, is it required to move to >> SolrCloud or continue using it Master Slave >> 2. Is the support for Master Slave will be there in the future release or >> do you plan to remove it. >> 3. Can we configure master-slave replication in Solr Cloud, if yes then >> do we need zookeeper as well. >> >> Thanks, >> Gopesh Sharma
RE: Complexphrase treats wildcards differently than other query parsers
After some more digging, I'm wrong even at the Lucene level. When I use the CustomAnalyzer and make my UC vowel mock filter MultitermAware, I get this with Lucene in trunk: "the* quick~" name:thE* name:qUIck~2 name:thE name:qUIck So, there's room for improvement with phrases, but the regular multiterms should be ok. Still no answer for you... 2017-10-05 14:34 GMT+02:00 Allison, Timothy B. : > There's every chance that I'm missing something at the Solr level, but > it _looks_ at the Lucene level, like ComplexPhraseQueryParser is still > not applying analysis to multiterms. > > When I call this on 7.0.0: >QueryParser qp = new ComplexPhraseQueryParser(defaultFieldName, > analyzer); > return qp.parse(qString); > > where the analyzer is a mock "uppercase vowel" analyzer[1] and the > qString is; > > "the* quick~" the* quick~ the quick > > I get this: > "the* quick~" name:the* name:quick~2 name:thE name:qUIck
Re: Recommendations for number of open files?
Interestingly many of these tlog files (5428 out of 8007) are have 0 length!? What would cause that? As I stated this is a cdcr target collection. On Thu, Oct 5, 2017 at 1:19 PM, Webster Homer wrote: > I wouldn't call it massive. The index is ~9 million documents. So not too > big, the documents themselves are pretty small > > On Thu, Oct 5, 2017 at 12:23 PM, Erick Erickson > wrote: > >> Well, Lucene keeps an open file handle for _every_ file in _every_ >> index directory. So, for instance, let's say a replica has 10 >> segments. Each segment is 10-15 individual files. So that's 100-150 >> file handles right there. And indexes can have many segments. >> >> Check to see if "cfs" extensions are in your indexing directory, >> that's "compound file system" and if present will reduce the number of >> file handles needed. >> >> A second thing you might be able to do is increase the maximum segment >> size by setting maxMergedSegmentMB in your solrconfig file for >> TieredMergePolicy, something like >> 1 >> eventually that'll merge segments into fewer, but that'll take a while. >> >> As to your question, we usually recommend to set the file limit to >> "unlimited". You do have to monitor it however, at some point there's >> a lot of bookkeeping. >> >> one replica trying to open > 8,000 files seems very odd though. Is it >> a massive index? The default max segment size is 5G, so you could have >> a gazillion small segments in which case you might want to split that >> shard up and move the sub-shards to some other machine. >> >> Best, >> Erick >> >> On Thu, Oct 5, 2017 at 10:02 AM, Webster Homer >> wrote: >> > We have begun to see errors around too many open files on one of our >> > solrcloud nodes. One replica tries to open >8000 files. This replica >> tries >> > to startup and then fails the open files are exceeded upon startup as it >> > tries to recover. >> > >> > >> > Our solrclouds have 12 distinct collections. I would think that the >> number >> > of open files would depend upon the number of collections as well as >> > numbers of files per index etc... >> > >> > Our current setting is 8192 open files per process. >> > >> > What values are recommended? is there a normal number of open files? >> > >> > What would lead to there being lots of open files? >> > >> > -- >> > >> > >> > This message and any attachment are confidential and may be privileged >> or >> > otherwise protected from disclosure. If you are not the intended >> recipient, >> > you must not copy this message or attachment or disclose the contents to >> > any other person. If you have received this transmission in error, >> please >> > notify the sender immediately and delete the message and any attachment >> > from your system. Merck KGaA, Darmstadt, Germany and any of its >> > subsidiaries do not accept liability for any omissions or errors in this >> > message which may arise as a result of E-Mail-transmission or for >> damages >> > resulting from any unauthorized changes of the content of this message >> and >> > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its >> > subsidiaries do not guarantee that this message is free of viruses and >> does >> > not accept liability for any damages caused by any virus transmitted >> > therewith. >> > >> > Click http://www.emdgroup.com/disclaimer to access the German, French, >> > Spanish and Portuguese versions of this disclaimer. >> > > -- This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not the intended recipient, you must not copy this message or attachment or disclose the contents to any other person. If you have received this transmission in error, please notify the sender immediately and delete the message and any attachment from your system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept liability for any omissions or errors in this message which may arise as a result of E-Mail-transmission or for damages resulting from any unauthorized changes of the content of this message and any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not guarantee that this message is free of viruses and does not accept liability for any damages caused by any virus transmitted therewith. Click http://www.emdgroup.com/disclaimer to access the German, French, Spanish and Portuguese versions of this disclaimer.
Re: Recommendations for number of open files?
OK, never mind about the file handle limits, let's deal with the tlogs. Although unlimited is a good thing. Do you have buffering disabled on the target cluster? Best Erick On Thu, Oct 5, 2017 at 11:19 AM, Webster Homer wrote: > I wouldn't call it massive. The index is ~9 million documents. So not too > big, the documents themselves are pretty small > > On Thu, Oct 5, 2017 at 12:23 PM, Erick Erickson > wrote: > >> Well, Lucene keeps an open file handle for _every_ file in _every_ >> index directory. So, for instance, let's say a replica has 10 >> segments. Each segment is 10-15 individual files. So that's 100-150 >> file handles right there. And indexes can have many segments. >> >> Check to see if "cfs" extensions are in your indexing directory, >> that's "compound file system" and if present will reduce the number of >> file handles needed. >> >> A second thing you might be able to do is increase the maximum segment >> size by setting maxMergedSegmentMB in your solrconfig file for >> TieredMergePolicy, something like >> 1 >> eventually that'll merge segments into fewer, but that'll take a while. >> >> As to your question, we usually recommend to set the file limit to >> "unlimited". You do have to monitor it however, at some point there's >> a lot of bookkeeping. >> >> one replica trying to open > 8,000 files seems very odd though. Is it >> a massive index? The default max segment size is 5G, so you could have >> a gazillion small segments in which case you might want to split that >> shard up and move the sub-shards to some other machine. >> >> Best, >> Erick >> >> On Thu, Oct 5, 2017 at 10:02 AM, Webster Homer >> wrote: >> > We have begun to see errors around too many open files on one of our >> > solrcloud nodes. One replica tries to open >8000 files. This replica >> tries >> > to startup and then fails the open files are exceeded upon startup as it >> > tries to recover. >> > >> > >> > Our solrclouds have 12 distinct collections. I would think that the >> number >> > of open files would depend upon the number of collections as well as >> > numbers of files per index etc... >> > >> > Our current setting is 8192 open files per process. >> > >> > What values are recommended? is there a normal number of open files? >> > >> > What would lead to there being lots of open files? >> > >> > -- >> > >> > >> > This message and any attachment are confidential and may be privileged or >> > otherwise protected from disclosure. If you are not the intended >> recipient, >> > you must not copy this message or attachment or disclose the contents to >> > any other person. If you have received this transmission in error, please >> > notify the sender immediately and delete the message and any attachment >> > from your system. Merck KGaA, Darmstadt, Germany and any of its >> > subsidiaries do not accept liability for any omissions or errors in this >> > message which may arise as a result of E-Mail-transmission or for damages >> > resulting from any unauthorized changes of the content of this message >> and >> > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its >> > subsidiaries do not guarantee that this message is free of viruses and >> does >> > not accept liability for any damages caused by any virus transmitted >> > therewith. >> > >> > Click http://www.emdgroup.com/disclaimer to access the German, French, >> > Spanish and Portuguese versions of this disclaimer. >> > > -- > > > This message and any attachment are confidential and may be privileged or > otherwise protected from disclosure. If you are not the intended recipient, > you must not copy this message or attachment or disclose the contents to > any other person. If you have received this transmission in error, please > notify the sender immediately and delete the message and any attachment > from your system. Merck KGaA, Darmstadt, Germany and any of its > subsidiaries do not accept liability for any omissions or errors in this > message which may arise as a result of E-Mail-transmission or for damages > resulting from any unauthorized changes of the content of this message and > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its > subsidiaries do not guarantee that this message is free of viruses and does > not accept liability for any damages caused by any virus transmitted > therewith. > > Click http://www.emdgroup.com/disclaimer to access the German, French, > Spanish and Portuguese versions of this disclaimer.
Re: Recommendations for number of open files?
buffering is disabled. Indeed we disable it everywhere as all it seems to do is leave tlogs around forever. Autocommit is set to 60 seconds. The source cdcr request handler looks like this. The first target is the problematic one {"requestHandler":{"/cdcr":{ "name":"/cdcr", "class":"solr.CdcrRequestHandler", "replica":[ { "zkHost":"ae1a-ecomqa-mzk01:2181,ae1a-ecomqa-mzk02:2181,ae1a-ecomqa-mzk03:2181/solr", "source":"sial-content-citations", "target":"sial-content-citations"}, { "zkHost":"uc1b-ecomqa-mzk01:2181,uc1b-ecomqa-mzk02:2181,uc1b-ecomqa-mzk03:2181/solr", "source":"sial-content-citations", "target":"sial-content-citations"}], "replicator":{ "threadPoolSize":2, "schedule":1000, "batchSize":250}, "updateLogSynchronizer":{"schedule":6 The target looks like: "requestHandler":{"/cdcr":{ "name":"/cdcr", "class":"solr.CdcrRequestHandler", "buffer":{"defaultState":"disabled"}} These are all in our QA environment On Thu, Oct 5, 2017 at 2:43 PM, Erick Erickson wrote: > OK, never mind about the file handle limits, let's deal with the > tlogs. Although unlimited is a good thing. > > Do you have buffering disabled on the target cluster? > > Best > Erick > > On Thu, Oct 5, 2017 at 11:19 AM, Webster Homer > wrote: > > I wouldn't call it massive. The index is ~9 million documents. So not too > > big, the documents themselves are pretty small > > > > On Thu, Oct 5, 2017 at 12:23 PM, Erick Erickson > > > wrote: > > > >> Well, Lucene keeps an open file handle for _every_ file in _every_ > >> index directory. So, for instance, let's say a replica has 10 > >> segments. Each segment is 10-15 individual files. So that's 100-150 > >> file handles right there. And indexes can have many segments. > >> > >> Check to see if "cfs" extensions are in your indexing directory, > >> that's "compound file system" and if present will reduce the number of > >> file handles needed. > >> > >> A second thing you might be able to do is increase the maximum segment > >> size by setting maxMergedSegmentMB in your solrconfig file for > >> TieredMergePolicy, something like > >> 1 > >> eventually that'll merge segments into fewer, but that'll take a while. > >> > >> As to your question, we usually recommend to set the file limit to > >> "unlimited". You do have to monitor it however, at some point there's > >> a lot of bookkeeping. > >> > >> one replica trying to open > 8,000 files seems very odd though. Is it > >> a massive index? The default max segment size is 5G, so you could have > >> a gazillion small segments in which case you might want to split that > >> shard up and move the sub-shards to some other machine. > >> > >> Best, > >> Erick > >> > >> On Thu, Oct 5, 2017 at 10:02 AM, Webster Homer > >> wrote: > >> > We have begun to see errors around too many open files on one of our > >> > solrcloud nodes. One replica tries to open >8000 files. This replica > >> tries > >> > to startup and then fails the open files are exceeded upon startup as > it > >> > tries to recover. > >> > > >> > > >> > Our solrclouds have 12 distinct collections. I would think that the > >> number > >> > of open files would depend upon the number of collections as well as > >> > numbers of files per index etc... > >> > > >> > Our current setting is 8192 open files per process. > >> > > >> > What values are recommended? is there a normal number of open files? > >> > > >> > What would lead to there being lots of open files? > >> > > >> > -- > >> > > >> > > >> > This message and any attachment are confidential and may be > privileged or > >> > otherwise protected from disclosure. If you are not the intended > >> recipient, > >> > you must not copy this message or attachment or disclose the contents > to > >> > any other person. If you have received this transmission in error, > please > >> > notify the sender immediately and delete the message and any > attachment > >> > from your system. Merck KGaA, Darmstadt, Germany and any of its > >> > subsidiaries do not accept liability for any omissions or errors in > this > >> > message which may arise as a result of E-Mail-transmission or for > damages > >> > resulting from any unauthorized changes of the content of this message > >> and > >> > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its > >> > subsidiaries do not guarantee that this message is free of viruses and > >> does > >> > not accept liability for any damages caused by any virus transmitted > >> > therewith. > >> > > >> > Click http://www.emdgroup.com/disclaimer to access the German, > French, > >> > Spanish and Portuguese versions of this disclaimer. > >> > > > > -- > > > > > > This message and any attachment are confidential and may be privileged or > > otherwise protected from disclosure. If you are not the intended > recipient, > > you must not copy this message o
Re: Recommendations for number of open files?
It seems that there was a networking error just prior to the creation of the 0 length files: The files from Sep 27 are all written at 17:56. There was minor packet loss (1 out of 10 packets per 60 second interval) just prior to that time. On Thu, Oct 5, 2017 at 3:11 PM, Webster Homer wrote: > buffering is disabled. Indeed we disable it everywhere as all it seems to > do is leave tlogs around forever. > > Autocommit is set to 60 seconds. > > The source cdcr request handler looks like this. The first target is the > problematic one > > {"requestHandler":{"/cdcr":{ > "name":"/cdcr", > "class":"solr.CdcrRequestHandler", > "replica":[ > { > > "zkHost":"ae1a-ecomqa-mzk01:2181,ae1a-ecomqa-mzk02:2181,ae1a-ecomqa-mzk03:2181/solr", > "source":"sial-content-citations", > "target":"sial-content-citations"}, > { > > "zkHost":"uc1b-ecomqa-mzk01:2181,uc1b-ecomqa-mzk02:2181,uc1b-ecomqa-mzk03:2181/solr", > "source":"sial-content-citations", > "target":"sial-content-citations"}], > "replicator":{ > "threadPoolSize":2, > "schedule":1000, > "batchSize":250}, > "updateLogSynchronizer":{"schedule":6 > > > > The target looks like: > > "requestHandler":{"/cdcr":{ > "name":"/cdcr", > "class":"solr.CdcrRequestHandler", > "buffer":{"defaultState":"disabled"}} > > > These are all in our QA environment > > > On Thu, Oct 5, 2017 at 2:43 PM, Erick Erickson > wrote: > >> OK, never mind about the file handle limits, let's deal with the >> tlogs. Although unlimited is a good thing. >> >> Do you have buffering disabled on the target cluster? >> >> Best >> Erick >> >> On Thu, Oct 5, 2017 at 11:19 AM, Webster Homer >> wrote: >> > I wouldn't call it massive. The index is ~9 million documents. So not >> too >> > big, the documents themselves are pretty small >> > >> > On Thu, Oct 5, 2017 at 12:23 PM, Erick Erickson < >> erickerick...@gmail.com> >> > wrote: >> > >> >> Well, Lucene keeps an open file handle for _every_ file in _every_ >> >> index directory. So, for instance, let's say a replica has 10 >> >> segments. Each segment is 10-15 individual files. So that's 100-150 >> >> file handles right there. And indexes can have many segments. >> >> >> >> Check to see if "cfs" extensions are in your indexing directory, >> >> that's "compound file system" and if present will reduce the number of >> >> file handles needed. >> >> >> >> A second thing you might be able to do is increase the maximum segment >> >> size by setting maxMergedSegmentMB in your solrconfig file for >> >> TieredMergePolicy, something like >> >> 1 >> >> eventually that'll merge segments into fewer, but that'll take a while. >> >> >> >> As to your question, we usually recommend to set the file limit to >> >> "unlimited". You do have to monitor it however, at some point there's >> >> a lot of bookkeeping. >> >> >> >> one replica trying to open > 8,000 files seems very odd though. Is it >> >> a massive index? The default max segment size is 5G, so you could have >> >> a gazillion small segments in which case you might want to split that >> >> shard up and move the sub-shards to some other machine. >> >> >> >> Best, >> >> Erick >> >> >> >> On Thu, Oct 5, 2017 at 10:02 AM, Webster Homer > > >> >> wrote: >> >> > We have begun to see errors around too many open files on one of our >> >> > solrcloud nodes. One replica tries to open >8000 files. This replica >> >> tries >> >> > to startup and then fails the open files are exceeded upon startup >> as it >> >> > tries to recover. >> >> > >> >> > >> >> > Our solrclouds have 12 distinct collections. I would think that the >> >> number >> >> > of open files would depend upon the number of collections as well as >> >> > numbers of files per index etc... >> >> > >> >> > Our current setting is 8192 open files per process. >> >> > >> >> > What values are recommended? is there a normal number of open files? >> >> > >> >> > What would lead to there being lots of open files? >> >> > >> >> > -- >> >> > >> >> > >> >> > This message and any attachment are confidential and may be >> privileged or >> >> > otherwise protected from disclosure. If you are not the intended >> >> recipient, >> >> > you must not copy this message or attachment or disclose the >> contents to >> >> > any other person. If you have received this transmission in error, >> please >> >> > notify the sender immediately and delete the message and any >> attachment >> >> > from your system. Merck KGaA, Darmstadt, Germany and any of its >> >> > subsidiaries do not accept liability for any omissions or errors in >> this >> >> > message which may arise as a result of E-Mail-transmission or for >> damages >> >> > resulting from any unauthorized changes of the content of this >> message >> >> and >> >> > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its >> >> > subsidiaries do not guarantee that this message
Re: Rescoring from 0 - full
Hi, Your answers have helped me a lot. I've managed to use the LTRQParserPlugin and it does what I need. Full control over scoring with it's re-ranking functionality. I define my custom features and may pass custom params to them using the "efi.*" syntax. Is there something similar to define weights in the model that uses these features? Can I have single model, byt pass feature weights in each request? How do I pass my custom weights with each request in the example below? { "store" : "myFeaturesStore", "name" : "myModel", "class" : "org.apache.solr.ltr.model.LinearModel", "features" : [ { "name" : "scorePersonalId" }, { "name" : "originalScore" } ], "params" : { "weights" : { "scorePersonalId" : 0.9, "originalScore" : 0.1 } } } I am using SOLR 6.6, soon switching to 7.0 Best regards, Dariusz Wojtas On Thu, Sep 21, 2017 at 5:18 PM, Erick Erickson wrote: > Sure, you can take full control of the scoring, just write a custom > similarity. > > What's not at all clear is why you want to. RerankQParserPlugin will > re-rank the to N documents by pushing them through a different query, > can you make that work? > > Best, > Erick > > > > On Thu, Sep 21, 2017 at 4:20 AM, Diego Ceccarelli (BLOOMBERG/ LONDON) > wrote: > > Hi Dariusz, > > If you use *:* you'll rerank only the top N random documents, as Emir > said, that will not produce interesting results probably. > > If you want to replace the original score, you can take a look at the > learning to rank module [1], that would allow you to reassign a > > new score to the top N documents returned by your query and then reorder > them based on that (ignoring the original score, if you want). > > > > Cheers, > > Diego > > > > [1] https://cwiki.apache.org/confluence/display/solr/Learning+To+Rank > > > > From: solr-user@lucene.apache.org At: 09/21/17 08:49:13 > > To: solr-user@lucene.apache.org > > Subject: Re: Rescoring from 0 - full > > > > Hi Dariusz, > > You could use fq for filtering (can disable caching to avoid polluting > filter cache) and q=*:*. That way you’ll get score=1 for all doc and can > rerank. The issue with this approach is that you rerank top N and without > score they wouldn’t be ordered so it is no-go. > > What you could do (did not try) in rescoring divide by score (not sure > if can access calculated but could calculate) to eliminate score. > > > > HTH, > > Emir > > > >> On 20 Sep 2017, at 21:38, Dariusz Wojtas wrote: > >> > >> Hi, > >> When I use boosting fuctionality, it is always about adding or > >> multiplicating the score calculated in the 'q' param. > >> I mau use function queries inside 'q', but this may hit performance on > >> calling multiple nested functions. > >> I thaught that 'rerank' could help, but it is still about changing the > >> original score, not full calculation. > >> > >> How can take full control on score in rerank? Is it possible? > >> > >> Best regards, > >> Dariusz Wojtas > > > > >
Solr not preserving milliseconds precision for zero milliseconds
Hello Everyone, Say I have a document like one below. > { > "id":"test", > "startTime":"2013-02-10T18:36:07.000Z" > } I add this document to solr index using the admin UI and "update" request handler. It gets added successfully but when I retrieve this document back using "id" I get following. { > "id":"test", > "startTime":"2013-02-10T18:36:07Z", > "_version_":1580456021738913792}] > } As you can see, the milliseconds precision in date field "startTime" is lost. Precision is preserved for non-zero milliseconds but it's being lost for zero values. The field type of "startTime" field is as follows. docValues="true" precisionStep="0"/> Does anyone know how I can preserve milliseconds even if its zero? Or is it not possible at all? Thanks, Pratik
Re: Solr test runs: test skipping logic
: I am seeing that in different test runs (e.g., by executing 'ant test' on : the root folder in 'lucene-solr') a different subset of tests are skipped. : Where can I find more about it? I am trying to create parity between test : successes before and after my changes and this is causing confusion. The test randomization logic creates an arbitrary "master seed" that is assigned by ant. This master seed is then used to generate some randomized default properties for the the forked JVMs (default timezones, default Locale, default charset, etc...) Each test class run in a forked JVM then gets it's own Random seed (generated fro mthe master seed as well) which the solr test-framework uses to randomize some more things (that are specific to the solr test-framework. In some cases, tests have @Assume of assumeThat(...) logic in if we know that certain tests are completely incompatible with certain randomized aspects of the environemnt -- for example: some tests won't bothe to run if the randomized Locale uses "tr" because of external third-party dependencies that break with this Locale (due to upercase/lowercase behavior). This is most likeley the reason you are seeing a diff "set" of tests run on diff times. But if you want true parity between test runs, use the same master seed -- which is printed at the begining of every "ant test" run, as well as any time a test fails, and can be overridden on the ant command line for future runs. run "ant test-help" for the specifics. -Hoss http://www.lucidworks.com/
Re: Solr not preserving milliseconds precision for zero milliseconds
: > "startTime":"2013-02-10T18:36:07.000Z" ... : handler. It gets added successfully but when I retrieve this document back : using "id" I get following. ... : > "startTime":"2013-02-10T18:36:07Z", ... : As you can see, the milliseconds precision in date field "startTime" is : lost. Precision is preserved for non-zero milliseconds but it's being lost : for zero values. The field type of "startTime" field is as follows. ... : Does anyone know how I can preserve milliseconds even if its zero? Or is it : not possible at all? ms precision is being preserved -- but as you mentioned, the fractional seconds you indexed are "0" therefore they are not needed/preserved when writing the response to maintain ms precision. This is the correct formatting as specified in the specification for the time format that Solr follows... https://lucene.apache.org/solr/guide/working-with-dates.html https://www.w3.org/TR/xmlschema-2/#dateTime >>> 3.2.7.2 Canonical representation >>> ... >>> The fractional second string, if present, must not end in '0'; -Hoss http://www.lucidworks.com/
Re: mm is not working if you have same term multiple times in query
: I'm using Solr 6.6.0 i have set mm as 100% but when i have the repeated : search term then mm param is not honoured : I have 2 docs in index : Doc1- : name=lock : Doc 2- : name=lock lock : : Now when i'm quering the solr with query : *http://localhost:8983/solr/test2/select?defType=dismax&qf=name&indent=on&mm=100%25&q=lock%20lock&wt=json : then it is returning both results but it should return only Doc 2 as no of : frequency is 2 in query while doc1 has frequency of 1 (lock term frequency). There's a couple of misconceptions here... first off: "mm" is a property of the "BooleanQuery" object that contains multiple SHOULD clauses -- it has nothign to do with the "frequency" of any clause/term -- if your BooleanQuery contains 2 SHOULD clauses, then the mm=2 will require that both clauses match. If the 2 clauses are *identical* then BooleanQuery will actally optimize away one instance, and reduce the mm=1 second: even if BooleanQuery didn't have that optimization -- which was the case until ~6.x -- then your original query would *still* match Doc#1, because each clause (aka sub-query) would be evaluated independently. the BooleanQuery would ask clause #1 "do you match doc#1?" and it would say "yes" -- then the BooleanQuery owuld ask clause #2 "do you match doc#1" and it would also say "yes" and so the BooleanQuery would say "i've reached the minimum number of SHOULD clauses i was configured to require for a match, so doc#1 is a match" If you have a special case situation of wanting to require that term occurs at least X times -- the only way i can think of off the top of my head to do that would be using the termfreq() function. something like... q={!frange l=}termfreq(text,'lock') https://lucene.apache.org/solr/guide/function-queries.html#termfreq-function https://lucene.apache.org/solr/guide/other-parsers.html#function-range-query-parser But i caution that while this might work in the specific example you gave, it's not really a drop in replacement for how you _thought_ mm should work, so a lot of things you might be trying to do with dismax+mm aren't going to have any sort of corollary here. In general i'm curious as to your broader picture goal, nad if there isn't some better solution... https://people.apache.org/~hossman/#xyproblem XY Problem Your question appears to be an "XY Problem" ... that is: you are dealing with "X", you are assuming "Y" will help you, and you are asking about "Y" without giving more details about the "X" so that we can understand the full issue. Perhaps the best solution doesn't involve "Y" at all? See Also: http://www.perlmonks.org/index.pl?node_id=542341 -Hoss http://www.lucidworks.com/
solr and machine learning - recommendations?
Now that I am got a big hunk of documents indexed with Solr, I am looking to see whether I can try some machine learning tools to try and extract bibliographic references out of the documents. Anyone got some recommendations about which kits might be good to play with for something like this? Notice: This email and any attachments are confidential and may not be used, published or redistributed without the prior written consent of the Institute of Geological and Nuclear Sciences Limited (GNS Science). If received in error please destroy and immediately notify GNS Science. Do not copy or disclose the contents.
Re: FilterCache size should reduce as index grows?
So for large indexes, there is a chance that filterCache of 128 can cause bad GC. And for smaller indexes, it would really not matter that much because well, the index size is small and probably whole of it is in OS-cache anyways. So perhaps a default of 64 would be a much saner choice to get the best of both the worlds? On Thu, Oct 5, 2017 at 7:23 AM, Yonik Seeley wrote: > On Thu, Oct 5, 2017 at 3:20 AM, Toke Eskildsen wrote: > > On Wed, 2017-10-04 at 21:42 -0700, S G wrote: > > > > It seems that the memory limit option maxSizeMB was added in Solr 5.2: > > https://issues.apache.org/jira/browse/SOLR-7372 > > I am not sure if it works with all caches in Solr, but in my world it > > is way better to define the caches by memory instead of count. > > Yes, that will work with the filterCache, but one needs to change the > cache type as well (maxSizeMB is only an option on LRUCache, and > filterCache uses FastLRUCache in the default solrconfig.xml) > > -Yonik >