Document field as an input for subquery params which contains whitespaces
Hello. I would like to query data depending on a value in a document. But it works only, when it contains no whitespaces. q=fqdn:"cn=user 1,ou=users"&fl=fqdn,orgnav:[subquery]&orgnav.q={!dismax qf=org_fqdn_hierarchy v=$row.org_fqdn_hierarchy}&orgnav.fq=type:org The first document looks like { fqdn:"cn=user 1,ou=users", org_fqdn_hierarchy:"ou=sales,cn=dep 1" } The field org_fqdn_hierarchy is tokenized with PathHierarchyTokenizerFactory, to retrieve all hierarchically belonging documents. I tried it with the following commands, but the orgnav-subquery stays empty :( q=fqdn:"cn=user 1,ou=users"&fl=fqdn,orgnav:[subquery]&orgnav.q={!dismax qf=org_fqdn_hierachy v='$row.org_fqdn_hierachy'}&orgnav.fq=type:org q=fqdn:"cn=user 1,ou=users"&fl=fqdn,orgnav:[subquery]&orgnav.q={!dismax qf=org_fqdn_hierachy v="$row.org_fqdn_hierachy"}&orgnav.fq=type:org Is there a trick or parameter to escape it? Best regards Johannes
Re: Document field as an input for subquery params which contains whitespaces
SOLVED. Containing white spaces were not the problem. I wanted to see the content of org_fqdn_hierarchy in the response and changed the field definition stored from false to true. And my subquery returns the desired results. Best regards Johannes Am 11.02.2017 um 11:18 schrieb Johannes: > Hello. > > I would like to query data depending on a value in a document. But it > works only, when it contains no whitespaces. > > q=fqdn:"cn=user 1,ou=users"&fl=fqdn,orgnav:[subquery]&orgnav.q={!dismax > qf=org_fqdn_hierarchy v=$row.org_fqdn_hierarchy}&orgnav.fq=type:org > > The first document looks like > { > fqdn:"cn=user 1,ou=users", > org_fqdn_hierarchy:"ou=sales,cn=dep 1" > } > > The field org_fqdn_hierarchy is tokenized with > PathHierarchyTokenizerFactory, to retrieve all hierarchically belonging > documents. > > > I tried it with the following commands, but the orgnav-subquery stays > empty :( > > q=fqdn:"cn=user 1,ou=users"&fl=fqdn,orgnav:[subquery]&orgnav.q={!dismax > qf=org_fqdn_hierachy v='$row.org_fqdn_hierachy'}&orgnav.fq=type:org > > q=fqdn:"cn=user 1,ou=users"&fl=fqdn,orgnav:[subquery]&orgnav.q={!dismax > qf=org_fqdn_hierachy v="$row.org_fqdn_hierachy"}&orgnav.fq=type:org > > > > Is there a trick or parameter to escape it? > > Best regards > Johannes >
Permutations of entries in a multivalued field
Hello all, we are facing the following problem: we use a multivalued string field that contains entries of the kind A/B/C/, where A,B,C are terms. We are now looking for a simple way to also find all permutations of A/B/C, so e.g. B/A/C. As a workaround we added a new field that contains all entries alphabetically sorted and guarantee sorting on the user side. However - since this is limited in some ways - is there a simple way to either index in a way such that solely A/B/C and all permutations are found (using e.g. type=text is not an option since a term could occur in a different entry of the multivalued field) or trigger an alphabetical sorting of incoming queries. Thanks a lot for your feedback, best regards Johannes
Re: Permutations of entries in a multivalued field
Thanks a lot for these useful hints. Best, Johannes On 18.12.2015 20:59, Allison, Timothy B. wrote: Duh, didn't realize you could set inOrder in Solr. Y, that's the better solution. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Friday, December 18, 2015 2:27 PM To: solr-user Subject: Re: Permutations of entries in a multivalued field The other thing to check is the ComplexPhraseQueryParser, see: https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-ComplexPhraseQueryParser It uses the Span queries to build up the query... Best, Erick On Fri, Dec 18, 2015 at 11:23 AM, Allison, Timothy B. wrote: Hi Johannes, I suspect that Scott's answer would be more efficient than the following, and I may be misunderstanding the problem! This type of search is supported at the Lucene level by a SpanNearQuery with inOrder set to false. So, how do you get a SpanQuery in Solr? You might want to look at the SurroundQueryParser, and I have an alternate (LUCENE-5205/SOLR-5410) here: https://github.com/tballison/lucene-addons. If you do find an appropriate parser, make sure that your position increment gap is > 0 on your text field definition, and then you'd never incorrectly get a hit across field entries of: [0] A B [1] C Best, Tim On Wed, Dec 16, 2015 at 8:38 AM, Johannes Riedl < johannes.ri...@uni-tuebingen.de> wrote: Hello all, we are facing the following problem: we use a multivalued string field that contains entries of the kind A/B/C/, where A,B,C are terms. We are now looking for a simple way to also find all permutations of A/B/C, so e.g. B/A/C. As a workaround we added a new field that contains all entries alphabetically sorted and guarantee sorting on the user side. However - since this is limited in some ways - is there a simple way to either index in a way such that solely A/B/C and all permutations are found (using e.g. type=text is not an option since a term could occur in a different entry of the multivalued field) or trigger an alphabetical sorting of incoming queries. Thanks a lot for your feedback, best regards Johannes -- Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC | 434.409.2780 http://www.opensourceconnections.com
optimize cache-hit-ratio of filter- and query-result-cache
Hi, some of my solr indices have a low cache-hit-ratio. 1 Does sorting the parts of a single filter-query have impact on filter-cache- and query-result-cache-hit-ratio? 1.1 Example: fq=field1:(2 or 3 or 1) to fq=field1:(1 or 2 or 3) -> if 1,2,3 are randomly sorted 2 Does sorting the parts of the query have impact on query-result-cache-hit-ratio? 2.1 Example: "q=abc&fq=field1:abc&sort=field1 desc&fq=field2:xyz&sort=field2 asc" to "q=abc&fq=field1:abc&fq=field2:xyz&sort=field1 desc&sort=field2 asc" -> if the query parts are randomly sorted Thanks! Johannes
Re: optimize cache-hit-ratio of filter- and query-result-cache
Thanks. The statements on http://wiki.apache.org/solr/SolrCaching#showItems are not explicitly enough for my question.
sort by given order
Hi, i want to sort my documents by a given order. The order is defined by a list of ids. My current solution is: list of ids: 15, 5, 1, 10, 3 query: q=*:*&fq=(id:((15) OR (5) OR (1) OR (10) OR (3)))&sort=query($idqsort) desc,id asc&idqsort=id:((15^5) OR (5^4) OR (1^3) OR (10^2) OR (3^1))&start=0&rows=5 Do you know an other solution to sort by a list of ids? Thanks! Johannes
Custom Query Implementation?
Hi, I am entirely new to the world of SOLR programming and I have the following questions: In addition to our regular searches we need to implement a specialised form of range search and ranking. What I mean by this is that users can search for one or more numeric ranges like "17:85,205:303" etc. (These are range-begin/range-end pairs.) A small percentage of our records, maybe less than 10% will have similar ranges, again, one or more, stored in a SOLR field. We need to apply a custom scoring function and filter the matches, too. (Not all ranges match and scores will typically differ greatly.) Where are all the places where we have to insert code? Also, any tips on how to develop and debug this? I am using the Linux command-line and Emacs. I am linking against SOLR by using "javac -cp solr-core-4.2.1.jar:. my_code.java". It is probably not relevant but, I might mention it anyway: We are using SOLR as a part of VuFind. I'd be greatful for any suggestions. Thank you! --Johannes -- Dr. Johannes Ruscheinski Universitätsbibliothek Tübingen - IT-Abteilung - Wilhelmstr. 32, 72074 Tübingen Tel: +49 7071 29-72820 FAX: +49 7071 29-5069 Email: johannes.ruschein...@uni-tuebingen.de
Custom Scoring Question
Hi, I am entirely new to the world of SOLR programming and I have the following questions: In addition to our regular searches we need to implement a specialised form of range search and ranking. We have implemented a CustomScoreQuery and a CustomScoreProvider. I now have a few questions: 1) Where and how do we let SOLR know that it should use this? (I presume that will be some XML config file.) 2) How do we "tag" our special queries to switch to the custom implementation. Furthermore, only a small subset of our data will have the database field relevant to this type of query set. A problem that I can see is that we want SOLR to prefilter, or suppress, any records that have no data in this field and, if the field is non-empty, to call a function provided by us to let it know whether to include said record in the result set or not. Also, any tips on how to develop and debug this? I am using the Linux command-line and Emacs. I am linking against SOLR by using "javac -cp solr-core-4.2.1.jar:. my_code.java". It is probably not relevant but, I might mention it anyway: We are using SOLR as a part of VuFind. I'd be greatful for any suggestions. --Johannes -- Dr. Johannes Ruscheinski Universitätsbibliothek Tübingen - IT-Abteilung - Wilhelmstr. 32, 72074 Tübingen Tel: +49 7071 29-72820 FAX: +49 7071 29-5069 Email: johannes.ruschein...@uni-tuebingen.de
Re: Antwort: Custom Scoring Question
Hi Stephan, On 29/04/15 14:37, Stephan Schubert wrote: > Hi Johannes, > > did you have a look on Solr edismax and function queries? > https://cwiki.apache.org/confluence/display/solr/Function+Queries Just read it. > > If I got you right, for the case you just want to ignore fields which have > not a value set on a specific field you can filter them out with a filter Yes, that is a part of our problem. > > query. > > Example: > > fieldname: mycustomfield > > filterquery to ignore docs with mycustomfield not set: +mycustomfield:* That seems really useful to us and solves one part of our problem, thanks. We still need to figure out how to invoke the custom scorer that we wrote in Java. Also, we would like the search to invoke another custom function that filters out results that are not relevant to a given query. --Johannes > > Regards > > Stephan > > > > Von:Johannes Ruscheinski > An: solr-user@lucene.apache.org, > Kopie: Oliver Obenland > Datum: 29.04.2015 14:10 > Betreff:Custom Scoring Question > > > > Hi, > > I am entirely new to the world of SOLR programming and I have the > following questions: > > In addition to our regular searches we need to implement a specialised > form of range search and ranking. We have implemented a CustomScoreQuery > and a CustomScoreProvider. I now have a few questions: > > 1) Where and how do we let SOLR know that it should use this? (I presume > that will be some XML config file.) > 2) How do we "tag" our special queries to switch to the custom > implementation. > > Furthermore, only a small subset of our data will have the database field > relevant to this type of query set. A problem that I can see is that we > want SOLR to prefilter, or suppress, any records that have no data in this > field and, if the field is non-empty, to call a function provided by us to > let it know whether to include said record in the result set or not. > > Also, any tips on how to develop and debug this? I am using the Linux > command-line and Emacs. I am linking against SOLR by using "javac -cp > solr-core-4.2.1.jar:. my_code.java". It is probably not relevant but, I > might mention it anyway: We are using SOLR as a part of VuFind. > > I'd be greatful for any suggestions. > > --Johannes > -- Dr. Johannes Ruscheinski Universitätsbibliothek Tübingen - IT-Abteilung - Wilhelmstr. 32, 72074 Tübingen Tel: +49 7071 29-72820 FAX: +49 7071 29-5069 Email: johannes.ruschein...@uni-tuebingen.de
Limit Results By Score?
Hi, We have implemented a custom scoring function and also need to limit the results by score. How could we go about that? Alternatively, can we suppress the results early using some kind of custom filter? --Johannes -- Dr. Johannes Ruscheinski Universitätsbibliothek Tübingen - IT-Abteilung - Wilhelmstr. 32, 72074 Tübingen Tel: +49 7071 29-72820 FAX: +49 7071 29-5069 Email: johannes.ruschein...@uni-tuebingen.de
Multilingual Solr
Hi all, we are currently in search of a solution for switching between different languages in the query results and keeping the possibility to perform a search in several languages in parallel. The overall aim would be a constant field name and a an additional Solr parameter "lang=XX_YY" that allows to return the results in the chosen language while searches are applied to all languages. Setting up several cores to obtain a generic field name is not an option. Does anyone know of a clean way to achieve this, particularly routing content indexed to a generic field (e.g. title) to a "background field" (e.g. title_en, title_fr) etc on the fly and retrieving it from there depending on the language chosen. Background: So far, we have investigated the multi-language field approach offered by Trey Grainger in the code examples for "Solr in Action" (https://github.com/treygrainger/solr-in-action.git, chapter 14), an extension to the ordinary textField that allows to use a generic field name and the language is encoded at the beginning of the field content and appropriate index and query analyzers associated to dummy fields in schema.xml. If there is a way to store data in these dummy fields and additionally the lang parameter is added we might be done. Thanks a lot, best regards Johannes
Re: Multilingual Solr
Hi Alessandro, hi Alexandre, Thanks a lot for your reply and your considerations and hints. We use a web front end that comes bundled with Solr. It currently uses a single core approach. We would like to stick to the original setup as closely as possible to avoid administrative overhead and to not prevent the possible use of several cores in a different context in the future. This is the reason why we would like to hide the language fields completely from the front end apart from specifying an additional language parameter. Language detection on indexing is currently not an issue for us, as we get the input in a standardized format and thus can determine the language beforehand. https://github.com/treygrainger/solr-in-action/blob/master/example-docs/ch14/cores/multi-language-field/conf/schema.xml shows an example how the multiText field type makes use of language specific field types to specify the analyzers that are being used. The core issue for us (pun intended ;-)) is to find out whether it is possible to extend this approach to only return the selected language(s), i.e. to transparently add something like nested documents. Best regards Johannes On 06.06.2016 10:10, Alessandro Benedetti wrote: Hi Johannes, nothing out of the box unfortunately but could be a nice idea and contribution. If having a multi-core setup is not an option ( out of curiousity, can I ask why ?) you could proceed in this way : 1) you define in the schema N field variation per field you are interested in. N is the number of language you can support. Given for example the text field you define : text field not indexed, only stored text_en indexed text_fr indexed text_it indexed ... 2) At indexing time you can develop a custom updateRequestProcessor that will identify the language ( Solr internal libraries offer support for that) and address the correct text field to index the content . If you want to index also translations, you need to rely on some third party libraries to do that. 3) At query time you can address in parallel all the fields you want, with the edismax query parser for example . 4) For rendering the results, I don't have exactly clear, do you want to : a) translate the document content in the language you want, you could develop a custom DocTransformer that will take in input the language and translate, but I don't see that much benefit in that. b) return only the documents that originally were of that language. This case is easy, you add a fq at queyTime to filter only the documents of the language you want ( at indexing time you identify the language) c) return the original content of the document, this is quite easy. You can store the generic "text" field, and always return that. Let us know for further discussion, Cheers On Sun, Jun 5, 2016 at 9:57 PM, Riedl, Johannes < johannes.ri...@uni-tuebingen.de> wrote: Hi all, we are currently in search of a solution for switching between different languages in the query results and keeping the possibility to perform a search in several languages in parallel. The overall aim would be a constant field name and a an additional Solr parameter "lang=XX_YY" that allows to return the results in the chosen language while searches are applied to all languages. Setting up several cores to obtain a generic field name is not an option. Does anyone know of a clean way to achieve this, particularly routing content indexed to a generic field (e.g. title) to a "background field" (e.g. title_en, title_fr) etc on the fly and retrieving it from there depending on the language chosen. Background: So far, we have investigated the multi-language field approach offered by Trey Grainger in the code examples for "Solr in Action" ( https://github.com/treygrainger/solr-in-action.git, chapter 14), an extension to the ordinary textField that allows to use a generic field name and the language is encoded at the beginning of the field content and appropriate index and query analyzers associated to dummy fields in schema.xml. If there is a way to store data in these dummy fields and additionally the lang parameter is added we might be done. Thanks a lot, best regards Johannes
Sharding vs single index vs separate collection
Hi, I have a solr cloud setup, with document routing (implicit routing with router field). As the index is about documents with a publication date, I routed according the publication year, as in my case, most of the search queries will have a year specified. Now, what would be the best strategy -as regards performance (i.e. a huge amount of queries to be processed)- for search queries without any year specified? 1 - Is it enough to define that these queries should go over all routes (i.e. route=year1, year2, ..., yearN)? 2 - Would it be better to add a separate node with a separate index that is not routed (but maybe sharded/splitted)? If so, how should I deal with such a separate index? Is it possible to add it to my existing Solr cloud? Would it go into a separate collection? Thanks for your advice. Johannes
SolrCloud indexing -- 2 collections, 2 indexes, sharing the same nodes possible?
I have a working SolrCloud-Setup with 38 nodes with a collection spanning over these nodes with 2 shards per node and replication factor 2 and a router field. Now I got some new data for indexing which has the same structure and size as my existing index in the described collection. However, although it has the same structure the new data to be indexed should not be mixed with the old data. Do I have create another 38 new nodes and a new collection and index the new data or is there a better / more efficient way I could use the existing nodes? Is it possible that the 2 collections could share the 38 nodes without the indexes being mixed? Thanks for your help. Johannes
Re: SolrCloud indexing -- 2 collections, 2 indexes, sharing the same nodes possible?
Thank you, Susheel, for the quick response. So, that means that when I create a new collection, it shards will be newly created at each node, right? Thus, if I have two collections with numShards=38, maxShardsPerNode=2 and replicationFactor=2 on my 38 nodes, then this would result in each node "hosting" 4 shards (two from each collection). If this is correct, I have two follow up questions: 1) As regards naming of the shards: Is using the same naming for the shards o.k. in this constellation? I.e. does it create trouble to have e.g. "Shard001", "Shard002", etc. in collection1 and "Shard001", "Shard002", etc. as well in collection2? 2) Performance: In my current single collection setup, I have 2 shards per node. After creating the second collection, there will be 4 shards per node. Do I have to edit the RAM per node value (raise the -m parameter when starting the node)? In my case, I am quite sure that the collections will never be queried simultaneously. So will the "running but idle" collection slow me down? Johannes -Ursprüngliche Nachricht- Von: Susheel Kumar [mailto:susheel2...@gmail.com] Gesendet: Mittwoch, 30. August 2017 17:36 An: solr-user@lucene.apache.org Betreff: Re: SolrCloud indexing -- 2 collections, 2 indexes, sharing the same nodes possible? Yes, absolutely. You can create as many as collections you need (like you would create table in relational world). On Wed, Aug 30, 2017 at 10:13 AM, Johannes Knaus wrote: > I have a working SolrCloud-Setup with 38 nodes with a collection > spanning over these nodes with 2 shards per node and replication > factor 2 and a router field. > > Now I got some new data for indexing which has the same structure and > size as my existing index in the described collection. > However, although it has the same structure the new data to be indexed > should not be mixed with the old data. > > Do I have create another 38 new nodes and a new collection and index > the new data or is there a better / more efficient way I could use the > existing nodes? > Is it possible that the 2 collections could share the 38 nodes without > the indexes being mixed? > > Thanks for your help. > > Johannes >
What does the replication factor parameter in collections api do?
Hi, I am still quite new to Solr. I have the following setup: A SolrCloud setup with 38 nodes, maxShardsPerNode=2, implicit routing with routing field, and replication factor=2. Now, I want to add replica. This works fine by first increasing the maxShardsPerNode to a higher number and then add replicas. So far, so good. I can confirm changes of the maxShardsPerNode parameter and added replicas in the Admin UI. However, the Solr Admin UI still is showing me a replication factor of 2. I am a little confused about what the replicationfactor parameter actually does in my case: 1) What does that mean? Does Solr make use of all replicas I have or only of two? 2) Do I need to increase the replication factor value as well to really have more replicas available and usable? If this is true, do I need to restart/reload the collection newly upload configs to Zookeeper or anything alike? 3) Or is replicationfactor just a parameter that is needed for the first start of SolrCloud and can be ignored afterwards? Thank you very much for your help, All the best, Johannes
AW: What does the replication factor parameter in collections api do?
Ok. Thank you for your quick reply. Though I still feel a little uneasy. Why is it possible then to alter replicationFactor via MODIFYCOLLECTION in the collections API? What would be the use case for this parameter at all then? -Ursprüngliche Nachricht- Von: Erick Erickson [mailto:erickerick...@gmail.com] Gesendet: Mittwoch, 12. April 2017 19:36 An: solr-user Betreff: Re: What does the replication factor parameter in collections api do? really <3>. replicationFactor is used to set up your collection initially, you have to be able to change your topology afterwards so it's ignored thereafter. Once your replica is added, it's automatically made use of by the collection. On Wed, Apr 12, 2017 at 9:30 AM, Johannes Knaus wrote: > Hi, > > I am still quite new to Solr. I have the following setup: > A SolrCloud setup with > 38 nodes, > maxShardsPerNode=2, > implicit routing with routing field, > and replication factor=2. > > Now, I want to add replica. This works fine by first increasing the > maxShardsPerNode to a higher number and then add replicas. > So far, so good. I can confirm changes of the maxShardsPerNode parameter and > added replicas in the Admin UI. > However, the Solr Admin UI still is showing me a replication factor of 2. > I am a little confused about what the replicationfactor parameter actually > does in my case: > > 1) What does that mean? Does Solr make use of all replicas I have or only of > two? > 2) Do I need to increase the replication factor value as well to really have > more replicas available and usable? If this is true, do I need to > restart/reload the collection newly upload configs to Zookeeper or anything > alike? > 3) Or is replicationfactor just a parameter that is needed for the first > start of SolrCloud and can be ignored afterwards? > > Thank you very much for your help, > All the best, > Johannes >
Re: AW: What does the replication factor parameter in collections api do?
Thank you all very much for your answers. That definitely explains it. All the best, Johannes > Am 13.04.2017 um 17:03 schrieb Erick Erickson : > > bq: Why is it possible then to alter replicationFactor via > MODIFYCOLLECTION in the collections API > > Because MODIFYCOLLECTION just changes properties in the collection > definition generically and replicationFactor just happens to be one. > IOW there's no overarching reason. > > It would be extra work to dis-allow that one case and possibly > introduce errors without changing any functionality so nobody was > willing to put in the effort. > > Best, > Erick > >> On Thu, Apr 13, 2017 at 5:48 AM, Shawn Heisey wrote: >>> On 4/13/2017 3:22 AM, Johannes Knaus wrote: >>> Ok. Thank you for your quick reply. Though I still feel a little >>> uneasy. Why is it possible then to alter replicationFactor via >>> MODIFYCOLLECTION in the collections API? What would be the use case >>> for this parameter at all then? >> >> If you use a very specific storage method for your indexes -- HDFS -- >> then replicationFactor has meaning beyond initial collection creation, >> in conjunction with the "autoAddReplicas" feature. >> >> https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+HDFS#RunningSolronHDFS-AutomaticallyAddReplicasinSolrCloud >> >> If you are NOT utilizing the very specific HDFS storage engine, then >> everything you were told applies. With standard storage mechanisms, >> replicationFactor has zero meaning after initial collection creation, >> and changing the value will have no effect. >> >> Thanks, >> Shawn >>
changed query behavior
Hi, I have updated my solr instance from 4.5.1 to 4.7.1. Now my solr query failing some tests. Query: q=*:*&fq=(title:((T&E)))?debug=true Before the update: *:* *:* MatchAllDocsQuery(*:*) *:* LuceneQParser (title:((T&E))) +title:t&e +title:t +title:e ... After the update: *:* *:* MatchAllDocsQuery(*:*) *:* LuceneQParser (title:((T&E))) +((title:t&e title:t)/no_coord) +title:e ... Before update the query deliver only one result. Now the query deliver three results. Do you have any idea why the parsed_filter_queries is "+((title:t&e title:t)/no_coord) +title:e" instead of "+title:t&e +title:t +title:e"? "title"-field definition: positionIncrementGap="100" omitNorms="true"> mapping="mapping.txt"/> generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="1" splitOnCaseChange="1" splitOnNumerics="1" preserveOriginal="1" stemEnglishPossessive="0"/> mapping="mapping.txt"/> synonyms="synonyms.txt" ignoreCase="true" expand="false"/> generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="0" splitOnNumerics="0" preserveOriginal="1"/> The default query operator is AND. Thanks! Johannes
Bug within the solr query parser (version 4.7.1)
Hi, I have updated my solr instance from 4.5.1 to 4.7.1. Now the parsed query seems to be not correct. Query: /*q=*:*&fq=title:T&E&debug=true */ Before the update the parsed filter query is "*/+title:t&e +title:t +title:e/*". After the update the parsed filter query is "*/+((title:t&e title:t)/no_coord) +title:e/*". It seems like a bug within the query parser. I also have validated the parsed filter query with the analysis component. The result was "*/+title:t&e +title:t +title:e/*". The behavior is equal on all special characters that split words into 2 parts. I use the following WordDelimiterFilter on query side: generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="0" splitOnNumerics="0" preserveOriginal="1"/> Thanks. Johannes Additional informations: Debug before the update: *:* *:* MatchAllDocsQuery(*:*) *:* LuceneQParser (title:((T&E))) * ** **+title:t&e +title:t +title:e ** ** * ... Debug after the update: *:* *:* MatchAllDocsQuery(*:*) *:* LuceneQParser (title:((T&E))) * ** **+((title:t&e title:t)/no_coord) +title:e ** *** ... "title"-field definition: positionIncrementGap="100" omitNorms="true"> mapping="mapping.txt"/> generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="1" splitOnCaseChange="1" splitOnNumerics="1" preserveOriginal="1" stemEnglishPossessive="0"/> mapping="mapping.txt"/> synonyms="synonyms.txt" ignoreCase="true" expand="false"/> generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="0" splitOnNumerics="0" preserveOriginal="1"/>
FilteredQuery with TermsFilter swallowing results after upgrade to solr 4.8
Hi, I am in the process of upgrading an extension I made to QueryComponent from solr 3.4 to solr 4.8. I am wrapping the query into a filteredquery together with a termsfilter that encapsulates a lot of Terms for up to two fields (potentially 10s of thousands, only two in my simple test case). My extension worked fine in solr 34 and I have used it for years. After upgrading to solr 4.8 and compiling the extension against the new source (Termsfilter API changed a little bit in how you pass in the terms), I am no longer getting any records back when running a query. The same query not involving the filter returns the expected results. A semantically equivalent query using a lot of OR clauses in the fq query parameter works fine, but is about 10 times slower, so I would really like to get the TermsFilter to work. I printed out the Query in solr 34 and in solr 48 and they differ (Unfortunately I do not know how to read these lines.): solr 34: filtered(+(bedbathbeyond:portlandia | title_pst:portlandia^1.5 | license_plate:portlandia | title_tst:portlandia^2.0 | description_pst:portlandia^0.8 | description_tst:portlandia | phone_number:portlandia | reference_number:portlandia))->org.apache.lucene.search.TermsFilter@66c21442 solr 48: filtered(+(+(reference_number:portlandia | title_tst:portlandia^2.0 | license_plate:portlandia | phone_number:portlandia | bedbathbeyond:portlandia | title_pst:portlandia^1.5 | description_tst:portlandia | description_pst:portlandia^0.8)))->property_group_id:984678480 property_id:984678954 (The query info for the latter line was: INFO: [property_test] webapp=/solr path=/select params={fl=*+score&start=0&q=portlandia&qf=title_pst^1.5+title_tst^2+description_pst^0.8+description_tst+bedbathbeyond+phone_number+reference_number+license_plate&properties=984678954&wt=ruby&groups=984678480&fq=type:Property&fq=visibility_s:visible&rows=25&defType=edismax} hits=0 status=0 QTime=5 ) I attached a copy of my source code, and marked the changes I made to the code of QueryComponent with comments - maybe there is something obviously wrong. Any help or pointers are appreciated, also please let me know if I should rather write to the dev list than the users list. thanks, Johannes
default query operator ignored by edismax query parser
Hi, I have defined the following edismax query parser: name="defaults">100%name="defType">edismax0.01name="ps">100*:*name="q.op">ANDfield1^2.0 field2name="rows">10* My search query looks like: q=(word1 word2) OR (word3 word4) Since I specified AND as default query operator, the query should match documents by ((word1 AND word2) OR (word3 AND word4)) but the query matches documents by ((word1 OR word2) OR (word3 OR word4)). Could anyone explain the behaviour? Thanks! Johannes P.S. The query q=(word1 word2) match all documents by (word1 AND word2)
Re: default query operator ignored by edismax query parser
Thanks Shawn! In this case I will use operators everywhere. Johannes Am 25.06.2014 15:09, schrieb Shawn Heisey: On 6/25/2014 1:05 AM, Johannes Siegert wrote: I have defined the following edismax query parser: 100%edismax0.01100*:*ANDfield1^2.0 field210* My search query looks like: q=(word1 word2) OR (word3 word4) Since I specified AND as default query operator, the query should match documents by ((word1 AND word2) OR (word3 AND word4)) but the query matches documents by ((word1 OR word2) OR (word3 OR word4)). Could anyone explain the behaviour? I believe that you are running into this bug: https://issues.apache.org/jira/browse/SOLR-2649 It's a very old bug, coming up on three years. The workaround is to not use boolean operators at all, or to use operators EVERYWHERE so that your intent is explicitly described. It is not much of a workaround, but it does work. Thanks, Shawn
wrong docFreq while executing query based on uniqueKey-field
Hi. My solr-index (version=4.7.2.) has an id-field: ... id The index will be updated once per hour. I use the following query to retrieve some documents: "q=id:2^2 id:1^1" I would expect that the document(2) should be always before the document(1). But after many index updates document(1) is before document(2). With debug=true I could see the problem. The document(1) has a docFreq=2, while the document(2) has a docFreq=1. How could the docFreq of the uniqueKey-field be hight than 1? Could anyone explain this behavior to me? Thanks! Johannes
NGramTokenizer influence to length normalization?
Hi, does the NGramTokenizer have an influence to the length normalization? Thanks. Johannes
high memory usage with small data set
Hi, we are using Apache Solr Cloud within a production environment. If the maximum heap-space is reached the Solr access time slows down, because of the working garbage collector for a small amount of time. We use the following configuration: - Apache Tomcat as webserver to run the Solr web application - 13 indices with about 150 entries (300 MB) - 5 server with one replication per index (5 GB max heap-space) - All indices have the following caches - maximum document-cache-size is 4096 entries, all other indices have between 64 and 1536 entries - maximum query-cache-size is 1024 entries, all other indices have between 64 and 768 - maximum filter-cache-size is 1536 entries, all other i ndices have between 64 and 1024 - the directory-factory-implementation is NRTCachingDirectoryFactory - the index is updated once per hour (no auto commit) - ca. 5000 requests per hour per server - large filter-queries (up to 15000 bytes and 1500 boolean operations) - many facet-queries (30%) Behaviour: Started with 512 MB heap space. Over several days the heap-space grow up, until the 5 GB was reached. At this moment the described problem occurs. From this time on the heap-space-useage is between 50 and 90 percent. No OutOfMemoryException occurs. Questions: 1. Why does Solr use 5 GB ram, with this small amount of data? 2. Which impact does the large filter-queries have in relation to ram usage? Thanks! Johannes Siegert
Re: high memory usage with small data set
Hi Erick, thanks for your reply. What do you exactly mean with "Do your used entries in your caches increase in parallel?"? I update the indices every hour and commit the changes. So a new searcher with empty or autowarmed caches should be created and the old one should be removed. Johannes Am 30.01.2014 15:08, schrieb Erick Erickson: Do your used entries in your caches increase in parallel? This would be the case if you aren't updating your index and would explain it. BTW, take a look at your cache statistics (from the admin page) and look at the cache hit ratios. If they are very small (and my guess is that with 1,500 boolean operations, you aren't getting significant re-use) then you're just wasting space, try the cache=false option. Also, how are you measuring memory? It's sometimes confusing that virtual memory can be include, see: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html Best, Erick On Wed, Jan 29, 2014 at 7:49 AM, Johannes Siegert wrote: Hi, we are using Apache Solr Cloud within a production environment. If the maximum heap-space is reached the Solr access time slows down, because of the working garbage collector for a small amount of time. We use the following configuration: - Apache Tomcat as webserver to run the Solr web application - 13 indices with about 150 entries (300 MB) - 5 server with one replication per index (5 GB max heap-space) - All indices have the following caches - maximum document-cache-size is 4096 entries, all other indices have between 64 and 1536 entries - maximum query-cache-size is 1024 entries, all other indices have between 64 and 768 - maximum filter-cache-size is 1536 entries, all other i ndices have between 64 and 1024 - the directory-factory-implementation is NRTCachingDirectoryFactory - the index is updated once per hour (no auto commit) - ca. 5000 requests per hour per server - large filter-queries (up to 15000 bytes and 1500 boolean operations) - many facet-queries (30%) Behaviour: Started with 512 MB heap space. Over several days the heap-space grow up, until the 5 GB was reached. At this moment the described problem occurs. From this time on the heap-space-useage is between 50 and 90 percent. No OutOfMemoryException occurs. Questions: 1. Why does Solr use 5 GB ram, with this small amount of data? 2. Which impact does the large filter-queries have in relation to ram usage? Thanks! Johannes Siegert
solr-query with NOT and OR operator
Hi, my solr-request contains the following filter-query: fq=((-(field1:value1)))+OR+(field2:value2). I expect solr deliver documents matching to ((-(field1:value1))) and documents matching to (field2:value2). But solr deliver only documents, that are the result of (field2:value2). I receive several documents, if I request only for ((-(field1:value1))). Thanks! Johannes
Re: solr-query with NOT and OR operator
Hi Jack, thanks! fq=((*:* -(field1:value1)))+OR+(field2:value2). This is the solution. Johannes Am 11.02.2014 17:22, schrieb Jack Krupansky: With so many parentheses in there, I wonder what you are really trying to do Try expressing your query in simple English first so that we can understand your goal. But generally, a purely negative nested query must have a *:* term to apply the exclusion against: fq=((*:* -(field1:value1)))+OR+(field2:value2). -- Jack Krupansky -Original Message- From: Johannes Siegert Sent: Tuesday, February 11, 2014 10:57 AM To: solr-user@lucene.apache.org Subject: solr-query with NOT and OR operator Hi, my solr-request contains the following filter-query: fq=((-(field1:value1)))+OR+(field2:value2). I expect solr deliver documents matching to ((-(field1:value1))) and documents matching to (field2:value2). But solr deliver only documents, that are the result of (field2:value2). I receive several documents, if I request only for ((-(field1:value1))). Thanks! Johannes -- Johannes Siegert Softwareentwickler Telefon: 0351 - 418 894 -73 Fax: 0351 - 418 894 -99 E-Mail: johannes.sieg...@marktjagd.de Xing: https://www.xing.com/profile/Johannes_Siegert2 Webseite: http://www.marktjagd.de Blog: http://blog.marktjagd.de Facebook: http://www.facebook.com/marktjagd Twitter: http://twitter.com/Marktjagd __ Marktjagd GmbH | Schützenplatz 14 | D - 01067 Dresden Geschäftsführung: Jan Großmann Sitz Dresden | Amtsgericht Dresden | HRB 28678
Replication of a corrupt master index
Hi, If I have a master/slave setup and the master index gets corrupted, will the slaves realize they should not replicate from the master anymore, since the master does not have a newer index version? I'm using Solr version 4.2.1. Regards, Johannes
AW: Replication of a corrupt master index
Thanks for your response, Erick. Do you think it is possible to corrupt an index merely with HTTP requests? I've been using the aforementioned m/s setup for years now and have never seen a master failure. I'm trying to think of scenarios where this setup (1 master, 4 slaves) might have a total outage. The master runs on a h/a cluster. Regards, Johannes -Ursprüngliche Nachricht- Von: Erick Erickson [mailto:erickerick...@gmail.com] Gesendet: Dienstag, 2. Dezember 2014 15:54 An: solr-user@lucene.apache.org Betreff: Re: Replication of a corrupt master index No. The master is the master and will always stay the master unless you change it. This is one of the reasons I really like to keep the original source around in case I every have this problem. Best, Erick On Tue, Dec 2, 2014 at 2:34 AM, Charra, Johannes wrote: > > Hi, > > If I have a master/slave setup and the master index gets corrupted, will the > slaves realize they should not replicate from the master anymore, since the > master does not have a newer index version? > > I'm using Solr version 4.2.1. > > Regards, > Johannes > >
looking for working example defType=term
Hi, can anyone provide a working example (solrconfig.xml,schema.xml) using the TermQParserPlugin? I always get a Nullpointer-Exception on startup: 8920 [searcherExecutor-4-thread-1] ERROR org.apache.solr.core.SolrCore û java.lang.NullPointerException at org.apache.solr.search.TermQParserPlugin$1.parse(TermQParserPlugin.java:55) at org.apache.solr.search.QParser.getQuery(QParser.java:142) at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:142) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:187) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904) at org.apache.solr.core.QuerySenderListener.newSearcher(QuerySenderListener.java:64) at org.apache.solr.core.SolrCore$5.call(SolrCore.java:1693) at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) solarconfig.xml: explicit term 10 id Thanks, Johannes
FW: looking for working example defType=term
Well, i couldnt get it work but maybe thats because im not a solr expert. What im trying to do is: I have an index with only one indexed field. This field is an id so I don't want the standard queryparser to try to break it up in tokens. On the client side I use solrj like this: SolrQuery solrQuery = new SolrQuery().setQuery(""); QueryResponse queryResponse = getSolrServer().query(solrQuery); I'd like to configure the TermQParserPlugin on the server side to minimize my queries. Johannes -Original Message- From: Erik Hatcher [mailto:erik.hatc...@gmail.com] Sent: Montag, 12. August 2013 17:10 To: Johannes Elsinghorst Subject: Re: looking for working example defType=term How are you using the term query parser? The term query parser requires a field to be specified. I use it this way: q=*:*&fq={!term f=category}electronics The "term" query parser would never make sense as a defType query parser, I don't think (you have to set the field through local params). Erik On Aug 12, 2013, at 11:01 , Johannes Elsinghorst wrote: > Hi, > can anyone provide a working example (solrconfig.xml,schema.xml) using the > TermQParserPlugin? I always get a Nullpointer-Exception on startup: > 8920 [searcherExecutor-4-thread-1] ERROR org.apache.solr.core.SolrCore û > java.lang.NullPointerException > at > org.apache.solr.search.TermQParserPlugin$1.parse(TermQParserPlugin.java:55) > at org.apache.solr.search.QParser.getQuery(QParser.java:142) > at > org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:142) > at > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:187) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904) > at > org.apache.solr.core.QuerySenderListener.newSearcher(QuerySenderListener.java:64) > at org.apache.solr.core.SolrCore$5.call(SolrCore.java:1693) > at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) > at java.util.concurrent.FutureTask.run(Unknown Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown > Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > at java.lang.Thread.run(Unknown Source) > > solarconfig.xml: > >explicit > term > 10 > id > > > Thanks, > Johannes > >
Re: Multi CPU Cores
Did you try to submit multiple search requests in parallel? The apache ab tool is great tool to simulate simultaneous load using (-n and -c). Johannes On Oct 15, 2011, at 7:32 PM, Rob Brown wrote: > Hi, > > I'm running Solr on a machine with 16 CPU cores, yet watching "top" > shows that java is only apparently using 1 and maxing it out. > > Is there anything that can be done to take advantage of more CPU cores? > > Solr 3.4 under Tomcat > > [root@solr01 ~]# java -version > java version "1.6.0_20" > OpenJDK Runtime Environment (IcedTea6 1.9.8) > (rhel-1.22.1.9.8.el5_6-x86_64) > OpenJDK 64-Bit Server VM (build 19.0-b09, mixed mode) > > > top - 14:36:18 up 22 days, 21:54, 4 users, load average: 1.89, 1.24, > 1.08 > Tasks: 317 total, 1 running, 315 sleeping, 0 stopped, 1 zombie > Cpu0 : 0.0%us, 0.0%sy, 0.0%ni, 99.6%id, 0.4%wa, 0.0%hi, 0.0%si, > 0.0%st > Cpu1 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, > 0.0%st > Cpu2 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, > 0.0%st > Cpu3 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, > 0.0%st > Cpu4 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, > 0.0%st > Cpu5 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, > 0.0%st > Cpu6 : 99.6%us, 0.4%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, > 0.0%st > Cpu7 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, > 0.0%st > Cpu8 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, > 0.0%st > Cpu9 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, > 0.0%st > Cpu10 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, > 0.0%st > Cpu11 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, > 0.0%st > Cpu12 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, > 0.0%st > Cpu13 : 0.7%us, 0.0%sy, 0.0%ni, 99.3%id, 0.0%wa, 0.0%hi, 0.0%si, > 0.0%st > Cpu14 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, > 0.0%st > Cpu15 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, > 0.0%st > Mem: 132088928k total, 23760584k used, 108328344k free, 318228k > buffers > Swap: 25920868k total,0k used, 25920868k free, 18371128k cached > > PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ > COMMAND > > > 4466 tomcat20 0 31.2g 4.0g 171m S 101.0 3.2 2909:38 > java > > > 6495 root 15 0 42416 3892 1740 S 0.4 0.0 9:34.71 > openvpn > > > 11456 root 16 0 12892 1312 836 R 0.4 0.0 0:00.08 > top > > >1 root 15 0 10368 632 536 S 0.0 0.0 0:04.69 > init > > >
Re: Multi CPU Cores
Try using -useParallelGc as vm option. Johannes On Oct 16, 2011, at 7:51 AM, Ken Krugler wrote: > > On Oct 16, 2011, at 1:44pm, Rob Brown wrote: > >> Looks like I checked the load during a quiet period, ab -n 1 -c 1000 >> saw a decent 40% load on each core. >> >> Still a little confused as to why 1 core stays at 100% constantly - even >> during the quiet periods? > > Could be background GC, depending on what you've got your JVM configured to > use. > > Though that shouldn't stay at 100% for very long. > > -- Ken > > >> -Original Message- >> From: Johannes Goll >> Reply-to: solr-user@lucene.apache.org >> To: solr-user@lucene.apache.org >> Subject: Re: Multi CPU Cores >> Date: Sat, 15 Oct 2011 21:30:11 -0400 >> >> Did you try to submit multiple search requests in parallel? The apache ab >> tool is great tool to simulate simultaneous load using (-n and -c). >> Johannes >> >> On Oct 15, 2011, at 7:32 PM, Rob Brown wrote: >> >>> Hi, >>> >>> I'm running Solr on a machine with 16 CPU cores, yet watching "top" >>> shows that java is only apparently using 1 and maxing it out. >>> >>> Is there anything that can be done to take advantage of more CPU cores? >>> >>> Solr 3.4 under Tomcat >>> >>> [root@solr01 ~]# java -version >>> java version "1.6.0_20" >>> OpenJDK Runtime Environment (IcedTea6 1.9.8) >>> (rhel-1.22.1.9.8.el5_6-x86_64) >>> OpenJDK 64-Bit Server VM (build 19.0-b09, mixed mode) >>> >>> >>> top - 14:36:18 up 22 days, 21:54, 4 users, load average: 1.89, 1.24, >>> 1.08 >>> Tasks: 317 total, 1 running, 315 sleeping, 0 stopped, 1 zombie >>> Cpu0 : 0.0%us, 0.0%sy, 0.0%ni, 99.6%id, 0.4%wa, 0.0%hi, 0.0%si, >>> 0.0%st >>> Cpu1 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, >>> 0.0%st >>> Cpu2 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, >>> 0.0%st >>> Cpu3 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, >>> 0.0%st >>> Cpu4 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, >>> 0.0%st >>> Cpu5 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, >>> 0.0%st >>> Cpu6 : 99.6%us, 0.4%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, >>> 0.0%st >>> Cpu7 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, >>> 0.0%st >>> Cpu8 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, >>> 0.0%st >>> Cpu9 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, >>> 0.0%st >>> Cpu10 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, >>> 0.0%st >>> Cpu11 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, >>> 0.0%st >>> Cpu12 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, >>> 0.0%st >>> Cpu13 : 0.7%us, 0.0%sy, 0.0%ni, 99.3%id, 0.0%wa, 0.0%hi, 0.0%si, >>> 0.0%st >>> Cpu14 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, >>> 0.0%st >>> Cpu15 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, >>> 0.0%st >>> Mem: 132088928k total, 23760584k used, 108328344k free, 318228k >>> buffers >>> Swap: 25920868k total,0k used, 25920868k free, 18371128k cached >>> >>> PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ >>> COMMAND >>> >>> >>> 4466 tomcat20 0 31.2g 4.0g 171m S 101.0 3.2 2909:38 >>> java >>> >>> >>> 6495 root 15 0 42416 3892 1740 S 0.4 0.0 9:34.71 >>> openvpn >>> >>> >>> 11456 root 16 0 12892 1312 836 R 0.4 0.0 0:00.08 >>> top >>> >>> >>> 1 root 15 0 10368 632 536 S 0.0 0.0 0:04.69 >>> init >>> >>> >>> >> > > -- > Ken Krugler > +1 530-210-6378 > http://bixolabs.com > custom big data solutions & training > Hadoop, Cascading, Mahout & Solr > > >
Re: Multi CPU Cores
we use the the following in production java -server -XX:+UseParallelGC -XX:+AggressiveOpts -XX:+DisableExplicitGC -Xms3G -Xmx40G -Djetty.port= -Dsolr.solr.home= jar start.jar more information http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html Johannes
Re: Multi CPU Cores
Yes, same thing. This was for the jetty servlet container not tomcat. I would refer to the tomcat documentation on how to modify/configure the java runtime environment (JRE) arguments for your running instance. Johannes On Oct 17, 2011, at 4:01 AM, Robert Brown wrote: > Where exactly do you set this up? We're running Solr3.4 under tomcat, > OpenJDK 1.6.0.20 > > btw, is the JRE just a different name for the VM? Apologies for such a > newbie Java question. > > > > On Sun, 16 Oct 2011 12:51:44 -0400, Johannes Goll > wrote: >> we use the the following in production >> >> java -server -XX:+UseParallelGC -XX:+AggressiveOpts >> -XX:+DisableExplicitGC -Xms3G -Xmx40G -Djetty.port= >> -Dsolr.solr.home= jar start.jar >> >> more information >> http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html >> >> Johannes >
Re: Hierarchical faceting in UI
another way is to store the original hierarchy in a sql database (in the form: id, parent_id, name, level) and in the Lucene index store the complete hierarchy (from root to leave node) for each document in one field using the ids of the sql database. In that way you can get documents at any level of the hierarchy. You can use the sql database to dynamically expand the tree by building facet queries to fetch document collections of child-nodes. Johannes from the root level down to leave node in one field "1 13 32 42 23 12" 2012/1/23 : > > On Mon, 23 Jan 2012 14:33:00 -0800 (PST), Yuhao > wrote: >> Programmatically, something like this might work: for each facet field, >> add another hidden field that identifies its parent. Then, program >> additional logic in the UI to show only the facet terms at the currently >> selected level. For example, if one filters on "cat:electronics", the > new >> UI logic would apply the additional filter "cat_parent:electronics". > Can >> this be done? > > Yes. This is how I do it. > >> Would it be a lot of work? > No. Its not a lot of work, simply represent your hierarchy as parent/child > relations in the document fields and in your UI drill down by issuing new > faceted searches. Use the current facet (tree level) as the parent: > in the next query. Its much easier than other suggestions for this. > >> Is there a better way? > Not in my opinion, there isn't. This is the simplest to implement and > understand. > >> >> By the way, Flamenco (another faceted browser) has built-in support for >> hierarchies, and it has worked well for my data in this aspect (but less >> well than Solr in others). I'm looking for the same kind of > hierarchical >> UI feature in Solr.
AW: Preferred query notation for alternative field values
Thanks for the hint. You are right: Both queries are identical after parsing. >>> -Ursprüngliche Nachricht- >>> Von: Upayavira [mailto:u...@odoko.co.uk] >>> Gesendet: Mittwoch, 28. November 2012 12:04 >>> An: solr-user@lucene.apache.org >>> Betreff: Re: Preferred query notation for alternative field values >>> >>> Use debugQuery=true to see the format of the parsed query. >>> >>> Solr will parse the query that you provide into Lucene Query objects, which >>> are >>> then used to execute the query. The parsed query info provided by >>> debugQuery=true is basically these Query objects converted back into a >>> string >>> representation, showing exactly what the query was parsed into. >>> >>> I bet you they are both parsed to more or less the same thing, and thus no >>> real >>> impact on query time. >>> >>> Upayavira >>> >>> On Wed, Nov 28, 2012, at 10:54 AM, Charra, Johannes wrote: >>> > >>> > Hi all, >>> > >>> > Is there any reason to prefer a query >>> > >>> > field:value1 OR field:value2 OR field:value3 OR field:value4 >>> > >>> > over >>> > >>> > field:(value1 OR value2 OR value3 OR value4) >>> > >>> > in terms of performance? From what I perceive, there is no difference, >>> > so I'd prefer the second query for readability reasons. >>> > >>> > Regards, >>> > Johannes
Index-time synonyms and trailing wildcard issue
Hi, I use Solr 3.6.0 with a synonym filter as the last filter at index time, using a list of stemmed terms. When i do a wildcard search that matches a part of an entry on the synonym list, the synonyms found are used by solr to generate the search results. I am trying to disable that behaviour, but with no success. Example: Stemmed synonyms: apfelsin, orang Search term: apfel* Matches: Apfelkuchen, Apfelsaft, Apfelsine... (good, i want these matches) Orange (bad, i dont want this match) My questions are: - Why does the synonym filter react on a wildcard query? For it is not a multiterm-aware component (see http://lucene.apache.org/solr/api-3_6_1/org/apache/solr/analysis/MultiTermAwareComponent.html) - How can i disable this behaviour, so that "Orange" is no longer returned by the query for "apfel*"? Regards, Johannes
Re: Index-time synonyms and trailing wildcard issue
Hello Jack, Thanks for your answer, it helped me gaining a deeper understandig what happens at index time, and finding a solution myself: It seems that putting the synonym filter in both filter chains (index and query), setting expand="false", and putting the desired synonym first in the row, does the trick: Synonyms line (reversed order!): orange, apfelsine All documents containing "apfelsine" are now mapped to "orange", so there are no more documets containing "apfelsine" that would match a wildcard-query for "apfel*" ("Apfelsine" is a true synonym for "Orange" in german, meaning "chinese apple". "Apfel" = apple, shouldnt match oranges). Problem solved, thanks again for the help! Johannes Rodenwald - Ursprüngliche Mail - Von: "Jack Krupansky" An: solr-user@lucene.apache.org Gesendet: Mittwoch, 13. Februar 2013 17:17:40 Betreff: Re: Index-time synonyms and trailing wildcard issue By doing synonyms at index time, you cause "apfelsin" to be added to documents that contain only "orang", so of course documents that previously only contained "orang" will now match for "apfelsin" or any term query that matches "apfelsin", such as a wildcard. At query time, Lucene cannot tell whether your original document contained "apfelsin" or if "apfelsin" was added when the document was indexed due to an index-time synonym. Solution: Either disable index time synonyms, or have a parallel field (via copyField) that does not have the index-time synonyms. But... perhaps you should clarify what you really intend to happen with these pseudo-synonyms. -- Jack Krupansky
Re: Solr Grouping and empty fields
Hi Oussama, If you have only a few distinct, unchanging values in the field that you group upon, you could implement a FilterQuery (query parameter "fq") and add it to the query, allowing all valid values, but not an empty field. For example: fq=my_grouping_string_field:( value_a OR value_b OR value_c OR value_d ) If you use SOLR 4.x, you should be able to group upon an integer field, allowing a range filter: (I still work with 3.6 which can only group on string fields, so i didnt test this one) fq=my_grouping_integer_field:[1 TO *] -- Johannes Rodenwald - Ursprüngliche Mail - Von: "Oussama Jilal" An: solr-user@lucene.apache.org Gesendet: Freitag, 22. Februar 2013 12:32:13 Betreff: Solr Grouping and empty fields Hi, I need to group some results in solr based on a field, but I don't want documents having that field empty to be grouped together, does anyone know how to achieve that ? -- Oussama Jilal
Update Solr Schema To Store Field
Hi, I am running apache-solr-3.1.0 and would like to change a field attribute from stored="false" to stored="true". I have several hundred cores that have been indexed without storing the field which is fine as I only would like to retrieve the value for new data that I plan to index with the updated schema. My question is whether this change affects the query behavior for the existing indexed documents which were loaded with stored ="false" Thanks a lot, Johannes
Re: summing facets on a specific field
you can use the StatsComponent http://wiki.apache.org/solr/StatsComponent with stats=true&stats.price=category&stats.facet=category and pull the sum fields from the resulting stats facets. Johannes 2012/2/5 Paul Kapla : > Hi everyone, > I'm pretty new to solr and I'm not sure if this can even be done. Is there > a way to sum a specific field per each item in a facet. For example, you > have an ecommerce site that has the following documents: > > id,category,name,price > 1,books,'solr book', $10.00 > 2,books,'lucene in action', $12.00 > 3.video, 'cool video', $20.00 > > so instead of getting (when faceting on category) > books(2) > video(1) > > I'd like to get: > books ($22) > video ($20) > > Is this something that can be even done? Any feedback would be much > appreciated. -- Dipl.-Ing.(FH) Johannes Goll 211 Curry Ford Lane Gaithersburg, Maryland 20878 USA
Re: summing facets on a specific field
I meant stats=true&stats.field=price&stats.facet=category 2012/2/6 Johannes Goll : > you can use the StatsComponent > > http://wiki.apache.org/solr/StatsComponent > > with stats=true&stats.price=category&stats.facet=category > > and pull the sum fields from the resulting stats facets. > > Johannes > > 2012/2/5 Paul Kapla : >> Hi everyone, >> I'm pretty new to solr and I'm not sure if this can even be done. Is there >> a way to sum a specific field per each item in a facet. For example, you >> have an ecommerce site that has the following documents: >> >> id,category,name,price >> 1,books,'solr book', $10.00 >> 2,books,'lucene in action', $12.00 >> 3.video, 'cool video', $20.00 >> >> so instead of getting (when faceting on category) >> books(2) >> video(1) >> >> I'd like to get: >> books ($22) >> video ($20) >> >> Is this something that can be even done? Any feedback would be much >> appreciated. > > > > -- > Dipl.-Ing.(FH) > Johannes Goll > 211 Curry Ford Lane > Gaithersburg, Maryland 20878 > USA -- Dipl.-Ing.(FH) Johannes Goll 211 Curry Ford Lane Gaithersburg, Maryland 20878 USA
Re: UI
yes, I am using this library and it works perfectly so far. If something does not work you can just modify it http://code.google.com/p/solr-php-client/ Johannes 2012/5/21 Tolga : > Hi, > > Can you recommend a good PHP UI to search? Is SolrPHPClient good?
Solr 1.4.1 stats component count not matching facet count for multi valued field
Hi, I have a facet field called option which may be multi-valued and a weight field which is single-valued. When I use the Solr 1.4.1 stats component with a facet field, i.e. q=*:*&version=2.2&stats=true& stats.field=weight&stats.facet=option I get conflicting results for the stats count result 1 when compared with the faceting counts obtained by q=*:*&version=2.2&facet=true&facet.field=option I would expect the same count for either method. This happens if multiple values are stored in the options field. It seem that for a multiple values only the last entered value is being considered in the stats component? What am I doing wrong here? Thanks, Johannes
solrconfig luceneMatchVersion 2.9.3
Hi, our index files have been created using Lucene 2.9.3 and solr 1.4.1. I am trying to use a patched version of the current trunk (solr 1.5.0 ? ). The patched version works fine with newly generated index data but not with our existing data: After adjusting the solrconfig.xml - I added the line LUCENE_40 also tried LUCENE_30 I am getting the following exception "java.lang.RuntimeException: org.apache.lucene.index.IndexFormatTooOldException: Format version is not supported in file '_q.fdx': 1 (needs to be between 2 and 2)" When I try to change it to LUCENE_29 or 2.9 or 2.9.3 I am getting "SEVERE: org.apache.solr.common.SolrException: Invalid luceneMatchVersion '2.9', valid values are: [LUCENE_30, LUCENE_31, LUCENE_40, LUCENE_CURRENT] or a string in format 'V.V'" Do you know a way to make this work with Lucene version 2.9.3 ? Thanks, Johannes
Re: solrconfig luceneMatchVersion 2.9.3
according to http://www.mail-archive.com/solr-user@lucene.apache.org/msg40491.html there is no more trunk support for 2.9 indexes. So I tried the suggested solution to execute an optimize to convert a 2.9.3 index to a 3.x index. However, when I tried to the optimize a 2.9.3 index using the Solr 4.0 trunk version with luceneMatchVersion set to LUCENE_30 in the solrconfig.xml, I am getting SimplePostTool: POSTing file optimize.xml SimplePostTool: FATAL: Solr returned an error: Severe errors in solr configuration. Check your log files for more detailed information on what may be wrong. - java.lang.RuntimeException: org.apache.lucene.index.IndexFormatTooOldException: Format version is not supported in file '_0.fdx': 1 (needs to be between 2 and 2). This version of Lucene only supports indexes created with release 3.0 and later. Is there any other mechanism for converting index files to 3.x? 2011/1/6 Johannes Goll > Hi, > > our index files have been created using Lucene 2.9.3 and solr 1.4.1. > > I am trying to use a patched version of the current trunk (solr 1.5.0 ? ). > The patched version works fine with newly generated index data but > not with our existing data: > > After adjusting the solrconfig.xml - I added the line > > LUCENE_40 > > also tried > > LUCENE_30 > > I am getting the following exception > > "java.lang.RuntimeException: > org.apache.lucene.index.IndexFormatTooOldException: > Format version is not supported in file '_q.fdx': 1 (needs to be between 2 > and 2)" > > When I try to change it to > > LUCENE_29 > > or > > 2.9 > > or > > 2.9.3 > > I am getting > > "SEVERE: org.apache.solr.common.SolrException: Invalid luceneMatchVersion > '2.9', valid values are: [LUCENE_30, LUCENE_31, LUCENE_40, LUCENE_CURRENT] > or a string in format 'V.V'" > > Do you know a way to make this work with Lucene version 2.9.3 ? > > Thanks, > Johannes > -- Johannes Goll 211 Curry Ford Lane Gaithersburg, Maryland 20878
Re: Tuning StatsComponent
What field type do you recommend for a float stats.field for optimal Solr 1.4.1 StatsComponent performance ? float, pfloat or tfloat ? Do you recommend to index the field ? 2011/1/12 stockii > > my field Type is "double" maybe "sint" is better ? but i need double ... > =( > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Tuning-StatsComponent-tp2225809p2241903.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: Adding weightage to the facets count
Hi Siva, try using the Solr Stats Component http://wiki.apache.org/solr/StatsComponent similar to select/?&q=*:*&stats=true&stats.field={your-weight-field}&stats.facet={your-facet-field} and get the sum field from the response. You may need to resort the weighted facet counts to get a descending list of facet counts. Note, there is a bug for using the Stats Component with multi-valued facet fields. For details see https://issues.apache.org/jira/browse/SOLR-1782 Johannes 2011/1/24 Chris Hostetter > > : prod1 has tag called “Light Weight” with weightage 20, > : prod2 has tag called “Light Weight” with weightage 100, > : > : If i get facet for “Light Weight” , i will get Light Weight (2) , > : here i need to consider the weightage in to account, and the result will > be > : Light Weight (120) > : > : How can we achieve this?Any ideas are really helpful. > > > It's not really possible with Solr out of the box. Faceting is fast and > efficient in Solr because it's all done using set intersections (and most > of the sets can be kept in ram very compactly and reused). For what you > are describing you'd need to no only assocaited a weighted payload with > every TermPosition, but also factor that weight in when doing the > faceting, which means efficient set operations are now out the window. > > If you know java it would be probably be possible to write a custom > SolrPlugin (a SearchComponent) to do this type of faceting in special > cases (assuming you indexed in a particular way) but i'm not sure off hte > top of my head how well it would scale -- the basic algo i'm thinking of > is (after indexing each facet term wit ha weight payload) to iterate over > the DocSet of all matching documents in parallel with an iteration over > a TermPositions, skipping ahead to only the docs that match the query, and > recording the sum of the payloads for each term. > > Hmmm... > > except TermPositions iterates over >> tuples, > so you would have to iterate over every term, and for every term then loop > over all matching docs ... like i said, not sure how efficient it would > wind up being. > > You might be happier all arround if you just do some sampling -- store the > tag+weight pairs so thta htey cna be retireved with each doc, and then > when you get your top facet constraints back, look at the first page of > results, and figure out what the sun "weight" is for each of those > constraints based solely on the page#1 results. > > i've had happy users using a similar appraoch in the past. > > -Hoss -- Johannes Goll 211 Curry Ford Lane Gaithersburg, Maryland 20878
Re: solr upgrade question
Hi Alexander, I have posted same question a few month ago. The only solution that came up was to regenerate the index files using the new version. How did you do this exactly with luke 1.0.1 ? Would you mind sharing some of that magic ? Best, Johannes 2011/3/31 Alexander Aristov > Didn't get any responses. > > But I tried luke 1.0.1 and it did the magic. I run optimization and after > that solr got up. > > Best Regards > Alexander Aristov > > > On 30 March 2011 15:47, Alexander Aristov >wrote: > > > People > > > > Is were way to upgrade existsing index from solr 1.4 to solr 4(trunk). > When > > I configured solr 4 and launched it complained about incorrect lucence > file > > version (3 instead of old 2) > > > > Are there any procedures to convert index? > > > > > > Best Regards > > Alexander Aristov > > >
apache-solr-3.1 slow stats component queries
Hi, thank you for making the new apache-solr-3.1 available. I have installed the version from http://apache.tradebit.com/pub//lucene/solr/3.1.0/ and am running into very slow stats component queries (~ 1 minute) for fetching the computed sum of the stats field url: ?q=*:*&start=0&rows=0&stats=true&stats.field=weight 52825 #documents: 78,359,699 total RAM: 256G vm arguments: -server -xmx40G the stats.field specification is as follows: filter queries that narrow down the #docs help to reduce it - QTime seems to be proportional to the number of docs being returned by a filter query. Is there any way to improve the performance of such stats queries ? Caching only helped to improve the filter query performance but if larger subsets are being returned, QTime increases unacceptably. Since I only need the sum and not the STD or sumsOfSquares/Min/Max, I have created a custom 3.1 version that does only return the sum. But this only slightly improved the performance. Of course I could somehow cache the larger sum queries on the client side but I want to do this only as a last resort. Thank you very much in advance for any ideas/suggestions. Johannes
Re: apache-solr-3.1 slow stats component queries
any ideas why in this case the stats summaries are so slow ? Thank you very much in advance for any ideas/suggestions. Johannes 2011/4/5 Johannes Goll > Hi, > > thank you for making the new apache-solr-3.1 available. > > I have installed the version from > > http://apache.tradebit.com/pub//lucene/solr/3.1.0/ > > and am running into very slow stats component queries (~ 1 minute) > for fetching the computed sum of the stats field > > url: ?q=*:*&start=0&rows=0&stats=true&stats.field=weight > > 52825 > > #documents: 78,359,699 > total RAM: 256G > vm arguments: -server -xmx40G > > the stats.field specification is as follows: > stored="false" required="true" multiValued="false" > default="1"/> > > filter queries that narrow down the #docs help to reduce it - > QTime seems to be proportional to the number of docs being returned > by a filter query. > > Is there any way to improve the performance of such stats queries ? > Caching only helped to improve the filter query performance but if > larger subsets are being returned, QTime increases unacceptably. > > Since I only need the sum and not the STD or sumsOfSquares/Min/Max, > I have created a custom 3.1 version that does only return the sum. But this > only slightly improved the performance. Of course I could somehow cache > the larger sum queries on the client side but I want to do this only as a > last resort. > > Thank you very much in advance for any ideas/suggestions. > > Johannes > > -- Johannes Goll 211 Curry Ford Lane Gaithersburg, Maryland 20878
Re: apache-solr-3.1 slow stats component queries
Hi, I bench-marked the slow stats queries (6 point estimate) using the same hardware on an index of size 104M. We use a Solr/Lucene 3.1-mod which returns only the sum and count for statistics component results. Solr/Lucene is run on jetty. The relationship between query time and set of found documents is linear when using the stats component (R^2 0.99). I guess this is expected as the application needs to scan/sum-up the stat field for all matching documents? Are there any plans for caching stat results for a certain stat field along with the documents that match a filter query ? Any other ideas that could help to improve this (hardware/software configuration) ? Even for a subset of 10M entries, the stat search takes on the order of 10 seconds. Thanks in advance. Johannes 2011/4/18 Johannes Goll > any ideas why in this case the stats summaries are so slow ? Thank you > very much in advance for any ideas/suggestions. Johannes > > > 2011/4/5 Johannes Goll > >> Hi, >> >> thank you for making the new apache-solr-3.1 available. >> >> I have installed the version from >> >> http://apache.tradebit.com/pub//lucene/solr/3.1.0/ >> >> and am running into very slow stats component queries (~ 1 minute) >> for fetching the computed sum of the stats field >> >> url: ?q=*:*&start=0&rows=0&stats=true&stats.field=weight >> >> 52825 >> >> #documents: 78,359,699 >> total RAM: 256G >> vm arguments: -server -xmx40G >> >> the stats.field specification is as follows: >> > stored="false" required="true" multiValued="false" >> default="1"/> >> >> filter queries that narrow down the #docs help to reduce it - >> QTime seems to be proportional to the number of docs being returned >> by a filter query. >> >> Is there any way to improve the performance of such stats queries ? >> Caching only helped to improve the filter query performance but if >> larger subsets are being returned, QTime increases unacceptably. >> >> Since I only need the sum and not the STD or sumsOfSquares/Min/Max, >> I have created a custom 3.1 version that does only return the sum. But >> this >> only slightly improved the performance. Of course I could somehow cache >> the larger sum queries on the client side but I want to do this only as a >> last resort. >> >> Thank you very much in advance for any ideas/suggestions. >> >> Johannes >> >> > > > -- > Johannes Goll > 211 Curry Ford Lane > Gaithersburg, Maryland 20878 >
Re: Huge performance drop in distributed search w/ shards on the same server/container
Hi Fred, we are having similar issues of scaling Solr 3.1 distributed searches on a single box with 18 cores. We use the StatsComponent which seems to be mainly CPU bound. Using distributed searches resulted in a 9 fold decrease in response time. However, sporadically, Jetty 6.1.2X (shipped with Solr 3.1.) sporadically throws Socket connect exceptions when executing distributed searches. Our next step is to switch from from jetty to tomcat. Did you find a solution for improving the CPU utilization and requests per second for your system? Johannes 2011/5/26 pravesh > Do you really require multi-shards? Single core/shard will do for even > millions of documents and the search will be faster than searching on > multi-shards. > > Consider multi-shard when you cannot scale-up on a single > shard/machine(e.g, > CPU,RAM etc. becomes major block). > > Also read through the SOLR distributed search wiki to check on all tuning > up > required at application server(Tomcat) end, like maxHTTP request settings. > For a single request in a multi-shard setup internal HTTP requests are made > through all queried shards, so, make sure you set this parameter higher. > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Huge-performance-drop-in-distributed-search-w-shards-on-the-same-server-container-tp2938421p2988464.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- Johannes Goll 211 Curry Ford Lane Gaithersburg, Maryland 20878
Re: Huge performance drop in distributed search w/ shards on the same server/container
I increased the maximum POST size and headerBufferSize to 10MB ; lowThreads to 50, maxThreads to 10 and lowResourceMaxIdleTime=15000. We tried tomcat 6 using the following Connnector settings : I am getting the same exception as for jetty SEVERE: org.apache.solr.common.SolrException: org.apache.solr.client.solrj.SolrServerException: java.net.SocketException: Connection reset This seem to point towards a Solr specific issue (solrj.SolrServerException during individual shard searches). I monitored the CPU utilization executing sequential distributed searches and noticed that in the beginning all CPUs are getting used for a short period of time (multiple lines for shard searches are shown in the log with isShard=true arguments), then all CPU except one become idle and the request is being processed by this one CPU for the longest period of time. I also noticed in the logs that while most of the individual shard searches (isShard=true) have low QTimes (5-10), a minority has extreme QTimes (104402-105126). All shards are fairly similar in size and content (1.2 M documents) and the StatsComponent is being used [stats=true&stats.field=weight&stats.facet=library_id]. Here library_id equals the shard/core name. Is there an internal timeout for gathering shard results or other fixed resource limitation ? Johannes 2011/6/13 Yonik Seeley > On Sun, Jun 12, 2011 at 9:10 PM, Johannes Goll > wrote: > > However, sporadically, Jetty 6.1.2X (shipped with Solr 3.1.) > > sporadically throws Socket connect exceptions when executing distributed > > searches. > > Are you using the exact jetty.xml that shipped with the solr example > server, > or did you make any modifications? > > -Yonik > http://www.lucidimagination.com > -- Johannes Goll 211 Curry Ford Lane Gaithersburg, Maryland 20878
refiltering search results
Hello, Im trying to develop a search component to filter the search results agein with current data so that the user only sess results he is permitted to see. Can someone give me a hint where to start and how to do this? Is a Search Component the right place to do this? Regards Johannes
Antwort: Re: refiltering search results
The main idea is to filter results as much as possible with solr an then check this result again. To do this I have to read some information from some fields of the documents in the result. At the moment I am trying to do this in the process method of a Search Component. But I even dont know how to get access to the search results or the index Fields of the documents. I have thought of ResponseBuilder.getResults() but after I have the DocListandSet Object I get stuck. I know the time of the search will increase but security has priority Regards, Johannes Von: Alexandre Rafalovitch An: solr-user@lucene.apache.org Datum: 28.08.2012 16:48 Betreff: Re: refiltering search results I think there was a JOIN example (for version 4) somewhere with the permission restrictions. Or, if you have very broad categories, you can use different search handlers with restriction queries baked in. These might be enough. Otherwise, you have to send the list of IDs back and forth and it could be expensive. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Tue, Aug 28, 2012 at 9:28 AM, wrote: > Hello, > > Im trying to develop a search component to filter the search results agein > with current data so that the user only sess results he is permitted to > see. > > Can someone give me a hint where to start and how to do this? Is a Search > Component the right place to do this? > > Regards > Johannes
Antwort: Re: Antwort: Re: refiltering search results
Von: Ahmet Arslan An: solr-user@lucene.apache.org Datum: 29.08.2012 10:50 Betreff: Re: Antwort: Re: refiltering search results Thanks for the answer. My next question is how can i filter the result or how to replace the old ResponseBuilder Result with a new one? --- On Wed, 8/29/12, johannes.schwendin...@blum.com wrote: > From: johannes.schwendin...@blum.com > Subject: Antwort: Re: refiltering search results > To: solr-user@lucene.apache.org > Date: Wednesday, August 29, 2012, 8:22 AM > The main idea is to filter results as > much as possible with solr an then > check this result again. > To do this I have to read some information from some fields > of the > documents in the result. > At the moment I am trying to do this in the process method > of a Search > Component. But I even dont know > how to get access to the search results or the index Fields > of the > documents. > I have thought of ResponseBuilder.getResults() but after I > have the > DocListandSet Object I get stuck. You can read information from some fields using DocListandSet with org.apache.solr.util.SolrPluginUtils#docListToSolrDocumentList method.
LateBinding
Hello, Has anyone ever implementet the security feature called late-binding? I am trying this but I am very new to solr and I would be very glad if I would get some hints to this. Regards, Johannes
Query during a query
Hi list, I want to get distinct data from a single solr field when ever a search query is started by an user. How can I do this? Regards, Johannes
Antwort: Re: Query during a query
Thanks for the answer, but I want to know how I can do a seperate query before the main query. And I only want this data in my programm. The user won't see it. I need the values from one field to get some information from an external source while the main query is executed. pravesh schrieb am 31.08.2012 07:42:48: > Von: > > pravesh > > An: > > solr-user@lucene.apache.org > > Datum: > > 31.08.2012 07:43 > > Betreff: > > Re: Query during a query > > Did you checked SOLR Field Collapsing/Grouping. > http://wiki.apache.org/solr/FieldCollapsing > http://wiki.apache.org/solr/FieldCollapsing > If this is what you are looking for. > > > Thanx > Pravesh > > > > -- > View this message in context: http://lucene.472066.n3.nabble.com/ > Query-during-a-query-tp4004624p4004631.html > Sent from the Solr - User mailing list archive at Nabble.com.
Antwort: Re: Antwort: Re: Query during a query
The problem is, that I don't know how to do this. :P My sequence: the user enters his search words. This is sent to solr. There I need to make another query first to get metadata from the index. with this metadata I have to connect to an external source to get some information about the user. With this information and the first search words I query then the solr index to get the search result. I hope its clear now wheres my problem and what I want to do Regards, Johannes Von: "Jack Krupansky" An: Datum: 31.08.2012 15:03 Betreff: Re: Antwort: Re: Query during a query So, just do another query before doing the main query. What's the problem? Be more specific. Walk us through the sequence of processing that you need. -- Jack Krupansky -Original Message- From: johannes.schwendin...@blum.com Sent: Friday, August 31, 2012 1:52 AM To: solr-user@lucene.apache.org Subject: Antwort: Re: Query during a query Thanks for the answer, but I want to know how I can do a seperate query before the main query. And I only want this data in my programm. The user won't see it. I need the values from one field to get some information from an external source while the main query is executed. pravesh schrieb am 31.08.2012 07:42:48: > Von: > > pravesh > > An: > > solr-user@lucene.apache.org > > Datum: > > 31.08.2012 07:43 > > Betreff: > > Re: Query during a query > > Did you checked SOLR Field Collapsing/Grouping. > http://wiki.apache.org/solr/FieldCollapsing > http://wiki.apache.org/solr/FieldCollapsing > If this is what you are looking for. > > > Thanx > Pravesh > > > > -- > View this message in context: http://lucene.472066.n3.nabble.com/ > Query-during-a-query-tp4004624p4004631.html > Sent from the Solr - User mailing list archive at Nabble.com.
Solr Cell Questions
Hi, Im currently experimenting with Solr Cell to index files to Solr. During this some questions came up. 1. Is it possible (and wise) to connect to Solr Cell with multiple Threads at the same time to index several documents at the same time? This question came up because my prrogramm takes about 6hours to index round 35000 docs. (no production environment, only example solr and a little desktop machine but I think its very slow, and I know solr isn't the bottleneck (yet)) 2. If 1 is possible, how many Threads should do this and how many memory Solr needs? I've tried it but i run into an out of memory exception. Thanks in advantage Best Regards Johannes
Antwort: Re: Solr Cell Questions
Thank you Erick for your respone, I've already tried what you've suggested and got some out of memory exceptions. Because of this i like the solution with solr Cell where i can send the file directly to solr via stream and don't collect them in my memory. And another question that came to my mind, how many documents per minute, second, what ever can i put into solr. Say XML format and from 100kb to 100MB. Is there a number or is it to dependent from hardware and settings? Best Johannes Erick Erickson schrieb am 25.09.2012 00:22:26: > Von: > > Erick Erickson > > An: > > solr-user@lucene.apache.org > > Datum: > > 25.09.2012 00:23 > > Betreff: > > Re: Solr Cell Questions > > If you're concerned about throughput, consider moving all the > SolrCell (Tika) processing off the server. SolrCell is way cool > for showing what can be done, but its downside is you're > moving all the processing of the structured documents to the > same machine doing the indexing. Pretty soon, especially > with significant size files, you're spending all your CPU cycles > parsing the files... > > Happens there's a blog about this: > http://searchhub.org/dev/2012/02/14/indexing-with-solrj/ > > By moving the indexing to N clients, you can increase > throughput until you make Solr work hard to do the indexing > > Best > Erick > > On Mon, Sep 24, 2012 at 10:04 AM, wrote: > > Hi, > > > > Im currently experimenting with Solr Cell to index files to Solr. During > > this some questions came up. > > > > 1. Is it possible (and wise) to connect to Solr Cell with multiple Threads > > at the same time to index several documents at the same time? > > This question came up because my prrogramm takes about 6hours to index > > round 35000 docs. (no production environment, only example solr and a > > little desktop machine but I think its very slow, and I know solr isn't > > the bottleneck (yet)) > > > > 2. If 1 is possible, how many Threads should do this and how many memory > > Solr needs? I've tried it but i run into an out of memory exception. > > > > Thanks in advantage > > > > Best Regards > > Johannes
Antwort: Re: Re: Solr Cell Questions
The difference with solr cell is, that i'am sending every single document to solr cell and don't collect them until i have a couple of them in my memory. Using mainly the code form here: http://wiki.apache.org/solr/ExtractingRequestHandler#SolrJ Erick Erickson schrieb am 25.09.2012 15:47:34: > Von: > > Erick Erickson > > An: > > solr-user@lucene.apache.org > > Datum: > > 25.09.2012 15:48 > > Betreff: > > Re: Re: Solr Cell Questions > > bq: how many documents per minute, second, what ever can i put into solr > > Too many variables to say. I've seen several thousand truly simple > docs/sec. But since you're doing the Tika processing that's probably > going to be your limiting factor. And it'll be many fewer... > > I don't understand your OOM issue when running Tika on the client. Or, > rather, why you think using SolrCell makes this different. SolrCell also > uses Tika. So my suspicion it that your client-side process simply isn't > allocating much memory to the JVM, did you try bumping the memory > on your client? > > Best > Erick > > On Tue, Sep 25, 2012 at 5:23 AM, wrote: > > Thank you Erick for your respone, > > > > I've already tried what you've suggested and got some out of memory > > exceptions. Because of this i like the solution with solr Cell where i can > > send the file directly to solr via stream and don't collect them in my > > memory. > > > > And another question that came to my mind, how many documents per minute, > > second, what ever can i put into solr. Say XML format and from 100kb to > > 100MB. > > Is there a number or is it to dependent from hardware and settings? > > > > > > Best > > Johannes > > > > Erick Erickson schrieb am 25.09.2012 00:22:26: > > > >> Von: > >> > >> Erick Erickson > >> > >> An: > >> > >> solr-user@lucene.apache.org > >> > >> Datum: > >> > >> 25.09.2012 00:23 > >> > >> Betreff: > >> > >> Re: Solr Cell Questions > >> > >> If you're concerned about throughput, consider moving all the > >> SolrCell (Tika) processing off the server. SolrCell is way cool > >> for showing what can be done, but its downside is you're > >> moving all the processing of the structured documents to the > >> same machine doing the indexing. Pretty soon, especially > >> with significant size files, you're spending all your CPU cycles > >> parsing the files... > >> > >> Happens there's a blog about this: > >> http://searchhub.org/dev/2012/02/14/indexing-with-solrj/ > >> > >> By moving the indexing to N clients, you can increase > >> throughput until you make Solr work hard to do the indexing > >> > >> Best > >> Erick > >> > >> On Mon, Sep 24, 2012 at 10:04 AM, > > wrote: > >> > Hi, > >> > > >> > Im currently experimenting with Solr Cell to index files to Solr. > > During > >> > this some questions came up. > >> > > >> > 1. Is it possible (and wise) to connect to Solr Cell with multiple > > Threads > >> > at the same time to index several documents at the same time? > >> > This question came up because my prrogramm takes about 6hours to index > >> > round 35000 docs. (no production environment, only example solr and a > >> > little desktop machine but I think its very slow, and I know solr > > isn't > >> > the bottleneck (yet)) > >> > > >> > 2. If 1 is possible, how many Threads should do this and how many > > memory > >> > Solr needs? I've tried it but i run into an out of memory exception. > >> > > >> > Thanks in advantage > >> > > >> > Best Regards > >> > Johannes
Antwort: RE: Group.query
I think what you need is facetting, or is this another thing? http://searchhub.org/dev/2009/09/02/faceted-search-with-solr/ Peter Kirk schrieb am 26.09.2012 12:18:32: > Von: > > Peter Kirk > > An: > > "solr-user@lucene.apache.org" > > Datum: > > 26.09.2012 12:19 > > Betreff: > > RE: Group.query > > Thanks. Yes I can do this - but doesn't it mean I need to execute a > query per group? > > What I really want to do (and I'm sorry I'm not so good at > explaining) is to execute one query for products, and receive > results grouped by the groups - but where a particular product may > be found in several groups. > > For example, I'd like to execute a query for all products which > match "bucket". > There are several products which are "buckets", each of which can > belong to several groups. > Would it be possible to generate a query which would return the > groups, each with a list of the buckets? > > Example result, with 3 groups, and several products (which may occur > in several groups). > > Children_sand_toys > Castle bucket > Plain bucket > > Boys_toys > Castle bucket > Truck bucket > > Girls_toys > Castle bucket > Large Pony bucket > > Thanks, > Peter > > -Original Message- > From: Ingar Hov [mailto:ingar@gmail.com] > Sent: 26. september 2012 11:57 > To: solr-user@lucene.apache.org > Subject: Re: Group.query > > I hope I understood the question, if so this may be a solution: > > Why don't you make the field group for product multiple? > > Example: > > multiValued="true"/> > > If the product is a member of group1 and group2, just add both for > the product document so that each product has an array of group. > Then you can easily get all products for group1 by doing query: group:group1 > > Regards, > Ingar > > > > On Wed, Sep 26, 2012 at 10:48 AM, Peter Kirk wrote: > > Thanks. Yes, the only solution I could think of was to execute > several queries. > > I would like it to be a single query if at all possible. If anyone > has ideas I could look into that would be great. > > Thanks, > > Peter > > > > > > -Original Message- > > From: Aditya [mailto:findbestopensou...@gmail.com] > > Sent: 26. september 2012 10:41 > > To: solr-user@lucene.apache.org > > Subject: Re: Group.query > > > > Hi > > > > You are doing AND search, so you are getting results prod1 and > prod2. I guess, you should query only for group1 and another query for group2. > > > > Regards > > Aditya > > www.findbestopensource.com > > > > > > > > On Wed, Sep 26, 2012 at 12:26 PM, Peter Kirk wrote: > > > >> Hi > >> > >> I have "products" which belong to one or more "groups". > >> Products are documents in Solr, while the groups are fields (eg. > >> group_1_bool:true). > >> > >> For example: > >> > >> Prod1 => group1, group2 > >> Prod2 => group1, group2 > >> Prod3 => group1 > >> Prod4 => group2 > >> > >> I would like to execute a query which results in the groups with > >> their products. That is, the result should be something like: > >> > >> Group1 => Prod1, Prod2, Prod3 > >> Group2 => Prod1, Prod2, Prod4 > >> > >> How can I do this? > >> > >> I've been looking at group.query, but I don't think this is what I want. > >> > >> For example, "q=*:*&group.query=group_1_bool:true+AND+group_2_bool:true" > >> Results in 1 group called "group_1_bool:true AND group_2_bool:true", > >> which contains prod1 and prod2. > >> > >> > >> Thanks, > >> Peter > >> > >> > > > >
System collection - lazy loading mechanism not working for custom UpdateProcessors
Hi all, I'm facing an issue regarding custom code inside a .system-collection and starting up a Solr Cloud cluster. I thought, like its stated in the documentation, that in case using the .system collection custom code is lazy loaded, because it can happen that a collection that uses custom code is initialized before the system collection is up and running. I did all the necessary configuration and while debugging, I can see that the custom code is wrapped via a PluginBag$LazyPluginHolder. So far its seems good, but I still get Exceptions when starting the Solr Cloud with the following errors: SolrException: Blob loading failed: .no active replica available for .system collection... In my case I'm using custom code for a couple of UpdateProcessors. So it seems, that this lazy mechanism is not working well for UpdateProcessors. Inside the calzz LazyPluginHolder the comment says: "A class that loads plugins Lazily. When the get() method is invoked the Plugin is initialized and returned." When a core is initialized and you have a custom UpdateProcessor, the get-method is invoked directly and the lazy loading mechanism tries to get the custom class from the MemClassLoader, but in most scenarios the system collection is not up and the above Exception is thrown... So maybe it’s the case that for UpdateProcessors while initializing a core, the routine is not implemented optimal for the lazy loading mechanism? Pls let me know if it helps sharing my configuration! Many thanks, Johannes
System collection - lazy loading mechanism not working for custom UpdateProcessors?
Hi all, I'm facing an issue regarding custom code inside a .system-collection and starting up a Solr Cloud cluster. I thought, like its stated in the documentation, that in case using the .system collection custom code is lazy loaded, because it can happen that a collection that uses custom code is initialized before the system collection is up and running. I did all the necessary configuration and while debugging, I can see that the custom code is wrapped via a PluginBag$LazyPluginHolder. So far its seems good, but I still get Exceptions when starting the Solr Cloud with the following errors: SolrException: Blob loading failed: .no active replica available for .system collection... In my case I'm using custom code for a couple of UpdateProcessors. So it seems, that this lazy mechanism is not working well for UpdateProcessors. Inside the calzz LazyPluginHolder the comment says: "A class that loads plugins Lazily. When the get() method is invoked the Plugin is initialized and returned." When a core is initialized and you have a custom UpdateProcessor, the get-method is invoked directly and the lazy loading mechanism tries to get the custom class from the MemClassLoader, but in most scenarios the system collection is not up and the above Exception is thrown... So maybe it’s the case that for UpdateProcessors while initializing a core, the routine is not implemented optimal for the lazy loading mechanism? Pls let me know if it helps sharing my configuration! Many thanks, Johannes
AW: System collection - lazy loading mechanism not working for custom UpdateProcessors?
Maybe I have found a more accurate example constellation to reproduce the error. By default the .system-collection is created with 1 shard and 1 replica. In this constellation, everything works as expected and no matter how often I try to restart the Solr Cloud, the error "SolrException: Blob loading failed: .no active replica available for .system collection" is never thrown... [cid:image006.jpg@01D3DD73.9B783400] But once I started to add one more replica to the .system collection things are messing up! [cid:image007.jpg@01D3DD73.9B783400] With this setup, I'm not able to start the Solr Cloud server without any error: [cid:image008.jpg@01D3DD73.9B783400] Sometimes one or two collections are Active but most of the time all collections are permanently marked as Down… [cid:image012.jpg@01D3DD73.9B783400] Are there any restrictions how to setup the .system collection? Johannes -Ursprüngliche Nachricht- Von: Johannes Brucher [mailto:johannes.bruc...@shi-gmbh.com] Gesendet: Mittwoch, 25. April 2018 10:57 An: solr-user@lucene.apache.org Betreff: System collection - lazy loading mechanism not working for custom UpdateProcessors? Hi all, I'm facing an issue regarding custom code inside a .system-collection and starting up a Solr Cloud cluster. I thought, like its stated in the documentation, that in case using the .system collection custom code is lazy loaded, because it can happen that a collection that uses custom code is initialized before the system collection is up and running. I did all the necessary configuration and while debugging, I can see that the custom code is wrapped via a PluginBag$LazyPluginHolder. So far its seems good, but I still get Exceptions when starting the Solr Cloud with the following errors: SolrException: Blob loading failed: .no active replica available for .system collection... In my case I'm using custom code for a couple of UpdateProcessors. So it seems, that this lazy mechanism is not working well for UpdateProcessors. Inside the calzz LazyPluginHolder the comment says: "A class that loads plugins Lazily. When the get() method is invoked the Plugin is initialized and returned." When a core is initialized and you have a custom UpdateProcessor, the get-method is invoked directly and the lazy loading mechanism tries to get the custom class from the MemClassLoader, but in most scenarios the system collection is not up and the above Exception is thrown... So maybe it’s the case that for UpdateProcessors while initializing a core, the routine is not implemented optimal for the lazy loading mechanism? Pls let me know if it helps sharing my configuration! Many thanks, Johannes
AW: System collection - lazy loading mechanism not working for custom UpdateProcessors?
Ty Shawn, I’m trying to use JustPaste.it to share my screenshots… Hi all, maybe I have found a more accurate example constellation to reproduce the error. By default the .system-collection is created with 1 shard and 1 replica if you using just one node. In this constellation, everything works as expected and no matter how often I try to restart the Solr Cloud, the error "SolrException: Blob loading failed: .no active replica available for .system collection" is never thrown... https://justpaste.it/685gf But once I started to add one more replica to the .system collection things are messing up! With this setup, I'm not able to start the Solr Cloud server without any error: https://justpaste.it/4t66c Sometimes one or two collections are Active but most of the time all collections are permanently marked as Down… Here are the Exceptions I’m constantly getting: https://justpaste.it/5ziem Are there any restrictions how to setup the .system collection? Johannes -Ursprüngliche Nachricht- Von: Johannes Brucher [mailto:johannes.bruc...@shi-gmbh.com] Gesendet: Mittwoch, 25. April 2018 10:57 An: solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org> Betreff: System collection - lazy loading mechanism not working for custom UpdateProcessors? Hi all, I'm facing an issue regarding custom code inside a .system-collection and starting up a Solr Cloud cluster. I thought, like its stated in the documentation, that in case using the .system collection custom code is lazy loaded, because it can happen that a collection that uses custom code is initialized before the system collection is up and running. I did all the necessary configuration and while debugging, I can see that the custom code is wrapped via a PluginBag$LazyPluginHolder. So far its seems good, but I still get Exceptions when starting the Solr Cloud with the following errors: SolrException: Blob loading failed: .no active replica available for .system collection... In my case I'm using custom code for a couple of UpdateProcessors. So it seems, that this lazy mechanism is not working well for UpdateProcessors. Inside the calzz LazyPluginHolder the comment says: "A class that loads plugins Lazily. When the get() method is invoked the Plugin is initialized and returned." When a core is initialized and you have a custom UpdateProcessor, the get-method is invoked directly and the lazy loading mechanism tries to get the custom class from the MemClassLoader, but in most scenarios the system collection is not up and the above Exception is thrown... So maybe it’s the case that for UpdateProcessors while initializing a core, the routine is not implemented optimal for the lazy loading mechanism? Pls let me know if it helps sharing my configuration! Many thanks, Johannes
gzip compression solr 8.4.1
Hi, we want to use gzip-compression between our application and the solr server. We use a standalone solr server version 8.4.1 and the prepackaged jetty as application server. We have enabled the jetty gzip module by adding these two files: {path_to_solr}/server/modules/gzip.mod (see below the question) {path_to_solr}/server/etc/jetty-gzip.xml (see below the question) Within the application we use a HttpSolrServer that is configured with allowCompression=true. After we had released our application we saw that the number of connections within the TCP-state CLOSE_WAIT rising up until the application was not able to open new connections. After a long debugging session we think the problem is that the header "Content-Length" that is returned by the jetty is sometimes wrong when gzip-compression is enabled. The solrj client uses a ContentLengthInputStream, that uses the header "Content-Lenght" to detect if all data was received. But the InputStream can not be fully consumed because the value of the header "Content-Lenght" is higher than the actual content-length. Usually the method PoolingHttpClientConnectionManager.releaseConnection is called after the InputStream was fully consumed. This give the connection free to be reused or to be closed by the application. Due to the incorrect header "Content-Length" the PoolingHttpClientConnectionManager.releaseConnection method is never called and the connection stays active. After the connection-timeout of the jetty is reached, it closes the connection from the server-side and the TCP-state switches into CLOSE_WAIT. The client never closes the connection and so the number of connections in use rises up. Currently we try to configure the jetty gzip module to return no "Content-Length" if gzip-compression was used. We hope that in this case another InputStream implementation is used that uses the NULL-terminator to see when the InputStream was fully consumed. Do you have any experiences with this problem or any suggestions for us? Thanks, Johannes gzip.mod - DO NOT EDIT - See: https://www.eclipse.org/jetty/documentation/current/startup-modules.html [description] Enable GzipHandler for dynamic gzip compression for the entire server. [tags] handler [depend] server [xml] etc/jetty-gzip.xml [ini-template] ## Minimum content length after which gzip is enabled jetty.gzip.minGzipSize=2048 ## Check whether a file with *.gz extension exists jetty.gzip.checkGzExists=false ## Gzip compression level (-1 for default) jetty.gzip.compressionLevel=-1 ## User agents for which gzip is disabled jetty.gzip.excludedUserAgent=.*MSIE.6\.0.* - jetty-gzip.xml - http://www.eclipse.org/jetty/configure_9_3.dtd";> -
Re: gzip compression solr 8.4.1
Hi, We did further tests to see where the problem exactly is. These are our outcomes: The content-length is calculated correctly, a quick test with curl showed this. The problem is that the stream with the gzip data is not fully consumed and afterwards not closed. Using the debugger with a breakpoint at org/apache/solr/common/util/Utils.java:575 shows that it won't enter the function readFully((entity.getContent()) most likely due to how the gzip stream content is wrapped and extracted beforehand. On line org/apache/solr/common/util/Utils.java:582 the consumeQuietly(entity) should close the stream but does not because of a silent exception. This seems to be the same as it is described in https://issues.apache.org/jira/browse/SOLR-14457 We saw that the problem happened also with correct GZIP responses from jetty. Not only with non-GZIP as described within the jira issue. Best, Johannes Am Do., 23. Apr. 2020 um 09:55 Uhr schrieb Johannes Siegert < johannes.sieg...@offerista.com>: > Hi, > > we want to use gzip-compression between our application and the solr > server. > > We use a standalone solr server version 8.4.1 and the prepackaged jetty as > application server. > > We have enabled the jetty gzip module by adding these two files: > > {path_to_solr}/server/modules/gzip.mod (see below the question) > {path_to_solr}/server/etc/jetty-gzip.xml (see below the question) > > Within the application we use a HttpSolrServer that is configured with > allowCompression=true. > > After we had released our application we saw that the number of > connections within the TCP-state CLOSE_WAIT rising up until the application > was not able to open new connections. > > > After a long debugging session we think the problem is that the header > "Content-Length" that is returned by the jetty is sometimes wrong when > gzip-compression is enabled. > > The solrj client uses a ContentLengthInputStream, that uses the header > "Content-Lenght" to detect if all data was received. But the InputStream > can not be fully consumed because the value of the header "Content-Lenght" > is higher than the actual content-length. > > Usually the method PoolingHttpClientConnectionManager.releaseConnection is > called after the InputStream was fully consumed. This give the connection > free to be reused or to be closed by the application. > > Due to the incorrect header "Content-Length" the > PoolingHttpClientConnectionManager.releaseConnection method is never called > and the connection stays active. After the connection-timeout of the jetty > is reached, it closes the connection from the server-side and the TCP-state > switches into CLOSE_WAIT. The client never closes the connection and so the > number of connections in use rises up. > > > Currently we try to configure the jetty gzip module to return no > "Content-Length" if gzip-compression was used. We hope that in this case > another InputStream implementation is used that uses the NULL-terminator to > see when the InputStream was fully consumed. > > Do you have any experiences with this problem or any suggestions for us? > > Thanks, > > Johannes > > > gzip.mod > > - > > DO NOT EDIT - See: > https://www.eclipse.org/jetty/documentation/current/startup-modules.html > > [description] > Enable GzipHandler for dynamic gzip compression > for the entire server. > > [tags] > handler > > [depend] > server > > [xml] > etc/jetty-gzip.xml > > [ini-template] > ## Minimum content length after which gzip is enabled > jetty.gzip.minGzipSize=2048 > > ## Check whether a file with *.gz extension exists > jetty.gzip.checkGzExists=false > > ## Gzip compression level (-1 for default) > jetty.gzip.compressionLevel=-1 > > ## User agents for which gzip is disabled > jetty.gzip.excludedUserAgent=.*MSIE.6\.0.* > > - > > jetty-gzip.xml > > - > > > http://www.eclipse.org/jetty/configure_9_3.dtd";> > > > > > > > > > > > > class="org.eclipse.jetty.server.handler.gzip.GzipHandler"> > > deprecated="gzip.minGzipSize" default="2048" /> > > > deprecated="gzip.checkGzExists" default="false" /> > > > deprecated="gzip.compressionLevel" default="-1" /> > >
ManagedFilter for stemming
Hi, we are using the SnowballPorterFilter to stem our tokens for serveral languages. Now we want to update the list of protected words over the Solr-API. As I can see, there are only solutions for SynonymFilter and the StopwordFilter with ManagedSynonymFilter and ManagedStopFilter. Do you know any solution for my problem? Thanks, Johannes