Re: Performance testing on SOLR cloud
Hi Aswath, It is not common to test only QPS unless it is static index most of the time. Usually you have to test and tune worst case scenario - max expected indexing rate + queries. You can get more QPS by reducing query latency or by increasing number of replicas. You manage latency by tuning Solr/JVM/queries and/or by sharding index. You first tune index without replication and when sure it is best single index can provide, you introduce replicas to achieve required throughput. Hard part is tuning Solr. You can do it without specialized tools, but tools help a lot. One such tool is Sematext's SPM - https://sematext.com/spm/index.html where you can see all necessary Solr/JVM/OS metrics needed to tune Solr. It also provides QPS graph. With index your size, unless documents are really big, you can start without sharding. After tuning, if not satisfied with query latency, you can try splitting to two shards. Thanks, Emir -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & Elasticsearch Support * http://sematext.com/ On 17.11.2015 23:45, Aswath Srinivasan (TMS) wrote: Hi fellow developers, Please share your experience, on how you did performance testing on SOLR? What I'm trying to do is have SOLR cloud on 3 Linux servers with 16 GB RAM and index a total of 2.2 million. Yet to decide how many shards and replicas to have (Any hint on this is welcome too, basically 'only' performance testing, so suggest the number of shards and replicas if you can). Ultimately, I'm trying to find the QPS that this SOLR cloud set up can handle. To summarize, 1. Find the QPS that my solr cloud set up can support 2. Using 5.3.1 version with external zookeeper 3. 3 Linux servers with 16 GB RAM and index a total of 2.2 million documents 4. Yet to decide number of shards and replicas 5. Not using any custom search application (performance testing for SOLR and not for Search portal) Thank you
Re: Security Problems
Not sure I quite understand. You're saying that the cost for the UI is not large, but then suggesting we protect just one resource (/admin/security-check)? Why couldn't we create the permission called 'admin-ui' and protect everything under /admin/ui/ for example? Along with the root HTML link too. Upayavira On Wed, Nov 18, 2015, at 07:46 AM, Noble Paul wrote: > The authentication plugin is not expensive if you are talking in the > context of admin UI. After all it is used not like 100s of requests > per second. > > The simplest solution would be > > provide a well known permission name called "admin-ui" > > ensure that every admin page load makes a call to some resource say > "/admin/security-check" > > Then we can just protect that . > > The only concern thatI have is the false sense of security it would > give to the user > > But, that is a different point altogether > > On Wed, Nov 11, 2015 at 1:52 AM, Upayavira wrote: > > Is the authentication plugin that expensive? > > > > I can help by minifying the UI down to a smaller number of CSS/JS/etc > > files :-) > > > > It may be overkill, but it would also give better experience. And isn't > > that what most applications do? Check authentication tokens on every > > request? > > > > Upayavira > > > > On Tue, Nov 10, 2015, at 07:33 PM, Anshum Gupta wrote: > >> The reason why we bypass that is so that we don't hit the authentication > >> plugin for every request that comes in for static content. I think we > >> could > >> call the authentication plugin for that but that'd be an overkill. Better > >> experience ? yes > >> > >> On Tue, Nov 10, 2015 at 11:24 AM, Upayavira wrote: > >> > >> > Noble, > >> > > >> > I get that a UI which is open source does not benefit from ACL control - > >> > we're not giving away anything that isn't public (other than perhaps > >> > info that could be used to identify the version of Solr, or even the > >> > fact that it *is* solr). > >> > > >> > However, from a user experience point of view, requiring credentials to > >> > see the UI would be more conventional, and therefore lead to less > >> > confusion. Is it possible for us to protect the UI static files, only > >> > for the sake of user experience, rather than security? > >> > > >> > Upayavira > >> > > >> > On Tue, Nov 10, 2015, at 12:01 PM, Noble Paul wrote: > >> > > The admin UI is a bunch of static pages . We don't let the ACL control > >> > > static content > >> > > > >> > > you must blacklist all the core/collection apis and it is pretty much > >> > > useless for anyone to access the admin UI (w/o the credentials , of > >> > > course) > >> > > > >> > > On Tue, Nov 10, 2015 at 7:08 AM, 马柏樟 wrote: > >> > > > Hi, > >> > > > > >> > > > After I configure Authentication with Basic Authentication Plugin and > >> > Authorization with Rule-Based Authorization Plugin, How can I prevent the > >> > strangers from visiting my solr by browser? For example, if the stranger > >> > visit the http://(my host):8983, the browser will pop up a window and > >> > says "the server http://(my host):8983 requires a username and > >> > password" > >> > > > >> > > > >> > > > >> > > -- > >> > > - > >> > > Noble Paul > >> > > >> > >> > >> > >> -- > >> Anshum Gupta > > > > -- > - > Noble Paul
Re: search for documents where all words of field present in the query
Hi Jim, I think you could do some magic with function queries. https://cwiki.apache.org/confluence/display/solr/Function+Queries Index number of unique words in the product title e.g. title = john smith length = 2 return products if the number of matching terms equals to the number of words in the title. Perhaps there is a better way but something like below should work in theory. termfreq(title,'john') termfreq(title,'smith') fq={!frange l=0 u=0} sub(length, sum(termfreq(title,'smith'), termfreq(title,'smith'))) Ahmet On Tuesday, November 17, 2015 4:31 PM, superjim wrote: How would I form a query where all of the words in a field must be present in the query (but possibly more). For example, if I have the following words in a text field: "John Smith" A query for "John" should return no results A query for "Smith" should return no results A query for "John Smith" should return that one result A query for "banana John Smith purple monkey dishwasher" should return that one result -- View this message in context: http://lucene.472066.n3.nabble.com/search-for-documents-where-all-words-of-field-present-in-the-query-tp4240564.html Sent from the Solr - User mailing list archive at Nabble.com.
Upgrading from 4.x to 5.x
Hi! I'm a very inexperienced user with Solr. I've been using Solr to provide indexes for my Dovecot IMAP server. Using version 3.x, and later 4.x, I have been able to do so without too much of a challenge. However, version 5.x has certainly changed quite a bit and I'm very uncertain how to proceed. I currently have a working 4.10.3 installation, using the "example" server provided with the Solr distribution package, and a schema.xml optimized for Dovecot. I haven't found anything on migrating from 4 to 5 - at least anything I actually understood. Can you point me in the right direction? -- Daniel
Re: CloudSolrClient Connect To Zookeeper with ACL Protected files
At the moment it seems that it's only settable via System properties - see https://cwiki.apache.org/confluence/display/solr/ZooKeeper+Access+Control. But it would be nice to do this programmatically as well, maybe worth opening a JIRA ticket? Alan Woodward www.flax.co.uk On 17 Nov 2015, at 16:44, Kevin Lee wrote: > Does anyone know if it is possible to set the ACL credentials in > CloudSolrClient needed to access a protected resource in Zookeeper? > > Thanks! > >> On Nov 13, 2015, at 1:20 PM, Kevin Lee wrote: >> >> Hi, >> >> Is there a way to use CloudSolrClient and connect to a Zookeeper instance >> where ACL is enabled and resources/files like /live_nodes, etc are ACL >> protected? Couldn’t find a way to set the ACL credentials. >> >> Thanks, >> Kevin >
Re: Solr Search: Access Control / Role based security
On 18/11/2015 07:55, Noble Paul wrote: I haven't evaluated manifoldCF for this . However , my preference would be to have a generic mechanism in built into Solr to restrict user access to certain docs based on some field values. Relying on external tools make life complex for users who do not like it. Our strategy is * Provide a pluggable framework so that custom external solutions can be plugged in * Provide a standard implementation which does not depend upon any external solutions any suggestions are welcome Hi, We're working on an external JOIN as part of the BioSolr project: basically this lets you filter result sets with an external query (which could be an authentication system of some kind). There's a patch at https://issues.apache.org/jira/browse/SOLR-7341 and the author, Tom Winch, is working on a blog post to explain it further - it'll hopefully be up on http://www.flax.co.uk/blog within the week. Cheers Charlie PS If anyone fancies a trip to Cambridge UK this February we're running a free 'search for bioinformatics' event http://www.ebi.ac.uk/pdbe/about/events/open-source-search-bioinformatics On Wed, Nov 11, 2015 at 12:07 AM, Susheel Kumar wrote: Thanks everyone for the suggestions. Hi Noble - Were there any thoughts made on utilizing Apache ManifoldCF while developing Authentication/Authorization plugins or anything to add there. Thanks, Susheel On Tue, Nov 10, 2015 at 5:01 AM, Alessandro Benedetti wrote: I've been working for a while with Apache ManifoldCF and Enterprise Search in Solr ( with Document level security) . Basically you can add a couple of extra fields , for example : allow_token : containing all the tokens that can view the document deny_token : containing all the tokens that are denied to view the document Apache ManifoldCF provides an integration that add an additional layer, and is able to combine different data sources permission schemes. The Authority Service endpoint will take in input the user name and return all the allow_token values and deny_token. At this point you can append the related filter queries to your queries and be sure that the user will only see what is supposed to see. It's basically an extension of the strategy you were proposing, role based. Of course keep protected your endpoints and avoid users to put custom fq, or all your document security model would be useless :) Cheers On 9 November 2015 at 21:52, Scott Stults < sstu...@opensourceconnections.com wrote: Susheel, This is perfectly fine for simple use-cases and has the benefit that the filterCache will help things stay nice and speedy. Apache ManifoldCF goes a bit further and ties back to your authentication and authorization mechanism: http://manifoldcf.apache.org/release/trunk/en_US/concepts.html#ManifoldCF+security+model k/r, Scott On Thu, Nov 5, 2015 at 2:26 PM, Susheel Kumar wrote: Hi, I have seen couple of use cases / need where we want to restrict result of search based on role of a user. For e.g. - if user role is admin, any document from the search result will be returned - if user role is manager, only documents intended for managers will be returned - if user role is worker, only documents intended for workers will be returned Typical practise is to tag the documents with the roles (using a multi-valued field) during indexing and then during search append filter query to restrict result based on roles. Wondering if there is any other better way out there and if this common requirement should be added as a Solr feature/plugin. The current security plugins are more towards making Solr apis/resources secure not towards securing/controlling data during search. https://cwiki.apache.org/confluence/display/solr/Authentication+and+Authorization+Plugins Please share your thoughts. Thanks, Susheel -- Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC | 434.409.2780 http://www.opensourceconnections.com -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti "Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry?" William Blake - Songs of Experience -1794 England -- Charlie Hull Flax - Open Source Enterprise Search tel/fax: +44 (0)8700 118334 mobile: +44 (0)7767 825828 web: www.flax.co.uk
Configure it on server
Hi, Can you help me out how I can configure it on a server? It was configured on one of our servers but I am unable to replicate it. Can you please help. Thanks, Prateek This message and the information contained herein is proprietary and confidential and subject to the Amdocs policy statement, you may review at http://www.amdocs.com/email_disclaimer.asp
Re: Configure it on server
Hi Prateek, Your question is little ambiguous. Could you please describe it more precisely what you want to configure on server and what is your requirement and problem. This will be more helpful to understand your problem. With Regards Aman Tandon On Wed, Nov 18, 2015 at 4:29 PM, Prateek Sharma wrote: > Hi, > > Can you help me out how I can configure it on a server? > It was configured on one of our servers but I am unable to replicate it. > > Can you please help. > > Thanks, > Prateek > > This message and the information contained herein is proprietary and > confidential and subject to the Amdocs policy statement, > you may review at http://www.amdocs.com/email_disclaimer.asp >
Synchronization Problems
Hi, I have encountered some problems with solr-5.3.1. After I initialized the solrcloud and set up BasicAuthPlugin and RuleBasedAuthorizationPlugin, something wrong happened to my solrcloud. I can't Synchronization as usual. The server log as follows: master log Invalid key PKIAuthenticationPlugin silver log Error while trying to recover:org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://172.16.200.35:8983/solr/t: Expected MIME type application/octet-stream but got text/html. RecoveryStrategy What can I do next? Thanks, Regards
Re: Upgrading from 4.x to 5.x
Hi You could try this Instead of example/, use the server/ folder (it has Jetty in it) Start Solr using bin/solr start script instead of java -jar start.jar … Leave your solrconfig and schema as is to keep back-compat with 4.x. You may need to remove use of 3.x classes that were deprecated in 4.x https://cwiki.apache.org/confluence/display/solr/Major+Changes+from+Solr+4+to+Solr+5 -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com > 18. nov. 2015 kl. 10.10 skrev Daniel Miller : > > Hi! > > I'm a very inexperienced user with Solr. I've been using Solr to provide > indexes for my Dovecot IMAP server. Using version 3.x, and later 4.x, I have > been able to do so without too much of a challenge. However, version 5.x has > certainly changed quite a bit and I'm very uncertain how to proceed. > > I currently have a working 4.10.3 installation, using the "example" server > provided with the Solr distribution package, and a schema.xml optimized for > Dovecot. I haven't found anything on migrating from 4 to 5 - at least > anything I actually understood. Can you point me in the right direction? > > -- > Daniel
Re: Security Problems
As of now the admin-ui calls are not protected. The static calls are served by jetty and it bypasses the authentication mechanism completely. If the admin UI relies on some API call which is served by Solr. The other option is to revamp the framework to take care of admin UI (static content) as well. This would be cleaner solution On Wed, Nov 18, 2015 at 2:32 PM, Upayavira wrote: > Not sure I quite understand. > > You're saying that the cost for the UI is not large, but then suggesting > we protect just one resource (/admin/security-check)? > > Why couldn't we create the permission called 'admin-ui' and protect > everything under /admin/ui/ for example? Along with the root HTML link > too. > > Upayavira > > On Wed, Nov 18, 2015, at 07:46 AM, Noble Paul wrote: >> The authentication plugin is not expensive if you are talking in the >> context of admin UI. After all it is used not like 100s of requests >> per second. >> >> The simplest solution would be >> >> provide a well known permission name called "admin-ui" >> >> ensure that every admin page load makes a call to some resource say >> "/admin/security-check" >> >> Then we can just protect that . >> >> The only concern thatI have is the false sense of security it would >> give to the user >> >> But, that is a different point altogether >> >> On Wed, Nov 11, 2015 at 1:52 AM, Upayavira wrote: >> > Is the authentication plugin that expensive? >> > >> > I can help by minifying the UI down to a smaller number of CSS/JS/etc >> > files :-) >> > >> > It may be overkill, but it would also give better experience. And isn't >> > that what most applications do? Check authentication tokens on every >> > request? >> > >> > Upayavira >> > >> > On Tue, Nov 10, 2015, at 07:33 PM, Anshum Gupta wrote: >> >> The reason why we bypass that is so that we don't hit the authentication >> >> plugin for every request that comes in for static content. I think we >> >> could >> >> call the authentication plugin for that but that'd be an overkill. Better >> >> experience ? yes >> >> >> >> On Tue, Nov 10, 2015 at 11:24 AM, Upayavira wrote: >> >> >> >> > Noble, >> >> > >> >> > I get that a UI which is open source does not benefit from ACL control - >> >> > we're not giving away anything that isn't public (other than perhaps >> >> > info that could be used to identify the version of Solr, or even the >> >> > fact that it *is* solr). >> >> > >> >> > However, from a user experience point of view, requiring credentials to >> >> > see the UI would be more conventional, and therefore lead to less >> >> > confusion. Is it possible for us to protect the UI static files, only >> >> > for the sake of user experience, rather than security? >> >> > >> >> > Upayavira >> >> > >> >> > On Tue, Nov 10, 2015, at 12:01 PM, Noble Paul wrote: >> >> > > The admin UI is a bunch of static pages . We don't let the ACL control >> >> > > static content >> >> > > >> >> > > you must blacklist all the core/collection apis and it is pretty much >> >> > > useless for anyone to access the admin UI (w/o the credentials , of >> >> > > course) >> >> > > >> >> > > On Tue, Nov 10, 2015 at 7:08 AM, 马柏樟 wrote: >> >> > > > Hi, >> >> > > > >> >> > > > After I configure Authentication with Basic Authentication Plugin >> >> > > > and >> >> > Authorization with Rule-Based Authorization Plugin, How can I prevent >> >> > the >> >> > strangers from visiting my solr by browser? For example, if the stranger >> >> > visit the http://(my host):8983, the browser will pop up a window and >> >> > says "the server http://(my host):8983 requires a username and >> >> > password" >> >> > > >> >> > > >> >> > > >> >> > > -- >> >> > > - >> >> > > Noble Paul >> >> > >> >> >> >> >> >> >> >> -- >> >> Anshum Gupta >> >> >> >> -- >> - >> Noble Paul -- - Noble Paul
Re: search for documents where all words of field present in the query
Assuming this is the only, specific kind of search you want, what about using shingles of tokens at query time and keyword tokenizer at indexing time ? Ideally you don't tokenise at indexing time. At query time you build your shingles ( apparently you need not only adiacent token shingles, so play a little bit with it and possibly customise it) . If you give us more information, maybe we can design a better solution. Cheers On 18 November 2015 at 09:02, Ahmet Arslan wrote: > > > Hi Jim, > > I think you could do some magic with function queries. > https://cwiki.apache.org/confluence/display/solr/Function+Queries > > > Index number of unique words in the product title e.g. > title = john smith > length = 2 > > return products if the number of matching terms equals to the number of > words in the title. > > Perhaps there is a better way but something like below should work in > theory. > > termfreq(title,'john') > termfreq(title,'smith') > > fq={!frange l=0 u=0} sub(length, sum(termfreq(title,'smith'), > termfreq(title,'smith'))) > Ahmet > > > On Tuesday, November 17, 2015 4:31 PM, superjim wrote: > > > > How would I form a query where all of the words in a field must be present > in > the query (but possibly more). For example, if I have the following words > in > a text field: "John Smith" > > A query for "John" should return no results > > A query for "Smith" should return no results > > A query for "John Smith" should return that one result > > A query for "banana John Smith purple monkey dishwasher" should return that > one result > > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/search-for-documents-where-all-words-of-field-present-in-the-query-tp4240564.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti "Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry?" William Blake - Songs of Experience -1794 England
Re: Security Problems
I tried out BasicAuthPlugin today. Surprised that not admin UI is protected. But even more surprised that only /select seems to be protected for not logged in users. I can create collections and /update documents without being prompted for pw. My security.json is https://gist.github.com/janhoy/d18854c75461816fb947 -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com > 18. nov. 2015 kl. 14.54 skrev Noble Paul : > > As of now the admin-ui calls are not protected. The static calls are > served by jetty and it bypasses the authentication mechanism > completely. If the admin UI relies on some API call which is served by > Solr. > The other option is to revamp the framework to take care of admin UI > (static content) as well. This would be cleaner solution > > > On Wed, Nov 18, 2015 at 2:32 PM, Upayavira wrote: >> Not sure I quite understand. >> >> You're saying that the cost for the UI is not large, but then suggesting >> we protect just one resource (/admin/security-check)? >> >> Why couldn't we create the permission called 'admin-ui' and protect >> everything under /admin/ui/ for example? Along with the root HTML link >> too. >> >> Upayavira >> >> On Wed, Nov 18, 2015, at 07:46 AM, Noble Paul wrote: >>> The authentication plugin is not expensive if you are talking in the >>> context of admin UI. After all it is used not like 100s of requests >>> per second. >>> >>> The simplest solution would be >>> >>> provide a well known permission name called "admin-ui" >>> >>> ensure that every admin page load makes a call to some resource say >>> "/admin/security-check" >>> >>> Then we can just protect that . >>> >>> The only concern thatI have is the false sense of security it would >>> give to the user >>> >>> But, that is a different point altogether >>> >>> On Wed, Nov 11, 2015 at 1:52 AM, Upayavira wrote: Is the authentication plugin that expensive? I can help by minifying the UI down to a smaller number of CSS/JS/etc files :-) It may be overkill, but it would also give better experience. And isn't that what most applications do? Check authentication tokens on every request? Upayavira On Tue, Nov 10, 2015, at 07:33 PM, Anshum Gupta wrote: > The reason why we bypass that is so that we don't hit the authentication > plugin for every request that comes in for static content. I think we > could > call the authentication plugin for that but that'd be an overkill. Better > experience ? yes > > On Tue, Nov 10, 2015 at 11:24 AM, Upayavira wrote: > >> Noble, >> >> I get that a UI which is open source does not benefit from ACL control - >> we're not giving away anything that isn't public (other than perhaps >> info that could be used to identify the version of Solr, or even the >> fact that it *is* solr). >> >> However, from a user experience point of view, requiring credentials to >> see the UI would be more conventional, and therefore lead to less >> confusion. Is it possible for us to protect the UI static files, only >> for the sake of user experience, rather than security? >> >> Upayavira >> >> On Tue, Nov 10, 2015, at 12:01 PM, Noble Paul wrote: >>> The admin UI is a bunch of static pages . We don't let the ACL control >>> static content >>> >>> you must blacklist all the core/collection apis and it is pretty much >>> useless for anyone to access the admin UI (w/o the credentials , of >>> course) >>> >>> On Tue, Nov 10, 2015 at 7:08 AM, 马柏樟 wrote: Hi, After I configure Authentication with Basic Authentication Plugin and >> Authorization with Rule-Based Authorization Plugin, How can I prevent the >> strangers from visiting my solr by browser? For example, if the stranger >> visit the http://(my host):8983, the browser will pop up a window and >> says "the server http://(my host):8983 requires a username and >> password" >>> >>> >>> >>> -- >>> - >>> Noble Paul >> > > > > -- > Anshum Gupta >>> >>> >>> >>> -- >>> - >>> Noble Paul > > > > -- > - > Noble Paul
Re: add and then delete same document before commit,
On Solr 4.10.3 I'm noting a different (desired) behaviour 1) add document x 2) delete document x 3) commit document x doesn't get indexed. The question now is: Can I count on this behaviour or is it just incidental? 2014-11-05 22:21 GMT+01:00 Matteo Grolla : > Perfectly clear, > thanks a lot! > > Il giorno 05/nov/2014, alle ore 13:48, Jack Krupansky ha scritto: > > > Document x doesn't exist - in terms of visibility - until the commit, so > the delete will no-op since a query of Lucene will not "see" the > uncommitted new document. > > > > -- Jack Krupansky > > > > -Original Message- From: Matteo Grolla > > Sent: Wednesday, November 5, 2014 4:47 AM > > To: solr-user@lucene.apache.org > > Subject: add and then delete same document before commit, > > > > Can anyone tell me the behavior of solr (and if it's consistent) when I > do what follows: > > 1) add document x > > 2) delete document x > > 3) commit > > > > I've tried with solr 4.5.0 and document x get's indexed > > > > Matteo= > >
Re: add and then delete same document before commit,
On 11/18/2015 8:21 AM, Matteo Grolla wrote: > On Solr 4.10.3 I'm noting a different (desired) behaviour > > 1) add document x > 2) delete document x > 3) commit > > document x doesn't get indexed. If the last operation for document X is to delete it, then it will be gone after the commit and not searchable. Order of operations is critical, and it's important to realize that Solr is not transactional. With a relational database like MySQL, updates made by one client can be logically separate from updates made by another client. Solr (Lucene) does not have that logical separation. When a commit happens, no matter where the commit comes from, changes made by ALL clients before that commit will become visible. Thanks, Shawn
Re: Security Problems
Everything requires explicit rules, if you wish to protect "/update/*" create a permission with name "update" and assign a role for the same. If you don't have an explicit rule, those paths are accessible by all On Wed, Nov 18, 2015 at 8:10 PM, Jan Høydahl wrote: > I tried out BasicAuthPlugin today. > Surprised that not admin UI is protected. > But even more surprised that only /select seems to be protected for not > logged in users. > I can create collections and /update documents without being prompted for pw. > > My security.json is https://gist.github.com/janhoy/d18854c75461816fb947 > > -- > Jan Høydahl, search solution architect > Cominvent AS - www.cominvent.com > >> 18. nov. 2015 kl. 14.54 skrev Noble Paul : >> >> As of now the admin-ui calls are not protected. The static calls are >> served by jetty and it bypasses the authentication mechanism >> completely. If the admin UI relies on some API call which is served by >> Solr. >> The other option is to revamp the framework to take care of admin UI >> (static content) as well. This would be cleaner solution >> >> >> On Wed, Nov 18, 2015 at 2:32 PM, Upayavira wrote: >>> Not sure I quite understand. >>> >>> You're saying that the cost for the UI is not large, but then suggesting >>> we protect just one resource (/admin/security-check)? >>> >>> Why couldn't we create the permission called 'admin-ui' and protect >>> everything under /admin/ui/ for example? Along with the root HTML link >>> too. >>> >>> Upayavira >>> >>> On Wed, Nov 18, 2015, at 07:46 AM, Noble Paul wrote: The authentication plugin is not expensive if you are talking in the context of admin UI. After all it is used not like 100s of requests per second. The simplest solution would be provide a well known permission name called "admin-ui" ensure that every admin page load makes a call to some resource say "/admin/security-check" Then we can just protect that . The only concern thatI have is the false sense of security it would give to the user But, that is a different point altogether On Wed, Nov 11, 2015 at 1:52 AM, Upayavira wrote: > Is the authentication plugin that expensive? > > I can help by minifying the UI down to a smaller number of CSS/JS/etc > files :-) > > It may be overkill, but it would also give better experience. And isn't > that what most applications do? Check authentication tokens on every > request? > > Upayavira > > On Tue, Nov 10, 2015, at 07:33 PM, Anshum Gupta wrote: >> The reason why we bypass that is so that we don't hit the authentication >> plugin for every request that comes in for static content. I think we >> could >> call the authentication plugin for that but that'd be an overkill. Better >> experience ? yes >> >> On Tue, Nov 10, 2015 at 11:24 AM, Upayavira wrote: >> >>> Noble, >>> >>> I get that a UI which is open source does not benefit from ACL control - >>> we're not giving away anything that isn't public (other than perhaps >>> info that could be used to identify the version of Solr, or even the >>> fact that it *is* solr). >>> >>> However, from a user experience point of view, requiring credentials to >>> see the UI would be more conventional, and therefore lead to less >>> confusion. Is it possible for us to protect the UI static files, only >>> for the sake of user experience, rather than security? >>> >>> Upayavira >>> >>> On Tue, Nov 10, 2015, at 12:01 PM, Noble Paul wrote: The admin UI is a bunch of static pages . We don't let the ACL control static content you must blacklist all the core/collection apis and it is pretty much useless for anyone to access the admin UI (w/o the credentials , of course) On Tue, Nov 10, 2015 at 7:08 AM, 马柏樟 wrote: > Hi, > > After I configure Authentication with Basic Authentication Plugin and >>> Authorization with Rule-Based Authorization Plugin, How can I prevent >>> the >>> strangers from visiting my solr by browser? For example, if the stranger >>> visit the http://(my host):8983, the browser will pop up a window and >>> says "the server http://(my host):8983 requires a username and >>> password" -- - Noble Paul >>> >> >> >> >> -- >> Anshum Gupta -- - Noble Paul >> >> >> >> -- >> - >> Noble Paul > -- - Noble Paul
Re: add and then delete same document before commit,
Thanks Shawn, I'm aware that solr isn't transactional and I don't need this property: a single application is indexing. With solr 4.6 I was noting a different behaviour, with 4.10 I'm observing the desired one. I'd like to know If I can count on this behaviour to be maintained by successive solr version. 2015-11-18 16:51 GMT+01:00 Shawn Heisey : > On 11/18/2015 8:21 AM, Matteo Grolla wrote: > > On Solr 4.10.3 I'm noting a different (desired) behaviour > > > > 1) add document x > > 2) delete document x > > 3) commit > > > > document x doesn't get indexed. > > If the last operation for document X is to delete it, then it will be > gone after the commit and not searchable. > > Order of operations is critical, and it's important to realize that Solr > is not transactional. With a relational database like MySQL, updates > made by one client can be logically separate from updates made by > another client. Solr (Lucene) does not have that logical separation. > When a commit happens, no matter where the commit comes from, changes > made by ALL clients before that commit will become visible. > > Thanks, > Shawn > >
Re: CloudSolrClient Connect To Zookeeper with ACL Protected files
Thanks Alan! That works! I was looking for a programatic way to do it, but this will work for now as it doesn’t seem to be supported. - Kevin > On Nov 18, 2015, at 1:24 AM, Alan Woodward wrote: > > At the moment it seems that it's only settable via System properties - see > https://cwiki.apache.org/confluence/display/solr/ZooKeeper+Access+Control. > But it would be nice to do this programmatically as well, maybe worth opening > a JIRA ticket? > > Alan Woodward > www.flax.co.uk > > > On 17 Nov 2015, at 16:44, Kevin Lee wrote: > >> Does anyone know if it is possible to set the ACL credentials in >> CloudSolrClient needed to access a protected resource in Zookeeper? >> >> Thanks! >> >>> On Nov 13, 2015, at 1:20 PM, Kevin Lee wrote: >>> >>> Hi, >>> >>> Is there a way to use CloudSolrClient and connect to a Zookeeper instance >>> where ACL is enabled and resources/files like /live_nodes, etc are ACL >>> protected? Couldn’t find a way to set the ACL credentials. >>> >>> Thanks, >>> Kevin >> >
Re: add and then delete same document before commit,
Then that was probably a bug in 4.6. There's a lot of work that's been done since then, and distributed updates that are mixed like this are particularly "interesting". So you should be able to count on this. One other possibility: Is it possible that this was a false failure in 4.6 and a commit happened between the original insert and the delete? Just askin'... Best, Erick On Wed, Nov 18, 2015 at 8:21 AM, Matteo Grolla wrote: > Thanks Shawn, >I'm aware that solr isn't transactional and I don't need this property: > a single application is indexing. > With solr 4.6 I was noting a different behaviour, with 4.10 I'm observing > the desired one. > I'd like to know If I can count on this behaviour to be maintained by > successive solr version. > > 2015-11-18 16:51 GMT+01:00 Shawn Heisey : > >> On 11/18/2015 8:21 AM, Matteo Grolla wrote: >> > On Solr 4.10.3 I'm noting a different (desired) behaviour >> > >> > 1) add document x >> > 2) delete document x >> > 3) commit >> > >> > document x doesn't get indexed. >> >> If the last operation for document X is to delete it, then it will be >> gone after the commit and not searchable. >> >> Order of operations is critical, and it's important to realize that Solr >> is not transactional. With a relational database like MySQL, updates >> made by one client can be logically separate from updates made by >> another client. Solr (Lucene) does not have that logical separation. >> When a commit happens, no matter where the commit comes from, changes >> made by ALL clients before that commit will become visible. >> >> Thanks, >> Shawn >> >>
Re: add and then delete same document before commit,
Thanks Erik, I observed the wrong behaviour on 4.6 in a controlled environment with a very simple test case, so It's was probably a bug (or I was drunk ;-) ) Really thanks again!!! 2015-11-18 17:40 GMT+01:00 Erick Erickson : > Then that was probably a bug in 4.6. There's a lot > of work that's been done since then, and distributed > updates that are mixed like this are particularly > "interesting". > > So you should be able to count on this. > > One other possibility: Is it possible that this was a false > failure in 4.6 and a commit happened between the original > insert and the delete? Just askin'... > > Best, > Erick > > On Wed, Nov 18, 2015 at 8:21 AM, Matteo Grolla > wrote: > > Thanks Shawn, > >I'm aware that solr isn't transactional and I don't need this > property: > > a single application is indexing. > > With solr 4.6 I was noting a different behaviour, with 4.10 I'm observing > > the desired one. > > I'd like to know If I can count on this behaviour to be maintained by > > successive solr version. > > > > 2015-11-18 16:51 GMT+01:00 Shawn Heisey : > > > >> On 11/18/2015 8:21 AM, Matteo Grolla wrote: > >> > On Solr 4.10.3 I'm noting a different (desired) behaviour > >> > > >> > 1) add document x > >> > 2) delete document x > >> > 3) commit > >> > > >> > document x doesn't get indexed. > >> > >> If the last operation for document X is to delete it, then it will be > >> gone after the commit and not searchable. > >> > >> Order of operations is critical, and it's important to realize that Solr > >> is not transactional. With a relational database like MySQL, updates > >> made by one client can be logically separate from updates made by > >> another client. Solr (Lucene) does not have that logical separation. > >> When a commit happens, no matter where the commit comes from, changes > >> made by ALL clients before that commit will become visible. > >> > >> Thanks, > >> Shawn > >> > >> >
Re: Error in log after upgrading Solr
On 11/17/2015 12:42 AM, Shawn Heisey wrote: > I have upgraded from 5.2.1 to a 5.3.2 snapshot -- the lucene_solr_5_3 > branch plus the patch for SOLR-6188. > > I'm getting errors in my log every time I make a commit on a core. > > 2015-11-16 20:28:11.554 ERROR > (searcherExecutor-82-thread-1-processing-x:sparkinclive) [ > x:sparkinclive] o.a.s.c.SolrCore Previous SolrRequestInfo was not > closed! > req=waitSearcher=true&commit=true&wt=javabin&version=2&softCommit=true > 2015-11-16 20:28:11.554 ERROR > (searcherExecutor-82-thread-1-processing-x:sparkinclive) [ > x:sparkinclive] o.a.s.c.SolrCore prev == info : false > 2015-11-16 20:28:11.554 INFO > (searcherExecutor-82-thread-1-processing-x:sparkinclive) [ > x:sparkinclive] o.a.s.c.S.Request [sparkinclive] webapp=null path=null > params={sort=post_date+desc&event=newSearcher&q=*:*&distrib=false&qt=/lbcheck&rows=1} > hits=459866 status=0 QTime=0 > 2015-11-16 20:28:11.554 INFO > (searcherExecutor-82-thread-1-processing-x:sparkinclive) [ > x:sparkinclive] o.a.s.c.SolrCore QuerySenderListener done. These errors persist after a complete index rebuild. I haven't done any *extensive* checks, but so far the index seems to work correctly. Do I need to be concerned about this? Thanks, Shawn
unsubscribe me.
please unsubscribe me. Regards, YP
Re: unsubscribe me.
You should probably send an email to solr-user-unsubscr...@lucene.apache.org Reference links http://lucene.apache.org/solr/resources.html#community https://wiki.apache.org/solr/Unsubscribing%20from%20mailing%20lists On Wed, Nov 18, 2015 at 1:04 PM, Pramod wrote: > please unsubscribe me. > > Regards, > YP >
Re: Limiting number of parallel queries per user
Just an update: my problem turned out to be that in the search-component, I decremented the entry for the user running a query in the first call to finishStage, and didn't realize that most of the query processing and time occurs only in later stages. Because the entry was decremented so quickly, the logs made it seem like Solr is running the request serially, when it was just that I was sending queries more slowly than my SearchComponent was processing them. The search component now works and does limit the amount of queries a user runs in parallel (which is a benefit in our specific case). -- View this message in context: http://lucene.472066.n3.nabble.com/Limiting-number-of-parallel-queries-per-user-tp4240566p4240851.html Sent from the Solr - User mailing list archive at Nabble.com.
Implementing security.json is breaking ADDREPLICA
Implementing security.json is breaking ADDREPLICA I have been able to reproduce this issue with minimal changes from an out-of-the-box Zookeeper (3.4.6) and Solr (5.3.1): loading configsets/basic_configs/conf into Zookeeper, creating the security.json listed below, creating two nodes (one with a core named xmpl and one without any core)- I can provide details if helpful. The security.json is as follows: { "authentication":{ "class":"solr.BasicAuthPlugin", "credentials":{ "solr":"IV0EHq1OnNrj6gvRCwvFwTrZ1+z1oBbnQdiVC3otuq0= Ndd7LKvVBAaZIF0QAVi1ekCfAJXr1GGfLtRUXhgrF8c=", "solruser":"VgZX1TAMNHT2IJikoGdKtxQdXc+MbNwfqzf89YqcLEE= 37pPWQ9v4gciIKHuTmFmN0Rv66rnlMOFEWfEy9qjJfY="}, "":{"v":9}}, "authorization":{ "class":"solr.RuleBasedAuthorizationPlugin", "user-role":{ "solr":[ "admin", "read", "xmpladmin", "xmplgen", "xmplsel"], "solruser":[ "read", "xmplgen", "xmplsel"]}, "permissions":[ { "name":"security-edit", "role":"admin"}, { "name":"xmpl_admin", "collection":"xmpl", "path":"/admin/*", "role":"xmpladmin"}, { "name":"xmpl_sel", "collection":"xmpl", "path":"/select/*", "role":null}, { "name":"xmpl_gen", "collection":"xmpl", "path":"/*", "role":"xmplgen"}], "":{"v":42}}} When I then execute admin/collections?action=ADDREPLICA, I get errors such as the following in the solr.log of the node which was created without a core. INFO - 2015-11-17 21:03:54.157; [c:xmpl s:shard1 r:core_node2 x:xmpl_shard1_replica1] org.apache.solr.cloud.RecoveryStrategy; Starting Replication Recovery. INFO - 2015-11-17 21:03:54.158; [c:xmpl s:shard1 r:core_node2 x:xmpl_shard1_replica1] org.apache.solr.cloud.RecoveryStrategy; Begin buffering updates. INFO - 2015-11-17 21:03:54.158; [c:xmpl s:shard1 r:core_node2 x:xmpl_shard1_replica1] org.apache.solr.update.UpdateLog; Starting to buffer updates. FSUpdateLog{state=ACTIVE, tlog=null} INFO - 2015-11-17 21:03:54.159; [c:xmpl s:shard1 r:core_node2 x:xmpl_shard1_replica1] org.apache.solr.cloud.RecoveryStrategy; Attempting to replicate from http://{IP-address-redacted}:4565/solr/xmpl/. ERROR - 2015-11-17 21:03:54.166; [c:xmpl s:shard1 r:core_node2 x:xmpl_shard1_replica1] org.apache.solr.common.SolrException; Error while trying to recover:org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://{IP-address-redacted}:4565/solr/xmpl: Expected mime type application/octet-stream but got text/html. Error 401 Unauthorized request, Response code: 401 HTTP ERROR 401 Problem accessing /solr/xmpl/update. Reason: Unauthorized request, Response code: 401Powered by Jetty:// at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:528) at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:234) at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:226) at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:135) at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:152) at org.apache.solr.cloud.RecoveryStrategy.commitOnLeader(RecoveryStrategy.java:207) at org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:147) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:437) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:227) INFO - 2015-11-17 21:03:54.166; [c:xmpl s:shard1 r:core_node2 x:xmpl_shard1_replica1] org.apache.solr.update.UpdateLog; Dropping buffered updates FSUpdateLog{state=BUFFERING, tlog=null} ERROR - 2015-11-17 21:03:54.166; [c:xmpl s:shard1 r:core_node2 x:xmpl_shard1_replica1] org.apache.solr.cloud.RecoveryStrategy; Recovery failed - trying again... (2) INFO - 2015-11-17 21:03:54.166; [c:xmpl s:shard1 r:core_node2 x:xmpl_shard1_replica1] org.apache.solr.cloud.RecoveryStrategy; Wait 8.0 seconds before trying to recover again (3) And (after modifying Logging Levels), the solr.log of the node which already had a core gets errors such as the following: 2015-11-17 21:03:50.743 DEBUG (qtp59559151-87) [ ] o.e.j.s.Server REQUEST GET /solr/tpl/cloud.html on HttpChannelOverHttp@37cf94f4{r=1,c=false,a=DISPATCHED,uri=/solr/tpl/cloud.html} 2015-11-17 21:03:50.744 DEBUG (qtp59559151-87) [ ] o.e.j.s.Server RESPONSE /solr/tpl/cloud.html 200 handled=true 2015-11-17 21:03:50.802 DEBUG (qtp59559151-91) [ ] o.e.j.s.Server REQUEST GET /solr/zookeeper on HttpChannelOverHttp@37cf94f4{r=2,c=false,a=DISPATCHED,uri=/solr/zookeeper} 2015-11-17 21:03:50.803 INFO (qtp59559151-91) [ ] o.a.s.s.HttpSolrCall userPrincipal: [null] type: [UNKNOWN], collections: [], Path: [/zookeeper] 2015-11-17 21:03:50.831 DEBUG (qtp5955
Re: Security Problems
I'm very happy for the admin UI to be served another way - i.e. not direct from Jetty, if that makes the task of securing it easier. Perhaps a request handler specifically for UI resources which would make it possible to secure it all in a more straight-forward way? Upayavira On Wed, Nov 18, 2015, at 01:54 PM, Noble Paul wrote: > As of now the admin-ui calls are not protected. The static calls are > served by jetty and it bypasses the authentication mechanism > completely. If the admin UI relies on some API call which is served by > Solr. > The other option is to revamp the framework to take care of admin UI > (static content) as well. This would be cleaner solution > > > On Wed, Nov 18, 2015 at 2:32 PM, Upayavira wrote: > > Not sure I quite understand. > > > > You're saying that the cost for the UI is not large, but then suggesting > > we protect just one resource (/admin/security-check)? > > > > Why couldn't we create the permission called 'admin-ui' and protect > > everything under /admin/ui/ for example? Along with the root HTML link > > too. > > > > Upayavira > > > > On Wed, Nov 18, 2015, at 07:46 AM, Noble Paul wrote: > >> The authentication plugin is not expensive if you are talking in the > >> context of admin UI. After all it is used not like 100s of requests > >> per second. > >> > >> The simplest solution would be > >> > >> provide a well known permission name called "admin-ui" > >> > >> ensure that every admin page load makes a call to some resource say > >> "/admin/security-check" > >> > >> Then we can just protect that . > >> > >> The only concern thatI have is the false sense of security it would > >> give to the user > >> > >> But, that is a different point altogether > >> > >> On Wed, Nov 11, 2015 at 1:52 AM, Upayavira wrote: > >> > Is the authentication plugin that expensive? > >> > > >> > I can help by minifying the UI down to a smaller number of CSS/JS/etc > >> > files :-) > >> > > >> > It may be overkill, but it would also give better experience. And isn't > >> > that what most applications do? Check authentication tokens on every > >> > request? > >> > > >> > Upayavira > >> > > >> > On Tue, Nov 10, 2015, at 07:33 PM, Anshum Gupta wrote: > >> >> The reason why we bypass that is so that we don't hit the authentication > >> >> plugin for every request that comes in for static content. I think we > >> >> could > >> >> call the authentication plugin for that but that'd be an overkill. > >> >> Better > >> >> experience ? yes > >> >> > >> >> On Tue, Nov 10, 2015 at 11:24 AM, Upayavira wrote: > >> >> > >> >> > Noble, > >> >> > > >> >> > I get that a UI which is open source does not benefit from ACL > >> >> > control - > >> >> > we're not giving away anything that isn't public (other than perhaps > >> >> > info that could be used to identify the version of Solr, or even the > >> >> > fact that it *is* solr). > >> >> > > >> >> > However, from a user experience point of view, requiring credentials > >> >> > to > >> >> > see the UI would be more conventional, and therefore lead to less > >> >> > confusion. Is it possible for us to protect the UI static files, only > >> >> > for the sake of user experience, rather than security? > >> >> > > >> >> > Upayavira > >> >> > > >> >> > On Tue, Nov 10, 2015, at 12:01 PM, Noble Paul wrote: > >> >> > > The admin UI is a bunch of static pages . We don't let the ACL > >> >> > > control > >> >> > > static content > >> >> > > > >> >> > > you must blacklist all the core/collection apis and it is pretty > >> >> > > much > >> >> > > useless for anyone to access the admin UI (w/o the credentials , of > >> >> > > course) > >> >> > > > >> >> > > On Tue, Nov 10, 2015 at 7:08 AM, 马柏樟 wrote: > >> >> > > > Hi, > >> >> > > > > >> >> > > > After I configure Authentication with Basic Authentication Plugin > >> >> > > > and > >> >> > Authorization with Rule-Based Authorization Plugin, How can I prevent > >> >> > the > >> >> > strangers from visiting my solr by browser? For example, if the > >> >> > stranger > >> >> > visit the http://(my host):8983, the browser will pop up a window and > >> >> > says "the server http://(my host):8983 requires a username and > >> >> > password" > >> >> > > > >> >> > > > >> >> > > > >> >> > > -- > >> >> > > - > >> >> > > Noble Paul > >> >> > > >> >> > >> >> > >> >> > >> >> -- > >> >> Anshum Gupta > >> > >> > >> > >> -- > >> - > >> Noble Paul > > > > -- > - > Noble Paul
Boost non stemmed keywords (KStem filter)
Hi, I am using KStem factory for stemming. This stemmer converts 'france to french', 'chinese to china' etc.. I am good with this stemming but I am trying to boost the results that contain the original term compared to the stemmed terms. Is this possible? Thanks, Learner -- View this message in context: http://lucene.472066.n3.nabble.com/Boost-non-stemmed-keywords-KStem-filter-tp4240880.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Error in log after upgrading Solr
: > I'm getting errors in my log every time I make a commit on a core. Do you have any custom plugins? what is the definition of the /lbcheck handler? : > 2015-11-16 20:28:11.554 ERROR : > (searcherExecutor-82-thread-1-processing-x:sparkinclive) [ : > x:sparkinclive] o.a.s.c.SolrCore Previous SolrRequestInfo was not : > closed! : > req=waitSearcher=true&commit=true&wt=javabin&version=2&softCommit=true : > 2015-11-16 20:28:11.554 ERROR : > (searcherExecutor-82-thread-1-processing-x:sparkinclive) [ : > x:sparkinclive] o.a.s.c.SolrCore prev == info : false Those log messages ("Previous SolrRequestInfo was not..." are a sanity check designed to help catch plugins that aren't cleaning up the thread local state tracked in SolrRequestInfo (see SolrRequestInfo.setRequestInfo). speculating here Perhaps waitSearcher=true combined with QuerySenderListener is an exception that's triggering the ERROR in a totally expected situation? ie: the thread is processing the request that triggered the "commit" and in that thread QuerySenderListener fires off some local solr requests? Perhaps LocalSolrQueryRequest should be stashing/restoring SolrRequestInfo state? or perhaps SolrCore.execute and/or SolrRequestInfo should do this when it sees a LocalSolrQueryRequest ? Shawn: If my speculations are correct, this should be fairly trivial to reproduce with a small generic config -- can you file an jira w/ steps to reproduce? -Hoss http://www.lucidworks.com/
RE: Boost non stemmed keywords (KStem filter)
Hi - easiest approach is to use KeywordRepeatFilter and RemoveDuplicatesTokenFilter. This creates a slightly higher IDF for unstemmed words which might be just enough in your case. We found it not to be enough, so we also attach payloads to signify stemmed words amongst others. This allows you to decrease score for stemmed words at query time via your similarity impl. M. -Original message- > From:bbarani > Sent: Wednesday 18th November 2015 22:07 > To: solr-user@lucene.apache.org > Subject: Boost non stemmed keywords (KStem filter) > > Hi, > > I am using KStem factory for stemming. This stemmer converts 'france to > french', 'chinese to china' etc.. I am good with this stemming but I am > trying to boost the results that contain the original term compared to the > stemmed terms. Is this possible? > > Thanks, > Learner > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Boost-non-stemmed-keywords-KStem-filter-tp4240880.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: CloudSolrCloud - Commit returns but not all data is visible (occasionally)
If you see "WARNING: too many searchers on deck" or something like that in the logs, that could cause this behavior and would indicate you are opening searchers faster than Solr can keep up. - Mark On Tue, Nov 17, 2015 at 2:05 PM Erick Erickson wrote: > That's what was behind my earlier comment about perhaps > the call is timing out, thus the commit call is returning > _before_ the actual searcher is opened. But the call > coming back is not a return from commit, but from Jetty > even though the commit hasn't really returned. > > Just a guess however. > > Best, > Erick > > On Tue, Nov 17, 2015 at 12:11 AM, adfel70 wrote: > > Thanks Eric, > > I'll try to play with the autowarm config. > > > > But I have a more direct question - why does the commit return without > > waiting till the searchers are fully refreshed? > > > > Could it be that the parameter waitSearcher=true doesn't really work? > > or maybe I don't understand something here... > > > > Thanks, > > > > > > > > > > -- > > View this message in context: > http://lucene.472066.n3.nabble.com/CloudSolrCloud-Commit-returns-but-not-all-data-is-visible-occasionally-tp4240368p4240518.html > > Sent from the Solr - User mailing list archive at Nabble.com. > -- - Mark about.me/markrmiller
Re: Error in log after upgrading Solr
On 11/18/2015 2:20 PM, Chris Hostetter wrote: > : > I'm getting errors in my log every time I make a commit on a core. > > Do you have any custom plugins? > what is the definition of the /lbcheck handler? I have one simple update processor in use that I wrote myself, and we have a third-party plugin that we are using. One of my indexes does not use either of these, but has them in the configuration so they can be used later, so I removed those components from the config on those cores. The problem still happened on commits in those cores. Then I commented the firstSearcher and newSearcher listeners from all the configs on the server, and the error stopped appearing in the log, even on cores still using the custom plugins. This is the config I removed: *:* 1 post_date desc /lbcheck *:* 1 post_date desc /lbcheck > Shawn: If my speculations are correct, this should be fairly trivial to > reproduce with a small generic config -- can you file an jira w/ steps to > reproduce? I do see the same error message in a failure email from Uwe's Jenkins server a few weeks ago. I'll see if I can put together a minimal configuration to reproduce. Thanks, Shawn
RE: CloudSolrCloud - Commit returns but not all data is visible (occasionally)
Hi - i sometimes see the too many searcher warning to since some 5.x version. The warning cloud has no autoCommit and there is only a single process ever sending a commit, only once every 10-15 minutes orso. The cores are quite small, commits finish quickly and new docs are quickly searchable. I've ignored the warning so far, since it makes no sense and the problem is not really there. -Original message- > From:Mark Miller > Sent: Wednesday 18th November 2015 23:24 > To: solr-user > Subject: Re: CloudSolrCloud - Commit returns but not all data is visible > (occasionally) > > If you see "WARNING: too many searchers on deck" or something like that in > the logs, that could cause this behavior and would indicate you are opening > searchers faster than Solr can keep up. > > - Mark > > On Tue, Nov 17, 2015 at 2:05 PM Erick Erickson > wrote: > > > That's what was behind my earlier comment about perhaps > > the call is timing out, thus the commit call is returning > > _before_ the actual searcher is opened. But the call > > coming back is not a return from commit, but from Jetty > > even though the commit hasn't really returned. > > > > Just a guess however. > > > > Best, > > Erick > > > > On Tue, Nov 17, 2015 at 12:11 AM, adfel70 wrote: > > > Thanks Eric, > > > I'll try to play with the autowarm config. > > > > > > But I have a more direct question - why does the commit return without > > > waiting till the searchers are fully refreshed? > > > > > > Could it be that the parameter waitSearcher=true doesn't really work? > > > or maybe I don't understand something here... > > > > > > Thanks, > > > > > > > > > > > > > > > -- > > > View this message in context: > > http://lucene.472066.n3.nabble.com/CloudSolrCloud-Commit-returns-but-not-all-data-is-visible-occasionally-tp4240368p4240518.html > > > Sent from the Solr - User mailing list archive at Nabble.com. > > > -- > - Mark > about.me/markrmiller >
adding document with nested document require to set id
i'm trying to add document with the nested objects but don't want id to be generated automatically. When i add document without nesting it's ok.But if i add _childDocuments_ there is an error [doc=null] missing required field: id -- View this message in context: http://lucene.472066.n3.nabble.com/adding-document-with-nested-document-require-to-set-id-tp4240908.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: CloudSolrCloud - Commit returns but not all data is visible (occasionally)
On 11/17/2015 1:11 AM, adfel70 Could it be that the parameter waitSearcher=true doesn't really work? or maybe I don't understand something here... I am just guessing with this, but I think this is likely how it works: I believe that if maxWarmingSearchers is exceeded, a commit call will return more quickly than usual, and Solr will not attempt to open a new searcher with that commit, because the threshold has been exceeded. Basically, when there are too many searchers warming at once, new ones cannot be created, which means that Solr cannot make the visibility guarantees it usually makes. CloudSolrServer should be identical in function to CloudSolrClient, I don't think you have to worry about it being deprecated for right now. You'll want to switch before 6.0. Thanks, Shawn
Re: adding document with nested document require to set id
If you have id listed as a required field (which I believe you need to anyway), what do you actually get when you add a document without nesting? What does the document echo back? Because if you are getting a document back without id field when it is declared required in the schema, that would be a problem of its own. Regards, Alex. Newsletter and resources for Solr beginners and intermediates: http://www.solr-start.com/ On 18 November 2015 at 17:35, CrazyDiamond wrote: > i'm trying to add document with the nested objects but don't want id to be > generated automatically. > When i add document without nesting it's ok.But if i add _childDocuments_ > there is an error [doc=null] missing required field: id > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/adding-document-with-nested-document-require-to-set-id-tp4240908.html > Sent from the Solr - User mailing list archive at Nabble.com.
Problem with Synchronization
Hi, I encountered some problems with solr-5.3.1. After I initialized the solrcloud and set up BasicAuthPlugin and RuleBasedAuthorizationPlugin, something wrong happened to my solrcloud. I can't Synchronization as usual. The server log as follows: master log Invalid key PKIAuthenticationPlugin silver log Error while trying to recover:org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException : Error from server at http://172.16.200.35:8983/solr/t: Expected MIME type application/octet-stream but got text/html. RecoveryStrategy What can I do next? Thanks, Regards
Re: CloudSolrCloud - Commit returns but not all data is visible (occasionally)
bq: Hi - i sometimes see the too many searcher warning to since some 5.x version. The warning cloud has no autoCommit and there is only a single process ever sending a commit, only once every 10-15 minutes orso This is very surprising unless your autowarming is taking 10-15 minutes, almost assuredly impossible given your description. So my theory is that "something" is sending commits far more often than you think. I'd take a look at the Solr logs, you should see messages when commits happen. The logs should also tell you how long autowarming takes. For that matter so will the plugins/state page. Something's definitely fishy, I cannot reconcile you getting occasional messages about too many searchers and that rare a commit. On Wed, Nov 18, 2015 at 3:11 PM, Shawn Heisey wrote: > On 11/17/2015 1:11 AM, adfel70 >> >> Could it be that the parameter waitSearcher=true doesn't really work? >> or maybe I don't understand something here... > > > I am just guessing with this, but I think this is likely how it works: > > I believe that if maxWarmingSearchers is exceeded, a commit call will return > more quickly than usual, and Solr will not attempt to open a new searcher > with that commit, because the threshold has been exceeded. > > Basically, when there are too many searchers warming at once, new ones > cannot be created, which means that Solr cannot make the visibility > guarantees it usually makes. > > CloudSolrServer should be identical in function to CloudSolrClient, I don't > think you have to worry about it being deprecated for right now. You'll > want to switch before 6.0. > > Thanks, > Shawn >
Shards and Replicas
I am looking for some good articles/guidance on how to determine number of shards and replicas for an index? Thanks
Re: Shards and Replicas
On 11/18/2015 9:02 PM, Troy Edwards wrote: > I am looking for some good articles/guidance on how to determine number of > shards and replicas for an index? The long version: https://lucidworks.com/blog/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ The short version: There's no quick formula for figuring out how much hardware you need and how to divide your index onto that hardware. There are too many variables involved. Building a prototype (or ideally a full-scale environment) is the only reliable way to figure it out. Those of us who have been doing this for a long time can make educated guesses if we are presented with the right pieces of information, but frequently users will not know some of that information until the system is put into production and actually handles real queries. The only general advice I have is this: It's probably going to cost more than you think it will. Thanks, Shawn
Re: Shards and Replicas
1. No more than 100 million documents per shard. 2. Number of replicas to meet your query load and to allow for the possibility that a replica might go down. 2 or 3, maybe 4. 3. Proof of concept implementation to validate the number of documents that will query well for a given number of documents per shard. But be aware that a query for the sharded version will be slower than for a single-shard implementation. -- Jack Krupansky On Wed, Nov 18, 2015 at 11:02 PM, Troy Edwards wrote: > I am looking for some good articles/guidance on how to determine number of > shards and replicas for an index? > > Thanks >
Re: Implementing security.json is breaking ADDREPLICA
Hi Craig, Just to be sure that you're using the feature as it should be used, can you outline what is it that you're trying to do here? There are a few things that aren't clear to me here, e.g. I see permissions for the /admin handler for a particular collection. What are the kind of permissions you're trying to set up. Solr uses it's internal PKI based mechanism for inter-shard communication and so you shouldn't really be hitting this. Can you check your logs and tell me if there are any other exceptions you see while bringing the node up etc. ? Something from PKI itself. About restricting the UI, there's another thread in parallel that's been discussing exactly that. The thing with the current UI implementation is that it bypasses all of this, primarily because most of that content is static. I am not saying we should be able to put it behind the authentication layer, but just that it's not currently supported through this plugin. On Wed, Nov 18, 2015 at 11:20 AM, Oakley, Craig (NIH/NLM/NCBI) [C] < craig.oak...@nih.gov> wrote: > Implementing security.json is breaking ADDREPLICA > > I have been able to reproduce this issue with minimal changes from an > out-of-the-box Zookeeper (3.4.6) and Solr (5.3.1): loading > configsets/basic_configs/conf into Zookeeper, creating the security.json > listed below, creating two nodes (one with a core named xmpl and one > without any core)- I can provide details if helpful. > > The security.json is as follows: > > { > "authentication":{ > "class":"solr.BasicAuthPlugin", > "credentials":{ > "solr":"IV0EHq1OnNrj6gvRCwvFwTrZ1+z1oBbnQdiVC3otuq0= > Ndd7LKvVBAaZIF0QAVi1ekCfAJXr1GGfLtRUXhgrF8c=", > "solruser":"VgZX1TAMNHT2IJikoGdKtxQdXc+MbNwfqzf89YqcLEE= > 37pPWQ9v4gciIKHuTmFmN0Rv66rnlMOFEWfEy9qjJfY="}, > "":{"v":9}}, > "authorization":{ > "class":"solr.RuleBasedAuthorizationPlugin", > "user-role":{ > "solr":[ > "admin", > "read", > "xmpladmin", > "xmplgen", > "xmplsel"], > "solruser":[ > "read", > "xmplgen", > "xmplsel"]}, > "permissions":[ > { > "name":"security-edit", > "role":"admin"}, > { > "name":"xmpl_admin", > "collection":"xmpl", > "path":"/admin/*", > "role":"xmpladmin"}, > { > "name":"xmpl_sel", > "collection":"xmpl", > "path":"/select/*", > "role":null}, > { > "name":"xmpl_gen", > "collection":"xmpl", > "path":"/*", > "role":"xmplgen"}], > "":{"v":42}}} > > > > > > When I then execute admin/collections?action=ADDREPLICA, I get errors such > as the following in the solr.log of the node which was created without a > core. > > INFO - 2015-11-17 21:03:54.157; [c:xmpl s:shard1 r:core_node2 > x:xmpl_shard1_replica1] org.apache.solr.cloud.RecoveryStrategy; Starting > Replication Recovery. > INFO - 2015-11-17 21:03:54.158; [c:xmpl s:shard1 r:core_node2 > x:xmpl_shard1_replica1] org.apache.solr.cloud.RecoveryStrategy; Begin > buffering updates. > INFO - 2015-11-17 21:03:54.158; [c:xmpl s:shard1 r:core_node2 > x:xmpl_shard1_replica1] org.apache.solr.update.UpdateLog; Starting to > buffer updates. FSUpdateLog{state=ACTIVE, tlog=null} > INFO - 2015-11-17 21:03:54.159; [c:xmpl s:shard1 r:core_node2 > x:xmpl_shard1_replica1] org.apache.solr.cloud.RecoveryStrategy; Attempting > to replicate from http://{IP-address-redacted}:4565/solr/xmpl/. > ERROR - 2015-11-17 21:03:54.166; [c:xmpl s:shard1 r:core_node2 > x:xmpl_shard1_replica1] org.apache.solr.common.SolrException; Error while > trying to > recover:org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: > Error from server at http://{IP-address-redacted}:4565/solr/xmpl: > Expected mime type application/octet-stream but got text/html. > > > Error 401 Unauthorized request, Response code: 401 > > HTTP ERROR 401 > Problem accessing /solr/xmpl/update. Reason: > Unauthorized request, Response code: > 401Powered by Jetty:// > > > > > at > org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:528) > at > org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:234) > at > org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:226) > at > org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:135) > at > org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:152) > at > org.apache.solr.cloud.RecoveryStrategy.commitOnLeader(RecoveryStrategy.java:207) > at > org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:147) > at > org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:437) > at > org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:227) > > INFO - 2015-11-17 21:03:54.166; [c:xmpl s:shard1 r:core_node2 > x:xmpl_shard1_replica1] org.apache.