Re: Performance testing on SOLR cloud

2015-11-18 Thread Emir Arnautovic

Hi Aswath,
It is not common to test only QPS unless it is static index most of the 
time. Usually you have to test and tune worst case scenario - max 
expected indexing rate + queries. You can get more QPS by reducing query 
latency or by increasing number of replicas. You manage latency by 
tuning Solr/JVM/queries and/or by sharding index. You first tune index 
without replication and when sure it is best single index can provide, 
you introduce replicas to achieve required throughput.


Hard part is tuning Solr. You can do it without specialized tools, but 
tools help a lot. One such tool is Sematext's SPM - 
https://sematext.com/spm/index.html where you can see all necessary 
Solr/JVM/OS metrics needed to tune Solr. It also provides QPS graph.


With index your size, unless documents are really big, you can start 
without sharding. After tuning, if not satisfied with query latency, you 
can try splitting to two shards.


Thanks,
Emir

--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


On 17.11.2015 23:45, Aswath Srinivasan (TMS) wrote:

Hi fellow developers,

Please share your experience, on how you did performance testing on SOLR? What 
I'm trying to do is have SOLR cloud on 3 Linux servers with 16 GB RAM and index 
a total of 2.2 million. Yet to decide how many shards and replicas to have (Any 
hint on this is welcome too, basically 'only' performance testing, so suggest 
the number of shards and replicas if you can). Ultimately, I'm trying to find 
the QPS that this SOLR cloud set up can handle.

To summarize,

1.   Find the QPS that my solr cloud set up can support

2.   Using 5.3.1 version with external zookeeper

3.   3 Linux servers with 16 GB RAM and index a total of 2.2 million documents

4.   Yet to decide number of shards and replicas

5.   Not using any custom search application (performance testing for SOLR and 
not for Search portal)

Thank you





Re: Security Problems

2015-11-18 Thread Upayavira
Not sure I quite understand.

You're saying that the cost for the UI is not large, but then suggesting
we protect just one resource (/admin/security-check)?

Why couldn't we create the permission called 'admin-ui' and protect
everything under /admin/ui/ for example? Along with the root HTML link
too.

Upayavira

On Wed, Nov 18, 2015, at 07:46 AM, Noble Paul wrote:
> The authentication plugin is not expensive if you are talking in the
> context of admin UI. After all it is used not like 100s of requests
> per second.
> 
> The simplest solution would be
> 
> provide a well known permission name called "admin-ui"
> 
> ensure that every admin page load makes a call to some resource say
> "/admin/security-check"
> 
> Then we can just protect that .
> 
> The only concern thatI have is the false sense of security it would
> give to the user
> 
> But, that is a different point altogether
> 
> On Wed, Nov 11, 2015 at 1:52 AM, Upayavira  wrote:
> > Is the authentication plugin that expensive?
> >
> > I can help by minifying the UI down to a smaller number of CSS/JS/etc
> > files :-)
> >
> > It may be overkill, but it would also give better experience. And isn't
> > that what most applications do? Check authentication tokens on every
> > request?
> >
> > Upayavira
> >
> > On Tue, Nov 10, 2015, at 07:33 PM, Anshum Gupta wrote:
> >> The reason why we bypass that is so that we don't hit the authentication
> >> plugin for every request that comes in for static content. I think we
> >> could
> >> call the authentication plugin for that but that'd be an overkill. Better
> >> experience ? yes
> >>
> >> On Tue, Nov 10, 2015 at 11:24 AM, Upayavira  wrote:
> >>
> >> > Noble,
> >> >
> >> > I get that a UI which is open source does not benefit from ACL control -
> >> > we're not giving away anything that isn't public (other than perhaps
> >> > info that could be used to identify the version of Solr, or even the
> >> > fact that it *is* solr).
> >> >
> >> > However, from a user experience point of view, requiring credentials to
> >> > see the UI would be more conventional, and therefore lead to less
> >> > confusion. Is it possible for us to protect the UI static files, only
> >> > for the sake of user experience, rather than security?
> >> >
> >> > Upayavira
> >> >
> >> > On Tue, Nov 10, 2015, at 12:01 PM, Noble Paul wrote:
> >> > > The admin UI is a bunch of static pages . We don't let the ACL control
> >> > > static content
> >> > >
> >> > > you must blacklist all the core/collection apis and it is pretty much
> >> > > useless for anyone to access the admin UI (w/o the credentials , of
> >> > > course)
> >> > >
> >> > > On Tue, Nov 10, 2015 at 7:08 AM, 马柏樟  wrote:
> >> > > > Hi,
> >> > > >
> >> > > > After I configure Authentication with Basic Authentication Plugin and
> >> > Authorization with Rule-Based Authorization Plugin, How can I prevent the
> >> > strangers from visiting my solr by browser? For example, if the stranger
> >> > visit the http://(my host):8983, the browser will pop up a window and
> >> > says "the server http://(my host):8983 requires a username and
> >> > password"
> >> > >
> >> > >
> >> > >
> >> > > --
> >> > > -
> >> > > Noble Paul
> >> >
> >>
> >>
> >>
> >> --
> >> Anshum Gupta
> 
> 
> 
> -- 
> -
> Noble Paul


Re: search for documents where all words of field present in the query

2015-11-18 Thread Ahmet Arslan


Hi Jim,

I think you could do some magic with function queries.
https://cwiki.apache.org/confluence/display/solr/Function+Queries


Index number of unique words in the product title e.g.
title = john smith
length = 2

return products if the number of matching terms equals to the number of words 
in the title.

Perhaps there is a better way but something like below should work in theory.

termfreq(title,'john') 
termfreq(title,'smith')

fq={!frange l=0 u=0} sub(length, sum(termfreq(title,'smith'), 
termfreq(title,'smith')))
Ahmet


On Tuesday, November 17, 2015 4:31 PM, superjim  wrote:



How would I form a query where all of the words in a field must be present in
the query (but possibly more). For example, if I have the following words in
a text field: "John Smith"

A query for "John" should return no results

A query for "Smith" should return no results

A query for "John Smith" should return that one result

A query for "banana John Smith purple monkey dishwasher" should return that
one result





--
View this message in context: 
http://lucene.472066.n3.nabble.com/search-for-documents-where-all-words-of-field-present-in-the-query-tp4240564.html
Sent from the Solr - User mailing list archive at Nabble.com.


Upgrading from 4.x to 5.x

2015-11-18 Thread Daniel Miller

Hi!

I'm a very inexperienced user with Solr.  I've been using Solr to 
provide indexes for my Dovecot IMAP server.  Using version 3.x, and 
later 4.x, I have been able to do so without too much of a challenge.  
However, version 5.x has certainly changed quite a bit and I'm very 
uncertain how to proceed.


I currently have a working 4.10.3 installation, using the "example" 
server provided with the Solr distribution package, and a schema.xml 
optimized for Dovecot.  I haven't found anything on migrating from 4 to 
5 - at least anything I actually understood.  Can you point me in the 
right direction?


--
Daniel

Re: CloudSolrClient Connect To Zookeeper with ACL Protected files

2015-11-18 Thread Alan Woodward
At the moment it seems that it's only settable via System properties - see 
https://cwiki.apache.org/confluence/display/solr/ZooKeeper+Access+Control.  But 
it would be nice to do this programmatically as well, maybe worth opening a 
JIRA ticket?

Alan Woodward
www.flax.co.uk


On 17 Nov 2015, at 16:44, Kevin Lee wrote:

> Does anyone know if it is possible to set the ACL credentials in 
> CloudSolrClient needed to access a protected resource in Zookeeper?
> 
> Thanks!
> 
>> On Nov 13, 2015, at 1:20 PM, Kevin Lee  wrote:
>> 
>> Hi,
>> 
>> Is there a way to use CloudSolrClient and connect to a Zookeeper instance 
>> where ACL is enabled and resources/files like /live_nodes, etc are ACL 
>> protected?  Couldn’t find a way to set the ACL credentials.
>> 
>> Thanks,
>> Kevin
> 



Re: Solr Search: Access Control / Role based security

2015-11-18 Thread Charlie Hull

On 18/11/2015 07:55, Noble Paul wrote:

I haven't evaluated manifoldCF for this .
However , my preference would be to have a generic mechanism in built
into Solr to restrict user access to certain docs based on some field
values. Relying on external tools make life complex for users who do
not like it.

Our strategy is

* Provide a pluggable framework so that custom external solutions can
be plugged in
* Provide a standard implementation which does not depend upon any
external solutions

any suggestions are welcome


Hi,

We're working on an external JOIN as part of the BioSolr project: 
basically this lets you filter result sets with an external query (which 
could be an authentication system of some kind). There's a patch at 
https://issues.apache.org/jira/browse/SOLR-7341 and the author, Tom 
Winch, is working on a blog post to explain it further - it'll hopefully 
be up on http://www.flax.co.uk/blog within the week.


Cheers

Charlie

PS If anyone fancies a trip to Cambridge UK this February we're running 
a free 'search for bioinformatics' event 
http://www.ebi.ac.uk/pdbe/about/events/open-source-search-bioinformatics



On Wed, Nov 11, 2015 at 12:07 AM, Susheel Kumar  wrote:

Thanks everyone for the suggestions.

Hi Noble - Were there any thoughts made on utilizing Apache ManifoldCF
while developing Authentication/Authorization plugins or anything to add
there.

Thanks,
Susheel

On Tue, Nov 10, 2015 at 5:01 AM, Alessandro Benedetti 
wrote:



I've been working for a while with Apache ManifoldCF and Enterprise Search
in Solr ( with Document level security) .
Basically you can add a couple of extra fields , for example :

allow_token : containing all the tokens that can view the document
deny_token : containing all the tokens that are denied to view the document

Apache ManifoldCF provides an integration that add an additional layer, and
is able to combine different data sources permission schemes.
The Authority Service endpoint will take in input the user name and return
all the allow_token values and deny_token.
At this point you can append the related filter queries to your queries and
be sure that the user will only see what is supposed to see.

It's basically an extension of the strategy you were proposing, role based.
Of course keep protected your endpoints and avoid users to put custom fq,
or all your document security model would be useless :)

Cheers


On 9 November 2015 at 21:52, Scott Stults <
sstu...@opensourceconnections.com

wrote:



Susheel,

This is perfectly fine for simple use-cases and has the benefit that the
filterCache will help things stay nice and speedy. Apache ManifoldCF

goes a

bit further and ties back to your authentication and authorization
mechanism:




http://manifoldcf.apache.org/release/trunk/en_US/concepts.html#ManifoldCF+security+model



k/r,
Scott

On Thu, Nov 5, 2015 at 2:26 PM, Susheel Kumar 
wrote:


Hi,

I have seen couple of use cases / need where we want to restrict result

of

search based on role of a user.  For e.g.

- if user role is admin, any document from the search result will be
returned
- if user role is manager, only documents intended for managers will be
returned
- if user role is worker, only documents intended for workers will be
returned

Typical practise is to tag the documents with the roles (using a
multi-valued field) during indexing and then during search append

filter

query to restrict result based on roles.

Wondering if there is any other better way out there and if this common
requirement should be added as a Solr feature/plugin.

The current security plugins are more towards making Solr

apis/resources

secure not towards securing/controlling data during search.





https://cwiki.apache.org/confluence/display/solr/Authentication+and+Authorization+Plugins



Please share your thoughts.

Thanks,
Susheel





--
Scott Stults | Founder & Solutions Architect | OpenSource Connections,

LLC

| 434.409.2780
http://www.opensourceconnections.com





--
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England








--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk


Configure it on server

2015-11-18 Thread Prateek Sharma
Hi,

Can you help me out how I can configure it on a server?
It was configured on one of our servers but I am unable to replicate it.

Can you please help.

Thanks,
Prateek

This message and the information contained herein is proprietary and 
confidential and subject to the Amdocs policy statement,
you may review at http://www.amdocs.com/email_disclaimer.asp


Re: Configure it on server

2015-11-18 Thread Aman Tandon
Hi Prateek,

Your question is little ambiguous. Could you please describe it more
precisely what you want to configure on server and what is your requirement
and problem. This will be more helpful to understand your problem.

With Regards
Aman Tandon

On Wed, Nov 18, 2015 at 4:29 PM, Prateek Sharma 
wrote:

> Hi,
>
> Can you help me out how I can configure it on a server?
> It was configured on one of our servers but I am unable to replicate it.
>
> Can you please help.
>
> Thanks,
> Prateek
>
> This message and the information contained herein is proprietary and
> confidential and subject to the Amdocs policy statement,
> you may review at http://www.amdocs.com/email_disclaimer.asp
>


Synchronization Problems

2015-11-18 Thread 马柏樟
Hi, I have encountered some problems with solr-5.3.1. After I initialized the 
solrcloud and set up BasicAuthPlugin and RuleBasedAuthorizationPlugin, 
something wrong happened to my solrcloud. I can't Synchronization as usual. The 
server log as follows:
master log
Invalid key PKIAuthenticationPlugin
silver log
Error while trying to 
recover:org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: 
Error from server at http://172.16.200.35:8983/solr/t: Expected MIME type 
application/octet-stream but got text/html.  RecoveryStrategy


What can I do next?


Thanks,
Regards

Re: Upgrading from 4.x to 5.x

2015-11-18 Thread Jan Høydahl
Hi

You could try this

Instead of example/, use the server/ folder (it has Jetty in it)
Start Solr using bin/solr start script instead of java -jar start.jar …
Leave your solrconfig and schema as is to keep back-compat with 4.x.
You may need to remove use of 3.x classes that were deprecated in 4.x

https://cwiki.apache.org/confluence/display/solr/Major+Changes+from+Solr+4+to+Solr+5

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 18. nov. 2015 kl. 10.10 skrev Daniel Miller :
> 
> Hi!
> 
> I'm a very inexperienced user with Solr.  I've been using Solr to provide 
> indexes for my Dovecot IMAP server.  Using version 3.x, and later 4.x, I have 
> been able to do so without too much of a challenge.  However, version 5.x has 
> certainly changed quite a bit and I'm very uncertain how to proceed.
> 
> I currently have a working 4.10.3 installation, using the "example" server 
> provided with the Solr distribution package, and a schema.xml optimized for 
> Dovecot.  I haven't found anything on migrating from 4 to 5 - at least 
> anything I actually understood.  Can you point me in the right direction?
> 
> --
> Daniel



Re: Security Problems

2015-11-18 Thread Noble Paul
As of now the admin-ui calls are not protected. The static calls are
served by jetty and it bypasses the authentication mechanism
completely. If the admin UI relies on some API call which is served by
Solr.
The other option is to revamp the framework to take care of admin UI
(static content) as well. This would be cleaner solution


On Wed, Nov 18, 2015 at 2:32 PM, Upayavira  wrote:
> Not sure I quite understand.
>
> You're saying that the cost for the UI is not large, but then suggesting
> we protect just one resource (/admin/security-check)?
>
> Why couldn't we create the permission called 'admin-ui' and protect
> everything under /admin/ui/ for example? Along with the root HTML link
> too.
>
> Upayavira
>
> On Wed, Nov 18, 2015, at 07:46 AM, Noble Paul wrote:
>> The authentication plugin is not expensive if you are talking in the
>> context of admin UI. After all it is used not like 100s of requests
>> per second.
>>
>> The simplest solution would be
>>
>> provide a well known permission name called "admin-ui"
>>
>> ensure that every admin page load makes a call to some resource say
>> "/admin/security-check"
>>
>> Then we can just protect that .
>>
>> The only concern thatI have is the false sense of security it would
>> give to the user
>>
>> But, that is a different point altogether
>>
>> On Wed, Nov 11, 2015 at 1:52 AM, Upayavira  wrote:
>> > Is the authentication plugin that expensive?
>> >
>> > I can help by minifying the UI down to a smaller number of CSS/JS/etc
>> > files :-)
>> >
>> > It may be overkill, but it would also give better experience. And isn't
>> > that what most applications do? Check authentication tokens on every
>> > request?
>> >
>> > Upayavira
>> >
>> > On Tue, Nov 10, 2015, at 07:33 PM, Anshum Gupta wrote:
>> >> The reason why we bypass that is so that we don't hit the authentication
>> >> plugin for every request that comes in for static content. I think we
>> >> could
>> >> call the authentication plugin for that but that'd be an overkill. Better
>> >> experience ? yes
>> >>
>> >> On Tue, Nov 10, 2015 at 11:24 AM, Upayavira  wrote:
>> >>
>> >> > Noble,
>> >> >
>> >> > I get that a UI which is open source does not benefit from ACL control -
>> >> > we're not giving away anything that isn't public (other than perhaps
>> >> > info that could be used to identify the version of Solr, or even the
>> >> > fact that it *is* solr).
>> >> >
>> >> > However, from a user experience point of view, requiring credentials to
>> >> > see the UI would be more conventional, and therefore lead to less
>> >> > confusion. Is it possible for us to protect the UI static files, only
>> >> > for the sake of user experience, rather than security?
>> >> >
>> >> > Upayavira
>> >> >
>> >> > On Tue, Nov 10, 2015, at 12:01 PM, Noble Paul wrote:
>> >> > > The admin UI is a bunch of static pages . We don't let the ACL control
>> >> > > static content
>> >> > >
>> >> > > you must blacklist all the core/collection apis and it is pretty much
>> >> > > useless for anyone to access the admin UI (w/o the credentials , of
>> >> > > course)
>> >> > >
>> >> > > On Tue, Nov 10, 2015 at 7:08 AM, 马柏樟  wrote:
>> >> > > > Hi,
>> >> > > >
>> >> > > > After I configure Authentication with Basic Authentication Plugin 
>> >> > > > and
>> >> > Authorization with Rule-Based Authorization Plugin, How can I prevent 
>> >> > the
>> >> > strangers from visiting my solr by browser? For example, if the stranger
>> >> > visit the http://(my host):8983, the browser will pop up a window and
>> >> > says "the server http://(my host):8983 requires a username and
>> >> > password"
>> >> > >
>> >> > >
>> >> > >
>> >> > > --
>> >> > > -
>> >> > > Noble Paul
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Anshum Gupta
>>
>>
>>
>> --
>> -
>> Noble Paul



-- 
-
Noble Paul


Re: search for documents where all words of field present in the query

2015-11-18 Thread Alessandro Benedetti
Assuming this is the only, specific kind of search you want, what about
using shingles of tokens at query time and keyword tokenizer at indexing
time ?

Ideally you don't tokenise at indexing time.
At query time you build your shingles ( apparently you need not only
adiacent token shingles, so play a little bit with it and possibly
customise it) .

If you give us more information, maybe we can design a better solution.

Cheers


On 18 November 2015 at 09:02, Ahmet Arslan 
wrote:

>
>
> Hi Jim,
>
> I think you could do some magic with function queries.
> https://cwiki.apache.org/confluence/display/solr/Function+Queries
>
>
> Index number of unique words in the product title e.g.
> title = john smith
> length = 2
>
> return products if the number of matching terms equals to the number of
> words in the title.
>
> Perhaps there is a better way but something like below should work in
> theory.
>
> termfreq(title,'john')
> termfreq(title,'smith')
>
> fq={!frange l=0 u=0} sub(length, sum(termfreq(title,'smith'),
> termfreq(title,'smith')))
> Ahmet
>
>
> On Tuesday, November 17, 2015 4:31 PM, superjim  wrote:
>
>
>
> How would I form a query where all of the words in a field must be present
> in
> the query (but possibly more). For example, if I have the following words
> in
> a text field: "John Smith"
>
> A query for "John" should return no results
>
> A query for "Smith" should return no results
>
> A query for "John Smith" should return that one result
>
> A query for "banana John Smith purple monkey dishwasher" should return that
> one result
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/search-for-documents-where-all-words-of-field-present-in-the-query-tp4240564.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: Security Problems

2015-11-18 Thread Jan Høydahl
I tried out BasicAuthPlugin today.
Surprised that not admin UI is protected.
But even more surprised that only /select seems to be protected for not logged 
in users.
I can create collections and /update documents without being prompted for pw.

My security.json is https://gist.github.com/janhoy/d18854c75461816fb947

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 18. nov. 2015 kl. 14.54 skrev Noble Paul :
> 
> As of now the admin-ui calls are not protected. The static calls are
> served by jetty and it bypasses the authentication mechanism
> completely. If the admin UI relies on some API call which is served by
> Solr.
> The other option is to revamp the framework to take care of admin UI
> (static content) as well. This would be cleaner solution
> 
> 
> On Wed, Nov 18, 2015 at 2:32 PM, Upayavira  wrote:
>> Not sure I quite understand.
>> 
>> You're saying that the cost for the UI is not large, but then suggesting
>> we protect just one resource (/admin/security-check)?
>> 
>> Why couldn't we create the permission called 'admin-ui' and protect
>> everything under /admin/ui/ for example? Along with the root HTML link
>> too.
>> 
>> Upayavira
>> 
>> On Wed, Nov 18, 2015, at 07:46 AM, Noble Paul wrote:
>>> The authentication plugin is not expensive if you are talking in the
>>> context of admin UI. After all it is used not like 100s of requests
>>> per second.
>>> 
>>> The simplest solution would be
>>> 
>>> provide a well known permission name called "admin-ui"
>>> 
>>> ensure that every admin page load makes a call to some resource say
>>> "/admin/security-check"
>>> 
>>> Then we can just protect that .
>>> 
>>> The only concern thatI have is the false sense of security it would
>>> give to the user
>>> 
>>> But, that is a different point altogether
>>> 
>>> On Wed, Nov 11, 2015 at 1:52 AM, Upayavira  wrote:
 Is the authentication plugin that expensive?
 
 I can help by minifying the UI down to a smaller number of CSS/JS/etc
 files :-)
 
 It may be overkill, but it would also give better experience. And isn't
 that what most applications do? Check authentication tokens on every
 request?
 
 Upayavira
 
 On Tue, Nov 10, 2015, at 07:33 PM, Anshum Gupta wrote:
> The reason why we bypass that is so that we don't hit the authentication
> plugin for every request that comes in for static content. I think we
> could
> call the authentication plugin for that but that'd be an overkill. Better
> experience ? yes
> 
> On Tue, Nov 10, 2015 at 11:24 AM, Upayavira  wrote:
> 
>> Noble,
>> 
>> I get that a UI which is open source does not benefit from ACL control -
>> we're not giving away anything that isn't public (other than perhaps
>> info that could be used to identify the version of Solr, or even the
>> fact that it *is* solr).
>> 
>> However, from a user experience point of view, requiring credentials to
>> see the UI would be more conventional, and therefore lead to less
>> confusion. Is it possible for us to protect the UI static files, only
>> for the sake of user experience, rather than security?
>> 
>> Upayavira
>> 
>> On Tue, Nov 10, 2015, at 12:01 PM, Noble Paul wrote:
>>> The admin UI is a bunch of static pages . We don't let the ACL control
>>> static content
>>> 
>>> you must blacklist all the core/collection apis and it is pretty much
>>> useless for anyone to access the admin UI (w/o the credentials , of
>>> course)
>>> 
>>> On Tue, Nov 10, 2015 at 7:08 AM, 马柏樟  wrote:
 Hi,
 
 After I configure Authentication with Basic Authentication Plugin and
>> Authorization with Rule-Based Authorization Plugin, How can I prevent the
>> strangers from visiting my solr by browser? For example, if the stranger
>> visit the http://(my host):8983, the browser will pop up a window and
>> says "the server http://(my host):8983 requires a username and
>> password"
>>> 
>>> 
>>> 
>>> --
>>> -
>>> Noble Paul
>> 
> 
> 
> 
> --
> Anshum Gupta
>>> 
>>> 
>>> 
>>> --
>>> -
>>> Noble Paul
> 
> 
> 
> -- 
> -
> Noble Paul



Re: add and then delete same document before commit,

2015-11-18 Thread Matteo Grolla
On Solr 4.10.3 I'm noting a different (desired) behaviour

1) add document x
2) delete document x
3) commit

document x doesn't get indexed.
The question now is: Can I count on this behaviour or is it just incidental?

2014-11-05 22:21 GMT+01:00 Matteo Grolla :

> Perfectly clear,
> thanks a lot!
>
> Il giorno 05/nov/2014, alle ore 13:48, Jack Krupansky ha scritto:
>
> > Document x doesn't exist - in terms of visibility - until the commit, so
> the delete will no-op since a query of Lucene will not "see" the
> uncommitted new document.
> >
> > -- Jack Krupansky
> >
> > -Original Message- From: Matteo Grolla
> > Sent: Wednesday, November 5, 2014 4:47 AM
> > To: solr-user@lucene.apache.org
> > Subject: add and then delete same document before commit,
> >
> > Can anyone tell me the behavior of solr (and if it's consistent) when I
> do what follows:
> > 1) add document x
> > 2) delete document x
> > 3) commit
> >
> > I've tried with solr 4.5.0 and document x get's indexed
> >
> > Matteo=
>
>


Re: add and then delete same document before commit,

2015-11-18 Thread Shawn Heisey
On 11/18/2015 8:21 AM, Matteo Grolla wrote:
> On Solr 4.10.3 I'm noting a different (desired) behaviour
> 
> 1) add document x
> 2) delete document x
> 3) commit
> 
> document x doesn't get indexed.

If the last operation for document X is to delete it, then it will be
gone after the commit and not searchable.

Order of operations is critical, and it's important to realize that Solr
is not transactional.  With a relational database like MySQL, updates
made by one client can be logically separate from updates made by
another client.  Solr (Lucene) does not have that logical separation.
When a commit happens, no matter where the commit comes from, changes
made by ALL clients before that commit will become visible.

Thanks,
Shawn



Re: Security Problems

2015-11-18 Thread Noble Paul
Everything requires explicit rules, if you wish to protect "/update/*"
create a permission with name "update" and assign a role for the same.
If you don't have an explicit rule, those paths are accessible by all

On Wed, Nov 18, 2015 at 8:10 PM, Jan Høydahl  wrote:
> I tried out BasicAuthPlugin today.
> Surprised that not admin UI is protected.
> But even more surprised that only /select seems to be protected for not 
> logged in users.
> I can create collections and /update documents without being prompted for pw.
>
> My security.json is https://gist.github.com/janhoy/d18854c75461816fb947
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
>> 18. nov. 2015 kl. 14.54 skrev Noble Paul :
>>
>> As of now the admin-ui calls are not protected. The static calls are
>> served by jetty and it bypasses the authentication mechanism
>> completely. If the admin UI relies on some API call which is served by
>> Solr.
>> The other option is to revamp the framework to take care of admin UI
>> (static content) as well. This would be cleaner solution
>>
>>
>> On Wed, Nov 18, 2015 at 2:32 PM, Upayavira  wrote:
>>> Not sure I quite understand.
>>>
>>> You're saying that the cost for the UI is not large, but then suggesting
>>> we protect just one resource (/admin/security-check)?
>>>
>>> Why couldn't we create the permission called 'admin-ui' and protect
>>> everything under /admin/ui/ for example? Along with the root HTML link
>>> too.
>>>
>>> Upayavira
>>>
>>> On Wed, Nov 18, 2015, at 07:46 AM, Noble Paul wrote:
 The authentication plugin is not expensive if you are talking in the
 context of admin UI. After all it is used not like 100s of requests
 per second.

 The simplest solution would be

 provide a well known permission name called "admin-ui"

 ensure that every admin page load makes a call to some resource say
 "/admin/security-check"

 Then we can just protect that .

 The only concern thatI have is the false sense of security it would
 give to the user

 But, that is a different point altogether

 On Wed, Nov 11, 2015 at 1:52 AM, Upayavira  wrote:
> Is the authentication plugin that expensive?
>
> I can help by minifying the UI down to a smaller number of CSS/JS/etc
> files :-)
>
> It may be overkill, but it would also give better experience. And isn't
> that what most applications do? Check authentication tokens on every
> request?
>
> Upayavira
>
> On Tue, Nov 10, 2015, at 07:33 PM, Anshum Gupta wrote:
>> The reason why we bypass that is so that we don't hit the authentication
>> plugin for every request that comes in for static content. I think we
>> could
>> call the authentication plugin for that but that'd be an overkill. Better
>> experience ? yes
>>
>> On Tue, Nov 10, 2015 at 11:24 AM, Upayavira  wrote:
>>
>>> Noble,
>>>
>>> I get that a UI which is open source does not benefit from ACL control -
>>> we're not giving away anything that isn't public (other than perhaps
>>> info that could be used to identify the version of Solr, or even the
>>> fact that it *is* solr).
>>>
>>> However, from a user experience point of view, requiring credentials to
>>> see the UI would be more conventional, and therefore lead to less
>>> confusion. Is it possible for us to protect the UI static files, only
>>> for the sake of user experience, rather than security?
>>>
>>> Upayavira
>>>
>>> On Tue, Nov 10, 2015, at 12:01 PM, Noble Paul wrote:
 The admin UI is a bunch of static pages . We don't let the ACL control
 static content

 you must blacklist all the core/collection apis and it is pretty much
 useless for anyone to access the admin UI (w/o the credentials , of
 course)

 On Tue, Nov 10, 2015 at 7:08 AM, 马柏樟  wrote:
> Hi,
>
> After I configure Authentication with Basic Authentication Plugin and
>>> Authorization with Rule-Based Authorization Plugin, How can I prevent 
>>> the
>>> strangers from visiting my solr by browser? For example, if the stranger
>>> visit the http://(my host):8983, the browser will pop up a window and
>>> says "the server http://(my host):8983 requires a username and
>>> password"



 --
 -
 Noble Paul
>>>
>>
>>
>>
>> --
>> Anshum Gupta



 --
 -
 Noble Paul
>>
>>
>>
>> --
>> -
>> Noble Paul
>



-- 
-
Noble Paul


Re: add and then delete same document before commit,

2015-11-18 Thread Matteo Grolla
Thanks Shawn,
   I'm aware that solr isn't transactional and I don't need this property:
a single application is indexing.
With solr 4.6 I was noting a different behaviour, with 4.10 I'm observing
the desired one.
I'd like to know If I can count on this behaviour to be maintained by
successive solr version.

2015-11-18 16:51 GMT+01:00 Shawn Heisey :

> On 11/18/2015 8:21 AM, Matteo Grolla wrote:
> > On Solr 4.10.3 I'm noting a different (desired) behaviour
> >
> > 1) add document x
> > 2) delete document x
> > 3) commit
> >
> > document x doesn't get indexed.
>
> If the last operation for document X is to delete it, then it will be
> gone after the commit and not searchable.
>
> Order of operations is critical, and it's important to realize that Solr
> is not transactional.  With a relational database like MySQL, updates
> made by one client can be logically separate from updates made by
> another client.  Solr (Lucene) does not have that logical separation.
> When a commit happens, no matter where the commit comes from, changes
> made by ALL clients before that commit will become visible.
>
> Thanks,
> Shawn
>
>


Re: CloudSolrClient Connect To Zookeeper with ACL Protected files

2015-11-18 Thread Kevin Lee
Thanks Alan!

That works!  I was looking for a programatic way to do it, but this will work 
for now as it doesn’t seem to be supported.

- Kevin

> On Nov 18, 2015, at 1:24 AM, Alan Woodward  wrote:
> 
> At the moment it seems that it's only settable via System properties - see 
> https://cwiki.apache.org/confluence/display/solr/ZooKeeper+Access+Control.  
> But it would be nice to do this programmatically as well, maybe worth opening 
> a JIRA ticket?
> 
> Alan Woodward
> www.flax.co.uk
> 
> 
> On 17 Nov 2015, at 16:44, Kevin Lee wrote:
> 
>> Does anyone know if it is possible to set the ACL credentials in 
>> CloudSolrClient needed to access a protected resource in Zookeeper?
>> 
>> Thanks!
>> 
>>> On Nov 13, 2015, at 1:20 PM, Kevin Lee  wrote:
>>> 
>>> Hi,
>>> 
>>> Is there a way to use CloudSolrClient and connect to a Zookeeper instance 
>>> where ACL is enabled and resources/files like /live_nodes, etc are ACL 
>>> protected?  Couldn’t find a way to set the ACL credentials.
>>> 
>>> Thanks,
>>> Kevin
>> 
> 



Re: add and then delete same document before commit,

2015-11-18 Thread Erick Erickson
Then that was probably a bug in 4.6. There's a lot
of work that's been done since then, and distributed
updates that are mixed like this are particularly
"interesting".

So you should be able to count on this.

One other possibility: Is it possible that this was a false
failure in 4.6 and a commit happened between the original
insert and the delete? Just askin'...

Best,
Erick

On Wed, Nov 18, 2015 at 8:21 AM, Matteo Grolla  wrote:
> Thanks Shawn,
>I'm aware that solr isn't transactional and I don't need this property:
> a single application is indexing.
> With solr 4.6 I was noting a different behaviour, with 4.10 I'm observing
> the desired one.
> I'd like to know If I can count on this behaviour to be maintained by
> successive solr version.
>
> 2015-11-18 16:51 GMT+01:00 Shawn Heisey :
>
>> On 11/18/2015 8:21 AM, Matteo Grolla wrote:
>> > On Solr 4.10.3 I'm noting a different (desired) behaviour
>> >
>> > 1) add document x
>> > 2) delete document x
>> > 3) commit
>> >
>> > document x doesn't get indexed.
>>
>> If the last operation for document X is to delete it, then it will be
>> gone after the commit and not searchable.
>>
>> Order of operations is critical, and it's important to realize that Solr
>> is not transactional.  With a relational database like MySQL, updates
>> made by one client can be logically separate from updates made by
>> another client.  Solr (Lucene) does not have that logical separation.
>> When a commit happens, no matter where the commit comes from, changes
>> made by ALL clients before that commit will become visible.
>>
>> Thanks,
>> Shawn
>>
>>


Re: add and then delete same document before commit,

2015-11-18 Thread Matteo Grolla
Thanks Erik,
 I observed the wrong behaviour on 4.6 in a controlled environment with
a very simple test case, so It's was probably a bug (or I was drunk ;-) )
Really thanks again!!!

2015-11-18 17:40 GMT+01:00 Erick Erickson :

> Then that was probably a bug in 4.6. There's a lot
> of work that's been done since then, and distributed
> updates that are mixed like this are particularly
> "interesting".
>
> So you should be able to count on this.
>
> One other possibility: Is it possible that this was a false
> failure in 4.6 and a commit happened between the original
> insert and the delete? Just askin'...
>
> Best,
> Erick
>
> On Wed, Nov 18, 2015 at 8:21 AM, Matteo Grolla 
> wrote:
> > Thanks Shawn,
> >I'm aware that solr isn't transactional and I don't need this
> property:
> > a single application is indexing.
> > With solr 4.6 I was noting a different behaviour, with 4.10 I'm observing
> > the desired one.
> > I'd like to know If I can count on this behaviour to be maintained by
> > successive solr version.
> >
> > 2015-11-18 16:51 GMT+01:00 Shawn Heisey :
> >
> >> On 11/18/2015 8:21 AM, Matteo Grolla wrote:
> >> > On Solr 4.10.3 I'm noting a different (desired) behaviour
> >> >
> >> > 1) add document x
> >> > 2) delete document x
> >> > 3) commit
> >> >
> >> > document x doesn't get indexed.
> >>
> >> If the last operation for document X is to delete it, then it will be
> >> gone after the commit and not searchable.
> >>
> >> Order of operations is critical, and it's important to realize that Solr
> >> is not transactional.  With a relational database like MySQL, updates
> >> made by one client can be logically separate from updates made by
> >> another client.  Solr (Lucene) does not have that logical separation.
> >> When a commit happens, no matter where the commit comes from, changes
> >> made by ALL clients before that commit will become visible.
> >>
> >> Thanks,
> >> Shawn
> >>
> >>
>


Re: Error in log after upgrading Solr

2015-11-18 Thread Shawn Heisey
On 11/17/2015 12:42 AM, Shawn Heisey wrote:
> I have upgraded from 5.2.1 to a 5.3.2 snapshot -- the lucene_solr_5_3
> branch plus the patch for SOLR-6188.
>
> I'm getting errors in my log every time I make a commit on a core.
>
> 2015-11-16 20:28:11.554 ERROR
> (searcherExecutor-82-thread-1-processing-x:sparkinclive) [  
> x:sparkinclive] o.a.s.c.SolrCore Previous SolrRequestInfo was not
> closed! 
> req=waitSearcher=true&commit=true&wt=javabin&version=2&softCommit=true
> 2015-11-16 20:28:11.554 ERROR
> (searcherExecutor-82-thread-1-processing-x:sparkinclive) [  
> x:sparkinclive] o.a.s.c.SolrCore prev == info : false
> 2015-11-16 20:28:11.554 INFO 
> (searcherExecutor-82-thread-1-processing-x:sparkinclive) [  
> x:sparkinclive] o.a.s.c.S.Request [sparkinclive] webapp=null path=null
> params={sort=post_date+desc&event=newSearcher&q=*:*&distrib=false&qt=/lbcheck&rows=1}
> hits=459866 status=0 QTime=0
> 2015-11-16 20:28:11.554 INFO 
> (searcherExecutor-82-thread-1-processing-x:sparkinclive) [  
> x:sparkinclive] o.a.s.c.SolrCore QuerySenderListener done.

These errors persist after a complete index rebuild.  I haven't done any
*extensive* checks, but so far the index seems to work correctly.  Do I
need to be concerned about this?

Thanks,
Shawn



unsubscribe me.

2015-11-18 Thread Pramod

please unsubscribe me.

Regards,
YP


Re: unsubscribe me.

2015-11-18 Thread davidphilip cherian
You should probably send an email to solr-user-unsubscr...@lucene.apache.org


Reference links

http://lucene.apache.org/solr/resources.html#community

https://wiki.apache.org/solr/Unsubscribing%20from%20mailing%20lists

On Wed, Nov 18, 2015 at 1:04 PM, Pramod  wrote:

> please unsubscribe me.
>
> Regards,
> YP
>


Re: Limiting number of parallel queries per user

2015-11-18 Thread deansg
Just an update: my problem turned out to be that in the search-component, I
decremented the entry for the user running a query in the first call to
finishStage, and didn't realize that most of the query processing and time
occurs only in later stages. 
Because the entry was decremented so quickly, the logs made it seem like
Solr is running the request serially, when it was just that I was sending
queries more slowly than my SearchComponent was processing them.
The search component now works and does limit the amount of queries a user
runs in parallel (which is a benefit in our specific case).



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Limiting-number-of-parallel-queries-per-user-tp4240566p4240851.html
Sent from the Solr - User mailing list archive at Nabble.com.


Implementing security.json is breaking ADDREPLICA

2015-11-18 Thread Oakley, Craig (NIH/NLM/NCBI) [C]
Implementing security.json is breaking ADDREPLICA

I have been able to reproduce this issue with minimal changes from an 
out-of-the-box Zookeeper (3.4.6) and Solr (5.3.1): loading 
configsets/basic_configs/conf into Zookeeper, creating the security.json listed 
below, creating two nodes (one with a core named xmpl and one without any 
core)- I can provide details if helpful.

The security.json is as follows:

{
  "authentication":{
    "class":"solr.BasicAuthPlugin",
    "credentials":{
  "solr":"IV0EHq1OnNrj6gvRCwvFwTrZ1+z1oBbnQdiVC3otuq0= 
Ndd7LKvVBAaZIF0QAVi1ekCfAJXr1GGfLtRUXhgrF8c=",
  "solruser":"VgZX1TAMNHT2IJikoGdKtxQdXc+MbNwfqzf89YqcLEE= 
37pPWQ9v4gciIKHuTmFmN0Rv66rnlMOFEWfEy9qjJfY="},
    "":{"v":9}},
  "authorization":{
    "class":"solr.RuleBasedAuthorizationPlugin",
    "user-role":{
  "solr":[
    "admin",
    "read",
    "xmpladmin",
    "xmplgen",
    "xmplsel"],
  "solruser":[
    "read",
    "xmplgen",
    "xmplsel"]},
    "permissions":[
  {
    "name":"security-edit",
    "role":"admin"},
  {
    "name":"xmpl_admin",
    "collection":"xmpl",
    "path":"/admin/*",
    "role":"xmpladmin"},
  {
    "name":"xmpl_sel",
    "collection":"xmpl",
    "path":"/select/*",
    "role":null},
  {
    "name":"xmpl_gen",
    "collection":"xmpl",
    "path":"/*",
    "role":"xmplgen"}],
    "":{"v":42}}}





When I then execute admin/collections?action=ADDREPLICA, I get errors such as 
the following in the solr.log of the node which was created without a core.

INFO  - 2015-11-17 21:03:54.157; [c:xmpl s:shard1 r:core_node2 
x:xmpl_shard1_replica1] org.apache.solr.cloud.RecoveryStrategy; Starting 
Replication Recovery.
INFO  - 2015-11-17 21:03:54.158; [c:xmpl s:shard1 r:core_node2 
x:xmpl_shard1_replica1] org.apache.solr.cloud.RecoveryStrategy; Begin buffering 
updates.
INFO  - 2015-11-17 21:03:54.158; [c:xmpl s:shard1 r:core_node2 
x:xmpl_shard1_replica1] org.apache.solr.update.UpdateLog; Starting to buffer 
updates. FSUpdateLog{state=ACTIVE, tlog=null}
INFO  - 2015-11-17 21:03:54.159; [c:xmpl s:shard1 r:core_node2 
x:xmpl_shard1_replica1] org.apache.solr.cloud.RecoveryStrategy; Attempting to 
replicate from http://{IP-address-redacted}:4565/solr/xmpl/.
ERROR - 2015-11-17 21:03:54.166; [c:xmpl s:shard1 r:core_node2 
x:xmpl_shard1_replica1] org.apache.solr.common.SolrException; Error while 
trying to 
recover:org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: 
Error from server at http://{IP-address-redacted}:4565/solr/xmpl: Expected mime 
type application/octet-stream but got text/html. 


Error 401 Unauthorized request, Response code: 401

HTTP ERROR 401
Problem accessing /solr/xmpl/update. Reason:
    Unauthorized request, Response code: 
401Powered by Jetty://




    at 
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:528)
    at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:234)
    at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:226)
    at 
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:135)
    at 
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:152)
    at 
org.apache.solr.cloud.RecoveryStrategy.commitOnLeader(RecoveryStrategy.java:207)
    at 
org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:147)
    at 
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:437)
    at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:227)

INFO  - 2015-11-17 21:03:54.166; [c:xmpl s:shard1 r:core_node2 
x:xmpl_shard1_replica1] org.apache.solr.update.UpdateLog; Dropping buffered 
updates FSUpdateLog{state=BUFFERING, tlog=null}
ERROR - 2015-11-17 21:03:54.166; [c:xmpl s:shard1 r:core_node2 
x:xmpl_shard1_replica1] org.apache.solr.cloud.RecoveryStrategy; Recovery failed 
- trying again... (2)
INFO  - 2015-11-17 21:03:54.166; [c:xmpl s:shard1 r:core_node2 
x:xmpl_shard1_replica1] org.apache.solr.cloud.RecoveryStrategy; Wait 8.0 
seconds before trying to recover again (3)



And (after modifying Logging Levels), the solr.log of the node which already 
had a core gets errors such as the following:

2015-11-17 21:03:50.743 DEBUG (qtp59559151-87) [   ] o.e.j.s.Server REQUEST GET 
/solr/tpl/cloud.html on 
HttpChannelOverHttp@37cf94f4{r=1,c=false,a=DISPATCHED,uri=/solr/tpl/cloud.html}
2015-11-17 21:03:50.744 DEBUG (qtp59559151-87) [   ] o.e.j.s.Server RESPONSE 
/solr/tpl/cloud.html  200 handled=true
2015-11-17 21:03:50.802 DEBUG (qtp59559151-91) [   ] o.e.j.s.Server REQUEST GET 
/solr/zookeeper on 
HttpChannelOverHttp@37cf94f4{r=2,c=false,a=DISPATCHED,uri=/solr/zookeeper}
2015-11-17 21:03:50.803 INFO  (qtp59559151-91) [   ] o.a.s.s.HttpSolrCall 
userPrincipal: [null] type: [UNKNOWN], collections: [], Path: [/zookeeper]
2015-11-17 21:03:50.831 DEBUG (qtp5955

Re: Security Problems

2015-11-18 Thread Upayavira
I'm very happy for the admin UI to be served another way - i.e. not
direct from Jetty, if that makes the task of securing it easier.

Perhaps a request handler specifically for UI resources which would make
it possible to secure it all in a more straight-forward way?

Upayavira

On Wed, Nov 18, 2015, at 01:54 PM, Noble Paul wrote:
> As of now the admin-ui calls are not protected. The static calls are
> served by jetty and it bypasses the authentication mechanism
> completely. If the admin UI relies on some API call which is served by
> Solr.
> The other option is to revamp the framework to take care of admin UI
> (static content) as well. This would be cleaner solution
> 
> 
> On Wed, Nov 18, 2015 at 2:32 PM, Upayavira  wrote:
> > Not sure I quite understand.
> >
> > You're saying that the cost for the UI is not large, but then suggesting
> > we protect just one resource (/admin/security-check)?
> >
> > Why couldn't we create the permission called 'admin-ui' and protect
> > everything under /admin/ui/ for example? Along with the root HTML link
> > too.
> >
> > Upayavira
> >
> > On Wed, Nov 18, 2015, at 07:46 AM, Noble Paul wrote:
> >> The authentication plugin is not expensive if you are talking in the
> >> context of admin UI. After all it is used not like 100s of requests
> >> per second.
> >>
> >> The simplest solution would be
> >>
> >> provide a well known permission name called "admin-ui"
> >>
> >> ensure that every admin page load makes a call to some resource say
> >> "/admin/security-check"
> >>
> >> Then we can just protect that .
> >>
> >> The only concern thatI have is the false sense of security it would
> >> give to the user
> >>
> >> But, that is a different point altogether
> >>
> >> On Wed, Nov 11, 2015 at 1:52 AM, Upayavira  wrote:
> >> > Is the authentication plugin that expensive?
> >> >
> >> > I can help by minifying the UI down to a smaller number of CSS/JS/etc
> >> > files :-)
> >> >
> >> > It may be overkill, but it would also give better experience. And isn't
> >> > that what most applications do? Check authentication tokens on every
> >> > request?
> >> >
> >> > Upayavira
> >> >
> >> > On Tue, Nov 10, 2015, at 07:33 PM, Anshum Gupta wrote:
> >> >> The reason why we bypass that is so that we don't hit the authentication
> >> >> plugin for every request that comes in for static content. I think we
> >> >> could
> >> >> call the authentication plugin for that but that'd be an overkill. 
> >> >> Better
> >> >> experience ? yes
> >> >>
> >> >> On Tue, Nov 10, 2015 at 11:24 AM, Upayavira  wrote:
> >> >>
> >> >> > Noble,
> >> >> >
> >> >> > I get that a UI which is open source does not benefit from ACL 
> >> >> > control -
> >> >> > we're not giving away anything that isn't public (other than perhaps
> >> >> > info that could be used to identify the version of Solr, or even the
> >> >> > fact that it *is* solr).
> >> >> >
> >> >> > However, from a user experience point of view, requiring credentials 
> >> >> > to
> >> >> > see the UI would be more conventional, and therefore lead to less
> >> >> > confusion. Is it possible for us to protect the UI static files, only
> >> >> > for the sake of user experience, rather than security?
> >> >> >
> >> >> > Upayavira
> >> >> >
> >> >> > On Tue, Nov 10, 2015, at 12:01 PM, Noble Paul wrote:
> >> >> > > The admin UI is a bunch of static pages . We don't let the ACL 
> >> >> > > control
> >> >> > > static content
> >> >> > >
> >> >> > > you must blacklist all the core/collection apis and it is pretty 
> >> >> > > much
> >> >> > > useless for anyone to access the admin UI (w/o the credentials , of
> >> >> > > course)
> >> >> > >
> >> >> > > On Tue, Nov 10, 2015 at 7:08 AM, 马柏樟  wrote:
> >> >> > > > Hi,
> >> >> > > >
> >> >> > > > After I configure Authentication with Basic Authentication Plugin 
> >> >> > > > and
> >> >> > Authorization with Rule-Based Authorization Plugin, How can I prevent 
> >> >> > the
> >> >> > strangers from visiting my solr by browser? For example, if the 
> >> >> > stranger
> >> >> > visit the http://(my host):8983, the browser will pop up a window and
> >> >> > says "the server http://(my host):8983 requires a username and
> >> >> > password"
> >> >> > >
> >> >> > >
> >> >> > >
> >> >> > > --
> >> >> > > -
> >> >> > > Noble Paul
> >> >> >
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Anshum Gupta
> >>
> >>
> >>
> >> --
> >> -
> >> Noble Paul
> 
> 
> 
> -- 
> -
> Noble Paul


Boost non stemmed keywords (KStem filter)

2015-11-18 Thread bbarani
Hi,

I am using KStem factory for stemming. This stemmer converts 'france to
french', 'chinese to china' etc.. I am good with this stemming but I am
trying to boost the results that contain the original term compared to the
stemmed terms. Is this possible?

Thanks,
Learner




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Boost-non-stemmed-keywords-KStem-filter-tp4240880.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Error in log after upgrading Solr

2015-11-18 Thread Chris Hostetter

: > I'm getting errors in my log every time I make a commit on a core.

Do you have any custom plugins? 
what is the definition of the /lbcheck handler?

: > 2015-11-16 20:28:11.554 ERROR
: > (searcherExecutor-82-thread-1-processing-x:sparkinclive) [  
: > x:sparkinclive] o.a.s.c.SolrCore Previous SolrRequestInfo was not
: > closed! 
: > req=waitSearcher=true&commit=true&wt=javabin&version=2&softCommit=true
: > 2015-11-16 20:28:11.554 ERROR
: > (searcherExecutor-82-thread-1-processing-x:sparkinclive) [  
: > x:sparkinclive] o.a.s.c.SolrCore prev == info : false

Those log messages ("Previous SolrRequestInfo was not..." are a sanity 
check designed to help catch plugins that aren't cleaning up the thread 
local state tracked in SolrRequestInfo (see 
SolrRequestInfo.setRequestInfo).

speculating here

Perhaps waitSearcher=true combined with QuerySenderListener is an 
exception that's triggering the ERROR in a totally expected situation?  
ie: the thread is processing the request that triggered the "commit" and 
in that thread QuerySenderListener fires off some local solr requests?


Perhaps LocalSolrQueryRequest should be stashing/restoring SolrRequestInfo 
state?

or perhaps SolrCore.execute and/or SolrRequestInfo should do this when it 
sees a LocalSolrQueryRequest ?


Shawn: If my speculations are correct, this should be fairly trivial to 
reproduce with a small generic config -- can you file an jira w/ steps to 
reproduce?



-Hoss
http://www.lucidworks.com/


RE: Boost non stemmed keywords (KStem filter)

2015-11-18 Thread Markus Jelsma
Hi - easiest approach is to use KeywordRepeatFilter and 
RemoveDuplicatesTokenFilter. This creates a slightly higher IDF for unstemmed 
words which might be just enough in your case. We found it not to be enough, so 
we also attach payloads to signify stemmed words amongst others. This allows 
you to decrease score for stemmed words at query time via your similarity impl.

M.

 
 
-Original message-
> From:bbarani 
> Sent: Wednesday 18th November 2015 22:07
> To: solr-user@lucene.apache.org
> Subject: Boost non stemmed keywords (KStem filter)
> 
> Hi,
> 
> I am using KStem factory for stemming. This stemmer converts 'france to
> french', 'chinese to china' etc.. I am good with this stemming but I am
> trying to boost the results that contain the original term compared to the
> stemmed terms. Is this possible?
> 
> Thanks,
> Learner
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Boost-non-stemmed-keywords-KStem-filter-tp4240880.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 


Re: CloudSolrCloud - Commit returns but not all data is visible (occasionally)

2015-11-18 Thread Mark Miller
If you see "WARNING: too many searchers on deck" or something like that in
the logs, that could cause this behavior and would indicate you are opening
searchers faster than Solr can keep up.

- Mark

On Tue, Nov 17, 2015 at 2:05 PM Erick Erickson 
wrote:

> That's what was behind my earlier comment about perhaps
> the call is timing out, thus the commit call is returning
> _before_ the actual searcher is opened. But the call
> coming back is not a return from commit, but from Jetty
> even though the commit hasn't really returned.
>
> Just a guess however.
>
> Best,
> Erick
>
> On Tue, Nov 17, 2015 at 12:11 AM, adfel70  wrote:
> > Thanks Eric,
> > I'll try to play with the autowarm config.
> >
> > But I have a more direct question - why does the commit return without
> > waiting till the searchers are fully refreshed?
> >
> > Could it be that the parameter waitSearcher=true doesn't really work?
> > or maybe I don't understand something here...
> >
> > Thanks,
> >
> >
> >
> >
> > --
> > View this message in context:
> http://lucene.472066.n3.nabble.com/CloudSolrCloud-Commit-returns-but-not-all-data-is-visible-occasionally-tp4240368p4240518.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
>
-- 
- Mark
about.me/markrmiller


Re: Error in log after upgrading Solr

2015-11-18 Thread Shawn Heisey
On 11/18/2015 2:20 PM, Chris Hostetter wrote:
> : > I'm getting errors in my log every time I make a commit on a core.
>
> Do you have any custom plugins? 
> what is the definition of the /lbcheck handler?

I have one simple update processor in use that I wrote myself, and we
have a third-party plugin that we are using.  One of my indexes does not
use either of these, but has them in the configuration so they can be
used later, so I removed those components from the config on those
cores.  The problem still happened on commits in those cores.  Then I
commented the firstSearcher and newSearcher listeners from all the
 configs on the server, and the error stopped appearing in the
log, even on cores still using the custom plugins.  This is the config I
removed:

  

  
*:*
1
post_date desc
/lbcheck
  

  

  

  
*:*
1
post_date desc
/lbcheck
  

  

> Shawn: If my speculations are correct, this should be fairly trivial to 
> reproduce with a small generic config -- can you file an jira w/ steps to 
> reproduce?

I do see the same error message in a failure email from Uwe's Jenkins
server a few weeks ago.  I'll see if I can put together a minimal
configuration to reproduce.

Thanks,
Shawn



RE: CloudSolrCloud - Commit returns but not all data is visible (occasionally)

2015-11-18 Thread Markus Jelsma
Hi - i sometimes see the too many searcher warning to since some 5.x version. 
The warning cloud has no autoCommit and there is only a single process ever 
sending a commit, only once every 10-15 minutes orso. The cores are quite 
small, commits finish quickly and new docs are quickly searchable. I've ignored 
the warning so far, since it makes no sense and the problem is not really 
there. 
 
-Original message-
> From:Mark Miller 
> Sent: Wednesday 18th November 2015 23:24
> To: solr-user 
> Subject: Re: CloudSolrCloud - Commit returns but not all data is visible 
> (occasionally)
> 
> If you see "WARNING: too many searchers on deck" or something like that in
> the logs, that could cause this behavior and would indicate you are opening
> searchers faster than Solr can keep up.
> 
> - Mark
> 
> On Tue, Nov 17, 2015 at 2:05 PM Erick Erickson 
> wrote:
> 
> > That's what was behind my earlier comment about perhaps
> > the call is timing out, thus the commit call is returning
> > _before_ the actual searcher is opened. But the call
> > coming back is not a return from commit, but from Jetty
> > even though the commit hasn't really returned.
> >
> > Just a guess however.
> >
> > Best,
> > Erick
> >
> > On Tue, Nov 17, 2015 at 12:11 AM, adfel70  wrote:
> > > Thanks Eric,
> > > I'll try to play with the autowarm config.
> > >
> > > But I have a more direct question - why does the commit return without
> > > waiting till the searchers are fully refreshed?
> > >
> > > Could it be that the parameter waitSearcher=true doesn't really work?
> > > or maybe I don't understand something here...
> > >
> > > Thanks,
> > >
> > >
> > >
> > >
> > > --
> > > View this message in context:
> > http://lucene.472066.n3.nabble.com/CloudSolrCloud-Commit-returns-but-not-all-data-is-visible-occasionally-tp4240368p4240518.html
> > > Sent from the Solr - User mailing list archive at Nabble.com.
> >
> -- 
> - Mark
> about.me/markrmiller
> 


adding document with nested document require to set id

2015-11-18 Thread CrazyDiamond
i'm trying to add document with the nested objects but don't want id to be
generated automatically.
When i add document without nesting it's ok.But if i add  _childDocuments_
there is an error [doc=null] missing required field: id



--
View this message in context: 
http://lucene.472066.n3.nabble.com/adding-document-with-nested-document-require-to-set-id-tp4240908.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: CloudSolrCloud - Commit returns but not all data is visible (occasionally)

2015-11-18 Thread Shawn Heisey

On 11/17/2015 1:11 AM, adfel70

Could it be that the parameter waitSearcher=true doesn't really work?
or maybe I don't understand something here...


I am just guessing with this, but I think this is likely how it works:

I believe that if maxWarmingSearchers is exceeded, a commit call will 
return more quickly than usual, and Solr will not attempt to open a new 
searcher with that commit, because the threshold has been exceeded.


Basically, when there are too many searchers warming at once, new ones 
cannot be created, which means that Solr cannot make the visibility 
guarantees it usually makes.


CloudSolrServer should be identical in function to CloudSolrClient, I 
don't think you have to worry about it being deprecated for right now.  
You'll want to switch before 6.0.


Thanks,
Shawn



Re: adding document with nested document require to set id

2015-11-18 Thread Alexandre Rafalovitch
If you have id listed as a required field (which I believe you need to
anyway), what do you actually get when you add a document without
nesting? What does the document echo back?

Because if you are getting a document back without id field when it is
declared required in the schema, that would be a problem of its own.

Regards,
   Alex.

Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 18 November 2015 at 17:35, CrazyDiamond  wrote:
> i'm trying to add document with the nested objects but don't want id to be
> generated automatically.
> When i add document without nesting it's ok.But if i add  _childDocuments_
> there is an error [doc=null] missing required field: id
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/adding-document-with-nested-document-require-to-set-id-tp4240908.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Problem with Synchronization

2015-11-18 Thread Byzen Ma
Hi, I encountered some problems with solr-5.3.1. After I initialized the
solrcloud and set up BasicAuthPlugin and RuleBasedAuthorizationPlugin,
something wrong happened to my solrcloud. I can't Synchronization as usual.
The server log as follows:

master log

Invalid key PKIAuthenticationPlugin

silver log

Error while trying to
recover:org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException
: Error from server at http://172.16.200.35:8983/solr/t: Expected MIME type
application/octet-stream but got text/html.  RecoveryStrategy

 

What can I do next?

 

Thanks,

Regards

 



Re: CloudSolrCloud - Commit returns but not all data is visible (occasionally)

2015-11-18 Thread Erick Erickson
bq: Hi - i sometimes see the too many searcher warning to since some
5.x version. The warning cloud has no autoCommit and there is only a
single process ever sending a commit, only once every 10-15 minutes
orso

This is very surprising unless your autowarming is taking 10-15
minutes, almost assuredly impossible given your description. So my
theory is that "something" is sending commits far more often than you
think. I'd take a look at the Solr logs, you should see messages when
commits happen. The logs should also tell you how long autowarming
takes. For that matter so will the plugins/state page.

Something's definitely fishy, I cannot reconcile you getting
occasional messages about too many searchers and that rare a commit.

On Wed, Nov 18, 2015 at 3:11 PM, Shawn Heisey  wrote:
> On 11/17/2015 1:11 AM, adfel70
>>
>> Could it be that the parameter waitSearcher=true doesn't really work?
>> or maybe I don't understand something here...
>
>
> I am just guessing with this, but I think this is likely how it works:
>
> I believe that if maxWarmingSearchers is exceeded, a commit call will return
> more quickly than usual, and Solr will not attempt to open a new searcher
> with that commit, because the threshold has been exceeded.
>
> Basically, when there are too many searchers warming at once, new ones
> cannot be created, which means that Solr cannot make the visibility
> guarantees it usually makes.
>
> CloudSolrServer should be identical in function to CloudSolrClient, I don't
> think you have to worry about it being deprecated for right now.  You'll
> want to switch before 6.0.
>
> Thanks,
> Shawn
>


Shards and Replicas

2015-11-18 Thread Troy Edwards
I am looking for some good articles/guidance on how to determine number of
shards and replicas for an index?

Thanks


Re: Shards and Replicas

2015-11-18 Thread Shawn Heisey
On 11/18/2015 9:02 PM, Troy Edwards wrote:
> I am looking for some good articles/guidance on how to determine number of
> shards and replicas for an index?

The long version:

https://lucidworks.com/blog/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

The short version:

There's no quick formula for figuring out how much hardware you need and
how to divide your index onto that hardware.  There are too many
variables involved.  Building a prototype (or ideally a full-scale
environment) is the only reliable way to figure it out.

Those of us who have been doing this for a long time can make educated
guesses if we are presented with the right pieces of information, but
frequently users will not know some of that information until the system
is put into production and actually handles real queries.

The only general advice I have is this:  It's probably going to cost
more than you think it will.

Thanks,
Shawn



Re: Shards and Replicas

2015-11-18 Thread Jack Krupansky
1. No more than 100 million documents per shard.
2. Number of replicas to meet your query load and to allow for the
possibility that a replica might go down. 2 or 3, maybe 4.
3. Proof of concept implementation to validate the number of documents that
will query well for a given number of documents per shard. But be aware
that a query for the sharded version will be slower than for a single-shard
implementation.

-- Jack Krupansky

On Wed, Nov 18, 2015 at 11:02 PM, Troy Edwards 
wrote:

> I am looking for some good articles/guidance on how to determine number of
> shards and replicas for an index?
>
> Thanks
>


Re: Implementing security.json is breaking ADDREPLICA

2015-11-18 Thread Anshum Gupta
Hi Craig,

Just to be sure that you're using the feature as it should be used, can you
outline what is it that you're trying to do here? There are a few things
that aren't clear to me here, e.g. I see permissions for the /admin handler
for a particular collection.

What are the kind of permissions you're trying to set up.

Solr uses it's internal PKI based mechanism for inter-shard communication
and so you shouldn't really be hitting this. Can you check your logs and
tell me if there are any other exceptions you see while bringing the node
up etc. ? Something from PKI itself.

About restricting the UI, there's another thread in parallel that's been
discussing exactly that. The thing with the current UI implementation is
that it bypasses all of this, primarily because most of that content is
static. I am not saying we should be able to put it behind the
authentication layer, but just that it's not currently supported through
this plugin.

On Wed, Nov 18, 2015 at 11:20 AM, Oakley, Craig (NIH/NLM/NCBI) [C] <
craig.oak...@nih.gov> wrote:

> Implementing security.json is breaking ADDREPLICA
>
> I have been able to reproduce this issue with minimal changes from an
> out-of-the-box Zookeeper (3.4.6) and Solr (5.3.1): loading
> configsets/basic_configs/conf into Zookeeper, creating the security.json
> listed below, creating two nodes (one with a core named xmpl and one
> without any core)- I can provide details if helpful.
>
> The security.json is as follows:
>
> {
>   "authentication":{
> "class":"solr.BasicAuthPlugin",
> "credentials":{
>   "solr":"IV0EHq1OnNrj6gvRCwvFwTrZ1+z1oBbnQdiVC3otuq0=
> Ndd7LKvVBAaZIF0QAVi1ekCfAJXr1GGfLtRUXhgrF8c=",
>   "solruser":"VgZX1TAMNHT2IJikoGdKtxQdXc+MbNwfqzf89YqcLEE=
> 37pPWQ9v4gciIKHuTmFmN0Rv66rnlMOFEWfEy9qjJfY="},
> "":{"v":9}},
>   "authorization":{
> "class":"solr.RuleBasedAuthorizationPlugin",
> "user-role":{
>   "solr":[
> "admin",
> "read",
> "xmpladmin",
> "xmplgen",
> "xmplsel"],
>   "solruser":[
> "read",
> "xmplgen",
> "xmplsel"]},
> "permissions":[
>   {
> "name":"security-edit",
> "role":"admin"},
>   {
> "name":"xmpl_admin",
> "collection":"xmpl",
> "path":"/admin/*",
> "role":"xmpladmin"},
>   {
> "name":"xmpl_sel",
> "collection":"xmpl",
> "path":"/select/*",
> "role":null},
>   {
> "name":"xmpl_gen",
> "collection":"xmpl",
> "path":"/*",
> "role":"xmplgen"}],
> "":{"v":42}}}
>
>
>
>
>
> When I then execute admin/collections?action=ADDREPLICA, I get errors such
> as the following in the solr.log of the node which was created without a
> core.
>
> INFO  - 2015-11-17 21:03:54.157; [c:xmpl s:shard1 r:core_node2
> x:xmpl_shard1_replica1] org.apache.solr.cloud.RecoveryStrategy; Starting
> Replication Recovery.
> INFO  - 2015-11-17 21:03:54.158; [c:xmpl s:shard1 r:core_node2
> x:xmpl_shard1_replica1] org.apache.solr.cloud.RecoveryStrategy; Begin
> buffering updates.
> INFO  - 2015-11-17 21:03:54.158; [c:xmpl s:shard1 r:core_node2
> x:xmpl_shard1_replica1] org.apache.solr.update.UpdateLog; Starting to
> buffer updates. FSUpdateLog{state=ACTIVE, tlog=null}
> INFO  - 2015-11-17 21:03:54.159; [c:xmpl s:shard1 r:core_node2
> x:xmpl_shard1_replica1] org.apache.solr.cloud.RecoveryStrategy; Attempting
> to replicate from http://{IP-address-redacted}:4565/solr/xmpl/.
> ERROR - 2015-11-17 21:03:54.166; [c:xmpl s:shard1 r:core_node2
> x:xmpl_shard1_replica1] org.apache.solr.common.SolrException; Error while
> trying to
> recover:org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:
> Error from server at http://{IP-address-redacted}:4565/solr/xmpl:
> Expected mime type application/octet-stream but got text/html. 
> 
> 
> Error 401 Unauthorized request, Response code: 401
> 
> HTTP ERROR 401
> Problem accessing /solr/xmpl/update. Reason:
> Unauthorized request, Response code:
> 401Powered by Jetty://
>
> 
> 
>
> at
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:528)
> at
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:234)
> at
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:226)
> at
> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:135)
> at
> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:152)
> at
> org.apache.solr.cloud.RecoveryStrategy.commitOnLeader(RecoveryStrategy.java:207)
> at
> org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:147)
> at
> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:437)
> at
> org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:227)
>
> INFO  - 2015-11-17 21:03:54.166; [c:xmpl s:shard1 r:core_node2
> x:xmpl_shard1_replica1] org.apache.