edismax/boost: certain documents should be last

2011-10-28 Thread Paul
(I am using solr 3.4 and edismax.)

In my index, I have a multivalued field named "genre". One of the
values this field can have is "Citation". I would like documents that
have a genre field of Citation to always be at the bottom of the
search results.

I've been experimenting, but I can't seem to figure out the syntax of
the search I need. Here is the search that seems most logical to me
(newlines added here for readability):

q=%2bcontent%3Anotes+genre%3ACitation^0.01
&start=0
&rows=3
&fl=genre+title
&version=2.2
&defType=edismax

I get the same results whether I include "genre%3ACitation^0.01" or not.

Just to see if my names were correct, I put a minus sign before
"genre" and it did, in fact, stop returning all the documents
containing Citation.

What am I doing wrong?

Here are the results from the above query:


  
0
1

  genre title 
  0
  +content:notes genre:Citation^0.01
  3
  2.2
  edismax

  
  

  CitationFiction
  Notes on novelists With some other notes


  Citation
  Novel notes


  Citation
  Knock about notes

  



Re: edismax/boost: certain documents should be last

2011-10-31 Thread Paul
Thanks Erik. They don't need to absolutely always be the bottom-most
-- just not near the top. But that sounds like an easy way to do it,
especially since it is a lot easier to reindex now than it used to be.

I would like to know why my query had no effect, though. There's
obviously something I don't get about queries.

On Mon, Oct 31, 2011 at 10:08 AM, Erik Hatcher  wrote:
> Paul (*bows* to the NINES!) -
>
> If you literally want Citations always at the bottom regardless of other 
> relevancy, then perhaps consider indexing boolean top_sort as true for 
> everything Citations and false otherwise, then use &sort=top_sort asc,score 
> desc (or do you need to desc top_sort?  true then false or false then true?)
>
> Then you can have Citations literally at the bottom (and within that sorted 
> in score order) and likewise with non-Citations at the top and sorted score 
> order within that.  Other tricks still risk having Citations mixed in should 
> relevancy score be high enough.
>
> The morale of this story is: if you want to hard sort by something, then make 
> a sort field that does it how you like rather than trying to get relevancy 
> scoring to do it for you.
>
>        Erik
>
>
> On Oct 28, 2011, at 17:17 , Paul wrote:
>
>> (I am using solr 3.4 and edismax.)
>>
>> In my index, I have a multivalued field named "genre". One of the
>> values this field can have is "Citation". I would like documents that
>> have a genre field of Citation to always be at the bottom of the
>> search results.
>>
>> I've been experimenting, but I can't seem to figure out the syntax of
>> the search I need. Here is the search that seems most logical to me
>> (newlines added here for readability):
>>
>> q=%2bcontent%3Anotes+genre%3ACitation^0.01
>> &start=0
>> &rows=3
>> &fl=genre+title
>> &version=2.2
>> &defType=edismax
>>
>> I get the same results whether I include "genre%3ACitation^0.01" or not.
>>
>> Just to see if my names were correct, I put a minus sign before
>> "genre" and it did, in fact, stop returning all the documents
>> containing Citation.
>>
>> What am I doing wrong?
>>
>> Here are the results from the above query:
>>
>> 
>>  
>>    0
>>    1
>>    
>>      genre title 
>>      0
>>      +content:notes genre:Citation^0.01
>>      3
>>      2.2
>>      edismax
>>    
>>  
>>  
>>    
>>      CitationFiction
>>      Notes on novelists With some other notes
>>    
>>    
>>      Citation
>>      Novel notes
>>    
>>    
>>      Citation
>>      Knock about notes
>>    
>>  
>> 
>
>


Re: edismax/boost: certain documents should be last

2011-10-31 Thread Paul
I had been experimenting with bq.

I switched to boost like you suggested, and get the following error
from solr: "can not use FieldCache on multivalued field: genre"

But that sounds like the solution I'd want, if it worked, since it's
more flexible than having to reindex.

On Mon, Oct 31, 2011 at 10:41 AM, Erik Hatcher  wrote:
> Paul - look at debugQuery=true output to see why scores end up the way they 
> do.  Use the explainOther to hone in on a specific document to get it's 
> explanation.  The math'll tell you why it's working the way it is.  It's more 
> than just likely that some other scoring factors are overweighting things.
>
> Also, now that I think about it, you'd be better off leveraging edismax and 
> the boost parameter.  Don't mess with your main q(uery), use 
> boost=genre:Citation^0.01 or something like that.  boost params (not bq!) are 
> multiplied into the score, not added.  Maybe that'll be more to your liking?
>
>        Erik
>
>
> On Oct 31, 2011, at 10:19 , Paul wrote:
>
>> Thanks Erik. They don't need to absolutely always be the bottom-most
>> -- just not near the top. But that sounds like an easy way to do it,
>> especially since it is a lot easier to reindex now than it used to be.
>>
>> I would like to know why my query had no effect, though. There's
>> obviously something I don't get about queries.
>>
>> On Mon, Oct 31, 2011 at 10:08 AM, Erik Hatcher  
>> wrote:
>>> Paul (*bows* to the NINES!) -
>>>
>>> If you literally want Citations always at the bottom regardless of other 
>>> relevancy, then perhaps consider indexing boolean top_sort as true for 
>>> everything Citations and false otherwise, then use &sort=top_sort asc,score 
>>> desc (or do you need to desc top_sort?  true then false or false then true?)
>>>
>>> Then you can have Citations literally at the bottom (and within that sorted 
>>> in score order) and likewise with non-Citations at the top and sorted score 
>>> order within that.  Other tricks still risk having Citations mixed in 
>>> should relevancy score be high enough.
>>>
>>> The morale of this story is: if you want to hard sort by something, then 
>>> make a sort field that does it how you like rather than trying to get 
>>> relevancy scoring to do it for you.
>>>
>>>        Erik
>>>
>>>
>>> On Oct 28, 2011, at 17:17 , Paul wrote:
>>>
>>>> (I am using solr 3.4 and edismax.)
>>>>
>>>> In my index, I have a multivalued field named "genre". One of the
>>>> values this field can have is "Citation". I would like documents that
>>>> have a genre field of Citation to always be at the bottom of the
>>>> search results.
>>>>
>>>> I've been experimenting, but I can't seem to figure out the syntax of
>>>> the search I need. Here is the search that seems most logical to me
>>>> (newlines added here for readability):
>>>>
>>>> q=%2bcontent%3Anotes+genre%3ACitation^0.01
>>>> &start=0
>>>> &rows=3
>>>> &fl=genre+title
>>>> &version=2.2
>>>> &defType=edismax
>>>>
>>>> I get the same results whether I include "genre%3ACitation^0.01" or not.
>>>>
>>>> Just to see if my names were correct, I put a minus sign before
>>>> "genre" and it did, in fact, stop returning all the documents
>>>> containing Citation.
>>>>
>>>> What am I doing wrong?
>>>>
>>>> Here are the results from the above query:
>>>>
>>>> 
>>>>  
>>>>    0
>>>>    1
>>>>    
>>>>      genre title 
>>>>      0
>>>>      +content:notes genre:Citation^0.01
>>>>      3
>>>>      2.2
>>>>      edismax
>>>>    
>>>>  
>>>>  
>>>>    
>>>>      CitationFiction
>>>>      Notes on novelists With some other notes
>>>>    
>>>>    
>>>>      Citation
>>>>      Novel notes
>>>>    
>>>>    
>>>>      Citation
>>>>      Knock about notes
>>>>    
>>>>  
>>>> 
>>>
>>>
>
>


Re: edismax/boost: certain documents should be last

2011-10-31 Thread Paul
I studied the results with debugQuery, and I understand how the search
is working. The two scores for the two terms are added together, so
specifying a boost less than one still adds to the score. For
instance, in the first result, content:notes has a score of 1.4892359
and genre:Citation^0.01 has a score of 0.0045761107. Those two numbers
are added together to get the total score.

I tried putting a negative boost number in, but that isn't legal.

Is there a way to express "genre does not equal Citation" as an
optional parameter that I can boost?

On Mon, Oct 31, 2011 at 2:26 PM, Paul  wrote:
> I had been experimenting with bq.
>
> I switched to boost like you suggested, and get the following error
> from solr: "can not use FieldCache on multivalued field: genre"
>
> But that sounds like the solution I'd want, if it worked, since it's
> more flexible than having to reindex.
>
> On Mon, Oct 31, 2011 at 10:41 AM, Erik Hatcher  wrote:
>> Paul - look at debugQuery=true output to see why scores end up the way they 
>> do.  Use the explainOther to hone in on a specific document to get it's 
>> explanation.  The math'll tell you why it's working the way it is.  It's 
>> more than just likely that some other scoring factors are overweighting 
>> things.
>>
>> Also, now that I think about it, you'd be better off leveraging edismax and 
>> the boost parameter.  Don't mess with your main q(uery), use 
>> boost=genre:Citation^0.01 or something like that.  boost params (not bq!) 
>> are multiplied into the score, not added.  Maybe that'll be more to your 
>> liking?
>>
>>        Erik
>>
>>
>> On Oct 31, 2011, at 10:19 , Paul wrote:
>>
>>> Thanks Erik. They don't need to absolutely always be the bottom-most
>>> -- just not near the top. But that sounds like an easy way to do it,
>>> especially since it is a lot easier to reindex now than it used to be.
>>>
>>> I would like to know why my query had no effect, though. There's
>>> obviously something I don't get about queries.
>>>
>>> On Mon, Oct 31, 2011 at 10:08 AM, Erik Hatcher  
>>> wrote:
>>>> Paul (*bows* to the NINES!) -
>>>>
>>>> If you literally want Citations always at the bottom regardless of other 
>>>> relevancy, then perhaps consider indexing boolean top_sort as true for 
>>>> everything Citations and false otherwise, then use &sort=top_sort 
>>>> asc,score desc (or do you need to desc top_sort?  true then false or false 
>>>> then true?)
>>>>
>>>> Then you can have Citations literally at the bottom (and within that 
>>>> sorted in score order) and likewise with non-Citations at the top and 
>>>> sorted score order within that.  Other tricks still risk having Citations 
>>>> mixed in should relevancy score be high enough.
>>>>
>>>> The morale of this story is: if you want to hard sort by something, then 
>>>> make a sort field that does it how you like rather than trying to get 
>>>> relevancy scoring to do it for you.
>>>>
>>>>        Erik
>>>>
>>>>
>>>> On Oct 28, 2011, at 17:17 , Paul wrote:
>>>>
>>>>> (I am using solr 3.4 and edismax.)
>>>>>
>>>>> In my index, I have a multivalued field named "genre". One of the
>>>>> values this field can have is "Citation". I would like documents that
>>>>> have a genre field of Citation to always be at the bottom of the
>>>>> search results.
>>>>>
>>>>> I've been experimenting, but I can't seem to figure out the syntax of
>>>>> the search I need. Here is the search that seems most logical to me
>>>>> (newlines added here for readability):
>>>>>
>>>>> q=%2bcontent%3Anotes+genre%3ACitation^0.01
>>>>> &start=0
>>>>> &rows=3
>>>>> &fl=genre+title
>>>>> &version=2.2
>>>>> &defType=edismax
>>>>>
>>>>> I get the same results whether I include "genre%3ACitation^0.01" or not.
>>>>>
>>>>> Just to see if my names were correct, I put a minus sign before
>>>>> "genre" and it did, in fact, stop returning all the documents
>>>>> containing Citation.
>>>>>
>>>>> What am I doing wrong?
>>>>>
>>>>> Here are the results from the above query:
>>>>>
>>>>> 
>>>>>  
>>>>>    0
>>>>>    1
>>>>>    
>>>>>      genre title 
>>>>>      0
>>>>>      +content:notes genre:Citation^0.01
>>>>>      3
>>>>>      2.2
>>>>>      edismax
>>>>>    
>>>>>  
>>>>>  
>>>>>    
>>>>>      CitationFiction
>>>>>      Notes on novelists With some other notes
>>>>>    
>>>>>    
>>>>>      Citation
>>>>>      Novel notes
>>>>>    
>>>>>    
>>>>>      Citation
>>>>>      Knock about notes
>>>>>    
>>>>>  
>>>>> 
>>>>
>>>>
>>
>>
>


Re: edismax/boost: certain documents should be last

2011-10-31 Thread Paul
(Sorry for so many messages in a row...)

For the record, I figured out something that will work, although it is
somewhat inelegant. My q parameter is now:

(+content:notes -genre:Citation)^20 (+content:notes genre:Citation)^0.01

Can I improve on that?

On Mon, Oct 31, 2011 at 5:52 PM, Paul  wrote:
> I studied the results with debugQuery, and I understand how the search
> is working. The two scores for the two terms are added together, so
> specifying a boost less than one still adds to the score. For
> instance, in the first result, content:notes has a score of 1.4892359
> and genre:Citation^0.01 has a score of 0.0045761107. Those two numbers
> are added together to get the total score.
>
> I tried putting a negative boost number in, but that isn't legal.
>
> Is there a way to express "genre does not equal Citation" as an
> optional parameter that I can boost?
>
> On Mon, Oct 31, 2011 at 2:26 PM, Paul  wrote:
>> I had been experimenting with bq.
>>
>> I switched to boost like you suggested, and get the following error
>> from solr: "can not use FieldCache on multivalued field: genre"
>>
>> But that sounds like the solution I'd want, if it worked, since it's
>> more flexible than having to reindex.
>>
>> On Mon, Oct 31, 2011 at 10:41 AM, Erik Hatcher  
>> wrote:
>>> Paul - look at debugQuery=true output to see why scores end up the way they 
>>> do.  Use the explainOther to hone in on a specific document to get it's 
>>> explanation.  The math'll tell you why it's working the way it is.  It's 
>>> more than just likely that some other scoring factors are overweighting 
>>> things.
>>>
>>> Also, now that I think about it, you'd be better off leveraging edismax and 
>>> the boost parameter.  Don't mess with your main q(uery), use 
>>> boost=genre:Citation^0.01 or something like that.  boost params (not bq!) 
>>> are multiplied into the score, not added.  Maybe that'll be more to your 
>>> liking?
>>>
>>>        Erik
>>>
>>>
>>> On Oct 31, 2011, at 10:19 , Paul wrote:
>>>
>>>> Thanks Erik. They don't need to absolutely always be the bottom-most
>>>> -- just not near the top. But that sounds like an easy way to do it,
>>>> especially since it is a lot easier to reindex now than it used to be.
>>>>
>>>> I would like to know why my query had no effect, though. There's
>>>> obviously something I don't get about queries.
>>>>
>>>> On Mon, Oct 31, 2011 at 10:08 AM, Erik Hatcher  
>>>> wrote:
>>>>> Paul (*bows* to the NINES!) -
>>>>>
>>>>> If you literally want Citations always at the bottom regardless of other 
>>>>> relevancy, then perhaps consider indexing boolean top_sort as true for 
>>>>> everything Citations and false otherwise, then use &sort=top_sort 
>>>>> asc,score desc (or do you need to desc top_sort?  true then false or 
>>>>> false then true?)
>>>>>
>>>>> Then you can have Citations literally at the bottom (and within that 
>>>>> sorted in score order) and likewise with non-Citations at the top and 
>>>>> sorted score order within that.  Other tricks still risk having Citations 
>>>>> mixed in should relevancy score be high enough.
>>>>>
>>>>> The morale of this story is: if you want to hard sort by something, then 
>>>>> make a sort field that does it how you like rather than trying to get 
>>>>> relevancy scoring to do it for you.
>>>>>
>>>>>        Erik
>>>>>
>>>>>
>>>>> On Oct 28, 2011, at 17:17 , Paul wrote:
>>>>>
>>>>>> (I am using solr 3.4 and edismax.)
>>>>>>
>>>>>> In my index, I have a multivalued field named "genre". One of the
>>>>>> values this field can have is "Citation". I would like documents that
>>>>>> have a genre field of Citation to always be at the bottom of the
>>>>>> search results.
>>>>>>
>>>>>> I've been experimenting, but I can't seem to figure out the syntax of
>>>>>> the search I need. Here is the search that seems most logical to me
>>>>>> (newlines added here for readability):
>>>>>>
>>>>>> q=%2bcontent%3Anotes+genre%3ACitation^0.01
>>>>>> &start=0
>>>>>> &rows=3
>>>>>> &fl=genre+title
>>>>>> &version=2.2
>>>>>> &defType=edismax
>>>>>>
>>>>>> I get the same results whether I include "genre%3ACitation^0.01" or not.
>>>>>>
>>>>>> Just to see if my names were correct, I put a minus sign before
>>>>>> "genre" and it did, in fact, stop returning all the documents
>>>>>> containing Citation.
>>>>>>
>>>>>> What am I doing wrong?
>>>>>>
>>>>>> Here are the results from the above query:
>>>>>>
>>>>>> 
>>>>>>  
>>>>>>    0
>>>>>>    1
>>>>>>    
>>>>>>      genre title 
>>>>>>      0
>>>>>>      +content:notes genre:Citation^0.01
>>>>>>      3
>>>>>>      2.2
>>>>>>      edismax
>>>>>>    
>>>>>>  
>>>>>>  
>>>>>>    
>>>>>>      CitationFiction
>>>>>>      Notes on novelists With some other notes
>>>>>>    
>>>>>>    
>>>>>>      Citation
>>>>>>      Novel notes
>>>>>>    
>>>>>>    
>>>>>>      Citation
>>>>>>      Knock about notes
>>>>>>    
>>>>>>  
>>>>>> 
>>>>>
>>>>>
>>>
>>>
>>
>


limiting the total number of documents matched

2010-07-14 Thread Paul
I'd like to limit the total number of documents that are returned for
a search, particularly when the sort order is not based on relevancy.

In other words, if the user searches for a very common term, they
might get tens of thousands of hits, and if they sort by "title", then
very high relevancy documents will be interspersed with very low
relevancy documents. I'd like to set a limit to the 1000 most relevant
documents, then sort those by title.

Is there a way to do this?

I guess I could always retrieve the top 1000 documents and sort them
in the client, but that seems particularly inefficient. I can't find
any other way to do this, though.

Thanks,
Paul


Re: limiting the total number of documents matched

2010-07-14 Thread Paul
I was hoping for a way to do this purely by configuration and making
the correct GET requests, but if there is a way to do it by creating a
custom Request Handler, I suppose I could plunge into that. Would that
yield the best results, and would that be particularly difficult?

On Wed, Jul 14, 2010 at 4:37 PM, Nagelberg, Kallin
 wrote:
> So you want to take the top 1000 sorted by score, then sort those by another 
> field. It's a strange case, and I can't think of a clean way to accomplish 
> it. You could do it in two queries, where the first is by score and you only 
> request your IDs to keep it snappy, then do a second query against the IDs 
> and sort by your other field. 1000 seems like a lot for that approach, but 
> who knows until you try it on your data.
>
> -Kallin Nagelberg
>
>
> -Original Message-
> From: Paul [mailto:p...@nines.org]
> Sent: Wednesday, July 14, 2010 4:16 PM
> To: solr-user
> Subject: limiting the total number of documents matched
>
> I'd like to limit the total number of documents that are returned for
> a search, particularly when the sort order is not based on relevancy.
>
> In other words, if the user searches for a very common term, they
> might get tens of thousands of hits, and if they sort by "title", then
> very high relevancy documents will be interspersed with very low
> relevancy documents. I'd like to set a limit to the 1000 most relevant
> documents, then sort those by title.
>
> Is there a way to do this?
>
> I guess I could always retrieve the top 1000 documents and sort them
> in the client, but that seems particularly inefficient. I can't find
> any other way to do this, though.
>
> Thanks,
> Paul
>


Re: limiting the total number of documents matched

2010-07-14 Thread Paul
I thought of another way to do it, but I still have one thing I don't
know how to do. I could do the search without sorting for the 50th
page, then look at the relevancy score on the first item on that page,
then repeat the search, but add score > that relevancy as a parameter.
Is it possible to do a search with "score:[5 to *]"? It didn't work in
my first attempt.

On Wed, Jul 14, 2010 at 5:34 PM, Paul  wrote:
> I was hoping for a way to do this purely by configuration and making
> the correct GET requests, but if there is a way to do it by creating a
> custom Request Handler, I suppose I could plunge into that. Would that
> yield the best results, and would that be particularly difficult?
>
> On Wed, Jul 14, 2010 at 4:37 PM, Nagelberg, Kallin
>  wrote:
>> So you want to take the top 1000 sorted by score, then sort those by another 
>> field. It's a strange case, and I can't think of a clean way to accomplish 
>> it. You could do it in two queries, where the first is by score and you only 
>> request your IDs to keep it snappy, then do a second query against the IDs 
>> and sort by your other field. 1000 seems like a lot for that approach, but 
>> who knows until you try it on your data.
>>
>> -Kallin Nagelberg
>>
>>
>> -Original Message-
>> From: Paul [mailto:p...@nines.org]
>> Sent: Wednesday, July 14, 2010 4:16 PM
>> To: solr-user
>> Subject: limiting the total number of documents matched
>>
>> I'd like to limit the total number of documents that are returned for
>> a search, particularly when the sort order is not based on relevancy.
>>
>> In other words, if the user searches for a very common term, they
>> might get tens of thousands of hits, and if they sort by "title", then
>> very high relevancy documents will be interspersed with very low
>> relevancy documents. I'd like to set a limit to the 1000 most relevant
>> documents, then sort those by title.
>>
>> Is there a way to do this?
>>
>> I guess I could always retrieve the top 1000 documents and sort them
>> in the client, but that seems particularly inefficient. I can't find
>> any other way to do this, though.
>>
>> Thanks,
>> Paul
>>
>


autocomplete: case-insensitive and middle word

2010-08-17 Thread Paul
I have a couple questions about implementing an autocomplete function
in solr. Here's my scenario:

I have a name field that usually contains two or three names. For
instance, let's suppose it contains:

John Alfred Smith
Alfred Johnson
John Quincy Adams
Fred Jones

I'd like to have the autocomplete be case insensitive and match any of
the names, preferably just at the beginning.

In other words, if the user types "alf", I want

John Alfred Smith
Alfred Johnson

if the user types "fre", I want

Fred Jones

but not:
John Alfred Smith
Alfred Johnson

I can get the matches using the text_lu analyzer, but the hints that
are returned are lower case, and only one name.

If I use the "string" analyzer, I get the entire name like I want it,
but the user must match the case, that is, must type "Alf", and it
only matches the first name, not the middle name.

How can I get the matches of the "text_lu" analyzer, but get the hints
like the "string" analyzer?

Thanks,
Paul


Re: autocomplete: case-insensitive and middle word

2010-08-18 Thread Paul
Here's my solution. I'm posting it in case it is radically wrong; I
hope someone can help straighten me out. It seems to work fine, and
seems fast enough.

In schema.xml:


















Then I restarted solr to pick up the changes.

I then ran a script which reads each document out of the current
index, and adds the new field:

for each doc in my solr index:

doc['ac_name'] = doc['name'].split(' ')

and write the record back out.

Then, using rsolr, I make the following query:

response = @solr.select(:params => {
:q=> "ac_name:#{prefix}",
:start=> 0,
:rows=> 500,
:fl => "name"
})
matches = []
docs = response['response']['docs']
docs.each {|doc|
matches.push(doc['name'])
}

"matches" is now an array of the values I want to display.


What is a "409 version conflict" error [solr 4.1]?

2013-01-28 Thread Paul
I've got a process that is replacing about 180K documents that are all
similar (I'm actually just adding one field to each of them). This is
mostly working fine, but occasionally (perhaps 100 times), I get this error:

409 Conflict
Error:
{'responseHeader'=>{'status'=>409,'QTime'=>1},'error'=>{'msg'=>'version
conflict for lib://X expected=1425089660734930944
actual=1425439751468482560','code'=>409}}

Why does that happen a few times? How can I prevent that from happening?

Is there any other info I can supply that would help the diagnosis?

Thanks!


Re: What is a "409 version conflict" error [solr 4.1]?

2013-01-28 Thread Paul
Ok, I did get a little more information about it from here:
http://yonik.com/solr/optimistic-concurrency/ but I really don't know why
the version number is conflicting. I'm running the only process that is
changing documents, and my process is to read the document, add a field,
and write the document. The records were all created at the same time with
similar data, and are all being updated for the first time, again with
similar data.

I did happen to stumble on the atomic updates feature, which is new to me,
and is probably a better way to do what I need.


On Mon, Jan 28, 2013 at 4:05 PM, Paul  wrote:

> I've got a process that is replacing about 180K documents that are all
> similar (I'm actually just adding one field to each of them). This is
> mostly working fine, but occasionally (perhaps 100 times), I get this error:
>
> 409 Conflict
> Error:
> {'responseHeader'=>{'status'=>409,'QTime'=>1},'error'=>{'msg'=>'version
> conflict for lib://X expected=1425089660734930944
> actual=1425439751468482560','code'=>409}}
>
> Why does that happen a few times? How can I prevent that from happening?
>
> Is there any other info I can supply that would help the diagnosis?
>
> Thanks!
>
>


Re: compare two shards.

2013-02-14 Thread Paul
I do a brute-force regression test where I read all the documents from
shard 1 and compare them to documents in shard 2. I had to have all the
fields stored to do that, but in my case that doesn't change the size of
the index much.

So, in other words, I do a search for a page's worth of documents sorted by
the same thing and compare them, then get the next page and do the same.



On Tue, Feb 12, 2013 at 4:20 AM, stockii  wrote:

> hello.
>
> i want to compare two shards each other, because these shards should have
> the same index. but this isnt so =(
> so i want to find these documents, there are missing in one shard of my
> both
> shards.
>
> my ideas
> - distrubuted shard request on my nodes and fire a facet search on my
> unique-field. but the result of facet component isnt reversable =(
>
> - grouping. but its not working correctly i think so. no groups of the same
> uniquekey in the resultset.
>
>
> does anyone some better ideas?
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/compare-two-shards-tp4039887.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: How to handle to run testcases in ruby code for solr

2012-02-18 Thread Paul
Are you asking how to test your own code, that happens to have a solr
query somewhere in the middle of it? I've done this two ways:

1) You can mock the solr call by detecting that you are in test mode
and just return the right answer. That will be fast.

2) Or you set up a second core with the name "test", and initialize it
for each test. That will give you confidence that your queries are
formed correctly.

Since your test data is generally really small, I've found that using
the second method performs well enough.

I use a global in my app that contains the name of the core, and I set
that in a before filter in application_controller depending whether
I'm in test mode or not.

As part of the test set up I delete all documents. The "delete *:*"
call is really fast. Then I commit a dozen documents or so. With that
little data that is fast, too.

I wrap all calls to solr in a single model so there is one point in my
app that calls rsolr. I can override that class to mock it out if the
solr result is not the focus of the test, and do the above work if the
solr result is the focus of the test.

>
> On Feb 17, 2012, at 07:12 , solr wrote:
>
>> Hi  all,
>> Am writing rails application by using solr_ruby gem to access solr .
>> Can anybody suggest how to handle testcaeses for solr code and connections
>> in functionaltetsing.
>>


searching top matches of each facet

2012-02-29 Thread Paul
Let's say that I have a facet named 'subject' that contains one of:
physics, chemistry, psychology, mathematics, etc

I'd like to do a search for the top 5 documents in each category. I
can do this with a separate search for each facet, but it seems like
there would a way to combine the search. Is there a way?

That is, if the user searches for "my search", I can now search for it
with the facet of "physics" and rows=5, then do a separate search with
the facet of "chemistry", etc...

Can I do that in one search to decrease the load on the server? Or,
when I do the first search, will the results be cached, so that the
rest of the searches are pretty cheap?


Re: searching top matches of each facet

2012-03-01 Thread Paul
Perfect! Thanks!

On Wed, Feb 29, 2012 at 3:29 PM, Emmanuel Espina
 wrote:
> I think that what you want is FieldCollapsing:
>
> http://wiki.apache.org/solr/FieldCollapsing
>
> For example
> &q=my search&group=true&group.field=subject&group.limit=5
>
> Test it to see if that is what you want.
>
> Thanks
> Emmanuel
>
>
> 2012/2/29 Paul :
>> Let's say that I have a facet named 'subject' that contains one of:
>> physics, chemistry, psychology, mathematics, etc
>>
>> I'd like to do a search for the top 5 documents in each category. I
>> can do this with a separate search for each facet, but it seems like
>> there would a way to combine the search. Is there a way?
>>
>> That is, if the user searches for "my search", I can now search for it
>> with the facet of "physics" and rows=5, then do a separate search with
>> the facet of "chemistry", etc...
>>
>> Can I do that in one search to decrease the load on the server? Or,
>> when I do the first search, will the results be cached, so that the
>> rest of the searches are pretty cheap?


Re: score from two cores

2010-12-03 Thread Paul
On Fri, Dec 3, 2010 at 4:47 PM, Erick Erickson  wrote:
> But why do you have two cores in the first place? Is it really necessary or
> is it just
> making things more complex?

I don't know why the OP wants two cores, but I ran into this same
problem and had to abandon using a second core. My use case is: I have
lots of slowing-changing documents, and a few often-changing
documents. Those classes of documents are updated by different people
using different processes. I wanted to split them into separate cores
so that:

1) The large core wouldn't change except deliberately so there would
be less chance of a bug creeping in. Also, that core is the same on
different servers, so they could be replicated.

2) The small core would update and optimize quickly and the data in it
is different on different servers.

The problem is that the search results should return relevancy as if
there were only one core.


Re: Improving Solr performance

2011-01-10 Thread Paul
> I see from your other messages that these indexes all live on the same 
> machine.
> You're almost certainly I/O bound, because you don't have enough memory for 
> the
> OS to cache your index files.  With 100GB of total index size, you'll get best
> results with between 64GB and 128GB of total RAM.

Is that a general rule of thumb? That it is best to have about the
same amount of RAM as the size of your index?

So, with a 5GB index, I should have between 4GB and 8GB of RAM
dedicated to solr?


verifying that an index contains ONLY utf-8

2011-01-12 Thread Paul
We've created an index from a number of different documents that are
supplied by third parties. We want the index to only contain UTF-8
encoded characters. I have a couple questions about this:

1) Is there any way to be sure during indexing (by setting something
in the solr configuration?) that the documents that we index will
always be stored in utf-8? Can solr convert documents that need
converting on the fly, or can solr reject documents containing illegal
characters?

2) Is there a way to scan the existing index to find any string
containing non-utf8 characters? Or is there another way that I can
discover if any crept into my index?


Re: verifying that an index contains ONLY utf-8

2011-01-13 Thread Paul
Thanks for all the responses.

CharsetDetector does look promising. Unfortunately, we aren't allowed
to keep the original of much of our data, so the solr index is the
only place it exists (to us). I do have a java app that "reindexes",
i.e., reads all documents out of one index, does some transform on
them, then writes them to a second index. So I already have a place
where I see all the data in the index stream by. I wanted to make sure
there wasn't some built in way of doing what I need.

I know that it is possible to fool the algorithm, but I'll see if the
string is a possible utf-8 string first and not change that. Then I
won't be introducing more errors and maybe I can detect a large
percentage of the non-utf-8 strings.

On Thu, Jan 13, 2011 at 4:36 PM, Robert Muir  wrote:
> it does: 
> http://icu-project.org/apiref/icu4j/com/ibm/icu/text/CharsetDetector.html
> this takes a sample of the file and makes a guess.


last item in results page is always the same

2011-02-16 Thread Paul
(I'm using solr 1.4)

I'm doing a test of my index, so I'm reading out every document in
batches of 500. The query is (I added newlines here to make it
readable):

http://localhost:8983/solr/archive_ECCO/select/
?q=archive%3AECCO
&fl=uri
&version=2.2
&start=0
&rows=500
&indent=on
&sort=uri%20asc

It turns out, in this case, the query should match every document. The
response shows numFound="182413".

If I scan the returned values, they appear sorted properly except the
last one. In other words, the uri that are returned on the first page
are:

100100
100200
etc...
0006601600
0006601700
1723200600

That 499th value is returned as the 499th value on every page. That
is, if I call it with &start=500, then most of the entries look right,
but that last value will still be 1723200600, and the true 499th value
is never returned.

1723200600 should have been returned as the 181,499th item.

Is this a known solr bug or is there something subtle going on?

Thanks,
Paul


Re: last item in results page is always the same

2011-02-17 Thread Paul
Thanks, going to update now. This is a system that is currently
deployed. Should I just update to 1.4.1, or should I go straight to
3.0? Does 1.4 => 3.0 require reindexing?

On Wed, Feb 16, 2011 at 5:37 PM, Yonik Seeley
 wrote:
> On Wed, Feb 16, 2011 at 5:08 PM, Paul  wrote:
>> Is this a known solr bug or is there something subtle going on?
>
> Yes, I think it's the following bug, fixed in 1.4.1:
>
> * SOLR-1777: fieldTypes with sortMissingLast=true or sortMissingFirst=true can
>  result in incorrectly sorted results.
>
> -Yonik
> http://lucidimagination.com
>


Search failing for matched text in large field

2011-03-23 Thread Paul
I'm using solr 1.4.1.

I have a document that has a pretty big field. If I search for a
phrase that occurs near the start of that field, it works fine. If I
search for a phrase that appears even a little ways into the field, it
doesn't find it. Is there some limit to how far into a field solr will
search?

Here's the way I'm doing the search. All I'm changing is the text I'm
searching on to make it succeed or fail:

http://localhost:8983/solr/my_core/select/?q=%22search+phrase%22&hl=on&hl.fl=text

Or, if it is not related to how large the document is, what else could
it possibly be related to? Could there be some character in that field
that is stopping the search?


Re: Search failing for matched text in large field

2011-03-23 Thread Paul
Ah, no, I'll try that now.

What is the disadvantage of setting that to a really large number?

I do want the search to work for every word I give to solr. Otherwise
I wouldn't have indexed it to begin with.

On Wed, Mar 23, 2011 at 11:15 AM, Sascha Szott  wrote:
> Hi Paul,
>
> did you increase the value of the maxFieldLength parameter in your
> solrconfig.xml?
>
> -Sascha
>
> On 23.03.2011 17:05, Paul wrote:
>>
>> I'm using solr 1.4.1.
>>
>> I have a document that has a pretty big field. If I search for a
>> phrase that occurs near the start of that field, it works fine. If I
>> search for a phrase that appears even a little ways into the field, it
>> doesn't find it. Is there some limit to how far into a field solr will
>> search?
>>
>> Here's the way I'm doing the search. All I'm changing is the text I'm
>> searching on to make it succeed or fail:
>>
>>
>> http://localhost:8983/solr/my_core/select/?q=%22search+phrase%22&hl=on&hl.fl=text
>>
>> Or, if it is not related to how large the document is, what else could
>> it possibly be related to? Could there be some character in that field
>> that is stopping the search?
>


Re: Search failing for matched text in large field

2011-03-23 Thread Paul
I increased maxFieldLength and reindexed a small number of documents.
That worked -- I got the correct results. In 3 minutes!

I assume that if I reindex all my documents that all searches will
become even slower. Is there any way to get all the results in a way
that is quick enough that my user won't get bored waiting? Is there
some optimization of this coming in solr 3.0?

On Wed, Mar 23, 2011 at 12:15 PM, Sascha Szott  wrote:
> Hi Paul,
>
> did you increase the value of the maxFieldLength parameter in your
> solrconfig.xml?
>
> -Sascha
>
> On 23.03.2011 17:05, Paul wrote:
>>
>> I'm using solr 1.4.1.
>>
>> I have a document that has a pretty big field. If I search for a
>> phrase that occurs near the start of that field, it works fine. If I
>> search for a phrase that appears even a little ways into the field, it
>> doesn't find it. Is there some limit to how far into a field solr will
>> search?
>>
>> Here's the way I'm doing the search. All I'm changing is the text I'm
>> searching on to make it succeed or fail:
>>
>>
>> http://localhost:8983/solr/my_core/select/?q=%22search+phrase%22&hl=on&hl.fl=text
>>
>> Or, if it is not related to how large the document is, what else could
>> it possibly be related to? Could there be some character in that field
>> that is stopping the search?
>


Re: Search failing for matched text in large field

2011-03-23 Thread Paul
I looked into the search that I'm doing a little closer and it seems
like the highlighting is slowing it down. If I do the query without
requesting highlighting it is fast. (BTW, I also have faceting and
pagination in my query. Faceting doesn't seem to change the response
time much, adding &rows= and &start= does, but not prohibitively.)

The field in question needs to be stored=true, because it is needed
for highlighting.

I'm thinking of doing this in two searches: first without highlighting
and put a progress spinner next to each result, then do an ajax call
to repeat the search with highlighting that can take its time to
finish.

(I, too, have seen random really long response times that seem to be
related to not enough RAM, but this isn't the problem because the
results here are repeatable.)

On Wed, Mar 23, 2011 at 2:30 PM, Sascha Szott  wrote:
> On 23.03.2011 18:52, Paul wrote:
>>
>> I increased maxFieldLength and reindexed a small number of documents.
>> That worked -- I got the correct results. In 3 minutes!
>
> Did you mark the field in question as stored = false?
>
> -Sascha
>
>>
>> I assume that if I reindex all my documents that all searches will
>> become even slower. Is there any way to get all the results in a way
>> that is quick enough that my user won't get bored waiting? Is there
>> some optimization of this coming in solr 3.0?
>>
>> On Wed, Mar 23, 2011 at 12:15 PM, Sascha Szott  wrote:
>>>
>>> Hi Paul,
>>>
>>> did you increase the value of the maxFieldLength parameter in your
>>> solrconfig.xml?
>>>
>>> -Sascha
>>>
>>> On 23.03.2011 17:05, Paul wrote:
>>>>
>>>> I'm using solr 1.4.1.
>>>>
>>>> I have a document that has a pretty big field. If I search for a
>>>> phrase that occurs near the start of that field, it works fine. If I
>>>> search for a phrase that appears even a little ways into the field, it
>>>> doesn't find it. Is there some limit to how far into a field solr will
>>>> search?
>>>>
>>>> Here's the way I'm doing the search. All I'm changing is the text I'm
>>>> searching on to make it succeed or fail:
>>>>
>>>>
>>>>
>>>> http://localhost:8983/solr/my_core/select/?q=%22search+phrase%22&hl=on&hl.fl=text
>>>>
>>>> Or, if it is not related to how large the document is, what else could
>>>> it possibly be related to? Could there be some character in that field
>>>> that is stopping the search?
>>>
>


Best practice for rotating solr logs

2011-03-31 Thread Paul
I'm about to set up log rotation using logrotate, but I have a
question about how to do it.

The general examples imply that one should include the following in the script:

postrotate
/sbin/service solr restart
endscript

but it seems to me that any requests that come in during that restart
process are going to return errors. The other way to do it is to use

copytruncate

but that will cause any requests that come in during the time that the
file is being truncated to not appear in the log.

How do you set up your logrotate file?

Thanks,
Paul


ConcurrentLRUCache$Stats error

2011-04-05 Thread Paul
I'm using solr 1.4.1 and just noticed a bunch of these errors in the
solr.log file:

SEVERE: java.util.concurrent.ExecutionException:
java.lang.NoSuchMethodError:
org.apache.solr.common.util.ConcurrentLRUCache$Stats.add(Lorg/apache/solr/common/util/ConcurrentLRUCache$Stats;)V

They appear to happen after a commit, and as far as I can tell,
everything is working fine -- that's why I didn't notice these errors
earlier.

What is this telling me?


Searching for escaped characters

2011-04-28 Thread Paul
I'm trying to create a test to make sure that character sequences like
"è" are successfully converted to their equivalent utf
character (that is, in this case, "è").

So, I'd like to search my solr index using the equivalent of the
following regular expression:

&\w{1,6};

To find any escaped sequences that might have slipped through.

Is this possible? I have indexed these fields with text_lu, which
looks like this:

   
  

    

  


Thanks,
Paul


applying FastVectorHighlighter truncation patch to solr 3.1

2011-05-17 Thread Paul
I'm having this issue with solr 3.1:

https://issues.apache.org/jira/browse/LUCENE-1824

It looks like there is a patch offered, but I can't figure out how to apply it.

What is the easiest way for me to get this fix? I'm just using the
example solr with changed conf xml files. Is there a file somewhere I
can just drop in?


auto suggestion with text_en field

2011-08-26 Thread Paul
Sorry if this has been asked before, but I couldn't seem to find it...

I've got a fairly simple index, and I'm searching on a field of type
text_en, and the results are good: I search for "computer" and I get
back hits for "computer", "computation", "computational", "computing".

I also want to create an auto suggestion drop down, so I did a query
using the field as a facet, and I get back a good, but literal, set of
suggestions. For instance, one of the suggestions is "comput", which
does actually match what I want it to, but it is ugly, since it isn't
actually a word.

As I'm thinking about it, I'm not sure what word I would like it to
return in this situation, so I'm asking how others have handled this
situation. Is it illogical to have auto complete on a text_en field?
Do I have to pick one or the other?

Thanks,


Setting up two cores in solr.xml for Solr 4.0

2012-09-04 Thread Paul
I'm trying to set up two cores that share everything except their
data. (This is for testing: I want to create a parallel index that is
used when running my testing scripts.) I thought that would be
straightforward, and according to the documentation, I thought the
following would work:


  
  

   


I thought that would create a directory structure like this:

solr
  MYCORE
conf
data
  index
MYCORE_test
  index

But it looks like both of the cores are sharing the same index and the
MYCORE_test directory is not created. In addition, I get the followin
in the log file:

INFO: [MYCORE_test] Opening new SolrCore at solr/MYCORE/,
dataDir=solr/MYCORE/data/
...
WARNING: New index directory detected: old=null new=solr/MYCORE/data/index/

What am I not understanding?


Re: Setting up two cores in solr.xml for Solr 4.0

2012-09-04 Thread Paul
By trial an error, I found that you evidently need to put that
property inline, so this version works:


  
  


Is the documentation here in error? http://wiki.apache.org/solr/CoreAdmin

On Tue, Sep 4, 2012 at 2:50 PM, Paul  wrote:
> I'm trying to set up two cores that share everything except their
> data. (This is for testing: I want to create a parallel index that is
> used when running my testing scripts.) I thought that would be
> straightforward, and according to the documentation, I thought the
> following would work:
>
> 
>   
>   
> 
>
> 
>
> I thought that would create a directory structure like this:
>
> solr
>   MYCORE
> conf
> data
>   index
> MYCORE_test
>   index
>
> But it looks like both of the cores are sharing the same index and the
> MYCORE_test directory is not created. In addition, I get the followin
> in the log file:
>
> INFO: [MYCORE_test] Opening new SolrCore at solr/MYCORE/,
> dataDir=solr/MYCORE/data/
> ...
> WARNING: New index directory detected: old=null new=solr/MYCORE/data/index/
>
> What am I not understanding?


Re: Setting up two cores in solr.xml for Solr 4.0

2012-09-05 Thread Paul
I don't think I changed by solrconfig.xml file from the default that
was provided in the example folder for solr 4.0.

On Tue, Sep 4, 2012 at 3:40 PM, Chris Hostetter
 wrote:
>
> :   
>
> I'm pretty sure what you hav above tells solr that core MYCORE_test it
> should use the instanceDir MYCORE but ignore the  in that
> solrconfig.xml and use the one you specified.
>
> This on the other hand...
>
> : >   
> : > 
> : >
>
> ...tells solr that the MYCORE_test SolrCore should use the instanceDir
> MYCORE, and when parsing that solrconfig.xml file it should set the
> variable ${dataDir} to be "MYCORE_test" -- but if your solconfig.xml file
> does not ever refer to the  ${dataDir} variable, it would have any effect.
>
> so the question becomes -- what does your solrconfig.xml look like?
>
>
> -Hoss


Still see document after delete with commit in solr 4.0

2012-09-05 Thread Paul
I've recently upgraded to solr 4.0 from solr 3.5 and I think my delete
statement used to work, but now it doesn't seem to be deleting. I've
been experimenting around, and it seems like this should be the URL
for deleting the document with the uri of "network_24".

In a browser, I first go here:

http://localhost:8983/solr/MYCORE/update?stream.body=%3Cdelete%3E%3Cquery%3Euri%3Anetwork_24%3C%2Fquery%3E%3C%2Fdelete%3E&commit=true

I get this response:


  
0
5
  


And this is in the log file:

(timestamp) org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start 
commit{flags=0,_version_=0,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false}
(timestamp) org.apache.solr.search.SolrIndexSearcher 
INFO: Opening Searcher@646dd60e main
(timestamp) org.apache.solr.update.DirectUpdateHandler2 commit
INFO: end_commit_flush
(timestamp) org.apache.solr.core.QuerySenderListener newSearcher
INFO: QuerySenderListener sending requests to Searcher@646dd60e
main{StandardDirectoryReader(segments_2v:447 _4p(4.0.0.1):C3244)}
(timestamp) org.apache.solr.core.QuerySenderListener newSearcher
INFO: QuerySenderListener done.
S(timestamp) org.apache.solr.core.SolrCore registerSearcher
INFO: [MYCORE] Registered new searcher Searcher@646dd60e
main{StandardDirectoryReader(segments_2v:447 _4p(4.0.0.1):C3244)}
(timestamp) org.apache.solr.update.processor.LogUpdateProcessor finish
INFO: [MYCORE] webapp=/solr path=/update
params={commit=true&stream.body=uri:network_24}
{deleteByQuery=uri:network_24,commit=} 0 5

But if I then go to this URL:

http://localhost:8983/solr/MYCORE/select?q=uri%3Anetwork_24&wt=xml

I get this response:


  
0
1

  xml
  uri:network_24

  
  

  network24
  network_24

  


Why didn't that document disappear?


Re: Still see document after delete with commit in solr 4.0

2012-09-05 Thread Paul
That was exactly it. I added the following line to schema.xml and it now works.




On Wed, Sep 5, 2012 at 10:13 AM, Jack Krupansky  wrote:
> Check to make sure that you are not stumbling into SOLR-3432: "deleteByQuery
> silently ignored if updateLog is enabled, but {{_version_}} field does not
> exist in schema".
>
> See:
> https://issues.apache.org/jira/browse/SOLR-3432
>
> -- Jack Krupansky
>
> -Original Message- From: Paul
> Sent: Wednesday, September 05, 2012 10:05 AM
> To: solr-user
> Subject: Still see document after delete with commit in solr 4.0
>
>
> I've recently upgraded to solr 4.0 from solr 3.5 and I think my delete
> statement used to work, but now it doesn't seem to be deleting. I've
> been experimenting around, and it seems like this should be the URL
> for deleting the document with the uri of "network_24".
>
> In a browser, I first go here:
>
> http://localhost:8983/solr/MYCORE/update?stream.body=%3Cdelete%3E%3Cquery%3Euri%3Anetwork_24%3C%2Fquery%3E%3C%2Fdelete%3E&commit=true
>
> I get this response:
>
> 
>  
>0
>5
>  
> 
>
> And this is in the log file:
>
> (timestamp) org.apache.solr.update.DirectUpdateHandler2 commit
> INFO: start
> commit{flags=0,_version_=0,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false}
> (timestamp) org.apache.solr.search.SolrIndexSearcher 
> INFO: Opening Searcher@646dd60e main
> (timestamp) org.apache.solr.update.DirectUpdateHandler2 commit
> INFO: end_commit_flush
> (timestamp) org.apache.solr.core.QuerySenderListener newSearcher
> INFO: QuerySenderListener sending requests to Searcher@646dd60e
> main{StandardDirectoryReader(segments_2v:447 _4p(4.0.0.1):C3244)}
> (timestamp) org.apache.solr.core.QuerySenderListener newSearcher
> INFO: QuerySenderListener done.
> S(timestamp) org.apache.solr.core.SolrCore registerSearcher
> INFO: [MYCORE] Registered new searcher Searcher@646dd60e
> main{StandardDirectoryReader(segments_2v:447 _4p(4.0.0.1):C3244)}
> (timestamp) org.apache.solr.update.processor.LogUpdateProcessor finish
> INFO: [MYCORE] webapp=/solr path=/update
> params={commit=true&stream.body=uri:network_24}
> {deleteByQuery=uri:network_24,commit=} 0 5
>
> But if I then go to this URL:
>
> http://localhost:8983/solr/MYCORE/select?q=uri%3Anetwork_24&wt=xml
>
> I get this response:
>
> 
>  
>0
>1
>
>  xml
>  uri:network_24
>
>  
>  
>
>  network24
>  network_24
>
>  
> 
>
> Why didn't that document disappear?


Re: Still see document after delete with commit in solr 4.0

2012-09-05 Thread Paul
Actually, I didn't technically "upgrade". I downloaded the new
version, grabbed the example, and pasted in the fields from my schema
into the new one. So the only two files I changed from the example are
schema.xml and solr.xml.

Then I reindexed everything from scratch so there was no old index
involved, either.

On Wed, Sep 5, 2012 at 2:42 PM, Chris Hostetter
 wrote:
>
> : That was exactly it. I added the following line to schema.xml and it now 
> works.
> :
> : 
>
> Just to be clear: how exactly did you "upgraded to solr 4.0 from solr 3.5"
> -- did you throw out your old solrconfig.xml and use the example
> solrconfig.xml from 4.0, but keep your 3.5 schema.xml?  Do you in fact
> have an  in your solrconfig.xml?
>
> (if so: then this is all known as part of SOLR-3432, and won't affect any
> users of 4.0-final -- but i want to be absolutely sure there isn't some
> other edge case of this bug)
>
>
> -Hoss


facet by "in the past" and "in the future"

2012-10-18 Thread Paul
I have some documents that contain a date field. I'd like to set up a
facet that groups the dates in two buckets: 1) before today, 2) today
and in the future.

It seems like I should be using range faceting, but I don't see how to
set up the parameters. Is there another way to get what I want?

The way my user interface will look is:

Status
--
[x] Open (26)
[ ] Closed (127)

Where "open" will be all the documents that don't have a date in the past.

Thanks!


Re: facet by "in the past" and "in the future"

2012-10-18 Thread Paul
That is perfect! Thanks. I never would have stumbled onto that.

On Thu, Oct 18, 2012 at 5:40 PM, Michael Ryan  wrote:
> This should do it:
> facet=true&facet.query=yourDateField:([* TO 
> NOW/DAY-1MILLI])&facet.query=yourDateField:([NOW/DAY TO *])
>
> -Michael
>
> -Original Message-
> From: Paul [mailto:p...@nines.org]
> Sent: Thursday, October 18, 2012 5:28 PM
> To: solr-user@lucene.apache.org
> Subject: facet by "in the past" and "in the future"
>
> I have some documents that contain a date field. I'd like to set up a
> facet that groups the dates in two buckets: 1) before today, 2) today
> and in the future.
>
> It seems like I should be using range faceting, but I don't see how to
> set up the parameters. Is there another way to get what I want?
>
> The way my user interface will look is:
>
> Status
> --
> [x] Open (26)
> [ ] Closed (127)
>
> Where "open" will be all the documents that don't have a date in the past.
>
> Thanks!


Re: If you could have one feature in Solr...

2010-02-24 Thread Paul
Limit the number of results when the results are sorted.

In other words, if the results are sorted by name and there are 10,000
results, then there will be items of low relevancy mixed in with the
results and it is hard for the user to find the relevant ones. If I
could say, "give me no more than 200 results, sorted by name", then I
want the most relevant 200 results. (It would be ok to be
approximately 200. If there are documents that are the same relevancy,
then a few more than 200 would be acceptable.)

On Wed, Feb 24, 2010 at 8:42 AM, Grant Ingersoll  wrote:
> What would it be?
>


Diagnosing solr timeout

2010-06-09 Thread Paul
Hi all,

In my app, it seems like solr has become slower over time. The index
has grown a bit, and there are probably a few more people using the
site, but the changes are not drastic.

I notice that when a solr search is made, the amount of cpu and ram
spike precipitously.

I notice in the solr log, a bunch of entries in the same second that end in:

status=0 QTime=212
status=0 QTime=96
status=0 QTime=44
status=0 QTime=276
status=0 QTime=8552
status=0 QTime=16
status=0 QTime=20
status=0 QTime=56

and then:

status=0 QTime=315919
status=0 QTime=325071

My questions: How do I figure out what to fix? Do I need to start java
with more memory? How do I tell what is the correct amount of memory
to use? Is there something particularly inefficient about something
else in my configuration, or the way I'm formulating the solr request,
and how would I narrow down what it could be? I can't tell, but it
seems like it happens after solr has been running unattended for a
little while. Should I have a cron job that restarts solr every day?
Could the solr process be starved by something else on the server
(although -- the only other thing that is particularly running is
apache/passenger/rails app)?

In other words, I'm at a total loss about how to fix this.

Thanks!

P.S. In case this helps, here's the exact log entry for the first item
that failed:

Jun 9, 2010 1:02:52 PM org.apache.solr.core.SolrCore execute
INFO: [resources] webapp=/solr path=/select
params={hl.fragsize=600&facet.missing=true&facet=false&facet.mincount=1&ids=http://pm.nlx.com/xtf/view?docId%3Dshelley/shelley.04.xml;chunk.id%3Ddiv.ww.shelleyworks.v4.44,http://pm.nlx.com/xtf/view?docId%3Dshelley/shelley.06.xml;chunk.id%3Ddiv.ww.shelleyworks.v6.67,http://pm.nlx.com/xtf/view?docId%3Dtennyson_c/tennyson_c.02.xml;chunk.id%3Ddiv.tennyson.v2.1115,http://pm.nlx.com/xtf/view?docId%3Dmarx/marx.39.xml;chunk.id%3Ddiv.marx.engels.39.325,http://pm.nlx.com/xtf/view?docId%3Dshelley_j/shelley_j.01.xml;chunk.id%3Ddiv.ww.shelley.journals.v1.80,http://pm.nlx.com/xtf/view?docId%3Deliot/eliot.01.xml;chunk.id%3Ddiv.eliot.novels.bede.116,http://pm.nlx.com/xtf/view?docId%3Deliot/eliot.01.xml;chunk.id%3Ddiv.eliot.novels.bede.115,http://pm.nlx.com/xtf/view?docId%3Deliot/eliot.01.xml;chunk.id%3Ddiv.eliot.novels.bede.75,http://pm.nlx.com/xtf/view?docId%3Deliot/eliot.01.xml;chunk.id%3Ddiv.eliot.novels.bede.76,http://pm.nlx.com/xtf/view?docId%3Demerson/emerson.05.xml;chunk.id%3Dralph.waldo.v5.d083,http://pm.nlx.com/xtf/view?docId%3Dshelley/shelley.04.xml;chunk.id%3Ddiv.ww.shelleyworks.v4.31,http://pm.nlx.com/xtf/view?docId%3Dshelley_j/shelley_j.01.xml;chunk.id%3Ddiv.ww.shelley.journals.v1.88,http://pm.nlx.com/xtf/view?docId%3Deliot/eliot.03.xml;chunk.id%3Ddiv.eliot.romola.48&facet.limit=-1&hl.fl=text&hl.maxAnalyzedChars=512000&wt=javabin&hl=true&rows=30&version=1&fl=uri,archive,date_label,genre,source,image,thumbnail,title,alternative,url,role_ART,role_AUT,role_EDT,role_PBL,role_TRL,role_EGR,role_ETR,role_CRE,freeculture,is_ocr,federation,has_full_text,source_xml,uri&start=0&q=(*:*+AND+(life)+AND+(death)+AND+(of)+AND+(jason)+AND+federation:NINES)+OR+(*:*+AND+(life)+AND+(death)+AND+(of)+AND+(jason)+AND+federation:NINES+-genre:Citation)^5&facet.field=genre&facet.field=archive&facet.field=freeculture&facet.field=has_full_text&facet.field=federation&isShard=true&fq=year:"1882"}
status=0 QTime=315919


Re: Diagnosing solr timeout

2010-06-09 Thread Paul
>Have you looked at the garbage collector statistics? I've experienced this 
>kind of issues in the past
and I was getting huge spikes when the GC was doing its job.

I haven't, and I'm not sure what a good way to monitor this is. The
problem occurs maybe once a week on a server. Should I run jstat the
whole time and redirect the output to a log file? Is there another way
to get that info?

Also, I was suspecting GC myself. So, if it is the problem, what do I
do about it? It seems like increasing RAM might make the problem worse
because it would wait longer to GC, then it would have more to do.


Enabling SSL on SOLR breaks my SQL Server connection

2019-05-23 Thread Paul
Hi,

I have enabled HTTPS on my SOLR server and it works fine over HTTPS for
interaction with SOLR via the browser such as for data queries and
management actions.

However, I now get an error when attempting to retrieve data from the SQL
server for Indexing. The JDBC connection string has the parameters to manage
SQL connections that are encrypted which has been setup and works fine when
SSL is not specified for SOLR. When enabling SSL for SOLR client connections
how do I enable it just for clients making requests into SOLR and not change
any of the outgoing stuff which is already using encrypted comms, ie to SQL
Server.

The error message I get is below:

...
org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to
execute query: SELECT * from MYVIEW Processing Document # 1
at
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:69)
at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:327)
at
org.apache.solr.handler.dataimport.JdbcDataSource.createResultSetIterator(JdbcDataSource.java:288)
at
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:283)
at
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:52)
at
org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59)
at
org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:267)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:476)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:233)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:424)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:483)
at
org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:466)
at java.lang.Thread.run(Unknown Source)
Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: The driver could
not establish a secure connection to SQL Server by using Secure Sockets
Layer (SSL) encryption. Error: "sun.security.validator.ValidatorException:
PKIX path building failed:
sun.security.provider.certpath.SunCertPathBuilderException: unable to find
valid certification path to requested target".
ClientConnectionId:bb3e9ce0-8d93-4514-98ed-f19938b91e96
at
com.microsoft.sqlserver.jdbc.SQLServerConnection.terminate(SQLServerConnection.java:2826)
at com.microsoft.sqlserver.jdbc.TDSChannel.enableSSL(IOBuffer.java:1829)
at
com.microsoft.sqlserver.jdbc.SQLServerConnection.connectHelper(SQLServerConnection.java:2391)
at
com.microsoft.sqlserver.jdbc.SQLServerConnection.login(SQLServerConnection.java:2042)
at
com.microsoft.sqlserver.jdbc.SQLServerConnection.connectInternal(SQLServerConnection.java:1889)
at
com.microsoft.sqlserver.jdbc.SQLServerConnection.connect(SQLServerConnection.java:1120)
at
com.microsoft.sqlserver.jdbc.SQLServerDriver.connect(SQLServerDriver.java:700)
at
org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:192)
at
org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:172)
at
org.apache.solr.handler.dataimport.JdbcDataSource.getConnection(JdbcDataSource.java:528)
at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:317)
... 14 more
Caused by: javax.net.ssl.SSLHandshakeException:
sun.security.validator.ValidatorException: PKIX path building failed:
sun.security.provider.certpath.SunCertPathBuilderException: unable to find
valid certification path to requested target
at sun.security.ssl.Alerts.getSSLException(Unknown Source)
at sun.security.ssl.SSLSocketImpl.fatal(Unknown Source)
at sun.security.ssl.Handshaker.fatalSE(Unknown Source)
at sun.security.ssl.Handshaker.fatalSE(Unknown Source)
at sun.security.ssl.ClientHandshaker.serverCertificate(Unknown Source)
at sun.security.ssl.ClientHandshaker.processMessage(Unknown Source)
at sun.security.ssl.Handshaker.processLoop(Unknown Source)
at sun.security.ssl.Handshaker.process_record(Unknown Source)
at sun.security.ssl.SSLSocketImpl.readRecord(Unknown Source)
at sun.security.ssl.SSLSocketImpl.performInitialHandshake(Unknown 
Source)
at sun.security.ssl.SSLSocketImpl.startHandshake(Unknown Source)
at sun.security.ssl.SSLSocketImpl.startHandshake(Unknown Source)
at com.microsoft.sqlse

Re: Enabling SSL on SOLR breaks my SQL Server connection

2019-05-23 Thread Paul
Thanks for the reply Shawn.

What I was asking is whether there is an option to exclude the comms to SQL
from SOLR managed encryption as the JDBC driver manages the connection and
SOLR is acting as the Client in this instance and is already using encrypted
comms via the connection string parameters.

Cheers
Paul


On 5/23/2019 5:45 AM, Paul wrote: 
> unable to find 
> valid certification path to requested target 

This seems to be the root of your problem with the connection to SQL server. 

If I have all the context right, Java is saying it can't validate the 
certificate returned by the SQL server. 

This page: 

https://docs.microsoft.com/en-us/sql/connect/jdbc/connecting-with-ssl-encryption?view=sql-server-2017

Talks about a "trustCertificate" property you can set to "true" in the 
JDBC URL that will cause Microsoft's JDBC driver to NOT validate the 
server certificate. 

Alternatively, if the SQL server is sending all the necessary chain 
certificates, you could place the root cert for the CA that issued the 
SQL Server certificate in the Java keystore that you're using for SSL on 
Solr, that would probably also fix it -- because then the SQL cert would 
validate. 

Thanks, 
Shawn 



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Enabling SSL on SOLR breaks my SQL Server connection

2019-05-24 Thread Paul



Ta - it works if I set trustCertificate=true so for now that will do for
test.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Enabling SSL on SOLR breaks my SQL Server connection

2019-05-31 Thread Paul


SOLVED: Now implemented with a bespoke trust store set up for SOLR ... 





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Configure mutual TLS 1.2 to secure SOLR

2019-06-07 Thread Paul
Hi,

Can someone please outline how to use mutual TLS 1.2 with SOLR. Or, point me
at docs/tutorials/other where I can read up further on this (version
currently onsite is SOLR 7.6).

Thanks
Paul



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Basic Authentication in Standalone Configuration ?

2019-06-10 Thread Paul
Hi,

I am not sure if Basic Authentication is possible in SOLR standalone
configuration (version 7.6). I have a working SOLR installation using SSL.
When following the docs I add options into solr.in.cmd, as in:

SOLR_AUTH_TYPE="basic"
SOLR_AUTHENTICATION_OPTS="-Dbasicauth=solr:SolrRocks"

When I go to start SOLR I get:

'SOLR_AUTH_TYPE' is not recognized as an internal or external command,
operable program or batch file.
'SOLR_AUTHENTICATION_OPTS' is not recognized as an internal or external
command, operable program or batch file.

This is as per
https://www.apache.si/lucene/solr/ref-guide/apache-solr-ref-guide-7.7.pdf
and in there it refers to '*If you are using SolrCloud*, you must upload
security.json to ZooKeeper. You can use this example command, ensuring that
the ZooKeeper port is correct '.

I am not using SolrCloud   








--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Security Problems

2015-12-16 Thread Noble Paul
I don't this behavior is intuitive. It is very easy to misunderstand

I would rather just add a flag to "authentication" plugin section
which says "blockUnauthenticated" : true

which means all unauthenticated requests must be blocked.




On Tue, Dec 15, 2015 at 7:09 PM, Jan Høydahl  wrote:
> Yes, that’s why I believe it should be:
> 1) if only authentication is enabled, all users must authenticate and all 
> authenticated users can do anything.
> 2) if authz is enabled, then all users must still authenticate, and can by 
> default do nothing at all, unless assigned proper roles
> 3) if a user is assigned the default “read” rule, and a collection adds a 
> custom “/myselect” handler, that one is unavailable until the user gets it 
> assigned
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
>> 14. des. 2015 kl. 14.15 skrev Noble Paul :
>>
>> ". If all paths were closed by default, forgetting to configure a path
>> would not result in a security breach like today."
>>
>> But it will still mean that unauthorized users are able to access,
>> like guest being able to post to "/update". Just authenticating is not
>> enough without proper authorization
>>
>> On Mon, Dec 14, 2015 at 3:59 PM, Jan Høydahl  wrote:
>>>> 1) "read" should cover all the paths
>>>
>>> This is very fragile. If all paths were closed by default, forgetting to 
>>> configure a path would not result in a security breach like today.
>>>
>>> /Jan
>>
>>
>>
>> --
>> -
>> Noble Paul
>



-- 
-
Noble Paul


Re: Security Problems

2015-12-16 Thread Noble Paul
I have opened https://issues.apache.org/jira/browse/SOLR-8429

On Wed, Dec 16, 2015 at 9:32 PM, Noble Paul  wrote:
> I don't this behavior is intuitive. It is very easy to misunderstand
>
> I would rather just add a flag to "authentication" plugin section
> which says "blockUnauthenticated" : true
>
> which means all unauthenticated requests must be blocked.
>
>
>
>
> On Tue, Dec 15, 2015 at 7:09 PM, Jan Høydahl  wrote:
>> Yes, that’s why I believe it should be:
>> 1) if only authentication is enabled, all users must authenticate and all 
>> authenticated users can do anything.
>> 2) if authz is enabled, then all users must still authenticate, and can by 
>> default do nothing at all, unless assigned proper roles
>> 3) if a user is assigned the default “read” rule, and a collection adds a 
>> custom “/myselect” handler, that one is unavailable until the user gets it 
>> assigned
>>
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>>
>>> 14. des. 2015 kl. 14.15 skrev Noble Paul :
>>>
>>> ". If all paths were closed by default, forgetting to configure a path
>>> would not result in a security breach like today."
>>>
>>> But it will still mean that unauthorized users are able to access,
>>> like guest being able to post to "/update". Just authenticating is not
>>> enough without proper authorization
>>>
>>> On Mon, Dec 14, 2015 at 3:59 PM, Jan Høydahl  wrote:
>>>>> 1) "read" should cover all the paths
>>>>
>>>> This is very fragile. If all paths were closed by default, forgetting to 
>>>> configure a path would not result in a security breach like today.
>>>>
>>>> /Jan
>>>
>>>
>>>
>>> --
>>> -
>>> Noble Paul
>>
>
>
>
> --
> -
> Noble Paul



-- 
-
Noble Paul


Re: API accessible without authentication even though Basic Auth Plugin is enabled

2015-12-22 Thread Noble Paul
A 5.3.2 release is coming up which will back port the fixes introduced in
5.4
On Dec 17, 2015 10:25 PM, "tine-2"  wrote:

> Noble Paul നോബിള്‍  नोब्ळ् wrote
> > It works as designed.
> >
> > Protect the read path [...]
>
> Works like described in 5.4.0, didn't work in 5.3.1, s.
> https://issues.apache.org/jira/browse/SOLR-8408
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/API-accessible-without-authentication-even-though-Basic-Auth-Plugin-is-enabled-tp4244940p4246099.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Solr server not starting

2016-01-07 Thread Paul Hoffman
On Wed, Jan 06, 2016 at 05:11:06PM +0100, agonn Qurdina wrote:
> Hi,
> 
> I am using Solr server with Echoprint service 
> (https://github.com/echonest/echoprint-server). The first time I started
>  it everything worked perfectly. This is the way I started it:
> 
> java -Dsolr.solr.home=/home/echoprint-server/solr/solr/solr/ 
> -Djava.awt.headless=true -Xmx2048m -Xms2048m -jar start.jar
> 
> Then I stopped it and I cannot start it anymore as it gets stuck at the 3rd 
> row of execution:
> 
> 2016-01-06 11:04:19.030::INFO:  Logging to STDERR via 
> org.mortbay.log.StdErrLog
> 2016-01-06 11:04:19.165::INFO:  jetty-6.1.3
> 2016-01-06 11:04:19.231::INFO:  Extract 
> jar:file:/home/echoprint-server/solr/solr/webapps/solr.war!/ to 
> /tmp/Jetty_0_0_0_0_8502_solr.war__solr__-rnc92a/webapp
> 
> It does not continue to execute anymore. I check if it is running in the
>  processes list and it turns out it is NOT. Please help me to solve this
>  problem!
> 
> Best regards,
> 
> Agon

This could be a permissions problem -- for example, perhaps you started 
it as root the first time and are now attempting to start it as some 
other user.

Paul.

-- 
Paul Hoffman 
Systems Librarian
Fenway Libraries Online
c/o Wentworth Institute of Technology
550 Huntington Ave.
Boston, MA 02115
(617) 442-2384 (FLO main number)


Re: URI is too long

2016-01-31 Thread Paul Libbrecht
How about using POST?

paul

> Salman Ansari <mailto:salman.rah...@gmail.com>
> 31 January 2016 at 15:20
> Hi,
>
> I am building a long query containing multiple ORs between query terms. I
> started to receive the following exception:
>
> The remote server returned an error: (414) Request-URI Too Long. Any idea
> what is the limit of the URL in Solr? Moreover, as a solution I was
> thinking of chunking the query into multiple requests but I was wondering
> if anyone has a better approach?
>
> Regards,
> Salman
>



Logging request times

2016-02-10 Thread McCallick, Paul
We’re trying to fine tune our query and ingestion performance and would like to 
get more metrics out of SOLR around this.  We are capturing the standard logs 
as well as the jetty request logs.  The standard logs get us QTime, which is 
not a good indication of how long the actual request took to process.  The 
Jetty request logs only show requests between nodes.  I can’t seem to find the 
client requests in there.

I’d like to start tracking:

  *   each request to index a document (or batch of documents) and the time it 
took.
  *   Each request to execute a query and the time it took.

Thanks,

Paul McCallick
Sr Manager Information Technology
eCommerce Foundation



Re: Custom auth plugin not loaded in SolrCloud

2016-02-11 Thread Noble Paul
yes, runtime lib cannot be used for loading container level plugins
yet. Eventually they must. You can open a ticket

On Mon, Jan 4, 2016 at 1:07 AM, tine-2  wrote:
> Hi,
>
> are there any news on this? Was anyone able to get it to work?
>
> Cheers,
>
> tine
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Custom-auth-plugin-not-loaded-in-SolrCloud-tp4245670p4248340.html
> Sent from the Solr - User mailing list archive at Nabble.com.



-- 
-----
Noble Paul


Re: Logging request times

2016-02-13 Thread McCallick, Paul
I stand corrected.  The Jetty request logs do indeed contain ALL of the 
traffic, both from other nodes and from query requests.

For the record, it is valuable to capture the time at the client AND from the 
server to track latency or compression issues.




On 2/11/16, 8:13 AM, "Shawn Heisey"  wrote:

>On 2/10/2016 10:33 AM, McCallick, Paul wrote:
>> We’re trying to fine tune our query and ingestion performance and would like 
>> to get more metrics out of SOLR around this.  We are capturing the standard 
>> logs as well as the jetty request logs.  The standard logs get us QTime, 
>> which is not a good indication of how long the actual request took to 
>> process.  The Jetty request logs only show requests between nodes.  I can’t 
>> seem to find the client requests in there.
>>
>> I’d like to start tracking:
>>
>>   *   each request to index a document (or batch of documents) and the time 
>> it took.
>>   *   Each request to execute a query and the time it took.
>
>The Jetty request log will usually include the IP address of the client
>making the request.  If IP addresses are included in your log and you
>aren't seeing anything from your client address(es), perhaps those
>requests are being sent to another node.
>
>Logging elapsed time is also something that the clients can do.  If the
>client is using SolrJ, every response object has a "getElapsedTime"
>method (and also "getQTime") that would allow the client program to log
>the elapsed time without doing its own calculation.  Or the client
>program could calculate the elapsed time using whatever facilities are
>available in the relevant language.
>
>Thanks,
>Shawn
>


Adding nodes

2016-02-13 Thread McCallick, Paul
I’d like to verify the following:

 - When creating a new collection, SOLRCloud will use all available nodes for 
the collection, adding cores to each.  This assumes that you do not specify a 
replicationFactor.

 - When adding new nodes to the cluster AFTER the collection is created, one 
must use the core admin api to add the node to the collection.

I would really like to see the second case behave more like the first.  If I 
add a node to the cluster, it is automatically used as a replica for existing 
clusters without my having to do so.  This would really simplify things.




Paul McCallick
Sr Manager Information Technology
eCommerce Foundation



Re: Highlight brings the content from the first pages of pdf

2016-02-14 Thread Paul Libbrecht
This looks like the stored content is shortened. Can it be?
Can you see that inside the docs?

paul

> Evert R. <mailto:evert.ra...@gmail.com>
> 14 February 2016 at 11:26
> Hi There,
>
> I have a situation where started a techproducts, without any modification,
> post a pdf file. When searching as:
>
> q=text:search_word
> hl=true
> hl.fl=content
>
> It show the highlight accordingly! =)
>
> BUT... *if the "search_word" is after the first pages* in my pdf file,
> such
> as page 15...
>
> It simply *does not show* *the HIGHLIGHT*...
>
> Does anyone has faced this situation before?
>
>
> Thanks!
>
>
> *--Evert*
>



Re: Adding nodes

2016-02-14 Thread McCallick, Paul
Then what is the suggested way to add a new node to a collection via the apis?  
I  am specifically thinking of autoscale scenarios where a node has gone down 
or more nodes are needed to handle load. 

Note that the ADDREPLICA endpoint requires a shard name, which puts the onus of 
how to scale out on the user. This can be challenging in an autoscale scenario. 

Thanks,
Paul

> On Feb 14, 2016, at 12:25 AM, Shawn Heisey  wrote:
> 
>> On 2/13/2016 6:01 PM, McCallick, Paul wrote:
>> - When creating a new collection, SOLRCloud will use all available nodes for 
>> the collection, adding cores to each.  This assumes that you do not specify 
>> a replicationFactor.
> 
> The number of nodes that will be used is numShards multipled by
> replicationFactor.  The default value for replicationFactor is 1.  If
> you do not specify numShards, there is no default -- the CREATE call
> will fail.  The value of maxShardsPerNode can also affect the overall
> result.
> 
>> - When adding new nodes to the cluster AFTER the collection is created, one 
>> must use the core admin api to add the node to the collection.
> 
> Using the CoreAdmin API is strongly discouraged when running SolrCloud. 
> It works, but it is an expert API when in cloud mode, and can cause
> serious problems if not used correctly.  Instead, use the Collections
> API.  It can handle all normal maintenance needs.
> 
>> I would really like to see the second case behave more like the first.  If I 
>> add a node to the cluster, it is automatically used as a replica for 
>> existing clusters without my having to do so.  This would really simplify 
>> things.
> 
> I've added a FAQ entry to address why this is a bad idea.
> 
> https://wiki.apache.org/solr/FAQ#Why_doesn.27t_SolrCloud_automatically_create_replicas_when_I_add_nodes.3F
> 
> Thanks,
> Shawn
> 


Re: Adding nodes

2016-02-14 Thread McCallick, Paul
Hi all,


This doesn’t really answer the following question:

What is the suggested way to add a new node to a collection via the
apis?  I  am specifically thinking of autoscale scenarios where a node has
gone down or more nodes are needed to handle load.


The coreadmin api makes this easy.  The collections api (ADDREPLICA), makes 
this very difficult.


On 2/14/16, 8:19 AM, "Susheel Kumar"  wrote:

>Hi Paul,
>
>Shawn is referring to use Collections API
>https://cwiki.apache.org/confluence/display/solr/Collections+API  than Core
>Admin API https://cwiki.apache.org/confluence/display/solr/CoreAdmin+API
>for SolrCloud.
>
>Hope that clarifies and you mentioned about ADDREPLICA which is the
>collections API, so you are on right track.
>
>Thanks,
>Susheel
>
>
>
>On Sun, Feb 14, 2016 at 10:51 AM, McCallick, Paul <
>paul.e.mccall...@nordstrom.com> wrote:
>
>> Then what is the suggested way to add a new node to a collection via the
>> apis?  I  am specifically thinking of autoscale scenarios where a node has
>> gone down or more nodes are needed to handle load.
>>
>> Note that the ADDREPLICA endpoint requires a shard name, which puts the
>> onus of how to scale out on the user. This can be challenging in an
>> autoscale scenario.
>>
>> Thanks,
>> Paul
>>
>> > On Feb 14, 2016, at 12:25 AM, Shawn Heisey  wrote:
>> >
>> >> On 2/13/2016 6:01 PM, McCallick, Paul wrote:
>> >> - When creating a new collection, SOLRCloud will use all available
>> nodes for the collection, adding cores to each.  This assumes that you do
>> not specify a replicationFactor.
>> >
>> > The number of nodes that will be used is numShards multipled by
>> > replicationFactor.  The default value for replicationFactor is 1.  If
>> > you do not specify numShards, there is no default -- the CREATE call
>> > will fail.  The value of maxShardsPerNode can also affect the overall
>> > result.
>> >
>> >> - When adding new nodes to the cluster AFTER the collection is created,
>> one must use the core admin api to add the node to the collection.
>> >
>> > Using the CoreAdmin API is strongly discouraged when running SolrCloud.
>> > It works, but it is an expert API when in cloud mode, and can cause
>> > serious problems if not used correctly.  Instead, use the Collections
>> > API.  It can handle all normal maintenance needs.
>> >
>> >> I would really like to see the second case behave more like the first.
>> If I add a node to the cluster, it is automatically used as a replica for
>> existing clusters without my having to do so.  This would really simplify
>> things.
>> >
>> > I've added a FAQ entry to address why this is a bad idea.
>> >
>> >
>> https://wiki.apache.org/solr/FAQ#Why_doesn.27t_SolrCloud_automatically_create_replicas_when_I_add_nodes.3F
>> >
>> > Thanks,
>> > Shawn
>> >
>>


Re: Adding nodes

2016-02-14 Thread McCallick, Paul
These are excellent questions and give me a good sense of why you suggest using 
the collections api.

In our case we have 8 shards of product data with a even distribution of data 
per shard, no hot spots. We have very different load at different points in the 
year (cyber monday), and we tend to have very little traffic at night. I'm 
thinking of two use cases:

1) we are seeing increased latency due to load and want to add 8 more replicas 
to handle the query volume.  Once the volume subsides, we'd remove the nodes. 

2) we lose a node due to some unexpected failure (ec2 tends to do this). We 
want auto scaling to detect the failure and add a node to replace the failed 
one. 

In both cases the core api makes it easy. It adds nodes to the shards evenly. 
Otherwise we have to write a fairly involved script that is subject to race 
conditions to determine which shard to add nodes to. 

Let me know if I'm making dangerous or uninformed assumptions, as I'm new to 
solr. 

Thanks,
Paul

> On Feb 14, 2016, at 10:35 AM, Susheel Kumar  wrote:
> 
> Hi Pual,
> 
> 
> For Auto-scaling, it depends on how you are thinking to design and what/how
> do you want to scale. Which scenario you think makes coreadmin API easy to
> use for a sharded SolrCloud environment?
> 
> Isn't if in a sharded environment (assume 3 shards A,B & C) and shard B has
> having higher or more load,  then you want to add Replica for shard B to
> distribute the load or if a particular shard replica goes down then you
> want to add another Replica back for the shard in which case ADDREPLICA
> requires a shard name?
> 
> Can you describe your scenario / provide more detail?
> 
> Thanks,
> Susheel
> 
> 
> 
> On Sun, Feb 14, 2016 at 11:51 AM, McCallick, Paul <
> paul.e.mccall...@nordstrom.com> wrote:
> 
>> Hi all,
>> 
>> 
>> This doesn’t really answer the following question:
>> 
>> What is the suggested way to add a new node to a collection via the
>> apis?  I  am specifically thinking of autoscale scenarios where a node has
>> gone down or more nodes are needed to handle load.
>> 
>> 
>> The coreadmin api makes this easy.  The collections api (ADDREPLICA),
>> makes this very difficult.
>> 
>> 
>>> On 2/14/16, 8:19 AM, "Susheel Kumar"  wrote:
>>> 
>>> Hi Paul,
>>> 
>>> Shawn is referring to use Collections API
>>> https://cwiki.apache.org/confluence/display/solr/Collections+API  than
>> Core
>>> Admin API https://cwiki.apache.org/confluence/display/solr/CoreAdmin+API
>>> for SolrCloud.
>>> 
>>> Hope that clarifies and you mentioned about ADDREPLICA which is the
>>> collections API, so you are on right track.
>>> 
>>> Thanks,
>>> Susheel
>>> 
>>> 
>>> 
>>> On Sun, Feb 14, 2016 at 10:51 AM, McCallick, Paul <
>>> paul.e.mccall...@nordstrom.com> wrote:
>>> 
>>>> Then what is the suggested way to add a new node to a collection via the
>>>> apis?  I  am specifically thinking of autoscale scenarios where a node
>> has
>>>> gone down or more nodes are needed to handle load.
>>>> 
>>>> Note that the ADDREPLICA endpoint requires a shard name, which puts the
>>>> onus of how to scale out on the user. This can be challenging in an
>>>> autoscale scenario.
>>>> 
>>>> Thanks,
>>>> Paul
>>>> 
>>>>> On Feb 14, 2016, at 12:25 AM, Shawn Heisey 
>> wrote:
>>>>> 
>>>>>> On 2/13/2016 6:01 PM, McCallick, Paul wrote:
>>>>>> - When creating a new collection, SOLRCloud will use all available
>>>> nodes for the collection, adding cores to each.  This assumes that you
>> do
>>>> not specify a replicationFactor.
>>>>> 
>>>>> The number of nodes that will be used is numShards multipled by
>>>>> replicationFactor.  The default value for replicationFactor is 1.  If
>>>>> you do not specify numShards, there is no default -- the CREATE call
>>>>> will fail.  The value of maxShardsPerNode can also affect the overall
>>>>> result.
>>>>> 
>>>>>> - When adding new nodes to the cluster AFTER the collection is
>> created,
>>>> one must use the core admin api to add the node to the collection.
>>>>> 
>>>>> Using the CoreAdmin API is strongly discouraged when running
>> SolrCloud.
>>>>> It works, but it is an expert API when in cloud mode, and can cause
>>>>> serious problems if not used correctly.  Instead, use the Collections
>>>>> API.  It can handle all normal maintenance needs.
>>>>> 
>>>>>> I would really like to see the second case behave more like the
>> first.
>>>> If I add a node to the cluster, it is automatically used as a replica
>> for
>>>> existing clusters without my having to do so.  This would really
>> simplify
>>>> things.
>>>>> 
>>>>> I've added a FAQ entry to address why this is a bad idea.
>> https://wiki.apache.org/solr/FAQ#Why_doesn.27t_SolrCloud_automatically_create_replicas_when_I_add_nodes.3F
>>>>> 
>>>>> Thanks,
>>>>> Shawn
>> 


Re: Need to move on SOlr cloud (help required)

2016-02-16 Thread Paul Borgermans
On 16 February 2016 at 06:09, Midas A  wrote:

> Susheel,
>
> Is there any client available in php for solr cloud which maintain the same
> ??
>
>
No there is none. I recommend HAProxy for Non SolrJ clients and
loadbalancing SolrCloud.
HAProxy makes it also easy to do rolling updates of your SolrCloud nodes

Hth
Paul


>
> On Tue, Feb 16, 2016 at 7:31 AM, Susheel Kumar 
> wrote:
>
> > In SolrJ, you would use CloudSolrClient which interacts with Zookeeper
> > (which maintains Cluster State). See CloudSolrClient API. So that's how
> > SolrJ would know which node is down or not.
>


Solr won't start -- java.lang.ClassNotFoundException: org.eclipse.jetty.xml.XmlConfiguration

2016-03-15 Thread Paul Hoffman
I've been running Solr successfully until this morning, when I stopped 
it to pick up a change in my schema, and now it won't start up again.  
I've whittled the problem down to this:

----
# cd /home/paul/proj/blacklight/jetty

# java -jar start.jar -Djetty.port=8983 -Dsolr.solr.home=$PWD/solr
WARNING: System properties and/or JVM args set.  Consider using --dry-run or 
--exec
java.lang.ClassNotFoundException: org.eclipse.jetty.xml.XmlConfiguration
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.eclipse.jetty.start.Main.invokeMain(Main.java:440)
at org.eclipse.jetty.start.Main.start(Main.java:615)
at org.eclipse.jetty.start.Main.main(Main.java:96)
ClassNotFound: org.eclipse.jetty.xml.XmlConfiguration

Usage: java -jar start.jar [options] [properties] [configs]
   java -jar start.jar --help  # for more information

# readlink -e $(which java)
/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java

# uname -srvmpio
Linux 3.16.0-57-generic #77~14.04.1-Ubuntu SMP Thu Dec 17 23:20:00 UTC 2015 
x86_64 x86_64 x86_64 GNU/Linux

# env | fgrep JAVA
[no output]


I only have one JVM installed -- openjdk-8-jre-headless.  Judging from 
the file timestamps within /usr/lib/jvm, the package hasn't been updated 
since last August at the latest; the server has only been up for 62 
days.

Just in case it matters, I was running Solr successfully under 
Blacklight's jetty wrapper, and the command line above is what it uses 
(or claims to use).

Does anyone have any idea what might be causing this problem?

Thanks in advance,

Paul.

-- 
Paul Hoffman 
Systems Librarian
Fenway Libraries Online
c/o Wentworth Institute of Technology
550 Huntington Ave.
Boston, MA 02115
(617) 442-2384 (FLO main number)


Re: Solr won't start -- java.lang.ClassNotFoundException: org.eclipse.jetty.xml.XmlConfiguration

2016-03-15 Thread Paul Hoffman
On Tue, Mar 15, 2016 at 01:46:32PM -0600, Shawn Heisey wrote:
> On 3/15/2016 1:34 PM, Paul Hoffman wrote:
> > I've been running Solr successfully until this morning, when I stopped 
> > it to pick up a change in my schema, and now it won't start up again.  
> > I've whittled the problem down to this:
> >
> > 
> > # cd /home/paul/proj/blacklight/jetty
> >
> > # java -jar start.jar -Djetty.port=8983 -Dsolr.solr.home=$PWD/solr
> > WARNING: System properties and/or JVM args set.  Consider using --dry-run 
> > or --exec
> > java.lang.ClassNotFoundException: org.eclipse.jetty.xml.XmlConfiguration
> > at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> > at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> > at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> > at org.eclipse.jetty.start.Main.invokeMain(Main.java:440)
> > at org.eclipse.jetty.start.Main.start(Main.java:615)
> > at org.eclipse.jetty.start.Main.main(Main.java:96)
> > ClassNotFound: org.eclipse.jetty.xml.XmlConfiguration
> 
> There are no Solr classes in that stacktrace.  The class that can't be
> found is a Jetty class.  I think the problem here is in Jetty, not
> Solr.  It probably can't find a jar with a name like one of these:
> 
> jetty-xml-8.1.14.v20131031.jar
> jetty-xml-9.2.13.v20150730.jar
> 
> What version of Solr?  I'm assuming it's not 5.x, since the command used
> to start those versions is very different, and Solr would probably not
> be located within a blacklight folder.

Thanks, Shawn.  Which version indeed -- I have a mishmash of cruft lying 
around from earlier attempts to get Solr and Blacklight running, so I 
don't want to assume anything.  I found the log file that shows me 
stopping and starting Solr today:


# ls -ltr $(find $(locate log | egrep 'solr|jetty') -type f -mtime -1) | head 
-n5
find: `/home/paul/proj/blacklight/jetty/logs/solr.log': No such file or 
directory
-rw-rw-r-- 1 paul paul 2885083 Mar 15 11:38 
/home/paul/proj/blacklight/jetty/logs/solr_log_20160315_1152
-rw-r--r-- 1 root root5088 Mar 15 11:49 
/home/paul/proj/solr-5.3.1/server/logs/solr_log_20160315_1150
-rw-r--r-- 1 root root   26701 Mar 15 11:49 
/home/paul/proj/solr-5.3.1/server/logs/solr_gc_log_20160315_1150
-rw-rw-r-- 1 paul paul5086 Mar 15 11:51 
/home/paul/proj/solr-5.3.1/server/logs/solr_log_20160315_1546
-rw-rw-r-- 1 paul paul   23537 Mar 15 11:51 
/home/paul/proj/solr-5.3.1/server/logs/solr_gc_log_20160315_1546

# LOGFILE=/home/paul/proj/blacklight/jetty/logs/solr_log_20160315_1152

# egrep -nw 'stopped|org.eclipse.jetty.server.Server;' $LOGFILE | tail
5852:INFO  - 2016-01-08 16:16:32.222; org.eclipse.jetty.server.Server; 
jetty-8.1.10.v20130312
6128:INFO  - 2016-01-13 13:01:58.338; org.eclipse.jetty.server.Server; 
jetty-8.1.10.v20130312
6281:INFO  - 2016-01-14 08:41:03.025; org.eclipse.jetty.server.Server; 
jetty-8.1.10.v20130312
7792:INFO  - 2016-02-08 11:57:41.131; org.eclipse.jetty.server.Server; 
jetty-8.1.10.v20130312
7957:INFO  - 2016-02-08 12:01:48.361; org.eclipse.jetty.server.Server; 
jetty-8.1.10.v20130312
8174:INFO  - 2016-02-08 15:03:18.641; org.eclipse.jetty.server.Server; 
jetty-8.1.10.v20130312
8773:INFO  - 2016-02-10 12:05:25.639; org.eclipse.jetty.server.Server; 
jetty-8.1.10.v20130312
12244:INFO  - 2016-03-15 11:38:16.810; org.eclipse.jetty.server.Server; 
Graceful shutdown SocketConnector@0.0.0.0:8983
12245:INFO  - 2016-03-15 11:38:16.814; org.eclipse.jetty.server.Server; 
Graceful shutdown 
o.e.j.w.WebAppContext{/solr,file:/home/paul/proj/blacklight/jetty/solr-webapp/webapp/},/home/paul/proj/blacklight/jetty/webapps/solr.war
12262:INFO  - 2016-03-15 11:38:18.473; 
org.eclipse.jetty.server.handler.ContextHandler; stopped 
o.e.j.w.WebAppContext{/solr,file:/home/paul/proj/blacklight/jetty/solr-webapp/webapp/},/home/paul/proj/blacklight/jetty/webapps/solr.war


It looks like the last time it was last restarted was on February 10 
(line 8773).  The log file doesn't show the Solr version directly, but 
maybe the first lines will help: 


# sed -n 8773,8795p $LOGFILE
INFO  - 2016-02-10 12:05:25.639; org.eclipse.jetty.server.Server; 
jetty-8.1.10.v20130312
INFO  - 2016-02-10 12:05:25.703; 
org.eclipse.jetty.deploy.providers.ScanningAppProvider; Deployment monitor 
/home/paul/proj/blacklight/jetty/contexts at interval 0
INFO  - 2016-02-10 12:05:25.714; org.eclipse.jetty.deploy.DeploymentManager; 
Deployable added: /home/paul/proj/blacklight/jetty/

Re: Solr won't start -- java.lang.ClassNotFoundException: org.eclipse.jetty.xml.XmlConfiguration

2016-03-18 Thread Paul Hoffman
On Tue, Mar 15, 2016 at 07:58:21PM -0600, Shawn Heisey wrote:
> On 3/15/2016 2:56 PM, Paul Hoffman wrote:
> >> It sure looks like I started Solr from my blacklight project dir.
> >>
> >> Any ideas?  Thanks,
> >>
> 
> You may need to get some help from the blacklight project.  I've got
> absolutely no idea what sort of integration they may have done with
> Solr, what they may have changed, or how they've arranged the filesystem.
> 
> Regarding the Jetty problem, in the directory where the "start.jar" that
> you are running lives, there should be a lib directory, with various
> jetty jars.  The jetty-xml jar should be one of them.  Here's a listing
> of Jetty's lib directory from a Solr 4.9.1 install that I've got.  I
> have upgraded to a newer version of Jetty:
> 
> root@bigindy5:/opt/solr4# ls -al lib
> total 1496
> drwxr-xr-x  3 solr solr   4096 Aug 31  2015 .
> drwxr-xr-x 13 solr solr   4096 Aug 31  2015 ..
> drwxr-xr-x  2 solr solr   4096 Aug 31  2015 ext
> -rw-r--r--  1 solr solr  21162 Aug 31  2015
> jetty-continuation-8.1.14.v20131031.jar
> -rw-r--r--  1 solr solr  61908 Aug 31  2015
> jetty-deploy-8.1.14.v20131031.jar
> -rw-r--r--  1 solr solr  96122 Aug 31  2015 jetty-http-8.1.14.v20131031.jar
> -rw-r--r--  1 solr solr 104219 Aug 31  2015 jetty-io-8.1.14.v20131031.jar
> -rw-r--r--  1 solr solr  24770 Aug 31  2015 jetty-jmx-8.1.14.v20131031.jar
> -rw-r--r--  1 solr solr  89923 Aug 31  2015
> jetty-security-8.1.14.v20131031.jar
> -rw-r--r--  1 solr solr 357704 Aug 31  2015
> jetty-server-8.1.14.v20131031.jar
> -rw-r--r--  1 solr solr 101714 Aug 31  2015
> jetty-servlet-8.1.14.v20131031.jar
> -rw-r--r--  1 solr solr 287680 Aug 31  2015 jetty-util-8.1.14.v20131031.jar
> -rw-r--r--  1 solr solr 110096 Aug 31  2015
> jetty-webapp-8.1.14.v20131031.jar
> -rw-r--r--  1 solr solr  39065 Aug 31  2015 jetty-xml-8.1.14.v20131031.jar
> -rw-r--r--  1 solr solr 200387 Aug 31  2015 servlet-api-3.0.jar
> 
> The Jetty included with blacklight may contain more jars than this.  The
> Solr jetty install is stripped down so it's very lean.
> 
> Thanks,
> Shawn
> 

That's exactly what I have now -- I got them from the blacklight-jetty 
repo -- except the versions are slightly different 
(jetty-*-8.1.10.v20130312.jar instead of jetty-*-8.1.14.v20131031.jar).

My assumption now is that I was running a significantly older version of 
Solr -- I was getting some deprecation warnings and an error that prevented 
loading my Blacklight core.  However, using the new jetty jars and 
making some adjustments in my schema.xml has got the Solr end of things 
working again, so I'll take any further questions to the Blacklight 
list.

Thanks again for your help,

Paul.

-- 
Paul Hoffman 
Systems Librarian
Fenway Libraries Online
c/o Wentworth Institute of Technology
550 Huntington Ave.
Boston, MA 02115
(617) 442-2384 (FLO main number)


Re: Indexing using CSV

2016-03-21 Thread Paul Hoffman
On Sun, Mar 20, 2016 at 06:11:32PM -0700, Jay Potharaju wrote:
> Hi,
> I am trying to index some data using csv files. The data contains
> description column, which can include quotes, comma, LF/CR & other special
> characters.
> 
> I have it working but run into an issue with the following error
> 
> line=5,can't read line: 5 values={NO LINES AVAILABLE}.
> 
> What is the best way to debug this issue and secondly how do other people
> handle indexing data using csv data.

I would concentrate first on getting the CSV reader working verifiably, 
which might be the hardest part -- CSV is not a file format, it's a 
hodgepodge.

Paul.

-- 
Paul Hoffman 
Systems Librarian
Fenway Libraries Online
c/o Wentworth Institute of Technology
550 Huntington Ave.
Boston, MA 02115
(617) 442-2384 (FLO main number)


Delete by query using JSON?

2016-03-22 Thread Paul Hoffman
I've been struggling to find the right syntax for deleting by query 
using JSON, where the query includes an fq parameter.

I know how to delete *all* documents, but how would I delete only 
documents with field doctype = "cres"?  I have tried the following along 
with a number of variations, all to no avail:

$ curl -s -d @- 'http://localhost:8983/solr/blacklight-core/update?wt=json' 
<http://localhost:8983/solr/blacklight-core/select?q=&fq=doctype%3Acres&wt=json&fl=id'

It seems like such a simple thing, but I haven't found any examples that 
use an fq.  Could someone post an example?

Thanks in advance,

Paul.

-- 
Paul Hoffman 
Systems Librarian
Fenway Libraries Online
c/o Wentworth Institute of Technology
550 Huntington Ave.
Boston, MA 02115
(617) 442-2384 (FLO main number)


Re: Delete by query using JSON?

2016-03-23 Thread Paul Hoffman
On Tue, Mar 22, 2016 at 04:27:03PM -0700, Walter Underwood wrote:
> “Why do you care?” might not be the best way to say it, but it is 
> essential to understand the difference between selection (filtering) 
> and ranking.
> 
> As Solr params:
> 
> * q is ranking and filtering
> * fq is filtering only
> * bq is ranking only

Thanks, that is a very useful and concise synopsis.

> When deleting documents, ordering does not matter, which is why we ask 
> why you care about the ordering.
> 
> If the response is familiar to you, imagine how the questions sound to 
> people who have been working in search for twenty years. But even when 
> we are snippy, we still try to help.
> 
> Many, many times, the question is wrong. The most common difficulty on 
> this list is an “XY problem”, where the poster has problem X and has 
> assumed solution Y, which is not the right solution. But they ask 
> about Y. So we will tell people that their approach is wrong, because 
> that is the most helpful thing we can do.

Alex's response didn't seem snippy to me at all, and I agree 
wholeheartedly about the wrong-question problem -- in my case, not only 
was I asking the wrong question, but I shouldn't even have had to ask 
the (right) question at all!

Thanks again, everyone.

Paul.

-- 
Paul Hoffman 
Systems Librarian
Fenway Libraries Online
c/o Wentworth Institute of Technology
550 Huntington Ave.
Boston, MA 02115
(617) 442-2384 (FLO main number)


Re: Delete by query using JSON?

2016-03-23 Thread Paul Hoffman
On Tue, Mar 22, 2016 at 10:25:06PM -0400, Jack Krupansky wrote:
> See the correct syntax example here:
> https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-SendingJSONUpdateCommands
> 
> Your query is fine.

Thanks; I thought the query was wrong, but the example you pointed 
me to clued me in to the real problem: I had neglected to specify 
Content-Type: application.json (d'oh!).

Paul.

-- 
Paul Hoffman 
Systems Librarian
Fenway Libraries Online
c/o Wentworth Institute of Technology
550 Huntington Ave.
Boston, MA 02115
(617) 442-2384 (FLO main number)


Re: Regarding JSON indexing in SOLR 4.10

2016-03-30 Thread Paul Hoffman
On Tue, Mar 29, 2016 at 11:30:06PM -0700, Aditya Desai wrote:
> I am running SOLR 4.10 on port 8984 by changing the default port in
> etc/jetty.xml. I am now trying to index all my JSON files to Solr running
> on 8984. The following is the command
> 
> curl 'http://localhost:8984/solr/update?commit=true' --data-binary *.json
> -H 'Content-type:application/json'

The wildcard is the problem; your shell is expanding --data-binary 
*.json to --data-binary foo.json bar.json baz.json and curl doesn't know 
how to download bar.json and baz.json.

Try this instead:

for file in *.json; do
curl 'http://localhost:8984/solr/update?commit=true' --data-binary "$file" 
-H 'Content-type:application/json'
done

Paul.

-- 
Paul Hoffman 
Systems Librarian
Fenway Libraries Online
c/o Wentworth Institute of Technology
550 Huntington Ave.
Boston, MA 02115
(617) 442-2384 (FLO main number)


Re: Basic auth

2015-07-22 Thread Noble Paul
Solr 5.3 is coming with proper basic auth support


https://issues.apache.org/jira/browse/SOLR-7692

On Wed, Jul 22, 2015 at 5:28 PM, Peter Sturge  wrote:
> if you're using Jetty you can use the standard realms mechanism for Basic
> Auth, and it works the same on Windows or UNIX. There's plenty of docs on
> the Jetty site about getting this working, although it does vary somewhat
> depending on the version of Jetty you're running (N.B. I would suggest
> using Jetty 9, and not 8, as 8 is missing some key authentication classes).
> If, when you execute a search query to your Solr instance you get a
> username and password popup, then Jetty's auth is setup. If you don't then
> something's wrong in the Jetty config.
>
> it's worth noting that if you're doing distributed searches Basic Auth on
> its own will not work for you. This is because Solr sends distributed
> requests to remote instances on behalf of the user, and it has no knowledge
> of the web container's auth mechanics. We got 'round this by customizing
> Solr to receive credentials and use them for authentication to remote
> instances - SOLR-1861 is an old implementation for a previous release, and
> there has been some significant refactoring of SearchHandler since then,
> but the concept works well for distributed queries.
>
> Thanks,
> Peter
>
>
>
> On Wed, Jul 22, 2015 at 11:18 AM, O. Klein  wrote:
>
>> Steven White wrote
>> > Thanks for updating the wiki page.  However, my issue remains, I cannot
>> > get
>> > Basic auth working.  Has anyone got it working, on Windows?
>>
>> Doesn't work for me on Linux either.
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Basic-auth-tp4218053p4218519.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>



-- 
-
Noble Paul


Re: Basic auth

2015-07-27 Thread Noble Paul
Q.do you know when it would be released?
5.3 will be released in another 3-4 weeks .

Q.Are there any requirements of ZK authentication must be there as well?
NO

bq.Providing my own security.json + class/implementation to verify
user/pass should work today with 5.2, right?

Yes. But, if you modify your credentials or anything in that JSON, you
will have to restart all your nodes .

Q.SOLR-7274 pluggable security is already in 5.2 (my requirement is to
provide user/pass in a secure manner, not as argument on cmd or from
(our unsecured) ZK but from a configuration restful service,

I'm not clear what your question is. Basic Auth is a well-known
standard. We are just implementing that standard. We store all
credentials & permissions in ZK . That means it is only as secure as
your ZK . As long as nobody can write to ZK, your system is safe

On Wed, Jul 22, 2015 at 11:10 PM, Fadi Mohsen  wrote:
> Hi, I have some questions regarding basic auth and proper support in 5.3:
>
> do you know when it would be released?
>
> Are there any requirements of ZK authentication must be there as well?
>
> Do we store the user/pass in ZK?
>
> SOLR-7274 pluggable security is already in 5.2 (my requirement is to provide 
> user/pass in a secure manner, not as argument on cmd or from (our unsecured) 
> ZK but from a configuration restful service,
> I'm not sure 5.3 release would fit above requirement, can you reflect on this?
>
> Providing my own security.json + class/implementation to verify user/pass 
> should work today with 5.2, right?
>
> Thanks
> Fadi
>
>> On 22 Jul 2015, at 14:33, Noble Paul  wrote:
>>
>> Solr 5.3 is coming with proper basic auth support
>>
>>
>> https://issues.apache.org/jira/browse/SOLR-7692
>>
>>> On Wed, Jul 22, 2015 at 5:28 PM, Peter Sturge  
>>> wrote:
>>> if you're using Jetty you can use the standard realms mechanism for Basic
>>> Auth, and it works the same on Windows or UNIX. There's plenty of docs on
>>> the Jetty site about getting this working, although it does vary somewhat
>>> depending on the version of Jetty you're running (N.B. I would suggest
>>> using Jetty 9, and not 8, as 8 is missing some key authentication classes).
>>> If, when you execute a search query to your Solr instance you get a
>>> username and password popup, then Jetty's auth is setup. If you don't then
>>> something's wrong in the Jetty config.
>>>
>>> it's worth noting that if you're doing distributed searches Basic Auth on
>>> its own will not work for you. This is because Solr sends distributed
>>> requests to remote instances on behalf of the user, and it has no knowledge
>>> of the web container's auth mechanics. We got 'round this by customizing
>>> Solr to receive credentials and use them for authentication to remote
>>> instances - SOLR-1861 is an old implementation for a previous release, and
>>> there has been some significant refactoring of SearchHandler since then,
>>> but the concept works well for distributed queries.
>>>
>>> Thanks,
>>> Peter
>>>
>>>
>>>
>>>> On Wed, Jul 22, 2015 at 11:18 AM, O. Klein  wrote:
>>>>
>>>> Steven White wrote
>>>>> Thanks for updating the wiki page.  However, my issue remains, I cannot
>>>>> get
>>>>> Basic auth working.  Has anyone got it working, on Windows?
>>>>
>>>> Doesn't work for me on Linux either.
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://lucene.472066.n3.nabble.com/Basic-auth-tp4218053p4218519.html
>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
>>
>> --
>> -
>> Noble Paul



-- 
-
Noble Paul


Re: Basic auth

2015-07-30 Thread Noble Paul
"Although I'm not sure why you took this approach instead of
supporting  simple built-in basic auth and let us configure security
the "old/easy" way"

Going with Jetty basic auth is not useful in a large enough  cluster.
Where do you store the credentials and how would you propagate it
across the cluster. When you use Solr, you need a SOlr like way of
managing that. The other problem is inter-node communication. How do
you pass credentials along in that case

"I'm guessing it has to do with future requirement of field/doc level security"

Acutally that is an orthogonal requirement

"I hope you can get rid of the war file soon and start promoting Solr
as a set of libraries so one can easily embed/extend Solr"

That is not what we have in mind. We want Solr to be a server which
controls every aspect of its running . We should have the choice of
getting rid of jetty or whatsoever and move to a new system. We only
guarantee to interface/protocol to remain constant



On Tue, Jul 28, 2015 at 2:19 AM, Fadi Mohsen  wrote:
> Thank you, I tested providing my implementation of authentication in 
> security.json, uploaded file to ZK (just considering authentication), started 
> nodes and it worked like a charm.
>
> That required of course turning off Jetty basic auth.
>
> Although I'm not sure why you took this approach instead of supporting  
> simple built-in basic auth and let us configure security the "old/easy" way.
>
> I'm guessing it has to do with future requirement of field/doc level security.
>
> I hope you can get rid of the war file soon and start promoting Solr as a set 
> of libraries so one can easily embed/extend Solr, since some (especially me) 
> might consider command line ZK operations are not that "continues 
> delivery/automate everything/production" friendly.
>
> It's easy today to spin up a jetty and wire / point out resource classes or 
> wire up CXF alongside to get things playing, but I'm probably missing out of 
> other things since I see many mails usually in consensus of not embedding and 
> rather want people to consider Solr as a stand-alone service, not sure why!
> I'm probably getting out of context here.
>
> Regards
>
>> On 27 Jul 2015, at 13:17, Noble Paul  wrote:
>>
>> Q.do you know when it would be released?
>> 5.3 will be released in another 3-4 weeks .
>>
>> Q.Are there any requirements of ZK authentication must be there as well?
>> NO
>>
>> bq.Providing my own security.json + class/implementation to verify
>> user/pass should work today with 5.2, right?
>>
>> Yes. But, if you modify your credentials or anything in that JSON, you
>> will have to restart all your nodes .
>>
>> Q.SOLR-7274 pluggable security is already in 5.2 (my requirement is to
>> provide user/pass in a secure manner, not as argument on cmd or from
>> (our unsecured) ZK but from a configuration restful service,
>>
>> I'm not clear what your question is. Basic Auth is a well-known
>> standard. We are just implementing that standard. We store all
>> credentials & permissions in ZK . That means it is only as secure as
>> your ZK . As long as nobody can write to ZK, your system is safe
>>
>>> On Wed, Jul 22, 2015 at 11:10 PM, Fadi Mohsen  wrote:
>>> Hi, I have some questions regarding basic auth and proper support in 5.3:
>>>
>>> do you know when it would be released?
>>>
>>> Are there any requirements of ZK authentication must be there as well?
>>>
>>> Do we store the user/pass in ZK?
>>>
>>> SOLR-7274 pluggable security is already in 5.2 (my requirement is to 
>>> provide user/pass in a secure manner, not as argument on cmd or from (our 
>>> unsecured) ZK but from a configuration restful service,
>>> I'm not sure 5.3 release would fit above requirement, can you reflect on 
>>> this?
>>>
>>> Providing my own security.json + class/implementation to verify user/pass 
>>> should work today with 5.2, right?
>>>
>>> Thanks
>>> Fadi
>>>
>>>> On 22 Jul 2015, at 14:33, Noble Paul  wrote:
>>>>
>>>> Solr 5.3 is coming with proper basic auth support
>>>>
>>>>
>>>> https://issues.apache.org/jira/browse/SOLR-7692
>>>>
>>>>> On Wed, Jul 22, 2015 at 5:28 PM, Peter Sturge  
>>>>> wrote:
>>>>> if you're using Jetty you can use the standard realms mechanism for Basic
>>>>> Auth, and it works the same on Windows or UNIX. There's plenty of docs on
>>>>> the Jetty sit

pre-loaded function-query?

2015-08-18 Thread Paul Libbrecht

Hello Solr experts,

I'm writing a "query expansion" QueryComponent which takes web-app
parameters (e.g. profile information) and turns them into a solr query.
Thus far I've used lucene TermQuery-ies with success.

Now, I would like to use something a bit more elaborate. Either I write
it with quite a lot of term-queries or I use a function query. But how
can I create a functionQuery that I can:
- re-use between the queries,
- enter using a somewhat practical method

Seems like range-queries could be done but I do not find how I can do it
with function-queries.
Is there an example somewhere?

thanks

Paul



signature.asc
Description: OpenPGP digital signature


Re: pre-loaded function-query?

2015-08-18 Thread Paul Libbrecht
Doug Turnbull wrote:
> I'm not sure if you mean organizing function queries under the hood in a
> query component or externally.
>
> Externally, I've always followed John Berryman's great advice for working
> with Solr when dealing with complex/reusable function queries and boosts
> http://opensourceconnections.com/blog/2013/11/22/parameterizing-and-organizing-solr-boosts/
Very very cute indeed.

However, I think I need it internally.
You're making me doubt.
Do I understand properly that this boost parameter is just operating a
power overall on the query?
My current expansion expands from the
   user-query
to the
   +user-query favouring-query-depending-other-params overall-favoring-query
(where the overall-favoring-query could be computed as a function).
With the boost parameter, i'd do:
   (+user-query favouring-query-depending-other-params)^boost-function

Not exactly the same or?

thanks

Paul



Re: User Authentication

2015-08-24 Thread Noble Paul
did you manage to look at the reference guide?
https://cwiki.apache.org/confluence/display/solr/Securing+Solr

On Mon, Aug 24, 2015 at 9:23 PM, LeZotte, Tom
 wrote:
> Alex
> I got a super secret release of Solr 5.3.1, wasn’t suppose to say anything.
>
> Yes I’m running 5.2.1, I will check out the release notes for 5.3.
>
> Was looking for three types of user authentication, I guess.
> 1. the Admin Console
> 2. User auth for each Core ( and select and update) on a server.
> 3. HTML interface access (example: 
> ajax-solr<https://github.com/evolvingweb/ajax-solr>)
>
> Thanks
>
> Tom LeZotte
> Health I.T. - Senior Product Developer
> (p) 615-875-8830
>
>
>
>
>
>
> On Aug 24, 2015, at 10:05 AM, Alexandre Rafalovitch 
> mailto:arafa...@gmail.com>> wrote:
>
> Thanks for the email from the future. It is good to start to prepare
> for 5.3.1 now that 5.3 is nearly out.
>
> Joking aside (and assuming Solr 5.2.1), what exactly are you trying to
> achieve? Solr should not actually be exposed to the users directly. It
> should be hiding in a backend only visible to your middleware. If you
> are looking for a HTML interface that talks directly to Solr after
> authentication, that's not the right way to set it up.
>
> That said, some security features are being rolled out and you should
> definitely check the release notes for the 5.3.
>
> Regards,
>   Alex.
> 
> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> http://www.solr-start.com/
>
>
> On 24 August 2015 at 10:01, LeZotte, Tom  wrote:
> Hi Solr Community
>
> I have been trying to add user authentication to our Solr 5.3.1 RedHat 
> install. I’ve found some examples on user authentication on the Jetty side. 
> But they have failed.
>
> Does any one have a step by step example on authentication for the admin 
> screen? And a core?
>
>
> Thanks
>
> Tom LeZotte
> Health I.T. - Senior Product Developer
> (p) 615-875-8830
>
>
>
>
>
>
>



-- 
-
Noble Paul


Re: SOLR 5.3

2015-08-24 Thread Noble Paul
The release is underway. Incorporating some corrections suggested by
others. Expect an announcement ove rthe next few hours

On Sun, Aug 23, 2015 at 6:44 PM, Arcadius Ahouansou
 wrote:
> Solr-5.3 has been available for download from
> http://mirror.catn.com/pub/apache/lucene/solr/5.3.0/
>
> The redirection on the web site will probably be fixed before we get the
> official announcement.
>
> Arcadius.
>
> On 23 August 2015 at 09:00, William Bell  wrote:
>
>> At lucene.apache.org/solr it says SOLR 5.3 is there, but when I click on
>> downloads it shows Solr 5.2.1... ??
>>
>> "APACHE SOLR™ 5.3.0Solr is the popular, blazing-fast, open source
>> enterprise search platform built on Apache Lucene™."
>>
>> --
>> Bill Bell
>> billnb...@gmail.com
>> cell 720-256-8076
>>
>
>
>
> --
> Arcadius Ahouansou
> Menelic Ltd | Information is Power
> M: 07908761999
> W: www.menelic.com
> ---



-- 
-
Noble Paul


Re: User Authentication

2015-08-24 Thread Noble Paul
no.
Most of it is in Solr 5.3

On Tue, Aug 25, 2015 at 12:48 AM, Steven White  wrote:
> Hi Noble,
>
> Is everything in the link you provided applicable to Solr 5.2.1?
>
> Thanks
>
> Steve
>
> On Mon, Aug 24, 2015 at 2:20 PM, Noble Paul  wrote:
>
>> did you manage to look at the reference guide?
>> https://cwiki.apache.org/confluence/display/solr/Securing+Solr
>>
>> On Mon, Aug 24, 2015 at 9:23 PM, LeZotte, Tom
>>  wrote:
>> > Alex
>> > I got a super secret release of Solr 5.3.1, wasn’t suppose to say
>> anything.
>> >
>> > Yes I’m running 5.2.1, I will check out the release notes for 5.3.
>> >
>> > Was looking for three types of user authentication, I guess.
>> > 1. the Admin Console
>> > 2. User auth for each Core ( and select and update) on a server.
>> > 3. HTML interface access (example: ajax-solr<
>> https://github.com/evolvingweb/ajax-solr>)
>> >
>> > Thanks
>> >
>> > Tom LeZotte
>> > Health I.T. - Senior Product Developer
>> > (p) 615-875-8830
>> >
>> >
>> >
>> >
>> >
>> >
>> > On Aug 24, 2015, at 10:05 AM, Alexandre Rafalovitch > <mailto:arafa...@gmail.com>> wrote:
>> >
>> > Thanks for the email from the future. It is good to start to prepare
>> > for 5.3.1 now that 5.3 is nearly out.
>> >
>> > Joking aside (and assuming Solr 5.2.1), what exactly are you trying to
>> > achieve? Solr should not actually be exposed to the users directly. It
>> > should be hiding in a backend only visible to your middleware. If you
>> > are looking for a HTML interface that talks directly to Solr after
>> > authentication, that's not the right way to set it up.
>> >
>> > That said, some security features are being rolled out and you should
>> > definitely check the release notes for the 5.3.
>> >
>> > Regards,
>> >   Alex.
>> > 
>> > Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
>> > http://www.solr-start.com/
>> >
>> >
>> > On 24 August 2015 at 10:01, LeZotte, Tom 
>> wrote:
>> > Hi Solr Community
>> >
>> > I have been trying to add user authentication to our Solr 5.3.1 RedHat
>> install. I’ve found some examples on user authentication on the Jetty side.
>> But they have failed.
>> >
>> > Does any one have a step by step example on authentication for the admin
>> screen? And a core?
>> >
>> >
>> > Thanks
>> >
>> > Tom LeZotte
>> > Health I.T. - Senior Product Developer
>> > (p) 615-875-8830
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>>
>>
>>
>> --
>> -
>> Noble Paul
>>



-- 
-
Noble Paul


[ANNOUNCE] Apache Solr 5.3.0 released

2015-08-24 Thread Noble Paul
Solr is the popular, blazing fast, open source NoSQL search platform
from the Apache Lucene project. Its major features include powerful
full-text search, hit highlighting, faceted search, dynamic
clustering, database integration, rich document (e.g., Word, PDF)
handling, and geospatial search. Solr is highly scalable, providing
fault tolerant distributed search and indexing, and powers the search
and navigation features of many of the world's largest internet sites.

Solr 5.3.0 is available for immediate download at:
http://lucene.apache.org/solr/mirrors-solr-latest-redir.html

Solr 5.3 Release Highlights:

In addition to many other improvements in the security framework, Solr
now includes an AuthenticationPlugin implementing HTTP Basic Auth that
stores credentials securely in ZooKeeper. This is a simple way to
require a username and password for anyone accessing Solr’s admin
screen or APIs.
In built AuthorizationPlugin that provides fine grained control over
implementing ACLs for various resources with permisssion rules which
are stored in ZooKeeper.
The JSON Facet API can now change the domain for facet commands,
essentially doing a block join and moving from parents to children, or
children to parents before calculating the facet data.
Major improvements in performance of the new Facet Module / JSON Facet API.
Query and Range Facets under Pivot Facets. Just like the JSON Facet
API, pivot facets can how nest other facet types such as range and
query facets.
More Like This Query Parser options. The MoreLikeThis QParser now
supports all options provided by the MLT Handler. The query parser is
much more versatile than the handler as it works in cloud mode as well
as anywhere a normal query can be specified.
Added Schema API support in SolrJ
Added Scoring mode for query-time join and block join.
Added Smile response format

For upgrading from 5.2, please look at the "Upgrading from Solr 5.2"
section in the change log.

Detailed change log:
http://lucene.apache.org/solr/5_3_0/changes/Changes.html

Please report any feedback to the mailing lists
(http://lucene.apache.org/solr/discussion.html)

-- 
-
Noble Paul
www.lucidworks.com


Re: [ANNOUNCE] Apache Solr 5.2.0 released

2015-08-24 Thread Noble Paul
sorry , screwed up the title

On Tue, Aug 25, 2015 at 8:30 AM, Noble Paul  wrote:
> Solr is the popular, blazing fast, open source NoSQL search platform
> from the Apache Lucene project. Its major features include powerful
> full-text search, hit highlighting, faceted search, dynamic
> clustering, database integration, rich document (e.g., Word, PDF)
> handling, and geospatial search. Solr is highly scalable, providing
> fault tolerant distributed search and indexing, and powers the search
> and navigation features of many of the world's largest internet sites.
>
> Solr 5.3.0 is available for immediate download at:
> http://lucene.apache.org/solr/mirrors-solr-latest-redir.html
>
> Solr 5.3 Release Highlights:
>
> In addition to many other improvements in the security framework, Solr
> now includes an AuthenticationPlugin implementing HTTP Basic Auth that
> stores credentials securely in ZooKeeper. This is a simple way to
> require a username and password for anyone accessing Solr’s admin
> screen or APIs.
> In built AuthorizationPlugin that provides fine grained control over
> implementing ACLs for various resources with permisssion rules which
> are stored in ZooKeeper.
> The JSON Facet API can now change the domain for facet commands,
> essentially doing a block join and moving from parents to children, or
> children to parents before calculating the facet data.
> Major improvements in performance of the new Facet Module / JSON Facet API.
> Query and Range Facets under Pivot Facets. Just like the JSON Facet
> API, pivot facets can how nest other facet types such as range and
> query facets.
> More Like This Query Parser options. The MoreLikeThis QParser now
> supports all options provided by the MLT Handler. The query parser is
> much more versatile than the handler as it works in cloud mode as well
> as anywhere a normal query can be specified.
> Added Schema API support in SolrJ
> Added Scoring mode for query-time join and block join.
> Added Smile response format
>
> For upgrading from 5.2, please look at the "Upgrading from Solr 5.2"
> section in the change log.
>
> Detailed change log:
> http://lucene.apache.org/solr/5_3_0/changes/Changes.html
>
> Please report any feedback to the mailing lists
> (http://lucene.apache.org/solr/discussion.html)
>
>
> --
> -
> Noble Paul
> www.lucidworks.com



-- 
-
Noble Paul


[ANNOUNCE] Apache Solr 5.2.0 released

2015-08-24 Thread Noble Paul
Solr is the popular, blazing fast, open source NoSQL search platform
from the Apache Lucene project. Its major features include powerful
full-text search, hit highlighting, faceted search, dynamic
clustering, database integration, rich document (e.g., Word, PDF)
handling, and geospatial search. Solr is highly scalable, providing
fault tolerant distributed search and indexing, and powers the search
and navigation features of many of the world's largest internet sites.

Solr 5.3.0 is available for immediate download at:
http://lucene.apache.org/solr/mirrors-solr-latest-redir.html

Solr 5.3 Release Highlights:

In addition to many other improvements in the security framework, Solr
now includes an AuthenticationPlugin implementing HTTP Basic Auth that
stores credentials securely in ZooKeeper. This is a simple way to
require a username and password for anyone accessing Solr’s admin
screen or APIs.
In built AuthorizationPlugin that provides fine grained control over
implementing ACLs for various resources with permisssion rules which
are stored in ZooKeeper.
The JSON Facet API can now change the domain for facet commands,
essentially doing a block join and moving from parents to children, or
children to parents before calculating the facet data.
Major improvements in performance of the new Facet Module / JSON Facet API.
Query and Range Facets under Pivot Facets. Just like the JSON Facet
API, pivot facets can how nest other facet types such as range and
query facets.
More Like This Query Parser options. The MoreLikeThis QParser now
supports all options provided by the MLT Handler. The query parser is
much more versatile than the handler as it works in cloud mode as well
as anywhere a normal query can be specified.
Added Schema API support in SolrJ
Added Scoring mode for query-time join and block join.
Added Smile response format

For upgrading from 5.2, please look at the "Upgrading from Solr 5.2"
section in the change log.

Detailed change log:
http://lucene.apache.org/solr/5_3_0/changes/Changes.html

Please report any feedback to the mailing lists
(http://lucene.apache.org/solr/discussion.html)


-- 
-
Noble Paul
www.lucidworks.com


Re: Issue Using Solr 5.3 Authentication and Authorization Plugins

2015-08-31 Thread Noble Paul
Admin UI is not protected by any of these permissions. Only if you try
to perform a protected operation , it asks for a password.

I'll investigate the restart problem and report my  findings

On Tue, Sep 1, 2015 at 3:10 AM, Kevin Lee  wrote:
> Anyone else running into any issues trying to get the authentication and 
> authorization plugins in 5.3 working?
>
>> On Aug 29, 2015, at 2:30 AM, Kevin Lee  wrote:
>>
>> Hi,
>>
>> I’m trying to use the new basic auth plugin for Solr 5.3 and it doesn’t seem 
>> to be working quite right.  Not sure if I’m missing steps or there is a bug. 
>>  I am able to get it to protect access to a URL under a collection, but am 
>> unable to get it to secure access to the Admin UI.  In addition, after 
>> stopping the Solr and Zookeeper instances, the security.json is still in 
>> Zookeeper, however Solr is allowing access to everything again like the 
>> security configuration isn’t in place.
>>
>> Contents of security.json taken from wiki page, but edited to produce valid 
>> JSON.  Had to move comma after 3rd from last “}” up to just after the last 
>> “]”.
>>
>> {
>> "authentication":{
>>   "class":"solr.BasicAuthPlugin",
>>   "credentials":{"solr":"IV0EHq1OnNrj6gvRCwvFwTrZ1+z1oBbnQdiVC3otuq0= 
>> Ndd7LKvVBAaZIF0QAVi1ekCfAJXr1GGfLtRUXhgrF8c="}
>> },
>> "authorization":{
>>   "class":"solr.RuleBasedAuthorizationPlugin",
>>   "permissions":[{"name":"security-edit",
>>  "role":"admin"}],
>>   "user-role":{"solr":"admin"}
>> }}
>>
>> Here are the steps I followed:
>>
>> Upload security.json to zookeeper
>> ./zkcli.sh -z localhost:2181,localhost:2182,localhost:2183 -cmd putfile 
>> /security.json ~/solr/security.json
>>
>> Use zkCli.sh from Zookeeper to ensure the security.json is in Zookeeper at 
>> /security.json.  It is there and looks like what was originally uploaded.
>>
>> Start Solr Instances
>>
>> Attempt to create a permission, however get the following error:
>> {
>>  "responseHeader":{
>>"status":400,
>>"QTime":0},
>>  "error":{
>>"msg":"No authorization plugin configured",
>>"code":400}}
>>
>> Upload security.json again.
>> ./zkcli.sh -z localhost:2181,localhost:2182,localhost:2183 -cmd putfile 
>> /security.json ~/solr/security.json
>>
>> Issue the following to try to create the permission again and this time it’s 
>> successful.
>> // Create a permission for mysearch endpoint
>>curl --user solr:SolrRocks -H 'Content-type:application/json' -d 
>> '{"set-permission": {"name":"mycollection-search","collection": 
>> “mycollection","path":”/mysearch","role": "search-user"}}' 
>> http://localhost:8983/solr/admin/authorization
>>
>>{
>>  "responseHeader":{
>>"status":0,
>>"QTime":7}}
>>
>> Issue the following commands to add users
>> curl --user solr:SolrRocks http://localhost:8983/solr/admin/authentication 
>> -H 'Content-type:application/json' -d '{"set-user": {"admin" : “password" }}’
>> curl --user solr:SolrRocks http://localhost:8983/solr/admin/authentication 
>> -H 'Content-type:application/json' -d '{"set-user": {"user" : “password" }}'
>>
>> Issue the following command to add permission to users
>> curl -u solr:SolrRocks -H 'Content-type:application/json' -d '{ 
>> "set-user-role" : {"admin": ["search-user", "admin"]}}' 
>> http://localhost:8983/solr/admin/authorization
>> curl -u solr:SolrRocks -H 'Content-type:application/json' -d '{ 
>> "set-user-role" : {"user": ["search-user"]}}' 
>> http://localhost:8983/solr/admin/authorization
>>
>> After executing the above, access to /mysearch is protected until I restart 
>> the Solr and Zookeeper instances.  However, the admin UI is never protected 
>> like the Wiki page says it should be once activated.
>>
>> https://cwiki.apache.org/confluence/display/solr/Rule-Based+Authorization+Plugin
>>  
>> <https://cwiki.apache.org/confluence/display/solr/Rule-Based+Authorization+Plugin>
>>
>> Why does the authentication and authorization plugin not stay activated 
>> after restart and why is the Admin UI never protected?  Am I missing any 
>> steps?
>>
>> Thanks,
>> Kevin



-- 
-
Noble Paul


Re: Issue Using Solr 5.3 Authentication and Authorization Plugins

2015-09-01 Thread Noble Paul
I'm investigating why restarts or first time start does not read the
security.json

On Tue, Sep 1, 2015 at 1:00 PM, Noble Paul  wrote:
> I removed that statement
>
> "If activating the authorization plugin doesn't protect the admin ui,
> how does one protect access to it?"
>
> One does not need to protect the admin UI. You only need to protect
> the relevant API calls . I mean it's OK to not protect the CSS and
> HTML stuff.  But if you perform an action to create a core or do a
> query through admin UI , it automatically will prompt you for
> credentials (if those APIs are protected)
>
> On Tue, Sep 1, 2015 at 12:41 PM, Kevin Lee  wrote:
>> Thanks for the clarification!
>>
>> So is the wiki page incorrect at
>> https://cwiki.apache.org/confluence/display/solr/Basic+Authentication+Plugin 
>> which says that the admin ui will require authentication once the 
>> authorization plugin is activated?
>>
>> "An authorization plugin is also available to configure Solr with 
>> permissions to perform various activities in the system. Once activated, 
>> access to the Solr Admin UI and all requests will need to be authenticated 
>> and users will be required to have the proper authorization for all 
>> requests, including using the Admin UI and making any API calls."
>>
>> If activating the authorization plugin doesn't protect the admin ui, how 
>> does one protect access to it?
>>
>> Also, the issue I'm having is not just at restart.  According to the docs 
>> security.json should be uploaded to Zookeeper before starting any of the 
>> Solr instances.  However, I tried to upload security.json before starting 
>> any of the Solr instances, but it would not pick up the security config 
>> until after the Solr instances are already running and then uploading the 
>> security.json again.  I can see in the logs at startup that the Solr 
>> instances don't see any plugin enabled even though security.json is already 
>> in zookeeper and then after they are started and the security.json is 
>> uploaded again I see it reconfigure to use the plugin.
>>
>> Thanks,
>> Kevin
>>
>>> On Aug 31, 2015, at 11:22 PM, Noble Paul  wrote:
>>>
>>> Admin UI is not protected by any of these permissions. Only if you try
>>> to perform a protected operation , it asks for a password.
>>>
>>> I'll investigate the restart problem and report my  findings
>>>
>>>> On Tue, Sep 1, 2015 at 3:10 AM, Kevin Lee  
>>>> wrote:
>>>> Anyone else running into any issues trying to get the authentication and 
>>>> authorization plugins in 5.3 working?
>>>>
>>>>> On Aug 29, 2015, at 2:30 AM, Kevin Lee  wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I’m trying to use the new basic auth plugin for Solr 5.3 and it doesn’t 
>>>>> seem to be working quite right.  Not sure if I’m missing steps or there 
>>>>> is a bug.  I am able to get it to protect access to a URL under a 
>>>>> collection, but am unable to get it to secure access to the Admin UI.  In 
>>>>> addition, after stopping the Solr and Zookeeper instances, the 
>>>>> security.json is still in Zookeeper, however Solr is allowing access to 
>>>>> everything again like the security configuration isn’t in place.
>>>>>
>>>>> Contents of security.json taken from wiki page, but edited to produce 
>>>>> valid JSON.  Had to move comma after 3rd from last “}” up to just after 
>>>>> the last “]”.
>>>>>
>>>>> {
>>>>> "authentication":{
>>>>> "class":"solr.BasicAuthPlugin",
>>>>> "credentials":{"solr":"IV0EHq1OnNrj6gvRCwvFwTrZ1+z1oBbnQdiVC3otuq0= 
>>>>> Ndd7LKvVBAaZIF0QAVi1ekCfAJXr1GGfLtRUXhgrF8c="}
>>>>> },
>>>>> "authorization":{
>>>>> "class":"solr.RuleBasedAuthorizationPlugin",
>>>>> "permissions":[{"name":"security-edit",
>>>>>"role":"admin"}],
>>>>> "user-role":{"solr":"admin"}
>>>>> }}
>>>>>
>>>>> Here are the steps I followed:
>>>>>
>>>>> Upload security.json to zookeeper
>>>>> ./zkcli.sh -z localhost:2181,localhost:2182,localhost:2183 -cmd putfile 
>>>>> /se

Re: Issue Using Solr 5.3 Authentication and Authorization Plugins

2015-09-01 Thread Noble Paul
I removed that statement

"If activating the authorization plugin doesn't protect the admin ui,
how does one protect access to it?"

One does not need to protect the admin UI. You only need to protect
the relevant API calls . I mean it's OK to not protect the CSS and
HTML stuff.  But if you perform an action to create a core or do a
query through admin UI , it automatically will prompt you for
credentials (if those APIs are protected)

On Tue, Sep 1, 2015 at 12:41 PM, Kevin Lee  wrote:
> Thanks for the clarification!
>
> So is the wiki page incorrect at
> https://cwiki.apache.org/confluence/display/solr/Basic+Authentication+Plugin 
> which says that the admin ui will require authentication once the 
> authorization plugin is activated?
>
> "An authorization plugin is also available to configure Solr with permissions 
> to perform various activities in the system. Once activated, access to the 
> Solr Admin UI and all requests will need to be authenticated and users will 
> be required to have the proper authorization for all requests, including 
> using the Admin UI and making any API calls."
>
> If activating the authorization plugin doesn't protect the admin ui, how does 
> one protect access to it?
>
> Also, the issue I'm having is not just at restart.  According to the docs 
> security.json should be uploaded to Zookeeper before starting any of the Solr 
> instances.  However, I tried to upload security.json before starting any of 
> the Solr instances, but it would not pick up the security config until after 
> the Solr instances are already running and then uploading the security.json 
> again.  I can see in the logs at startup that the Solr instances don't see 
> any plugin enabled even though security.json is already in zookeeper and then 
> after they are started and the security.json is uploaded again I see it 
> reconfigure to use the plugin.
>
> Thanks,
> Kevin
>
>> On Aug 31, 2015, at 11:22 PM, Noble Paul  wrote:
>>
>> Admin UI is not protected by any of these permissions. Only if you try
>> to perform a protected operation , it asks for a password.
>>
>> I'll investigate the restart problem and report my  findings
>>
>>> On Tue, Sep 1, 2015 at 3:10 AM, Kevin Lee  wrote:
>>> Anyone else running into any issues trying to get the authentication and 
>>> authorization plugins in 5.3 working?
>>>
>>>> On Aug 29, 2015, at 2:30 AM, Kevin Lee  wrote:
>>>>
>>>> Hi,
>>>>
>>>> I’m trying to use the new basic auth plugin for Solr 5.3 and it doesn’t 
>>>> seem to be working quite right.  Not sure if I’m missing steps or there is 
>>>> a bug.  I am able to get it to protect access to a URL under a collection, 
>>>> but am unable to get it to secure access to the Admin UI.  In addition, 
>>>> after stopping the Solr and Zookeeper instances, the security.json is 
>>>> still in Zookeeper, however Solr is allowing access to everything again 
>>>> like the security configuration isn’t in place.
>>>>
>>>> Contents of security.json taken from wiki page, but edited to produce 
>>>> valid JSON.  Had to move comma after 3rd from last “}” up to just after 
>>>> the last “]”.
>>>>
>>>> {
>>>> "authentication":{
>>>> "class":"solr.BasicAuthPlugin",
>>>> "credentials":{"solr":"IV0EHq1OnNrj6gvRCwvFwTrZ1+z1oBbnQdiVC3otuq0= 
>>>> Ndd7LKvVBAaZIF0QAVi1ekCfAJXr1GGfLtRUXhgrF8c="}
>>>> },
>>>> "authorization":{
>>>> "class":"solr.RuleBasedAuthorizationPlugin",
>>>> "permissions":[{"name":"security-edit",
>>>>"role":"admin"}],
>>>> "user-role":{"solr":"admin"}
>>>> }}
>>>>
>>>> Here are the steps I followed:
>>>>
>>>> Upload security.json to zookeeper
>>>> ./zkcli.sh -z localhost:2181,localhost:2182,localhost:2183 -cmd putfile 
>>>> /security.json ~/solr/security.json
>>>>
>>>> Use zkCli.sh from Zookeeper to ensure the security.json is in Zookeeper at 
>>>> /security.json.  It is there and looks like what was originally uploaded.
>>>>
>>>> Start Solr Instances
>>>>
>>>> Attempt to create a permission, however get the following error:
>>>> {
>>>> "responseHeader":{
>>>>  "status":400,
>>>>  &

Re: Issue Using Solr 5.3 Authentication and Authorization Plugins

2015-09-01 Thread Noble Paul
Looks like there is a bug in that . On start/restart the security.json
is not loaded
I shall open a ticket

https://issues.apache.org/jira/browse/SOLR-8000

On Tue, Sep 1, 2015 at 1:01 PM, Noble Paul  wrote:
> I'm investigating why restarts or first time start does not read the
> security.json
>
> On Tue, Sep 1, 2015 at 1:00 PM, Noble Paul  wrote:
>> I removed that statement
>>
>> "If activating the authorization plugin doesn't protect the admin ui,
>> how does one protect access to it?"
>>
>> One does not need to protect the admin UI. You only need to protect
>> the relevant API calls . I mean it's OK to not protect the CSS and
>> HTML stuff.  But if you perform an action to create a core or do a
>> query through admin UI , it automatically will prompt you for
>> credentials (if those APIs are protected)
>>
>> On Tue, Sep 1, 2015 at 12:41 PM, Kevin Lee  wrote:
>>> Thanks for the clarification!
>>>
>>> So is the wiki page incorrect at
>>> https://cwiki.apache.org/confluence/display/solr/Basic+Authentication+Plugin
>>>  which says that the admin ui will require authentication once the 
>>> authorization plugin is activated?
>>>
>>> "An authorization plugin is also available to configure Solr with 
>>> permissions to perform various activities in the system. Once activated, 
>>> access to the Solr Admin UI and all requests will need to be authenticated 
>>> and users will be required to have the proper authorization for all 
>>> requests, including using the Admin UI and making any API calls."
>>>
>>> If activating the authorization plugin doesn't protect the admin ui, how 
>>> does one protect access to it?
>>>
>>> Also, the issue I'm having is not just at restart.  According to the docs 
>>> security.json should be uploaded to Zookeeper before starting any of the 
>>> Solr instances.  However, I tried to upload security.json before starting 
>>> any of the Solr instances, but it would not pick up the security config 
>>> until after the Solr instances are already running and then uploading the 
>>> security.json again.  I can see in the logs at startup that the Solr 
>>> instances don't see any plugin enabled even though security.json is already 
>>> in zookeeper and then after they are started and the security.json is 
>>> uploaded again I see it reconfigure to use the plugin.
>>>
>>> Thanks,
>>> Kevin
>>>
>>>> On Aug 31, 2015, at 11:22 PM, Noble Paul  wrote:
>>>>
>>>> Admin UI is not protected by any of these permissions. Only if you try
>>>> to perform a protected operation , it asks for a password.
>>>>
>>>> I'll investigate the restart problem and report my  findings
>>>>
>>>>> On Tue, Sep 1, 2015 at 3:10 AM, Kevin Lee  
>>>>> wrote:
>>>>> Anyone else running into any issues trying to get the authentication and 
>>>>> authorization plugins in 5.3 working?
>>>>>
>>>>>> On Aug 29, 2015, at 2:30 AM, Kevin Lee  wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I’m trying to use the new basic auth plugin for Solr 5.3 and it doesn’t 
>>>>>> seem to be working quite right.  Not sure if I’m missing steps or there 
>>>>>> is a bug.  I am able to get it to protect access to a URL under a 
>>>>>> collection, but am unable to get it to secure access to the Admin UI.  
>>>>>> In addition, after stopping the Solr and Zookeeper instances, the 
>>>>>> security.json is still in Zookeeper, however Solr is allowing access to 
>>>>>> everything again like the security configuration isn’t in place.
>>>>>>
>>>>>> Contents of security.json taken from wiki page, but edited to produce 
>>>>>> valid JSON.  Had to move comma after 3rd from last “}” up to just after 
>>>>>> the last “]”.
>>>>>>
>>>>>> {
>>>>>> "authentication":{
>>>>>> "class":"solr.BasicAuthPlugin",
>>>>>> "credentials":{"solr":"IV0EHq1OnNrj6gvRCwvFwTrZ1+z1oBbnQdiVC3otuq0= 
>>>>>> Ndd7LKvVBAaZIF0QAVi1ekCfAJXr1GGfLtRUXhgrF8c="}
>>>>>> },
>>>>>> "authorization":{
>>>>>> "class":"solr.RuleBasedAuthorizationPlug

Re: Issue Using Solr 5.3 Authentication and Authorization Plugins

2015-09-01 Thread Noble Paul
" However, after uploading the new security.json and restarting the
web browser,"

The browser remembers your login , So it is unlikely to prompt for the
credentials again.

Why don't you try the RELOAD operation using command line (curl) ?

On Tue, Sep 1, 2015 at 10:31 PM, Kevin Lee  wrote:
> The restart issues aside, I’m trying to lockdown usage of the Collections 
> API, but that also does not seem to be working either.
>
> Here is my security.json.  I’m using the “collection-admin-edit” permission 
> and assigning it to the “adminRole”.  However, after uploading the new 
> security.json and restarting the web browser, it doesn’t seem to be requiring 
> credentials when calling the RELOAD action on the Collections API.  The only 
> thing that seems to work is the custom permission “browse” which is requiring 
> authentication before allowing me to pull up the page.  Am I using the 
> permissions correctly for the RuleBasedAuthorizationPlugin?
>
> {
> "authentication":{
>"class":"solr.BasicAuthPlugin",
>"credentials": {
> "admin”:” ",
> "user": ” "
> }
> },
> "authorization":{
>"class":"solr.RuleBasedAuthorizationPlugin",
>"permissions": [
> {
> "name":"security-edit",
> "role":"adminRole"
> },
> {
> "name":"collection-admin-edit”,
> "role":"adminRole"
> },
> {
> "name":"browse",
> "collection": "inventory",
> "path": "/browse",
> "role":"browseRole"
> }
> ],
>"user-role": {
> "admin": [
> "adminRole",
> "browseRole"
> ],
> "user": [
> "browseRole"
> ]
> }
> }
> }
>
> Also tried adding the permission using the Authorization API, but no effect, 
> still isn’t protecting the Collections API from being invoked without a 
> username password.  I do see in the Solr logs that it sees the updates 
> because it outputs the messages “Updating /security.json …”, “Security node 
> changed”, “Initializing authorization plugin: 
> solr.RuleBasedAuthorizationPlugin” and “Authentication plugin class obtained 
> from ZK: solr.BasicAuthPlugin”.
>
> Thanks,
> Kevin
>
>> On Sep 1, 2015, at 12:31 AM, Noble Paul  wrote:
>>
>> I'm investigating why restarts or first time start does not read the
>> security.json
>>
>> On Tue, Sep 1, 2015 at 1:00 PM, Noble Paul  wrote:
>>> I removed that statement
>>>
>>> "If activating the authorization plugin doesn't protect the admin ui,
>>> how does one protect access to it?"
>>>
>>> One does not need to protect the admin UI. You only need to protect
>>> the relevant API calls . I mean it's OK to not protect the CSS and
>>> HTML stuff.  But if you perform an action to create a core or do a
>>> query through admin UI , it automatically will prompt you for
>>> credentials (if those APIs are protected)
>>>
>>> On Tue, Sep 1, 2015 at 12:41 PM, Kevin Lee  
>>> wrote:
>>>> Thanks for the clarification!
>>>>
>>>> So is the wiki page incorrect at
>>>> https://cwiki.apache.org/confluence/display/solr/Basic+Authentication+Plugin
>>>>  which says that the admin ui will require authentication once the 
>>>> authorization plugin is activated?
>>>>
>>>> "An authorization plugin is also available to configure Solr with 
>>>> permissions to perform various activities in the system. Once activated, 
>>>> access to the Solr Admin UI and all requests will need to be authenticated 
>>>> and users will be required to have the proper authorization for all 
>>>> requests, including using the Admin UI and making any A

Re: Issue Using Solr 5.3 Authentication and Authorization Plugins

2015-09-02 Thread Noble Paul
I opened a ticket for the same
 https://issues.apache.org/jira/browse/SOLR-8004

On Wed, Sep 2, 2015 at 1:36 PM, Kevin Lee  wrote:
> I’ve found that completely exiting Chrome or Firefox and opening it back up 
> re-prompts for credentials when they are required.  It was re-prompting with 
> the /browse path where authentication was working each time I completely 
> exited and started the browser again, however it won’t re-prompt unless you 
> exit completely and close all running instances so I closed all instances 
> each time to test.
>
> However, to make sure I ran it via the command line via curl as suggested and 
> it still does not give any authentication error when trying to issue the 
> command via curl.  I get a success response from all the Solr instances that 
> the reload was successful.
>
> Not sure why the pre-canned permissions aren’t working, but the one to the 
> request handler at the /browse path is.
>
>
>> On Sep 1, 2015, at 11:03 PM, Noble Paul  wrote:
>>
>> " However, after uploading the new security.json and restarting the
>> web browser,"
>>
>> The browser remembers your login , So it is unlikely to prompt for the
>> credentials again.
>>
>> Why don't you try the RELOAD operation using command line (curl) ?
>>
>> On Tue, Sep 1, 2015 at 10:31 PM, Kevin Lee  wrote:
>>> The restart issues aside, I’m trying to lockdown usage of the Collections 
>>> API, but that also does not seem to be working either.
>>>
>>> Here is my security.json.  I’m using the “collection-admin-edit” permission 
>>> and assigning it to the “adminRole”.  However, after uploading the new 
>>> security.json and restarting the web browser, it doesn’t seem to be 
>>> requiring credentials when calling the RELOAD action on the Collections 
>>> API.  The only thing that seems to work is the custom permission “browse” 
>>> which is requiring authentication before allowing me to pull up the page.  
>>> Am I using the permissions correctly for the RuleBasedAuthorizationPlugin?
>>>
>>> {
>>>"authentication":{
>>>   "class":"solr.BasicAuthPlugin",
>>>   "credentials": {
>>>"admin”:” ",
>>>"user": ” "
>>>}
>>>},
>>>"authorization":{
>>>   "class":"solr.RuleBasedAuthorizationPlugin",
>>>   "permissions": [
>>>{
>>>"name":"security-edit",
>>>"role":"adminRole"
>>>},
>>>{
>>>"name":"collection-admin-edit”,
>>>"role":"adminRole"
>>>},
>>>{
>>>"name":"browse",
>>>"collection": "inventory",
>>>"path": "/browse",
>>>"role":"browseRole"
>>>}
>>>],
>>>   "user-role": {
>>>"admin": [
>>>"adminRole",
>>>"browseRole"
>>>],
>>>"user": [
>>>    "browseRole"
>>>]
>>>}
>>>}
>>> }
>>>
>>> Also tried adding the permission using the Authorization API, but no 
>>> effect, still isn’t protecting the Collections API from being invoked 
>>> without a username password.  I do see in the Solr logs that it sees the 
>>> updates because it outputs the messages “Updating /security.json …”, 
>>> “Security node changed”, “Initializing authorization plugin: 
>>> solr.RuleBasedAuthorizationPlugin” and “Authentication plugin class 
>>> obtained from ZK: solr.BasicAuthPlugin”.
>>>
>>> Thanks,
>>> Kevin
>>>
>>>> On Sep 1, 2015, at 12:31 AM, Noble Paul  wrote:
>>>>
>>>> I'm investigating why restarts

Re: Issue Using Solr 5.3 Authentication and Authorization Plugins

2015-09-03 Thread Noble Paul
Both these are committed. If you could test with the latest 5.3 branch
it would be helpful

On Wed, Sep 2, 2015 at 5:11 PM, Noble Paul  wrote:
> I opened a ticket for the same
>  https://issues.apache.org/jira/browse/SOLR-8004
>
> On Wed, Sep 2, 2015 at 1:36 PM, Kevin Lee  wrote:
>> I’ve found that completely exiting Chrome or Firefox and opening it back up 
>> re-prompts for credentials when they are required.  It was re-prompting with 
>> the /browse path where authentication was working each time I completely 
>> exited and started the browser again, however it won’t re-prompt unless you 
>> exit completely and close all running instances so I closed all instances 
>> each time to test.
>>
>> However, to make sure I ran it via the command line via curl as suggested 
>> and it still does not give any authentication error when trying to issue the 
>> command via curl.  I get a success response from all the Solr instances that 
>> the reload was successful.
>>
>> Not sure why the pre-canned permissions aren’t working, but the one to the 
>> request handler at the /browse path is.
>>
>>
>>> On Sep 1, 2015, at 11:03 PM, Noble Paul  wrote:
>>>
>>> " However, after uploading the new security.json and restarting the
>>> web browser,"
>>>
>>> The browser remembers your login , So it is unlikely to prompt for the
>>> credentials again.
>>>
>>> Why don't you try the RELOAD operation using command line (curl) ?
>>>
>>> On Tue, Sep 1, 2015 at 10:31 PM, Kevin Lee  
>>> wrote:
>>>> The restart issues aside, I’m trying to lockdown usage of the Collections 
>>>> API, but that also does not seem to be working either.
>>>>
>>>> Here is my security.json.  I’m using the “collection-admin-edit” 
>>>> permission and assigning it to the “adminRole”.  However, after uploading 
>>>> the new security.json and restarting the web browser, it doesn’t seem to 
>>>> be requiring credentials when calling the RELOAD action on the Collections 
>>>> API.  The only thing that seems to work is the custom permission “browse” 
>>>> which is requiring authentication before allowing me to pull up the page.  
>>>> Am I using the permissions correctly for the RuleBasedAuthorizationPlugin?
>>>>
>>>> {
>>>>"authentication":{
>>>>   "class":"solr.BasicAuthPlugin",
>>>>   "credentials": {
>>>>"admin”:” ",
>>>>"user": ” "
>>>>}
>>>>},
>>>>"authorization":{
>>>>   "class":"solr.RuleBasedAuthorizationPlugin",
>>>>   "permissions": [
>>>>{
>>>>"name":"security-edit",
>>>>"role":"adminRole"
>>>>},
>>>>{
>>>>"name":"collection-admin-edit”,
>>>>"role":"adminRole"
>>>>},
>>>>{
>>>>"name":"browse",
>>>>"collection": "inventory",
>>>>"path": "/browse",
>>>>"role":"browseRole"
>>>>}
>>>>    ],
>>>>   "user-role": {
>>>>"admin": [
>>>>    "adminRole",
>>>>"browseRole"
>>>>],
>>>>"user": [
>>>>"browseRole"
>>>>]
>>>>}
>>>>}
>>>> }
>>>>
>>>> Also tried adding the permission using the Authorization API, but no 
>>>> effect, still isn’t protecting the Collections API from being invoked 
>>>> without a username password.  I do see in the Solr logs that it sees the 
>&

Re: Issue Using Solr 5.3 Authentication and Authorization Plugins

2015-09-04 Thread Noble Paul
There are no download links for 5.3.x branch  till we do a bug fix release

If you wish to download the trunk nightly (which is not same as 5.3.0)
check here 
https://builds.apache.org/job/Solr-Artifacts-trunk/lastSuccessfulBuild/artifact/solr/package/

If you wish to get the binaries for 5.3 branch you will have to make it
(you will need to install svn and ant)

Here are the steps

svn checkout 
http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_5_3/
cd lucene_solr_5_3/solr
ant server



On Fri, Sep 4, 2015 at 4:11 PM, davidphilip cherian
 wrote:
> Hi Kevin/Noble,
>
> What is the download link to take the latest? What are the steps to compile
> it, test and use?
> We also have a use case to have this feature in solr too. Therefore, wanted
> to test and above info would help a lot to get started.
>
> Thanks.
>
>
> On Fri, Sep 4, 2015 at 1:45 PM, Kevin Lee  wrote:
>
>> Thanks, I downloaded the source and compiled it and replaced the jar file
>> in the dist and solr-webapp’s WEB-INF/lib directory.  It does seem to be
>> protecting the Collections API reload command now as long as I upload the
>> security.json after startup of the Solr instances.  If I shutdown and bring
>> the instances back up, the security is no longer in place and I have to
>> upload the security.json again for it to take effect.
>>
>> - Kevin
>>
>> > On Sep 3, 2015, at 10:29 PM, Noble Paul  wrote:
>> >
>> > Both these are committed. If you could test with the latest 5.3 branch
>> > it would be helpful
>> >
>> > On Wed, Sep 2, 2015 at 5:11 PM, Noble Paul  wrote:
>> >> I opened a ticket for the same
>> >> https://issues.apache.org/jira/browse/SOLR-8004
>> >>
>> >> On Wed, Sep 2, 2015 at 1:36 PM, Kevin Lee 
>> wrote:
>> >>> I’ve found that completely exiting Chrome or Firefox and opening it
>> back up re-prompts for credentials when they are required.  It was
>> re-prompting with the /browse path where authentication was working each
>> time I completely exited and started the browser again, however it won’t
>> re-prompt unless you exit completely and close all running instances so I
>> closed all instances each time to test.
>> >>>
>> >>> However, to make sure I ran it via the command line via curl as
>> suggested and it still does not give any authentication error when trying
>> to issue the command via curl.  I get a success response from all the Solr
>> instances that the reload was successful.
>> >>>
>> >>> Not sure why the pre-canned permissions aren’t working, but the one to
>> the request handler at the /browse path is.
>> >>>
>> >>>
>> >>>> On Sep 1, 2015, at 11:03 PM, Noble Paul  wrote:
>> >>>>
>> >>>> " However, after uploading the new security.json and restarting the
>> >>>> web browser,"
>> >>>>
>> >>>> The browser remembers your login , So it is unlikely to prompt for the
>> >>>> credentials again.
>> >>>>
>> >>>> Why don't you try the RELOAD operation using command line (curl) ?
>> >>>>
>> >>>> On Tue, Sep 1, 2015 at 10:31 PM, Kevin Lee 
>> wrote:
>> >>>>> The restart issues aside, I’m trying to lockdown usage of the
>> Collections API, but that also does not seem to be working either.
>> >>>>>
>> >>>>> Here is my security.json.  I’m using the “collection-admin-edit”
>> permission and assigning it to the “adminRole”.  However, after uploading
>> the new security.json and restarting the web browser, it doesn’t seem to be
>> requiring credentials when calling the RELOAD action on the Collections
>> API.  The only thing that seems to work is the custom permission “browse”
>> which is requiring authentication before allowing me to pull up the page.
>> Am I using the permissions correctly for the RuleBasedAuthorizationPlugin?
>> >>>>>
>> >>>>> {
>> >>>>>   "authentication":{
>> >>>>>  "class":"solr.BasicAuthPlugin",
>> >>>>>  "credentials": {
>> >>>>>   "admin”:” ",
>> >>>>>   "user": ” "
>> >>>>>   }
>> >>>>>   },
>> >>>>>   "authorization":{
>> >>>>

Re: Strange interpretation of invalid ISO date strings

2015-09-06 Thread Paul Libbrecht
Just a word of warning: iso-8601, the date format standard, is quite big, to 
say the least, and I thus expect very few implementations to be complete. 

I survived one such interoperability issue with Safari on iOS6. While they (and 
JS I think) claim iso-8601, it was not complete and fine grained hunting lead 
us to the discovery of that. Opening an issue at Apple was done but changing on 
our side was ‎much faster. Overall, this has cost us several months of 
development...

I wish there would be a tinyer standard.

Paul‎


-- fat fingered on my z10 --
  Message d'origine  
De: Shawn Heisey
Envoyé: Montag, 7. September 2015 02:05
À: solr-user@lucene.apache.org
Répondre à: solr-user@lucene.apache.org
Objet: Strange interpretation of invalid ISO date strings

Here's some debug info from a query our code was generating:

"querystring": "post_date:[2015-09-0124T00:00:00Z TO
2015-09-0224T00:00:00Z]",
"parsedquery": "post_date:[145169280 TO 146033280]",

The "24" is from part of our code that interprets the hour, it was being
incorrectly added. We have since fixed the problem, but are somewhat
confused that we did not get an error.

When I decode the millisecond timestamps in the parsed query, I get
these dates:

Sat, 02 Jan 2016 00:00:00 GMT
Mon, 11 Apr 2016 00:00:00 GMT

Should this be considered a bug? I would have expected Solr to throw an
exception related to an invalidly formatted date, not assume that we
meant the 124th and 224th day of the month and calculate it
accordingly. Would I be right in thinking that this problem is not
actually in Solr code, that we are using code from either Java itself or
a third party for ISO date parsing?

The index where this problem was noticed is Solr 4.9.1 running with
Oracle JDK8u45 on Linux. I confirmed that the same thing happens if I
use Solr 5.2.1 running with Oracle JDK 8u60 on Windows.

Thanks,
Shawn



Re: How to secure Admin UI with Basic Auth in Solr 5.3.x

2015-09-10 Thread Noble Paul
Check this https://cwiki.apache.org/confluence/display/solr/Securing+Solr

There a couple of bugs in 5.3.o and a bug fix release is coming up
over the next few days.

We don't provide any specific means to restrict access to admin UI
itself. However we let users specify fine grained ACLs on various
operations such collection-admin-edit, read etc

On Wed, Sep 9, 2015 at 2:35 PM, Merlin Morgenstern
 wrote:
> I just installed solr cloud 5.3.x and found that the way to secure the amin
> ui has changed. Aparently there is a new plugin which does role based
> authentification and all info on how to secure the admin UI found on the
> net is outdated.
>
> I do not need role based authentification but just simply want to put basic
> authentification to the Admin UI.
>
> How do I configure solr cloud 5.3.x in order to restrict access to the
> Admin UI via Basic Authentification?
>
> Thank you for any help



-- 
-----
Noble Paul


Re: How to secure Admin UI with Basic Auth in Solr 5.3.x

2015-09-11 Thread Noble Paul
There were some bugs with the 5.3.0 release and 5.3.1 is in the
process of getting released.

try out the option #2 with the RC here

https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-5.3.1-RC1-rev1702389/solr/



On Fri, Sep 11, 2015 at 5:16 PM, Merlin Morgenstern
 wrote:
> OK, I downgraded to solr 5.2.x
>
> Unfortunatelly still no luck. I followed 2 aproaches:
>
> 1. Secure it the old fashioned way like described here:
> http://stackoverflow.com/questions/28043957/how-to-set-apache-solr-admin-password
>
> 2. Using the Basic Authentication Plugin like described here:
> http://lucidworks.com/blog/securing-solr-basic-auth-permission-rules/
>
> Both aproaches created unsolved problems.
>
> While following option 1, I was able to secure the Admin UI with basic
> authentication, but no longer able to access my application despite the
> fact that it was working on solr 3.x with the same type of authentication
> procedure and credentials.
>
> While following option 2, I was stuck right after uploading the
> security.json file to the zookeeper ensemble. The described behaviour to curl
> http://localhost:8983/solr/admin/authentication responded with a 404 not
> found and then solr could not connect to zookeeper. I had to remove that
> file from zookeeper and restart all solr nodes.
>
> Please could someone lead me the way on how to secure the Admin UI and
> password protect solr cloud? I have a perfectly running system with solr
> 3.x and one core and now taking it to solr cloud 5.2.x into production
> seems to be stoped by simple authorization problems.
>
> Thank you in advane for any help.
>
>
>
> 2015-09-10 20:42 GMT+02:00 Noble Paul :
>
>> Check this https://cwiki.apache.org/confluence/display/solr/Securing+Solr
>>
>> There a couple of bugs in 5.3.o and a bug fix release is coming up
>> over the next few days.
>>
>> We don't provide any specific means to restrict access to admin UI
>> itself. However we let users specify fine grained ACLs on various
>> operations such collection-admin-edit, read etc
>>
>> On Wed, Sep 9, 2015 at 2:35 PM, Merlin Morgenstern
>>  wrote:
>> > I just installed solr cloud 5.3.x and found that the way to secure the
>> amin
>> > ui has changed. Aparently there is a new plugin which does role based
>> > authentification and all info on how to secure the admin UI found on the
>> > net is outdated.
>> >
>> > I do not need role based authentification but just simply want to put
>> basic
>> > authentification to the Admin UI.
>> >
>> > How do I configure solr cloud 5.3.x in order to restrict access to the
>> > Admin UI via Basic Authentification?
>> >
>> > Thank you for any help
>>
>>
>>
>> --
>> -
>> Noble Paul
>>



-- 
-
Noble Paul


Re: Solr authentication - Error 401 Unauthorized

2015-09-13 Thread Noble Paul
It is not that solr is over protected, it is just that the clients,
SolrJ as well as bin/solr are not provided with basic auth
capabilities.

 I have opened a ticket to track this
https://issues.apache.org/jira/browse/SOLR-8048

On Sat, Sep 12, 2015 at 7:14 PM, Dan Davis  wrote:
> Noble,
>
> You should also look at this if it is intended to be more than an internal
> API.   Using the minor protections I added to test SOLR-8000, I was able to
> reproduce a problem very like this:
>
> bin/solr healthcheck -z localhost:2181 -c mycollection
>
> Since Solr /select is protected...
>
> On Sat, Sep 12, 2015 at 9:40 AM, Dan Davis  wrote:
>
>> It seems that you have secured Solr so thoroughly that you cannot now run
>> bin/solr status!
>>
>> bin/solr has no arguments as yet for providing a username/password - as a
>> mostly user like you I'm not sure of the roadmap.
>>
>> I think you should relax those restrictions a bit and try again.
>>
>> On Fri, Sep 11, 2015 at 5:06 AM, Merlin Morgenstern <
>> merlin.morgenst...@gmail.com> wrote:
>>
>>> I have secured solr cloud via basic authentication.
>>>
>>> Now I am having difficulties creating cores and getting status
>>> information.
>>> Solr keeps telling me that the request is unothorized. However, I have
>>> access to the admin UI after login.
>>>
>>> How do I configure solr to use the basic authentication credentials?
>>>
>>> This is the error message:
>>>
>>> /opt/solr-5.3.0/bin/solr status
>>>
>>> Found 1 Solr nodes:
>>>
>>> Solr process 31114 running on port 8983
>>>
>>> ERROR: Failed to get system information from http://localhost:8983/solr
>>> due
>>> to: org.apache.http.client.ClientProtocolException: Expected JSON response
>>> from server but received: 
>>>
>>> 
>>>
>>> 
>>>
>>> Error 401 Unauthorized
>>>
>>> 
>>>
>>> HTTP ERROR 401
>>>
>>> Problem accessing /solr/admin/info/system. Reason:
>>>
>>> UnauthorizedPowered by
>>> Jetty://
>>>
>>>
>>> 
>>>
>>> 
>>>
>>
>>



-- 
-
Noble Paul


Re: Ideas

2015-09-21 Thread Paul Libbrecht
Writing a query component would be pretty easy or?
It would throw an exception if crazy numbers are requested...

I can provide a simple example of a maven project for a query component.

Paul


William Bell wrote:
> We have some Denial of service attacks on our web site. SOLR threads are
> going crazy.
>
> Basically someone is hitting start=15 + and rows=20. The start is crazy
> large.
>
> And then they jump around. start=15 then start=213030 etc.
>
> Any ideas for how to stop this besides blocking these IPs?
>
> Sometimes it is Google doing it even though these search results are set
> with No-index and No-Follow on these pages.
>
> Thoughts? Ideas?



[ANNOUNCE] Apache Lucene 5.3.1 released

2015-09-24 Thread Noble Paul
24 September 2015, Apache Solr™ 5.3.1 available


The Lucene PMC is pleased to announce the release of Apache Solr 5.3.1


Solr is the popular, blazing fast, open source NoSQL search platform

from the Apache Lucene project. Its major features include powerful

full-text search, hit highlighting, faceted search, dynamic

clustering, database integration, rich document (e.g., Word, PDF)

handling, and geospatial search. Solr is highly scalable, providing

fault tolerant distributed search and indexing, and powers the search

and navigation features of many of the world's largest internet sites.


This release contains various bug fixes and optimizations since the
5.3.0 release. The release is available for immediate download at:


  http://lucene.apache.org/solr/mirrors-solr-latest-redir.html


Please read CHANGES.txt for a full list of new features and changes:


  https://lucene.apache.org/solr/5_3_1/changes/Changes.html


Solr 5.3.1 includes these bug fixes.


 * security.json is not loaded on server start

 * RuleBasedAuthorization plugin does not respect the
collection-admin-edit permission

 * Fix VelocityResponseWriter template encoding issue. Templates must
be UTF-8 encoded

 * SimplePostTool (also bin/post) -filetypes "*" now works properly in
'web' mode

 * example/files update-script.js to be Java 7 and 8 compatible.

 * SolrJ could not make requests to handlers with '/admin/' prefix

 * Use of timeAllowed can cause incomplete filters to be cached and
incorrect results to be returned on subsequent requests

 * VelocityResponseWriter's $resource.get(key,baseName,locale) to use
specified locale.

 * Fix the exclusion filter so that collections that start with js,
css, img, tpl can be accessed.

 * Resolve XSS issue in Admin UI stats page


Known issues:

 * On Windows, bin/solr.cmd script fails to start correctly when using
relative path with -s parameter. Use absolute path as a workaround .
https://issues.apache.org/jira/browse/SOLR-8073


See the CHANGES.txt file included with the release for a full list of
changes and further details.


Please report any feedback to the mailing lists
(http://lucene.apache.org/solr/discussion.html)


Note: The Apache Software Foundation uses an extensive mirroring
network for distributing releases. It is possible that the mirror you
are using may not have replicated the release yet. If that is the
case, please try another mirror. This also goes for Maven access.

Noble Paul
on behalf of Lucene PMC


Re: [ANNOUNCE] Apache Lucene 5.3.1 released

2015-09-24 Thread Noble Paul
Wrong title

On Thu, Sep 24, 2015 at 10:55 PM, Noble Paul  wrote:
> 24 September 2015, Apache Solr™ 5.3.1 available
>
>
> The Lucene PMC is pleased to announce the release of Apache Solr 5.3.1
>
>
> Solr is the popular, blazing fast, open source NoSQL search platform
>
> from the Apache Lucene project. Its major features include powerful
>
> full-text search, hit highlighting, faceted search, dynamic
>
> clustering, database integration, rich document (e.g., Word, PDF)
>
> handling, and geospatial search. Solr is highly scalable, providing
>
> fault tolerant distributed search and indexing, and powers the search
>
> and navigation features of many of the world's largest internet sites.
>
>
> This release contains various bug fixes and optimizations since the
> 5.3.0 release. The release is available for immediate download at:
>
>
>   http://lucene.apache.org/solr/mirrors-solr-latest-redir.html
>
>
> Please read CHANGES.txt for a full list of new features and changes:
>
>
>   https://lucene.apache.org/solr/5_3_1/changes/Changes.html
>
>
> Solr 5.3.1 includes these bug fixes.
>
>
>  * security.json is not loaded on server start
>
>  * RuleBasedAuthorization plugin does not respect the
> collection-admin-edit permission
>
>  * Fix VelocityResponseWriter template encoding issue. Templates must
> be UTF-8 encoded
>
>  * SimplePostTool (also bin/post) -filetypes "*" now works properly in
> 'web' mode
>
>  * example/files update-script.js to be Java 7 and 8 compatible.
>
>  * SolrJ could not make requests to handlers with '/admin/' prefix
>
>  * Use of timeAllowed can cause incomplete filters to be cached and
> incorrect results to be returned on subsequent requests
>
>  * VelocityResponseWriter's $resource.get(key,baseName,locale) to use
> specified locale.
>
>  * Fix the exclusion filter so that collections that start with js,
> css, img, tpl can be accessed.
>
>  * Resolve XSS issue in Admin UI stats page
>
>
> Known issues:
>
>  * On Windows, bin/solr.cmd script fails to start correctly when using
> relative path with -s parameter. Use absolute path as a workaround .
> https://issues.apache.org/jira/browse/SOLR-8073
>
>
> See the CHANGES.txt file included with the release for a full list of
> changes and further details.
>
>
> Please report any feedback to the mailing lists
> (http://lucene.apache.org/solr/discussion.html)
>
>
> Note: The Apache Software Foundation uses an extensive mirroring
> network for distributing releases. It is possible that the mirror you
> are using may not have replicated the release yet. If that is the
> case, please try another mirror. This also goes for Maven access.
>
> Noble Paul
> on behalf of Lucene PMC



-- 
-
Noble Paul


[ANNOUNCE] Apache Solr 5.3.1 released

2015-09-24 Thread Noble Paul
24 September 2015, Apache Solr™ 5.3.1 available


The Lucene PMC is pleased to announce the release of Apache Solr 5.3.1


Solr is the popular, blazing fast, open source NoSQL search platform

from the Apache Lucene project. Its major features include powerful

full-text search, hit highlighting, faceted search, dynamic

clustering, database integration, rich document (e.g., Word, PDF)

handling, and geospatial search. Solr is highly scalable, providing

fault tolerant distributed search and indexing, and powers the search

and navigation features of many of the world's largest internet sites.


This release contains various bug fixes and optimizations since the
5.3.0 release. The release is available for immediate download at:


  http://lucene.apache.org/solr/mirrors-solr-latest-redir.html


Please read CHANGES.txt for a full list of new features and changes:


  https://lucene.apache.org/solr/5_3_1/changes/Changes.html


Solr 5.3.1 includes these bug fixes.


 * security.json is not loaded on server start

 * RuleBasedAuthorization plugin does not respect the
collection-admin-edit permission

 * Fix VelocityResponseWriter template encoding issue. Templates must
be UTF-8 encoded

 * SimplePostTool (also bin/post) -filetypes "*" now works properly in
'web' mode

 * example/files update-script.js to be Java 7 and 8 compatible.

 * SolrJ could not make requests to handlers with '/admin/' prefix

 * Use of timeAllowed can cause incomplete filters to be cached and
incorrect results to be returned on subsequent requests

 * VelocityResponseWriter's $resource.get(key,baseName,locale) to use
specified locale.

 * Fix the exclusion filter so that collections that start with js,
css, img, tpl can be accessed.

 * Resolve XSS issue in Admin UI stats page


Known issues:

 * On Windows, bin/solr.cmd script fails to start correctly when using
relative path with -s parameter. Use absolute path as a workaround .
https://issues.apache.org/jira/browse/SOLR-8073


See the CHANGES.txt file included with the release for a full list of
changes and further details.

Noble Paul
on behalf of Lucene PMC


Re: Instant Page Previews

2015-10-08 Thread Paul Libbrecht
This is a very nice start Charlie,

I'd warn a bit however, on the value of such previews: automated
previews of web-page can be quite far from what users might be
remembering a page should look like. In particular all tool pages
typically show quite "empty" or "initial" state in such automatic
previewers.

For i2geo.net, I searched for such a solution (a tick longer than 6
years ago!) and failed to find a successful one. Instead, we built in a
signed applet (yes, this is old) where users could screenshot previews.
To my taste, this allows a far far better feeling, but of course, it
requires a community approach.

Maybe both are needed if there's an infinite budget...

Paul

> Charlie Hull <mailto:char...@flax.co.uk>
> 8 octobre 2015 09:48
>
> Hi Lewin,
>
> We built this feature for another search engine (based on Xapian,
> which I doubt many people have heard of) a long while ago. It's
> standalone and open source though so should be applicable:
> https://github.com/flaxsearch/flaxcode/tree/master/flax_basic/libs/previewgen
>
> It uses a headless version of Open Office under the hood to generate
> thumbbnail previews for various common file types, plus some
> ImageMagick for PDF, all wrapped up in Python. Bear in mind this is 6
> years old so some updating might be required!
>
> Cheers
>
> Charlie
>
>
> Lewin Joy (TMS) <mailto:lewin_...@toyota.com>
> 7 octobre 2015 19:49
> Hi,
>
> Is there anyway we can implement instant page previews in solr?
> Just saw that Google Search Appliance has this out of the box.
> Just like what google.com had previously. We need to display the
> content of the result record when hovering over the link.
>
> Thanks,
> Lewin
>
>
>
>



Re: Kate Winslet vs Winslet Kate

2015-11-01 Thread Paul Libbrecht

I believe that very many installations of solr actually need a query
expansion such as the one you describe below with an indexing of each
textual fields in multiple forms (string, straight
(whitespace/ideaograms), stemmed, phonetic).

Thanks to edismax, I think, you would do the following expansion:
- 2.0 for string match (same field only, complete value)
- 1.8 for straight match phrase (same field only, using a slop)
- 1.5 for straight match in bag of words
- 1.3 for stemmed match in bag of words
- 1.1 for phonetic match in bag of words
I think you can do that with edismax: sure about the parameter
distribution, just not sure about the pf usage, this might need two
straight fields which is quite cheap.

As others indicated having intelligence to recognize the terms (e.g.
Kate should be in name) or some user indication to do so can make thing
more precise but is rarely done.
Please note that this is just a suggestion. In particular, parameters
really need some testing and adjustment, I think.

Paul


> Erick Erickson <mailto:erickerick...@gmail.com>
> 1 novembre 2015 07:40
> Yeah, that's actually a tough one. You have no control over what the
> user types,
> you have to try to guess what they meant.
>
> To do that right, you really have to have some meta-data besides what
> the user
> typed in, i.e. recognize "kate" and "winslet" are proper names and
> "movies" is
> something else and break up the query appropriately behind the scenes.
>
> edismax might help here. You could copyField for everything into a
> bag_of_words field then boost the name field quite high relative to the
> bag_of_words field. That way, and _assuming_ that the bag_of_words
> field had all three words, then the user at least gets something.
>
> You can also do some tricks with edismax and the "pf" parameters. That
> option automatically takes the input and makes a phrase out of it against
> the field, so you get better scores for, say, the name field if it
> contains
> the phrase "kate winslet". doesn't help with the kate winslet movies
> though.
>
> On Sat, Oct 31, 2015 at 11:11 PM, Daniel Valdivia
> Daniel Valdivia <mailto:h...@danielvaldivia.com>
> 1 novembre 2015 07:11
> Perhaps
>
> q=name:("Kate AND Winslet")
>
> q=name:("Kate Winslet")
>
> Sent from my iPhone
>
> Yangrui Guo <mailto:guoyang...@gmail.com>
> 1 novembre 2015 06:21
> Thanks for the reply. Putting the name: before the terms did the work. I
> just wanted to generalize the search query because users might be
> interested in querying Kate Winslet herself or her movies. If user enter
> query string "Kate Winslet movie", the query q=name:(Kate AND Winslet AND
> movie) will return nothing.
>
> Yangrui Guo
>
> On Saturday, October 31, 2015, Erick Erickson 
>
> Erick Erickson <mailto:erickerick...@gmail.com>
> 1 novembre 2015 05:27
> There are a couple of anomalies here.
>
> 1> kate AND winslet
> What does the query look like if you add &debug=true to the statement
> and look at the "parsed_query" section of the return? My guess is you
> typed "q=name:kate AND winslet" which parses as "q=name:kate AND
> default_search_field:winslet" and are getting matches you don't
> expect. You need something like "q=name:(kate AND winslet)" or
> "q=name:kate AND name:winslet". Note that if you're using eDIsmax it's
> more complicated, but that should still honor the intent.
>
> 2> I have no idea why searching for "Kate Winslet" in quotes returns
> anything, I wouldn't expect it to unless you mean you type in "q=kate
> winslet" which is searching against your default field, not the name
> field.
>
> Best,
> Erick
> Yangrui Guo <mailto:guoyang...@gmail.com>
> 1 novembre 2015 04:52
> Hi today I found an interesting aspect of solr. I imported IMDB data into
> solr. The IMDB puts last name before first name for its person's name
> field
> eg. "Winslet, Kate". When I search "Winslet Kate" with quotation marks I
> could get the exact result. However if I search "Kate Winslet" or Kate AND
> Winslet solr seem to return me all result containing either Kate or
> Winslet
> which is similar to "Winslet Kate"~99. From user perspective I
> certainly want solr to treat Kate Winslet the same as Winslet Kate. Is
> there anyway to make solr score higher for terms in the same field?
>
> Yangrui
>



Re: Kate Winslet vs Winslet Kate

2015-11-01 Thread Paul Libbrecht
Alexandre,

I guess you are talking about that post:
 
http://lucidworks.com/blog/2015/06/06/query-autofiltering-extended-language-logic-search/

I think it is very often impossible to solve properly.

Words such as "direction" have very many meanings and would come in
different fields.
In IMDB, words such as the names of persons would come in at least
different roles; similarly, the actors' role's name is likely to match
the family name of persons...

Paul



> As others indicated having intelligence to recognize the terms (e.g.
> Kate should be in name) or some user indication to do so can make thing
> more precise but is rarely done.
> Alexandre Rafalovitch <mailto:arafa...@gmail.com>
> 1 novembre 2015 13:07
> Which is what I believe Ted Sullivan is working on and presented at
> the latest Lucene/Solr Revolution. His presentation does not seem to
> be up, but he was writing about it on:
> http://lucidworks.com/blog/author/tedsullivan/

> Erick Erickson <mailto:erickerick...@gmail.com>
> 1 novembre 2015 07:40
> Yeah, that's actually a tough one. You have no control over what the
> user types,
> you have to try to guess what they meant.



  1   2   3   4   5   6   7   8   9   10   >