edismax/boost: certain documents should be last
(I am using solr 3.4 and edismax.) In my index, I have a multivalued field named "genre". One of the values this field can have is "Citation". I would like documents that have a genre field of Citation to always be at the bottom of the search results. I've been experimenting, but I can't seem to figure out the syntax of the search I need. Here is the search that seems most logical to me (newlines added here for readability): q=%2bcontent%3Anotes+genre%3ACitation^0.01 &start=0 &rows=3 &fl=genre+title &version=2.2 &defType=edismax I get the same results whether I include "genre%3ACitation^0.01" or not. Just to see if my names were correct, I put a minus sign before "genre" and it did, in fact, stop returning all the documents containing Citation. What am I doing wrong? Here are the results from the above query: 0 1 genre title 0 +content:notes genre:Citation^0.01 3 2.2 edismax CitationFiction Notes on novelists With some other notes Citation Novel notes Citation Knock about notes
Re: edismax/boost: certain documents should be last
Thanks Erik. They don't need to absolutely always be the bottom-most -- just not near the top. But that sounds like an easy way to do it, especially since it is a lot easier to reindex now than it used to be. I would like to know why my query had no effect, though. There's obviously something I don't get about queries. On Mon, Oct 31, 2011 at 10:08 AM, Erik Hatcher wrote: > Paul (*bows* to the NINES!) - > > If you literally want Citations always at the bottom regardless of other > relevancy, then perhaps consider indexing boolean top_sort as true for > everything Citations and false otherwise, then use &sort=top_sort asc,score > desc (or do you need to desc top_sort? true then false or false then true?) > > Then you can have Citations literally at the bottom (and within that sorted > in score order) and likewise with non-Citations at the top and sorted score > order within that. Other tricks still risk having Citations mixed in should > relevancy score be high enough. > > The morale of this story is: if you want to hard sort by something, then make > a sort field that does it how you like rather than trying to get relevancy > scoring to do it for you. > > Erik > > > On Oct 28, 2011, at 17:17 , Paul wrote: > >> (I am using solr 3.4 and edismax.) >> >> In my index, I have a multivalued field named "genre". One of the >> values this field can have is "Citation". I would like documents that >> have a genre field of Citation to always be at the bottom of the >> search results. >> >> I've been experimenting, but I can't seem to figure out the syntax of >> the search I need. Here is the search that seems most logical to me >> (newlines added here for readability): >> >> q=%2bcontent%3Anotes+genre%3ACitation^0.01 >> &start=0 >> &rows=3 >> &fl=genre+title >> &version=2.2 >> &defType=edismax >> >> I get the same results whether I include "genre%3ACitation^0.01" or not. >> >> Just to see if my names were correct, I put a minus sign before >> "genre" and it did, in fact, stop returning all the documents >> containing Citation. >> >> What am I doing wrong? >> >> Here are the results from the above query: >> >> >> >> 0 >> 1 >> >> genre title >> 0 >> +content:notes genre:Citation^0.01 >> 3 >> 2.2 >> edismax >> >> >> >> >> CitationFiction >> Notes on novelists With some other notes >> >> >> Citation >> Novel notes >> >> >> Citation >> Knock about notes >> >> >> > >
Re: edismax/boost: certain documents should be last
I had been experimenting with bq. I switched to boost like you suggested, and get the following error from solr: "can not use FieldCache on multivalued field: genre" But that sounds like the solution I'd want, if it worked, since it's more flexible than having to reindex. On Mon, Oct 31, 2011 at 10:41 AM, Erik Hatcher wrote: > Paul - look at debugQuery=true output to see why scores end up the way they > do. Use the explainOther to hone in on a specific document to get it's > explanation. The math'll tell you why it's working the way it is. It's more > than just likely that some other scoring factors are overweighting things. > > Also, now that I think about it, you'd be better off leveraging edismax and > the boost parameter. Don't mess with your main q(uery), use > boost=genre:Citation^0.01 or something like that. boost params (not bq!) are > multiplied into the score, not added. Maybe that'll be more to your liking? > > Erik > > > On Oct 31, 2011, at 10:19 , Paul wrote: > >> Thanks Erik. They don't need to absolutely always be the bottom-most >> -- just not near the top. But that sounds like an easy way to do it, >> especially since it is a lot easier to reindex now than it used to be. >> >> I would like to know why my query had no effect, though. There's >> obviously something I don't get about queries. >> >> On Mon, Oct 31, 2011 at 10:08 AM, Erik Hatcher >> wrote: >>> Paul (*bows* to the NINES!) - >>> >>> If you literally want Citations always at the bottom regardless of other >>> relevancy, then perhaps consider indexing boolean top_sort as true for >>> everything Citations and false otherwise, then use &sort=top_sort asc,score >>> desc (or do you need to desc top_sort? true then false or false then true?) >>> >>> Then you can have Citations literally at the bottom (and within that sorted >>> in score order) and likewise with non-Citations at the top and sorted score >>> order within that. Other tricks still risk having Citations mixed in >>> should relevancy score be high enough. >>> >>> The morale of this story is: if you want to hard sort by something, then >>> make a sort field that does it how you like rather than trying to get >>> relevancy scoring to do it for you. >>> >>> Erik >>> >>> >>> On Oct 28, 2011, at 17:17 , Paul wrote: >>> >>>> (I am using solr 3.4 and edismax.) >>>> >>>> In my index, I have a multivalued field named "genre". One of the >>>> values this field can have is "Citation". I would like documents that >>>> have a genre field of Citation to always be at the bottom of the >>>> search results. >>>> >>>> I've been experimenting, but I can't seem to figure out the syntax of >>>> the search I need. Here is the search that seems most logical to me >>>> (newlines added here for readability): >>>> >>>> q=%2bcontent%3Anotes+genre%3ACitation^0.01 >>>> &start=0 >>>> &rows=3 >>>> &fl=genre+title >>>> &version=2.2 >>>> &defType=edismax >>>> >>>> I get the same results whether I include "genre%3ACitation^0.01" or not. >>>> >>>> Just to see if my names were correct, I put a minus sign before >>>> "genre" and it did, in fact, stop returning all the documents >>>> containing Citation. >>>> >>>> What am I doing wrong? >>>> >>>> Here are the results from the above query: >>>> >>>> >>>> >>>> 0 >>>> 1 >>>> >>>> genre title >>>> 0 >>>> +content:notes genre:Citation^0.01 >>>> 3 >>>> 2.2 >>>> edismax >>>> >>>> >>>> >>>> >>>> CitationFiction >>>> Notes on novelists With some other notes >>>> >>>> >>>> Citation >>>> Novel notes >>>> >>>> >>>> Citation >>>> Knock about notes >>>> >>>> >>>> >>> >>> > >
Re: edismax/boost: certain documents should be last
I studied the results with debugQuery, and I understand how the search is working. The two scores for the two terms are added together, so specifying a boost less than one still adds to the score. For instance, in the first result, content:notes has a score of 1.4892359 and genre:Citation^0.01 has a score of 0.0045761107. Those two numbers are added together to get the total score. I tried putting a negative boost number in, but that isn't legal. Is there a way to express "genre does not equal Citation" as an optional parameter that I can boost? On Mon, Oct 31, 2011 at 2:26 PM, Paul wrote: > I had been experimenting with bq. > > I switched to boost like you suggested, and get the following error > from solr: "can not use FieldCache on multivalued field: genre" > > But that sounds like the solution I'd want, if it worked, since it's > more flexible than having to reindex. > > On Mon, Oct 31, 2011 at 10:41 AM, Erik Hatcher wrote: >> Paul - look at debugQuery=true output to see why scores end up the way they >> do. Use the explainOther to hone in on a specific document to get it's >> explanation. The math'll tell you why it's working the way it is. It's >> more than just likely that some other scoring factors are overweighting >> things. >> >> Also, now that I think about it, you'd be better off leveraging edismax and >> the boost parameter. Don't mess with your main q(uery), use >> boost=genre:Citation^0.01 or something like that. boost params (not bq!) >> are multiplied into the score, not added. Maybe that'll be more to your >> liking? >> >> Erik >> >> >> On Oct 31, 2011, at 10:19 , Paul wrote: >> >>> Thanks Erik. They don't need to absolutely always be the bottom-most >>> -- just not near the top. But that sounds like an easy way to do it, >>> especially since it is a lot easier to reindex now than it used to be. >>> >>> I would like to know why my query had no effect, though. There's >>> obviously something I don't get about queries. >>> >>> On Mon, Oct 31, 2011 at 10:08 AM, Erik Hatcher >>> wrote: >>>> Paul (*bows* to the NINES!) - >>>> >>>> If you literally want Citations always at the bottom regardless of other >>>> relevancy, then perhaps consider indexing boolean top_sort as true for >>>> everything Citations and false otherwise, then use &sort=top_sort >>>> asc,score desc (or do you need to desc top_sort? true then false or false >>>> then true?) >>>> >>>> Then you can have Citations literally at the bottom (and within that >>>> sorted in score order) and likewise with non-Citations at the top and >>>> sorted score order within that. Other tricks still risk having Citations >>>> mixed in should relevancy score be high enough. >>>> >>>> The morale of this story is: if you want to hard sort by something, then >>>> make a sort field that does it how you like rather than trying to get >>>> relevancy scoring to do it for you. >>>> >>>> Erik >>>> >>>> >>>> On Oct 28, 2011, at 17:17 , Paul wrote: >>>> >>>>> (I am using solr 3.4 and edismax.) >>>>> >>>>> In my index, I have a multivalued field named "genre". One of the >>>>> values this field can have is "Citation". I would like documents that >>>>> have a genre field of Citation to always be at the bottom of the >>>>> search results. >>>>> >>>>> I've been experimenting, but I can't seem to figure out the syntax of >>>>> the search I need. Here is the search that seems most logical to me >>>>> (newlines added here for readability): >>>>> >>>>> q=%2bcontent%3Anotes+genre%3ACitation^0.01 >>>>> &start=0 >>>>> &rows=3 >>>>> &fl=genre+title >>>>> &version=2.2 >>>>> &defType=edismax >>>>> >>>>> I get the same results whether I include "genre%3ACitation^0.01" or not. >>>>> >>>>> Just to see if my names were correct, I put a minus sign before >>>>> "genre" and it did, in fact, stop returning all the documents >>>>> containing Citation. >>>>> >>>>> What am I doing wrong? >>>>> >>>>> Here are the results from the above query: >>>>> >>>>> >>>>> >>>>> 0 >>>>> 1 >>>>> >>>>> genre title >>>>> 0 >>>>> +content:notes genre:Citation^0.01 >>>>> 3 >>>>> 2.2 >>>>> edismax >>>>> >>>>> >>>>> >>>>> >>>>> CitationFiction >>>>> Notes on novelists With some other notes >>>>> >>>>> >>>>> Citation >>>>> Novel notes >>>>> >>>>> >>>>> Citation >>>>> Knock about notes >>>>> >>>>> >>>>> >>>> >>>> >> >> >
Re: edismax/boost: certain documents should be last
(Sorry for so many messages in a row...) For the record, I figured out something that will work, although it is somewhat inelegant. My q parameter is now: (+content:notes -genre:Citation)^20 (+content:notes genre:Citation)^0.01 Can I improve on that? On Mon, Oct 31, 2011 at 5:52 PM, Paul wrote: > I studied the results with debugQuery, and I understand how the search > is working. The two scores for the two terms are added together, so > specifying a boost less than one still adds to the score. For > instance, in the first result, content:notes has a score of 1.4892359 > and genre:Citation^0.01 has a score of 0.0045761107. Those two numbers > are added together to get the total score. > > I tried putting a negative boost number in, but that isn't legal. > > Is there a way to express "genre does not equal Citation" as an > optional parameter that I can boost? > > On Mon, Oct 31, 2011 at 2:26 PM, Paul wrote: >> I had been experimenting with bq. >> >> I switched to boost like you suggested, and get the following error >> from solr: "can not use FieldCache on multivalued field: genre" >> >> But that sounds like the solution I'd want, if it worked, since it's >> more flexible than having to reindex. >> >> On Mon, Oct 31, 2011 at 10:41 AM, Erik Hatcher >> wrote: >>> Paul - look at debugQuery=true output to see why scores end up the way they >>> do. Use the explainOther to hone in on a specific document to get it's >>> explanation. The math'll tell you why it's working the way it is. It's >>> more than just likely that some other scoring factors are overweighting >>> things. >>> >>> Also, now that I think about it, you'd be better off leveraging edismax and >>> the boost parameter. Don't mess with your main q(uery), use >>> boost=genre:Citation^0.01 or something like that. boost params (not bq!) >>> are multiplied into the score, not added. Maybe that'll be more to your >>> liking? >>> >>> Erik >>> >>> >>> On Oct 31, 2011, at 10:19 , Paul wrote: >>> >>>> Thanks Erik. They don't need to absolutely always be the bottom-most >>>> -- just not near the top. But that sounds like an easy way to do it, >>>> especially since it is a lot easier to reindex now than it used to be. >>>> >>>> I would like to know why my query had no effect, though. There's >>>> obviously something I don't get about queries. >>>> >>>> On Mon, Oct 31, 2011 at 10:08 AM, Erik Hatcher >>>> wrote: >>>>> Paul (*bows* to the NINES!) - >>>>> >>>>> If you literally want Citations always at the bottom regardless of other >>>>> relevancy, then perhaps consider indexing boolean top_sort as true for >>>>> everything Citations and false otherwise, then use &sort=top_sort >>>>> asc,score desc (or do you need to desc top_sort? true then false or >>>>> false then true?) >>>>> >>>>> Then you can have Citations literally at the bottom (and within that >>>>> sorted in score order) and likewise with non-Citations at the top and >>>>> sorted score order within that. Other tricks still risk having Citations >>>>> mixed in should relevancy score be high enough. >>>>> >>>>> The morale of this story is: if you want to hard sort by something, then >>>>> make a sort field that does it how you like rather than trying to get >>>>> relevancy scoring to do it for you. >>>>> >>>>> Erik >>>>> >>>>> >>>>> On Oct 28, 2011, at 17:17 , Paul wrote: >>>>> >>>>>> (I am using solr 3.4 and edismax.) >>>>>> >>>>>> In my index, I have a multivalued field named "genre". One of the >>>>>> values this field can have is "Citation". I would like documents that >>>>>> have a genre field of Citation to always be at the bottom of the >>>>>> search results. >>>>>> >>>>>> I've been experimenting, but I can't seem to figure out the syntax of >>>>>> the search I need. Here is the search that seems most logical to me >>>>>> (newlines added here for readability): >>>>>> >>>>>> q=%2bcontent%3Anotes+genre%3ACitation^0.01 >>>>>> &start=0 >>>>>> &rows=3 >>>>>> &fl=genre+title >>>>>> &version=2.2 >>>>>> &defType=edismax >>>>>> >>>>>> I get the same results whether I include "genre%3ACitation^0.01" or not. >>>>>> >>>>>> Just to see if my names were correct, I put a minus sign before >>>>>> "genre" and it did, in fact, stop returning all the documents >>>>>> containing Citation. >>>>>> >>>>>> What am I doing wrong? >>>>>> >>>>>> Here are the results from the above query: >>>>>> >>>>>> >>>>>> >>>>>> 0 >>>>>> 1 >>>>>> >>>>>> genre title >>>>>> 0 >>>>>> +content:notes genre:Citation^0.01 >>>>>> 3 >>>>>> 2.2 >>>>>> edismax >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> CitationFiction >>>>>> Notes on novelists With some other notes >>>>>> >>>>>> >>>>>> Citation >>>>>> Novel notes >>>>>> >>>>>> >>>>>> Citation >>>>>> Knock about notes >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>> >>> >> >
limiting the total number of documents matched
I'd like to limit the total number of documents that are returned for a search, particularly when the sort order is not based on relevancy. In other words, if the user searches for a very common term, they might get tens of thousands of hits, and if they sort by "title", then very high relevancy documents will be interspersed with very low relevancy documents. I'd like to set a limit to the 1000 most relevant documents, then sort those by title. Is there a way to do this? I guess I could always retrieve the top 1000 documents and sort them in the client, but that seems particularly inefficient. I can't find any other way to do this, though. Thanks, Paul
Re: limiting the total number of documents matched
I was hoping for a way to do this purely by configuration and making the correct GET requests, but if there is a way to do it by creating a custom Request Handler, I suppose I could plunge into that. Would that yield the best results, and would that be particularly difficult? On Wed, Jul 14, 2010 at 4:37 PM, Nagelberg, Kallin wrote: > So you want to take the top 1000 sorted by score, then sort those by another > field. It's a strange case, and I can't think of a clean way to accomplish > it. You could do it in two queries, where the first is by score and you only > request your IDs to keep it snappy, then do a second query against the IDs > and sort by your other field. 1000 seems like a lot for that approach, but > who knows until you try it on your data. > > -Kallin Nagelberg > > > -Original Message- > From: Paul [mailto:p...@nines.org] > Sent: Wednesday, July 14, 2010 4:16 PM > To: solr-user > Subject: limiting the total number of documents matched > > I'd like to limit the total number of documents that are returned for > a search, particularly when the sort order is not based on relevancy. > > In other words, if the user searches for a very common term, they > might get tens of thousands of hits, and if they sort by "title", then > very high relevancy documents will be interspersed with very low > relevancy documents. I'd like to set a limit to the 1000 most relevant > documents, then sort those by title. > > Is there a way to do this? > > I guess I could always retrieve the top 1000 documents and sort them > in the client, but that seems particularly inefficient. I can't find > any other way to do this, though. > > Thanks, > Paul >
Re: limiting the total number of documents matched
I thought of another way to do it, but I still have one thing I don't know how to do. I could do the search without sorting for the 50th page, then look at the relevancy score on the first item on that page, then repeat the search, but add score > that relevancy as a parameter. Is it possible to do a search with "score:[5 to *]"? It didn't work in my first attempt. On Wed, Jul 14, 2010 at 5:34 PM, Paul wrote: > I was hoping for a way to do this purely by configuration and making > the correct GET requests, but if there is a way to do it by creating a > custom Request Handler, I suppose I could plunge into that. Would that > yield the best results, and would that be particularly difficult? > > On Wed, Jul 14, 2010 at 4:37 PM, Nagelberg, Kallin > wrote: >> So you want to take the top 1000 sorted by score, then sort those by another >> field. It's a strange case, and I can't think of a clean way to accomplish >> it. You could do it in two queries, where the first is by score and you only >> request your IDs to keep it snappy, then do a second query against the IDs >> and sort by your other field. 1000 seems like a lot for that approach, but >> who knows until you try it on your data. >> >> -Kallin Nagelberg >> >> >> -Original Message- >> From: Paul [mailto:p...@nines.org] >> Sent: Wednesday, July 14, 2010 4:16 PM >> To: solr-user >> Subject: limiting the total number of documents matched >> >> I'd like to limit the total number of documents that are returned for >> a search, particularly when the sort order is not based on relevancy. >> >> In other words, if the user searches for a very common term, they >> might get tens of thousands of hits, and if they sort by "title", then >> very high relevancy documents will be interspersed with very low >> relevancy documents. I'd like to set a limit to the 1000 most relevant >> documents, then sort those by title. >> >> Is there a way to do this? >> >> I guess I could always retrieve the top 1000 documents and sort them >> in the client, but that seems particularly inefficient. I can't find >> any other way to do this, though. >> >> Thanks, >> Paul >> >
autocomplete: case-insensitive and middle word
I have a couple questions about implementing an autocomplete function in solr. Here's my scenario: I have a name field that usually contains two or three names. For instance, let's suppose it contains: John Alfred Smith Alfred Johnson John Quincy Adams Fred Jones I'd like to have the autocomplete be case insensitive and match any of the names, preferably just at the beginning. In other words, if the user types "alf", I want John Alfred Smith Alfred Johnson if the user types "fre", I want Fred Jones but not: John Alfred Smith Alfred Johnson I can get the matches using the text_lu analyzer, but the hints that are returned are lower case, and only one name. If I use the "string" analyzer, I get the entire name like I want it, but the user must match the case, that is, must type "Alf", and it only matches the first name, not the middle name. How can I get the matches of the "text_lu" analyzer, but get the hints like the "string" analyzer? Thanks, Paul
Re: autocomplete: case-insensitive and middle word
Here's my solution. I'm posting it in case it is radically wrong; I hope someone can help straighten me out. It seems to work fine, and seems fast enough. In schema.xml: Then I restarted solr to pick up the changes. I then ran a script which reads each document out of the current index, and adds the new field: for each doc in my solr index: doc['ac_name'] = doc['name'].split(' ') and write the record back out. Then, using rsolr, I make the following query: response = @solr.select(:params => { :q=> "ac_name:#{prefix}", :start=> 0, :rows=> 500, :fl => "name" }) matches = [] docs = response['response']['docs'] docs.each {|doc| matches.push(doc['name']) } "matches" is now an array of the values I want to display.
What is a "409 version conflict" error [solr 4.1]?
I've got a process that is replacing about 180K documents that are all similar (I'm actually just adding one field to each of them). This is mostly working fine, but occasionally (perhaps 100 times), I get this error: 409 Conflict Error: {'responseHeader'=>{'status'=>409,'QTime'=>1},'error'=>{'msg'=>'version conflict for lib://X expected=1425089660734930944 actual=1425439751468482560','code'=>409}} Why does that happen a few times? How can I prevent that from happening? Is there any other info I can supply that would help the diagnosis? Thanks!
Re: What is a "409 version conflict" error [solr 4.1]?
Ok, I did get a little more information about it from here: http://yonik.com/solr/optimistic-concurrency/ but I really don't know why the version number is conflicting. I'm running the only process that is changing documents, and my process is to read the document, add a field, and write the document. The records were all created at the same time with similar data, and are all being updated for the first time, again with similar data. I did happen to stumble on the atomic updates feature, which is new to me, and is probably a better way to do what I need. On Mon, Jan 28, 2013 at 4:05 PM, Paul wrote: > I've got a process that is replacing about 180K documents that are all > similar (I'm actually just adding one field to each of them). This is > mostly working fine, but occasionally (perhaps 100 times), I get this error: > > 409 Conflict > Error: > {'responseHeader'=>{'status'=>409,'QTime'=>1},'error'=>{'msg'=>'version > conflict for lib://X expected=1425089660734930944 > actual=1425439751468482560','code'=>409}} > > Why does that happen a few times? How can I prevent that from happening? > > Is there any other info I can supply that would help the diagnosis? > > Thanks! > >
Re: compare two shards.
I do a brute-force regression test where I read all the documents from shard 1 and compare them to documents in shard 2. I had to have all the fields stored to do that, but in my case that doesn't change the size of the index much. So, in other words, I do a search for a page's worth of documents sorted by the same thing and compare them, then get the next page and do the same. On Tue, Feb 12, 2013 at 4:20 AM, stockii wrote: > hello. > > i want to compare two shards each other, because these shards should have > the same index. but this isnt so =( > so i want to find these documents, there are missing in one shard of my > both > shards. > > my ideas > - distrubuted shard request on my nodes and fire a facet search on my > unique-field. but the result of facet component isnt reversable =( > > - grouping. but its not working correctly i think so. no groups of the same > uniquekey in the resultset. > > > does anyone some better ideas? > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/compare-two-shards-tp4039887.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: How to handle to run testcases in ruby code for solr
Are you asking how to test your own code, that happens to have a solr query somewhere in the middle of it? I've done this two ways: 1) You can mock the solr call by detecting that you are in test mode and just return the right answer. That will be fast. 2) Or you set up a second core with the name "test", and initialize it for each test. That will give you confidence that your queries are formed correctly. Since your test data is generally really small, I've found that using the second method performs well enough. I use a global in my app that contains the name of the core, and I set that in a before filter in application_controller depending whether I'm in test mode or not. As part of the test set up I delete all documents. The "delete *:*" call is really fast. Then I commit a dozen documents or so. With that little data that is fast, too. I wrap all calls to solr in a single model so there is one point in my app that calls rsolr. I can override that class to mock it out if the solr result is not the focus of the test, and do the above work if the solr result is the focus of the test. > > On Feb 17, 2012, at 07:12 , solr wrote: > >> Hi all, >> Am writing rails application by using solr_ruby gem to access solr . >> Can anybody suggest how to handle testcaeses for solr code and connections >> in functionaltetsing. >>
searching top matches of each facet
Let's say that I have a facet named 'subject' that contains one of: physics, chemistry, psychology, mathematics, etc I'd like to do a search for the top 5 documents in each category. I can do this with a separate search for each facet, but it seems like there would a way to combine the search. Is there a way? That is, if the user searches for "my search", I can now search for it with the facet of "physics" and rows=5, then do a separate search with the facet of "chemistry", etc... Can I do that in one search to decrease the load on the server? Or, when I do the first search, will the results be cached, so that the rest of the searches are pretty cheap?
Re: searching top matches of each facet
Perfect! Thanks! On Wed, Feb 29, 2012 at 3:29 PM, Emmanuel Espina wrote: > I think that what you want is FieldCollapsing: > > http://wiki.apache.org/solr/FieldCollapsing > > For example > &q=my search&group=true&group.field=subject&group.limit=5 > > Test it to see if that is what you want. > > Thanks > Emmanuel > > > 2012/2/29 Paul : >> Let's say that I have a facet named 'subject' that contains one of: >> physics, chemistry, psychology, mathematics, etc >> >> I'd like to do a search for the top 5 documents in each category. I >> can do this with a separate search for each facet, but it seems like >> there would a way to combine the search. Is there a way? >> >> That is, if the user searches for "my search", I can now search for it >> with the facet of "physics" and rows=5, then do a separate search with >> the facet of "chemistry", etc... >> >> Can I do that in one search to decrease the load on the server? Or, >> when I do the first search, will the results be cached, so that the >> rest of the searches are pretty cheap?
Re: score from two cores
On Fri, Dec 3, 2010 at 4:47 PM, Erick Erickson wrote: > But why do you have two cores in the first place? Is it really necessary or > is it just > making things more complex? I don't know why the OP wants two cores, but I ran into this same problem and had to abandon using a second core. My use case is: I have lots of slowing-changing documents, and a few often-changing documents. Those classes of documents are updated by different people using different processes. I wanted to split them into separate cores so that: 1) The large core wouldn't change except deliberately so there would be less chance of a bug creeping in. Also, that core is the same on different servers, so they could be replicated. 2) The small core would update and optimize quickly and the data in it is different on different servers. The problem is that the search results should return relevancy as if there were only one core.
Re: Improving Solr performance
> I see from your other messages that these indexes all live on the same > machine. > You're almost certainly I/O bound, because you don't have enough memory for > the > OS to cache your index files. With 100GB of total index size, you'll get best > results with between 64GB and 128GB of total RAM. Is that a general rule of thumb? That it is best to have about the same amount of RAM as the size of your index? So, with a 5GB index, I should have between 4GB and 8GB of RAM dedicated to solr?
verifying that an index contains ONLY utf-8
We've created an index from a number of different documents that are supplied by third parties. We want the index to only contain UTF-8 encoded characters. I have a couple questions about this: 1) Is there any way to be sure during indexing (by setting something in the solr configuration?) that the documents that we index will always be stored in utf-8? Can solr convert documents that need converting on the fly, or can solr reject documents containing illegal characters? 2) Is there a way to scan the existing index to find any string containing non-utf8 characters? Or is there another way that I can discover if any crept into my index?
Re: verifying that an index contains ONLY utf-8
Thanks for all the responses. CharsetDetector does look promising. Unfortunately, we aren't allowed to keep the original of much of our data, so the solr index is the only place it exists (to us). I do have a java app that "reindexes", i.e., reads all documents out of one index, does some transform on them, then writes them to a second index. So I already have a place where I see all the data in the index stream by. I wanted to make sure there wasn't some built in way of doing what I need. I know that it is possible to fool the algorithm, but I'll see if the string is a possible utf-8 string first and not change that. Then I won't be introducing more errors and maybe I can detect a large percentage of the non-utf-8 strings. On Thu, Jan 13, 2011 at 4:36 PM, Robert Muir wrote: > it does: > http://icu-project.org/apiref/icu4j/com/ibm/icu/text/CharsetDetector.html > this takes a sample of the file and makes a guess.
last item in results page is always the same
(I'm using solr 1.4) I'm doing a test of my index, so I'm reading out every document in batches of 500. The query is (I added newlines here to make it readable): http://localhost:8983/solr/archive_ECCO/select/ ?q=archive%3AECCO &fl=uri &version=2.2 &start=0 &rows=500 &indent=on &sort=uri%20asc It turns out, in this case, the query should match every document. The response shows numFound="182413". If I scan the returned values, they appear sorted properly except the last one. In other words, the uri that are returned on the first page are: 100100 100200 etc... 0006601600 0006601700 1723200600 That 499th value is returned as the 499th value on every page. That is, if I call it with &start=500, then most of the entries look right, but that last value will still be 1723200600, and the true 499th value is never returned. 1723200600 should have been returned as the 181,499th item. Is this a known solr bug or is there something subtle going on? Thanks, Paul
Re: last item in results page is always the same
Thanks, going to update now. This is a system that is currently deployed. Should I just update to 1.4.1, or should I go straight to 3.0? Does 1.4 => 3.0 require reindexing? On Wed, Feb 16, 2011 at 5:37 PM, Yonik Seeley wrote: > On Wed, Feb 16, 2011 at 5:08 PM, Paul wrote: >> Is this a known solr bug or is there something subtle going on? > > Yes, I think it's the following bug, fixed in 1.4.1: > > * SOLR-1777: fieldTypes with sortMissingLast=true or sortMissingFirst=true can > result in incorrectly sorted results. > > -Yonik > http://lucidimagination.com >
Search failing for matched text in large field
I'm using solr 1.4.1. I have a document that has a pretty big field. If I search for a phrase that occurs near the start of that field, it works fine. If I search for a phrase that appears even a little ways into the field, it doesn't find it. Is there some limit to how far into a field solr will search? Here's the way I'm doing the search. All I'm changing is the text I'm searching on to make it succeed or fail: http://localhost:8983/solr/my_core/select/?q=%22search+phrase%22&hl=on&hl.fl=text Or, if it is not related to how large the document is, what else could it possibly be related to? Could there be some character in that field that is stopping the search?
Re: Search failing for matched text in large field
Ah, no, I'll try that now. What is the disadvantage of setting that to a really large number? I do want the search to work for every word I give to solr. Otherwise I wouldn't have indexed it to begin with. On Wed, Mar 23, 2011 at 11:15 AM, Sascha Szott wrote: > Hi Paul, > > did you increase the value of the maxFieldLength parameter in your > solrconfig.xml? > > -Sascha > > On 23.03.2011 17:05, Paul wrote: >> >> I'm using solr 1.4.1. >> >> I have a document that has a pretty big field. If I search for a >> phrase that occurs near the start of that field, it works fine. If I >> search for a phrase that appears even a little ways into the field, it >> doesn't find it. Is there some limit to how far into a field solr will >> search? >> >> Here's the way I'm doing the search. All I'm changing is the text I'm >> searching on to make it succeed or fail: >> >> >> http://localhost:8983/solr/my_core/select/?q=%22search+phrase%22&hl=on&hl.fl=text >> >> Or, if it is not related to how large the document is, what else could >> it possibly be related to? Could there be some character in that field >> that is stopping the search? >
Re: Search failing for matched text in large field
I increased maxFieldLength and reindexed a small number of documents. That worked -- I got the correct results. In 3 minutes! I assume that if I reindex all my documents that all searches will become even slower. Is there any way to get all the results in a way that is quick enough that my user won't get bored waiting? Is there some optimization of this coming in solr 3.0? On Wed, Mar 23, 2011 at 12:15 PM, Sascha Szott wrote: > Hi Paul, > > did you increase the value of the maxFieldLength parameter in your > solrconfig.xml? > > -Sascha > > On 23.03.2011 17:05, Paul wrote: >> >> I'm using solr 1.4.1. >> >> I have a document that has a pretty big field. If I search for a >> phrase that occurs near the start of that field, it works fine. If I >> search for a phrase that appears even a little ways into the field, it >> doesn't find it. Is there some limit to how far into a field solr will >> search? >> >> Here's the way I'm doing the search. All I'm changing is the text I'm >> searching on to make it succeed or fail: >> >> >> http://localhost:8983/solr/my_core/select/?q=%22search+phrase%22&hl=on&hl.fl=text >> >> Or, if it is not related to how large the document is, what else could >> it possibly be related to? Could there be some character in that field >> that is stopping the search? >
Re: Search failing for matched text in large field
I looked into the search that I'm doing a little closer and it seems like the highlighting is slowing it down. If I do the query without requesting highlighting it is fast. (BTW, I also have faceting and pagination in my query. Faceting doesn't seem to change the response time much, adding &rows= and &start= does, but not prohibitively.) The field in question needs to be stored=true, because it is needed for highlighting. I'm thinking of doing this in two searches: first without highlighting and put a progress spinner next to each result, then do an ajax call to repeat the search with highlighting that can take its time to finish. (I, too, have seen random really long response times that seem to be related to not enough RAM, but this isn't the problem because the results here are repeatable.) On Wed, Mar 23, 2011 at 2:30 PM, Sascha Szott wrote: > On 23.03.2011 18:52, Paul wrote: >> >> I increased maxFieldLength and reindexed a small number of documents. >> That worked -- I got the correct results. In 3 minutes! > > Did you mark the field in question as stored = false? > > -Sascha > >> >> I assume that if I reindex all my documents that all searches will >> become even slower. Is there any way to get all the results in a way >> that is quick enough that my user won't get bored waiting? Is there >> some optimization of this coming in solr 3.0? >> >> On Wed, Mar 23, 2011 at 12:15 PM, Sascha Szott wrote: >>> >>> Hi Paul, >>> >>> did you increase the value of the maxFieldLength parameter in your >>> solrconfig.xml? >>> >>> -Sascha >>> >>> On 23.03.2011 17:05, Paul wrote: >>>> >>>> I'm using solr 1.4.1. >>>> >>>> I have a document that has a pretty big field. If I search for a >>>> phrase that occurs near the start of that field, it works fine. If I >>>> search for a phrase that appears even a little ways into the field, it >>>> doesn't find it. Is there some limit to how far into a field solr will >>>> search? >>>> >>>> Here's the way I'm doing the search. All I'm changing is the text I'm >>>> searching on to make it succeed or fail: >>>> >>>> >>>> >>>> http://localhost:8983/solr/my_core/select/?q=%22search+phrase%22&hl=on&hl.fl=text >>>> >>>> Or, if it is not related to how large the document is, what else could >>>> it possibly be related to? Could there be some character in that field >>>> that is stopping the search? >>> >
Best practice for rotating solr logs
I'm about to set up log rotation using logrotate, but I have a question about how to do it. The general examples imply that one should include the following in the script: postrotate /sbin/service solr restart endscript but it seems to me that any requests that come in during that restart process are going to return errors. The other way to do it is to use copytruncate but that will cause any requests that come in during the time that the file is being truncated to not appear in the log. How do you set up your logrotate file? Thanks, Paul
ConcurrentLRUCache$Stats error
I'm using solr 1.4.1 and just noticed a bunch of these errors in the solr.log file: SEVERE: java.util.concurrent.ExecutionException: java.lang.NoSuchMethodError: org.apache.solr.common.util.ConcurrentLRUCache$Stats.add(Lorg/apache/solr/common/util/ConcurrentLRUCache$Stats;)V They appear to happen after a commit, and as far as I can tell, everything is working fine -- that's why I didn't notice these errors earlier. What is this telling me?
Searching for escaped characters
I'm trying to create a test to make sure that character sequences like "è" are successfully converted to their equivalent utf character (that is, in this case, "è"). So, I'd like to search my solr index using the equivalent of the following regular expression: &\w{1,6}; To find any escaped sequences that might have slipped through. Is this possible? I have indexed these fields with text_lu, which looks like this: Thanks, Paul
applying FastVectorHighlighter truncation patch to solr 3.1
I'm having this issue with solr 3.1: https://issues.apache.org/jira/browse/LUCENE-1824 It looks like there is a patch offered, but I can't figure out how to apply it. What is the easiest way for me to get this fix? I'm just using the example solr with changed conf xml files. Is there a file somewhere I can just drop in?
auto suggestion with text_en field
Sorry if this has been asked before, but I couldn't seem to find it... I've got a fairly simple index, and I'm searching on a field of type text_en, and the results are good: I search for "computer" and I get back hits for "computer", "computation", "computational", "computing". I also want to create an auto suggestion drop down, so I did a query using the field as a facet, and I get back a good, but literal, set of suggestions. For instance, one of the suggestions is "comput", which does actually match what I want it to, but it is ugly, since it isn't actually a word. As I'm thinking about it, I'm not sure what word I would like it to return in this situation, so I'm asking how others have handled this situation. Is it illogical to have auto complete on a text_en field? Do I have to pick one or the other? Thanks,
Setting up two cores in solr.xml for Solr 4.0
I'm trying to set up two cores that share everything except their data. (This is for testing: I want to create a parallel index that is used when running my testing scripts.) I thought that would be straightforward, and according to the documentation, I thought the following would work: I thought that would create a directory structure like this: solr MYCORE conf data index MYCORE_test index But it looks like both of the cores are sharing the same index and the MYCORE_test directory is not created. In addition, I get the followin in the log file: INFO: [MYCORE_test] Opening new SolrCore at solr/MYCORE/, dataDir=solr/MYCORE/data/ ... WARNING: New index directory detected: old=null new=solr/MYCORE/data/index/ What am I not understanding?
Re: Setting up two cores in solr.xml for Solr 4.0
By trial an error, I found that you evidently need to put that property inline, so this version works: Is the documentation here in error? http://wiki.apache.org/solr/CoreAdmin On Tue, Sep 4, 2012 at 2:50 PM, Paul wrote: > I'm trying to set up two cores that share everything except their > data. (This is for testing: I want to create a parallel index that is > used when running my testing scripts.) I thought that would be > straightforward, and according to the documentation, I thought the > following would work: > > > > > > > > > I thought that would create a directory structure like this: > > solr > MYCORE > conf > data > index > MYCORE_test > index > > But it looks like both of the cores are sharing the same index and the > MYCORE_test directory is not created. In addition, I get the followin > in the log file: > > INFO: [MYCORE_test] Opening new SolrCore at solr/MYCORE/, > dataDir=solr/MYCORE/data/ > ... > WARNING: New index directory detected: old=null new=solr/MYCORE/data/index/ > > What am I not understanding?
Re: Setting up two cores in solr.xml for Solr 4.0
I don't think I changed by solrconfig.xml file from the default that was provided in the example folder for solr 4.0. On Tue, Sep 4, 2012 at 3:40 PM, Chris Hostetter wrote: > > : > > I'm pretty sure what you hav above tells solr that core MYCORE_test it > should use the instanceDir MYCORE but ignore the in that > solrconfig.xml and use the one you specified. > > This on the other hand... > > : > > : > > : > > > ...tells solr that the MYCORE_test SolrCore should use the instanceDir > MYCORE, and when parsing that solrconfig.xml file it should set the > variable ${dataDir} to be "MYCORE_test" -- but if your solconfig.xml file > does not ever refer to the ${dataDir} variable, it would have any effect. > > so the question becomes -- what does your solrconfig.xml look like? > > > -Hoss
Still see document after delete with commit in solr 4.0
I've recently upgraded to solr 4.0 from solr 3.5 and I think my delete statement used to work, but now it doesn't seem to be deleting. I've been experimenting around, and it seems like this should be the URL for deleting the document with the uri of "network_24". In a browser, I first go here: http://localhost:8983/solr/MYCORE/update?stream.body=%3Cdelete%3E%3Cquery%3Euri%3Anetwork_24%3C%2Fquery%3E%3C%2Fdelete%3E&commit=true I get this response: 0 5 And this is in the log file: (timestamp) org.apache.solr.update.DirectUpdateHandler2 commit INFO: start commit{flags=0,_version_=0,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false} (timestamp) org.apache.solr.search.SolrIndexSearcher INFO: Opening Searcher@646dd60e main (timestamp) org.apache.solr.update.DirectUpdateHandler2 commit INFO: end_commit_flush (timestamp) org.apache.solr.core.QuerySenderListener newSearcher INFO: QuerySenderListener sending requests to Searcher@646dd60e main{StandardDirectoryReader(segments_2v:447 _4p(4.0.0.1):C3244)} (timestamp) org.apache.solr.core.QuerySenderListener newSearcher INFO: QuerySenderListener done. S(timestamp) org.apache.solr.core.SolrCore registerSearcher INFO: [MYCORE] Registered new searcher Searcher@646dd60e main{StandardDirectoryReader(segments_2v:447 _4p(4.0.0.1):C3244)} (timestamp) org.apache.solr.update.processor.LogUpdateProcessor finish INFO: [MYCORE] webapp=/solr path=/update params={commit=true&stream.body=uri:network_24} {deleteByQuery=uri:network_24,commit=} 0 5 But if I then go to this URL: http://localhost:8983/solr/MYCORE/select?q=uri%3Anetwork_24&wt=xml I get this response: 0 1 xml uri:network_24 network24 network_24 Why didn't that document disappear?
Re: Still see document after delete with commit in solr 4.0
That was exactly it. I added the following line to schema.xml and it now works. On Wed, Sep 5, 2012 at 10:13 AM, Jack Krupansky wrote: > Check to make sure that you are not stumbling into SOLR-3432: "deleteByQuery > silently ignored if updateLog is enabled, but {{_version_}} field does not > exist in schema". > > See: > https://issues.apache.org/jira/browse/SOLR-3432 > > -- Jack Krupansky > > -Original Message- From: Paul > Sent: Wednesday, September 05, 2012 10:05 AM > To: solr-user > Subject: Still see document after delete with commit in solr 4.0 > > > I've recently upgraded to solr 4.0 from solr 3.5 and I think my delete > statement used to work, but now it doesn't seem to be deleting. I've > been experimenting around, and it seems like this should be the URL > for deleting the document with the uri of "network_24". > > In a browser, I first go here: > > http://localhost:8983/solr/MYCORE/update?stream.body=%3Cdelete%3E%3Cquery%3Euri%3Anetwork_24%3C%2Fquery%3E%3C%2Fdelete%3E&commit=true > > I get this response: > > > >0 >5 > > > > And this is in the log file: > > (timestamp) org.apache.solr.update.DirectUpdateHandler2 commit > INFO: start > commit{flags=0,_version_=0,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false} > (timestamp) org.apache.solr.search.SolrIndexSearcher > INFO: Opening Searcher@646dd60e main > (timestamp) org.apache.solr.update.DirectUpdateHandler2 commit > INFO: end_commit_flush > (timestamp) org.apache.solr.core.QuerySenderListener newSearcher > INFO: QuerySenderListener sending requests to Searcher@646dd60e > main{StandardDirectoryReader(segments_2v:447 _4p(4.0.0.1):C3244)} > (timestamp) org.apache.solr.core.QuerySenderListener newSearcher > INFO: QuerySenderListener done. > S(timestamp) org.apache.solr.core.SolrCore registerSearcher > INFO: [MYCORE] Registered new searcher Searcher@646dd60e > main{StandardDirectoryReader(segments_2v:447 _4p(4.0.0.1):C3244)} > (timestamp) org.apache.solr.update.processor.LogUpdateProcessor finish > INFO: [MYCORE] webapp=/solr path=/update > params={commit=true&stream.body=uri:network_24} > {deleteByQuery=uri:network_24,commit=} 0 5 > > But if I then go to this URL: > > http://localhost:8983/solr/MYCORE/select?q=uri%3Anetwork_24&wt=xml > > I get this response: > > > >0 >1 > > xml > uri:network_24 > > > > > network24 > network_24 > > > > > Why didn't that document disappear?
Re: Still see document after delete with commit in solr 4.0
Actually, I didn't technically "upgrade". I downloaded the new version, grabbed the example, and pasted in the fields from my schema into the new one. So the only two files I changed from the example are schema.xml and solr.xml. Then I reindexed everything from scratch so there was no old index involved, either. On Wed, Sep 5, 2012 at 2:42 PM, Chris Hostetter wrote: > > : That was exactly it. I added the following line to schema.xml and it now > works. > : > : > > Just to be clear: how exactly did you "upgraded to solr 4.0 from solr 3.5" > -- did you throw out your old solrconfig.xml and use the example > solrconfig.xml from 4.0, but keep your 3.5 schema.xml? Do you in fact > have an in your solrconfig.xml? > > (if so: then this is all known as part of SOLR-3432, and won't affect any > users of 4.0-final -- but i want to be absolutely sure there isn't some > other edge case of this bug) > > > -Hoss
facet by "in the past" and "in the future"
I have some documents that contain a date field. I'd like to set up a facet that groups the dates in two buckets: 1) before today, 2) today and in the future. It seems like I should be using range faceting, but I don't see how to set up the parameters. Is there another way to get what I want? The way my user interface will look is: Status -- [x] Open (26) [ ] Closed (127) Where "open" will be all the documents that don't have a date in the past. Thanks!
Re: facet by "in the past" and "in the future"
That is perfect! Thanks. I never would have stumbled onto that. On Thu, Oct 18, 2012 at 5:40 PM, Michael Ryan wrote: > This should do it: > facet=true&facet.query=yourDateField:([* TO > NOW/DAY-1MILLI])&facet.query=yourDateField:([NOW/DAY TO *]) > > -Michael > > -Original Message- > From: Paul [mailto:p...@nines.org] > Sent: Thursday, October 18, 2012 5:28 PM > To: solr-user@lucene.apache.org > Subject: facet by "in the past" and "in the future" > > I have some documents that contain a date field. I'd like to set up a > facet that groups the dates in two buckets: 1) before today, 2) today > and in the future. > > It seems like I should be using range faceting, but I don't see how to > set up the parameters. Is there another way to get what I want? > > The way my user interface will look is: > > Status > -- > [x] Open (26) > [ ] Closed (127) > > Where "open" will be all the documents that don't have a date in the past. > > Thanks!
Re: If you could have one feature in Solr...
Limit the number of results when the results are sorted. In other words, if the results are sorted by name and there are 10,000 results, then there will be items of low relevancy mixed in with the results and it is hard for the user to find the relevant ones. If I could say, "give me no more than 200 results, sorted by name", then I want the most relevant 200 results. (It would be ok to be approximately 200. If there are documents that are the same relevancy, then a few more than 200 would be acceptable.) On Wed, Feb 24, 2010 at 8:42 AM, Grant Ingersoll wrote: > What would it be? >
Diagnosing solr timeout
Hi all, In my app, it seems like solr has become slower over time. The index has grown a bit, and there are probably a few more people using the site, but the changes are not drastic. I notice that when a solr search is made, the amount of cpu and ram spike precipitously. I notice in the solr log, a bunch of entries in the same second that end in: status=0 QTime=212 status=0 QTime=96 status=0 QTime=44 status=0 QTime=276 status=0 QTime=8552 status=0 QTime=16 status=0 QTime=20 status=0 QTime=56 and then: status=0 QTime=315919 status=0 QTime=325071 My questions: How do I figure out what to fix? Do I need to start java with more memory? How do I tell what is the correct amount of memory to use? Is there something particularly inefficient about something else in my configuration, or the way I'm formulating the solr request, and how would I narrow down what it could be? I can't tell, but it seems like it happens after solr has been running unattended for a little while. Should I have a cron job that restarts solr every day? Could the solr process be starved by something else on the server (although -- the only other thing that is particularly running is apache/passenger/rails app)? In other words, I'm at a total loss about how to fix this. Thanks! P.S. In case this helps, here's the exact log entry for the first item that failed: Jun 9, 2010 1:02:52 PM org.apache.solr.core.SolrCore execute INFO: [resources] webapp=/solr path=/select params={hl.fragsize=600&facet.missing=true&facet=false&facet.mincount=1&ids=http://pm.nlx.com/xtf/view?docId%3Dshelley/shelley.04.xml;chunk.id%3Ddiv.ww.shelleyworks.v4.44,http://pm.nlx.com/xtf/view?docId%3Dshelley/shelley.06.xml;chunk.id%3Ddiv.ww.shelleyworks.v6.67,http://pm.nlx.com/xtf/view?docId%3Dtennyson_c/tennyson_c.02.xml;chunk.id%3Ddiv.tennyson.v2.1115,http://pm.nlx.com/xtf/view?docId%3Dmarx/marx.39.xml;chunk.id%3Ddiv.marx.engels.39.325,http://pm.nlx.com/xtf/view?docId%3Dshelley_j/shelley_j.01.xml;chunk.id%3Ddiv.ww.shelley.journals.v1.80,http://pm.nlx.com/xtf/view?docId%3Deliot/eliot.01.xml;chunk.id%3Ddiv.eliot.novels.bede.116,http://pm.nlx.com/xtf/view?docId%3Deliot/eliot.01.xml;chunk.id%3Ddiv.eliot.novels.bede.115,http://pm.nlx.com/xtf/view?docId%3Deliot/eliot.01.xml;chunk.id%3Ddiv.eliot.novels.bede.75,http://pm.nlx.com/xtf/view?docId%3Deliot/eliot.01.xml;chunk.id%3Ddiv.eliot.novels.bede.76,http://pm.nlx.com/xtf/view?docId%3Demerson/emerson.05.xml;chunk.id%3Dralph.waldo.v5.d083,http://pm.nlx.com/xtf/view?docId%3Dshelley/shelley.04.xml;chunk.id%3Ddiv.ww.shelleyworks.v4.31,http://pm.nlx.com/xtf/view?docId%3Dshelley_j/shelley_j.01.xml;chunk.id%3Ddiv.ww.shelley.journals.v1.88,http://pm.nlx.com/xtf/view?docId%3Deliot/eliot.03.xml;chunk.id%3Ddiv.eliot.romola.48&facet.limit=-1&hl.fl=text&hl.maxAnalyzedChars=512000&wt=javabin&hl=true&rows=30&version=1&fl=uri,archive,date_label,genre,source,image,thumbnail,title,alternative,url,role_ART,role_AUT,role_EDT,role_PBL,role_TRL,role_EGR,role_ETR,role_CRE,freeculture,is_ocr,federation,has_full_text,source_xml,uri&start=0&q=(*:*+AND+(life)+AND+(death)+AND+(of)+AND+(jason)+AND+federation:NINES)+OR+(*:*+AND+(life)+AND+(death)+AND+(of)+AND+(jason)+AND+federation:NINES+-genre:Citation)^5&facet.field=genre&facet.field=archive&facet.field=freeculture&facet.field=has_full_text&facet.field=federation&isShard=true&fq=year:"1882"} status=0 QTime=315919
Re: Diagnosing solr timeout
>Have you looked at the garbage collector statistics? I've experienced this >kind of issues in the past and I was getting huge spikes when the GC was doing its job. I haven't, and I'm not sure what a good way to monitor this is. The problem occurs maybe once a week on a server. Should I run jstat the whole time and redirect the output to a log file? Is there another way to get that info? Also, I was suspecting GC myself. So, if it is the problem, what do I do about it? It seems like increasing RAM might make the problem worse because it would wait longer to GC, then it would have more to do.
Enabling SSL on SOLR breaks my SQL Server connection
Hi, I have enabled HTTPS on my SOLR server and it works fine over HTTPS for interaction with SOLR via the browser such as for data queries and management actions. However, I now get an error when attempting to retrieve data from the SQL server for Indexing. The JDBC connection string has the parameters to manage SQL connections that are encrypted which has been setup and works fine when SSL is not specified for SOLR. When enabling SSL for SOLR client connections how do I enable it just for clients making requests into SOLR and not change any of the outgoing stuff which is already using encrypted comms, ie to SQL Server. The error message I get is below: ... org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: SELECT * from MYVIEW Processing Document # 1 at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:69) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:327) at org.apache.solr.handler.dataimport.JdbcDataSource.createResultSetIterator(JdbcDataSource.java:288) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:283) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:52) at org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59) at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:267) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:476) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:233) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:424) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:483) at org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:466) at java.lang.Thread.run(Unknown Source) Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: The driver could not establish a secure connection to SQL Server by using Secure Sockets Layer (SSL) encryption. Error: "sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target". ClientConnectionId:bb3e9ce0-8d93-4514-98ed-f19938b91e96 at com.microsoft.sqlserver.jdbc.SQLServerConnection.terminate(SQLServerConnection.java:2826) at com.microsoft.sqlserver.jdbc.TDSChannel.enableSSL(IOBuffer.java:1829) at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectHelper(SQLServerConnection.java:2391) at com.microsoft.sqlserver.jdbc.SQLServerConnection.login(SQLServerConnection.java:2042) at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectInternal(SQLServerConnection.java:1889) at com.microsoft.sqlserver.jdbc.SQLServerConnection.connect(SQLServerConnection.java:1120) at com.microsoft.sqlserver.jdbc.SQLServerDriver.connect(SQLServerDriver.java:700) at org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:192) at org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:172) at org.apache.solr.handler.dataimport.JdbcDataSource.getConnection(JdbcDataSource.java:528) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:317) ... 14 more Caused by: javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target at sun.security.ssl.Alerts.getSSLException(Unknown Source) at sun.security.ssl.SSLSocketImpl.fatal(Unknown Source) at sun.security.ssl.Handshaker.fatalSE(Unknown Source) at sun.security.ssl.Handshaker.fatalSE(Unknown Source) at sun.security.ssl.ClientHandshaker.serverCertificate(Unknown Source) at sun.security.ssl.ClientHandshaker.processMessage(Unknown Source) at sun.security.ssl.Handshaker.processLoop(Unknown Source) at sun.security.ssl.Handshaker.process_record(Unknown Source) at sun.security.ssl.SSLSocketImpl.readRecord(Unknown Source) at sun.security.ssl.SSLSocketImpl.performInitialHandshake(Unknown Source) at sun.security.ssl.SSLSocketImpl.startHandshake(Unknown Source) at sun.security.ssl.SSLSocketImpl.startHandshake(Unknown Source) at com.microsoft.sqlse
Re: Enabling SSL on SOLR breaks my SQL Server connection
Thanks for the reply Shawn. What I was asking is whether there is an option to exclude the comms to SQL from SOLR managed encryption as the JDBC driver manages the connection and SOLR is acting as the Client in this instance and is already using encrypted comms via the connection string parameters. Cheers Paul On 5/23/2019 5:45 AM, Paul wrote: > unable to find > valid certification path to requested target This seems to be the root of your problem with the connection to SQL server. If I have all the context right, Java is saying it can't validate the certificate returned by the SQL server. This page: https://docs.microsoft.com/en-us/sql/connect/jdbc/connecting-with-ssl-encryption?view=sql-server-2017 Talks about a "trustCertificate" property you can set to "true" in the JDBC URL that will cause Microsoft's JDBC driver to NOT validate the server certificate. Alternatively, if the SQL server is sending all the necessary chain certificates, you could place the root cert for the CA that issued the SQL Server certificate in the Java keystore that you're using for SSL on Solr, that would probably also fix it -- because then the SQL cert would validate. Thanks, Shawn -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Enabling SSL on SOLR breaks my SQL Server connection
Ta - it works if I set trustCertificate=true so for now that will do for test. -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Enabling SSL on SOLR breaks my SQL Server connection
SOLVED: Now implemented with a bespoke trust store set up for SOLR ... -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Configure mutual TLS 1.2 to secure SOLR
Hi, Can someone please outline how to use mutual TLS 1.2 with SOLR. Or, point me at docs/tutorials/other where I can read up further on this (version currently onsite is SOLR 7.6). Thanks Paul -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Basic Authentication in Standalone Configuration ?
Hi, I am not sure if Basic Authentication is possible in SOLR standalone configuration (version 7.6). I have a working SOLR installation using SSL. When following the docs I add options into solr.in.cmd, as in: SOLR_AUTH_TYPE="basic" SOLR_AUTHENTICATION_OPTS="-Dbasicauth=solr:SolrRocks" When I go to start SOLR I get: 'SOLR_AUTH_TYPE' is not recognized as an internal or external command, operable program or batch file. 'SOLR_AUTHENTICATION_OPTS' is not recognized as an internal or external command, operable program or batch file. This is as per https://www.apache.si/lucene/solr/ref-guide/apache-solr-ref-guide-7.7.pdf and in there it refers to '*If you are using SolrCloud*, you must upload security.json to ZooKeeper. You can use this example command, ensuring that the ZooKeeper port is correct '. I am not using SolrCloud -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Security Problems
I don't this behavior is intuitive. It is very easy to misunderstand I would rather just add a flag to "authentication" plugin section which says "blockUnauthenticated" : true which means all unauthenticated requests must be blocked. On Tue, Dec 15, 2015 at 7:09 PM, Jan Høydahl wrote: > Yes, that’s why I believe it should be: > 1) if only authentication is enabled, all users must authenticate and all > authenticated users can do anything. > 2) if authz is enabled, then all users must still authenticate, and can by > default do nothing at all, unless assigned proper roles > 3) if a user is assigned the default “read” rule, and a collection adds a > custom “/myselect” handler, that one is unavailable until the user gets it > assigned > > -- > Jan Høydahl, search solution architect > Cominvent AS - www.cominvent.com > >> 14. des. 2015 kl. 14.15 skrev Noble Paul : >> >> ". If all paths were closed by default, forgetting to configure a path >> would not result in a security breach like today." >> >> But it will still mean that unauthorized users are able to access, >> like guest being able to post to "/update". Just authenticating is not >> enough without proper authorization >> >> On Mon, Dec 14, 2015 at 3:59 PM, Jan Høydahl wrote: >>>> 1) "read" should cover all the paths >>> >>> This is very fragile. If all paths were closed by default, forgetting to >>> configure a path would not result in a security breach like today. >>> >>> /Jan >> >> >> >> -- >> - >> Noble Paul > -- - Noble Paul
Re: Security Problems
I have opened https://issues.apache.org/jira/browse/SOLR-8429 On Wed, Dec 16, 2015 at 9:32 PM, Noble Paul wrote: > I don't this behavior is intuitive. It is very easy to misunderstand > > I would rather just add a flag to "authentication" plugin section > which says "blockUnauthenticated" : true > > which means all unauthenticated requests must be blocked. > > > > > On Tue, Dec 15, 2015 at 7:09 PM, Jan Høydahl wrote: >> Yes, that’s why I believe it should be: >> 1) if only authentication is enabled, all users must authenticate and all >> authenticated users can do anything. >> 2) if authz is enabled, then all users must still authenticate, and can by >> default do nothing at all, unless assigned proper roles >> 3) if a user is assigned the default “read” rule, and a collection adds a >> custom “/myselect” handler, that one is unavailable until the user gets it >> assigned >> >> -- >> Jan Høydahl, search solution architect >> Cominvent AS - www.cominvent.com >> >>> 14. des. 2015 kl. 14.15 skrev Noble Paul : >>> >>> ". If all paths were closed by default, forgetting to configure a path >>> would not result in a security breach like today." >>> >>> But it will still mean that unauthorized users are able to access, >>> like guest being able to post to "/update". Just authenticating is not >>> enough without proper authorization >>> >>> On Mon, Dec 14, 2015 at 3:59 PM, Jan Høydahl wrote: >>>>> 1) "read" should cover all the paths >>>> >>>> This is very fragile. If all paths were closed by default, forgetting to >>>> configure a path would not result in a security breach like today. >>>> >>>> /Jan >>> >>> >>> >>> -- >>> - >>> Noble Paul >> > > > > -- > - > Noble Paul -- - Noble Paul
Re: API accessible without authentication even though Basic Auth Plugin is enabled
A 5.3.2 release is coming up which will back port the fixes introduced in 5.4 On Dec 17, 2015 10:25 PM, "tine-2" wrote: > Noble Paul നോബിള് नोब्ळ् wrote > > It works as designed. > > > > Protect the read path [...] > > Works like described in 5.4.0, didn't work in 5.3.1, s. > https://issues.apache.org/jira/browse/SOLR-8408 > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/API-accessible-without-authentication-even-though-Basic-Auth-Plugin-is-enabled-tp4244940p4246099.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: Solr server not starting
On Wed, Jan 06, 2016 at 05:11:06PM +0100, agonn Qurdina wrote: > Hi, > > I am using Solr server with Echoprint service > (https://github.com/echonest/echoprint-server). The first time I started > it everything worked perfectly. This is the way I started it: > > java -Dsolr.solr.home=/home/echoprint-server/solr/solr/solr/ > -Djava.awt.headless=true -Xmx2048m -Xms2048m -jar start.jar > > Then I stopped it and I cannot start it anymore as it gets stuck at the 3rd > row of execution: > > 2016-01-06 11:04:19.030::INFO: Logging to STDERR via > org.mortbay.log.StdErrLog > 2016-01-06 11:04:19.165::INFO: jetty-6.1.3 > 2016-01-06 11:04:19.231::INFO: Extract > jar:file:/home/echoprint-server/solr/solr/webapps/solr.war!/ to > /tmp/Jetty_0_0_0_0_8502_solr.war__solr__-rnc92a/webapp > > It does not continue to execute anymore. I check if it is running in the > processes list and it turns out it is NOT. Please help me to solve this > problem! > > Best regards, > > Agon This could be a permissions problem -- for example, perhaps you started it as root the first time and are now attempting to start it as some other user. Paul. -- Paul Hoffman Systems Librarian Fenway Libraries Online c/o Wentworth Institute of Technology 550 Huntington Ave. Boston, MA 02115 (617) 442-2384 (FLO main number)
Re: URI is too long
How about using POST? paul > Salman Ansari <mailto:salman.rah...@gmail.com> > 31 January 2016 at 15:20 > Hi, > > I am building a long query containing multiple ORs between query terms. I > started to receive the following exception: > > The remote server returned an error: (414) Request-URI Too Long. Any idea > what is the limit of the URL in Solr? Moreover, as a solution I was > thinking of chunking the query into multiple requests but I was wondering > if anyone has a better approach? > > Regards, > Salman >
Logging request times
We’re trying to fine tune our query and ingestion performance and would like to get more metrics out of SOLR around this. We are capturing the standard logs as well as the jetty request logs. The standard logs get us QTime, which is not a good indication of how long the actual request took to process. The Jetty request logs only show requests between nodes. I can’t seem to find the client requests in there. I’d like to start tracking: * each request to index a document (or batch of documents) and the time it took. * Each request to execute a query and the time it took. Thanks, Paul McCallick Sr Manager Information Technology eCommerce Foundation
Re: Custom auth plugin not loaded in SolrCloud
yes, runtime lib cannot be used for loading container level plugins yet. Eventually they must. You can open a ticket On Mon, Jan 4, 2016 at 1:07 AM, tine-2 wrote: > Hi, > > are there any news on this? Was anyone able to get it to work? > > Cheers, > > tine > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Custom-auth-plugin-not-loaded-in-SolrCloud-tp4245670p4248340.html > Sent from the Solr - User mailing list archive at Nabble.com. -- ----- Noble Paul
Re: Logging request times
I stand corrected. The Jetty request logs do indeed contain ALL of the traffic, both from other nodes and from query requests. For the record, it is valuable to capture the time at the client AND from the server to track latency or compression issues. On 2/11/16, 8:13 AM, "Shawn Heisey" wrote: >On 2/10/2016 10:33 AM, McCallick, Paul wrote: >> We’re trying to fine tune our query and ingestion performance and would like >> to get more metrics out of SOLR around this. We are capturing the standard >> logs as well as the jetty request logs. The standard logs get us QTime, >> which is not a good indication of how long the actual request took to >> process. The Jetty request logs only show requests between nodes. I can’t >> seem to find the client requests in there. >> >> I’d like to start tracking: >> >> * each request to index a document (or batch of documents) and the time >> it took. >> * Each request to execute a query and the time it took. > >The Jetty request log will usually include the IP address of the client >making the request. If IP addresses are included in your log and you >aren't seeing anything from your client address(es), perhaps those >requests are being sent to another node. > >Logging elapsed time is also something that the clients can do. If the >client is using SolrJ, every response object has a "getElapsedTime" >method (and also "getQTime") that would allow the client program to log >the elapsed time without doing its own calculation. Or the client >program could calculate the elapsed time using whatever facilities are >available in the relevant language. > >Thanks, >Shawn >
Adding nodes
I’d like to verify the following: - When creating a new collection, SOLRCloud will use all available nodes for the collection, adding cores to each. This assumes that you do not specify a replicationFactor. - When adding new nodes to the cluster AFTER the collection is created, one must use the core admin api to add the node to the collection. I would really like to see the second case behave more like the first. If I add a node to the cluster, it is automatically used as a replica for existing clusters without my having to do so. This would really simplify things. Paul McCallick Sr Manager Information Technology eCommerce Foundation
Re: Highlight brings the content from the first pages of pdf
This looks like the stored content is shortened. Can it be? Can you see that inside the docs? paul > Evert R. <mailto:evert.ra...@gmail.com> > 14 February 2016 at 11:26 > Hi There, > > I have a situation where started a techproducts, without any modification, > post a pdf file. When searching as: > > q=text:search_word > hl=true > hl.fl=content > > It show the highlight accordingly! =) > > BUT... *if the "search_word" is after the first pages* in my pdf file, > such > as page 15... > > It simply *does not show* *the HIGHLIGHT*... > > Does anyone has faced this situation before? > > > Thanks! > > > *--Evert* >
Re: Adding nodes
Then what is the suggested way to add a new node to a collection via the apis? I am specifically thinking of autoscale scenarios where a node has gone down or more nodes are needed to handle load. Note that the ADDREPLICA endpoint requires a shard name, which puts the onus of how to scale out on the user. This can be challenging in an autoscale scenario. Thanks, Paul > On Feb 14, 2016, at 12:25 AM, Shawn Heisey wrote: > >> On 2/13/2016 6:01 PM, McCallick, Paul wrote: >> - When creating a new collection, SOLRCloud will use all available nodes for >> the collection, adding cores to each. This assumes that you do not specify >> a replicationFactor. > > The number of nodes that will be used is numShards multipled by > replicationFactor. The default value for replicationFactor is 1. If > you do not specify numShards, there is no default -- the CREATE call > will fail. The value of maxShardsPerNode can also affect the overall > result. > >> - When adding new nodes to the cluster AFTER the collection is created, one >> must use the core admin api to add the node to the collection. > > Using the CoreAdmin API is strongly discouraged when running SolrCloud. > It works, but it is an expert API when in cloud mode, and can cause > serious problems if not used correctly. Instead, use the Collections > API. It can handle all normal maintenance needs. > >> I would really like to see the second case behave more like the first. If I >> add a node to the cluster, it is automatically used as a replica for >> existing clusters without my having to do so. This would really simplify >> things. > > I've added a FAQ entry to address why this is a bad idea. > > https://wiki.apache.org/solr/FAQ#Why_doesn.27t_SolrCloud_automatically_create_replicas_when_I_add_nodes.3F > > Thanks, > Shawn >
Re: Adding nodes
Hi all, This doesn’t really answer the following question: What is the suggested way to add a new node to a collection via the apis? I am specifically thinking of autoscale scenarios where a node has gone down or more nodes are needed to handle load. The coreadmin api makes this easy. The collections api (ADDREPLICA), makes this very difficult. On 2/14/16, 8:19 AM, "Susheel Kumar" wrote: >Hi Paul, > >Shawn is referring to use Collections API >https://cwiki.apache.org/confluence/display/solr/Collections+API than Core >Admin API https://cwiki.apache.org/confluence/display/solr/CoreAdmin+API >for SolrCloud. > >Hope that clarifies and you mentioned about ADDREPLICA which is the >collections API, so you are on right track. > >Thanks, >Susheel > > > >On Sun, Feb 14, 2016 at 10:51 AM, McCallick, Paul < >paul.e.mccall...@nordstrom.com> wrote: > >> Then what is the suggested way to add a new node to a collection via the >> apis? I am specifically thinking of autoscale scenarios where a node has >> gone down or more nodes are needed to handle load. >> >> Note that the ADDREPLICA endpoint requires a shard name, which puts the >> onus of how to scale out on the user. This can be challenging in an >> autoscale scenario. >> >> Thanks, >> Paul >> >> > On Feb 14, 2016, at 12:25 AM, Shawn Heisey wrote: >> > >> >> On 2/13/2016 6:01 PM, McCallick, Paul wrote: >> >> - When creating a new collection, SOLRCloud will use all available >> nodes for the collection, adding cores to each. This assumes that you do >> not specify a replicationFactor. >> > >> > The number of nodes that will be used is numShards multipled by >> > replicationFactor. The default value for replicationFactor is 1. If >> > you do not specify numShards, there is no default -- the CREATE call >> > will fail. The value of maxShardsPerNode can also affect the overall >> > result. >> > >> >> - When adding new nodes to the cluster AFTER the collection is created, >> one must use the core admin api to add the node to the collection. >> > >> > Using the CoreAdmin API is strongly discouraged when running SolrCloud. >> > It works, but it is an expert API when in cloud mode, and can cause >> > serious problems if not used correctly. Instead, use the Collections >> > API. It can handle all normal maintenance needs. >> > >> >> I would really like to see the second case behave more like the first. >> If I add a node to the cluster, it is automatically used as a replica for >> existing clusters without my having to do so. This would really simplify >> things. >> > >> > I've added a FAQ entry to address why this is a bad idea. >> > >> > >> https://wiki.apache.org/solr/FAQ#Why_doesn.27t_SolrCloud_automatically_create_replicas_when_I_add_nodes.3F >> > >> > Thanks, >> > Shawn >> > >>
Re: Adding nodes
These are excellent questions and give me a good sense of why you suggest using the collections api. In our case we have 8 shards of product data with a even distribution of data per shard, no hot spots. We have very different load at different points in the year (cyber monday), and we tend to have very little traffic at night. I'm thinking of two use cases: 1) we are seeing increased latency due to load and want to add 8 more replicas to handle the query volume. Once the volume subsides, we'd remove the nodes. 2) we lose a node due to some unexpected failure (ec2 tends to do this). We want auto scaling to detect the failure and add a node to replace the failed one. In both cases the core api makes it easy. It adds nodes to the shards evenly. Otherwise we have to write a fairly involved script that is subject to race conditions to determine which shard to add nodes to. Let me know if I'm making dangerous or uninformed assumptions, as I'm new to solr. Thanks, Paul > On Feb 14, 2016, at 10:35 AM, Susheel Kumar wrote: > > Hi Pual, > > > For Auto-scaling, it depends on how you are thinking to design and what/how > do you want to scale. Which scenario you think makes coreadmin API easy to > use for a sharded SolrCloud environment? > > Isn't if in a sharded environment (assume 3 shards A,B & C) and shard B has > having higher or more load, then you want to add Replica for shard B to > distribute the load or if a particular shard replica goes down then you > want to add another Replica back for the shard in which case ADDREPLICA > requires a shard name? > > Can you describe your scenario / provide more detail? > > Thanks, > Susheel > > > > On Sun, Feb 14, 2016 at 11:51 AM, McCallick, Paul < > paul.e.mccall...@nordstrom.com> wrote: > >> Hi all, >> >> >> This doesn’t really answer the following question: >> >> What is the suggested way to add a new node to a collection via the >> apis? I am specifically thinking of autoscale scenarios where a node has >> gone down or more nodes are needed to handle load. >> >> >> The coreadmin api makes this easy. The collections api (ADDREPLICA), >> makes this very difficult. >> >> >>> On 2/14/16, 8:19 AM, "Susheel Kumar" wrote: >>> >>> Hi Paul, >>> >>> Shawn is referring to use Collections API >>> https://cwiki.apache.org/confluence/display/solr/Collections+API than >> Core >>> Admin API https://cwiki.apache.org/confluence/display/solr/CoreAdmin+API >>> for SolrCloud. >>> >>> Hope that clarifies and you mentioned about ADDREPLICA which is the >>> collections API, so you are on right track. >>> >>> Thanks, >>> Susheel >>> >>> >>> >>> On Sun, Feb 14, 2016 at 10:51 AM, McCallick, Paul < >>> paul.e.mccall...@nordstrom.com> wrote: >>> >>>> Then what is the suggested way to add a new node to a collection via the >>>> apis? I am specifically thinking of autoscale scenarios where a node >> has >>>> gone down or more nodes are needed to handle load. >>>> >>>> Note that the ADDREPLICA endpoint requires a shard name, which puts the >>>> onus of how to scale out on the user. This can be challenging in an >>>> autoscale scenario. >>>> >>>> Thanks, >>>> Paul >>>> >>>>> On Feb 14, 2016, at 12:25 AM, Shawn Heisey >> wrote: >>>>> >>>>>> On 2/13/2016 6:01 PM, McCallick, Paul wrote: >>>>>> - When creating a new collection, SOLRCloud will use all available >>>> nodes for the collection, adding cores to each. This assumes that you >> do >>>> not specify a replicationFactor. >>>>> >>>>> The number of nodes that will be used is numShards multipled by >>>>> replicationFactor. The default value for replicationFactor is 1. If >>>>> you do not specify numShards, there is no default -- the CREATE call >>>>> will fail. The value of maxShardsPerNode can also affect the overall >>>>> result. >>>>> >>>>>> - When adding new nodes to the cluster AFTER the collection is >> created, >>>> one must use the core admin api to add the node to the collection. >>>>> >>>>> Using the CoreAdmin API is strongly discouraged when running >> SolrCloud. >>>>> It works, but it is an expert API when in cloud mode, and can cause >>>>> serious problems if not used correctly. Instead, use the Collections >>>>> API. It can handle all normal maintenance needs. >>>>> >>>>>> I would really like to see the second case behave more like the >> first. >>>> If I add a node to the cluster, it is automatically used as a replica >> for >>>> existing clusters without my having to do so. This would really >> simplify >>>> things. >>>>> >>>>> I've added a FAQ entry to address why this is a bad idea. >> https://wiki.apache.org/solr/FAQ#Why_doesn.27t_SolrCloud_automatically_create_replicas_when_I_add_nodes.3F >>>>> >>>>> Thanks, >>>>> Shawn >>
Re: Need to move on SOlr cloud (help required)
On 16 February 2016 at 06:09, Midas A wrote: > Susheel, > > Is there any client available in php for solr cloud which maintain the same > ?? > > No there is none. I recommend HAProxy for Non SolrJ clients and loadbalancing SolrCloud. HAProxy makes it also easy to do rolling updates of your SolrCloud nodes Hth Paul > > On Tue, Feb 16, 2016 at 7:31 AM, Susheel Kumar > wrote: > > > In SolrJ, you would use CloudSolrClient which interacts with Zookeeper > > (which maintains Cluster State). See CloudSolrClient API. So that's how > > SolrJ would know which node is down or not. >
Solr won't start -- java.lang.ClassNotFoundException: org.eclipse.jetty.xml.XmlConfiguration
I've been running Solr successfully until this morning, when I stopped it to pick up a change in my schema, and now it won't start up again. I've whittled the problem down to this: ---- # cd /home/paul/proj/blacklight/jetty # java -jar start.jar -Djetty.port=8983 -Dsolr.solr.home=$PWD/solr WARNING: System properties and/or JVM args set. Consider using --dry-run or --exec java.lang.ClassNotFoundException: org.eclipse.jetty.xml.XmlConfiguration at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.eclipse.jetty.start.Main.invokeMain(Main.java:440) at org.eclipse.jetty.start.Main.start(Main.java:615) at org.eclipse.jetty.start.Main.main(Main.java:96) ClassNotFound: org.eclipse.jetty.xml.XmlConfiguration Usage: java -jar start.jar [options] [properties] [configs] java -jar start.jar --help # for more information # readlink -e $(which java) /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java # uname -srvmpio Linux 3.16.0-57-generic #77~14.04.1-Ubuntu SMP Thu Dec 17 23:20:00 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux # env | fgrep JAVA [no output] I only have one JVM installed -- openjdk-8-jre-headless. Judging from the file timestamps within /usr/lib/jvm, the package hasn't been updated since last August at the latest; the server has only been up for 62 days. Just in case it matters, I was running Solr successfully under Blacklight's jetty wrapper, and the command line above is what it uses (or claims to use). Does anyone have any idea what might be causing this problem? Thanks in advance, Paul. -- Paul Hoffman Systems Librarian Fenway Libraries Online c/o Wentworth Institute of Technology 550 Huntington Ave. Boston, MA 02115 (617) 442-2384 (FLO main number)
Re: Solr won't start -- java.lang.ClassNotFoundException: org.eclipse.jetty.xml.XmlConfiguration
On Tue, Mar 15, 2016 at 01:46:32PM -0600, Shawn Heisey wrote: > On 3/15/2016 1:34 PM, Paul Hoffman wrote: > > I've been running Solr successfully until this morning, when I stopped > > it to pick up a change in my schema, and now it won't start up again. > > I've whittled the problem down to this: > > > > > > # cd /home/paul/proj/blacklight/jetty > > > > # java -jar start.jar -Djetty.port=8983 -Dsolr.solr.home=$PWD/solr > > WARNING: System properties and/or JVM args set. Consider using --dry-run > > or --exec > > java.lang.ClassNotFoundException: org.eclipse.jetty.xml.XmlConfiguration > > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > > at org.eclipse.jetty.start.Main.invokeMain(Main.java:440) > > at org.eclipse.jetty.start.Main.start(Main.java:615) > > at org.eclipse.jetty.start.Main.main(Main.java:96) > > ClassNotFound: org.eclipse.jetty.xml.XmlConfiguration > > There are no Solr classes in that stacktrace. The class that can't be > found is a Jetty class. I think the problem here is in Jetty, not > Solr. It probably can't find a jar with a name like one of these: > > jetty-xml-8.1.14.v20131031.jar > jetty-xml-9.2.13.v20150730.jar > > What version of Solr? I'm assuming it's not 5.x, since the command used > to start those versions is very different, and Solr would probably not > be located within a blacklight folder. Thanks, Shawn. Which version indeed -- I have a mishmash of cruft lying around from earlier attempts to get Solr and Blacklight running, so I don't want to assume anything. I found the log file that shows me stopping and starting Solr today: # ls -ltr $(find $(locate log | egrep 'solr|jetty') -type f -mtime -1) | head -n5 find: `/home/paul/proj/blacklight/jetty/logs/solr.log': No such file or directory -rw-rw-r-- 1 paul paul 2885083 Mar 15 11:38 /home/paul/proj/blacklight/jetty/logs/solr_log_20160315_1152 -rw-r--r-- 1 root root5088 Mar 15 11:49 /home/paul/proj/solr-5.3.1/server/logs/solr_log_20160315_1150 -rw-r--r-- 1 root root 26701 Mar 15 11:49 /home/paul/proj/solr-5.3.1/server/logs/solr_gc_log_20160315_1150 -rw-rw-r-- 1 paul paul5086 Mar 15 11:51 /home/paul/proj/solr-5.3.1/server/logs/solr_log_20160315_1546 -rw-rw-r-- 1 paul paul 23537 Mar 15 11:51 /home/paul/proj/solr-5.3.1/server/logs/solr_gc_log_20160315_1546 # LOGFILE=/home/paul/proj/blacklight/jetty/logs/solr_log_20160315_1152 # egrep -nw 'stopped|org.eclipse.jetty.server.Server;' $LOGFILE | tail 5852:INFO - 2016-01-08 16:16:32.222; org.eclipse.jetty.server.Server; jetty-8.1.10.v20130312 6128:INFO - 2016-01-13 13:01:58.338; org.eclipse.jetty.server.Server; jetty-8.1.10.v20130312 6281:INFO - 2016-01-14 08:41:03.025; org.eclipse.jetty.server.Server; jetty-8.1.10.v20130312 7792:INFO - 2016-02-08 11:57:41.131; org.eclipse.jetty.server.Server; jetty-8.1.10.v20130312 7957:INFO - 2016-02-08 12:01:48.361; org.eclipse.jetty.server.Server; jetty-8.1.10.v20130312 8174:INFO - 2016-02-08 15:03:18.641; org.eclipse.jetty.server.Server; jetty-8.1.10.v20130312 8773:INFO - 2016-02-10 12:05:25.639; org.eclipse.jetty.server.Server; jetty-8.1.10.v20130312 12244:INFO - 2016-03-15 11:38:16.810; org.eclipse.jetty.server.Server; Graceful shutdown SocketConnector@0.0.0.0:8983 12245:INFO - 2016-03-15 11:38:16.814; org.eclipse.jetty.server.Server; Graceful shutdown o.e.j.w.WebAppContext{/solr,file:/home/paul/proj/blacklight/jetty/solr-webapp/webapp/},/home/paul/proj/blacklight/jetty/webapps/solr.war 12262:INFO - 2016-03-15 11:38:18.473; org.eclipse.jetty.server.handler.ContextHandler; stopped o.e.j.w.WebAppContext{/solr,file:/home/paul/proj/blacklight/jetty/solr-webapp/webapp/},/home/paul/proj/blacklight/jetty/webapps/solr.war It looks like the last time it was last restarted was on February 10 (line 8773). The log file doesn't show the Solr version directly, but maybe the first lines will help: # sed -n 8773,8795p $LOGFILE INFO - 2016-02-10 12:05:25.639; org.eclipse.jetty.server.Server; jetty-8.1.10.v20130312 INFO - 2016-02-10 12:05:25.703; org.eclipse.jetty.deploy.providers.ScanningAppProvider; Deployment monitor /home/paul/proj/blacklight/jetty/contexts at interval 0 INFO - 2016-02-10 12:05:25.714; org.eclipse.jetty.deploy.DeploymentManager; Deployable added: /home/paul/proj/blacklight/jetty/
Re: Solr won't start -- java.lang.ClassNotFoundException: org.eclipse.jetty.xml.XmlConfiguration
On Tue, Mar 15, 2016 at 07:58:21PM -0600, Shawn Heisey wrote: > On 3/15/2016 2:56 PM, Paul Hoffman wrote: > >> It sure looks like I started Solr from my blacklight project dir. > >> > >> Any ideas? Thanks, > >> > > You may need to get some help from the blacklight project. I've got > absolutely no idea what sort of integration they may have done with > Solr, what they may have changed, or how they've arranged the filesystem. > > Regarding the Jetty problem, in the directory where the "start.jar" that > you are running lives, there should be a lib directory, with various > jetty jars. The jetty-xml jar should be one of them. Here's a listing > of Jetty's lib directory from a Solr 4.9.1 install that I've got. I > have upgraded to a newer version of Jetty: > > root@bigindy5:/opt/solr4# ls -al lib > total 1496 > drwxr-xr-x 3 solr solr 4096 Aug 31 2015 . > drwxr-xr-x 13 solr solr 4096 Aug 31 2015 .. > drwxr-xr-x 2 solr solr 4096 Aug 31 2015 ext > -rw-r--r-- 1 solr solr 21162 Aug 31 2015 > jetty-continuation-8.1.14.v20131031.jar > -rw-r--r-- 1 solr solr 61908 Aug 31 2015 > jetty-deploy-8.1.14.v20131031.jar > -rw-r--r-- 1 solr solr 96122 Aug 31 2015 jetty-http-8.1.14.v20131031.jar > -rw-r--r-- 1 solr solr 104219 Aug 31 2015 jetty-io-8.1.14.v20131031.jar > -rw-r--r-- 1 solr solr 24770 Aug 31 2015 jetty-jmx-8.1.14.v20131031.jar > -rw-r--r-- 1 solr solr 89923 Aug 31 2015 > jetty-security-8.1.14.v20131031.jar > -rw-r--r-- 1 solr solr 357704 Aug 31 2015 > jetty-server-8.1.14.v20131031.jar > -rw-r--r-- 1 solr solr 101714 Aug 31 2015 > jetty-servlet-8.1.14.v20131031.jar > -rw-r--r-- 1 solr solr 287680 Aug 31 2015 jetty-util-8.1.14.v20131031.jar > -rw-r--r-- 1 solr solr 110096 Aug 31 2015 > jetty-webapp-8.1.14.v20131031.jar > -rw-r--r-- 1 solr solr 39065 Aug 31 2015 jetty-xml-8.1.14.v20131031.jar > -rw-r--r-- 1 solr solr 200387 Aug 31 2015 servlet-api-3.0.jar > > The Jetty included with blacklight may contain more jars than this. The > Solr jetty install is stripped down so it's very lean. > > Thanks, > Shawn > That's exactly what I have now -- I got them from the blacklight-jetty repo -- except the versions are slightly different (jetty-*-8.1.10.v20130312.jar instead of jetty-*-8.1.14.v20131031.jar). My assumption now is that I was running a significantly older version of Solr -- I was getting some deprecation warnings and an error that prevented loading my Blacklight core. However, using the new jetty jars and making some adjustments in my schema.xml has got the Solr end of things working again, so I'll take any further questions to the Blacklight list. Thanks again for your help, Paul. -- Paul Hoffman Systems Librarian Fenway Libraries Online c/o Wentworth Institute of Technology 550 Huntington Ave. Boston, MA 02115 (617) 442-2384 (FLO main number)
Re: Indexing using CSV
On Sun, Mar 20, 2016 at 06:11:32PM -0700, Jay Potharaju wrote: > Hi, > I am trying to index some data using csv files. The data contains > description column, which can include quotes, comma, LF/CR & other special > characters. > > I have it working but run into an issue with the following error > > line=5,can't read line: 5 values={NO LINES AVAILABLE}. > > What is the best way to debug this issue and secondly how do other people > handle indexing data using csv data. I would concentrate first on getting the CSV reader working verifiably, which might be the hardest part -- CSV is not a file format, it's a hodgepodge. Paul. -- Paul Hoffman Systems Librarian Fenway Libraries Online c/o Wentworth Institute of Technology 550 Huntington Ave. Boston, MA 02115 (617) 442-2384 (FLO main number)
Delete by query using JSON?
I've been struggling to find the right syntax for deleting by query using JSON, where the query includes an fq parameter. I know how to delete *all* documents, but how would I delete only documents with field doctype = "cres"? I have tried the following along with a number of variations, all to no avail: $ curl -s -d @- 'http://localhost:8983/solr/blacklight-core/update?wt=json' <http://localhost:8983/solr/blacklight-core/select?q=&fq=doctype%3Acres&wt=json&fl=id' It seems like such a simple thing, but I haven't found any examples that use an fq. Could someone post an example? Thanks in advance, Paul. -- Paul Hoffman Systems Librarian Fenway Libraries Online c/o Wentworth Institute of Technology 550 Huntington Ave. Boston, MA 02115 (617) 442-2384 (FLO main number)
Re: Delete by query using JSON?
On Tue, Mar 22, 2016 at 04:27:03PM -0700, Walter Underwood wrote: > “Why do you care?” might not be the best way to say it, but it is > essential to understand the difference between selection (filtering) > and ranking. > > As Solr params: > > * q is ranking and filtering > * fq is filtering only > * bq is ranking only Thanks, that is a very useful and concise synopsis. > When deleting documents, ordering does not matter, which is why we ask > why you care about the ordering. > > If the response is familiar to you, imagine how the questions sound to > people who have been working in search for twenty years. But even when > we are snippy, we still try to help. > > Many, many times, the question is wrong. The most common difficulty on > this list is an “XY problem”, where the poster has problem X and has > assumed solution Y, which is not the right solution. But they ask > about Y. So we will tell people that their approach is wrong, because > that is the most helpful thing we can do. Alex's response didn't seem snippy to me at all, and I agree wholeheartedly about the wrong-question problem -- in my case, not only was I asking the wrong question, but I shouldn't even have had to ask the (right) question at all! Thanks again, everyone. Paul. -- Paul Hoffman Systems Librarian Fenway Libraries Online c/o Wentworth Institute of Technology 550 Huntington Ave. Boston, MA 02115 (617) 442-2384 (FLO main number)
Re: Delete by query using JSON?
On Tue, Mar 22, 2016 at 10:25:06PM -0400, Jack Krupansky wrote: > See the correct syntax example here: > https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-SendingJSONUpdateCommands > > Your query is fine. Thanks; I thought the query was wrong, but the example you pointed me to clued me in to the real problem: I had neglected to specify Content-Type: application.json (d'oh!). Paul. -- Paul Hoffman Systems Librarian Fenway Libraries Online c/o Wentworth Institute of Technology 550 Huntington Ave. Boston, MA 02115 (617) 442-2384 (FLO main number)
Re: Regarding JSON indexing in SOLR 4.10
On Tue, Mar 29, 2016 at 11:30:06PM -0700, Aditya Desai wrote: > I am running SOLR 4.10 on port 8984 by changing the default port in > etc/jetty.xml. I am now trying to index all my JSON files to Solr running > on 8984. The following is the command > > curl 'http://localhost:8984/solr/update?commit=true' --data-binary *.json > -H 'Content-type:application/json' The wildcard is the problem; your shell is expanding --data-binary *.json to --data-binary foo.json bar.json baz.json and curl doesn't know how to download bar.json and baz.json. Try this instead: for file in *.json; do curl 'http://localhost:8984/solr/update?commit=true' --data-binary "$file" -H 'Content-type:application/json' done Paul. -- Paul Hoffman Systems Librarian Fenway Libraries Online c/o Wentworth Institute of Technology 550 Huntington Ave. Boston, MA 02115 (617) 442-2384 (FLO main number)
Re: Basic auth
Solr 5.3 is coming with proper basic auth support https://issues.apache.org/jira/browse/SOLR-7692 On Wed, Jul 22, 2015 at 5:28 PM, Peter Sturge wrote: > if you're using Jetty you can use the standard realms mechanism for Basic > Auth, and it works the same on Windows or UNIX. There's plenty of docs on > the Jetty site about getting this working, although it does vary somewhat > depending on the version of Jetty you're running (N.B. I would suggest > using Jetty 9, and not 8, as 8 is missing some key authentication classes). > If, when you execute a search query to your Solr instance you get a > username and password popup, then Jetty's auth is setup. If you don't then > something's wrong in the Jetty config. > > it's worth noting that if you're doing distributed searches Basic Auth on > its own will not work for you. This is because Solr sends distributed > requests to remote instances on behalf of the user, and it has no knowledge > of the web container's auth mechanics. We got 'round this by customizing > Solr to receive credentials and use them for authentication to remote > instances - SOLR-1861 is an old implementation for a previous release, and > there has been some significant refactoring of SearchHandler since then, > but the concept works well for distributed queries. > > Thanks, > Peter > > > > On Wed, Jul 22, 2015 at 11:18 AM, O. Klein wrote: > >> Steven White wrote >> > Thanks for updating the wiki page. However, my issue remains, I cannot >> > get >> > Basic auth working. Has anyone got it working, on Windows? >> >> Doesn't work for me on Linux either. >> >> >> >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/Basic-auth-tp4218053p4218519.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> -- - Noble Paul
Re: Basic auth
Q.do you know when it would be released? 5.3 will be released in another 3-4 weeks . Q.Are there any requirements of ZK authentication must be there as well? NO bq.Providing my own security.json + class/implementation to verify user/pass should work today with 5.2, right? Yes. But, if you modify your credentials or anything in that JSON, you will have to restart all your nodes . Q.SOLR-7274 pluggable security is already in 5.2 (my requirement is to provide user/pass in a secure manner, not as argument on cmd or from (our unsecured) ZK but from a configuration restful service, I'm not clear what your question is. Basic Auth is a well-known standard. We are just implementing that standard. We store all credentials & permissions in ZK . That means it is only as secure as your ZK . As long as nobody can write to ZK, your system is safe On Wed, Jul 22, 2015 at 11:10 PM, Fadi Mohsen wrote: > Hi, I have some questions regarding basic auth and proper support in 5.3: > > do you know when it would be released? > > Are there any requirements of ZK authentication must be there as well? > > Do we store the user/pass in ZK? > > SOLR-7274 pluggable security is already in 5.2 (my requirement is to provide > user/pass in a secure manner, not as argument on cmd or from (our unsecured) > ZK but from a configuration restful service, > I'm not sure 5.3 release would fit above requirement, can you reflect on this? > > Providing my own security.json + class/implementation to verify user/pass > should work today with 5.2, right? > > Thanks > Fadi > >> On 22 Jul 2015, at 14:33, Noble Paul wrote: >> >> Solr 5.3 is coming with proper basic auth support >> >> >> https://issues.apache.org/jira/browse/SOLR-7692 >> >>> On Wed, Jul 22, 2015 at 5:28 PM, Peter Sturge >>> wrote: >>> if you're using Jetty you can use the standard realms mechanism for Basic >>> Auth, and it works the same on Windows or UNIX. There's plenty of docs on >>> the Jetty site about getting this working, although it does vary somewhat >>> depending on the version of Jetty you're running (N.B. I would suggest >>> using Jetty 9, and not 8, as 8 is missing some key authentication classes). >>> If, when you execute a search query to your Solr instance you get a >>> username and password popup, then Jetty's auth is setup. If you don't then >>> something's wrong in the Jetty config. >>> >>> it's worth noting that if you're doing distributed searches Basic Auth on >>> its own will not work for you. This is because Solr sends distributed >>> requests to remote instances on behalf of the user, and it has no knowledge >>> of the web container's auth mechanics. We got 'round this by customizing >>> Solr to receive credentials and use them for authentication to remote >>> instances - SOLR-1861 is an old implementation for a previous release, and >>> there has been some significant refactoring of SearchHandler since then, >>> but the concept works well for distributed queries. >>> >>> Thanks, >>> Peter >>> >>> >>> >>>> On Wed, Jul 22, 2015 at 11:18 AM, O. Klein wrote: >>>> >>>> Steven White wrote >>>>> Thanks for updating the wiki page. However, my issue remains, I cannot >>>>> get >>>>> Basic auth working. Has anyone got it working, on Windows? >>>> >>>> Doesn't work for me on Linux either. >>>> >>>> >>>> >>>> -- >>>> View this message in context: >>>> http://lucene.472066.n3.nabble.com/Basic-auth-tp4218053p4218519.html >>>> Sent from the Solr - User mailing list archive at Nabble.com. >> >> >> >> -- >> - >> Noble Paul -- - Noble Paul
Re: Basic auth
"Although I'm not sure why you took this approach instead of supporting simple built-in basic auth and let us configure security the "old/easy" way" Going with Jetty basic auth is not useful in a large enough cluster. Where do you store the credentials and how would you propagate it across the cluster. When you use Solr, you need a SOlr like way of managing that. The other problem is inter-node communication. How do you pass credentials along in that case "I'm guessing it has to do with future requirement of field/doc level security" Acutally that is an orthogonal requirement "I hope you can get rid of the war file soon and start promoting Solr as a set of libraries so one can easily embed/extend Solr" That is not what we have in mind. We want Solr to be a server which controls every aspect of its running . We should have the choice of getting rid of jetty or whatsoever and move to a new system. We only guarantee to interface/protocol to remain constant On Tue, Jul 28, 2015 at 2:19 AM, Fadi Mohsen wrote: > Thank you, I tested providing my implementation of authentication in > security.json, uploaded file to ZK (just considering authentication), started > nodes and it worked like a charm. > > That required of course turning off Jetty basic auth. > > Although I'm not sure why you took this approach instead of supporting > simple built-in basic auth and let us configure security the "old/easy" way. > > I'm guessing it has to do with future requirement of field/doc level security. > > I hope you can get rid of the war file soon and start promoting Solr as a set > of libraries so one can easily embed/extend Solr, since some (especially me) > might consider command line ZK operations are not that "continues > delivery/automate everything/production" friendly. > > It's easy today to spin up a jetty and wire / point out resource classes or > wire up CXF alongside to get things playing, but I'm probably missing out of > other things since I see many mails usually in consensus of not embedding and > rather want people to consider Solr as a stand-alone service, not sure why! > I'm probably getting out of context here. > > Regards > >> On 27 Jul 2015, at 13:17, Noble Paul wrote: >> >> Q.do you know when it would be released? >> 5.3 will be released in another 3-4 weeks . >> >> Q.Are there any requirements of ZK authentication must be there as well? >> NO >> >> bq.Providing my own security.json + class/implementation to verify >> user/pass should work today with 5.2, right? >> >> Yes. But, if you modify your credentials or anything in that JSON, you >> will have to restart all your nodes . >> >> Q.SOLR-7274 pluggable security is already in 5.2 (my requirement is to >> provide user/pass in a secure manner, not as argument on cmd or from >> (our unsecured) ZK but from a configuration restful service, >> >> I'm not clear what your question is. Basic Auth is a well-known >> standard. We are just implementing that standard. We store all >> credentials & permissions in ZK . That means it is only as secure as >> your ZK . As long as nobody can write to ZK, your system is safe >> >>> On Wed, Jul 22, 2015 at 11:10 PM, Fadi Mohsen wrote: >>> Hi, I have some questions regarding basic auth and proper support in 5.3: >>> >>> do you know when it would be released? >>> >>> Are there any requirements of ZK authentication must be there as well? >>> >>> Do we store the user/pass in ZK? >>> >>> SOLR-7274 pluggable security is already in 5.2 (my requirement is to >>> provide user/pass in a secure manner, not as argument on cmd or from (our >>> unsecured) ZK but from a configuration restful service, >>> I'm not sure 5.3 release would fit above requirement, can you reflect on >>> this? >>> >>> Providing my own security.json + class/implementation to verify user/pass >>> should work today with 5.2, right? >>> >>> Thanks >>> Fadi >>> >>>> On 22 Jul 2015, at 14:33, Noble Paul wrote: >>>> >>>> Solr 5.3 is coming with proper basic auth support >>>> >>>> >>>> https://issues.apache.org/jira/browse/SOLR-7692 >>>> >>>>> On Wed, Jul 22, 2015 at 5:28 PM, Peter Sturge >>>>> wrote: >>>>> if you're using Jetty you can use the standard realms mechanism for Basic >>>>> Auth, and it works the same on Windows or UNIX. There's plenty of docs on >>>>> the Jetty sit
pre-loaded function-query?
Hello Solr experts, I'm writing a "query expansion" QueryComponent which takes web-app parameters (e.g. profile information) and turns them into a solr query. Thus far I've used lucene TermQuery-ies with success. Now, I would like to use something a bit more elaborate. Either I write it with quite a lot of term-queries or I use a function query. But how can I create a functionQuery that I can: - re-use between the queries, - enter using a somewhat practical method Seems like range-queries could be done but I do not find how I can do it with function-queries. Is there an example somewhere? thanks Paul signature.asc Description: OpenPGP digital signature
Re: pre-loaded function-query?
Doug Turnbull wrote: > I'm not sure if you mean organizing function queries under the hood in a > query component or externally. > > Externally, I've always followed John Berryman's great advice for working > with Solr when dealing with complex/reusable function queries and boosts > http://opensourceconnections.com/blog/2013/11/22/parameterizing-and-organizing-solr-boosts/ Very very cute indeed. However, I think I need it internally. You're making me doubt. Do I understand properly that this boost parameter is just operating a power overall on the query? My current expansion expands from the user-query to the +user-query favouring-query-depending-other-params overall-favoring-query (where the overall-favoring-query could be computed as a function). With the boost parameter, i'd do: (+user-query favouring-query-depending-other-params)^boost-function Not exactly the same or? thanks Paul
Re: User Authentication
did you manage to look at the reference guide? https://cwiki.apache.org/confluence/display/solr/Securing+Solr On Mon, Aug 24, 2015 at 9:23 PM, LeZotte, Tom wrote: > Alex > I got a super secret release of Solr 5.3.1, wasn’t suppose to say anything. > > Yes I’m running 5.2.1, I will check out the release notes for 5.3. > > Was looking for three types of user authentication, I guess. > 1. the Admin Console > 2. User auth for each Core ( and select and update) on a server. > 3. HTML interface access (example: > ajax-solr<https://github.com/evolvingweb/ajax-solr>) > > Thanks > > Tom LeZotte > Health I.T. - Senior Product Developer > (p) 615-875-8830 > > > > > > > On Aug 24, 2015, at 10:05 AM, Alexandre Rafalovitch > mailto:arafa...@gmail.com>> wrote: > > Thanks for the email from the future. It is good to start to prepare > for 5.3.1 now that 5.3 is nearly out. > > Joking aside (and assuming Solr 5.2.1), what exactly are you trying to > achieve? Solr should not actually be exposed to the users directly. It > should be hiding in a backend only visible to your middleware. If you > are looking for a HTML interface that talks directly to Solr after > authentication, that's not the right way to set it up. > > That said, some security features are being rolled out and you should > definitely check the release notes for the 5.3. > > Regards, > Alex. > > Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: > http://www.solr-start.com/ > > > On 24 August 2015 at 10:01, LeZotte, Tom wrote: > Hi Solr Community > > I have been trying to add user authentication to our Solr 5.3.1 RedHat > install. I’ve found some examples on user authentication on the Jetty side. > But they have failed. > > Does any one have a step by step example on authentication for the admin > screen? And a core? > > > Thanks > > Tom LeZotte > Health I.T. - Senior Product Developer > (p) 615-875-8830 > > > > > > > -- - Noble Paul
Re: SOLR 5.3
The release is underway. Incorporating some corrections suggested by others. Expect an announcement ove rthe next few hours On Sun, Aug 23, 2015 at 6:44 PM, Arcadius Ahouansou wrote: > Solr-5.3 has been available for download from > http://mirror.catn.com/pub/apache/lucene/solr/5.3.0/ > > The redirection on the web site will probably be fixed before we get the > official announcement. > > Arcadius. > > On 23 August 2015 at 09:00, William Bell wrote: > >> At lucene.apache.org/solr it says SOLR 5.3 is there, but when I click on >> downloads it shows Solr 5.2.1... ?? >> >> "APACHE SOLR™ 5.3.0Solr is the popular, blazing-fast, open source >> enterprise search platform built on Apache Lucene™." >> >> -- >> Bill Bell >> billnb...@gmail.com >> cell 720-256-8076 >> > > > > -- > Arcadius Ahouansou > Menelic Ltd | Information is Power > M: 07908761999 > W: www.menelic.com > --- -- - Noble Paul
Re: User Authentication
no. Most of it is in Solr 5.3 On Tue, Aug 25, 2015 at 12:48 AM, Steven White wrote: > Hi Noble, > > Is everything in the link you provided applicable to Solr 5.2.1? > > Thanks > > Steve > > On Mon, Aug 24, 2015 at 2:20 PM, Noble Paul wrote: > >> did you manage to look at the reference guide? >> https://cwiki.apache.org/confluence/display/solr/Securing+Solr >> >> On Mon, Aug 24, 2015 at 9:23 PM, LeZotte, Tom >> wrote: >> > Alex >> > I got a super secret release of Solr 5.3.1, wasn’t suppose to say >> anything. >> > >> > Yes I’m running 5.2.1, I will check out the release notes for 5.3. >> > >> > Was looking for three types of user authentication, I guess. >> > 1. the Admin Console >> > 2. User auth for each Core ( and select and update) on a server. >> > 3. HTML interface access (example: ajax-solr< >> https://github.com/evolvingweb/ajax-solr>) >> > >> > Thanks >> > >> > Tom LeZotte >> > Health I.T. - Senior Product Developer >> > (p) 615-875-8830 >> > >> > >> > >> > >> > >> > >> > On Aug 24, 2015, at 10:05 AM, Alexandre Rafalovitch > <mailto:arafa...@gmail.com>> wrote: >> > >> > Thanks for the email from the future. It is good to start to prepare >> > for 5.3.1 now that 5.3 is nearly out. >> > >> > Joking aside (and assuming Solr 5.2.1), what exactly are you trying to >> > achieve? Solr should not actually be exposed to the users directly. It >> > should be hiding in a backend only visible to your middleware. If you >> > are looking for a HTML interface that talks directly to Solr after >> > authentication, that's not the right way to set it up. >> > >> > That said, some security features are being rolled out and you should >> > definitely check the release notes for the 5.3. >> > >> > Regards, >> > Alex. >> > >> > Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: >> > http://www.solr-start.com/ >> > >> > >> > On 24 August 2015 at 10:01, LeZotte, Tom >> wrote: >> > Hi Solr Community >> > >> > I have been trying to add user authentication to our Solr 5.3.1 RedHat >> install. I’ve found some examples on user authentication on the Jetty side. >> But they have failed. >> > >> > Does any one have a step by step example on authentication for the admin >> screen? And a core? >> > >> > >> > Thanks >> > >> > Tom LeZotte >> > Health I.T. - Senior Product Developer >> > (p) 615-875-8830 >> > >> > >> > >> > >> > >> > >> > >> >> >> >> -- >> - >> Noble Paul >> -- - Noble Paul
[ANNOUNCE] Apache Solr 5.3.0 released
Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search. Solr is highly scalable, providing fault tolerant distributed search and indexing, and powers the search and navigation features of many of the world's largest internet sites. Solr 5.3.0 is available for immediate download at: http://lucene.apache.org/solr/mirrors-solr-latest-redir.html Solr 5.3 Release Highlights: In addition to many other improvements in the security framework, Solr now includes an AuthenticationPlugin implementing HTTP Basic Auth that stores credentials securely in ZooKeeper. This is a simple way to require a username and password for anyone accessing Solr’s admin screen or APIs. In built AuthorizationPlugin that provides fine grained control over implementing ACLs for various resources with permisssion rules which are stored in ZooKeeper. The JSON Facet API can now change the domain for facet commands, essentially doing a block join and moving from parents to children, or children to parents before calculating the facet data. Major improvements in performance of the new Facet Module / JSON Facet API. Query and Range Facets under Pivot Facets. Just like the JSON Facet API, pivot facets can how nest other facet types such as range and query facets. More Like This Query Parser options. The MoreLikeThis QParser now supports all options provided by the MLT Handler. The query parser is much more versatile than the handler as it works in cloud mode as well as anywhere a normal query can be specified. Added Schema API support in SolrJ Added Scoring mode for query-time join and block join. Added Smile response format For upgrading from 5.2, please look at the "Upgrading from Solr 5.2" section in the change log. Detailed change log: http://lucene.apache.org/solr/5_3_0/changes/Changes.html Please report any feedback to the mailing lists (http://lucene.apache.org/solr/discussion.html) -- - Noble Paul www.lucidworks.com
Re: [ANNOUNCE] Apache Solr 5.2.0 released
sorry , screwed up the title On Tue, Aug 25, 2015 at 8:30 AM, Noble Paul wrote: > Solr is the popular, blazing fast, open source NoSQL search platform > from the Apache Lucene project. Its major features include powerful > full-text search, hit highlighting, faceted search, dynamic > clustering, database integration, rich document (e.g., Word, PDF) > handling, and geospatial search. Solr is highly scalable, providing > fault tolerant distributed search and indexing, and powers the search > and navigation features of many of the world's largest internet sites. > > Solr 5.3.0 is available for immediate download at: > http://lucene.apache.org/solr/mirrors-solr-latest-redir.html > > Solr 5.3 Release Highlights: > > In addition to many other improvements in the security framework, Solr > now includes an AuthenticationPlugin implementing HTTP Basic Auth that > stores credentials securely in ZooKeeper. This is a simple way to > require a username and password for anyone accessing Solr’s admin > screen or APIs. > In built AuthorizationPlugin that provides fine grained control over > implementing ACLs for various resources with permisssion rules which > are stored in ZooKeeper. > The JSON Facet API can now change the domain for facet commands, > essentially doing a block join and moving from parents to children, or > children to parents before calculating the facet data. > Major improvements in performance of the new Facet Module / JSON Facet API. > Query and Range Facets under Pivot Facets. Just like the JSON Facet > API, pivot facets can how nest other facet types such as range and > query facets. > More Like This Query Parser options. The MoreLikeThis QParser now > supports all options provided by the MLT Handler. The query parser is > much more versatile than the handler as it works in cloud mode as well > as anywhere a normal query can be specified. > Added Schema API support in SolrJ > Added Scoring mode for query-time join and block join. > Added Smile response format > > For upgrading from 5.2, please look at the "Upgrading from Solr 5.2" > section in the change log. > > Detailed change log: > http://lucene.apache.org/solr/5_3_0/changes/Changes.html > > Please report any feedback to the mailing lists > (http://lucene.apache.org/solr/discussion.html) > > > -- > - > Noble Paul > www.lucidworks.com -- - Noble Paul
[ANNOUNCE] Apache Solr 5.2.0 released
Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search. Solr is highly scalable, providing fault tolerant distributed search and indexing, and powers the search and navigation features of many of the world's largest internet sites. Solr 5.3.0 is available for immediate download at: http://lucene.apache.org/solr/mirrors-solr-latest-redir.html Solr 5.3 Release Highlights: In addition to many other improvements in the security framework, Solr now includes an AuthenticationPlugin implementing HTTP Basic Auth that stores credentials securely in ZooKeeper. This is a simple way to require a username and password for anyone accessing Solr’s admin screen or APIs. In built AuthorizationPlugin that provides fine grained control over implementing ACLs for various resources with permisssion rules which are stored in ZooKeeper. The JSON Facet API can now change the domain for facet commands, essentially doing a block join and moving from parents to children, or children to parents before calculating the facet data. Major improvements in performance of the new Facet Module / JSON Facet API. Query and Range Facets under Pivot Facets. Just like the JSON Facet API, pivot facets can how nest other facet types such as range and query facets. More Like This Query Parser options. The MoreLikeThis QParser now supports all options provided by the MLT Handler. The query parser is much more versatile than the handler as it works in cloud mode as well as anywhere a normal query can be specified. Added Schema API support in SolrJ Added Scoring mode for query-time join and block join. Added Smile response format For upgrading from 5.2, please look at the "Upgrading from Solr 5.2" section in the change log. Detailed change log: http://lucene.apache.org/solr/5_3_0/changes/Changes.html Please report any feedback to the mailing lists (http://lucene.apache.org/solr/discussion.html) -- - Noble Paul www.lucidworks.com
Re: Issue Using Solr 5.3 Authentication and Authorization Plugins
Admin UI is not protected by any of these permissions. Only if you try to perform a protected operation , it asks for a password. I'll investigate the restart problem and report my findings On Tue, Sep 1, 2015 at 3:10 AM, Kevin Lee wrote: > Anyone else running into any issues trying to get the authentication and > authorization plugins in 5.3 working? > >> On Aug 29, 2015, at 2:30 AM, Kevin Lee wrote: >> >> Hi, >> >> I’m trying to use the new basic auth plugin for Solr 5.3 and it doesn’t seem >> to be working quite right. Not sure if I’m missing steps or there is a bug. >> I am able to get it to protect access to a URL under a collection, but am >> unable to get it to secure access to the Admin UI. In addition, after >> stopping the Solr and Zookeeper instances, the security.json is still in >> Zookeeper, however Solr is allowing access to everything again like the >> security configuration isn’t in place. >> >> Contents of security.json taken from wiki page, but edited to produce valid >> JSON. Had to move comma after 3rd from last “}” up to just after the last >> “]”. >> >> { >> "authentication":{ >> "class":"solr.BasicAuthPlugin", >> "credentials":{"solr":"IV0EHq1OnNrj6gvRCwvFwTrZ1+z1oBbnQdiVC3otuq0= >> Ndd7LKvVBAaZIF0QAVi1ekCfAJXr1GGfLtRUXhgrF8c="} >> }, >> "authorization":{ >> "class":"solr.RuleBasedAuthorizationPlugin", >> "permissions":[{"name":"security-edit", >> "role":"admin"}], >> "user-role":{"solr":"admin"} >> }} >> >> Here are the steps I followed: >> >> Upload security.json to zookeeper >> ./zkcli.sh -z localhost:2181,localhost:2182,localhost:2183 -cmd putfile >> /security.json ~/solr/security.json >> >> Use zkCli.sh from Zookeeper to ensure the security.json is in Zookeeper at >> /security.json. It is there and looks like what was originally uploaded. >> >> Start Solr Instances >> >> Attempt to create a permission, however get the following error: >> { >> "responseHeader":{ >>"status":400, >>"QTime":0}, >> "error":{ >>"msg":"No authorization plugin configured", >>"code":400}} >> >> Upload security.json again. >> ./zkcli.sh -z localhost:2181,localhost:2182,localhost:2183 -cmd putfile >> /security.json ~/solr/security.json >> >> Issue the following to try to create the permission again and this time it’s >> successful. >> // Create a permission for mysearch endpoint >>curl --user solr:SolrRocks -H 'Content-type:application/json' -d >> '{"set-permission": {"name":"mycollection-search","collection": >> “mycollection","path":”/mysearch","role": "search-user"}}' >> http://localhost:8983/solr/admin/authorization >> >>{ >> "responseHeader":{ >>"status":0, >>"QTime":7}} >> >> Issue the following commands to add users >> curl --user solr:SolrRocks http://localhost:8983/solr/admin/authentication >> -H 'Content-type:application/json' -d '{"set-user": {"admin" : “password" }}’ >> curl --user solr:SolrRocks http://localhost:8983/solr/admin/authentication >> -H 'Content-type:application/json' -d '{"set-user": {"user" : “password" }}' >> >> Issue the following command to add permission to users >> curl -u solr:SolrRocks -H 'Content-type:application/json' -d '{ >> "set-user-role" : {"admin": ["search-user", "admin"]}}' >> http://localhost:8983/solr/admin/authorization >> curl -u solr:SolrRocks -H 'Content-type:application/json' -d '{ >> "set-user-role" : {"user": ["search-user"]}}' >> http://localhost:8983/solr/admin/authorization >> >> After executing the above, access to /mysearch is protected until I restart >> the Solr and Zookeeper instances. However, the admin UI is never protected >> like the Wiki page says it should be once activated. >> >> https://cwiki.apache.org/confluence/display/solr/Rule-Based+Authorization+Plugin >> >> <https://cwiki.apache.org/confluence/display/solr/Rule-Based+Authorization+Plugin> >> >> Why does the authentication and authorization plugin not stay activated >> after restart and why is the Admin UI never protected? Am I missing any >> steps? >> >> Thanks, >> Kevin -- - Noble Paul
Re: Issue Using Solr 5.3 Authentication and Authorization Plugins
I'm investigating why restarts or first time start does not read the security.json On Tue, Sep 1, 2015 at 1:00 PM, Noble Paul wrote: > I removed that statement > > "If activating the authorization plugin doesn't protect the admin ui, > how does one protect access to it?" > > One does not need to protect the admin UI. You only need to protect > the relevant API calls . I mean it's OK to not protect the CSS and > HTML stuff. But if you perform an action to create a core or do a > query through admin UI , it automatically will prompt you for > credentials (if those APIs are protected) > > On Tue, Sep 1, 2015 at 12:41 PM, Kevin Lee wrote: >> Thanks for the clarification! >> >> So is the wiki page incorrect at >> https://cwiki.apache.org/confluence/display/solr/Basic+Authentication+Plugin >> which says that the admin ui will require authentication once the >> authorization plugin is activated? >> >> "An authorization plugin is also available to configure Solr with >> permissions to perform various activities in the system. Once activated, >> access to the Solr Admin UI and all requests will need to be authenticated >> and users will be required to have the proper authorization for all >> requests, including using the Admin UI and making any API calls." >> >> If activating the authorization plugin doesn't protect the admin ui, how >> does one protect access to it? >> >> Also, the issue I'm having is not just at restart. According to the docs >> security.json should be uploaded to Zookeeper before starting any of the >> Solr instances. However, I tried to upload security.json before starting >> any of the Solr instances, but it would not pick up the security config >> until after the Solr instances are already running and then uploading the >> security.json again. I can see in the logs at startup that the Solr >> instances don't see any plugin enabled even though security.json is already >> in zookeeper and then after they are started and the security.json is >> uploaded again I see it reconfigure to use the plugin. >> >> Thanks, >> Kevin >> >>> On Aug 31, 2015, at 11:22 PM, Noble Paul wrote: >>> >>> Admin UI is not protected by any of these permissions. Only if you try >>> to perform a protected operation , it asks for a password. >>> >>> I'll investigate the restart problem and report my findings >>> >>>> On Tue, Sep 1, 2015 at 3:10 AM, Kevin Lee >>>> wrote: >>>> Anyone else running into any issues trying to get the authentication and >>>> authorization plugins in 5.3 working? >>>> >>>>> On Aug 29, 2015, at 2:30 AM, Kevin Lee wrote: >>>>> >>>>> Hi, >>>>> >>>>> I’m trying to use the new basic auth plugin for Solr 5.3 and it doesn’t >>>>> seem to be working quite right. Not sure if I’m missing steps or there >>>>> is a bug. I am able to get it to protect access to a URL under a >>>>> collection, but am unable to get it to secure access to the Admin UI. In >>>>> addition, after stopping the Solr and Zookeeper instances, the >>>>> security.json is still in Zookeeper, however Solr is allowing access to >>>>> everything again like the security configuration isn’t in place. >>>>> >>>>> Contents of security.json taken from wiki page, but edited to produce >>>>> valid JSON. Had to move comma after 3rd from last “}” up to just after >>>>> the last “]”. >>>>> >>>>> { >>>>> "authentication":{ >>>>> "class":"solr.BasicAuthPlugin", >>>>> "credentials":{"solr":"IV0EHq1OnNrj6gvRCwvFwTrZ1+z1oBbnQdiVC3otuq0= >>>>> Ndd7LKvVBAaZIF0QAVi1ekCfAJXr1GGfLtRUXhgrF8c="} >>>>> }, >>>>> "authorization":{ >>>>> "class":"solr.RuleBasedAuthorizationPlugin", >>>>> "permissions":[{"name":"security-edit", >>>>>"role":"admin"}], >>>>> "user-role":{"solr":"admin"} >>>>> }} >>>>> >>>>> Here are the steps I followed: >>>>> >>>>> Upload security.json to zookeeper >>>>> ./zkcli.sh -z localhost:2181,localhost:2182,localhost:2183 -cmd putfile >>>>> /se
Re: Issue Using Solr 5.3 Authentication and Authorization Plugins
I removed that statement "If activating the authorization plugin doesn't protect the admin ui, how does one protect access to it?" One does not need to protect the admin UI. You only need to protect the relevant API calls . I mean it's OK to not protect the CSS and HTML stuff. But if you perform an action to create a core or do a query through admin UI , it automatically will prompt you for credentials (if those APIs are protected) On Tue, Sep 1, 2015 at 12:41 PM, Kevin Lee wrote: > Thanks for the clarification! > > So is the wiki page incorrect at > https://cwiki.apache.org/confluence/display/solr/Basic+Authentication+Plugin > which says that the admin ui will require authentication once the > authorization plugin is activated? > > "An authorization plugin is also available to configure Solr with permissions > to perform various activities in the system. Once activated, access to the > Solr Admin UI and all requests will need to be authenticated and users will > be required to have the proper authorization for all requests, including > using the Admin UI and making any API calls." > > If activating the authorization plugin doesn't protect the admin ui, how does > one protect access to it? > > Also, the issue I'm having is not just at restart. According to the docs > security.json should be uploaded to Zookeeper before starting any of the Solr > instances. However, I tried to upload security.json before starting any of > the Solr instances, but it would not pick up the security config until after > the Solr instances are already running and then uploading the security.json > again. I can see in the logs at startup that the Solr instances don't see > any plugin enabled even though security.json is already in zookeeper and then > after they are started and the security.json is uploaded again I see it > reconfigure to use the plugin. > > Thanks, > Kevin > >> On Aug 31, 2015, at 11:22 PM, Noble Paul wrote: >> >> Admin UI is not protected by any of these permissions. Only if you try >> to perform a protected operation , it asks for a password. >> >> I'll investigate the restart problem and report my findings >> >>> On Tue, Sep 1, 2015 at 3:10 AM, Kevin Lee wrote: >>> Anyone else running into any issues trying to get the authentication and >>> authorization plugins in 5.3 working? >>> >>>> On Aug 29, 2015, at 2:30 AM, Kevin Lee wrote: >>>> >>>> Hi, >>>> >>>> I’m trying to use the new basic auth plugin for Solr 5.3 and it doesn’t >>>> seem to be working quite right. Not sure if I’m missing steps or there is >>>> a bug. I am able to get it to protect access to a URL under a collection, >>>> but am unable to get it to secure access to the Admin UI. In addition, >>>> after stopping the Solr and Zookeeper instances, the security.json is >>>> still in Zookeeper, however Solr is allowing access to everything again >>>> like the security configuration isn’t in place. >>>> >>>> Contents of security.json taken from wiki page, but edited to produce >>>> valid JSON. Had to move comma after 3rd from last “}” up to just after >>>> the last “]”. >>>> >>>> { >>>> "authentication":{ >>>> "class":"solr.BasicAuthPlugin", >>>> "credentials":{"solr":"IV0EHq1OnNrj6gvRCwvFwTrZ1+z1oBbnQdiVC3otuq0= >>>> Ndd7LKvVBAaZIF0QAVi1ekCfAJXr1GGfLtRUXhgrF8c="} >>>> }, >>>> "authorization":{ >>>> "class":"solr.RuleBasedAuthorizationPlugin", >>>> "permissions":[{"name":"security-edit", >>>>"role":"admin"}], >>>> "user-role":{"solr":"admin"} >>>> }} >>>> >>>> Here are the steps I followed: >>>> >>>> Upload security.json to zookeeper >>>> ./zkcli.sh -z localhost:2181,localhost:2182,localhost:2183 -cmd putfile >>>> /security.json ~/solr/security.json >>>> >>>> Use zkCli.sh from Zookeeper to ensure the security.json is in Zookeeper at >>>> /security.json. It is there and looks like what was originally uploaded. >>>> >>>> Start Solr Instances >>>> >>>> Attempt to create a permission, however get the following error: >>>> { >>>> "responseHeader":{ >>>> "status":400, >>>> &
Re: Issue Using Solr 5.3 Authentication and Authorization Plugins
Looks like there is a bug in that . On start/restart the security.json is not loaded I shall open a ticket https://issues.apache.org/jira/browse/SOLR-8000 On Tue, Sep 1, 2015 at 1:01 PM, Noble Paul wrote: > I'm investigating why restarts or first time start does not read the > security.json > > On Tue, Sep 1, 2015 at 1:00 PM, Noble Paul wrote: >> I removed that statement >> >> "If activating the authorization plugin doesn't protect the admin ui, >> how does one protect access to it?" >> >> One does not need to protect the admin UI. You only need to protect >> the relevant API calls . I mean it's OK to not protect the CSS and >> HTML stuff. But if you perform an action to create a core or do a >> query through admin UI , it automatically will prompt you for >> credentials (if those APIs are protected) >> >> On Tue, Sep 1, 2015 at 12:41 PM, Kevin Lee wrote: >>> Thanks for the clarification! >>> >>> So is the wiki page incorrect at >>> https://cwiki.apache.org/confluence/display/solr/Basic+Authentication+Plugin >>> which says that the admin ui will require authentication once the >>> authorization plugin is activated? >>> >>> "An authorization plugin is also available to configure Solr with >>> permissions to perform various activities in the system. Once activated, >>> access to the Solr Admin UI and all requests will need to be authenticated >>> and users will be required to have the proper authorization for all >>> requests, including using the Admin UI and making any API calls." >>> >>> If activating the authorization plugin doesn't protect the admin ui, how >>> does one protect access to it? >>> >>> Also, the issue I'm having is not just at restart. According to the docs >>> security.json should be uploaded to Zookeeper before starting any of the >>> Solr instances. However, I tried to upload security.json before starting >>> any of the Solr instances, but it would not pick up the security config >>> until after the Solr instances are already running and then uploading the >>> security.json again. I can see in the logs at startup that the Solr >>> instances don't see any plugin enabled even though security.json is already >>> in zookeeper and then after they are started and the security.json is >>> uploaded again I see it reconfigure to use the plugin. >>> >>> Thanks, >>> Kevin >>> >>>> On Aug 31, 2015, at 11:22 PM, Noble Paul wrote: >>>> >>>> Admin UI is not protected by any of these permissions. Only if you try >>>> to perform a protected operation , it asks for a password. >>>> >>>> I'll investigate the restart problem and report my findings >>>> >>>>> On Tue, Sep 1, 2015 at 3:10 AM, Kevin Lee >>>>> wrote: >>>>> Anyone else running into any issues trying to get the authentication and >>>>> authorization plugins in 5.3 working? >>>>> >>>>>> On Aug 29, 2015, at 2:30 AM, Kevin Lee wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> I’m trying to use the new basic auth plugin for Solr 5.3 and it doesn’t >>>>>> seem to be working quite right. Not sure if I’m missing steps or there >>>>>> is a bug. I am able to get it to protect access to a URL under a >>>>>> collection, but am unable to get it to secure access to the Admin UI. >>>>>> In addition, after stopping the Solr and Zookeeper instances, the >>>>>> security.json is still in Zookeeper, however Solr is allowing access to >>>>>> everything again like the security configuration isn’t in place. >>>>>> >>>>>> Contents of security.json taken from wiki page, but edited to produce >>>>>> valid JSON. Had to move comma after 3rd from last “}” up to just after >>>>>> the last “]”. >>>>>> >>>>>> { >>>>>> "authentication":{ >>>>>> "class":"solr.BasicAuthPlugin", >>>>>> "credentials":{"solr":"IV0EHq1OnNrj6gvRCwvFwTrZ1+z1oBbnQdiVC3otuq0= >>>>>> Ndd7LKvVBAaZIF0QAVi1ekCfAJXr1GGfLtRUXhgrF8c="} >>>>>> }, >>>>>> "authorization":{ >>>>>> "class":"solr.RuleBasedAuthorizationPlug
Re: Issue Using Solr 5.3 Authentication and Authorization Plugins
" However, after uploading the new security.json and restarting the web browser," The browser remembers your login , So it is unlikely to prompt for the credentials again. Why don't you try the RELOAD operation using command line (curl) ? On Tue, Sep 1, 2015 at 10:31 PM, Kevin Lee wrote: > The restart issues aside, I’m trying to lockdown usage of the Collections > API, but that also does not seem to be working either. > > Here is my security.json. I’m using the “collection-admin-edit” permission > and assigning it to the “adminRole”. However, after uploading the new > security.json and restarting the web browser, it doesn’t seem to be requiring > credentials when calling the RELOAD action on the Collections API. The only > thing that seems to work is the custom permission “browse” which is requiring > authentication before allowing me to pull up the page. Am I using the > permissions correctly for the RuleBasedAuthorizationPlugin? > > { > "authentication":{ >"class":"solr.BasicAuthPlugin", >"credentials": { > "admin”:” ", > "user": ” " > } > }, > "authorization":{ >"class":"solr.RuleBasedAuthorizationPlugin", >"permissions": [ > { > "name":"security-edit", > "role":"adminRole" > }, > { > "name":"collection-admin-edit”, > "role":"adminRole" > }, > { > "name":"browse", > "collection": "inventory", > "path": "/browse", > "role":"browseRole" > } > ], >"user-role": { > "admin": [ > "adminRole", > "browseRole" > ], > "user": [ > "browseRole" > ] > } > } > } > > Also tried adding the permission using the Authorization API, but no effect, > still isn’t protecting the Collections API from being invoked without a > username password. I do see in the Solr logs that it sees the updates > because it outputs the messages “Updating /security.json …”, “Security node > changed”, “Initializing authorization plugin: > solr.RuleBasedAuthorizationPlugin” and “Authentication plugin class obtained > from ZK: solr.BasicAuthPlugin”. > > Thanks, > Kevin > >> On Sep 1, 2015, at 12:31 AM, Noble Paul wrote: >> >> I'm investigating why restarts or first time start does not read the >> security.json >> >> On Tue, Sep 1, 2015 at 1:00 PM, Noble Paul wrote: >>> I removed that statement >>> >>> "If activating the authorization plugin doesn't protect the admin ui, >>> how does one protect access to it?" >>> >>> One does not need to protect the admin UI. You only need to protect >>> the relevant API calls . I mean it's OK to not protect the CSS and >>> HTML stuff. But if you perform an action to create a core or do a >>> query through admin UI , it automatically will prompt you for >>> credentials (if those APIs are protected) >>> >>> On Tue, Sep 1, 2015 at 12:41 PM, Kevin Lee >>> wrote: >>>> Thanks for the clarification! >>>> >>>> So is the wiki page incorrect at >>>> https://cwiki.apache.org/confluence/display/solr/Basic+Authentication+Plugin >>>> which says that the admin ui will require authentication once the >>>> authorization plugin is activated? >>>> >>>> "An authorization plugin is also available to configure Solr with >>>> permissions to perform various activities in the system. Once activated, >>>> access to the Solr Admin UI and all requests will need to be authenticated >>>> and users will be required to have the proper authorization for all >>>> requests, including using the Admin UI and making any A
Re: Issue Using Solr 5.3 Authentication and Authorization Plugins
I opened a ticket for the same https://issues.apache.org/jira/browse/SOLR-8004 On Wed, Sep 2, 2015 at 1:36 PM, Kevin Lee wrote: > I’ve found that completely exiting Chrome or Firefox and opening it back up > re-prompts for credentials when they are required. It was re-prompting with > the /browse path where authentication was working each time I completely > exited and started the browser again, however it won’t re-prompt unless you > exit completely and close all running instances so I closed all instances > each time to test. > > However, to make sure I ran it via the command line via curl as suggested and > it still does not give any authentication error when trying to issue the > command via curl. I get a success response from all the Solr instances that > the reload was successful. > > Not sure why the pre-canned permissions aren’t working, but the one to the > request handler at the /browse path is. > > >> On Sep 1, 2015, at 11:03 PM, Noble Paul wrote: >> >> " However, after uploading the new security.json and restarting the >> web browser," >> >> The browser remembers your login , So it is unlikely to prompt for the >> credentials again. >> >> Why don't you try the RELOAD operation using command line (curl) ? >> >> On Tue, Sep 1, 2015 at 10:31 PM, Kevin Lee wrote: >>> The restart issues aside, I’m trying to lockdown usage of the Collections >>> API, but that also does not seem to be working either. >>> >>> Here is my security.json. I’m using the “collection-admin-edit” permission >>> and assigning it to the “adminRole”. However, after uploading the new >>> security.json and restarting the web browser, it doesn’t seem to be >>> requiring credentials when calling the RELOAD action on the Collections >>> API. The only thing that seems to work is the custom permission “browse” >>> which is requiring authentication before allowing me to pull up the page. >>> Am I using the permissions correctly for the RuleBasedAuthorizationPlugin? >>> >>> { >>>"authentication":{ >>> "class":"solr.BasicAuthPlugin", >>> "credentials": { >>>"admin”:” ", >>>"user": ” " >>>} >>>}, >>>"authorization":{ >>> "class":"solr.RuleBasedAuthorizationPlugin", >>> "permissions": [ >>>{ >>>"name":"security-edit", >>>"role":"adminRole" >>>}, >>>{ >>>"name":"collection-admin-edit”, >>>"role":"adminRole" >>>}, >>>{ >>>"name":"browse", >>>"collection": "inventory", >>>"path": "/browse", >>>"role":"browseRole" >>>} >>>], >>> "user-role": { >>>"admin": [ >>>"adminRole", >>>"browseRole" >>>], >>>"user": [ >>> "browseRole" >>>] >>>} >>>} >>> } >>> >>> Also tried adding the permission using the Authorization API, but no >>> effect, still isn’t protecting the Collections API from being invoked >>> without a username password. I do see in the Solr logs that it sees the >>> updates because it outputs the messages “Updating /security.json …”, >>> “Security node changed”, “Initializing authorization plugin: >>> solr.RuleBasedAuthorizationPlugin” and “Authentication plugin class >>> obtained from ZK: solr.BasicAuthPlugin”. >>> >>> Thanks, >>> Kevin >>> >>>> On Sep 1, 2015, at 12:31 AM, Noble Paul wrote: >>>> >>>> I'm investigating why restarts
Re: Issue Using Solr 5.3 Authentication and Authorization Plugins
Both these are committed. If you could test with the latest 5.3 branch it would be helpful On Wed, Sep 2, 2015 at 5:11 PM, Noble Paul wrote: > I opened a ticket for the same > https://issues.apache.org/jira/browse/SOLR-8004 > > On Wed, Sep 2, 2015 at 1:36 PM, Kevin Lee wrote: >> I’ve found that completely exiting Chrome or Firefox and opening it back up >> re-prompts for credentials when they are required. It was re-prompting with >> the /browse path where authentication was working each time I completely >> exited and started the browser again, however it won’t re-prompt unless you >> exit completely and close all running instances so I closed all instances >> each time to test. >> >> However, to make sure I ran it via the command line via curl as suggested >> and it still does not give any authentication error when trying to issue the >> command via curl. I get a success response from all the Solr instances that >> the reload was successful. >> >> Not sure why the pre-canned permissions aren’t working, but the one to the >> request handler at the /browse path is. >> >> >>> On Sep 1, 2015, at 11:03 PM, Noble Paul wrote: >>> >>> " However, after uploading the new security.json and restarting the >>> web browser," >>> >>> The browser remembers your login , So it is unlikely to prompt for the >>> credentials again. >>> >>> Why don't you try the RELOAD operation using command line (curl) ? >>> >>> On Tue, Sep 1, 2015 at 10:31 PM, Kevin Lee >>> wrote: >>>> The restart issues aside, I’m trying to lockdown usage of the Collections >>>> API, but that also does not seem to be working either. >>>> >>>> Here is my security.json. I’m using the “collection-admin-edit” >>>> permission and assigning it to the “adminRole”. However, after uploading >>>> the new security.json and restarting the web browser, it doesn’t seem to >>>> be requiring credentials when calling the RELOAD action on the Collections >>>> API. The only thing that seems to work is the custom permission “browse” >>>> which is requiring authentication before allowing me to pull up the page. >>>> Am I using the permissions correctly for the RuleBasedAuthorizationPlugin? >>>> >>>> { >>>>"authentication":{ >>>> "class":"solr.BasicAuthPlugin", >>>> "credentials": { >>>>"admin”:” ", >>>>"user": ” " >>>>} >>>>}, >>>>"authorization":{ >>>> "class":"solr.RuleBasedAuthorizationPlugin", >>>> "permissions": [ >>>>{ >>>>"name":"security-edit", >>>>"role":"adminRole" >>>>}, >>>>{ >>>>"name":"collection-admin-edit”, >>>>"role":"adminRole" >>>>}, >>>>{ >>>>"name":"browse", >>>>"collection": "inventory", >>>>"path": "/browse", >>>>"role":"browseRole" >>>>} >>>> ], >>>> "user-role": { >>>>"admin": [ >>>> "adminRole", >>>>"browseRole" >>>>], >>>>"user": [ >>>>"browseRole" >>>>] >>>>} >>>>} >>>> } >>>> >>>> Also tried adding the permission using the Authorization API, but no >>>> effect, still isn’t protecting the Collections API from being invoked >>>> without a username password. I do see in the Solr logs that it sees the >&
Re: Issue Using Solr 5.3 Authentication and Authorization Plugins
There are no download links for 5.3.x branch till we do a bug fix release If you wish to download the trunk nightly (which is not same as 5.3.0) check here https://builds.apache.org/job/Solr-Artifacts-trunk/lastSuccessfulBuild/artifact/solr/package/ If you wish to get the binaries for 5.3 branch you will have to make it (you will need to install svn and ant) Here are the steps svn checkout http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_5_3/ cd lucene_solr_5_3/solr ant server On Fri, Sep 4, 2015 at 4:11 PM, davidphilip cherian wrote: > Hi Kevin/Noble, > > What is the download link to take the latest? What are the steps to compile > it, test and use? > We also have a use case to have this feature in solr too. Therefore, wanted > to test and above info would help a lot to get started. > > Thanks. > > > On Fri, Sep 4, 2015 at 1:45 PM, Kevin Lee wrote: > >> Thanks, I downloaded the source and compiled it and replaced the jar file >> in the dist and solr-webapp’s WEB-INF/lib directory. It does seem to be >> protecting the Collections API reload command now as long as I upload the >> security.json after startup of the Solr instances. If I shutdown and bring >> the instances back up, the security is no longer in place and I have to >> upload the security.json again for it to take effect. >> >> - Kevin >> >> > On Sep 3, 2015, at 10:29 PM, Noble Paul wrote: >> > >> > Both these are committed. If you could test with the latest 5.3 branch >> > it would be helpful >> > >> > On Wed, Sep 2, 2015 at 5:11 PM, Noble Paul wrote: >> >> I opened a ticket for the same >> >> https://issues.apache.org/jira/browse/SOLR-8004 >> >> >> >> On Wed, Sep 2, 2015 at 1:36 PM, Kevin Lee >> wrote: >> >>> I’ve found that completely exiting Chrome or Firefox and opening it >> back up re-prompts for credentials when they are required. It was >> re-prompting with the /browse path where authentication was working each >> time I completely exited and started the browser again, however it won’t >> re-prompt unless you exit completely and close all running instances so I >> closed all instances each time to test. >> >>> >> >>> However, to make sure I ran it via the command line via curl as >> suggested and it still does not give any authentication error when trying >> to issue the command via curl. I get a success response from all the Solr >> instances that the reload was successful. >> >>> >> >>> Not sure why the pre-canned permissions aren’t working, but the one to >> the request handler at the /browse path is. >> >>> >> >>> >> >>>> On Sep 1, 2015, at 11:03 PM, Noble Paul wrote: >> >>>> >> >>>> " However, after uploading the new security.json and restarting the >> >>>> web browser," >> >>>> >> >>>> The browser remembers your login , So it is unlikely to prompt for the >> >>>> credentials again. >> >>>> >> >>>> Why don't you try the RELOAD operation using command line (curl) ? >> >>>> >> >>>> On Tue, Sep 1, 2015 at 10:31 PM, Kevin Lee >> wrote: >> >>>>> The restart issues aside, I’m trying to lockdown usage of the >> Collections API, but that also does not seem to be working either. >> >>>>> >> >>>>> Here is my security.json. I’m using the “collection-admin-edit” >> permission and assigning it to the “adminRole”. However, after uploading >> the new security.json and restarting the web browser, it doesn’t seem to be >> requiring credentials when calling the RELOAD action on the Collections >> API. The only thing that seems to work is the custom permission “browse” >> which is requiring authentication before allowing me to pull up the page. >> Am I using the permissions correctly for the RuleBasedAuthorizationPlugin? >> >>>>> >> >>>>> { >> >>>>> "authentication":{ >> >>>>> "class":"solr.BasicAuthPlugin", >> >>>>> "credentials": { >> >>>>> "admin”:” ", >> >>>>> "user": ” " >> >>>>> } >> >>>>> }, >> >>>>> "authorization":{ >> >>>>
Re: Strange interpretation of invalid ISO date strings
Just a word of warning: iso-8601, the date format standard, is quite big, to say the least, and I thus expect very few implementations to be complete. I survived one such interoperability issue with Safari on iOS6. While they (and JS I think) claim iso-8601, it was not complete and fine grained hunting lead us to the discovery of that. Opening an issue at Apple was done but changing on our side was much faster. Overall, this has cost us several months of development... I wish there would be a tinyer standard. Paul -- fat fingered on my z10 -- Message d'origine De: Shawn Heisey Envoyé: Montag, 7. September 2015 02:05 À: solr-user@lucene.apache.org Répondre à: solr-user@lucene.apache.org Objet: Strange interpretation of invalid ISO date strings Here's some debug info from a query our code was generating: "querystring": "post_date:[2015-09-0124T00:00:00Z TO 2015-09-0224T00:00:00Z]", "parsedquery": "post_date:[145169280 TO 146033280]", The "24" is from part of our code that interprets the hour, it was being incorrectly added. We have since fixed the problem, but are somewhat confused that we did not get an error. When I decode the millisecond timestamps in the parsed query, I get these dates: Sat, 02 Jan 2016 00:00:00 GMT Mon, 11 Apr 2016 00:00:00 GMT Should this be considered a bug? I would have expected Solr to throw an exception related to an invalidly formatted date, not assume that we meant the 124th and 224th day of the month and calculate it accordingly. Would I be right in thinking that this problem is not actually in Solr code, that we are using code from either Java itself or a third party for ISO date parsing? The index where this problem was noticed is Solr 4.9.1 running with Oracle JDK8u45 on Linux. I confirmed that the same thing happens if I use Solr 5.2.1 running with Oracle JDK 8u60 on Windows. Thanks, Shawn
Re: How to secure Admin UI with Basic Auth in Solr 5.3.x
Check this https://cwiki.apache.org/confluence/display/solr/Securing+Solr There a couple of bugs in 5.3.o and a bug fix release is coming up over the next few days. We don't provide any specific means to restrict access to admin UI itself. However we let users specify fine grained ACLs on various operations such collection-admin-edit, read etc On Wed, Sep 9, 2015 at 2:35 PM, Merlin Morgenstern wrote: > I just installed solr cloud 5.3.x and found that the way to secure the amin > ui has changed. Aparently there is a new plugin which does role based > authentification and all info on how to secure the admin UI found on the > net is outdated. > > I do not need role based authentification but just simply want to put basic > authentification to the Admin UI. > > How do I configure solr cloud 5.3.x in order to restrict access to the > Admin UI via Basic Authentification? > > Thank you for any help -- ----- Noble Paul
Re: How to secure Admin UI with Basic Auth in Solr 5.3.x
There were some bugs with the 5.3.0 release and 5.3.1 is in the process of getting released. try out the option #2 with the RC here https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-5.3.1-RC1-rev1702389/solr/ On Fri, Sep 11, 2015 at 5:16 PM, Merlin Morgenstern wrote: > OK, I downgraded to solr 5.2.x > > Unfortunatelly still no luck. I followed 2 aproaches: > > 1. Secure it the old fashioned way like described here: > http://stackoverflow.com/questions/28043957/how-to-set-apache-solr-admin-password > > 2. Using the Basic Authentication Plugin like described here: > http://lucidworks.com/blog/securing-solr-basic-auth-permission-rules/ > > Both aproaches created unsolved problems. > > While following option 1, I was able to secure the Admin UI with basic > authentication, but no longer able to access my application despite the > fact that it was working on solr 3.x with the same type of authentication > procedure and credentials. > > While following option 2, I was stuck right after uploading the > security.json file to the zookeeper ensemble. The described behaviour to curl > http://localhost:8983/solr/admin/authentication responded with a 404 not > found and then solr could not connect to zookeeper. I had to remove that > file from zookeeper and restart all solr nodes. > > Please could someone lead me the way on how to secure the Admin UI and > password protect solr cloud? I have a perfectly running system with solr > 3.x and one core and now taking it to solr cloud 5.2.x into production > seems to be stoped by simple authorization problems. > > Thank you in advane for any help. > > > > 2015-09-10 20:42 GMT+02:00 Noble Paul : > >> Check this https://cwiki.apache.org/confluence/display/solr/Securing+Solr >> >> There a couple of bugs in 5.3.o and a bug fix release is coming up >> over the next few days. >> >> We don't provide any specific means to restrict access to admin UI >> itself. However we let users specify fine grained ACLs on various >> operations such collection-admin-edit, read etc >> >> On Wed, Sep 9, 2015 at 2:35 PM, Merlin Morgenstern >> wrote: >> > I just installed solr cloud 5.3.x and found that the way to secure the >> amin >> > ui has changed. Aparently there is a new plugin which does role based >> > authentification and all info on how to secure the admin UI found on the >> > net is outdated. >> > >> > I do not need role based authentification but just simply want to put >> basic >> > authentification to the Admin UI. >> > >> > How do I configure solr cloud 5.3.x in order to restrict access to the >> > Admin UI via Basic Authentification? >> > >> > Thank you for any help >> >> >> >> -- >> - >> Noble Paul >> -- - Noble Paul
Re: Solr authentication - Error 401 Unauthorized
It is not that solr is over protected, it is just that the clients, SolrJ as well as bin/solr are not provided with basic auth capabilities. I have opened a ticket to track this https://issues.apache.org/jira/browse/SOLR-8048 On Sat, Sep 12, 2015 at 7:14 PM, Dan Davis wrote: > Noble, > > You should also look at this if it is intended to be more than an internal > API. Using the minor protections I added to test SOLR-8000, I was able to > reproduce a problem very like this: > > bin/solr healthcheck -z localhost:2181 -c mycollection > > Since Solr /select is protected... > > On Sat, Sep 12, 2015 at 9:40 AM, Dan Davis wrote: > >> It seems that you have secured Solr so thoroughly that you cannot now run >> bin/solr status! >> >> bin/solr has no arguments as yet for providing a username/password - as a >> mostly user like you I'm not sure of the roadmap. >> >> I think you should relax those restrictions a bit and try again. >> >> On Fri, Sep 11, 2015 at 5:06 AM, Merlin Morgenstern < >> merlin.morgenst...@gmail.com> wrote: >> >>> I have secured solr cloud via basic authentication. >>> >>> Now I am having difficulties creating cores and getting status >>> information. >>> Solr keeps telling me that the request is unothorized. However, I have >>> access to the admin UI after login. >>> >>> How do I configure solr to use the basic authentication credentials? >>> >>> This is the error message: >>> >>> /opt/solr-5.3.0/bin/solr status >>> >>> Found 1 Solr nodes: >>> >>> Solr process 31114 running on port 8983 >>> >>> ERROR: Failed to get system information from http://localhost:8983/solr >>> due >>> to: org.apache.http.client.ClientProtocolException: Expected JSON response >>> from server but received: >>> >>> >>> >>> >>> >>> Error 401 Unauthorized >>> >>> >>> >>> HTTP ERROR 401 >>> >>> Problem accessing /solr/admin/info/system. Reason: >>> >>> UnauthorizedPowered by >>> Jetty:// >>> >>> >>> >>> >>> >>> >> >> -- - Noble Paul
Re: Ideas
Writing a query component would be pretty easy or? It would throw an exception if crazy numbers are requested... I can provide a simple example of a maven project for a query component. Paul William Bell wrote: > We have some Denial of service attacks on our web site. SOLR threads are > going crazy. > > Basically someone is hitting start=15 + and rows=20. The start is crazy > large. > > And then they jump around. start=15 then start=213030 etc. > > Any ideas for how to stop this besides blocking these IPs? > > Sometimes it is Google doing it even though these search results are set > with No-index and No-Follow on these pages. > > Thoughts? Ideas?
[ANNOUNCE] Apache Lucene 5.3.1 released
24 September 2015, Apache Solr™ 5.3.1 available The Lucene PMC is pleased to announce the release of Apache Solr 5.3.1 Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search. Solr is highly scalable, providing fault tolerant distributed search and indexing, and powers the search and navigation features of many of the world's largest internet sites. This release contains various bug fixes and optimizations since the 5.3.0 release. The release is available for immediate download at: http://lucene.apache.org/solr/mirrors-solr-latest-redir.html Please read CHANGES.txt for a full list of new features and changes: https://lucene.apache.org/solr/5_3_1/changes/Changes.html Solr 5.3.1 includes these bug fixes. * security.json is not loaded on server start * RuleBasedAuthorization plugin does not respect the collection-admin-edit permission * Fix VelocityResponseWriter template encoding issue. Templates must be UTF-8 encoded * SimplePostTool (also bin/post) -filetypes "*" now works properly in 'web' mode * example/files update-script.js to be Java 7 and 8 compatible. * SolrJ could not make requests to handlers with '/admin/' prefix * Use of timeAllowed can cause incomplete filters to be cached and incorrect results to be returned on subsequent requests * VelocityResponseWriter's $resource.get(key,baseName,locale) to use specified locale. * Fix the exclusion filter so that collections that start with js, css, img, tpl can be accessed. * Resolve XSS issue in Admin UI stats page Known issues: * On Windows, bin/solr.cmd script fails to start correctly when using relative path with -s parameter. Use absolute path as a workaround . https://issues.apache.org/jira/browse/SOLR-8073 See the CHANGES.txt file included with the release for a full list of changes and further details. Please report any feedback to the mailing lists (http://lucene.apache.org/solr/discussion.html) Note: The Apache Software Foundation uses an extensive mirroring network for distributing releases. It is possible that the mirror you are using may not have replicated the release yet. If that is the case, please try another mirror. This also goes for Maven access. Noble Paul on behalf of Lucene PMC
Re: [ANNOUNCE] Apache Lucene 5.3.1 released
Wrong title On Thu, Sep 24, 2015 at 10:55 PM, Noble Paul wrote: > 24 September 2015, Apache Solr™ 5.3.1 available > > > The Lucene PMC is pleased to announce the release of Apache Solr 5.3.1 > > > Solr is the popular, blazing fast, open source NoSQL search platform > > from the Apache Lucene project. Its major features include powerful > > full-text search, hit highlighting, faceted search, dynamic > > clustering, database integration, rich document (e.g., Word, PDF) > > handling, and geospatial search. Solr is highly scalable, providing > > fault tolerant distributed search and indexing, and powers the search > > and navigation features of many of the world's largest internet sites. > > > This release contains various bug fixes and optimizations since the > 5.3.0 release. The release is available for immediate download at: > > > http://lucene.apache.org/solr/mirrors-solr-latest-redir.html > > > Please read CHANGES.txt for a full list of new features and changes: > > > https://lucene.apache.org/solr/5_3_1/changes/Changes.html > > > Solr 5.3.1 includes these bug fixes. > > > * security.json is not loaded on server start > > * RuleBasedAuthorization plugin does not respect the > collection-admin-edit permission > > * Fix VelocityResponseWriter template encoding issue. Templates must > be UTF-8 encoded > > * SimplePostTool (also bin/post) -filetypes "*" now works properly in > 'web' mode > > * example/files update-script.js to be Java 7 and 8 compatible. > > * SolrJ could not make requests to handlers with '/admin/' prefix > > * Use of timeAllowed can cause incomplete filters to be cached and > incorrect results to be returned on subsequent requests > > * VelocityResponseWriter's $resource.get(key,baseName,locale) to use > specified locale. > > * Fix the exclusion filter so that collections that start with js, > css, img, tpl can be accessed. > > * Resolve XSS issue in Admin UI stats page > > > Known issues: > > * On Windows, bin/solr.cmd script fails to start correctly when using > relative path with -s parameter. Use absolute path as a workaround . > https://issues.apache.org/jira/browse/SOLR-8073 > > > See the CHANGES.txt file included with the release for a full list of > changes and further details. > > > Please report any feedback to the mailing lists > (http://lucene.apache.org/solr/discussion.html) > > > Note: The Apache Software Foundation uses an extensive mirroring > network for distributing releases. It is possible that the mirror you > are using may not have replicated the release yet. If that is the > case, please try another mirror. This also goes for Maven access. > > Noble Paul > on behalf of Lucene PMC -- - Noble Paul
[ANNOUNCE] Apache Solr 5.3.1 released
24 September 2015, Apache Solr™ 5.3.1 available The Lucene PMC is pleased to announce the release of Apache Solr 5.3.1 Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search. Solr is highly scalable, providing fault tolerant distributed search and indexing, and powers the search and navigation features of many of the world's largest internet sites. This release contains various bug fixes and optimizations since the 5.3.0 release. The release is available for immediate download at: http://lucene.apache.org/solr/mirrors-solr-latest-redir.html Please read CHANGES.txt for a full list of new features and changes: https://lucene.apache.org/solr/5_3_1/changes/Changes.html Solr 5.3.1 includes these bug fixes. * security.json is not loaded on server start * RuleBasedAuthorization plugin does not respect the collection-admin-edit permission * Fix VelocityResponseWriter template encoding issue. Templates must be UTF-8 encoded * SimplePostTool (also bin/post) -filetypes "*" now works properly in 'web' mode * example/files update-script.js to be Java 7 and 8 compatible. * SolrJ could not make requests to handlers with '/admin/' prefix * Use of timeAllowed can cause incomplete filters to be cached and incorrect results to be returned on subsequent requests * VelocityResponseWriter's $resource.get(key,baseName,locale) to use specified locale. * Fix the exclusion filter so that collections that start with js, css, img, tpl can be accessed. * Resolve XSS issue in Admin UI stats page Known issues: * On Windows, bin/solr.cmd script fails to start correctly when using relative path with -s parameter. Use absolute path as a workaround . https://issues.apache.org/jira/browse/SOLR-8073 See the CHANGES.txt file included with the release for a full list of changes and further details. Noble Paul on behalf of Lucene PMC
Re: Instant Page Previews
This is a very nice start Charlie, I'd warn a bit however, on the value of such previews: automated previews of web-page can be quite far from what users might be remembering a page should look like. In particular all tool pages typically show quite "empty" or "initial" state in such automatic previewers. For i2geo.net, I searched for such a solution (a tick longer than 6 years ago!) and failed to find a successful one. Instead, we built in a signed applet (yes, this is old) where users could screenshot previews. To my taste, this allows a far far better feeling, but of course, it requires a community approach. Maybe both are needed if there's an infinite budget... Paul > Charlie Hull <mailto:char...@flax.co.uk> > 8 octobre 2015 09:48 > > Hi Lewin, > > We built this feature for another search engine (based on Xapian, > which I doubt many people have heard of) a long while ago. It's > standalone and open source though so should be applicable: > https://github.com/flaxsearch/flaxcode/tree/master/flax_basic/libs/previewgen > > It uses a headless version of Open Office under the hood to generate > thumbbnail previews for various common file types, plus some > ImageMagick for PDF, all wrapped up in Python. Bear in mind this is 6 > years old so some updating might be required! > > Cheers > > Charlie > > > Lewin Joy (TMS) <mailto:lewin_...@toyota.com> > 7 octobre 2015 19:49 > Hi, > > Is there anyway we can implement instant page previews in solr? > Just saw that Google Search Appliance has this out of the box. > Just like what google.com had previously. We need to display the > content of the result record when hovering over the link. > > Thanks, > Lewin > > > >
Re: Kate Winslet vs Winslet Kate
I believe that very many installations of solr actually need a query expansion such as the one you describe below with an indexing of each textual fields in multiple forms (string, straight (whitespace/ideaograms), stemmed, phonetic). Thanks to edismax, I think, you would do the following expansion: - 2.0 for string match (same field only, complete value) - 1.8 for straight match phrase (same field only, using a slop) - 1.5 for straight match in bag of words - 1.3 for stemmed match in bag of words - 1.1 for phonetic match in bag of words I think you can do that with edismax: sure about the parameter distribution, just not sure about the pf usage, this might need two straight fields which is quite cheap. As others indicated having intelligence to recognize the terms (e.g. Kate should be in name) or some user indication to do so can make thing more precise but is rarely done. Please note that this is just a suggestion. In particular, parameters really need some testing and adjustment, I think. Paul > Erick Erickson <mailto:erickerick...@gmail.com> > 1 novembre 2015 07:40 > Yeah, that's actually a tough one. You have no control over what the > user types, > you have to try to guess what they meant. > > To do that right, you really have to have some meta-data besides what > the user > typed in, i.e. recognize "kate" and "winslet" are proper names and > "movies" is > something else and break up the query appropriately behind the scenes. > > edismax might help here. You could copyField for everything into a > bag_of_words field then boost the name field quite high relative to the > bag_of_words field. That way, and _assuming_ that the bag_of_words > field had all three words, then the user at least gets something. > > You can also do some tricks with edismax and the "pf" parameters. That > option automatically takes the input and makes a phrase out of it against > the field, so you get better scores for, say, the name field if it > contains > the phrase "kate winslet". doesn't help with the kate winslet movies > though. > > On Sat, Oct 31, 2015 at 11:11 PM, Daniel Valdivia > Daniel Valdivia <mailto:h...@danielvaldivia.com> > 1 novembre 2015 07:11 > Perhaps > > q=name:("Kate AND Winslet") > > q=name:("Kate Winslet") > > Sent from my iPhone > > Yangrui Guo <mailto:guoyang...@gmail.com> > 1 novembre 2015 06:21 > Thanks for the reply. Putting the name: before the terms did the work. I > just wanted to generalize the search query because users might be > interested in querying Kate Winslet herself or her movies. If user enter > query string "Kate Winslet movie", the query q=name:(Kate AND Winslet AND > movie) will return nothing. > > Yangrui Guo > > On Saturday, October 31, 2015, Erick Erickson > > Erick Erickson <mailto:erickerick...@gmail.com> > 1 novembre 2015 05:27 > There are a couple of anomalies here. > > 1> kate AND winslet > What does the query look like if you add &debug=true to the statement > and look at the "parsed_query" section of the return? My guess is you > typed "q=name:kate AND winslet" which parses as "q=name:kate AND > default_search_field:winslet" and are getting matches you don't > expect. You need something like "q=name:(kate AND winslet)" or > "q=name:kate AND name:winslet". Note that if you're using eDIsmax it's > more complicated, but that should still honor the intent. > > 2> I have no idea why searching for "Kate Winslet" in quotes returns > anything, I wouldn't expect it to unless you mean you type in "q=kate > winslet" which is searching against your default field, not the name > field. > > Best, > Erick > Yangrui Guo <mailto:guoyang...@gmail.com> > 1 novembre 2015 04:52 > Hi today I found an interesting aspect of solr. I imported IMDB data into > solr. The IMDB puts last name before first name for its person's name > field > eg. "Winslet, Kate". When I search "Winslet Kate" with quotation marks I > could get the exact result. However if I search "Kate Winslet" or Kate AND > Winslet solr seem to return me all result containing either Kate or > Winslet > which is similar to "Winslet Kate"~99. From user perspective I > certainly want solr to treat Kate Winslet the same as Winslet Kate. Is > there anyway to make solr score higher for terms in the same field? > > Yangrui >
Re: Kate Winslet vs Winslet Kate
Alexandre, I guess you are talking about that post: http://lucidworks.com/blog/2015/06/06/query-autofiltering-extended-language-logic-search/ I think it is very often impossible to solve properly. Words such as "direction" have very many meanings and would come in different fields. In IMDB, words such as the names of persons would come in at least different roles; similarly, the actors' role's name is likely to match the family name of persons... Paul > As others indicated having intelligence to recognize the terms (e.g. > Kate should be in name) or some user indication to do so can make thing > more precise but is rarely done. > Alexandre Rafalovitch <mailto:arafa...@gmail.com> > 1 novembre 2015 13:07 > Which is what I believe Ted Sullivan is working on and presented at > the latest Lucene/Solr Revolution. His presentation does not seem to > be up, but he was writing about it on: > http://lucidworks.com/blog/author/tedsullivan/ > Erick Erickson <mailto:erickerick...@gmail.com> > 1 novembre 2015 07:40 > Yeah, that's actually a tough one. You have no control over what the > user types, > you have to try to guess what they meant.