Re: Big SolrCloud cluster with a lot of collections
Thanks for your answersCurrently I have one machine (6 cores, 148 GB RAM, 2.5 TB HDD) and I index around 60 million documents for a day - the index size is around 26GB.I do have customer-ID today and I use it for the queries. I don't split the customers but I get bad performance. If I will make small collection for each customer then I know to query only those collections and I get better performance - the indexes are smaller and the Solr don't need to keep the other customers data in the memory. I checked it and the performance is much better. I do have 1 billion documents today but I can't index them - so it is a real requirement for today to be ably index 1 billion and keep the data for 90 days.We want to grow and to support more customers so I want to understand what design I need for 10 billions per day. I will think if I can split the customers to clusters and merge the results myself - it is a good idea. Thanks for the advise. What is better - 1 powerful machine or a few smaller? For example - one machine with 12 cores and 256GB 2.5 TB or 5 machines each with 4 cores and 32 GB 0.5 TB? Thanks,Yuri On Saturday, August 15, 2015 5:53 PM, Toke Eskildsen wrote: yura last wrote: > Hi All, I am testing a SolrCloud with many collections. The version is 5.2.1 > and I installed 3 machines – each one with 4 cores and 8 GB Ram.Then I > created collections with 3 shards and replication factor of 2. It gives me 2 > cores per collection on each machine.I reached almost 900 collections > and then the cluster was stuck and I couldn’t revive the cluster. That mirrors what others are reporting. > As I understand Solr have issues with many collections (thousands).If I > will use much more machines – does it will give me the ability to create > tens of thousands of collections or the limit is couple of thousands? (Caveat: I have no real world experience with high collection count in Solr) Adding more machines will not really help you as the problem with thousands of collections is not hardware power per se, but rather the coordination of them. You mention 180K collections below and with the current Solr architecture, I do not see that happening. > I want to build a cluster that will handle 10 billion of documents (currently > I > have 1 billion) per day and to keep the data for 90 days. Are those real requirements or something somebody hope will come true some years down the road? Technology has a habit of catching up and while a 900 billion document setup is a challenge today, it is probably a lot easier in 5 years. When we are discussion this, it would help if you could also approximate the index size in bytes. How large do you expect the sum of shards for 1 billion of your documents to be? Likewise, which kind of queries do you expect? Grouping? Faceting? All these things multiply. Anyway, your requirements are in a league where there is not much collective experience. You will definitely have to build a serious prototype or three to get a proper idea of how much power you need: The standard advices for scaling Solr does not make economical sense beyond a point. But you seem to have started that process already with your current tests. > I want to support 2000 customers so I would like to split them to collections > and also to split it by days. (180,000 collections) As 180,000 collections currently seems infeasible for a single SolrCloud, you should consider alternatives: 1) If your collections are independent, then build fully independent clusters of machines. 2) Don't use collections for dividing data between your customers. Use a field with a customer-ID or something like that. > If I will create big collections I will have performance issues with queries > and also most of the queries are for a specific customer. Why would many smaller collections have better performance than fewer larger collections? > (I also have cross customers queries) If you make independent setups, that could be solved by querying them independently and do the merging yourself. - Toke Eskildsen
Re: Cache for percentiles facets
Hi, just a general question as I was unable to find any old posts relating to stats/percentile/facets performance/cache settings. I have been using Solr since version 4.0 , now using the latest v. 5.2.1. What I have done: - Increase heap memory to 30gb - Experimented with the cache settings - Merged segments - Used docvalues as filter - Tried with ramdrive for index as well -The field I calculate percentile on is type int, seems to be a big performance difference between int and float/decimal etc. The database consists of multiple sets with 5 mil rows I calculate facets stats for a field filtered by those sets. My fields are indexed not stored The queries are basic curl http://localhost:8983/solr/demo/query -d 'rows=0&fq=set_id:id_of_set&q=*:*& json.facet={ by_something:{terms:{ field:myfield, facet:{ median_value:"percentile(myvalue_field,50)" } }} } As a quick fix I created a cache in redis ;) -Håvard On Sat, Aug 15, 2015 at 10:26 PM, Erick Erickson wrote: > You have to provide a lot more info about your problem, including > what you've tried, what your data looks like, etc. > > You might review: > http://wiki.apache.org/solr/UsingMailingLists > > Best, > Erick > > On Sat, Aug 15, 2015 at 10:27 AM, Håvard Wahl Kongsgård > wrote: > > Hi, I have tried various options to speed up percentile calculation for > > facets. But the internal solr cache only speed up my queries from 22 to > 19 > > sec. > > > > I'am using the new json facets http://yonik.com/json-facet-api/ > > > > Any tips for caching stats? > > > > > > -Håvard >
Re: Big SolrCloud cluster with a lot of collections
yura last wrote: > I have one machine (6 cores, 148 GB RAM, 2.5 TB HDD) and I index > around 60 million documents for a day - the index size is around 26GB. So 1 billion documents would be approximately 500GB. ...and 10 billion/day in 90 days would be 450TB. > I do have customer-ID today and I use it for the queries. I don't split > the customers but I get bad performance. If I will make small collection > for each customer then I know to query only those collections and I > get better performance - the indexes are smaller and the Solr don't > need to keep the other customers data in the memory. I checked it > and the performance is much better. True when the amount of concurrent active customers is low. How many customers do you expect to be actively using the index at a time? If the answer is "most of them", you should make sure that your tests reflect that. If the answer is "relatively few", then your setup might scale well (if you create independent clouds to handle the many collection problem). First search for a customer will of course take a while. > I do have 1 billion documents today but I can't index them Why? Does it break down, take too long to index, results in too slow searches? Your current problems helps a lot when talking future scale. > - so it is a real requirement for today to be ably index 1 billion and > keep the data for 90 days. To be clear: Would that be 1 billion index every 90 days, 1 billion each day in 90 days = 90 billion at any given time or something third? > What is better - 1 powerful machine or a few smaller? For example > - one machine with 12 cores and 256GB 2.5 TB or 5 machines > each with 4 cores and 32 GB 0.5 TB? Depends on what you do with your data. Most of the time, IO is the bottleneck for Solr and for those cases it is probably more bang-for-the-buck to buy machines with 256GB of RAM (or maybe the 148GB you have currently) as it minimizes the overhead per box. - Toke Eskildsen
Re: Big SolrCloud cluster with a lot of collections
I expect that the amount of concurrent customers will be low.Today I have 1 machine so I don't have the capacity for all the data. Because of that I am thinking on a new "cluster" solution.Today is 1 billion each day for 90 days = 90 billion (around 45TB data). I should prefer a lot of machines with many RAM and not so many HDD - right? Thanks,Yuri On Sunday, August 16, 2015 1:33 PM, Toke Eskildsen wrote: yura last wrote: > I have one machine (6 cores, 148 GB RAM, 2.5 TB HDD) and I index > around 60 million documents for a day - the index size is around 26GB. So 1 billion documents would be approximately 500GB. ...and 10 billion/day in 90 days would be 450TB. > I do have customer-ID today and I use it for the queries. I don't split > the customers but I get bad performance. If I will make small collection > for each customer then I know to query only those collections and I > get better performance - the indexes are smaller and the Solr don't > need to keep the other customers data in the memory. I checked it > and the performance is much better. True when the amount of concurrent active customers is low. How many customers do you expect to be actively using the index at a time? If the answer is "most of them", you should make sure that your tests reflect that. If the answer is "relatively few", then your setup might scale well (if you create independent clouds to handle the many collection problem). First search for a customer will of course take a while. > I do have 1 billion documents today but I can't index them Why? Does it break down, take too long to index, results in too slow searches? Your current problems helps a lot when talking future scale. > - so it is a real requirement for today to be ably index 1 billion and > keep the data for 90 days. To be clear: Would that be 1 billion index every 90 days, 1 billion each day in 90 days = 90 billion at any given time or something third? > What is better - 1 powerful machine or a few smaller? For example > - one machine with 12 cores and 256GB 2.5 TB or 5 machines > each with 4 cores and 32 GB 0.5 TB? Depends on what you do with your data. Most of the time, IO is the bottleneck for Solr and for those cases it is probably more bang-for-the-buck to buy machines with 256GB of RAM (or maybe the 148GB you have currently) as it minimizes the overhead per box. - Toke Eskildsen
Re: Big SolrCloud cluster with a lot of collections
yura last wrote: > I expect that the amount of concurrent customers will be low. > Today I have 1 machine so I don't have the capacity for all > the data. You aim for 90 billion documents in the first go and want to prepare for 10 times that. Your current test setup is 60M documents, which means you are off by a factor 1000. You really need to test on a larger subset. > Because of that I am thinking on a new "cluster" solution.Today is 1 billion > each day for 90 days = 90 billion (around 45TB data). > I should prefer a lot of machines with many RAM and not so many HDD - right? We seem to be looking at non-trivial machines, so I think you should run more tests at a larger scale, taking care to emulate the amount of requests and the amount of concurrent customer requests you expect. If you are lucky, it works well to swap in the data for the active customer and you will be able to get by with relatively modest hardware. We have had great success with buying relatively cheap (bang-for-the-buck) machines with low memory (compared to index size) and local SSDs. With static indexes (89 out of your 90 days would be static data, if I understand correctly), one of our 256GB machines holds 6 billion documents in 20TB of index data. You might want to investigate that option. Some details at https://sbdevel.wordpress.com/net-archive-search/ - Toke Eskildsen
Re: Admin Login
Erik, After Walters reply I started thinking along the lines you mentioned and realized the folly of doing that! Scott On 8/15/2015 9:57 PM, Erick Erickson wrote: Scott: You better not even let them access Solr directly. http://server:port/solr/admin/collections?ACTION=delete&name=collection. Try it sometime on a collection that's not important ;) But as Walter said, that'd be similar to allowing end users unrestricted access to a SOL database, that Solr URL is akin to "drop database". Or, if you've locked down the admin stuff, http://solr:port/solr/collection/update?commit=true&stream.body=*:* Best Erick On Sat, Aug 15, 2015 at 6:57 PM, Scott Derrick wrote: Walter, actually that explains it perfectly! I will move behind my apache server... thanks, Scott On 8/15/2015 6:15 PM, Walter Underwood wrote: No one runs a public-facing Solr server. Just like no one runs a public-facing MySQL server. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Aug 15, 2015, at 4:15 PM, Scott Derrick wrote: I'm somewhat puzzled there is no built in security. I can't image anybody is running a public facing solr server with the admin page wide open? I've searched and haven't found any solutions that work out of the box. I've tried the solutions here to no avail. https://wiki.apache.org/solr/SolrSecurity and here. http://wiki.eclipse.org/Jetty/Tutorial/Realms The Solr security docs say to use the application server and if I could run it on my tomcat server I would already be done. But I'm told I can't do that? What solutions are people using? Scott -- Leave no stone unturned. Euripides --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus
Re: phonetic filter factory question
Thanks, i didn't know you could do this, I'll check this out. On Aug 15, 2015 12:54 PM, "Alexandre Rafalovitch" wrote: > From the "teaching to fish" category of advice (since I don't know the > actual answer). > > Did you try "Analysis" screen in the Admin UI? If you check "Verbose > output" mark, you will see all the offsets and can easily confirm the > detailed behavior for yourself. > > Regards, > Alex. > > Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: > http://www.solr-start.com/ > > > On 15 August 2015 at 12:22, Jamie Johnson wrote: > > The JavaDoc says that the PhoneticFilterFactory will "inject" tokens with > > an offset of 0 into the stream. I'm assuming this means an offset of 0 > > from the token that it is analyzing, is that right? I am trying to > > collapse some of my schema, I currently have a text field that I use for > > general purpose text and another field with the PhoneticFilterFactory > > applied for finding things that are similar phonetically, but if this > does > > inject at the current position then I could likely collapse these into a > > single field. As always thanks in advance! > > > > -Jamie >
Re: joins
I exactly have the same requirement > On 13-Aug-2015, at 2:12 pm, Kiran Sai Veerubhotla wrote: > > does solr support joins? > > we have a use case where two collections have to be joined and the join has > to be on the faceted results of the two collections. is this possible?
Query term matches
Is there a way to get the list of terms that matched in a query response? I realize the q parameter is returned, but I'm looking for just the list of terms and not the operators. Scott -- To those leaning on the sustaining infinite, to-day is big with blessings. Mary Baker Eddy
Re: Query term matches
Scott Derrick wrote: > Is there a way to get the list of terms that matched in a query response? Add debug=query to your request: https://wiki.apache.org/solr/CommonQueryParameters#debug You might also want to try http://splainer.io/ - Toke Eskildsen
Re: Query term matches
with a query like q=mar* I tried the debugQuery=true but it just said rawquerystring": "mar*", "querystring": "mar*", "parsedquery": "_text_:mar*", "parsedquery_toString": "_text_:mar*", I already know that! one document match's Mary another matches Mary and martyr I will look at splainer.io Scott Original Message Subject: Re: Query term matches From: Toke Eskildsen To: solr-user@lucene.apache.org Date: 08/16/2015 11:39 AM Scott Derrick wrote: Is there a way to get the list of terms that matched in a query response? Add debug=query to your request: https://wiki.apache.org/solr/CommonQueryParameters#debug You might also want to try http://splainer.io/ - Toke Eskildsen -- No one can make you feel inferior without your consent. Eleanor Roosevelt
Solr Cloud Security Question
I have a solr cloud with 3 nodes. I've added password protection following the steps here: http://stackoverflow.com/questions/28043957/how-to-set-apache-solr-admin-password Now only one node is able to load the collections. The others are getting 401 Unauthorized error when loading the collections. Could anybody provide the instructions to configure security for solr cloud? Thanks, Magesh
No. of records mismatch
I did a dataimport with 'clean' set to false. The DIH status upon completion was: idle 1 6843427 6843427 0 2015-08-16 16:50:54 Indexing completed. Added/Updated: 6843427 documents. Deleted 0 documents. Whereas when I query using 'query?q=*:*&rows=0', I get the following count { "responseHeader":{ "status":0, "QTime":1, "params":{ "q":"*:*", "rows":"0"}}, "response":{"numFound":1616376,"start":0,"docs":[] }} There is a difference of 5 million records. Can anyone help me understand the behavior? The logs look fine. Thanks
Re: joins
You can do what are called "pseudo joins", which are eqivalent to a nested query in SQL. You get back data from one core, based upon criteria in the other. You cannot (yet) merge the results to create a composite document. Upayavira On Sun, Aug 16, 2015, at 06:02 PM, Nagasharath wrote: > I exactly have the same requirement > > > > > On 13-Aug-2015, at 2:12 pm, Kiran Sai Veerubhotla > > wrote: > > > > does solr support joins? > > > > we have a use case where two collections have to be joined and the join has > > to be on the faceted results of the two collections. is this possible?
Re: No. of records mismatch
You almost certainly have a non-unique ID field. Some documents are overwritten during indexing. Try it with a clean index, and then review the number of deleted documents (updates are a delete then insert action). Deletes are calculated with maxDocs minus numDocs. Upayavira On Sun, Aug 16, 2015, at 07:18 PM, Pattabiraman, Meenakshisundaram wrote: > I did a dataimport with 'clean' set to false. > The DIH status upon completion was: > > idle > > > 1 > 6843427 > 6843427 > 0 > 2015-08-16 16:50:54 > > Indexing completed. Added/Updated: 6843427 documents. Deleted 0 > documents. > > Whereas when I query using 'query?q=*:*&rows=0', I get the following > count > { > "responseHeader":{ > "status":0, > "QTime":1, > "params":{ > "q":"*:*", > "rows":"0"}}, > "response":{"numFound":1616376,"start":0,"docs":[] > }} > > There is a difference of 5 million records. Can anyone help me understand > the behavior? The logs look fine. > Thanks
Re: Query term matches
This isn't going to be easy. Why do you need to know? Especially with wildcards this'll be "challenging". For the specific docs that are returned, highlighting will tell you _some_ of them. Why only some? Because usually only the best N snippets are returned, say 3 (it's configurable). And it's still possible that four terms beginning with "mar" were in the returned doc (or N+1...). FWIW, Erick On Sun, Aug 16, 2015 at 10:48 AM, Scott Derrick wrote: > with a query like > > q=mar* > > I tried the debugQuery=true but it just said > > rawquerystring": "mar*", > "querystring": "mar*", > "parsedquery": "_text_:mar*", > "parsedquery_toString": "_text_:mar*", > > I already know that! > > one document match's Mary > another matches Mary and martyr > > I will look at splainer.io > > Scott > > > Original Message > Subject: Re: Query term matches > From: Toke Eskildsen > To: solr-user@lucene.apache.org > Date: 08/16/2015 11:39 AM > >> Scott Derrick wrote: >>> >>> Is there a way to get the list of terms that matched in a query response? >> >> >> Add debug=query to your request: >> https://wiki.apache.org/solr/CommonQueryParameters#debug >> >> You might also want to try >> http://splainer.io/ >> >> - Toke Eskildsen >> > > -- > No one can make you feel inferior without your consent. > Eleanor Roosevelt
Re: joins
Is there any chance of this feature(merge the results to create a composite document) coming out in the next release 5.3 ? On Sun, Aug 16, 2015 at 2:08 PM, Upayavira wrote: > You can do what are called "pseudo joins", which are eqivalent to a > nested query in SQL. You get back data from one core, based upon > criteria in the other. You cannot (yet) merge the results to create a > composite document. > > Upayavira > > On Sun, Aug 16, 2015, at 06:02 PM, Nagasharath wrote: > > I exactly have the same requirement > > > > > > > > > On 13-Aug-2015, at 2:12 pm, Kiran Sai Veerubhotla > wrote: > > > > > > does solr support joins? > > > > > > we have a use case where two collections have to be joined and the > join has > > > to be on the faceted results of the two collections. is this possible? >
Re: joins
bq: Is there any chance of this feature(merge the results to create a composite document) coming out in the next release 5.3 In a word "no". And there aren't really any long-range plans either that I'm aware of. You could also explore streaming aggregation, if the need here is more batch-oriented. If at all possible, Solr is much more flexible if you can de-normlize your data rather than try to make Solr work like an RDBMS. Of course it goes against the training of all DB Admins, but it's often the best option. So have you explored denormalizing and do you know it's not a viable option? Best, Erick On Sun, Aug 16, 2015 at 12:45 PM, naga sharathrayapati wrote: > Is there any chance of this feature(merge the results to create a composite > document) coming out in the next release 5.3 ? > > On Sun, Aug 16, 2015 at 2:08 PM, Upayavira wrote: > >> You can do what are called "pseudo joins", which are eqivalent to a >> nested query in SQL. You get back data from one core, based upon >> criteria in the other. You cannot (yet) merge the results to create a >> composite document. >> >> Upayavira >> >> On Sun, Aug 16, 2015, at 06:02 PM, Nagasharath wrote: >> > I exactly have the same requirement >> > >> > >> > >> > > On 13-Aug-2015, at 2:12 pm, Kiran Sai Veerubhotla >> wrote: >> > > >> > > does solr support joins? >> > > >> > > we have a use case where two collections have to be joined and the >> join has >> > > to be on the faceted results of the two collections. is this possible? >>
Re: joins
https://issues.apache.org/jira/browse/SOLR-7090 I see this jira open in support of joins which might solve the problem. On Sun, Aug 16, 2015 at 2:51 PM, Erick Erickson wrote: > bq: Is there any chance of this feature(merge the results to create a > composite > document) coming out in the next release 5.3 > > In a word "no". And there aren't really any long-range plans either that > I'm > aware of. > > You could also explore streaming aggregation, if the need here is more > batch-oriented. > > If at all possible, Solr is much more flexible if you can de-normlize your > data > rather than try to make Solr work like an RDBMS. Of course it goes against > the training of all DB Admins, but it's often the best option. > > So have you explored denormalizing and do you know it's not a viable > option? > > Best, > Erick > > On Sun, Aug 16, 2015 at 12:45 PM, naga sharathrayapati > wrote: > > Is there any chance of this feature(merge the results to create a > composite > > document) coming out in the next release 5.3 ? > > > > On Sun, Aug 16, 2015 at 2:08 PM, Upayavira wrote: > > > >> You can do what are called "pseudo joins", which are eqivalent to a > >> nested query in SQL. You get back data from one core, based upon > >> criteria in the other. You cannot (yet) merge the results to create a > >> composite document. > >> > >> Upayavira > >> > >> On Sun, Aug 16, 2015, at 06:02 PM, Nagasharath wrote: > >> > I exactly have the same requirement > >> > > >> > > >> > > >> > > On 13-Aug-2015, at 2:12 pm, Kiran Sai Veerubhotla < > sai.sq...@gmail.com> > >> wrote: > >> > > > >> > > does solr support joins? > >> > > > >> > > we have a use case where two collections have to be joined and the > >> join has > >> > > to be on the faceted results of the two collections. is this > possible? > >> >
Re: Query term matches
I'm searching a collection of documents. When I build my results page I provide a link to each document. If the user click the link I display the document with all the matched terms highlighted. I need to supply my highlighter a list of words to hilight in the doc. I thought the highlighter might be able to return a list of hits "per" document since it is highliting a fragment. Scott On 8/16/2015 1:44 PM, Erick Erickson wrote: This isn't going to be easy. Why do you need to know? Especially with wildcards this'll be "challenging". For the specific docs that are returned, highlighting will tell you _some_ of them. Why only some? Because usually only the best N snippets are returned, say 3 (it's configurable). And it's still possible that four terms beginning with "mar" were in the returned doc (or N+1...). FWIW, Erick On Sun, Aug 16, 2015 at 10:48 AM, Scott Derrick wrote: with a query like q=mar* I tried the debugQuery=true but it just said rawquerystring": "mar*", "querystring": "mar*", "parsedquery": "_text_:mar*", "parsedquery_toString": "_text_:mar*", I already know that! one document match's Mary another matches Mary and martyr I will look at splainer.io Scott Original Message Subject: Re: Query term matches From: Toke Eskildsen To: solr-user@lucene.apache.org Date: 08/16/2015 11:39 AM Scott Derrick wrote: Is there a way to get the list of terms that matched in a query response? Add debug=query to your request: https://wiki.apache.org/solr/CommonQueryParameters#debug You might also want to try http://splainer.io/ - Toke Eskildsen -- No one can make you feel inferior without your consent. Eleanor Roosevelt --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus
Re: Query term matches
splainer doesn't return anything the debug parameter can. On 8/16/2015 11:39 AM, Toke Eskildsen wrote: Scott Derrick wrote: Is there a way to get the list of terms that matched in a query response? Add debug=query to your request: https://wiki.apache.org/solr/CommonQueryParameters#debug You might also want to try http://splainer.io/ - Toke Eskildsen --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus
RE: No. of records mismatch
" You almost certainly have a non-unique ID field." Yes it is not absolutely unique but do not think it is at this 1 to 6 ratio. "Try it with a clean index, and then review the number of deleted documents (updates are a delete then insert action) " I tried on a new instance - same effect. I do not see any deletions. Is there a way to determine this from the logs to confirm that the behavior is due to non-uniqueness? This will serve as an assurance. Thanks 6843469 6843469 0 2015-08-16 21:22:24 Indexing completed. Added/Updated: 6843469 documents. Deleted 0 documents. 2015-08-16 22:31:47 Whereas '*:*' "params":{ "q":"*:*"}}, "response":{"numFound":1143108,"start":0,"docs":[ -Original Message- From: Upayavira [mailto:u...@odoko.co.uk] Sent: Sunday, August 16, 2015 3:18 PM To: solr-user@lucene.apache.org Subject: Re: No. of records mismatch You almost certainly have a non-unique ID field. Some documents are overwritten during indexing. Try it with a clean index, and then review the number of deleted documents (updates are a delete then insert action). Deletes are calculated with maxDocs minus numDocs. Upayavira On Sun, Aug 16, 2015, at 07:18 PM, Pattabiraman, Meenakshisundaram wrote: > I did a dataimport with 'clean' set to false. > The DIH status upon completion was: > > idle > > > 1 6843427 6843427 0 > 2015-08-16 16:50:54 > Indexing completed. Added/Updated: 6843427 documents. Deleted 0 > documents. > > Whereas when I query using 'query?q=*:*&rows=0', I get the following > count { > "responseHeader":{ > "status":0, > "QTime":1, > "params":{ > "q":"*:*", > "rows":"0"}}, > "response":{"numFound":1616376,"start":0,"docs":[] > }} > > There is a difference of 5 million records. Can anyone help me > understand the behavior? The logs look fine. > Thanks
xsl error
I'm using a dataimporthandler class="org.apache.solr.handler.dataimport.DataImportHandler"> html-config.xml I'm using the xsl attribute on all the entities, but this one is throwing an excpetion. This xsl is used in a production document conversion process with no problems. quote.xsl is a wrapper that calls into the real stuff which is in buildBookQuote.xsl . the error is reported as Caused by: javax.xml.transform.TransformerConfigurationException: solrres:/xslt/buildBookQuote.xsl: line 449: Required attribute 'test' is missing. here is the code starting at line 449 there actually is no "parameter" "test", though there is the xsl code that uses test="..." I have other xsl scripts I'm using on other entities that have calls with no problem? any ideas? Scott -- Leave no stone unturned. Euripides
Re: Solr Cloud Security Question
On 8/16/2015 12:09 PM, Tarala, Magesh wrote: > I have a solr cloud with 3 nodes. I've added password protection following > the steps here: > http://stackoverflow.com/questions/28043957/how-to-set-apache-solr-admin-password > > Now only one node is able to load the collections. The others are getting 401 > Unauthorized error when loading the collections. > > Could anybody provide the instructions to configure security for solr cloud? Authentication and SolrCloud do not work well together, unless it's client-certificate-based authentication (SSL). This is because there is currently no way to tell Solr what user/pass to use when making requests to another node. This was one of the early issues trying to solve the problem with user/pass authentication and inter-node requests: https://issues.apache.org/jira/browse/SOLR-4470 That issue is now closed as a duplicate, because Solr 5.3 will have an authentication/authorization framework, and basic authentication is one of the first things that has been implemented using that framework: https://issues.apache.org/jira/browse/SOLR-7692 The release process for 5.3 is underway now. If all goes well, the release will happen well before the end of August. Thanks, Shawn
RE: Solr Cloud Security Question
Thanks Shawn! We are on 4.10.4. Will consider 5.x upgrade shortly. -Original Message- From: Shawn Heisey [mailto:apa...@elyograg.org] Sent: Sunday, August 16, 2015 9:05 PM To: solr-user@lucene.apache.org Subject: Re: Solr Cloud Security Question On 8/16/2015 12:09 PM, Tarala, Magesh wrote: > I have a solr cloud with 3 nodes. I've added password protection > following the steps here: > http://stackoverflow.com/questions/28043957/how-to-set-apache-solr-adm > in-password > > Now only one node is able to load the collections. The others are getting 401 > Unauthorized error when loading the collections. > > Could anybody provide the instructions to configure security for solr cloud? Authentication and SolrCloud do not work well together, unless it's client-certificate-based authentication (SSL). This is because there is currently no way to tell Solr what user/pass to use when making requests to another node. This was one of the early issues trying to solve the problem with user/pass authentication and inter-node requests: https://issues.apache.org/jira/browse/SOLR-4470 That issue is now closed as a duplicate, because Solr 5.3 will have an authentication/authorization framework, and basic authentication is one of the first things that has been implemented using that framework: https://issues.apache.org/jira/browse/SOLR-7692 The release process for 5.3 is underway now. If all goes well, the release will happen well before the end of August. Thanks, Shawn
Re: No. of records mismatch
Hi, You should check whether there were deletions by navigating to solr admin core admin page. Example url http://localhost:8983/solr/#/~cores/test_shard1_replica1, check for numDocs, maxDocs and deletedDocs. If numDocs remains equal to maxDocs, then you confirm that there were no updations (as recommended by Upayavira) HTH On Mon, Aug 17, 2015 at 4:41 AM, Pattabiraman, Meenakshisundaram < pattabiraman.meenakshisunda...@aig.com> wrote: > " You almost certainly have a non-unique ID field." > Yes it is not absolutely unique but do not think it is at this 1 to 6 > ratio. > > "Try it with a clean index, and then review the number of deleted > documents (updates are a delete then insert action) " > I tried on a new instance - same effect. I do not see any deletions. Is > there a way to determine this from the logs to confirm that the behavior is > due to non-uniqueness? This will serve as an assurance. > Thanks > > 6843469 > 6843469 > 0 > 2015-08-16 21:22:24 > > Indexing completed. Added/Updated: 6843469 documents. Deleted 0 documents. > > 2015-08-16 22:31:47 > > Whereas '*:*' > "params":{ > "q":"*:*"}}, > "response":{"numFound":1143108,"start":0,"docs":[ > > -Original Message- > From: Upayavira [mailto:u...@odoko.co.uk] > Sent: Sunday, August 16, 2015 3:18 PM > To: solr-user@lucene.apache.org > Subject: Re: No. of records mismatch > > You almost certainly have a non-unique ID field. Some documents are > overwritten during indexing. Try it with a clean index, and then review the > number of deleted documents (updates are a delete then insert action). > Deletes are calculated with maxDocs minus numDocs. > > Upayavira > > On Sun, Aug 16, 2015, at 07:18 PM, Pattabiraman, Meenakshisundaram > wrote: > > I did a dataimport with 'clean' set to false. > > The DIH status upon completion was: > > > > idle > > > > > > 1 6843427 6843427 0 > > 2015-08-16 16:50:54 > > Indexing completed. Added/Updated: 6843427 documents. Deleted 0 > > documents. > > > > Whereas when I query using 'query?q=*:*&rows=0', I get the following > > count { > > "responseHeader":{ > > "status":0, > > "QTime":1, > > "params":{ > > "q":"*:*", > > "rows":"0"}}, > > "response":{"numFound":1616376,"start":0,"docs":[] > > }} > > > > There is a difference of 5 million records. Can anyone help me > > understand the behavior? The logs look fine. > > Thanks >