Improve indexing speed?
Hi to all, My document contains 65 fields. All the fields needs to be indexed. But for the 100 documents takes 10 seconds for indexing. I am using Solr 7.5 (2 cloud instance), with 50 shards. It's running on Windows OS and it has 32 GB RAM. Java heap space 15 GB. How to improve indexing speed? Note : All the fields contains maximum 20 characters only. Field type is text general with case insensitive. Thanks, John Milton
Re: Improve indexing speed?
What have you tried? The first thing I'd try is using just 1 or 2 shards. My first guess is that you're doing a lot of GC because you have 50 shards in a single JVM (1 replica/shard?). I regularly get several thousand Wikipedia docs/second on my macbook pro, so your numbers are way out of the norm. Best, Erick On Tue, Jan 1, 2019 at 9:05 AM John Milton wrote: > > Hi to all, > > My document contains 65 fields. All the fields needs to be indexed. But for > the 100 documents takes 10 seconds for indexing. > I am using Solr 7.5 (2 cloud instance), with 50 shards. > It's running on Windows OS and it has 32 GB RAM. Java heap space 15 GB. > How to improve indexing speed? > Note : > All the fields contains maximum 20 characters only. Field type is text > general with case insensitive. > > Thanks, > John Milton
Re: Improve indexing speed?
How are you indexing the documents? Are you using SolrJ or the plain REST API? Are you sending the documents one by one or all in one request? The performance is far better if you send the 100 documents in one request. If you send them individual, are you doing any commits between them? regards, Hendrik On 01.01.2019 16:59, John Milton wrote: Hi to all, My document contains 65 fields. All the fields needs to be indexed. But for the 100 documents takes 10 seconds for indexing. I am using Solr 7.5 (2 cloud instance), with 50 shards. It's running on Windows OS and it has 32 GB RAM. Java heap space 15 GB. How to improve indexing speed? Note : All the fields contains maximum 20 characters only. Field type is text general with case insensitive. Thanks, John Milton
Re: How to access the Solr Admin GUI
Why would you want to expose the administration gui on the web? This is a very hazardous thing to do. Never mind that it normally also runs on 8983 and all it's functionality relies on the ability to interact with 8983 hosted api end points. What are you actually trying to solve? On Dec 31, 2018 6:04 PM, "Jörn Franke" wrote: Reverse proxy? > Am 31.12.2018 um 22:48 schrieb s...@cid.is: > > Hi all, > > is there a way, better a solution, to access the Solr Admin GUI from outside the server (via public web) while the Solr port 8983 is closed by a firewall and only available inside the server via localhost? > > Thanks in advance > Walter Claassen > > Alexandraweg 32 > D 64287 Darmstadt > Fon +49-6151-4937961 > Fax +49-6151-4937969 > c...@cid.is >
Re: How to access the Solr Admin GUI
Yes, exposing the admin UI on the web is very dangerous. Anyone who finds it can delete all your collections. That UI is designed for “back office” use only. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jan 1, 2019, at 9:43 AM, Gus Heck wrote: > > Why would you want to expose the administration gui on the web? This is a > very hazardous thing to do. Never mind that it normally also runs on 8983 > and all it's functionality relies on the ability to interact with 8983 > hosted api end points. > > What are you actually trying to solve? > > On Dec 31, 2018 6:04 PM, "Jörn Franke" wrote: > > Reverse proxy? > > >> Am 31.12.2018 um 22:48 schrieb s...@cid.is: >> >> Hi all, >> >> is there a way, better a solution, to access the Solr Admin GUI from > outside the server (via public web) while the Solr port 8983 is closed by a > firewall and only available inside the server via localhost? >> >> Thanks in advance >> Walter Claassen >> >> Alexandraweg 32 >> D 64287 Darmstadt >> Fon +49-6151-4937961 >> Fax +49-6151-4937969 >> c...@cid.is >>
Re: How to access the Solr Admin GUI
You can use ssh to tunnel in. ssh -L8983:localhost:8983 use...@myremoteserver.example.com This will only require port 22 to be exposed to the public. Sent from my iPhone > On Jan 1, 2019, at 11:43 AM, Gus Heck wrote: > > Why would you want to expose the administration gui on the web? This is a > very hazardous thing to do. Never mind that it normally also runs on 8983 > and all it's functionality relies on the ability to interact with 8983 > hosted api end points. > > What are you actually trying to solve? > > On Dec 31, 2018 6:04 PM, "Jörn Franke" wrote: > > Reverse proxy? > > >> Am 31.12.2018 um 22:48 schrieb s...@cid.is: >> >> Hi all, >> >> is there a way, better a solution, to access the Solr Admin GUI from > outside the server (via public web) while the Solr port 8983 is closed by a > firewall and only available inside the server via localhost? >> >> Thanks in advance >> Walter Claassen >> >> Alexandraweg 32 >> D 64287 Darmstadt >> Fon +49-6151-4937961 >> Fax +49-6151-4937969 >> c...@cid.is >> -- The information in this e-mail is confidential and is intended solely for the addressee(s). Access to this email by anyone else is unauthorized. If you are not an intended recipient, you may not print, save or otherwise store the e-mail or any of the contents thereof in electronic or physical form, nor copy, use or disseminate the information contained in the email. If you are not an intended recipient, please notify the sender of this email immediately.
Re: How to access the Solr Admin GUI
You could configure a reverse proxy to provide one or more means of authentication. However, I agree that the purpose why this is done should be clarified. > Am 01.01.2019 um 19:02 schrieb Kay Wrobel : > > You can use ssh to tunnel in. > > ssh -L8983:localhost:8983 use...@myremoteserver.example.com > > This will only require port 22 to be exposed to the public. > > > Sent from my iPhone > >> On Jan 1, 2019, at 11:43 AM, Gus Heck wrote: >> >> Why would you want to expose the administration gui on the web? This is a >> very hazardous thing to do. Never mind that it normally also runs on 8983 >> and all it's functionality relies on the ability to interact with 8983 >> hosted api end points. >> >> What are you actually trying to solve? >> >> On Dec 31, 2018 6:04 PM, "Jörn Franke" wrote: >> >> Reverse proxy? >> >> >>> Am 31.12.2018 um 22:48 schrieb s...@cid.is: >>> >>> Hi all, >>> >>> is there a way, better a solution, to access the Solr Admin GUI from >> outside the server (via public web) while the Solr port 8983 is closed by a >> firewall and only available inside the server via localhost? >>> >>> Thanks in advance >>> Walter Claassen >>> >>> Alexandraweg 32 >>> D 64287 Darmstadt >>> Fon +49-6151-4937961 >>> Fax +49-6151-4937969 >>> c...@cid.is >>> > > -- > > The information in this e-mail is confidential and is intended solely for > the addressee(s). Access to this email by anyone else is unauthorized. If > you are not an intended recipient, you may not print, save or otherwise > store the e-mail or any of the contents thereof in electronic or physical > form, nor copy, use or disseminate the information contained in the email. > If you are not an intended recipient, please notify the sender of this > email immediately.
Re: How to access the Solr Admin GUI
I think a better approach to tunneling would be: ssh -p -L :localhost:8983 use...@myremoteserver.example.com This requires you to set up a different port () rather than use the standard 22 port (on your router and on your sshd config). I've been running something like this for about a year and have rarely if ever had it attacked. Prior to changing the port (to ), however, I was under constant hacking attacks - they find port 22 too attractive to ignore. Also, regarding my use of port : if you have the server running on several local machines (as I do), the use of the port may help prevent confusion (as to whether your browser is accessing a local - defaulted to 8983 - or a remote solr server). Note: you might find that the ssh connection will drop out after some inactivity, and need to be restarted occasionally. Pretty simple to do - just run the ssh line above again. Note: I also add authorization controls to the AdminUI (and its functions) On 1/1/19 1:02 PM, Kay Wrobel wrote: > You can use ssh to tunnel in. > > ssh -L8983:localhost:8983 use...@myremoteserver.example.com > > This will only require port 22 to be exposed to the public. > > > Sent from my iPhone > >> On Jan 1, 2019, at 11:43 AM, Gus Heck wrote: >> >> Why would you want to expose the administration gui on the web? This is a >> very hazardous thing to do. Never mind that it normally also runs on 8983 >> and all it's functionality relies on the ability to interact with 8983 >> hosted api end points. >> >> What are you actually trying to solve? >> >> On Dec 31, 2018 6:04 PM, "Jörn Franke" wrote: >> >> Reverse proxy? >> >> >>> Am 31.12.2018 um 22:48 schrieb s...@cid.is: >>> >>> Hi all, >>> >>> is there a way, better a solution, to access the Solr Admin GUI from >> outside the server (via public web) while the Solr port 8983 is closed by a >> firewall and only available inside the server via localhost? >>> Thanks in advance >>> Walter Claassen >>> >>> Alexandraweg 32 >>> D 64287 Darmstadt >>> Fon +49-6151-4937961 >>> Fax +49-6151-4937969 >>> c...@cid.is >>>
Re: How to access the Solr Admin GUI
On 12/31/2018 2:48 PM, s...@cid.is wrote: is there a way, better a solution, to access the Solr Admin GUI from outside the server (via public web) while the Solr port 8983 is closed by a firewall and only available inside the server via localhost? If you've blocked the Solr port, then you can't access Solr at all, including the admin UI. The UI is accessed through the same port as the rest of Solr. The admin UI is a static set of resources (html, css, javascript, images, etc) that gets downloaded and runs within the browser, accessing the same API that anything else would. When you issue a query with the admin UI, it is your browser that makes the query, not the server. If you set up a reverse proxy that blocks URL paths for the API while allowing URL paths for the admin UI, then the admin UI won't work -- because everything the admin UI displays or does is accomplished by your browser making calls to the API. Thanks, Shawn
Re: Removing words like "FONT-SIZE: 9pt; FONT-FAMILY: arial" from content
Although Vincenzo and Alexandre's suggestions may be helpful in the right circumstances, there is a continuum of answers to the original question here. This continuum is mostly relevant if indexing and querying is likely to happen simultaneously or the data volume is large enough relative to the server to make you wish indexing would finish faster. Otherwise maintainability, local talent and time investment concerns probably dominate, with the caveat that in many cases, initial success may lead to a future with large data volumes or where querying and indexing do become simultaneous. 1) Vincenzo's answer would be suitable for a single or a few small fields with a very narrow set of possible html like tags. If the number of patterns that need to be matched is high or the length of the text for matching is long I would expect this solution to begin to negatively impact performance. 2) Alexandre's suggestion is much better in the case where there is a moderate amount of text and the input could be generalized html, but as the amount of text that needs to have html stripped grows the performance of the server will also degrade faster than necessary with increased indexing load. 3) If the Solr Cloud you are indexing into will need to simultaneously need to provide good response times for queries, and you are not able to supply it with an over abundance of hardware relative to the query/indexing load, then you should consider pre-processing the documents in an external ingestion system such as JesterJ, Fusion, or a variety of other solutions out there. As the indexing and query load goes up, the best practice is to move as much pre-processing work out of solr as possible so that solr can continue to do what it does well and return queries quickly. In the end, like most engineering decisions, it's a cost trade off consideration. What costs more, investing in setting up external processing or investing in server hardware. If it's a small amount of data loaded batch style prior to querying, you are in a good place and any of these will work. Just do whatever is fastest/easiest to implement. If you need to support a high volume of data being loaded into solr in a timely manner or you require minimal impact to query latency due to indexing, you want some variation of 3. -Gus On Sun, Dec 30, 2018 at 10:29 PM Alexandre Rafalovitch wrote: > Specifically, a custome Update Request Processor chain can be used before > indexing. Probably with HTMLStripFieldUpdateProcessorFactory > Regards, > Alex > > On Sun, Dec 30, 2018, 9:26 PM Vincenzo D'Amore > > Hi, > > > > I think this kind of text manipulation should be done before indexing, if > > you have font-size font-family in your text, very likely you’re indexing > an > > html with css. > > If I’m right, you’re just entering in a hell of words that should be > > removed from your text. > > > > On the other hand, if you have to do this at index time, a quick and > dirty > > solution is using the pattern-replace filter. > > > > > > > https://lucene.apache.org/solr/guide/7_5/filter-descriptions.html#pattern-replace-filter > > > > Ciao, > > Vincenzo > > > > -- > > mobile: 3498513251 > > skype: free.dev > > > > > On 31 Dec 2018, at 02:47, Zheng Lin Edwin Yeo > > wrote: > > > > > > Hi, > > > > > > I noticed that during the indexing of EMLfiles, there are words like > > > "*FONT-SIZE: > > > 9pt; FONT-FAMILY: arial*" that are being indexed into the content as > > well. > > > > > > Would like to check, how are we able to remove those words during the > > > indexing? > > > > > > I am using Solr 7.5.0 > > > > > > Regards, > > > Edwin > > > -- http://www.the111shift.com
Debugging Solr Search results & Issues with Distributed IDF
Hi, I am trying to debug a query to find out why one documentgets more score than the other. The below are two similar products. Below is the debug results I get from Solr admin console. "Doc1": "\n15.20965 = sum of:\n 4.7573533 = max of:\n 4.7573533= weight(All:2x in 962) [], result of:\n 4.7573533 = score(doc=962,freq=2.0 =termFreq=2.0\n), product of:\n 3.4598935 = idf(docFreq=1346, docCount=42836)\n 1.375 = tfNorm, computed from:\n 2.0 = termFreq=2.0\n 1.2 = parameter k1\n 0.0 = parameter b (norms omitted forfield)\n 10.452296 = max of:\n 5.9166136 = weight(All:powerpoint in 962)[], result of:\n 5.9166136 =score(doc=962,freq=2.0 = termFreq=2.0\n), product of:\n 4.302992 = idf(docFreq=579,docCount=42836)\n 1.375 = tfNorm,computed from:\n 2.0 =termFreq=2.0\n 1.2 = parameterk1\n 0.0 = parameter b (normsomitted for field)\n 10.452296 =weight(All:\"socket outlet\" in 962) [], result of:\n 10.452296 = score(doc=962,freq=2.0 =phraseFreq=2.0\n), product of:\n 7.60167 = idf(), sum of:\n 3.5370626 = idf(docFreq=1246, docCount=42836)\n 4.064607 = idf(docFreq=735,docCount=42836)\n 1.375 = tfNorm,computed from:\n 2.0 =phraseFreq=2.0\n 1.2 = parameterk1\n 0.0 = parameter b (normsomitted for field)\n", "Doc15":"\n13.258003 = sum of:\n 5.7317085 = max of:\n 5.7317085 = weight(All:doubl in 2122) [],result of:\n 5.7317085 =score(doc=2122,freq=2.0 = termFreq=2.0\n), product of:\n 4.168515 = idf(docFreq=663,docCount=42874)\n 1.375 = tfNorm,computed from:\n 2.0 =termFreq=2.0\n 1.2 = parameterk1\n 0.0 = parameter b (normsomitted for field)\n 4.7657394 =weight(All:2x in 2122) [], result of:\n 4.7657394 = score(doc=2122,freq=2.0 = termFreq=2.0\n), productof:\n 3.4659925 =idf(docFreq=1339, docCount=42874)\n 1.375 = tfNorm, computed from:\n 2.0 = termFreq=2.0\n 1.2= parameter k1\n 0.0 = parameterb (norms omitted for field)\n 5.390302= weight(All:2g in 2122) [], result of:\n 5.390302 = score(doc=2122,freq=2.0 = termFreq=2.0\n), product of:\n 3.9202197 = idf(docFreq=850,docCount=42874)\n 1.375 = tfNorm,computed from:\n 2.0 = termFreq=2.0\n 1.2 = parameter k1\n 0.0 = parameter b (norms omitted forfield)\n 7.526294 = max of:\n 5.8597584 = weight(All:powerpoint in 2122)[], result of:\n 5.8597584 =score(doc=2122,freq=2.0 = termFreq=2.0\n), product of:\n 4.2616425 = idf(docFreq=604,docCount=42874)\n 1.375 = tfNorm,computed from:\n 2.0 = termFreq=2.0\n 1.2 = parameter k1\n 0.0 = parameter b (norms omitted forfield)\n 7.526294 =weight(All:\"socket outlet\" in 2122) [], result of:\n 7.526294 = score(doc=2122,freq=1.0 =phraseFreq=1.0\n), product of:\n 7.526294 = idf(), sum of:\n 3.4955401 = idf(docFreq=1300, docCount=42874)\n 4.030754 = idf(docFreq=761,docCount=42874)\n 1.0 = tfNorm,computed from:\n 1.0 =phraseFreq=1.0\n 1.2 = parameterk1\n 0.0 = parameter b (normsomitted for field)\n", My Questions 1. IDF : I understand from solr documents that IDFis calculated for each separate shards, I have added the following stats cacheconfig to solrconfig.xml and reloaded collection But even after that there is no change incalculated IDF. 2. What are parameter b and parameter K1? 3. Why there are lots of parameters included in myDoc15 rather than Doc1? Is there any documentations I can refer to understand thesolr query calculations in depth. We are using Solr 6.1in Cloud with 3 zookeepers and 3 masters and 3 replicas. Regards, Lavanya
Re: ConcurrentUpdateSolrClient - notify on success/failure?
thanks a lot for the explanation :) - Zeki ama calismiyor... Calissa yapar... -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Improve indexing speed?
On 1/1/2019 8:59 AM, John Milton wrote: My document contains 65 fields. All the fields needs to be indexed. But for the 100 documents takes 10 seconds for indexing. I am using Solr 7.5 (2 cloud instance), with 50 shards. The best way to achieve fast indexing in Solr is to index multiple items in parallel. That is, make your indexing system multi-threaded or multi-process. As Erick also asked ... why do you have so many shards? The only good reason I can imagine for so many shards is a need to handle billions of documents. Thanks, Shawn
Re: RuleBasedAuthorizationPlugin configuration
Hi, I created a Jira issue https://issues.apache.org/jira/browse/SOLR-13097 Regards. Dominique Le lun. 31 déc. 2018 à 11:26, Dominique Bejean a écrit : > Hi, > > In debugging mode, I discovered that only in SolrCloud mode the collection > name is extract from the request path in the init() method of > HttpSolrCall.java > >if (cores.isZooKeeperAware()) { > // init collectionList (usually one name but not when there are > aliases) > ... > } > > So in Solr standalone mode, only authentication is fully fonctionnal, not > authorization ! > > Regards. > > Dominique > > > > > > Le dim. 30 déc. 2018 à 13:40, Dominique Bejean > a écrit : > >> Hi, >> >> After reading more carefully the log file, here is my understanding. >> >> The request >> >> http://2:xx@localhost:8983/solr/biblio/select?indent=on&q=*:*&wt=json >> >> >> report this in log >> >> 2018-12-30 12:24:52.102 INFO (qtp1731656333-20) [ x:biblio] >> o.a.s.s.HttpSolrCall USER_REQUIRED auth header Basic Mjox context : >> userPrincipal: [[principal: 2]] type: [READ], collections: [], Path: >> [/select] path : /select params :q=*:*&indent=on&wt=json >> >> collections is empty, so it looks like "/select" is not collection >> specific and so it is not possible to define read access by collection. >> >> Can someone confirm ? >> >> Regards >> >> Dominique >> >> >> >> >> >> Le ven. 21 déc. 2018 à 10:46, Dominique Bejean >> a écrit : >> >>> Hi, >>> >>> I am trying to configure security.json file, in order to define the >>> following users and permissions : >>> >>>- user "admin" with all permissions on all collections >>>- user "read" with read permissions on all collections >>>- user "1" with only read permissions on biblio collection >>>- user "2" with only read permissions on personnes collection >>> >>> Here is my security.json file >>> >>> { >>> "authentication":{ >>> "blockUnknown":true, >>> "class":"solr.BasicAuthPlugin", >>> "credentials":{ >>> "admin":"4uwfcjV7bCqOdLF/Qn2wiTyC7zIWN6lyA1Bgp1yqZj0= >>> 7PCh68vhIlZXg1l45kSlvGKowMg1bm/L3eSfgT5dzjs=", >>> "read":"azUFSo9/plsGkQGhSQuk8YXoir22pALVpP8wFkd7wlk= >>> gft4wNAeuvz7P8bv/Jv6TK94g516/qXe9cFWe/VlhDo=", >>> "1":"azUFSo9/plsGkQGhSQuk8YXoir22pALVpP8wFkd7wlk= >>> gft4wNAeuvz7P8bv/Jv6TK94g516/qXe9cFWe/VlhDo=", >>> "2":"azUFSo9/plsGkQGhSQuk8YXoir22pALVpP8wFkd7wlk= >>> gft4wNAeuvz7P8bv/Jv6TK94g516/qXe9cFWe/VlhDo="}, >>> "":{"v":0}}, >>> "authorization":{ >>> "class":"solr.RuleBasedAuthorizationPlugin", >>> "permissions":[ >>> { >>> "name":"all", >>> "role":"admin", >>> "index":1}, >>> { >>> "name":"read-biblio", >>> "path":"/select", >>> "role":["admin","read","r1"], >>> "collection":"biblio", >>> "index":2}, >>> { >>> "name":"read-personnes", >>> "path":"/select", >>> "role":["admin","read","r2"], >>> "collection":"personnes", >>> "index":3}, >>> { >>> "name":"read", >>> "collection":"*", >>> "role":["admin","read"], >>> "index":4}], >>> "user-role":{ >>> "admin":"admin", >>> "read":"read", >>> "1":"r1", >>> "2":"r2"} >>> } >>> } >>> >>> >>> I have a 403 errors for user 1 on biblio and user 2 on personnes while >>> using the "/select" requestHandler. However according to r1 and r2 roles >>> and premissions order, the access should be allowed. >>> >>> I have duplicated the TestRuleBasedAuthorizationPlugin.java class in >>> order to test these exact same permissions and roles. checkRules reports >>> access is allowed !!! >>> >>> I don't understand where is the problem. Any ideas ? >>> >>> Regards >>> >>> Dominique >>> >>> >>> >>> >>> >>> >>> >>>
Re: Is there a common tool for SOLR benckmark?
Cool!Looking forward to this patch to be available. Best,TinsWzy Mikhail Khludnev 于2018年12月22日周六 上午4:30写道: > I've used the patch from https://issues.apache.org/jira/browse/SOLR-2646 a > while ago. > > On Fri, Dec 21, 2018 at 6:34 PM Dominique Bejean < > dominique.bej...@eolya.fr> > wrote: > > > Hi, > > > > There are the powerfull JMeter obviously and also SolrMeter ( > > https://github.com/tflobbe/solrmeter). > > > > Regards > > > > Dominique > > > > > > Le jeu. 20 déc. 2018 à 03:17, zhenyuan wei a écrit : > > > > > Hi all, > > >Is there a common tool for SOLR benckmark? YCSB is not very > > > suitable for SOLR. Currently, Is there a good benchmark tool for > SOLR? > > > > > > > > > Best, TinsWzy > > > > > > > > -- > Sincerely yours > Mikhail Khludnev >
Please unsubscribe me from solr-user emails
Hi Team, I tried automated way to unsubscribe from solr-user emails. could you please help me in unsubscribing the emails ? -- Regards Gaurav Srivastava