Re: Solr cloud production set up
Our Index size is huge and in master slave the full indexing time is almost 24 hrs. In future the no of documents will increase. So,please some one recommend about the no of nodes and configuration like ram and cpu core for solr cloud. On Sat, 18 Jan, 2020, 8:05 AM Walter Underwood, wrote: > Why do you want to change to Solr Cloud? Master/slave is a great, stable > cluster architecture. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > > On Jan 17, 2020, at 6:19 PM, Rajdeep Sahoo > wrote: > > > > Please reply anyone > > > > On Sat, 18 Jan, 2020, 12:13 AM Rajdeep Sahoo, < > rajdeepsahoo2...@gmail.com> > > wrote: > > > >> Hi all, > >> We are using solr cloud 7.7.1 > >> In a live production environment how many solr cloud server do we need, > >> Currently ,we are using master slave set up with 16 slave server with > >> solr 4.6. > >> In solr cloud do we need to scale it up or 16 server will suffice the > >> purpose. > >> > >> > >
Re: Solr cloud production set up
How big? We index 35 million documents in about 6 hours. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jan 18, 2020, at 12:05 AM, Rajdeep Sahoo > wrote: > > Our Index size is huge and in master slave the full indexing time is almost > 24 hrs. > In future the no of documents will increase. > So,please some one recommend about the no of nodes and configuration like > ram and cpu core for solr cloud. > > On Sat, 18 Jan, 2020, 8:05 AM Walter Underwood, > wrote: > >> Why do you want to change to Solr Cloud? Master/slave is a great, stable >> cluster architecture. >> >> wunder >> Walter Underwood >> wun...@wunderwood.org >> http://observer.wunderwood.org/ (my blog) >> >>> On Jan 17, 2020, at 6:19 PM, Rajdeep Sahoo >> wrote: >>> >>> Please reply anyone >>> >>> On Sat, 18 Jan, 2020, 12:13 AM Rajdeep Sahoo, < >> rajdeepsahoo2...@gmail.com> >>> wrote: >>> Hi all, We are using solr cloud 7.7.1 In a live production environment how many solr cloud server do we need, Currently ,we are using master slave set up with 16 slave server with solr 4.6. In solr cloud do we need to scale it up or 16 server will suffice the purpose. >> >>
Re: Solr cloud production set up
I think you should do your own measurements. This is very document and processing specific. You can run a test with a simple setup for let’s say 1 mio document and interpolate from this. It could be also that your ETL is the bottleneck and not Solr. At the same time you can simulate user queries using Jmeter or similar. > Am 18.01.2020 um 09:05 schrieb Rajdeep Sahoo : > > Our Index size is huge and in master slave the full indexing time is almost > 24 hrs. > In future the no of documents will increase. > So,please some one recommend about the no of nodes and configuration like > ram and cpu core for solr cloud. > >> On Sat, 18 Jan, 2020, 8:05 AM Walter Underwood, >> wrote: >> >> Why do you want to change to Solr Cloud? Master/slave is a great, stable >> cluster architecture. >> >> wunder >> Walter Underwood >> wun...@wunderwood.org >> http://observer.wunderwood.org/ (my blog) >> >>> On Jan 17, 2020, at 6:19 PM, Rajdeep Sahoo >> wrote: >>> >>> Please reply anyone >>> >>> On Sat, 18 Jan, 2020, 12:13 AM Rajdeep Sahoo, < >> rajdeepsahoo2...@gmail.com> >>> wrote: >>> Hi all, We are using solr cloud 7.7.1 In a live production environment how many solr cloud server do we need, Currently ,we are using master slave set up with 16 slave server with solr 4.6. In solr cloud do we need to scale it up or 16 server will suffice the purpose. >> >>
Indexing HTML Metatags Nutch - SOLR
Hello, I have been trying this for several days without success. (nutch 1.16 - solr 7.3.1) I have followed this description: https://cwiki.apache.org/confluence/display/nutch/IndexMetatags Below I put my file nutch-site.xml I have created the core following this description: https://cwiki.apache.org/confluence/display/nutch/NutchTutorial/ By the way without the metatags everything works fine. Bevor creating the core I deleted the managed-schema.xml and inserted my metatag fields into schema.xml in the configsets directory of the core First Question: After creating the core I see a managed-schema.xml file and a schema.xml.bak file in the conf directory of the core. Sorry I am new to this, but I believe I do not want managed-schema.xml??? (See description above) Anyway when I run the crawl all is ok until the index is created. Then I end up with the error: org.apache.solr.common.SolrException: copyField dest :'metatag.SITdescription_str' is not an explicit field and doesn't match a dynamicField. at org.apache.solr.schema.IndexSchema.registerCopyField(IndexSchema.java:902) at org.apache.solr.schema.ManagedIndexSchema.addCopyFields(ManagedIndexSchema.java:784) There is no copyfield instruction for metatag.SITdescription in managed-schema.xml. I even created a field "metatag.SITdescription_str" in managed-schema.xml which did not help. Can you help me please Best Regards Martin nutch-site.xml http.agent.name SIT_NUTCH_SPIDER db.ignore.external.links true If true, outlinks leading from a page to external hosts will be ignored. This is an effective way to limit the crawl to include only initially injected hosts, without creating complex URLFilters. plugin.includes protocol-http|urlfilter-(regex|validator)|parse-(html|tika|metatags)|index-(basic|anchor|metadata)|indexer-solr|scoring-opic|urlnormalizer-(pass|regex|basic) Regular expression naming plugin directory names to include. Any plugin not matching this expression is excluded. By default Nutch includes plugins to crawl HTML and various other document formats via HTTP/HTTPS and indexing the crawled content into Solr. More plugins are available to support more indexing backends, to fetch ftp:// and file:// URLs, for focused crawling, and many other use cases. http.robot.rules.whitelist sitlux02.sit.de Comma separated list of hostnames or IP addresses to ignore robot rules parsing for. metatags.names SITdescription,SITkeywords,SITcategory,SITintern Names of the metatags to extract, separated by ','. Use '*' to extract all metatags. Prefixes the names with 'metatag.' in the parse-metadata. For instance to index description and keywords, you need to activate the plugin index-metadata and set the value of the parameter 'index.parse.md' to 'metatag.description,metatag.keywords'. index.parse.md metatag.SITdescription,metatag.SITkeywords,metatag.SITcategory,metatag.SITintern Comma-separated list of keys to be taken from the parse metadata to generate fields. Can be used e.g. for 'description' or 'keywords' provided that these values are generated by a parser (see parse-metatags plugin) index.metadata metatag.SITdescription,metatag.SITkeywords,metatag.SITcategory,metatag.SITintern Comma-separated list of keys to be taken from the metadata to generate fields. Can be used e.g. for 'description' or 'keywords' provided that these values are generated by a parser (see parse-metatags plugin), and property 'metatags.names'. -- Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Indexing HTML Metatags Nutch - SOLR
Hello, I have been trying this for several days without success. (nutch 1.16 - solr 7.3.1) I have followed this description: https://cwiki.apache.org/confluence/display/nutch/IndexMetatags Below I put my file nutch-site.xml I have created the core following this description: https://cwiki.apache.org/confluence/display/nutch/NutchTutorial/ By the way without the metatags everything works fine. Bevor creating the core I deleted the managed-schema.xml and inserted my metatag fields into schema.xml in the configsets directory of the core First Question: After creating the core I see a managed-schema.xml file and a schema.xml.bak file in the conf directory of the core. Sorry I am new to this, but I believe I do not want managed-schema.xml??? (See description above) Anyway when I run the crawl all is ok until the index is created. Then I end up with the error: org.apache.solr.common.SolrException: copyField dest :'metatag.SITdescription_str' is not an explicit field and doesn't match a dynamicField. at org.apache.solr.schema.IndexSchema.registerCopyField(IndexSchema.java:902) at org.apache.solr.schema.ManagedIndexSchema.addCopyFields(ManagedIndexSchema.java:784) There is no copyfield instruction for metatag.SITdescription in managed-schema.xml. I even created a field "metatag.SITdescription_str" in managed-schema.xml which did not help. Can you help me please Best Regards Martin nutch-site.xml http.agent.name SIT_NUTCH_SPIDER db.ignore.external.links true If true, outlinks leading from a page to external hosts will be ignored. This is an effective way to limit the crawl to include only initially injected hosts, without creating complex URLFilters. plugin.includes protocol-http|urlfilter-(regex|validator)|parse-(html|tika|metatags)|index-(basic|anchor|metadata)|indexer-solr|scoring-opic|urlnormalizer-(pass|regex|basic) Regular expression naming plugin directory names to include. Any plugin not matching this expression is excluded. By default Nutch includes plugins to crawl HTML and various other document formats via HTTP/HTTPS and indexing the crawled content into Solr. More plugins are available to support more indexing backends, to fetch ftp:// and file:// URLs, for focused crawling, and many other use cases. http.robot.rules.whitelist sitlux02.sit.de Comma separated list of hostnames or IP addresses to ignore robot rules parsing for. metatags.names SITdescription,SITkeywords,SITcategory,SITintern Names of the metatags to extract, separated by ','. Use '*' to extract all metatags. Prefixes the names with 'metatag.' in the parse-metadata. For instance to index description and keywords, you need to activate the plugin index-metadata and set the value of the parameter 'index.parse.md' to 'metatag.description,metatag.keywords'. index.parse.md metatag.SITdescription,metatag.SITkeywords,metatag.SITcategory,metatag.SITintern Comma-separated list of keys to be taken from the parse metadata to generate fields. Can be used e.g. for 'description' or 'keywords' provided that these values are generated by a parser (see parse-metatags plugin) index.metadata metatag.SITdescription,metatag.SITkeywords,metatag.SITcategory,metatag.SITintern Comma-separated list of keys to be taken from the metadata to generate fields. Can be used e.g. for 'description' or 'keywords' provided that these values are generated by a parser (see parse-metatags plugin), and property 'metatags.names'. -- Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Solr cloud production set up
Got your point. If we think about the infra, then in cloud do we need more infra in comparison to master slave. On Sat, 18 Jan, 2020, 2:24 PM Jörn Franke, wrote: > I think you should do your own measurements. This is very document and > processing specific. > You can run a test with a simple setup for let’s say 1 mio document and > interpolate from this. It could be also that your ETL is the bottleneck and > not Solr. > At the same time you can simulate user queries using Jmeter or similar. > > > Am 18.01.2020 um 09:05 schrieb Rajdeep Sahoo >: > > > > Our Index size is huge and in master slave the full indexing time is > almost > > 24 hrs. > > In future the no of documents will increase. > > So,please some one recommend about the no of nodes and configuration like > > ram and cpu core for solr cloud. > > > >> On Sat, 18 Jan, 2020, 8:05 AM Walter Underwood, > >> wrote: > >> > >> Why do you want to change to Solr Cloud? Master/slave is a great, stable > >> cluster architecture. > >> > >> wunder > >> Walter Underwood > >> wun...@wunderwood.org > >> http://observer.wunderwood.org/ (my blog) > >> > >>> On Jan 17, 2020, at 6:19 PM, Rajdeep Sahoo > > >> wrote: > >>> > >>> Please reply anyone > >>> > >>> On Sat, 18 Jan, 2020, 12:13 AM Rajdeep Sahoo, < > >> rajdeepsahoo2...@gmail.com> > >>> wrote: > >>> > Hi all, > We are using solr cloud 7.7.1 > In a live production environment how many solr cloud server do we > need, > Currently ,we are using master slave set up with 16 slave server with > solr 4.6. > In solr cloud do we need to scale it up or 16 server will suffice the > purpose. > > > >> > >> >
Re: Upgrading solr to 8.2
There has been modification in field types. I would suggest you need to compare two schema and then may be you have to reindex it. Other than that latest version has light footprint so it should be the case. On Wed, Jan 15, 2020 at 9:05 PM kshitij tyagi wrote: > Hi, > > Any suggestions from anyone? > > Regards, > kshitij > > On Tue, Jan 14, 2020 at 4:11 PM Jan Høydahl wrote: > > > Please don’t cross-post, this discussion belongs in solr-user only. > > > > Jan > > > > > 14. jan. 2020 kl. 22:22 skrev kshitij tyagi < > kshitij.shopcl...@gmail.com > > >: > > > > > > Also trie fileds have been updated to point fields, will that by any > > chance > > > degrade my response time by 50 percent? > > > > > > On Tue, Jan 14, 2020 at 1:37 PM kshitij tyagi > > > wrote: > > > > > >> Hi Team, > > >> > > >> I am currently upgrading my system from solr 6.6 to solr 8.2 : > > >> > > >> 1. I am observing increased search time in my queries i.e. search > > response > > >> time is increasing along with cpu utilisation, although memory looks > > fine, > > >> on analysing heap dumps I figured out that queries are taking most of > > the > > >> time in Docstreamer.java file and method convertLuceneDocToSolrDoc. > > >> I saw a couple of Solr jira regarding the same, example : SOLR-11891, > > >> SOLR-1265. > > >> > > >> Can anyone please help me out by pointing out where I need to look out > > and > > >> what needs to be done in order to bring back my response time which > was > > >> earlier? > > >> > > >> Regards, > > >> kshitij > > >> > > > > >
Re: Solr cloud production set up
On 1/18/2020 1:05 AM, Rajdeep Sahoo wrote: Our Index size is huge and in master slave the full indexing time is almost 24 hrs. In future the no of documents will increase. So,please some one recommend about the no of nodes and configuration like ram and cpu core for solr cloud. Indexing is not going to be any faster in SolrCloud. It would probably be a little bit slower. The best way to speed up indexing, whether running SolrCloud or not, is to make your indexing processes run in parallel, so that multiple batches of documents are being indexed at the same time. SolrCloud is not a magic bullet that solves all problems. It's just a different way of managing indexes that has more automation, and makes initial setup of a distributed index a lot easier. It doesn't do the job any faster than running without SolrCloud. The legacy master/slave mode is likely to be a little bit faster. You haven't provided any of the information required for us to guess about the system requirements. And it will be a guess ... we could be completely wrong. https://lucidworks.com/post/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ Thanks, Shawn
Re: Solr cloud production set up
Hi shawn, Thanks for your reply We do parallel indexing in production, What about search performance in solr cloud in comparison with master slave. And what about block join performance in solr cloud. Do we need to increase the infra for solr cloud as we would be maintaining multiple shard and replica. Is there any co relation with master slave set up. On Sat, 18 Jan, 2020, 10:01 PM Shawn Heisey, wrote: > On 1/18/2020 1:05 AM, Rajdeep Sahoo wrote: > > Our Index size is huge and in master slave the full indexing time is > almost > > 24 hrs. > > In future the no of documents will increase. > > So,please some one recommend about the no of nodes and configuration like > > ram and cpu core for solr cloud. > > Indexing is not going to be any faster in SolrCloud. It would probably > be a little bit slower. The best way to speed up indexing, whether > running SolrCloud or not, is to make your indexing processes run in > parallel, so that multiple batches of documents are being indexed at the > same time. > > SolrCloud is not a magic bullet that solves all problems. It's just a > different way of managing indexes that has more automation, and makes > initial setup of a distributed index a lot easier. It doesn't do the > job any faster than running without SolrCloud. The legacy master/slave > mode is likely to be a little bit faster. > > You haven't provided any of the information required for us to guess > about the system requirements. And it will be a guess ... we could be > completely wrong. > > > https://lucidworks.com/post/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ > > Thanks, > Shawn >
Re: Solr cloud production set up
Agreed with the above. what’s your idea of “huge”? I have 600 ish gb in one core plus another 250x2 in two more on the same standalone solr instance and it runs more than fine > On Jan 18, 2020, at 11:31 AM, Shawn Heisey wrote: > > On 1/18/2020 1:05 AM, Rajdeep Sahoo wrote: >> Our Index size is huge and in master slave the full indexing time is almost >> 24 hrs. >>In future the no of documents will increase. >> So,please some one recommend about the no of nodes and configuration like >> ram and cpu core for solr cloud. > > Indexing is not going to be any faster in SolrCloud. It would probably be a > little bit slower. The best way to speed up indexing, whether running > SolrCloud or not, is to make your indexing processes run in parallel, so that > multiple batches of documents are being indexed at the same time. > > SolrCloud is not a magic bullet that solves all problems. It's just a > different way of managing indexes that has more automation, and makes initial > setup of a distributed index a lot easier. It doesn't do the job any faster > than running without SolrCloud. The legacy master/slave mode is likely to be > a little bit faster. > > You haven't provided any of the information required for us to guess about > the system requirements. And it will be a guess ... we could be completely > wrong. > > https://lucidworks.com/post/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ > > Thanks, > Shawn
Re: Solr cloud production set up
We are having 2.3 million documents and size is 2.5 gb. 10 core cpu and 24 gb ram . 16 slave nodes. Still some of the queries are taking 50 sec at solr end. As we are using solr 4.6 . Other thing is we are having 200 (avg) facet fields in a query. And 30 searchable fields. Is there any way to identify why it is taking 50 sec for a query. Multiple concurrent requests are there. On Sat, 18 Jan, 2020, 10:32 PM Dave, wrote: > Agreed with the above. what’s your idea of “huge”? I have 600 ish gb in > one core plus another 250x2 in two more on the same standalone solr > instance and it runs more than fine > > > On Jan 18, 2020, at 11:31 AM, Shawn Heisey wrote: > > > > On 1/18/2020 1:05 AM, Rajdeep Sahoo wrote: > >> Our Index size is huge and in master slave the full indexing time is > almost > >> 24 hrs. > >>In future the no of documents will increase. > >> So,please some one recommend about the no of nodes and configuration > like > >> ram and cpu core for solr cloud. > > > > Indexing is not going to be any faster in SolrCloud. It would probably > be a little bit slower. The best way to speed up indexing, whether running > SolrCloud or not, is to make your indexing processes run in parallel, so > that multiple batches of documents are being indexed at the same time. > > > > SolrCloud is not a magic bullet that solves all problems. It's just a > different way of managing indexes that has more automation, and makes > initial setup of a distributed index a lot easier. It doesn't do the job > any faster than running without SolrCloud. The legacy master/slave mode is > likely to be a little bit faster. > > > > You haven't provided any of the information required for us to guess > about the system requirements. And it will be a guess ... we could be > completely wrong. > > > > > https://lucidworks.com/post/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ > > > > Thanks, > > Shawn >
Re: Solr cloud production set up
On 1/18/2020 9:55 AM, Rajdeep Sahoo wrote: We do parallel indexing in production, What about search performance in solr cloud in comparison with master slave. And what about block join performance in solr cloud. Do we need to increase the infra for solr cloud as we would be maintaining multiple shard and replica. Is there any co relation with master slave set up. As I said before, SolrCloud is not a magic bullet that solves performance issues. If the index characteristics are the same (number of docs, total size), performance in SolrCloud will be nearly identical to non-cloud. Thanks, Shawn
Re: Solr cloud production set up
Hi shawn, Thanks for this info, Could you Please address my below query, We are having 2.3 million documents and size is 2.5 gb. With this data do we need solr cloud. 10 core cpu and 24 gb ram . 16 slave nodes. Still some of the queries are taking 50 sec at solr end. As we are using solr 4.6 . Other thing is we are having 200 (avg) facet fields in a query. And 30 searchable fields. Is there any way to identify why it is taking 50 sec for a query. Multiple concurrent requests are there. And how to optimize the search response time as it is almost 1 mins for some request. On Sat, 18 Jan, 2020, 10:52 PM Shawn Heisey, wrote: > On 1/18/2020 9:55 AM, Rajdeep Sahoo wrote: > > We do parallel indexing in production, > > > > What about search performance in solr cloud in comparison with master > > slave. > > And what about block join performance in solr cloud. > > Do we need to increase the infra for solr cloud as we would be > > maintaining multiple shard and replica. > >Is there any co relation with master slave set up. > > As I said before, SolrCloud is not a magic bullet that solves > performance issues. If the index characteristics are the same (number > of docs, total size), performance in SolrCloud will be nearly identical > to non-cloud. > > Thanks, > Shawn >
Re: Solr cloud production set up
For indexing, is the master node CPU around 90%? If not, you aren’t sending requests fast enough or your disk is slow. For querying, 200 facet fields is HUGE. That will take a lot of Java heap memory and will be slow. Each facet fields requires large in-memory arrays and sorting. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jan 18, 2020, at 9:29 AM, Rajdeep Sahoo wrote: > > Hi shawn, > Thanks for this info, > Could you Please address my below query, > > > We are having 2.3 million documents and size is 2.5 gb. > With this data do we need solr cloud. > > 10 core cpu and 24 gb ram . 16 slave nodes. > > Still some of the queries are taking 50 sec at solr end. > As we are using solr 4.6 . > Other thing is we are having 200 (avg) facet fields in a query. > And 30 searchable fields. > Is there any way to identify why it is taking 50 sec for a query. >Multiple concurrent requests are there. > > And how to optimize the search response time as it is almost 1 mins for > some request. > > > On Sat, 18 Jan, 2020, 10:52 PM Shawn Heisey, wrote: > >> On 1/18/2020 9:55 AM, Rajdeep Sahoo wrote: >>> We do parallel indexing in production, >>> >>> What about search performance in solr cloud in comparison with master >>> slave. >>>And what about block join performance in solr cloud. >>>Do we need to increase the infra for solr cloud as we would be >>> maintaining multiple shard and replica. >>> Is there any co relation with master slave set up. >> >> As I said before, SolrCloud is not a magic bullet that solves >> performance issues. If the index characteristics are the same (number >> of docs, total size), performance in SolrCloud will be nearly identical >> to non-cloud. >> >> Thanks, >> Shawn >>
Re: Solr cloud production set up
Although we are having a avg of 200 facet fields in the search request all of them will not be having values in each request. Max of 50-60 facet fields will be having some value. And we are using function query,is it having some performance impact. On Sat, 18 Jan, 2020, 11:10 PM Walter Underwood, wrote: > For indexing, is the master node CPU around 90%? If not, you aren’t > sending requests fast enough or your disk is slow. > > For querying, 200 facet fields is HUGE. That will take a lot of Java heap > memory and will be slow. Each facet fields requires large in-memory arrays > and sorting. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > > On Jan 18, 2020, at 9:29 AM, Rajdeep Sahoo > wrote: > > > > Hi shawn, > > Thanks for this info, > > Could you Please address my below query, > > > > > > We are having 2.3 million documents and size is 2.5 gb. > > With this data do we need solr cloud. > > > > 10 core cpu and 24 gb ram . 16 slave nodes. > > > > Still some of the queries are taking 50 sec at solr end. > > As we are using solr 4.6 . > > Other thing is we are having 200 (avg) facet fields in a query. > > And 30 searchable fields. > > Is there any way to identify why it is taking 50 sec for a query. > >Multiple concurrent requests are there. > > > > And how to optimize the search response time as it is almost 1 mins for > > some request. > > > > > > On Sat, 18 Jan, 2020, 10:52 PM Shawn Heisey, > wrote: > > > >> On 1/18/2020 9:55 AM, Rajdeep Sahoo wrote: > >>> We do parallel indexing in production, > >>> > >>> What about search performance in solr cloud in comparison with master > >>> slave. > >>>And what about block join performance in solr cloud. > >>>Do we need to increase the infra for solr cloud as we would be > >>> maintaining multiple shard and replica. > >>> Is there any co relation with master slave set up. > >> > >> As I said before, SolrCloud is not a magic bullet that solves > >> performance issues. If the index characteristics are the same (number > >> of docs, total size), performance in SolrCloud will be nearly identical > >> to non-cloud. > >> > >> Thanks, > >> Shawn > >> > >
Re: Solr cloud production set up
On 1/18/2020 10:09 AM, Rajdeep Sahoo wrote: We are having 2.3 million documents and size is 2.5 gb. 10 core cpu and 24 gb ram . 16 slave nodes. Still some of the queries are taking 50 sec at solr end. As we are using solr 4.6 . Other thing is we are having 200 (avg) facet fields in a query. And 30 searchable fields. Is there any way to identify why it is taking 50 sec for a query. Multiple concurrent requests are there. Searching 30 fields and computing 200 facets is never going to be super fast. Switching to cloud will not help, and might make it slower. Your index is pretty small to a lot of us. There are people running indexes with billions of documents that take terabytes of disk space. As Walter mentioned, computing 200 facets is going to require a fair amount of heap memory. One *possible* problem here is that the Solr heap size is too small, so a lot of GC is required. How much of the 24GB have you assigned to the heap? Is there any software other than Solr running on these nodes? Thanks, Shawn
Re: Solr cloud production set up
We have assigned 16 gb out of 24gb for heap . No other process is running on that node. 200 facets fields are there in the query but we will not be getting the values for each facets for every search. There can be max of 50-60 facets for which we will be getting values. We are using caching,is it not going to help. On Sat, 18 Jan, 2020, 11:36 PM Shawn Heisey, wrote: > On 1/18/2020 10:09 AM, Rajdeep Sahoo wrote: > > We are having 2.3 million documents and size is 2.5 gb. > >10 core cpu and 24 gb ram . 16 slave nodes. > > > >Still some of the queries are taking 50 sec at solr end. > > As we are using solr 4.6 . > >Other thing is we are having 200 (avg) facet fields in a query. > > And 30 searchable fields. > > Is there any way to identify why it is taking 50 sec for a query. > > Multiple concurrent requests are there. > > Searching 30 fields and computing 200 facets is never going to be super > fast. Switching to cloud will not help, and might make it slower. > > Your index is pretty small to a lot of us. There are people running > indexes with billions of documents that take terabytes of disk space. > > As Walter mentioned, computing 200 facets is going to require a fair > amount of heap memory. One *possible* problem here is that the Solr > heap size is too small, so a lot of GC is required. How much of the > 24GB have you assigned to the heap? Is there any software other than > Solr running on these nodes? > > Thanks, > Shawn >
Re: Solr cloud production set up
If you’re not getting values, don’t ask for the facet. Facets are expensive as hell, maybe you should think more about your query’s than your infrastructure, solr cloud won’t help you at all especially if your asking for things you don’t need > On Jan 18, 2020, at 1:25 PM, Rajdeep Sahoo wrote: > > We have assigned 16 gb out of 24gb for heap . > No other process is running on that node. > > 200 facets fields are there in the query but we will not be getting the > values for each facets for every search. > There can be max of 50-60 facets for which we will be getting values. > > We are using caching,is it not going to help. > > > >> On Sat, 18 Jan, 2020, 11:36 PM Shawn Heisey, wrote: >> >>> On 1/18/2020 10:09 AM, Rajdeep Sahoo wrote: >>> We are having 2.3 million documents and size is 2.5 gb. >>> 10 core cpu and 24 gb ram . 16 slave nodes. >>> >>> Still some of the queries are taking 50 sec at solr end. >>> As we are using solr 4.6 . >>> Other thing is we are having 200 (avg) facet fields in a query. >>> And 30 searchable fields. >>> Is there any way to identify why it is taking 50 sec for a query. >>> Multiple concurrent requests are there. >> >> Searching 30 fields and computing 200 facets is never going to be super >> fast. Switching to cloud will not help, and might make it slower. >> >> Your index is pretty small to a lot of us. There are people running >> indexes with billions of documents that take terabytes of disk space. >> >> As Walter mentioned, computing 200 facets is going to require a fair >> amount of heap memory. One *possible* problem here is that the Solr >> heap size is too small, so a lot of GC is required. How much of the >> 24GB have you assigned to the heap? Is there any software other than >> Solr running on these nodes? >> >> Thanks, >> Shawn >>
Re: Solr cloud production set up
Thanks for the suggestion, Is there any way to get the info which operation or which query params are increasing the response time. On Sat, 18 Jan, 2020, 11:59 PM Dave, wrote: > If you’re not getting values, don’t ask for the facet. Facets are > expensive as hell, maybe you should think more about your query’s than your > infrastructure, solr cloud won’t help you at all especially if your asking > for things you don’t need > > > On Jan 18, 2020, at 1:25 PM, Rajdeep Sahoo > wrote: > > > > We have assigned 16 gb out of 24gb for heap . > > No other process is running on that node. > > > > 200 facets fields are there in the query but we will not be getting the > > values for each facets for every search. > > There can be max of 50-60 facets for which we will be getting values. > > > > We are using caching,is it not going to help. > > > > > > > >> On Sat, 18 Jan, 2020, 11:36 PM Shawn Heisey, > wrote: > >> > >>> On 1/18/2020 10:09 AM, Rajdeep Sahoo wrote: > >>> We are having 2.3 million documents and size is 2.5 gb. > >>> 10 core cpu and 24 gb ram . 16 slave nodes. > >>> > >>> Still some of the queries are taking 50 sec at solr end. > >>> As we are using solr 4.6 . > >>> Other thing is we are having 200 (avg) facet fields in a query. > >>> And 30 searchable fields. > >>> Is there any way to identify why it is taking 50 sec for a query. > >>> Multiple concurrent requests are there. > >> > >> Searching 30 fields and computing 200 facets is never going to be super > >> fast. Switching to cloud will not help, and might make it slower. > >> > >> Your index is pretty small to a lot of us. There are people running > >> indexes with billions of documents that take terabytes of disk space. > >> > >> As Walter mentioned, computing 200 facets is going to require a fair > >> amount of heap memory. One *possible* problem here is that the Solr > >> heap size is too small, so a lot of GC is required. How much of the > >> 24GB have you assigned to the heap? Is there any software other than > >> Solr running on these nodes? > >> > >> Thanks, > >> Shawn > >> >
Re: Solr cloud production set up
Add &debug=timing to the query and it’ll show you the time each component takes. > On Jan 18, 2020, at 1:50 PM, Rajdeep Sahoo wrote: > > Thanks for the suggestion, > > Is there any way to get the info which operation or which query params are > increasing the response time. > > > On Sat, 18 Jan, 2020, 11:59 PM Dave, wrote: > >> If you’re not getting values, don’t ask for the facet. Facets are >> expensive as hell, maybe you should think more about your query’s than your >> infrastructure, solr cloud won’t help you at all especially if your asking >> for things you don’t need >> >>> On Jan 18, 2020, at 1:25 PM, Rajdeep Sahoo >> wrote: >>> >>> We have assigned 16 gb out of 24gb for heap . >>> No other process is running on that node. >>> >>> 200 facets fields are there in the query but we will not be getting the >>> values for each facets for every search. >>> There can be max of 50-60 facets for which we will be getting values. >>> >>> We are using caching,is it not going to help. >>> >>> >>> On Sat, 18 Jan, 2020, 11:36 PM Shawn Heisey, >> wrote: > On 1/18/2020 10:09 AM, Rajdeep Sahoo wrote: > We are having 2.3 million documents and size is 2.5 gb. > 10 core cpu and 24 gb ram . 16 slave nodes. > > Still some of the queries are taking 50 sec at solr end. > As we are using solr 4.6 . > Other thing is we are having 200 (avg) facet fields in a query. > And 30 searchable fields. > Is there any way to identify why it is taking 50 sec for a query. >Multiple concurrent requests are there. Searching 30 fields and computing 200 facets is never going to be super fast. Switching to cloud will not help, and might make it slower. Your index is pretty small to a lot of us. There are people running indexes with billions of documents that take terabytes of disk space. As Walter mentioned, computing 200 facets is going to require a fair amount of heap memory. One *possible* problem here is that the Solr heap size is too small, so a lot of GC is required. How much of the 24GB have you assigned to the heap? Is there any software other than Solr running on these nodes? Thanks, Shawn >>
Re: Solr cloud production set up
Apart from reducing no of facets in the query, is there any other query params or gc params or heap space or anything else that we need to tweak for improving search response time. On Sun, 19 Jan, 2020, 3:15 AM Erick Erickson, wrote: > Add &debug=timing to the query and it’ll show you the time each component > takes. > > > On Jan 18, 2020, at 1:50 PM, Rajdeep Sahoo > wrote: > > > > Thanks for the suggestion, > > > > Is there any way to get the info which operation or which query params > are > > increasing the response time. > > > > > > On Sat, 18 Jan, 2020, 11:59 PM Dave, > wrote: > > > >> If you’re not getting values, don’t ask for the facet. Facets are > >> expensive as hell, maybe you should think more about your query’s than > your > >> infrastructure, solr cloud won’t help you at all especially if your > asking > >> for things you don’t need > >> > >>> On Jan 18, 2020, at 1:25 PM, Rajdeep Sahoo > > >> wrote: > >>> > >>> We have assigned 16 gb out of 24gb for heap . > >>> No other process is running on that node. > >>> > >>> 200 facets fields are there in the query but we will not be getting the > >>> values for each facets for every search. > >>> There can be max of 50-60 facets for which we will be getting values. > >>> > >>> We are using caching,is it not going to help. > >>> > >>> > >>> > On Sat, 18 Jan, 2020, 11:36 PM Shawn Heisey, > >> wrote: > > > On 1/18/2020 10:09 AM, Rajdeep Sahoo wrote: > > We are having 2.3 million documents and size is 2.5 gb. > > 10 core cpu and 24 gb ram . 16 slave nodes. > > > > Still some of the queries are taking 50 sec at solr end. > > As we are using solr 4.6 . > > Other thing is we are having 200 (avg) facet fields in a query. > > And 30 searchable fields. > > Is there any way to identify why it is taking 50 sec for a query. > >Multiple concurrent requests are there. > > Searching 30 fields and computing 200 facets is never going to be > super > fast. Switching to cloud will not help, and might make it slower. > > Your index is pretty small to a lot of us. There are people running > indexes with billions of documents that take terabytes of disk space. > > As Walter mentioned, computing 200 facets is going to require a fair > amount of heap memory. One *possible* problem here is that the Solr > heap size is too small, so a lot of GC is required. How much of the > 24GB have you assigned to the heap? Is there any software other than > Solr running on these nodes? > > Thanks, > Shawn > > >> > >