date:20200118

Re: Solr cloud production set up

2020-01-18 Thread Rajdeep Sahoo

Our Index size is huge and in master slave the full indexing time is almost
24 hrs.
   In future the no of documents will increase.
So,please some one recommend about the no of nodes and configuration like
ram and cpu core for solr cloud.

On Sat, 18 Jan, 2020, 8:05 AM Walter Underwood, 
wrote:

> Why do you want to change to Solr Cloud? Master/slave is a great, stable
> cluster architecture.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Jan 17, 2020, at 6:19 PM, Rajdeep Sahoo 
> wrote:
> >
> > Please reply anyone
> >
> > On Sat, 18 Jan, 2020, 12:13 AM Rajdeep Sahoo, <
> rajdeepsahoo2...@gmail.com>
> > wrote:
> >
> >> Hi all,
> >> We are using solr cloud 7.7.1
> >> In a live production environment how many solr cloud server do we need,
> >> Currently ,we are using master slave set up with 16 slave server with
> >> solr 4.6.
> >> In solr cloud do we need to scale it up or 16 server will suffice the
> >> purpose.
> >>
> >>
>
>

Re: Solr cloud production set up

2020-01-18 Thread Walter Underwood

How big? We index 35 million documents in about 6 hours.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Jan 18, 2020, at 12:05 AM, Rajdeep Sahoo  
> wrote:
> 
> Our Index size is huge and in master slave the full indexing time is almost
> 24 hrs.
>   In future the no of documents will increase.
> So,please some one recommend about the no of nodes and configuration like
> ram and cpu core for solr cloud.
> 
> On Sat, 18 Jan, 2020, 8:05 AM Walter Underwood, 
> wrote:
> 
>> Why do you want to change to Solr Cloud? Master/slave is a great, stable
>> cluster architecture.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>>> On Jan 17, 2020, at 6:19 PM, Rajdeep Sahoo 
>> wrote:
>>> 
>>> Please reply anyone
>>> 
>>> On Sat, 18 Jan, 2020, 12:13 AM Rajdeep Sahoo, <
>> rajdeepsahoo2...@gmail.com>
>>> wrote:
>>> 
 Hi all,
 We are using solr cloud 7.7.1
 In a live production environment how many solr cloud server do we need,
 Currently ,we are using master slave set up with 16 slave server with
 solr 4.6.
 In solr cloud do we need to scale it up or 16 server will suffice the
 purpose.
 
 
>> 
>>

Re: Solr cloud production set up

2020-01-18 Thread Jörn Franke

I think you should do your own measurements. This is very document and 
processing specific.
You can run a test with a simple setup for let’s say 1 mio document and 
interpolate from this. It could be also that your ETL is the bottleneck and not 
Solr.
At the same time you can simulate user queries using Jmeter or similar.

> Am 18.01.2020 um 09:05 schrieb Rajdeep Sahoo :
> 
> Our Index size is huge and in master slave the full indexing time is almost
> 24 hrs.
>   In future the no of documents will increase.
> So,please some one recommend about the no of nodes and configuration like
> ram and cpu core for solr cloud.
> 
>> On Sat, 18 Jan, 2020, 8:05 AM Walter Underwood, 
>> wrote:
>> 
>> Why do you want to change to Solr Cloud? Master/slave is a great, stable
>> cluster architecture.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>>> On Jan 17, 2020, at 6:19 PM, Rajdeep Sahoo 
>> wrote:
>>> 
>>> Please reply anyone
>>> 
>>> On Sat, 18 Jan, 2020, 12:13 AM Rajdeep Sahoo, <
>> rajdeepsahoo2...@gmail.com>
>>> wrote:
>>> 
 Hi all,
 We are using solr cloud 7.7.1
 In a live production environment how many solr cloud server do we need,
 Currently ,we are using master slave set up with 16 slave server with
 solr 4.6.
 In solr cloud do we need to scale it up or 16 server will suffice the
 purpose.
 
 
>> 
>>

Indexing HTML Metatags Nutch - SOLR

2020-01-18 Thread kra...@gds2.de

Hello,

I have been trying this for several days without success. (nutch 1.16 - solr
7.3.1)

I have followed this description:
https://cwiki.apache.org/confluence/display/nutch/IndexMetatags
Below I put my file nutch-site.xml

I have created the core following this description:
https://cwiki.apache.org/confluence/display/nutch/NutchTutorial/

By the way without the metatags everything works fine.

Bevor creating the core I deleted the managed-schema.xml and inserted my
metatag fields into schema.xml in the configsets directory of the core

First Question: After creating the core I see a managed-schema.xml file and
a schema.xml.bak file in the conf directory of the core. Sorry I am new to
this, but I believe I do not want managed-schema.xml??? (See description
above)

Anyway when I run the crawl all is ok until the index is created. Then I end
up with the error:

org.apache.solr.common.SolrException: copyField dest
:'metatag.SITdescription_str' is not an explicit field and doesn't match a
dynamicField.
at
org.apache.solr.schema.IndexSchema.registerCopyField(IndexSchema.java:902)
at
org.apache.solr.schema.ManagedIndexSchema.addCopyFields(ManagedIndexSchema.java:784)

There is no copyfield instruction for metatag.SITdescription in
managed-schema.xml. I even created a field "metatag.SITdescription_str" in
managed-schema.xml which did not help.

Can you help me please

Best Regards

Martin

nutch-site.xml

http.agent.name
SIT_NUTCH_SPIDER

db.ignore.external.links
true
If true, outlinks leading from a page to external hosts will be
ignored. This is an effective way to limit the crawl to include only
initially injected hosts, without creating complex URLFilters.

plugin.includes

http.robot.rules.whitelist
sitlux02.sit.de
Comma separated list of hostnames or IP addresses to ignore
robot rules parsing for.

metatags.names
SITdescription,SITkeywords,SITcategory,SITintern
Names of the metatags to extract, separated by ','.
Use '*' to extract all metatags. Prefixes the names with 'metatag.'
in the parse-metadata. For instance to index description and keywords,
you need to activate the plugin index-metadata and set the value of the
parameter 'index.parse.md' to 'metatag.description,metatag.keywords'.

index.parse.md

metatag.SITdescription,metatag.SITkeywords,metatag.SITcategory,metatag.SITintern

Comma-separated list of keys to be taken from the parse metadata to
generate fields.
Can be used e.g. for 'description' or 'keywords' provided that these
values are generated
by a parser (see parse-metatags plugin)

index.metadata

metatag.SITdescription,metatag.SITkeywords,metatag.SITcategory,metatag.SITintern

Comma-separated list of keys to be taken from the metadata to generate
fields.
Can be used e.g. for 'description' or 'keywords' provided that these
values are generated
by a parser (see parse-metatags plugin), and property 'metatags.names'.

--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Indexing HTML Metatags Nutch - SOLR

2020-01-18 Thread kra...@gds2.de

Hello,

I have been trying this for several days without success. (nutch 1.16 - solr
7.3.1)

I have followed this description:
https://cwiki.apache.org/confluence/display/nutch/IndexMetatags
Below I put my file nutch-site.xml

I have created the core following this description:
https://cwiki.apache.org/confluence/display/nutch/NutchTutorial/

By the way without the metatags everything works fine.

Bevor creating the core I deleted the managed-schema.xml and inserted my
metatag fields into schema.xml in the configsets directory of the core

Anyway when I run the crawl all is ok until the index is created. Then I end
up with the error:

There is no copyfield instruction for metatag.SITdescription in
managed-schema.xml. I even created a field "metatag.SITdescription_str" in
managed-schema.xml which did not help.

Can you help me please

Best Regards

Martin

nutch-site.xml

http.agent.name
SIT_NUTCH_SPIDER

plugin.includes

http.robot.rules.whitelist
sitlux02.sit.de
Comma separated list of hostnames or IP addresses to ignore
robot rules parsing for.

index.parse.md

metatag.SITdescription,metatag.SITkeywords,metatag.SITcategory,metatag.SITintern

index.metadata

metatag.SITdescription,metatag.SITkeywords,metatag.SITcategory,metatag.SITintern

--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Solr cloud production set up

2020-01-18 Thread Rajdeep Sahoo

Got your point.
  If we think about the infra, then in cloud do we need more infra in
comparison to master slave.



On Sat, 18 Jan, 2020, 2:24 PM Jörn Franke,  wrote:

> I think you should do your own measurements. This is very document and
> processing specific.
> You can run a test with a simple setup for let’s say 1 mio document and
> interpolate from this. It could be also that your ETL is the bottleneck and
> not Solr.
> At the same time you can simulate user queries using Jmeter or similar.
>
> > Am 18.01.2020 um 09:05 schrieb Rajdeep Sahoo  >:
> >
> > Our Index size is huge and in master slave the full indexing time is
> almost
> > 24 hrs.
> >   In future the no of documents will increase.
> > So,please some one recommend about the no of nodes and configuration like
> > ram and cpu core for solr cloud.
> >
> >> On Sat, 18 Jan, 2020, 8:05 AM Walter Underwood, 
> >> wrote:
> >>
> >> Why do you want to change to Solr Cloud? Master/slave is a great, stable
> >> cluster architecture.
> >>
> >> wunder
> >> Walter Underwood
> >> wun...@wunderwood.org
> >> http://observer.wunderwood.org/  (my blog)
> >>
> >>> On Jan 17, 2020, at 6:19 PM, Rajdeep Sahoo  >
> >> wrote:
> >>>
> >>> Please reply anyone
> >>>
> >>> On Sat, 18 Jan, 2020, 12:13 AM Rajdeep Sahoo, <
> >> rajdeepsahoo2...@gmail.com>
> >>> wrote:
> >>>
>  Hi all,
>  We are using solr cloud 7.7.1
>  In a live production environment how many solr cloud server do we
> need,
>  Currently ,we are using master slave set up with 16 slave server with
>  solr 4.6.
>  In solr cloud do we need to scale it up or 16 server will suffice the
>  purpose.
> 
> 
> >>
> >>
>

Re: Upgrading solr to 8.2

2020-01-18 Thread Zara Parst

There has been modification in field types. I would suggest you need to
compare two schema and then may be you have to reindex it. Other than that
latest version has light footprint so it should be the case.

On Wed, Jan 15, 2020 at 9:05 PM kshitij tyagi 
wrote:

> Hi,
>
> Any suggestions from anyone?
>
> Regards,
> kshitij
>
> On Tue, Jan 14, 2020 at 4:11 PM Jan Høydahl  wrote:
>
> > Please don’t cross-post, this discussion belongs in solr-user only.
> >
> > Jan
> >
> > > 14. jan. 2020 kl. 22:22 skrev kshitij tyagi <
> kshitij.shopcl...@gmail.com
> > >:
> > >
> > > Also trie fileds have been updated to point fields, will that by any
> > chance
> > > degrade my response time by 50 percent?
> > >
> > > On Tue, Jan 14, 2020 at 1:37 PM kshitij tyagi 
> > > wrote:
> > >
> > >> Hi Team,
> > >>
> > >> I am currently upgrading my system from solr 6.6 to solr 8.2 :
> > >>
> > >> 1.  I am observing increased search time in my queries i.e. search
> > response
> > >> time is increasing along with cpu utilisation, although memory looks
> > fine,
> > >> on analysing heap dumps I figured out that queries are taking most of
> > the
> > >> time in Docstreamer.java file and method convertLuceneDocToSolrDoc.
> > >> I saw a couple of Solr jira regarding the same, example : SOLR-11891,
> > >> SOLR-1265.
> > >>
> > >> Can anyone please help me out by pointing out where I need to look out
> > and
> > >> what needs to be done in order to bring back my response time which
> was
> > >> earlier?
> > >>
> > >> Regards,
> > >> kshitij
> > >>
> >
> >
>

Re: Solr cloud production set up

2020-01-18 Thread Shawn Heisey


On 1/18/2020 1:05 AM, Rajdeep Sahoo wrote:

Our Index size is huge and in master slave the full indexing time is almost
24 hrs.
In future the no of documents will increase.
So,please some one recommend about the no of nodes and configuration like
ram and cpu core for solr cloud.


Indexing is not going to be any faster in SolrCloud.  It would probably 
be a little bit slower.  The best way to speed up indexing, whether 
running SolrCloud or not, is to make your indexing processes run in 
parallel, so that multiple batches of documents are being indexed at the 
same time.


SolrCloud is not a magic bullet that solves all problems.  It's just a 
different way of managing indexes that has more automation, and makes 
initial setup of a distributed index a lot easier.  It doesn't do the 
job any faster than running without SolrCloud.  The legacy master/slave 
mode is likely to be a little bit faster.


You haven't provided any of the information required for us to guess 
about the system requirements.  And it will be a guess ... we could be 
completely wrong.


https://lucidworks.com/post/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

Thanks,
Shawn

Re: Solr cloud production set up

2020-01-18 Thread Rajdeep Sahoo

Hi shawn,
 Thanks for your reply

We do parallel indexing in production,

 What about search performance in solr cloud in comparison with master
slave.
   And what about  block join performance in solr cloud.
   Do we need to increase the infra for solr cloud as we would be
maintaining multiple shard and replica.
  Is there any co relation with master slave set up.




On Sat, 18 Jan, 2020, 10:01 PM Shawn Heisey,  wrote:

> On 1/18/2020 1:05 AM, Rajdeep Sahoo wrote:
> > Our Index size is huge and in master slave the full indexing time is
> almost
> > 24 hrs.
> > In future the no of documents will increase.
> > So,please some one recommend about the no of nodes and configuration like
> > ram and cpu core for solr cloud.
>
> Indexing is not going to be any faster in SolrCloud.  It would probably
> be a little bit slower.  The best way to speed up indexing, whether
> running SolrCloud or not, is to make your indexing processes run in
> parallel, so that multiple batches of documents are being indexed at the
> same time.
>
> SolrCloud is not a magic bullet that solves all problems.  It's just a
> different way of managing indexes that has more automation, and makes
> initial setup of a distributed index a lot easier.  It doesn't do the
> job any faster than running without SolrCloud.  The legacy master/slave
> mode is likely to be a little bit faster.
>
> You haven't provided any of the information required for us to guess
> about the system requirements.  And it will be a guess ... we could be
> completely wrong.
>
>
> https://lucidworks.com/post/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
>
> Thanks,
> Shawn
>

Re: Solr cloud production set up

2020-01-18 Thread Dave

Agreed with the above. what’s your idea of “huge”? I have 600 ish gb in one 
core plus another 250x2 in two more on the same standalone solr instance and it 
runs more than fine

> On Jan 18, 2020, at 11:31 AM, Shawn Heisey  wrote:
> 
> On 1/18/2020 1:05 AM, Rajdeep Sahoo wrote:
>> Our Index size is huge and in master slave the full indexing time is almost
>> 24 hrs.
>>In future the no of documents will increase.
>> So,please some one recommend about the no of nodes and configuration like
>> ram and cpu core for solr cloud.
> 
> Indexing is not going to be any faster in SolrCloud.  It would probably be a 
> little bit slower.  The best way to speed up indexing, whether running 
> SolrCloud or not, is to make your indexing processes run in parallel, so that 
> multiple batches of documents are being indexed at the same time.
> 
> SolrCloud is not a magic bullet that solves all problems.  It's just a 
> different way of managing indexes that has more automation, and makes initial 
> setup of a distributed index a lot easier.  It doesn't do the job any faster 
> than running without SolrCloud.  The legacy master/slave mode is likely to be 
> a little bit faster.
> 
> You haven't provided any of the information required for us to guess about 
> the system requirements.  And it will be a guess ... we could be completely 
> wrong.
> 
> https://lucidworks.com/post/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
> 
> Thanks,
> Shawn

Re: Solr cloud production set up

2020-01-18 Thread Rajdeep Sahoo

We are having 2.3 million documents and size is 2.5 gb.
  10 core cpu and 24 gb ram . 16 slave nodes.

  Still some of the queries are taking 50 sec at solr end.
As we are using solr 4.6 .
  Other thing is we are having 200 (avg) facet fields  in a query.
 And 30 searchable fields.
 Is there any way to identify why it is taking 50 sec for a query.
Multiple concurrent requests are there.



On Sat, 18 Jan, 2020, 10:32 PM Dave,  wrote:

> Agreed with the above. what’s your idea of “huge”? I have 600 ish gb in
> one core plus another 250x2 in two more on the same standalone solr
> instance and it runs more than fine
>
> > On Jan 18, 2020, at 11:31 AM, Shawn Heisey  wrote:
> >
> > On 1/18/2020 1:05 AM, Rajdeep Sahoo wrote:
> >> Our Index size is huge and in master slave the full indexing time is
> almost
> >> 24 hrs.
> >>In future the no of documents will increase.
> >> So,please some one recommend about the no of nodes and configuration
> like
> >> ram and cpu core for solr cloud.
> >
> > Indexing is not going to be any faster in SolrCloud.  It would probably
> be a little bit slower.  The best way to speed up indexing, whether running
> SolrCloud or not, is to make your indexing processes run in parallel, so
> that multiple batches of documents are being indexed at the same time.
> >
> > SolrCloud is not a magic bullet that solves all problems.  It's just a
> different way of managing indexes that has more automation, and makes
> initial setup of a distributed index a lot easier.  It doesn't do the job
> any faster than running without SolrCloud.  The legacy master/slave mode is
> likely to be a little bit faster.
> >
> > You haven't provided any of the information required for us to guess
> about the system requirements.  And it will be a guess ... we could be
> completely wrong.
> >
> >
> https://lucidworks.com/post/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
> >
> > Thanks,
> > Shawn
>

Re: Solr cloud production set up

2020-01-18 Thread Shawn Heisey


On 1/18/2020 9:55 AM, Rajdeep Sahoo wrote:

We do parallel indexing in production,

  What about search performance in solr cloud in comparison with master
slave.
And what about  block join performance in solr cloud.
Do we need to increase the infra for solr cloud as we would be
maintaining multiple shard and replica.
   Is there any co relation with master slave set up.


As I said before, SolrCloud is not a magic bullet that solves 
performance issues.  If the index characteristics are the same (number 
of docs, total size), performance in SolrCloud will be nearly identical 
to non-cloud.


Thanks,
Shawn

Re: Solr cloud production set up

2020-01-18 Thread Rajdeep Sahoo

Hi shawn,
  Thanks for this info,
Could you Please address my below query,

We are having 2.3 million documents and size is 2.5 gb.
 With this data do we need solr cloud.

  10 core cpu and 24 gb ram . 16 slave nodes.

  Still some of the queries are taking 50 sec at solr end.
As we are using solr 4.6 .
  Other thing is we are having 200 (avg) facet fields  in a query.
 And 30 searchable fields.
 Is there any way to identify why it is taking 50 sec for a query.
Multiple concurrent requests are there.

And how to optimize the search response time as it is almost 1 mins for
some request.

On Sat, 18 Jan, 2020, 10:52 PM Shawn Heisey,  wrote:

> On 1/18/2020 9:55 AM, Rajdeep Sahoo wrote:
> > We do parallel indexing in production,
> >
> >   What about search performance in solr cloud in comparison with master
> > slave.
> > And what about  block join performance in solr cloud.
> > Do we need to increase the infra for solr cloud as we would be
> > maintaining multiple shard and replica.
> >Is there any co relation with master slave set up.
>
> As I said before, SolrCloud is not a magic bullet that solves
> performance issues.  If the index characteristics are the same (number
> of docs, total size), performance in SolrCloud will be nearly identical
> to non-cloud.
>
> Thanks,
> Shawn
>

Re: Solr cloud production set up

2020-01-18 Thread Walter Underwood

For indexing, is the master node CPU around 90%? If not, you aren’t sending 
requests fast enough or your disk is slow.

For querying, 200 facet fields is HUGE. That will take a lot of Java heap 
memory and will be slow. Each facet fields requires large in-memory arrays and 
sorting.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Jan 18, 2020, at 9:29 AM, Rajdeep Sahoo  wrote:
> 
> Hi shawn,
>  Thanks for this info,
> Could you Please address my below query,
> 
> 
> We are having 2.3 million documents and size is 2.5 gb.
> With this data do we need solr cloud.
> 
>  10 core cpu and 24 gb ram . 16 slave nodes.
> 
>  Still some of the queries are taking 50 sec at solr end.
> As we are using solr 4.6 .
>  Other thing is we are having 200 (avg) facet fields  in a query.
> And 30 searchable fields.
> Is there any way to identify why it is taking 50 sec for a query.
>Multiple concurrent requests are there.
> 
> And how to optimize the search response time as it is almost 1 mins for
> some request.
> 
> 
> On Sat, 18 Jan, 2020, 10:52 PM Shawn Heisey,  wrote:
> 
>> On 1/18/2020 9:55 AM, Rajdeep Sahoo wrote:
>>> We do parallel indexing in production,
>>> 
>>>  What about search performance in solr cloud in comparison with master
>>> slave.
>>>And what about  block join performance in solr cloud.
>>>Do we need to increase the infra for solr cloud as we would be
>>> maintaining multiple shard and replica.
>>>   Is there any co relation with master slave set up.
>> 
>> As I said before, SolrCloud is not a magic bullet that solves
>> performance issues.  If the index characteristics are the same (number
>> of docs, total size), performance in SolrCloud will be nearly identical
>> to non-cloud.
>> 
>> Thanks,
>> Shawn
>>

Re: Solr cloud production set up

2020-01-18 Thread Rajdeep Sahoo

Although we are having a avg of 200 facet fields in the search request all
of them will not be having values in each request.
Max of 50-60 facet fields will be having some value.
  And we are using function query,is it having some performance impact.


On Sat, 18 Jan, 2020, 11:10 PM Walter Underwood, 
wrote:

> For indexing, is the master node CPU around 90%? If not, you aren’t
> sending requests fast enough or your disk is slow.
>
> For querying, 200 facet fields is HUGE. That will take a lot of Java heap
> memory and will be slow. Each facet fields requires large in-memory arrays
> and sorting.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Jan 18, 2020, at 9:29 AM, Rajdeep Sahoo 
> wrote:
> >
> > Hi shawn,
> >  Thanks for this info,
> > Could you Please address my below query,
> >
> >
> > We are having 2.3 million documents and size is 2.5 gb.
> > With this data do we need solr cloud.
> >
> >  10 core cpu and 24 gb ram . 16 slave nodes.
> >
> >  Still some of the queries are taking 50 sec at solr end.
> > As we are using solr 4.6 .
> >  Other thing is we are having 200 (avg) facet fields  in a query.
> > And 30 searchable fields.
> > Is there any way to identify why it is taking 50 sec for a query.
> >Multiple concurrent requests are there.
> >
> > And how to optimize the search response time as it is almost 1 mins for
> > some request.
> >
> >
> > On Sat, 18 Jan, 2020, 10:52 PM Shawn Heisey, 
> wrote:
> >
> >> On 1/18/2020 9:55 AM, Rajdeep Sahoo wrote:
> >>> We do parallel indexing in production,
> >>>
> >>>  What about search performance in solr cloud in comparison with master
> >>> slave.
> >>>And what about  block join performance in solr cloud.
> >>>Do we need to increase the infra for solr cloud as we would be
> >>> maintaining multiple shard and replica.
> >>>   Is there any co relation with master slave set up.
> >>
> >> As I said before, SolrCloud is not a magic bullet that solves
> >> performance issues.  If the index characteristics are the same (number
> >> of docs, total size), performance in SolrCloud will be nearly identical
> >> to non-cloud.
> >>
> >> Thanks,
> >> Shawn
> >>
>
>

Re: Solr cloud production set up

2020-01-18 Thread Shawn Heisey


On 1/18/2020 10:09 AM, Rajdeep Sahoo wrote:

We are having 2.3 million documents and size is 2.5 gb.
   10 core cpu and 24 gb ram . 16 slave nodes.

   Still some of the queries are taking 50 sec at solr end.
As we are using solr 4.6 .
   Other thing is we are having 200 (avg) facet fields  in a query.
  And 30 searchable fields.
  Is there any way to identify why it is taking 50 sec for a query.
 Multiple concurrent requests are there.


Searching 30 fields and computing 200 facets is never going to be super 
fast.  Switching to cloud will not help, and might make it slower.


Your index is pretty small to a lot of us.  There are people running 
indexes with billions of documents that take terabytes of disk space.


As Walter mentioned, computing 200 facets is going to require a fair 
amount of heap memory.  One *possible* problem here is that the Solr 
heap size is too small, so a lot of GC is required.  How much of the 
24GB have you assigned to the heap?  Is there any software other than 
Solr running on these nodes?


Thanks,
Shawn

Re: Solr cloud production set up

2020-01-18 Thread Rajdeep Sahoo

We have assigned 16 gb out of 24gb for heap .
 No other process is running on that node.

200 facets fields are there in the query but we will not be getting the
values for each facets for every search.
There can be max of 50-60 facets for which we will be getting values.

 We are using caching,is it not going to help.



On Sat, 18 Jan, 2020, 11:36 PM Shawn Heisey,  wrote:

> On 1/18/2020 10:09 AM, Rajdeep Sahoo wrote:
> > We are having 2.3 million documents and size is 2.5 gb.
> >10 core cpu and 24 gb ram . 16 slave nodes.
> >
> >Still some of the queries are taking 50 sec at solr end.
> > As we are using solr 4.6 .
> >Other thing is we are having 200 (avg) facet fields  in a query.
> >   And 30 searchable fields.
> >   Is there any way to identify why it is taking 50 sec for a query.
> >  Multiple concurrent requests are there.
>
> Searching 30 fields and computing 200 facets is never going to be super
> fast.  Switching to cloud will not help, and might make it slower.
>
> Your index is pretty small to a lot of us.  There are people running
> indexes with billions of documents that take terabytes of disk space.
>
> As Walter mentioned, computing 200 facets is going to require a fair
> amount of heap memory.  One *possible* problem here is that the Solr
> heap size is too small, so a lot of GC is required.  How much of the
> 24GB have you assigned to the heap?  Is there any software other than
> Solr running on these nodes?
>
> Thanks,
> Shawn
>

Re: Solr cloud production set up

2020-01-18 Thread Dave

If you’re not getting values, don’t ask for the facet. Facets are expensive as 
hell, maybe you should think more about your query’s than your infrastructure, 
solr cloud won’t help you at all especially if your asking for things you don’t 
need

> On Jan 18, 2020, at 1:25 PM, Rajdeep Sahoo  wrote:
> 
> We have assigned 16 gb out of 24gb for heap .
> No other process is running on that node.
> 
> 200 facets fields are there in the query but we will not be getting the
> values for each facets for every search.
> There can be max of 50-60 facets for which we will be getting values.
> 
> We are using caching,is it not going to help.
> 
> 
> 
>> On Sat, 18 Jan, 2020, 11:36 PM Shawn Heisey,  wrote:
>> 
>>> On 1/18/2020 10:09 AM, Rajdeep Sahoo wrote:
>>> We are having 2.3 million documents and size is 2.5 gb.
>>>   10 core cpu and 24 gb ram . 16 slave nodes.
>>> 
>>>   Still some of the queries are taking 50 sec at solr end.
>>> As we are using solr 4.6 .
>>>   Other thing is we are having 200 (avg) facet fields  in a query.
>>>  And 30 searchable fields.
>>>  Is there any way to identify why it is taking 50 sec for a query.
>>> Multiple concurrent requests are there.
>> 
>> Searching 30 fields and computing 200 facets is never going to be super
>> fast.  Switching to cloud will not help, and might make it slower.
>> 
>> Your index is pretty small to a lot of us.  There are people running
>> indexes with billions of documents that take terabytes of disk space.
>> 
>> As Walter mentioned, computing 200 facets is going to require a fair
>> amount of heap memory.  One *possible* problem here is that the Solr
>> heap size is too small, so a lot of GC is required.  How much of the
>> 24GB have you assigned to the heap?  Is there any software other than
>> Solr running on these nodes?
>> 
>> Thanks,
>> Shawn
>>

Re: Solr cloud production set up

2020-01-18 Thread Rajdeep Sahoo

Thanks for the suggestion,

 Is there any way to get the info which operation or which query params are
increasing the response time.


On Sat, 18 Jan, 2020, 11:59 PM Dave,  wrote:

> If you’re not getting values, don’t ask for the facet. Facets are
> expensive as hell, maybe you should think more about your query’s than your
> infrastructure, solr cloud won’t help you at all especially if your asking
> for things you don’t need
>
> > On Jan 18, 2020, at 1:25 PM, Rajdeep Sahoo 
> wrote:
> >
> > We have assigned 16 gb out of 24gb for heap .
> > No other process is running on that node.
> >
> > 200 facets fields are there in the query but we will not be getting the
> > values for each facets for every search.
> > There can be max of 50-60 facets for which we will be getting values.
> >
> > We are using caching,is it not going to help.
> >
> >
> >
> >> On Sat, 18 Jan, 2020, 11:36 PM Shawn Heisey, 
> wrote:
> >>
> >>> On 1/18/2020 10:09 AM, Rajdeep Sahoo wrote:
> >>> We are having 2.3 million documents and size is 2.5 gb.
> >>>   10 core cpu and 24 gb ram . 16 slave nodes.
> >>>
> >>>   Still some of the queries are taking 50 sec at solr end.
> >>> As we are using solr 4.6 .
> >>>   Other thing is we are having 200 (avg) facet fields  in a query.
> >>>  And 30 searchable fields.
> >>>  Is there any way to identify why it is taking 50 sec for a query.
> >>> Multiple concurrent requests are there.
> >>
> >> Searching 30 fields and computing 200 facets is never going to be super
> >> fast.  Switching to cloud will not help, and might make it slower.
> >>
> >> Your index is pretty small to a lot of us.  There are people running
> >> indexes with billions of documents that take terabytes of disk space.
> >>
> >> As Walter mentioned, computing 200 facets is going to require a fair
> >> amount of heap memory.  One *possible* problem here is that the Solr
> >> heap size is too small, so a lot of GC is required.  How much of the
> >> 24GB have you assigned to the heap?  Is there any software other than
> >> Solr running on these nodes?
> >>
> >> Thanks,
> >> Shawn
> >>
>

Re: Solr cloud production set up

2020-01-18 Thread Erick Erickson

Add &debug=timing to the query and it’ll show you the time each component takes.

> On Jan 18, 2020, at 1:50 PM, Rajdeep Sahoo  wrote:
> 
> Thanks for the suggestion,
> 
> Is there any way to get the info which operation or which query params are
> increasing the response time.
> 
> 
> On Sat, 18 Jan, 2020, 11:59 PM Dave,  wrote:
> 
>> If you’re not getting values, don’t ask for the facet. Facets are
>> expensive as hell, maybe you should think more about your query’s than your
>> infrastructure, solr cloud won’t help you at all especially if your asking
>> for things you don’t need
>> 
>>> On Jan 18, 2020, at 1:25 PM, Rajdeep Sahoo 
>> wrote:
>>> 
>>> We have assigned 16 gb out of 24gb for heap .
>>> No other process is running on that node.
>>> 
>>> 200 facets fields are there in the query but we will not be getting the
>>> values for each facets for every search.
>>> There can be max of 50-60 facets for which we will be getting values.
>>> 
>>> We are using caching,is it not going to help.
>>> 
>>> 
>>> 
 On Sat, 18 Jan, 2020, 11:36 PM Shawn Heisey, 
>> wrote:

> On 1/18/2020 10:09 AM, Rajdeep Sahoo wrote:
> We are having 2.3 million documents and size is 2.5 gb.
>  10 core cpu and 24 gb ram . 16 slave nodes.
> 
>  Still some of the queries are taking 50 sec at solr end.
> As we are using solr 4.6 .
>  Other thing is we are having 200 (avg) facet fields  in a query.
> And 30 searchable fields.
> Is there any way to identify why it is taking 50 sec for a query.
>Multiple concurrent requests are there.

 Searching 30 fields and computing 200 facets is never going to be super
 fast.  Switching to cloud will not help, and might make it slower.

 Your index is pretty small to a lot of us.  There are people running
 indexes with billions of documents that take terabytes of disk space.

 As Walter mentioned, computing 200 facets is going to require a fair
 amount of heap memory.  One *possible* problem here is that the Solr
 heap size is too small, so a lot of GC is required.  How much of the
 24GB have you assigned to the heap?  Is there any software other than
 Solr running on these nodes?

 Thanks,
 Shawn

>>

Re: Solr cloud production set up

2020-01-18 Thread Rajdeep Sahoo

Apart from reducing no of facets in the query, is there any other query
params or gc params or heap space or anything else that we need to tweak
for improving search response time.

On Sun, 19 Jan, 2020, 3:15 AM Erick Erickson, 
wrote:

> Add &debug=timing to the query and it’ll show you the time each component
> takes.
>
> > On Jan 18, 2020, at 1:50 PM, Rajdeep Sahoo 
> wrote:
> >
> > Thanks for the suggestion,
> >
> > Is there any way to get the info which operation or which query params
> are
> > increasing the response time.
> >
> >
> > On Sat, 18 Jan, 2020, 11:59 PM Dave, 
> wrote:
> >
> >> If you’re not getting values, don’t ask for the facet. Facets are
> >> expensive as hell, maybe you should think more about your query’s than
> your
> >> infrastructure, solr cloud won’t help you at all especially if your
> asking
> >> for things you don’t need
> >>
> >>> On Jan 18, 2020, at 1:25 PM, Rajdeep Sahoo  >
> >> wrote:
> >>>
> >>> We have assigned 16 gb out of 24gb for heap .
> >>> No other process is running on that node.
> >>>
> >>> 200 facets fields are there in the query but we will not be getting the
> >>> values for each facets for every search.
> >>> There can be max of 50-60 facets for which we will be getting values.
> >>>
> >>> We are using caching,is it not going to help.
> >>>
> >>>
> >>>
>  On Sat, 18 Jan, 2020, 11:36 PM Shawn Heisey, 
> >> wrote:
> 
> > On 1/18/2020 10:09 AM, Rajdeep Sahoo wrote:
> > We are having 2.3 million documents and size is 2.5 gb.
> >  10 core cpu and 24 gb ram . 16 slave nodes.
> >
> >  Still some of the queries are taking 50 sec at solr end.
> > As we are using solr 4.6 .
> >  Other thing is we are having 200 (avg) facet fields  in a query.
> > And 30 searchable fields.
> > Is there any way to identify why it is taking 50 sec for a query.
> >Multiple concurrent requests are there.
> 
>  Searching 30 fields and computing 200 facets is never going to be
> super
>  fast.  Switching to cloud will not help, and might make it slower.
> 
>  Your index is pretty small to a lot of us.  There are people running
>  indexes with billions of documents that take terabytes of disk space.
> 
>  As Walter mentioned, computing 200 facets is going to require a fair
>  amount of heap memory.  One *possible* problem here is that the Solr
>  heap size is too small, so a lot of GC is required.  How much of the
>  24GB have you assigned to the heap?  Is there any software other than
>  Solr running on these nodes?
> 
>  Thanks,
>  Shawn
> 
> >>
>
>

Re: Solr cloud production set up

Re: Solr cloud production set up

Re: Solr cloud production set up

Indexing HTML Metatags Nutch - SOLR

Indexing HTML Metatags Nutch - SOLR

Re: Solr cloud production set up

Re: Upgrading solr to 8.2

Re: Solr cloud production set up

Re: Solr cloud production set up

Re: Solr cloud production set up

Re: Solr cloud production set up

Re: Solr cloud production set up

Re: Solr cloud production set up

Re: Solr cloud production set up

Re: Solr cloud production set up

Re: Solr cloud production set up

Re: Solr cloud production set up

Re: Solr cloud production set up

Re: Solr cloud production set up

Re: Solr cloud production set up

Re: Solr cloud production set up

21 matches

Site Navigation

Mail list logo

Footer information