Re: Commits (with openSearcher = true) are too slow in solr 8

2020-12-02 Thread raj.yadav
Hi everyone,

As per suggestions in previous post (by Erick and Shawn) we did following
changes.

OLD




 

NEW







*Reduced JVM heap size from 30GB to 26GB*

GC setting:
GC_TUNE=" \
-XX:+UseG1GC \
-XX:+PerfDisableSharedMem \
-XX:+ParallelRefProcEnabled \
-XX:G1HeapRegionSize=8m \
-XX:MaxGCPauseMillis=150 \
-XX:InitiatingHeapOccupancyPercent=60 \
-XX:+UseLargePages \
-XX:+AggressiveOpts \
"

Solr Collection details: (running in solrCloud mode)
It has 6 shards, and each shard has only one replica (which is also a
leader) and replica type is NRT
Each shard Index size: 11 GB
avg size/doc: 1.0Kb

We are running indexing on this collection:
*Indexing rate: 2.4 million per hour*

*The query rate is zero. Still commit with opensearcher=true is taking 25 to
28 minutes.*
Is this because of heavy indexing? Also, with an increase in the number of
documents in collection commit time is increasing. 
This is not our production system. In the prod system generally, our
indexing rate is 5k/hour.

Is it expected to have such high commit time with the above indexing rate?



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Need help to configure automated deletion of shard in solr

2020-12-02 Thread Erick Erickson
You can certainly use the TTL logic. Note the TimeRoutedAlias, but
the DocExpirationUpdateFactory. DocExpirationUpdateFactory
operates on each document individually so you can mix-n-match
if you want.

As for knowing when a shard is empty, I suggested a method for that
in one of the earlier e-mails.

If you have a collection per customer, and assuming that a customer
has the same retention policy for all docs, then TimeRoutedAlias would
work.

Best,
Erick

> On Dec 2, 2020, at 12:19 AM, Pushkar Mishra  wrote:
> 
> Hi Erick,
> It is implicit.
> TTL thing I have explored but due to some complications we can't use. that .
> Let me explain the actual use case .
> 
> We have limited space ,we can't keep storing the document for infinite
> time  . So based on the customer's retention policy ,I need to delete the
> documents. And in this process  if any shard gets empty , need to delete
> the shard as well.
> 
> So lets say , is there a way to know, when solr completes the purging of
> deleted documents, then based on that flag we can configure shard deletion
> 
> Thanks
> Pushkar
> 
> On Tue, Dec 1, 2020 at 9:02 PM Erick Erickson 
> wrote:
> 
>> This is still confusing. You haven’t told us what router you are using,
>> compositeId or implicit?
>> 
>> If you’re using compositeId (the default), you will never have empty shards
>> because docs get assigned to shards via a hashing algorithm that
>> distributes
>> them very evenly across all available shards. You cannot delete any
>> shard when using compositeId as your routing method.
>> 
>> If you don’t know which router you’re using, then you’re using compositeId.
>> 
>> NOTE: for the rest, “documents” means non-deleted documents. Solr will
>> take care of purging the deleted documents automatically.
>> 
>> I think you’re making this much more difficult than you need to. Assuming
>> that the total number of documents remains relatively constant, you can
>> just
>> let Solr take care of it all and not bother with trying to individually
>> manage
>> shards by using the default compositeID routing.
>> 
>> If the number of docs increases you might need to use splitshard. But it
>> sounds like the total number of “live” documents isn’t going to increase.
>> 
>> For TTL, if you have a _fixed_ TTL, i.e. the docs should always expire
>> after,
>> say, 30 dayswhich it doesn’t sound like you do, you can use
>> the “Time Routed Alias” option, see:
>> https://lucene.apache.org/solr/guide/7_5/time-routed-aliases.html
>> 
>> Assuming your TTL isn’t a fixed-interval, you can configure
>> DocExpirationUpdateProcessorFactory to deal with TTL automatically.
>> 
>> And if you still think you need to handle this, you need to explain exactly
>> what problem you’re trying to solve because so far it appears that
>> you’re simply taking on way more work than you need to.
>> 
>> Best,
>> Erick
>> 
>>> On Dec 1, 2020, at 9:46 AM, Pushkar Mishra 
>> wrote:
>>> 
>>> Hi Team,
>>> As I explained the use case , can someone help me out to find out the
>>> configuration way to delete the shard here ?
>>> A quick response  will be greatly appreciated.
>>> 
>>> Regards
>>> Pushkar
>>> 
>>> 
>>> On Mon, Nov 30, 2020 at 11:32 PM Pushkar Mishra 
>>> wrote:
>>> 
 
 
 On Mon, Nov 30, 2020, 9:15 PM Pushkar Mishra 
 wrote:
 
> Hi Erick,
> First of all thanks for your response . I will check the possibility  .
> Let me explain my problem  in detail :
> 
> 1. We have other use cases where we are making use of listener on
> postCommit to delete/shift/split the shards . So we have capability to
> delete the shards .
> 2. The current use case is , where we have to delete the documents from
> the shard , and during deletion process(it will be scheduled process,
>> may
> be hourly or daily, which will delete the documents) , if shards  gets
> empty (or may be lets  say nominal documents are left ) , then delete
>> the
> shard.  And I am exploring to do this using configuration .
> 
 3. Also it will not be in live shard for sure as only those documents
>> are
 deleted which have TTL got over . TTL could be a month or year.
 
 Please assist if you have any config based idea on this
 
> Regards
> Pushkar
> 
> On Mon, Nov 30, 2020, 8:48 PM Erick Erickson 
> wrote:
> 
>> Are you using the implicit router? Otherwise you cannot delete a
>> shard.
>> And you won’t have any shards that have zero documents anyway.
>> 
>> It’d be a little convoluted, but you could use the collections
>> COLSTATUS
>> Api to
>> find the names of all your replicas. Then query _one_ replica of each
>> shard with something like
>> solr/collection1_shard1_replica_n1/q=*:*&distrib=false
>> 
>> that’ll return the number of live docs (i.e. non-deleted docs) and if
>> it’s zero
>> you can delete the shard.
>> 
>> But the implicit router requires you take complete control of where

Re: Solr8.7 - How to optmize my index ?

2020-12-02 Thread Erick Erickson
expungeDeletes is unnecessary, optimize is a superset of expungeDeletes.
The key difference is commit=true. I suspect if you’d waited until your
indexing process added another doc and committed, you’d have seen
the index size drop.

Just to check, you send the command to my_core but talk about collections.
Specifying the collection is sufficient, but I’ll assume that’s a typo and
you’re really saying my_collection.

I agree with Walter like I always do, you shouldn’t be running 
optimize without some proof that it’s helping. About the only time
I think it’s reasonable is when you have a static index, unless you can
demonstrate improved performance. The optimize button was
removed precisely because it was so tempting. In much earlier
versions of Lucene, it made a demonstrable difference so was put
front and center. In more recent versions of Solr optimize doesn’t
help nearly as much so it was removed.

You say you have 38M deleted documents. How many documents total? If this is
50% of your index, that’s one thing. If it’s 5%, it’s certainly not worth
the effort. You’re rewriting 466G of index, if you’re not seeing demonstrable
performance improvements, that’s a lot of wasted effort…

See: https://lucidworks.com/post/solr-and-optimizing-your-index-take-ii/
and the linked article for what happens in pre 7.5 solr versions.

Best,
Erick

> On Dec 1, 2020, at 2:31 PM, Info MatheoSoftware  
> wrote:
> 
> Hi All,
> 
> 
> 
> I found the solution, I must do :
> 
> curl ‘http://xxx:8983/solr/my_core/update?
> 
> commit=true&expungeDeletes=true’
> 
> 
> 
> It works fine
> 
> 
> 
> Thanks,
> 
> Bruno
> 
> 
> 
> 
> 
> 
> 
> De : Matheo Software [mailto:i...@matheo-software.com]
> Envoyé : mardi 1 décembre 2020 13:28
> À : solr-user@lucene.apache.org
> Objet : Solr8.7 - How to optmize my index ?
> 
> 
> 
> Hi All,
> 
> 
> 
> With Solr5.4, I used the UI button but in Solr8.7 UI this button is missing.
> 
> 
> 
> So I decide to use the command line:
> 
> curl http://xxx:8983/solr/my_core/update?optimize=true
> 
> 
> 
> My collection my_core exists of course.
> 
> 
> 
> The answer of the command line is:
> 
> {
> 
>  "responseHeader":{
> 
>"status":0,
> 
>"QTime":18}
> 
> }
> 
> 
> 
> But nothing change.
> 
> I always have 38M deleted docs in my collection and directory size no change
> like with solr5.4.
> 
> The size of the collection stay always at : 466.33Go
> 
> 
> 
> Could you tell me how can I purge deleted docs ?
> 
> 
> 
> Cordialement, Best Regards
> 
> Bruno Mannina
> 
>  www.matheo-software.com
> 
>  www.patent-pulse.com
> 
> Tél. +33 0 970 738 743
> 
> Mob. +33 0 634 421 817
> 
>  facebook (1)
>  1425551717
>  1425551737
>  1425551760
> 
> 
> 
> 
> 
>  _
> 
> 
>  Avast logo
> 
> L'absence de virus dans ce courrier électronique a été vérifiée par le
> logiciel antivirus Avast.
> www.avast.com 
> 
> 
> 
> 
> 
> 
> 
> --
> L'absence de virus dans ce courrier électronique a été vérifiée par le 
> logiciel antivirus Avast.
> https://www.avast.com/antivirus



chaining charFilter

2020-12-02 Thread Arturas Mazeika
Hi Solr-Team,

The manual of charfilters says that one can chain them: (from
https://lucene.apache.org/solr/guide/6_6/charfilterfactories.html#CharFilterFactories-solr.MappingCharFilterFactory
):

CharFilters can be chained like Token Filters and placed in front of a
Tokenizer. CharFilters can add, change, or remove characters while
preserving the original character offsets to support features like
highlighting.

I am trying to filter out some of the chars from some fields, so I can do
an efficient and effective faceting later. I tried to chaing charfilters
for that purpose:














but in schema definition I see only the last charfilter
[image: image.png]

Any clues why?

Cheers,
Arturas


Re: chaining charFilter

2020-12-02 Thread Alexandre Rafalovitch
Did you reload the core for it to notice the new schema? Or try creating a
new core from the same schema?

If it is a SolrCloud, you also have to upload the schema to the Zookeeper.

Regards,
   Alex.

On Wed, 2 Dec 2020 at 09:19, Arturas Mazeika  wrote:

> Hi Solr-Team,
>
> The manual of charfilters says that one can chain them: (from
> https://lucene.apache.org/solr/guide/6_6/charfilterfactories.html#CharFilterFactories-solr.MappingCharFilterFactory
> ):
>
> CharFilters can be chained like Token Filters and placed in front of a
> Tokenizer. CharFilters can add, change, or remove characters while
> preserving the original character offsets to support features like
> highlighting.
>
> I am trying to filter out some of the chars from some fields, so I can do
> an efficient and effective faceting later. I tried to chaing charfilters
> for that purpose:
>
>  positionIncrementGap="100">
> 
> 
>  pattern="(.*[/\\])([^/\\]+)$"   replacement="$2"/>
>  pattern="([0-9\-]+)T([0-9\-]+)" replacement="$1 $2"/>
>  pattern="[^a-zA-Z]+"replacement=" "/>
>
> 
> 
> 
> 
>  stored="true"/>
>
> but in schema definition I see only the last charfilter
> [image: image.png]
>
> Any clues why?
>
> Cheers,
> Arturas
>


Re: chaining charFilter

2020-12-02 Thread Erick Erickson
Images are stripped by the mail server, so we can’t see the result.

I looked at master and the admin UI has problems, I just
raised a JIRA, see:
https://issues.apache.org/jira/browse/SOLR-15024

The _functionality_ is fine. If you go to the analysis page
and enter values, you’ll see the transformations work. Although
that screen doesn’t show the CharFitler transformations correctly,
but the tokens at the end are chained.

Best,
Erick

> On Dec 2, 2020, at 9:18 AM, Arturas Mazeika  wrote:
> 
> Hi Solr-Team,
> 
> The manual of charfilters says that one can chain them: (from 
> https://lucene.apache.org/solr/guide/6_6/charfilterfactories.html#CharFilterFactories-solr.MappingCharFilterFactory):
> 
> CharFilters can be chained like Token Filters and placed in front of a 
> Tokenizer. CharFilters can add, change, or remove characters while preserving 
> the original character offsets to support features like highlighting.
> 
> I am trying to filter out some of the chars from some fields, so I can do an 
> efficient and effective faceting later. I tried to chaing charfilters for 
> that purpose:
> 
>  positionIncrementGap="100">
> 
> 
>  pattern="(.*[/\\])([^/\\]+)$"   replacement="$2"/>
>  pattern="([0-9\-]+)T([0-9\-]+)" replacement="$1 $2"/>
> replacement=" "/>
> 
> 
> 
> 
> 
>  stored="true"/>
> 
> but in schema definition I see only the last charfilter 
> 
> 
> Any clues why? 
> 
> Cheers,
> Arturas



Re: chaining charFilter

2020-12-02 Thread Arturas Mazeika
Hi Alex,
Hi Erick,

Thanks a lot for the prompt reply. Indeed, the functionality is completely
fine and checking the values with the analyzer gives results as expected. I
also checked the jira issue, nicely described.

Cheers,
Arturas

On Wed, Dec 2, 2020 at 7:23 PM Erick Erickson 
wrote:

> Images are stripped by the mail server, so we can’t see the result.
>
> I looked at master and the admin UI has problems, I just
> raised a JIRA, see:
> https://issues.apache.org/jira/browse/SOLR-15024
>
> The _functionality_ is fine. If you go to the analysis page
> and enter values, you’ll see the transformations work. Although
> that screen doesn’t show the CharFitler transformations correctly,
> but the tokens at the end are chained.
>
> Best,
> Erick
>
> > On Dec 2, 2020, at 9:18 AM, Arturas Mazeika  wrote:
> >
> > Hi Solr-Team,
> >
> > The manual of charfilters says that one can chain them: (from
> https://lucene.apache.org/solr/guide/6_6/charfilterfactories.html#CharFilterFactories-solr.MappingCharFilterFactory
> ):
> >
> > CharFilters can be chained like Token Filters and placed in front of a
> Tokenizer. CharFilters can add, change, or remove characters while
> preserving the original character offsets to support features like
> highlighting.
> >
> > I am trying to filter out some of the chars from some fields, so I can
> do an efficient and effective faceting later. I tried to chaing charfilters
> for that purpose:
> >
> >  positionIncrementGap="100">
> > 
> > 
> >  pattern="(.*[/\\])([^/\\]+)$"   replacement="$2"/>
> >  pattern="([0-9\-]+)T([0-9\-]+)" replacement="$1 $2"/>
> >  pattern="[^a-zA-Z]+"replacement=" "/>
> >
> > 
> > 
> > 
> > 
> >  stored="true"/>
> >
> > but in schema definition I see only the last charfilter
> >
> >
> > Any clues why?
> >
> > Cheers,
> > Arturas
>
>


RE: Solr8.7 - How to optmize my index ?

2020-12-02 Thread Matheo Software
Hi Erick,
Hi Walter,

Thanks for these information,

I will learn seriously about the solr article you gave me.
I thought it was important to always delete and optimize collection.

More information concerning my collection,
Index size is about 390Go for 130M docs (3-5ko / doc), around 25 fields 
(indexed, stored)
All Tuesday I do an update of around 1M docs and all Thusday I do an add new 
docs (around 50 000).

Many thanks !

Regards,
Bruno

-Message d'origine-
De : Erick Erickson [mailto:erickerick...@gmail.com]
Envoyé : mercredi 2 décembre 2020 14:07
À : solr-user@lucene.apache.org
Objet : Re: Solr8.7 - How to optmize my index ?

expungeDeletes is unnecessary, optimize is a superset of expungeDeletes.
The key difference is commit=true. I suspect if you’d waited until your 
indexing process added another doc and committed, you’d have seen the index 
size drop.

Just to check, you send the command to my_core but talk about collections.
Specifying the collection is sufficient, but I’ll assume that’s a typo and 
you’re really saying my_collection.

I agree with Walter like I always do, you shouldn’t be running optimize without 
some proof that it’s helping. About the only time I think it’s reasonable is 
when you have a static index, unless you can demonstrate improved performance. 
The optimize button was removed precisely because it was so tempting. In much 
earlier versions of Lucene, it made a demonstrable difference so was put front 
and center. In more recent versions of Solr optimize doesn’t help nearly as 
much so it was removed.

You say you have 38M deleted documents. How many documents total? If this is 
50% of your index, that’s one thing. If it’s 5%, it’s certainly not worth the 
effort. You’re rewriting 466G of index, if you’re not seeing demonstrable 
performance improvements, that’s a lot of wasted effort…

See: https://lucidworks.com/post/solr-and-optimizing-your-index-take-ii/
and the linked article for what happens in pre 7.5 solr versions.

Best,
Erick

> On Dec 1, 2020, at 2:31 PM, Info MatheoSoftware  
> wrote:
>
> Hi All,
>
>
>
> I found the solution, I must do :
>
> curl ‘http://xxx:8983/solr/my_core/update?
> 
> commit=true&expungeDeletes=true’
>
>
>
> It works fine
>
>
>
> Thanks,
>
> Bruno
>
>
>
>
>
>
>
> De : Matheo Software [mailto:i...@matheo-software.com] Envoyé : mardi
> 1 décembre 2020 13:28 À : solr-user@lucene.apache.org Objet : Solr8.7
> - How to optmize my index ?
>
>
>
> Hi All,
>
>
>
> With Solr5.4, I used the UI button but in Solr8.7 UI this button is missing.
>
>
>
> So I decide to use the command line:
>
> curl http://xxx:8983/solr/my_core/update?optimize=true
>
>
>
> My collection my_core exists of course.
>
>
>
> The answer of the command line is:
>
> {
>
>  "responseHeader":{
>
>"status":0,
>
>"QTime":18}
>
> }
>
>
>
> But nothing change.
>
> I always have 38M deleted docs in my collection and directory size no
> change like with solr5.4.
>
> The size of the collection stay always at : 466.33Go
>
>
>
> Could you tell me how can I purge deleted docs ?
>
>
>
> Cordialement, Best Regards
>
> Bruno Mannina
>
>  www.matheo-software.com
>
>  www.patent-pulse.com
>
> Tél. +33 0 970 738 743
>
> Mob. +33 0 634 421 817
>
>  facebook (1)
>  1425551717
>  1425551737
>  1425551760
>
>
>
>
>
>  _
>
>
>  Avast logo
>
> L'absence de virus dans ce courrier électronique a été vérifiée par le
> logiciel antivirus Avast.
> www.avast.com 
>
>
>
>
>
>
>
> --
> L'absence de virus dans ce courrier électronique a été vérifiée par le 
> logiciel antivirus Avast.
> https://www.avast.com/antivirus


--
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus



Re: Solr8.7 - How to optmize my index ?

2020-12-02 Thread Dave
I’m going to go against the advice SLIGHTLY, it really depends on how you have 
things set up as far as your solr server hosting is done. If you’re searching 
off the same solr server you’re indexing to, yeah don’t ever optimize it will 
take care of itself, people much smarter than us, like Erick/Walter/Yonik, have 
spent time on this and if they say don’t do it don't do it. 

 In my particular use case I do see a measured improvement from optimizing 
every three or four months.  In my case a large portion, over 75% of the 
documents, which each measure around 500k to 3mg get reindexed every month, as 
the fields in the documents change every month, while documents are added to it 
daily as well.  So when I can go from a 650gb index to a 450gb once in a while 
it makes a difference if I only have 500gb of memory to work with on the 
searchers and can fit all the segments straight to memory. Also I use the old 
set up of master slave, so my indexing server, when it’s optimizing has no 
impact on the searching servers.  Once the optimized index gets warmed back up 
in the searcher I do notice improvement in my qtimes (I like to think) however 
I’ve been using my same integration process of occasional hard optimizations 
since 1.4, and it might just be i like to watch the index inflate three times 
the size then shrivel up. Old habits die hard. 

> On Dec 2, 2020, at 10:28 PM, Matheo Software  wrote:
> 
> Hi Erick,
> Hi Walter,
> 
> Thanks for these information,
> 
> I will learn seriously about the solr article you gave me. 
> I thought it was important to always delete and optimize collection.
> 
> More information concerning my collection,
> Index size is about 390Go for 130M docs (3-5ko / doc), around 25 fields 
> (indexed, stored)
> All Tuesday I do an update of around 1M docs and all Thusday I do an add new 
> docs (around 50 000). 
> 
> Many thanks !
> 
> Regards,
> Bruno
> 
> -Message d'origine-
> De : Erick Erickson [mailto:erickerick...@gmail.com] 
> Envoyé : mercredi 2 décembre 2020 14:07
> À : solr-user@lucene.apache.org
> Objet : Re: Solr8.7 - How to optmize my index ?
> 
> expungeDeletes is unnecessary, optimize is a superset of expungeDeletes.
> The key difference is commit=true. I suspect if you’d waited until your 
> indexing process added another doc and committed, you’d have seen the index 
> size drop.
> 
> Just to check, you send the command to my_core but talk about collections.
> Specifying the collection is sufficient, but I’ll assume that’s a typo and 
> you’re really saying my_collection.
> 
> I agree with Walter like I always do, you shouldn’t be running optimize 
> without some proof that it’s helping. About the only time I think it’s 
> reasonable is when you have a static index, unless you can demonstrate 
> improved performance. The optimize button was removed precisely because it 
> was so tempting. In much earlier versions of Lucene, it made a demonstrable 
> difference so was put front and center. In more recent versions of Solr 
> optimize doesn’t help nearly as much so it was removed.
> 
> You say you have 38M deleted documents. How many documents total? If this is 
> 50% of your index, that’s one thing. If it’s 5%, it’s certainly not worth the 
> effort. You’re rewriting 466G of index, if you’re not seeing demonstrable 
> performance improvements, that’s a lot of wasted effort…
> 
> See: https://lucidworks.com/post/solr-and-optimizing-your-index-take-ii/
> and the linked article for what happens in pre 7.5 solr versions.
> 
> Best,
> Erick
> 
>> On Dec 1, 2020, at 2:31 PM, Info MatheoSoftware  
>> wrote:
>> 
>> Hi All,
>> 
>> 
>> 
>> I found the solution, I must do :
>> 
>> curl ‘http://xxx:8983/solr/my_core/update?
>> 
>> commit=true&expungeDeletes=true’
>> 
>> 
>> 
>> It works fine
>> 
>> 
>> 
>> Thanks,
>> 
>> Bruno
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> De : Matheo Software [mailto:i...@matheo-software.com] Envoyé : mardi 
>> 1 décembre 2020 13:28 À : solr-user@lucene.apache.org Objet : Solr8.7 
>> - How to optmize my index ?
>> 
>> 
>> 
>> Hi All,
>> 
>> 
>> 
>> With Solr5.4, I used the UI button but in Solr8.7 UI this button is missing.
>> 
>> 
>> 
>> So I decide to use the command line:
>> 
>> curl http://xxx:8983/solr/my_core/update?optimize=true
>> 
>> 
>> 
>> My collection my_core exists of course.
>> 
>> 
>> 
>> The answer of the command line is:
>> 
>> {
>> 
>> "responseHeader":{
>> 
>>   "status":0,
>> 
>>   "QTime":18}
>> 
>> }
>> 
>> 
>> 
>> But nothing change.
>> 
>> I always have 38M deleted docs in my collection and directory size no 
>> change like with solr5.4.
>> 
>> The size of the collection stay always at : 466.33Go
>> 
>> 
>> 
>> Could you tell me how can I purge deleted docs ?
>> 
>> 
>> 
>> Cordialement, Best Regards
>> 
>> Bruno Mannina
>> 
>>  www.matheo-software.com
>> 
>>  www.patent-pulse.com
>> 
>> Tél. +33 0 970 738