Need help to configure automated deletion of shard in solr

2020-11-30 Thread Pushkar Mishra
Hi Solr team,

I am using solr cloud.(version 8.5.x). I have a need to find out a
configuration where I can delete a shard , when number of documents reaches
to zero in the shard , can some one help me out to achieve that ?


It is urgent , so a quick response will be highly appreciated .

Thanks
Pushkar

-- 
Pushkar Kumar Mishra
"Reactions are always instinctive whereas responses are always well thought
of... So start responding rather than reacting in life"


Re: Solr Highlighting not working

2020-11-30 Thread Ajay Sharma
Hi All,

pushing the query to the top.
Does anyone have any idea about it?


On Fri, Nov 27, 2020 at 11:49 AM Ajay Sharma  wrote:

> Hi Community,
>
> This is the first time, I am implementing a solr *highlighting *feature.
> I have read the concept via solr documentation
> Link- https://lucene.apache.org/solr/guide/8_2/highlighting.html
>
> To enable highlighting I just have to add *&hl=true&hl.fl=* *in our solr
> query and got the snippet in the solr response and it is working fine in
> most of the cases.
>
> *But highlighting does not work when synonyms came into action*
>
> *Issue:*
> I am searching leopard (q=leopard) in field title (qf=title)
>
> In our synonym file, we have an entry like below
> *leopard,tenduaa,panther*
>
> and in one document id:123456, field title contains below text:
> title:"Jindal Panther TMT Bars
>
> For the query (q=leopard) , i am getting this document (id:123456) in solr
> response
> I could check that due to synonym document is matched  and I confirmed it
> via Solr UI analysis screen where I put Analyse FieldName= title,  Field
> Value (Index) ="Jindal Panther TMT rebars" and Field Value (Query) =
> leopard and I could see in index chain, token panther getting saved as
> leopard also but in highlighting I don't get any matched token and
> getting below response
>
>
>- highlighting:
>{
>   - 123456: { }
>   }
>
>
>
> I just need the matched synonym token like panther in the above case to be
> returned in solr highlighting response
> I have read and re-read the solr documentation, searched on google gone
> through many articles even checked StackOverflow but could not find a
> solution.
> Any help from community members will be highly appreciated.
>
> Thanks in advance.
>
>
> --
> Regards,
> Ajay Sharma
> Software Engineer, Product-Search,
> IndiaMART InterMESH Ltd
>


-- 
Thanks & Regards,
Ajay Sharma
Software Engineer, Product-Search,
IndiaMART InterMESH Ltd

-- 



write.lock file after unloading core

2020-11-30 Thread elisabeth benoit
Hello all,

We are using solr 7.3.1, with master and slave config.

When we deliver a new index we unload the core, with option delete data dir
= true, then recreate the data folder and copy the new index files into
that folder before sending solr a command to recreate the core (with the
same name).

But we have, at the same time, some batches indexing non stop the core we
just unloaded, and it happens quite frequently that we have an error at
this point, the copy cannot be done, and I guess it is because of a
write.lock file created by a solr index writer in the index directory.

Is it possible, when unloading the core, to stop / kill index writer? I've
tried including a sleep after the unload and before recreation of the index
folder, it seems to work but I was wondering if a better solution exists.

Best regards,
Elisabeth


Re: data import handler deprecated?

2020-11-30 Thread Eric Pugh
You don’t need to abandon DIH right now….   You can just use the Github hosted 
version….   The more people who use it, the better a community it will form 
around it!It’s a bit chicken and egg, since no one is actively discussing 
it, submitting PR’s etc, it may languish.   If you use it, and test it, and 
support other community folks using it, then it will continue on!



> On Nov 29, 2020, at 12:12 PM, Dmitri Maziuk  wrote:
> 
> On 11/29/2020 10:32 AM, Erick Erickson wrote:
> 
>> And I absolutely agree with Walter that the DB is often where
>> the bottleneck lies. You might be able to
>> use multiple threads and/or processes to query the
>> DB if that’s the case and you can find some kind of partition
>> key.
> 
> IME the difficult part has always been dealing with incremental updates, if 
> we were to roll our own, my vote would be for a database trigger that does a 
> POST in whichever language the DBMS likes.
> 
> But this has not been a part of our "solr 6.5 update" project until now.
> 
> Thanks everyone,
> Dima

___
Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com  | 
My Free/Busy   
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed 


This e-mail and all contents, including attachments, is considered to be 
Company Confidential unless explicitly stated otherwise, regardless of whether 
attachments are marked as such.



Re: Solr collapse & expand queries.

2020-11-30 Thread Joel Bernstein
Both collapse and grouping are used quite often so I'm not sure I would
agree with the preference for collapse. There is a very specific use case
where collapse performs better and in these scenarios collapse might be the
only option that would work.

The use case where collapse works better is:

1) High cardinality grouping field, like product id.
2) Larger result sets
3) The need to know the full number of groups that match the result set. In
grouping this is group.ngroups.

At a certain point grouping will become too slow under the scenario
described above. It will all depend on the scale of #1 and #2 above. If you
remove group.ngroups grouping will usually be just as fast or faster then
collapse.

So in your testing, make sure you're testing the full data set with
representative queries, and decide if group.ngroups is needed.







Joel Bernstein
http://joelsolr.blogspot.com/


On Sat, Nov 28, 2020 at 3:42 AM Parshant Kumar
 wrote:

> Hi community,
>
> I want to implement collapse queries instead of group queries . In solr
> documentation it is stated that we should prefer collapse & expand queries
> instead of group queries.Please explain how the collapse & expand queries
> is better than grouped queries ? How can I implement it ? Do i need to add
> anything in *solrconfig.xml file* as well or just need to make changes in
> solr queries like below:
>
>
> *fq={!collapse field=*field*}&expand.rows=n&expand=true  instead of
> group.field=*field*&group=true&group.limit=n*
>
> I have done performance testing by making above changes in solr queries and
> found that query times are almost the same for both collapse queries and
> group queries.
>
> Please help me how to implement it and its advantage over grouped queries.
>
> Thanks,
> Parshant Kumar.
>
> --
>
>


Re: data import handler deprecated?

2020-11-30 Thread David Smiley
Yes, absolutely to what Eric said.  We goofed on news / release highlights
on how to communicate what's happening in Solr.  From a Solr insider point
of view, we are "deprecating" because strictly speaking, the code isn't in
our codebase any longer.  From a user point of view (the audience of news /
release notes), the functionality has *moved*.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Mon, Nov 30, 2020 at 8:04 AM Eric Pugh 
wrote:

> You don’t need to abandon DIH right now….   You can just use the Github
> hosted version….   The more people who use it, the better a community it
> will form around it!It’s a bit chicken and egg, since no one is
> actively discussing it, submitting PR’s etc, it may languish.   If you use
> it, and test it, and support other community folks using it, then it will
> continue on!
>
>
>
> > On Nov 29, 2020, at 12:12 PM, Dmitri Maziuk 
> wrote:
> >
> > On 11/29/2020 10:32 AM, Erick Erickson wrote:
> >
> >> And I absolutely agree with Walter that the DB is often where
> >> the bottleneck lies. You might be able to
> >> use multiple threads and/or processes to query the
> >> DB if that’s the case and you can find some kind of partition
> >> key.
> >
> > IME the difficult part has always been dealing with incremental updates,
> if we were to roll our own, my vote would be for a database trigger that
> does a POST in whichever language the DBMS likes.
> >
> > But this has not been a part of our "solr 6.5 update" project until now.
> >
> > Thanks everyone,
> > Dima
>
> ___
> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 |
> http://www.opensourceconnections.com <
> http://www.opensourceconnections.com/> | My Free/Busy <
> http://tinyurl.com/eric-cal>
> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
>
> This e-mail and all contents, including attachments, is considered to be
> Company Confidential unless explicitly stated otherwise, regardless of
> whether attachments are marked as such.
>
>


Re: write.lock file after unloading core

2020-11-30 Thread Erick Erickson
I’m a little confused here. Are you unloading/copying/creating the core on 
master?
I’ll assume so since I can’t really think of how doing this on one of the other
cores would make sense…..

I’m having a hard time wrapping my head around the use-case. You’re 
“delivering a new index”, which I take to mean you’re building a completely new
index somewhere else.

But you’re also updating the target index. What’s the relationship between the
index you’re “delivering” and the update sent while the core is unloaded? Are
the updates _already_ in the index you’re delivering or would you expect them
to be in the new index? Or are they just lost? Or does the indexing program
resend them after the core is created?

The unloaded core should not have any open index writers though. What I’m 
guessing is that updates are coming in before the unload is complete. Instead
of a sleep, have you tried specifying the async parameter and waiting until
REQUESTSTATUS tells you the unload is complete?

Best,
Erick

> On Nov 30, 2020, at 7:41 AM, elisabeth benoit  
> wrote:
> 
> Hello all,
> 
> We are using solr 7.3.1, with master and slave config.
> 
> When we deliver a new index we unload the core, with option delete data dir
> = true, then recreate the data folder and copy the new index files into
> that folder before sending solr a command to recreate the core (with the
> same name).
> 
> But we have, at the same time, some batches indexing non stop the core we
> just unloaded, and it happens quite frequently that we have an error at
> this point, the copy cannot be done, and I guess it is because of a
> write.lock file created by a solr index writer in the index directory.
> 
> Is it possible, when unloading the core, to stop / kill index writer? I've
> tried including a sleep after the unload and before recreation of the index
> folder, it seems to work but I was wondering if a better solution exists.
> 
> Best regards,
> Elisabeth



Re: Need help to configure automated deletion of shard in solr

2020-11-30 Thread Erick Erickson
Are you using the implicit router? Otherwise you cannot delete a shard.
And you won’t have any shards that have zero documents anyway.

It’d be a little convoluted, but you could use the collections COLSTATUS Api to
find the names of all your replicas. Then query _one_ replica of each
shard with something like
solr/collection1_shard1_replica_n1/q=*:*&distrib=false

that’ll return the number of live docs (i.e. non-deleted docs) and if it’s zero
you can delete the shard.

But the implicit router requires you take complete control of where documents
go, i.e. which shard they land on.

This really sounds like an XY problem. What’s the use  case you’re trying
to support where you expect a shard’s number of live docs to drop to zero?

Best,
Erick

> On Nov 30, 2020, at 4:57 AM, Pushkar Mishra  wrote:
> 
> Hi Solr team,
> 
> I am using solr cloud.(version 8.5.x). I have a need to find out a
> configuration where I can delete a shard , when number of documents reaches
> to zero in the shard , can some one help me out to achieve that ?
> 
> 
> It is urgent , so a quick response will be highly appreciated .
> 
> Thanks
> Pushkar
> 
> -- 
> Pushkar Kumar Mishra
> "Reactions are always instinctive whereas responses are always well thought
> of... So start responding rather than reacting in life"



Re: Need help to configure automated deletion of shard in solr

2020-11-30 Thread Pushkar Mishra
Hi Erick,
First of all thanks for your response . I will check the possibility  .
Let me explain my problem  in detail :

1. We have other use cases where we are making use of listener on
postCommit to delete/shift/split the shards . So we have capability to
delete the shards .
2. The current use case is , where we have to delete the documents from the
shard , and during deletion process(it will be scheduled process, may be
hourly or daily, which will delete the documents) , if shards  gets empty
(or may be lets  say nominal documents are left ) , then delete the shard.
And I am exploring to do this using configuration .
Regards
Pushkar

On Mon, Nov 30, 2020, 8:48 PM Erick Erickson 
wrote:

> Are you using the implicit router? Otherwise you cannot delete a shard.
> And you won’t have any shards that have zero documents anyway.
>
> It’d be a little convoluted, but you could use the collections COLSTATUS
> Api to
> find the names of all your replicas. Then query _one_ replica of each
> shard with something like
> solr/collection1_shard1_replica_n1/q=*:*&distrib=false
>
> that’ll return the number of live docs (i.e. non-deleted docs) and if it’s
> zero
> you can delete the shard.
>
> But the implicit router requires you take complete control of where
> documents
> go, i.e. which shard they land on.
>
> This really sounds like an XY problem. What’s the use  case you’re trying
> to support where you expect a shard’s number of live docs to drop to zero?
>
> Best,
> Erick
>
> > On Nov 30, 2020, at 4:57 AM, Pushkar Mishra 
> wrote:
> >
> > Hi Solr team,
> >
> > I am using solr cloud.(version 8.5.x). I have a need to find out a
> > configuration where I can delete a shard , when number of documents
> reaches
> > to zero in the shard , can some one help me out to achieve that ?
> >
> >
> > It is urgent , so a quick response will be highly appreciated .
> >
> > Thanks
> > Pushkar
> >
> > --
> > Pushkar Kumar Mishra
> > "Reactions are always instinctive whereas responses are always well
> thought
> > of... So start responding rather than reacting in life"
>
>


Re: data import handler deprecated?

2020-11-30 Thread Dmitri Maziuk

On 11/30/2020 7:50 AM, David Smiley wrote:

Yes, absolutely to what Eric said.  We goofed on news / release highlights
on how to communicate what's happening in Solr.  From a Solr insider point
of view, we are "deprecating" because strictly speaking, the code isn't in
our codebase any longer.  From a user point of view (the audience of news /
release notes), the functionality has *moved*.


Just FYI, there is the dih 8.7.0 jar in 
repo1.maven.org/maven2/org/apache/solr -- whereas the github build is on 
8.6.0.


Dima



Re: Standard tokenizer not considering emojis as special chars in solr 8.4.1, it does in solr 5

2020-11-30 Thread Deepu
Hi All,

Any suggestions on below observation, can i use Char Filter to retain old
behavior of Standard Tokenizer ?

Thanks,
Deepu

On Sat, Nov 28, 2020 at 4:59 PM Deepu  wrote:

> Hi All,
>
> We are in process of migrating from Solr 5 to solr 8, during testing
> observed that Standard tokenizer in Solr 5 was considering emojis as
> special chars and removing them apparently in Solr 8 it's considering them
> as regular chars so not removing while indexing.
>
> We need to retain same behavior in solr 8 also, as we are using white
> space tokenizer to index Emojis and standard tokenizer to remove them now
> both are behaving the same.
>
> Please share your suggestions.
>
>
> Thanks,
> Deepu
>


Re: Need help to configure automated deletion of shard in solr

2020-11-30 Thread Pushkar Mishra
On Mon, Nov 30, 2020, 9:15 PM Pushkar Mishra  wrote:

> Hi Erick,
> First of all thanks for your response . I will check the possibility  .
> Let me explain my problem  in detail :
>
> 1. We have other use cases where we are making use of listener on
> postCommit to delete/shift/split the shards . So we have capability to
> delete the shards .
> 2. The current use case is , where we have to delete the documents from
> the shard , and during deletion process(it will be scheduled process, may
> be hourly or daily, which will delete the documents) , if shards  gets
> empty (or may be lets  say nominal documents are left ) , then delete the
> shard.  And I am exploring to do this using configuration .
>
3. Also it will not be in live shard for sure as only those documents are
deleted which have TTL got over . TTL could be a month or year.

Please assist if you have any config based idea on this

> Regards
> Pushkar
>
> On Mon, Nov 30, 2020, 8:48 PM Erick Erickson 
> wrote:
>
>> Are you using the implicit router? Otherwise you cannot delete a shard.
>> And you won’t have any shards that have zero documents anyway.
>>
>> It’d be a little convoluted, but you could use the collections COLSTATUS
>> Api to
>> find the names of all your replicas. Then query _one_ replica of each
>> shard with something like
>> solr/collection1_shard1_replica_n1/q=*:*&distrib=false
>>
>> that’ll return the number of live docs (i.e. non-deleted docs) and if
>> it’s zero
>> you can delete the shard.
>>
>> But the implicit router requires you take complete control of where
>> documents
>> go, i.e. which shard they land on.
>>
>> This really sounds like an XY problem. What’s the use  case you’re trying
>> to support where you expect a shard’s number of live docs to drop to zero?
>>
>> Best,
>> Erick
>>
>> > On Nov 30, 2020, at 4:57 AM, Pushkar Mishra 
>> wrote:
>> >
>> > Hi Solr team,
>> >
>> > I am using solr cloud.(version 8.5.x). I have a need to find out a
>> > configuration where I can delete a shard , when number of documents
>> reaches
>> > to zero in the shard , can some one help me out to achieve that ?
>> >
>> >
>> > It is urgent , so a quick response will be highly appreciated .
>> >
>> > Thanks
>> > Pushkar
>> >
>> > --
>> > Pushkar Kumar Mishra
>> > "Reactions are always instinctive whereas responses are always well
>> thought
>> > of... So start responding rather than reacting in life"
>>
>>


facet.method=smart

2020-11-30 Thread Jae Joo
Is "smart" really smarter than one explicitly defined?

For "emun" type, would it be faster to define facet.method=enum than smart?

Jae


Shard Lock

2020-11-30 Thread sambasivarao giddaluri
Hi All,
We are getting below exception from Solr where 3 zk with 3 solr nodes and 3
replicas. It was working fine and we got this exception unexpectedly.

   -
- *k04o95kz_shard2_replica_n10:*
org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
   Index dir 
*'/opt/solr/volumes/data/cores/k04o95kz_shard2_replica_n10/data/index.20201126040543992'
   of core 'k04o95kz_shard2_replica_n10' is already locked. The most likely
   cause is another Solr server (or another solr core in this server) also
   configured to use this directory; other possible causes may be specific to
   lockType: native*
   - *k04o95kz_shard3_replica_n16:
org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
   Index dir
   
'/opt/solr/volumes/data/cores/k04o95kz_shard3_replica_n16/data/index.20201126040544142'
   of core 'k04o95kz_shard3_replica_n16' is already locked. The most likely
   cause is another Solr server (or another solr core in this server) also
   configured to use this directory; other possible causes may be specific to
   lockType: native*
   -


[image: Screen Shot 2020-11-30 at 4.10.46 PM.png]

[image: Screen Shot 2020-11-30 at 4.09.29 PM.png]

Any advice

Thanks
sam


Re: Shard Lock

2020-11-30 Thread sambasivarao giddaluri
when checked in to *opt/solr/volumes/data/cores/ both
**k04o95kz_shard2_replica_n10
and **k04o95kz_shard3_replica_n16 replicate are not present no idea how
they got deleted.*

On Mon, Nov 30, 2020 at 4:13 PM sambasivarao giddaluri <
sambasiva.giddal...@gmail.com> wrote:

> Hi All,
> We are getting below exception from Solr where 3 zk with 3 solr nodes and
> 3 replicas. It was working fine and we got this exception unexpectedly.
>
>-
> - *k04o95kz_shard2_replica_n10:* 
> org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
>Index dir 
> *'/opt/solr/volumes/data/cores/k04o95kz_shard2_replica_n10/data/index.20201126040543992'
>of core 'k04o95kz_shard2_replica_n10' is already locked. The most likely
>cause is another Solr server (or another solr core in this server) also
>configured to use this directory; other possible causes may be specific to
>lockType: native*
>- *k04o95kz_shard3_replica_n16: 
> org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
>Index dir
>
> '/opt/solr/volumes/data/cores/k04o95kz_shard3_replica_n16/data/index.20201126040544142'
>of core 'k04o95kz_shard3_replica_n16' is already locked. The most likely
>cause is another Solr server (or another solr core in this server) also
>configured to use this directory; other possible causes may be specific to
>lockType: native*
>-
>
>
> [image: Screen Shot 2020-11-30 at 4.10.46 PM.png]
>
> [image: Screen Shot 2020-11-30 at 4.09.29 PM.png]
>
> Any advice
>
> Thanks
> sam
>


Re: uploading model in Solr 6.6

2020-11-30 Thread vishal patel
Any one help me for my question?

Regards,
Vishal


From: vishal patel 
Sent: Friday, November 27, 2020 12:18 PM
To: solr-user@lucene.apache.org 
Subject: uploading model in Solr 6.6

Hi

what is meaning of weight of feature at the time Uploading a Model for 
Re-Ranking?
How can we calculate the weight? Ranking is depended on weight?

Please give me more details about weight.
https://lucene.apache.org/solr/guide/8_1/learning-to-rank.html#uploading-a-model

Regards,
Vishal

Sent from Outlook


Can solr index replacement character

2020-11-30 Thread Eran Buchnick
Hi community,
During integration tests with new data source I have noticed weird scenario
where replacement character can't be searched, though, seems to be stored.
I mean, honestly, I don't want that irrelevant data stored in my index but
I wondered if solr can index replacement character (U+FFFD �) as string, if
so, how to search it?
And in general, is there any built-in char filtration?!

Thanks