Hi, These are really text positions. For example I have a document: "hello thanks for calling the support how can I help you"
And in the application I would like to search for documents that match "thanks" NEAR "support" only in first 30 words of the document (greeting part for example), and not in the middle/end part of the document. Regards, Adi -----Original Message----- From: Alexandre Rafalovitch <arafa...@gmail.com> Sent: Wednesday, October 16, 2019 12:48 PM To: solr-user <solr-user@lucene.apache.org> Subject: Re: Position search So are these really text locations or rather actually sections of the document. If later, can you parse out sections during indexing? Regards, Alex On Wed, Oct 16, 2019, 3:57 AM Kaminski, Adi, <adi.kamin...@verint.com> wrote: > Hi, > Thanks for the responses. > > It's a soft boundary which is resulted by dynamic syntax from our > application. So may vary from different user searches, one user can > search some "word1" in starting 30 words, and another can search > "word2" in starting 10 words. The use case is to match some > terms/phrase in specific document places in order to identify > scripts/specific word ocuurences. > > So I guess copy field won't work here. > > Any other suggestions/thoughts ? > Maybe some hidden position filters in native level to limit from > start/end of the document ? > > Thanks, > Adi > > -----Original Message----- > From: Tim Casey <tca...@gmail.com> > Sent: Tuesday, October 15, 2019 11:05 PM > To: solr-user@lucene.apache.org > Subject: Re: Position search > > If this is about a normalized query, I would put the normalization > text into a specific field. The reason for this is you may want to > search the overall text during any form of expansion phase of searching for > data. > That is, maybe you want to know the context of up to the 120th word. > At least you have both. > Also, you may want to note which normalized fields were truncated or > were simply too small. This would give some guidance as to the bias of > the normalization. If 95% of the fields were not truncated, there is > a chance you are not doing good at normalizing because you have a set > of particularly short messages. So I would expect a small set of side > fields remarking this. This would allow you to carry the measures > along with the data. > > tim > > On Tue, Oct 15, 2019 at 12:19 PM Alexandre Rafalovitch > <arafa...@gmail.com > > > wrote: > > > Is the 100 words a hard boundary or a soft one? > > > > If it is a hard one (always 100 words), the easiest is probably copy > > field and in the (unstored) copy, trim off whatever you don't want > > to search. Possibly using regular expressions. Of course, "what's a word" > > is an important question here. > > > > Similarly, you could do that with Update Request Processors and > > clone/process field even before it hits the schema. Then you could > > store the extract for highlighting purposes. > > > > Regards, > > Alex. > > > > On Tue, 15 Oct 2019 at 02:25, Kaminski, Adi > > <adi.kamin...@verint.com> > > wrote: > > > > > > Hi, > > > What's the recommended way to search in Solr (assuming 8.2 is > > > used) for > > specific terms/phrases/expressions while limiting the search from > > position perspective. > > > For example to search only in the first/last 100 words of the > > > document > ? > > > > > > Is there any built-in functionality for that ? > > > > > > Thanks in advance, > > > Adi > > > > > > > > > This electronic message may contain proprietary and confidential > > information of Verint Systems Inc., its affiliates and/or > > subsidiaries. The information is intended to be for the use of the > > individual(s) or > > entity(ies) named above. If you are not the intended recipient (or > > authorized to receive this e-mail for the intended recipient), you > > may not use, copy, disclose or distribute to anyone this message or > > any information contained in this message. If you have received this > > electronic message in error, please notify us by replying to this e-mail. > > > > > This electronic message may contain proprietary and confidential > information of Verint Systems Inc., its affiliates and/or > subsidiaries. The information is intended to be for the use of the > individual(s) or > entity(ies) named above. If you are not the intended recipient (or > authorized to receive this e-mail for the intended recipient), you may > not use, copy, disclose or distribute to anyone this message or any > information contained in this message. If you have received this > electronic message in error, please notify us by replying to this e-mail. > This electronic message may contain proprietary and confidential information of Verint Systems Inc., its affiliates and/or subsidiaries. The information is intended to be for the use of the individual(s) or entity(ies) named above. If you are not the intended recipient (or authorized to receive this e-mail for the intended recipient), you may not use, copy, disclose or distribute to anyone this message or any information contained in this message. If you have received this electronic message in error, please notify us by replying to this e-mail.