Hi, We have managed to resolve the issue, by changing the \s to \W. The reason could be due to that some of the spaces and white space instead of just a space. Using \s will only remove the spaces and not the white spaces, but using \W will remove the white spaces as well.
We have used this config, and it works. <processor class="solr.RegexReplaceProcessorFactory"> <str name="fieldName">content</str> <str name="pattern">(\n\W*){2,}</str> <str name="replacement"><br><br></str> <bool name="literalReplacement">true</bool> </processor> <processor class="solr.RegexReplaceProcessorFactory"> <str name="fieldName">content</str> <str name="pattern">(\n\W*){1,}</str> <str name="replacement"><br></str> <bool name="literalReplacement">true</bool> </processor> Regards, Edwin On Tue, 12 Mar 2019 at 10:49, Zheng Lin Edwin Yeo <edwinye...@gmail.com> wrote: > Hi, > > Has anyone else faced the same issue before? > So far all the regex patterns that we tried in this thread are not able to > resolve the issue. > > Regards, > Edwin > > On Fri, 8 Mar 2019 at 12:17, Zheng Lin Edwin Yeo <edwinye...@gmail.com> > wrote: > >> Hi Paul, >> >> Sorry, I realized there is an extra ']' in the pattern provided, which is >> why there are so many <br> in the output. >> >> The output is exactly the same as previously (previous index result) if >> we remove the extra ']', as shown in the configuration below. >> >> <processor class="solr.RegexReplaceProcessorFactory"> >> <str name="fieldName">content</str> >> <str name="pattern">[ \t\x0b\f]*\r?\n</str> >> <str name="replacement"><br></str> >> <bool name="literalReplacement">true</bool> >> </processor> >> <processor class="solr.RegexReplaceProcessorFactory"> >> <str name="fieldName">content</str> >> <str name="pattern">(<br>[ \t\x0b\f]*){3,}</str> >> <str name="replacement"><br><br></str> >> <bool name="literalReplacement">true</bool> >> </processor> >> >> Regards, >> Edwin >> >> >> >> On Thu, 7 Mar 2019 at 22:51, Zheng Lin Edwin Yeo <edwinye...@gmail.com> >> wrote: >> >>> Hi Paul, >>> >>> Thanks for the reply. >>> >>> For the 2nd pattern, if we put this pattern <str >>> name="pattern">(<br>[ \t\x0b\f]]*){3,}</str>, which is like the >>> configurations below: >>> >>> <processor class="solr.RegexReplaceProcessorFactory"> >>> <str name="fieldName">content</str> >>> <str name="pattern">[ \t\x0b\f]*\r?\n</str> >>> <str name="replacement"><br></str> >>> <bool name="literalReplacement">true</bool> >>> </processor> >>> <processor class="solr.RegexReplaceProcessorFactory"> >>> <str name="fieldName">content</str> >>> <str name="pattern">(<br>[ \t\x0b\f]]*){3,}</str> >>> <str name="replacement"><br><br></str> >>> <bool name="literalReplacement">true</bool> >>> </processor> >>> >>> It will not be able to change all those more than 3 <br> to 2 <br>. >>> >>> We will end up with many <br> in the output, like the example below: >>> >>> http://www.concorded.com/<br><br> >>> <br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br> >>> On Tue, Dec 18, 2018 >>> >>> >>> Regards, >>> Edwin >>> >>> >>> >>> >>> On Thu, 7 Mar 2019 at 20:44, <paul.d...@ub.unibe.ch> wrote: >>> >>>> Hi Edwin >>>> >>>> >>>> >>>> I can’t understand why the pattern is not working and where the spaces >>>> between the <br> are coming from. It should be possible to allow for spaces >>>> between the <br> in the second match pattern however i.e. 2nd pattern >>>> >>>> >>>> >>>> <str name="pattern">(<br>[ \t\x0b\f]]*){3,}</str> >>>> >>>> >>>> >>>> /Paul >>>> >>>> >>>> >>>> Gesendet von Mail<https://go.microsoft.com/fwlink/?LinkId=550986> für >>>> Windows 10 >>>> >>>> >>>> >>>> Von: Zheng Lin Edwin Yeo<mailto:edwinye...@gmail.com> >>>> Gesendet: Mittwoch, 6. März 2019 16:28 >>>> An: solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org> >>>> Betreff: Re: RegexReplaceProcessorFactory pattern to detect multiple \n >>>> >>>> >>>> >>>> Hi Paul, >>>> >>>> I have tried with the first match pattern to be <str name="pattern">[ >>>> \t\x0b\f]*\r?\n</str>, like the configuration below: >>>> >>>> <processor class="solr.RegexReplaceProcessorFactory"> >>>> <str name="fieldName">content</str> >>>> <str name="pattern">[ \t\x0b\f]*\r?\n</str> >>>> <str name="replacement"><br></str> >>>> <bool name="literalReplacement">true</bool> >>>> </processor> >>>> <processor class="solr.RegexReplaceProcessorFactory"> >>>> <str name="fieldName">content</str> >>>> <str name="pattern">(<br>){3,}</str> >>>> <str name="replacement"><br><br></str> >>>> <bool name="literalReplacement">true</bool> >>>> </processor> >>>> >>>> However, the result is still the same as before (previous index >>>> results), >>>> with the 4 <br>. >>>> >>>> Regards, >>>> Edwin >>>> >>>> >>>> On Wed, 6 Mar 2019 at 18:23, <paul.d...@ub.unibe.ch> wrote: >>>> >>>> > Hi Edwin >>>> > >>>> > >>>> > >>>> > You are correct re the 2nd pattern – my bad. Looking at the 4 <br>, >>>> it’s >>>> > actually the sequence «<br><br> <br><br>»? So perhaps the first match >>>> > pattern could be <str name="pattern">[ \t\x0b\f]*\r?\n</str> >>>> > >>>> > >>>> > >>>> > i.e. [space tab vertical-tab formfeed] >>>> > >>>> > >>>> > >>>> > Regards, >>>> > >>>> > Paul >>>> > >>>> > >>>> > >>>> > Gesendet von Mail<https://go.microsoft.com/fwlink/?LinkId=550986> für >>>> > Windows 10 >>>> > >>>> > >>>> > >>>> > Von: Zheng Lin Edwin Yeo<mailto:edwinye...@gmail.com> >>>> > Gesendet: Mittwoch, 6. März 2019 07:44 >>>> > An: solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org> >>>> > Betreff: Re: RegexReplaceProcessorFactory pattern to detect multiple >>>> \n >>>> > >>>> > >>>> > >>>> > Hi Paul, >>>> > >>>> > I have modified the second pattern to be (<br>){3,}, instead of >>>> > (<br><br>){3,}. This pattern of >>>> (<br><br>){3,} >>>> > will actually look for 6 or more <br> instead of 3 <br>, as we have >>>> put >>>> > the <br> two times in the pattern, which is the reason that there are >>>> more >>>> > <br> in the result, as cases where there are less than 6 <br> are not >>>> being >>>> > replaced, so we ended up having up to 5 <br> in the index. >>>> > >>>> > Modified configuration: >>>> > <processor class="solr.RegexReplaceProcessorFactory"> >>>> > <str name="fieldName">content</str> >>>> > <str name="pattern">(<br>){3,}</str> >>>> > <str name="replacement"><br><br></str> >>>> > <bool name="literalReplacement">true</bool> >>>> > </processor> >>>> > >>>> > This will bring us back to the result of the previous index content, >>>> > meaning the issue of having the 4 <br> is still there. >>>> > >>>> > Regards, >>>> > Edwin >>>> > >>>> > >>>> > >>>> > Regards, >>>> > Edwin >>>> > >>>> > On Wed, 6 Mar 2019 at 11:37, Zheng Lin Edwin Yeo < >>>> edwinye...@gmail.com> >>>> > wrote: >>>> > >>>> > > Hi Paul, >>>> > > >>>> > > Further to my previous email, which there was an extra "}" in the >>>> > > configuration, I have changed to use the below configuration based >>>> on >>>> > your >>>> > > suggestion. >>>> > > >>>> > > <processor class="solr.RegexReplaceProcessorFactory"> >>>> > > <str name="fieldName">content</str> >>>> > > <str name="pattern">[ \t]*\r?\n</str> >>>> > > <str name="replacement"><br></str> >>>> > > <bool name="literalReplacement">true</bool> >>>> > > </processor> >>>> > > <processor class="solr.RegexReplaceProcessorFactory"> >>>> > > <str name="fieldName">content</str> >>>> > > <str name="pattern">(<br><br>){3,}</str> >>>> > > <str name="replacement"><br><br></str> >>>> > > <bool name="literalReplacement">true</bool> >>>> > > </processor> >>>> > > >>>> > > However, the result that I get still has more than 2 <br>. In fact, >>>> the >>>> > > result become worse, as you can see from the comparison below. >>>> > > >>>> > > Example 1: The sentence that the regex pattern used to work >>>> correctly. >>>> > But >>>> > > with the latest pattern, it has now changed from 2 <br> to become 5 >>>> <br>, >>>> > > which is wrong. >>>> > > *Original content in EML file:* >>>> > > Dear Sir, >>>> > > >>>> > > >>>> > > I am terminating >>>> > > *Original content:* Dear Sir, \n\n \n \n\n I am terminating >>>> > > *Previous Index content: * Dear Sir, <br><br>I am terminating >>>> > > *Current Index content*: Dear Sir, <br><br><br><br><br> I am >>>> > terminating >>>> > > >>>> > > Example 2: The sentence that the above regex pattern is partially >>>> working >>>> > > (as you can see, instead of 2 <br>, there are 4 <br>) >>>> > > *Original content in EML file:* >>>> > > >>>> > > *exalted* >>>> > > >>>> > > *Psalm 89:17* >>>> > > >>>> > > >>>> > > 3 Choa Chu Kang Avenue 4 >>>> > > *Original content:* exalted \n \n\n Psalm 89:17 \n\n \n\n 3 >>>> Choa >>>> > > Chu Kang Avenue 4, Singapore >>>> > > *Previous Index content: *exalted <br><br>Psalm 89:17 <br><br> >>>> > > <br><br>3 Choa Chu Kang Avenue 4, Singapore >>>> > > *Current Index content*: <br><br><br> Psalm 89:17<br><br> >>>> <br><br> 3 >>>> > > Choa Chu Kang Avenue 3, Singapor4 >>>> > > >>>> > > Example 3: The sentence that the above regex pattern is partially >>>> working >>>> > > (as you can see, instead of 2 <br>, there are 4 <br>). For the >>>> latest >>>> > code, >>>> > > there are now 5 <br> >>>> > > *Original content in EML file:* >>>> > > >>>> > > http://www.concorded.com/ >>>> > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> > > On Tue, Dec 18, 2018 at 10:07 AM >>>> > > *Original content:* http://www.concorded.com/ \n\n \n\n \n >>>> \n\n \n\n >>>> > > \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n\n \n\n\n On Tue, Dec 18, >>>> 2018 at >>>> > > 10:07 AM >>>> > > *Previous Index content: *http://www.concorded.com/ <br><br> >>>> > > <br><br>On Tue, Dec 18, 2018 at 10:07 AM >>>> > > *Current Index content:* http://www.concorded.com/<br><br> >>>> <br><br><br> >>>> > > On Tue, Dec 18, 2018 at 10:07 AM >>>> > > >>>> > > >>>> > > Regards, >>>> > > Edwin >>>> > > >>>> > > On Wed, 6 Mar 2019 at 00:29, Zheng Lin Edwin Yeo < >>>> edwinye...@gmail.com> >>>> > > wrote: >>>> > > >>>> > >> Hi Paul, >>>> > >> >>>> > >> Thank you for the reply. >>>> > >> >>>> > >> I have tried to add the following configuration according to your >>>> > >> suggestion: >>>> > >> >>>> > >> <processor class="solr.RegexReplaceProcessorFactory"> >>>> > >> <str name="fieldName">content</str> >>>> > >> <str name="pattern">[ \t]*\r?\n}</str> >>>> > >> <str name="replacement"><br></str> >>>> > >> <bool name="literalReplacement">true</bool> >>>> > >> </processor> >>>> > >> >>>> > >> <processor class="solr.RegexReplaceProcessorFactory"> >>>> > >> <str name="fieldName">content</str> >>>> > >> <str name="pattern">(<br><br>){3,}</str> >>>> > >> <str name="replacement"><br><br></str> >>>> > >> <bool name="literalReplacement">true</bool> >>>> > >> </processor> >>>> > >> >>>> > >> However, none of the \n is being removed this time round. >>>> > >> Is the order and/or the pattern correct? >>>> > >> >>>> > >> Regards, >>>> > >> Edwin >>>> > >> >>>> > >> On Tue, 5 Mar 2019 at 19:54, <paul.d...@ub.unibe.ch> wrote: >>>> > >> >>>> > >>> Hi Edwin >>>> > >>> >>>> > >>> >>>> > >>> >>>> > >>> Try for the first pattern/replacement >>>> > >>> >>>> > >>> >>>> > >>> >>>> > >>> <str name="pattern">[ \t]*\r?\n</str> >>>> > >>> >>>> > >>> <str name="replacement"><br></str> >>>> > >>> >>>> > >>> >>>> > >>> >>>> > >>> Now all line endings and preceding whitespace characters should be >>>> > >>> changed to ‘<br>’. >>>> > >>> >>>> > >>> >>>> > >>> >>>> > >>> The second pattern replacement should replace 3 or more ‘<br>’ >>>> > sequences >>>> > >>> to 2 ‘<br>’ sequences: >>>> > >>> >>>> > >>> >>>> > >>> >>>> > >>> <str name="pattern">(<br><br>){3,}</str> >>>> > >>> >>>> > >>> <str name="replacement"><br><br></str> >>>> > >>> >>>> > >>> >>>> > >>> >>>> > >>> Hope this approach works. Sorry for not replying earlier and best >>>> > >>> regards, >>>> > >>> >>>> > >>> Paul >>>> > >>> >>>> > >>> >>>> > >>> >>>> > >>> >>>> > >>> >>>> > >>> Gesendet von Mail<https://go.microsoft.com/fwlink/?LinkId=550986> >>>> für >>>> > >>> Windows 10 >>>> > >>> >>>> > >>> >>>> > >>> >>>> > >>> Von: Zheng Lin Edwin Yeo<mailto:edwinye...@gmail.com> >>>> > >>> Gesendet: Dienstag, 5. März 2019 03:35 >>>> > >>> An: solr-user@lucene.apache.org<mailto: >>>> solr-user@lucene.apache.org> >>>> > >>> Betreff: Re: RegexReplaceProcessorFactory pattern to detect >>>> multiple \n >>>> > >>> >>>> > >>> >>>> > >>> >>>> > >>> Hi, >>>> > >>> >>>> > >>> For your info, this issue is occurring in the new Solr 7.7.1 as >>>> well. >>>> > >>> >>>> > >>> Regards, >>>> > >>> Edwin >>>> > >>> >>>> > >>> On Mon, 25 Feb 2019 at 10:28, Zheng Lin Edwin Yeo < >>>> > edwinye...@gmail.com> >>>> > >>> wrote: >>>> > >>> >>>> > >>> > Hi, >>>> > >>> > >>>> > >>> > Anyone else has other suggestions or have faced the same >>>> problem? >>>> > >>> > >>>> > >>> > Regards, >>>> > >>> > Edwin >>>> > >>> > >>>> > >>> > On Wed, 20 Feb 2019 at 16:58, Zheng Lin Edwin Yeo < >>>> > >>> edwinye...@gmail.com> >>>> > >>> > wrote: >>>> > >>> > >>>> > >>> >> Hi Paul, >>>> > >>> >> >>>> > >>> >> If I tried to execute the second step first, then I will only >>>> get a >>>> > >>> >> single <br> for those with 2 <br>. >>>> > >>> >> For those that we originally get 4 <br>, there will be 2 <br> >>>> with a >>>> > >>> >> space in between. >>>> > >>> >> >>>> > >>> >> This is just changing the 2 <br> to be a single <br>, since the >>>> > second >>>> > >>> >> step is to replace with a single <br>. >>>> > >>> >> But it has not solved the underlying problem yet. >>>> > >>> >> >>>> > >>> >> Regards, >>>> > >>> >> Edwin >>>> > >>> >> >>>> > >>> >> >>>> > >>> >> On Wed, 20 Feb 2019 at 16:41, <paul.d...@ub.unibe.ch> wrote: >>>> > >>> >> >>>> > >>> >>> If the second step is executed first, then you will get the >>>> > unwanted >>>> > >>> 4 >>>> > >>> >>> <br> >>>> > >>> >>> >>>> > >>> >>> >>>> > >>> >>> >>>> > >>> >>> Gesendet von Mail< >>>> https://go.microsoft.com/fwlink/?LinkId=550986> >>>> > >>> für >>>> > >>> >>> Windows 10 >>>> > >>> >>> >>>> > >>> >>> >>>> > >>> >>> >>>> > >>> >>> Von: Zheng Lin Edwin Yeo<mailto:edwinye...@gmail.com> >>>> > >>> >>> Gesendet: Mittwoch, 20. Februar 2019 09:29 >>>> > >>> >>> An: solr-user@lucene.apache.org<mailto: >>>> solr-user@lucene.apache.org >>>> > > >>>> > >>> >>> Betreff: Re: RegexReplaceProcessorFactory pattern to detect >>>> > multiple >>>> > >>> \n >>>> > >>> >>> >>>> > >>> >>> >>>> > >>> >>> >>>> > >>> >>> Hi Jörn , >>>> > >>> >>> >>>> > >>> >>> Do you mean the regex is not correct? >>>> > >>> >>> >>>> > >>> >>> We are already using two RegexReplaceProcessorFactory steps, >>>> like >>>> > >>> the one >>>> > >>> >>> shown below. The output that we get is still the same. >>>> > >>> >>> >>>> > >>> >>> <processor class="solr.RegexReplaceProcessorFactory"> >>>> > >>> >>> <str name="fieldName">content</str> >>>> > >>> >>> <str name="pattern">([ \t]*\r?\n){2,}</str> >>>> > >>> >>> <str name="replacement"><br><br></str> >>>> > >>> >>> <bool name="literalReplacement">true</bool> >>>> > >>> >>> <processor> >>>> > >>> >>> >>>> > >>> >>> <processor class="solr.RegexReplaceProcessorFactory"> >>>> > >>> >>> <str name="fieldName">content</str> >>>> > >>> >>> <str name="pattern">([ \t]*\r?\n){1,}</str> >>>> > >>> >>> <str name="replacement"><br></str> >>>> > >>> >>> <bool name="literalReplacement">true</bool> >>>> > >>> >>> <processor> >>>> > >>> >>> >>>> > >>> >>> Regards, >>>> > >>> >>> Edwin >>>> > >>> >>> >>>> > >>> >>> On Wed, 20 Feb 2019 at 16:03, Jörn Franke < >>>> jornfra...@gmail.com> >>>> > >>> wrote: >>>> > >>> >>> >>>> > >>> >>> > Then you need two regexprocessfactory steps >>>> > >>> >>> > >>>> > >>> >>> > > Am 20.02.2019 um 08:12 schrieb Zheng Lin Edwin Yeo < >>>> > >>> >>> edwinye...@gmail.com >>>> > >>> >>> > >: >>>> > >>> >>> > > >>>> > >>> >>> > > Hi, >>>> > >>> >>> > > >>>> > >>> >>> > > Thanks for the reply. >>>> > >>> >>> > > >>>> > >>> >>> > > Do you know of any regex online tool that works correctly >>>> for >>>> > >>> Java >>>> > >>> >>> regex? >>>> > >>> >>> > > I tried to find some, but they are not working properly. >>>> > >>> >>> > > >>>> > >>> >>> > > Yes, our plan is to replace more than one \n with >>>> <br><br>, and >>>> > >>> >>> single \n >>>> > >>> >>> > > with single <br>. >>>> > >>> >>> > > >>>> > >>> >>> > > Regards, >>>> > >>> >>> > > Edwin >>>> > >>> >>> > > >>>> > >>> >>> > >> On Wed, 20 Feb 2019 at 14:59, Jörn Franke < >>>> > jornfra...@gmail.com >>>> > >>> > >>>> > >>> >>> wrote: >>>> > >>> >>> > >> >>>> > >>> >>> > >> Solr uses Java regex matching, so i doubt there is a bug >>>> - it >>>> > >>> would >>>> > >>> >>> then >>>> > >>> >>> > >> be in the JDK. Try out in a regex online Tool that >>>> supports >>>> > Java >>>> > >>> >>> regex >>>> > >>> >>> > for >>>> > >>> >>> > >> your solution. >>>> > >>> >>> > >> >>>> > >>> >>> > >> I believe you want to have 2 regex process factories: >>>> > >>> >>> > >> One that deals with single \n and one that deals with >>>> more >>>> > than >>>> > >>> one >>>> > >>> >>> \n >>>> > >>> >>> > >> >>>> > >>> >>> > >>> Am 20.02.2019 um 06:17 schrieb Zheng Lin Edwin Yeo < >>>> > >>> >>> > edwinye...@gmail.com >>>> > >>> >>> > >>> : >>>> > >>> >>> > >>> >>>> > >>> >>> > >>> Hi, >>>> > >>> >>> > >>> >>>> > >>> >>> > >>> We have tried with the following pattern ([ >>>> \t]*\r?\n){2,} >>>> > and >>>> > >>> >>> > >>> configuration: >>>> > >>> >>> > >>> >>>> > >>> >>> > >>> <processor class="solr.RegexReplaceProcessorFactory"> >>>> > >>> >>> > >>> <str name="fieldName">content</str> >>>> > >>> >>> > >>> <str name="pattern">([ \t]*\r?\n){2,}</str> >>>> > >>> >>> > >>> <str name="replacement"><br><br></str> >>>> > >>> >>> > >>> <bool name="literalReplacement">true</bool> >>>> > >>> >>> > >>> </processor> >>>> > >>> >>> > >>> >>>> > >>> >>> > >>> However, the issue is still occurring. >>>> > >>> >>> > >>> >>>> > >>> >>> > >>> Anyone else is able to help? >>>> > >>> >>> > >>> >>>> > >>> >>> > >>> Regards, >>>> > >>> >>> > >>> Edwin >>>> > >>> >>> > >>> >>>> > >>> >>> > >>> On Fri, 15 Feb 2019 at 11:47, Zheng Lin Edwin Yeo < >>>> > >>> >>> > edwinye...@gmail.com> >>>> > >>> >>> > >>> wrote: >>>> > >>> >>> > >>> >>>> > >>> >>> > >>>> Hi, >>>> > >>> >>> > >>>> >>>> > >>> >>> > >>>> For your info, this issue is occurring in Solr 7.7.0 as >>>> > well. >>>> > >>> >>> > >>>> >>>> > >>> >>> > >>>> Regards, >>>> > >>> >>> > >>>> Edwin >>>> > >>> >>> > >>>> >>>> > >>> >>> > >>>> On Tue, 12 Feb 2019 at 00:10, Zheng Lin Edwin Yeo < >>>> > >>> >>> > edwinye...@gmail.com >>>> > >>> >>> > >>> >>>> > >>> >>> > >>>> wrote: >>>> > >>> >>> > >>>> >>>> > >>> >>> > >>>>> Hi, >>>> > >>> >>> > >>>>> >>>> > >>> >>> > >>>>> Should we report this as a bug in Solr? >>>> > >>> >>> > >>>>> >>>> > >>> >>> > >>>>> Regards, >>>> > >>> >>> > >>>>> Edwin >>>> > >>> >>> > >>>>> >>>> > >>> >>> > >>>>> On Fri, 8 Feb 2019 at 22:18, Zheng Lin Edwin Yeo < >>>> > >>> >>> > edwinye...@gmail.com >>>> > >>> >>> > >>> >>>> > >>> >>> > >>>>> wrote: >>>> > >>> >>> > >>>>> >>>> > >>> >>> > >>>>>> Hi Paul, >>>> > >>> >>> > >>>>>> >>>> > >>> >>> > >>>>>> Regarding the regex (\n\s*){2,} that we are using, >>>> when we >>>> > >>> try >>>> > >>> >>> in on >>>> > >>> >>> > >>>>>> https://regex101.com/, it is able to give us the >>>> correct >>>> > >>> >>> result for >>>> > >>> >>> > >> all >>>> > >>> >>> > >>>>>> the examples (ie: All of them will only have >>>> <br><br>, and >>>> > >>> not >>>> > >>> >>> more >>>> > >>> >>> > >> than >>>> > >>> >>> > >>>>>> that like what we are getting in Solr in our earlier >>>> > >>> examples). >>>> > >>> >>> > >>>>>> >>>> > >>> >>> > >>>>>> Could there be a possibility of a bug in Solr? >>>> > >>> >>> > >>>>>> >>>> > >>> >>> > >>>>>> Regards, >>>> > >>> >>> > >>>>>> Edwin >>>> > >>> >>> > >>>>>> >>>> > >>> >>> > >>>>>> On Fri, 8 Feb 2019 at 00:33, Zheng Lin Edwin Yeo < >>>> > >>> >>> > >> edwinye...@gmail.com> >>>> > >>> >>> > >>>>>> wrote: >>>> > >>> >>> > >>>>>> >>>> > >>> >>> > >>>>>>> Hi Paul, >>>> > >>> >>> > >>>>>>> >>>> > >>> >>> > >>>>>>> We have tried it with the space preceeding the \n >>>> i.e. >>>> > <str >>>> > >>> >>> > >>>>>>> name="pattern">(\s*\n){2,}</str>, with the following >>>> > regex >>>> > >>> >>> pattern: >>>> > >>> >>> > >>>>>>> >>>> > >>> >>> > >>>>>>> <processor >>>> class="solr.RegexReplaceProcessorFactory"> >>>> > >>> >>> > >>>>>>> <str name="fieldName">content</str> >>>> > >>> >>> > >>>>>>> <str name="pattern">(\s*\n){2,}</str> >>>> > >>> >>> > >>>>>>> <str name="replacement"><br><br></str> >>>> > >>> >>> > >>>>>>> </processor> >>>> > >>> >>> > >>>>>>> >>>> > >>> >>> > >>>>>>> However, we are also getting the exact same results >>>> as >>>> > the >>>> > >>> >>> earlier >>>> > >>> >>> > >>>>>>> Example 1, 2 and 3. >>>> > >>> >>> > >>>>>>> >>>> > >>> >>> > >>>>>>> As for your point 2 on perhaps in the data you have >>>> other >>>> > >>> (non >>>> > >>> >>> > >>>>>>> printing) characters than \n, we have find that >>>> there are >>>> > >>> no >>>> > >>> >>> non >>>> > >>> >>> > >> printing >>>> > >>> >>> > >>>>>>> characters. It is just next line with a space. You >>>> can >>>> > >>> refer >>>> > >>> >>> to the >>>> > >>> >>> > >>>>>>> original content in the same examples below. >>>> > >>> >>> > >>>>>>> >>>> > >>> >>> > >>>>>>> >>>> > >>> >>> > >>>>>>> Example 1: The sentence that the above regex >>>> pattern is >>>> > >>> working >>>> > >>> >>> > >>>>>>> correctly >>>> > >>> >>> > >>>>>>> *Original content in EML file:* >>>> > >>> >>> > >>>>>>> Dear Sir, >>>> > >>> >>> > >>>>>>> >>>> > >>> >>> > >>>>>>> >>>> > >>> >>> > >>>>>>> I am terminating >>>> > >>> >>> > >>>>>>> *Original content:* Dear Sir, \n\n \n \n\n I am >>>> > >>> terminating >>>> > >>> >>> > >>>>>>> *Index content: * Dear Sir, <br><br>I am >>>> terminating >>>> > >>> >>> > >>>>>>> >>>> > >>> >>> > >>>>>>> Example 2: The sentence that the above regex >>>> pattern is >>>> > >>> >>> partially >>>> > >>> >>> > >>>>>>> working (as you can see, instead of 2 <br>, there >>>> are 4 >>>> > >>> <br>) >>>> > >>> >>> > >>>>>>> *Original content in EML file:* >>>> > >>> >>> > >>>>>>> >>>> > >>> >>> > >>>>>>> *exalted* >>>> > >>> >>> > >>>>>>> >>>> > >>> >>> > >>>>>>> *Psalm 89:17* >>>> > >>> >>> > >>>>>>> >>>> > >>> >>> > >>>>>>> >>>> > >>> >>> > >>>>>>> 3 Choa Chu Kang Avenue 4 >>>> > >>> >>> > >>>>>>> *Original content:* exalted \n \n\n Psalm 89:17 >>>> \n\n >>>> > >>> >>> \n\n 3 >>>> > >>> >>> > >>>>>>> Choa Chu Kang Avenue 4, Singapore >>>> > >>> >>> > >>>>>>> *Index content: *exalted <br><br>Psalm 89:17 >>>> <br><br> >>>> > >>> >>> <br><br>3 >>>> > >>> >>> > >>>>>>> Choa Chu Kang Avenue 4, Singapore >>>> > >>> >>> > >>>>>>> >>>> > >>> >>> > >>>>>>> Example 3: The sentence that the above regex >>>> pattern is >>>> > >>> >>> partially >>>> > >>> >>> > >>>>>>> working (as you can see, instead of 2 <br>, there >>>> are 4 >>>> > >>> <br>) >>>> > >>> >>> > >>>>>>> *Original content in EML file:* >>>> > >>> >>> > >>>>>>> >>>> > >>> >>> > >>>>>>> http://www.concordpri.moe.edu.sg/ >>>> > >>> >>> > >>>>>>> >>>> > >>> >>> > >>>>>>> >>>> > >>> >>> > >>>>>>> >>>> > >>> >>> > >>>>>>> >>>> > >>> >>> > >>>>>>> >>>> > >>> >>> > >>>>>>> >>>> > >>> >>> > >>>>>>> >>>> > >>> >>> > >>>>>>> >>>> > >>> >>> > >>>>>>> On Tue, Dec 18, 2018 at 10:07 AM >>>> > >>> >>> > >>>>>>> *Original content:* >>>> http://www.concordpri.moe.edu.sg/ >>>> > >>> \n\n >>>> > >>> >>> > \n\n >>>> > >>> >>> > >> \n >>>> > >>> >>> > >>>>>>> \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n\n >>>> > >>> \n\n\n On >>>> > >>> >>> Tue, >>>> > >>> >>> > >> Dec 18, >>>> > >>> >>> > >>>>>>> 2018 at 10:07 AM >>>> > >>> >>> > >>>>>>> *Index content: *http://www.concordpri.moe.edu.sg/ >>>> > >>> <br><br> >>>> > >>> >>> > >>>>>>> <br><br>On Tue, Dec 18, 2018 at 10:07 AM >>>> > >>> >>> > >>>>>>> >>>> > >>> >>> > >>>>>>> >>>> > >>> >>> > >>>>>>> Appreciate any other ideas or suggestions that you >>>> may >>>> > >>> have. >>>> > >>> >>> > >>>>>>> >>>> > >>> >>> > >>>>>>> Thank you. >>>> > >>> >>> > >>>>>>> >>>> > >>> >>> > >>>>>>> Regards, >>>> > >>> >>> > >>>>>>> Edwin >>>> > >>> >>> > >>>>>>> >>>> > >>> >>> > >>>>>>>> On Thu, 7 Feb 2019 at 22:49, < >>>> paul.d...@ub.unibe.ch> >>>> > >>> wrote: >>>> > >>> >>> > >>>>>>>> >>>> > >>> >>> > >>>>>>>> Hi Edwin >>>> > >>> >>> > >>>>>>>> >>>> > >>> >>> > >>>>>>>> >>>> > >>> >>> > >>>>>>>> >>>> > >>> >>> > >>>>>>>> 1. Sorry, the pattern was wrong, the space should >>>> > preceed >>>> > >>> >>> the \n >>>> > >>> >>> > >>>>>>>> i.e. <str name="pattern">(\s*\n){2,}</str> >>>> > >>> >>> > >>>>>>>> 2. Perhaps in the data you have other (non >>>> printing) >>>> > >>> >>> characters >>>> > >>> >>> > >>>>>>>> than \n? >>>> > >>> >>> > >>>>>>>> >>>> > >>> >>> > >>>>>>>> >>>> > >>> >>> > >>>>>>>> >>>> > >>> >>> > >>>>>>>> Gesendet von Mail< >>>> > >>> >>> https://go.microsoft.com/fwlink/?LinkId=550986> >>>> > >>> >>> > >> für >>>> > >>> >>> > >>>>>>>> Windows 10 >>>> > >>> >>> > >>>>>>>> >>>> > >>> >>> > >>>>>>>> >>>> > >>> >>> > >>>>>>>> >>>> > >>> >>> > >>>>>>>> Von: Zheng Lin Edwin Yeo<mailto: >>>> edwinye...@gmail.com> >>>> > >>> >>> > >>>>>>>> Gesendet: Donnerstag, 7. Februar 2019 15:23 >>>> > >>> >>> > >>>>>>>> An: solr-user@lucene.apache.org<mailto: >>>> > >>> >>> > solr-user@lucene.apache.org> >>>> > >>> >>> > >>>>>>>> Betreff: Re: RegexReplaceProcessorFactory pattern >>>> to >>>> > >>> detect >>>> > >>> >>> > >> multiple \n >>>> > >>> >>> > >>>>>>>> >>>> > >>> >>> > >>>>>>>> >>>> > >>> >>> > >>>>>>>> >>>> > >>> >>> > >>>>>>>> Hi Paul, >>>> > >>> >>> > >>>>>>>> >>>> > >>> >>> > >>>>>>>> We have tried this suggested regex pattern as >>>> follow: >>>> > >>> >>> > >>>>>>>> <processor >>>> class="solr.RegexReplaceProcessorFactory"> >>>> > >>> >>> > >>>>>>>> <str name="fieldName">content</str> >>>> > >>> >>> > >>>>>>>> <str name="pattern">(\n\s*){2,}</str> >>>> > >>> >>> > >>>>>>>> <str name="replacement"><br><br></str> >>>> > >>> >>> > >>>>>>>> </processor> >>>> > >>> >>> > >>>>>>>> >>>> > >>> >>> > >>>>>>>> But we still have exactly the same problem of >>>> Example >>>> > 1,2 >>>> > >>> and >>>> > >>> >>> 3 >>>> > >>> >>> > >> below. >>>> > >>> >>> > >>>>>>>> >>>> > >>> >>> > >>>>>>>> Example 1: The sentence that the above regex >>>> pattern is >>>> > >>> >>> working >>>> > >>> >>> > >>>>>>>> correctly >>>> > >>> >>> > >>>>>>>> *Original content:* Dear Sir, \n\n \n \n\n I am >>>> > >>> >>> terminating >>>> > >>> >>> > >>>>>>>> *Index content: * Dear Sir, <br><br>I am >>>> terminating >>>> > >>> >>> > >>>>>>>> >>>> > >>> >>> > >>>>>>>> Example 2: The sentence that the above regex >>>> pattern is >>>> > >>> >>> partially >>>> > >>> >>> > >>>>>>>> working >>>> > >>> >>> > >>>>>>>> (as you can see, instead of 2 <br>, there are 4 >>>> <br>) >>>> > >>> >>> > >>>>>>>> *Original content:* exalted \n \n\n Psalm 89:17 >>>> > \n\n >>>> > >>> >>> \n\n >>>> > >>> >>> > 3 >>>> > >>> >>> > >>>>>>>> Choa >>>> > >>> >>> > >>>>>>>> Chu Kang Avenue 4, Singapore >>>> > >>> >>> > >>>>>>>> *Index content: *exalted <br><br>Psalm 89:17 >>>> <br><br> >>>> > >>> >>> > <br><br>3 >>>> > >>> >>> > >>>>>>>> Choa >>>> > >>> >>> > >>>>>>>> Chu Kang Avenue 4, Singapore >>>> > >>> >>> > >>>>>>>> >>>> > >>> >>> > >>>>>>>> Example 3: The sentence that the above regex >>>> pattern is >>>> > >>> >>> partially >>>> > >>> >>> > >>>>>>>> working >>>> > >>> >>> > >>>>>>>> (as you can see, instead of 2 <br>, there are 4 >>>> <br>) >>>> > >>> >>> > >>>>>>>> *Original content:* >>>> http://www.concordpri.moe.edu.sg/ >>>> > >>> \n\n >>>> > >>> >>> > \n\n >>>> > >>> >>> > >>>>>>>> \n \n\n >>>> > >>> >>> > >>>>>>>> \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n\n >>>> \n\n\n >>>> > On >>>> > >>> >>> Tue, Dec >>>> > >>> >>> > >> 18, >>>> > >>> >>> > >>>>>>>> 2018 >>>> > >>> >>> > >>>>>>>> at 10:07 AM >>>> > >>> >>> > >>>>>>>> *Index content: *http://www.concordpri.moe.edu.sg/ >>>> > >>> <br><br> >>>> > >>> >>> > >>>>>>>> <br><br>On >>>> > >>> >>> > >>>>>>>> Tue, Dec 18, 2018 at 10:07 AM >>>> > >>> >>> > >>>>>>>> >>>> > >>> >>> > >>>>>>>> Any further suggestion? >>>> > >>> >>> > >>>>>>>> >>>> > >>> >>> > >>>>>>>> Thank you. >>>> > >>> >>> > >>>>>>>> >>>> > >>> >>> > >>>>>>>> Regards, >>>> > >>> >>> > >>>>>>>> Edwin >>>> > >>> >>> > >>>>>>>> >>>> > >>> >>> > >>>>>>>>> On Thu, 7 Feb 2019 at 22:20, < >>>> paul.d...@ub.unibe.ch> >>>> > >>> wrote: >>>> > >>> >>> > >>>>>>>>> >>>> > >>> >>> > >>>>>>>>> To avoid the «\n+\s*» matching too many \n and >>>> then >>>> > >>> failing >>>> > >>> >>> on >>>> > >>> >>> > the >>>> > >>> >>> > >>>>>>>> {2,} >>>> > >>> >>> > >>>>>>>>> part you could try >>>> > >>> >>> > >>>>>>>>> >>>> > >>> >>> > >>>>>>>>> >>>> > >>> >>> > >>>>>>>>> >>>> > >>> >>> > >>>>>>>>> <str name="pattern">(\n\s*){2,}</str> >>>> > >>> >>> > >>>>>>>>> >>>> > >>> >>> > >>>>>>>>> >>>> > >>> >>> > >>>>>>>>> >>>> > >>> >>> > >>>>>>>>> If you also want to match CRLF then >>>> > >>> >>> > >>>>>>>>> >>>> > >>> >>> > >>>>>>>>> <str name="pattern">(\r?\n\s*){2,}</str> >>>> > >>> >>> > >>>>>>>>> >>>> > >>> >>> > >>>>>>>>> >>>> > >>> >>> > >>>>>>>>> >>>> > >>> >>> > >>>>>>>>> >>>> > >>> >>> > >>>>>>>>> >>>> > >>> >>> > >>>>>>>>> Gesendet von Mail< >>>> > >>> >>> https://go.microsoft.com/fwlink/?LinkId=550986 >>>> > >>> >>> > > >>>> > >>> >>> > >>>>>>>> für >>>> > >>> >>> > >>>>>>>>> Windows 10 >>>> > >>> >>> > >>>>>>>>> >>>> > >>> >>> > >>>>>>>>> >>>> > >>> >>> > >>>>>>>>> >>>> > >>> >>> > >>>>>>>>> Von: Zheng Lin Edwin Yeo<mailto: >>>> edwinye...@gmail.com> >>>> > >>> >>> > >>>>>>>>> Gesendet: Donnerstag, 7. Februar 2019 15:10 >>>> > >>> >>> > >>>>>>>>> An: solr-user@lucene.apache.org<mailto: >>>> > >>> >>> > solr-user@lucene.apache.org >>>> > >>> >>> > >>> >>>> > >>> >>> > >>>>>>>>> Betreff: Re: RegexReplaceProcessorFactory pattern >>>> to >>>> > >>> detect >>>> > >>> >>> > >> multiple >>>> > >>> >>> > >>>>>>>> \n >>>> > >>> >>> > >>>>>>>>> >>>> > >>> >>> > >>>>>>>>> >>>> > >>> >>> > >>>>>>>>> >>>> > >>> >>> > >>>>>>>>> Hi Paul, >>>> > >>> >>> > >>>>>>>>> >>>> > >>> >>> > >>>>>>>>> Thanks for your reply. >>>> > >>> >>> > >>>>>>>>> >>>> > >>> >>> > >>>>>>>>> When I use this pattern: >>>> > >>> >>> > >>>>>>>>> <processor >>>> class="solr.RegexReplaceProcessorFactory"> >>>> > >>> >>> > >>>>>>>>> <str name="fieldName">content</str> >>>> > >>> >>> > >>>>>>>>> <str name="pattern">(\n+\s*){2,}</str> >>>> > >>> >>> > >>>>>>>>> <str >>>> name="replacement"><br><br></str> >>>> > >>> >>> > >>>>>>>>> </processor> >>>> > >>> >>> > >>>>>>>>> >>>> > >>> >>> > >>>>>>>>> It is working for some sentence within the same >>>> content >>>> > >>> and >>>> > >>> >>> not >>>> > >>> >>> > >>>>>>>> working for >>>> > >>> >>> > >>>>>>>>> some sentences. Please see below for the one that >>>> is >>>> > >>> working >>>> > >>> >>> and >>>> > >>> >>> > >>>>>>>> another >>>> > >>> >>> > >>>>>>>>> that is not working (partially working): >>>> > >>> >>> > >>>>>>>>> >>>> > >>> >>> > >>>>>>>>> Example 1: The sentence that the above regex >>>> pattern is >>>> > >>> >>> working >>>> > >>> >>> > >>>>>>>> correctly >>>> > >>> >>> > >>>>>>>>> *Original content:* Dear Sir, \n\n \n \n\n I >>>> am >>>> > >>> >>> terminating >>>> > >>> >>> > >>>>>>>>> *Index content: * Dear Sir, <br><br>I am >>>> > terminating >>>> > >>> >>> > >>>>>>>>> >>>> > >>> >>> > >>>>>>>>> Example 2: The sentence that the above regex >>>> pattern is >>>> > >>> >>> partially >>>> > >>> >>> > >>>>>>>> working >>>> > >>> >>> > >>>>>>>>> (as you can see, instead of 2 <br>, there are 4 >>>> <br>) >>>> > >>> >>> > >>>>>>>>> *Original content:* exalted \n \n\n Psalm 89:17 >>>> > \n\n >>>> > >>> >>> > \n\n 3 >>>> > >>> >>> > >>>>>>>> Choa >>>> > >>> >>> > >>>>>>>>> Chu Kang Avenue 4, Singapore >>>> > >>> >>> > >>>>>>>>> *Index content: *exalted <br><br>Psalm 89:17 >>>> > <br><br> >>>> > >>> >>> > <br><br>3 >>>> > >>> >>> > >>>>>>>> Choa >>>> > >>> >>> > >>>>>>>>> Chu Kang Avenue 4, Singapore >>>> > >>> >>> > >>>>>>>>> >>>> > >>> >>> > >>>>>>>>> Example 3: The sentence that the above regex >>>> pattern is >>>> > >>> >>> partially >>>> > >>> >>> > >>>>>>>> working >>>> > >>> >>> > >>>>>>>>> (as you can see, instead of 2 <br>, there are 4 >>>> <br>) >>>> > >>> >>> > >>>>>>>>> *Original content:* >>>> http://www.concordpri.moe.edu.sg/ >>>> > >>> \n\n >>>> > >>> >>> > >> \n\n >>>> > >>> >>> > >>>>>>>> \n >>>> > >>> >>> > >>>>>>>>> \n\n >>>> > >>> >>> > >>>>>>>>> \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n\n >>>> \n\n\n >>>> > On >>>> > >>> >>> Tue, >>>> > >>> >>> > Dec >>>> > >>> >>> > >>>>>>>> 18, 2018 >>>> > >>> >>> > >>>>>>>>> at 10:07 AM >>>> > >>> >>> > >>>>>>>>> *Index content: * >>>> http://www.concordpri.moe.edu.sg/ >>>> > >>> >>> <br><br> >>>> > >>> >>> > >>>>>>>> <br><br>On >>>> > >>> >>> > >>>>>>>>> Tue, Dec 18, 2018 at 10:07 AM >>>> > >>> >>> > >>>>>>>>> >>>> > >>> >>> > >>>>>>>>> We would appreciate your help to see what is >>>> wrong? >>>> > >>> >>> > >>>>>>>>> >>>> > >>> >>> > >>>>>>>>> Thank you. >>>> > >>> >>> > >>>>>>>>> >>>> > >>> >>> > >>>>>>>>> Regards, >>>> > >>> >>> > >>>>>>>>> Edwin >>>> > >>> >>> > >>>>>>>>> >>>> > >>> >>> > >>>>>>>>>> On Thu, 7 Feb 2019 at 21:24, < >>>> paul.d...@ub.unibe.ch> >>>> > >>> wrote: >>>> > >>> >>> > >>>>>>>>>> >>>> > >>> >>> > >>>>>>>>>> You don’t say what happens, just that it is not >>>> > >>> working. I >>>> > >>> >>> > assume >>>> > >>> >>> > >>>>>>>> nothing >>>> > >>> >>> > >>>>>>>>>> is replaced? Perhaps the pattern should be >>>> > >>> >>> > >>>>>>>>>> >>>> > >>> >>> > >>>>>>>>>> >>>> > >>> >>> > >>>>>>>>>> >>>> > >>> >>> > >>>>>>>>>> <str name="pattern">"(\n\s*){2,}"</str> >>>> > >>> >>> > >>>>>>>>>> >>>> > >>> >>> > >>>>>>>>>> >>>> > >>> >>> > >>>>>>>>>> >>>> > >>> >>> > >>>>>>>>>> ?? >>>> > >>> >>> > >>>>>>>>>> >>>> > >>> >>> > >>>>>>>>>> >>>> > >>> >>> > >>>>>>>>>> >>>> > >>> >>> > >>>>>>>>>> Gesendet von Mail< >>>> > >>> >>> > https://go.microsoft.com/fwlink/?LinkId=550986> >>>> > >>> >>> > >>>>>>>> für >>>> > >>> >>> > >>>>>>>>>> Windows 10 >>>> > >>> >>> > >>>>>>>>>> >>>> > >>> >>> > >>>>>>>>>> >>>> > >>> >>> > >>>>>>>>>> >>>> > >>> >>> > >>>>>>>>>> Von: Zheng Lin Edwin Yeo<mailto: >>>> edwinye...@gmail.com> >>>> > >>> >>> > >>>>>>>>>> Gesendet: Donnerstag, 7. Februar 2019 14:08 >>>> > >>> >>> > >>>>>>>>>> An: solr-user@lucene.apache.org<mailto: >>>> > >>> >>> > >> solr-user@lucene.apache.org >>>> > >>> >>> > >>>>>>>>> >>>> > >>> >>> > >>>>>>>>>> Betreff: RegexReplaceProcessorFactory pattern to >>>> > detect >>>> > >>> >>> multiple >>>> > >>> >>> > >> \n >>>> > >>> >>> > >>>>>>>>>> >>>> > >>> >>> > >>>>>>>>>> >>>> > >>> >>> > >>>>>>>>>> >>>> > >>> >>> > >>>>>>>>>> Hi, >>>> > >>> >>> > >>>>>>>>>> >>>> > >>> >>> > >>>>>>>>>> I am trying to use the >>>> RegexReplaceProcessorFactory to >>>> > >>> >>> remove >>>> > >>> >>> > more >>>> > >>> >>> > >>>>>>>> than >>>> > >>> >>> > >>>>>>>>> two >>>> > >>> >>> > >>>>>>>>>> \n with any number of spaces between them (Eg: >>>> \n\n, >>>> > \n >>>> > >>> \n, >>>> > >>> >>> \n >>>> > >>> >>> > \n >>>> > >>> >>> > >>>>>>>> \n >>>> > >>> >>> > >>>>>>>>> \n), >>>> > >>> >>> > >>>>>>>>>> and replace it with two <br>. >>>> > >>> >>> > >>>>>>>>>> >>>> > >>> >>> > >>>>>>>>>> I use the following regex pattern and it is >>>> working >>>> > >>> when I >>>> > >>> >>> test >>>> > >>> >>> > it >>>> > >>> >>> > >>>>>>>> in >>>> > >>> >>> > >>>>>>>>>> regex101.com. But it is not working when I put >>>> it >>>> > >>> inside >>>> > >>> >>> the >>>> > >>> >>> > >>>>>>>>>> RegexReplaceProcessorFactory as below: >>>> > >>> >>> > >>>>>>>>>> >>>> > >>> >>> > >>>>>>>>>> <updateRequestProcessorChain name="removeCode"> >>>> > >>> >>> > >>>>>>>>>> <processor >>>> class="solr.RegexReplaceProcessorFactory"> >>>> > >>> >>> > >>>>>>>>>> <str name="fieldName">content</str> >>>> > >>> >>> > >>>>>>>>>> <str name="pattern">"(\\n\s*){2,}"</str> >>>> > >>> >>> > >>>>>>>>>> <str >>>> name="replacement"><br><br></str> >>>> > >>> >>> > >>>>>>>>>> </processor> >>>> > >>> >>> > >>>>>>>>>> </updateRequestProcessorChain> >>>> > >>> >>> > >>>>>>>>>> >>>> > >>> >>> > >>>>>>>>>> To explain further about my regex pattern, \s* is >>>> > >>> >>> instructing >>>> > >>> >>> > the >>>> > >>> >>> > >>>>>>>> regex >>>> > >>> >>> > >>>>>>>>> to >>>> > >>> >>> > >>>>>>>>>> match any \n that have space after and {2,} is >>>> > >>> instructing >>>> > >>> >>> the >>>> > >>> >>> > >>>>>>>> regex to >>>> > >>> >>> > >>>>>>>>>> match 2 or more occurrence of such pattern (\n). >>>> > >>> >>> > >>>>>>>>>> >>>> > >>> >>> > >>>>>>>>>> Please kindly let me know what is wrong and how >>>> should >>>> > >>> I do >>>> > >>> >>> it? >>>> > >>> >>> > >>>>>>>>>> >>>> > >>> >>> > >>>>>>>>>> I am using Solr 7.6.0. >>>> > >>> >>> > >>>>>>>>>> >>>> > >>> >>> > >>>>>>>>>> Regards, >>>> > >>> >>> > >>>>>>>>>> Edwin >>>> > >>> >>> > >>>>>>>>>> >>>> > >>> >>> > >>>>>>>>> >>>> > >>> >>> > >>>>>>>> >>>> > >>> >>> > >>>>>>> >>>> > >>> >>> > >> >>>> > >>> >>> > >>>> > >>> >>> >>>> > >>> >> >>>> > >>> >>>> > >> >>>> > >>>> >>>