Re: RegexReplaceProcessorFactory pattern to detect multiple \n

2019-03-20 Thread Zheng Lin Edwin Yeo
you need to identify the white space characters >> that are causing the problem. >> >> Von: Zheng Lin Edwin Yeo >> Gesendet: Mittwoch, 13. März 2019 03:25:39 >> An: solr-user@lucene.apache.org >> Betreff: Re: RegexReplaceProcessorFactory pattern to detect multiple \

Re: RegexReplaceProcessorFactory pattern to detect multiple \n

2019-03-13 Thread Zheng Lin Edwin Yeo
using the problem. > > Von: Zheng Lin Edwin Yeo > Gesendet: Mittwoch, 13. März 2019 03:25:39 > An: solr-user@lucene.apache.org > Betreff: Re: RegexReplaceProcessorFactory pattern to detect multiple \n > > Hi, > > We have managed to reso

Re: RegexReplaceProcessorFactory pattern to detect multiple \n

2019-03-12 Thread Zheng Lin Edwin Yeo
;>> Regards, >>> Edwin >>> >>> >>> >>> >>> On Thu, 7 Mar 2019 at 20:44, wrote: >>> >>>> Hi Edwin >>>> >>>> >>>> >>>> I can’t understand why the pattern is not working a

Re: RegexReplaceProcessorFactory pattern to detect multiple \n

2019-03-11 Thread Zheng Lin Edwin Yeo
>> I can’t understand why the pattern is not working and where the spaces >>> between the are coming from. It should be possible to allow for spaces >>> between the in the second match pattern however i.e. 2nd pattern >>> >>> >>> >>> (<br>[

Re: RegexReplaceProcessorFactory pattern to detect multiple \n

2019-03-07 Thread Zheng Lin Edwin Yeo
>> >> >> >> (<br>[ \t\x0b\f]]*){3,} >> >> >> >> /Paul >> >> >> >> Gesendet von Mail<https://go.microsoft.com/fwlink/?LinkId=550986> für >> Windows 10 >> >> >> >> Von: Zheng Li

Re: RegexReplaceProcessorFactory pattern to detect multiple \n

2019-03-07 Thread Zheng Lin Edwin Yeo
ttps://go.microsoft.com/fwlink/?LinkId=550986> für > Windows 10 > > > > Von: Zheng Lin Edwin Yeo<mailto:edwinye...@gmail.com> > Gesendet: Mittwoch, 6. März 2019 16:28 > An: solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org> > Betreff: Re: RegexReplace

Re: RegexReplaceProcessorFactory pattern to detect multiple \n

2019-03-06 Thread Zheng Lin Edwin Yeo
;> > >> > >>content > >>(<br><br>){3,} > >><br><br> > >>true > >> > >> > >> However, none of the \n is being removed this time round. > >> Is the order and/or the pattern correct? > >> >

Re: RegexReplaceProcessorFactory pattern to detect multiple \n

2019-03-05 Thread Zheng Lin Edwin Yeo
t;>> >>> <br> >>> >>> >>> >>> Now all line endings and preceding whitespace characters should be >>> changed to ‘’. >>> >>> >>> >>> The second pattern replacement should replace 3 or more ‘’ sequences

Re: RegexReplaceProcessorFactory pattern to detect multiple \n

2019-03-05 Thread Zheng Lin Edwin Yeo
;> >> <br><br> >> >> >> >> Hope this approach works. Sorry for not replying earlier and best regards, >> >> Paul >> >> >> >> >> >> Gesendet von Mail<https://go.microsoft.com/fwlink/?LinkId=550986> für &g

Re: RegexReplaceProcessorFactory pattern to detect multiple \n

2019-03-05 Thread Zheng Lin Edwin Yeo
nd best regards, > > Paul > > > > > > Gesendet von Mail<https://go.microsoft.com/fwlink/?LinkId=550986> für > Windows 10 > > > > Von: Zheng Lin Edwin Yeo<mailto:edwinye...@gmail.com> > Gesendet: Dienstag, 5. März 2019 03:35 > An: solr-user@lu

Re: RegexReplaceProcessorFactory pattern to detect multiple \n

2019-03-04 Thread Zheng Lin Edwin Yeo
If the second step is executed first, then you will get the unwanted 4 >>> >>> >>> >>> >>> Gesendet von Mail<https://go.microsoft.com/fwlink/?LinkId=550986> für >>> Windows 10 >>> >>> >>> >>> Von: Zheng

Re: RegexReplaceProcessorFactory pattern to detect multiple \n

2019-02-24 Thread Zheng Lin Edwin Yeo
o<mailto:edwinye...@gmail.com> >> Gesendet: Mittwoch, 20. Februar 2019 09:29 >> An: solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org> >> Betreff: Re: RegexReplaceProcessorFactory pattern to detect multiple \n >> >> >> >> Hi Jörn

Re: RegexReplaceProcessorFactory pattern to detect multiple \n

2019-02-20 Thread Zheng Lin Edwin Yeo
; > > Von: Zheng Lin Edwin Yeo<mailto:edwinye...@gmail.com> > Gesendet: Mittwoch, 20. Februar 2019 09:29 > An: solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org> > Betreff: Re: RegexReplaceProcessorFactory pattern to detect multiple \n > > > > Hi Jörn , &

Re: RegexReplaceProcessorFactory pattern to detect multiple \n

2019-02-20 Thread Zheng Lin Edwin Yeo
content: *Dear Sir, I am terminating > >>>>>>> > >>>>>>> Example 2: The sentence that the above regex pattern is partially > >>>>>>> working (as you can see, instead of 2 , there are 4 ) > >>>>>>> *Origina

Re: RegexReplaceProcessorFactory pattern to detect multiple \n

2019-02-20 Thread Zheng Lin Edwin Yeo
mailto:edwinye...@gmail.com> > Gesendet: Mittwoch, 20. Februar 2019 08:13 > An: solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org> > Betreff: Re: RegexReplaceProcessorFactory pattern to detect multiple \n > > > > Hi, > > Thanks for the reply. > &

Re: RegexReplaceProcessorFactory pattern to detect multiple \n

2019-02-20 Thread Jörn Franke
gt;>>>>>> >>>>>>> 3 Choa Chu Kang Avenue 4 >>>>>>> *Original content:* exalted \n \n\n Psalm 89:17 \n\n \n\n 3 >>>>>>> Choa Chu Kang Avenue 4, Singapore >>>>>>> *Index content: *exalted Psalm 89:17 3 &

Re: RegexReplaceProcessorFactory pattern to detect multiple \n

2019-02-20 Thread Jörn Franke
gt;>>>>>> >>>>>>> 3 Choa Chu Kang Avenue 4 >>>>>>> *Original content:* exalted \n \n\n Psalm 89:17 \n\n \n\n 3 >>>>>>> Choa Chu Kang Avenue 4, Singapore >>>>>>> *Index content: *exalted Ps

Re: RegexReplaceProcessorFactory pattern to detect multiple \n

2019-02-19 Thread Zheng Lin Edwin Yeo
89:17 3 > >>>>> Choa Chu Kang Avenue 4, Singapore > >>>>> > >>>>> Example 3: The sentence that the above regex pattern is partially > >>>>> working (as you can see, instead of 2 , there are 4 ) > >>>>> *Ori

Re: RegexReplaceProcessorFactory pattern to detect multiple \n

2019-02-19 Thread Jörn Franke
nal content in EML file:* >>>>> >>>>> http://www.concordpri.moe.edu.sg/ >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> O

Re: RegexReplaceProcessorFactory pattern to detect multiple \n

2019-02-19 Thread Zheng Lin Edwin Yeo
>> >>>> On Tue, Dec 18, 2018 at 10:07 AM >>>> *Original content:* http://www.concordpri.moe.edu.sg/ \n\n \n\n \n >>>> \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n\n \n\n\n On Tue, Dec 18, >>>> 2018 at 10:07 AM >>>> *Index

Re: RegexReplaceProcessorFactory pattern to detect multiple \n

2019-02-14 Thread Zheng Lin Edwin Yeo
gt;> >>>> Example 2: The sentence that the above regex pattern is partially >>>> working >>>> (as you can see, instead of 2 , there are 4 ) >>>> *Original content:* exalted \n \n\n Psalm 89:17 \n\n \n\n 3 Choa >>>> Chu Kang A

Re: RegexReplaceProcessorFactory pattern to detect multiple \n

2019-02-11 Thread Zheng Lin Edwin Yeo
t;> >>> Hi Edwin >>> >>> >>> >>> 1. Sorry, the pattern was wrong, the space should preceed the \n i.e. >>> (\s*\n){2,} >>> 2. Perhaps in the data you have other (non printing) characters than >>> \n? >>

Re: RegexReplaceProcessorFactory pattern to detect multiple \n

2019-02-08 Thread Zheng Lin Edwin Yeo
he data you have other (non printing) characters than >> \n? >> >> >> >> Gesendet von Mail<https://go.microsoft.com/fwlink/?LinkId=550986> für >> Windows 10 >> >> >> >> Von: Zheng Lin Edwin Yeo<mailto:edwinye...@gmail.com> >&g

Re: RegexReplaceProcessorFactory pattern to detect multiple \n

2019-02-07 Thread Zheng Lin Edwin Yeo
Lin Edwin Yeo<mailto:edwinye...@gmail.com> > Gesendet: Donnerstag, 7. Februar 2019 15:23 > An: solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org> > Betreff: Re: RegexReplaceProcessorFactory pattern to detect multiple \n > > > > Hi Paul, > > We hav

Re: RegexReplaceProcessorFactory pattern to detect multiple \n

2019-02-07 Thread Zheng Lin Edwin Yeo
t; > Von: Zheng Lin Edwin Yeo<mailto:edwinye...@gmail.com> > Gesendet: Donnerstag, 7. Februar 2019 15:10 > An: solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org> > Betreff: Re: RegexReplaceProcessorFactory pattern to detect multiple \n > > > > Hi Paul, >

Re: RegexReplaceProcessorFactory pattern to detect multiple \n

2019-02-07 Thread Zheng Lin Edwin Yeo
Hi Paul, Thanks for your reply. When I use this pattern: content (\n+\s*){2,}

It is working for some sentence within the same content and not working for some sentences. Please see below for the one that is working and another that is not working (partially working): Example