Hi,
Thanks for the reply.
Meaning we have to write this custom QParser ourselves?
Regards,
Edwin
On 3 February 2018 at 03:28, Chris Hostetter
wrote:
>
> : Have you manage to get the regex for this string in Chinese:
> 预支款管理及账务处理办法 ?
> ...
> : > An example of the string in Chinese is
: Have you manage to get the regex for this string in Chinese: 预支款管理及账务处理办法 ?
...
: > An example of the string in Chinese is 预支款管理及账务处理办法
: >
: > The number of characters is 12, but the expected length should be 36.
...
: >> > So this would likely be different from what the operati
Hi Edwin,
Unfortunately, I was not able find regex that would work in your case.
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> On 1 Feb 2018, at 05:42, Zheng Lin Edwin Yeo wrote:
>
> Hi,
>
> Have y
Hi,
Have you manage to get the regex for this string in Chinese: 预支款管理及账务处理办法 ?
Regards,
Edwin
On 4 January 2018 at 18:04, Zheng Lin Edwin Yeo
wrote:
> Hi Emir,
>
> An example of the string in Chinese is 预支款管理及账务处理办法
>
> The number of characters is 12, but the expected length should be 36.
>
Hi Emir,
An example of the string in Chinese is 预支款管理及账务处理办法
The number of characters is 12, but the expected length should be 36.
Regards,
Edwin
On 4 January 2018 at 16:21, Emir Arnautović
wrote:
> Hi Edwin,
> I don’t have enough knowledge in eastern languages to know what is
> expected num
Hi Edwin,
I don’t have enough knowledge in eastern languages to know what is expected
number when you as for sting length. Maybe you can try some of regex unicode
settings and see if you’ll get what you need: try setting unicode flag with
(?U) or try using regex groups and ranges. If you provide
Hi Emir,
So this would likely be different from what the operating system counts, as
the operating system may consider each Chinese characters as 3 to 4 bytes.
Which is probably why I could not find any record with subject:/.{255,}.*/
Is there other tools that we can use to query the length for d
Hi Edwin,
I do not know, but my guess would be that each character is counted as 1 in
regex regardless how many bytes it takes in used encoding.
Regards,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
Thanks for the reply.
I am doing the search on existing data that has already been indexed, and
it is likely to be a one time thing.
This subject:/.{255,}.*/ works for English characters. However, there are
Chinese characters in some of the records. The length seems to be more than
255, but it
Do that during indexing as Emir suggested. Specifically, use an
UpdateRequestProcessor chain, probably with the Clone and FieldLength
processors:
http://www.solr-start.com/javadoc/solr-lucene/org/apache/solr/update/processor/FieldLengthUpdateProcessorFactory.html
Regards,
Alex.
On 31 December
Hi Edwin,
If it is one time thing you can use regex to filter out results that are not
long enough. Something like: subject:/.{255,}.*/.
Of course, this means subject is not tokenized.
It would be probably best if you index subject length as separate field and
include it in query as subject_leng
11 matches
Mail list logo