Roland Szűcs <roland.sz...@booknwalk.com> wrote:
> My use case is that I have to calculate the LIX readability index for my
> documents.
[...]
> *B* = Number of periods (defined by period, colon or capital first letter)
[...]
> Does anybody have idea how to get the number of "periods"?

As the positions does not matter, you could make a copyField containing only 
punctuation. And maybe extended with a replace filter so that you have dot, 
comma, color, bang, question ect. instead of .,:!?

The capital first letter seems a bit strange to me - what about names? But 
anyway, you could do it with a PatternReplaceCharFilter, matching on something 
like 
([^.,:!?]\p{Space}*\p{Upper})|(^\p{Upper})
and replacing with 'capital' (the regexp above probably fails - it was just 
from memory).

- Toke Eskildsen

Reply via email to