On Sat, Jul 6, 2013 at 5:02 PM, Denis Papathanasiou <[email protected]> wrote: > On Saturday, July 6, 2013 1:22:32 PM UTC-4, Lars Nilsson wrote: >> >> [snip] >> >> If that kind of splitting is really all you require, >> (clojure.string/split my-text #"[.!?;]") or (re-seq #"[^.!?;]+" >> my-text) > Is there any way to preserve the actual punctuation? That's why I was > looking at partition-by and group-by instead.
You could try (re-seq #"[^.!?;]+[.!?;]?" my-text) or perhaps Jim's longer regex is better suited (I didn't look at it in-depth, but it is longer... :) ) >> For fancier stuff look into an opennlp wrapper or something like it. >> >> https://github.com/dakrone/clojure-opennlp > > > This might be a better solution; thanks for mentioning it. It is certainly what I would use, if I was looking for decent text parsing and I was interested more in the use of the output than the implementation of tokenization, etc. Lars Nilsson -- -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to [email protected] Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.
