On Sunday, July 7, 2013 6:06:06 AM UTC-4, Jim foo.bar wrote:
>
> I'm not sure I follow what you mean...both regexes posted here preserve
> the punctuation...here is mine (ignore the names - it is in fact the same
> regex):
>
You're right; I was actually referring to the suggestions Lars had made.
>
> [snip]
>
> Similar thing happens with Lars's simpler regex...just use 're-seq'
> instead of 'split'
>
That wasn't my experience:
#'user/sentences
user=> (nth sentences 0)
" THE country of the ancient Mexicans, or Aztecs as they were called,
formed but a very small part of the extensive territories comprehended in
the modern republic of Mexico"
user=> (nth sentences 1)
" Its boundaries cannot be defined with certainty"
user=> (nth sentences 2)
" They were much enlarged in the latter days of the empire, when they may
be considered as reaching from about the eighteenth degree north to the
twenty-first on the Atlantic"
Actually, I also thought of a way to do it with the simple example
suggested by Lars w/o using the nlp package (this only works b/c there are
no pipe characters in the text file I'm processing):
user=> (def sentences (clojure.string/split(clojure.string/replace my-text
#"([.?!;])\s{1}" "$1|||") #"\|\|\|"))
#'user/sentences
user=> (nth sentences 0)
" THE country of the ancient Mexicans, or Aztecs as they were called,
formed but a very small part of the extensive territories comprehended in
the modern republic of Mexico."
user=> (nth sentences 1)
"Its boundaries cannot be defined with certainty."
user=> (nth sentences 2)
"They were much enlarged in the latter days of the empire, when they may be
considered as reaching from about the eighteenth degree north to the
twenty-first on the Atlantic;"
user=> (nth sentences 3)
"and from the fourteenth to the nineteenth, including a very narrow strip,
on the Pacific."
--
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to [email protected]
Note that posts from new members are moderated - please be patient with your
first post.
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.