On Mon, Jul 19, 2010 at 9:24 AM, David Virebayre <[email protected]> wrote:
> A minor point: instead of removing the punctuation, you maybe should > convert it to whitespace. > > Otherwise in texts like "there was a quick,brown fox" (notice the > missing space after the comma) you'll have the word "quickbrown" > instead of 2 words "quick" and "brown". If you remove punctuation you - run the risk of joining two valid words into one invalid word: "quick,brown" -> "quickbrown" - run the risk of converting one word into a different word: "can't" -> "cant" "won't" -> "wont" If you split at punctuation you create more semi-words: "can't" -> "can", "t" "shouldn't" -> "shouldn" "t" It might be better regarding in-word apostrophes as letters in this case? -- Dougal Stanton [email protected] // http://www.dougalstanton.net _______________________________________________ Haskell-Cafe mailing list [email protected] http://www.haskell.org/mailman/listinfo/haskell-cafe
