Re: Using Regex fragmenter to extract paragraphs

2008-12-15 Thread Mark Ferguson
You actually don't need to escape most characters inside a character class, the escaping of the period was unnecessary. I've tried using the example regex ([-\w ,/\n\"']{20,200}), and I'm _still_ getting lots of highlighted snippets that don't match the regex (starting with a period, etc.) Has any

Re: Using Regex fragmenter to extract paragraphs

2008-12-14 Thread Erick Erickson
Shouldn't you escape the question mark at the end too? On Fri, Dec 12, 2008 at 6:22 PM, Mark Ferguson wrote: > Someone helped me with the regex and pointed out a couple mistakes, most > notably the extra quantifier in .*{400,600}. My new regex is this: > > \w.{400,600}[\.!?] > > Unfortunately, my

Re: Using Regex fragmenter to extract paragraphs

2008-12-12 Thread Mark Ferguson
Someone helped me with the regex and pointed out a couple mistakes, most notably the extra quantifier in .*{400,600}. My new regex is this: \w.{400,600}[\.!?] Unfortunately, my results still aren't any better. Some results start with a word character, some don't, and none seem to end with punctua

Using Regex fragmenter to extract paragraphs

2008-12-12 Thread Mark Ferguson
Hello, I am trying to use the regex fragmenter and am having a hard time getting the results I want. I am trying to get fragments that start on a word character and end on punctuation, but for some reason the fragments being returned to me seem to be very inflexible, despite that I've provided a l