I have a wild and crazy text file, the head of which looks like this: 2016-07-01 02:50:35 <john> hey 2016-07-01 02:51:26 <jane> waiting for plane to Edinburgh 2016-07-01 02:51:45 <john> thinking about my boo 2016-07-01 02:52:07 <jane> nothing crappy has happened, not really 2016-07-01 02:52:20 <john> plane went by pretty fast, didn't sleep 2016-07-01 02:54:08 <jane> no idea what time it is or where I am really 2016-07-01 02:54:17 <john> just know it's london 2016-07-01 02:56:44 <jane> you are probably asleep 2016-07-01 02:58:45 <jane> I hope fish was fishy in a good eay 2016-07-01 02:58:56 <jone> 💘 2016-07-01 02:59:34 <jane> 🍑🍑🍑 2016-07-01 03:02:48 <john> British security is a little more rigorous...
It goes on for a while. It's a big file. But I feel like it's going to be difficult to annotate with the coreNLP library or package. I'm doing natural language processing. In other words, I'm curious as to how I would shave off the dates, that is, to make it look like: <john> hey <jane> waiting for plane to Edinburgh <john> thinking about my boo <jane> nothing crappy has happened, not really <john> plane went by pretty fast, didn't sleep <jane> no idea what time it is or where I am really <john> just know it's london <jane> you are probably asleep <jane> I hope fish was fishy in a good eay <jone> 💘 <jane> 🍑🍑🍑 <john> British security is a little more rigorous... To be clear, then, I'm trying to clean a large text file by writing a regular expression? such that I create a new object with no numbers or dates. Michael ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.