Re: [R] how to separate string from numbers in a large txt file

Michael Boulineau Sun, 19 May 2019 10:57:53 -0700

> b <- "^([0-9-]{10} [0-9:]{8} )[*]{3} (\\w+ \\w+)"

so the ^ signals that the regex BEGINS with a number (that could be
any number, 0-9) that is only 10 characters long (then there's the
dash in there, too, with the 0-9-, which I assume enabled the regex to
grab the - that's between the numbers in the date), followed by a
single space, followed by a unit that could be any number, again, but
that is only 8 characters long this time. For that one, it will
include the colon, hence the 9:, although for that one ([0-9:]{8} ), I
don't get why the space is on the inside in that one, after the {8},
whereas the space is on the outside with the other one ^([0-9-]{10} ,
directly after the {10}. Why is that?


Then three *** [*]{3}, then the (\\w+ \\w+)", which Boris explained so
well above. I guess I still don't get why this one seemed to have
deleted the *** out of the mix, plus I still don't why it didn't
remove the *** from the first one.

2016-03-20 19:29:37 *** Jane Doe started a video chat
2016-03-20 19:30:35 *** John Doe ended a video chat
2016-04-02 12:59:36 *** Jane Doe started a video chat
2016-04-02 13:00:43 *** John Doe ended a video chat
2016-04-02 13:01:08 *** Jane Doe started a video chat
2016-04-02 13:01:41 *** John Doe ended a video chat
2016-04-02 13:03:51 *** John Doe started a video chat
2016-04-02 13:06:35 *** John Doe ended a video chat

This is a random sample from the beginning of the txt file with no
edits. The ***s were deleted, all but the first one, the one that had
the ï»¿ but that was taken out by the encoding = "UTF-8". I know that
the function was c <- gsub(b, "\\1<\\2> ", a), so it had a gsub () on
there, the point of which is to do substitution work.

Oh, I get it, I think. The \\1<\\2> in the gsub () puts the <> around
the names, so that it's consistent with the rest of the data, so that
the names in the text about that aren't enclosed in the <> are
enclosed like the rest of them. But I still don't get why or how the
gsub () replaced the *** with the <>...

This one is more straightforward.

> d <- "^([0-9-]{10}) ([0-9:]{8}) <(\\w+ \\w+)>\\s*(.+)$"

any number with - for 10 characters, followed by a space. Oh, there's
no space in this one ([0-9:]{8}), after the {8}. Hu. So, then, any
number with : for 8 characters, followed by any two words separated by
a space and enclosed in <>. And then the \\s* is followed by a single
space? Or maybe it puts space on both sides (on the side of the #s to
the left, and then the comment to the right). The (.+)$ is anything
whatsoever until the end.

Michael


On Sun, May 19, 2019 at 4:37 AM Boris Steipe <boris.ste...@utoronto.ca> wrote:
>
> Inline
>
>
>
> > On 2019-05-18, at 20:34, Michael Boulineau <michael.p.boulin...@gmail.com> 
> > wrote:
> >
> > It appears to have worked, although there were three little quirks.
> > The ; close(con); rm(con) didn't work for me; the first row of the
> > data.frame was all NAs, when all was said and done;
>
> You will get NAs for lines that can't be matched to the regular expression. 
> That's a good thing, it allows you to test whether your assumptions were 
> valid for the entire file:
>
> # number of failed strcapture()
> sum(is.na(e$date))
>
>
> > and then there
> > were still three *** on the same line where the ï»¿ was apparently
> > deleted.
>
> This is a sign that something else happened with the line that prevented the 
> regex from matching. In that case you need to investigate more. I see an 
> invalid multibyte character at the beginning of the line you posted below.
>
> >
> >> a <- readLines ("hangouts-conversation-6.txt", encoding = "UTF-8")
> >> b <- "^([0-9-]{10} [0-9:]{8} )[*]{3} (\\w+ \\w+)"
> >> c <- gsub(b, "\\1<\\2> ", a)
> >> head (c)
> > [1] "2016-01-27 09:14:40 *** Jane Doe started a video chat"
> > [2] "2016-01-27 09:15:20 <Jane Doe>
> > https://lh3.googleusercontent.com/-_WQF5kRcnpk/Vqj7J4aK1jI/AAAAAAAAAVA/GVqutPqbSuo/s0/be8ded30-87a6-4e80-bdfa-83ed51591dbf";
>
> [...]
>
> > But, before I do anything else, I'm going to study the regex in this
> > particular code. For example, I'm still not sure why there has to the
> > second \\w+ in the (\\w+ \\w+). Little things like that.
>
> \w is the metacharacter for alphanumeric characters, \w+ designates something 
> we could call a word. Thus \w+ \w+ are two words separated by a single blank. 
> This corresponds to your example, but, as I wrote previously, you need to 
> think very carefully whether this covers all possible cases (Could there be 
> only one word? More than one blank? Could letters be separated by hyphens or 
> periods?) In most cases we could have more robustly matched everything 
> between "<" and ">" (taking care to test what happens if the message contains 
> those characters). But for the video chat lines we need to make an assumption 
> about what is name and what is not. If "started a video chat" is the only 
> possibility in such lines, you can use this information instead. If there are 
> other possibilities, you need a different strategy. In NLP there is no 
> one-approach-fits-all.
>
> To validate the structure of the names in your transcripts, you can look at
>
> patt <- " <.+?> "   # " <any string, not greedy> "
> m <- regexpr(patt, c)
> unique(regmatches(c, m))
>
>
>
> B.
>
>
>
> >
> > Michael
> >
> >
> > On Sat, May 18, 2019 at 4:30 PM Boris Steipe <boris.ste...@utoronto.ca> 
> > wrote:
> >>
> >> This works for me:
> >>
> >> # sample data
> >> c <- character()
> >> c[1] <- "2016-01-27 09:14:40 <Jane Doe> started a video chat"
> >> c[2] <- "2016-01-27 09:15:20 <Jane Doe> https://lh3.googleusercontent.com/";
> >> c[3] <- "2016-01-27 09:15:20 <Jane Doe> Hey "
> >> c[4] <- "2016-01-27 09:15:22 <John Doe>  ended a video chat"
> >> c[5] <- "2016-01-27 21:07:11 <Jane Doe>  started a video chat"
> >> c[6] <- "2016-01-27 21:26:57 <John Doe>  ended a video chat"
> >>
> >>
> >> # regex  ^(year)       (time)      <(word word)>\\s*(string)$
> >> patt <- "^([0-9-]{10}) ([0-9:]{8}) <(\\w+ \\w+)>\\s*(.+)$"
> >> proto <- data.frame(date = character(),
> >>                    time = character(),
> >>                    name = character(),
> >>                    text = character(),
> >>                    stringsAsFactors = TRUE)
> >> d <- strcapture(patt, c, proto)
> >>
> >>
> >>
> >>        date     time     name                               text
> >> 1 2016-01-27 09:14:40 Jane Doe               started a video chat
> >> 2 2016-01-27 09:15:20 Jane Doe https://lh3.googleusercontent.com/
> >> 3 2016-01-27 09:15:20 Jane Doe                               Hey
> >> 4 2016-01-27 09:15:22 John Doe                 ended a video chat
> >> 5 2016-01-27 21:07:11 Jane Doe               started a video chat
> >> 6 2016-01-27 21:26:57 John Doe                 ended a video chat
> >>
> >>
> >>
> >> B.
> >>
> >>
> >>> On 2019-05-18, at 18:32, Michael Boulineau 
> >>> <michael.p.boulin...@gmail.com> wrote:
> >>>
> >>> Going back and thinking through what Boris and William were saying
> >>> (also Ivan), I tried this:
> >>>
> >>> a <- readLines ("hangouts-conversation-6.csv.txt")
> >>> b <- "^([0-9-]{10} [0-9:]{8} )[*]{3} (\\w+ \\w+)"
> >>> c <- gsub(b, "\\1<\\2> ", a)
> >>>> head (c)
> >>> [1] "ï»¿2016-01-27 09:14:40 *** Jane Doe started a video chat"
> >>> [2] "2016-01-27 09:15:20 <Jane Doe>
> >>> https://lh3.googleusercontent.com/-_WQF5kRcnpk/Vqj7J4aK1jI/AAAAAAAAAVA/GVqutPqbSuo/s0/be8ded30-87a6-4e80-bdfa-83ed51591dbf";
> >>> [3] "2016-01-27 09:15:20 <Jane Doe> Hey "
> >>> [4] "2016-01-27 09:15:22 <John Doe>  ended a video chat"
> >>> [5] "2016-01-27 21:07:11 <Jane Doe>  started a video chat"
> >>> [6] "2016-01-27 21:26:57 <John Doe>  ended a video chat"
> >>>
> >>> The ï»¿ is still there, since I forgot to do what Ivan had suggested, 
> >>> namely,
> >>>
> >>> a <- readLines(con <- file("hangouts-conversation-6.csv.txt", encoding
> >>> = "UTF-8")); close(con); rm(con)
> >>>
> >>> But then the new code is still turning out only NAs when I apply
> >>> strcapture (). This was what happened next:
> >>>
> >>>> d <- strcapture("^([[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2}
> >>> + [[:digit:]]{2}:[[:digit:]]{2}:[[:digit:]]{2}) +(<[^>]*>) *(.*$)",
> >>> +                 c, proto=data.frame(stringsAsFactors=FALSE, When="", 
> >>> Who="",
> >>> +                                     What=""))
> >>>> head (d)
> >>> When  Who What
> >>> 1 <NA> <NA> <NA>
> >>> 2 <NA> <NA> <NA>
> >>> 3 <NA> <NA> <NA>
> >>> 4 <NA> <NA> <NA>
> >>> 5 <NA> <NA> <NA>
> >>> 6 <NA> <NA> <NA>
> >>>
> >>> I've been reading up on regular expressions, too, so this code seems
> >>> spot on. What's going wrong?
> >>>
> >>> Michael
> >>>
> >>> On Fri, May 17, 2019 at 4:28 PM Boris Steipe <boris.ste...@utoronto.ca> 
> >>> wrote:
> >>>>
> >>>> Don't start putting in extra commas and then reading this as csv. That 
> >>>> approach is broken. The correct approach is what Bill outlined: read 
> >>>> everything with readLines(), and then use a proper regular expression 
> >>>> with strcapture().
> >>>>
> >>>> You need to pre-process the object that readLines() gives you: replace 
> >>>> the contents of the videochat lines, and make it conform to the format 
> >>>> of the other lines before you process it into your data frame.
> >>>>
> >>>> Approximately something like
> >>>>
> >>>> # read the raw data
> >>>> tmp <- readLines("hangouts-conversation-6.csv.txt")
> >>>>
> >>>> # process all video chat lines
> >>>> patt <- "^([0-9-]{10} [0-9:]{8} )[*]{3} (\\w+ \\w+) "  # (year time )*** 
> >>>> (word word)
> >>>> tmp <- gsub(patt, "\\1<\\2> ", tmp)
> >>>>
> >>>> # next, use strcapture()
> >>>>
> >>>> Note that this makes the assumption that your names are always exactly 
> >>>> two words containing only letters. If that assumption is not true, more 
> >>>> though needs to go into the regex. But you can test that:
> >>>>
> >>>> patt <- " <\\w+ \\w+> "   #" <word word> "
> >>>> sum( ! grepl(patt, tmp)))
> >>>>
> >>>> ... will give the number of lines that remain in your file that do not 
> >>>> have a tag that can be interpreted as "Who"
> >>>>
> >>>> Once that is fine, use Bill's approach - or a regular expression of your 
> >>>> own design - to create your data frame.
> >>>>
> >>>> Hope this helps,
> >>>> Boris
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>> On 2019-05-17, at 16:18, Michael Boulineau 
> >>>>> <michael.p.boulin...@gmail.com> wrote:
> >>>>>
> >>>>> Very interesting. I'm sure I'll be trying to get rid of the byte order
> >>>>> mark eventually. But right now, I'm more worried about getting the
> >>>>> character vector into either a csv file or data.frame; that way, I can
> >>>>> be able to work with the data neatly tabulated into four columns:
> >>>>> date, time, person, comment. I assume it's a write.csv function, but I
> >>>>> don't know what arguments to put in it. header=FALSE? fill=T?
> >>>>>
> >>>>> Micheal
> >>>>>
> >>>>> On Fri, May 17, 2019 at 1:03 PM Jeff Newmiller 
> >>>>> <jdnew...@dcn.davis.ca.us> wrote:
> >>>>>>
> >>>>>> If byte order mark is the issue then you can specify the file encoding 
> >>>>>> as "UTF-8-BOM" and it won't show up in your data any more.
> >>>>>>
> >>>>>> On May 17, 2019 12:12:17 PM PDT, William Dunlap via R-help 
> >>>>>> <r-help@r-project.org> wrote:
> >>>>>>> The pattern I gave worked for the lines that you originally showed 
> >>>>>>> from
> >>>>>>> the
> >>>>>>> data file ('a'), before you put commas into them.  If the name is
> >>>>>>> either of
> >>>>>>> the form "<name>" or "***" then the "(<[^>]*>)" needs to be changed so
> >>>>>>> something like "(<[^>]*>|[*]{3})".
> >>>>>>>
> >>>>>>> The " ï»¿" at the start of the imported data may come from the byte
> >>>>>>> order
> >>>>>>> mark that Windows apps like to put at the front of a text file in 
> >>>>>>> UTF-8
> >>>>>>> or
> >>>>>>> UTF-16 format.
> >>>>>>>
> >>>>>>> Bill Dunlap
> >>>>>>> TIBCO Software
> >>>>>>> wdunlap tibco.com
> >>>>>>>
> >>>>>>>
> >>>>>>> On Fri, May 17, 2019 at 11:53 AM Michael Boulineau <
> >>>>>>> michael.p.boulin...@gmail.com> wrote:
> >>>>>>>
> >>>>>>>> This seemed to work:
> >>>>>>>>
> >>>>>>>>> a <- readLines ("hangouts-conversation-6.csv.txt")
> >>>>>>>>> b <- sub("^(.{10}) (.{8}) (<.+>) (.+$)", "\\1,\\2,\\3,\\4", a)
> >>>>>>>>> b [1:84]
> >>>>>>>>
> >>>>>>>> And the first 85 lines looks like this:
> >>>>>>>>
> >>>>>>>> [83] "2016-06-28 21:02:28 *** Jane Doe started a video chat"
> >>>>>>>> [84] "2016-06-28 21:12:43 *** John Doe ended a video chat"
> >>>>>>>>
> >>>>>>>> Then they transition to the commas:
> >>>>>>>>
> >>>>>>>>> b [84:100]
> >>>>>>>> [1] "2016-06-28 21:12:43 *** John Doe ended a video chat"
> >>>>>>>> [2] "2016-07-01,02:50:35,<John Doe>,hey"
> >>>>>>>> [3] "2016-07-01,02:51:26,<John Doe>,waiting for plane to Edinburgh"
> >>>>>>>> [4] "2016-07-01,02:51:45,<John Doe>,thinking about my boo"
> >>>>>>>>
> >>>>>>>> Even the strange bit on line 6347 was caught by this:
> >>>>>>>>
> >>>>>>>>> b [6346:6348]
> >>>>>>>> [1] "2016-10-21,10:56:29,<John Doe>,John_Doe"
> >>>>>>>> [2] "2016-10-21,10:56:37,<John Doe>,Admit#8242"
> >>>>>>>> [3] "2016-10-21,11:00:13,<Jane Doe>,Okay so you have a discussion"
> >>>>>>>>
> >>>>>>>> Perhaps most awesomely, the code catches spaces that are interposed
> >>>>>>>> into the comment itself:
> >>>>>>>>
> >>>>>>>>> b [4]
> >>>>>>>> [1] "2016-01-27,09:15:20,<Jane Doe>,Hey "
> >>>>>>>>> b [85]
> >>>>>>>> [1] "2016-07-01,02:50:35,<John Doe>,hey"
> >>>>>>>>
> >>>>>>>> Notice whether there is a space after the "hey" or not.
> >>>>>>>>
> >>>>>>>> These are the first two lines:
> >>>>>>>>
> >>>>>>>> [1] "ï»¿2016-01-27 09:14:40 *** Jane Doe started a video chat"
> >>>>>>>> [2] "2016-01-27,09:15:20,<Jane
> >>>>>>>> Doe>,
> >>>>>>>>
> >>>>>>> https://lh3.googleusercontent.com/-_WQF5kRcnpk/Vqj7J4aK1jI/AAAAAAAAAVA/GVqutPqbSuo/s0/be8ded30-87a6-4e80-bdfa-83ed51591dbf
> >>>>>>>> "
> >>>>>>>>
> >>>>>>>> So, who knows what happened with the ï»¿ at the beginning of [1]
> >>>>>>>> directly above. But notice how there are no commas in [1] but there
> >>>>>>>> appear in [2]. I don't see why really long ones like [2] directly
> >>>>>>>> above would be a problem, were they to be translated into a csv or
> >>>>>>>> data frame column.
> >>>>>>>>
> >>>>>>>> Now, with the commas in there, couldn't we write this into a csv or a
> >>>>>>>> data.frame? Some of this data will end up being garbage, I imagine.
> >>>>>>>> Like in [2] directly above. Or with [83] and [84] at the top of this
> >>>>>>>> discussion post/email. Embarrassingly, I've been trying to convert
> >>>>>>>> this into a data.frame or csv but I can't manage to. I've been using
> >>>>>>>> the write.csv function, but I don't think I've been getting the
> >>>>>>>> arguments correct.
> >>>>>>>>
> >>>>>>>> At the end of the day, I would like a data.frame and/or csv with the
> >>>>>>>> following four columns: date, time, person, comment.
> >>>>>>>>
> >>>>>>>> I tried this, too:
> >>>>>>>>
> >>>>>>>>> c <- strcapture("^([[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2}
> >>>>>>>> + [[:digit:]]{2}:[[:digit:]]{2}:[[:digit:]]{2}) +(<[^>]*>) *(.*$)",
> >>>>>>>> +                 a, proto=data.frame(stringsAsFactors=FALSE,
> >>>>>>> When="",
> >>>>>>>> Who="",
> >>>>>>>> +                                     What=""))
> >>>>>>>>
> >>>>>>>> But all I got was this:
> >>>>>>>>
> >>>>>>>>> c [1:100, ]
> >>>>>>>>  When  Who What
> >>>>>>>> 1   <NA> <NA> <NA>
> >>>>>>>> 2   <NA> <NA> <NA>
> >>>>>>>> 3   <NA> <NA> <NA>
> >>>>>>>> 4   <NA> <NA> <NA>
> >>>>>>>> 5   <NA> <NA> <NA>
> >>>>>>>> 6   <NA> <NA> <NA>
> >>>>>>>>
> >>>>>>>> It seems to have caught nothing.
> >>>>>>>>
> >>>>>>>>> unique (c)
> >>>>>>>> When  Who What
> >>>>>>>> 1 <NA> <NA> <NA>
> >>>>>>>>
> >>>>>>>> But I like that it converted into columns. That's a really great
> >>>>>>>> format. With a little tweaking, it'd be a great code for this data
> >>>>>>>> set.
> >>>>>>>>
> >>>>>>>> Michael
> >>>>>>>>
> >>>>>>>> On Fri, May 17, 2019 at 8:20 AM William Dunlap via R-help
> >>>>>>>> <r-help@r-project.org> wrote:
> >>>>>>>>>
> >>>>>>>>> Consider using readLines() and strcapture() for reading such a
> >>>>>>> file.
> >>>>>>>> E.g.,
> >>>>>>>>> suppose readLines(files) produced a character vector like
> >>>>>>>>>
> >>>>>>>>> x <- c("2016-10-21 10:35:36 <Jane Doe> What's your login",
> >>>>>>>>>        "2016-10-21 10:56:29 <John Doe> John_Doe",
> >>>>>>>>>        "2016-10-21 10:56:37 <John Doe> Admit#8242",
> >>>>>>>>>        "October 23, 1819 12:34 <Jane Eyre> I am not an angel")
> >>>>>>>>>
> >>>>>>>>> Then you can make a data.frame with columns When, Who, and What by
> >>>>>>>>> supplying a pattern containing three parenthesized capture
> >>>>>>> expressions:
> >>>>>>>>>> z <- strcapture("^([[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2}
> >>>>>>>>> [[:digit:]]{2}:[[:digit:]]{2}:[[:digit:]]{2}) +(<[^>]*>) *(.*$)",
> >>>>>>>>>           x, proto=data.frame(stringsAsFactors=FALSE, When="",
> >>>>>>> Who="",
> >>>>>>>>> What=""))
> >>>>>>>>>> str(z)
> >>>>>>>>> 'data.frame':   4 obs. of  3 variables:
> >>>>>>>>> $ When: chr  "2016-10-21 10:35:36" "2016-10-21 10:56:29"
> >>>>>>> "2016-10-21
> >>>>>>>>> 10:56:37" NA
> >>>>>>>>> $ Who : chr  "<Jane Doe>" "<John Doe>" "<John Doe>" NA
> >>>>>>>>> $ What: chr  "What's your login" "John_Doe" "Admit#8242" NA
> >>>>>>>>>
> >>>>>>>>> Lines that don't match the pattern result in NA's - you might make
> >>>>>>> a
> >>>>>>>> second
> >>>>>>>>> pass over the corresponding elements of x with a new pattern.
> >>>>>>>>>
> >>>>>>>>> You can convert the When column from character to time with
> >>>>>>> as.POSIXct().
> >>>>>>>>>
> >>>>>>>>> Bill Dunlap
> >>>>>>>>> TIBCO Software
> >>>>>>>>> wdunlap tibco.com
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Thu, May 16, 2019 at 8:30 PM David Winsemius
> >>>>>>> <dwinsem...@comcast.net>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On 5/16/19 3:53 PM, Michael Boulineau wrote:
> >>>>>>>>>>> OK. So, I named the object test and then checked the 6347th
> >>>>>>> item
> >>>>>>>>>>>
> >>>>>>>>>>>> test <- readLines ("hangouts-conversation.txt)
> >>>>>>>>>>>> test [6347]
> >>>>>>>>>>> [1] "2016-10-21 10:56:37 <John Doe> Admit#8242"
> >>>>>>>>>>>
> >>>>>>>>>>> Perhaps where it was getting screwed up is, since the end of
> >>>>>>> this is
> >>>>>>>> a
> >>>>>>>>>>> number (8242), then, given that there's no space between the
> >>>>>>> number
> >>>>>>>>>>> and what ought to be the next row, R didn't know where to draw
> >>>>>>> the
> >>>>>>>>>>> line. Sure enough, it looks like this when I go to the original
> >>>>>>> file
> >>>>>>>>>>> and control f "#8242"
> >>>>>>>>>>>
> >>>>>>>>>>> 2016-10-21 10:35:36 <Jane Doe> What's your login
> >>>>>>>>>>> 2016-10-21 10:56:29 <John Doe> John_Doe
> >>>>>>>>>>> 2016-10-21 10:56:37 <John Doe> Admit#8242
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> An octothorpe is an end of line signifier and is interpreted as
> >>>>>>>> allowing
> >>>>>>>>>> comments. You can prevent that interpretation with suitable
> >>>>>>> choice of
> >>>>>>>>>> parameters to `read.table` or `read.csv`. I don't understand why
> >>>>>>> that
> >>>>>>>>>> should cause anu error or a failure to match that pattern.
> >>>>>>>>>>
> >>>>>>>>>>> 2016-10-21 11:00:13 <Jane Doe> Okay so you have a discussion
> >>>>>>>>>>>
> >>>>>>>>>>> Again, it doesn't look like that in the file. Gmail
> >>>>>>> automatically
> >>>>>>>>>>> formats it like that when I paste it in. More to the point, it
> >>>>>>> looks
> >>>>>>>>>>> like
> >>>>>>>>>>>
> >>>>>>>>>>> 2016-10-21 10:35:36 <Jane Doe> What's your login2016-10-21
> >>>>>>> 10:56:29
> >>>>>>>>>>> <John Doe> John_Doe2016-10-21 10:56:37 <John Doe>
> >>>>>>>> Admit#82422016-10-21
> >>>>>>>>>>> 11:00:13 <Jane Doe> Okay so you have a discussion
> >>>>>>>>>>>
> >>>>>>>>>>> Notice Admit#82422016. So there's that.
> >>>>>>>>>>>
> >>>>>>>>>>> Then I built object test2.
> >>>>>>>>>>>
> >>>>>>>>>>> test2 <- sub("^(.{10}) (.{8}) (<.+>) (.+$)", "//1,//2,//3,//4",
> >>>>>>> test)
> >>>>>>>>>>>
> >>>>>>>>>>> This worked for 84 lines, then this happened.
> >>>>>>>>>>
> >>>>>>>>>> It may have done something but as you later discovered my first
> >>>>>>> code
> >>>>>>>> for
> >>>>>>>>>> the pattern was incorrect. I had tested it (and pasted in the
> >>>>>>> results
> >>>>>>>> of
> >>>>>>>>>> the test) . The way to refer to a capture class is with
> >>>>>>> back-slashes
> >>>>>>>>>> before the numbers, not forward-slashes. Try this:
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> newvec <- sub("^(.{10}) (.{8}) (<.+>) (.+$)",
> >>>>>>> "\\1,\\2,\\3,\\4",
> >>>>>>>> chrvec)
> >>>>>>>>>>> newvec
> >>>>>>>>>> [1] "2016-07-01,02:50:35,<john>,hey"
> >>>>>>>>>> [2] "2016-07-01,02:51:26,<jane>,waiting for plane to Edinburgh"
> >>>>>>>>>> [3] "2016-07-01,02:51:45,<john>,thinking about my boo"
> >>>>>>>>>> [4] "2016-07-01,02:52:07,<jane>,nothing crappy has happened,
> >>>>>>> not
> >>>>>>>> really"
> >>>>>>>>>> [5] "2016-07-01,02:52:20,<john>,plane went by pretty fast,
> >>>>>>> didn't
> >>>>>>>> sleep"
> >>>>>>>>>> [6] "2016-07-01,02:54:08,<jane>,no idea what time it is or
> >>>>>>> where I am
> >>>>>>>>>> really"
> >>>>>>>>>> [7] "2016-07-01,02:54:17,<john>,just know it's london"
> >>>>>>>>>> [8] "2016-07-01,02:56:44,<jane>,you are probably asleep"
> >>>>>>>>>> [9] "2016-07-01,02:58:45,<jane>,I hope fish was fishy in a good
> >>>>>>> eay"
> >>>>>>>>>> [10] "2016-07-01 02:58:56 <jone>"
> >>>>>>>>>> [11] "2016-07-01 02:59:34 <jane>"
> >>>>>>>>>> [12] "2016-07-01,03:02:48,<john>,British security is a little
> >>>>>>> more
> >>>>>>>>>> rigorous..."
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> I made note of the fact that the 10th and 11th lines had no
> >>>>>>> commas.
> >>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>> test2 [84]
> >>>>>>>>>>> [1] "2016-06-28 21:12:43 *** John Doe ended a video chat"
> >>>>>>>>>>
> >>>>>>>>>> That line didn't have any "<" so wasn't matched.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> You could remove all none matching lines for pattern of
> >>>>>>>>>>
> >>>>>>>>>> dates<space>times<space>"<"<name>">"<space><anything>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> with:
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> chrvec <- chrvec[ grepl("^.{10} .{8} <.+> .+$)", chrvec)]
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Do read:
> >>>>>>>>>>
> >>>>>>>>>> ?read.csv
> >>>>>>>>>>
> >>>>>>>>>> ?regex
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> --
> >>>>>>>>>>
> >>>>>>>>>> David
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>>> test2 [85]
> >>>>>>>>>>> [1] "//1,//2,//3,//4"
> >>>>>>>>>>>> test [85]
> >>>>>>>>>>> [1] "2016-07-01 02:50:35 <John Doe> hey"
> >>>>>>>>>>>
> >>>>>>>>>>> Notice how I toggled back and forth between test and test2
> >>>>>>> there. So,
> >>>>>>>>>>> whatever happened with the regex, it happened in the switch
> >>>>>>> from 84
> >>>>>>>> to
> >>>>>>>>>>> 85, I guess. It went on like
> >>>>>>>>>>>
> >>>>>>>>>>> [990] "//1,//2,//3,//4"
> >>>>>>>>>>> [991] "//1,//2,//3,//4"
> >>>>>>>>>>> [992] "//1,//2,//3,//4"
> >>>>>>>>>>> [993] "//1,//2,//3,//4"
> >>>>>>>>>>> [994] "//1,//2,//3,//4"
> >>>>>>>>>>> [995] "//1,//2,//3,//4"
> >>>>>>>>>>> [996] "//1,//2,//3,//4"
> >>>>>>>>>>> [997] "//1,//2,//3,//4"
> >>>>>>>>>>> [998] "//1,//2,//3,//4"
> >>>>>>>>>>> [999] "//1,//2,//3,//4"
> >>>>>>>>>>> [1000] "//1,//2,//3,//4"
> >>>>>>>>>>>
> >>>>>>>>>>> up until line 1000, then I reached max.print.
> >>>>>>>>>>
> >>>>>>>>>>> Michael
> >>>>>>>>>>>
> >>>>>>>>>>> On Thu, May 16, 2019 at 1:05 PM David Winsemius <
> >>>>>>>> dwinsem...@comcast.net>
> >>>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> On 5/16/19 12:30 PM, Michael Boulineau wrote:
> >>>>>>>>>>>>> Thanks for this tip on etiquette, David. I will be sure and
> >>>>>>> not do
> >>>>>>>>>> that again.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I tried the read.fwf from the foreign package, with a code
> >>>>>>> like
> >>>>>>>> this:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> d <- read.fwf("hangouts-conversation.txt",
> >>>>>>>>>>>>>                widths= c(10,10,20,40),
> >>>>>>>>>>>>>
> >>>>>>> col.names=c("date","time","person","comment"),
> >>>>>>>>>>>>>                strip.white=TRUE)
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> But it threw this error:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Error in scan(file = file, what = what, sep = sep, quote =
> >>>>>>> quote,
> >>>>>>>> dec
> >>>>>>>>>> = dec,  :
> >>>>>>>>>>>>>  line 6347 did not have 4 elements
> >>>>>>>>>>>>
> >>>>>>>>>>>> So what does line 6347 look like? (Use `readLines` and print
> >>>>>>> it
> >>>>>>>> out.)
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Interestingly, though, the error only happened when I
> >>>>>>> increased the
> >>>>>>>>>>>>> width size. But I had to increase the size, or else I
> >>>>>>> couldn't
> >>>>>>>> "see"
> >>>>>>>>>>>>> anything.  The comment was so small that nothing was being
> >>>>>>>> captured by
> >>>>>>>>>>>>> the size of the column. so to speak.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> It seems like what's throwing me is that there's no comma
> >>>>>>> that
> >>>>>>>>>>>>> demarcates the end of the text proper. For example:
> >>>>>>>>>>>> Not sure why you thought there should be a comma. Lines
> >>>>>>> usually end
> >>>>>>>>>>>> with  <cr> and or a <lf>.
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> Once you have the raw text in a character vector from
> >>>>>>> `readLines`
> >>>>>>>> named,
> >>>>>>>>>>>> say, 'chrvec', then you could selectively substitute commas
> >>>>>>> for
> >>>>>>>> spaces
> >>>>>>>>>>>> with regex. (Now that you no longer desire to remove the dates
> >>>>>>> and
> >>>>>>>>>> times.)
> >>>>>>>>>>>>
> >>>>>>>>>>>> sub("^(.{10}) (.{8}) (<.+>) (.+$)", "//1,//2,//3,//4", chrvec)
> >>>>>>>>>>>>
> >>>>>>>>>>>> This will not do any replacements when the pattern is not
> >>>>>>> matched.
> >>>>>>>> See
> >>>>>>>>>>>> this test:
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>> newvec <- sub("^(.{10}) (.{8}) (<.+>) (.+$)",
> >>>>>>> "\\1,\\2,\\3,\\4",
> >>>>>>>>>> chrvec)
> >>>>>>>>>>>>> newvec
> >>>>>>>>>>>> [1] "2016-07-01,02:50:35,<john>,hey"
> >>>>>>>>>>>> [2] "2016-07-01,02:51:26,<jane>,waiting for plane to
> >>>>>>> Edinburgh"
> >>>>>>>>>>>> [3] "2016-07-01,02:51:45,<john>,thinking about my boo"
> >>>>>>>>>>>> [4] "2016-07-01,02:52:07,<jane>,nothing crappy has
> >>>>>>> happened, not
> >>>>>>>>>> really"
> >>>>>>>>>>>> [5] "2016-07-01,02:52:20,<john>,plane went by pretty fast,
> >>>>>>> didn't
> >>>>>>>>>> sleep"
> >>>>>>>>>>>> [6] "2016-07-01,02:54:08,<jane>,no idea what time it is or
> >>>>>>> where
> >>>>>>>> I am
> >>>>>>>>>>>> really"
> >>>>>>>>>>>> [7] "2016-07-01,02:54:17,<john>,just know it's london"
> >>>>>>>>>>>> [8] "2016-07-01,02:56:44,<jane>,you are probably asleep"
> >>>>>>>>>>>> [9] "2016-07-01,02:58:45,<jane>,I hope fish was fishy in a
> >>>>>>> good
> >>>>>>>> eay"
> >>>>>>>>>>>> [10] "2016-07-01 02:58:56 <jone>"
> >>>>>>>>>>>> [11] "2016-07-01 02:59:34 <jane>"
> >>>>>>>>>>>> [12] "2016-07-01,03:02:48,<john>,British security is a little
> >>>>>>> more
> >>>>>>>>>>>> rigorous..."
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> You should probably remove the "empty comment" lines.
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> --
> >>>>>>>>>>>>
> >>>>>>>>>>>> David.
> >>>>>>>>>>>>
> >>>>>>>>>>>>> 2016-07-01 15:34:30 <John Doe> Lame. We were in a
> >>>>>>>> starbucks2016-07-01
> >>>>>>>>>>>>> 15:35:02 <Jane Doe> Hmm that's interesting2016-07-01 15:35:09
> >>>>>>> <Jane
> >>>>>>>>>>>>> Doe> You must want coffees2016-07-01 15:35:25 <John Doe>
> >>>>>>> There was
> >>>>>>>>>>>>> lots of Starbucks in my day2016-07-01 15:35:47
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> It was interesting, too, when I pasted the text into the
> >>>>>>> email, it
> >>>>>>>>>>>>> self-formatted into the way I wanted it to look. I had to
> >>>>>>> manually
> >>>>>>>>>>>>> make it look like it does above, since that's the way that it
> >>>>>>>> looks in
> >>>>>>>>>>>>> the txt file. I wonder if it's being organized by XML or
> >>>>>>> something.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Anyways, There's always a space between the two sideways
> >>>>>>> carrots,
> >>>>>>>> just
> >>>>>>>>>>>>> like there is right now: <John Doe> See. Space. And there's
> >>>>>>> always
> >>>>>>>> a
> >>>>>>>>>>>>> space between the data and time. Like this. 2016-07-01
> >>>>>>> 15:34:30
> >>>>>>>> See.
> >>>>>>>>>>>>> Space. But there's never a space between the end of the
> >>>>>>> comment and
> >>>>>>>>>>>>> the next date. Like this: We were in a starbucks2016-07-01
> >>>>>>> 15:35:02
> >>>>>>>>>>>>> See. starbucks and 2016 are smooshed together.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> This code is also on the table right now too.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> a <- read.table("E:/working
> >>>>>>>>>>>>> directory/-189/hangouts-conversation2.txt", quote="\"",
> >>>>>>>>>>>>> comment.char="", fill=TRUE)
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>> h<-cbind(hangouts.conversation2[,1:2],hangouts.conversation2[,3:5],hangouts.conversation2[,6:9])
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> aa<-gsub("[^[:digit:]]","",h)
> >>>>>>>>>>>>> my.data.num <- as.numeric(str_extract(h, "[0-9]+"))
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Those last lines are a work in progress. I wish I could
> >>>>>>> import a
> >>>>>>>>>>>>> picture of what it looks like when it's translated into a
> >>>>>>> data
> >>>>>>>> frame.
> >>>>>>>>>>>>> The fill=TRUE helped to get the data in table that kind of
> >>>>>>> sort of
> >>>>>>>>>>>>> works, but the comments keep bleeding into the data and time
> >>>>>>>> column.
> >>>>>>>>>>>>> It's like
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 2016-07-01 15:59:17 <Jane Doe> Seriously I've never been
> >>>>>>>>>>>>> over               there
> >>>>>>>>>>>>> 2016-07-01 15:59:27 <Jane Doe> It confuses me :(
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> And then, maybe, the "seriously" will be in a column all to
> >>>>>>>> itself, as
> >>>>>>>>>>>>> will be the "I've'"and the "never" etc.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I will use a regular expression if I have to, but it would be
> >>>>>>> nice
> >>>>>>>> to
> >>>>>>>>>>>>> keep the dates and times on there. Originally, I thought they
> >>>>>>> were
> >>>>>>>>>>>>> meaningless, but I've since changed my mind on that count.
> >>>>>>> The
> >>>>>>>> time of
> >>>>>>>>>>>>> day isn't so important. But, especially since, say, Gmail
> >>>>>>> itself
> >>>>>>>> knows
> >>>>>>>>>>>>> how to quickly recognize what it is, I know it can be done. I
> >>>>>>> know
> >>>>>>>>>>>>> this data has structure to it.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Michael
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Wed, May 15, 2019 at 8:47 PM David Winsemius <
> >>>>>>>>>> dwinsem...@comcast.net> wrote:
> >>>>>>>>>>>>>> On 5/15/19 4:07 PM, Michael Boulineau wrote:
> >>>>>>>>>>>>>>> I have a wild and crazy text file, the head of which looks
> >>>>>>> like
> >>>>>>>> this:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> 2016-07-01 02:50:35 <john> hey
> >>>>>>>>>>>>>>> 2016-07-01 02:51:26 <jane> waiting for plane to Edinburgh
> >>>>>>>>>>>>>>> 2016-07-01 02:51:45 <john> thinking about my boo
> >>>>>>>>>>>>>>> 2016-07-01 02:52:07 <jane> nothing crappy has happened, not
> >>>>>>>> really
> >>>>>>>>>>>>>>> 2016-07-01 02:52:20 <john> plane went by pretty fast,
> >>>>>>> didn't
> >>>>>>>> sleep
> >>>>>>>>>>>>>>> 2016-07-01 02:54:08 <jane> no idea what time it is or where
> >>>>>>> I am
> >>>>>>>>>> really
> >>>>>>>>>>>>>>> 2016-07-01 02:54:17 <john> just know it's london
> >>>>>>>>>>>>>>> 2016-07-01 02:56:44 <jane> you are probably asleep
> >>>>>>>>>>>>>>> 2016-07-01 02:58:45 <jane> I hope fish was fishy in a good
> >>>>>>> eay
> >>>>>>>>>>>>>>> 2016-07-01 02:58:56 <jone>
> >>>>>>>>>>>>>>> 2016-07-01 02:59:34 <jane>
> >>>>>>>>>>>>>>> 2016-07-01 03:02:48 <john> British security is a little
> >>>>>>> more
> >>>>>>>>>> rigorous...
> >>>>>>>>>>>>>> Looks entirely not-"crazy". Typical log file format.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Two possibilities: 1) Use `read.fwf` from pkg foreign; 2)
> >>>>>>> Use
> >>>>>>>> regex
> >>>>>>>>>>>>>> (i.e. the sub-function) to strip everything up to the "<".
> >>>>>>> Read
> >>>>>>>>>>>>>> `?regex`. Since that's not a metacharacters you could use a
> >>>>>>>> pattern
> >>>>>>>>>>>>>> ".+<" and replace with "".
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> And do read the Posting Guide. Cross-posting to
> >>>>>>> StackOverflow and
> >>>>>>>>>> Rhelp,
> >>>>>>>>>>>>>> at least within hours of each, is considered poor manners.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> David.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> It goes on for a while. It's a big file. But I feel like
> >>>>>>> it's
> >>>>>>>> going
> >>>>>>>>>> to
> >>>>>>>>>>>>>>> be difficult to annotate with the coreNLP library or
> >>>>>>> package. I'm
> >>>>>>>>>>>>>>> doing natural language processing. In other words, I'm
> >>>>>>> curious
> >>>>>>>> as to
> >>>>>>>>>>>>>>> how I would shave off the dates, that is, to make it look
> >>>>>>> like:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> <john> hey
> >>>>>>>>>>>>>>> <jane> waiting for plane to Edinburgh
> >>>>>>>>>>>>>>>  <john> thinking about my boo
> >>>>>>>>>>>>>>> <jane> nothing crappy has happened, not really
> >>>>>>>>>>>>>>> <john> plane went by pretty fast, didn't sleep
> >>>>>>>>>>>>>>> <jane> no idea what time it is or where I am really
> >>>>>>>>>>>>>>> <john> just know it's london
> >>>>>>>>>>>>>>> <jane> you are probably asleep
> >>>>>>>>>>>>>>> <jane> I hope fish was fishy in a good eay
> >>>>>>>>>>>>>>>  <jone>
> >>>>>>>>>>>>>>> <jane>
> >>>>>>>>>>>>>>> <john> British security is a little more rigorous...
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> To be clear, then, I'm trying to clean a large text file by
> >>>>>>>> writing a
> >>>>>>>>>>>>>>> regular expression? such that I create a new object with no
> >>>>>>>> numbers
> >>>>>>>>>> or
> >>>>>>>>>>>>>>> dates.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Michael
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> ______________________________________________
> >>>>>>>>>>>>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and
> >>>>>>> more,
> >>>>>>>> see
> >>>>>>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>>>>>>>>>>>>> PLEASE do read the posting guide
> >>>>>>>>>> http://www.R-project.org/posting-guide.html
> >>>>>>>>>>>>>>> and provide commented, minimal, self-contained,
> >>>>>>> reproducible
> >>>>>>>> code.
> >>>>>>>>>>>>> ______________________________________________
> >>>>>>>>>>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more,
> >>>>>>> see
> >>>>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>>>>>>>>>>> PLEASE do read the posting guide
> >>>>>>>>>> http://www.R-project.org/posting-guide.html
> >>>>>>>>>>>>> and provide commented, minimal, self-contained, reproducible
> >>>>>>> code.
> >>>>>>>>>>>> ______________________________________________
> >>>>>>>>>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more,
> >>>>>>> see
> >>>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>>>>>>>>>> PLEASE do read the posting guide
> >>>>>>>>>> http://www.R-project.org/posting-guide.html
> >>>>>>>>>>>> and provide commented, minimal, self-contained, reproducible
> >>>>>>> code.
> >>>>>>>>>>> ______________________________________________
> >>>>>>>>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more,
> >>>>>>> see
> >>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>>>>>>>>> PLEASE do read the posting guide
> >>>>>>>>>> http://www.R-project.org/posting-guide.html
> >>>>>>>>>>> and provide commented, minimal, self-contained, reproducible
> >>>>>>> code.
> >>>>>>>>>>
> >>>>>>>>>> ______________________________________________
> >>>>>>>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>>>>>>>> PLEASE do read the posting guide
> >>>>>>>>>> http://www.R-project.org/posting-guide.html
> >>>>>>>>>> and provide commented, minimal, self-contained, reproducible
> >>>>>>> code.
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>      [[alternative HTML version deleted]]
> >>>>>>>>>
> >>>>>>>>> ______________________________________________
> >>>>>>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>>>>>>> PLEASE do read the posting guide
> >>>>>>>> http://www.R-project.org/posting-guide.html
> >>>>>>>>> and provide commented, minimal, self-contained, reproducible code.
> >>>>>>>>
> >>>>>>>> ______________________________________________
> >>>>>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>>>>>> PLEASE do read the posting guide
> >>>>>>>> http://www.R-project.org/posting-guide.html
> >>>>>>>> and provide commented, minimal, self-contained, reproducible code.
> >>>>>>>>
> >>>>>>>
> >>>>>>>    [[alternative HTML version deleted]]
> >>>>>>>
> >>>>>>> ______________________________________________
> >>>>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>>>>> PLEASE do read the posting guide
> >>>>>>> http://www.R-project.org/posting-guide.html
> >>>>>>> and provide commented, minimal, self-contained, reproducible code.
> >>>>>>
> >>>>>> --
> >>>>>> Sent from my phone. Please excuse my brevity.
> >>>>>
> >>>>> ______________________________________________
> >>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>>> PLEASE do read the posting guide 
> >>>>> http://www.R-project.org/posting-guide.html
> >>>>> and provide commented, minimal, self-contained, reproducible code.
> >>>>
> >>>
> >>> ______________________________________________
> >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide 
> >>> http://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.
> >>
> >
> > ______________________________________________
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to separate string from numbers in a large txt file

Reply via email to