Inline
> On 2019-05-19, at 18:11, Michael Boulineau
> wrote:
>
> For context:
>
>> In gsub(b, "\\1<\\2> ", a) the work is done by the backreferences \\1 and
>> \\2. The expression says:
>> Substitute ALL of the match with the first captured expression, then " <",
>> then the second capture
For context:
> In gsub(b, "\\1<\\2> ", a) the work is done by the backreferences \\1 and
> \\2. The expression says:
> Substitute ALL of the match with the first captured expression, then " <",
> then the second captured expression, then "> ". The rest of the line is >not
> substituted and appe
Inline ...
> On 2019-05-19, at 13:56, Michael Boulineau
> wrote:
>
>> b <- "^([0-9-]{10} [0-9:]{8} )[*]{3} (\\w+ \\w+)"
>
> so the ^ signals that the regex BEGINS with a number (that could be
> any number, 0-9) that is only 10 characters long (then there's the
> dash in there, too, with the 0-
> b <- "^([0-9-]{10} [0-9:]{8} )[*]{3} (\\w+ \\w+)"
so the ^ signals that the regex BEGINS with a number (that could be
any number, 0-9) that is only 10 characters long (then there's the
dash in there, too, with the 0-9-, which I assume enabled the regex to
grab the - that's between the numbers in
Inline
> On 2019-05-18, at 20:34, Michael Boulineau
> wrote:
>
> It appears to have worked, although there were three little quirks.
> The ; close(con); rm(con) didn't work for me; the first row of the
> data.frame was all NAs, when all was said and done;
You will get NAs for lines that can'
It appears to have worked, although there were three little quirks.
The ; close(con); rm(con) didn't work for me; the first row of the
data.frame was all NAs, when all was said and done; and then there
were still three *** on the same line where the  was apparently
deleted.
> a <- readLines ("h
This works for me:
# sample data
c <- character()
c[1] <- "2016-01-27 09:14:40 started a video chat"
c[2] <- "2016-01-27 09:15:20 https://lh3.googleusercontent.com/";
c[3] <- "2016-01-27 09:15:20 Hey "
c[4] <- "2016-01-27 09:15:22 ended a video chat"
c[5] <- "2016-01-27 21:07:11 started a v
Going back and thinking through what Boris and William were saying
(also Ivan), I tried this:
a <- readLines ("hangouts-conversation-6.csv.txt")
b <- "^([0-9-]{10} [0-9:]{8} )[*]{3} (\\w+ \\w+)"
c <- gsub(b, "\\1<\\2> ", a)
> head (c)
[1] "2016-01-27 09:14:40 *** Jane Doe started a video chat"
Don't start putting in extra commas and then reading this as csv. That approach
is broken. The correct approach is what Bill outlined: read everything with
readLines(), and then use a proper regular expression with strcapture().
You need to pre-process the object that readLines() gives you: rep
Very interesting. I'm sure I'll be trying to get rid of the byte order
mark eventually. But right now, I'm more worried about getting the
character vector into either a csv file or data.frame; that way, I can
be able to work with the data neatly tabulated into four columns:
date, time, person, comm
If byte order mark is the issue then you can specify the file encoding as
"UTF-8-BOM" and it won't show up in your data any more.
On May 17, 2019 12:12:17 PM PDT, William Dunlap via R-help
wrote:
>The pattern I gave worked for the lines that you originally showed from
>the
>data file ('a'), bef
On Fri, 17 May 2019 11:36:22 -0700
Michael Boulineau wrote:
> So, who knows what happened with the  at the beginning of [1]
> directly above.
perl -Mutf8 -MEncode=encode,decode -Mcharnames=:full \
-E'say charnames::viacode ord decode utf8 => encode latin1 => ""'
# ZERO WIDTH NO-BREAK SPA
The pattern I gave worked for the lines that you originally showed from the
data file ('a'), before you put commas into them. If the name is either of
the form "" or "***" then the "(<[^>]*>)" needs to be changed so
something like "(<[^>]*>|[*]{3})".
The " " at the start of the imported data m
This seemed to work:
> a <- readLines ("hangouts-conversation-6.csv.txt")
> b <- sub("^(.{10}) (.{8}) (<.+>) (.+$)", "\\1,\\2,\\3,\\4", a)
> b [1:84]
And the first 85 lines looks like this:
[83] "2016-06-28 21:02:28 *** Jane Doe started a video chat"
[84] "2016-06-28 21:12:43 *** John Doe ended
Consider using readLines() and strcapture() for reading such a file. E.g.,
suppose readLines(files) produced a character vector like
x <- c("2016-10-21 10:35:36 What's your login",
"2016-10-21 10:56:29 John_Doe",
"2016-10-21 10:56:37 Admit#8242",
"October 23, 1819
On 5/16/19 3:53 PM, Michael Boulineau wrote:
OK. So, I named the object test and then checked the 6347th item
test <- readLines ("hangouts-conversation.txt)
test [6347]
[1] "2016-10-21 10:56:37 Admit#8242"
Perhaps where it was getting screwed up is, since the end of this is a
number (8242)
OK. So, I named the object test and then checked the 6347th item
> test <- readLines ("hangouts-conversation.txt)
> test [6347]
[1] "2016-10-21 10:56:37 Admit#8242"
Perhaps where it was getting screwed up is, since the end of this is a
number (8242), then, given that there's no space between the
On 5/16/19 12:30 PM, Michael Boulineau wrote:
Thanks for this tip on etiquette, David. I will be sure and not do that again.
I tried the read.fwf from the foreign package, with a code like this:
d <- read.fwf("hangouts-conversation.txt",
widths= c(10,10,20,40),
Thanks for this tip on etiquette, David. I will be sure and not do that again.
I tried the read.fwf from the foreign package, with a code like this:
d <- read.fwf("hangouts-conversation.txt",
widths= c(10,10,20,40),
col.names=c("date","time","person","comment"),
On 5/15/19 4:07 PM, Michael Boulineau wrote:
I have a wild and crazy text file, the head of which looks like this:
2016-07-01 02:50:35 hey
2016-07-01 02:51:26 waiting for plane to Edinburgh
2016-07-01 02:51:45 thinking about my boo
2016-07-01 02:52:07 nothing crappy has happened, not reall
I have a wild and crazy text file, the head of which looks like this:
2016-07-01 02:50:35 hey
2016-07-01 02:51:26 waiting for plane to Edinburgh
2016-07-01 02:51:45 thinking about my boo
2016-07-01 02:52:07 nothing crappy has happened, not really
2016-07-01 02:52:20 plane went by pretty fast,
21 matches
Mail list logo