Dear David,
Thank you very much. You have solved my problem, not of importing in
the right format but the date conversion. It works for me, and if I
specify the timezone tz='GMT', I get exactly the same time of the
webpage.
Sincerely yours,
Cristina
On Tue, 11 Jul 2017 07:15:43 -0700, David Winsemius wrote:
On Jul 11, 2017, at 6:25 AM, Cristina Silva <csi...@ipma.pt> wrote:
Dear Jeff,
I am sorry, I didn't notice that it was not plain text. I hope that
it is now in the correct format. I explain the problem again, now with
more detais.
I am collecting the track positions of our research vessel from
www.marinetraffic.com. In the page, the data appear in a table:
Timestamp Source Speed (kn) Latitude (°) Longitude (°)
Course (°) Show on Map
2017-07-11 10:57 (UTC) Terr-AIS 8.6 37.36228 -9.176811
200
2017-07-11 10:54 (UTC) Terr-AIS 8.4 37.36875 -9.174048
200
2017-07-11 09:50 (UTC) Terr-AIS 8 37.51499 -9.184502
149
2017-07-11 09:08 (UTC) Terr-AIS 7.7 37.60513 -9.228263
169
2017-07-11 09:05 (UTC) Terr-AIS 8 37.61168 -9.229627
167
2017-07-11 09:03 (UTC) Terr-AIS 6.9 37.61626 -9.230456
132
2017-07-11 08:59 (UTC) Terr-AIS 3.4 37.61758 -9.237928
101
2017-07-11 08:57 (UTC) Terr-AIS 3.6 37.61808 -9.240235
116
2017-07-11 08:47 (UTC) Terr-AIS 3.3 37.62267 -9.249762
112
2017-07-11 08:44 (UTC) Terr-AIS 3.2 37.6241 -9.25348
115
2017-07-11 08:40 (UTC) Terr-AIS 3.3 37.62557 -9.257295
112
2017-07-11 08:32 (UTC) Terr-AIS 3.4 37.62903 -9.266028
111
2017-07-11 08:28 (UTC) Terr-AIS 3.3 37.63051 -9.269725
114
2017-07-11 08:25 (UTC) Terr-AIS 3.3 37.63207 -9.273642
120
2017-07-11 08:20 (UTC) Terr-AIS 3.6 37.63395 -9.278254
118
2017-07-11 08:16 (UTC) Terr-AIS 3.3 37.63586 -9.282853
116
2017-07-11 08:08 (UTC) Terr-AIS 3.2 37.6392 -9.291157
118
2017-07-11 07:38 (UTC) Sat-AIS
2017-07-11 06:58 (UTC) Terr-AIS 3 37.68459 -9.321658
159
2017-07-11 06:53 (UTC) Terr-AIS 3.3 37.68881 -9.324172
150
To get this table I had to copy manually from the webpage, paste in
a text file, delete information on figures, links, etc. and then
import to R. This works well, I read the csv file and have no problems
with date formats. But instead of 20 records, I have to import e.g.
500 records, i.e. 10 pages of 50 records each, or more. I want to
import these data in a more automated and quick procedure. I have done
this before with no problems, but probably the webpage changed some
permissions.
Here is the script used to get the data directly from the webpage
into R, for a sample of 20 records (10 per page):
library(XML)
x <- list()
for (i in 1:2)
{
x[i]<-
readHTMLTable(paste('http://www.marinetraffic.com/en/ais/index/positions/all/shipid:318358/mmsi:263601000/shipname:NORUEGA/per_page:10/page:',
i, sep=''))
}
ais <- do.call('rbind', x)
ais <- ais[,-7]
and I got the following table:
ais
Timestamp Source
Speed (kn) Latitude (°) Longitude (°) Course (°)
1 149977066014997706604 minutes ago Terr-AIS
8.6 37.36228 -9.176811 200
There's undoubtedly a way to extract the time info from this HTML
code (but I'm not the one to provide that solution.) This is the
first
entry in the webpage source when viewed right now:
<td>
<time class="toggle_time utc_time hide-me formatTime"
data-overrideAbbreviation="empty" data-time="1499780737"
data-timezone="0"><span>1499780737</span></time><time
class="toggle_time my_time hide-me formatTime"
data-overrideAbbreviation="MyT" data-time="1499780737"
data-timezone="MyT"><span>1499780737</span></time><time
class="toggle_time dif_time hide"><span>6 minutes
ago</span></time> </td>
So the POSIX-style time is givne twice (with no separator and then an
offset in minutes is calculated. If I were given the records as you
have them now I would extract the leading 10 digits and convert to
POSIXct
as.POSIXct(as.numeric(gsub("(^.{10}).+","", Timestamp)),
origin="1970-01-01")
as.POSIXct(as.numeric(gsub("(^.{10}).+","\\1", ais$Timestamp)),
origin="1970-01-01")
[1] "2017-07-11 07:09:57 PDT" "2017-07-11 06:45:37 PDT" "2017-07-11
06:40:34 PDT" "2017-07-11 06:34:46 PDT" "2017-07-11 06:32:26 PDT"
[6] "2017-07-11 06:29:34 PDT" "2017-07-11 06:26:37 PDT" "2017-07-11
06:20:24 PDT" "2017-07-11 06:17:18 PDT" "2017-07-11 06:12:37 PDT"
[11] "2017-07-11 06:08:24 PDT" "2017-07-11 06:05:04 PDT" "2017-07-11
06:03:27 PDT" "2017-07-11 05:59:37 PDT" "2017-07-11 05:54:37 PDT"
[16] "2017-07-11 05:50:48 PDT" "2017-07-11 05:44:53 PDT" "2017-07-11
05:29:17 PDT" "2017-07-11 05:26:44 PDT" "2017-07-11 04:30:08 PDT"
-- David.
2 149977048714997704876 minutes ago Terr-AIS
8.4 37.36875 -9.174048 200
3 149976661414997666141 hour, 11 minutes ago Terr-AIS
8.0 37.51499 -9.184502 149
4 149976410714997641071 hour, 53 minutes ago Terr-AIS
7.7 37.60513 -9.228263 169
5 149976392714997639271 hour, 56 minutes ago Terr-AIS
8.0 37.61168 -9.229627 167
6 149976378014997637801 hour, 58 minutes ago Terr-AIS
6.9 37.61626 -9.230456 132
7 149976354014997635402 hours, 2 minutes ago Terr-AIS
3.4 37.61758 -9.237928 101
8 149976342014997634202 hours, 4 minutes ago Terr-AIS
3.6 37.61808 -9.240235 116
9 149976286114997628612 hours, 14 minutes ago Terr-AIS
3.3 37.62267 -9.249762 112
10 149976264714997626472 hours, 17 minutes ago Terr-AIS
3.2 37.6241 -9.25348 115
11 149976243014997624302 hours, 21 minutes ago Terr-AIS
3.3 37.62557 -9.257295 112
12 149976193714997619372 hours, 29 minutes ago Terr-AIS
3.4 37.62903 -9.266028 111
13 149976172814997617282 hours, 32 minutes ago Terr-AIS
3.3 37.63051 -9.269725 114
14 149976150714997615072 hours, 36 minutes ago Terr-AIS
3.3 37.63207 -9.273642 120
15 149976124714997612472 hours, 40 minutes ago Terr-AIS
3.6 37.63395 -9.278254 118
16 149976098714997609872 hours, 45 minutes ago Terr-AIS
3.3 37.63586 -9.282853 116
17 149976051014997605102 hours, 53 minutes ago Terr-AIS
3.2 37.6392 -9.291157 118
18 149975870414997587043 hours, 23 minutes ago Sat-AIS Add to SAT
Fleet for undelayed data <NA> <NA> <NA>
19 149975631414997563144 hours, 3 minutes ago Terr-AIS
3.0 37.68459 -9.321658 159
20 149975598814997559884 hours, 8 minutes ago Terr-AIS
3.3 37.68881 -9.324172 150
The records correspond exactly to the first table. I have tried to
change the colClasses to character, but the results are always the
same. My question is: how to change the Timestamp to a POSIXlt POSIXct
format, either in the importing phase or within R?
I hope that the question is clear now.
Cristina
On Mon, 10 Jul 2017 09:18:05 -0700, Jeff Newmiller wrote:
Not reproducible. [1][2][3] If our answers don't seem to apply to
your situation, it will likely be because you did not explain your
question clearly.
Not plain text. This is a plain text mailing list, and the
best-case
scenario when you let your email program send HTML is that what you
saw is not what we see (worst case is your email is scrambled on
our
end).
Have you read the documentation for the function you are using? In
particular, what about the colClasses argument? If you don't let
readHTMLTable guess what the format is (have it read in as
character
data) then you have a fighting chance to get it right yourself,
e.g.
as.POSIXct( "2017-07-10 14:04 (UTC)", format="%Y-%m-%d %H:%M
(UTC)",
tz="UTC" )
-----
[1]
http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
[2] http://adv-r.had.co.nz/Reproducibility.html
[3] https://cran.r-project.org/web/packages/reprex/index.html
--
Sent from my phone. Please excuse my brevity.
On July 10, 2017 8:31:30 AM PDT, Cristina Silva <csi...@ipma.pt>
wrote:
Hi,
I am extracting positions data from the marine traffic website.
The
table has a "Timestamp" column which, in the browser, appears with
the
format yyyy-mm-dd HH:MM (UTC), e.g. 2017-07-10 14:04 (UTC).
When I import the table, the same date "2017-07-10 14:04 (UTC)"
appears
as "1499696500149969650021 minutes ago", This is the more recent
date
and time. Older records, as e.g. "2017-07-09 17:02 (UTC)" appear
as
e.g.
"1499619726149961972621 hours, 59 minutes ago".
I don't know how to convert these data to the time formats used in
R
(POSIXct).
The script is very simple and worked before:
library(XML)
x <- readHTMLTable('url')
where the 'url' is the link to the website with the specification
of
the
vessel.
I appreciate any help.
Cristina
David Winsemius
Alameda, CA, USA
--
Cristina Silva
Divisão de Modelação e Gestão de Recursos Pesqueiros
Avenida de Brasília
1449-006 Lisboa
@: csi...@ipma.pt
#: +351 213027096
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.