Re: [R] subset English language using textcat package

2018-11-19 Thread Robert David Burbidge via R-help
Look at the help docs and examples for textcat and sapply: print(as.character(data$x[sapply(data$x, textcat)=="english"])) Although textcat defaults classify "This book is amazing" as dutch, so you may want to read the help for textcat and change the profile db ("p") or "method". On 19/11/20

Re: [R] Help with Centroids

2018-11-14 Thread Robert David Burbidge via R-help
# construct the dataframe `TK-QUADRANT` <- c(9161,9162,9163,9164,10152,10154,10161,10163) LAT <- c(55.07496,55.07496,55.02495,55.02496,54.97496,54.92495,54.97496,54.92496) LON <- c(8.37477,8.458109,8.37477,8.45811,8.291435,8.291437,8.374774,8.374774) df <- data.frame(`TK-QUADRANT`=`TK-QUADRANT`,L

Re: [R] POS tagging generating a string

2018-11-13 Thread Robert David Burbidge via R-help
On 13/11/2018 12:31, Elahe chalabi wrote: Hi Robert, Thanks for your reply but your code returns the number of verbs in each massage. What I want is a string showing verbs in each massage. The output of my code (below) is: # A tibble: 4 x 2   DocumentID verbs    1 478920 has|been|

Re: [R] Help with Centroids

2018-11-13 Thread Robert David Burbidge via R-help
Hi Sasha, Your attached table did not come through, please see the posting guidelines: "No binary attachments except for PS, PDF, and some image and archive formats (others are automatically stripped off because they can contain malicious software). Files in other formats and larger ones should

Re: [R] saveRDS() and readRDS() Why? [solved, kind of]

2018-11-08 Thread Robert David Burbidge via R-help
Apologies, unserialize takes a connection, not a file, so you would need something like: # linux (not run) f <- file("rawData.rds", open="r") rawData <- unserialize(f) close(f) The help file states that readRDS will read a file created by serialize (saveRDS is a wrapper for serialize). It ap

Re: [R] saveRDS() and readRDS() Why?

2018-11-07 Thread Robert David Burbidge via R-help
Patrick, I cannot reproduce this behaviour. I'm using: Windows 8.1; R 3.5.1; RStudio 1.1.463 running in a VirtualBox on Ubuntu 18.04 with R 3.4.4; RStudio 1.1.456 The file size of rawData.rds is always 88 bytes in my example and od gives the same results on Windows and Linux. I am using a V

Re: [R] saveRDS() and readRDS() Why?

2018-11-07 Thread Robert David Burbidge via R-help
If the file sizes are the same, then presumably both contain the binary data. From the serialize function help: "As almost all systems in current use are little-endian, xdr = FALSE can be used to avoid byte-shuffling at both ends when transferring data from one little-endian machine to another

Re: [R] saveRDS() and readRDS() Why?

2018-11-07 Thread Robert David Burbidge via R-help
Hi Patrick, From the help: "save writes a single line header (typically "RDXs\n") before the serialization of a single object". If the file sizes are the same (see Eric's message), then the problem may be due to different line terminators. Try serialize and unserialize for low-level control

Re: [R] POS tagging generating a string

2018-11-06 Thread Robert David Burbidge via R-help
Hi Elahe, You could modify your count_verbs function from your previous post: * use scan to extract the tokens (words) from Message * use your previous grepl expression to index the tokens that are verbs * paste the verbs together to form the entries of a new column. Here is one solution: