[Rd] xmlParseDoc parser errors

2012-11-16 Thread bryan rasmussen
Hi,

I have some XML files that have a processing instruction directly
after the XML declaration

when I do
kgroup.reading <- character(0)
for (file in file_list){kgroup.reading <-
xmlParseDoc(file.path("c:","projects","respositories","dk","004",file))}

I get the error
file name :1: parser error : Start tag expected, '<' not found

When I remove the processing instruction and try to load it again I do
not get the parser error.

This is of course understandable because of

 [Definition: Processing instructions (PIs) allow documents to contain
instructions for applications.]
Processing Instructions
[16]PI ::=  '' Char*)))? 
'?>'
[17]PITarget   ::=  Name - (('X' | 'x') ('M' | 'm') ('L' | 
'l'))

PIs are not part of the document's character data, but MUST be passed
through to the application. The PI begins with a target (PITarget)
used to identify the application to which the instruction is directed.
The target names " XML ", " xml ", and so on are reserved for
standardization in this or future versions of this specification. The
XML Notation mechanism may be used for formal declaration of PI
targets. Parameter entity references MUST NOT be recognized within
processing instructions.

from the specification, on the other hand it does not say that it is
never allowed for any PI given that they (the W3C) are planning to use
it for  'standardization in this or future versions of this
specification'

Unfortunately the people who made the xml-model processing instruction

http://www.w3.org/TR/2012/NOTE-xml-model-20121009/#the-xml-model-processing-instruction

I guess decided they had the right to standardize a processing
instruction name.

Is there any way to get around this problem?

Also When I do the following:

 t <- ''
> xmlParseDoc(t)
I get the parser warning

:1:
parser warning : xmlParsePITarget: invalid name prefix 'xml'
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] best way to extract this meaningful data from a table

2013-02-18 Thread bryan rasmussen
I have a table with a structure like the following:

lang | basic id | doc id | topics|
se  | 447157 | MD_2002_0014 |12 |

loaded topics <- read.table("path to file",header=TRUE, sep="|",
fileEncoding="utf-8")

In that table the actual meaningful data (in this context) is the text
before the first underscore in doc id which is the document type ( for
example MD as above), and topics.
However topics can have more than one value in it, multiple values are
comma separated, if there is no actual topic I have a 0 although I can
also have an empty column if I want.

So what I want is the best way to extract the meaningful data - the
comma separated values of each topics column and the actual document
type so that I can start to do reports of how many documents of type X
have no topics, median number of topics per document type etc.

Do I have to loop through the table and build a new table up with the
info I want, or is there a smarter way to do it?
If a smarter way, what is that smarter way.

Thanks,
Bryan Rasmussen

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] SWF animation method

2007-08-08 Thread bryan rasmussen
I suppose what is really wanted is a way to associate a the parts of a
graph with a timeline a la gapminder.

Cheers,
Bryan Rasmussen

On 8/8/07, Mike Lawrence <[EMAIL PROTECTED]> wrote:
> Hi all,
>
> Just thought I'd share something I discovered last night. I was
> interested in creating animations consisting of a series of plots and
> after finding very little in the usual sources regarding animation in
> R directly, and disliking the imagemagick method described here
> (http://tolstoy.newcastle.edu.au/R/help/05/10/13297.html), I
> discovered that if one exports the plots to a multipage pdf, it is
> relatively trivial to then use the pdf2swf command in SWFTools
> (http://www.swftools.org/download.html; mac install instructions
> here: http://9mmedia.com/blog/?p=7).
>
> pdf2swf seems to generate swf animations with a slow frame rate, but
> you can increase the framerate using 'swfcombine -r 30 --dummy
> myslow.swf -o myfast.swf', where the value passed to -r is the
> framerate.
>
> Unfortunately, this method seems to have limitations with regards to
> the number of plots it can convert. For example, on my system (17"
> macbook pro, 2.33GHz, 2GB ram, OSX 10.4.10, R 2.5.1) the maximum
> number of single point plots I can do is about 5400 (i.e. for(i in
> 1:5400) plot(runif(1),ylim=c(0,1)) ). Complexity of the plots might
> matter as well, but I only have rather convoluted examples of this.
> Also, pdf2swf throws up a lot of errors ('ERROR   Internal error:
> drawChar.render!=beginString.render'), the origin of which I know
> not, that might be slowing things down.
>
> Now, if only someone could wrap this process into a single R command
> (I'm a little too newb to do this myself I think).
>
> Mike
>
> --
> Mike Lawrence
> Graduate Student, Department of Psychology, Dalhousie University
>
> Website: http://memetic.ca
>
> Public calendar: http://icalx.com/public/informavore/Public
>
> "The road to wisdom? Well, it's plain and simple to express:
> Err and err and err again, but less and less and less."
> - Piet Hein
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] example using R to aggregate data from multiple Excel files

2006-10-05 Thread bryan rasmussen
Hi,

I have a project to analyse the various Web server statistics for a
server on a weekly basis for the past year using data maintained in
about 20 excel files per week. I need to go through these files and
aggregate the data, obviously the excel files are pretty simple 2
column affairs (I say obviously because otherwise why was the data
maintained in 20 files for web site usage per week if the files
themselves were not very simple) I am looking for examples of going
over a bunch of excel files with R, extracting the data and then
aggregating for analysis.

Cheers,
Bryan Rasmussen

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel