Hi Romain,
I've been thinking for quite a long time on how to keep comments when
parsing R code and finally got a trick with inspiration from one of my
friends, i.e. to mask the comments in special assignments to "cheat" R
parser
# keep.comment: whether to keep the comments or not
# keep.blank.line: preserve blank lines or not?
# begin.comment and end.comment: special identifiers that mark the orignial
# comments as 'begin.comment = "#[ comments ]end.comment"'
# and these marks will be removed after the modified code is parsed
tidy.source <- function(source = "clipboard", keep.comment = TRUE,
keep.blank.line = FALSE, begin.comment, end.comment, ...) {
# parse and deparse the code
tidy.block = function(block.text) {
exprs = parse(text = block.text)
n = length(exprs)
res = character(n)
for (i in 1:n) {
dep = paste(deparse(exprs[i]), collapse = "\n")
res[i] = substring(dep, 12, nchar(dep) - 1)
}
return(res)
}
text.lines = readLines(source, warn = FALSE)
if (keep.comment) {
# identifier for comments
identifier = function() paste(sample(LETTERS), collapse = "")
if (missing(begin.comment))
begin.comment = identifier()
if (missing(end.comment))
end.comment = identifier()
# remove leading and trailing white spaces
text.lines = gsub("^[[:space:]]+|[[:space:]]+$", "",
text.lines)
# make sure the identifiers are not in the code
# or the original code might be modified
while (length(grep(sprintf("%s|%s", begin.comment, end.comment),
text.lines))) {
begin.comment = identifier()
end.comment = identifier()
}
head.comment = substring(text.lines, 1, 1) == "#"
# add identifiers to comment lines to cheat R parser
if (any(head.comment)) {
text.lines[head.comment] = gsub("\"", "\'",
text.lines[head.comment])
text.lines[head.comment] = sprintf("%s=\"%s%s\"",
begin.comment, text.lines[head.comment], end.comment)
}
# keep blank lines?
blank.line = text.lines == ""
if (any(blank.line) & keep.blank.line)
text.lines[blank.line] = sprintf("%s=\"%s\"", begin.comment,
end.comment)
text.tidy = tidy.block(text.lines)
# remove the identifiers
text.tidy = gsub(sprintf("%s = \"|%s\"", begin.comment,
end.comment), "", text.tidy)
}
else {
text.tidy = tidy.block(text.lines)
}
cat(paste(text.tidy, collapse = "\n"), "\n", ...)
invisible(text.tidy)
}
The above function can deal with comments which are in single lines, e.g.
f = tempfile()
writeLines('
# rotation of the word "Animation"
# in a loop; change the angle and color
# step by step
for (i in 1:360) {
# redraw the plot again and again
plot(1,ann=FALSE,type="n",axes=FALSE)
# rotate; use rainbow() colors
text(1,1,"Animation",srt=i,col=rainbow(360)[i],cex=7*i/360)
# pause for a while
Sys.sleep(0.01)}
', f)
Then parse the code file 'f':
tidy.source(f)
# rotation of the word 'Animation'
# in a loop; change the angle and color
# step by step
for (i in 1:360) {
# redraw the plot again and again
plot(1, ann = FALSE, type = "n", axes = FALSE)
# rotate; use rainbow() colors
text(1, 1, "Animation", srt = i, col = rainbow(360)[i], cex = 7 *
i/360)
# pause for a while
Sys.sleep(0.01)
}
Of course this function has some limitations: it does not support
inline comments or comments which are inside incomplete code lines.
Peter's example
f #here
( #here
a #here (possibly)
= #here
1 #this one belongs to the argument, though
) #but here as well
will be parsed as
f
(a = 1)
I'm quite interested in syntax highlighting of R code and saw your
previous discussions in another posts (with Jose Quesada, etc). I'd
like to do something for your package if I could be of some help.
Regards,
Yihui
--
Yihui Xie <xieyi...@gmail.com>
Phone: +86-(0)10-82509086 Fax: +86-(0)10-82509086
Mobile: +86-15810805877
Homepage: http://www.yihui.name
School of Statistics, Room 1037, Mingde Main Building,
Renmin University of China, Beijing, 100872, China
2009/3/21 <romain.franc...@dbmail.com>:
It happens in the token function in gram.c:
   c = SkipSpace();
   if (c == '#') c = SkipComment();
and then SkipComment goes like that:
static int SkipComment(void)
{
   int c;
   while ((c = xxgetc()) != '\n' && c != R_EOF) ;
   if (c == R_EOF) EndOfFile = 2;
   return c;
}
which effectively drops comments.
Would it be possible to keep the information somewhere ?
The source code says this:
 * The function yylex() scans the input, breaking it into
 * tokens which are then passed to the parser. The lexical
 * analyser maintains a symbol table (in a very messy fashion).
so my question is could we use this symbol table to keep track of, say, COMMENT
tokens.
Why would I even care about that ? I'm writing a package that will
perform syntax highlighting of R source code based on the output of the
parser, and it seems a waste to drop the comments.
An also, when you print a function to the R console, you don't get the
comments, and some of them might be useful to the user.
Am I mad if I contemplate looking into this ?
Romain
--
Romain Francois
Independent R Consultant
+33(0) 6 28 91 30 30
http://romainfrancois.blog.free.fr