Hi Romain, I've been thinking for quite a long time on how to keep comments when parsing R code and finally got a trick with inspiration from one of my friends, i.e. to mask the comments in special assignments to "cheat" R parser:
# keep.comment: whether to keep the comments or not # keep.blank.line: preserve blank lines or not? # begin.comment and end.comment: special identifiers that mark the orignial # comments as 'begin.comment = "#[ comments ]end.comment"' # and these marks will be removed after the modified code is parsed tidy.source <- function(source = "clipboard", keep.comment = TRUE, keep.blank.line = FALSE, begin.comment, end.comment, ...) { # parse and deparse the code tidy.block = function(block.text) { exprs = parse(text = block.text) n = length(exprs) res = character(n) for (i in 1:n) { dep = paste(deparse(exprs[i]), collapse = "\n") res[i] = substring(dep, 12, nchar(dep) - 1) } return(res) } text.lines = readLines(source, warn = FALSE) if (keep.comment) { # identifier for comments identifier = function() paste(sample(LETTERS), collapse = "") if (missing(begin.comment)) begin.comment = identifier() if (missing(end.comment)) end.comment = identifier() # remove leading and trailing white spaces text.lines = gsub("^[[:space:]]+|[[:space:]]+$", "", text.lines) # make sure the identifiers are not in the code # or the original code might be modified while (length(grep(sprintf("%s|%s", begin.comment, end.comment), text.lines))) { begin.comment = identifier() end.comment = identifier() } head.comment = substring(text.lines, 1, 1) == "#" # add identifiers to comment lines to cheat R parser if (any(head.comment)) { text.lines[head.comment] = gsub("\"", "\'", text.lines[head.comment]) text.lines[head.comment] = sprintf("%s=\"%s%s\"", begin.comment, text.lines[head.comment], end.comment) } # keep blank lines? blank.line = text.lines == "" if (any(blank.line) & keep.blank.line) text.lines[blank.line] = sprintf("%s=\"%s\"", begin.comment, end.comment) text.tidy = tidy.block(text.lines) # remove the identifiers text.tidy = gsub(sprintf("%s = \"|%s\"", begin.comment, end.comment), "", text.tidy) } else { text.tidy = tidy.block(text.lines) } cat(paste(text.tidy, collapse = "\n"), "\n", ...) invisible(text.tidy) } The above function can deal with comments which are in single lines, e.g. f = tempfile() writeLines(' # rotation of the word "Animation" # in a loop; change the angle and color # step by step for (i in 1:360) { # redraw the plot again and again plot(1,ann=FALSE,type="n",axes=FALSE) # rotate; use rainbow() colors text(1,1,"Animation",srt=i,col=rainbow(360)[i],cex=7*i/360) # pause for a while Sys.sleep(0.01)} ', f) Then parse the code file 'f': > tidy.source(f) # rotation of the word 'Animation' # in a loop; change the angle and color # step by step for (i in 1:360) { # redraw the plot again and again plot(1, ann = FALSE, type = "n", axes = FALSE) # rotate; use rainbow() colors text(1, 1, "Animation", srt = i, col = rainbow(360)[i], cex = 7 * i/360) # pause for a while Sys.sleep(0.01) } Of course this function has some limitations: it does not support inline comments or comments which are inside incomplete code lines. Peter's example f #here ( #here a #here (possibly) = #here 1 #this one belongs to the argument, though ) #but here as well will be parsed as f (a = 1) I'm quite interested in syntax highlighting of R code and saw your previous discussions in another posts (with Jose Quesada, etc). I'd like to do something for your package if I could be of some help. Regards, Yihui -- Yihui Xie <xieyi...@gmail.com> Phone: +86-(0)10-82509086 Fax: +86-(0)10-82509086 Mobile: +86-15810805877 Homepage: http://www.yihui.name School of Statistics, Room 1037, Mingde Main Building, Renmin University of China, Beijing, 100872, China 2009/3/21 <romain.franc...@dbmail.com>: > > It happens in the token function in gram.c: > >    c = SkipSpace(); >    if (c == '#') c = SkipComment(); > > and then SkipComment goes like that: > > static int SkipComment(void) > { >    int c; >    while ((c = xxgetc()) != '\n' && c != R_EOF) ; >    if (c == R_EOF) EndOfFile = 2; >    return c; > } > > which effectively drops comments. > > Would it be possible to keep the information somewhere ? > > The source code says this: > >  * The function yylex() scans the input, breaking it into >  * tokens which are then passed to the parser. The lexical >  * analyser maintains a symbol table (in a very messy fashion). > > so my question is could we use this symbol table to keep track of, say, > COMMENT tokens. > > Why would I even care about that ? I'm writing a package that will > perform syntax highlighting of R source code based on the output of the > parser, and it seems a waste to drop the comments. > > An also, when you print a function to the R console, you don't get the > comments, and some of them might be useful to the user. > > Am I mad if I contemplate looking into this ? > > Romain > > -- > Romain Francois > Independent R Consultant > +33(0) 6 28 91 30 30 > http://romainfrancois.blog.free.fr > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel