Arup wrote:
I can't import any HTML or SQL files into R..:confused:

Yeah, I'm confused, too.

What exactly is it you're trying to do? Not the technical task you asked about, but the effect you're trying to achieve? Can you give details about the exact nature of your data sources, or, better, examples?

I ask because actually importing HTML and SQL files is almost certainly the wrong approach. You almost never want to handle texts in either language directly in R.

For SQL, you usually don't have "SQL files": files literally containing SQL queries. Or if you do happen to have SQL query files, you probably don't want to parse them with R. I expect what you really want is to be able to query a database using SQL. For that, look up DBI on CRAN. This will let you connect R to a database server, and use SQL to get data from it in a format that R can process directly.

For HTML, the problem is that HTML is a very difficult language to parse correctly in the general case. Much of the reason for that is that few web pages are actually legal HTML, but browsers will quietly cope with many classes of errors. To parse such stuff in R, it's usually best to take a case-by-case approach, matching particular structures within the file so you can extract the few bits of data you want. You might want to post a snippet of the HTML here to get suggestions.

If you really do have to be able to accept arbitrary HTML, I'd suggest running the HTML through a filter that converts it to XHTML, then use the XML package from CRAN to load it up into R.

You might also want to look into the RCurl package, if the HTML lives on a web server. You can download it directly instead of saving it out to an HTML file. Then you can use the methods above to process it.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to