Jose

As far as getting to the data, I think the best way to do this sort of thing would be if the site supports a SOAP or REST interface. When they don't (yet) then one is faced with clicking through some pages. Python or Java is one way to automate the process of clicking through the pages. I don't know how to do that in R, but would like to know if it is possible.

But, I guess I was confused about the part you want to improve. What I have works fairly smoothly parsing and passing back JSON data, converted from a csv file, into R. The downside is that this approach requires more than R to be installed on the client machine. But if the object you get back is ASPX, then you either need to parse it directly, or convert it to JSON, or something else you can deal with. I suspect that will be fairly specific to a particular web site, but I don't really know enough about ASPX to be sure.

Paul

On 12-10-30 01:12 PM, jose ramon mazaira wrote:
Thanks for your interest, Paul.
I've checked the source code of TSjson and I've seen that what it does
is to call a Python script to retrieve the data. In fact, I've already
done this with Java using the URLConnection class and sending the
requested values to fill the form.
However, I think it would be more useful to open a connection with R
and to send the requested values within R, and not through an external
program.
The application I've designed, like yours, is also page-specific
(i.e., designed for
http://cxa.gtm.idmanagedsolutions.com/finra/BondCenter/AdvancedScreener.aspx),
but I think that our applications would be more powerful if they were
able to parse the name-value pairs generated from ASPX (or of any
other dynamically generated web page) and ask the user to select the
appropiate values.

2012/10/30, Paul Gilbert <pgilbert...@gmail.com>:
I think RHTMLForms works if you have a single form, but I have not been
able to see how to use it when you need to go through a sequence of
dynamically generated forms (like you can do with Python mechanize).

Paul

On 12-10-30 09:08 AM, Gabriel Becker wrote:
I haven't used it extensively myself, and can't speak to it's current
state but on quick inspection RHTMLForms seems worth a look for what you
want.

http://www.omegahat.org/RHTMLForms/

~G

On Tue, Oct 30, 2012 t 5:38 AM, Paul Gilbert <pgilbert...@gmail.com
<mailto:pgilbert...@gmail.com>> wrote:

     I don't know of an easy way to do this in R. I've been doing
     something similar with python scripts called from R. If anyone knows
     how to do this with just R, I would appreciate hearing too.

     Paul


     On 12-10-29 04:11 PM, jose ramon mazaira wrote:

         Hi. I'm trying to write an application to retrieve financial data
         (specially bonds data) from FINRA. The web page is served
         dynamically
         from an asp.net <http://asp.net> application:


http://cxa.gtm.__idmanagedsolutions.com/finra/__BondCenter/AdvancedScreener.__aspx

<http://cxa.gtm.idmanagedsolutions.com/finra/BondCenter/AdvancedScreener.aspx>

         I'd like to know if it's possible to fill dynamically the web
page
         form from R and, after filling it (with the issuer name),
         retrieve the
         web page, parse the data, and covert it to appropiate R objects.
         For example, suppose I want to search data for AT&T bonds. I'd
         like to
         know if it's possible, within R, to fill the page served from:


http://cxa.gtm.__idmanagedsolutions.com/finra/__BondCenter/AdvancedScreener.__aspx

<http://cxa.gtm.idmanagedsolutions.com/finra/BondCenter/AdvancedScreener.aspx>

         select the "corporate" option and fill with AT&T the field for
         "Issuer
         name", ask the page to display the results, and retrieve the
results
         for each of the bonds issued by AT&T (for example:


http://cxa.gtm.__idmanagedsolutions.com/finra/__BondCenter/BondDetail.aspx?ID=__MDAxOTU3Qko3

<http://cxa.gtm.idmanagedsolutions.com/finra/BondCenter/BondDetail.aspx?ID=MDAxOTU3Qko3>)

         and parsing the data from the web page.

         Thanks in advance.

         ________________________________________________
         R-devel@r-project.org <mailto:R-devel@r-project.org> mailing list
         https://stat.ethz.ch/mailman/__listinfo/r-devel
         <https://stat.ethz.ch/mailman/listinfo/r-devel>


     ________________________________________________
     R-devel@r-project.org <mailto:R-devel@r-project.org> mailing list
     https://stat.ethz.ch/mailman/__listinfo/r-devel
     <https://stat.ethz.ch/mailman/listinfo/r-devel>




--
Gabriel Becker
Graduate Student
Statistics Department
University of California, Davis



______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to