On 22/09/12 11:34 AM, Lars Noodén wrote:
On 9/22/12 6:01 PM, cr...@gtek.biz wrote:
Greetings,

I have a small book collection (~150) that I thought would be neat to
catalog by the Library of Congress catalog numbers. I have found a
LOC search form that will allow me to input the ISBN, and it will
return the information I want:

[code]http://www.loc.gov/cgi-bin/zgate?ACTION=INIT&FORM_HOST_PORT=/prod/www/data/z3950/locils2.html,z3950.loc.gov,7090[/code]

  I have the list of book ISBNs in a text file, so scripting this
should be quite easy. The problem is I can't figure out how to submit
the form from the command line. I figured wget would be the best way,
but everything I try results in downloading a single line that reads
"Your form didn't include an ACTION!" So I thought I would turn to
here for help. The test ISBN I am using is for The Linux Cookbook:
1886411484, QA76.76.O63S788 2001.
[snip]

If you want to screen scrape, the URI would be like this:

http://www.loc.gov/cgi-bin/zgate?ACTION=SEARCH&DBNAME=VOYAGER&ESNAME=B&MAXRECORDS=20&RECSYNTAX=1.2.840.10003.5.10&REINIT=/cgi-bin/zgate?ACTION=INIT&FORM_HOST_PORT=/prod/www/data/z3950/locils2.html,z3950.loc.gov,7090&srchtype=1,1016,2,102,3,3,4,2,5,100,6,1&SESSION_ID=4493330&TERM_1=1886411484

However, the session ID expires after only a few minutes so you will
need a fresh one.

Regards,
/Lars
The solution is to wget the form to get a session id then submit the query using that session id. If running multiple queries then keep submitting them using the session id until one is rejected. With any luck, you should be able to run multiple queries and also be able to detect when a query is rejected due to an expired session.

You also could simply keep the get form / submit query pairing since I doubt that the (possibly) unnecessary extra form gets are going to cause a huge slowdown. I just think it's better to try to minimize traffic where possible.


--
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/505e745d.3050...@rogers.com

Reply via email to