This is what libcurl says about proxy support:
Proxies
What "proxy" means according to Merriam-Webster: "a person authorized to
act for another" but also "the agency, function, or office of a deputy
who acts as a substitute for another".
Proxies are exceedingly common these days. Companies often only offer
Internet access to employees through their proxies. Network clients or
user-agents ask the proxy for documents, the proxy does the actual
request and then it returns them.
libcurl supports SOCKS and HTTP proxies. When a given URL is wanted,
libcurl will ask the proxy for it instead of trying to connect to the
actual host identified in the URL.
If you're using a SOCKS proxy, you may find that libcurl doesn't quite
support all operations through it.
For HTTP proxies: the fact that the proxy is a HTTP proxy puts certain
restrictions on what can actually happen. A requested URL that might not
be a HTTP URL will be still be passed to the HTTP proxy to deliver back
to libcurl. This happens transparently, and an application may not need
to know. I say "may", because at times it is very important to
understand that all operations over a HTTP proxy use the HTTP protocol.
For example, you can't invoke your own custom FTP commands or even
proper FTP directory listings.
*Proxy Options*
To tell libcurl to use a proxy at a given port number:
curl_easy_setopt(easyhandle, CURLOPT_PROXY, "proxy-host.com:8080");
Some proxies require user authentication before allowing a request, and
you pass that information similar to this:
curl_easy_setopt(easyhandle, CURLOPT_PROXYUSERPWD, "user:password");
If you want to, you can specify the host name only in the CURLOPT_PROXY
option, and set the port number separately with CURLOPT_PROXYPORT.
Tell libcurl what kind of proxy it is with CURLOPT_PROXYTYPE (if not, it
will default to assume a HTTP proxy):
curl_easy_setopt(easyhandle, CURLOPT_PROXYTYPE, CURLPROXY_SOCKS4);
*Environment Variables*
libcurl automatically checks and uses a set of environment variables to
know what proxies to use for certain protocols. The names of the
variables are following an ancient de facto standard and are built up as
"[protocol]_proxy" (note the lower casing). Which makes the variable
'http_proxy' checked for a name of a proxy to use when the input URL is
HTTP. Following the same rule, the variable named 'ftp_proxy' is checked
for FTP URLs. Again, the proxies are always HTTP proxies, the different
names of the variables simply allows different HTTP proxies to be used.
The proxy environment variable contents should be in the format
"[protocol://][user:passw...@]machine[:port]". Where the protocol://
part is simply ignored if present (so http://proxy and bluerk://proxy
will do the same) and the optional port number specifies on which port
the proxy operates on the host. If not specified, the internal default
port number will be used and that is most likely *not* the one you would
like it to be.
There are two special environment variables. 'all_proxy' is what sets
proxy for any URL in case the protocol specific variable wasn't set, and
'no_proxy' defines a list of hosts that should not use a proxy even
though a variable may say so. If 'no_proxy' is a plain asterisk ("*") it
matches all hosts.
To explicitly disable libcurl's checking for and using the proxy
environment variables, set the proxy name to "" - an empty string - with
CURLOPT_PROXY.
So, setting your environment variable 'http_proxy' to 'proxy-host:8080'
should work: does it work indeed?
Alberto
Dantzler, DeWayne C wrote:
I've looked at the libcurl doc, but I'm using Xerces to parse and it is during the parsing of the XMl file that internet access occurs since the XML calls entities that must be resolved via a URL. Since Xerces can be configured with libcurl, my assumption is that somehow Xerces must use it. Given Alberto comments "If you use --enable-netaccessor-curl Xerces will use the APIs provided by libcurl, so if you use the same API to setup a global proxy, you will be able to use it." and based on the libcurl doc, my assumption is that setting the env 'http_proxy' using libcurl's API is how Xerces will know the correct proxy to use. Is this how Xerces will know the correct proxy to use?
-----Original Message-----
From: Vitaly Prapirny [mailto:[email protected]]
Sent: Thursday, November 05, 2009 12:18 AM
To: [email protected]
Subject: Re: What is the difference between configuring Xerces 3.0.1
with--enable-netaccessor-curlvs --enable-netaccessor-socket?
Proxy settings should be recognized by libcurl, not Xerces.
So please look at http://curl.haxx.se/libcurl/c/libcurl-tutorial.html
as I suggest you in my answer to your previous message
http://marc.info/?l=xerces-c-users&m=125541575925143&w=2
Good luck!
Vitaly
Dantzler, DeWayne C wrote:
Ok, If Xerces will use the APIs provided by libcurl, then what Xerces's APIs
must I use to get Xerces to recognize my proxy settings or how does Xerces
determine the proxy settings? I've tried googling for the answer, but came up
empty. I'm not sure of the right text combo to get a hit.
Thanks
-----Original Message-----
From: Alberto Massari [mailto:[email protected]]
Sent: Tuesday, November 03, 2009 11:27 PM
To: [email protected]
Subject: Re: What is the difference between configuring Xerces 3.0.1
with--enable-netaccessor-curl vs --enable-netaccessor-socket?
If you use --enable-netaccessor-socket you will not be able to specify a proxy,
as the code that reads from the Internet is simply working with plain TCP
sockets. If you use --enable-netaccessor-curl Xerces will use the APIs provided
by libcurl, so if you use the same API to setup a global proxy, you will be
able to use it.
Alberto
Dantzler, DeWayne C wrote:
Hello
Problem: there is a proxy between Xerces and the outside World and I need Xerces to perform XML
validation against a schema which includes online references to an external scheme (e.g<xs:import
namespace="http://www.w3.org/myspace
schemaLocation="http://www.w3.org/schema.xsd"/>.
What is the difference between configuring Xerces with
--enable-netaccessor-curl vs --enable-netaccessor-socket? Basically, how does
this effect the socket behavior of Xerces and why would I choose one over the
other?
Thanks