On Sun, Jan 22, 2012 at 11:07 PM, Myles C. Maxfield <[email protected]> wrote: > Replies are inline. Thanks for the quick and thoughtful response! > > On Sat, Jan 21, 2012 at 8:56 AM, Michael Snoyman <[email protected]> > wrote: >> >> Hi Myles, >> >> These sound like two solid features, and I'd be happy to merge in code to >> support it. Some comments below. >> >> On Sat, Jan 21, 2012 at 8:38 AM, Myles C. Maxfield >> <[email protected]> wrote: >>> >>> To: Michael Snoyman, author and maintainer of http-conduit >>> CC: haskell-cafe >>> >>> Hello! >>> >>> I am interested in contributing to the http-conduit library. I've been >>> using it for a little while and reading through its source, but have felt >>> that it could be improved with two features: >>> >>> Allowing the caller to know the final URL that ultimately resulted in the >>> HTTP Source. Because httpRaw is not exported, the caller can't even >>> re-implement the redirect-following code themselves. Ideally, the caller >>> would be able to know not only the final URL, but also the entire chain of >>> URLs that led to the final request. I was thinking that it would be even >>> cooler if the caller could be notified of these redirects as they happen in >>> another thread. There are a couple ways to implement this that I have been >>> thinking about: >>> >>> A straightforward way would be to add a [W.Ascii] to the type of >>> Response, and getResponse can fill in this extra field. getResponse already >>> knows about the Request so it can tell if the response should be gunzipped. >> >> What would be in the [W.Ascii], a list of all paths redirected to? Also, >> I'm not sure what gunzipping has to do with here, can you clarify? >> > > Yes; my idea was to make the [W.Ascii] represent the list of all URLs > redirected to, in order. > > My comment about gunzipping is only tangentially related. I meant that in > the latest version of the code on GitHub, the getResponse function already > takes a Request as an argument. This means that the getResponse function > already knows what URL its data is coming from, so modifying the getResponse > function to return that URL is simple. (I mentioned gunzip because, as far > as I can tell, the reason that getResponse already takes a Request is so > that the function can tell if the request should be gunzipped.) >>> >>> It would be nice for the caller to be able to know in real time what URLs >>> the request is being redirected to. A possible way to do this would be for >>> the 'http' function to take an extra argument of type (Maybe >>> (Control.Concurrent.Chan W.Ascii)) which httpRaw can push URLs into. If the >>> caller doesn't want to use this variable, they can simply pass Nothing. >>> Otherwise, the caller can create an IO thread which reads the Chan until >>> some termination condition is met (Perhaps this will change the type of the >>> extra argument to (Maybe (Chan (Maybe W.Ascii)))). I like this solution, >>> though I can see how it could be considered too heavyweight. >> >> >> I do think it's too heavyweight. I think if people really want lower-level >> control of the redirects, they should turn off automatic redirect and allow >> 3xx responses. > > Yeah, that totally makes more sense. As it stands, however, httpRaw isn't > exported, so a caller has no way of knowing about each individual HTTP > transaction. Exporting httpRaw solves the problem I'm trying to solve. If we > export httpRaw, should we also make 'http' return the URL chain? Doing both > is probably the best solution, IMHO.
What's the difference between calling httpRaw and calling http with redirections turned off? >>> >>> Making the redirection aware of cookies. There are redirects around the >>> web where the first URL returns a Set-Cookie header and a 3xx code which >>> redirects to another site that expects the cookie that the first HTTP >>> transaction set. I propose to add an (IORef to a Data.Set of Cookies) to the >>> Manager datatype, letting the Manager act as a cookie store as well as a >>> repository of available TCP connections. httpRaw could deal with the cookie >>> store. Network.HTTP.Types does not declare a Cookie datatype, so I would >>> probably be adding one. I would probably take it directly from >>> Network.HTTP.Cookie. >> >> Actually, we already have the cookie package for this. I'm not sure if >> putting the cookie store in the manager is necessarily the right approach, >> since I can imagine wanting to have separate sessions while reusing the same >> connections. A different approach could be adding a list of Cookies to both >> the Request and Response. > > Ah, looks like you're the maintainer of that package as well! I didn't > realize it existed. I should have, though; Yesod must need to know about > cookies somehow. > > As the http-conduit package stands, the headers of the original Request can > be set, and the headers of the last Response can be read. Because cookies > are implemented on top of headers, the caller knows about the cookies before > and after the redirection chain. I'm more interested in the preservation of > cookies within the redirection chain. As discussed earlier, exposing the > httpRaw function allows the entire redirection chain to be handled by the > caller, which alleviates the problem. > > That being said, however, the simpleHttp function (and all functions built > upon 'http' inside of http-conduit) should probably respect cookies inside > redirection chains. Under the hood, Network.Browser does this by having the > State monad keep track of these cookies (as well as the connection pool) and > making HTTP requests mutate that State, but that's a pretty different > architecture than Network.HTTP.Conduit. > > One way I can think to do this would be to let the user supply a CookieStore > (probably implemented as a (Data.Set Web.Cookie.SetCookie)) and receive a > (different) CookieStore from the 'http' function. That way, the caller can > manage the CookieStores independently from the connection pool. The downside > is that it's one more bit of ugliness the caller has to deal with. How do > you feel about this? You probably have a better idea :-) The only idea was to implement an extra layer of cookie-away functions in a separate Browser module. That's been the running assumption for a while now, since HTTP does it, but I'm not opposed to taking a different approach. It could be that the big mistake in all this was putting redirection at the layer of the API that I did. Yitz Gale pointed out that in Python, they have the low-level API and the high-level API, the latter dealing with both redirection and cookies. Anyway, here's one possible approach to the whole situation: `Request` could have an extra record on it of type `Maybe (IORef (Set SetCookie))`. When `http` is called, if the record is `Nothing`, a new value is created. Every time a request is made, the value is updated accordingly. That way, redirects will respect cookies for the current sessions, and if you want to keep a longer-term session, you can keep reusing the record in different `Request`s. We can also add some convenience functions to automatically reuse the cookie set. Michael >> I'd be happy to do both of these things, but I'm hoping for your input on >> how to go about this endeavor. Are these features even good to be pursuing? >> Should I be going about this entirely differently? >> >> Thanks, >> Myles C. Maxfield >> >> P.S. I'm curious about the lack of Network.URI throughout >> Network.HTTP.Conduit. Is there a particular design decision that led you to >> use raw ascii strings? > > > Because there are plenty of URIs that are valid that we don't handle at all, > e.g., ftp. > > I'm a little surprised by this, since you can easily test for unhandled URIs > because they're already parsed. Whatever; It doesn't really matter to me, I > was just surprised by it. > > Michael > > Thanks again for the feedback! I'm hoping to make a difference :] > > --Myles _______________________________________________ Haskell-Cafe mailing list [email protected] http://www.haskell.org/mailman/listinfo/haskell-cafe
