Re: [RT] Webdavapps with Cocoon

Stefano Mazzocchi Mon, 28 Jul 2003 02:54:26 -0700

replying to both Gianugo and Marc in the same email for brevity.

On Friday, Jul 25, 2003, at 17:08 Europe/Rome, Marc Portier wrote:

Gianugo Rabellino wrote:
Stefano Mazzocchi wrote:
<snip />

Now Cocoon, in its present incarnation, is heavily biased by the "read-only" syndrome, and this makes IMO very hard to enter the WebDAV world. I see two serious areas where WebDAV support needs careful (re)thinking of core Cocoon patterns:
I think this applies also to more classic file-upload schemes?

Yes, it totally does. the way file-upload is handled today is just one aspect of a more general 'polishing outside-in flow of information' for cocoon.

(note I used the term "polishing" not "rethinking", see below why)

1) URI space decoupling being unreversable: while this is a *major* feature of Cocoon (and something that might help immensely when applied to a DAV environment: views on WebDAV would really kick ass, imagine presenting your XML files as virtual OpenOffice .sxw that are assembled /disassembled on the fly), the drawback is that, in most cases, it's impossible to work your way from the pipeline result to the single pieces that make that result happen. Even the simplest XSLT transformation can't be reversely applied, so now there is no way to understand how an resource should be treated in a symmetric way when requested or uploaded. Oh yes, you can
hm, do we really need to look at it as symmetric?

No, we don't. I've been thinking about this a lot and I think that symmetry is not only a holy grail, but it's the wrong grail to consider holy. Read on.

I know we are tempted to do so, but is it a must?

It is tempting, but symmetry-driven design is bad. we must understand what we want, why and what is limiting us.

Is it imposed by current webdav enabled editors?

It has been already said that webdav is the most under-hyped technology ever.

Microsoft said in the helloween documents that they pushed for webdav to be a supercomplex specification so that opensource wouldn't be able to implement it. Greg Stein (the current ASF chairmain, BTW) finished mod_dav in a few days disturbed by those documents (if you ever meet Greg, as him, is a pretty funny story and he's very proud of having done that [he worked for microsoft before])

As a result of this, Microsoft moved away from webdav (probably they thought it was not complex enough) and into web services (will the SOAP/WSDL/UDDI/BPEL4WS stack will be hard enough for OSS to catch up? hopefully we'll be smarter and just keep going with good old HTTP style WS).

As a result, webdav was (more or less) abandoned by the market. Subversion is the only use of webdav that goes behind saving a file on disk thru your web folder (which implementation sucks ass and I bet is not going to be better in the future, in favor of a SOAP-based document upload web service). Again, Greg Stein is behind the effort.

WebDAV is a very generic protocol (just like HTTP is) but people are influenced by implementations more than by the protocol design themselves. For example, almost everybody on the web believes that

http://blah.com

and

http://blah.com/

are the same URL just because all web clients will call

HTTP/1.0 GET /

on both requests. But they don't know that

http://blah.com/news

and

http://blah.com/news/

are two different URL and it's the web server that (normally! but nobody ever specified this behavior anywhere!) translates the first into the second if the folder 'news' if found in the file system that mounts to that URL space.

Note that on a real FS, everybody knows the difference between

/home/blah/news

and

/home/blah/news/

because the OS enforces type checking on these (on a POSIX file system you cannot open a directory for writing as a file, for example).

The above weakness of URL space handling is the first thing that severely hurt the WebDAV world. [note: a bug in microsoft web folders eliminates the trailing slash from URL before sending the HTTP request, go figure! means that nobody in microsoft ever thought about webdav-editing the root of a folder (which is normally its index, or default content in ISS terms)]

Some say (ever Marc suggests) that the forcing of DAV to work all the actions on the same URL might be a reason for poor success, but I disagree because it doesn't take resource views into consideration.

If I have a resource like

http://blah.com/news/

and I want to edit, I could ask for

 http://blah.com:8888/news/
 http://edit.blah.com/news/
 http://blah.com/news/?view="edit";

which are all 'orthogonal' ways of asking a different view of the same resource accessing it thru a parallel URL space (but with different behaviors)

I normally prefer the virtual-host approach. something like this

   [frontend] <- [repository] <- [backend]
 http://blah.com             http://edit.blah.com

where frontend and backend are separated (frontend might even be a static representation of the saved content (say, created by a cronned forrest every hour or so).

The above setup removes you from the need from having to be symmetric.

(they want to put back where they got I assume?)

actually if you look at the combination of matchers/request-method-selector you wrote up it more looks like the request-method being part of the uri-request space almost?

I dislike this. the action should not be encoded in the URI space.

or put differently each request-method caters for a separate uri space? taking from there the symmetry between those spaces is something you can or cannot want to achieve?

(we're not used to look at this in this way, and I might be totally off scale here)

I would tend to prefer to have a backend with the exact same URL space than the front end, just providing different "views" on the data from the frontend from all the potential HTTP requests.

After years of tries and thinking, I believe the above is the best way of doing it.

<match pattern="*.xls"> <select type="request-method"> <when test="GET"> <generate src="{1}.xml"/> <transform src="xml2poi.xls"/> <serialize type="hssf"/> </when> <when test="PUT"> <generate type="xls2poi"/> <transform src="poi2sourcewrite"/> <transform type="sourcewrite"/> <serialize type="dummyserializer"/> </when> [...] </match> but this, apart from being ankward, doesn't work in general for all pipelines: think about aggregation at a very least.

Some high-end CMS (the good ones, not that stinking hyperexpensive vignette crap) implement the concept of wevdav de-aggregators. But, IMHO, the complexity of implementation and configuration of those resources makes their use totally ackward.

IMO, for aggregation, one potential solution is to provide a sub-URL-space that is directly accessible from the backend (interesting enough, this is the same concept that ReiserFS4 applied to pseudo-files)

Example, if on the frontend you have

/page

which is an aggregated resource with parts "top" "navbar" "body" the backend might do

 /page -> PUT/POST forbidden
 /page/top
 /page/navbar
 /page/body

but note that this is *NOT* something that cocoon should decide automatically, but it's something that *you* should decide in your backend sitemap for your webdav application. because another way of doing the above is simply

/page

where GET goes thru aggregation identifying the non-editable parts with special IDs, then PUT goes thru a stylesheet that filters out the non-editable elements. This is poor man de-aggregation but works and you decide your own.

My point is: symmetry is a holy grail, we should just come up with components and best practices to show people how to do stuff and they will build their own webdavapp.

The hard part is to let them know that webdav is nothing more than a few other actions on top of HTTP.

isn't this aggregate example just showing that some GET-URI's are to be considered as read-only? (not to be abused for a PUT that is)

In many situations, your webdavapp will forbid some actions on some resources, but this is very natural.

couldn't dav properties (PROPFIND?) provide such meta-data per GET-URI?
is any usage of those properties in any way standardised?

very few dav properties are standardized. since we don't control the client side, we cannot make assumptions on these.

2) direction: Cocoon is clearly designed for an "inside-out" type of flow in mind, while WebDAV is fully bidirectional.

this is not true anymore. with the ability to have pipeline dump their content on an outputstream if called from the flow, cocoon reached complete bydirectionality.

Design-wise it's difficult to adapt the G-T-S pattern to an incoming stream of data,

I can't see why. Admittedly, there are generators who are hardly reusable in both in-out and out-in case (StreamGenerator or RequestGenerator, for example) but that is not a deficiency of the pipeline design, expecially now that the output stream of the pipeline is reconnectable.

when you're barely generating stuff (you're actually deserializing it) and, mostly, when you're not serializing anything but a simple response (think MKCOL, MOVE, DELETE and the like).
this stuff sounds like flow integration on a separate section of the uri-request-space?

I totally agree. i think it would be fairly easy to implement a full dav stack with flowscript and a few java components that wrap around a repository (could be as simple as a file system)

This said, I have no real solutions to that, but I'm very curious to learn more about your "extractor" concept. I think this is something needed, yes, but would that be enough?

Yes, i totally think so. once you are able to extract information from the pipeline that you need to process it, the sitemap+flow can do whatever you need, in a fully symmetrical way (if you wish to do so).

webdav has been thought as a protocol for saving and retrieving files, but this is, again, another file-system injected syndrome of mod_dav. It
Though this makes it a tremendous tool too! The problem is that right now all the WebDAV implementations are "dumb" filesystems, where all you get is persistent storage. What I would love to see (and Cocoon would fit just perfectly) is the ability to build around the file system metaphore a whole set of components being able to react on the "filesystem" operation. In this case, a "save" (or "drag 'n drop") might
see this makes me return to the uri-binding again... if we were to do this without webdav and only with POST and file-upload stuff then the uri would be holding the 'action' that webdav carries in his method

yes, it would be possible. but the good thing about dav is that many fat clients implement it (office, openoffice, photoshop) providing a super-easy way for people to interact with something that can be seen as a repository (and maybe, on the other hand, is just a cocoon wrapping a file system and a relational database, depending on the URL presented)

mean an email sent to an administrator, or a workflow procedure being started: as easy as that, no client needed, just what we already got, networked shares and (maybe) a web browser: who needs a CMS client anymore then? Probably only CMS administrator, not users. Or (again) think about views/facets: being able to glue the Cocoon power to WebDAV might mean giving different content to each user. Graphics might see only images, and only in hi-rez: Cocoon will take care of making scaled down versions, while hiding them from the users. Possibilities are endless.
mmm, dreaming allowed... MOVE of a product.xml-file to another productline-collection results in a sql update on the foreign-key relation ?

why not. we could be as wild as doing an SVG report graph of a relational table, modify it with illustrator, save it and alter the data in the table. How about that? ;-)

I would love to see cocoon becoming a framework that can glue together everything on the web, from stateless publishing from stateful webdav applications and yeah, why not, once you can do webdav applications you can do SOAP applications or XML-RPC applications, they are, more or less, all XMLoverHTTP stuff.
Oh, me too, believe me! This might be the Next Big Thing (hey... wait, are we ready to be 10 years ahead of the crowd? ;-)). Now for the big question: should we leave this discussion for now, focusing on the upcoming release and take webdavification as one of the major challenges for the next generation (this alone might be a good reason for Cocoon 3.0 IMHO), or shoud we have some more fun on the topic here and now?

I think we should get this release out of the door ASAP, then start thinking about what's next.

I just wanted to tell you that there is a lot of thinking to do about webdav but we are in pretty good shape with what we have.

hehe, the avalanche has already started :-) managing the change into timing/planning and releases is a different aspect, they can (and should) run in parallel IMHO

the bigger challenge of being 10 years ahead is that these fast, wild, non-domesticated, associated thoughts here and now aren't mature enough to pull of anything and the discussion dries up before it started... we shouldn't add a management constraint onto that IMHO

yes, but we shouldn't put too many irons in the fire either.

--
Stefano.

Re: [RT] Webdavapps with Cocoon

Reply via email to