Vadim wrote:


* split the bean into a CocoonWrapper that handles configuring a
Cocoon object and handling a single request, and a CocoonBean which handles
crawling

What is the API of these new beans? Please do not forget that
CocoonBean is out of the door with 2.1 release and people (might be)
already building applications with CocoonBean, meaning, you can't
change CocoonBean API in backward incompatible way without proper
deprecating and support of released functionality.

But we did document that the API of the bean was unstable. Doesn't that mean we can change the API where necessary?


Ah, in this case we can. Unfortunately, class's Javadoc does not has this indication.

How do you use Javadoc to indicate unstable?


Of course we should minimise it as much as possible. Therefore, I'll redo what I've done so far, being more thorough about ensuring compatibility.

I'm sure I can manage the split into two classes (which I think greatly aids clarity) without breaking any interfaces.

Sounds good.

I've just committed this.


* Made the CocoonBean use a Crawler class (derived from the one in
the scratchpad Ant task)

Do you mean org.apache.cocoon.components.crawler.Crawler? I don't see
how it can be used in CocoonBean. Can you elaborate?


No. There's a scratchpad Ant task which has its own crawler. I used that.

CocoonCrawling.java? :)

Yes.


I'd like to use o.a.c.components.crawler.Crawler, but I couldn't see how to do it, because it has its own link gathering code built into it.

It's purely for crawling external sites via URL.

But it could do with the ability to use the Cocoon protocol, as it would make it more efficient.
Especially as you can now request views via the cocoon: protocol.


Next I want to:
* moving the member variables of the wrapper and bean into a Context
object, so that the Bean can be used in a ThreadSafe environment.

AFAIU, CocoonBean.processURI is already thread safe. All addTarget() methods are obviously not. addTarget() methods can easily be made threadsafe (in some sense -- call to addTarget in one thread does not break bean but affects process() running in another thread) by synchronyzing access to the targets collection. It can be thread safe in another sense too (calls to processTargets in different threads are independent of each other): you just need to add processTargets(targets) method.

All of the crawler data is in member variables that will be shared between threads. Therefore processTargets(targets) wouldn't in itself be enough.


I can add a crawler in which encapsulates the necessary data. Then a processTargets(targets) could be threadsafe.

Agreed.

I'll make sure the crawler code is well encapsulated when I get on to that.


...

* Get caching working properly, and make it use ifModifiedSince() to
determine whether to save the file or not.

Must-have feature. Top priority. I hope you've seen my emails on
persistent store subject.

I certainly did. I got your code, and downloaded and compiled the latest Excalibur Store. Unfortunately, on first tests, the CLI seems to have actually got slower. I did those tests without stepping through the code, so I've got to check out more of what's going on. I agree this is a top priority. I guess I just got a little downhearted at those results and needed a few days to recover my enthusiasm!

I've got it working now, but I've lost linkGatherer functionality and it seems slower.

I've started looking into reimplementing linkGatherer, by, every time I see cache.store() saying something like (using pseudo-code):

if (objectModel.contains(LINK_GATHERER_LIST) {
  cache.store(key + "/gathered-links", objectModel.get(LINK_GATHERER_LIST);
}

Does that seem reasonable? Is it easy to build up the 'key + "/gathered-links" ' composite cache key?

* Make Cocoon work with an external Cocoon object, again for the
sake of a  PublishingService

I don't get this. What Cocoon with which external Cocoon?

This is something that Unico talked about in relation to a publishing service running within a Cocoon servlet. Again, I'll wait until we've got an actual plan for such a service.

Ah, I see. But there, you will have to go over the wire, as Crawler does. Right?

Reading Unico's recent email, it makes sense to use FOM_Cocoon if the bean is to be used in a servlet environment. Then the bean becomes something quite simple. Would you agree?


* work out how to implement Vadim's idea for a single pipeline with
an  XMLTeePipe to generate both a link view and page view in one hit

Yep. Should increase performance and conformance!

I've spent some time trying to work out how to do this. It seems quite complicated. As each pipeline, when built, is made up of generator, set of translators and serializer, to build a pipeline which splits into two, one half completing normally and the other going off into a separate 'link-view' pipeline, would require a specifically built Pipeline class, and would require changes to the treeprocessor to be able to build it. Am I right, or do you know of a simpler way?

You are right. As currently sitemap implementation adds link gatherer automagically, in the same way links view should be automagically assembled and attached at the branch point.

But to do it automagically would require significant changes to, probably, AbstractProcessingPipeline, as for automagicallity, you couldn't add a special 'BranchingCachingProcessingPipeline'. Is that what you would propose?


Regards, Upayavira




Reply via email to