Re: Archiva Consumers question

Joakim Erdfelt Wed, 17 Oct 2007 07:28:10 -0700

Everything in the UI uses the database.
A full scan from disk to database take *FAR* too long.

This was setup as a tiered effort.First step. get the real, valid, useful content off of disk and into thedatabase into a usable form.

Second step. expand the data in the database fully.

The first step makes the content available for browse immediately, inthe order of ~55ms per file.

The second step takes an average of 3 seconds per file right now.

NOTE: Even though the database scan is short now, this is expected totake multi-minute per artifact in the future, as the consumer complexitystart to grow. (think bytecode scan / checksumming / indexing /cross-referencing, and gpg signature confirmation processes to name just2 that we are aware of)

In large repositories, this is the only way to get the content availablewithin a 24 hour window.This problem also exists with the old technique + large repositories +scan for new content.


- Joakim

Wendy Smoak wrote:

On 10/16/07, Joakim Erdfelt <[EMAIL PROTECTED]> wrote:

ArchivaArtifactConsumer is an abstract-dealing-with-artifacts consumer.
RepositoryContentConsumer is for files.

A file that isn't an artifact can be *.xml, *.sha1, *.md5,
maven-metadata.xml, bad content, poorly named content, etc.

Would it be better to state the phase/scan instead?

RepositoryContentConsumer becomes -> RepositoryScanConsumer
ArchivaArtifactConsumer becomes -> DatabaseScanConsumer


All artifacts _are_ repository content, are they not?  And even after
the renaming... it can't be in the database unless it's in the
repository.

I understand scanning the filesystem to update the database.  But when
and why do you "scan" the database?



--
- Joakim Erdfelt
 [EMAIL PROTECTED]
 Open Source Software (OSS) Developer

Re: Archiva Consumers question

Reply via email to