Everything in the UI uses the database.
A full scan from disk to database take *FAR* too long.

This was setup as a tiered effort. First step. get the real, valid, useful content off of disk and into the database into a usable form.
Second step. expand the data in the database fully.

The first step makes the content available for browse immediately, in the order of ~55ms per file.
The second step takes an average of 3 seconds per file right now.
NOTE: Even though the database scan is short now, this is expected to take multi-minute per artifact in the future, as the consumer complexity start to grow. (think bytecode scan / checksumming / indexing / cross-referencing, and gpg signature confirmation processes to name just 2 that we are aware of)

In large repositories, this is the only way to get the content available within a 24 hour window. This problem also exists with the old technique + large repositories + scan for new content.

- Joakim

Wendy Smoak wrote:
On 10/16/07, Joakim Erdfelt <[EMAIL PROTECTED]> wrote:
ArchivaArtifactConsumer is an abstract-dealing-with-artifacts consumer.
RepositoryContentConsumer is for files.

A file that isn't an artifact can be *.xml, *.sha1, *.md5,
maven-metadata.xml, bad content, poorly named content, etc.

Would it be better to state the phase/scan instead?

RepositoryContentConsumer becomes -> RepositoryScanConsumer
ArchivaArtifactConsumer becomes -> DatabaseScanConsumer

All artifacts _are_ repository content, are they not?  And even after
the renaming... it can't be in the database unless it's in the
repository.

I understand scanning the filesystem to update the database.  But when
and why do you "scan" the database?



--
- Joakim Erdfelt
 [EMAIL PROTECTED]
 Open Source Software (OSS) Developer

Reply via email to