Everything in the UI uses the database.
A full scan from disk to database take *FAR* too long.
This was setup as a tiered effort.
First step. get the real, valid, useful content off of disk and into the
database into a usable form.
Second step. expand the data in the database fully.
The first step makes the content available for browse immediately, in
the order of ~55ms per file.
The second step takes an average of 3 seconds per file right now.
NOTE: Even though the database scan is short now, this is expected to
take multi-minute per artifact in the future, as the consumer complexity
start to grow. (think bytecode scan / checksumming / indexing /
cross-referencing, and gpg signature confirmation processes to name just
2 that we are aware of)
In large repositories, this is the only way to get the content available
within a 24 hour window.
This problem also exists with the old technique + large repositories +
scan for new content.
- Joakim
Wendy Smoak wrote:
On 10/16/07, Joakim Erdfelt <[EMAIL PROTECTED]> wrote:
ArchivaArtifactConsumer is an abstract-dealing-with-artifacts consumer.
RepositoryContentConsumer is for files.
A file that isn't an artifact can be *.xml, *.sha1, *.md5,
maven-metadata.xml, bad content, poorly named content, etc.
Would it be better to state the phase/scan instead?
RepositoryContentConsumer becomes -> RepositoryScanConsumer
ArchivaArtifactConsumer becomes -> DatabaseScanConsumer
All artifacts _are_ repository content, are they not? And even after
the renaming... it can't be in the database unless it's in the
repository.
I understand scanning the filesystem to update the database. But when
and why do you "scan" the database?
--
- Joakim Erdfelt
[EMAIL PROTECTED]
Open Source Software (OSS) Developer