Hi Jamie Lokier,

Just to clear up a few issues:

1)  All the Dbs (btree/hashes/ berkeley) only have api for updating one
key/Value at a time AFAIK

2) In our case, the final index merge is not updating anything as its
creating a new index therefore the disk space for hits is contiguous so
regardless of what word we start from, space is allocated on a first
come first served basis (IE its appended) so whatever word order we
choose basically. The buckets in the header are random of course but
they are fixed at first 1MB of index  (256,000 buckets at 32bits each)

3) all major indexers Lucene (Beagle/strigi) and google use index merges
as updating a big index is slow + it helps remove deleted entries and
fragmentation. Without merges no index would be scalable

4) we dont wanna use multiple tables and sql dbs are not appropriate as
they store the word twice (once in index and once in table) hence
bloating things up

5) The high end oracle RDBMs has support for clustered tables which
allow storage of stuff in key order (normal tables are appended and only
indexes are sorted). These are not practical as they are even more
painful to update due to massive relocation (in fact its far quicker to
append records then copy to new table in sorted order).

6) performance problems with existing merges dissappear on XFS (they
merge in seconds as opposed to minutes on EXT3). If EXT4 gets similar
delayed allocation then hopefully we will see same too

-- 
Heavy Disk I/O harms desktop responsiveness
https://bugs.launchpad.net/bugs/131094
You received this bug notification because you are a member of Ubuntu
Bugs, which is the bug contact for Ubuntu.

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to