https://bugs.kde.org/show_bug.cgi?id=354636

--- Comment #10 from Oded Arbel <o...@geek.co.il> ---
> 3. The index file is huge - about 19GB, which doesn't make a lot of sense to
> me. `balooctl indexSize` has this to say:
> 
> ----8<----
> File Size: 18.75 GiB
> Used:      948.13 MiB
> 
>            PostingDB:       2.93 GiB   316.627 %
>           PositionDB:      85.44 MiB     9.011 %
>             DocTerms:       1.39 GiB   149.920 %
>     DocFilenameTerms:     152.72 MiB    16.107 %
>        DocXattrTerms:       8.39 MiB     0.885 %
>               IdTree:      35.69 MiB     3.764 %
>           IdFileName:     175.18 MiB    18.476 %
>              DocTime:      92.85 MiB     9.793 %
>              DocData:      43.49 MiB     4.587 %
>    ContentIndexingDB:     448.00 KiB     0.046 %
>          FailedIdsDB:            0 B     0.000 %
>              MTimeDB:      26.48 MiB     2.793 %
> ----8<----
> 
> and to that I can only say "wahhh?!?!?"

After reviewing the code at https://github.com/KDE/baloo/blob/master , I'm more
befuddled by the above numbers:

1. "Used" is `DatabaseSize.expectedSize`
2. The percentages are computed by 100 * "entry size" / "Used", so the 316%
makes sense as it is larger than "Used".
3. `DatabaseSize.expectedSize` is calculated (src/engine/transaction.cpp:474)
by adding up the sizes of all of the entries listed!! so it cannot be smaller
than the sum of its parts, unless one of the parts is negative - which it can't
be as the sizes are of type `size_t`, which - unless something really weird is
going on in the build server - should be unsigned long int.

There's something about page sizes, but that isn't relevant to the above
calculation which seem to suggest that a/(a+b) > 1 where both a and b are
non-negative integers.

BTW - here's the result of running the `mdb_stat` tool from lmdb-utils on the
baloo index:

----8<----
$ mdb_stat -af <path-to-index-db>
Freelist Status
  Tree depth: 2
  Branch pages: 1
  Leaf pages: 41
  Overflow pages: 5046
  Entries: 3253
  Free pages: 2566315
Status of Main DB
  Tree depth: 1
  Branch pages: 0
  Leaf pages: 1
  Overflow pages: 0
  Entries: 12
Status of docfilenameterms
  Tree depth: 4
  Branch pages: 315
  Leaf pages: 38726
  Overflow pages: 0
  Entries: 2104603
Status of docterms
  Tree depth: 4
  Branch pages: 633
  Leaf pages: 79407
  Overflow pages: 284028
  Entries: 2103699
Status of documentdatadb
  Tree depth: 3
  Branch pages: 90
  Leaf pages: 11012
  Overflow pages: 38
  Entries: 664790
Status of documenttimedb
  Tree depth: 3
  Branch pages: 187
  Leaf pages: 23555
  Overflow pages: 0
  Entries: 2111124
Status of docxatrrterms
  Tree depth: 3
  Branch pages: 21
  Leaf pages: 2040
  Overflow pages: 86
  Entries: 31253
Status of failediddb
  Tree depth: 0
  Branch pages: 0
  Leaf pages: 0
  Overflow pages: 0
  Entries: 0
Status of idfilename
  Tree depth: 4
  Branch pages: 363
  Leaf pages: 44411
  Overflow pages: 0
  Entries: 2120309
Status of idtree
  Tree depth: 3
  Branch pages: 52
  Leaf pages: 6960
  Overflow pages: 2118
  Entries: 223613
Status of indexingleveldb
  Tree depth: 3
  Branch pages: 3
  Leaf pages: 49
  Overflow pages: 0
  Entries: 5471
Status of mtimedb
  Tree depth: 3
  Branch pages: 42
  Leaf pages: 6719
  Overflow pages: 0
  Entries: 2111124
Status of positiondb
  Tree depth: 4
  Branch pages: 6657
  Leaf pages: 735531
  Overflow pages: 328761
  Entries: 42876611
Status of postingdb
  Tree depth: 4
  Branch pages: 6181
  Leaf pages: 657348
  Overflow pages: 105167
  Entries: 45851508
----8<----

-- 
You are receiving this mail because:
You are watching all bug changes.

Reply via email to