Control: reassign -1 apt On Fri, Sep 08, 2017 at 02:46:10PM +0200, Enrico Rossi wrote: > Hi, > > Enrico Zini made this python3 code to test the problem outside debtags: > > #!/usr/bin/python3 > import apt > > def main(): > cache = apt.Cache() > for pkg in cache: > cand = pkg.candidate > if not cand: continue > tags = cand.record.get("Tag", None) > if not tags: continue > print(pkg.name, tags) > > if __name__ == "__main__": > main() > > Running it without compressed index in apt: > > # time python3 cattags >/dev/null > real 0m1.671s > user 0m1.539s > sys 0m0.132s > > with compressed index: > > # time python3 cattags >/dev/null > real 1m44.238s > user 1m33.536s > sys 0m10.698s > > I'm going to re-assign the bug to python-apt, maybe they have more info > on the issue.
So the problem really is that random access to lz4 compressed files does not really exist, instead, when seeking backwards in it, we seek to the beginning and then re-read until we reached our target position. When seeking forward, we simply read and discard the bytes from the current position to the target. I'm not sure how much backward seeks are involved in that, but if the slowdown is substantial, probably a lot. I fear that because the tag file uses a buffer and reads further than needed in the compressed stream that essentially *every* jump is a backward read, thus causing a quadratic amount of bytes sum(k * average_section_size, k=1...n) to be decompressed - yes, you read that right, O(n^2). That's of course inefficient. One way to work around this is to make TagFile use FileFd's native buffering support somehow, instead of maintaining its own buffer. Then FileFd could just seek forward in the buffer and then (if needed) in the file instead of discarding the buffer and starting from the beginning again. -- Debian Developer - deb.li/jak | jak-linux.org - free software dev | Ubuntu Core Developer | When replying, only quote what is necessary, and write each reply directly below the part(s) it pertains to ('inline'). Thank you.