When migrating from 2.0 to 3.0 I noticed some operations were very slow, mainly the Splitter tool. With a big-ish file it would take *a lot* more memory/cpu (jdk8).
I believe the culprit is RandomAccessReadBuffer with inputstreams. This fully reads the stream in 4KB chunks (not a problem), however every time createView(..) is called (on every PDPage access I think) it call a clone RARB constructor, and all its ByteArray chunks are duplicate()'d which for bigger files with many pages means *tons* of wasted objects + calls (even if the underlying buf is the same). Simplifying that, for example by reusing the parent bufferList rather than duplicting it uses the expected cpu/memory (I don't know the implications though). >From simple observations Splitter seems to take x4 more cpu/heap. For example I'd assume with a 100MB file of 300 pages (normal enough if you deal with scanned docs) + inputstream: 100MB = 25600 chunks of 4KB * 300 pages = 7680000 objects created+gc'd in a short time, at least. With smaller files (few pages) this isn't very noticeable, nor with RandomAccessReadBufferedFile (different handling). Passing a pre-read byte[] file to RandomAccessReadBuffer works ok (minimal dupes). RandomAccess.createBuffer(inputStream) in alpha3 was also ok but removed in beta1. Either way, I don't think code should be copying/duping so much and could be restructured, specially since the migration guide hints at using RandomAccessReadBuffer for inputStreams. Also, for RARB it'd make more sense to read chunks as needed in read() rather than all at once in the constructor I think (faster metadata query'ing). Incidentally, may be useful to increase the default chunk size (or allow users to set it) to reduce fragmentation, since it's going the read the whole thing and PDFs < 4kb aren't that common I'd say. (I don't have a publishable example at hand but can be easily replicated by using the PDFMergerUtility and joining the same non-tiny PDF xN times, then splitting it). Thanks.

