Zeke – the issue here isn’t specific to deletion of annotations but is related 
to the PDF file format and it’s support for “incremental updates”.

When saving changes to a PDF, they can either be saved by simply appending them 
as part of an increment update section (which includes not only new or changed 
objects, but a list of deleted objects).  This is the most common way to save 
things because it is faster.  You will find that 99% of all PDF processing 
tools do this by default.

Alternatively, software could do a “full save” or a “Save As”, where objects no 
longer in use are “garbage collected”.  Poppler does not offer this option.

Leonard

From: poppler <[email protected]> on behalf of Zeke 
Williams <[email protected]>
Date: Friday, September 23, 2022 at 9:20 AM
To: [email protected] <[email protected]>
Subject: [poppler] There is a flaw with poppler that needs to be fixed. Deleted 
annotations are not actually deleted. I require assistance in fixing this.
EXTERNAL: Use caution when clicking on links or opening attachments.


I require assistance as I am not a very proficient C++ programmer with
this issue with poppler. What happens with poppler is that the portion
of the PDF document that shows the annotation is deleted when you
delete an annotation in such as okular or evince, but the actual
contents is in a separate part of the document and that doesn't get
deleted. Meaning in other words, it's still there. That is a privacy
violation that should be fixed. I believe this is the part of poppler
that removes the annotation:

bool Annots::removeAnnot(Annot *annot)
{
    auto idx = std::find(annots.begin(), annots.end(), annot);

    if (idx == annots.end()) {
        return false;
    } else {
        annot->decRefCnt();
        annots.erase(idx);
        return true;
    }
}

And from another PDF reader (PDF4QT) here is how it removes them:

void PDFDocumentBuilder::removeAnnotation(PDFObjectReference page,
PDFObjectReference annotation)
{
    PDFDocumentDataLoaderDecorator loader(&m_storage);

    if (const PDFDictionary* pageDictionary =
m_storage.getDictionaryFromObject(m_storage.getObjectByReference(page)))
    {
        std::vector<PDFObjectReference> annots =
loader.readReferenceArrayFromDictionary(pageDictionary, "Annots");
        annots.erase(std::remove(annots.begin(), annots.end(),
annotation), annots.end());

        PDFObjectFactory factory;
        factory.beginDictionary();
        factory.beginDictionaryItem("Annots");
        if (!annots.empty())
        {
            factory << annots;
        }
        else
        {
            factory << PDFObject();
        }
        factory.endDictionaryItem();
        factory.endDictionary();

        mergeTo(page, factory.takeObject());
    }

    setObject(annotation, PDFObject());
}

PDF4QT can be found here: 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FJakubMelka%2FPDF4QT&amp;data=05%7C01%7Clrosenth%40adobe.com%7C20fe244e683442cbddc008da9d665505%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637995360231784319%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=NMLVc%2Fbwtyjm0UxtXlqtIEs9eaBU%2BO%2F%2FNaCevw%2F%2Bz8E%3D&amp;reserved=0

What can we do to solve this? I think we should mimic how PDF4QT does
it. What do you think?

Reply via email to