Yes, in the case of that "stroke" PDF each line segment has been put into a separate Form Xobject which adds TREMENDOUS overhead to rendering. If you view it in Adobe Acrobat/Reader, you will see that we end up going into "progressive mode" where each line is rendered individually - cool to look at but VERY painful.
I would also like to remind you that Adobe Illustrator is _NOT_ a PDF editor. So opening up PDFs (that were not originally created by Illustrator) and modifying them is not the intent of the product and not recommend by Adobe. Leonard On 1/7/13 5:39 AM, "[email protected]" <[email protected]> wrote: > >On 06 Jan 2013, at 21:01, [email protected] wrote: >> From: Albert Astals Cid <[email protected]> >> Subject: Re: [poppler] patch for a progress callback >> >> El Divendres, 4 de gener de 2013, a les 09:16:25, Damian Stewart va >>escriure: >> Some comments: >> * The cpp facing api seems quite "un-C++-ish", i think that a class >>with a >> pure virtual progressDone() function and a Page::setProgressHandler >>method >> would make it more C++ish > >In the core Poppler source, the existing callbacks follow the same >pattern: > >void display(OutputDev *out, double hDPI, double vDPI, >... >GBool (*abortCheckCbk)(void *data) = NULL, > void *abortCheckCbkData = NULL, > GBool (*annotDisplayDecideCbk)(Annot *annot, void *user_data) > = NULL, > void *annotDisplayDecideCbkData = NULL, > >With this precedent, I would have assumed > > void (*progressCbk)(int pageNum, float progressPct, void* > user_data) >= NULL, > void *progressCbkData = NULL ); > >makes the most sense.. or? > > >> * How useful is this? I.e. i predict that there are operations that are >> really fast and others really slow, does it really help? > >With UX improvements, often 'useful' is not the adjective to use. Try >'less frustrating'. > >Yes, some operations are slow and some are fast, but having a progress >indicator that gradually increases from 0 to 1 over 2 hours is much, much >less frustrating than a blinking cursor for the same 2 hours, especially >if you don't know up front whether it's going to take 1 minute, 20 >minutes or 2 hours. > > >> * The DummyOutputDev should take the real outputdev and return the >>values of >> the real outputdev for the "querying" functions > >Good point, yes, it should probably do that. > > >> * Have you measured the performance hit? > >I have not, however: >0. Any progress indicator naturally involves reduced 'performance' >(because of GUI redraw calls, mutex grab/release, etc). However from a UX >perspective, 10 seconds with no progress indicator is way more >frustrating for the user than 10.1 seconds with a progress indicator. >1. If progressCbk==NULL in Page::displaySlice then there is performance >hit as the progress code is never called. >2. If the page has few operations then the performance hit through >precalculating operation count will be miniscule. >3. If the page has many operations then the time to perform those >operations 'for real' will dwarf the time to count operations probably by >several orders of magnitude. >4. The time taken hit to call the progressCbk per operation will depend >entirely on the third-party's implementation of the progressCbk; ie, it's >not Poppler's responsibility. > > > > >On 06/01/13 18:35, Adrian Johnson <[email protected]> wrote: > >> This is the biggest problem I see with this patch. There are PDFs with >> one huge image. In this case the progress will sit on 0% for a long time >> then jump 100% when rendering is finished. > >Granted. However it's unlikely that a pdf with embedded images will be >rendered at 300DPI, as that would mean that the embedded images would >also need to be 300DPI; and if someone is emailing you a PDF with 300DPI >images you can email them back and say 'please just send me the images'. > >Also: Have you seen how other progress indicators out there function? >This is a universal issue with progress indicators. > >> It also doesn't take into >> account XObjects and patterns which may also take a long time to render >> with no progress available. There are PDFs with a single XObject painted >> in the content stream. All the work is done in the XObject. >> >> If we merge this feature what do we do when users start filing bugs >> about these cases that don't provide useful progress feedback? > >I don't know. I'm also not sure what you're saying. So no-one should have >a progress bar, because the maintainers don't want to deal with users >complaining about the edge cases? > >>> * The DummyOutputDev should take the real outputdev and return the >>>values of >>> the real outputdev for the "querying" functions, e.g. your dummy >>>outpudev >>> returns false for useTilingPatternFill, but SplashOutputDev returns >>>true, this >>> probably means that you never get to 100% on files that have patterns, >>>no? >>> * Have you measured the performance hit? >> >> I would also like to know the performance cost of rendering to >> DummyOutputDev. > >This point was addressed above. > >> Have you got a sample PDF we can test with? It would be interesting to >> see if there is anything that can be done to speed up the rendering. > >https://dl.dropbox.com/u/16106653/pdfToImage-progress-example.zip > >Note: the PDFs in this zip will probably take a very, very long time to >render with standard desktop PDF viewers. The first one >2012_12_28_13_6_22.pdf (generated by Cairo) is reasonably well behaved. >The second one 2012_12_28_13_6_22_0,5stroke-1.pdf (generated by loading >2012_12_28_13_6_22.pdf into Adobe Illustrator and modifying the stroke >width for all of the lines) seems to have multiple embedded streams, but >is typical of the documents I was working with in making this patch. > >After compiling pdfToImage-progress.cpp, try: > ./a.out -i 2012_12_28_13_6_22.pdf -o test.png --width 12000 >(12000 pixels is the image size necessary to produce a ~100cm wide print >at 300DPI) >(This is a direct output from an openFrameworks application that draws >several thousand line segments every frame, running at 60fps for a minute >or two) > >For a poor performance example, try > ./a.out -i 2012_12_28_13_6_22_0,5stroke-1.pdf -o test.png --width 3508 >(3508 pixels wide is a DIN-A4 print at 300DPI) >(This is generated from the same data file as above, after loading it >into Adobe Illustrator, selecting all the lines and setting their stroke >width to 0.5 -- yes, there are better ways to achieve this than via a GUI >application, but this is the use case I am working from.) > > >> I don't think calling the progress callback on every operation is >> efficient or necessary. It should be sufficient to call the callback >> every n operations. I suggest making n = TotalOperations/100. > >Try with the 2012_12_28_13_6_22_0,5stroke-1.pdf example at 300dpi, as in >the paragraph above. With this one, some single operations can take >several seconds on my (2GHz Intel) machine. If n=TotalOperations/100, the >progress bar would stall on this for tens of seconds to a minute, which >is annoying. IMO the implementor of the callback function should be able >to decide how frequently they want to respond to events. > >> Instead of parsing the content twice to get the operation count you >> could use the current position in the content stream to report progress. >> You would have to use the compressed stream position >> (getBaseStream()->getLength()) since the uncompressed length of a stream >> is not stored in the pdf file. > >I initially attempted that. With the 2012_12_28_13_6_22_0,5stroke-1.pdf >from the zip linked above, it seems there are multiple streams embedded >in the PDF, as the stream position pointer keeps on jumping back to zero. >I'm assuming there are multiple streams, which means you'd need to do a >preprocessing step which involves looping through the PDF, adding up all >the stream lengths, and then actually rendering .... which is exactly >what I've done with this patch, but counting operations rather than >stream lengths. > >Cheers >Damian > >-- >damian stewart . interdisciplinary creative engineer . vienna, austria >http://damianstewart.com . twitter @damian0815 > > > > >_______________________________________________ >poppler mailing list >[email protected] >http://lists.freedesktop.org/mailman/listinfo/poppler _______________________________________________ poppler mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/poppler
