On 12/22/2011 09:44 AM, Panks wrote:



    Very great. Lot of thanks for sharing your progress. For poppler
    you may like to have a look at
    http://people.freedesktop.org/~aacid/docs/qt4/
    <http://people.freedesktop.org/%7Eaacid/docs/qt4/> and for
    implementations using it
    http://mail.kde.org/pipermail/okular-devel/2011-May/009429.html (
    http://quickgit.kde.org/index.php?p=okular.git&a=summary
    <http://quickgit.kde.org/index.php?p=okular.git&a=summary> ).

    For the initial skeleton what means the very first code to start a
    PDF-importer with I could provide some helping hands to get it
    done. We could start with creating a branch in our git and add a
    calligra/filters/words/pdfimport directory and then copy over the
    Ascii-filter + rename + adapt the CMakeLists.txt + link against
    libpoppler and create the first lines of code that use libpoppler
    to have a look first code that extracts content from a PDF and
    writes it into a ODT. You can ping me at IRC or write a mail to
    get started on this :-)


Hello Sebastian,

I did little bit of modification in code on my system, I created a new direcory pdfimport inside calligra/filters/words.I copied import files, cmakefile and .desktop file from ascii directory and renamed them to pdfimport.
this is my CMakeList.txt - http://paste.kde.org/176486/
and this is word_pdf_import.desktop file - http://paste.kde.org/176498/
I added the line
> add_subdirectory( pdfimport )
in CMakeList.txt in calligra/filters/words directory. I tried building the code after this without doing much modification to pdfimport.cpp and pdfimport.h (the code in them was same as asciiimport.cpp amd asciiimport.h). Build was successful but I didn't see any change in filter after launching calligraword, I mean the 'Open Document' window still wasn't showing the pdf documents neither there was any entry as pdf in drop down list of filter. So, What all changes do I need to do and in which all file to at least make pdf file visible in 'Open Document' dialog and make it accept it?


Looks all correct. Did you do a "kbuildsycoca4" so the new desktop-file is proper picked-up?

Back then it was also needed to define in the PdfImport.cpp the proper libname. So something like;

K_PLUGIN_FACTORY(PdfImportFactory, registerPlugin<PdfImport>();)
K_EXPORT_PLUGIN(PdfImportFactory("wordspdfimport", "calligrafilters"))

Not sure if that is needed any longer but it certainly cannot harm.

and, second thing, I was going through the code of asciiimport.cpp, in that code the input file has been passed to a QTextStream object and appropriate codec is set to the object.
    QTextStream stream(&in);
    stream.setCodec(codec);

and after that using a QString the lines are being appended to the document-

    QString line = stream.readLine();.
    bodyWriter->addTextSpan(line);


whereas using poppler there is no such straing forward option to get the text line by line, I think.

Correct. Text-files are simple compared to PDF-files. The later can have formatings (bold, italic, underline, different font-sizes, font-color, etc. pp) and even images. Our target would be to take all that over. But step by step. We can start with simple things like the pure text and some basic formatings and later go on to e.g. images.
One method I could think of was to go to each pdf page one by one and use
QString text(const QRectF &rect, TextLayout)
function to get the text within a rectangle into a QString, but in this case what value of rect should I pass to the function and apart from this what other method I can use to fetch the text out of pdf using poppler? Please give some suggestion.


It looks as poppler Qt is not enough for us to to anything more put extracting the pure plain-text :-(

What we ideally like to have is something like http://cgit.freedesktop.org/poppler/poppler/tree/poppler/TextOutputDev.h . So an own OutputDev that does compared to the ArthurOutputDev not render by drawing it using a QPainter but by producing proper ODF out of it.

poppler ships with http://cgit.freedesktop.org/poppler/poppler/tree/utils/ which is a nice show-case how to output to a HTML file. I guess that's a good starting point. We could first investigate what would be needed to create our own OdtOutputDevice and then just create it :-)

May I suggest to commit early and often. Means it would really rock if you can create a branch for out work and commit what you have so far (doesn't need to compile or work) with something like;

# create branch
git checkout master -b filter-words-pdfimport-panks
# add your new filter
git add filters/words/pdfimport
#commit everything
git commit -a
# and push the branch upstream
git push

Hope the above steps work. git is rather tricky sometimes if not all times :-/

_______________________________________________
calligra-devel mailing list
calligra-devel@kde.org
https://mail.kde.org/mailman/listinfo/calligra-devel

Reply via email to