On 12/22/2011 09:44 AM, Panks wrote:
Very great. Lot of thanks for sharing your progress. For poppler
you may like to have a look at
http://people.freedesktop.org/~aacid/docs/qt4/
<http://people.freedesktop.org/%7Eaacid/docs/qt4/> and for
implementations using it
http://mail.kde.org/pipermail/okular-devel/2011-May/009429.html (
http://quickgit.kde.org/index.php?p=okular.git&a=summary
<http://quickgit.kde.org/index.php?p=okular.git&a=summary> ).
For the initial skeleton what means the very first code to start a
PDF-importer with I could provide some helping hands to get it
done. We could start with creating a branch in our git and add a
calligra/filters/words/pdfimport directory and then copy over the
Ascii-filter + rename + adapt the CMakeLists.txt + link against
libpoppler and create the first lines of code that use libpoppler
to have a look first code that extracts content from a PDF and
writes it into a ODT. You can ping me at IRC or write a mail to
get started on this :-)
Hello Sebastian,
I did little bit of modification in code on my system, I created a new
direcory pdfimport inside calligra/filters/words.I copied import
files, cmakefile and .desktop file from ascii directory and renamed
them to pdfimport.
this is my CMakeList.txt - http://paste.kde.org/176486/
and this is word_pdf_import.desktop file - http://paste.kde.org/176498/
I added the line
> add_subdirectory( pdfimport )
in CMakeList.txt in calligra/filters/words directory. I tried building
the code after this without doing much modification to pdfimport.cpp
and pdfimport.h (the code in them was same as asciiimport.cpp amd
asciiimport.h). Build was successful but I didn't see any change in
filter after launching calligraword, I mean the 'Open Document' window
still wasn't showing the pdf documents neither there was any entry as
pdf in drop down list of filter. So, What all changes do I need to do
and in which all file to at least make pdf file visible in 'Open
Document' dialog and make it accept it?
Looks all correct. Did you do a "kbuildsycoca4" so the new desktop-file
is proper picked-up?
Back then it was also needed to define in the PdfImport.cpp the proper
libname. So something like;
K_PLUGIN_FACTORY(PdfImportFactory, registerPlugin<PdfImport>();)
K_EXPORT_PLUGIN(PdfImportFactory("wordspdfimport", "calligrafilters"))
Not sure if that is needed any longer but it certainly cannot harm.
and, second thing, I was going through the code of asciiimport.cpp, in
that code the input file has been passed to a QTextStream object and
appropriate codec is set to the object.
QTextStream stream(&in);
stream.setCodec(codec);
and after that using a QString the lines are being appended to the
document-
QString line = stream.readLine();.
bodyWriter->addTextSpan(line);
whereas using poppler there is no such straing forward option to get
the text line by line, I think.
Correct. Text-files are simple compared to PDF-files. The later can have
formatings (bold, italic, underline, different font-sizes, font-color,
etc. pp) and even images. Our target would be to take all that over. But
step by step. We can start with simple things like the pure text and
some basic formatings and later go on to e.g. images.
One method I could think of was to go to each pdf page one by one and use
QString text(const QRectF &rect, TextLayout)
function to get the text within a rectangle into a QString, but in
this case what value of rect should I pass to the function and apart
from this what other method I can use to fetch the text out of pdf
using poppler? Please give some suggestion.
It looks as poppler Qt is not enough for us to to anything more put
extracting the pure plain-text :-(
What we ideally like to have is something like
http://cgit.freedesktop.org/poppler/poppler/tree/poppler/TextOutputDev.h
. So an own OutputDev that does compared to the ArthurOutputDev not
render by drawing it using a QPainter but by producing proper ODF out of it.
poppler ships with
http://cgit.freedesktop.org/poppler/poppler/tree/utils/ which is a nice
show-case how to output to a HTML file. I guess that's a good starting
point. We could first investigate what would be needed to create our own
OdtOutputDevice and then just create it :-)
May I suggest to commit early and often. Means it would really rock if
you can create a branch for out work and commit what you have so far
(doesn't need to compile or work) with something like;
# create branch
git checkout master -b filter-words-pdfimport-panks
# add your new filter
git add filters/words/pdfimport
#commit everything
git commit -a
# and push the branch upstream
git push
Hope the above steps work. git is rather tricky sometimes if not all
times :-/
_______________________________________________
calligra-devel mailing list
calligra-devel@kde.org
https://mail.kde.org/mailman/listinfo/calligra-devel