On Thu, Dec 22, 2011 at 10:04 PM, Sebastian Sauer <m...@dipe.org> wrote:
> ** > On 12/22/2011 09:44 AM, Panks wrote: > > > > Very great. Lot of thanks for sharing your progress. For poppler you may > like to have a look at http://people.freedesktop.org/~aacid/docs/qt4/ and > for implementations using it > http://mail.kde.org/pipermail/okular-devel/2011-May/009429.html ( > http://quickgit.kde.org/index.php?p=okular.git&a=summary ). > > For the initial skeleton what means the very first code to start a > PDF-importer with I could provide some helping hands to get it done. We > could start with creating a branch in our git and add a > calligra/filters/words/pdfimport directory and then copy over the > Ascii-filter + rename + adapt the CMakeLists.txt + link against libpoppler > and create the first lines of code that use libpoppler to have a look first > code that extracts content from a PDF and writes it into a ODT. You can > ping me at IRC or write a mail to get started on this :-) > > > Hello Sebastian, > > I did little bit of modification in code on my system, I created a new > direcory pdfimport inside calligra/filters/words.I copied import files, > cmakefile and .desktop file from ascii directory and renamed them to > pdfimport. > this is my CMakeList.txt - http://paste.kde.org/176486/ > and this is word_pdf_import.desktop file - http://paste.kde.org/176498/ > I added the line > > add_subdirectory( pdfimport ) > in CMakeList.txt in calligra/filters/words directory. I tried building > the code after this without doing much modification to pdfimport.cpp and > pdfimport.h (the code in them was same as asciiimport.cpp amd > asciiimport.h). Build was successful but I didn't see any change in filter > after launching calligraword, I mean the 'Open Document' window still > wasn't showing the pdf documents neither there was any entry as pdf in drop > down list of filter. So, What all changes do I need to do and in which all > file to at least make pdf file visible in 'Open Document' dialog and make > it accept it? > > > Looks all correct. Did you do a "kbuildsycoca4" so the new desktop-file is > proper picked-up? > > Back then it was also needed to define in the PdfImport.cpp the proper > libname. So something like; > > K_PLUGIN_FACTORY(PdfImportFactory, registerPlugin<PdfImport>();) > K_EXPORT_PLUGIN(PdfImportFactory("wordspdfimport", "calligrafilters")) > > Not sure if that is needed any longer but it certainly cannot harm. > > and, second thing, I was going through the code of asciiimport.cpp, in > that code the input file has been passed to a QTextStream object and > appropriate codec is set to the object. > QTextStream stream(&in); > stream.setCodec(codec); > > and after that using a QString the lines are being appended to the > document- > > QString line = stream.readLine();. > bodyWriter->addTextSpan(line); > > > whereas using poppler there is no such straing forward option to get the > text line by line, I think. > > > Correct. Text-files are simple compared to PDF-files. The later can have > formatings (bold, italic, underline, different font-sizes, font-color, etc. > pp) and even images. Our target would be to take all that over. But step by > step. We can start with simple things like the pure text and some basic > formatings and later go on to e.g. images. > > One method I could think of was to go to each pdf page one by one and > use > QString text(const QRectF &rect, TextLayout) > function to get the text within a rectangle into a QString, but in this > case what value of rect should I pass to the function and apart from this > what other method I can use to fetch the text out of pdf using poppler? > Please give some suggestion. > > > It looks as poppler Qt is not enough for us to to anything more put > extracting the pure plain-text :-( > > What we ideally like to have is something like > http://cgit.freedesktop.org/poppler/poppler/tree/poppler/TextOutputDev.h. So > an own OutputDev that does compared to the ArthurOutputDev not render > by drawing it using a QPainter but by producing proper ODF out of it. > > poppler ships with > http://cgit.freedesktop.org/poppler/poppler/tree/utils/which is a nice > show-case how to output to a HTML file. I guess that's a > good starting point. We could first investigate what would be needed to > create our own OdtOutputDevice and then just create it :-) > > May I suggest to commit early and often. Means it would really rock if you > can create a branch for out work and commit what you have so far (doesn't > need to compile or work) with something like; > > # create branch > git checkout master -b filter-words-pdfimport-panks > # add your new filter > git add filters/words/pdfimport > #commit everything > git commit -a > # and push the branch upstream > git push > > Hope the above steps work. git is rather tricky sometimes if not all times > :-/ > > Hello Sebastian :-) Sorry for late reply, College reopening next week so have few assignments to deal with in this week. Anyway, I made that skeleton work, now it is showing pdf files in 'Open Document' window and pushed it to kde git too. I went through that Outputdev file once roughly. Can you please give me some hint on what should I hit upon/do next? Thank you, * * Pankaj UG Student *|* Dept. of Computer Science and Engineering IIT Madras, Chennai, India
_______________________________________________ calligra-devel mailing list calligra-devel@kde.org https://mail.kde.org/mailman/listinfo/calligra-devel