date:20120912

let us work together

2012-09-12 Thread Laura Matthews

Hi,

My name is Laura Matthews and I would really love to tell you how blogs.kde.org
can rank even better in Google.

I'm a SEO expert working at SEO Persona and while doing
a research for some of my colleagues I found your email address and
decided to contact you immediately.

If you are interested I will be happy to send the additional information
and all the details needed to make it happen.

Thanks a lot,

Laura
SEOpersona.net

___
calligra-devel mailing list
calligra-devel@kde.org
https://mail.kde.org/mailman/listinfo/calligra-devel

Extracting plain text and meta data

2012-09-12 Thread Vishesh Handa

Hey everyone

I'm currently working on improving KDE File Indexing infrastructure.

One of the areas where we are lacking is proper support for Open Document
Formats and Microsoft Document Formats. It occurred to me that maybe I
could use the calligra libraries to do so. I even looked at the code base (
a little bit ) and extracting the basic metadata is really simple
(KoDocumentInfo).

I also looked at the Calligra Converter code, which seems to be using a
print job to convert the formats. It can convert the file to a pdf, which I
can then easily parse, but that seems like a bit too much effort. Not to
mention that it's probably very slow.

So my question is - Is it possible to use Calligra to quickly extract the
plain text from the file?

Also, what kind of dependencies an I looking at? Just calligra-libs or
something else?

-- 
Vishesh Handa

PS: Please keep me cced. I'm not on the mailing list.
___
calligra-devel mailing list
calligra-devel@kde.org
https://mail.kde.org/mailman/listinfo/calligra-devel

Re: Extracting plain text and meta data

2012-09-12 Thread Inge Wallin

On Wednesday, September 12, 2012 16:55:10 Vishesh Handa wrote:
> Hey everyone
> 
> I'm currently working on improving KDE File Indexing infrastructure.
> 
> One of the areas where we are lacking is proper support for Open Document
> Formats and Microsoft Document Formats. It occurred to me that maybe I
> could use the calligra libraries to do so. I even looked at the code base (
> a little bit ) and extracting the basic metadata is really simple
> (KoDocumentInfo).
> 
> I also looked at the Calligra Converter code, which seems to be using a
> print job to convert the formats. It can convert the file to a pdf, which I
> can then easily parse, but that seems like a bit too much effort. Not to
> mention that it's probably very slow.
> 
> So my question is - Is it possible to use Calligra to quickly extract the
> plain text from the file?
> 
> Also, what kind of dependencies an I looking at? Just calligra-libs or
> something else?

The short answer would be to look at the epub export filter.  It uses a small 
set of functions to traverse the xml tree of an odt file.  Right now it uses 
this information to create an xhtml output file. It should be very easy to 
instead create a pure text file.
___
calligra-devel mailing list
calligra-devel@kde.org
https://mail.kde.org/mailman/listinfo/calligra-devel

let us work together

Extracting plain text and meta data

Re: Extracting plain text and meta data

3 matches

Site Navigation

Mail list logo

Footer information