Hi Chris, BTW I should have said in my mail to Peter, sorry for taking ages ot get back. These emails come though as batches, so sometimes it can be days if the lists are quiet... which they are. Anyway,
On Wed, Jul 3, 2013 at 12:34 AM, <[email protected]> wrote: > > > What about integrating Any23 into Tika -- which has a PDF parser, > etc.? I'd be happy to try and help out wherever I can. > > Yeah I suppose this is the next logical step Chris. The problem I see here though is that, with regards to trivial structured content such as schemas, name spaces, etc., which I may add are completely useless for my purpose, I have a feeling that I am kinda beating my head against a wall here. Any23 extracts structured markup such as DC, LKIFCore, hListings, etc. None of this structure is/will be available within my PDF's. This creates a problem for me. It means that I cannot use most of the built in extraction implementations from Any23. Which leaves me to code the stuff myself... Thanks for chiming in on this one.
