If you are on Windows try the Microsoft IFilter API - it supports current Office versions. http://www.microsoft.com/downloads/details.aspx?FamilyId=60C92A37-719C-4077-B5C6-CAC34F4227CC&displaylang=en
On Tue, Apr 27, 2010 at 6:08 AM, Roland Villemoes <r...@alpha-solutions.dk> wrote: > Hi All, > > Does anyone have a running solution indexing Microsoft Office Documents e.g. > .docx .xlsx etc. ? > > I can see a lot of examples using Tika for rich content extraction, but still > nothing when it comes to newer versions of Microsoft Office? > What libraries to use of not Tika? > > med venlig hilsen/best regards > > Roland Villemoes > Tel: (+45) 22 69 59 62 > E-Mail: mailto:r...@alpha-solutions.dk > > Alpha Solutions A/S > Borgergade 2, 3.sal, 1300 København K > Tel: (+45) 70 20 65 38 > Web: http://www.alpha-solutions.dk<http://www.alpha-solutions.dk/> > > ** This message including any attachments may contain confidential and/or > privileged information intended only for the person or entity to which it is > addressed. If you are not the intended recipient you should delete this > message. Any printing, copying, distribution or other use of this message is > strictly prohibited. If you have received this message in error, please > notify the sender immediately by telephone, or e-mail and delete all copies > of this message and any attachments from your system. Thank you. > >