Re: Extract footer/header text out of Word docs

Otis Gospodnetic Thu, 30 Aug 2012 06:28:20 -0700

Hi Alex,

I think you may get better help on the Tika mailing list - Solr uses Tika to 
parse rich text docs and extract text from them.  I don't know if Tika can 
figure out what's from a header and a footer...


Otis 
----
Performance Monitoring for Solr / ElasticSearch / HBase - 
http://sematext.com/spm 



----- Original Message -----
> From: Alex Cougarman <acoug...@bwc.org>
> To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org>
> Cc: 
> Sent: Thursday, August 30, 2012 9:25 AM
> Subject: Extract footer/header text out of Word docs
> 
> Hi. Is it possible to specifically extract footer/header and body text out of 
> a 
> Word document using Solr? In other words, we'd like to index/store those 
> items in different Solr fields.
> 
> Also, is it possible to search on specific styles within a Word document? Can 
> these attributes be indexed? Thanks.
> 
> Sincerely,
> Alex
>

Re: Extract footer/header text out of Word docs

Reply via email to