Filip Bednárik created TIKA-1315:
------------------------------------
Summary: Basic list support in WordExtractor
Key: TIKA-1315
URL: https://issues.apache.org/jira/browse/TIKA-1315
Project: Tika
Issue Type: Improvement
Components: parser
Affects Versions: 1.6
Reporter: Filip Bednárik
Priority: Minor
Fix For: 1.6
Attachments: ListUtils.java, WordExtractor.java, WordParserTest.java
Hello guys, I am really sorry to post issue like this because I have no other
way of contacting you and I don't quite understand how you manage forks and
pull requests (I don't think you do that).
In my project I needed for tika to parse numbered lists from word .doc
documents, but TIKA doesn't support it. So I looked for solution and found one
here:
http://developerhints.blog.com/2010/08/28/finding-out-list-numbers-in-word-document-using-poi-hwpf/
. So I adapted this solution to Apache TIKA with few fixes and improvements.
Anyway feel free to use any of it so it can help people who struggle with lists
in TIKA like I did.
Attached files are:
Updated test
Fixed WordExtractor
Added ListUtils
--
This message was sent by Atlassian JIRA
(v6.2#6252)