On 16/01/2014 07:42, Philippe de Rochambeau wrote:
Hello,
can anyone suggest alternatives to GATE
(http://gate.ac.uk/download/)? I would like to index place and person
names in PDFs using gazetteers (ie, dictionaries) and normalize dates
( (eg, December 1st, 2001 will be indexed as 20011201) and feed the
indexes to SOLR?
GATE is a great tool, but the search engine, Mimir, is unfortunately
not customizable (well-document enough) enough for my purposes, which
are to return the found documents (PDFs) ordered by document or
entity (eg, {Date}, {Person}) name.
Many thanks.
Philippe
Hi Phillippe,
For entity extraction we often use the Stanford NLP libraries which are
part of GATE but a lot simpler (GATE is a bit of a beast TBH): for
example in a taxonomy editor/classifier prototype we built recently we
use Stanford to pull out entities from classified documents as
suggestions for improving a node definition:
http://www.flax.co.uk/blog/2012/06/12/clade-a-freely-available-open-source-taxonomy-and-autoclassification-tool/
There's also an interesting European Commission funded project in this
area, a 'marketplace' for text classification & extraction apps:
https://annomarket.com/
HTH
Cheers
Charlie
--
Charlie Hull
Flax - Open Source Enterprise Search
tel/fax: +44 (0)8700 118334
mobile: +44 (0)7767 825828
web: www.flax.co.uk