I'm probably a little unclear about the breadth of what you want to
do, but I would recommend DC at the extremely lightweight end, and TEI
at the very heavyweight end. Perhaps you could come up with a mashup
of DC and your own fields in RDF as well.

Michael Della Bitta

------------------------------------------------
Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017
www.appinions.com
Where Influence Isn’t a Game


On Tue, Sep 11, 2012 at 11:51 AM, Otis Gospodnetic
<otis_gospodne...@yahoo.com> wrote:
> Hello,
>
> If I'm extracting named entities, topics, key phrases/tags, etc. from 
> documents and I want to have a representation of this document, what format 
> should I use? Are there any standard or at least common formats or approaches 
> people use in such situations?
>
> For example, the most straight forward format might be something like this:
>
>
> <document>
>   <title>doc title</title>
>   <keywords>meta keywords coming from the web page</keywords>
>   <content>page meat</content>
>   <entities>name entities recognized in the document</entities>
>   <topics>topics extracted by the annotator</topics>
>   <tags>tags extracted by the annotator</tags>
>   <relations>relations extracted by the annotator</relations>
> </document>
>
> But this is a made up format - the XML tags above are just what somebody 
> happened to pick.
>
> Are there any standard or at least common formats for this?
>
>
> Thanks,
> Otis
> ----
> Performance Monitoring - Solr - ElasticSearch - HBase - 
> http://sematext.com/spm
>
> Search Analytics - http://sematext.com/search-analytics/index.html

Reply via email to