Hi, I guess the most common format today is using the schema.org's ontologies. It provides a couple of definitions, and it is supported by big players, such as Google, Yahoo, Microsoft. See http://schema.org/.
Hope it helps, Péter <otis_gospodne...@yahoo.com> wrote: > > Hello, > > > > If I'm extracting named entities, topics, key phrases/tags, etc. from > documents and I want to have a representation of this document, what format > should I use? Are there any standard or at least common formats or > approaches people use in such situations? > > > > For example, the most straight forward format might be something like > this: > > > > > > <document> > > <title>doc title</title> > > <keywords>meta keywords coming from the web page</keywords> > > <content>page meat</content> > > <entities>name entities recognized in the document</entities> > > <topics>topics extracted by the annotator</topics> > > <tags>tags extracted by the annotator</tags> > > <relations>relations extracted by the annotator</relations> > > </document> > > > > But this is a made up format - the XML tags above are just what somebody > happened to pick. > > > > Are there any standard or at least common formats for this? > > > > > > Thanks, > > Otis > > ---- > > Performance Monitoring - Solr - ElasticSearch - HBase - > http://sematext.com/spm > > > > Search Analytics - http://sematext.com/search-analytics/index.html > On Tue, Sep 11, 2012 at 11:51 AM, Otis Gospodnetic -- Péter Király eXtensible Catalog http://eXtensibleCatalog.org http://drupal.org/project/xc