Hi,

I guess the most common format today is using the schema.org's ontologies.
It provides a couple of definitions, and it is supported by big players,
such as Google, Yahoo, Microsoft. See http://schema.org/.

Hope it helps,
Péter


<otis_gospodne...@yahoo.com> wrote:
> > Hello,
> >
> > If I'm extracting named entities, topics, key phrases/tags, etc. from
> documents and I want to have a representation of this document, what format
> should I use? Are there any standard or at least common formats or
> approaches people use in such situations?
> >
> > For example, the most straight forward format might be something like
> this:
> >
> >
> > <document>
> >   <title>doc title</title>
> >   <keywords>meta keywords coming from the web page</keywords>
> >   <content>page meat</content>
> >   <entities>name entities recognized in the document</entities>
> >   <topics>topics extracted by the annotator</topics>
> >   <tags>tags extracted by the annotator</tags>
> >   <relations>relations extracted by the annotator</relations>
> > </document>
> >
> > But this is a made up format - the XML tags above are just what somebody
> happened to pick.
> >
> > Are there any standard or at least common formats for this?
> >
> >
> > Thanks,
> > Otis
> > ----
> > Performance Monitoring - Solr - ElasticSearch - HBase -
> http://sematext.com/spm
> >
> > Search Analytics - http://sematext.com/search-analytics/index.html
>
On Tue, Sep 11, 2012 at 11:51 AM, Otis Gospodnetic



-- 
Péter Király
eXtensible Catalog
http://eXtensibleCatalog.org
http://drupal.org/project/xc

Reply via email to