Re: Semantic document format... standards?

Jack Krupansky Tue, 11 Sep 2012 15:46:47 -0700

My standard question for such a situation: How are you expecting your usersto query this data? Are they expecting simple English/natural language text,or are they expecting structured "identifiers" that can be keys into otherdata sources.

For example, are your "entities" simple text literal names, or might they beDublin Core (DC) "Agent" URI identifiers?

Ditto for topics - free text vs. some SKOS vocabulary or other form oftaxonomy.


In other words, clue us in as to your client "requirements".

-- Jack Krupansky

-----Original Message-----From: Otis Gospodnetic

Sent: Tuesday, September 11, 2012 11:51 AM
To: solr-user@lucene.apache.org
Subject: Semantic document format... standards?

Hello,

If I'm extracting named entities, topics, key phrases/tags, etc. fromdocuments and I want to have a representation of this document, what formatshould I use? Are there any standard or at least common formats orapproaches people use in such situations?


For example, the most straight forward format might be something like this:


<document>
 <title>doc title</title>
 <keywords>meta keywords coming from the web page</keywords>
 <content>page meat</content>
 <entities>name entities recognized in the document</entities>
 <topics>topics extracted by the annotator</topics>
 <tags>tags extracted by the annotator</tags>
 <relations>relations extracted by the annotator</relations>
</document>

But this is a made up format - the XML tags above are just what somebodyhappened to pick.


Are there any standard or at least common formats for this?


Thanks,
Otis
----

Performance Monitoring - Solr - ElasticSearch - HBase -http://sematext.com/spm

Search Analytics - http://sematext.com/search-analytics/index.html

Re: Semantic document format... standards?

Reply via email to