Hello all,
I'm experimenting with the SKG features available through json.facet API in
solr 8.11 to discover semantic relations between medical text pre-annotated
with NER algorithms.
I store the NER annotations, annotation id, span ecc in separate solr fields,
to keep text clean.
The first results looks promising but I found a behaviour that surprises me.
To give a bit of context I'm looking for covid-related papers with a standard
query (q parameter)
Then I set my foreground query to be a set of keywords in OR related to the
mithochondria, and the background query is set to *.
Then the json.facet parameters are like
"json.facet": {
"gene":{
"type": "terms",
"field": "abstracts_gene_pubtator_annotation_ids",
"sort": { "r1": "desc" },
"limit": 3,
"facet": {
"r1" : "relatedness($fore,$back)"
}
}
}
This should give gene stored in abstracts_gene_pubtator_annotation_ids that are
more likely to occur in mitochondrial papers.
Running a test query gives me this surprising result
...
"gene": {
"buckets": [
{
"val": "3091",
"count": 1,
"rtitles1": {
"relatedness": 0.55649,
"foreground_popularity": 0,
"background_popularity": 0.00018
}
},
...
or for a similar query even bigger relatedness values
...
"buckets": [
{
"val": "MESH:D028361",
"count": 1,
"rabstract_conclusions0": {
"relatedness": 0.91506,
"foreground_popularity": 5e-05,
"background_popularity": 5e-05
},
...
But If I recall the z-score formula
countFG("3091") - totalFG * probBG
------------------------------------------------
sqrt( totalFG * (1-probBG)*probBG )
and set countFG("3091") to 1 this means that the relatedness should be negative
(or at most 0) if totalFG * probBG >=1, while here I find a quite positive
relatedness.
Maybe this can be controlled with min_popularity, but I don't understand how to
use it in conjunction with type=terms and
field=abstracts_gene_pubtator_annotation_ids
Can you please tell me the correct syntax, and if my reasoning is correct?
Thank you
Danilo
Danilo Tomasoni
Fondazione The Microsoft Research - University of Trento Centre for
Computational and Systems Biology (COSBI)
Piazza Manifattura 1, 38068 Rovereto (TN), Italy
[email protected]<https://webmail.cosbi.eu/owa/redir.aspx?C=VNXi3_8-qSZTBi-FPvMwmwSB3IhCOjY8nuCBIfcNIs_5SgD-zNPWCA..&URL=mailto%3acalabro%40cosbi.eu>
http://www.cosbi.eu<https://webmail.cosbi.eu/owa/redir.aspx?C=CkilyF54_imtLHzZqF1gCGvmYXjsnf4bzGynd8OXm__5SgD-zNPWCA..&URL=http%3a%2f%2fwww.cosbi.eu%2f>
As for the European General Data Protection Regulation 2016/679 on the
protection of natural persons with regard to the processing of personal data,
we inform you that all the data we possess are object of treatment in the
respect of the normative provided for by the cited GDPR.
It is your right to be informed on which of your data are used and how; you may
ask for their correction, cancellation or you may oppose to their use by
written request sent by recorded delivery to The Microsoft Research –
University of Trento Centre for Computational and Systems Biology Scarl, Piazza
Manifattura 1, 38068 Rovereto (TN), Italy.
P Please don't print this e-mail unless you really need to