Hi Eduardo,

Unfortunately this is a commonly encountered problem, usually because data 
providers change the data GBIF harvests. In this case, the difference between 
the data sets is that one has the field ?occurrenceID? set and the other 
doesn?t, so the records appear to be two different records to GBIF. In an ideal 
world the older dataset would be deleted or otherwise deprecated, and only the 
newer data displayed.

Regards

Rod

---------------------------------------------------------
Roderic Page
Professor of Taxonomy
Institute of Biodiversity, Animal Health and Comparative Medicine
College of Medical, Veterinary and Life Sciences
Graham Kerr Building
University of Glasgow
Glasgow G12 8QQ, UK

Email:  Roderic.Page at glasgow.ac.uk<mailto:Roderic.Page at glasgow.ac.uk>
Tel:  +44 141 330 4778
Skype:  rdmpage
Facebook:  http://www.facebook.com/rdmpage
LinkedIn:  http://uk.linkedin.com/in/rdmpage
Twitter:  http://twitter.com/rdmpage
Blog:  http://iphylo.blogspot.com
ORCID:  http://orcid.org/0000-0002-7101-9767
Citations:  http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ
ResearchGate https://www.researchgate.net/profile/Roderic_Page


On 14 Sep 2015, at 17:25, Eduardo Dalcin <edalcin at jbrj.org<mailto:edalcin at 
jbrj.org>> wrote:

Good points Markus, Thanks!

However, other publishers are *very* online, like this example:

"The New York Botanical Garden Herbarium (NY) - Vascular Plant Collection" 
(http://www.gbif.org/dataset/d415c253-4d61-4459-9d25-4015b9084fb0<https://mailtrack.io/trace/link/0a54ebc017ec4ddde255d8f470cf1d5eb58d6ff1?url=http%3A%2F%2Fwww.gbif.org%2Fdataset%2Fd415c253-4d61-4459-9d25-4015b9084fb0&signature=9fbac047f6b2d815>)
 and the "Herbarium of The New York Botanical Garden" 
(http://www.gbif.org/dataset/7133ff0a-f762-11e1-a439-00145eb45e9a<https://mailtrack.io/trace/link/c5595e540f23c50c332c5d3aba65d9b857daec6c?url=http%3A%2F%2Fwww.gbif.org%2Fdataset%2F7133ff0a-f762-11e1-a439-00145eb45e9a&signature=3cd4b1e2eec64e92>).

Same stuff, twice.

The thing is that when we search for, for instance, "Belemia fucsioides" we got 
a duplication of records of the same entity:

<FireShot Pro Screen Capture #076 - 'Occurrence Search Results' - 
www_gbif_org_occurrence_search_TAXON_KEY=5553637.png>
?
http://www.gbif.org/occurrence/216419815<https://mailtrack.io/trace/link/ec43e42a6e6e903eea24db7611a53591ef91ecff?url=http%3A%2F%2Fwww.gbif.org%2Foccurrence%2F216419815&signature=48bd3b924b438606>
http://www.gbif.org/occurrence/1098393958<https://mailtrack.io/trace/link/9e4d9ffa65cef4747df77c1c708df94d1da1b929?url=http%3A%2F%2Fwww.gbif.org%2Foccurrence%2F1098393958&signature=6d279f2c395d493a>

This is very annoying and give us a lot of work to clean up.

Cheers,

Eduardo





--------------------------------
Eduardo 
Dalcin<https://mailtrack.io/trace/link/12fd73de9c0d11461d2da7249c58967486d95ffb?url=http%3A%2F%2Feduardo.dalc.in&signature=b76aae61fa71c8a0>
Instituto de Pesquisas Jardim Bot?nico do Rio de Janeiro - JBRJ
e-mail: edalcin at jbrj.gov.br<mailto:edalcin at jbrj.gov.br>
Trabalho / Work: +55 21 3204 2116
--------------------------------
e-mail alternativo /  alternate email: edalcin at jbrj.org<mailto:edalcin at 
jbrj.org>
--------------------------------
Agendar reuni?o / Schedule a meeting: 
http://agendar.dalc.in<https://mailtrack.io/trace/link/8eb76452df5772642c41cbc47d035ab63fb88da6?url=http%3A%2F%2Fagendar.dalc.in&signature=db7d545fe68e0cb0>

On Thu, Sep 10, 2015 at 4:50 AM, Markus D?ring <mdoering at 
gbif.org<mailto:mdoering at gbif.org>> wrote:
Eduardo,

another difference in using downloads periodically is that you get the 
interpreted data from us (together with the original if you want to).
That already contains quite a bit of data cleaning and aligning to controlled 
vocabularies that might be painful to reproduce otherwise.
Also publishers are *very* often offline. Especially for the long running xml 
harvesting protocols (biocase,tapir,digir) this can be a bit of a challenge to 
index them entirely.

Markus


On 09 Sep 2015, at 20:02, Eduardo Dalcin <edalcin at jbrj.org<mailto:edalcin at 
jbrj.org>> wrote:

Thanks Alex. Food for thought.

Best,

Eduardo



--------------------------------
Eduardo 
Dalcin<https://mailtrack.io/trace/link/216b27d1db9fb6d2aa5a6d7b07aaf7a6b6b19ed9?url=http%3A%2F%2Feduardo.dalc.in&signature=fd3c844b73e3c400>
Instituto de Pesquisas Jardim Bot?nico do Rio de Janeiro - JBRJ
e-mail: edalcin at jbrj.gov.br<mailto:edalcin at jbrj.gov.br>
Trabalho / Work: +55 21 3204 2116<tel:%2B55%2021%203204%202116>
--------------------------------
e-mail alternativo /  alternate email: edalcin at jbrj.org<mailto:edalcin at 
jbrj.org>
--------------------------------
Agendar reuni?o / Schedule a meeting: 
http://agendar.dalc.in<https://mailtrack.io/trace/link/dc51cf0b6ea2d4897c2ae7a70bab52163650de15?url=http%3A%2F%2Fagendar.dalc.in&signature=c87a5e78e01a7976>

On Wed, Sep 9, 2015 at 2:28 PM, Alex Thompson <godfoder at 
acis.ufl.edu<mailto:godfoder at acis.ufl.edu>> wrote:
I'm kind of seconding Rod here.

It might make more sense, depending on your use case and local computer 
resources, to just get a download of Plantae *AND* Brazil from GBIF 
periodically, then process that to exclude existing Brazilian datasets. You 
could then use something like Apache hadoop / spark to efficiently split the 
file by dataset or by institution code.

This would greatly simplify your interactions with GBIF (down to just 
periodically generating a download programmatically) and you would have an easy 
place to insert any additional data transformations you want. This is the path 
i take for my work at least - the incremental cost of a couple million more 
records is worth the reduction in complexity overall.

- Alex


On 09/09/2015 12:16 PM, Eduardo Dalcin wrote:
Hi Rod,

The real purpose is to have a list of UUID and the "source web page" for the 
data set. Thus, one way to do it is to select those resources that counts <> 0 
for PLANTAE *AND* Brazil.

I don't want to do any stats analysis, but feed up one local harverster / 
agregator.

The problem is, considering the reply from Jan Legind at Sep 3, we have to 
check one by one (https://goo.gl/3wysaA) to check if it is a Herbarium / 
Preserved Specimen (Plantae) or not, from the request 
http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN.

Does it make sense?

Thanks for your curiosity! :)

Cheers,

Eduardo



--------------------------------
Eduardo 
Dalcin<https://mailtrack.io/trace/link/5516ed5e4f903c6ee9bd9fb3876fb65ffffc687c?url=http%3A%2F%2Feduardo.dalc.in&signature=cda9e9bf584a828c>
Instituto de Pesquisas Jardim Bot?nico do Rio de Janeiro - JBRJ
e-mail: edalcin at jbrj.gov.br<mailto:edalcin at jbrj.gov.br>
Trabalho / Work: +55 21 3204 2116<tel:%2B55%2021%203204%202116>
--------------------------------
e-mail alternativo /  alternate email: edalcin at jbrj.org<mailto:edalcin at 
jbrj.org>
--------------------------------
Agendar reuni?o / Schedule a meeting: 
http://agendar.dalc.in<https://mailtrack.io/trace/link/3a5eaa1df56016285886497766577e5357ddc6c1?url=http%3A%2F%2Fagendar.dalc.in&signature=c4e8d8113c34937f>

On Mon, Sep 7, 2015 at 12:33 PM, Roderic Page <Roderic.Page at 
glasgow.ac.uk<mailto:Roderic.Page at glasgow.ac.uk>> wrote:
Hi Eduardo,

I?m curious, is the purpose to get counts by dataset by country, or to get all 
the plant occurrences for Brazil? The later can be obtained by downloading all 
plant occurrences in Brazil 
http://www.gbif.org/occurrence/search?TAXON_KEY=6&COUNTRY=BR (you could then 
compute the per-dataset stats locally). I realise that this isn?t as convenient 
as having GBIF slice the data for you in the API.

Regards

Rod

---------------------------------------------------------
Roderic Page
Professor of Taxonomy
Institute of Biodiversity, Animal Health and Comparative Medicine
College of Medical, Veterinary and Life Sciences
Graham Kerr Building
University of Glasgow
Glasgow G12 8QQ, UK

Email:  Roderic.Page at glasgow.ac.uk<mailto:Roderic.Page at glasgow.ac.uk>
Tel:  +44 141 330 4778<tel:%2B44%20141%20330%204778>
Skype:  rdmpage
Facebook:  http://www.facebook.com/rdmpage
LinkedIn:  http://uk.linkedin.com/in/rdmpage
Twitter:  http://twitter.com/rdmpage
Blog:  http://iphylo.blogspot.com<http://iphylo.blogspot.com/>
ORCID:  http://orcid.org/0000-0002-7101-9767
Citations:  http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ
ResearchGate https://www.researchgate.net/profile/Roderic_Page


On 4 Sep 2015, at 10:39, Eduardo Dalcin <edalcin at jbrj.org<mailto:edalcin at 
jbrj.org>> wrote:

Hi Markus,

Yes, that's a shame I can't have country and "nub" together. There is any hope 
about it?

Eduardo



--------------------------------
Eduardo 
Dalcin<https://mailtrack.io/trace/link/bac23864202354f3789938ce352a878faa0cd8b8?url=http%3A%2F%2Feduardo.dalc.in&signature=aea58ef6f439535b>
Instituto de Pesquisas Jardim Bot?nico do Rio de Janeiro - JBRJ
e-mail: edalcin at jbrj.gov.br<mailto:edalcin at jbrj.gov.br>
Trabalho / Work: +55 21 3204 2116<tel:%2B55%2021%203204%202116>
--------------------------------
e-mail alternativo /  alternate email: edalcin at jbrj.org<mailto:edalcin at 
jbrj.org>
--------------------------------
Agendar reuni?o / Schedule a meeting: 
http://agendar.dalc.in<https://mailtrack.io/trace/link/db57b837be515d4b7caefe43d55b60467cd7c2c1?url=http%3A%2F%2Fagendar.dalc.in&signature=69b244942739c0f5>

On Thu, Sep 3, 2015 at 4:29 PM, Markus D?ring <mdoering at 
gbif.org<mailto:mdoering at gbif.org>> wrote:
Eduardo,

as you might have seen from my issue comment the webservice uses a different 
parameter name for taxonKey which is a bug we need to fix at some point.
Please use nubKey for now to use the service like that:

http://api.gbif.org/v1/occurrence/counts/datasets?nubKey=6

The real problem for you will be that we do not support the combination of the 
country and the taxon filter, just one of the two. So you cannot search for 
plants in Brazil I am afraid, just for datasets about Brazil and datasets with 
plant records.

Markus



> On 03 Sep 2015, at 14:12, Eduardo Dalcin <edalcin at jbrj.org<mailto:edalcin 
> at jbrj.org>> wrote:
>
> Thanks Jan. I'll keep exploring and I'll be in touch, if I need.
>
> Best,
>
> Eduardo
>
>
>
> --------------------------------
> Eduardo Dalcin
> Instituto de Pesquisas Jardim Bot?nico do Rio de Janeiro - JBRJ
> e-mail: edalcin at jbrj.gov.br<mailto:edalcin at jbrj.gov.br>
> Trabalho / Work: +55 21 3204 2116<tel:%2B55%2021%203204%202116>
> --------------------------------
> e-mail alternativo /  alternate email: edalcin at jbrj.org<mailto:edalcin at 
> jbrj.org>
> --------------------------------
> Agendar reuni?o / Schedule a meeting: 
> http://agendar.dalc.in<https://mailtrack.io/trace/link/db57b837be515d4b7caefe43d55b60467cd7c2c1?url=http%3A%2F%2Fagendar.dalc.in&signature=69b244942739c0f5>
>
> On Thu, Sep 3, 2015 at 4:51 AM, Jan Legind [GBIF] <jlegind at 
> gbif.org<mailto:jlegind at gbif.org>> wrote:
> Dear Eduardo,
>
>
>
> Thanks for getting in touch with us about these issues.
>
>
>
> The first request 
> http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN
>  returns the number of records located in Brazil for the facets in the 
> request.
>
> The second query 
> http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN
>  uses the Occurrence Inventories web service 
> http://www.gbif.org/developer/occurrence#inventories which does not support 
> the basis-of-record facet in the /datasets request. I understand that it 
> would be better if the API response yielded an error message in this instance.
>
>
>
> Concerning the other issues ? you are indeed right that the counts do not 
> make sense in the context of taxon key 6 which is Plantae. Actually the API 
> does not handle the taxonKey search at all, contrary to what the 
> documentation states:
>
>
>
> /occurrence/counts/datasets
>
> GET
>
> Counts
>
> Lists occurrence counts for datasets that cover a given taxon or country.
>
> country, taxonKey
>
>
>
> As you can see here, 
> http://api.gbif.org/v1/occurrence/counts/datasets?taxonKey=6 , this request 
> doesn?t return anything.
>
>
>
> The GBIF developers will handle this issue in due time.
>
> You can follow the issue in our bug tracking service here: 
> http://dev.gbif.org/issues/browse/POR-2828
>
>
>
>
>
> With best regards,
>
>
>
> Jan K. Legind
>
> Data manager, GBIF Secretariat
>
>
>
>
>
> From: API-users [mailto:api-users-bounces at 
> lists.gbif.org<mailto:api-users-bounces at lists.gbif.org>] On Behalf Of 
> Eduardo Dalcin
> Sent: 2. september 2015 20:06
> To: api-users at lists.gbif.org<mailto:api-users at lists.gbif.org>; dev at 
> gbif.org<mailto:dev at gbif.org>
> Cc: Jo?o Monnerat Lanna; Nat?lia Queiroz; Diogo Silva; Laura; Ricardo Avancini
> Subject: [API-users] Some questions from a begginer
>
>
>
> Hi folks,
>
>
>
> This is my first message to the list. So, please, be nice :)
>
>
>
> I'm working here at Rio de Janeiro Botanical Garden, together with the guys 
> at the National Center for Flora Conservation. We are doing the risk 
> assessment of the Brazilian flora to the government. We assess, so far, the 
> risk of ca. 6.000 species, but we still have to assess ca. 35.000. Access 
> occurrence records for Brazil is crucial, and every occurrence is important.
>
>
>
> That means that we have to put together occurrence data from different 
> sources and, after the first batch of the risk assessment, we realize that we 
> need to build up our aggregator. We are planning to do this with the 
> Lontra-harvester, with the help of the guys at Brazilian GBIF Node.
>
>
>
> So, the one of the firsts steps was to list the available resources to 
> understand the dimension of the task and, that brings me to my questions.
>
>
>
> First:
>
>
>
> The request:
>
>
>
> http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN
>
>
>
> returns 4.982.689 records
>
>
>
> And the request:
>
>
>
> http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN
>
>
>
> returns (here) 7.406.310 records
>
>
>
> Comments?
>
>
>
> Second:
>
>
>
> The request:
>
>
>
> http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN
>
>
>
> return things like this:
>
>
>
> "197908d0-5565-11d8-b290-b8a03c50a862":27629
>
>
> But the consult of the same dataset:
>
>
>
> http://www.gbif.org/occurrence/search?TAXON_KEY=6&DATASET_KEY=197908d0-5565-11d8-b290-b8a03c50a862
>
>
>
> Returns "null" (of course, is a FishBase!)
>
>
>
> I have plenty of examples like this, on yellow here (not finished!):
>
>
>
> https://docs.google.com/spreadsheets/d/1msUjwMLoKwnXxJFzF20SeN_C65RIkGLbwaYyj459VTc/edit?usp=sharing
>
>
>
> Comments?
>
>
>
> I think those two questions is a good start. Please, let me know if I'm doing 
> something wrong.
>
>
>
> Cheers,
>
>
>
> Eduardo
>
> --------------------------------
>
> Eduardo Dalcin
>
> Instituto de Pesquisas Jardim Bot?nico do Rio de Janeiro - JBRJ
>
> e-mail: edalcin at jbrj.gov.br<mailto:edalcin at jbrj.gov.br>
>
> Trabalho / Work: +55 21 3204 2116<tel:%2B55%2021%203204%202116>
>
> --------------------------------
>
> e-mail alternativo /  alternate email: edalcin at jbrj.org<mailto:edalcin at 
> jbrj.org>
>
> --------------------------------
>
> Agendar reuni?o / Schedule a meeting: 
> http://agendar.dalc.in<https://mailtrack.io/trace/link/db57b837be515d4b7caefe43d55b60467cd7c2c1?url=http%3A%2F%2Fagendar.dalc.in&signature=69b244942739c0f5>
>
>
>
>


_______________________________________________
API-users mailing list
API-users at lists.gbif.org<mailto:API-users at lists.gbif.org>
http://lists.gbif.org/mailman/listinfo/api-users





_______________________________________________
API-users mailing list
API-users at lists.gbif.org<mailto:API-users at lists.gbif.org>
http://lists.gbif.org/mailman/listinfo/api-users



_______________________________________________
API-users mailing list
API-users at lists.gbif.org<mailto:API-users at lists.gbif.org>
http://lists.gbif.org/mailman/listinfo/api-users


_______________________________________________
API-users mailing list
API-users at lists.gbif.org<mailto:API-users at lists.gbif.org>
http://lists.gbif.org/mailman/listinfo/api-users


_______________________________________________
API-users mailing list
API-users at lists.gbif.org<mailto:API-users at lists.gbif.org>
http://lists.gbif.org/mailman/listinfo/api-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://lists.gbif.org/pipermail/api-users/attachments/20150914/e7f529c9/attachment-0001.html>

Reply via email to