Good points Markus, Thanks! However, other publishers are *very* online, like this example:
"The New York Botanical Garden Herbarium (NY) - Vascular Plant Collection" ( http://www.gbif.org/dataset/d415c253-4d61-4459-9d25-4015b9084fb0 <https://mailtrack.io/trace/link/0a54ebc017ec4ddde255d8f470cf1d5eb58d6ff1?url=http%3A%2F%2Fwww.gbif.org%2Fdataset%2Fd415c253-4d61-4459-9d25-4015b9084fb0&signature=9fbac047f6b2d815>) and the "Herbarium of The New York Botanical Garden" ( http://www.gbif.org/dataset/7133ff0a-f762-11e1-a439-00145eb45e9a <https://mailtrack.io/trace/link/c5595e540f23c50c332c5d3aba65d9b857daec6c?url=http%3A%2F%2Fwww.gbif.org%2Fdataset%2F7133ff0a-f762-11e1-a439-00145eb45e9a&signature=3cd4b1e2eec64e92> ). Same stuff, twice. The thing is that when we search for, for instance, "Belemia fucsioides" we got a duplication of records of the same entity: ? http://www.gbif.org/occurrence/216419815 <https://mailtrack.io/trace/link/ec43e42a6e6e903eea24db7611a53591ef91ecff?url=http%3A%2F%2Fwww.gbif.org%2Foccurrence%2F216419815&signature=48bd3b924b438606> http://www.gbif.org/occurrence/1098393958 <https://mailtrack.io/trace/link/9e4d9ffa65cef4747df77c1c708df94d1da1b929?url=http%3A%2F%2Fwww.gbif.org%2Foccurrence%2F1098393958&signature=6d279f2c395d493a> This is very annoying and give us a lot of work to clean up. Cheers, Eduardo -------------------------------- *Eduardo Dalcin <https://mailtrack.io/trace/link/12fd73de9c0d11461d2da7249c58967486d95ffb?url=http%3A%2F%2Feduardo.dalc.in&signature=b76aae61fa71c8a0>* Instituto de Pesquisas Jardim Bot?nico do Rio de Janeiro - JBRJ e-mail: edalcin at jbrj.gov.br Trabalho / Work: +55 21 3204 2116 -------------------------------- *e-mail alternativo / * *alternate email:** edalcin at jbrj.org <edalcin at jbrj.org>* -------------------------------- Agendar reuni?o / Schedule a meeting: http://agendar.dalc.in <https://mailtrack.io/trace/link/8eb76452df5772642c41cbc47d035ab63fb88da6?url=http%3A%2F%2Fagendar.dalc.in&signature=db7d545fe68e0cb0> On Thu, Sep 10, 2015 at 4:50 AM, Markus D?ring <mdoering at gbif.org> wrote: > Eduardo, > > another difference in using downloads periodically is that you get the > interpreted data from us (together with the original if you want to). > That already contains quite a bit of data cleaning and aligning to > controlled vocabularies that might be painful to reproduce otherwise. > Also publishers are *very* often offline. Especially for the long running > xml harvesting protocols (biocase,tapir,digir) this can be a bit of a > challenge to index them entirely. > > Markus > > > On 09 Sep 2015, at 20:02, Eduardo Dalcin <edalcin at jbrj.org> wrote: > > Thanks Alex. Food for thought. > > Best, > > Eduardo > > > -------------------------------- > *Eduardo Dalcin > <https://mailtrack.io/trace/link/216b27d1db9fb6d2aa5a6d7b07aaf7a6b6b19ed9?url=http%3A%2F%2Feduardo.dalc.in&signature=fd3c844b73e3c400>* > Instituto de Pesquisas Jardim Bot?nico do Rio de Janeiro - JBRJ > e-mail: edalcin at jbrj.gov.br > Trabalho / Work: +55 21 3204 2116 > -------------------------------- > *e-mail alternativo / * *alternate email:** edalcin at jbrj.org > <edalcin at jbrj.org>* > -------------------------------- > Agendar reuni?o / Schedule a meeting: http://agendar.dalc.in > <https://mailtrack.io/trace/link/dc51cf0b6ea2d4897c2ae7a70bab52163650de15?url=http%3A%2F%2Fagendar.dalc.in&signature=c87a5e78e01a7976> > > On Wed, Sep 9, 2015 at 2:28 PM, Alex Thompson <godfoder at acis.ufl.edu> > wrote: > >> I'm kind of seconding Rod here. >> >> It might make more sense, depending on your use case and local computer >> resources, to just get a download of Plantae *AND* Brazil from GBIF >> periodically, then process that to exclude existing Brazilian datasets. You >> could then use something like Apache hadoop / spark to efficiently split >> the file by dataset or by institution code. >> >> This would greatly simplify your interactions with GBIF (down to just >> periodically generating a download programmatically) and you would have an >> easy place to insert any additional data transformations you want. This is >> the path i take for my work at least - the incremental cost of a couple >> million more records is worth the reduction in complexity overall. >> >> - Alex >> >> >> On 09/09/2015 12:16 PM, Eduardo Dalcin wrote: >> >> Hi Rod, >> >> The real purpose is to have a list of UUID and the "source web page" for >> the data set. Thus, one way to do it is to select those resources that >> counts <> 0 for PLANTAE *AND* Brazil. >> >> I don't want to do any stats analysis, but feed up one local harverster / >> agregator. >> >> The problem is, considering the reply from Jan Legind at Sep 3, we have >> to check one by one (https://goo.gl/3wysaA) to check if it is a >> Herbarium / Preserved Specimen (Plantae) or not, from the request >> http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN >> . >> >> Does it make sense? >> >> Thanks for your curiosity! :) >> >> Cheers, >> >> Eduardo >> >> >> -------------------------------- >> *Eduardo Dalcin >> <https://mailtrack.io/trace/link/5516ed5e4f903c6ee9bd9fb3876fb65ffffc687c?url=http%3A%2F%2Feduardo.dalc.in&signature=cda9e9bf584a828c>* >> Instituto de Pesquisas Jardim Bot?nico do Rio de Janeiro - JBRJ >> e-mail: edalcin at jbrj.gov.br >> Trabalho / Work: +55 21 3204 2116 >> -------------------------------- >> *e-mail alternativo / * *alternate email:** edalcin at jbrj.org >> <edalcin at jbrj.org>* >> -------------------------------- >> Agendar reuni?o / Schedule a meeting: http://agendar.dalc.in >> <https://mailtrack.io/trace/link/3a5eaa1df56016285886497766577e5357ddc6c1?url=http%3A%2F%2Fagendar.dalc.in&signature=c4e8d8113c34937f> >> >> On Mon, Sep 7, 2015 at 12:33 PM, Roderic Page <Roderic.Page at glasgow.ac.uk >> > wrote: >> >>> Hi Eduardo, >>> >>> I?m curious, is the purpose to get counts by dataset by country, or to >>> get all the plant occurrences for Brazil? The later can be obtained by >>> downloading all plant occurrences in Brazil >>> http://www.gbif.org/occurrence/search?TAXON_KEY=6&COUNTRY=BR (you could >>> then compute the per-dataset stats locally). I realise that this isn?t as >>> convenient as having GBIF slice the data for you in the API. >>> >>> Regards >>> >>> Rod >>> >>> --------------------------------------------------------- >>> Roderic Page >>> Professor of Taxonomy >>> Institute of Biodiversity, Animal Health and Comparative Medicine >>> College of Medical, Veterinary and Life Sciences >>> Graham Kerr Building >>> University of Glasgow >>> Glasgow G12 8QQ, UK >>> >>> Email: Roderic.Page at glasgow.ac.uk >>> Tel: +44 141 330 4778 <%2B44%20141%20330%204778> >>> Skype: rdmpage >>> Facebook: http://www.facebook.com/rdmpage >>> LinkedIn: http://uk.linkedin.com/in/rdmpage >>> Twitter: http://twitter.com/rdmpage >>> Blog: http://iphylo.blogspot.com >>> ORCID: http://orcid.org/0000-0002-7101-9767 >>> Citations: >>> http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ >>> ResearchGate https://www.researchgate.net/profile/Roderic_Page >>> >>> >>> On 4 Sep 2015, at 10:39, Eduardo Dalcin <edalcin at jbrj.org> wrote: >>> >>> Hi Markus, >>> >>> Yes, that's a shame I can't have country and "nub" together. There is >>> any hope about it? >>> >>> Eduardo >>> >>> >>> -------------------------------- >>> *Eduardo Dalcin >>> <https://mailtrack.io/trace/link/bac23864202354f3789938ce352a878faa0cd8b8?url=http%3A%2F%2Feduardo.dalc.in&signature=aea58ef6f439535b>* >>> Instituto de Pesquisas Jardim Bot?nico do Rio de Janeiro - JBRJ >>> e-mail: edalcin at jbrj.gov.br >>> Trabalho / Work: +55 21 3204 2116 >>> -------------------------------- >>> *e-mail alternativo / * *alternate email:** edalcin at jbrj.org >>> <edalcin at jbrj.org>* >>> -------------------------------- >>> Agendar reuni?o / Schedule a meeting: http://agendar.dalc.in >>> <https://mailtrack.io/trace/link/db57b837be515d4b7caefe43d55b60467cd7c2c1?url=http%3A%2F%2Fagendar.dalc.in&signature=69b244942739c0f5> >>> >>> On Thu, Sep 3, 2015 at 4:29 PM, Markus D?ring <mdoering at gbif.org> wrote: >>> >>>> Eduardo, >>>> >>>> as you might have seen from my issue comment the webservice uses a >>>> different parameter name for taxonKey which is a bug we need to fix at some >>>> point. >>>> Please use nubKey for now to use the service like that: >>>> >>>> http://api.gbif.org/v1/occurrence/counts/datasets?nubKey=6 >>>> >>>> The real problem for you will be that we do not support the combination >>>> of the country and the taxon filter, just one of the two. So you cannot >>>> search for plants in Brazil I am afraid, just for datasets about Brazil and >>>> datasets with plant records. >>>> >>>> Markus >>>> >>>> >>>> >>>> > On 03 Sep 2015, at 14:12, Eduardo Dalcin <edalcin at jbrj.org> wrote: >>>> > >>>> > Thanks Jan. I'll keep exploring and I'll be in touch, if I need. >>>> > >>>> > Best, >>>> > >>>> > Eduardo >>>> > >>>> > >>>> > >>>> > -------------------------------- >>>> > Eduardo Dalcin >>>> > Instituto de Pesquisas Jardim Bot?nico do Rio de Janeiro - JBRJ >>>> > e-mail: edalcin at jbrj.gov.br >>>> > Trabalho / Work: +55 21 3204 2116 >>>> > -------------------------------- >>>> > e-mail alternativo / alternate email: edalcin at jbrj.org >>>> > -------------------------------- >>>> > Agendar reuni?o / Schedule a meeting: http://agendar.dalc.in >>>> <https://mailtrack.io/trace/link/db57b837be515d4b7caefe43d55b60467cd7c2c1?url=http%3A%2F%2Fagendar.dalc.in&signature=69b244942739c0f5> >>>> > >>>> > On Thu, Sep 3, 2015 at 4:51 AM, Jan Legind [GBIF] <jlegind at gbif.org> >>>> wrote: >>>> > Dear Eduardo, >>>> > >>>> > >>>> > >>>> > Thanks for getting in touch with us about these issues. >>>> > >>>> > >>>> > >>>> > The first request >>>> http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN >>>> returns the number of records located in Brazil for the facets in the >>>> request. >>>> > >>>> > The second query >>>> http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN >>>> uses the Occurrence Inventories web service >>>> http://www.gbif.org/developer/occurrence#inventories which does not >>>> support the basis-of-record facet in the /datasets request. I understand >>>> that it would be better if the API response yielded an error message in >>>> this instance. >>>> > >>>> > >>>> > >>>> > Concerning the other issues ? you are indeed right that the counts do >>>> not make sense in the context of taxon key 6 which is Plantae. Actually the >>>> API does not handle the taxonKey search at all, contrary to what the >>>> documentation states: >>>> > >>>> > >>>> > >>>> > /occurrence/counts/datasets >>>> > >>>> > GET >>>> > >>>> > Counts >>>> > >>>> > Lists occurrence counts for datasets that cover a given taxon or >>>> country. >>>> > >>>> > country, taxonKey >>>> > >>>> > >>>> > >>>> > As you can see here, >>>> http://api.gbif.org/v1/occurrence/counts/datasets?taxonKey=6 , this >>>> request doesn?t return anything. >>>> > >>>> > >>>> > >>>> > The GBIF developers will handle this issue in due time. >>>> > >>>> > You can follow the issue in our bug tracking service here: >>>> http://dev.gbif.org/issues/browse/POR-2828 >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > With best regards, >>>> > >>>> > >>>> > >>>> > Jan K. Legind >>>> > >>>> > Data manager, GBIF Secretariat >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > From: API-users [mailto:api-users-bounces at lists.gbif.org] On Behalf >>>> Of Eduardo Dalcin >>>> > Sent: 2. september 2015 20:06 >>>> > To: api-users at lists.gbif.org; dev at gbif.org >>>> > Cc: Jo?o Monnerat Lanna; Nat?lia Queiroz; Diogo Silva; Laura; Ricardo >>>> Avancini >>>> > Subject: [API-users] Some questions from a begginer >>>> > >>>> > >>>> > >>>> > Hi folks, >>>> > >>>> > >>>> > >>>> > This is my first message to the list. So, please, be nice :) >>>> > >>>> > >>>> > >>>> > I'm working here at Rio de Janeiro Botanical Garden, together with >>>> the guys at the National Center for Flora Conservation. We are doing the >>>> risk assessment of the Brazilian flora to the government. We assess, so >>>> far, the risk of ca. 6.000 species, but we still have to assess ca. 35.000. >>>> Access occurrence records for Brazil is crucial, and every occurrence is >>>> important. >>>> > >>>> > >>>> > >>>> > That means that we have to put together occurrence data from >>>> different sources and, after the first batch of the risk assessment, we >>>> realize that we need to build up our aggregator. We are planning to do this >>>> with the Lontra-harvester, with the help of the guys at Brazilian GBIF >>>> Node. >>>> > >>>> > >>>> > >>>> > So, the one of the firsts steps was to list the available resources >>>> to understand the dimension of the task and, that brings me to my >>>> questions. >>>> > >>>> > >>>> > >>>> > First: >>>> > >>>> > >>>> > >>>> > The request: >>>> > >>>> > >>>> > >>>> > >>>> http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN >>>> > >>>> > >>>> > >>>> > returns 4.982.689 records >>>> > >>>> > >>>> > >>>> > And the request: >>>> > >>>> > >>>> > >>>> > >>>> http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN >>>> > >>>> > >>>> > >>>> > returns (here) 7.406.310 records >>>> > >>>> > >>>> > >>>> > Comments? >>>> > >>>> > >>>> > >>>> > Second: >>>> > >>>> > >>>> > >>>> > The request: >>>> > >>>> > >>>> > >>>> > >>>> http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN >>>> > >>>> > >>>> > >>>> > return things like this: >>>> > >>>> > >>>> > >>>> > "197908d0-5565-11d8-b290-b8a03c50a862":27629 >>>> > >>>> > >>>> > But the consult of the same dataset: >>>> > >>>> > >>>> > >>>> > >>>> http://www.gbif.org/occurrence/search?TAXON_KEY=6&DATASET_KEY=197908d0-5565-11d8-b290-b8a03c50a862 >>>> > >>>> > >>>> > >>>> > Returns "null" (of course, is a FishBase!) >>>> > >>>> > >>>> > >>>> > I have plenty of examples like this, on yellow here (not finished!): >>>> > >>>> > >>>> > >>>> > >>>> https://docs.google.com/spreadsheets/d/1msUjwMLoKwnXxJFzF20SeN_C65RIkGLbwaYyj459VTc/edit?usp=sharing >>>> > >>>> > >>>> > >>>> > Comments? >>>> > >>>> > >>>> > >>>> > I think those two questions is a good start. Please, let me know if >>>> I'm doing something wrong. >>>> > >>>> > >>>> > >>>> > Cheers, >>>> > >>>> > >>>> > >>>> > Eduardo >>>> > >>>> > -------------------------------- >>>> > >>>> > Eduardo Dalcin >>>> > >>>> > Instituto de Pesquisas Jardim Bot?nico do Rio de Janeiro - JBRJ >>>> > >>>> > e-mail: edalcin at jbrj.gov.br >>>> > >>>> > Trabalho / Work: +55 21 3204 2116 >>>> > >>>> > -------------------------------- >>>> > >>>> > e-mail alternativo / alternate email: edalcin at jbrj.org >>>> > >>>> > -------------------------------- >>>> > >>>> > Agendar reuni?o / Schedule a meeting: http://agendar.dalc.in >>>> <https://mailtrack.io/trace/link/db57b837be515d4b7caefe43d55b60467cd7c2c1?url=http%3A%2F%2Fagendar.dalc.in&signature=69b244942739c0f5> >>>> > >>>> > >>>> > >>>> > >>>> >>>> >>> _______________________________________________ >>> API-users mailing list >>> API-users at lists.gbif.org >>> http://lists.gbif.org/mailman/listinfo/api-users >>> >>> >>> >> >> >> _______________________________________________ >> API-users mailing listAPI-users at >> lists.gbif.orghttp://lists.gbif.org/mailman/listinfo/api-users >> >> >> >> _______________________________________________ >> API-users mailing list >> API-users at lists.gbif.org >> http://lists.gbif.org/mailman/listinfo/api-users >> >> > _______________________________________________ > API-users mailing list > API-users at lists.gbif.org > http://lists.gbif.org/mailman/listinfo/api-users > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gbif.org/pipermail/api-users/attachments/20150914/64560c76/attachment-0001.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: FireShot Pro Screen Capture #076 - 'Occurrence Search Results' - www_gbif_org_occurrence_search_TAXON_KEY=5553637.png Type: image/png Size: 21327 bytes Desc: not available URL: <http://lists.gbif.org/pipermail/api-users/attachments/20150914/64560c76/attachment-0001.png>
