Eduardo, another difference in using downloads periodically is that you get the interpreted data from us (together with the original if you want to). That already contains quite a bit of data cleaning and aligning to controlled vocabularies that might be painful to reproduce otherwise. Also publishers are *very* often offline. Especially for the long running xml harvesting protocols (biocase,tapir,digir) this can be a bit of a challenge to index them entirely.
Markus > On 09 Sep 2015, at 20:02, Eduardo Dalcin <edalcin at jbrj.org> wrote: > > Thanks Alex. Food for thought. > > Best, > > Eduardo > > > > -------------------------------- > Eduardo Dalcin > <https://mailtrack.io/trace/link/216b27d1db9fb6d2aa5a6d7b07aaf7a6b6b19ed9?url=http%3A%2F%2Feduardo.dalc.in&signature=fd3c844b73e3c400> > Instituto de Pesquisas Jardim Bot?nico do Rio de Janeiro - JBRJ > e-mail: edalcin at jbrj.gov.br <mailto:edalcin at jbrj.gov.br> > Trabalho / Work: +55 21 3204 2116 > -------------------------------- > e-mail alternativo / alternate email: edalcin at jbrj.org <mailto:edalcin at > jbrj.org> > -------------------------------- > Agendar reuni?o / Schedule a meeting: http://agendar.dalc.in > <https://mailtrack.io/trace/link/dc51cf0b6ea2d4897c2ae7a70bab52163650de15?url=http%3A%2F%2Fagendar.dalc.in&signature=c87a5e78e01a7976> > > On Wed, Sep 9, 2015 at 2:28 PM, Alex Thompson <godfoder at acis.ufl.edu > <mailto:godfoder at acis.ufl.edu>> wrote: > I'm kind of seconding Rod here. > > It might make more sense, depending on your use case and local computer > resources, to just get a download of Plantae *AND* Brazil from GBIF > periodically, then process that to exclude existing Brazilian datasets. You > could then use something like Apache hadoop / spark to efficiently split the > file by dataset or by institution code. > > This would greatly simplify your interactions with GBIF (down to just > periodically generating a download programmatically) and you would have an > easy place to insert any additional data transformations you want. This is > the path i take for my work at least - the incremental cost of a couple > million more records is worth the reduction in complexity overall. > > - Alex > > > On 09/09/2015 12:16 PM, Eduardo Dalcin wrote: >> Hi Rod, >> >> The real purpose is to have a list of UUID and the "source web page" for the >> data set. Thus, one way to do it is to select those resources that counts <> >> 0 for PLANTAE *AND* Brazil. >> >> I don't want to do any stats analysis, but feed up one local harverster / >> agregator. >> >> The problem is, considering the reply from Jan Legind at Sep 3, we have to >> check one by one (https://goo.gl/3wysaA <https://goo.gl/3wysaA>) to check if >> it is a Herbarium / Preserved Specimen (Plantae) or not, from the request >> http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN >> >> <http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN>. >> >> Does it make sense? >> >> Thanks for your curiosity! :) >> >> Cheers, >> >> Eduardo >> >> >> >> -------------------------------- >> Eduardo Dalcin >> <https://mailtrack.io/trace/link/5516ed5e4f903c6ee9bd9fb3876fb65ffffc687c?url=http%3A%2F%2Feduardo.dalc.in&signature=cda9e9bf584a828c> >> Instituto de Pesquisas Jardim Bot?nico do Rio de Janeiro - JBRJ >> e-mail: edalcin at jbrj.gov.br <mailto:edalcin at jbrj.gov.br> >> Trabalho / Work: +55 21 3204 2116 <tel:%2B55%2021%203204%202116> >> -------------------------------- >> e-mail alternativo / alternate email: edalcin at jbrj.org <mailto:edalcin >> at jbrj.org> >> -------------------------------- >> Agendar reuni?o / Schedule a meeting: http://agendar.dalc.in >> <https://mailtrack.io/trace/link/3a5eaa1df56016285886497766577e5357ddc6c1?url=http%3A%2F%2Fagendar.dalc.in&signature=c4e8d8113c34937f> >> On Mon, Sep 7, 2015 at 12:33 PM, Roderic Page <Roderic.Page at glasgow.ac.uk >> <mailto:Roderic.Page at glasgow.ac.uk>> wrote: >> Hi Eduardo, >> >> I?m curious, is the purpose to get counts by dataset by country, or to get >> all the plant occurrences for Brazil? The later can be obtained by >> downloading all plant occurrences in Brazil >> http://www.gbif.org/occurrence/search?TAXON_KEY=6&COUNTRY=BR >> <http://www.gbif.org/occurrence/search?TAXON_KEY=6&COUNTRY=BR> (you could >> then compute the per-dataset stats locally). I realise that this isn?t as >> convenient as having GBIF slice the data for you in the API. >> >> Regards >> >> Rod >> >> --------------------------------------------------------- >> Roderic Page >> Professor of Taxonomy >> Institute of Biodiversity, Animal Health and Comparative Medicine >> College of Medical, Veterinary and Life Sciences >> Graham Kerr Building >> University of Glasgow >> Glasgow G12 8QQ, UK >> >> Email: Roderic.Page at glasgow.ac.uk <mailto:Roderic.Page at glasgow.ac.uk> >> Tel: +44 141 330 4778 <tel:%2B44%20141%20330%204778> >> Skype: rdmpage >> Facebook: >> http://www.facebook.com/rdmpage >> <http://www.facebook.com/rdmpage> >> LinkedIn: >> http://uk.linkedin.com/in/rdmpage >> <http://uk.linkedin.com/in/rdmpage> >> Twitter: >> http://twitter.com/rdmpage >> <http://twitter.com/rdmpage> >> Blog: http://iphylo.blogspot.com <http://iphylo.blogspot.com/> >> ORCID: http://orcid.org/0000-0002-7101-9767 >> <http://orcid.org/0000-0002-7101-9767> >> Citations: http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ >> <http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ> >> ResearchGate https://www.researchgate.net/profile/Roderic_Page >> <https://www.researchgate.net/profile/Roderic_Page> >> >> >>> On 4 Sep 2015, at 10:39, Eduardo Dalcin <edalcin at jbrj.org >>> <mailto:edalcin at jbrj.org>> wrote: >>> >>> Hi Markus, >>> >>> Yes, that's a shame I can't have country and "nub" together. There is any >>> hope about it? >>> >>> Eduardo >>> >>> >>> >>> -------------------------------- >>> Eduardo Dalcin >>> <https://mailtrack.io/trace/link/bac23864202354f3789938ce352a878faa0cd8b8?url=http%3A%2F%2Feduardo.dalc.in&signature=aea58ef6f439535b> >>> Instituto de Pesquisas Jardim Bot?nico do Rio de Janeiro - JBRJ >>> e-mail: edalcin at jbrj.gov.br <mailto:edalcin at jbrj.gov.br> >>> Trabalho / Work: +55 21 3204 2116 <tel:%2B55%2021%203204%202116> >>> -------------------------------- >>> e-mail alternativo / alternate email: edalcin at jbrj.org <mailto:edalcin >>> at jbrj.org> >>> -------------------------------- >>> Agendar reuni?o / Schedule a meeting: http://agendar.dalc.in >>> <https://mailtrack.io/trace/link/db57b837be515d4b7caefe43d55b60467cd7c2c1?url=http%3A%2F%2Fagendar.dalc.in&signature=69b244942739c0f5> >>> On Thu, Sep 3, 2015 at 4:29 PM, Markus D?ring <mdoering at gbif.org >>> <mailto:mdoering at gbif.org>> wrote: >>> Eduardo, >>> >>> as you might have seen from my issue comment the webservice uses a >>> different parameter name for taxonKey which is a bug we need to fix at some >>> point. >>> Please use nubKey for now to use the service like that: >>> >>> http://api.gbif.org/v1/occurrence/counts/datasets?nubKey=6 >>> <http://api.gbif.org/v1/occurrence/counts/datasets?nubKey=6> >>> >>> The real problem for you will be that we do not support the combination of >>> the country and the taxon filter, just one of the two. So you cannot search >>> for plants in Brazil I am afraid, just for datasets about Brazil and >>> datasets with plant records. >>> >>> Markus >>> >>> >>> >>> > On 03 Sep 2015, at 14:12, Eduardo Dalcin <edalcin at jbrj.org >>> > <mailto:edalcin at jbrj.org>> wrote: >>> > >>> > Thanks Jan. I'll keep exploring and I'll be in touch, if I need. >>> > >>> > Best, >>> > >>> > Eduardo >>> > >>> > >>> > >>> > -------------------------------- >>> > Eduardo Dalcin >>> > Instituto de Pesquisas Jardim Bot?nico do Rio de Janeiro - JBRJ >>> > e-mail: edalcin at jbrj.gov.br <mailto:edalcin at jbrj.gov.br> >>> > Trabalho / Work: +55 21 3204 2116 <tel:%2B55%2021%203204%202116> >>> > -------------------------------- >>> > e-mail alternativo / alternate email: edalcin at jbrj.org >>> > <mailto:edalcin at jbrj.org> >>> > -------------------------------- >>> > Agendar reuni?o / Schedule a meeting: http://agendar.dalc.in >>> > <https://mailtrack.io/trace/link/db57b837be515d4b7caefe43d55b60467cd7c2c1?url=http%3A%2F%2Fagendar.dalc.in&signature=69b244942739c0f5> >>> > >>> > On Thu, Sep 3, 2015 at 4:51 AM, Jan Legind [GBIF] <jlegind at gbif.org >>> > <mailto:jlegind at gbif.org>> wrote: >>> > Dear Eduardo, >>> > >>> > >>> > >>> > Thanks for getting in touch with us about these issues. >>> > >>> > >>> > >>> > The first request >>> > http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN >>> > >>> > <http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN> >>> > returns the number of records located in Brazil for the facets in the >>> > request. >>> > >>> > The second query >>> > http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN >>> > >>> > <http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN> >>> > uses the Occurrence Inventories web service >>> > http://www.gbif.org/developer/occurrence#inventories >>> > <http://www.gbif.org/developer/occurrence#inventories> which does not >>> > support the basis-of-record facet in the /datasets request. I understand >>> > that it would be better if the API response yielded an error message in >>> > this instance. >>> > >>> > >>> > >>> > Concerning the other issues ? you are indeed right that the counts do not >>> > make sense in the context of taxon key 6 which is Plantae. Actually the >>> > API does not handle the taxonKey search at all, contrary to what the >>> > documentation states: >>> > >>> > >>> > >>> > /occurrence/counts/datasets >>> > >>> > GET >>> > >>> > Counts >>> > >>> > Lists occurrence counts for datasets that cover a given taxon or country. >>> > >>> > country, taxonKey >>> > >>> > >>> > >>> > As you can see here, >>> > http://api.gbif.org/v1/occurrence/counts/datasets?taxonKey=6 >>> > <http://api.gbif.org/v1/occurrence/counts/datasets?taxonKey=6> , this >>> > request doesn?t return anything. >>> > >>> > >>> > >>> > The GBIF developers will handle this issue in due time. >>> > >>> > You can follow the issue in our bug tracking service here: >>> > http://dev.gbif.org/issues/browse/POR-2828 >>> > <http://dev.gbif.org/issues/browse/POR-2828> >>> > >>> > >>> > >>> > >>> > >>> > With best regards, >>> > >>> > >>> > >>> > Jan K. Legind >>> > >>> > Data manager, GBIF Secretariat >>> > >>> > >>> > >>> > >>> > >>> > From: API-users [mailto:api-users-bounces at lists.gbif.org >>> > <mailto:api-users-bounces at lists.gbif.org>] On Behalf Of Eduardo Dalcin >>> > Sent: 2. september 2015 20:06 >>> > To: api-users at lists.gbif.org <mailto:api-users at lists.gbif.org>; dev >>> > at gbif.org <mailto:dev at gbif.org> >>> > Cc: Jo?o Monnerat Lanna; Nat?lia Queiroz; Diogo Silva; Laura; Ricardo >>> > Avancini >>> > Subject: [API-users] Some questions from a begginer >>> > >>> > >>> > >>> > Hi folks, >>> > >>> > >>> > >>> > This is my first message to the list. So, please, be nice :) >>> > >>> > >>> > >>> > I'm working here at Rio de Janeiro Botanical Garden, together with the >>> > guys at the National Center for Flora Conservation. We are doing the risk >>> > assessment of the Brazilian flora to the government. We assess, so far, >>> > the risk of ca. 6.000 species, but we still have to assess ca. 35.000. >>> > Access occurrence records for Brazil is crucial, and every occurrence is >>> > important. >>> > >>> > >>> > >>> > That means that we have to put together occurrence data from different >>> > sources and, after the first batch of the risk assessment, we realize >>> > that we need to build up our aggregator. We are planning to do this with >>> > the Lontra-harvester, with the help of the guys at Brazilian GBIF Node. >>> > >>> > >>> > >>> > So, the one of the firsts steps was to list the available resources to >>> > understand the dimension of the task and, that brings me to my questions. >>> > >>> > >>> > >>> > First: >>> > >>> > >>> > >>> > The request: >>> > >>> > >>> > >>> > http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN >>> > >>> > <http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN> >>> > >>> > >>> > >>> > returns 4.982.689 records >>> > >>> > >>> > >>> > And the request: >>> > >>> > >>> > >>> > http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN >>> > >>> > <http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN> >>> > >>> > >>> > >>> > returns (here) 7.406.310 records >>> > >>> > >>> > >>> > Comments? >>> > >>> > >>> > >>> > Second: >>> > >>> > >>> > >>> > The request: >>> > >>> > >>> > >>> > http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN >>> > >>> > <http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN> >>> > >>> > >>> > >>> > return things like this: >>> > >>> > >>> > >>> > "197908d0-5565-11d8-b290-b8a03c50a862":27629 >>> > >>> > >>> > But the consult of the same dataset: >>> > >>> > >>> > >>> > http://www.gbif.org/occurrence/search?TAXON_KEY=6&DATASET_KEY=197908d0-5565-11d8-b290-b8a03c50a862 >>> > >>> > <http://www.gbif.org/occurrence/search?TAXON_KEY=6&DATASET_KEY=197908d0-5565-11d8-b290-b8a03c50a862> >>> > >>> > >>> > >>> > Returns "null" (of course, is a FishBase!) >>> > >>> > >>> > >>> > I have plenty of examples like this, on yellow here (not finished!): >>> > >>> > >>> > >>> > https://docs.google.com/spreadsheets/d/1msUjwMLoKwnXxJFzF20SeN_C65RIkGLbwaYyj459VTc/edit?usp=sharing >>> > >>> > <https://docs.google.com/spreadsheets/d/1msUjwMLoKwnXxJFzF20SeN_C65RIkGLbwaYyj459VTc/edit?usp=sharing> >>> > >>> > >>> > >>> > Comments? >>> > >>> > >>> > >>> > I think those two questions is a good start. Please, let me know if I'm >>> > doing something wrong. >>> > >>> > >>> > >>> > Cheers, >>> > >>> > >>> > >>> > Eduardo >>> > >>> > -------------------------------- >>> > >>> > Eduardo Dalcin >>> > >>> > Instituto de Pesquisas Jardim Bot?nico do Rio de Janeiro - JBRJ >>> > >>> > e-mail: edalcin at jbrj.gov.br <mailto:edalcin at jbrj.gov.br> >>> > >>> > Trabalho / Work: +55 21 3204 2116 <tel:%2B55%2021%203204%202116> >>> > >>> > -------------------------------- >>> > >>> > e-mail alternativo / alternate email: edalcin at jbrj.org >>> > <mailto:edalcin at jbrj.org> >>> > >>> > -------------------------------- >>> > >>> > Agendar reuni?o / Schedule a meeting: http://agendar.dalc.in >>> > <https://mailtrack.io/trace/link/db57b837be515d4b7caefe43d55b60467cd7c2c1?url=http%3A%2F%2Fagendar.dalc.in&signature=69b244942739c0f5> >>> > >>> > >>> > >>> > >>> >>> >>> _______________________________________________ >>> API-users mailing list >>> API-users at lists.gbif.org <mailto:API-users at lists.gbif.org> >>> http://lists.gbif.org/mailman/listinfo/api-users >>> <http://lists.gbif.org/mailman/listinfo/api-users> >> >> >> >> >> _______________________________________________ >> API-users mailing list >> API-users at lists.gbif.org <mailto:API-users at lists.gbif.org> >> http://lists.gbif.org/mailman/listinfo/api-users >> <http://lists.gbif.org/mailman/listinfo/api-users> > > > _______________________________________________ > API-users mailing list > API-users at lists.gbif.org <mailto:API-users at lists.gbif.org> > http://lists.gbif.org/mailman/listinfo/api-users > <http://lists.gbif.org/mailman/listinfo/api-users> > > > _______________________________________________ > API-users mailing list > API-users at lists.gbif.org > http://lists.gbif.org/mailman/listinfo/api-users -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gbif.org/pipermail/api-users/attachments/20150910/70cafda0/attachment-0001.html>
