Eduardo,

another difference in using downloads periodically is that you get the 
interpreted data from us (together with the original if you want to).
That already contains quite a bit of data cleaning and aligning to controlled 
vocabularies that might be painful to reproduce otherwise. 
Also publishers are *very* often offline. Especially for the long running xml 
harvesting protocols (biocase,tapir,digir) this can be a bit of a challenge to 
index them entirely.

Markus


> On 09 Sep 2015, at 20:02, Eduardo Dalcin <edalcin at jbrj.org> wrote:
> 
> Thanks Alex. Food for thought.
> 
> Best,
> 
> Eduardo
> 
> 
> 
> --------------------------------
> Eduardo Dalcin 
> <https://mailtrack.io/trace/link/216b27d1db9fb6d2aa5a6d7b07aaf7a6b6b19ed9?url=http%3A%2F%2Feduardo.dalc.in&signature=fd3c844b73e3c400>
> Instituto de Pesquisas Jardim Bot?nico do Rio de Janeiro - JBRJ
> e-mail: edalcin at jbrj.gov.br <mailto:edalcin at jbrj.gov.br>
> Trabalho / Work: +55 21 3204 2116
> --------------------------------
> e-mail alternativo /  alternate email: edalcin at jbrj.org <mailto:edalcin at 
> jbrj.org>
> --------------------------------
> Agendar reuni?o / Schedule a meeting: http://agendar.dalc.in 
> <https://mailtrack.io/trace/link/dc51cf0b6ea2d4897c2ae7a70bab52163650de15?url=http%3A%2F%2Fagendar.dalc.in&signature=c87a5e78e01a7976>
> 
> On Wed, Sep 9, 2015 at 2:28 PM, Alex Thompson <godfoder at acis.ufl.edu 
> <mailto:godfoder at acis.ufl.edu>> wrote:
> I'm kind of seconding Rod here.
> 
> It might make more sense, depending on your use case and local computer 
> resources, to just get a download of Plantae *AND* Brazil from GBIF 
> periodically, then process that to exclude existing Brazilian datasets. You 
> could then use something like Apache hadoop / spark to efficiently split the 
> file by dataset or by institution code.
> 
> This would greatly simplify your interactions with GBIF (down to just 
> periodically generating a download programmatically) and you would have an 
> easy place to insert any additional data transformations you want. This is 
> the path i take for my work at least - the incremental cost of a couple 
> million more records is worth the reduction in complexity overall.
> 
> - Alex
> 
> 
> On 09/09/2015 12:16 PM, Eduardo Dalcin wrote:
>> Hi Rod,
>> 
>> The real purpose is to have a list of UUID and the "source web page" for the 
>> data set. Thus, one way to do it is to select those resources that counts <> 
>> 0 for PLANTAE *AND* Brazil.
>> 
>> I don't want to do any stats analysis, but feed up one local harverster / 
>> agregator.
>> 
>> The problem is, considering the reply from Jan Legind at Sep 3, we have to 
>> check one by one (https://goo.gl/3wysaA <https://goo.gl/3wysaA>) to check if 
>> it is a Herbarium / Preserved Specimen (Plantae) or not, from the request 
>> http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN
>>  
>> <http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN>.
>> 
>> Does it make sense?
>> 
>> Thanks for your curiosity! :)
>> 
>> Cheers,
>> 
>> Eduardo
>> 
>> 
>> 
>> --------------------------------
>> Eduardo Dalcin 
>> <https://mailtrack.io/trace/link/5516ed5e4f903c6ee9bd9fb3876fb65ffffc687c?url=http%3A%2F%2Feduardo.dalc.in&signature=cda9e9bf584a828c>
>> Instituto de Pesquisas Jardim Bot?nico do Rio de Janeiro - JBRJ
>> e-mail: edalcin at jbrj.gov.br <mailto:edalcin at jbrj.gov.br>
>> Trabalho / Work: +55 21 3204 2116 <tel:%2B55%2021%203204%202116>
>> --------------------------------
>> e-mail alternativo /  alternate email: edalcin at jbrj.org <mailto:edalcin 
>> at jbrj.org>
>> --------------------------------
>> Agendar reuni?o / Schedule a meeting: http://agendar.dalc.in 
>> <https://mailtrack.io/trace/link/3a5eaa1df56016285886497766577e5357ddc6c1?url=http%3A%2F%2Fagendar.dalc.in&signature=c4e8d8113c34937f>
>> On Mon, Sep 7, 2015 at 12:33 PM, Roderic Page <Roderic.Page at glasgow.ac.uk 
>> <mailto:Roderic.Page at glasgow.ac.uk>> wrote:
>> Hi Eduardo,
>> 
>> I?m curious, is the purpose to get counts by dataset by country, or to get 
>> all the plant occurrences for Brazil? The later can be obtained by 
>> downloading all plant occurrences in Brazil 
>> http://www.gbif.org/occurrence/search?TAXON_KEY=6&COUNTRY=BR 
>> <http://www.gbif.org/occurrence/search?TAXON_KEY=6&COUNTRY=BR> (you could 
>> then compute the per-dataset stats locally). I realise that this isn?t as 
>> convenient as having GBIF slice the data for you in the API.
>> 
>> Regards
>> 
>> Rod
>> 
>> ---------------------------------------------------------
>> Roderic Page
>> Professor of Taxonomy
>> Institute of Biodiversity, Animal Health and Comparative Medicine
>> College of Medical, Veterinary and Life Sciences
>> Graham Kerr Building
>> University of Glasgow
>> Glasgow G12 8QQ, UK
>> 
>> Email:  Roderic.Page at glasgow.ac.uk <mailto:Roderic.Page at glasgow.ac.uk>
>> Tel:  +44 141 330 4778 <tel:%2B44%20141%20330%204778>
>> Skype:  rdmpage
>> Facebook: 
>>                             http://www.facebook.com/rdmpage 
>> <http://www.facebook.com/rdmpage>
>> LinkedIn: 
>>                             http://uk.linkedin.com/in/rdmpage 
>> <http://uk.linkedin.com/in/rdmpage>
>> Twitter: 
>>                             http://twitter.com/rdmpage 
>> <http://twitter.com/rdmpage>
>> Blog:  http://iphylo.blogspot.com <http://iphylo.blogspot.com/>
>> ORCID:  http://orcid.org/0000-0002-7101-9767 
>> <http://orcid.org/0000-0002-7101-9767>
>> Citations:  http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ 
>> <http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ>
>> ResearchGate https://www.researchgate.net/profile/Roderic_Page 
>> <https://www.researchgate.net/profile/Roderic_Page>
>> 
>> 
>>> On 4 Sep 2015, at 10:39, Eduardo Dalcin <edalcin at jbrj.org 
>>> <mailto:edalcin at jbrj.org>> wrote:
>>> 
>>> Hi Markus,
>>> 
>>> Yes, that's a shame I can't have country and "nub" together. There is any 
>>> hope about it?
>>> 
>>> Eduardo
>>> 
>>> 
>>> 
>>> --------------------------------
>>> Eduardo Dalcin 
>>> <https://mailtrack.io/trace/link/bac23864202354f3789938ce352a878faa0cd8b8?url=http%3A%2F%2Feduardo.dalc.in&signature=aea58ef6f439535b>
>>> Instituto de Pesquisas Jardim Bot?nico do Rio de Janeiro - JBRJ
>>> e-mail: edalcin at jbrj.gov.br <mailto:edalcin at jbrj.gov.br>
>>> Trabalho / Work: +55 21 3204 2116 <tel:%2B55%2021%203204%202116>
>>> --------------------------------
>>> e-mail alternativo /  alternate email: edalcin at jbrj.org <mailto:edalcin 
>>> at jbrj.org>
>>> --------------------------------
>>> Agendar reuni?o / Schedule a meeting: http://agendar.dalc.in 
>>> <https://mailtrack.io/trace/link/db57b837be515d4b7caefe43d55b60467cd7c2c1?url=http%3A%2F%2Fagendar.dalc.in&signature=69b244942739c0f5>
>>> On Thu, Sep 3, 2015 at 4:29 PM, Markus D?ring <mdoering at gbif.org 
>>> <mailto:mdoering at gbif.org>> wrote:
>>> Eduardo,
>>> 
>>> as you might have seen from my issue comment the webservice uses a 
>>> different parameter name for taxonKey which is a bug we need to fix at some 
>>> point.
>>> Please use nubKey for now to use the service like that:
>>> 
>>> http://api.gbif.org/v1/occurrence/counts/datasets?nubKey=6 
>>> <http://api.gbif.org/v1/occurrence/counts/datasets?nubKey=6>
>>> 
>>> The real problem for you will be that we do not support the combination of 
>>> the country and the taxon filter, just one of the two. So you cannot search 
>>> for plants in Brazil I am afraid, just for datasets about Brazil and 
>>> datasets with plant records.
>>> 
>>> Markus
>>> 
>>> 
>>> 
>>> > On 03 Sep 2015, at 14:12, Eduardo Dalcin <edalcin at jbrj.org 
>>> > <mailto:edalcin at jbrj.org>> wrote:
>>> >
>>> > Thanks Jan. I'll keep exploring and I'll be in touch, if I need.
>>> >
>>> > Best,
>>> >
>>> > Eduardo
>>> >
>>> >
>>> >
>>> > --------------------------------
>>> > Eduardo Dalcin
>>> > Instituto de Pesquisas Jardim Bot?nico do Rio de Janeiro - JBRJ
>>> > e-mail: edalcin at jbrj.gov.br <mailto:edalcin at jbrj.gov.br>
>>> > Trabalho / Work: +55 21 3204 2116 <tel:%2B55%2021%203204%202116>
>>> > --------------------------------
>>> > e-mail alternativo /  alternate email: edalcin at jbrj.org 
>>> > <mailto:edalcin at jbrj.org>
>>> > --------------------------------
>>> > Agendar reuni?o / Schedule a meeting: http://agendar.dalc.in 
>>> > <https://mailtrack.io/trace/link/db57b837be515d4b7caefe43d55b60467cd7c2c1?url=http%3A%2F%2Fagendar.dalc.in&signature=69b244942739c0f5>
>>> >
>>> > On Thu, Sep 3, 2015 at 4:51 AM, Jan Legind [GBIF] <jlegind at gbif.org 
>>> > <mailto:jlegind at gbif.org>> wrote:
>>> > Dear Eduardo,
>>> >
>>> >
>>> >
>>> > Thanks for getting in touch with us about these issues.
>>> >
>>> >
>>> >
>>> > The first request 
>>> > http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN
>>> >  
>>> > <http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN>
>>> >  returns the number of records located in Brazil for the facets in the 
>>> > request.
>>> >
>>> > The second query 
>>> > http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN
>>> >  
>>> > <http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN>
>>> >  uses the Occurrence Inventories web service 
>>> > http://www.gbif.org/developer/occurrence#inventories 
>>> > <http://www.gbif.org/developer/occurrence#inventories> which does not 
>>> > support the basis-of-record facet in the /datasets request. I understand 
>>> > that it would be better if the API response yielded an error message in 
>>> > this instance.
>>> >
>>> >
>>> >
>>> > Concerning the other issues ? you are indeed right that the counts do not 
>>> > make sense in the context of taxon key 6 which is Plantae. Actually the 
>>> > API does not handle the taxonKey search at all, contrary to what the 
>>> > documentation states:
>>> >
>>> >
>>> >
>>> > /occurrence/counts/datasets
>>> >
>>> > GET
>>> >
>>> > Counts
>>> >
>>> > Lists occurrence counts for datasets that cover a given taxon or country.
>>> >
>>> > country, taxonKey
>>> >
>>> >
>>> >
>>> > As you can see here, 
>>> > http://api.gbif.org/v1/occurrence/counts/datasets?taxonKey=6 
>>> > <http://api.gbif.org/v1/occurrence/counts/datasets?taxonKey=6> , this 
>>> > request doesn?t return anything.
>>> >
>>> >
>>> >
>>> > The GBIF developers will handle this issue in due time.
>>> >
>>> > You can follow the issue in our bug tracking service here: 
>>> > http://dev.gbif.org/issues/browse/POR-2828 
>>> > <http://dev.gbif.org/issues/browse/POR-2828>
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > With best regards,
>>> >
>>> >
>>> >
>>> > Jan K. Legind
>>> >
>>> > Data manager, GBIF Secretariat
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > From: API-users [mailto:api-users-bounces at lists.gbif.org 
>>> > <mailto:api-users-bounces at lists.gbif.org>] On Behalf Of Eduardo Dalcin
>>> > Sent: 2. september 2015 20:06
>>> > To: api-users at lists.gbif.org <mailto:api-users at lists.gbif.org>; dev 
>>> > at gbif.org <mailto:dev at gbif.org>
>>> > Cc: Jo?o Monnerat Lanna; Nat?lia Queiroz; Diogo Silva; Laura; Ricardo 
>>> > Avancini
>>> > Subject: [API-users] Some questions from a begginer
>>> >
>>> >
>>> >
>>> > Hi folks,
>>> >
>>> >
>>> >
>>> > This is my first message to the list. So, please, be nice :)
>>> >
>>> >
>>> >
>>> > I'm working here at Rio de Janeiro Botanical Garden, together with the 
>>> > guys at the National Center for Flora Conservation. We are doing the risk 
>>> > assessment of the Brazilian flora to the government. We assess, so far, 
>>> > the risk of ca. 6.000 species, but we still have to assess ca. 35.000. 
>>> > Access occurrence records for Brazil is crucial, and every occurrence is 
>>> > important.
>>> >
>>> >
>>> >
>>> > That means that we have to put together occurrence data from different 
>>> > sources and, after the first batch of the risk assessment, we realize 
>>> > that we need to build up our aggregator. We are planning to do this with 
>>> > the Lontra-harvester, with the help of the guys at Brazilian GBIF Node.
>>> >
>>> >
>>> >
>>> > So, the one of the firsts steps was to list the available resources to 
>>> > understand the dimension of the task and, that brings me to my questions.
>>> >
>>> >
>>> >
>>> > First:
>>> >
>>> >
>>> >
>>> > The request:
>>> >
>>> >
>>> >
>>> > http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN
>>> >  
>>> > <http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN>
>>> >
>>> >
>>> >
>>> > returns 4.982.689 records
>>> >
>>> >
>>> >
>>> > And the request:
>>> >
>>> >
>>> >
>>> > http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN
>>> >  
>>> > <http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN>
>>> >
>>> >
>>> >
>>> > returns (here) 7.406.310 records
>>> >
>>> >
>>> >
>>> > Comments?
>>> >
>>> >
>>> >
>>> > Second:
>>> >
>>> >
>>> >
>>> > The request:
>>> >
>>> >
>>> >
>>> > http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN
>>> >  
>>> > <http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN>
>>> >
>>> >
>>> >
>>> > return things like this:
>>> >
>>> >
>>> >
>>> > "197908d0-5565-11d8-b290-b8a03c50a862":27629
>>> >
>>> >
>>> > But the consult of the same dataset:
>>> >
>>> >
>>> >
>>> > http://www.gbif.org/occurrence/search?TAXON_KEY=6&DATASET_KEY=197908d0-5565-11d8-b290-b8a03c50a862
>>> >  
>>> > <http://www.gbif.org/occurrence/search?TAXON_KEY=6&DATASET_KEY=197908d0-5565-11d8-b290-b8a03c50a862>
>>> >
>>> >
>>> >
>>> > Returns "null" (of course, is a FishBase!)
>>> >
>>> >
>>> >
>>> > I have plenty of examples like this, on yellow here (not finished!):
>>> >
>>> >
>>> >
>>> > https://docs.google.com/spreadsheets/d/1msUjwMLoKwnXxJFzF20SeN_C65RIkGLbwaYyj459VTc/edit?usp=sharing
>>> >  
>>> > <https://docs.google.com/spreadsheets/d/1msUjwMLoKwnXxJFzF20SeN_C65RIkGLbwaYyj459VTc/edit?usp=sharing>
>>> >
>>> >
>>> >
>>> > Comments?
>>> >
>>> >
>>> >
>>> > I think those two questions is a good start. Please, let me know if I'm 
>>> > doing something wrong.
>>> >
>>> >
>>> >
>>> > Cheers,
>>> >
>>> >
>>> >
>>> > Eduardo
>>> >
>>> > --------------------------------
>>> >
>>> > Eduardo Dalcin
>>> >
>>> > Instituto de Pesquisas Jardim Bot?nico do Rio de Janeiro - JBRJ
>>> >
>>> > e-mail: edalcin at jbrj.gov.br <mailto:edalcin at jbrj.gov.br>
>>> >
>>> > Trabalho / Work: +55 21 3204 2116 <tel:%2B55%2021%203204%202116>
>>> >
>>> > --------------------------------
>>> >
>>> > e-mail alternativo /  alternate email: edalcin at jbrj.org 
>>> > <mailto:edalcin at jbrj.org>
>>> >
>>> > --------------------------------
>>> >
>>> > Agendar reuni?o / Schedule a meeting: http://agendar.dalc.in 
>>> > <https://mailtrack.io/trace/link/db57b837be515d4b7caefe43d55b60467cd7c2c1?url=http%3A%2F%2Fagendar.dalc.in&signature=69b244942739c0f5>
>>> >
>>> >
>>> >
>>> >
>>> 
>>> 
>>> _______________________________________________
>>> API-users mailing list
>>> API-users at lists.gbif.org <mailto:API-users at lists.gbif.org>
>>> http://lists.gbif.org/mailman/listinfo/api-users 
>>> <http://lists.gbif.org/mailman/listinfo/api-users>
>> 
>> 
>> 
>> 
>> _______________________________________________
>> API-users mailing list
>> API-users at lists.gbif.org <mailto:API-users at lists.gbif.org>
>> http://lists.gbif.org/mailman/listinfo/api-users 
>> <http://lists.gbif.org/mailman/listinfo/api-users>
> 
> 
> _______________________________________________
> API-users mailing list
> API-users at lists.gbif.org <mailto:API-users at lists.gbif.org>
> http://lists.gbif.org/mailman/listinfo/api-users 
> <http://lists.gbif.org/mailman/listinfo/api-users>
> 
> 
> _______________________________________________
> API-users mailing list
> API-users at lists.gbif.org
> http://lists.gbif.org/mailman/listinfo/api-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://lists.gbif.org/pipermail/api-users/attachments/20150910/70cafda0/attachment-0001.html>

Reply via email to