Javier, Thank you for your suggestion. I may come back to you if I need help ok? Thanks for the offer also!
Eduardo -------------------------------- *Eduardo Dalcin <https://mailtrack.io/trace/link/4807d9aae06ef9f148dd0c67b1a85fd9c27d6a1a?url=http%3A%2F%2Feduardo.dalc.in&signature=1b3c182ada178f91>* Instituto de Pesquisas Jardim Bot?nico do Rio de Janeiro - JBRJ e-mail: edalcin at jbrj.gov.br Trabalho / Work: +55 21 3204 2116 -------------------------------- *e-mail alternativo / * *alternate email:** edalcin at jbrj.org <edalcin at jbrj.org>* -------------------------------- Agendar reuni?o / Schedule a meeting: http://agendar.dalc.in <https://mailtrack.io/trace/link/c017b5af9bbed782c53245ec028883eaed4c9b6e?url=http%3A%2F%2Fagendar.dalc.in&signature=016b8a072a7480cb> On Wed, Sep 9, 2015 at 5:23 PM, Javier Otegui <javier.otegui at gmail.com> wrote: > Hi Eduardo (et al.), > > If I understand correctly, the list at https://goo.gl/3wysaA shows the > resources with data from Brazil and you want to filter out those with > records other than Plants, am I right? Have you considered using OpenRefine > (http://openrefine.org/) for this task? OpenRefine has a service to fetch > URLs built based on data from other columns, which plays very well with > GBIF APIs. You can make the program dinamically build the API request URL > based on the dataset UUID, and fetch and parse the JSON response, without > having to download the data and without having to code almost anything. The > way I would go here is: > > 1. Create a column based off of the value in column A of your table, > to extract just the dataset UUID > 2. Create a new column fetching the GBIF API, adding the value in the > previous column to a template URL: > http://api.gbif.org/v1/occurrence/search?TAXON_KEY=6&limit=1&DATASET_KEY= > <value>. The "limit:1" part makes things faster by avoiding having to > show the default 20 records in the column > 3. Create yet another column parsing the JSON result from the previous > column, extracting just the value in the field "count". The result is the > number of plant records in that dataset (therefore, resources such as > FishBase will have a value of zero) > > Actually, you can add as many columns as you want, with as many API calls, > to fill the rest of the fields in your table. Using the "registry" API, you > can get the title, external data link and the protocol (IPT, DiGIR...). > > Hope this helps. Let me know if you are interested in this approach and > need more help using OpenRefine. > Cheers! > > Javier Otegui > http://www.jotegui.com > > On Wed, Sep 9, 2015 at 8:07 PM, Mauro Cavalcanti <maurobio at gmail.com> > wrote: > >> Scott, >> >> That's my very point - that using R and rgbif should be the best path to >> take in this case, both because of the easier access to the GBIF API >> provided by rgbif and the HUGE data analytical capabilities of R itself. I >> had been working on a paper discussing this in the context of conservation >> databases (using R/rgbif and a Red-Listed group of mammals as an exemple), >> but unfortunately this work has been delayed by unexpected health problems. >> Hope it can be the light someday, however. >> >> Best regards, >> Em 09/09/2015 14:44, "Scott Chamberlain" <scott at ropensci.org> escreveu: >> >>> Note that the R client rgbif does interface with the GBIF download API >>> in addition to the search API - making it easier to deal with larger >>> datasets. This works even if you downloaded bulk data from the GBIF GUI. >>> Ignore this if you don't use R :) >>> >>> Best, S >>> >>> On Wed, Sep 9, 2015 at 10:35 AM Alex Thompson <godfoder at acis.ufl.edu> >>> wrote: >>> >>>> I'm kind of seconding Rod here. >>>> >>>> It might make more sense, depending on your use case and local computer >>>> resources, to just get a download of Plantae *AND* Brazil from GBIF >>>> periodically, then process that to exclude existing Brazilian datasets. You >>>> could then use something like Apache hadoop / spark to efficiently split >>>> the file by dataset or by institution code. >>>> >>>> This would greatly simplify your interactions with GBIF (down to just >>>> periodically generating a download programmatically) and you would have an >>>> easy place to insert any additional data transformations you want. This is >>>> the path i take for my work at least - the incremental cost of a couple >>>> million more records is worth the reduction in complexity overall. >>>> >>>> >>>> - Alex >>>> >>>> >>>> On 09/09/2015 12:16 PM, Eduardo Dalcin wrote: >>>> >>>> Hi Rod, >>>> >>>> The real purpose is to have a list of UUID and the "source web page" >>>> for the data set. Thus, one way to do it is to select those resources that >>>> counts <> 0 for PLANTAE *AND* Brazil. >>>> >>>> I don't want to do any stats analysis, but feed up one local harverster >>>> / agregator. >>>> >>>> The problem is, considering the reply from Jan Legind at Sep 3, we have >>>> to check one by one (https://goo.gl/3wysaA) to check if it is a >>>> Herbarium / Preserved Specimen (Plantae) or not, from the request >>>> http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN >>>> . >>>> >>>> Does it make sense? >>>> >>>> Thanks for your curiosity! :) >>>> >>>> Cheers, >>>> >>>> Eduardo >>>> >>>> >>>> -------------------------------- >>>> *Eduardo Dalcin >>>> <https://mailtrack.io/trace/link/5516ed5e4f903c6ee9bd9fb3876fb65ffffc687c?url=http%3A%2F%2Feduardo.dalc.in&signature=cda9e9bf584a828c>* >>>> Instituto de Pesquisas Jardim Bot?nico do Rio de Janeiro - JBRJ >>>> e-mail: edalcin at jbrj.gov.br >>>> Trabalho / Work: +55 21 3204 2116 >>>> -------------------------------- >>>> *e-mail alternativo / * *alternate email:** edalcin at jbrj.org >>>> <edalcin at jbrj.org>* >>>> -------------------------------- >>>> Agendar reuni?o / Schedule a meeting: http://agendar.dalc.in >>>> <https://mailtrack.io/trace/link/3a5eaa1df56016285886497766577e5357ddc6c1?url=http%3A%2F%2Fagendar.dalc.in&signature=c4e8d8113c34937f> >>>> >>>> On Mon, Sep 7, 2015 at 12:33 PM, Roderic Page < >>>> Roderic.Page at glasgow.ac.uk> wrote: >>>> >>>>> Hi Eduardo, >>>>> >>>>> I?m curious, is the purpose to get counts by dataset by country, or to >>>>> get all the plant occurrences for Brazil? The later can be obtained by >>>>> downloading all plant occurrences in Brazil >>>>> http://www.gbif.org/occurrence/search?TAXON_KEY=6&COUNTRY=BR (you >>>>> could then compute the per-dataset stats locally). I realise that this >>>>> isn?t as convenient as having GBIF slice the data for you in the API. >>>>> >>>>> Regards >>>>> >>>>> Rod >>>>> >>>>> --------------------------------------------------------- >>>>> Roderic Page >>>>> Professor of Taxonomy >>>>> Institute of Biodiversity, Animal Health and Comparative Medicine >>>>> College of Medical, Veterinary and Life Sciences >>>>> Graham Kerr Building >>>>> University of Glasgow >>>>> Glasgow G12 8QQ, UK >>>>> >>>>> Email: Roderic.Page at glasgow.ac.uk >>>>> Tel: +44 141 330 4778 <%2B44%20141%20330%204778> >>>>> Skype: rdmpage >>>>> Facebook: http://www.facebook.com/rdmpage >>>>> LinkedIn: http://uk.linkedin.com/in/rdmpage >>>>> Twitter: http://twitter.com/rdmpage >>>>> Blog: http://iphylo.blogspot.com >>>>> ORCID: http://orcid.org/0000-0002-7101-9767 >>>>> Citations: >>>>> http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ >>>>> ResearchGate https://www.researchgate.net/profile/Roderic_Page >>>>> >>>>> >>>>> On 4 Sep 2015, at 10:39, Eduardo Dalcin <edalcin at jbrj.org> wrote: >>>>> >>>>> Hi Markus, >>>>> >>>>> Yes, that's a shame I can't have country and "nub" together. There is >>>>> any hope about it? >>>>> >>>>> Eduardo >>>>> >>>>> >>>>> -------------------------------- >>>>> *Eduardo Dalcin >>>>> <https://mailtrack.io/trace/link/bac23864202354f3789938ce352a878faa0cd8b8?url=http%3A%2F%2Feduardo.dalc.in&signature=aea58ef6f439535b>* >>>>> Instituto de Pesquisas Jardim Bot?nico do Rio de Janeiro - JBRJ >>>>> e-mail: edalcin at jbrj.gov.br >>>>> Trabalho / Work: +55 21 3204 2116 >>>>> -------------------------------- >>>>> *e-mail alternativo / * *alternate email:** edalcin at jbrj.org >>>>> <edalcin at jbrj.org>* >>>>> -------------------------------- >>>>> Agendar reuni?o / Schedule a meeting: http://agendar.dalc.in >>>>> <https://mailtrack.io/trace/link/db57b837be515d4b7caefe43d55b60467cd7c2c1?url=http%3A%2F%2Fagendar.dalc.in&signature=69b244942739c0f5> >>>>> >>>>> On Thu, Sep 3, 2015 at 4:29 PM, Markus D?ring <mdoering at gbif.org> >>>>> wrote: >>>>> >>>>>> Eduardo, >>>>>> >>>>>> as you might have seen from my issue comment the webservice uses a >>>>>> different parameter name for taxonKey which is a bug we need to fix at >>>>>> some >>>>>> point. >>>>>> Please use nubKey for now to use the service like that: >>>>>> >>>>>> http://api.gbif.org/v1/occurrence/counts/datasets?nubKey=6 >>>>>> >>>>>> The real problem for you will be that we do not support the >>>>>> combination of the country and the taxon filter, just one of the two. So >>>>>> you cannot search for plants in Brazil I am afraid, just for datasets >>>>>> about >>>>>> Brazil and datasets with plant records. >>>>>> >>>>>> Markus >>>>>> >>>>>> >>>>>> >>>>>> > On 03 Sep 2015, at 14:12, Eduardo Dalcin <edalcin at jbrj.org> wrote: >>>>>> > >>>>>> > Thanks Jan. I'll keep exploring and I'll be in touch, if I need. >>>>>> > >>>>>> > Best, >>>>>> > >>>>>> > Eduardo >>>>>> > >>>>>> > >>>>>> > >>>>>> > -------------------------------- >>>>>> > Eduardo Dalcin >>>>>> > Instituto de Pesquisas Jardim Bot?nico do Rio de Janeiro - JBRJ >>>>>> > e-mail: edalcin at jbrj.gov.br >>>>>> > Trabalho / Work: +55 21 3204 2116 >>>>>> > -------------------------------- >>>>>> > e-mail alternativo / alternate email: edalcin at jbrj.org >>>>>> > -------------------------------- >>>>>> > Agendar reuni?o / Schedule a meeting: http://agendar.dalc.in >>>>>> <https://mailtrack.io/trace/link/db57b837be515d4b7caefe43d55b60467cd7c2c1?url=http%3A%2F%2Fagendar.dalc.in&signature=69b244942739c0f5> >>>>>> > >>>>>> > On Thu, Sep 3, 2015 at 4:51 AM, Jan Legind [GBIF] <jlegind at gbif.org> >>>>>> wrote: >>>>>> > Dear Eduardo, >>>>>> > >>>>>> > >>>>>> > >>>>>> > Thanks for getting in touch with us about these issues. >>>>>> > >>>>>> > >>>>>> > >>>>>> > The first request >>>>>> http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN >>>>>> returns the number of records located in Brazil for the facets in the >>>>>> request. >>>>>> > >>>>>> > The second query >>>>>> http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN >>>>>> uses the Occurrence Inventories web service >>>>>> http://www.gbif.org/developer/occurrence#inventories which does not >>>>>> support the basis-of-record facet in the /datasets request. I understand >>>>>> that it would be better if the API response yielded an error message in >>>>>> this instance. >>>>>> > >>>>>> > >>>>>> > >>>>>> > Concerning the other issues ? you are indeed right that the counts >>>>>> do not make sense in the context of taxon key 6 which is Plantae. >>>>>> Actually >>>>>> the API does not handle the taxonKey search at all, contrary to what the >>>>>> documentation states: >>>>>> > >>>>>> > >>>>>> > >>>>>> > /occurrence/counts/datasets >>>>>> > >>>>>> > GET >>>>>> > >>>>>> > Counts >>>>>> > >>>>>> > Lists occurrence counts for datasets that cover a given taxon or >>>>>> country. >>>>>> > >>>>>> > country, taxonKey >>>>>> > >>>>>> > >>>>>> > >>>>>> > As you can see here, >>>>>> http://api.gbif.org/v1/occurrence/counts/datasets?taxonKey=6 , this >>>>>> request doesn?t return anything. >>>>>> > >>>>>> > >>>>>> > >>>>>> > The GBIF developers will handle this issue in due time. >>>>>> > >>>>>> > You can follow the issue in our bug tracking service here: >>>>>> http://dev.gbif.org/issues/browse/POR-2828 >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > With best regards, >>>>>> > >>>>>> > >>>>>> > >>>>>> > Jan K. Legind >>>>>> > >>>>>> > Data manager, GBIF Secretariat >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > From: API-users [mailto:api-users-bounces at lists.gbif.org] On >>>>>> Behalf Of Eduardo Dalcin >>>>>> > Sent: 2. september 2015 20:06 >>>>>> > To: api-users at lists.gbif.org; dev at gbif.org >>>>>> > Cc: Jo?o Monnerat Lanna; Nat?lia Queiroz; Diogo Silva; Laura; >>>>>> Ricardo Avancini >>>>>> > Subject: [API-users] Some questions from a begginer >>>>>> > >>>>>> > >>>>>> > >>>>>> > Hi folks, >>>>>> > >>>>>> > >>>>>> > >>>>>> > This is my first message to the list. So, please, be nice :) >>>>>> > >>>>>> > >>>>>> > >>>>>> > I'm working here at Rio de Janeiro Botanical Garden, together with >>>>>> the guys at the National Center for Flora Conservation. We are doing the >>>>>> risk assessment of the Brazilian flora to the government. We assess, so >>>>>> far, the risk of ca. 6.000 species, but we still have to assess ca. >>>>>> 35.000. >>>>>> Access occurrence records for Brazil is crucial, and every occurrence is >>>>>> important. >>>>>> > >>>>>> > >>>>>> > >>>>>> > That means that we have to put together occurrence data from >>>>>> different sources and, after the first batch of the risk assessment, we >>>>>> realize that we need to build up our aggregator. We are planning to do >>>>>> this >>>>>> with the Lontra-harvester, with the help of the guys at Brazilian GBIF >>>>>> Node. >>>>>> > >>>>>> > >>>>>> > >>>>>> > So, the one of the firsts steps was to list the available resources >>>>>> to understand the dimension of the task and, that brings me to my >>>>>> questions. >>>>>> > >>>>>> > >>>>>> > >>>>>> > First: >>>>>> > >>>>>> > >>>>>> > >>>>>> > The request: >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN >>>>>> > >>>>>> > >>>>>> > >>>>>> > returns 4.982.689 records >>>>>> > >>>>>> > >>>>>> > >>>>>> > And the request: >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> http://api.gbif.org/v1/occurrence/counts/datasets?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN >>>>>> > >>>>>> > >>>>>> > >>>>>> > returns (here) 7.406.310 records >>>>>> > >>>>>> > >>>>>> > >>>>>> > Comments? >>>>>> > >>>>>> > >>>>>> > >>>>>> > Second: >>>>>> > >>>>>> > >>>>>> > >>>>>> > The request: >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> http://api.gbif.org/v1/occurrence/count?country=BR&taxonKey=6&basisOfRecord=PRESERVED_SPECIMEN >>>>>> > >>>>>> > >>>>>> > >>>>>> > return things like this: >>>>>> > >>>>>> > >>>>>> > >>>>>> > "197908d0-5565-11d8-b290-b8a03c50a862":27629 >>>>>> > >>>>>> > >>>>>> > But the consult of the same dataset: >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> http://www.gbif.org/occurrence/search?TAXON_KEY=6&DATASET_KEY=197908d0-5565-11d8-b290-b8a03c50a862 >>>>>> > >>>>>> > >>>>>> > >>>>>> > Returns "null" (of course, is a FishBase!) >>>>>> > >>>>>> > >>>>>> > >>>>>> > I have plenty of examples like this, on yellow here (not finished!): >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> https://docs.google.com/spreadsheets/d/1msUjwMLoKwnXxJFzF20SeN_C65RIkGLbwaYyj459VTc/edit?usp=sharing >>>>>> > >>>>>> > >>>>>> > >>>>>> > Comments? >>>>>> > >>>>>> > >>>>>> > >>>>>> > I think those two questions is a good start. Please, let me know if >>>>>> I'm doing something wrong. >>>>>> > >>>>>> > >>>>>> > >>>>>> > Cheers, >>>>>> > >>>>>> > >>>>>> > >>>>>> > Eduardo >>>>>> > >>>>>> > -------------------------------- >>>>>> > >>>>>> > Eduardo Dalcin >>>>>> > >>>>>> > Instituto de Pesquisas Jardim Bot?nico do Rio de Janeiro - JBRJ >>>>>> > >>>>>> > e-mail: edalcin at jbrj.gov.br >>>>>> > >>>>>> > Trabalho / Work: +55 21 3204 2116 >>>>>> > >>>>>> > -------------------------------- >>>>>> > >>>>>> > e-mail alternativo / alternate email: edalcin at jbrj.org >>>>>> > >>>>>> > -------------------------------- >>>>>> > >>>>>> > Agendar reuni?o / Schedule a meeting: http://agendar.dalc.in >>>>>> <https://mailtrack.io/trace/link/db57b837be515d4b7caefe43d55b60467cd7c2c1?url=http%3A%2F%2Fagendar.dalc.in&signature=69b244942739c0f5> >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> >>>>>> >>>>> _______________________________________________ >>>>> API-users mailing list >>>>> API-users at lists.gbif.org >>>>> http://lists.gbif.org/mailman/listinfo/api-users >>>>> >>>>> >>>>> >>>> >>>> >>>> _______________________________________________ >>>> API-users mailing listAPI-users at >>>> lists.gbif.orghttp://lists.gbif.org/mailman/listinfo/api-users >>>> >>>> >>>> _______________________________________________ >>>> API-users mailing list >>>> API-users at lists.gbif.org >>>> http://lists.gbif.org/mailman/listinfo/api-users >>>> >>> >>> _______________________________________________ >>> API-users mailing list >>> API-users at lists.gbif.org >>> http://lists.gbif.org/mailman/listinfo/api-users >>> >>> >> _______________________________________________ >> API-users mailing list >> API-users at lists.gbif.org >> http://lists.gbif.org/mailman/listinfo/api-users >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gbif.org/pipermail/api-users/attachments/20150916/cd8568a3/attachment-0001.html>
