Hi Nils, Thank you for sharing!
How is phoenix about? Does it connects to the ESGF network? It's the first time I read about this. Looks very very interesting! Thanks everybody for these valuable feedback. Best wishes Juan On 01/06/16 10:09, Nils Hempelmann wrote: > Hi Juan et al > > Thanks a lot for triggering this discussion. > I am currently working on a Web processing service > (http://birdhouse.readthedocs.io/en/latest/) including a species > distribution model based on the GBIF data (and climate model data). A > good connection to GBIF database is still missing and all hints were > quite useful!! > > If you want to share code: > https://github.com/bird-house/flyingpigeon/blob/master/flyingpigeon/processes/wps_sdm.py > > > > Merci > Nils > > On 31/05/2016 22:08, Juan M. Escamilla Molgora wrote: >> >> Hi Tim, >> >> Thank you! specially for the DwC-A hint. >> >> The cells are by default in decimal degrees, (wgs84 ) but the >> functions for generating them are general enough to use any >> projection supported by gdal using postgis. It could be done "on the >> fly" or stored on the server side, >> >> I was thinking (day dreaming) in a standard way for coding unique but >> universal grids (similar to geohash or open location code), but >> didn't find something fast and ready. Maybe later :) >> >> I only use Open Source Software, Python, Django, GDAL, Numpy, >> Postgis, Conda, Py2Neo, ete2 among others. >> >> Currently I don't have an official release and the project is quite >> inmature, unstable as well as the installation could be non trivial. >> I'm fixing all these issues but will take some time,sorry for this. >> >> The github repository is: >> >> https://github.com/molgor/biospytial.git >> >> An there's a very old documentation here: >> >> http://test.holobio.me/modules/gbif_taxonomy_class.html >> >> Please feel free to follow! >> >> >> Best wishes >> >> >> Juan >> >> P.s. The functions for generating the grid are in: >> biospytial/SQL_functions >> >> >> >> >> >> On 31/05/16 19:47, Tim Robertson wrote: >>> Thanks Juan >>> >>> You're quite right - you need the DwC-A download format to get those >>> IDs. >>> >>> Are the cells decimal degrees, and then partitioned into smaller >>> units, or equal area cells or maybe UTM grids or something else >>> perhaps? I am just curious. >>> >>> Are you developing this as OSS? I'd like to follow progress if possible? >>> >>> Thanks, >>> Tim, >>> >>> On 31 May 2016, at 20:31, Juan M. Escamilla Molgora >>> <j.escamillamolgora at lancaster.ac.uk> wrote: >>> >>>> Hi Tim, >>>> >>>> The grid is made by selecting a square area and divide it in nxn >>>> subsquares which form a partition on the bigger square. >>>> >>>> Each grid is a table in postgis and there's a mapping between this >>>> table to a django model (class). >>>> >>>> The class constructor have attributes: id, cell and neighbours >>>> (next release). >>>> >>>> The cell is a polygon (square) and with geodjango inherits the >>>> properties of the osgeo module for polygons. >>>> >>>> I've tried to use the CSV data (downloaded as a CSV request ) but I >>>> couldn't find a way to obtain the global id's for each taxonomic >>>> level (idspecies, idgenus, idfamily, etc). >>>> >>>> Do you know a way for obtaining these fields? >>>> >>>> >>>> Thank you for your email and best wishes, >>>> >>>> >>>> Juan >>>> >>>> >>>> On 31/05/16 19:03, Tim Robertson wrote: >>>>> Hi Juan >>>>> >>>>> That sounds like a fun project! >>>>> >>>>> Can you please describe your grid / cells? >>>>> >>>>> Most likely your best bet will be to use the download API (as CSV >>>>> data) and ingest that. The other APIs will likely hit limits (e.g. >>>>> You can't page through indefinitely). >>>>> >>>>> Thanks, >>>>> Tim >>>>> >>>>> On 31 May 2016, at 18:55, Juan M. Escamilla Molgora >>>>> <j.escamillamolgora at lancaster.ac.uk> wrote: >>>>> >>>>>> Dear all, >>>>>> >>>>>> >>>>>> Thank you very much for your valuable feedback! >>>>>> >>>>>> >>>>>> I'll explain a bit what I'm doing just to clarify, sorry if this >>>>>> spam to some. >>>>>> >>>>>> >>>>>> I want to build a model for species assemblages based on >>>>>> co-occurrence of taxa within an arbitrary area. I'm building a 2D >>>>>> lattice in which for each cell I'm collapsing the data into a >>>>>> taxonomic tree (the occurrences). For doing this I need first to >>>>>> obtain the data from the gbif api and later, based on the ids (or >>>>>> names) of each taxonomic level (from kingdom to occurrence) build >>>>>> a tree coupled to each cell. >>>>>> >>>>>> >>>>>> The implementation is done with postgresql (postgis) for storing >>>>>> the raw gbif data and neo4j for storing the relation >>>>>> >>>>>> "Being a member of the [ specie, genus, family,,,] [name/id]" The >>>>>> idea is to include data from different sources similar to the >>>>>> project Matthew and Jennifer had mentioned (which I'm very >>>>>> interested and like to hear more) and traverse the network >>>>>> looking for significant merged information. >>>>>> >>>>>> >>>>>> One of the immediate problems I've found is to import big chunks >>>>>> of the gbif data into my specification. Thanks to this thread >>>>>> I've found the tools that are the most used by the community >>>>>> (pygbif,rgbif, and python-dwca-reader). I was using urlib2 and >>>>>> things like that. >>>>>> >>>>>> I'll be happy to share any code or ideas with the people interested. >>>>>> >>>>>> >>>>>> Btw, I've checked the tinkerpop project which uses the Gremlin >>>>>> traversal language as independent from the DBMS. >>>>>> >>>>>> Perhaps it's possible to use it with spark and Guoda as well? >>>>>> >>>>>> >>>>>> >>>>>> Does GOuda is working now? >>>>>> >>>>>> >>>>>> Best wishes >>>>>> >>>>>> >>>>>> Juan. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On 31/05/16 17:02, Collins, Matthew wrote: >>>>>>> >>>>>>> Jorrit pointed out this thread to us at iDigBio. Downloading and >>>>>>> importing data into a relational database will work great, >>>>>>> especially if as Jan said you can cut the data size down to a >>>>>>> reasonable amount. >>>>>>> >>>>>>> >>>>>>> Another approach we've been working on in a collaboration called >>>>>>> GUODA [1] is to build an Apache Spark environment with >>>>>>> pre-formatted data frames with common data sets in them for >>>>>>> researchers to use. This approach would offer a remote service >>>>>>> where you could write arbitrary Spark code, probably in Jupyter >>>>>>> notebooks, to iterate over data. Spark does a lot of cool stuff >>>>>>> including GraphX which might be of interest. This is definitely >>>>>>> pre-alpha at this point and if anyone is interested, I'd like to >>>>>>> hear your thoughts. I'll also be at SPNHC talking about this. >>>>>>> >>>>>>> >>>>>>> One thing we've found in working on this is that importing data >>>>>>> into a structured data format isn't always easy. If you only >>>>>>> want a few columns, it'll be fine. But getting the data typing, >>>>>>> format standardization, and column name syntax of the whole >>>>>>> width of an iDigBio record right requires some code. I looked to >>>>>>> see if EcoData Retriever [2] had a GBIF data source and they >>>>>>> have an eBird one that perhaps you might find useful as a >>>>>>> starting point if you wanted to try to use someone else's code >>>>>>> to download and import data. >>>>>>> >>>>>>> >>>>>>> For other data structures like BHL, we're kind of making stuff >>>>>>> up since we're packaging a relational structure and not >>>>>>> something nearly as flat as GBIF and DWC stuff. >>>>>>> >>>>>>> >>>>>>> [1] http://guoda.bio/? >>>>>>> >>>>>>> [2] http://www.ecodataretriever.org/ >>>>>>> >>>>>>> >>>>>>> Matthew Collins >>>>>>> Technical Operations Manager >>>>>>> Advanced Computing and Information Systems Lab, ECE >>>>>>> University of Florida >>>>>>> 352-392-5414 <callto:352-392-5414> >>>>>>> ------------------------------------------------------------------------ >>>>>>> *From:* jorrit poelen <jhpoelen at xs4all.nl> >>>>>>> *Sent:* Monday, May 30, 2016 11:16 AM >>>>>>> *To:* Collins, Matthew; Thompson, Alexander M; Hammock, Jennifer >>>>>>> *Subject:* Fwd: [API-users] Is there any NEO4J or graph-based >>>>>>> driver for this API ? >>>>>>> Hey y?all: >>>>>>> >>>>>>> Interesting request below on the GBIF mailing list - sounds like >>>>>>> a perfect fit for the GUODA use cases. >>>>>>> >>>>>>> Would it be too early to jump onto this thread and share our >>>>>>> efforts/vision? >>>>>>> >>>>>>> thx, >>>>>>> -jorrit >>>>>>> >>>>>>>> Begin forwarded message: >>>>>>>> >>>>>>>> *From: *Jan Legind <jlegind at gbif.org> >>>>>>>> *Subject: **Re: [API-users] Is there any NEO4J or graph-based >>>>>>>> driver for this API ?* >>>>>>>> *Date: *May 30, 2016 at 5:48:51 AM PDT >>>>>>>> *To: *Mauro Cavalcanti <maurobio at gmail.com>, "Juan M. Escamilla >>>>>>>> Molgora" <j.escamillamolgora at lancaster.ac.uk> >>>>>>>> *Cc: *"api-users at lists.gbif.org >>>>>>>> <mailto:api-users at lists.gbif.org>" <api-users at lists.gbif.org> >>>>>>>> >>>>>>>> Dear Juan, >>>>>>>> Unfortunately we have no tool for creating these kind of SQL >>>>>>>> like queries to the portal. I am sure you are aware that the >>>>>>>> filters in the occurrence search pages can be applied in >>>>>>>> combination in numerous ways. The API can go even further in >>>>>>>> this regard[1], but it not well suited for retrieving >>>>>>>> occurrence records since there is a 200.000 records ceiling >>>>>>>> making it unfit for species exceeding this number. >>>>>>>> There is going be updates to the pygbif package[2] in the near >>>>>>>> future that will enable you to launch user downloads >>>>>>>> programmatically where a whole list of different species can be >>>>>>>> used as a query parameter as well as adding polygons.[3] >>>>>>>> In the meantime, Mauro?s suggestion is excellent. If you can >>>>>>>> narrow your search down until it returns a manageable download >>>>>>>> (say less than 100 million records), importing this into a >>>>>>>> database should be doable. From there, you can refine using SQL >>>>>>>> queries. >>>>>>>> Best, >>>>>>>> Jan K. Legind, GBIF Data manager >>>>>>>> [1]http://www.gbif.org/developer/occurrence#search >>>>>>>> [2]https://github.com/sckott/pygbif >>>>>>>> [3]https://github.com/jlegind/GBIF-downloads >>>>>>>> *From:*API-users [mailto:api-users-bounces at lists.gbif.org]*On >>>>>>>> Behalf Of*Mauro Cavalcanti >>>>>>>> *Sent:*30. maj 2016 14:06 >>>>>>>> *To:*Juan M. Escamilla Molgora >>>>>>>> *Cc:*api-users at lists.gbif.org >>>>>>>> *Subject:*Re: [API-users] Is there any NEO4J or graph-based >>>>>>>> driver for this API ? >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> One solution I have successfully adopted for this is to >>>>>>>> download the records (either "manually" via browser or, yet >>>>>>>> better, using a Python script using the fine pygbif library), >>>>>>>> storing them into a MySQL or SQLite database and then perform >>>>>>>> the relational queries. I can provide examples if you are >>>>>>>> interested. >>>>>>>> >>>>>>>> Best regards, >>>>>>>> 2016-05-30 8:59 GMT-03:00 Juan M. Escamilla Molgora >>>>>>>> <j.escamillamolgora at lancaster.ac.uk>: >>>>>>>> Hola, >>>>>>>> >>>>>>>> Is there any API for making relational queries like taxonomy, >>>>>>>> location or timestamp? >>>>>>>> >>>>>>>> Thank you and best wishes >>>>>>>> >>>>>>>> Juan >>>>>>>> _______________________________________________ >>>>>>>> API-users mailing list >>>>>>>> API-users at lists.gbif.org <mailto:API-users at lists.gbif.org> >>>>>>>> http://lists.gbif.org/mailman/listinfo/api-users >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Dr. Mauro J. Cavalcanti >>>>>>>> E-mail:maurobio at gmail.com >>>>>>>> Web:http://sites.google.com/site/maurobio >>>>>>>> _______________________________________________ >>>>>>>> API-users mailing list >>>>>>>> API-users at lists.gbif.org <mailto:API-users at lists.gbif.org> >>>>>>>> http://lists.gbif.org/mailman/listinfo/api-users >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> API-users mailing list >>>>>>> API-users at lists.gbif.org >>>>>>> http://lists.gbif.org/mailman/listinfo/api-users >>>>>> >>>>>> _______________________________________________ >>>>>> API-users mailing list >>>>>> API-users at lists.gbif.org <mailto:API-users at lists.gbif.org> >>>>>> http://lists.gbif.org/mailman/listinfo/api-users >>>> >> >> >> >> _______________________________________________ >> API-users mailing list >> API-users at lists.gbif.org >> http://lists.gbif.org/mailman/listinfo/api-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gbif.org/pipermail/api-users/attachments/20160601/d0b29f05/attachment-0001.html>
