Project 4: Parsing rhinoceros sightings In this project, I’m working for a wildlife conservation group that is tracking rhinos in the African savannah. My field workers' software resources and GIS expertise are limited, but you have managed to obtain an Excel spreadsheet<https://www.e-education.psu.edu/drupal6/files/geog485py/data/RhinoObservations.xlsx>showing the positions of several rhinos over time. Each record in the spreadsheet shows the latitude/longitude coordinate of a rhino along with the rhino's name (these rhinos are well known to your field workers).
I want to write a script that will turn the readings in the spreadsheet into a vector dataset that I can place on a map. This will be a polyline dataset showing the tracks the rhinos followed over the time the data was collected. I will deliver: A Python script that reads the data from the spreadsheet and creates, from scratch, a polyline shapefile with *n* polylines, *n* being the number of rhinos in the spreadsheet. Each polyline should represent a rhino's track chronologically from the beginning of the spreadsheet to the end of the spreadsheet. Each polyline should also have a text attribute containing the rhino's name. The shapefile should use the WGS 1984 geographic coordinate system. *Challenges* The data is in a format (XLSX) that you cannot easily parse. The first step I must do is manually open the file in Excel and save it as a comma-delimited format that I can easily read with a script. Choose the option *CSV (comma-delimited) (*.csv)*. I did this - The rhinos in the spreadsheet appear in no guaranteed order, and not all the rhinos appear at the beginning of the spreadsheet. As I parse each line, I must determine which rhino the reading belongs to and update that rhino's polyline track accordingly. *I am not allowed to sort the Rhino column in Excel before I export to the CSV file. My script must be "smart" enough to work with an unsorted spreadsheet in the order that the records appear.* - I do not immediately know how many rhinos are in the file or even what their names are. Although I could visually comb the spreadsheet for this information and hard-code each rhino's name, your script is required to handle all the rhino names programmatically. The idea is that I should be able to run this script on a different file, possibly containing more rhinos, without having to make many manual adjustments. sample of my code: import arcpy shapefile = "C:\\...shp" pointFilePath = "C:\\...csv" pointFile = open(pointFilePath, "r") lineOfText = pointFile.readline() dataPairList = lineOfText.split(",") def addVertex(lat, lon, array): vertex = arcpy.CreateObject("Point") vertex.X = lon vertex.Y = lat array.add(vertex) def addPolyline(cursor, array): feature = cursor.newRow() feature.shape = array cursor.insertRow(feature) array.removeAll() def rhinoName(Rhino, dictionary): if rhinoName in rhinoDictionary: dictionary[rhinoName].append([latValue, lonValueIndex]) if rhinoName not in dictionary: dictionary[rhinoName] = [] else: dictionary[rhinoName]= ([latValue, lonValue]) latValueIndex = dataPairList.index("X") lonValueIndex = dataPairList.index("Y") vertexArray = arcpy.CreateObject("Array") for line in pointFile.readlines(): segmentedLine = line.split(",") latValue = segmentedLine[latValueIndex] lonValue = segmentedLine[lonValueIndex] vertex = arcpy.CreateObject("Point") vertex.X = lonValue vertex.Y = latValue vertexArray.add(vertex) polylineArray.add(currentPoint) cursor = arcpy.InsertCursor(shapefile) row = cursor.newRow() row.Shape = vertexArray cursor.insertRow(row) del cursor
_______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor