Greetings Benjamin, To begin: I do not know R.
: I'm trying to improve my python by translating R code that I : wrote into Python. : : *All I am trying to do is take in a specific column in : "uncurated" and write that whole column as output to "curated." : It should be a pretty basic command, I'm just not clear on how to : execute it.* The hardest part about translation is learning how to think in a different language. If you know any other human languages, you probably know that you can say things in some languages that do not translate particularly well (other than circumlocution) into another language. Why am I starting with this? I am starting here because you seem quite comfortable with thinking and operating in R, but you don't seem as comfortable yet with thinking and operating in Python. Naturally, that's why you are asking the Tutor list about this, so welcome to the right place! Let's see if we can get you some help. : As background, GSEXXXXX_full_pdata.csv has different patient : information (such as unique patient ID's, whether the tissue used : was tumor or normal, and other things. I'll just use the first : two characteristics for now). Template.csv is a template we built : that allows us to take different datasets and standardize them : for meta-analysis. So for example, "curated$alt_sample_name" : refers to the unique patient ID, and "curated$sample_type" refers : to the type of tissue used. I have fabricated some data after your description that looks like this: patientID,title,sample_type V6IF0OqVu,0.5788,70 GXj51ljB2,0.3449,88 You, doubtless have more columns and the data here are probably nothing like yours, but consider it useful for illustrative purposes only. (Illustrating porpoises! How did they get here? Next thing you know we will have illuminating egrets and animating dromedaries!) : I've been reading about the python csv module and realized it was : best to get some expert input to clarify some confusion on my : part. The csv module is very useful and quite powerful for reading data in different ways and iterating over data sets. Supposing you know the index of the column of interest to you...well this is quite trivial: import csv def main(f,field): for row in csv.reader(f): print row[0],row[field] # -- lists/tuples are zero-based [0,1,2], so 2 is the third column # # main(open('GSEXXXXX_full_pdata.csv'),2) OK, but if your data files have different numbers of or ordering of columns, then this can become a bit fragile. So maybe you would want to learn how to use the csv.DictReader, which will give you the same thing but uses the first (header) line to name the columns, so then you could do something more like this: import csv def main(f,id,field): for row in csv.DictReader(f): print row[id],row[field] main(open('GSEXXXXX_full_pdata.csv'),'patientID','sample_type') Would you like more detail on this? Well, have a look at this nice little summary: http://www.doughellmann.com/PyMOTW/csv/ Now, that really is just giving you a glimpse of the csv module. This is not really your question. Your question was more along the lines of 'How do I, in Python, accomplish this task that is quite simple in R?' You may find that list-comprehensions, generators and iterators are all helpful in mangling the data according to your nefarious will once you have used the csv module to load the data into a data structure. In point of fact, though, Python does not have this particular feature that you are seek...not in the core libraries, however. The lack of this capability has bothered a few people over the years, so there are a few different types of solutions. You have already heard a reference to RPy (about which I know nothing): http://rpy.sourceforge.net/ There are, however, a few other tools that you may find quite useful. One chap wanted access to some features of R that he used all the time along with many of the other convenient features of Python, so he decided to implement dataframes (an R concept?) in Python. This idea was present at the genesis of the pandas library. http://pandas.pydata.org/ So, how would you do this with pandas? Well, you could: import pandas def main(f,field): uncurated = pandas.read_csv(f) curated = uncurated[field] print curated main(open('GSEXXXXX_full_pdata.csv'),'sample_type') Note that pandas is geared to allow you to access your data by the 'handles', the unique identifier for the row and the column name. This will produce a tabular output of just the single column you want. You may find that pandas affords you access to tools with which you are already intellectually familiar. Good luck, -Martin P.S. While I was writing this, you sent in some sample data that looked tab-separated (well, anyway, not comma-separated). The csv and pandas libraries allow for delimiter='\t' options to most object constructor calls. So, you could do: csv.reader(f,delimiter='\t') -- Martin A. Brown http://linux-ip.net/ _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor