Thanks Martin-- this is really great. My major question now is that I need to transition to Python for a project and I need to learn how to think in Python instead of in R. The two strategies I have used so far are: a) going through the description and exercises in http://www.openbookproject.net/thinkcs/python/english2e/ and b) trying to convert my R code into Python.
On a high-level, do you have any other suggestions for how I could go about becoming more proficient in Python? Thanks again to you and everyone else who responded. I am really very much obliged. Benjamin On Sat, May 19, 2012 at 5:32 PM, Martin A. Brown <mar...@linux-ip.net>wrote: > > Greetings Benjamin, > > To begin: I do not know R. > > : I'm trying to improve my python by translating R code that I > : wrote into Python. > : > : *All I am trying to do is take in a specific column in > : "uncurated" and write that whole column as output to "curated." > : It should be a pretty basic command, I'm just not clear on how to > : execute it.* > > The hardest part about translation is learning how to think in a > different language. If you know any other human languages, you > probably know that you can say things in some languages that do not > translate particularly well (other than circumlocution) into another > language. Why am I starting with this? I am starting here because > you seem quite comfortable with thinking and operating in R, but you > don't seem as comfortable yet with thinking and operating in Python. > > Naturally, that's why you are asking the Tutor list about this, so > welcome to the right place! Let's see if we can get you some help. > > : As background, GSEXXXXX_full_pdata.csv has different patient > : information (such as unique patient ID's, whether the tissue used > : was tumor or normal, and other things. I'll just use the first > : two characteristics for now). Template.csv is a template we built > : that allows us to take different datasets and standardize them > : for meta-analysis. So for example, "curated$alt_sample_name" > : refers to the unique patient ID, and "curated$sample_type" refers > : to the type of tissue used. > > I have fabricated some data after your description that looks like > this: > > patientID,title,sample_type > V6IF0OqVu,0.5788,70 > GXj51ljB2,0.3449,88 > > You, doubtless have more columns and the data here are probably > nothing like yours, but consider it useful for illustrative purposes > only. (Illustrating porpoises! How did they get here? Next thing > you know we will have illuminating egrets and animating > dromedaries!) > > : I've been reading about the python csv module and realized it was > : best to get some expert input to clarify some confusion on my > : part. > > The csv module is very useful and quite powerful for reading data in > different ways and iterating over data sets. Supposing you know the > index of the column of interest to you...well this is quite trivial: > > import csv > def main(f,field): > for row in csv.reader(f): > print row[0],row[field] > > # -- lists/tuples are zero-based [0,1,2], so 2 is the third column > # > # > main(open('GSEXXXXX_full_pdata.csv'),2) > > OK, but if your data files have different numbers of or ordering of > columns, then this can become a bit fragile. So maybe you would > want to learn how to use the csv.DictReader, which will give you the > same thing but uses the first (header) line to name the columns, so > then you could do something more like this: > > import csv > def main(f,id,field): > for row in csv.DictReader(f): > print row[id],row[field] > > main(open('GSEXXXXX_full_pdata.csv'),'patientID','sample_type') > > Would you like more detail on this? Well, have a look at this nice > little summary: > > http://www.doughellmann.com/PyMOTW/csv/ > > Now, that really is just giving you a glimpse of the csv module. > This is not really your question. Your question was more along the > lines of 'How do I, in Python, accomplish this task that is quite > simple in R?' > > You may find that list-comprehensions, generators and iterators are > all helpful in mangling the data according to your nefarious will > once you have used the csv module to load the data into a data > structure. > > In point of fact, though, Python does not have this particular > feature that you are seek...not in the core libraries, however. > > The lack of this capability has bothered a few people over the > years, so there are a few different types of solutions. You have > already heard a reference to RPy (about which I know nothing): > > http://rpy.sourceforge.net/ > > There are, however, a few other tools that you may find quite > useful. One chap wanted access to some features of R that he used > all the time along with many of the other convenient features of > Python, so he decided to implement dataframes (an R concept?) in > Python. This idea was present at the genesis of the pandas library. > > http://pandas.pydata.org/ > > So, how would you do this with pandas? Well, you could: > > import pandas > def main(f,field): > uncurated = pandas.read_csv(f) > curated = uncurated[field] > print curated > > main(open('GSEXXXXX_full_pdata.csv'),'sample_type') > > Note that pandas is geared to allow you to access your data by the > 'handles', the unique identifier for the row and the column name. > This will produce a tabular output of just the single column you > want. You may find that pandas affords you access to tools with > which you are already intellectually familiar. > > Good luck, > > -Martin > > P.S. While I was writing this, you sent in some sample data that > looked tab-separated (well, anyway, not comma-separated). The > csv and pandas libraries allow for delimiter='\t' options to > most object constructor calls. So, you could do: > csv.reader(f,delimiter='\t') > > -- > Martin A. Brown > http://linux-ip.net/ >
_______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor