[Tutor] R: Tutor Digest, Vol 125, Issue 49

2014-07-16 Thread jarod...@libero.it
Hi there!!!
I have a file  with this data
['uc002uvo.3 ', 'uc001mae.1']
['uc010dya.2 ', 'uc001kko.2']
['uc003ejx.2 ', 'uc010yfr.1']
['uc001bhk.2 ', 'uc003eib.2']
['uc001znc.2 ', 'uc001efn.2']
['uc002ycq.2 ', 'uc001vnh.2']
['uc001odf.1 ', 'uc002mwd.2']
['uc010jkn.1 ', 'uc010luk.1']
['uc003uhf.3 ', 'uc010tqd.1']
['uc002rue.3 ', 'uc001tex.2']
['uc011dtt.1 ', 'uc001lkv.1']
['uc003yyt.2 ', 'uc003mkl.2']
['uc003pkv.2 ', 'uc003ytw.2']
['uc010bhz.2 ', 'uc002kbt.1']
['uc001wnj.2 ', 'uc009wtj.1']
['uc011lyh.1 ', 'uc003jvb.2']
['uc002awj.1 ', 'uc009znm.1']
['uc010bft.2 ', 'uc002cxz.1']
['uc011mar.1 ', 'uc001lvb.1']
['uc001oxl.2 ', 'uc002lvx.1']

I want to replace of the things after the dots, so I want to have  a file with 
this output:

['uc002uvo ', 'uc001mae']
['uc010dya ', 'uc001kko']
...

I try to use regular expression but I have  a strange output

with open("non_annotati.csv") as p:
for i in p:
lines= i.rstrip("\n").split("\t")
mit = re.sub(r'(\.\d$)','',lines[0])
mit2 = re.sub(r'(\.\d$)','',lines[1])
print mit,mit2


uc003klv.2  uc010lxj
uc001tzy.2  uc011kzk
uc010qdj.1  uc001iku
uc004coe.2  uc002vmf
uc002dvw.2  uc004bxn
uc001dmp.2  uc001dmo
uc002rqd.2  uc010ynl
uc010cvm.1  uc002qjc
uc003ewy.3  uc003hgx
uc002ejy.2  uc003mvb
uc002fou.1  uc010ilx
uc003vhf.2  uc010qlo
uc003mix.2  uc010tdt
uc002nez.1  uc003wxe
uc011cpu.1  uc002keg
uc001ovu.2  uc011dne
uc010zfg.1  uc001jvq
uc010jlf.2  uc011azi
uc001ors.3  uc001vzx
uc010tyt.1  uc003vih
uc010fde.2  uc002xgq
uc010bit.1  uc003zle
uc010xcb.1  uc010wsg
uc011acg.1  uc009wlp
uc002bnj.2  uc004ckd


Where is the error? what is wrong in my regular expression code? 


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] R: Tutor Digest, Vol 125, Issue 49

2014-07-16 Thread Danny Yoo
Hi Jarod,

Ah.  Note the extra space on the first column elements.  For example,
one of your inputs that you've split on tabs:

['uc011lyh.1 ', 'uc003jvb.2']

If you look really closely, you'll see the whitespace at the end of
"uc011lyh.1".  That's why the regex isn't matching.

Good luck!
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] R: Tutor Digest, Vol 125, Issue 49

2014-07-16 Thread Danny Yoo
By the way, those look like gene locus names.  Reminds me of the ones
I saw when I worked at arabidopsis.org.  e.g.:

   http://www.arabidopsis.org/servlets/TairObject?id=1000638674&type=gene
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] Question about style

2014-07-16 Thread Jose Amoreira

Hello
I wrote a function that, given a list of numbers, finds clusters of 
values by proximity and returns a reduced list containing the centers of 
these clusters. However, I find it rather unclear. I would appreciate 
any comments on how pythonic my function is and suggestions to improve 
its readability.

The function is:

def aglomerate(x_lst, delta=1.e-5):
clusters = [] #list of pairs [center, number of clustered values]
for x in x_lst:
close_to = [abs(x - y) < delta for y,_ in clusters]
if any(close_to):
# x is close to a cluster
index = close_to.index(True)
center, n = clusters[index]
#update the cluster center including the new value,
#and increment dimension of cluster
clusters[index] = (n * center + x)/(n+1), n+1
else:
# x does not belong to any cluster, create a new one
clusters.append([x,1])
# return list with centers
return [center for center, _ in clusters]

Examples:
1. No clusters in x_lst:
In [52]: aglomerate([1., 2., 3., 4.])
Out[52]: [1.0, 2.0, 3.0, 4.0]

2. Some elements in x_lst are equal:
In [53]: aglomerate([1., 2., 1., 3.])
Out[53]: [1.0, 2.0, 3.0]

3. Some elements in x_lst should be clustered:
In [54]: aglomerate([1., 2., 1.1, 3.], delta=0.2)
Out[54]: [1.05, 2.0, 3.0]

So, the function seems to work as it should, but can it be made more 
readable?


Thanks for any help.
Ze
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] R: Tutor Digest, Vol 125, Issue 49

2014-07-16 Thread Joel Goldstick
2014-07-16 4:04 GMT-04:00 jarod...@libero.it :

> Hi there!!!
> I have a file  with this data
> ['uc002uvo.3 ', 'uc001mae.1']
> ['uc010dya.2 ', 'uc001kko.2']
>


> I want to replace of the things after the dots, so I want to have  a file
> with
> this output:
>
> ['uc002uvo ', 'uc001mae']
> ['uc010dya ', 'uc001kko']
> ...
>
> I try to use regular expression but I have  a strange output
>

I'm one of those who doesn't choose regex first for string manipulation
that are simpler without them.

>
> with open("non_annotati.csv") as p:
> for i in p:
> lines= i.rstrip("\n").split("\t")
> mit = re.sub(r'(\.\d$)','',lines[0])
> mit2 = re.sub(r'(\.\d$)','',lines[1])
> print mit,mit2
>

mit  = lines[0].split('.')[0]
mit  = lines[1].split('.')[0]

>>> s = 'uc003klv.2  '
>>> s.split('.')
['uc003klv', '2  ']



> uc003klv.2  uc010lxj
> uc001tzy.2  uc011kzk
> uc010qdj.1  uc001iku
> uc004coe.2  uc002vmf
> uc002dvw.2  uc004bxn
> uc001dmp.2  uc001dmo
> uc002rqd.2  uc010ynl
> uc010cvm.1  uc002qjc
> uc003ewy.3  uc003hgx
> uc002ejy.2  uc003mvb
> uc002fou.1  uc010ilx
> uc003vhf.2  uc010qlo
> uc003mix.2  uc010tdt
> uc002nez.1  uc003wxe
> uc011cpu.1  uc002keg
> uc001ovu.2  uc011dne
> uc010zfg.1  uc001jvq
> uc010jlf.2  uc011azi
> uc001ors.3  uc001vzx
> uc010tyt.1  uc003vih
> uc010fde.2  uc002xgq
> uc010bit.1  uc003zle
> uc010xcb.1  uc010wsg
> uc011acg.1  uc009wlp
> uc002bnj.2  uc004ckd
>
>
> Where is the error? what is wrong in my regular expression code?
>
>
> ___
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor
>



-- 
Joel Goldstick
http://joelgoldstick.com
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] R: Tutor Digest, Vol 125, Issue 49

2014-07-16 Thread Wolfgang Maier

On 16.07.2014 10:04, jarod...@libero.it wrote:

Hi there!!!
I have a file  with this data
['uc002uvo.3 ', 'uc001mae.1']
['uc010dya.2 ', 'uc001kko.2']
['uc003ejx.2 ', 'uc010yfr.1']
['uc001bhk.2 ', 'uc003eib.2']
['uc001znc.2 ', 'uc001efn.2']
['uc002ycq.2 ', 'uc001vnh.2']
['uc001odf.1 ', 'uc002mwd.2']
['uc010jkn.1 ', 'uc010luk.1']
['uc003uhf.3 ', 'uc010tqd.1']
['uc002rue.3 ', 'uc001tex.2']
['uc011dtt.1 ', 'uc001lkv.1']
['uc003yyt.2 ', 'uc003mkl.2']
['uc003pkv.2 ', 'uc003ytw.2']
['uc010bhz.2 ', 'uc002kbt.1']
['uc001wnj.2 ', 'uc009wtj.1']
['uc011lyh.1 ', 'uc003jvb.2']
['uc002awj.1 ', 'uc009znm.1']
['uc010bft.2 ', 'uc002cxz.1']
['uc011mar.1 ', 'uc001lvb.1']
['uc001oxl.2 ', 'uc002lvx.1']

I want to replace of the things after the dots, so I want to have  a file with
this output:

['uc002uvo ', 'uc001mae']
['uc010dya ', 'uc001kko']
...

I try to use regular expression but I have  a strange output

with open("non_annotati.csv") as p:
 for i in p:
 lines= i.rstrip("\n").split("\t")


lines is not the best variable name why not use:
   gene1, gene2 = i.rstrip("\n").split("\t")


 mit = re.sub(r'(\.\d$)','',lines[0])
 mit2 = re.sub(r'(\.\d$)','',lines[1])
 print mit,mit2



While Danny has pointed out the actual reason why your code is not 
working with this specific input data, it's generally not a good idea to 
make too specific assumptions about input formatting by specifying '\n' 
and ’\t' explicitly when all you want to do is to eliminate whitespace:


>>> help(s.split)
Help on built-in function split:

split(...) method of builtins.str instance
S.split(sep=None, maxsplit=-1) -> list of strings

Return a list of the words in S, using sep as the
delimiter string.  If maxsplit is given, at most maxsplit
splits are done. If sep is not specified or is None, any
whitespace string is a separator and empty strings are
removed from the result.

>>> s='uc002uvo.3 \tuc001mae.1\r\n'  # Windows line breaks
>>> s.split()
['uc002uvo.3', 'uc001mae.1']

and I agree with Joel that re is overkill here. In fact, your current 
regexp will fail with two digit numbers after the dot though I don't 
know whether such names can occur in your data.


Best,
Wolfgang

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Question about style

2014-07-16 Thread Jose Amoreira

Hi!

On 07/16/2014 10:14 PM, Wolfgang Maier wrote:


careful here: you just stored a tuple instead of a list; doesn't matter
for your current implementation, but may bite you at some point.


Oh, you're right. Silly mistake, even if harmless for the application I 
have in mind.



 else:
 # x does not belong to any cluster, create a new one
 clusters.append([x,1])
 # return list with centers
 return [center for center, _ in clusters]



Your building of the close_to list with Trues and Falses, then using it
to find the original element again makes me think that you're a regular
R user ? Using such a logical vector (I guess that's what you'd call it
in R) should (almost) never be required in Python.


No, I use mainly python and fortran (but I'm not a "real" programmer). I 
arrived at this particular function from a previous version which only 
took one single line and was quite readable, but it didn't compute the 
centers of the clusters; the first value added to a new cluster would be 
the cluster value, regardless of other added values. Then I decided that 
I wanted the centroids of each cluster and this was the result. It 
smelled bad, but I was trying to hang on to my (supposedly) smart one 
liner...



Here's an implementation of this idea demonstrating how, in Python, you
typically use the builtin enumerate function to avoid "logical vectors"
or other kinds of index juggling:

def aglomerate(x_lst, delta=1.e-5):
 centers = []
 sizes = []
 for x in x_lst:
 for i, center in enumerate(centers):
 if abs(x - center) < delta:
 # x is close to a cluster
 #update the cluster center including the new value,
 #and increment dimension of cluster
 n = sizes[i]
 centers[i] = (n * center + x)/(n+1)
 sizes[i] = n+1
 break
 else:
 # this block is executed only when the break in the preceeding
 # block wasn't reached =>
 # x does not belong to any cluster, create a new one
 centers.append(x)
 sizes.append(1)
 # return list with centers
 return centers


Thanks, Wolfgang. You were very helpful.

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Question about style

2014-07-16 Thread Alan Gauld

On 16/07/14 12:49, Jose Amoreira wrote:

Hello
I wrote a function that, given a list of numbers, finds clusters of
values by proximity and returns a reduced list containing the centers of
these clusters. However, I find it rather unclear. I would appreciate
any comments on how pythonic my function is and suggestions to improve
its readability.


Just throwing this idea in without really thinking about it...
Would itertools.groupby work?

It takes a sorted collection and groups the items found based on a key 
function. If the key function deemed two items identical if they were 
within distance X of each other then groupby might help.


The itertools functions are generally space efficient and
therefore good for large volumes of data.

Just a thought.

--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.flickr.com/photos/alangauldphotos

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Question about style

2014-07-16 Thread Wolfgang Maier

On 16.07.2014 13:49, Jose Amoreira wrote:

Hello
I wrote a function that, given a list of numbers, finds clusters of
values by proximity and returns a reduced list containing the centers of
these clusters. However, I find it rather unclear. I would appreciate
any comments on how pythonic my function is and suggestions to improve
its readability.
The function is:

def aglomerate(x_lst, delta=1.e-5):
 clusters = [] #list of pairs [center, number of clustered values]
 for x in x_lst:
 close_to = [abs(x - y) < delta for y,_ in clusters]
 if any(close_to):
 # x is close to a cluster
 index = close_to.index(True)
 center, n = clusters[index]
 #update the cluster center including the new value,
 #and increment dimension of cluster
 clusters[index] = (n * center + x)/(n+1), n+1


careful here: you just stored a tuple instead of a list; doesn't matter 
for your current implementation, but may bite you at some point.



 else:
 # x does not belong to any cluster, create a new one
 clusters.append([x,1])
 # return list with centers
 return [center for center, _ in clusters]



Your building of the close_to list with Trues and Falses, then using it 
to find the original element again makes me think that you're a regular 
R user ? Using such a logical vector (I guess that's what you'd call it 
in R) should (almost) never be required in Python.
Possible solutions for your case depend somewhat on what you want to be 
able to do. Currently, you are identifying all possible clusters that a 
new value may belong to, but then you are simply adding it to the first 
cluster. If that's all your code needs to do, then you could do (this 
avoids indexing altogether by exploiting the fact that lists are mutable):


def aglomerate(x_lst, delta=1.e-5):
clusters = [] #list of pairs [center, number of clustered values]
for x in x_lst:
for cluster in clusters:
# cluster is now a list of two elements
# lists are mutable objects
center, n = cluster # to keep things readable
if abs(x - center) < delta:
# x is close to a cluster
#update the cluster center including the new value,
#and increment dimension of cluster
# since list are mutable objects you can do this as an
# in-place operation
cluster[0] = (n * center + x)/(n+1)
cluster[1] = n+1
break
else:
# this block is executed if the break in the preceeding
# block wasn't reached =>
# x does not belong to any cluster, create a new one
clusters.append([x,1])
# return list with centers
return [center for center, _ in clusters]

Given your return value on the other hand, it looks like your clusters 
data structure is somewhat suboptimal. In fact, you would be better off 
with two separate lists for centers and sizes, in which case you could 
simply

return centers.

Here's an implementation of this idea demonstrating how, in Python, you 
typically use the builtin enumerate function to avoid "logical vectors" 
or other kinds of index juggling:


def aglomerate(x_lst, delta=1.e-5):
centers = []
sizes = []
for x in x_lst:
for i, center in enumerate(centers):
if abs(x - center) < delta:
# x is close to a cluster
#update the cluster center including the new value,
#and increment dimension of cluster
n = sizes[i]
centers[i] = (n * center + x)/(n+1)
sizes[i] = n+1
break
else:
# this block is executed only when the break in the preceeding
# block wasn't reached =>
# x does not belong to any cluster, create a new one
centers.append(x)
sizes.append(1)
# return list with centers
return centers

In summary, indices in Python should be handled using enumerate and 
sometimes can be avoided completely by in-place manipulations of mutable 
objects.


Best,
Wolfgang
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] Python open source for beginners

2014-07-16 Thread Mitesh H. Budhabhatti
I am learning Python 3 for fun and as hobby.  I am experienced in C#,
ASP.Net.  I want to gain more knowledge in Python.  Can somebody please
suggest open source projects/sites?
Thanks

Warm Regards,
Mitesh H. Budhabhatti
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Python open source for beginners

2014-07-16 Thread wesley chun
> I am learning Python 3 for fun and as hobby.  I am experienced in C#,
> ASP.Net.  I want to gain more knowledge in Python.  Can somebody
> please suggest open source projects/sites?


Greetings Mitesh, and welcome to Python!

Others will have more advice to give on specific projects, but I would
initially suggest: 1) http://python.org which is the main website for the
open source language -- it has all the docs and a beginners' tutorial, 2)
http://ironpython.net which is the website for IronPython, the open source
version of Python implemented for .NET, an area that you're familiar with,
and where you may feel more comfortable joining the Python world. Finally,
3) http://sf.net/projects/pywin32 -- this is the Python Extensions for
Windows library which allows you to create apps using the MFC library,
including COM clients.

As you start coding and running into issues/problems, feel free to drop by
and ask your questions here along with a description of what you tried,
what didn't work, and what was the output and/or stack trace that you got.

Best of luck!
--Wesley

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
"A computer never does what you want... only what you tell it."
+wesley chun  : wescpy at gmail : @wescpy

Python training & consulting : http://CyberwebConsulting.com
"Core Python" books : http://CorePython.com
Python blog: http://wescpy.blogspot.com
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor