Re: [Tutor] Issue with Code

2016-04-30 Thread Matt Ruffalo
On 2016-04-30 11:30, Olaoluwa Thomas wrote:
> I would appreciate a logical explanation for why the "else" statement in
> the 2nd script isn't working properly.
>
> I'm running Python v2.7.8 on a Windows 7 Ultimate VM via Command prompt and
> my scripts are created and edited via Notepad++ v6.7.3
>

Hi-

The problem is that you're reading 'hours' and 'rate' from the user with
'raw_input', and this function returns a string containing the
characters that the user typed. You convert these to floating point
numbers before doing any processing of the gross pay, but in your
'GrossPayv2.py', you compare the string referred to by 'hours' to the
numeric value 40.

In Python 2, strings always compare as greater than integers:

"""
Python 2.7.11+ (default, Apr 17 2016, 14:00:29)
[GCC 5.3.1 20160413] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> "60" > 40
True
>>> "20" > 40
True
"""

This unfortunate behavior is one of the things fixed in Python 3. Unless
you have a compelling reason otherwise (like a course or textbook that
you're learning from), I would recommend using Python 3 instead of 2,
since many of these "gotcha" behaviors have been fixed in the newer (but
backward-incompatible) version of the language.

Specifically:
"""
Python 3.5.1+ (default, Mar 30 2016, 22:46:26)
[GCC 5.3.1 20160330] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> "60" > 40
Traceback (most recent call last):
  File "", line 1, in 
TypeError: unorderable types: str() > int()
"""

MMR...
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] job killed: too high numbers?

2016-09-20 Thread Matt Ruffalo
Hello-

On 2016-09-20 11:48, Gabriele Brambilla wrote:
> does it mean that my number of points is too high?

In short, yes. From your usage of the 'print' statement, you are running
the code under Python 2.x. In this version of Python, the 'range'
function creates a full list of numbers, and so you are asking 'range'
to create a list of 33.8 billion integers. Python lists are essentially
implemented in C as dense arrays of pointers to PyObject structs, so in
addition to the actual numeric values, you will need eight bytes per
value in the list (assuming a 64-bit OS and Python build). This is
already 270GB of memory just for these pointers, in addition to the
actual numeric values, which might take up to an additional 135GB (if
each numeric value is stored as a 32-bit integer). Are you running this
on a machine with ≥405GB memory?

To solve your immediate problem, you could replace 'range' with 'xrange'
in your code, but this will probably only allow you to encounter another
problem: this loop will take a *very* long time to run, even without
doing any numerical work inside it. Unfortunately, there's no way to
suggest any algorithm/numerical analysis improvements without more
information about what you're trying to accomplish.

MMR...
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] String within a string solution (newbie question)

2016-10-27 Thread Matt Ruffalo
On 10/26/2016 02:06 PM, Wish Dokta wrote:
> Hello,
>
> I am currently writing a basic program to calculate and display the size of
> folders with a drive/directory. To do this I am storing each directory in a
> dict as the key, with the value being the sum of the size of all files in
> that directories (but not directories).
>
> For example:
>
> { "C:\\docs" : 10, "C:\\docs123" : 200, "C:\\docs\\code\\snippets" : 5,
> "C:\\docs\\code" : 20, "C:\\docs\\pics" : 200, "C:\\docs\\code\\python" :
> 10  }
>
> Then to return the total size of a directory I am searching for a string in
> the key:
>
> For example:
>
> for "C:\\docs\\code" in key:
>
> Which works fine and will return "C:\\docs\\code" : 20,
> "C:\\docs\\code\\snippets" : 5, "C:\\docs\\code\\python" : 10 = (35)
>
> However it fails when I try to calculate the size of a directory such as
> "C:\\docs", as it also returns "C:\\docs123".
>
> I'd be very grateful if anyone could offer any advice on how to correct
> this.

Hello-

As you saw in your current approach, using strings for paths can be
problematic in a lot of scenarios. I've found it really useful to use a
higher-level abstraction instead, like what is provided by pathlib in
the standard library. You're obviously using Windows, and you didn't
mention your Python version, so I'll assume you're using something
current like 3.5.2 (at least 3.4 is required for the following code).

You could do something like the following:

"""
from pathlib import PureWindowsPath

# From your example
sizes_str_keys = {
"C:\\docs": 10,
"C:\\docs123": 200,
"C:\\docs\\code\\snippets": 5,
"C:\\docs\\code": 20,
"C:\\docs\\pics": 200,
"C:\\docs\\code\\python": 10,
}

# Same dict, but with Path objects as keys, and the same sizes as values.
# You would almost definitely want to use Path in your code (and adjust
# the 'pathlib' import appropriately), but I'm on a Linux system so I had
# to use a PureWindowsPath instead.
sizes_path_keys = {PureWindowsPath(p): s for (p, s) in
sizes_str_keys.items()}

def filter_paths(size_dict, top_level_directory):
for path in size_dict:
# Given some directory we're examining (e.g. c:\docs\code\snippets),
# and top-level directory (e.g. c:\docs), we want to yield this
# directory if it exactly matches (of course) or if the top-level
# directory is a parent of what we're looking at:

# >>>
pprint(list(PureWindowsPath("C:\\docs\\code\\snippets").parents))
# [PureWindowsPath('C:/docs/code'),
#  PureWindowsPath('C:/docs'),
#  PureWindowsPath('C:/')]

# so in that case we'll find 'c:\docs' in iterating over
path.parents.

# You'll definitely want to remove the 'print' calls too:
if path == top_level_directory or top_level_directory in
path.parents:
print('Matched', path)
yield path
else:
print('No match for', path)

def compute_subdir_size(size_dict, top_level_directory):
total_size = 0
for dir_key in filter_paths(size_dict, top_level_directory):
total_size += size_dict[dir_key]
return total_size
"""

Then you could call 'compute_subdir_size' like so:

"""
>>> compute_subdir_size(sizes_path_keys, PureWindowsPath(r'c:\docs'))
Matched C:\docs\code\snippets
No match for C:\docs123
Matched C:\docs\code\python
Matched C:\docs\pics
Matched C:\docs\code
Matched C:\docs
245
>>> compute_subdir_size(sizes_path_keys, PureWindowsPath(r'c:\docs\code'))
Matched C:\docs\code\snippets
No match for C:\docs123
Matched C:\docs\code\python
No match for C:\docs\pics
Matched C:\docs\code
No match for C:\docs
35
"""

MMR...
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Euclidean Distances between Atoms in a Molecule.

2017-04-03 Thread Matt Ruffalo
Hi Stephen-

The `scipy.spatial.distance` module (part of the SciPy package) contains
what you will need -- specifically, the `scipy.spatial.distance.pdist`
function, which takes a matrix of m observations in n-dimensional space,
and returns a condensed distance matrix as described in
https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.pdist.html
. This condensed distance matrix can be expanded into a full m by m
matrix with `scipy.spatial.distance.squareform` as follows:

"""
In [1]: import pandas as pd

In [2]: from io import StringIO

In [3]: s = StringIO('''
   ...:   MASS X Y Z
   ...: 0   12.011 -3.265636  0.198894  0.090858
   ...: 1   12.011 -1.307161  1.522212  1.003463
   ...: 2   12.011  1.213336  0.948208 -0.033373
   ...: 3   14.007  3.238650  1.041523  1.301322
   ...: 4   12.011 -5.954489  0.650878  0.803379
   ...: 5   12.011  5.654476  0.480066  0.013757
   ...: 6   12.011  6.372043  2.731713 -1.662411
   ...: 7   12.011  7.655753  0.168393  2.096802
   ...: 8   12.011  5.563051 -1.990203 -1.511875
   ...: 91.008 -2.939469 -1.327967 -1.247635
   ...: 10   1.008 -1.460475  2.993912  2.415410
   ...: 11   1.008  1.218042  0.451815 -2.057439
   ...: 12   1.008 -6.255901  2.575035  1.496984
   ...: 13   1.008 -6.560562 -0.695722  2.248982
   ...: 14   1.008 -7.152500  0.390758 -0.864115
   ...: 15   1.008  4.959548  3.061356 -3.139100
   ...: 16   1.008  8.197613  2.429073 -2.588339
   ...: 17   1.008  6.503322  4.471092 -0.543939
   ...: 18   1.008  7.845274  1.892126  3.227577
   ...: 19   1.008  9.512371 -0.273198  1.291080
   ...: 20   1.008  7.147039 -1.365346  3.393778
   ...: 21   1.008  4.191488 -1.928466 -3.057804
   ...: 22   1.008  5.061650 -3.595015 -0.302810
   ...: 23   1.008  7.402586 -2.392148 -2.374554
   ...: ''')

In [4]: d = pd.read_table(s, sep='\\s+', index_col=0)

In [5]: d.head()
Out[5]:
 MASS X Y Z
0  12.011 -3.265636  0.198894  0.090858
1  12.011 -1.307161  1.522212  1.003463
2  12.011  1.213336  0.948208 -0.033373
3  14.007  3.238650  1.041523  1.301322
4  12.011 -5.954489  0.650878  0.803379

In [6]: points = d.loc[:, ['X', 'Y', 'Z']]

In [7]: import scipy.spatial.distance

In [8]: distances = scipy.spatial.distance.pdist(points)

In [9]: distances.shape
Out[9]: (276,)

In [10]: distances
Out[10]:
array([  2.53370139,   4.54291701,   6.6694065 ,   2.81813878,
 8.92487537,  10.11800281,  11.10411993,   9.23615791,
 2.05651475,   4.0588513 ,   4.97820424,   4.0700026 ,
 4.03910564,   4.0070559 ,   9.28870116,  11.98156386,
10.68116021,  11.66869152,  12.84293061,  11.03539433,
 8.36949409,   9.15928011,  11.25178722,   2.78521357,
 4.58084922,   4.73253781,   7.10844399,   8.21826934,
 9.13028167,   8.11565138,   3.98188296,   2.04523847,



In [11]: scipy.spatial.distance.squareform(distances)
Out[11]:
array([[  0.,   2.53370139,   4.54291701,   6.6694065 ,
  2.81813878,   8.92487537,  10.11800281,  11.10411993,
  9.23615791,   2.05651475,   4.0588513 ,   4.97820424,
  4.0700026 ,   4.03910564,   4.0070559 ,   9.28870116,
 11.98156386,  10.68116021,  11.66869152,  12.84293061,
 11.03539433,   8.36949409,   9.15928011,  11.25178722],
   [  2.53370139,   0.,   2.78521357,   4.58084922,
  4.73253781,   7.10844399,   8.21826934,   9.13028167,
  8.11565138,   3.98188296,   2.04523847,   4.10992956,
  5.08350537,   5.83684597,   6.2398737 ,   7.66820932,
 10.2011846 ,   8.49081803,   9.42605887,  10.9712576 ,
  9.24797787,   7.65742836,   8.27370019,  10.12881562],


"""

MMR...

On 2017-04-02 13:41, Stephen P. Molnar wrote:
> I am trying to port a program that I wrote in FORTRAN twenty years ago
> into Python 3 and am having a hard time trying to calculate the
> Euclidean distance between each atom in the molecule and every other
> atom in the molecule.
>
> Here is a typical table of coordinates:
>
>
>   MASS X Y Z
> 0   12.011 -3.265636  0.198894  0.090858
> 1   12.011 -1.307161  1.522212  1.003463
> 2   12.011  1.213336  0.948208 -0.033373
> 3   14.007  3.238650  1.041523  1.301322
> 4   12.011 -5.954489  0.650878  0.803379
> 5   12.011  5.654476  0.480066  0.013757
> 6   12.011  6.372043  2.731713 -1.662411
> 7   12.011  7.655753  0.168393  2.096802
> 8   12.011  5.563051 -1.990203 -1.511875
> 91.008 -2.939469 -1.327967 -1.247635
> 10   1.008 -1.460475  2.993912  2.415410
> 11   1.008  1.218042  0.451815 -2.057439
> 12   1.008 -6.255901  2.575035  1.496984
> 13   1.008 -6.560562 -0.695722  2.248982
> 14   1.008 -7.152500  0.390758 -0.864115
> 15   1.008  4.959548  3.061356 -3.139100
> 16   1.008  8.197613  2.429073 -2.588339
> 17   1.008  6.503322  4.471092 -0.543939
> 18   1.008  7.845274  1.892126  3.227577
> 19   1.008  9.512371 -0.273198  1.291080
> 20   1.008  7.147039 -1.365346  3.393778
> 21   1

Re: [Tutor] How can I find a group of characters in a list of strings?

2018-07-26 Thread Matt Ruffalo
On 2018-07-25 20:23, Mats Wichmann wrote:
> On 07/25/2018 05:50 PM, Jim wrote:
>> Linux mint 18 and python 3.6
>>
>> I have a list of strings that contains slightly more than a million
>> items. Each item is a string of 8 capital letters like so:
>>
>> ['MIBMMCCO', 'YOWHHOY', ...]
>>
>> I need to check and see if the letters 'OFHCMLIP' are one of the items
>> in the list but there is no way to tell in what order the letters will
>> appear. So I can't just search for the string 'OFHCMLIP'. I just need to
>> locate any strings that are made up of those letters no matter their order.
>>
>> I suppose I could loop over the list and loop over each item using a
>> bunch of if statements exiting the inner loop as soon as I find a letter
>> is not in the string, but there must be a better way.
>>
>> I'd appreciate hearing about a better way to attack this.
> It's possible that the size of the biglist and the length of the key has
> enough performance impacts that a quicky (untested because I don't have
> your data) solution is unworkable for performance reasons.  But a quicky
> might be to take these two steps:
>
> 1. generate a list of the permutations of the target
> 2. check if any member of the target-permutation-list is in the biglist.
>
> Python sets are a nice way to check membership.
>
> from itertools import permutations
> permlist = [''.join(p) for p in permutations('MIBMMCCO', 8)]
>
> if not set(permlist).isdisjoint(biglist):
> print("Found a permutation of MIBMMCCO")
>

I would *strongly* recommend against keeping a list of all permutations
of the query string; though there are only 8! = 40320 permutations of 8
characters, suggesting anything with factorial runtime should be done
only as a last resort.

This could pretty effectively be solved by considering each string in
the list as a set of characters for query purposes, and keeping a set of
those, making membership testing constant-time. Note that the inner sets
will have to be frozensets because normal sets aren't hashable.

For example:

"""
In [1]: strings = ['MIBMMCCO', 'YOWHHOY']

In [2]: query = 'OFHCMLIP'

In [3]: search_db = {frozenset(s) for s in strings}

In [4]: frozenset(query) in search_db
Out[4]: False

In [5]: frozenset('MMCOCBIM') in search_db # permutation of first string
Out[5]: True
"""

MMR...
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Moving a conda environment to an off-line computer

2018-12-02 Thread Matt Ruffalo
Hi Henrique-

It is quite easy to transfer an Anaconda installation from one machine
to the other by copying all of the files -- I have done this repeatedly
with cluster compute environments. It is sometimes nicer to run `conda
upgrade --all` in a local VM and then `rsync` the updated Anaconda
installation between machines, since (as you mentioned) internet access
can sometimes be an issue.

It looks like you did everything correctly, and everything is "working"
as well as you would expect. As Alan mentioned, though, it looks like
the 'deepchem' package is trying to access the internet to load one of
its data sets, and this is what is failing. You could perhaps download
that data set and put it somewhere on the cluster where deepchem would
know where to look for it, to avoid having to download it, but I am
completely unfamiliar with deepchem so I can't offer any advice about
how to do that.

MMR...

On 30/11/18 08:47, Henrique Castro wrote:
> Dear colleagues,
> Soon I'll start to use one of the powerful computers on my university as a 
> tool in my Ph.D. The computer does not have an internet connection and I need 
> to find a way to install a conda environment on it.
> At first I tried to install and set the conda environment that I need in a 
> computer with internet connection and taking care to copy everything in a 
> path that is similar in the off-line computer (i.e I installed everything on 
> /home/henrique/bin/anaconda3 at home and tried to copy everything to 
> /home/henrique/bin/anaconda3 in the off-line computer - with the same 
> .bashrc) but when I run conda I get an error(it works on my home computer):
>
> (deepchem) [henrique@europio qm7] $ python qm7_ANI.py
> /home/henrique/bin/anaconda3/envs/deepchem/lib/python3.6/site-packages/sklearn/ensemble/weight_boosting.py:29:
>  DeprecationWarning: numpy.core.umath_tests is an internal NumPy module and 
> should not be imported. It will be removed in a future NumPy release.
>   from numpy.core.umath_tests import inner1d
> Traceback (most recent call last):
>   File 
> "/home/henrique/bin/anaconda3/envs/deepchem/lib/python3.6/urllib/request.py", 
> line 1318, in do_open
> encode_chunked=req.has_header('Transfer-encoding'))
>   File 
> "/home/henrique/bin/anaconda3/envs/deepchem/lib/python3.6/http/client.py", 
> line 1239, in request
> self._send_request(method, url, body, headers, encode_chunked)
>   File 
> "/home/henrique/bin/anaconda3/envs/deepchem/lib/python3.6/http/client.py", 
> line 1285, in _send_request
> self.endheaders(body, encode_chunked=encode_chunked)
>   File 
> "/home/henrique/bin/anaconda3/envs/deepchem/lib/python3.6/http/client.py", 
> line 1234, in endheaders
> self._send_output(message_body, encode_chunked=encode_chunked)
>   File 
> "/home/henrique/bin/anaconda3/envs/deepchem/lib/python3.6/http/client.py", 
> line 1026, in _send_output
> self.send(msg)
>   File 
> "/home/henrique/bin/anaconda3/envs/deepchem/lib/python3.6/http/client.py", 
> line 964, in send
> self.connect()
>   File 
> "/home/henrique/bin/anaconda3/envs/deepchem/lib/python3.6/http/client.py", 
> line 936, in connect
> (self.host,self.port), self.timeout, self.source_address)
>   File "/home/henrique/bin/anaconda3/envs/deepchem/lib/python3.6/socket.py", 
> line 704, in create_connection
> for res in getaddrinfo(host, port, 0, SOCK_STREAM):
>   File "/home/henrique/bin/anaconda3/envs/deepchem/lib/python3.6/socket.py", 
> line 745, in getaddrinfo
> for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
> socket.gaierror: [Errno -2] Name or service not known
>
> During handling of the above exception, another exception occurred:
>
> Traceback (most recent call last):
>   File "qm7_ANI.py", line 15, in 
> featurizer='BPSymmetryFunction', split='stratified', move_mean=False)
>   File 
> "/home/henrique/bin/anaconda3/envs/deepchem/lib/python3.6/site-packages/deepchem/molnet/load_function/qm7_datasets.py",
>  line 50, in load_qm7_from_mat
> 'http://deepchem.io.s3-website-us-west-1.amazonaws.com/datasets/qm7.mat'
>   File 
> "/home/henrique/bin/anaconda3/envs/deepchem/lib/python3.6/site-packages/deepchem/utils/__init__.py",
>  line 85, in download_url
> urlretrieve(url, os.path.join(dest_dir, name))
>   File 
> "/home/henrique/bin/anaconda3/envs/deepchem/lib/python3.6/urllib/request.py", 
> line 248, in urlretrieve
> with contextlib.closing(urlopen(url, data)) as fp:
>   File 
> "/home/henrique/bin/anaconda3/envs/deepchem/lib/python3.6/urllib/request.py", 
> line 223, in urlopen
> return opener.open(url, data, timeout)
>   File 
> "/home/henrique/bin/anaconda3/envs/deepchem/lib/python3.6/urllib/request.py", 
> line 526, in open
> response = self._open(req, data)
>   File 
> "/home/henrique/bin/anaconda3/envs/deepchem/lib/python3.6/urllib/request.py", 
> line 544, in _open
> '_open', req)
>   File 
> "/home/henrique/bin/anaconda3/envs/deepchem/lib/python3.6/urllib/request.py", 
> line 504,