Re: [Tutor] Issue with Code
On 2016-04-30 11:30, Olaoluwa Thomas wrote: > I would appreciate a logical explanation for why the "else" statement in > the 2nd script isn't working properly. > > I'm running Python v2.7.8 on a Windows 7 Ultimate VM via Command prompt and > my scripts are created and edited via Notepad++ v6.7.3 > Hi- The problem is that you're reading 'hours' and 'rate' from the user with 'raw_input', and this function returns a string containing the characters that the user typed. You convert these to floating point numbers before doing any processing of the gross pay, but in your 'GrossPayv2.py', you compare the string referred to by 'hours' to the numeric value 40. In Python 2, strings always compare as greater than integers: """ Python 2.7.11+ (default, Apr 17 2016, 14:00:29) [GCC 5.3.1 20160413] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> "60" > 40 True >>> "20" > 40 True """ This unfortunate behavior is one of the things fixed in Python 3. Unless you have a compelling reason otherwise (like a course or textbook that you're learning from), I would recommend using Python 3 instead of 2, since many of these "gotcha" behaviors have been fixed in the newer (but backward-incompatible) version of the language. Specifically: """ Python 3.5.1+ (default, Mar 30 2016, 22:46:26) [GCC 5.3.1 20160330] on linux Type "help", "copyright", "credits" or "license" for more information. >>> "60" > 40 Traceback (most recent call last): File "", line 1, in TypeError: unorderable types: str() > int() """ MMR... ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] job killed: too high numbers?
Hello- On 2016-09-20 11:48, Gabriele Brambilla wrote: > does it mean that my number of points is too high? In short, yes. From your usage of the 'print' statement, you are running the code under Python 2.x. In this version of Python, the 'range' function creates a full list of numbers, and so you are asking 'range' to create a list of 33.8 billion integers. Python lists are essentially implemented in C as dense arrays of pointers to PyObject structs, so in addition to the actual numeric values, you will need eight bytes per value in the list (assuming a 64-bit OS and Python build). This is already 270GB of memory just for these pointers, in addition to the actual numeric values, which might take up to an additional 135GB (if each numeric value is stored as a 32-bit integer). Are you running this on a machine with ≥405GB memory? To solve your immediate problem, you could replace 'range' with 'xrange' in your code, but this will probably only allow you to encounter another problem: this loop will take a *very* long time to run, even without doing any numerical work inside it. Unfortunately, there's no way to suggest any algorithm/numerical analysis improvements without more information about what you're trying to accomplish. MMR... ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] String within a string solution (newbie question)
On 10/26/2016 02:06 PM, Wish Dokta wrote: > Hello, > > I am currently writing a basic program to calculate and display the size of > folders with a drive/directory. To do this I am storing each directory in a > dict as the key, with the value being the sum of the size of all files in > that directories (but not directories). > > For example: > > { "C:\\docs" : 10, "C:\\docs123" : 200, "C:\\docs\\code\\snippets" : 5, > "C:\\docs\\code" : 20, "C:\\docs\\pics" : 200, "C:\\docs\\code\\python" : > 10 } > > Then to return the total size of a directory I am searching for a string in > the key: > > For example: > > for "C:\\docs\\code" in key: > > Which works fine and will return "C:\\docs\\code" : 20, > "C:\\docs\\code\\snippets" : 5, "C:\\docs\\code\\python" : 10 = (35) > > However it fails when I try to calculate the size of a directory such as > "C:\\docs", as it also returns "C:\\docs123". > > I'd be very grateful if anyone could offer any advice on how to correct > this. Hello- As you saw in your current approach, using strings for paths can be problematic in a lot of scenarios. I've found it really useful to use a higher-level abstraction instead, like what is provided by pathlib in the standard library. You're obviously using Windows, and you didn't mention your Python version, so I'll assume you're using something current like 3.5.2 (at least 3.4 is required for the following code). You could do something like the following: """ from pathlib import PureWindowsPath # From your example sizes_str_keys = { "C:\\docs": 10, "C:\\docs123": 200, "C:\\docs\\code\\snippets": 5, "C:\\docs\\code": 20, "C:\\docs\\pics": 200, "C:\\docs\\code\\python": 10, } # Same dict, but with Path objects as keys, and the same sizes as values. # You would almost definitely want to use Path in your code (and adjust # the 'pathlib' import appropriately), but I'm on a Linux system so I had # to use a PureWindowsPath instead. sizes_path_keys = {PureWindowsPath(p): s for (p, s) in sizes_str_keys.items()} def filter_paths(size_dict, top_level_directory): for path in size_dict: # Given some directory we're examining (e.g. c:\docs\code\snippets), # and top-level directory (e.g. c:\docs), we want to yield this # directory if it exactly matches (of course) or if the top-level # directory is a parent of what we're looking at: # >>> pprint(list(PureWindowsPath("C:\\docs\\code\\snippets").parents)) # [PureWindowsPath('C:/docs/code'), # PureWindowsPath('C:/docs'), # PureWindowsPath('C:/')] # so in that case we'll find 'c:\docs' in iterating over path.parents. # You'll definitely want to remove the 'print' calls too: if path == top_level_directory or top_level_directory in path.parents: print('Matched', path) yield path else: print('No match for', path) def compute_subdir_size(size_dict, top_level_directory): total_size = 0 for dir_key in filter_paths(size_dict, top_level_directory): total_size += size_dict[dir_key] return total_size """ Then you could call 'compute_subdir_size' like so: """ >>> compute_subdir_size(sizes_path_keys, PureWindowsPath(r'c:\docs')) Matched C:\docs\code\snippets No match for C:\docs123 Matched C:\docs\code\python Matched C:\docs\pics Matched C:\docs\code Matched C:\docs 245 >>> compute_subdir_size(sizes_path_keys, PureWindowsPath(r'c:\docs\code')) Matched C:\docs\code\snippets No match for C:\docs123 Matched C:\docs\code\python No match for C:\docs\pics Matched C:\docs\code No match for C:\docs 35 """ MMR... ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Euclidean Distances between Atoms in a Molecule.
Hi Stephen- The `scipy.spatial.distance` module (part of the SciPy package) contains what you will need -- specifically, the `scipy.spatial.distance.pdist` function, which takes a matrix of m observations in n-dimensional space, and returns a condensed distance matrix as described in https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.pdist.html . This condensed distance matrix can be expanded into a full m by m matrix with `scipy.spatial.distance.squareform` as follows: """ In [1]: import pandas as pd In [2]: from io import StringIO In [3]: s = StringIO(''' ...: MASS X Y Z ...: 0 12.011 -3.265636 0.198894 0.090858 ...: 1 12.011 -1.307161 1.522212 1.003463 ...: 2 12.011 1.213336 0.948208 -0.033373 ...: 3 14.007 3.238650 1.041523 1.301322 ...: 4 12.011 -5.954489 0.650878 0.803379 ...: 5 12.011 5.654476 0.480066 0.013757 ...: 6 12.011 6.372043 2.731713 -1.662411 ...: 7 12.011 7.655753 0.168393 2.096802 ...: 8 12.011 5.563051 -1.990203 -1.511875 ...: 91.008 -2.939469 -1.327967 -1.247635 ...: 10 1.008 -1.460475 2.993912 2.415410 ...: 11 1.008 1.218042 0.451815 -2.057439 ...: 12 1.008 -6.255901 2.575035 1.496984 ...: 13 1.008 -6.560562 -0.695722 2.248982 ...: 14 1.008 -7.152500 0.390758 -0.864115 ...: 15 1.008 4.959548 3.061356 -3.139100 ...: 16 1.008 8.197613 2.429073 -2.588339 ...: 17 1.008 6.503322 4.471092 -0.543939 ...: 18 1.008 7.845274 1.892126 3.227577 ...: 19 1.008 9.512371 -0.273198 1.291080 ...: 20 1.008 7.147039 -1.365346 3.393778 ...: 21 1.008 4.191488 -1.928466 -3.057804 ...: 22 1.008 5.061650 -3.595015 -0.302810 ...: 23 1.008 7.402586 -2.392148 -2.374554 ...: ''') In [4]: d = pd.read_table(s, sep='\\s+', index_col=0) In [5]: d.head() Out[5]: MASS X Y Z 0 12.011 -3.265636 0.198894 0.090858 1 12.011 -1.307161 1.522212 1.003463 2 12.011 1.213336 0.948208 -0.033373 3 14.007 3.238650 1.041523 1.301322 4 12.011 -5.954489 0.650878 0.803379 In [6]: points = d.loc[:, ['X', 'Y', 'Z']] In [7]: import scipy.spatial.distance In [8]: distances = scipy.spatial.distance.pdist(points) In [9]: distances.shape Out[9]: (276,) In [10]: distances Out[10]: array([ 2.53370139, 4.54291701, 6.6694065 , 2.81813878, 8.92487537, 10.11800281, 11.10411993, 9.23615791, 2.05651475, 4.0588513 , 4.97820424, 4.0700026 , 4.03910564, 4.0070559 , 9.28870116, 11.98156386, 10.68116021, 11.66869152, 12.84293061, 11.03539433, 8.36949409, 9.15928011, 11.25178722, 2.78521357, 4.58084922, 4.73253781, 7.10844399, 8.21826934, 9.13028167, 8.11565138, 3.98188296, 2.04523847, In [11]: scipy.spatial.distance.squareform(distances) Out[11]: array([[ 0., 2.53370139, 4.54291701, 6.6694065 , 2.81813878, 8.92487537, 10.11800281, 11.10411993, 9.23615791, 2.05651475, 4.0588513 , 4.97820424, 4.0700026 , 4.03910564, 4.0070559 , 9.28870116, 11.98156386, 10.68116021, 11.66869152, 12.84293061, 11.03539433, 8.36949409, 9.15928011, 11.25178722], [ 2.53370139, 0., 2.78521357, 4.58084922, 4.73253781, 7.10844399, 8.21826934, 9.13028167, 8.11565138, 3.98188296, 2.04523847, 4.10992956, 5.08350537, 5.83684597, 6.2398737 , 7.66820932, 10.2011846 , 8.49081803, 9.42605887, 10.9712576 , 9.24797787, 7.65742836, 8.27370019, 10.12881562], """ MMR... On 2017-04-02 13:41, Stephen P. Molnar wrote: > I am trying to port a program that I wrote in FORTRAN twenty years ago > into Python 3 and am having a hard time trying to calculate the > Euclidean distance between each atom in the molecule and every other > atom in the molecule. > > Here is a typical table of coordinates: > > > MASS X Y Z > 0 12.011 -3.265636 0.198894 0.090858 > 1 12.011 -1.307161 1.522212 1.003463 > 2 12.011 1.213336 0.948208 -0.033373 > 3 14.007 3.238650 1.041523 1.301322 > 4 12.011 -5.954489 0.650878 0.803379 > 5 12.011 5.654476 0.480066 0.013757 > 6 12.011 6.372043 2.731713 -1.662411 > 7 12.011 7.655753 0.168393 2.096802 > 8 12.011 5.563051 -1.990203 -1.511875 > 91.008 -2.939469 -1.327967 -1.247635 > 10 1.008 -1.460475 2.993912 2.415410 > 11 1.008 1.218042 0.451815 -2.057439 > 12 1.008 -6.255901 2.575035 1.496984 > 13 1.008 -6.560562 -0.695722 2.248982 > 14 1.008 -7.152500 0.390758 -0.864115 > 15 1.008 4.959548 3.061356 -3.139100 > 16 1.008 8.197613 2.429073 -2.588339 > 17 1.008 6.503322 4.471092 -0.543939 > 18 1.008 7.845274 1.892126 3.227577 > 19 1.008 9.512371 -0.273198 1.291080 > 20 1.008 7.147039 -1.365346 3.393778 > 21 1
Re: [Tutor] How can I find a group of characters in a list of strings?
On 2018-07-25 20:23, Mats Wichmann wrote: > On 07/25/2018 05:50 PM, Jim wrote: >> Linux mint 18 and python 3.6 >> >> I have a list of strings that contains slightly more than a million >> items. Each item is a string of 8 capital letters like so: >> >> ['MIBMMCCO', 'YOWHHOY', ...] >> >> I need to check and see if the letters 'OFHCMLIP' are one of the items >> in the list but there is no way to tell in what order the letters will >> appear. So I can't just search for the string 'OFHCMLIP'. I just need to >> locate any strings that are made up of those letters no matter their order. >> >> I suppose I could loop over the list and loop over each item using a >> bunch of if statements exiting the inner loop as soon as I find a letter >> is not in the string, but there must be a better way. >> >> I'd appreciate hearing about a better way to attack this. > It's possible that the size of the biglist and the length of the key has > enough performance impacts that a quicky (untested because I don't have > your data) solution is unworkable for performance reasons. But a quicky > might be to take these two steps: > > 1. generate a list of the permutations of the target > 2. check if any member of the target-permutation-list is in the biglist. > > Python sets are a nice way to check membership. > > from itertools import permutations > permlist = [''.join(p) for p in permutations('MIBMMCCO', 8)] > > if not set(permlist).isdisjoint(biglist): > print("Found a permutation of MIBMMCCO") > I would *strongly* recommend against keeping a list of all permutations of the query string; though there are only 8! = 40320 permutations of 8 characters, suggesting anything with factorial runtime should be done only as a last resort. This could pretty effectively be solved by considering each string in the list as a set of characters for query purposes, and keeping a set of those, making membership testing constant-time. Note that the inner sets will have to be frozensets because normal sets aren't hashable. For example: """ In [1]: strings = ['MIBMMCCO', 'YOWHHOY'] In [2]: query = 'OFHCMLIP' In [3]: search_db = {frozenset(s) for s in strings} In [4]: frozenset(query) in search_db Out[4]: False In [5]: frozenset('MMCOCBIM') in search_db # permutation of first string Out[5]: True """ MMR... ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Moving a conda environment to an off-line computer
Hi Henrique- It is quite easy to transfer an Anaconda installation from one machine to the other by copying all of the files -- I have done this repeatedly with cluster compute environments. It is sometimes nicer to run `conda upgrade --all` in a local VM and then `rsync` the updated Anaconda installation between machines, since (as you mentioned) internet access can sometimes be an issue. It looks like you did everything correctly, and everything is "working" as well as you would expect. As Alan mentioned, though, it looks like the 'deepchem' package is trying to access the internet to load one of its data sets, and this is what is failing. You could perhaps download that data set and put it somewhere on the cluster where deepchem would know where to look for it, to avoid having to download it, but I am completely unfamiliar with deepchem so I can't offer any advice about how to do that. MMR... On 30/11/18 08:47, Henrique Castro wrote: > Dear colleagues, > Soon I'll start to use one of the powerful computers on my university as a > tool in my Ph.D. The computer does not have an internet connection and I need > to find a way to install a conda environment on it. > At first I tried to install and set the conda environment that I need in a > computer with internet connection and taking care to copy everything in a > path that is similar in the off-line computer (i.e I installed everything on > /home/henrique/bin/anaconda3 at home and tried to copy everything to > /home/henrique/bin/anaconda3 in the off-line computer - with the same > .bashrc) but when I run conda I get an error(it works on my home computer): > > (deepchem) [henrique@europio qm7] $ python qm7_ANI.py > /home/henrique/bin/anaconda3/envs/deepchem/lib/python3.6/site-packages/sklearn/ensemble/weight_boosting.py:29: > DeprecationWarning: numpy.core.umath_tests is an internal NumPy module and > should not be imported. It will be removed in a future NumPy release. > from numpy.core.umath_tests import inner1d > Traceback (most recent call last): > File > "/home/henrique/bin/anaconda3/envs/deepchem/lib/python3.6/urllib/request.py", > line 1318, in do_open > encode_chunked=req.has_header('Transfer-encoding')) > File > "/home/henrique/bin/anaconda3/envs/deepchem/lib/python3.6/http/client.py", > line 1239, in request > self._send_request(method, url, body, headers, encode_chunked) > File > "/home/henrique/bin/anaconda3/envs/deepchem/lib/python3.6/http/client.py", > line 1285, in _send_request > self.endheaders(body, encode_chunked=encode_chunked) > File > "/home/henrique/bin/anaconda3/envs/deepchem/lib/python3.6/http/client.py", > line 1234, in endheaders > self._send_output(message_body, encode_chunked=encode_chunked) > File > "/home/henrique/bin/anaconda3/envs/deepchem/lib/python3.6/http/client.py", > line 1026, in _send_output > self.send(msg) > File > "/home/henrique/bin/anaconda3/envs/deepchem/lib/python3.6/http/client.py", > line 964, in send > self.connect() > File > "/home/henrique/bin/anaconda3/envs/deepchem/lib/python3.6/http/client.py", > line 936, in connect > (self.host,self.port), self.timeout, self.source_address) > File "/home/henrique/bin/anaconda3/envs/deepchem/lib/python3.6/socket.py", > line 704, in create_connection > for res in getaddrinfo(host, port, 0, SOCK_STREAM): > File "/home/henrique/bin/anaconda3/envs/deepchem/lib/python3.6/socket.py", > line 745, in getaddrinfo > for res in _socket.getaddrinfo(host, port, family, type, proto, flags): > socket.gaierror: [Errno -2] Name or service not known > > During handling of the above exception, another exception occurred: > > Traceback (most recent call last): > File "qm7_ANI.py", line 15, in > featurizer='BPSymmetryFunction', split='stratified', move_mean=False) > File > "/home/henrique/bin/anaconda3/envs/deepchem/lib/python3.6/site-packages/deepchem/molnet/load_function/qm7_datasets.py", > line 50, in load_qm7_from_mat > 'http://deepchem.io.s3-website-us-west-1.amazonaws.com/datasets/qm7.mat' > File > "/home/henrique/bin/anaconda3/envs/deepchem/lib/python3.6/site-packages/deepchem/utils/__init__.py", > line 85, in download_url > urlretrieve(url, os.path.join(dest_dir, name)) > File > "/home/henrique/bin/anaconda3/envs/deepchem/lib/python3.6/urllib/request.py", > line 248, in urlretrieve > with contextlib.closing(urlopen(url, data)) as fp: > File > "/home/henrique/bin/anaconda3/envs/deepchem/lib/python3.6/urllib/request.py", > line 223, in urlopen > return opener.open(url, data, timeout) > File > "/home/henrique/bin/anaconda3/envs/deepchem/lib/python3.6/urllib/request.py", > line 526, in open > response = self._open(req, data) > File > "/home/henrique/bin/anaconda3/envs/deepchem/lib/python3.6/urllib/request.py", > line 544, in _open > '_open', req) > File > "/home/henrique/bin/anaconda3/envs/deepchem/lib/python3.6/urllib/request.py", > line 504,