Lists and Sublists
We have a list of N (>2,000) keywords (of datatype string). Each of the N keywords has associated with it a list of names of varying numbers. For example, keyword(1) will have a list of L1 names associated with it, keyword(2) will have a list of L2 names associated with it and so on with L1 not equal to L2 etc. All keywords and names are immutable. Given a keyword(n) , we want to get hold of the associated list of Ln names. At anytime, we also want to add keywords to the list of N keywords, and to any of the associated Ln lists - both of which will grow to very large sizes. The data will be read into the Python data structure(s) from disk storage. I am struggling to work out what is the ideal Python data structure for the above. Any help would be greatly appreciated. Dinesh -- http://mail.python.org/mailman/listinfo/python-list
Re: To unicode or not to unicode
re: "You should never have to rely on the default encoding. You should explicitly decode and encode data." What is the best practice for 1) doing this in Python and 2) for unicode support ? I want to standardize on unicode and want to put into place best Python practice so that we don't have to worry. Thanks! Dinesh On Feb 19, 7:21 pm, Benjamin Peterson wrote: > Ron Garret flownet.com> writes: > > > > > I'm writing a little wiki that I call µWiki. That's a lowercase Greek > > mu at the beginning (it's pronounced micro-wiki). It's working, except > > that I can't actually enter the name of the wiki into the wiki itself > > because the default unicode encoding on my Python installation is > > "ascii". So I'm trying to decide on a course of action. There seem to > > be three possibilities: > > You should never have to rely on the default encoding. You should explicitly > decode and encode data. > > > > > 1. Change the code to properly support unicode. Preliminary > > investigations indicate that this is going to be a colossal pain in the > > ass. > > Properly handling unicode may be painful at first, but it will surely pay off > in > the future. -- http://mail.python.org/mailman/listinfo/python-list
Python on 64-bit Windows Vista
Does anyone have experience of working with Python and very large text files (> 10Gb) on 64-bit Windows Vista? The problem is that my Python program - to perform simple data processing on the 10Gb file - never completes and ends with an error. When I reduce the size of the file (< 5Gb) the program works perfectly. So, it is not my code! This is not the first time that this has happened and I'm wondering what is it about Python and/or 64-bit Vista that causes these inexplicable errors when processing very large text files? Dinesh -- http://mail.python.org/mailman/listinfo/python-list
Fast list traversal
I want to see if there is an alternative method for fast list traversal. The code is very simple: dict_long_lists = defaultdict(list) for long_list in dict_long_lists.itervalues() for element in long_list: array_a[element] = m + n + p# m,n,p are variable numbers The long_list's are read from a defaultdict(list) dictionary and so don't need initializing. The elements of long_list are integers and ordered (sorted before placing in dictionary). There are > 20,000 long_list's each with a variable number of elements (>5,000). The elements of long_list are immutable (ie. don't change). The above code is within a def function. I've tried set() using defaultdict(set) but the elements are not ordered. What is the fastest way to traverse these long_list's sequentially from the beginning to the end? Maybe there is another data structure that can be used instead of a list. Dinesh -- http://mail.python.org/mailman/listinfo/python-list
Re: Fast list traversal
On Nov 2, 1:00 am, Dennis Lee Bieber <[EMAIL PROTECTED]> wrote: > On Sun, 2 Nov 2008 00:25:13 -0700 (PDT), dineshv > <[EMAIL PROTECTED]> declaimed the following in comp.lang.python: > > > I want to see if there is an alternative method for fast list > > traversal. The code is very simple: > > > dict_long_lists = defaultdict(list) > > for long_list in dict_long_lists.itervalues() > > for element in long_list: > > array_a[element] = m + n + p # m,n,p > > are variable numbers > > > The long_list's are read from a defaultdict(list) dictionary and so > > don't need initializing. The elements of long_list are integers and > > ordered (sorted before placing in dictionary). There are > 20,000 > > Out of curiosity, what code is used to put the values in? The sample > you give above is creating an empty dictionary rigged, if I understand > the help file, to automatically give an empty list if a non-existent key > is requested. But in your loop, there is no possibility of a > non-existent key being requested -- .itervalues() will only traverse > over real data (ie; keys that DO exist in the dictionary). > > And, if you are sorting a list "before placing in dictionary", why > need the defaultdict()? A plain > > dict[key] = presorted_list_of_integers > > would be sufficient. > > Or do you mean to imply that you are using something like: > > thedefaultdict[key].append(single_value) > thedefaultdict[key].sort() > > EACH time you obtain another value from where-ever? If so, that's going > to be the biggest time sink... > > > What is the fastest way to traverse these long_list's sequentially > > from the beginning to the end? Maybe there is another data structure > > that can be used instead of a list. > > So far as I know, the list IS the fastest structure available for > sequential processing. > -- > Wulfraed Dennis Lee Bieber KD6MOG > [EMAIL PROTECTED] [EMAIL PROTECTED] > HTTP://wlfraed.home.netcom.com/ > (Bestiaria Support Staff: [EMAIL PROTECTED]) > HTTP://www.bestiaria.com/ dict_long_lists is a dictionary of lists and is NOT empty. Thank-you -- http://mail.python.org/mailman/listinfo/python-list
Dictionary, integer compression
If you store a large number of integers (keys and values) in a dictionary, do the Python internals perform integer compression to save memory and enhance performance? Thanks Dinesh -- http://mail.python.org/mailman/listinfo/python-list
Re: Dictionary, integer compression
Yes, "integer compression" as in Unary, Golomb, and there are a few other schemes. It is known that for large (integer) data sets, encoding and decoding the integers will save space (memory and/or storage) and doesn't impact performance. As the Python dictionary is a built-in (and an important data structure), I wondered if the Python internals used integer compression for the dictionary (especially as the size of the dictionary grew)? Dinesh -- http://mail.python.org/mailman/listinfo/python-list
Re: Dictionary, integer compression
Hi bearophile Thanks for that about Python3. My integers range from 0 to 9,999,999 and I have loads of them. Do you think Python3 will help? I want to do testing on my local machine with the large numbers of integers and was wondering if I can get away with an existing Python data structure or will I have to code a compression scheme. Dinesh -- http://mail.python.org/mailman/listinfo/python-list
Python 2.6 for Windows 64-bit AMD
I upgraded from Python 2.5.4 to Python 2.6.2 under the Windows 64-bit AMD version, but no external libraries (eg. pyparsing and Numpy 1.3) work. I noticed a few odd things: i. pyparsing could not find an entry for Python 2.6.2 in the Wondows Registry ii. Python 2.6.2 only allows per-machine installation instead of per- user and per-machine. Can anyone shed any light on what's up with this build of Python 2.6.2? Dinesh -- http://mail.python.org/mailman/listinfo/python-list
