Lists and Sublists

2007-10-23 Thread dineshv
We have a list of N (>2,000) keywords (of datatype string).  Each of
the N keywords has associated with it a list of names of varying
numbers.  For example, keyword(1) will have a list of L1 names
associated with it, keyword(2) will have a list of L2 names associated
with it and so on with L1 not equal to L2 etc.  All keywords and names
are immutable.

Given a keyword(n) , we want to get hold of the associated list of Ln
names.

At anytime, we also want to add keywords to the list of N keywords,
and to any of the associated Ln lists - both of which will grow to
very large sizes.

The data will be read into the Python data structure(s) from disk
storage.

I am struggling to work out what is the ideal Python data structure
for the above.  Any help would be greatly appreciated.

Dinesh

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: To unicode or not to unicode

2009-02-22 Thread dineshv
re: "You should never have to rely on the default encoding. You should
explicitly decode and encode data."

What is the best practice for 1) doing this in Python and 2) for
unicode support ?

I want to standardize on unicode and want to put into place best
Python practice so that we don't have to worry.  Thanks!

Dinesh



On Feb 19, 7:21 pm, Benjamin Peterson  wrote:
> Ron Garret  flownet.com> writes:
>
>
>
> > I'm writing a little wiki that I call µWiki.  That's a lowercase Greek
> > mu at the beginning (it's pronounced micro-wiki).  It's working, except
> > that I can't actually enter the name of the wiki into the wiki itself
> > because the default unicode encoding on my Python installation is
> > "ascii".  So I'm trying to decide on a course of action.  There seem to
> > be three possibilities:
>
> You should never have to rely on the default encoding. You should explicitly
> decode and encode data.
>
>
>
> > 1.  Change the code to properly support unicode.  Preliminary
> > investigations indicate that this is going to be a colossal pain in the
> > ass.
>
> Properly handling unicode may be painful at first, but it will surely pay off 
> in
> the future.

--
http://mail.python.org/mailman/listinfo/python-list


Python on 64-bit Windows Vista

2009-02-22 Thread dineshv
Does anyone have experience of working with Python and very large text
files (> 10Gb) on 64-bit Windows Vista?

The problem is that my Python program - to perform simple data
processing on the 10Gb file - never completes and ends with an error.
When I reduce the size of the file (< 5Gb) the program works
perfectly.  So, it is not my code!

This is not the first time that this has happened and I'm wondering
what is it about Python and/or 64-bit Vista that causes these
inexplicable errors when processing very large text files?

Dinesh
--
http://mail.python.org/mailman/listinfo/python-list


Fast list traversal

2008-11-02 Thread dineshv
I want to see if there is an alternative method for fast list
traversal.  The code is very simple:

dict_long_lists = defaultdict(list)
for long_list in dict_long_lists.itervalues()
for element in long_list:
array_a[element] = m + n + p# m,n,p
are variable numbers

The long_list's are read from a defaultdict(list) dictionary and so
don't need initializing.  The elements of long_list are integers and
ordered (sorted before placing in dictionary).  There are > 20,000
long_list's each with a variable number of elements (>5,000).  The
elements of long_list are immutable (ie. don't change).  The above
code is within a def function.

I've tried set() using defaultdict(set) but the elements are not
ordered.

What is the fastest way to traverse these long_list's sequentially
from the beginning to the end?  Maybe there is another data structure
that can be used instead of a list.

Dinesh
--
http://mail.python.org/mailman/listinfo/python-list


Re: Fast list traversal

2008-11-02 Thread dineshv
On Nov 2, 1:00 am, Dennis Lee Bieber <[EMAIL PROTECTED]> wrote:
> On Sun, 2 Nov 2008 00:25:13 -0700 (PDT), dineshv
> <[EMAIL PROTECTED]> declaimed the following in comp.lang.python:
>
> > I want to see if there is an alternative method for fast list
> > traversal.  The code is very simple:
>
> > dict_long_lists = defaultdict(list)
> > for long_list in dict_long_lists.itervalues()
> >         for element in long_list:
> >                 array_a[element] = m + n + p                # m,n,p
> > are variable numbers
>
> > The long_list's are read from a defaultdict(list) dictionary and so
> > don't need initializing.  The elements of long_list are integers and
> > ordered (sorted before placing in dictionary).  There are > 20,000
>
>         Out of curiosity, what code is used to put the values in? The sample
> you give above is creating an empty dictionary rigged, if I understand
> the help file, to automatically give an empty list if a non-existent key
> is requested. But in your loop, there is no possibility of a
> non-existent key being requested -- .itervalues() will only traverse
> over real data (ie; keys that DO exist in the dictionary).
>
>         And, if you are sorting a list "before placing in dictionary", why
> need the defaultdict()? A plain
>
>         dict[key] = presorted_list_of_integers
>
> would be sufficient.
>
>         Or do you mean to imply that you are using something like:
>
>         thedefaultdict[key].append(single_value)
>         thedefaultdict[key].sort()
>
> EACH time you obtain another value from where-ever? If so, that's going
> to be the biggest time sink...
>
> > What is the fastest way to traverse these long_list's sequentially
> > from the beginning to the end?  Maybe there is another data structure
> > that can be used instead of a list.
>
>         So far as I know, the list IS the fastest structure available for
> sequential processing.
> --
>         Wulfraed        Dennis Lee Bieber               KD6MOG
>         [EMAIL PROTECTED]             [EMAIL PROTECTED]
>                 HTTP://wlfraed.home.netcom.com/
>         (Bestiaria Support Staff:               [EMAIL PROTECTED])
>                 HTTP://www.bestiaria.com/

dict_long_lists is a dictionary of lists and is NOT empty.  Thank-you
--
http://mail.python.org/mailman/listinfo/python-list


Dictionary, integer compression

2009-04-29 Thread dineshv
If you store a large number of integers (keys and values) in a
dictionary, do the Python internals perform integer compression to
save memory and enhance performance?  Thanks

Dinesh
--
http://mail.python.org/mailman/listinfo/python-list


Re: Dictionary, integer compression

2009-04-30 Thread dineshv
Yes, "integer compression" as in Unary, Golomb, and there are a few
other schemes.

It is known that for large (integer) data sets, encoding and decoding
the integers will save space (memory and/or storage) and doesn't
impact performance.

As the Python dictionary is a built-in (and an important data
structure), I wondered if the Python internals used integer
compression for the dictionary (especially as the size of the
dictionary grew)?

Dinesh
--
http://mail.python.org/mailman/listinfo/python-list


Re: Dictionary, integer compression

2009-04-30 Thread dineshv
Hi bearophile

Thanks for that about Python3.  My integers range from 0 to 9,999,999
and I have loads of them.  Do you think Python3 will help?

I want to do testing on my local machine with the large numbers of
integers and was wondering if I can get away with an existing Python
data structure or will I have to code a compression scheme.

Dinesh
--
http://mail.python.org/mailman/listinfo/python-list


Python 2.6 for Windows 64-bit AMD

2009-05-30 Thread dineshv
I upgraded from Python 2.5.4 to Python 2.6.2 under the Windows 64-bit
AMD version, but no external libraries (eg. pyparsing and Numpy 1.3)
work.  I noticed a few odd things:

i.  pyparsing could not find an entry for Python 2.6.2 in the Wondows
Registry
ii. Python 2.6.2 only allows per-machine installation instead of per-
user and per-machine.

Can anyone shed any light on what's up with this build of Python
2.6.2?

Dinesh
-- 
http://mail.python.org/mailman/listinfo/python-list