date:20101221

Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-21 Thread Stefan Behnel

[note that this has also been posted to comp.lang.python and discussed 
separately over there]


Steven D'Aprano, 20.12.2010 22:19:

ashish makani wrote:


Goal : I am trying to parse a ginormous ( ~ 1gb) xml file.


I sympathize with you. I wonder who thought that building a 1GB XML file
was a good thing.

Forget about using any XML parser that reads the entire file into memory.
By the time that 1GB of text is read and parsed, you will probably have
something about 6-8GB (estimated) in size.


The in-memory size is highly dependent on the data, specifically the 
text-to-structure ratio. If it's a lot of text content, the difference to 
the serialised tree will be small. If it's a lot of structure with tiny 
bits of text content, the in-memory size of the tree will be a lot larger.




I am guessing, as this happens (over the course of 20-30 mins), the tree
representing is being slowly built in memory, but even after 30-40 mins,
nothing happens.


It's probably not finished. Leave it another hour or so and you'll get an
out of memory error.


Right, if it gets into wild swapping, it can slow down almost to a halt, 
even though the XML parsing itself tends to have pretty good memory 
locality (but the ever growing in-memory tree obviously doesn't).




4. I then investigated some streaming libraries, but am confused - there is
SAX[http://en.wikipedia.org/wiki/Simple_API_for_XML] , the iterparse
interface[http://effbot.org/zone/element-iterparse.htm], & several otehr
options ( minidom)

Which one is the best for my situation ?


You absolutely need to use a streaming library. element-iterparse still
builds the tree, so that's no use to you.


Wrong. iterparse() allows you to cut branches in the tree while it's 
growing, that's exactly what it's there for.




I believe you should use SAX or
minidom, but that's about my limit of knowledge of streaming XML parsers.


With "minidom" being an advice that's even worse than SAX - SAX would at 
least solve the problem, whereas minidom wouldn't because of its 
intolerable memory requirements.


Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-21 Thread David Hutto

On Tue, Dec 21, 2010 at 3:44 AM, Stefan Behnel  wrote:
> [note that this has also been posted to comp.lang.python and discussed
> separately over there]
>
> Steven D'Aprano, 20.12.2010 22:19:
>>
>> ashish makani wrote:
>>
>>> Goal : I am trying to parse a ginormous ( ~ 1gb) xml file.
>>
>> I sympathize with you. I wonder who thought that building a 1GB XML file
>> was a good thing.

David Mertz, Ph.D.
Comparator, Gnosis Software, Inc.
June 2003

http://gnosis.cx/publish/programming/xml_matters_29.html


that was just the first listing:

http://www.google.com/search?client=ubuntu&channel=fs&q=parsing+gigabyte+xml+python&ie=utf-8&oe=utf-8




>>
>> Forget about using any XML parser that reads the entire file into memory.
>> By the time that 1GB of text is read and parsed, you will probably have
>> something about 6-8GB (estimated) in size.
>
> The in-memory size is highly dependent on the data, specifically the
> text-to-structure ratio. If it's a lot of text content, the difference to
> the serialised tree will be small. If it's a lot of structure with tiny bits
> of text content, the in-memory size of the tree will be a lot larger.
>
>
>>> I am guessing, as this happens (over the course of 20-30 mins), the tree
>>> representing is being slowly built in memory, but even after 30-40 mins,
>>> nothing happens.
>>
>> It's probably not finished. Leave it another hour or so and you'll get an
>> out of memory error.
>
> Right, if it gets into wild swapping, it can slow down almost to a halt,
> even though the XML parsing itself tends to have pretty good memory locality
> (but the ever growing in-memory tree obviously doesn't).
>
>
>>> 4. I then investigated some streaming libraries, but am confused - there
>>> is
>>> SAX[http://en.wikipedia.org/wiki/Simple_API_for_XML] , the iterparse
>>> interface[http://effbot.org/zone/element-iterparse.htm], & several otehr
>>> options ( minidom)
>>>
>>> Which one is the best for my situation ?
>>
>> You absolutely need to use a streaming library. element-iterparse still
>> builds the tree, so that's no use to you.
>
> Wrong. iterparse() allows you to cut branches in the tree while it's
> growing, that's exactly what it's there for.
>
>
>> I believe you should use SAX or
>> minidom, but that's about my limit of knowledge of streaming XML parsers.
>
> With "minidom" being an advice that's even worse than SAX - SAX would at
> least solve the problem, whereas minidom wouldn't because of its intolerable
> memory requirements.
>
> Stefan
>
> ___
> Tutor maillist  -  tu...@python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>



-- 
They're installing the breathalyzer on my email account next week.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-21 Thread Stefan Behnel


Chris Fuller, 21.12.2010 03:27:

This isn't XML, it's an abomination of XML.  Best to not treat it as XML.
Good thing you're only after one class of tags.  Here's what I'd do.  I'll
give a general solution, but there are two parameters / four cases that could
make the code simpler, I'll just point them out at the end.

Iterate over the file descriptor, reading in line-by-line.  This will be slow
on a huge file, but probably not so bad if you're only doing it once.


Note that it's not unlikely that this is actually *slower* than using a 
real XML parser:


http://effbot.org/zone/celementtree.htm#benchmarks

Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-21 Thread David Hutto

But then again, maybe it's too much of an optimization for someone not
optimizing for others or a specific application for the hardware, or
it's not part of the standard python library, and therefore,
expendable.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-21 Thread David Hutto

On Tue, Dec 21, 2010 at 3:52 AM, Stefan Behnel  wrote:
> Chris Fuller, 21.12.2010 03:27:
>>
>> This isn't XML, it's an abomination of XML.  Best to not treat it as XML.
>> Good thing you're only after one class of tags.  Here's what I'd do.  I'll
>> give a general solution, but there are two parameters / four cases that
>> could
>> make the code simpler, I'll just point them out at the end.
>>
>> Iterate over the file descriptor, reading in line-by-line.  This will be
>> slow
>> on a huge file, but probably not so bad if you're only doing it once.
>
> Note that it's not unlikely that this is actually *slower* than using a real
> XML parser:
>

Or a 'real' language like C or C++ maybe to increase, or in Python's
case, bypass, the interpreter?


> http://effbot.org/zone/celementtree.htm#benchmarks
>
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-21 Thread David Hutto

On Tue, Dec 21, 2010 at 3:55 AM, David Hutto  wrote:
> On Tue, Dec 21, 2010 at 3:52 AM, Stefan Behnel  wrote:
>> Chris Fuller, 21.12.2010 03:27:
>>>
>>> This isn't XML, it's an abomination of XML.  Best to not treat it as XML.
>>> Good thing you're only after one class of tags.  Here's what I'd do.  I'll
>>> give a general solution, but there are two parameters / four cases that
>>> could
>>> make the code simpler, I'll just point them out at the end.
>>>
>>> Iterate over the file descriptor, reading in line-by-line.  This will be
>>> slow
>>> on a huge file, but probably not so bad if you're only doing it once.
>>
>> Note that it's not unlikely that this is actually *slower* than using a real
>> XML parser:
>>
>
> Or a 'real' language like C or C++ maybe to increase, or in Python's
> case, bypass, the interpreter?


Which is *faster*.
>
>
>> http://effbot.org/zone/celementtree.htm#benchmarks
>>
>



-- 
They're installing the breathalyzer on my email account next week.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-21 Thread David Hutto

And from what I recall XML is intended for data transfer in respect to
HTML(from a recent brushup, nothing more), so not having used it, it
sure has been displayed as a data transfer mechanism, I remember this
from using Joomla's framework, and the xml files for menus I think.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-21 Thread Stefan Behnel


David Hutto, 21.12.2010 09:49:

Steven D'Aprano, 20.12.2010 22:19:


ashish makani wrote:


Goal : I am trying to parse a ginormous ( ~ 1gb) xml file.


I sympathize with you. I wonder who thought that building a 1GB XML file
was a good thing.


http://gnosis.cx/publish/programming/xml_matters_29.html


Fredrik Lundh's cElementTree page has a benchmark for that, too. It's 
actually slower than cElementTree for the case he tested (which was 
basically "parsing" :)


http://effbot.org/zone/celementtree.htm#benchmarks

Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-21 Thread David Hutto

On Tue, Dec 21, 2010 at 3:59 AM, David Hutto  wrote:
> And from what I recall XML is intended for data transfer in respect to
> HTML(from a recent brushup, nothing more),

Apologies that is browser based transfer, (not sure what more,
although I think it means any data tranfer)

 so not having used it, it
> sure has been displayed as a data transfer mechanism, I remember this
> from using Joomla's framework, and the xml files for menus I think.
>
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-21 Thread Stefan Behnel


David Hutto, 21.12.2010 09:55:

On Tue, Dec 21, 2010 at 3:52 AM, Stefan Behnel wrote:

Chris Fuller, 21.12.2010 03:27:


This isn't XML, it's an abomination of XML.  Best to not treat it as XML.
Good thing you're only after one class of tags.  Here's what I'd do.  I'll
give a general solution, but there are two parameters / four cases that
could
make the code simpler, I'll just point them out at the end.

Iterate over the file descriptor, reading in line-by-line.  This will be
slow
on a huge file, but probably not so bad if you're only doing it once.


Note that it's not unlikely that this is actually *slower* than using a real
XML parser:


Or a 'real' language like C or C++ maybe to increase, or in Python's
case, bypass, the interpreter?


While this may be a little faster than Python code (although I suspect that 
benchmarking is needed to prove either way), I doubt that it's worth the 
overhead in code writing. If I can write a couple of lines of Python code 
that are easy to validate and almost as fast as C code, why would I want to 
write and debug hundreds of lines of code in C or C++, just to see that I 
need to tune my benchmark to notice the difference?


But then, people even write XML handling code in Java, where neither 
performance nor code size is a suitable argument.


Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-21 Thread David Hutto

.

 I sympathize with you. I wonder who thought that building a 1GB XML file
 was a good thing.

If it is:

XML stands for eXtensible Markup Language.

XML is designed to transport and store data.


Then what other file medium would you suggest as the tagging means.

You have a file with tags, you can't parse and store the data in any
file anymore than the next, right?

So the tags and how they are marked by any module or file extension
searcher shouldn't matter, right?

>>
>> http://gnosis.cx/publish/programming/xml_matters_29.html
>
> Fredrik Lundh's cElementTree page has a benchmark for that, too. It's
> actually slower than cElementTree for the case he tested (which was
> basically "parsing" :)
>
> http://effbot.org/zone/celementtree.htm#benchmarks
>
> Stefan
>
> ___
> Tutor maillist  -  tu...@python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>



-- 
They're installing the breathalyzer on my email account next week.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-21 Thread David Hutto

On Tue, Dec 21, 2010 at 4:10 AM, Stefan Behnel  wrote:
> David Hutto, 21.12.2010 09:55:
>>
>> On Tue, Dec 21, 2010 at 3:52 AM, Stefan Behnel wrote:
>>>
>>> Chris Fuller, 21.12.2010 03:27:

 This isn't XML, it's an abomination of XML.  Best to not treat it as
 XML.
 Good thing you're only after one class of tags.  Here's what I'd do.
  I'll
 give a general solution, but there are two parameters / four cases that
 could
 make the code simpler, I'll just point them out at the end.

 Iterate over the file descriptor, reading in line-by-line.  This will be
 slow
 on a huge file, but probably not so bad if you're only doing it once.
>>>
>>> Note that it's not unlikely that this is actually *slower* than using a
>>> real
>>> XML parser:
>>
>> Or a 'real' language like C or C++ maybe to increase, or in Python's
>> case, bypass, the interpreter?
>
> While this may be a little faster than Python code (although I suspect that
> benchmarking is needed to prove either way), I doubt that it's worth the
> overhead in code writing. If I can write a couple of lines of Python code
> that are easy to validate and almost as fast as C code, why would I want to
> write and debug hundreds of lines of code in C or C++, just to see that I
> need to tune my benchmark to notice the difference?

Don't get me wrong, I love the simplicity too, but if you know you
really do need it along the way, then you should start thinking ahead
of the easy, and toward the harder code for your project. Just as
every language has it's place, so does Python.


>
> But then, people even write XML handling code in Java, where neither
> performance nor code size is a suitable argument.
>
> Stefan
>
> ___
> Tutor maillist  -  tu...@python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>



-- 
They're installing the breathalyzer on my email account next week.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-21 Thread David Hutto

On Tue, Dec 21, 2010 at 4:17 AM, David Hutto  wrote:
> On Tue, Dec 21, 2010 at 4:10 AM, Stefan Behnel  wrote:
>> David Hutto, 21.12.2010 09:55:
>>>
>>> On Tue, Dec 21, 2010 at 3:52 AM, Stefan Behnel wrote:

 Chris Fuller, 21.12.2010 03:27:
>
> This isn't XML, it's an abomination of XML.  Best to not treat it as
> XML.
> Good thing you're only after one class of tags.  Here's what I'd do.
>  I'll
> give a general solution, but there are two parameters / four cases that
> could
> make the code simpler, I'll just point them out at the end.
>
> Iterate over the file descriptor, reading in line-by-line.  This will be
> slow
> on a huge file, but probably not so bad if you're only doing it once.

 Note that it's not unlikely that this is actually *slower* than using a
 real
 XML parser:
>>>
>>> Or a 'real' language like C or C++ maybe to increase, or in Python's
>>> case, bypass, the interpreter?
>>
>> While this may be a little faster than Python code (although I suspect that
>> benchmarking is needed to prove either way), I doubt that it's worth the
>> overhead in code writing. If I can write a couple of lines of Python code
>> that are easy to validate and almost as fast as C code, why would I want to
>> write and debug hundreds of lines of code in C or C++, just to see that I
>> need to tune my benchmark to notice the difference?
>
> Don't get me wrong, I love the simplicity too, but if you know you
> really do need it along the way, then you should start thinking ahead
> of the easy, and toward the harder code for your project. Just as
> every language has it's place, so does Python.

If I want to write a programming language, It might not be the best
idea to have a labguage needed for speed based on Python, I should
maybe use wha it's based on, or refine my own optimizations, just to
be a little clearer about my perspective.


>
>
>>
>> But then, people even write XML handling code in Java, where neither
>> performance nor code size is a suitable argument.
>>
>> Stefan
>>
>> ___
>> Tutor maillist  -  tu...@python.org
>> To unsubscribe or change subscription options:
>> http://mail.python.org/mailman/listinfo/tutor
>>
>
>
>
> --
> They're installing the breathalyzer on my email account next week.
>



-- 
They're installing the breathalyzer on my email account next week.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

[Tutor] Hangman game.....problem putting strings in a list.....

2010-12-21 Thread Yasin Yaqoobi


global line
global index;
guessed = ["-"];
count = 0;
wrong = 0;

def guess(letter):
global guessed
if (letter in line):
index = line.index(letter);
print guessed;

# This is the line that gives me the error don't know why?  
guessed[index] = " " + (letter); ,TypeError: 'str' object does not 
support item assignment

guessed[index] = (letter);
print ' '.join(guessed)
else:
global wrong;
wrong += 1;


def draw(number):
if (number == 1):
print "O ";
elif(number == 2):
print "O ";
print "| ";
elif (number == 3):
print "O ";
print "   \| ";

elif (number == 4):
print "O  ";
print "   \|/ ";
elif (number == 5):
print "O  ";
print "   \|/ ";
print "|  ";
elif (number == 6):
print "O  ";
print "   \|/ ";
print "|  ";
print "   /   ";
elif (number == 7):
print "O  ";
print "   \|/ ";
print "|  ";
print "   / \ ";
print "Sorry you Lost! "

def doit():
global count
while(wrong != 7):
a_letter = raw_input("Pick a letter --> ")
print
guess(a_letter);
draw(wrong);
print
count += 1

def initArray():
global guessed
print line
guessed =  guessed[0] * (len(line)-1)
print "this is new list " + guessed;


while 1:
line = file.readline();
if (len(line) >= 5):
initArray()
doit();
break
if not line: break

file.close()

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-21 Thread Stefan Behnel


Hi,

I wonder why you reply to my e-mail without replying to what I wrote in it.


David Hutto, 21.12.2010 10:12:

.

I sympathize with you. I wonder who thought that building a 1GB XML file
was a good thing.


This was written by Steven D'Aprano.



If it is:

XML stands for eXtensible Markup Language.

XML is designed to transport and store data.


Then what other file medium would you suggest as the tagging means.


There are different file formats for structured and semi-structured data. 
XML certainly isn't the only one, and people have been defining specific 
formats for their specific use cases for ages, for better or worse each time.


Personally, I don't think GB-sized XML files are bad per-se. It depends on 
the use case, and it depends on what's considered a suitable solution in a 
given environment. Also note that XML tends to compress pretty well, and 
that it's sometimes faster to parse gzipped XML than uncompressed XML. So 
the serialised file size by itself isn't an argument, either.




You have a file with tags, you can't parse and store the data in any
file anymore than the next, right?

So the tags and how they are marked by any module or file extension
searcher shouldn't matter, right?


I don't think I can extract the intended meaning from the assembled words 
you use here.


Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-21 Thread David Hutto

File = string

going through string code

finding pieces of the string and marking the territory.


I don't see 'real' optimization other than rolling your own.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-21 Thread Stefan Behnel


David Hutto, 21.12.2010 10:19:

On Tue, Dec 21, 2010 at 4:17 AM, David Hutto wrote:

On Tue, Dec 21, 2010 at 4:10 AM, Stefan Behnel wrote:

Note that it's not unlikely that this is actually *slower* than using a
real XML parser:


Or a 'real' language like C or C++ maybe to increase, or in Python's
case, bypass, the interpreter?


While this may be a little faster than Python code (although I suspect that
benchmarking is needed to prove either way), I doubt that it's worth the
overhead in code writing. If I can write a couple of lines of Python code
that are easy to validate and almost as fast as C code, why would I want to
write and debug hundreds of lines of code in C or C++, just to see that I
need to tune my benchmark to notice the difference?


Don't get me wrong, I love the simplicity too, but if you know you
really do need it along the way, then you should start thinking ahead
of the easy, and toward the harder code for your project. Just as
every language has it's place, so does Python.


Premature optimisation is the root of all evil. That totally applies when 
choosing a programming language.




If I want to write a programming language, It might not be the best
idea to have a labguage needed for speed based on Python, I should
maybe use wha it's based on, or refine my own optimizations, just to
be a little clearer about my perspective.


Being clearer would certainly help in understanding your postings.

Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-21 Thread David Hutto

On Tue, Dec 21, 2010 at 4:28 AM, Stefan Behnel  wrote:
> Hi,
>
> I wonder why you reply to my e-mail without replying to what I wrote in it.
>
>
> David Hutto, 21.12.2010 10:12:
>>
>> .
>>
>> I sympathize with you. I wonder who thought that building a 1GB XML
>> file
>> was a good thing.
>
> This was written by Steven D'Aprano.
>

My bad, human parsing has errors too.

>
>> If it is:
>>
>> XML stands for eXtensible Markup Language.
>>
>> XML is designed to transport and store data.
>>
>>
>> Then what other file medium would you suggest as the tagging means.
>
> There are different file formats for structured and semi-structured data.
> XML certainly isn't the only one, and people have been defining specific
> formats for their specific use cases for ages, for better or worse each
> time.

But it's all a string of coded text with only the formats that define
the markups within though.

String format + text in file(type of coding for lang)



>
> Personally, I don't think GB-sized XML files are bad per-se. It depends on
> the use case, and it depends on what's considered a suitable solution in a
> given environment. Also note that XML tends to compress pretty well, and
> that it's sometimes faster to parse gzipped XML than uncompressed XML. So
> the serialised file size by itself isn't an argument, either.

So the zipped file in compressed doesn't contain compressed tags, or
data, then why is it compressed?

>
>
>> You have a file with tags, you can't parse and store the data in any
>> file anymore than the next, right?
>>
>> So the tags and how they are marked by any module or file extension
>> searcher shouldn't matter, right?
>
The phrase:
 in a php file
 in a xml file
 in an html file.

if read in any file it's the same, as


How does the file extension make it any longer?
 This is know matter how it's interpreted by any other mechanism than
just reading the text within, right?

> I don't think I can extract the intended meaning from the assembled words
> you use here.
>
> Stefan
>
> ___
> Tutor maillist  -  tu...@python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>



-- 
They're installing the breathalyzer on my email account next week.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-21 Thread Stefan Behnel


David Hutto, 21.12.2010 10:29:

File = string

going through string code

finding pieces of the string and marking the territory.


I don't see 'real' optimization other than rolling your own.


Reads like a Haiku. Doesn't quite fit the verse, though.

From your behaviour, I get the impression that you are just trolling.

Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-21 Thread David Hutto

On Tue, Dec 21, 2010 at 4:34 AM, Stefan Behnel  wrote:
> David Hutto, 21.12.2010 10:19:
>>
>> On Tue, Dec 21, 2010 at 4:17 AM, David Hutto wrote:
>>>
>>> On Tue, Dec 21, 2010 at 4:10 AM, Stefan Behnel wrote:
>>
>> Note that it's not unlikely that this is actually *slower* than using
>> a
>> real XML parser:
>
> Or a 'real' language like C or C++ maybe to increase, or in Python's
> case, bypass, the interpreter?

 While this may be a little faster than Python code (although I suspect
 that
 benchmarking is needed to prove either way), I doubt that it's worth the
 overhead in code writing. If I can write a couple of lines of Python
 code
 that are easy to validate and almost as fast as C code, why would I want
 to
 write and debug hundreds of lines of code in C or C++, just to see that
 I
 need to tune my benchmark to notice the difference?
>>>
>>> Don't get me wrong, I love the simplicity too, but if you know you
>>> really do need it along the way, then you should start thinking ahead
>>> of the easy, and toward the harder code for your project. Just as
>>> every language has it's place, so does Python.
>
> Premature optimisation is the root of all evil. That totally applies when
> choosing a programming language.

Not premature design, but being pre mature when selecting. Do you
utilize python for aviation? you could, but modeling would be better.
However, you'll just have to learn another language in order to
optimize the end means.

I know python has it's own optimizations, but it's still interpreting
to the command line. A dog fight between two f-15's would certainly
notice the response rate when pulling out of a intersect course.

>
>
>> If I want to write a programming language, It might not be the best
>> idea to have a labguage needed for speed based on Python, I should
>> maybe use wha it's based on, or refine my own optimizations, just to
>> be a little clearer about my perspective.
>
> Being clearer would certainly help in understanding your postings.

What's the difference between a language based on C++ Python, and C?

If I used any of the above to begin writing my own language, which of
the above would be faster(other languages aside) to begin with?
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-21 Thread Alan Gauld


"David Hutto"  wrote

And from what I recall XML is intended for data transfer in respect 
to

HTML(from a recent brushup, nothing more),


Apologies that is browser based transfer,


I'm not sure what that last bit means.
XML is a self-describing data format. It is usually used for files
but can be used in data streams or in-memory strings.

It's natural competitors are TLV (Tag,Lenth,Value) and
CSV(Comma Seperated Value) files but neither is as rich
in structure.  Alternative options include ASN.1, Edifact and
IDL but these are not self-describing(*) (although they are all
more compact and faster to parse, but only IDL is free.)


sure has been displayed as a data transfer mechanism,


You don't have to use it for data transfer - eg MS's use
as a document storage format in Office - but frankly if
you use XML to store large volumes of data you are mad,
a database is a much more sensible option being far more
space efficient and faster to work with.

(*)ASN.1, IDL etc all rely on a shared definition, and
often shared code library, at both sender and receiver.
The library is a compiled version of the data definition
which enables complex data structures to be read from
the file in a single chunk very efficiently.

HTH,


--
Alan Gauld
Author of the Learn to Program web site
http://www.alan-g.me.uk/


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-21 Thread Alan Gauld



"David Hutto"  wrote

> Note that it's not unlikely that this is actually *slower* than 
> using a real

> XML parser:

Or a 'real' language like C or C++ maybe to increase, or in Python's
case, bypass, the interpreter?


Most of the Python xml parsers are written in C - many use the
industry standard expat parser - so converting to C would bring
minimal speed advantage and a lot of extra work.

Alan G. 



___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-21 Thread David Hutto

On Tue, Dec 21, 2010 at 4:43 AM, Stefan Behnel  wrote:
> David Hutto, 21.12.2010 10:29:
>>
>> File = string

A file is a string of character encoded in it's format

>>
>> going through string code

Code that goes through the file format and the encoding

>>
>> finding pieces of the string and marking the territory.

What sequence of characters qualifies for a target select

>>
>>
>> I don't see 'real' optimization other than rolling your own.

encoding + characters in file + what's where in the file




>
> Reads like a Haiku. Doesn't quite fit the verse, though.
>
> From your behaviour, I get the impression that you are just trolling.

The water always flows freely down here, Plus the traffics light.


>
> Stefan
>
> ___
> Tutor maillist  -  tu...@python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>



-- 
They're installing the breathalyzer on my email account next week.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-21 Thread Alan Gauld



"David Hutto"  wrote

I sympathize with you. I wonder who thought that building a 1GB XML 
file

was a good thing.



that was just the first listing:

http://www.google.com/search?client=ubuntu&channel=fs&q=parsing+gigabyte+xml+python&ie=utf-8&oe=utf-8


Eeek! One of the listings says:


22 Jan 2009 ... Stripping Illegal Characters from XML in Python >>

... I'd be asking Python to process 6.4 gigabytes of CSV into
6.5 gigabytes of XML 1. . In fact, what happened was that
the parsing didn't work and the whole db was ...

And I thought a 1G file was extreme... Do these people stop to think 
that
with XML as much as 80% of their "data" is just description (ie the 
tags).


Alan G. 



___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

[Tutor] Writing a programming language in Python (was: Trying to parse a HUGE(1gb) xml file in python)

2010-12-21 Thread Stefan Behnel


David Hutto, 21.12.2010 10:46:

On Tue, Dec 21, 2010 at 4:34 AM, Stefan Behnel wrote:

David Hutto, 21.12.2010 10:19:

If I want to write a programming language, It might not be the best
idea to have a labguage needed for speed based on Python, I should
maybe use wha it's based on, or refine my own optimizations, just to
be a little clearer about my perspective.


Being clearer would certainly help in understanding your postings.


What's the difference between a language based on C++ Python, and C?

If I used any of the above to begin writing my own language, which of
the above would be faster(other languages aside) to begin with?


Certainly Python. The Cython compiler is written in Python, for example. It 
translates Python to C. It would have been a lot less fun to write it in C, 
and it would totally have made the project progress a lot slower.


PyPy and ShedSkin are other famous examples to name here, even though both 
are written in restricted versions of Python (not sure about ShedSkin, but 
if it compiles itself, it must be).


Seriously, if you want to write a programming language, the best advice I 
can give is to write it in Python.


Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-21 Thread David Hutto

On Tue, Dec 21, 2010 at 4:46 AM, Alan Gauld  wrote:
> "David Hutto"  wrote
>
>>> And from what I recall XML is intended for data transfer in respect to
>>> HTML(from a recent brushup, nothing more),
>>
>> Apologies that is browser based transfer,
>
> I'm not sure what that last bit means.
> XML is a self-describing data format. It is usually used for files
> but can be used in data streams or in-memory strings.

I know it's self tagged, meaning you create the tags within, and that
it's used elsewhere as a form of data transfer, my previous usage with
the particular file format was browser based in usage, but I know it's
used in many other places, which is why I didn't see the meaning of
the discussion saying it was horrible to use, I just asked for any
alternative suggestions for files, since everyone 'seemed' to have a
bad view of the usage, since it seems to be the standard for user
defined tags for data transfer.

>
> It's natural competitors are TLV (Tag,Lenth,Value) and
> CSV(Comma Seperated Value) files but neither is as rich
> in structure.

That was kind of my point, I've seen all but TLV in use, but XML is
the web standard it seems.

Alternative options include ASN.1, Edifact and
> IDL but these are not self-describing(*) (although they are all
> more compact and faster to parse, but only IDL is free

Haven't heard of these, but formula of file, it seems to me,
is encoding + extension + text, how much can these really differ.
 On average it seems that the self defined tags of xml, would have a
bigger impact on the average usage(someone has larger tag sizes, and
more tags) than a defined file with averaged tags.

>
>>> sure has been displayed as a data transfer mechanism,
>
> You don't have to use it for data transfer - eg MS's use
> as a document storage format in Office - but frankly if
> you use XML to store large volumes of data you are mad,
> a database is a much more sensible option being far more
> space efficient and faster to work with.

If truly optimizing, I would time both, and maybe move to a different
language, or pattern if it truly mattered.

>
> (*)ASN.1, IDL etc all rely on a shared definition, and
> often shared code library, at both sender and receiver.
> The library is a compiled version of the data definition
> which enables complex data structures to be read from
> the file in a single chunk very efficiently.

This I might have to work on, but I rely on experience to quasi-trust
experience.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-21 Thread David Hutto

On Tue, Dec 21, 2010 at 4:49 AM, Alan Gauld  wrote:
>
> "David Hutto"  wrote
>
>> > Note that it's not unlikely that this is actually *slower* than > using
>> > a real
>> > XML parser:
>>
>> Or a 'real' language like C or C++ maybe to increase, or in Python's
>> case, bypass, the interpreter?
>
> Most of the Python xml parsers are written in C - many use the
> industry standard expat parser - so converting to C would bring
> minimal speed advantage and a lot of extra work.

Somewhat of the fact that python uses C encourages me of that, but I
have still been looking into c++ to optimize, because I've used it
before, and the more languages I learn the more they feel 'similar',
but the same, if you can understand that!
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-21 Thread Alan Gauld



"David Hutto"  wrote


XML stands for eXtensible Markup Language.
XML is designed to transport and store data.

Then what other file medium would you suggest as the tagging means.


See my other post but there are many alternatives that are orders
of magnitude more efficient. XML is one of the most inefficient
data transport mechanisms ever invented and its main redeeming
feature is its human readability.


You have a file with tags, you can't parse and store the data in any
file anymore than the next, right?


Wrong, even CSV files are more efficient than parsing XML.
(But are very limited in their data structure)

But binary based formats like IDL and ASN.1 can be parsed
very efficiently and, because they are binary based, store
(and therefore transmit) their data much more efficiently too.

HTH,

--
Alan Gauld
Author of the Learn to Program web site
http://www.alan-g.me.uk/


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-21 Thread David Hutto

On Tue, Dec 21, 2010 at 4:58 AM, Alan Gauld  wrote:
>
> "David Hutto"  wrote
>
>>> I sympathize with you. I wonder who thought that building a 1GB XML file
>>> was a good thing.
>
>> that was just the first listing:
>>
>>
>> http://www.google.com/search?client=ubuntu&channel=fs&q=parsing+gigabyte+xml+python&ie=utf-8&oe=utf-8
>
> Eeek! One of the listings says:
>
>> 22 Jan 2009 ... Stripping Illegal Characters from XML in Python >>
>
> ... I'd be asking Python to process 6.4 gigabytes of CSV into
> 6.5 gigabytes of XML 1. . In fact, what happened was that
> the parsing didn't work and the whole db was ...
>
> And I thought a 1G file was extreme... Do these people stop to think that
> with XML as much as 80% of their "data" is just description (ie the tags).

That';s what I saying above that xml seems to be the hog in terms of
it's user defined tags. Is that somewhat a confirmation of my hunch,
that it's the length of the users predefined tags that add to the
above mess, and that maybe a lessened tag system in accordance with
xml might be better, or a simple  tag  tag in the xml(other
files) with an index  to point to a and b would be better.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] decimal input problem

2010-12-21 Thread Alan Gauld



"jtl999"  wrote

when I try to multiply with a decimal  number in python with the 
input

this is what i get

Enter first number: 1.2
Traceback (most recent call last):
 File "Timesed.py", line 18, in 
   numberx1 = (int)(raw_input('Enter first number: '))
ValueError: invalid literal for int() with base 10: '1.2'


You are inputting a floating point number - 1.2 and
trying to convert it to an integer. You need to convert
raw_input to a float() rather than an int()

HTH,


--
Alan Gauld
Author of the Learn to Program web site
http://www.alan-g.me.uk/


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Writing a programming language in Python (was: Trying to parse a HUGE(1gb) xml file in python)

2010-12-21 Thread David Hutto

On Tue, Dec 21, 2010 at 5:00 AM, Stefan Behnel  wrote:
> David Hutto, 21.12.2010 10:46:
>>
>> On Tue, Dec 21, 2010 at 4:34 AM, Stefan Behnel wrote:
>>>
>>> David Hutto, 21.12.2010 10:19:

 If I want to write a programming language, It might not be the best
 idea to have a labguage needed for speed based on Python, I should
 maybe use wha it's based on, or refine my own optimizations, just to
 be a little clearer about my perspective.
>>>
>>> Being clearer would certainly help in understanding your postings.
>>
>> What's the difference between a language based on C++ Python, and C?
>>
>> If I used any of the above to begin writing my own language, which of
>> the above would be faster(other languages aside) to begin with?
>
> Certainly Python. The Cython compiler is written in Python, for example. It
> translates Python to C. It would have been a lot less fun to write it in C,
> and it would totally have made the project progress a lot slower.
>
> PyPy and ShedSkin are other famous examples to name here, even though both
> are written in restricted versions of Python (not sure about ShedSkin, but
> if it compiles itself, it must be).
>
> Seriously, if you want to write a programming language, the best advice I
> can give is to write it in Python.

Simpler yeah, Several implementations of python ring a bell with that,
and I understand it's .pyc so it's a compiled file and ready for usage
as 'anyother'(I might be wrong on this, but sure it's the same as
converting the original py file straight back to c).

Every language has it's benefits, and it's devout worshippers, I love
python, I even give credit to Guido for inspiring my own language
deviation in C++. I like cross language diversity, so I do things
everywhere, and I see noone complaining all the time about any one
language, nor it's advantages to them, especially if they know it and
use it.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-21 Thread Stefan Behnel


Alan Gauld, 21.12.2010 10:58:

"David Hutto" wrote

http://www.google.com/search?client=ubuntu&channel=fs&q=parsing+gigabyte+xml+python&ie=utf-8&oe=utf-8


Eeek! One of the listings says:


22 Jan 2009 ... Stripping Illegal Characters from XML in Python >>

... I'd be asking Python to process 6.4 gigabytes of CSV into
6.5 gigabytes of XML 1. . In fact, what happened was that
the parsing didn't work and the whole db was ...

And I thought a 1G file was extreme... Do these people stop to think that
with XML as much as 80% of their "data" is just description (ie the tags).


As I already said, it compresses well. In run-length compressed XML files, 
the tags can easily take up a negligible amount of space compared to the 
more widely varying data content (although that also commonly tends to 
compress rather well). And depending on how fast your underlying storage 
is, decompressing and parsing the file may still be faster than parsing a 
huge uncompressed file directly. So, again, the shear uncompressed file 
size is *not* a very interesting argument.


Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Hangman game.....problem putting strings in a list.....

2010-12-21 Thread Alan Gauld



"Yasin Yaqoobi"  wrote

I'm confused. The error message you describe doesn't
appear to match any line in your code.

Please provide the full error printout not just a single line.
Meanwhile some comments...


global line
global index;


global is not doing anything here, it is only effective inside a 
function.


Try not to use global variables unless you have to. Specifically only
for data that's shared between functions, and even then it's usually
better practice to pass the values into the functions as arghuments.


guessed = ["-"];
count = 0;
wrong = 0;

def guess(letter):
global guessed



if (letter in line):


You don't need the parens, they don't do any harm,
but they aren't needed.


index = line.index(letter);
print guessed;


# This is the line that gives me the error don't know why? 
guessed[index] = " " + (letter); ,TypeError: 'str' object does not 
support item assignment



guessed[index] = (letter);


Again, you don't need the parens...
And I suspect you really want to use append() here rather
than assigning to guessed[index].


print ' '.join(guessed)
else:
global wrong;
wrong += 1;


def draw(number):...



def doit():
global count
while(wrong != 7):
a_letter = raw_input("Pick a letter --> ")
print
guess(a_letter);
draw(wrong);
print
count += 1

def initArray():
global guessed
print line
guessed =  guessed[0] * (len(line)-1)
print "this is new list " + guessed;


If you use the append() method you don't need this.


while 1:
line = file.readline();
if (len(line) >= 5):
initArray()
doit();
break
if not line: break

file.close()


HTH,


--
Alan Gauld
Author of the Learn to Program web site
http://www.alan-g.me.uk/


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-21 Thread David Hutto

Give me a little time to review this when it's not 5:30 in the morning
and I've been up since 9 am yesterday, and 'relearning' c++:)

But it still seems that you have have coding + filetype +
charactersinfileinformat., one long string that has to be parsed by
the C functions.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-21 Thread Stefan Behnel


Alan Gauld, 21.12.2010 10:46:

You don't have to use it for data transfer - eg MS's use
as a document storage format in Office - but frankly if
you use XML to store large volumes of data you are mad,
a database is a much more sensible option being far more
space efficient and faster to work with.


Even "storing large volumes of data" in XML can be perfectly ok. It depends 
on the use case. Database storage formats are not generally portable, for 
example, but they provide fast online access. Totally different use cases. 
Nothing's inherently wrong with storing large amounts of data in 
(compressed) XML for medium to long term storage or data exchange, and 
loading them back into a database to make the data quickly accessible for 
heavy processing.


Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-21 Thread David Hutto

On Tue, Dec 21, 2010 at 5:19 AM, Stefan Behnel  wrote:
> Alan Gauld, 21.12.2010 10:58:
>>
>> "David Hutto" wrote
>>>
>>>
>>> http://www.google.com/search?client=ubuntu&channel=fs&q=parsing+gigabyte+xml+python&ie=utf-8&oe=utf-8
>>
>> Eeek! One of the listings says:
>>
>>> 22 Jan 2009 ... Stripping Illegal Characters from XML in Python >>
>>
>> ... I'd be asking Python to process 6.4 gigabytes of CSV into
>> 6.5 gigabytes of XML 1. . In fact, what happened was that
>> the parsing didn't work and the whole db was ...
>>
>> And I thought a 1G file was extreme... Do these people stop to think that
>> with XML as much as 80% of their "data" is just description (ie the tags).
>
> As I already said, it compresses well. In run-length compressed XML files,
> the tags can easily take up a negligible amount of space compared to the
> more widely varying data content (although that also commonly tends to
> compress rather well). And depending on how fast your underlying storage is,
> decompressing and parsing the file may still be faster than parsing a huge
> uncompressed file directly. So, again, the shear uncompressed file size is
> *not* a very interesting argument.
>

However, could they (as mentioned elsewhere, and by other in another
form)mitigate the damage by using smaller tags exclusively?  And also
compressed is formatted, even for the tags, correct?
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Writing a programming language in Python

2010-12-21 Thread Stefan Behnel


David Hutto, 21.12.2010 11:16:

I understand it's .pyc so it's a compiled file and ready for usage
as 'anyother'(I might be wrong on this, but sure it's the same as
converting the original py file straight back to c).


".pyc" files have nothing to do with C. They are just compiled byte code, 
and not even portable to other Python runtimes (and sometimes not even 
between CPython versions).


Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-21 Thread Alan Gauld



"David Hutto"  wrote


(*)ASN.1, IDL etc all rely on a shared definition, and
often shared code library, at both sender and receiver.


This I might have to work on, but I rely on experience to 
quasi-trust

experience.


These are all data transport formats agreed and standardised
long before XML appeared. IDL is the format used in COM calls
for example and RPC calls between processes on an OS or
across a network. It is an OpenGroup standard I believe.

ASN.1 is a binary form and used in eCommerce and telecomms
networks for many years. It is standardised by the ITU

Edifact is the data standard of EDI and is set by the UN.
It has been used for commercial trading between large corporates
for many years.

All of these standards developed when network bandwidth
was very expensive so they all major on efficiency. XML was
developed by non networks-oriented people for the ease of
writing software for the web. Bandwidth was not a primary
concern to them.

There are other formats too, because the problem of transporting
data portably between computers has been with us since the
dawn of networking. XML just happens to be the most popular
format today. But popularity doesn't necessarily mean its good. :-)

--
Alan Gauld
Author of the Learn to Program web site
http://www.alan-g.me.uk/


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-21 Thread Alan Gauld



"David Hutto"  wrote


Somewhat of the fact that python uses C encourages me of that, but I
have still been looking into c++ to optimize, because I've used it
before, and the more languages I learn the more they feel 'similar',
but the same, if you can understand that!


Absolutely! That's why I use 3 languages in my tutor. To try and
dispel the myth that programming languages are hard to learn.
Once you can program in one language learning another
normally takes a few days or even hours. (Becoming fluent
is another matter! :-)

There are probably only about 5 or 6 basic structures to
programming languages:

Algol based (Pascal, C, Java, Python etc)
Lisp based (Lisp, Scheme, clojure etc)
Prolog based
Functional( Haskell, ML etc)
SQL based
others?...

--
Alan Gauld
Author of the Learn to Program web site
http://www.alan-g.me.uk/


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Writing a programming language in Python

2010-12-21 Thread David Hutto

On Tue, Dec 21, 2010 at 5:25 AM, Stefan Behnel  wrote:
> David Hutto, 21.12.2010 11:16:
>>
>> I understand it's .pyc so it's a compiled file and ready for usage
>> as 'anyother'(I might be wrong on this, but sure it's the same as
>> converting the original py file straight back to c).
>
> ".pyc" files have nothing to do with C. They are just compiled byte code,
> and not even portable to other Python runtimes (and sometimes not even
> between CPython versions).

I meant immediately available upon module calls on your own pc. Like
when you get a module that has them already compiled for your
architecture.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-21 Thread Alan Gauld



"David Hutto"  wrote


That';s what I saying above that xml seems to be the hog in terms of
it's user defined tags. Is that somewhat a confirmation of my hunch,
that it's the length of the users predefined tags that add to the
above mess, and that maybe a lessened tag system in accordance with
xml might be better, or a simple  tag  tag in the xml(other
files) with an index  to point to a and b would be better.


Shorter tags reduce the data volume by a bit (and it can be a
big bit if the names are all 20 characters long!) but the inherent tag
structure, even with single char names will still often surpass the
data content.


5


8 bytes to describe an int which could be represented in
a single byte in binary (or even in CSV). Even if the int were
a 64bit binary value (8 bytes) the minimal tag structure still
consumes the same data width. Of course if the data
content is a long string then simple tags become cost
effective (think  in XHTML)...

HTH,


--
Alan Gauld
Author of the Learn to Program web site
http://www.alan-g.me.uk/


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-21 Thread David Hutto

On Tue, Dec 21, 2010 at 5:35 AM, Alan Gauld  wrote:
>
> "David Hutto"  wrote
>
>> Somewhat of the fact that python uses C encourages me of that, but I
>> have still been looking into c++ to optimize, because I've used it
>> before, and the more languages I learn the more they feel 'similar',
>> but the same, if you can understand that!
>
> Absolutely! That's why I use 3 languages in my tutor.

And I saw them all right their first!...for the most part


 To try and
> dispel the myth that programming languages are hard to learn.
> Once you can program in one language learning another
> normally takes a few days or even hours. (Becoming fluent
> is another matter! :-)
>
> There are probably only about 5 or 6 basic structures to
> programming languages:
>
> Algol based (Pascal, C, Java, Python etc)
> Lisp based (Lisp, Scheme, clojure etc)
> Prolog based
> Functional( Haskell, ML etc)
> SQL based
> others?...

Could probably go all day about the ones I've either looked at or loved.

>
> --
> Alan Gauld
> Author of the Learn to Program web site
> http://www.alan-g.me.uk/
>
>
> ___
> Tutor maillist  -  tu...@python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-21 Thread Stefan Behnel


David Hutto, 21.12.2010 11:29:

On Tue, Dec 21, 2010 at 5:19 AM, Stefan Behnel wrote:

Alan Gauld, 21.12.2010 10:58:

22 Jan 2009 ... Stripping Illegal Characters from XML in Python>>


... I'd be asking Python to process 6.4 gigabytes of CSV into
6.5 gigabytes of XML 1. . In fact, what happened was that
the parsing didn't work and the whole db was ...

And I thought a 1G file was extreme... Do these people stop to think that
with XML as much as 80% of their "data" is just description (ie the tags).


As I already said, it compresses well. In run-length compressed XML files,
the tags can easily take up a negligible amount of space compared to the
more widely varying data content (although that also commonly tends to
compress rather well). And depending on how fast your underlying storage is,
decompressing and parsing the file may still be faster than parsing a huge
uncompressed file directly. So, again, the shear uncompressed file size is
*not* a very interesting argument.


However, could they (as mentioned elsewhere, and by other in another
form)mitigate the damage by using smaller tags exclusively?


Why should that have a (noticeable) impact on the compressed file? It's the 
inherent nature of compression to reduce redundancy, which in XML files 
usually includes the redundancy of repeated tag names (even if the 
compression is not specifically XML aware).


It's a very bad idea to use short and obfuscated tag names to reduce the 
storage size. That's like coding in assembler to reduce the size of the 
source code. Just use compression for storage, or buy a larger hard disk 
for your NAS.




And also compressed is formatted, even for the tags, correct?


The (lossless) compression doesn't change the content.

Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Hangman game.....problem putting strings in a list.....

2010-12-21 Thread Peter Otten

Yasin Yaqoobi wrote:

> # This is the line that gives me the error don't know why?
> guessed[index] = " " + (letter); ,TypeError: 'str' object does not
> support item assignment

I don't get this far because I run into

Traceback (most recent call last):
  File "hangman.py", line 69, in 
line = file.readline();
TypeError: descriptor 'readline' of 'file' object needs an argument

In the future please copy and paste the code you are actually running.
However:

> def initArray():
>  global guessed
>  print line
   print guessed
>  guessed =  guessed[0] * (len(line)-1)
   print guessed
>  print "this is new list " + guessed;

If you add these two print statements to your code you might find out 
yourself what's going wrong.

Peter

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-21 Thread Eike Welk

On Tuesday 21.12.2010 10:12:55 David Hutto wrote:
> Then what other file medium would you suggest as the tagging means.

One of those formats, that are specially designed for large amounts of data, 
is HDF5. It is intended for numerical data, but you can store text as well. 
There are multiple Python libraries for it, the most feature rich is IMHO 
PyTables. 

http://www.pytables.org/moin


Eike.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-21 Thread David Hutto

On Tue, Dec 21, 2010 at 5:45 AM, Alan Gauld  wrote:
>
> "David Hutto"  wrote
>
>> That';s what I saying above that xml seems to be the hog in terms of
>> it's user defined tags. Is that somewhat a confirmation of my hunch,
>> that it's the length of the users predefined tags that add to the
>> above mess, and that maybe a lessened tag system in accordance with
>> xml might be better, or a simple  tag  tag in the xml(other
>> files) with an index  to point to a and b would be better.
>
> Shorter tags reduce the data volume by a bit (and it can be a
> big bit if the names are all 20 characters long!) but the inherent tag
> structure, even with single char names will still often surpass the
> data content.
>
> 
> 5
> 


>
> 8 bytes to describe an int which could be represented in
> a single byte in binary (or even in CSV).

But that byte can't describe the tag(google hold my hand). I'll get
this eventually, but my iostream is long on content and hard on
parsing. So many languages, and technology, yet so little time.

Even if the int were
> a 64bit binary value (8 bytes) the minimal tag structure still
> consumes the same data width. Of course if the data
> content is a long string then simple tags become cost
> effective (think  in XHTML)...
>
> HTH,
>
>
> --
> Alan Gauld
> Author of the Learn to Program web site
> http://www.alan-g.me.uk/
>
>
> ___
> Tutor maillist  -  tu...@python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>



-- 
They're installing the breathalyzer on my email account next week.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-21 Thread David Hutto

On Tue, Dec 21, 2010 at 5:49 AM, Stefan Behnel  wrote:
> David Hutto, 21.12.2010 11:29:
>>
>> On Tue, Dec 21, 2010 at 5:19 AM, Stefan Behnel wrote:
>>>
>>> Alan Gauld, 21.12.2010 10:58:
>
> 22 Jan 2009 ... Stripping Illegal Characters from XML in Python>>

 ... I'd be asking Python to process 6.4 gigabytes of CSV into
 6.5 gigabytes of XML 1. . In fact, what happened was that
 the parsing didn't work and the whole db was ...

 And I thought a 1G file was extreme... Do these people stop to think
 that
 with XML as much as 80% of their "data" is just description (ie the
 tags).
>>>
>>> As I already said, it compresses well. In run-length compressed XML
>>> files,
>>> the tags can easily take up a negligible amount of space compared to the
>>> more widely varying data content (although that also commonly tends to
>>> compress rather well). And depending on how fast your underlying storage
>>> is,
>>> decompressing and parsing the file may still be faster than parsing a
>>> huge
>>> uncompressed file directly. So, again, the shear uncompressed file size
>>> is
>>> *not* a very interesting argument.
>>
>> However, could they (as mentioned elsewhere, and by other in another
>> form)mitigate the damage by using smaller tags exclusively?
>
> Why should that have a (noticeable) impact on the compressed file? It's the
> inherent nature of compression to reduce redundancy, which in XML files
> usually includes the redundancy of repeated tag names (even if the
> compression is not specifically XML aware).
>
> It's a very bad idea to use short and obfuscated tag names to reduce the
> storage size.


Maybe my style is a form of bad coder example, in some areas(present
company accepted). For example, I have a dictionary that has codes
within a text file, that point to other lines for verbs, adj, nouns,
etc.
So  doesn't have to mean a it could mean  = , but would
that help in making the initial usage of  in the xml file faster,
or slower, by parsing for  then relating  to ?


That's like coding in assembler to reduce the size of the
> source code.

Haven't gotten to assembler yet, almost there.


 Just use compression for storage, or buy a larger hard disk for
> your NAS.
>
>
>> And also compressed is formatted, even for the tags, correct?
>
> The (lossless) compression doesn't change the content.

google search later, I promise.


>
> Stefan
>
> ___
> Tutor maillist  -  tu...@python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>



-- 
They're installing the breathalyzer on my email account next week.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-21 Thread Stefan Behnel


David Hutto, 21.12.2010 12:02:

On Tue, Dec 21, 2010 at 5:45 AM, Alan Gauld wrote:

8 bytes to describe an int which could be represented in
a single byte in binary (or even in CSV).


Well, "CSV" indicates that there's at least one separator character 
involved, so make that an asymptotic 2 bytes on average. But obviously, 
compression applies to CSV and other 'readable' formats as well.




But that byte can't describe the tag


Yep, that's an argument that Alan already presented.

Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-21 Thread David Hutto

On Tue, Dec 21, 2010 at 6:19 AM, Stefan Behnel  wrote:
> David Hutto, 21.12.2010 12:02:
>>
>> On Tue, Dec 21, 2010 at 5:45 AM, Alan Gauld wrote:
>>>
>>> 8 bytes to describe an int which could be represented in
>>> a single byte in binary (or even in CSV).
>
> Well, "CSV" indicates that there's at least one separator character
> involved, so make that an asymptotic 2 bytes on average. But obviously,
> compression applies to CSV and other 'readable' formats as well.
>
>
>> But that byte can't describe the tag
>
> Yep, that's an argument that Alan already presented.

Didn't see that, but that would make the minimal format for parsing a
comma, or any other single character marker, and the minimal would
still be a specific marker in a file, but does not answer my question
about the assignment to another file's variable.

If file a.xml has simple tagged xml like , and file b.config has
tags that represent the a.xml(i.e. = ) as greater tags,
does this pattern optimize the process by limiting the size of the
tags to be parsed in the xml, then converting those simpler tags that
are found to the b.config values for the simple  simple format?
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-21 Thread David Hutto

On Tue, Dec 21, 2010 at 6:41 AM, David Hutto  wrote:
> On Tue, Dec 21, 2010 at 6:19 AM, Stefan Behnel  wrote:
>> David Hutto, 21.12.2010 12:02:
>>>
>>> On Tue, Dec 21, 2010 at 5:45 AM, Alan Gauld wrote:

 8 bytes to describe an int which could be represented in
 a single byte in binary (or even in CSV).
>>
>> Well, "CSV" indicates that there's at least one separator character
>> involved, so make that an asymptotic 2 bytes on average. But obviously,
>> compression applies to CSV and other 'readable' formats as well.
>>
>>
>>> But that byte can't describe the tag
>>
>> Yep, that's an argument that Alan already presented.
>
> Didn't see that, but that would make the minimal format for parsing a
> comma, or any other single character marker, and the minimal would
> still be a specific marker in a file, but does not answer my question
> about the assignment to another file's variable.
>
> If file a.xml has simple tagged xml like , and file b.config has
> tags that represent the a.xml(i.e. = ) as greater tags,
> does this pattern optimize the process by limiting the size of the
> tags to be parsed in the xml, then converting those simpler tags that
> are found to the b.config values for the simple  simple format?
>

In other words I'm lazy and asking for the experiment to be performed
for me(or, more importantly, if it has been), but since I'm not new to
this, if no one has a specific case, I'll timeit when I get to it.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-21 Thread Stefan Behnel


David Hutto, 21.12.2010 12:45:

If file a.xml has simple tagged xml like, and file b.config has
tags that represent the a.xml(i.e.  =) as greater tags,
does this pattern optimize the process by limiting the size of the
tags to be parsed in the xml, then converting those simpler tags that
are found to the b.config values for the simple  simple format?


In other words I'm lazy and asking for the experiment to be performed
for me(or, more importantly, if it has been), but since I'm not new to
this, if no one has a specific case, I'll timeit when I get to it.


I'm still not sure I understand what you are trying to describe here, but I 
think you want to look into the Wikipedia articles on indexing, hashing and 
compression.


http://en.wikipedia.org/wiki/Index_%28database%29
http://en.wikipedia.org/wiki/Index_%28information_technology%29
http://en.wikipedia.org/wiki/Hash_function
http://en.wikipedia.org/wiki/Data_compression

Terms like "indirection" and "mapping" also come to my mind when I try to 
make sense out of your hints.


Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-21 Thread David Hutto

On Tue, Dec 21, 2010 at 6:59 AM, Stefan Behnel  wrote:
> David Hutto, 21.12.2010 12:45:
>>>
>>> If file a.xml has simple tagged xml like, and file b.config has
>>> tags that represent the a.xml(i.e.  =) as greater tags,
>>> does this pattern optimize the process by limiting the size of the
>>> tags to be parsed in the xml, then converting those simpler tags that
>>> are found to the b.config values for the simple  simple format?
>>
I forget to insert my tags...



>> In other words I'm lazy and asking for the experiment to be performed
>> for me(or, more importantly, if it has been), but since I'm not new to
>> this, if no one has a specific case, I'll timeit when I get to it.



>
> I'm still not sure I understand what you are trying to describe here, but I
> think you want to look into the Wikipedia articles on indexing, hashing and
> compression.

a.xml has tags with simplistic forms, like was argued above, with ,
or . b.config has variables for the simple tags in a.xml so that
 =  in b.config.

So when parsing a.xml, you parse it, then use more complex tags to
define with b.config.. I'll review the url's a little later.



>
> http://en.wikipedia.org/wiki/Index_%28database%29
> http://en.wikipedia.org/wiki/Index_%28information_technology%29
> http://en.wikipedia.org/wiki/Hash_function
> http://en.wikipedia.org/wiki/Data_compression
>
> Terms like "indirection" and "mapping" also come to my mind when I try to
> make sense out of your hints.


Terms like tags, and xml also come to mind. Or parsing, or regular
expressions, or re, or find, or alot of things come to mind. My
experience is limited, but not by much, and certainly not in respect
to the scope of other languages. But thank you for the references, I'm
not so good, that, I can't afford to look through a bunch of coal to
find a diamond.


>
> Stefan
>
> ___
> Tutor maillist  -  tu...@python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>



-- 
They're installing the breathalyzer on my email account next week.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-21 Thread Stefan Behnel


David Hutto, 21.12.2010 13:09:

On Tue, Dec 21, 2010 at 6:59 AM, Stefan Behnel wrote:

David Hutto, 21.12.2010 12:45:


If file a.xml has simple tagged xml like, and file b.config has
tags that represent the a.xml(i.e.=) as greater tags,
does this pattern optimize the process by limiting the size of the
tags to be parsed in the xml, then converting those simpler tags that
are found to the b.config values for the simplesimple format?


In other words I'm lazy and asking for the experiment to be performed
for me(or, more importantly, if it has been), but since I'm not new to
this, if no one has a specific case, I'll timeit when I get to it.


I'm still not sure I understand what you are trying to describe here


a.xml has tags with simplistic forms, like was argued above, with,
or. b.config has variables for the simple tags in a.xml so that
  =  in b.config.

So when parsing a.xml, you parse it, then use more complex tags to
define with b.config.. I'll review the url's a little later.


Ok, I'd call that simple renaming, that's what I meant with "indirection" 
and "mapping" (basically the two concepts that computer science is all 
about ;).


Sure, run your own benchmarks, but don't expect anyone to be interested in 
the results. If your interest is to obfuscate the tag names, why not just 
use a binary (or less readable) format? That gives you much better 
obfuscation in the first place.


Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-21 Thread Steven D'Aprano


Alan Gauld wrote:


XML is a self-describing data format. It is usually used for files
but can be used in data streams or in-memory strings.

It's natural competitors are TLV (Tag,Lenth,Value) and
CSV(Comma Seperated Value) files but neither is as rich
in structure.  Alternative options include ASN.1, Edifact and
IDL but these are not self-describing(*) (although they are all
more compact and faster to parse, but only IDL is free.)


I would have thought that both JSON and YAML are competitors to XML, 
although of course it depends on exactly what you are using XML for. For 
example, Gnome uses XML files extensively for their poor-man's Registry, 
which is a shame as (in my opinion) simple Windows-style INI files or 
Unix/Linux style config files would be a far better and more natural choice.


Basically, people shouldn't make the mistake of thinking that because 
XML is text-based it is meant as a human-readable (let alone 
human-editable) format. It's not. It's a machine format that happens to 
be *just barely* human-readable and -editable in simple cases due to 
using ASCII text



--
Steven
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-21 Thread Steven D'Aprano


Stefan Behnel wrote:

David Hutto, 21.12.2010 10:29:

File = string

going through string code

finding pieces of the string and marking the territory.


I don't see 'real' optimization other than rolling your own.


Reads like a Haiku. Doesn't quite fit the verse, though.

 From your behaviour, I get the impression that you are just trolling.


Consider David's email address and mail sig:

smokefl...@gmail.com
"They're installing the breathalyzer on my email account next week."


I don't think he's trolling so much as floating around the ceiling :)




--
Steven

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-21 Thread Alan Gauld



"Steven D'Aprano"  wrote


It's natural competitors are TLV (Tag,Lenth,Value) and
CSV(Comma Seperated Value) files but neither is as rich


I would have thought that both JSON and YAML are competitors to XML,


Totally agree but I excluded those on the basis that they weren't
around when XML was invented but CSV and TLV etc were.
(I guess my reasoning is that while XML was a competitor to
JSON/YAML when they were developed - because it preceded
them - they were not competitors to XML because they post
dated it. But it depe4nds on how you define the competition
- is it as a candidate for use now or alternatives at the time
of creation...)

But if you were looking for a choice of format today then YAML
and JSON would have to be included.

Alan g


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-21 Thread Alan Gauld



"Stefan Behnel"  wrote

And I thought a 1G file was extreme... Do these people stop to 
think that
with XML as much as 80% of their "data" is just description (ie the 
tags).


As I already said, it compresses well. In run-length compressed XML 
files, the tags can easily take up a negligible amount of space 
compared to the more widely varying data content


I understand how compression helps with the data transmission aspect.

compress rather well). And depending on how fast your underlying 
storage is, decompressing and parsing the file may still be faster 
than parsing a huge uncompressed file directly.


But I don't understand how uncompressing a file before parsing it can
be faster than parsing the original uncompressed file?

There are ways of processing xml to reduce the tag space (a bit like
tinyurl does with long urls) but then the parsing code has to know
about the tag translations too - and usually the savings are small.

Curious,

Alan G. 



___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-21 Thread David Hutto

Establish that with fact that initiatially I didn't have a reason to
be hostile, and that your comment of my kubit kaba here, and your
comment on comp.python.lang about your pystats, aftger our
conversation, and your reference to it not being "set in stone",
wasn't a reference tyo our statrs argument here.

Who sets examples steven. Mediocre idols, or egotistical
programmers(whom I've met plenty of, hence my unecesary  hostility )
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-21 Thread David Hutto

Take a look at the flame wars individuals see, comments by programmers
who are sarcastic, and think of the response you might have had to the
initial questions you had , and maybe even a few paranoid delusions
you got hacked.

It's not a rewarding experience not being a college educated
individual vs someone who is a self learner who has to put up with
lots of ego because they got the degree, and I have to earn it through
damn near torture and fire.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-21 Thread David Hutto

On Tue, Dec 21, 2010 at 9:32 AM, David Hutto  wrote:
> Take a look at the flame wars individuals see, comments by programmers
> who are sarcastic, and think of the response you might have had to the
> initial questions you had , and maybe even a few paranoid delusions
> you got hacked.
>
> It's not a rewarding experience not being a college educated
> individual vs someone who is a self learner who has to put up with
> lots of ego because they got the degree, and I have to earn it through
> damn near torture and fire.
>

I don't mind hazing, but I do mind you underestimating my potential,
or my ability
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-21 Thread Steven D'Aprano


David Hutto wrote:

Establish that with fact that initiatially I didn't have a reason to
be hostile, and that your comment of my kubit kaba here, and your
comment on comp.python.lang about your pystats, aftger our
conversation, and your reference to it not being "set in stone",
wasn't a reference tyo our statrs argument here.


I have no idea what you are talking about. I can't decipher the above 
paragraph, what's a "kubit kaba"?


I don't know why you have taken offense, or even what you have taken 
offense over.



--
Steven

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-21 Thread David Hutto

And furthermore, I'm not the first, nor the last to get angry and
frustrated on the internet. I'm not the first to get drunk, and type.
And I dare any employer to deny me the right to MY personal time.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-21 Thread David Hutto

On Tue, Dec 21, 2010 at 9:36 AM, Steven D'Aprano  wrote:
> David Hutto wrote:
>>
>> Establish that with fact that initiatially I didn't have a reason to
>> be hostile, and that your comment of my kubit kaba here, and your
>> comment on comp.python.lang about your pystats, aftger our
>> conversation, and your reference to it not being "set in stone",
>> wasn't a reference tyo our statrs argument here.
>
> I have no idea what you are talking about. I can't decipher the above
> paragraph, what's a "kubit kaba"?

Check the 'archives' steven, dcite your own reerences to deny the
claim. I'm not known for being a liar..


>
> I don't know why you have taken offense, or even what you have taken offense
> over.
>
>
> --
> Steven
>
> ___
> Tutor maillist  -  tu...@python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>



-- 
They're installing the breathalyzer on my email account next week.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-21 Thread David Hutto

On Tue, Dec 21, 2010 at 9:40 AM, David Hutto  wrote:
> On Tue, Dec 21, 2010 at 9:36 AM, Steven D'Aprano  wrote:
>> David Hutto wrote:
>>>
>>> Establish that with fact that initiatially I didn't have a reason to
>>> be hostile, and that your comment of my kubit kaba here, and your
>>> comment on comp.python.lang about your pystats, aftger our
>>> conversation, and your reference to it not being "set in stone",
>>> wasn't a reference tyo our statrs argument here.
>>
>> I have no idea what you are talking about. I can't decipher the above
>> paragraph, what's a "kubit kaba"?
>
> Check the 'archives' steven, dcite your own refrences to deny the
> claim. I'm not known for being a liar..
>
>
>>
>> I don't know why you have taken offense, or even what you have taken offense
>> over.
>>
>>
>> --
>> Steven
>>
>> ___
>> Tutor maillist  -  tu...@python.org
>> To unsubscribe or change subscription options:
>> http://mail.python.org/mailman/listinfo/tutor
>>
>
>
>
> --
> They're installing the breathalyzer on my email account next week.
>



-- 
They're installing the breathalyzer on my email account next week.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-21 Thread David Hutto

Me and you, apparently know exactly what i'm talking about...

http://code.activestate.com/lists/python-tutor/79293/
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-21 Thread David Hutto

you got nothing of real value.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-21 Thread David Hutto

And a lesson of what you really are to anyone listening.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-21 Thread Stefan Behnel


Alan Gauld, 21.12.2010 15:11:

"Stefan Behnel" wrote

And I thought a 1G file was extreme... Do these people stop to think that
with XML as much as 80% of their "data" is just description (ie the tags).


As I already said, it compresses well. In run-length compressed XML
files, the tags can easily take up a negligible amount of space compared
to the more widely varying data content


I understand how compression helps with the data transmission aspect.


compress rather well). And depending on how fast your underlying storage
is, decompressing and parsing the file may still be faster than parsing a
huge uncompressed file directly.


But I don't understand how uncompressing a file before parsing it can
be faster than parsing the original uncompressed file?


I didn't say "uncompressing a file *before* parsing it". I meant 
uncompressing the data *while* parsing it. Just like you have to decode it 
for parsing, it's just an additional step to decompress it before decoding. 
Depending on the performance relation between I/O speed and decompression 
speed, it can be faster to load the compressed data and decompress it into 
the parser on the fly. lxml.etree (or rather libxml2) internally does that 
for you, for example, if it detects compressed input when parsing from a file.


Note that these performance differences are tricky to prove in benchmarks, 
as repeating the benchmark usually means that the file is already cached in 
memory after the first run, so the decompression overhead will dominate in 
the second run. That's not what you will see in a clean run or for huge 
files, though.


Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

[Tutor] REGEX EXPLANATION

2010-12-21 Thread delegbede

I have been reading regex in order to work around an assignment. 

Could anyone explain to me in plain english what the following regex expression 
translate to. 

(r'PSC(?P\d{6})(DC|VR)(\d{2})(RC|GA)(\d{3})(!?([A-Z\d]{1,})@?(.*))?',
 re.I)

(r'PSC(?P\d{6})VR(?P\d{2})RC(?P\d{3})(?P[ABCDEFGHJKMNPQRSTUVWXYZ\d]{2,})?',
 re.I)

(r'PSC(?P\d{6})VR(?P\d{2})(?P(RC|GA))(?P\d{3})!(?P[ABCDEFGHJKMNPQ]{1,})@?(?P.*)',
 re.I)

(r'PSC(?P\d{6})DC(?P\d{2})RC(?P\d{3})(?P[ABCDEFGHJKMNPQRSTUVWX\d]{2,})?',
 re.I)

(r'PSC(?P\d{6})DC(?P\d{2})(?P(RC|GA))(?P\d{3})!(?P[ABCDEFGHJK]{1,})@?(?P.*)',
 re.I)

Like reading it out in plain english. This should further help me understand 
how these signs work and then I can get along. 

Thanks. 
Sent from my BlackBerry wireless device from MTN
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-21 Thread David Hutto

On Tue, Dec 21, 2010 at 10:03 AM, Stefan Behnel  wrote:
> Alan Gauld, 21.12.2010 15:11:
>>
>> "Stefan Behnel" wrote

 And I thought a 1G file was extreme... Do these people stop to think
 that
 with XML as much as 80% of their "data" is just description (ie the
 tags).
>>>
>>> As I already said, it compresses well. In run-length compressed XML
>>> files, the tags can easily take up a negligible amount of space compared
>>> to the more widely varying data content
>>
>> I understand how compression helps with the data transmission aspect.
>>
>>> compress rather well). And depending on how fast your underlying storage
>>> is, decompressing and parsing the file may still be faster than parsing a
>>> huge uncompressed file directly.
>>
>> But I don't understand how uncompressing a file before parsing it can
>> be faster than parsing the original uncompressed file?
>
> I didn't say "uncompressing a file *before* parsing it".

He didn't say utilizing code below Python either, but others will
argue the microseconds matter, and if that's YOUR standard, then keep
it for client and self.

 I meant
> uncompressing the data *while* parsing it. Just like you have to decode it
> for parsing, it's just an additional step to decompress it before decoding.
> Depending on the performance relation between I/O speed and decompression
> speed, it can be faster to load the compressed data and decompress it into
> the parser on the fly. lxml.etree (or rather libxml2) internally does that
> for you, for example, if it detects compressed input when parsing from a
> file.
>
> Note that these performance differences are tricky to prove in benchmarks,

Tricky and proven, then tell me what real time, and this is in
reference to a recent c++ discussion, is python used in ,andhow could
it be utilized insay an aviation system to avoid a collision when
milliseconds are on the line?

> as repeating the benchmark usually means that the file is already cached in
> memory after the first run, so the decompression overhead will dominate in
> the second run. That's not what you will see in a clean run or for huge
> files, though.
>
> Stefan
>
> ___
> Tutor maillist  -  tu...@python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>



-- 
They're installing the breathalyzer on my email account next week.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-21 Thread Stefan Behnel


David Hutto, 21.12.2010 16:11:

On Tue, Dec 21, 2010 at 10:03 AM, Stefan Behnel wrote:

I meant
uncompressing the data *while* parsing it. Just like you have to decode it
for parsing, it's just an additional step to decompress it before decoding.
Depending on the performance relation between I/O speed and decompression
speed, it can be faster to load the compressed data and decompress it into
the parser on the fly. lxml.etree (or rather libxml2) internally does that
for you, for example, if it detects compressed input when parsing from a
file.

Note that these performance differences are tricky to prove in benchmarks,


Tricky and proven, then tell me what real time, and this is in
reference to a recent c++ discussion, is python used in ,andhow could
it be utilized insay an aviation system to avoid a collision when
milliseconds are on the line?


I doubt that there are many aviation systems that send around gigabytes of 
compressed XML data milliseconds before a collision.


I even doubt that air plane collision detection is time critical anywhere 
in the milliseconds range. After all, there's a pilot who has to react to 
the collision warning, and he or she will certainly need more than a couple 
of milliseconds to react, not to mention the time that it takes for the air 
plane to adapt its flight direction. If you plan the system in a way that 
makes milliseconds count, you can just as well replace it by a 
jack-in-the-box. Oh, and that might even speed up the reaction of the pilot. ;)


So, no, if these systems ever come close to a somewhat recent state of 
technology, I wouldn't mind if they were written in Python. The CPython 
runtime is pretty predictable in its performance characteristics, after all.


Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

[Tutor] AttributeError: 'list' object has no attribute 'find'

2010-12-21 Thread Ben Ganzfried

Hey,

I keep getting the error above and unfortunately browsing through
google and finding similar responses has not been fruitful for me.  My
code is below and I have marked off the location of the problem in my
code.  I'm wondering the following:

1) Doesn't the read() file object method return the specified
characters from the file as a string?
2) If #1 is correct, then why is my variable "source" being viewed as
a list as opposed to a string?
3) How can I change my variable "source" so that it can use the 'find'
method?  Or alternatively, is there another method besides the 'find'
method that would do the same thing that 'find' does that would work
on my variable 'source' as it currently is?

Thanks so much,

Ben

#recursively find each instance of the tag throughout the whole document
def findOneTag (tag, source, output):
print("tag is ", tag)
#print("source is now ", source)
print("output is ", output)
#base case
if source == "":
return
#recursive case
tagIndex = source.find(tag) #(*THIS IS THE LOCATION OF THE
PROBLEM**)
print("tagIndex is ", tagIndex)
start = source[tagIndex:].find("\t") + 1
print("start is ", start)
stop = source[tagIndex + start:].find("\t")
print("stop is ", stop)
if tagIndex !=-1:
output.write(tag + "\t" + line[start: stop])
print("spliced text is: ", line[start:stop])
#recursively call findOneTag(tag, String source[stop:], output)
findOneTag(tag, source[stop + 1:], output)

def main():
tag = "species"
inname = "skeletontext.txt"
outname = "skeletontext1234567.txt"
inname1 = open(inname, "r")
output = open(outname, "w")
source = inname1.readlines()
print("source is: ", source)
findOneTag(tag, source, output)
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-21 Thread Alan Gauld



"Stefan Behnel"  wrote

But I don't understand how uncompressing a file before parsing it 
can

be faster than parsing the original uncompressed file?


I didn't say "uncompressing a file *before* parsing it". I meant 
uncompressing the data *while* parsing it.


Ah, ok that can work, although it does add a layer of processing
to identify compressed v uncompressed data, but if I/O is the
bottleneck then it could give an advantage.

Alan g. 



___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] AttributeError: 'list' object has no attribute 'find'

2010-12-21 Thread Alan Gauld



"Ben Ganzfried"  wrote


1) Doesn't the read() file object method return the specified
characters from the file as a string?


Yes
2) If #1 is correct, then why is my variable "source" being viewed 
as

a list as opposed to a string?


You are not using read(), you are using readlines()
which returns all of the lines in a list.

3) How can I change my variable "source" so that it can use the 
'find'

method?


Use read()

HTH,


--
Alan Gauld
Author of the Learn to Program web site
http://www.alan-g.me.uk/


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] AttributeError: 'list' object has no attribute 'find'

2010-12-21 Thread Luke Paireepinart

Read() does return a string. I guess the better question would be... Are you 
using read? 'cause in the example you sent you used readlines() which returns a 
list of lines.

-
Sent from a mobile device with a bad e-mail client.
-

On Dec 21, 2010, at 10:07 AM, Ben Ganzfried  wrote:

> Hey,
> 
> I keep getting the error above and unfortunately browsing through
> google and finding similar responses has not been fruitful for me.  My
> code is below and I have marked off the location of the problem in my
> code.  I'm wondering the following:
> 
> 1) Doesn't the read() file object method return the specified
> characters from the file as a string?
> 2) If #1 is correct, then why is my variable "source" being viewed as
> a list as opposed to a string?
> 3) How can I change my variable "source" so that it can use the 'find'
> method?  Or alternatively, is there another method besides the 'find'
> method that would do the same thing that 'find' does that would work
> on my variable 'source' as it currently is?
> 
> Thanks so much,
> 
> Ben
> 
> #recursively find each instance of the tag throughout the whole document
> def findOneTag (tag, source, output):
>print("tag is ", tag)
>#print("source is now ", source)
>print("output is ", output)
>#base case
>if source == "":
>return
>#recursive case
>tagIndex = source.find(tag) #(*THIS IS THE LOCATION OF THE
> PROBLEM**)
>print("tagIndex is ", tagIndex)
>start = source[tagIndex:].find("\t") + 1
>print("start is ", start)
>stop = source[tagIndex + start:].find("\t")
>print("stop is ", stop)
>if tagIndex !=-1:
>output.write(tag + "\t" + line[start: stop])
>print("spliced text is: ", line[start:stop])
>#recursively call findOneTag(tag, String source[stop:], output)
>findOneTag(tag, source[stop + 1:], output)
> 
> def main():
>tag = "species"
>inname = "skeletontext.txt"
>outname = "skeletontext1234567.txt"
>inname1 = open(inname, "r")
>output = open(outname, "w")
>source = inname1.readlines()
>print("source is: ", source)
>findOneTag(tag, source, output)
> ___
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-21 Thread Luke Paireepinart

You're not going to win any friends here Dave. Steven is well known on this 
list. He is sometimes abrasive but it's rarely if ever malicious. Anytime he's 
ever been rude to me it was deserved. Like how I top post from my phone. Or 
giving bad advice to newbies.

People are getting irritated because YOU are not making the effort to be 
understood, and then getting angry when we don't understand you. Have you read 
Eric raymond's smart questions article? It may help you understand the 
mentality here.

And no we are not hostile to people because they lack education. But may be 
perceived as hostile to those who do not WANT education. Whatever bandying 
about and posturing you want to do is fine with me, but when it comes down to 
it, if you ask a question and I can tell you didn't make sufficient effort to 
solve it yourself (due to laziness, not ignorance or inability) I have little 
to no incentive to help you.

Also, I have a masters in C.S. My stepfather has been a programmer for 30 years 
and has no programming degree. I do not think my degree is a replacement for 
that, and learn new things from him every day. It goes both ways. Have respect 
for yourself and your ability to learn what you need to, and we'll have the 
respect to help pick you up if you stumble.

My 2 cents.

- 
Sent from a mobile device with a bad e-mail client.
-

On Dec 21, 2010, at 8:49 AM, David Hutto  wrote:

> And a lesson of what you really are to anyone listening.
> ___
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

[Tutor] [OT] Bay area meetups for python &/ {linux, embedded, mobile, wireless ?}

2010-12-21 Thread ashish makani

>
> OT(off_topic): I moved to the bay area recently & am passionate about
> technology in general & linux, python, c, embedded, mobile, wireless
> stuff,.
> I was wondering if any of you guys, are part of some bay area python( or
> other tech) meetup ( as in do you guys meetup, in person) for like a tech
> talk / discussion / brainstorming/ hack nights ?
>


> If yes, i would love to know more & be a part of it
>


Thanks Tino for the info

Any other inputs from other folks ?
OT(off_topic): I moved to the bay area recently & am passionate about
technology in general & linux, python, c, embedded, mobile, wireless
stuff,.
I was wondering if any of you guys, are part of some bay area python( or
other tech) meetup ( as in do you guys meetup, in person) for like a tech
talk / discussion / brainstorming/ hack nights ?
If yes, i would love to know more & be a part of it

I am aware of  BayPiggies
http://www.baypiggies.net/
http://mail.python.org/mailman/listinfo/baypiggies


On Mon, Dec 20, 2010 at 9:10 PM, Tino Dai  wrote:

> Hi Ashish,
>
> Check out Noisebridge ( 
> https://www.*noisebridge*.net/) in
> SF. I think you will find there are like minded tech people there. It also
> has Mitch Altman (   
> http://en.wikipedia.org/wiki/*Mitch*_*Altman*
> ) is one of the founders/original members.
>
> -Tino
>
>
> ___
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>
>

*"We act as though comfort and luxury were the chief requirements of life,
when all that we need to make us happy is something to be enthusiastic
about."
-- Albert Einstein*
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-21 Thread Walter Prins

On 21 December 2010 14:11, Alan Gauld  wrote:

> But I don't understand how uncompressing a file before parsing it can
> be faster than parsing the original uncompressed file?
>

Because of IO overhead/benefits.  It's not so much that the parsing aspect
of it is faster of course (it is what it is), it's that the total time taken
to (read+decompress+parse) is faster than just (read+parse), because the
time to actually read the compressed data is less than the time it takes to
decompress that data into RAM.  Generally speaking, compared to your CPU and
memory, with respect to IO your disk is always going to be the culprit,
though of course it does depend on exactly how much data we're talking
about, how fast your CPU is, etc.

In general computing this is less of an issue nowadays than perhaps a few
years ago, and the gains can be as you say small, or sometimes not so small,
depending exactly how much data you've got, how highly compressed it's
become, how fast/efficient the decompresser is, how slow your I/O channel is
etc, but the point nevertheless stands.  Case in point, it's perhaps
interesting to note that this technique is used  regularly on the web in
general -- most web servers actually stream their HTML content as LZ
compressed data streams, since (as above) it's quicker to compress, stream,
decompress and parse than it is to just stream the data direct.  (And, of
course, thanks to zlib + urllib one can even use this feature from Python
should you wish to do so.)

Anyway, just my $0.02!

Walter
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-21 Thread Walter Prins

On 21 December 2010 17:57, Alan Gauld  wrote:

>
> "Stefan Behnel"  wrote
>
>  But I don't understand how uncompressing a file before parsing it can
>>> be faster than parsing the original uncompressed file?
>>>
>>
>> I didn't say "uncompressing a file *before* parsing it". I meant
>> uncompressing the data *while* parsing it.
>>
>
> Ah, ok that can work, although it does add a layer of processing
> to identify compressed v uncompressed data, but if I/O is the
> bottleneck then it could give an advantage.
>

OK my apologies, I see my previous response was already circumscribed by
later emails (which I had not read yet.)  Feel free to ignore it. :)

Walter
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-21 Thread David Hutto

On Tue, Dec 21, 2010 at 1:23 PM, Luke Paireepinart
 wrote:
> You're not going to win any friends here Dave.
 Wasn't trying to.

Steven is well known on this list.

And that means something to you only.

 He is sometimes abrasive but it's rarely if ever malicious.

 Anytime he's ever been rude to me it was deserved,and the only times
I remember being rude to him, it was settled, and then we go right
back to steven being an ass(not that I ever said I wasn't at some
points, and we've had off list emails as well


. Like how I top post from my phone. Or giving bad advice to newbies.

Guilty of the same, but I wasn't trying to malicious either, just as
were'nt trying to be.

>
> People are getting irritated because YOU are not making the effort to be 
> understood

I pete and repeat when asked, and barely try to ask, when most of the
time google and the docs solve it



, and then getting angry when we don't understand you.

More like frustrated than angry, and I believe everyone here has had
the same feeling.

 Have you read Eric raymond's smart questions article?

Not his but others.

 It may help you understand the mentality here.
>
> And no we are not hostile to people because they lack education.

And apparently you speak for all of the programmers out there.


 But may be perceived as hostile to those who do not WANT education.
Whatever bandying about and posturing you want to do is fine with me,
but when it comes down to it, if you ask a question and I can tell you
didn't make sufficient effort to solve it yourself (due to laziness,
not ignorance or inability) I have little to no incentive to help you.
>
> Also, I have a masters in C.S. My stepfather has been a programmer for 30 
> years and has no programming degree. I do not think my degree is a 
> replacement for that, and learn new things from him every day. It goes both 
> ways. Have respect for yourself and your ability to learn what you need to, 
> and we'll have the respect to help pick you up if you stumble


I like to think that in terms of being a python self learner, I have
persevered on my own very well. Maybe i was feeling inadequate and
frustrated, and lashed ouy, but so has he, and I've tried to let it
go, and will again this time.
.
>
> My 2 cents.

Pocketed.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Hangman game.....problem putting strings in a list.....

2010-12-21 Thread ALAN GAULD

Forwarding to tutor list, please use Reply All whjen replying to the group.

 
Alan Gauld
Author of the Learn To Program website
http://www.alan-g.me.uk/




- Original Message 

> full  error printout. I tried debugging it and I still don't know  why
> guessed[index] = (letter) gives me an error. I don't want to use the  append
> method because it will put the new letter at the end of the list which  is
> very different then playing the hangman game..

OK, On reading it again I see why you are using the index approach.


> guessed =  guessed[0] *  (len(line)-1)

That should be OK although personally I'd use a list comprehension 
to initialise the list:

guessed = [None for c in line]  # use the characters in line to get 
corresponding Nones

> Traceback (most recent call last):
>File "/Python_Hello_World/src/HangMan/__init__.py", line 72, in  
> doit();
> File "/Python_Hello_World/src/HangMan/__init__.py",  line 56, in doit
> guess(a_letter);
>   File  "/Python_Hello_World/src/HangMan/__init__.py", line 15, in guess
>  guessed[index] = (letter);
> TypeError: 'str' object does not support  item assignment

I don't understand how you are getting this either, I don't get that error 
in similar code. Can you insert a print statement just before that line like:

print 'index: ',index, '\tletter: ',letter,\nguessed: ', guessed

So we can see the values just before the line is executed.
Also drop the semi colon and parens. They shouldn't do any harm 
but it removes two more variables from the equation...

> I am switching from java so my python style might be  a bit unorthodox. I do
> however access my global variables in different  methods... Don't know why it
> is encouraged in java and discouraged in python  ?

I've never seen global variables being encouraged in Java. I think you may 
be thinking of class (or instance) variables which are also ok in Python
albeit more clearly identified by prefixing with self (ie 'this' in Java).


> >> guessed = ["-"];
> >> count  = 0;
> >> wrong = 0;
> >> 
> >> def  guess(letter):
> >> global guessed
> > 
> >> if (letter in line):
> > 
> > You don't need  the parens, they don't do any harm,
> > but they aren't needed.
> > 
> >> index =  line.index(letter);
> >> print  guessed;
> > 
> >> # This is the line that gives me the error don't  know why?
> >> guessed[index] = " " + (letter); ,TypeError: 'str' object  does not
> >> support item assignment
> > 
> >>  guessed[index] = (letter);
> > 
> > Again, you don't need  the parens...
> > And I suspect you really want to use append() here  rather
> > than assigning to guessed[index].
> > 
> >>  print ' '.join(guessed)
> >>  else:
> >> global wrong;
> >>  wrong += 1;
> >> 
> >> 
> >> def  draw(number):...
> > 
> >> def doit():
> >>  global count
> >> while(wrong != 7):
> >>  a_letter = raw_input("Pick a letter -->  ")
> >> print
> >>  guess(a_letter);
> >>  draw(wrong);
> >> print
> >>  count += 1
> >> 
> >> def  initArray():
> >> global guessed
> >>  print line
> >> guessed =  guessed[0] *  (len(line)-1)
> >> print "this is new list " +  guessed;
> > 
> > If you use the append() method you don't need  this.
> > 
> >> while 1:
> >> line =  file.readline();
> >> if (len(line) >=  5):
> >> initArray()
> >>  doit();
> >>  break
> >> if not line: break
> >> 
> >>  file.close()
> > 
> > HTH,
> > 
> 
> 
> 
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] REGEX EXPLANATION

2010-12-21 Thread Luke Paireepinart

Which part of the regexes do you not understand? Have you tried
figuring themselves out yourself first?
We don't typically give out answers here.  This is a tutor list, not a
solve your problems for you list.  We're here to teach you how to
fish, not cook for you.  So show us where you're stuck and we'll help
out.

Did you read the Python regex documentation online? I find it's quite
good.  Should give you a push in the right direction.

On Tue, Dec 21, 2010 at 9:10 AM,   wrote:
> I have been reading regex in order to work around an assignment.
>
> Could anyone explain to me in plain english what the following regex 
> expression translate to.
>
> (r'PSC(?P\d{6})(DC|VR)(\d{2})(RC|GA)(\d{3})(!?([A-Z\d]{1,})@?(.*))?',
>  re.I)
>
> (r'PSC(?P\d{6})VR(?P\d{2})RC(?P\d{3})(?P[ABCDEFGHJKMNPQRSTUVWXYZ\d]{2,})?',
>  re.I)
>
> (r'PSC(?P\d{6})VR(?P\d{2})(?P(RC|GA))(?P\d{3})!(?P[ABCDEFGHJKMNPQ]{1,})@?(?P.*)',
>  re.I)
>
> (r'PSC(?P\d{6})DC(?P\d{2})RC(?P\d{3})(?P[ABCDEFGHJKMNPQRSTUVWX\d]{2,})?',
>  re.I)
>
> (r'PSC(?P\d{6})DC(?P\d{2})(?P(RC|GA))(?P\d{3})!(?P[ABCDEFGHJK]{1,})@?(?P.*)',
>  re.I)
>
> Like reading it out in plain english. This should further help me understand 
> how these signs work and then I can get along.
>
> Thanks.
> Sent from my BlackBerry wireless device from MTN
> ___
> Tutor maillist  -  tu...@python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

82 matches

Mail list logo