Re: Scanning a file
[EMAIL PROTECTED] wrote:
> I think implementing a finite state automaton would be a good (best?)
> solution. I have drawn a FSM for you (try viewing the following in
> fixed width font). Just increment the count when you reach state 5.
>
> <---|
>||
> 0 0 | 1 0 |0
> -->[1]--->[2]--->[3]--->[4]--->[5]-|
> ^ || ^ | | |
> 1| |<---| | | |1 |1
> |_|1 |_| | |
> ^ 0 | |
> |-|<-|
>
> If you don't understand FSM's, try getting a book on computational
> theory (the book by Hopcroft & Ullman is great.)
>
I already have that book. The above solution very slow in practice. None
of the solutions presented in this thread is nearly as fast as the
print file("filename", "rb").read().count("\x00\x00\x01\x00")
/David
--
http://mail.python.org/mailman/listinfo/python-list
Re: Scanning a file
Steven D'Aprano wrote:
> On Fri, 28 Oct 2005 06:22:11 -0700, [EMAIL PROTECTED] wrote:
>
>>Which is quite fast. The only problems is that the file might be huge.
>
> What *you* call huge and what *Python* calls huge may be very different
> indeed. What are you calling huge?
>
I'm not saying that it is too big for Python. I am saying that it is too
big for the systems it is going to run on. These files can be 22 MB or 5
GB or ..., depending on the situation. It might not be okay to run a
tool that claims that much memory, even if it is available.
>
>>I really have no need for reading the entire file into a string as I am
>>doing here. All I want is to count occurences this substring. Can I
>>somehow count occurences in a file without reading it into a string
>>first?
>
> Magic?
>
That would be nice :)
But you misunderstand me...
> You have to read the file into memory at some stage, otherwise how can you
> see what value the bytes are?
I haven't said that I would like to scan the file without reading it. I
am just saying that the .count() functionality implemented into strings
could just as well be applied to some abstraction such as a stream (I
come from C++). In C++, the count() functionality would be separated as
much as possible from any concrete datatype (such as a string),
precisely because it is a concept that is applicable at a more abstract
level. I should be able to say "count the substring occurences of this
stream" or "using this iterator" or something to that effect. If I could say
print file("filename", "rb").count("\x00\x00\x01\x00")
(or something like that)
instead of the original
print file("filename", "rb").read().count("\x00\x00\x01\x00")
it would be exactly what I am after. What is the conceptual difference?
The first solution should be at least as fast as the second. I have to
read and compare the characters anyway. I just don't need to store them
in a string. In essence, I should be able to use the "count occurences"
functionality on more things, such as a file, or even better, a file
read through a buffer with a size specified by me.
>
> Here is another thought. What are you going to do with the count when you
> are done? That sounds to me like a pretty pointless result: "Hi user, the
> file XYZ has 27 occurrences of bitpattern \x00\x00\x01\x00. Would you like
> to do another file?"
>
It might sound pointless to you, but it is not pointless for my purposes :)
If you must know, the above one-liner actually counts the number of
frames in an MPEG2 file. I want to know this number for a number of
files for various reasons. I don't want it to take forever.
> If you are planning to use this count to do something, perhaps there is a
> more efficient way to combine the two steps into one -- especially
> valuable if your files really are huge.
>
Of course, but I don't need to do anything else in this case.
/David
--
http://mail.python.org/mailman/listinfo/python-list
Re: Scanning a file
No comments to this post? /David -- http://mail.python.org/mailman/listinfo/python-list
Re: Scanning a file
Lasse Vågsæther Karlsen wrote: > David Rasmussen wrote: > > >> If you must know, the above one-liner actually counts the number of >> frames in an MPEG2 file. I want to know this number for a number of >> files for various reasons. I don't want it to take forever. > > Don't you risk getting more "frames" than the file actually have? What > if the encoded data happens to have the magic byte values for something > else? > I am not too sure about the details, but I've been told from a reliable source that 0x0100 only occurs as a "begin frame" marker, and not anywhere else. So far, it has been true on the files I have tried it on. /David -- http://mail.python.org/mailman/listinfo/python-list
Re: Scanning a file
Bengt Richter wrote: > > Good point, but perhaps the bit pattern the OP is looking for is guaranteed > (e.g. by some kind of HDLC-like bit or byte stuffing or escaping) not to occur > except as frame marker (which might make sense re the problem of re-synching > to frames in a glitched video stream). > Exactly. > The OP probably knows. I imagine this thread would have gone differently if > the > title had been "How to count frames in an MPEG2 file?" and the OP had supplied > the info about what marks a frame and whether it is guaranteed not to occur > in the data ;-) > Sure, but I wanted to ask the general question :) I am new to Python and I want to learn about the language. /David -- http://mail.python.org/mailman/listinfo/python-list
Re: Scanning a file
Steven D'Aprano wrote: > > However, there may be a simpler solution *fingers crossed* -- you are > searching for a sub-string "\x00\x00\x01\x00", which is hex 0x100. > Surely you don't want any old substring of "\x00\x00\x01\x00", but only > the ones which align on word boundaries? > Nope, sorry. On the files I have tried this on, the pattern could occur on any byte boundary. /David -- http://mail.python.org/mailman/listinfo/python-list
Re: Python's website does a great disservice to the language
Steve Holden wrote: > > [Thinks: wonder if it's time to release a sneak preview]. > It is! It is! /David -- http://mail.python.org/mailman/listinfo/python-list
Re: Scanning a file
Steven D'Aprano wrote: > > 0x0100 is one of a number of unique start codes in the MPEG2 > standard. It is guaranteed to be unique in the video stream, however > when searching for codes within the video stream, make sure you're in > the video stream! > I know I am in the cases I am interested in. > And heaven help you if you want to support MPEGs that are slightly > broken... > I don't. This tool is for use in house only. And on MPEGs that are generated in house too. /David -- http://mail.python.org/mailman/listinfo/python-list
Getting a function name from string
If I have a string that contains the name of a function, can I call it? As in: def someFunction(): print "Hello" s = "someFunction" s() # I know this is wrong, but you get the idea... /David -- http://mail.python.org/mailman/listinfo/python-list
Re: Python as a HTTP Client
Fuzzyman wrote: > ``urllib2`` is the standard library module you need. > > I've written a guide to using it (although it's very easy - but some > attributes of the errors it can raise aren't documented) : > > http://www.voidspace.org.uk/python/articles/urllib2.shtml > > All the best, > > Fuzzyman > http://www.voidspace.org.uk/python/index.shtml > Excellent site! /David -- http://mail.python.org/mailman/listinfo/python-list
Re: Python as a HTTP Client
James Tanis wrote: > If you haven't discovered www.python.org yet I suggest going there :P. > You will find there the documentation you need under the conspicuous > name library reference. Specifically the modules you'd probably most > be interested in are urllib/urllib2/httplib depending on what you > need. Their may be other external modules which fit your task even > better, try doing a search through the Python Package Index.. > To both you and Frederik: I do know about www.python.org. I do an extensive amount of googling in general and searching at python.org before I ask questions such as this. I did stumble upon urllib, urllib2 and httplib in the documentation, but let me assure you, as a newbie, that finding this documentation doesn't make one go "ah, this is what I was looking for". Specifically, I can't see from reference documentation whether something even smarter or more highlevel exists. Fuzzyman's link did the trick. It also helped me (after reading his articles) to understand the reference documentation better. /David -- http://mail.python.org/mailman/listinfo/python-list
Multikey Dict?
If I have a collection of dicts like:
john = {'id': 1, 'name': "John Cleese", 'year': 1939}
graham = {'id': 2, 'name': "Graham Chapman", 'year': 1941}
I could store all of them in a list. But for easy lookup, I might store
all these in a dict instead, like
people = {'1': john, '2': graham}
or maybe
people = {'John Cleese': john, 'Graham Chapman': graham}
or whatever key I might choose. Now, first of all, it seems a bit
annoying that I have to keep that redundant data in the second dict that
is already in the individual dicts within people. Secondly (and this is
my question), it is annoying that I have to choose one of several
unambiguous keys as a key.
I would like to be able to say:
people['1'].year
in some case and in other cases I want to say
people['John Cleese'].year
That is, sometimes I have the name at hand and would like to look up
data based on that. Other times, I have the ID at hand and would like to
look up data based on that instead.
Also, I would like if I didn't have to keep the key data both in the
dict of dicts and in the dicts :)
If I could just say to Python: john and graham (and ...) are all a part
of a "superdict" and either their id or their name can be used as keys.
Can I do that somehow?
/David
--
http://mail.python.org/mailman/listinfo/python-list
Python Book
What is the best book for Python newbies (seasoned programmer in other languages)? /David -- http://mail.python.org/mailman/listinfo/python-list
HTTP Keep-Alive with urllib2
Someone once asked about this an got no answer: http://groups.google.dk/group/comp.lang.python/browse_frm/thread/c3cc0b8d7e9cbc2/ff11efce3b1776cf?lnk=st&q=python+http+%22keep+alive%22&rnum=84&hl=da#ff11efce3b1776cf Maybe I am luckier :) Does anyone know how to do Keep-Alive with urllib2, that is, how to get a persistent HTTP connection, instead of a new connection being opened for every request? /David -- http://mail.python.org/mailman/listinfo/python-list
Making a persistent HTTP connection
I use urllib2 to do some simple HTTP communication with a web server. In one "session", I do maybe 10-15 requests. It seems that urllib2 opens op a connection every time I do a request. Can I somehow make it use _one_ persistent connection where I can do multiple GET->"receive data" passes before the connection is closed? /David -- http://mail.python.org/mailman/listinfo/python-list
Re: python speed
Frithiof Andreas Jensen wrote: > > From the speed requirement: Is that correspondance chess by any chance?? > Regular chess at tournament time controls requires speed too. Any pure Python chess program would lose badly to the best C/C++ programs out there now. I would also like to see Half Life 2 in pure Python. /David -- http://mail.python.org/mailman/listinfo/python-list
Re: python speed
Harald Armin Massa wrote: > Dr. Armin Rigo has some mathematical proof, that High Level Languages > like esp. Python are able to be faster than low level code like > Fortran, C or assembly. > Faster than assembly? LOL... :) /David -- http://mail.python.org/mailman/listinfo/python-list
Re: python speed
Steve Holden wrote: >> >> Faster than assembly? LOL... :) >> > I don't see why this is so funny. A good C compiler with optimization > typically produces better code than an equivalent assembly language > program. As compilation techniques improve this gap is likely to widen. > There's less and less reason to use assembler language with each passing > year. > I've answered this question elsewhere in the thread. /David -- http://mail.python.org/mailman/listinfo/python-list
Re: python speed
Harald Armin Massa wrote: >>Faster than assembly? LOL... :) > > why not? Because any program generated automatically by a compiler of any kind can always be expressed in assembly langauge. That writing assembler for many processors can be really hard to do well is beside the point. We're talking about the best-case capabilities of a language. Writing programs in other languages can be hard as well, not to mention writing a compiler for any language that produces "as good as best assembly" code, that is, optimal code. /David -- http://mail.python.org/mailman/listinfo/python-list
Re: python speed
Peter Hansen wrote: >> >>> From the speed requirement: Is that correspondance chess by any chance?? >> >> Regular chess at tournament time controls requires speed too. Any pure >> Python chess program would lose badly to the best C/C++ programs out >> there now. >> >> I would also like to see Half Life 2 in pure Python. > > True, but so what? So nothing. I was just commenting on the correspondance chess comment... with a valid observation. > Why did you suddenly change the discussion to > require "pure" Python? Because that's the only meaningful thing to discuss. If we're allowed to use Python as a thin layer above C (a very important, practical and cool feature), then we're measuring the speed of C, not Python. What would be the point of that? I can already answer how fast that notion of Python is: as fast as C. > And are you not allowed to use any of the > performance-boosting techniques available for Python, like Pyrex or > Psyco? Of course. > Why such restrictions, when these are things Python programs use > on a daily basis: these are *part* of Python, as much as the -O switch > on the compiler is part of C/C++. > Because we're then measuring the speed of C, not Python. But if you want to convey the point that Python can be used for arbitrarily performance "hungry" problems, you're right. Just code the "fast" parts in C or assembly language and apply a thin layer of Python on top. Lather, rinse, repeat. > Okay, let's compare a "pure" Python program (if you can define it in any > meaningful, practical way) with a "pure" Java program, running on a > non-JIT interpreter and with optimizations turned off (because, of > course, those optimizations are umm... somehow.. not "pure"...?). > Says who? And why are you comparing to Java now? Java sucks. > Judging by the other posts in this thread, the gauntlet is down: Python > is faster than Java. I don't disagree with that. > Let those who believe otherwise prove their point > with facts, and without artificially handcuffing their opponents with > non-real-world "purity" requirements. > You are the one getting theoretical and academic. I was just commenting on the chess comment above, because it is a field that I have extensive knowledge in. To make a real-word challenge to make my point obvious: Write a chess program in Python and win more than 50% out of 100 games against a good chess program (Crafty, Fritz, Chess Tiger etc.) playing on the same hardware. In fact, just win one. Now, if you write the "fast" parts in C, you obviously can solve this problem, and easily. But then there is no reason to discuss Python's speed ever. Just say "as fast as C" and code everything in C and make a hookup in Python. In fact, why not just code it in C then? /David -- http://mail.python.org/mailman/listinfo/python-list
Re: python speed
bruno at modulix wrote: > > There's nothing like "pure" Python. Python depends on a lot of libs, > most of them being coded in C++ or C (or assembly FWIW). The common > scheme is to use Python for the logic and low-level libs for the > critical parts. > I know. But if a discussion like this is to have any meaning, then we're talking about "algorithmic" or "calculative" code with native Python constructs, not Python as a layer on another langauge. In that case, we're measuring the speed of the other language, not Python. > And FWIW, I'd like to see any similar game in "pure" Java !-) > Me too :) /David -- http://mail.python.org/mailman/listinfo/python-list
Re: ANN: Dao Language v.0.9.6-beta is release!
Antoon Pardon wrote: >> >>Write shorter functions ;) > > This has little to do with long functions. A class can contain > a large number of methods, whitch are all rather short, and your > class will still be spread over several pages. > Write classes with a smaller interface ;-) /David -- http://mail.python.org/mailman/listinfo/python-list
Re: How to check if a string "is" an int?
Daniel Schüle wrote: > > others already answered, this is just an idea > I guess, if we want to avoid the exception paradigm for a particular problem, we could just do something like: def isNumber(n): try: dummy = int(n) return True except ValueError: return False and use that function from whereever in the program. /David -- http://mail.python.org/mailman/listinfo/python-list
wxStyledTextCtrl - Dead?
I have several questions about wxStyledTextCtrl: 1) Is it still being maintained? 2) Where are the docs and tutorials? 3) Is it wxStyledTextCtrl, wx.StyledTextCtrl, StyledTextCtrl, or... ? 4) Is there an alternative? /David -- http://mail.python.org/mailman/listinfo/python-list
Re: how relevant is C today?
Mirco Wahab wrote: > > I would say, from my own experience, that you wouldn't > use all C++ features in all C++ projects. Most people > I know would write C programs 'camouflaged' as C++, > that is: write clean & simple C - and use some C++ > features e.g, class bound methods for interfaces - > but no inheritance at all (use compound objects) and > no exceptions (handle errors 'the olden way'). > Of course. C++ is a hybrid language by design, not only an object oriented language, not only a language with exceptions, not only a language with compile time metaprogramming etc. You don't have to use all the features of C++ to make a real C++ program. Even for writing C programs, C++ is still a better choice (in my opinion). If you want to, you can keep things "simple", and plain C-ish, and still benefit from better type safety etc. In my everyday work, I am forced to use a C90 only compiler, and everyday I miss some C++ feature that wouldn't make my program any more complex, quite the opposite. These are features like "const", no default extern linkage, more typesafe enums etc. You can put upon yourself to program in a C style, but whenever you miss something, you can always wrap that up behind an abstraction such as a class etc., and still maintain C-like semantics. Say I wanted an Ada-like integer type that only runs from 1 to 100. I could make such a beast in C++, and then use it just as an ordinary int in my C style program. I could even make this beast _be_ an ordinary int in release builds when I was sure (yeah right) that the code was bugfree. This gives expressibility and preciseness in specification. You let the compiler do the work for you. And you still maintain performance. You can't do this in C at all. And there are a million more examples. In practice, the combination of Python and C++ covers most of what I need to do in most situations. But I still wish that C++ offered a lot more of those zero-overhead features that it might as well offer, that the compiler just as well can do. It could learn from Ada in this regard. /David -- http://mail.python.org/mailman/listinfo/python-list
Re: how relevant is C today?
Lawrence D'Oliveiro wrote: > In article <[EMAIL PROTECTED]>, > David Rasmussen <[EMAIL PROTECTED]> wrote: > >> In my everyday work, I am forced to use a C90 only compiler, and >> everyday I miss some C++ feature that wouldn't make my program any more >> complex, quite the opposite. These are features like "const", no default >> extern linkage, more typesafe enums etc. > > "const" is in C89/C90. Broken const is. C++ const is different from C90 const. > As for the others, how about hiding a copy of GCC > somewhere, just to use to preflight your code before actually building > it with your compulsory broken compiler? :) I can't do that. I compile for a special system with loads of special libraries. The code can never compiler on a stock gcc compiler. Besides, it doesn't help me to better and more precisely express notions in my code. /David -- http://mail.python.org/mailman/listinfo/python-list
Re: how relevant is C today?
Thomas Bellman wrote:
> Lawrence D'Oliveiro <[EMAIL PROTECTED]> writes:
>
>> "const" is in C89/C90.
>
> Although with slightly different semantics from in C++... For
> instance:
>
> static const int n = 5;
> double a[n];
>
> is valid C++, but not valid C.
>
There are other differences as well. In C, I can't do something like:
int f(void)
{
return 42;
}
const int i = f()
int main()
{
return 0;
}
/David
--
http://mail.python.org/mailman/listinfo/python-list
Re: wxStyledTextCtrl - Dead?
Dave Mandelin wrote: > I don't know the answers to 1 and 2, but from the demo I know that the > answer to 3 is wx.stc.StyledTextControl. > > As for 4, I guess it depends on what you want to do. StyledTextControl > looked pretty scary to me, and for my application I mainly needed to > display styled text, not edit it, so I embedded a web browser window > and used HTML+CSS (and even a little JavaScript now). That worked quite > nicely. > > If you need to edit the text as well, then I don't know. The rich edit > control (TextCtrl with style wx.TE_RICH2) is one option, but it is not > particularly nice to use. > I am trying to make a programmer's editor (and later a full IDE), and I want things like syntax highlighting etc. I could of course roll my own fancy editing control, but if STC could solve my problem and is flexible enough, then I'll use that for now :) /David -- http://mail.python.org/mailman/listinfo/python-list
