fdups: calling for beta testers
Hi all, I am looking for beta-testers for fdups. fdups is a program to detect duplicate files on locally mounted filesystems. Files are considered equal if their content is identical, regardless of their filename. Also, fdups ignores symbolic links and is able to detect and ignore hardlinks, where available. In contrast to similar programs, fdups does not rely on md5 sums or other hash functions to detect potentially identical files. Instead, it does a direct blockwise comparison and stops reading as soon as possible, thus reducing the file reads to a maximum. fdups has been developed on Linux but should run on all platforms that support Python. fdups' homepage is at http://www.homepages.lu/pu/fdups.html, where you'll also find a link to download the tar. I am primarily interested in getting feedback if it produces correct results. But as I haven't been programming in Python for a year or so, I'd also be interested in comments on code if you happen to look at it in detail. Your help is much appreciated. -pu -- http://mail.python.org/mailman/listinfo/python-list
Re: fdups: calling for beta testers
John Machin wrote: (1) It's actually .bz2, not .bz (2) Why annoy people with the not-widely-known bzip2 format just to save a few % of a 12KB file?? (3) Typing that on Windows command line doesn't produce a useful result (4) Haven't you heard of distutils? (1) Typo, thanks for pointing it out (2)(3) In the Linux world, it is really popular. I suppose you are a Windows user, and I haven't given that much thought. The point was not to save space, just to use the "standard" format. What would it be for Windows - zip? (4) Never used them, but are very valid point. I will look into it. (6) You are keeping open handles for all files of a given size -- have you actually considered the possibility of an exception like this: IOError: [Errno 24] Too many open files: 'foo509' (6) Not much I can do about this. In the beginning, all files of equal size are potentially identical. I first need to read a chunk of each, and if I want to avoid opening & closing files all the time, I need them open together. What would you suggest? Once upon a time, max 20 open files was considered as generous as 640KB of memory. Looks like Bill thinks 512 (open files, that is) is about right these days. Bill also thinks it is normal that half of service pack 2 lingers twice on a harddisk. Not sure whether he's my hero ;-) (7) Why sort? What's wrong with just two lines: ! for size, file_list in self.compfiles.iteritems(): ! self.comparefiles(size, file_list) (7) I wanted the output to be sorted by file size, instead of being random. It's psychological, but if you're chasing dups, you'd want to start with the largest ones first. If you have more that a screen full of info, it's the last lines which are the most interesting. And it will produce the same info in the same order if you run it twice on the same folders. (8) global MIN_FILESIZE,MAX_ONEBUFFER,MAX_ALLBUFFERS,BLOCKSIZE,INODES That doesn't sit very well with the 'everything must be in a class' religion seemingly espoused by the following: (8) Agreed. I'll think about that. (9) Any good reason why the "executables" don't have ".py" extensions on their names? (9) Because I am lazy and Linux doesn't care. I suppose Windows does? All in all, a very poor "out-of-the-box" experience. Bear in mind that very few Windows users would have even heard of bzip2, let alone have a bzip2.exe on their machine. They wouldn't even be able to *open* the box. As I said, I did not give Windows users much thought. I will improve this. And what is "chown" -- any relation of Perl's "chomp"? chown is a Unix command to change the owner or the group of a file. It has to do with controlling access to the file. It is not relevant on Windows. No relation to Perl's chomp. Thank you very much for your feedback. Did you actually run it on your Windows box? -pu -- http://mail.python.org/mailman/listinfo/python-list
Re: fdups: calling for beta testers
John Machin wrote: Yes. Moreover, "WinZip", the most popular archive-handler, doesn't grok bzip2. I've added a zip file. It was made in Linux with the zip command-line tool, the man pages say it's compatible with the Windows zip tools. I have also added .py extentions to the 2 programs. I did however not use distutils, because I'm not sure it is really adapted to module-less scripts. You should consider a fall-back method to be used in this case and in the case of too many files for your 1Mb (default) buffer pool. BTW 1Mb seems tiny; desktop PCs come with 512MB standard these days, and Bill does leave a bit more than 1MB available for applications. I've added it to the TODO list. The question was rhetorical. Your irony detector must be on the fritz. :-) I always find it hard to detect irony by mail with people I do not know. .. Did you actually run it on your Windows box? Yes, with trepidation, after carefully reading the source. It detected some highly plausible duplicates, which I haven't verified yet. I would have been reluctant too. But I've tested it intensively, and there's strictly no statement that actually alters the file system. Thanks for your feedback! -pu -- http://mail.python.org/mailman/listinfo/python-list
Re: fdups: calling for beta testers
Serge Orlov wrote: Or use exemaker, which IMHO is the best way to handle this problem. Looks good, but I do not use Windows. -pu -- http://mail.python.org/mailman/listinfo/python-list
Re: fdups: calling for beta testers
John Machin wrote: I've tested it intensively "Famous Last Words" :-) ;-) (1) Manic s/w producing lots of files all the same size: the Borland C[++] compiler produces a debug symbol file (.tds) that's always 384KB; I have 144 of these on my HD, rarely more than 1 in the same directory. Not sure what you want me to do about it. I've decreased the minimum block size once more, to accomodate for more files of the same length without increasing the total amount of memory used. (2) There appears to be a flaw in your logic such that it will find duplicates only if they are in the *SAME* directory and only when there are no other directories with two or more files of the same size. Ooops... A really stupid mistake on my side. Corrected. (3) Your fdups-check gadget doesn't work on Windows; the commands module works only on Unix but is supplied with Python on all platforms. The results might just confuse a newbie: Why not use the Python filecmp module? Done. It's also faster AND it works better. Thanks for the suggestion. Please fetch the new version from http://www.homepages.lu/pu/fdups.html. -pu -- http://mail.python.org/mailman/listinfo/python-list
os.stat('')[stat.ST_INO] on Windows
What does the above yield on Windows? Are inodes supported on Windows NTFS, FAT, FAT32? -- http://mail.python.org/mailman/listinfo/python-list
Re: Wishful thinking : unix to windows script?
John Leslie wrote: Or does anyone have a python script which takes a standard unix command as an argument and runs the pyton/windows equivalent on windows? There's not always an equivalent command. -pu -- http://mail.python.org/mailman/listinfo/python-list
Re: Wishful thinking : unix to windows script?
Grant Edwards wrote: If you install cygwin there almost always is. If you install cygwin there's no need for what the OP describes. -pu -- http://mail.python.org/mailman/listinfo/python-list
Re: Indexing strings
Fred wrote:
I am searching for a possibility, to find out, what the index for a
certain lettyer in a string is.
My example:
for x in text:
if x == ' ':
list = text[: # There I need the index of the space the
program found during the loop...
Is there and possibility to find the index of the space???
Thanks for any help!
Fred
Use the index method, e.g.: text.index(' ').
What exactly do you want to do?
-pu
--
http://mail.python.org/mailman/listinfo/python-list
Re: Indexing strings
Fred wrote: That was exactely what I was searching for. I needed a program, that chopped up a string into its words and then saves them into a list. I think I got this done... There's a function for that: text.split(). You should really have a look at the Python docs. Also, http://diveintopython.org/ and http://www.gnosis.cx/TPiP/ are great tutorials. -pu -- http://mail.python.org/mailman/listinfo/python-list
Re: enum question
M.N.A.Smadi wrote: does python support a C-like enum statement where one can define a variable with prespesified range of values? thanks m.smadi >>> BLUE, RED, GREEN = 1,5,8 >>> BLUE 1 >>> RED 5 >>> GREEN 8 -- http://mail.python.org/mailman/listinfo/python-list
Re: function with a state
Xah Lee wrote: globe=0; def myFun(): globe=globe+1 return globe The short answer is to use the global statement: globe=0 def myFun(): global globe globe=globe+1 return globe more elegant is: globe=0 globe=myfun(globe) def myFun(var): return var+1 and still more elegant is using classes and class attributes instead of global variables. -pu -- http://mail.python.org/mailman/listinfo/python-list
Re: function with a state
Kent Johnson wrote: globe=0 globe=myfun(globe) def myFun(var): return var+1 This mystifies me. What is myfun()? What is var intended to be? myfun is an error ;-) should be myFun, of course. var is parameter of function myFun. If you call myFun with variable globe, all references to var will be replaced by globe inside function myFun. -pu -- http://mail.python.org/mailman/listinfo/python-list
Re: Python docs [was: function with a state]
You don't understand the "global" statement in Python, but you do understand Software industry in general? Smart... -- http://mail.python.org/mailman/listinfo/python-list
Re: [perl-python] a program to delete duplicate files
I wrote something similar, have a look at http://www.homepages.lu/pu/fdups.html. -- http://mail.python.org/mailman/listinfo/python-list
Re: [perl-python] a program to delete duplicate files
Christos TZOTZIOY Georgiou wrote: On POSIX filesystems, one has also to avoid comparing files having same (st_dev, st_inum), because you know that they are the same file. I then have a bug here - I consider all files with the same inode equal, but according to what you say I need to consider the tuple (st_dev,ST_ium). I'll have to fix that for 0.13. Thanks ;-) -pu -- http://mail.python.org/mailman/listinfo/python-list
Re: [perl-python] a program to delete duplicate files
Christos TZOTZIOY Georgiou wrote: That's fast and good. Nice to hear. A minor nit-pick: `fdups.py -r .` does nothing (at least on Linux). I'll look into that. Have you found any way to test if two files on NTFS are hard linked without opening them first to get a file handle? No. And even then, I wouldn't know how to find out. -pu -- http://mail.python.org/mailman/listinfo/python-list
Re: [perl-python] a program to delete duplicate files
Christos TZOTZIOY Georgiou wrote: The relevant parts from this last page: st_dev <-> dwVolumeSerialNumber st_ino <-> (nFileIndexHigh, nFileIndexLow) I see. But if I am not mistaken, that would mean that I (1) had to detect NTFS volumes (2) use non-standard libraries to find these information (like the Python Win extentions). I am not seriously motivated to do so, but if somebody is interested to help, I am open to it. -pu -- http://mail.python.org/mailman/listinfo/python-list
Re: [perl-python] a program to delete duplicate files
David Eppstein wrote: You need do no comparisons between files. Just use a sufficiently strong hash algorithm (SHA-256 maybe?) and compare the hashes. That's not very efficient. IMO, it only makes sense in network-based operations such as rsync. -pu -- http://mail.python.org/mailman/listinfo/python-list
Re: [perl-python] a program to delete duplicate files
Christos TZOTZIOY Georgiou wrote: A minor nit-pick: `fdups.py -r .` does nothing (at least on Linux). Changed. -- http://mail.python.org/mailman/listinfo/python-list
Re: [perl-python] a program to delete duplicate files
David Eppstein wrote: Well, but the spec didn't say efficiency was the primary criterion, it said minimizing the number of comparisons was. That's exactly what my program does. More seriously, the best I can think of that doesn't use a strong slow hash would be to group files by (file size, cheap hash) then compare each file in a group with a representative of each distinct file found among earlier files in the same group -- that leads to an average of about three reads per duplicated file copy: one to hash it, and two for the comparison between it and its representative (almost all of the comparisons will turn out equal but you still need to check unless you My point is : forget hashes. If you work with hashes, you do have to read each file completely, cheap hash or not. My program normally reads *at most* 100% of the files to analyse, but usually much less. Also, I do plain comparisons which are much cheaper than hash calculations. I'm assuming of course that there are too many files and/or they're too large just to keep them all in core. I assume that file handles are sufficient to keep one open per file of the same size. This lead to trouble on Windows installations, but I guess that's a parameter to change. On Linux, I never had the problem. Regarding buffer size, I use a maxumim which is then split up between all open files. Anyone have any data on whether reading files and SHA-256'ing them (or whatever other cryptographic hash you think is strong enough) is I/O-bound or CPU-bound? That is, is three reads with very little CPU overhead really cheaper than one read with a strong hash? It also depends on the OS. I found that my program runs much slower on Windows, probably due to the way Linux anticipates reads and tries to reduce head movement. I guess it also depends on the number of files you expect to have duplicates of. If most of the files exist in only one copy, it's clear that the cheap hash will find them more cheaply than the expensive hash. In that case you could combine the (file size, cheap hash) filtering with the expensive hash and get only two reads per copy rather than three. Sorry, but I can still not see a point tu use hashes. Maybe you'll have a look at my program and tell me where a hash could be useful? It's available at http://www.homepages.lu/pu/fdups.html. Regards, -pu -- http://mail.python.org/mailman/listinfo/python-list
Re: a program to delete duplicate files
John Machin wrote: Just look at the efficiency of processing N files of the same size S, where they differ after d bytes: [If they don't differ, d = S] PU: O(Nd) reading time, O(Nd) data comparison time [Actually (N-1)d which is important for small N and large d]. Hashing method: O(NS) reading time, O(NS) hash calc time Shouldn't you add the additional comparison time that has to be done after hash calculation? Hashes do not give 100% guarantee. If there's a large number of identical hashes, you'd still need to read all of these files to make sure. Just to explain why I appear to be a lawer: everybody I spoke to about this program told me to use hashes, but nobody has been able to explain why. I found myself 2 possible reasons: 1) it's easier to program: you don't compare several files in parallel, but process one by one. But it's not perfect and you still need to compare afterwards. In the worst case, you end up with 3 files with identical hashes, of which 2 are identical and 1 is not. In order to find this, you'd still have to program the algorithm I use, unless you say "oh well, there's a problem with the hash, go and look yourself." 2) it's probably useful if you compare files over a network and you want to reduce bandwidth. A hash lets you do that at the cost of local CPU and disk usage, which may be OK. That was not my case. -pu -- http://mail.python.org/mailman/listinfo/python-list
Re: Adapting code to multiple platforms
Jeffrey Barish wrote: I have a small program that I would like to run on multiple platforms (at least linux and windows). My program calls helper programs that are different depending on the platform. I think I figured out a way to structure my program, but I'm wondering whether my solution is good Python programming practice. I use something like this in the setup code: if os.name == 'posix': statfunction = os.lstat else: statfunction = os.stat and then further in the code: x = statfunction(filename) So the idea is to have your "own" function names and assign the os-specific functions one and for all in the beginning. Afterwards, your code only uses your own function names and, as long as they behave in the same way, there's no more if - else stuff. -pu -- http://mail.python.org/mailman/listinfo/python-list
Re: a program to delete duplicate files
Scott David Daniels wrote: comparisons. Using hashes, three file reads and three comparisons of hash values. Without hashes, six file reads; you must read both files to do a file comparison, so three comparisons is six files. That's provided you compare always 2 files at a time. I compare n files at a time, n being the number of files of the same size. That's quicker than hashes because I have a fair chance of finding a difference before the end of files. Otherwise, it's like hashes without computation and without having to have a second go to *really* compare them. -pu -- http://mail.python.org/mailman/listinfo/python-list
Re: a program to delete duplicate files
François Pinard wrote: Identical hashes for different files? The probability of this happening should be extremely small, or else, your hash function is not a good one. We're talking about md5, sha1 or similar. They are all known not to be 100% perfect. I agree it's a rare case, but still, why settle on something "about right" when you can have "right"? I once was over-cautious about relying on hashes only, without actually comparing files. A friend convinced me, doing maths, that with a good hash function, the probability of a false match was much, much smaller than the probability of my computer returning the wrong answer, despite thorough comparisons, due to some electronic glitch or cosmic ray. So, my cautious attitude was by far, for all practical means, a waste. It was not my only argument for not using hashed. My algorithm also does less reads, for example. -pu -- http://mail.python.org/mailman/listinfo/python-list
Re: Can't seem to insert rows into a MySQL table
grumfish wrote:
connection = MySQLdb.connect(host="localhost", user="root", passwd="pw",
db="japanese")
cursor = connection.cursor()
cursor.execute("INSERT INTO edict (kanji, kana, meaning) VALUES (%s, %s,
%s)", ("a", "b", "c") )
connection.close()
Just a guess "in the dark" (I don't use MySQL): is "commit" implicit, or
do you have to add it yourself?
-pu
--
http://mail.python.org/mailman/listinfo/python-list
Re: a program to delete duplicate files
John Machin wrote: Maybe I was wrong: lawyers are noted for irritating precision. You meant to say in your own defence: "If there are *any* number (n >= 2) of identical hashes, you'd still need to *RE*-read and *compare* ...". Right, that is what I meant. 2. As others have explained, with a decent hash function, the probability of a false positive is vanishingly small. Further, nobody in their right mind [1] would contemplate automatically deleting n-1 out of a bunch of n reportedly duplicate files without further investigation. Duplicate files are usually (in the same directory with different names or in different-but-related directories with the same names) and/or (have a plausible explanation for how they were duplicated) -- the one-in-zillion-chance false-positive should stand out as implausible. Still, if you can get it 100% right automatically, why would you bother checking manually? Why get back to argments like "impossible", "implausible", "can't be" if you can have a simple and correct answer - yes or no? Anyway, fdups does not do anything else than report duplicates. Deleting, hardlinking or anything else might be an option depending on the context in which you use fdups, but then we'd have to discuss the context. I never assumed any context, in order to keep it as universal as possible. Different subject: maximum number of files that can be open at once. I raised this issue with you because I had painful memories of having to work around max=20 years ago on MS-DOS and was aware that this magic number was copied blindly from early Unix. I did tell you that empirically I could get 509 successful opens on Win 2000 [add 3 for stdin/out/err to get a plausible number] -- this seems high enough to me compared to the likely number of files with the same size -- but you might like to consider a fall-back detection method instead of just quitting immediately if you ran out of handles. For the time being, the additional files will be ignored, and a warning is issued. fdups does not quit, why are you saying this? A fallback solution would be to open the file before every _block_ read, and close it afterwards. In my mind, it would be a command-line option, because it's difficult to determine the number of available file handles in a multitasking environment. Not difficult to implement, but I first wanted to refactor the code so that it's a proper class that can be used in other Python programs, as you also asked. That is what I have sent you tonight. It's not that I don't care about the file handle problem, it's just that I do changes by (my own) priority. You wrote at some stage in this thread that (a) this caused problems on Windows and (b) you hadn't had any such problems on Linux. Re (a): what evidence do you have? I've had the case myself on my girlfriend's XP box. It was certainly less than 500 files of the same length. Re (b): famous last words! How long would it take you to do a test and announce the margin of safety that you have? Sorry, I do not understand what you mean by this. -pu -- http://mail.python.org/mailman/listinfo/python-list
Re: a program to delete duplicate files
John Machin wrote:
Oh yeah, "the computer said so, it must be correct". Even with your
algorithm, I would be investigating cases where files were duplicates
but there was nothing in the names or paths that suggested how that
might have come about.
Of course, but it's good to know that the computer is right, isn't it?
That leaves the human to take decisions instead of double-checking.
I beg your pardon, I was wrong. Bad memory. It's the case of running
out of the minuscule buffer pool that you allocate by default where it
panics and pulls the sys.exit(1) rip-cord.
Bufferpool is a parameter, and the default values allow for 4096 files
of the same size. It's more likely to run out of file handles than out
of bufferspace, don't you think?
The pythonic way is to press ahead optimistically and recover if you
get bad news.
You're right, that's what I thought about afterwards. Current idea is to
design a second class that opens/closes/reads the files and handles the
situation independantly of the main class.
I didn't "ask"; I suggested. I would never suggest a
class-for-classes-sake. You had already a singleton class; why
another". What I did suggest was that you provide a callable interface
that returned clusters of duplicates [so that people could do their own
thing instead of having to parse your file output which contains a
mixture of warning & info messages and data].
That is what I have submitted to you. Are you sure that *I* am the
lawyer here?
Re (a): what evidence do you have?
See ;-)
Interesting. Less on XP than on 2000? Maybe there's a machine-wide
limit, not a per-process limit, like the old DOS max=20. What else was
running at the time?
Nothing I started manually, but the usual bunch of local firewall, virus
scanner (not doing a complete machine check at that time).
Test:
!for k in range(1000):
!open('foo' + str(k), 'w')
I'll try that.
Announce:
"I can open A files at once on box B running os C. The most files of
the same length that I have seen is D. The ratio A/D is small enough
not to worry."
I wouldn't count on that on a multi-tasking environment, as I said. The
class I described earlier seems a cleaner approach.
Regards,
-pu
--
http://mail.python.org/mailman/listinfo/python-list
Re: a program to delete duplicate files
John Machin wrote:
Test:
!for k in range(1000):
!open('foo' + str(k), 'w')
I ran that and watched it open 2 million files and going strong ...
until I figured that files are closed by Python immediately because
there's no reference to them ;-)
Here's my code:
#!/usr/bin/env python
import os
print 'max number of file handles today is',
n = 0
h = []
try:
while True:
filename = 'mfh' + str(n)
h.append((file(filename,'w'),filename))
n = n + 1
except:
print n
for handle, filename in h:
handle.close()
os.remove(filename)
On Slackware 10.1, this yields 1021.
On WinXPSP2, this yields 509.
-pu
--
http://mail.python.org/mailman/listinfo/python-list
Re: a program to delete duplicate files
David Eppstein wrote: When I've been talking about hashes, I've been assuming very strong cryptographic hashes, good enough that you can trust equal results to really be equal without having to verify by a comparison. I am not an expert in this field. All I know is that MD5 and SHA1 can create collisions. Are there stronger algorithms that do not? And, more importantly, has it been *proved* that they do not? -pu -- http://mail.python.org/mailman/listinfo/python-list
Re: a program to delete duplicate files
David Eppstein wrote: The hard part is verifying that the files that look like duplicates really are duplicates. To do so, for a group of m files that appear to be the same, requires 2(m-1) reads through the whole files if you use a comparison based method, or m reads if you use a strong hashing method. You can't hope to cut the reads off early when using comparisons, because the files won't be different. If you read them in parallel, it's _at most_ m (m is the worst case here), not 2(m-1). In my tests, it has always significantly less than m. -pu -- http://mail.python.org/mailman/listinfo/python-list
Re: How to create an object instance from a string??
Tian wrote: I have a string: classname = "Dog" It's easier without strings: >>> classname = Dog >>> classname().bark() Arf!!! >>> -- http://mail.python.org/mailman/listinfo/python-list
[ann] fdups 0.15
I am happy to announce version 0.15 of fdups. Changes in this version: - ability to limit the number of file handles used Download = To download, go to: http://www.homepages.lu/pu/fdups.html What is fdups? == fdups is a Python program to detect duplicate files on locally mounted filesystems. Files are considered equal if their content is identical, regardless of their filename. Also, fdups is able to detect and ignore symbolic links and hard links, where available. In contrast to similar programs, fdups does not rely on md5 sums or other hash functions to detect potentially identical files. Instead, it does a direct blockwise comparison and stops reading as soon as possible, thus reducing the file reads to a maximum. fdups results can either be processed by a unix-type filter, or directly by another python program. Warning === fdups is BETA software. It is known not to produce false positives if the filesystem is static. I am looking for additional beta-testers, as well as for somebody who would be able to implement hard-link detection on NTFS file systems. All feedback is appreciated. -- http://mail.python.org/mailman/listinfo/python-list
Re: how to add a string to the beginning of a large binary file?
could ildg wrote: I want to add a string such as "I love you" to the beginning of a binary file, How to? and how to delete the string if I want to get the original file? You shouldn't use Python to write a virus :-) -pu -- http://mail.python.org/mailman/listinfo/python-list
Re: numbering variables
remi wrote: Hello, I have got a list like : mylist = ['item 1', 'item 2','item n'] and I would like to store the string 'item1' in a variable called s_1, 'item2' in s_2,...,'item i' in 's_i',... The lenght of mylist is finite ;-) Any ideas ? Thanks a lot. Rémi. Use a dictionary: variable['s_1']= mylist.pop(), variable['s_2'] = mylist.pop() ... -- http://mail.python.org/mailman/listinfo/python-list
Re: Which is easier? Translating from C++ or from Java...
cjl wrote: Implementations of what I'm trying to accomplish are available (open source) in C++ and in Java. Which would be easier for me to use as a reference? I'm not looking for automated tools, just trying to gather opinions on which language is easier to understand / rewrite as python. Depends on what language you know best. But Java is certainly easier to read than C++. -pu -- http://mail.python.org/mailman/listinfo/python-list
Re: Which is easier? Translating from C++ or from Java...
[EMAIL PROTECTED] wrote: Patrick Useldinger wrote: Depends on what language you know best. But Java is certainly easier to read than C++. There's certainly some irony in those last two sentences. However, I agree with the former. It depends on which you know better, the style of those who developed each and so forth. Personally, I'd prefer C++. Not really. If you know none of the languages perfectly, you are less likely to miss something in Java than in C++ (i.e. no &, * and stuff in Java). However, if you are much more familiar with one of the two, you're less likely to miss things there. -pu -- http://mail.python.org/mailman/listinfo/python-list
Re: Which is easier? Translating from C++ or from Java...
cjl wrote: I've found a third open source implementation in pascal (delphi), and was wondering how well that would translate to python? Being old enough to have programmed in UCSD Pascal on an Apple ][ (with a language card, of course), I'd say: go for Pascal! ;-) -- http://mail.python.org/mailman/listinfo/python-list
filtering DNS proxy
Hi all,
I am looking to write a filtering DNS proxy which should
- receive DNS queries
- validate them again an ACL which looks as follows:
{ 'ip1':['name1','name2',...],
'ip2':['name1','name3'],
...
}
- if the request is valid (ie. if the sending IP address is allowed to
ask for the name resulution of 'name', pass it on to the relevant DNS server
- if not send the requestor some kind of error message.
The expected workload is not enormous. The proxy must run on Linux.
What would be the best way to approach this problem:
- implementing it in stock Python with asyncore
- implementing it in stock Python with threads
- using Twisted
- anything else?
My first impression is that I would be most comfortable with stock
Python and threads because I am not very familiar with event-driven
programming and combining the server and client part might be more
complicated to do. Twisted seems daunting to me because of the
documentation.
Any suggesting would be appreciated.
Regards,
-pu
--
http://mail.python.org/mailman/listinfo/python-list
Re: [OT] why cd ripping on Linux is so slow
alf wrote: > Hi, > > I try to ip some music CD and later convert it into mp3 for my mp3 > player, but can not get around one problem. ripping from Linux is > extremely slow like 0.5x of CD speed. > > In contrary, on M$ Windows it takes like a few minutes to have CD ripped > and compresses into wmf yet I do not knowhow to get pure wavs... > > Hope I find someone here helping me with that This is really OT, and you might be better off looking in Linux forums like http://www.linuxquestions.org/. That said, it's likely that your DMA is not switched on. Ask your question in the aforementioned forums, and make sure to state which distribution you are using. -pu -- http://mail.python.org/mailman/listinfo/python-list
Re: setuid root
Tiago Simões Batista wrote: > The sysadmin already set the setuid bit on the script, but it > still fails when it tries to write to any file that only root has > write access to. use sudo. -- http://mail.python.org/mailman/listinfo/python-list
Re: Python 2.5 incompatible with Fedora Core 6 - packaging problems again
http://www.serpentine.com/blog/2006/12/22/how-to-build-safe-clean-python-25-rpms-for-fedora-core-6/ -- http://mail.python.org/mailman/listinfo/python-list
