Re: [Python-il] Perl Vs. Python on Various Points

Roman Gaufman Wed, 15 Jul 2009 00:57:15 -0700

>
> 1. Syntax as an Indicative of What the Language is Doing:
> ---------------------------------------------------------
>
> He said he didn't like Perl syntax like "push @$array_ref, $val;" because
> of the sigils. I said I happen to like it because the extra characters
> convey meaning. By looking at this code I know that $array_ref is an array
> reference, that it is being treated as an array and that one appends a
> single ("scalar") value to it. On the other if I see code like this:
>
> <<<<<
> s.add(h)
>>>>>>


it's .append and if you see it you can assume it's an array or look at
the context. For example:

list = ['a', 'b', 'c']
hash = {}
hash['List'] = list
print hash['List']

vs

my @list = ('a', 'b', 'c');
my %hash;
$hash{'List'} = \...@list;
print "@{$hash{'List'}}\n";

I find both reading and writing the python version easier - it's all
personal taste though.


>
> 2. Comparison Operators:
> ------------------------
>
> Later on the discussion diverted to comparison operators. Now python only
> has "==" and friends for comparison (at least as far as I know) while Perl 5
> has both ==/!=/>/etc. and eq/ne/gt/etc. The first ones are intended for
> numeric comparison and the latter ones for string comparison.
>
> I argued that by looking at code with such comparisons, I can tell what kind
> of comparison the programmmer intended the comparison to be. Part of the
> reason for the fact that Perl 5 has both types of comparison is that it
> does not have separate data types for strings and for numbers, but this is
> not the only reason.
>
> So in Python, I have:
>
> <<<<<<<<<<<<<<<
> shlomi:~$ python
> Python 2.6.2 (r262:71600, Jul 11 2009, 07:37:11)
> [GCC 4.4.0] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
>>>> 1 == 1.0
> True
>>>> "1" == "1.0"
> False
>>>>
>>>>>>>>>>>>>>>>
>
> Whereas in Perl, I have:
>
> <<<<<<<<<<<<<<<
> shlomi:~$ re.pl
> $ 1 == 1.0
> 1
> $ "1" == "1.0"
> 1
> $ "1" eq "1.0"
>
> $ 1 eq 1.0
> 1
>>>>>>>>>>>>>>>>
>
> That's not all there is to it, however. In Python:
>
> <<<<<<<<<<<<<<<<<<<<<
> shlomi:~$ python
> Python 2.6.2 (r262:71600, Jul 11 2009, 07:37:11)
> [GCC 4.4.0] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
>>>> x = [0,1,2]
>>>> y = [2,1,0]
>>>> y.reverse
> <built-in method reverse of list object at 0xb7cc444c>
>>>> y.reverse()
>>>> x
> [0, 1, 2]
>>>> y
> [0, 1, 2]
>>>> x == y
> True
>>>>
>>>>>>>>>>>>>>>>>>>>>
>
> So Python's == does a deep comparison of complex data structures and returned
> that x and y where equivalent despite the fact that they aren't the same
> physical reference.
>
> In Perl, however:
>
> <<<<<<<<<<
> shlomi:~$ re.pl
> $ [0,1,2] eq [0,1,2]
>
> $ ([0,1,2] eq [0,1,2]) ? "True" : "False"
> False
> $ ([0,1,2] == [0,1,2]) ? "True" : "False"
> False
> $
>>>>>>>>>>>
>
> (You shouldn't use == for comparing references in Perl 5 - it's just for the
> sake of the demonstration.)
>
> Perl did a shallow comparison of the references and returned a false because
> they weren't the same reference.
>
> I should note that in Perl comparison is not necessarily O(1) because if I
> have
> two very long strings, then comparing them may be O(N) where N is the length
> of the strings.
>
> For deep comparison we have CPAN modules like
> http://search.cpan.org/dist/Test-Differences/ , or can use the more limited
> is_deeply() functionationality of Test::More.
>
> I personally feel that it's impossible to have "one-comparison-fits-all"
> because for two pieces of data, there may be several ways that we would like
> to compare them.

If you know to know if it's the same reference, use "is":

[1,2,3] is [1,2,3]
False

From the english definition of equals, I like the way == works and I
like not having different operators for different types or having to
use additional modules. It's a time saver for me but I guess It's a
matter of preference.

>
> 3. Circular References:
> -----------------------
>
> After the discussion on comparison, the conversation diverted to discussing
> circular references. My partner for the conversation was surprised to learn
> that Python has them:
>
> <<<<<<<<<<<<<<<<<<<<<<
> shlomi:~$ python
> Python 2.6.2 (r262:71600, Jul 11 2009, 07:37:11)
> [GCC 4.4.0] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
>>>> a = [0,1,2]
>>>> a[0] = a
>>>> a
> [[...], 1, 2]
>>>> a[0][1]
> 1
>>>> a[0][0][1]
> 1
>>>> a[0][0][0][1]
> 1
>>>> a[0]
> [[...], 1, 2]
>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>
> Now what happens if we try to compare two equivalent circular data strctures:
>
> <<<<<<<<<<<<<<<<<<<<<<
> shlomi:~$ python
> Python 2.6.2 (r262:71600, Jul 11 2009, 07:37:11)
> [GCC 4.4.0] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
>>>> x = [0,1,2]
>>>> x[0] = x
>>>> y = [0,1,2]
>>>> y[0] = y
>>>> x
> [[...], 1, 2]
>>>> y
> [[...], 1, 2]
>>>> x == y
> Traceback (most recent call last):
>  File "<stdin>", line 1, in <module>
> RuntimeError: maximum recursion depth exceeded in cmp
>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>
> So CPython is not very smart about it, and throws an ugly expection.
>
> Obviously x[0] = x data structures are neither too common or useful, but
> circular references are useful for such data structures and OO patterns
> as trees with parent pointers, doubly-linked lists or graphs.
>
> Now, I also remembered that circular references with a reference count-based
> garbage collector (GC) were a common cause of memory leaks, so I decided to
> see if it still existed in Python. I wrote the following program:
>
> <<<<<<<<<<<<<<<<<<<<<<
> import random
>
> def gen_rand_string():
>    ret = ""
>    for x in range(0,100):
>        ret += str(random.randint(0,1000000))
>    return ret
>
> def leak():
>    a = [0,gen_rand_string(), 24]
>    a[0] = a
>    return a
>
> random.seed(24)
>
> while(1):
>    print leak()[2]
>>>>>>>>>>>>>>>>>>>>>>>
>
> Running it made it remain at 0.1% of memory for a long time. I asked
> the people on Freenode #python about it and they told me that "python has had
> a cycle collector since 2.0" and I was told that this cycle collector does
> not make object destruction unpredictable.
>
> perl 5 still doesn't have something like that and when I consulted #perl
> about it they said that http://xrl.us/be233e is a start towards a cycle
> collection for objects.

I've never used circular references, thanks for showing me something
new - I'll read more about them.

>
> 4. Hiding Code By Using .pyc's
> ------------------------------
>
> The python backend compiles the text-based Python code to bytecode, and
> caches the result in .pyc files. My partner to the conversation argued that
> he often uses these .pyc files to "hide" the source code from people he's
> distributing it to them, so they will be unable to reverse engineer it.

I never said this. I said you can reverse engineer just about anything
and I don't agree with trying to hide code, but compiled bytecode is
enough to shut my employer up about source hiding.

>
> I told him that with a symbolic language such as Python, such .pyc files
> do not provide adequate "protection" as they contain the names of identifiers
> and other detailed information on the code. At one point, he even thought
> that they are compiled C code, but I told him CPython can work pretty well
> on machines without any kind of C compiler.

Jython for example creates very similar compiled bytecode to Java,
Cpython also creates bytecode that looks somewhat similar to compiled
C code - with the only differences being that maybe more identifiers
are visible (this is both in the java and cpython bytecode) and the
code is compiled for the python virtual machine using instructions
supported by python rather than those supported by your architecture
and cpu.

>
> On #python, people seemed to have agree with me (I am rindolf):
>
> <<<<<<<<<<<<<<<<<<<<<<
> Jul 12 20:33:13 <rindolf>       Another question: can I depend on compiled 
> python
> bytecode (.pyc) for "hiding" code? Doesn't it still contain all the
> identifiers
> verbatim?
> Jul 12 20:33:28 <lvh>   rindolf: No. You cannot hide code, stop trying.
> Jul 12 20:33:34 <kniht> rindolf: why are you trying to hide code?
> Jul 12 20:35:53 <DeadPanda>     rindolf, check the new IEEE Security and 
> Privacy
> (if you can), you can't hide Python code
> Jul 12 20:38:21 <DeadPanda>     rindolf, unfortunately, there's nothing you 
> can
> do.
> Otoh, you can probably do the bare minimum and make your boss happy.
> Jul 12 20:38:55 <kniht> rindolf: because these final users are liars and
> cheats? if that's the business plan, not sure what I can say
>>>>>>>>>>>>>>>>>>>>>>>
>
> Python knows the identifiers of the variables at run-time. For example:
>
> <<<<<<<<<<<<<<<<<<<<<<
> shlomi:~$ cat exec-test.py
> #!/usr/bin/env python
>
> import sys
>
> a = "I am a"
> b = "I'm b"
>
> exec(sys.stdin.readline())
> shlomi:~$ python exec-test.py
> print a
> I am a
> shlomi:~$ python exec-test.py
> print b
> I'm b
> shlomi:~$
>>>>>>>>>>>>>>>>>>>>>>>
>
> Even if you're not using exec(), eval() or friends, python still has to
> accomodate for them being potentially used and as a result keeps this
> information in the bytecode. My partner said he doesn't use eval and friends
> because they are "a bad programming practice" and as a result thought he
> was safe. However, that's not the case.

Of course they agree with you, I agree with you too - however in a
business environment the technical argument is not always the one that
wins.

>
> He told his clients that Python bytecode was only marginally worse than
> Java and .NET bytecode which "are used to protect the code of highly sensitive
> IDF and US millitary applications - bytecode is sufficient protection."
> However, according to:

No I didn't. I gave YOU examples of IDF and US military applications
being written in Java and C# and them thinking that bytecode is
"sufficient protection" to hide code - which they do. I also agreed
with you that it isn't sufficient protection and you can reverse
engineer just about anything.

I didn't tell my "clients" anything - my employer said you can't use
Perl because it doesn't have compiled bytecode, I asked what about
python, it does -  they came back to me saying to use it.

>
> http://developers.slashdot.org/article.pl?sid=05/06/28/2319213&tid=108
>
> "java is [a] cake to reverse engineer".
>
> So it's not adequate protection, and Python is even substantially less than
> that.

Why is it substantially less than that? -- Have you compared bytecode
generated by CPython, Jython and Java?

>
> I next suggested he may opt to use obfuscators to obfuscate his code, and he
> said that:
>
> <<<<<<<<<<<<<<<<<<<<<<
> They can't sue me, they'll have to sue whoever reverse engineered the code - I
> believe it's illegal to reverse engineer commercial compiled bytecode - it
> isn't however to reverse engineer obfuscated code far as I know -- it's not a
> technical issue, it's a legal/business one.
>>>>>>>>>>>>>>>>>>>>>>>

What? - you are quoting me out of context again. You said to me
something along the lines of good luck being sued after your compiled
bytecode is reverse engineered.

>
> I don't understand the distinction between obfuscated code and bytecode
> in this case. And like I told him "Any sufficiently advanced obfuscation is
> indistinguishable from bytecode.".

Compiled bytecode generates machine code just like compiling C code -
except the machine code isn't for your architecture and CPU, but
rather a virtual machine like python, java or .Net. It's a little more
than fig leaf protection as you say, but possible to reverse engineer,
sure.

>
> Talking with a different Python programmer, he told me that .pyc were
> considered adequate "protection" for them, despite the fact there are
> several .pyc disassemblers and decompilers present.

There are disassemblers and decompilers for just about all compiled
bytecode out there - they show you the virtual machine instructions
used and some reference identifiers. There are also disassemblers and
decompilers for compiled C code, that also show you the machine
instructions being used.

>
> In short, depending on .pyc's for protecting your source code, only provides
> fig-leaf protection.

pyc is compiled bytecode - so while you can reverse engineer compiled
bytecode (as you can reverse engineer just about everything) -- you
don't have nesting depths, scope, type - all you have is numeric
codes, numeric addresses and some visible references. It's a little
more than fig-leaf protection.

>
> ------------------------------
>
> In short, I enjoyed this discussion and learned some new things about Python.
> Another thing I was happy to find out was that often my intuition and
> understanding were more correct than the knowledge of someone who's been
> programming Python intensively for 3 years.

Good for you.

>
> Regards,
>
>    Shlomi Fish
>
>
> --
> -----------------------------------------------------------------
> Shlomi Fish       http://www.shlomifish.org/
> Original Riddles - http://www.shlomifish.org/puzzles/
>
> God gave us two eyes and ten fingers so we will type five times as much as we
> read.
>
_______________________________________________
Python-il mailing list
Python-il@hamakor.org.il
http://hamakor.org.il/cgi-bin/mailman/listinfo/python-il

Re: [Python-il] Perl Vs. Python on Various Points

לענות