urllib2 performance on windows, usb connection

2009-02-05 Thread dq
I've googled this pretty extensively and can't find anyone who's had the 
same problem, so here it is:


I wrote a console program in python to download podcasts, so speed is an 
issue.  I have 1.6 M down.  The key bit of downloading code is this:


source = urllib2.urlopen( url )
target = open( filename, 'wb' )
target.write( source.read() )

This runs great on Ubuntu.  I get DL speeds of about 1.5 Mb/s on the 
SATA HD or on a usb-connected iPod, but if I run the same program on 
Windows (with a 2 GHz core 2 duo, 7200 rpm sata drive---better hardware 
specs than the Ubuntu box), it maxes out at about 500 kb/s.  Worse, if I 
DL directly to my iPod in disk mode, I'm lucky if I even hit 100 kb/s.


So does anyone know what the deal is with this?  Why is the same code so 
much slower on Windows?  Hope someone can tell me before a holy war 
erupts :-)


--danny
--
http://mail.python.org/mailman/listinfo/python-list


Re: urllib2 performance on windows, usb connection

2009-02-06 Thread dq

Martin v. Löwis wrote:

So does anyone know what the deal is with this?  Why is the same code so
much slower on Windows?  Hope someone can tell me before a holy war
erupts :-)


Only the holy war can give an answer here. It certainly has *nothing* to
do with Python; Python calls the operating system functions to read from
the network and write to the disk almost directly. So it must be the
operating system itself that slows it down.

To investigate further, you might drop the write operating, and measure
only source.read(). If that is slower, then, for some reason, the
network speed is bad on Windows. Maybe you have the network interfaces
misconfigured? Maybe you are using wireless on Windows, but cable on
Linux? Maybe you have some network filtering software running on
Windows? Maybe it's just that Windows sucks?-)

If the network read speed is fine, but writing slows down, I ask the
same questions. Perhaps you have some virus scanner installed that
filters all write operations? Maybe Windows sucks?

Regards,
Martin



Thanks for the ideas, Martin.  I ran a couple of experiments to find the 
culprit, by downloading the same 20 MB file from the same fast server. 
I compared:


1.  DL to HD vs USB iPod.
2.  AV on-access protection on vs. off
3.  "source. read()" only vs.  "file.write( source.read() )"

The culprit is definitely the write speed on the iPod.  That is, 
everything runs plenty fast (~1 MB/s down) as long as I'm not writing 
directly to the iPod.  This is kind of odd, because if I copy the file 
over from the HD to the iPod using windows (drag-n-drop), it takes about 
a second or two, so about 10 MB/s.


So the problem is definitely partially Windows, but it also seems that 
Python's file.write() function is not without blame.  It's the 
combination of Windows, iPod and Python's data stream that is slowing me 
down.


I'm not really sure what I can do about this.  I'll experiment a little 
more and see if there's any way around this bottleneck.  If anyone has 
run into a problem like this, I'd love to hear about it...


thanks again,
--danny
--
http://mail.python.org/mailman/listinfo/python-list


Re: urllib2 performance on windows, usb connection

2009-02-06 Thread dq

MRAB wrote:

dq wrote:
 > Martin v. Löwis wrote:
 >>> So does anyone know what the deal is with this?  Why is the same 
code so

 >>> much slower on Windows?  Hope someone can tell me before a holy war
 >>> erupts :-)
 >>
 >> Only the holy war can give an answer here. It certainly has 
*nothing* to
 >> do with Python; Python calls the operating system functions to read 
from

 >> the network and write to the disk almost directly. So it must be the
 >> operating system itself that slows it down.
 >>
 >> To investigate further, you might drop the write operating, and measure
 >> only source.read(). If that is slower, then, for some reason, the
 >> network speed is bad on Windows. Maybe you have the network interfaces
 >> misconfigured? Maybe you are using wireless on Windows, but cable on
 >> Linux? Maybe you have some network filtering software running on
 >> Windows? Maybe it's just that Windows sucks?-)
 >>
 >> If the network read speed is fine, but writing slows down, I ask the
 >> same questions. Perhaps you have some virus scanner installed that
 >> filters all write operations? Maybe Windows sucks?
 >>
 >> Regards,
 >> Martin
 >>
 >
 > Thanks for the ideas, Martin.  I ran a couple of experiments to find the
 > culprit, by downloading the same 20 MB file from the same fast server. I
 > compared:
 >
 > 1.  DL to HD vs USB iPod.
 > 2.  AV on-access protection on vs. off
 > 3.  "source. read()" only vs.  "file.write( source.read() )"
 >
 > The culprit is definitely the write speed on the iPod.  That is,
 > everything runs plenty fast (~1 MB/s down) as long as I'm not writing
 > directly to the iPod.  This is kind of odd, because if I copy the file
 > over from the HD to the iPod using windows (drag-n-drop), it takes about
 > a second or two, so about 10 MB/s.
 >
 > So the problem is definitely partially Windows, but it also seems that
 > Python's file.write() function is not without blame.  It's the
 > combination of Windows, iPod and Python's data stream that is slowing me
 > down.
 >
 > I'm not really sure what I can do about this.  I'll experiment a little
 > more and see if there's any way around this bottleneck.  If anyone has
 > run into a problem like this, I'd love to hear about it...
 >
You could try copying the file to the iPod using the command line, or
copying data from disk to iPod in, say, C, anything but Python. This
would allow you to identify whether Python itself has anything to do
with it.


Well, I think I've partially identified the problem.  target.write( 
source.read() ) runs perfectly fast, copies 20 megs in about a second, 
from HD to iPod.  However, if I run the same code in a while loop, using 
a certain block size, say target.write( source.read(4096) ), it takes 
forever (or at least I'm still timing it while I write this post).


The mismatch seems to be between urllib2's block size and the write 
speed of the iPod, I might try to tweak this a little in the code and 
see if it has any effect.


Oh, there we go:   20 megs in 135.8 seconds.  Yeah... I might want to 
try to improve that...

--
http://mail.python.org/mailman/listinfo/python-list


Re: urllib2 performance on windows, usb connection

2009-02-06 Thread dq

dq wrote:

MRAB wrote:

dq wrote:
 > Martin v. Löwis wrote:
 >>> So does anyone know what the deal is with this?  Why is the same 
code so

 >>> much slower on Windows?  Hope someone can tell me before a holy war
 >>> erupts :-)
 >>
 >> Only the holy war can give an answer here. It certainly has 
*nothing* to
 >> do with Python; Python calls the operating system functions to 
read from

 >> the network and write to the disk almost directly. So it must be the
 >> operating system itself that slows it down.
 >>
 >> To investigate further, you might drop the write operating, and 
measure

 >> only source.read(). If that is slower, then, for some reason, the
 >> network speed is bad on Windows. Maybe you have the network 
interfaces

 >> misconfigured? Maybe you are using wireless on Windows, but cable on
 >> Linux? Maybe you have some network filtering software running on
 >> Windows? Maybe it's just that Windows sucks?-)
 >>
 >> If the network read speed is fine, but writing slows down, I ask the
 >> same questions. Perhaps you have some virus scanner installed that
 >> filters all write operations? Maybe Windows sucks?
 >>
 >> Regards,
 >> Martin
 >>
 >
 > Thanks for the ideas, Martin.  I ran a couple of experiments to 
find the
 > culprit, by downloading the same 20 MB file from the same fast 
server. I

 > compared:
 >
 > 1.  DL to HD vs USB iPod.
 > 2.  AV on-access protection on vs. off
 > 3.  "source. read()" only vs.  "file.write( source.read() )"
 >
 > The culprit is definitely the write speed on the iPod.  That is,
 > everything runs plenty fast (~1 MB/s down) as long as I'm not writing
 > directly to the iPod.  This is kind of odd, because if I copy the file
 > over from the HD to the iPod using windows (drag-n-drop), it takes 
about

 > a second or two, so about 10 MB/s.
 >
 > So the problem is definitely partially Windows, but it also seems that
 > Python's file.write() function is not without blame.  It's the
 > combination of Windows, iPod and Python's data stream that is 
slowing me

 > down.
 >
 > I'm not really sure what I can do about this.  I'll experiment a 
little

 > more and see if there's any way around this bottleneck.  If anyone has
 > run into a problem like this, I'd love to hear about it...
 >
You could try copying the file to the iPod using the command line, or
copying data from disk to iPod in, say, C, anything but Python. This
would allow you to identify whether Python itself has anything to do
with it.


Well, I think I've partially identified the problem.  target.write( 
source.read() ) runs perfectly fast, copies 20 megs in about a second, 
from HD to iPod.  However, if I run the same code in a while loop, using 
a certain block size, say target.write( source.read(4096) ), it takes 
forever (or at least I'm still timing it while I write this post).


The mismatch seems to be between urllib2's block size and the write 
speed of the iPod, I might try to tweak this a little in the code and 
see if it has any effect.


Oh, there we go:   20 megs in 135.8 seconds.  Yeah... I might want to 
try to improve that...


After some tweaking of the block size, I managed to get the DL speed up 
to about 900 Mb/s.  It's still not quite Ubuntu, but it's a good order 
of magnitude better.  The new DL code is pretty much this:


"""
blocksize = 2 ** 16# plus or minus a power of 2
source = urllib2.urlopen( 'url://string' )
target = open( pathname, 'wb')
fullsize = float( source.info()['Content-Length'] )
DLd = 0
while DLd < fullsize:
DLd = DLd + blocksize
# optional:  write some DL progress info
# somewhere, e.g. stdout
target.close()
source.close()
"""







--
http://mail.python.org/mailman/listinfo/python-list


Re: urllib2 performance on windows, usb connection

2009-02-06 Thread dq

MRAB wrote:

dq wrote:

MRAB wrote:

dq wrote:

Martin v. Löwis wrote:

So does anyone know what the deal is with this?  Why is the
same code so much slower on Windows?  Hope someone can tell
me before a holy war erupts :-)


Only the holy war can give an answer here. It certainly has 
*nothing* to do with Python; Python calls the operating system

functions to read from the network and write to the disk almost
directly. So it must be the operating system itself that slows
it down.

To investigate further, you might drop the write operating, and
measure only source.read(). If that is slower, then, for some
reason, the network speed is bad on Windows. Maybe you have the
network interfaces misconfigured? Maybe you are using wireless
on Windows, but cable on Linux? Maybe you have some network
filtering software running on Windows? Maybe it's just that
Windows sucks?-)

If the network read speed is fine, but writing slows down, I
ask the same questions. Perhaps you have some virus scanner
installed that filters all write operations? Maybe Windows
sucks?

Thanks for the ideas, Martin.  I ran a couple of experiments to find 
the culprit, by downloading the same 20 MB file from the

same fast server. I compared:

1.  DL to HD vs USB iPod.
2.  AV on-access protection on vs. off
3.  "source. read()" only vs.  "file.write( source.read() )"

The culprit is definitely the write speed on the iPod.  That is, 
everything runs plenty fast (~1 MB/s down) as long as I'm not

writing directly to the iPod.  This is kind of odd, because if I
copy the file over from the HD to the iPod using windows
(drag-n-drop), it takes about a second or two, so about 10 MB/s.

So the problem is definitely partially Windows, but it also seems
that Python's file.write() function is not without blame.  It's
the combination of Windows, iPod and Python's data stream that is
slowing me down.

I'm not really sure what I can do about this.  I'll experiment a
little more and see if there's any way around this bottleneck.
If anyone has run into a problem like this, I'd love to hear
about it...


You could try copying the file to the iPod using the command line,
or copying data from disk to iPod in, say, C, anything but Python.
This would allow you to identify whether Python itself has anything
to do with it.


Well, I think I've partially identified the problem.  target.write( 
source.read() ) runs perfectly fast, copies 20 megs in about a

second, from HD to iPod.  However, if I run the same code in a while
loop, using a certain block size, say target.write( source.read(4096)
), it takes forever (or at least I'm still timing it while I write
this post).

The mismatch seems to be between urllib2's block size and the write 
speed of the iPod, I might try to tweak this a little in the code and

see if it has any effect.

Oh, there we go:   20 megs in 135.8 seconds.  Yeah... I might want to
try to improve that...


How long does it take to transfer 4KB? If it can transfer 1MB/s then I'd
say that 4KB is too small. Generally speaking, the higher the data rate,
the larger the blocks you should be transferring at a time, IMHO.

You could write a script to test the transfer speed with different block
sizes.


Thanks MRAB, 32 or 64 KB seems to be quickest, but I'll do a more 
scientific test soon and see what turns up.

--
http://mail.python.org/mailman/listinfo/python-list


Re: urllib2 performance on windows, usb connection

2009-02-06 Thread dq

MRAB wrote:

dq wrote:

dq wrote:

MRAB wrote:

dq wrote:

Martin v. Löwis wrote:
So does anyone know what the deal is with this?  Why is 
the same code so much slower on Windows?  Hope someone 
can tell me before a holy war erupts :-)


Only the holy war can give an answer here. It certainly has
 *nothing* to do with Python; Python calls the operating 
system functions to read from the network and write to the 
disk almost directly. So it must be the operating system 
itself that slows it down.


To investigate further, you might drop the write operating,
 and measure only source.read(). If that is slower, then, 
for some reason, the network speed is bad on Windows. Maybe
 you have the network interfaces misconfigured? Maybe you 
are using wireless on Windows, but cable on Linux? Maybe 
you have some network filtering software running on 
Windows? Maybe it's just that Windows sucks?-)


If the network read speed is fine, but writing slows down,
 I ask the same questions. Perhaps you have some virus 
scanner installed that filters all write operations? Maybe

 Windows sucks?

Regards, Martin



Thanks for the ideas, Martin.  I ran a couple of experiments
 to find the culprit, by downloading the same 20 MB file from
 the same fast server. I compared:

1.  DL to HD vs USB iPod. 2.  AV on-access protection on vs.
 off 3.  "source. read()" only vs.  "file.write(
source.read() )"

The culprit is definitely the write speed on the iPod.  That 
is, everything runs plenty fast (~1 MB/s down) as long as I'm
not writing directly to the iPod.  This is kind of odd, 
because if I copy the file over from the HD to the iPod using
 windows (drag-n-drop), it takes about a second or two, so 
about 10 MB/s.


So the problem is definitely partially Windows, but it also 
seems that Python's file.write() function is not without 
blame. It's the combination of Windows, iPod and Python's 
data stream that is slowing me down.


I'm not really sure what I can do about this.  I'll 
experiment a little more and see if there's any way around 
this bottleneck.  If anyone has run into a problem like this,

 I'd love to hear about it...

You could try copying the file to the iPod using the command 
line, or copying data from disk to iPod in, say, C, anything 
but Python. This would allow you to identify whether Python 
itself has anything to do with it.


Well, I think I've partially identified the problem. 
target.write( source.read() ) runs perfectly fast, copies 20 megs

 in about a second, from HD to iPod.  However, if I run the same
 code in a while loop, using a certain block size, say 
target.write( source.read(4096) ), it takes forever (or at least

 I'm still timing it while I write this post).

The mismatch seems to be between urllib2's block size and the 
write speed of the iPod, I might try to tweak this a little in 
the code and see if it has any effect.


Oh, there we go:   20 megs in 135.8 seconds.  Yeah... I might 
want to try to improve that...


After some tweaking of the block size, I managed to get the DL 
speed up to about 900 Mb/s.  It's still not quite Ubuntu, but it's

 a good order of magnitude better.  The new DL code is pretty much
 this:

""" blocksize = 2 ** 16# plus or minus a power of 2 source = 
urllib2.urlopen( 'url://string' ) target = open( pathname, 'wb') 
fullsize = float( source.info()['Content-Length'] ) DLd = 0 while 
DLd < fullsize: DLd = DLd + blocksize # optional:  write some DL 
progress info # somewhere, e.g. stdout target.close() 
source.close() """


I'd like to suggest that the block size you add to 'DLd' be the 
actual size of the returned block, just in case the read() doesn't 
return all you asked for (it might not be guaranteed, and the chances

 are that the final block will be shorter, unless 'fullsize' happens
 to be a multiple of 'blocksize').

If less is returned by read() then the while-loop might finish before
 all the data has been downloaded, and if you just add 'blocksize' 
each time it might end up > 'fullsize', ie apparently >100% 
downloaded!


Interesting.  I'll if to see if any of the downloaded files end 
prematurely :)


btw, I forgot the most important line of the code!

"""
blocksize = 2 ** 16# plus or minus a power of 2
source = urllib2.urlopen( 'url://string' )
target = open( pathname, 'wb')
fullsize = float( source.info()['Content-Length'] )
DLd = 0
while DLd < fullsize:
#  +++
target.write( source.read( blocksize ) )  # +++
#  +++
DLd = DLd + blocksize
# optional:  write some DL progress info
# somewhere, e.g. stdout
target.close()
source.close()
"""

Using that, I'm not quite sure where I can grab onto the value of how 
much was actually re

Re: urllib2 performance on windows, usb connection

2009-02-07 Thread dq

MRAB wrote:

dq wrote:

MRAB wrote:

dq wrote:

dq wrote:

MRAB wrote:

dq wrote:

Martin v. Löwis wrote:
So does anyone know what the deal is with this?  Why is the 
same code so much slower on Windows?  Hope someone can tell me 
before a holy war erupts :-)


Only the holy war can give an answer here. It certainly has
 *nothing* to do with Python; Python calls the operating system 
functions to read from the network and write to the disk almost 
directly. So it must be the operating system itself that slows 
it down.


To investigate further, you might drop the write operating,
 and measure only source.read(). If that is slower, then, for 
some reason, the network speed is bad on Windows. Maybe
 you have the network interfaces misconfigured? Maybe you are 
using wireless on Windows, but cable on Linux? Maybe you have 
some network filtering software running on Windows? Maybe it's 
just that Windows sucks?-)


If the network read speed is fine, but writing slows down,
 I ask the same questions. Perhaps you have some virus scanner 
installed that filters all write operations? Maybe

 Windows sucks?

Regards, Martin



Thanks for the ideas, Martin.  I ran a couple of experiments
 to find the culprit, by downloading the same 20 MB file from
 the same fast server. I compared:

1.  DL to HD vs USB iPod. 2.  AV on-access protection on vs.
 off 3.  "source. read()" only vs.  "file.write(
source.read() )"

The culprit is definitely the write speed on the iPod.  That is, 
everything runs plenty fast (~1 MB/s down) as long as I'm
not writing directly to the iPod.  This is kind of odd, because 
if I copy the file over from the HD to the iPod using
 windows (drag-n-drop), it takes about a second or two, so about 
10 MB/s.


So the problem is definitely partially Windows, but it also seems 
that Python's file.write() function is not without blame. It's 
the combination of Windows, iPod and Python's data stream that is 
slowing me down.


I'm not really sure what I can do about this.  I'll experiment a 
little more and see if there's any way around this bottleneck.  
If anyone has run into a problem like this,

 I'd love to hear about it...

You could try copying the file to the iPod using the command line, 
or copying data from disk to iPod in, say, C, anything but Python. 
This would allow you to identify whether Python itself has 
anything to do with it.


Well, I think I've partially identified the problem. target.write( 
source.read() ) runs perfectly fast, copies 20 megs

 in about a second, from HD to iPod.  However, if I run the same
 code in a while loop, using a certain block size, say 
target.write( source.read(4096) ), it takes forever (or at least

 I'm still timing it while I write this post).

The mismatch seems to be between urllib2's block size and the write 
speed of the iPod, I might try to tweak this a little in the code 
and see if it has any effect.


Oh, there we go:   20 megs in 135.8 seconds.  Yeah... I might want 
to try to improve that...


After some tweaking of the block size, I managed to get the DL speed 
up to about 900 Mb/s.  It's still not quite Ubuntu, but it's

 a good order of magnitude better.  The new DL code is pretty much
 this:

""" blocksize = 2 ** 16# plus or minus a power of 2 source = 
urllib2.urlopen( 'url://string' ) target = open( pathname, 'wb') 
fullsize = float( source.info()['Content-Length'] ) DLd = 0 while 
DLd < fullsize: DLd = DLd + blocksize # optional:  write some DL 
progress info # somewhere, e.g. stdout target.close() source.close() 
"""


I'd like to suggest that the block size you add to 'DLd' be the 
actual size of the returned block, just in case the read() doesn't 
return all you asked for (it might not be guaranteed, and the chances

 are that the final block will be shorter, unless 'fullsize' happens
 to be a multiple of 'blocksize').

If less is returned by read() then the while-loop might finish before
 all the data has been downloaded, and if you just add 'blocksize' 
each time it might end up > 'fullsize', ie apparently >100% downloaded!


Interesting.  I'll if to see if any of the downloaded files end 
prematurely :)


btw, I forgot the most important line of the code!

"""
blocksize = 2 ** 16# plus or minus a power of 2
source = urllib2.urlopen( 'url://string' )
target = open( pathname, 'wb')
fullsize = float( source.info()['Content-Length'] )
DLd = 0
while DLd < fullsize:
#  +++
target.write( source.read( blocksize ) )  # +++
#  +++
DLd = DLd + blocksize
# optional:  write some DL progress info
# somewhere, e.g. stdout
target.close()
source.close()
"""

Using that, I'm not quite sure where I can grab onto the value of ho