Re: [Tutor] self.name is calling the __set__ method of another class

2019-04-30 Thread David L Neil

Hi Arup,


On 30/04/19 5:55 AM, Arup Rakshit wrote:

class NonBlank:
 def __init__(self, storage_name):
 self.storage_name = storage_name
 
 def __set__(self, instance, value):

 if not isinstance(value, str):
 raise TypeError("%r must be of type 'str'" % self.storage_name)
 elif len(value) == 0:
 raise ValueError("%r must not be empty" % self.storage_name)
 instance.__dict__[self.storage_name] = value

class Customer:
 name = NonBlank('name')
 email = NonBlank('email')
 
 def __init__(self, name, email, fidelity=0):

 self.name = name
 self.email = email
 self.fidelity = fidelity
 
 def full_email(self):

 return '{0} <{1}>'.format(self.name, self.email)
 
if __name__ == '__main__':

 cus = Customer('Arup', 99)

Running this code throws an error:

Traceback (most recent call last):
   File 
"/Users/aruprakshit/python_playground/pycon2017/decorators_and_descriptors_decoded/customer.py",
 line 25, in 
 cus = Customer('Arup', 99)
   File 
"/Users/aruprakshit/python_playground/pycon2017/decorators_and_descriptors_decoded/customer.py",
 line 18, in __init__
 self.email = email
   File 
"/Users/aruprakshit/python_playground/pycon2017/decorators_and_descriptors_decoded/customer.py",
 line 7, in __set__
 raise TypeError("%r must be of type 'str'" % self.storage_name)
TypeError: 'email' must be of type 'str'
Process terminated with an exit code of 1

Now I am not getting how the __set__() method from NonBlank is being called 
inside the __init__() method. Looks like some magic is going on under the hood. 
Can anyone please explain this how self.name and self.email assignment is 
called the __set__ from NonBlank? What is the name of this concept?



Use the tools provided - follow the Traceback and interpret each step:-

>  cus = Customer('Arup', 99)

means: instantiate a Customer object, which takes us to

>  def __init__(self, name, email, fidelity=0):

where:
- name is set to a string: 'Arup'
- email is set to an *integer*: 99, and
- fidelity is set to become another integer with a value of 0
(in the first instance)

Ignoring name, we arrive at

>  self.email = email

which *appears to be* the creation of an integer(!) within the cus 
Customer instance.


However (the "magic") when the module was loaded into the Python 
interpreter self.email has already been defined as:


>  email = NonBlank('email')

which means that:

>  def __init__(self, storage_name):
>  self.storage_name = storage_name

made it (past tense!) an instance of the NonBlank object with a 
storage_name of email. (and with a __set__ method).


So, returning to the Trace, specifically:

>File 
"/Users/aruprakshit/python_playground/pycon2017/decorators_and_descriptors_decoded/customer.py", 
line 18, in __init__

>  self.email = email

what now happens is that the self.email instance of a NonBlank object 
receives the value passed-in as email (ie 99), and invokes the method:


>  def __set__(self, instance, value):

In due course, we find that 99 is not an acceptable value:

>  if not isinstance(value, str):
>  raise TypeError("%r must be of type 'str'" % 
self.storage_name)


and thus:

> TypeError: 'email' must be of type 'str'
> Process terminated with an exit code of 1

Crash!


Of course it is a 'toy example' - when you could plug two 'is it a 
string' checks straight into Customer, why not keep-it-simple and do 
just that? - without the added abstraction on top of an abstraction!


However, the author is illustrating a useful tool - should you find a 
situation where the 'checks' are much more involved or complex.



(NB in addition to, not an alternative to, the discussions Steven has 
offered)


Given previous conversations, I'm not surprised that you were mystified. 
The fact that I had to read it twice, and that the above explanation is 
NOT a 'straight line', indicates that there is probably a better (more 
simple) approach - and one which is MUCH more likely to be understood by 
our Python programming colleagues (possibly including our 'future selves'!)


As Steven explained, this is a complex environment where only those with 
a good understanding of the meta abstractions would even want to play 
(IMHO). Perhaps you would be better served by actually writing some 
Python applications, and with such experience under-your-belt, adding 
these 'advanced knowledge' ideas at some later time, if/when needed?)


Assuming use of a recent version of Python, you may like to solve this 
specific problem the same way you might in other programming languages:


<<<
typing — Support for type hints

New in version 3.5.

Note
The typing module has been included in the standard library on a 
provisional basis. New features might be added and API may change even 
between minor releases if deemed necessary by the core developers.


This module suppor

Re: [Tutor] feedparser in python

2019-04-30 Thread Alan Gauld via Tutor
On 30/04/2019 00:23, nathan tech wrote:

> The results were as follows:
> 
>      tim( a url): 2.9 seconds
> 
>      tim(the downoaded file(: 1.8 seconds
> 
> 
> That tells me that roughly 1.1 seconds is network related, fair enough.

Or about 30% of the time.
Since the network element will increase as data
size increases as will the parse time it may be
a near linear relationship. Only more extensive
tests would tell.

> entire thing again, they all say use ETAG and Modified, but my feeds 
> never, have them.
> 
> I've tried feeds from several sources, and none have them in the http 
> header.

Have you looked at the headers to see what they do have?

> To that end, that is why I mentioned in the previous email about .date, 
> because that seemed the most likely, but even that failed.

Again you tell us that something failed. But don't say
how it failed. Do you mean that date did not exist?
Why did you think it would if you had already inspected
the headers?

Can you share some actual code that you used to check
these fields? And sow us the actual headers you are
reading?

> 1, download a feed to the computer.
> 
> 2. Occasionally, check the website to see if the donloaded feed is out 
> of date if it is, redownload it.

Seems a good plan. You just need to identify when changes occur.

Even better would be if the sites provided a web API to access
the data programmatically, but of course few sites do that...


> I did think about using threading for this, for example:

> user sees downloaded feed data only, in the background, the program 
> checks for updates on each feed, and the user may see them gradually 
> start to update.
> 
> This would work, in that execution would not fail at any time, but it 
> seems... clunky, to me I suppose? And rather data jheavy for the end 
> user, especially if, as you suggest, a feed is 10 MB in size.

Only data heavy if you download everything. If you only do the
headers and you only have a relatively few feeds its a good scheme.

As an alternative is there anything in the feed body that identifies
its creation date? Could you change your parsing mechanism to
parse the data as it arrives and stop if the date/time has not
changed? That minimises the download data.

> Furthering to that, how many threads is safe?

You have a lot of I/O going on so you could run quite a few threads
without blocking issues. How many feeds do you watch? Logic
would say have one thread per feed.

But how real time does this really need to be? Would it be
terrible if updates were, say 1 minute late? If that's the case
a single threaded solution may be fine. (and much simpler)
I'd certainly focus on a single threaded solution initially. Get it
working first then think about performance tuning.


-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] Fwd: Re: feedparser in python

2019-04-30 Thread Alan Gauld via Tutor
Sharing with the list, comments later. Busy right now.



 Forwarded Message 
Subject:Re: [Tutor] feedparser in python
Date:   Tue, 30 Apr 2019 14:14:35 +
From:   nathan tech 
To: Alan Gauld 



Hi Alan,

Thanks for your emails.

I considered what you said, and came up with a couple of possibilities,
listed below.

Before that, I wanted to clarify what I meant when I said "not working."
I kept meaning to do it, and kept forgetting.

According to the docs:

?? f=feedparser.parse(url)

Will download a feed, parse it into xml, and return a dict of that feed,
which it does. There are obviously some special things going on with
tha,t because it allows, for instance, f.entries[0].title, rather than
f["entries"][0]["title"].

Anyway.

The docs then say that feedparser will have elements of etag and
modified, which you can then pass in an update, like so:

?? newfeed=feedparser.parse(url, etag=f.etag, modified=f.modified)

To that end, it would check the headers, and if the feed was not
updated, set newfeed.status to 304.


Which is great, accept... My feeds never have a .etag or a .modified
anywhere.

Even f.get("etag") returns None. which while I could pass it that way,
would mean the feed gets downloaded over and over and over again.

In an example of an rss feed of size 10 MB, that's 240 MB a day, and by
3 days you're over a GIG.


To that end, when I said not working, I meant, nothing I parsed in place
of f.etag and or f.modified seemed to work in that it juts downloaded
the entire feed agai


Now, onto some solutions:

I considered what you said and realised actually, logic says all we need
to know is: is file on local hard drive older than file on web server,
right?

Which lead me briefly to, would os.path.getfilemtime work? Probably not,
but I am curious if there are alternatives to thhat.


In any case, finally I thought, what about f.entries

This is a list of entries in an rss feed.

Even without an update key, which they usually have:

?? date=f.entries[0].updated = "Fri, August 20th 2009"

We could simply do:

?? if(downlaoded_first_entry==f.entries[0]):

 # feed is jup to date, so quit.


This is where I got stuck.

urllib2.urlopen() from my calculations, seems to download the file, then
open it?

Is that correct, or is that wrong?

I wrote up this function below:

?? import urllib2

?? import time

?? url="https://www.bigfinish.com/podcasts.rss";

?? start_time=time.time()

?? j=urllib2.urlopen(url)

?? j.close() # lets not be messy

?? print time.time()-start_time

That came out at 0.8 seconds.

perhaps that is just network connectivity?

but if we remember back to the tests run with the tim function, the
difference in time there was around 1.1 seconds.

The similarities were.. worrying is all.

If urllib2.urlopen doesn't download the file, and merely opens a link
up, as it were, then great.


My theory here is to:

open the web file,

discard any data up to ""

until "" is reached, save the data to a list.

Covnert that list using an xml parser into a dictionary, and then
compare either updated, title, or the whole thing.

If one of them says, this isn't right, download the feed.

If they match, the feed on local drive is up to date.

To be fair, I could clean this up further, and simply have:

until  or  is reached save to a list, but that's a
refinement for later.


I'm looking forward to hear your thoughts on this.

I picked up python myself over the course of a year, so am not quite
used to having back and forth like these yet. Especially not with
someone who knows what they're talking about. :)

Thanks

Nate


On 30/04/2019 08:47, Alan Gauld via Tutor wrote:
> On 30/04/2019 00:23, nathan tech wrote:
>
>> The results were as follows:
>>
>> ?? tim( a url): 2.9 seconds
>>
>> ?? tim(the downoaded file(: 1.8 seconds
>>
>>
>> That tells me that roughly 1.1 seconds is network related, fair enough.
> Or about 30% of the time.
> Since the network element will increase as data
> size increases as will the parse time it may be
> a near linear relationship. Only more extensive
> tests would tell.
>
>> entire thing again, they all say use ETAG and Modified, but my feeds
>> never, have them.
>>
>> I've tried feeds from several sources, and none have them in the http
>> header.
> Have you looked at the headers to see what they do have?
>
>> To that end, that is why I mentioned in the previous email about .date,
>> because that seemed the most likely, but even that failed.
> Again you tell us that something failed. But don't say
> how it failed. Do you mean that date did not exist?
> Why did you think it would if you had already inspected
> the headers?
>
> Can you share some actual code that you used to check
> these fields? And sow us the actual headers you are
> reading?
>
>> 1, download a feed to the computer.
>>
>> 2. Occasionally, check the website to see if the donloaded feed is out
>> of date if it is, redownload it.