Have do_nothing as default action for dictionary?

2017-09-03 Thread Christopher Reimer via Python-list

Greetings,

I was playing around this piece of example code (written from memory).


def filter_text(key, value):

    def do_nothing(text): return text

    return {'this': call_this,

  'that': call_that,

  'what': do_nothing

 }[key](value)


Is there a way to refactor the code to have the inner do_nothing 
function be the default action for the dictionary?


The original code was a series of if statements. The alternatives 
include using a lambda to replace the inner function or a try-except 
block on the dictionary to return value on KeyError exception.


What's the most pythonic and fastest?

Thank you,

Chris R.

--
https://mail.python.org/mailman/listinfo/python-list


Re: Have do_nothing as default action for dictionary?

2017-09-04 Thread Christopher Reimer via Python-list

Greetings,

After reading everyone's comments and doing a little more research, I 
re-implemented my function as a callable class.


    def __call__(self, key, value):
    if key not in self._methods:
    return value
    return self._methods[key](value)

This behaves like my previous function, solved the problem that I had 
with the dictionary, the dictionary is created only once, a half dozen 
functions got moved into the new class, and the old class now has less 
clutter.


Thanks everyone!

Chris R.
--
https://mail.python.org/mailman/listinfo/python-list


Setting property for current class from property in an different class...

2017-09-06 Thread Christopher Reimer via Python-list

Greetings,

My web scraper program has a top-level class for managing the other 
classes. I went to set up a property for the top-level class that 
changes the corresponding property in a different class.


class Scraper(object):

    def __init__(self, user_id, user_name):
    self.requestor = Requestor(user_id, user_name)

    @property
    def page_start(self):
    return self.requestor.page_start

    @page_start.setter
    def page_start(self, number):
    self.requestor.page_start = number


I get the following error for @page_start.setter when I try to run the code.

    AttributeError: 'Scraper' object has no attribute 'requestor'

That's either a bug or I'm doing it wrong. Or maybe both?

Thank you,

Chris R.

--
https://mail.python.org/mailman/listinfo/python-list


Re: Setting property for current class from property in an different class...

2017-09-06 Thread Christopher Reimer via Python-list

On 9/6/2017 7:41 PM, Stefan Ram wrote:

The following code runs here: 


Your code runs but that's not how I have mine code set up. Here's the 
revised code:



class Requestor(object):
    def __init__(self, user_id, user_name ):
    self._page_start = -1

    @property
    def page_start(self):
    return self._page_start

    @page_start.setter
    def page_start(self, number):
    self._page_start = number



class Scraper(object):
 def __init__(self, user_id, user_name):
 self.requestor = Requestor(user_id, user_name)
 @property
 def page_start(self):
 return self.requestor.page_start
 @page_start.setter
 def page_start(self, number):
 self.requestor.page_start = number

>>> test = Scraper(1, 1)
>>> test.page_start
Traceback (most recent call last):
  File "", line 1, in 
  File "", line 20, in page_start
AttributeError: 'Requestor' object has no attribute 'page_start'


That's a slightly different error than what I got my code, where the 
Scraper object didn't have the attribute.


Could it be that @property item can't call another @property item?

Thank you,

Chris R.
--
https://mail.python.org/mailman/listinfo/python-list


Re: Setting property for current class from property in an different class...

2017-09-07 Thread Christopher Reimer via Python-list

On 9/6/2017 9:26 PM, Christopher Reimer wrote:


On Sep 6, 2017, at 9:14 PM, Stefan Ram  wrote:

  I can run this (your code) without an error here (Python 3.6.0),
  from a file named "Scraper1.py":

I'll check tomorrow. I recently switched from 3.5.x to 3.6.1 in the PyCharm 
IDE. It's probably FUBAR in some obscure way.


I uninstalled Python 3.6.0 (32-bit) and Python 3.6.1 (64-bit), installed 
Python 3.6.2 (64-bit), and everything now works.


Thanks,

Chris R.
--
https://mail.python.org/mailman/listinfo/python-list


Re: Differences between Class(Object) and Class(Dict) for dictionary usage?

2016-04-27 Thread Christopher Reimer via Python-list

On 4/26/2016 8:56 PM, Random832 wrote:
what exactly do you mean by property decorators? If you're just 
accessing them in a dictionary what's the benefit over having the 
values be simple attributes rather than properties?


After considering the feedback I got for sanity checking my code, I've 
decided to simplify the base class for the chess pieces (see code 
below). All the variables are stored inside a dictionary with most 
values accessible through properties. A custom dictionary can be loaded 
through the constructor and or saved out through the fetch_state method. 
The subclasses only have to implement the is_move_valid method, which is 
different for each type of chess piece.


Thank you,

Chris R.



class Piece(object):
def __init__(self, color, position, state=None):
if state is None:
self._state = {
'class': self.__class__.__name__,
'color': color,
'first_move': True,
'initial_position': position,
'move_count': 0,
'name': color.title() + ' ' + self.__class__.__name__,
'notation': color.title()[:1] + 
self.__class__.__name__[:1],

'position': position
}
else:
self._state = state

@property
def color(self):
return self._state['color']

def fetch_state(self):
return self._state

def is_move_valid(self, new_position, board_state):
raise NotImplementedError

@property
def move_count(self):
return self._state['move_count']

@property
def name(self):
return self._state['name']

@property
def notation(self):
return self._state['notation']

@property
def position(self):
return self._state['position']

@position.setter
def position(self, position):
self._state['position'] = position
if self._state['first_move']:
self._state['first_move'] = False
self._state['move_count'] += 1

--
https://mail.python.org/mailman/listinfo/python-list


Not x.islower() Versus x.isupper Output Results

2016-04-29 Thread Christopher Reimer via Python-list

Greetings,

I was playing around with a piece of code to remove lowercase letters 
and leave behind uppercase letters from a string when I got unexpected 
results.


string = 'Whiskey Tango Foxtrot'

list(filter((lambda x: not x.islower()), string))

['W', ' ', 'T', ' ', 'F']

Note the space characters in the list.

list(filter((lambda x: x.isupper()), string))

['W', 'T', 'F']

Note that there are no space characters in the list.

Shouldn't the results between 'not x.islower()' and 'x.isupper()' be 
identical?


The final form of this code is this:

list(filter(str.isupper, string))

['W', 'T', 'F']

Thank you,

Chris R.



--
https://mail.python.org/mailman/listinfo/python-list


Re: Finding sentinel text when using a thread pool...

2017-05-20 Thread Christopher Reimer via Python-list

On 5/20/2017 1:19 AM, dieter wrote:


If your (590) pages are linked together (such that you must fetch
a page to get the following one) and page fetching is the limiting
factor, then this would limit the parallelizability.


The pages are not linked together. The URL requires a page number. If I 
requested 1000 pages in sequence, the first 60% will have comments and 
the remaining 40% will have the sentinel text. As more comments are 
added over time, the dividing line between the last page with the oldest 
comments and the first page with the sentinel page shifts over time. 
Since I changed the code to fetch 16 pages at the same time, the run 
time got reduced by nine minutes.



If processing a selected page takes a significant amount of time
(compared to the fetching), then you could use a work queue as follows:
a page is fetched and the following page determined; if a following
page is found, processing this page is put as a job into the work queue
and page processing is continued. Free tasks look for jobs in the work queue
and process them.


I'm looking into that now. The requester class yields one page at a 
time. If I change the code to yield a list of 16 pages, I could parse 16 
pages at a time. That change would require a bit more work but it would 
fix some problems that's been nagging me for a while about the parser class.


Thank you,

Chris Reimer
--
https://mail.python.org/mailman/listinfo/python-list


BeautifulSoup doesn't work with a threaded input queue?

2017-08-27 Thread Christopher Reimer via Python-list

Greetings,

I have Python 3.6 script on Windows to scrape comment history from a 
website. It's currently set up this way:


Requestor (threads) -> list -> Parser (threads) -> queue -> CVSWriter 
(single thread)


It takes 15 minutes to process ~11,000 comments.

When I replaced the list with a queue between the Requestor and Parser 
to speed up things, BeautifulSoup stopped working.


When I changed BeautifulSoup(contents, "lxml") to 
BeautifulSoup(contents), I get the UserWarning that no parser wasn't 
explicitly set and a reference to line 80 in threading.py (which puts it 
in the RLock factory function).


When I switched back to using list between the Requestor and Parser, the 
Parser worked again.


BeautifulSoup doesn't work with a threaded input queue?

Thank you,

Chris Reimer

--
https://mail.python.org/mailman/listinfo/python-list


Re: BeautifulSoup doesn't work with a threaded input queue?

2017-08-27 Thread Christopher Reimer via Python-list

On 8/27/2017 11:54 AM, Peter Otten wrote:


The documentation

https://www.crummy.com/software/BeautifulSoup/bs4/doc/#making-the-soup

says you can make the BeautifulSoup object from a string or file.
Can you give a few more details where the queue comes into play? A small
code sample would be ideal.


A worker thread uses a request object to get the page and puts it into 
queue as page.content (HTML).  Another worker thread gets the 
page.content from the queue to apply BeautifulSoup and nothing happens.


soup = BeautifulSoup(page_content, 'lxml')
print(soup)

No output whatsoever. If I remove 'lxml', I get the UserWarning that no 
parser wasn't explicitly set and get the reference to threading.py at 
line 80.


I verified that page.content that goes into and out of the queue is the 
same page.content that goes into and out of a list.


I read somewhere that BeautifulSoup may not be thread-safe. I've never 
had a problem with threads storing the output into a queue. Using a 
queue (random order) instead of a list (sequential order) to feed pages 
for the input is making it wonky.


Chris R.
--
https://mail.python.org/mailman/listinfo/python-list


Re: BeautifulSoup doesn't work with a threaded input queue?

2017-08-27 Thread Christopher Reimer via Python-list

On 8/27/2017 1:12 PM, MRAB wrote:

What do you mean by "queue (random order)"? A queue is sequential 
order, first-in-first-out. 


With 20 threads requesting 20 different pages, they're not going into 
the queue in sequential order (i.e., 0, 1, 2, ..., 17, 18, 19) and 
coming in at different times for the parser worker threads to get for 
processing.


Similar situation with a list but I sort the list before giving it to 
the parser, so all the items are in sequential order and fed to the 
parser one at time.


Chris R.

--
https://mail.python.org/mailman/listinfo/python-list


Re: BeautifulSoup doesn't work with a threaded input queue?

2017-08-27 Thread Christopher Reimer via Python-list

On 8/27/2017 1:31 PM, Peter Otten wrote:


Here's a simple example that extracts titles from generated html. It seems
to work. Does it resemble what you do?
Your example is similar to my code when I'm using a list for the input 
to the parser. You have soup_threads and write_threads, but no read_threads.


The particular website I'm scraping requires checking each page for the 
sentinel value (i.e., "Sorry, no more comments") in order to determine 
when to stop requesting pages. For my comment history that's ~750 pages 
to parse ~11,000 comments.


I have 20 read_threads requesting and putting pages into the output 
queue that is the input_queue for the parser. My soup_threads can get 
items from the queue, but BeautifulSoup doesn't do anything after that.


Chris R.
--
https://mail.python.org/mailman/listinfo/python-list


Re: BeautifulSoup doesn't work with a threaded input queue?

2017-08-27 Thread Christopher Reimer via Python-list

On 8/27/2017 1:50 PM, MRAB wrote:
What if you don't sort the list? I ask because it sounds like you're 
changing 2 variables (i.e. list->queue, sorted->unsorted) at the same 
time, so you can't be sure that it's the queue that's the problem.


If I'm using a list, I'm using a for loop to input items into the parser.

If I'm using a queue, I'm using worker threads to put or get items.

The item is still the same whether in a list or a queue.

Chris R.
--
https://mail.python.org/mailman/listinfo/python-list


Re: BeautifulSoup doesn't work with a threaded input queue?

2017-08-27 Thread Christopher Reimer via Python-list
Ah, shoot me. I had a .join() statement on the output queue but not on 
in the input queue. So the threads for the input queue got terminated 
before BeautifulSoup could get started. I went down that same rabbit 
hole with CSVWriter the other day. *sigh*


Thanks for everyone's help.

Chris R.
--
https://mail.python.org/mailman/listinfo/python-list