from:"Daniel Bosah"

[Tutor] Sklearn

2017-03-10 Thread Daniel Bosah

Can someone explain sklearns to me? I'm a novice at Python, and I would
like to use machine learning in my coding. But aren't there libraries like
matplotlib I can already use? Why use sklearns?
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

[Tutor] Problems with pytz module

2017-07-19 Thread Daniel Bosah

I'm learning about OOP programming in Python.
This is my code from my online course.

import datetime
import pytz
class Account:
"""Simple account class with balance"""
def __init__(self,name,balance):
self.name = name
self.balance = balance
self.transaction_list = []
print "Account created for " + self.name

def deposit(self,amount):
if amount > 0:
self.balance += amount
self.show_balance()

self.transaction_list.append((pytz.utc.localize(datetime.datetime.utcnow()),amount))
# appends traction details to list

def withdrawl(self,amount):
 if 0 < amount <= self.balance:
self.balance -= amount
 else:
print "The account must be greater then zero and no more then
your account balance"
 self.show_balance()

def show_balance(self):
print "Balance is {}".format(self.balance)

def show_transactions(self):
for date, amount in self.transaction_list:
if amount > 0:
tran_type = "deposited"
else:
tran_type = "withdrawn"
amount *= -1 # to show negative number
print "{:6} {} on {} (local time was {})".format(amount,
tran_type, date, date.astimezone())

if __name__ == '__main__':
tim = Account("Tim", 0)
tim.show_balance()


tim.deposit(1000)
tim.show_balance()
tim.withdrawl(500)
tim.show_transactions()

tim.show_balance()



Im using Ubuntu Linux. My problem is that I cannot get show_transactions to
print out on the console. I suspect that I cannot use pytz ( as I'm using
Python 2.7) . Im trying to get the date and time of the transaction (as
shown on this line -

print "{:6} {} on {} (local time was {})".format(amount, tran_type, date,
date.astimezone())

But it will. Is there a workaround for pytz, or another problem that I am
missing?
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

[Tutor] PYTZ Package not working.

2017-07-20 Thread Daniel Bosah

I'm trying to get my function to work:

 def deposit(self,amount):
if amount > 0:
self.balance += amount
self.show_balance()

self.transaction_list.append((pytz.utc.localize(datetime.datetime.utcnow()),amount))
# appends traction details to list

def show_transactions(self):
for date, amount in self.transaction_list:
if amount > 0:
tran_type = "deposited"
else:
tran_type = "withdrawn"
amount *= -1 # to show negative number
print "{:6} {} on {} (local time was {})".format(amount,
tran_type, date, date.astimezone())


But it doens't show on the console at all. Theres no error message,
show_transactions just doesn't show up in the console, which is weird. I
think it may be the way I installed my package, but last time I checked,
it's installed in the right place.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

[Tutor] Python Daemons

2017-08-01 Thread Daniel Bosah

I'm following an online tutorial about threading. This is the code I've
used so far:

import socket
import threading
from queue import Queue



print_lock = threading.Lock()

target = 'pythonprogramming.net'

def portscan(port):
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
try:
   con = s.connect((target,port)) # con for connect
   with print_lock: # if sucessful run with statement
  print 'port', port, 'is open'
   con.close() # closes connection
except:
   pass #if it doesn't work pass the method

def threader():
while True:
worker = q.get()
portscan(worker)
q.task_done
q = Queue()
for x in range(30):
t = threading.Thread(target = threader() #creates a thread, gets
workers from q, set them to work on portscanning
t.daemon() = True # want it to be a daemon
t.start()
#jobs = ports
for worker in range(1,101): # port zero invalid port
q.put(worker) # puts worker to work
q.join() #waits till thread terminiates


I don't know what a Daemon is, and I also don't know how to use it in
Python 2.7. Apparently its built in Python 3, but  I don't know how to use
it in Python 2.7. Any help would be appreciated.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

[Tutor] Data Structures printing and Python Package creation

2018-02-03 Thread Daniel Bosah

I'm in a research group for school, and my first task is to learn how to
make a Python package and to learn how to print out all types of data
structures. Are there resources I can be pointed to to help me out.

Thanks
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

[Tutor] How to get all previous revision entries of a wikipedia page?

2018-02-16 Thread Daniel Bosah

Hello,

I'm doing research for a compsci group. I'm new at Python, and my task is
the use the Wikipedia API to get all the previous revision entries of a
Wikipedia page and collect them in one list.

Now, I'm totally lost on how to do this. I have never used a API before,
and I'm not sure how to use the Wikipedia API. Is there any resource anyone
can point me to to help me do this? To not only use the API but to also
parse through all the previous edits?

Thanks
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

[Tutor] How to Load Every Revised Wikipedia Page Revision

2018-02-19 Thread Daniel Bosah

Good day,

I'm doing research for a compsci group. I have a script that is supposed to
load every revised page of a wikipedia article on FDR.

This script is supposed to, in while loop
 access the wikipedia api and using the request library,
access the api
if the continue is in the requests
update the query dict with continue
keep updating until there are no more 'continue' ( or until the API load
limit is reached )
else
break

Here is the code:



def GetRevisions():
url = "https://en.wikipedia.org/w/api.php"; #gets the api and sets it to
a variable
query = {
"format": "json",
"action": "query",
"titles": "Franklin D. Roosevelt",
"prop": "revisions",
"rvlimit": 500,
}# sets up a dictionary of the arguments of the query

while True: # in  a while loop
r = requests.get(url, params = query).json() # does a request call
for the url in the parameters of the query
print repr(r) #repr gets the "offical" string output of a object
if 'continue' in r: ## while in the loop, if the keyword is in "r"
query.update(r['continue']) # updates the dictionary to include
continue in it, and keeps on printing out all instances of 'continue"
else: # else
   break # quit loop



I want to load every page version with the revisions of the wikipedia page,
not just the info about the page revision. How can I go about that?

Thanks
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

[Tutor] Matplotlib scatterplot help

2018-04-30 Thread Daniel Bosah

I have a function in which returns scatterplot of a Isomap
function, which takes output from a
TF-IDF  function, which
calculated TF-IDF values of certain articles online. I used four articles
and I want to show the 4 articles by a 3D scatterplot.

Below is the function to turn my Isomap values to a 3D scatterplot :

def Isomap(tfidf):
  jon = pd.read_csv(tfidf)
  le = preprocessing.LabelEncoder()
  tims = jon.apply(le.fit_transform)
  iso = manifold.Isomap(n_neighbors=2, n_components=3)
  john = iso.fit_transform(tims)
  fig = plt.figure()
  ax = fig.add_subplot(111, projection='3d')
  use_colors = 'rybg'
  ax.scatter(john[:,0], john[:,1],john[:,2],color=use_colors,alpha=.5) #
x,y,z coord. jon 1-3
  plt.title('Isomap of candiates')
  plt.xlabel('x')
  plt.ylabel('y')
  plt.show()
  plt.savefig('isomap.png')

The problem is that I usually only get one color returned. And even if I
get the code to print out 4 colors, I'm not sure how to get those colors
to  correspond to the four web articles.

Thanks for the help in advance.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

[Tutor] Figuring out selective actions in Python

2018-05-05 Thread Daniel Bosah

Hello,

I'm trying to figure out how to do blank in blank things. For example, if I
want to delete 5 MB ( or anything ) for every 20 MB, how would the could
look like? I'm essentially trying to do an action in one order of the
sequence out of an entire sequence.


Thank you for your help
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

[Tutor] Recursion depth exceeded in python web crawler

2018-06-14 Thread Daniel Bosah

I am trying to modify code from a web crawler to scrape for keywords from
certain websites. However, Im trying to run the web crawler before  I
modify it, and I'm running into issues.

When I ran this code -




*import threading*
*from Queue import Queue*
*from spider import Spider*
*from domain import get_domain_name*
*from general import file_to_set*


*PROJECT_NAME = "SPIDER"*
*HOME_PAGE = "https://www.cracked.com/ "*
*DOMAIN_NAME = get_domain_name(HOME_PAGE)*
*QUEUE_FILE = '/home/me/research/queue.txt'*
*CRAWLED_FILE = '/home/me/research/crawled.txt'*
*NUMBER_OF_THREADS = 1*
*#Captialize variables and make them class variables to make them const
variables*

*threadqueue = Queue()*

*Spider(PROJECT_NAME,HOME_PAGE,DOMAIN_NAME)*

*def crawl():*
*change = file_to_set(QUEUE_FILE)*
*if len(change) > 0:*
*print str(len(change)) + 'links in the queue'*
*create_jobs()*

*def create_jobs():*
*for link in file_to_set(QUEUE_FILE):*
*threadqueue.put(link) #.put = put item into the queue*
*threadqueue.join()*
*crawl()*
*def create_spiders():*
*for _ in range(NUMBER_OF_THREADS): #_ basically if you dont want to
act on the iterable*
*vari = threading.Thread(target = work)*
*vari.daemon = True #makes sure that it dies when main exits*
*vari.start()*

*#def regex():*
*#for i in files_to_set(CRAWLED_FILE):*
*  #reg(i,LISTS) #MAKE FUNCTION FOR REGEX# i is url's, LISTs is
list or set of keywords*
*def work():*
*while True:*
*url = threadqueue.get()# pops item off queue*
*Spider.crawl_pages(threading.current_thread().name,url)*
*threadqueue.task_done()*

*create_spiders()*

*crawl()*


That used this class:

*from HTMLParser import HTMLParser*
*from urlparse import urlparse*

*class LinkFinder(HTMLParser):*
*def _init_(self, base_url,page_url):*
*super()._init_()*
*self.base_url= base_url*
*self.page_url = page_url*
*self.links = set() #stores the links*
*def error(self,message):*
*pass*
*def handle_starttag(self,tag,attrs):*
*if tag == 'a': # means a link*
*for (attribute,value) in attrs:*
*if attribute  == 'href':  #href relative url i.e not
having www*
*url = urlparse.urljoin(self.base_url,value)*
*self.links.add(url)*
*def return_links(self):*
*return self.links()*


And this spider class:



*from urllib import urlopen #connects to webpages from python*
*from link_finder import LinkFinder*
*from general import directory, text_maker, file_to_set, conversion_to_set*


*class Spider():*
* project_name = 'Reader'*
* base_url = ''*
* Queue_file = ''*
* crawled_file = ''*
* queue = set()*
* crawled = set()*


* def __init__(self,project_name, base_url,domain_name):*
* Spider.project_name = project_name*
* Spider.base_url = base_url*
* Spider.domain_name = domain_name*
* Spider.Queue_file =  '/home/me/research/queue.txt'*
* Spider.crawled_file =  '/home/me/research/crawled.txt'*
* self.boot()*
* self.crawl_pages('Spider 1 ', base_url)*

* @staticmethod  *
* def boot():*
*  directory(Spider.project_name)*
*  text_maker(Spider.project_name,Spider.base_url)*
*  Spider.queue = file_to_set(Spider.Queue_file)*
*  Spider.crawled = file_to_set(Spider.crawled_file)*
* @staticmethod*
* def crawl_pages(thread_name, page_url):*
*  if page_url not in Spider.crawled:*
*  print thread_name + 'crawling ' + page_url*
*  print 'queue' + str(len(Spider.queue)) + '|crawled' +
str(len(Spider.crawled))*
*  Spider.add_links_to_queue(Spider.gather_links(page_url))*
*  Spider.crawled.add(page_url)*
*  Spider.update_files()*
* @staticmethod*
* def gather_links(page_url):*
*  html_string = ''*
*  try:*
*  response = urlopen(page_url)*
*  if 'text/html' in response.getheader('Content Type'):*
*  read = response.read()*
*  html_string = read.decode('utf-8')*
*  finder = LinkFinder(Spider.base_url,page_url)*
*  finder.feed(html_string)*
*  except:*
*   print 'Error: cannot crawl page'*
*   return set()*
*  return finder.return_links()*

* @staticmethod*
* def add_links_to_queue(links):*
*for i in links:*
*if i in Spider.queue:*
*continue*
*if i in Spider.crawled:*
*continue*
*   # if Spider.domain_name != get_domain_name(url):*
*#continue*
*Spider.queue.add()*
* @staticmethod*
* def update_files():*
*conversion_to_set(Spider.queue,Spider.Queue_file)*
*conversion_t

[Tutor] Parsing and collecting keywords from a webpage

2018-06-20 Thread Daniel Bosah

# coding: latin-1
from bs4 import BeautifulSoup
from urllib.request import urlopen
import re

#new point to add... make rest of function then compare a list of monuments
notaries ( such as blvd, road, street, etc.) to a list of words containing
them. if contained, pass into new set ( ref notes in case)


def regex(url):

  html = urlopen(url).read()
  soup = BeautifulSoup(html,"lxml") # why does lmxl fix it?
  sets = []

  john = [u'Julia Alvarez', u'Arambilet',  u'Frank Baez',u'Josefina Baez'
u'Rei Berroa',u'Manuel del Cabral', u'Junot Díaz', u'Luis Arambilet'
u'Manuel del Cabral', u'Manuel del Cabral' u'Aída Cartagena Portalatín',
u'Roberto Cassá', 'Raquel Cepeda',u'Tulio Manuel Cestero',u'Hilma
Contreras', u'Angie Cruz', u'Judith Dupré',u'Virginia Elena Ortea',u'León
Félix Batista', u'Arturo Féliz-Camilo',u'Fabio Fiallo',u'Freddy
Ginebra',u'Cristino Gómez',u'José Luis González',u'Pedro Henríquez
Ureña',u'Federico Henríquez y Carvajal',u'Angela Hernández Nuñez',u'Juan
Isidro Jiménez Grullón',u'Rita Indiana',u'Mariano Lebrón Saviñón',u'Marcio
Veloz Maggiolo',u'Andrés L. Mateo', u'Félix Evaristo Mejía',u'Miguel D.
Mena',u'Leopoldo Minaya', u'Juan Duarte',u'Rafael Alburquerque',u'Pedro
Franco Badía ',u'Buenaventura Báez Méndez',u'Joaquín Balaguer
Ricardo',u'Ramón Emeterio Betances',u'Salvador Jorge Blanco',u'Tomás
Bobadilla',u'Juan Bosch y Gaviño',u'Francisco Alberto Caamaño
Deñó',u'Fernando Cabrera',u'Ramón Cáceres',u'Margarita Cedeño de
Fernández',u'David Collado', u'Lorraine Cortés-Vázquez',u'Adriano
Espaillat',u'Juan Pablo Duarte',u'Rafael Espinal',u'Rafael Estrella
Ureña',u'Carlos Felipe Morales', u'Leonel Fernández Reyna',u'Pedro
Florentino',u'Maximiliano Gómez',u'Máximo Gómez',u'Petronila Angélica
Gómez',u'Antonio Guzmán Fernández',u'Ulises Heureaux',u'Antonio Imbert
Barrera',u'Gregorio Luperón',u'Miguel Martinez',u'Danilo Medina',u'Hipólito
Mejía',u'Ramón Matías Mella',u'Patria Mirabal',u'Minerva Mirabal',u'María
Teresa Mirabal',u'Adolfo Alejandro Nouel',u'José Nuñez-Melo',u'José
Francisco Peña Gómez', u'Joseline Peña-Melnyk',u'Cesar A. Perales',u'Thomas
Perez',u'Donald Reid Cabral',u'Ydanis Rodríguez',u'José Antonio (Pepillo)
Salcedo',u'Pepillo',u'Roberto Salcedo, Sr.',u'Juan Sánchez
Ramírez',u'Francisco del Rosario Sánchez',u'José Santana', u'Pedro Santana
Familias',u'José Del Castillo Saviñón',u'Angel Taveras', u'Rafael Leónidas
Trujillo',u'Ramfis Trujillo',u'Francisco Urena',u'Fernando Valerio',
u'Elias Wessin y Wessin blvd']

  jake = [u'Pedro Mir',u'Domingo Moreno Jimenes',u'Mateo Morrison',u'José
Núñez de Cáceres',u'Arturo Rodríguez Fernández',u'Mu-Kien Adriana
Sang',u'Rosa Silverio',u'Alfredo Fernández Simó',u'Salomé Ureña',u'Jael
Uribe',u'Bernardo Vega',u'Julio Vega Batlle',u'Alanna Lockward',u'Delia
Weber', u'blvd'] #llist of words , only set




  paul = jake + john

  new_list = [x.encode('latin-1') for x in sorted(paul)]

  search = "(" + b"|".join(new_list).decode() + ")" + "" #re.complie needs
string as first argument, so adds string to be first argument, and joins
the strings together with john

 # print (type(search))
  pattern = re.compile(search)#compiles search to be a regex object
  reg = pattern.findall(str(soup))#calls findall on pattern, which findall
returns all non-overllapping matches of the pattern in string, returning a
list of strings

  for i in reg:
 if i in reg and paul: # this loop checks to see if elements are in
both the regexed parsed list and the list. If i is in both, it is added to
list.
sets.append(str(i))
with open('sets.txt', 'w') as f:
f.write(str(sets))
f.close()


def regexparse(regex):
monum = [u'road', u'blvd',u'street', u'town', u'city',u'Bernardo Vega']
setss = []

f = open('sets.txt', 'rt')
f = list(f)
for i in f:
   if i in f and i in monum:
  setss.append(i)
#with open ('regex.txt','w') as q:
#q.write(str(setss))
   # q.close()
print (setss)


if __name__ == '__main__':
   regexparse(regex('
https://en.wikipedia.org/wiki/List_of_people_from_the_Dominican_Republic'))


What this code is doing is basically going through a webpage using
BeautifulSoup and regex to compare a regexed list of words ( in regex ) to
a list of keywords and then writing them to a textfile. The next function
(regexparse) then goes and has a empty list (setss), then reads the
textfile from the previous function.  What I want to do, in a for loop, is
check to see if words in monum and the textfile ( from the regex function )
are shared, and if so , those shared words get added to the empty
list(setss) , then written to a file ( this code is going to be added to a
web crawler, and is basically going to be adding words and phrases to a
txtfile as it crawls through the internet. ).

However, every time I run the current code, I get all the
textfile(sets.txt) from the previous ( regex ) function, even though all I
want are words and pharse shar

[Tutor] How would I replace the data of a Pandas Series with the values of a dictionary?

2019-07-16 Thread Daniel Bosah

Hi all,

I have a problem trying to match items in a dict and pandas series in
Python.

I have a dict ( called city_dict )of cities and city_id's ; for each city (
which is a key in the dict ), a unique city_id is a value in that dict.

So for example, city_dict = { New York : 1001, LA : 1002, Chicago : 1003 }.
New York is a key, 1001 is a value.

Now I have a panda Series called dfCities. In this series is a bunch of
cities, including the cities in city_dict.

My goal is to replace the cities in dfCities with the city_id's in a brand
new csv file. So if dfCities has New York in it, I want to replace it with
it's value in the dictionary, so 1001.


Approaches I've tried - checking to see if the keys  match the cities in
dfCities in a  'if in' statement ( such as "if city_dict.keys() in
dfSeries"), and then doing a straight replace ( can't do that since series
are ambiguous in truth values).  Therefore I tried using .any() for Pandas
series (since .all() would strictly want all values in dfCities to match,
and all values don't match )

Afterwards, tried to directly match the series with keys and the clarified
truth value series, but dict_keys are unhashable, so I had to convert the
keys to str and see if I could compare strings ( with a stringified
dfCities )

Then I realized that even if I can get a if statement to start checking
(if dfCities.str.contains(keyss).any(): ) (keyss being the stringified
version of the keys for city_dict ), I don't know how to build a approach
to cross check the values of city_dict with the cities in dfCities ( I have
a vague notion that I should check if the keys of city_dict match with
dfCities, and then replace the cities in dfCities with the values of
city_dict in a new csv file output. However, I don't know how to replace
data in a Series with vaues of a dict ).

So I would like to ask the community what approach I can take to build to
that piece of the puzzle. I feel I have most of the solution, but I'm
missing something.

Thanks for reading and I appreciate the help.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

[Tutor] Sklearn

[Tutor] Problems with pytz module

[Tutor] PYTZ Package not working.

[Tutor] Python Daemons

[Tutor] Data Structures printing and Python Package creation

[Tutor] How to get all previous revision entries of a wikipedia page?

[Tutor] How to Load Every Revised Wikipedia Page Revision

[Tutor] Matplotlib scatterplot help

[Tutor] Figuring out selective actions in Python

[Tutor] Recursion depth exceeded in python web crawler

[Tutor] Parsing and collecting keywords from a webpage

[Tutor] How would I replace the data of a Pandas Series with the values of a dictionary?

12 matches

Site Navigation

Mail list logo

Footer information