Question about concatenation error

2005-09-07 Thread colonel
I am new to python and I am confused as to why when I try to
concatenate 3 strings, it isn't working properly. 

Here is the code:

--
import string
import sys
import re
import urllib

linkArray = []
srcArray = []
website = sys.argv[1]

urllib.urlretrieve(website, 'getfile.txt')   

filename = "getfile.txt"
input = open(filename, 'r')  
reg1 = re.compile('href=".*"')   
reg3 = re.compile('".*?"')   
reg4 = re.compile('http')
Line = input.readline() 

while Line:  
searchstring1 = reg1.search(Line)
if searchstring1:
rawlink = searchstring1.group()  
link = reg3.search(rawlink).group()  
link2 = link.split('"')  
cleanlink = link2[1:2]   
fullink = reg4.search(str(cleanlink))
if fullink:
linkArray.append(cleanlink)  
else:
cleanlink2 = str(website) + "/" + str(cleanlink)
linkArray.append(cleanlink2)
Line = input.readline()   

print linkArray
---

I get this:

["http://www.slugnuts.com/['index.html']",
"http://www.slugnuts.com/['movies.html']",
"http://www.slugnuts.com/['ramblings.html']",
"http://www.slugnuts.com/['sluggies.html']",
"http://www.slugnuts.com/['movies.html']"]

instead of this:

["http://www.slugnuts.com/index.html]";,
"http://www.slugnuts.com/movies.html]";,
"http://www.slugnuts.com/ramblings.html]";,
"http://www.slugnuts.com/sluggies.html]";,
"http://www.slugnuts.com/movies.html]";]

The concatenation isn't working the way I expected it to.  I suspect
that I am screwing up by mixing types, but I can't see where...

I would appreciate any advice or pointers.

Thanks.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Question about concatenation error

2005-09-07 Thread colonel
On Wed, 07 Sep 2005 16:34:25 GMT, colonel <[EMAIL PROTECTED]>
wrote:

>I am new to python and I am confused as to why when I try to
>concatenate 3 strings, it isn't working properly. 
>
>Here is the code:
>
>--
>import string
>import sys
>import re
>import urllib
>
>linkArray = []
>srcArray = []
>website = sys.argv[1]
>
>urllib.urlretrieve(website, 'getfile.txt')   
>
>filename = "getfile.txt"
>input = open(filename, 'r')  
>reg1 = re.compile('href=".*"')   
>reg3 = re.compile('".*?"')   
>reg4 = re.compile('http')
>Line = input.readline() 
>
>while Line:  
>searchstring1 = reg1.search(Line)
>if searchstring1:
>rawlink = searchstring1.group()  
>link = reg3.search(rawlink).group()  
>link2 = link.split('"')  
>cleanlink = link2[1:2]   
>fullink = reg4.search(str(cleanlink))
>if fullink:
>linkArray.append(cleanlink)  
>else:
>cleanlink2 = str(website) + "/" + str(cleanlink)
>linkArray.append(cleanlink2)
>Line = input.readline()   
>
>print linkArray
>---
>
>I get this:
>
>["http://www.slugnuts.com/['index.html']",
>"http://www.slugnuts.com/['movies.html']",
>"http://www.slugnuts.com/['ramblings.html']",
>"http://www.slugnuts.com/['sluggies.html']",
>"http://www.slugnuts.com/['movies.html']"]
>
>instead of this:
>
>["http://www.slugnuts.com/index.html]";,
>"http://www.slugnuts.com/movies.html]";,
>"http://www.slugnuts.com/ramblings.html]";,
>"http://www.slugnuts.com/sluggies.html]";,
>"http://www.slugnuts.com/movies.html]";]
>
>The concatenation isn't working the way I expected it to.  I suspect
>that I am screwing up by mixing types, but I can't see where...
>
>I would appreciate any advice or pointers.
>
>Thanks.


Okay.  It works if I change:

fullink = reg4.search(str(cleanlink))
if fullink:
linkArray.append(cleanlink)  
else:
cleanlink2 = str(website) + "/" + str(cleanlink)

to

fullink = reg4.search(cleanlink[0])
if fullink:
linkArray.append(cleanlink[0])  
else:
cleanlink2 = str(website) + "/" + cleanlink[0]


so can anyone tell me why "cleanlink" gets coverted to a list?  Is it
during the slicing?


Thanks.
-- 
http://mail.python.org/mailman/listinfo/python-list