>
> The attribute match should be completely contained within the square
> brackets:
>
Oh, that's a pretty gross mistake. OK, syntax is noted. I won't make
that mistake again.
Additionally, notice that the anchor tags you're trying to get are *not*
children
> of the divs you're selecting.
> They are *siblings*, which means they are at the same "level" in the
> markup hierarchy as the divs.
>
Aiee! Yeah, totally understood. The reason why I focused on the sibling
div (lesson-status-icon) is because I couldn't find anything to "grab" onto
in the tag, although now that I think about it, maybe something like:
response.xpath("//a[@href='/lessons/*'"]
would get me the list I wanted without resorting to the roundabout method
of finding the sibling tag and going one past it. Thanks for the lesson --
you've given me something to experiment and play with. Much appreciated!
Pete
On Wednesday, January 25, 2017 at 4:29:59 PM UTC-5, Joey Espinosa wrote:
>
> First, this part is wrong:
>
> for div in response.xpath("//div[@class]='lesson-status-icon'"):
>
> The attribute match should be completely contained within the square
> brackets:
>
> for div in response.xpath("//div[@class='lesson-status-icon']"):
>
> Additionally, notice that the anchor tags you're trying to get are *not*
> children
> of the divs you're selecting. They are *siblings*, which means they are
> at the same "level" in the markup hierarchy as the divs. If you are
> insistent on selecting those divs (maybe because they're more reliably
> selectable to you?), then you can use the "following-sibling" selector:
>
> for anchor in
> response.xpath("//div[@class='lesson-status-icon']/following-sibling::a/@href"):
> print anchor.extract()
>
> I can't check it right now, but give that a shot.
>
>
> On Tue, Jan 24, 2017 at 9:32 PM Peter <[email protected] <javascript:>>
> wrote:
>
>> Trying to scrape some URLs from this page (the stuff highlighted in
>> yellow is what I'm looking for):
>>
>>
>>
>>
>>
>> I didn't quite understand the section on selectors and XPath, but this
>> was my attempt at getting those URLs:
>>
>>
>> def grab_page(self, response):
>> for div in response.xpath("//div[@class]='lesson-status-icon'"):
>> print( div.xpath("a[@href]").extract() )
>> print( div.xpath("a[@href]/text()").extract() )
>> print( div.extract() )
>> for div in response.xpath("//div[@class]='lesson-status-icon'").
>> xpath("/a[@href]"):
>> print( div.text().extract() )
>>
>>
>> I'm flailing and drowning. Can someone please put me on the right path?
>> What's the right syntax to grab the URLs?
>>
>>
>> Thanks!!!
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> <https://lh3.googleusercontent.com/-MeiUXL6STxs/WIgIqtFhtPI/AAAAAAAACHU/KO9L90WRgMI9xKA2azq9upycDjQYXxo4ACLcB/s1600/cpod.jpg>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "scrapy-users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected] <javascript:>.
>> To post to this group, send email to [email protected]
>> <javascript:>.
>> Visit this group at https://groups.google.com/group/scrapy-users.
>> For more options, visit https://groups.google.com/d/optout.
>>
> --
> Respectfully,
>
> *Joey Espinosa*
> Chief Technology Officer
> *Vote.org* <https://www.vote.org/>
> Phone: (305) 747-1711
>
--
You received this message because you are subscribed to the Google Groups
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.