This failing test seems to be an issue with Python itself, rather than
Scrapy. Suggest just changing the test to match Python behavior.

This code calls through to w3lib.encoding.to_unicode, which just boils
down to this:

b"\xef\xbb\xbfWORD\xe3\xab".decode('utf-8', 'replace')

In which we can see the same results as the test:

On python 2:
>>> b"\xef\xbb\xbfWORD\xe3\xab".decode('utf-8', 'replace')
u'\ufeffWORD\ufffd'

On python 3:
>>> b"\xef\xbb\xbfWORD\xe3\xab".decode('utf-8', 'replace')
'\ufeffWORD�'

This bug is keeping python3-scrapy out of testing, can we just update
the test to accept this behavior?

Reply via email to