Re: [Imdbpy-help] imdbpy2sql 4.7 - invalid byte sequence for encoding "UTF8"

Davide Alberani Sat, 16 Apr 2011 08:01:48 -0700

On Wed, Apr 13, 2011 at 08:46, darklow <[email protected]> wrote:
> Maybe someone knows some fast dirty fix at least how to skip such invalid
> byte sequence strings while there are no official fix, so i can finish the
> import?
> Can we detect invalid byte characters?


Hi again,
actually my problem is that I'm unable to reproduce this bug. :-)
Using Postgresql and SQLObject, my run goes on smooth.

I have downloaded the 'actors.list.gz' file today, so it's possible that some
garbage was removed.

Anyway, the previously proposed solution was obviously flawed, since
the problem was on _character_ names.

So, let's edit again the imdbpy2sql.py file and change the lines around 1540
so that they become:

        movieid = CACHE_MID.addUnique(title)
        if role is not None:
            roles = filter(None, [x.strip() for x in role.split('/')])
            for role in roles:
                role = role.replace('\xec\x8c\xa0', '')  # TEMPORARY FIX
                cid = CACHE_CID.addUnique(role)
                sqldata.add((pid, movieid, cid, note, order))

Maybe this will help... who knows? :-)

-- 
Davide Alberani <[email protected]>  [PGP KeyID: 0x465BFD47]
http://www.mimante.net/

------------------------------------------------------------------------------
Benefiting from Server Virtualization: Beyond Initial Workload 
Consolidation -- Increasing the use of server virtualization is a top
priority.Virtualization can reduce costs, simplify management, and improve 
application availability and disaster protection. Learn more about boosting 
the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev
_______________________________________________
Imdbpy-help mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/imdbpy-help

Re: [Imdbpy-help] imdbpy2sql 4.7 - invalid byte sequence for encoding "UTF8"

Reply via email to