On Mar 19, Michael Liu <[email protected]> wrote:
> I used imdbpy2sql to populate a local database with the IMDB data, but
> have some questions about the data.
Hi!
I take this as an opportunity to write some FAQs to put in the
documentation, since these questions came up a lot, lately. :-)
> In the titles table, the column imdb_id is empty. Am I missing a file
> I needed to download to fill this in? How can I get imdb_ids?
Q2: why the movieID (and other IDs) used in the 'sql' database are not
the same used on the IMDb.com site?
A2: first, a bit of nomenclature: we'll call "movieID" (or things like
"personID", for instance of the Person class) a unique identifier used
by IMDbPY to manage a single movie (or other kinds of object).
We'll call "imdbID" a unique identifier used, for the same kind
of data, by the IMDb.com site (i.e.: the 7-digit number in tt0094226,
as seen in the URL for "The Untouchables").
Using IMDbPY to access the web ('http' and 'mobile' data access
systems), movieIDs and imdbIDs are the same thing - beware that
in this case a movieID is a string, with the leading zeroes.
Unfortunately, populating a sql database with data from the plain
text data files, we don't have access to imdbIDs - since they are
not distributed at all - and so we have to made them by ourselves
(they are the 'id' column in tables like 'title' or 'name').
This mean that these values are valid only for your current database:
if you update it with a newer set of plain text data files, these IDs
will surely change (and, by the way, they are integers).
It's also obvious, now, that you can't exchange IDs between the
'http' (or 'mobile') data access system and 'sql', and in the same
way you can't use imdbIDs with your local database or vice-versa.
Q3: using a sql database, what's the imdb_id (or something like that)
column in tables like 'title', 'name' and so on?
A3: it's internally used by IMDbPY to remember the imdbID (the one
used by the web site - accessing the database you'll use the numeric
value of the 'id' column, as movieID) of a movie, once it stumbled
upon. This way, if IMDbPY is asked again about the imdbID of
a movie (or person, or ...), it doesn't have to contact again to
the web site. Notice that you have to access the sql database using
a user with write permission, to update it.
As a bonus, when possible, the values of these imdbIDs are saved
between updates of the sql database (using the imdbpy2sql.py script).
Beware that it's tricky and not always possible, but the script does
its best to succeed.
Q4: but what if I really need the imdbIDs, to use my database?
A4: no, you don't. Search for a title, get its information. Be happy!
Q5: I have a great idea: write a script to fetch all the imdbID from the
web site! Can't you do it?
A5: yeah, I can. But I won't. :-)
It would be somewhat easy to map every title on the web to its
imdbID, but there are still a lot of problems.
First of all, every user will end up doing it for its own copy
of the plain text data files (and this will make the imdbpy2sql.py
script painfully slow and prone to all sort of problems).
Moreover, the imdbIDs are unique and never reused, true, but movie
title _do_ change: to fix typos, override working titles, to cope
with a new movie with the same title release in the same year (not
to mention cancelled or postponed movies).
Besides that, we'd have to do the same for persons, characters and
companies. Believe me: it doesn't make sense.
Work on your local database using your movieIDs (or even better:
don't mind about movieIDs and think in terms of searches and Movie
instances!) and retrieve the imdbID only in the rare circumstances
when you really need them (see the next FAQ).
Repeat with me: I DON'T NEED ALL THE imdbIDs. :-)
> Without the imdb_id, is it possible for me to generate a link to a
> given movie on IMDB?
Q6: using a sql database, how can I convert a movieID (whose value
is valid only locally) to an imdbID (the ID used by the imdb.com site)?
A6: various functions can be used to convert a movieID (or personID or
other IDs) to the imdbID used by the seb site.
Example of code:
from imdb import IMDb
ia = IMDb('sql', uri=URI_TO_YOUR_SQL_DATABASE)
movie = ia.search_movie('The Untouchables')[0] # a Movie instance.
print 'The movieID for The Untouchables:', movie.movieID
print 'The imdbID used by the site:', ia.get_imdbMovieID(movie.movieID)
print 'Same ID, smarter function:', ia.get_imdbID(movie)
It goes without saying that get_imdbMovieID has some sibling
methods: get_imdbPersonID, get_imdbCompanyID and get_imdbCharacterID.
Also notice that the get_imdbID method is smater, and takes any kind
of instance (the other functions need a movieID, personID, ...)
Another method that will try to retrieve the imdbID is get_imdbURL,
which works like get_imdbID but returns an URL.
In case of problems, these methods will return None.
> Also, the online IMDB is aware of which titles are adult movies, but I
> don't see any similar column in my local database. How can I determine
> whether a movie is adult or not?
Read README.adult and see imdb/parser/sql/__init__.py: searching for
a title, it tries to guess if it's an adult title.
It can't be perfect and I don't assume any kind of responsibilities
on this matter. ;-)
> Lastly, online IMDB seems to know which movies are and aren't
> available on Amazon and Blockbuster. Is that in the database
> somewhere?
No.
Accessing the web ('http' and 'mobile'), there are parsers for
the 'amazon reviews' page, but these information are not published
in the plain text data files.
HTH,
--
Davide Alberani <[email protected]> [GPG KeyID: 0x465BFD47]
http://www.mimante.net/
------------------------------------------------------------------------------
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Imdbpy-help mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/imdbpy-help