On May 10, Gozde Ozbal <[email protected]> wrote:

> I have also made some changes about performance in MySQL
> configuration.

Good; another possible speed-up is to use imdbpy2sql.py to dump a set
of CSV files, later imported into the database.  See README.sqldb
for the details.

By the way, the plain text data files still contain some file
(complete-crew, distributors, keywords, miscellaneous-companies
and special-effects-companies at this moment) with movie titles in
the old format (IMDb recently switched from "Title, The" to "The Title").

Running imdbpy2sql.py you should use the --fix-old-style-titles
argument so that every title is converted to the new format.

> I am intending to use IMDbPY for my thesis work,

Cool!  Seems to be one of the more popular use of IMDbPY, these days. :-)

Maybe you can be interested in the Hollywood Informatics group,
where some IMDbPY developers and users are trying to explore
strange uses of the IMDb data. :-)
It's a new initiative and so nothing was done, yet, but it can
be a good place to share some ideas (not about IMDbPY itself:
there are already these mailing list, for help and development).

> in which I am designing and implementing ReMovender, an intelligent
> web based movie recommendation system. And I need to keep my movie
> data up to date by running some scripts from the user interface.

I'm not sure to have completely understood how it will work and
what kind of data you need to keep up-to-date.
Can you provide a short example of what an user will do?

> First I have thought of crawling http://www.imdb.com/nowplaying/. But
> after realizing that the movies in this page are already in the
> imdb database, I've found out that it would take so much time for
> this page to help me with obtaining the up-to-date data.

You can parse that page (or http://italian.imdb.com/Recent/ ) to get
a list of movies that are in theaters; after that, if you need complete
information about a movie, you can use IMDbPY.
Beware that the movieIDs internally used by IMDbPY to uniquely identify
a movie are not the same for the web and the SQL database: you see
that Star Trek (2009) has a movieID=0796366 on the web (and it's the
same if you access its data using the 'http' and 'mobile' data
access systems of IMDbPY), but with the 'sql' data access systems
it will have another integer ID.
That's because IMDb doesn't distribute a map from titles to
movieIDs in the plain text data files (obviously you can search
for the complete title and get the first result: 99.9% of the times
it will lead to the same movie).

> So I need to be aware just when the database is updated.

The IMDb's database?
The web pages are _constantly_ updated, so you can't tell for sure
if something new was added.
The plain text data files are update once a week (but not every file
every week).

> Do you think that IMDbPY can be helpful for this issue? Maybe,
> I can provide a user interface to the administrator of my system
> so that he/she can update the movie data properly.

IMDbPY can be useful to get easy access to information about
movies, persons, characters and companies; maybe it's what you
need, maybe not. :-)

> http://imdbpy.sourceforge.net/docs/README.sqldb.txt also mentions
> about diffs files of IMDb. But I haven't been able to find a document
> about how to use the diffs files with IMDbPY

Right now they can't be used, if not to patch your set of plain
text data files and then run imdbpy2sql.py again on the patched set.

IMDbPY can't still update a database using the patches; Timo Schulz
is working on the same problem and trust me: it's a very difficult
task.

> I am planning to state my thanks to you and your team in the first
> page of my thesis for all the contribution you have made :)

Thank you very much, I'd be very proud of it. :-)


HTH,
-- 
Davide Alberani <[email protected]> [GPG KeyID: 0x465BFD47]
http://erlug.linux.it/~da/

------------------------------------------------------------------------------
The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your
production scanning environment may not be a perfect world - but thanks to
Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700
Series Scanner you'll get full speed at 300 dpi even with all image 
processing features enabled. http://p.sf.net/sfu/kodak-com
_______________________________________________
Imdbpy-help mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/imdbpy-help

Reply via email to