On May 10, Gozde Ozbal <[email protected]> wrote: > I have also made some changes about performance in MySQL > configuration.
Good; another possible speed-up is to use imdbpy2sql.py to dump a set of CSV files, later imported into the database. See README.sqldb for the details. By the way, the plain text data files still contain some file (complete-crew, distributors, keywords, miscellaneous-companies and special-effects-companies at this moment) with movie titles in the old format (IMDb recently switched from "Title, The" to "The Title"). Running imdbpy2sql.py you should use the --fix-old-style-titles argument so that every title is converted to the new format. > I am intending to use IMDbPY for my thesis work, Cool! Seems to be one of the more popular use of IMDbPY, these days. :-) Maybe you can be interested in the Hollywood Informatics group, where some IMDbPY developers and users are trying to explore strange uses of the IMDb data. :-) It's a new initiative and so nothing was done, yet, but it can be a good place to share some ideas (not about IMDbPY itself: there are already these mailing list, for help and development). > in which I am designing and implementing ReMovender, an intelligent > web based movie recommendation system. And I need to keep my movie > data up to date by running some scripts from the user interface. I'm not sure to have completely understood how it will work and what kind of data you need to keep up-to-date. Can you provide a short example of what an user will do? > First I have thought of crawling http://www.imdb.com/nowplaying/. But > after realizing that the movies in this page are already in the > imdb database, I've found out that it would take so much time for > this page to help me with obtaining the up-to-date data. You can parse that page (or http://italian.imdb.com/Recent/ ) to get a list of movies that are in theaters; after that, if you need complete information about a movie, you can use IMDbPY. Beware that the movieIDs internally used by IMDbPY to uniquely identify a movie are not the same for the web and the SQL database: you see that Star Trek (2009) has a movieID=0796366 on the web (and it's the same if you access its data using the 'http' and 'mobile' data access systems of IMDbPY), but with the 'sql' data access systems it will have another integer ID. That's because IMDb doesn't distribute a map from titles to movieIDs in the plain text data files (obviously you can search for the complete title and get the first result: 99.9% of the times it will lead to the same movie). > So I need to be aware just when the database is updated. The IMDb's database? The web pages are _constantly_ updated, so you can't tell for sure if something new was added. The plain text data files are update once a week (but not every file every week). > Do you think that IMDbPY can be helpful for this issue? Maybe, > I can provide a user interface to the administrator of my system > so that he/she can update the movie data properly. IMDbPY can be useful to get easy access to information about movies, persons, characters and companies; maybe it's what you need, maybe not. :-) > http://imdbpy.sourceforge.net/docs/README.sqldb.txt also mentions > about diffs files of IMDb. But I haven't been able to find a document > about how to use the diffs files with IMDbPY Right now they can't be used, if not to patch your set of plain text data files and then run imdbpy2sql.py again on the patched set. IMDbPY can't still update a database using the patches; Timo Schulz is working on the same problem and trust me: it's a very difficult task. > I am planning to state my thanks to you and your team in the first > page of my thesis for all the contribution you have made :) Thank you very much, I'd be very proud of it. :-) HTH, -- Davide Alberani <[email protected]> [GPG KeyID: 0x465BFD47] http://erlug.linux.it/~da/ ------------------------------------------------------------------------------ The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your production scanning environment may not be a perfect world - but thanks to Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700 Series Scanner you'll get full speed at 300 dpi even with all image processing features enabled. http://p.sf.net/sfu/kodak-com _______________________________________________ Imdbpy-help mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/imdbpy-help
