On Mar 22, Michael Liu <[email protected]> wrote: > In the title table generated by imdbpy2sql, what are the meanings of > the column titled imdb_index and phonetic_code?
They are internally used, so they're not documented. phonetic_code (and the various *_pcode columns in other tables) is used when you search for a given title; it's value is calculated at insert-time, and is a SOUNDEX phonetic code (i.e. a representation of how a given word/phrase sounds). So that, handling a search, we can select a subset of the database of titles that "sound similar" to the one we're searching for - this subset is then ordered using a Ratcliff-Obershelp similarity metric. You can find the layout of the database in the imdb.parser.sql.dbschema module (abstracted: we're pretty naive and support both SQLObject and SQLAlchemy... ;-) An old message about sondex/racliff-obershelp: http://sourceforge.net/mailarchive/message.php?msg_name=20060407152643.GB4376%40libero.it imdb_index is what a long time ago I decided to call the "imdbIndex" (probably not a very good name...): it's used when two movies, produced the same year, share the same title. It's the one you may see in the imdb.com page after a title, inside the parentheses containing the production year, separated by a slash. Example: 10 Bullets (2007/I) 10 Bullets (2007/II) It's also used to disambiguate persons' names. Now... a question: do you really need to understand the internals of IMDbPY? It's perfectly legit, but IMDbPY is not only a tool to put the plain text data files into a SQL database: it's perfectly able to extract the information from the database, too. :-) Are you sure that you need to directly access the database, without using IMDbPY? Bye! -- Davide Alberani <[email protected]> [GPG KeyID: 0x465BFD47] http://www.mimante.net/ ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev _______________________________________________ Imdbpy-help mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/imdbpy-help
