On 21 Apr 2008, at 14:40, robl wrote:

SELECT * FROM pages WHERE page_title LIKE "Queen%Elizabeth"

This would perform a case insensitive match on Queen(anything) Elizabeth
(at least in mySQL).


...

Is there quick way to do what I want ?  Are there any indexes I could
apply to improve things (I have already created the indexes specified at http://www.openlinksw.com/dataspace/kide...@openlinksw.com/weblog/ kide...@openlinksw.com's%20BLOG%20%5B127%5D/1298)
?

Or do I need to create a conventional SQL table of resource names and
then do a SQL LIKE query on those ?


You might also want to check out freebase. Here's the approach I'm about to attempt, myself. Start with a reconciliation query:

http://sandbox.freebase.com/dataserver/reconciliation/?name=Queen+Elizabeth&types=%2Fpeople%2Fperson&responseType=html
  - the reconciliation service handles misspellings and other variations
  - s/html/json/ for the machine readable version

Then look at the freebase page or perform a query:

http://www.freebase.com/view/en/elizabeth_ii_of_the_united_kingdom

That page has this link:

http://en.wikipedia.org/wiki/index.html?curid=12153654

On that page, we have

<a href="http://en.wikipedia.org/wiki/Elizabeth_II_of_the_United_Kingdom";>article</a>

Maybe freebase can just hand us that link instead of the curid one. I haven't gotten to that part of my code yet. I don't know how often the last word of the freebase URI is in sync with the WP one, but that seems like it would be the least reliable. Following freebase's designated WP link is probably more robust.

Finally, take the wiki name, and make a dbpedia URI:

http://dbpedia.org/page/Elizabeth_II_of_the_United_Kingdom



You probably noticed that elizabeth_ii_of_the_united_kingdom wasn't the first result for 'Queen Elizabeth' of type /people/person. I'm not sure if freebase considers that a bad result page or not. The reconciliation service is new, so now's probably a great time to tell them how important good results are to you :)

Reply via email to