Re: wikipedia and teaching kids search engines

2010-03-25 Thread Mark Miller
On 03/24/2010 10:40 AM, Erik Hatcher wrote: I've got a couple of questions for the community... * what's the simplest way to get Solr up and running with a relatively richly schema'd index of a Wikipedia dump? What I'm looking for is something as easy as something along these lines: java

Re: wikipedia and teaching kids search engines

2010-03-25 Thread Jon Baer
Just throwing this out there ... I recently saw something I found pretty interesting from CMU ... http://csunplugged.org/activities The search algorithm exercise was focused on a Battleship lookup I think. - Jon On Mar 24, 2010, at 10:40 AM, Erik Hatcher wrote: > I've got a couple of quest

Re: wikipedia and teaching kids search engines

2010-03-24 Thread Grant Ingersoll
On Mar 24, 2010, at 1:53 PM, Andrzej Bialecki wrote: > On 2010-03-24 16:15, Markus Jelsma wrote: >> A bit off-topic but how about Nutch grabbing some conent and have it indexed >> in Solr? > > The problem is not with collecting and submitting the documents, the problem > is with parsing the Wik

Re: wikipedia and teaching kids search engines

2010-03-24 Thread Chris Hostetter
: My goal is to index wikipedia in order to demonstrate search to a class of : middle school kids that I've volunteered to teach for a couple of hours. : Which brings me to my next question... twitter data is a little easier to ingest easily then the wikipedia markup (the json based streaming AP

Re: wikipedia and teaching kids search engines

2010-03-24 Thread Andrzej Bialecki
On 2010-03-24 16:15, Markus Jelsma wrote: A bit off-topic but how about Nutch grabbing some conent and have it indexed in Solr? The problem is not with collecting and submitting the documents, the problem is with parsing the Wikimedia markup embedded in XML. WikipediaTokenizer from Lucene con

Re: wikipedia and teaching kids search engines

2010-03-24 Thread Walter Underwood
This is brilliant. I love it! Is a computer game a document? How about each level, each room, each player? If you want some fancy linguistics besides stemming, try compounding or what I call "one word or two?" English loves to glom words together. schoolroom or school room? babysitter, baby-sit

Re: wikipedia and teaching kids search engines

2010-03-24 Thread Erick Erickson
Erik: In a former incarnation, I thought I was going to teach 6th graders. Until I found out I can't deal with 25 kids for 6 hours at a stretch for years on end My thoughts, presented in a "feel free to ignore but this is what I'd do" spirit. There are some random thoughts below, but here's w

Re: wikipedia and teaching kids search engines

2010-03-24 Thread Markus Jelsma
A bit off-topic but how about Nutch grabbing some conent and have it indexed in Solr? On Wednesday 24 March 2010 16:08:43 Christopher Laux wrote: > Hi Erik, > > I'm working on Wikipedia search and use Solr. Afaik it can't easily be > done. The Wikipedia XML dump only provided the page title and

Re: wikipedia and teaching kids search engines

2010-03-24 Thread Christopher Laux
Hi Erik, I'm working on Wikipedia search and use Solr. Afaik it can't easily be done. The Wikipedia XML dump only provided the page title and author in terms of data one would search for. The rest requires parsing the Mediawiki markup for which there is no good one freely available (still writing

Re: wikipedia and teaching kids search engines

2010-03-24 Thread Mattmann, Chris A (388J)
Hey Erik, One thing to think about (and I'm no expert at middle school kids) would be to relate search somehow to a topic they are interested in. My 12 year old nephew loves the NBA, so if I were to talk to him about search, I would try and relate it to e.g., NBA.com, or understanding the differen

wikipedia and teaching kids search engines

2010-03-24 Thread Erik Hatcher
I've got a couple of questions for the community... * what's the simplest way to get Solr up and running with a relatively richly schema'd index of a Wikipedia dump? What I'm looking for is something as easy as something along these lines: java -Dsolr.solr.home=./wikipedia_solr_home -