Probably not as generated by the EnwikiDocMaker, but the
WikipediaTokenizer in Lucene can pull out richer syntax which could
then be Teed/Sinked to other fields. Things like categories, related
links, etc. Mostly, though, I was just commenting on the fact that it
isn't hard to at least use it for getting docs into Solr.
-Grant
On Jul 14, 2009, at 7:38 PM, Jason Rutherglen wrote:
You think enwiki has enough data for faceting?
On Tue, Jul 14, 2009 at 2:56 PM, Grant
Ingersoll<gsing...@apache.org> wrote:
At a min, it is trivial to use the EnWikiDocMaker and then send the
doc over
SolrJ...
On Jul 14, 2009, at 4:07 PM, Mark Miller wrote:
On Tue, Jul 14, 2009 at 3:36 PM, Jason Rutherglen <
jason.rutherg...@gmail.com> wrote:
Is there a standard index like what Lucene uses for contrib/
benchmark for
executing faceted queries over? Or maybe we can randomly generate
one
that
works in conjunction with wikipedia? That way we can execute real
world
queries against faceted data. Or we could use the Lucene/Solr
mailing
lists
and other data (ala Lucid's faceted site) as a standard index?
I don't think there is any standard set of docs for solr testing -
there
is
not a real benchmark contrib - though I know more than a few of us
have
hacked up pieces of Lucene benchmark to work with Solr - I think
I've done
it twice now ;)
Would be nice to get things going. I was thinking the other day: I
wonder
how hard it would be to make Lucene Benchmark generic enough to
accept
Solr
impls and Solr algs?
It does a lot that would suck to duplicate.
--
--
- Mark
http://www.lucidimagination.com
--------------------------
Grant Ingersoll
http://www.lucidimagination.com/
Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
using
Solr/Lucene:
http://www.lucidimagination.com/search
--------------------------
Grant Ingersoll
http://www.lucidimagination.com/
Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
using Solr/Lucene:
http://www.lucidimagination.com/search