On Thu, Nov 7, 2013 at 11:37 AM, Jim Hu <[email protected]> wrote:

> Hi Nik,
>
> As I was reading the docs for MWSearch, I considered whether I should
> switch to CirrusSearch, so it may not be a difficult sell.  I'd even
> volunteer to try to update the documentation if you're willing to help walk
> me through it.
>
> But to show how clueless I am... I'm not sure how to check the other end,
> since I'm not clear on what it's trying to do. Here's my undoubtedly deeply
> flawed understanding of what happens (this reflects that I'm a biologist by
> training and badly self-taught on wikis and linux/unix/osx).
>
> I'm assuming that the problem is in this first step of the update script
>
> java -cp LuceneSearch.jar org.wikimedia.lsearch.oai.IncrementalUpdater -l
> $@ \
>
> It's listing a bunch of update items (the ... in my first post).  I am
> guessing that it pulls info on revisions from the mysql database and
> converts them to some format that gets sent to the indexer, which I assume
> is part of apache Lucene.  From the error, it's failing to pass that
> through some socket to the indexer.  But I don't know how to see a log for
> activity on that socket.
>

You have the right idea but by "the other side" I mean a log on the
indexer.  It is some other java process probably running on the Hexamer
host that I saw in the indexer logs.  It should have something in the
logs.  Hopefully.


> My similarly uninformed reading about CirrusSearch is that it uses
> elasticsearch, which in turn uses Lucene.  So if the problem is between the
> incrementalUpdater and Lucene, I might have similar issues with
> CirrusSearch.  But if CirrusSearch gives more informative errors, that
> would help!!  And maybe I should switch anyway, as it sounds like support
> for MWsearch will go away at some point.
>

Lucene is a library that can be embedded in Java applications to provide
full text searching capabilities (and geospatial search and few other
things).  Anyway, LuceneSearch is a Mediawiki specific application that
provides Lucene's full text search capabilities in a way that the MWSearch
extension understands.

Elasticsearch serves the same purpose for CirrusSearch as LuceneSearch
serves for MWSearch.  We like Elasticsearch because it is general purpose
and sees a ton more development than LuceneSearch.

As far as support goes - we haven't done much with LuceneSearch/MWSearch in
a while.  I work on CirrusSearch every day, as does Chad who seems to have
replied while I'm sending this email.  Elasticsearch itself has had 44
people submit code to it in the past month.  Its a more healthy ecosystem
but it might be a pain to switch.  CirrusSearch requires a very recent
version of Mediawiki, for example.

Nik
_______________________________________________
MediaWiki-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l

Reply via email to