[ANN] SIREn, a Lucene/Solr plugin for rich JSON data search

Renaud Delbru Wed, 23 Jul 2014 04:14:35 -0700

One of the coolest features of Lucene/Solr is its ability to indexnested documents using a Blockjoin approach.

While this works well for small documents and document collections, itbecomes unsustainable for larger ones: Blockjoin works by splitting theoriginal document in many documents, one per nested record.

For example, a single USPTO patent (XML format converted to JSON) willend up being over 1500 documents in the index. This has massiveimplications on performance and scalability.


Introducing SIREn

SIREn is an open source plugin for Solr for indexing and searching richnested JSON data.

SIREn uses a sophisticated "tree indexing" design which ensures that theindex is not artificially inflated. This ensures that querying on manytypes of nested queries can be up to 3x faster. Further, depending onthe data, memory requirements for faceting can be up to 10x higher. Assuch, SIREn allows you to use Solr for larger and more complex datasets,especially so for sophisticated analytics. (You can read our whitepaperto find out more [1])

SIREn is also truly schemaless - it even allows you to change the typeof a property between documents without being restricted by a definedmapping. This can be very useful for data integration scenarios wheredata is described in different ways in different sources.

You only need a few minutes to download and try SIREn [2]. It comes witha detailed manual [3] and you have access to the code on GitHub [4].


We look forward to hear about your feedbacks.

[1]http://siren.solutions/siren/resources/whitepapers/comparing-siren-1-2-and-lucenes-blockjoin-performance-a-uspto-patent-search-scenario/

[2] http://siren.solutions/siren/downloads/
[3] http://siren.solutions/manual/preface.html
[4] https://github.com/sindicetech/siren
--
Renaud Delbru
CTO
SIREn Solutions

[ANN] SIREn, a Lucene/Solr plugin for rich JSON data search

Reply via email to