Hello James,

     First of all, thanks for your feedback. I will try to clarify some
questions bellow.

2013/2/20 Dyer, James <james.d...@ingramcontent.com>

> I only looked at your link super fast, but this seems like a very viable
> alternative to Solr's DIH.  DIH does the job fairly well but we've
> struggled to have developers who are willing to maintain it.  The problem,
> I think, is that DIH appeals to non-programmers who want to index their
> data without writing (much) code, hence the user base is unable to help out.
>

Indeed, one of the next steps we were considering to improve is the ability
for non-programmers to use pre-made transformations easily, thought a web
interface. In other words, to do transformations like this:
http://www.zorba-xquery.com/html/demo users wouldn't need to do code.
However, we think it's important to assure the possibility to code is still
open without losing features. I like to see it more as a framework to make
the programmers life much easier than as a product for non-programmers,
that's the idea.


> I saw that you can take data from NoSql databases, from a rdbms and
> emails.  Can it handle flat tiles, integrate with tika to parse out
> different file formats, etc?  Can it join and denormalize data from a
> variety of sources and combine that into 1 flat-schema document?
>

That's exactly what we do. This is our main strengh, as we could capture
data from any source. The idea is that you can write java code to import to
our repository, or you can use some of our already pre-build importers. We
have importers for JSON, CSV and one for DBs nowadays, although the DB one
might be improved a bit. However, you are free to write any java code that
imports from whatever you wants.

The only possible drawback is: we need the repository. We import from
anything and put on Cassandra, so users can capture data first and index it
later. The indexing process is split from the capture process, so we need
somewhere to keep the data. Suppose you capture all data you can about
users today, but only index part of it. In a later time, you could choose
to index the rest, the data is already there.

However, if this is important, we could think in adding the possibility to
import data directly to Solr, without this repo in the middle.



> I could see your project as something we could use in Solr & Lucene, but
> with its appeal to more projects than just ours, possibly would not have
> such a problem attracting developers.


That's exactly the kind of feedback we want, thank you very much.

Best regards,
Marcelo

Reply via email to