Hi Mike, Bit late on this, but just saw it...
Using streaming to ingest has occurred to me too but I think it's not really right for that except in fairly trivial cases. The very first big problem you will have in the example you give is that you won't be able to mark things as already ingested, so you have to read the whole thing every time, one could eventually add enough features to it, but that's probably going to feature bloat it, and change the focus from processing data originating in solr to processing data from external sources. At that point I think it's better for it to be a separate system, and to be set up in a way that can be managed. Any non-trivial ingestion process using streaming is going to be configured as a large deeply nested streaming expression, which I fear would be very hard to read and maintain. I did a talk a while back that went through a wishlist for document ingestion... slides here: https://docs.google.com/presentation/d/17NhL-nfYa-d2Vx_DleXo_JC1SwiBMlfP5Zm4IEiZOYY/pub?start=false&loop=false&delayms=5000 I do presently have a case where I use streaming to create summary records for some data once it's in solr. -Gus On Fri, Sep 16, 2016 at 11:52 AM, Joel Bernstein <joels...@gmail.com> wrote: > Unfortunately there currently isn't a way to split a field. But this would > be nice functionality to add. > > The approach would be to an add a split operation that would be used by the > select() function. It would look like this: > > select(jdbc(...), split(fieldA, delim=","), ...) > > This would make a good jira issue. > > > > > > > Joel Bernstein > http://joelsolr.blogspot.com/ > > On Fri, Sep 16, 2016 at 11:03 AM, Mike Thomsen <mikerthom...@gmail.com> > wrote: > > > Read this article and thought it could be interesting as a way to do > > ingestion: > > > > https://dzone.com/articles/solr-streaming-expressions- > > for-collection-auto-upd-1 > > > > Example from the article: > > > > daemon(id="12345", > > > > runInterval="60000", > > > > update(users, > > > > batchSize=10, > > > > jdbc(connection="jdbc:mysql://localhost/users?user=root&password=solr", > > sql="SELECT id, name FROM users", sort="id asc", > > driver="com.mysql.jdbc.Driver") > > > > ) > > > > What's the best way to handle a multivalue field using this API? Is > > there a way to tokenize something returned in a database field? > > > > Thanks, > > > > Mike > > > -- http://www.the111shift.com