Hey Jay! All I can say is "good luck with that". I do know Morphlines uses EmbeddedSolrServer to do its work. So I don't really see a good way to pluck just what you'd need for schemaless.
The MapReduceIndexerTool is carried right along with Solr though. IIRC the Morphlines stuff is mostly the ETL process. Have you tried just running an MRIT job with a current Solr? I have no idea whether it'd work, but it seem like it "should"... Erick On Fri, Mar 17, 2017 at 5:51 PM, Jay Hill <jayallenh...@gmail.com> wrote: > I've got a very difficult project to tackle. I've been tasked with using > schemaless mode to index json files that we receive. The structure of the > json files will always be very different as we're receiving files from > different customers totally unrelated to one another. We are attempting to > build a "one size fits all" approach to receiving documents from a wide > variety of sources and then index them into Solr. > > We're running in Solr 5.3. The schemaless approach works well enough - > until it doesn't. It seems to fail on type guessing and also gets confused > indexing to different shards. If it was reliable it would be the perfect > solution for our task. But the larger the JSON file the more likely it is > to fail. At a certain size it just doesn't work. > > I've been advised by some experts and committers that schemaless is a good > tool for prototyping, but risky to run in production, but we thought we > would try it by doing offline indexing using the Cloudera > MapReduceIndexerTool to build offline indexes - but still using managed > schemas. This map reduce tool uses morphlines, which is a nifty ETL tool > that pipes together a series of commands to transform data. For example a > JSON or CSV file can be processed and loaded into a Solr index with a > "readJSON" command piped to a "loadSolr" command, for a simple example. > > But the kite-sdk that manages the morphlines only seems to offer as they're > latest version, solr *4.10.3*-cdh5.10.0 (they're customized version of > 4.10.3) > > So I can't see any way to integrate schemaless (which has dependencies > after 4.10.3) with the morphlines. > > But I thought I would ask here: Anybody had ANY experience with morphlines > to index to Solr? Any info would help me make sense of this. > > Cheers to all!