> On Jun 2, 2020, at 7:40 AM, Charlie Hull <char...@flax.co.uk> wrote: > > If it was me I'd probably build a standalone indexer script in Python that > did the file handling, called out to a separate Tika service for extraction, > posted to Solr.
I would do the same thing, and I would base that script on Scrapy (https://scrapy.org <https://scrapy.org/>). I worked on a Python-based web spider for about ten years. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog)