Oops, premature send. But basically, nearly all the items below seem to be a mix of things that CSV can already do or that URP can already do or would be the good place to inject that as a plugin. E.g. http://lucene.apache.org/solr/guide/7_4/update-request-processors.html#templateupdateprocessorfactory
Not that I am saying your project has no place to exist. I am just saying that it would benefit from a higher-level explanation that clearly differentiates it from what Solr already does. Regards, Alex. On 18 September 2018 at 17:16, Alexandre Rafalovitch <arafa...@gmail.com> wrote: > Uhm, inline: > > On 18 September 2018 at 17:05, Dan Brown <d...@likethecolor.com> wrote: >> 1. Thank you. >> >> 2. I think this is what you're looking for. You'd be able to be more >> specific than with bin/post. For instance: >> a. specify the CSV delimiter, CSV quote character, and multivalued field >> delimiter > http://lucene.apache.org/solr/guide/7_4/uploading-data-with-index-handlers.html > separator - (global and field local for multivalued) > encapsulator - for CSV quote characters > >> b. the dynamic-fields feature let's you write plugins in Java to define >> values (very simple example: combine field values f_name, m_name, l_name to >> populate a full_name field) > UpdateRequestProcessors. Your example specifically: > >> c. specify field order for mapping onto SOLR fields, data types, date >> formats of source data; perhaps your CSV headers/JSON keys don't cleanly >> map to SOLR field names >> d. flag whether the first row of a CSV is the header and should not be >> indexed >> e. use literal values - e.g., instead of having to alter the source data to >> have a column whose value is "foo" you can configure a field to always have >> the same literal value for all documents >> f. set the number of times to retry when there is an error and the amount >> of time between retries (e.g., sometimes zk was not consistently responsive) >> g. skip fields - e.g., your data have 10 columns but you only want to index >> columns 1, 3, 5, and 9 >> h. send soft commits after a specified number of batches >> i. combine fields to generate the uniqueKey value >> >> 3. Yes, atomic updates. For instance, index data using DIH then use this >> index to provide additional values to fields in those documents (e.g., >> maybe the extra data come from a different data source like BigQuery). >> >> I hope this brings more clarity to this tool's features and answers all >> your questions. Please ask questions if anyone has more. >> >> Dan >> >> >> On Tue, Sep 18, 2018 at 3:21 PM Christopher Schultz < >> ch...@christopherschultz.net> wrote: >> >>> -----BEGIN PGP SIGNED MESSAGE----- >>> Hash: SHA256 >>> >>> Dan, >>> >>> On 9/18/18 2:51 PM, Dan Brown wrote: >>> > I've been working on this for a while and it's finally in a state >>> > where it's ready for public consumption. >>> > >>> > This is a command line indexer that will index CSV or JSON >>> > documents: https://github.com/likethecolor/solr-indexer >>> > >>> > There are quite a few parameters/options that can be set. >>> > >>> > One thing to note is that it will update individual fields. That >>> > is, unlike the Data Import Handler, it does not replace entire >>> > documents. >>> > >>> > Please check it out and let me know what you think. >>> >>> How is this different from the bin/post tool that ships with Solr? >>> >>> Or is that you meant when you said "this is unlike the Data Import >>> Handler". >>> >>> AIUI, Solr doesn't support updating a single field in a document. The >>> document is replaced no matter how hard to try to be surgical about >>> updating a single field. >>> >>> - -chris >>> -----BEGIN PGP SIGNATURE----- >>> Comment: GPGTools - http://gpgtools.org >>> Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ >>> >>> iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAluhXlYACgkQHPApP6U8 >>> pFjIeQ/+PRIx+I+IDW9XTqGNV5TIWYf+yQKC/4JpTV4Ndj7MZLsEEw+cfMvFTvQt >>> 44dK7CnDKEDgQHZlMccWKd9/Th1k/5g40VMugBMsayRwUc83Onawdi4HQfnig4et >>> VN0/RaZ/IBo2AThsgEvUNplXYyY3BtyrUt6miiBsVkhKstI/BnmKqZvsRgvVjH0P >>> K1Xc5F2LNyXswvoIZqd3YmEa9p7CYMy7COsFV9KOeSymKlB7UoHulZqpJ9MRYkmn >>> YWjc9dHIRjpz5TUrJqWhZUG03uGXGtTnaXEku1Hb98WyIUZcHxkwN8W7qm6/B0CG >>> inPxfGRFH9EbUdcK4qeXmbQqty2sbKMQ6hogpRd/NEzgSWjDapiEUT1xz+p5V6wG >>> XM0ILaiLJ8zHJA6oUY0w5SNNyhdnd76CDpCK7T7YBm+aIxUDv9zoj6TLNceEaLi0 >>> SjfI83LvaR1gM/ZeVO77d+1IY9maU1+5m0EZFjAETfMGj5dwYRvBub0Oo6QQuLUm >>> roF5R5b/bg/WjjPF1n4CJ7gTr/WBMzahKFnnQvoYD3OQqZpoasoEUifPpSd9OgvO >>> yEok0VqwxPeXdHgE+Vy+BlXn6QqshB3BYnUSNbpFXlNsOIQojfJXkjcCa+dP1nyF >>> JCElvmEgBG8K1WzGo4WAtVqJs7WDzQlmY2RDrETGsVbnqkTojXA= >>> =AmkJ >>> -----END PGP SIGNATURE----- >>>