Yup, thanks for the clarification. I see now that some of the items I list in 2 are moot.
On Tue, Sep 18, 2018 at 4:16 PM Alexandre Rafalovitch <arafa...@gmail.com> wrote: > Uhm, inline: > > On 18 September 2018 at 17:05, Dan Brown <d...@likethecolor.com> wrote: > > 1. Thank you. > > > > 2. I think this is what you're looking for. You'd be able to be more > > specific than with bin/post. For instance: > > a. specify the CSV delimiter, CSV quote character, and multivalued field > > delimiter > > http://lucene.apache.org/solr/guide/7_4/uploading-data-with-index-handlers.html > separator - (global and field local for multivalued) > encapsulator - for CSV quote characters > > > b. the dynamic-fields feature let's you write plugins in Java to define > > values (very simple example: combine field values f_name, m_name, l_name > to > > populate a full_name field) > UpdateRequestProcessors. Your example specifically: > > > c. specify field order for mapping onto SOLR fields, data types, date > > formats of source data; perhaps your CSV headers/JSON keys don't cleanly > > map to SOLR field names > > d. flag whether the first row of a CSV is the header and should not be > > indexed > > e. use literal values - e.g., instead of having to alter the source data > to > > have a column whose value is "foo" you can configure a field to always > have > > the same literal value for all documents > > f. set the number of times to retry when there is an error and the amount > > of time between retries (e.g., sometimes zk was not consistently > responsive) > > g. skip fields - e.g., your data have 10 columns but you only want to > index > > columns 1, 3, 5, and 9 > > h. send soft commits after a specified number of batches > > i. combine fields to generate the uniqueKey value > > > > 3. Yes, atomic updates. For instance, index data using DIH then use this > > index to provide additional values to fields in those documents (e.g., > > maybe the extra data come from a different data source like BigQuery). > > > > I hope this brings more clarity to this tool's features and answers all > > your questions. Please ask questions if anyone has more. > > > > Dan > > > > > > On Tue, Sep 18, 2018 at 3:21 PM Christopher Schultz < > > ch...@christopherschultz.net> wrote: > > > >> -----BEGIN PGP SIGNED MESSAGE----- > >> Hash: SHA256 > >> > >> Dan, > >> > >> On 9/18/18 2:51 PM, Dan Brown wrote: > >> > I've been working on this for a while and it's finally in a state > >> > where it's ready for public consumption. > >> > > >> > This is a command line indexer that will index CSV or JSON > >> > documents: https://github.com/likethecolor/solr-indexer > >> > > >> > There are quite a few parameters/options that can be set. > >> > > >> > One thing to note is that it will update individual fields. That > >> > is, unlike the Data Import Handler, it does not replace entire > >> > documents. > >> > > >> > Please check it out and let me know what you think. > >> > >> How is this different from the bin/post tool that ships with Solr? > >> > >> Or is that you meant when you said "this is unlike the Data Import > >> Handler". > >> > >> AIUI, Solr doesn't support updating a single field in a document. The > >> document is replaced no matter how hard to try to be surgical about > >> updating a single field. > >> > >> - -chris > >> -----BEGIN PGP SIGNATURE----- > >> Comment: GPGTools - http://gpgtools.org > >> Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ > >> > >> iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAluhXlYACgkQHPApP6U8 > >> pFjIeQ/+PRIx+I+IDW9XTqGNV5TIWYf+yQKC/4JpTV4Ndj7MZLsEEw+cfMvFTvQt > >> 44dK7CnDKEDgQHZlMccWKd9/Th1k/5g40VMugBMsayRwUc83Onawdi4HQfnig4et > >> VN0/RaZ/IBo2AThsgEvUNplXYyY3BtyrUt6miiBsVkhKstI/BnmKqZvsRgvVjH0P > >> K1Xc5F2LNyXswvoIZqd3YmEa9p7CYMy7COsFV9KOeSymKlB7UoHulZqpJ9MRYkmn > >> YWjc9dHIRjpz5TUrJqWhZUG03uGXGtTnaXEku1Hb98WyIUZcHxkwN8W7qm6/B0CG > >> inPxfGRFH9EbUdcK4qeXmbQqty2sbKMQ6hogpRd/NEzgSWjDapiEUT1xz+p5V6wG > >> XM0ILaiLJ8zHJA6oUY0w5SNNyhdnd76CDpCK7T7YBm+aIxUDv9zoj6TLNceEaLi0 > >> SjfI83LvaR1gM/ZeVO77d+1IY9maU1+5m0EZFjAETfMGj5dwYRvBub0Oo6QQuLUm > >> roF5R5b/bg/WjjPF1n4CJ7gTr/WBMzahKFnnQvoYD3OQqZpoasoEUifPpSd9OgvO > >> yEok0VqwxPeXdHgE+Vy+BlXn6QqshB3BYnUSNbpFXlNsOIQojfJXkjcCa+dP1nyF > >> JCElvmEgBG8K1WzGo4WAtVqJs7WDzQlmY2RDrETGsVbnqkTojXA= > >> =AmkJ > >> -----END PGP SIGNATURE----- > >> >