Thanks Alex, I will try your "not programming" :) solution. Really appreciate your time and effort.
manohar On Sep 22, 2014, at 6:23 PM, Alexandre Rafalovitch <arafa...@gmail.com> wrote: > You could try - for your ideal scenario - creating an > UpdateRequestProcessor (URP) chain, that > includes:ParseDateFieldUpdateProcessorFactory > https://lucene.apache.org/solr/4_10_0/solr-core/org/apache/solr/update/processor/ParseDateFieldUpdateProcessorFactory.html > > Notice that it has been designed for dynamic field scenario, so by > default it looks at everything and tries to make it a date. But its > parent class has some parameters to specify specific fields to use: > https://lucene.apache.org/solr/4_10_0/solr-core/org/apache/solr/update/processor/FieldMutatingUpdateProcessorFactory.html > > You can see an example in the schemaless config example: > https://github.com/apache/lucene-solr/blob/lucene_solr_4_10_0/solr/example/example-schemaless/solr/collection1/conf/solrconfig.xml#L1584 > > Just remember that when you are creating a URP chain: > 1) You need to keep two (or three) of the update request processor in > the chain, not just your date one. The details are here: > https://wiki.apache.org/solr/UpdateRequestProcessor . The example > above uses three, to deal with cloud situation > 2) You need to refer to that chain in the request handler to make sure > it is actually used: > https://github.com/apache/lucene-solr/blob/lucene_solr_4_10_0/solr/example/example-schemaless/solr/collection1/conf/solrconfig.xml#L1014 > > I THINK this should work and it would classify under configuration not > customization and definitely not programming. > > Regards, > Alex. > Personal: http://www.outerthoughts.com/ and @arafalov > Solr resources and newsletter: http://www.solr-start.com/ and @solrstart > Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 > > > On 22 September 2014 16:16, Manohar Kanuri <s...@kanuri.org> wrote: >> Hello, >> >> I am a non-techie who decided to download and install Solr 5.0 to parse data >> for my community activism. Got it installed and running, updated the >> example schema and installation with a bunch of CSV data. And went back to >> deal with the first of two fields I deferred till later - dates and location >> data. >> >> The CSV data file for Jan - August 2014 is about 650mb with about 1.25 >> million records/rows. I split it into 5 pieces and went changed MM/DD/YYYY >> HH:MM:SS AM/PM to the YYYY-MM-DDTHH:MM:SSZ format required by Solr, using >> TextWrangler. Which is what I know and a step up from trying to use Mac >> Numbers spreadsheet which does it very easily but I will have to break it >> into pieces smaller than 25-30mb. Random fields can get updated months after >> the record was created so I have to find an easier way than break the CSV >> file into smaller bits and reformat manually. Each record/row has 4 date >> fields so potentially there are upto 5 million fields to be reformatted in 8 >> months worth of data.. >> >> I did a Google search (didn't see a Solr search page) on the mailing list >> archives and the internet, but seems like my question is either too simple >> and/or it's staring me in the face and I'm just missing it: Is there a >> simple way to reformat the dates to Solr-style in a 650mb-1gig CSV file? Or, >> ideally, have the dates and times automatically reformatted as the Solr >> index gets updated the latest data (I recall reading this was not possible). >> Is there a widget/gadget/gizmo/script that would do this? >> >> thanks, >> manohar