Hello,

I am a non-techie who decided to download and install Solr 5.0 to parse data  
for my community activism. Got it installed and running, updated the example 
schema and installation with a bunch of CSV data. And went back to deal with 
the first of two fields I deferred till later - dates and location data. 

The CSV data file for Jan - August 2014 is about 650mb with about 1.25 million 
records/rows. I split it into 5 pieces and went changed MM/DD/YYYY HH:MM:SS 
AM/PM to the YYYY-MM-DDTHH:MM:SSZ format required by Solr, using TextWrangler. 
Which is what I know and a step up from trying to use Mac Numbers spreadsheet 
which does it very easily but I will have to break it into pieces smaller than 
25-30mb. Random fields can get updated months after the record was created so I 
have to find an easier way than break the CSV file into smaller bits and 
reformat manually. Each record/row has 4 date fields so potentially there are 
upto 5 million fields to be reformatted in 8 months worth of data.. 

I did a Google search (didn't see a Solr search page) on the mailing list 
archives and the internet, but seems like my question is either too simple 
and/or it's staring me in the face and I'm just missing it:  Is there a simple 
way to reformat the dates to Solr-style in a 650mb-1gig CSV file? Or, ideally, 
have the dates and times automatically reformatted as the Solr index gets 
updated the latest data (I recall reading this was not possible). Is there a 
widget/gadget/gizmo/script that would do this? 

thanks,
manohar

Reply via email to