On 3/27/2014 12:49 AM, scallawa wrote: > I am using solr 4.7 and am importing data directly from a mysql database > table using the DIH. I have a column that looks like similar to this below > in that it has multiple values in the database. > > material cotton "polyester blend" rayon > > I would like the data to look like the following when imported. > > <str name="material">cotton</str> > <str name="material">polyester blend</str> > <str name="material">rayon</str>. > > In other words. If there is multiple data points for a particular column > and the mapped field is multivalued, create multiple <str name> fields. If > there are quotes around multiple words, treat them as one token. Is this > possible?
In a direct manner, I do not think so. If the input data were simply space separated and didn't have the quoted string that includes a space, you could use the RegexTransformer in DIH and do a simple 'splitBy' on the field. If you know how to write a regex that would only match the spaces outside of the quotes, you could still use that method. I have no idea how to do that. Alternatively, you can write a custom update processor for Solr that knows how to break up the input, remove the original field, and reinsert it with the multiple values. Custom update processors are not very difficult if you already know how to write a program, but it's not trivial. If the database actually has multiple values in a table rather than the space separation, there are two possibilities: 1) Use nested DIH entities, which makes a query to the database for every document. 2) Use a JOIN with GROUP_CONCAT to construct a value with a delimiter other than space - something that won't ever show up in the actual data. You can then use the splitBy method that I already mentioned. You'd need to consult a database expert for help with JOIN and GROUP_CONCAT. Thanks, Shawn