On 3/27/2014 12:49 AM, scallawa wrote:
> I am using solr 4.7 and am importing data directly from a mysql database
> table using the DIH.  I have a column that looks like similar to this below
> in that it has multiple values in the database.
> 
> material          cotton "polyester blend" rayon
> 
> I would like the data to look like the following when imported.
> 
> <str name="material">cotton</str>
> <str name="material">polyester blend</str>
> <str name="material">rayon</str>.
> 
> In other words.  If there is multiple data points for a particular column
> and the mapped field is multivalued, create multiple <str name> fields.  If
> there are quotes around multiple words, treat them as one token.  Is this
> possible?

In a direct manner, I do not think so.  If the input data were simply
space separated and didn't have the quoted string that includes a space,
you could use the RegexTransformer in DIH and do a simple 'splitBy' on
the field.

If you know how to write a regex that would only match the spaces
outside of the quotes, you could still use that method.  I have no idea
how to do that.

Alternatively, you can write a custom update processor for Solr that
knows how to break up the input, remove the original field, and reinsert
it with the multiple values.  Custom update processors are not very
difficult if you already know how to write a program, but it's not trivial.

If the database actually has multiple values in a table rather than the
space separation, there are two possibilities: 1) Use nested DIH
entities, which makes a query to the database for every document. 2) Use
a JOIN with GROUP_CONCAT to construct a value with a delimiter other
than space - something that won't ever show up in the actual data.  You
can then use the splitBy method that I already mentioned.

You'd need to consult a database expert for help with JOIN and GROUP_CONCAT.

Thanks,
Shawn

Reply via email to