Take a look at the RegexTransformer. Or,in some cases your may need to use the raw ScriptTransformer.
See: https://cwiki.apache.org/confluence/display/solr/Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler -- Jack Krupansky On Sat, Jan 24, 2015 at 3:49 PM, Carl Roberts <carl.roberts.zap...@gmail.com > wrote: > Via this rss-data-config.xml file and a class that I wrote (attached) to > download and XML file from a ZIP URL: > > <dataConfig> > <dataSource type="ZIPURLDataSource" connectionTimeout="15000" > readTimeout="30000"/> > <document> > <entity name="cve-2002" > pk="id" > url="https://nvd.nist.gov/feeds/xml/cve/nvdcve-2.0-2002.xml.zip" > processor="XPathEntityProcessor" > forEach="/nvd/entry"> > <field column="id" xpath="/nvd/entry/@id" commonField="false" > /> > <field column="cve" xpath="/nvd/entry/cve-id" > commonField="false" /> > <field column="cwe" xpath="/nvd/entry/cwe/@id" > commonField="false" /> > <field column="vulnerable-configuration" > xpath="/nvd/entry/vulnerable-configuration/logical-test/fact-ref/@name" > commonField="false" /> > <field column="vulnerable-software" > xpath="/nvd/entry/vulnerable-software-list/product" commonField="false" /> > <field column="published" xpath="/nvd/entry/published-datetime" > commonField="false" /> > <field column="modified" xpath="/nvd/entry/last-modified-datetime" > commonField="false" /> > <field column="summary" xpath="/nvd/entry/summary" > commonField="false" /> > </entity> > <entity name="cve-2003" > pk="id" > url="http://nvd.nist.gov/feeds/xml/cve/nvdcve-2.0-2003.xml.zip" > processor="XPathEntityProcessor" > forEach="/nvd/entry"> > <field column="id" xpath="/nvd/entry/@id" commonField="false" > /> > <field column="cve" xpath="/nvd/entry/cve-id" > commonField="false" /> > <field column="cwe" xpath="/nvd/entry/cwe/@id" > commonField="false" /> > <field column="vulnerable-configuration" > xpath="/nvd/entry/vulnerable-configuration/logical-test/fact-ref/@name" > commonField="false" /> > <field column="vulnerable-software" > xpath="/nvd/entry/vulnerable-software-list/product" commonField="false" /> > <field column="published" xpath="/nvd/entry/published-datetime" > commonField="false" /> > <field column="modified" xpath="/nvd/entry/last-modified-datetime" > commonField="false" /> > <field column="summary" xpath="/nvd/entry/summary" > commonField="false" /> > </entity> > <!-- > <entity name="nvd-rss-update" > pk="link" > url="https://nvd.nist.gov/download/nvd-rss.xml" > processor="XPathEntityProcessor" > forEach="/RDF/item" > transformer="DateFormatTransformer" > preImportDeleteQuery=""> > <field column="id" xpath="/RDF/item/title" commonField="true" > /> > <field column="link" xpath="/RDF/item/link" commonField="true" > /> > <field column="summary" xpath="/RDF/item/description" > commonField="true" /> > <field column="date" xpath="/RDF/item/date" commonField="true" > /> > </entity> > --> > </document> > </dataConfig> > > > On 1/24/15, 3:45 PM, Jack Krupansky wrote: > >> How are you currently importing data? >> >> -- Jack Krupansky >> >> On Sat, Jan 24, 2015 at 3:42 PM, Carl Roberts < >> carl.roberts.zap...@gmail.com >> >>> wrote: >>> Sorry if I was not clear. What I am asking is this: >>> >>> How can I parse the data during import to tokenize it by (:) and strip >>> the >>> cpe:/o? >>> >>> >>> >>> On 1/24/15, 3:28 PM, Alexandre Rafalovitch wrote: >>> >>> You are using keywords here that seem to contradict with each other. >>>> Or your use case is not clear. >>>> >>>> Specifically, you are saying you are getting stuff from a (Solr?) >>>> query. So, the results are now outside of Solr. Then you are asking >>>> for help to strip stuff off it. Well, it's outside of Solr, do >>>> whatever you want with it! >>>> >>>> But then at the end, you say you want to search for whatever you >>>> stripped off. So, that should be back in Solr again? >>>> >>>> Or are you asking something along these lines: >>>> 1. I have a multiValued field with the following sample content... (it >>>> does not matter to Solr where it comes from) >>>> 2. I wanted it returned as is, but I want to be able to find documents >>>> when somebody searches for X, Y, or Z >>>> 3. What would be the best analyzer chain to be able to do so? >>>> >>>> Regards, >>>> Alex. >>>> ---- >>>> Sign up for my Solr resources newsletter at http://www.solr-start.com/ >>>> >>>> >>>> On 24 January 2015 at 15:04, Carl Roberts < >>>> carl.roberts.zap...@gmail.com> >>>> wrote: >>>> >>>> Hi, >>>>> >>>>> How can I parse the data in a field that is returned from a query? >>>>> >>>>> Basically, >>>>> >>>>> I have a multi-valued field that contains values such as these that are >>>>> returned from a query: >>>>> >>>>> "cpe:/o:freebsd:freebsd:1.1.5.1", >>>>> "cpe:/o:freebsd:freebsd:2.2.3", >>>>> "cpe:/o:freebsd:freebsd:2.2.2", >>>>> "cpe:/o:freebsd:freebsd:2.2.5", >>>>> "cpe:/o:freebsd:freebsd:2.2.4", >>>>> "cpe:/o:freebsd:freebsd:2.0.5", >>>>> "cpe:/o:freebsd:freebsd:2.2.6", >>>>> "cpe:/o:freebsd:freebsd:2.1.6.1", >>>>> "cpe:/o:freebsd:freebsd:2.0.1", >>>>> "cpe:/o:freebsd:freebsd:2.2", >>>>> "cpe:/o:freebsd:freebsd:2.0", >>>>> "cpe:/o:openbsd:openbsd:2.3", >>>>> "cpe:/o:freebsd:freebsd:3.0", >>>>> "cpe:/o:freebsd:freebsd:1.1", >>>>> "cpe:/o:freebsd:freebsd:2.1.6", >>>>> "cpe:/o:openbsd:openbsd:2.4", >>>>> "cpe:/o:bsdi:bsd_os:3.1", >>>>> "cpe:/o:freebsd:freebsd:1.0", >>>>> "cpe:/o:freebsd:freebsd:2.1.7", >>>>> "cpe:/o:freebsd:freebsd:1.2", >>>>> "cpe:/o:freebsd:freebsd:2.1.5", >>>>> "cpe:/o:freebsd:freebsd:2.1.7.1"], >>>>> >>>>> And my problem is that I need to strip the cpe:/o part and I also need >>>>> to >>>>> tokenize words using the (:) as a separator so that I can then search >>>>> for >>>>> "freebsd 1.1" or "openbsd 2.4" or just "freebsd". >>>>> >>>>> Thanks in advance. >>>>> >>>>> Joe >>>>> >>>>> >