Re: How do you parse the data in a field that is returned from a query?

Jack Krupansky Sat, 24 Jan 2015 12:59:51 -0800

Take a look at the RegexTransformer. Or,in some cases your may need to use
the raw ScriptTransformer.


See:
https://cwiki.apache.org/confluence/display/solr/Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler

-- Jack Krupansky

On Sat, Jan 24, 2015 at 3:49 PM, Carl Roberts <carl.roberts.zap...@gmail.com
> wrote:

> Via this rss-data-config.xml file and a class that I wrote (attached) to
> download and XML file from a ZIP URL:
>
> <dataConfig>
>     <dataSource type="ZIPURLDataSource" connectionTimeout="15000"
> readTimeout="30000"/>
>     <document>
>         <entity name="cve-2002"
>                 pk="id"
> url="https://nvd.nist.gov/feeds/xml/cve/nvdcve-2.0-2002.xml.zip";
>                 processor="XPathEntityProcessor"
>                 forEach="/nvd/entry">
>             <field column="id" xpath="/nvd/entry/@id" commonField="false"
> />
>             <field column="cve" xpath="/nvd/entry/cve-id"
> commonField="false" />
>             <field column="cwe" xpath="/nvd/entry/cwe/@id"
> commonField="false" />
>             <field column="vulnerable-configuration"
> xpath="/nvd/entry/vulnerable-configuration/logical-test/fact-ref/@name"
> commonField="false" />
>             <field column="vulnerable-software"
> xpath="/nvd/entry/vulnerable-software-list/product" commonField="false" />
>             <field column="published" xpath="/nvd/entry/published-datetime"
> commonField="false" />
>             <field column="modified" xpath="/nvd/entry/last-modified-datetime"
> commonField="false" />
>             <field column="summary" xpath="/nvd/entry/summary"
> commonField="false" />
>         </entity>
>         <entity name="cve-2003"
>                 pk="id"
> url="http://nvd.nist.gov/feeds/xml/cve/nvdcve-2.0-2003.xml.zip";
>                 processor="XPathEntityProcessor"
>                 forEach="/nvd/entry">
>             <field column="id" xpath="/nvd/entry/@id" commonField="false"
> />
>             <field column="cve" xpath="/nvd/entry/cve-id"
> commonField="false" />
>             <field column="cwe" xpath="/nvd/entry/cwe/@id"
> commonField="false" />
>             <field column="vulnerable-configuration"
> xpath="/nvd/entry/vulnerable-configuration/logical-test/fact-ref/@name"
> commonField="false" />
>             <field column="vulnerable-software"
> xpath="/nvd/entry/vulnerable-software-list/product" commonField="false" />
>             <field column="published" xpath="/nvd/entry/published-datetime"
> commonField="false" />
>             <field column="modified" xpath="/nvd/entry/last-modified-datetime"
> commonField="false" />
>             <field column="summary" xpath="/nvd/entry/summary"
> commonField="false" />
>         </entity>
>         <!--
>         <entity name="nvd-rss-update"
>                 pk="link"
>                 url="https://nvd.nist.gov/download/nvd-rss.xml";
>                 processor="XPathEntityProcessor"
>                 forEach="/RDF/item"
>                 transformer="DateFormatTransformer"
>                 preImportDeleteQuery="">
>             <field column="id" xpath="/RDF/item/title" commonField="true"
> />
>             <field column="link" xpath="/RDF/item/link" commonField="true"
> />
>             <field column="summary" xpath="/RDF/item/description"
> commonField="true" />
>             <field column="date" xpath="/RDF/item/date" commonField="true"
> />
>         </entity>
>         -->
>     </document>
> </dataConfig>
>
>
> On 1/24/15, 3:45 PM, Jack Krupansky wrote:
>
>> How are you currently importing data?
>>
>> -- Jack Krupansky
>>
>> On Sat, Jan 24, 2015 at 3:42 PM, Carl Roberts <
>> carl.roberts.zap...@gmail.com
>>
>>> wrote:
>>> Sorry if I was not clear.  What I am asking is this:
>>>
>>> How can I parse the data during import to tokenize it by (:) and strip
>>> the
>>> cpe:/o?
>>>
>>>
>>>
>>> On 1/24/15, 3:28 PM, Alexandre Rafalovitch wrote:
>>>
>>>  You are using keywords here that seem to contradict with each other.
>>>> Or your use case is not clear.
>>>>
>>>> Specifically, you are saying you are getting stuff from a (Solr?)
>>>> query. So, the results are now outside of Solr. Then you are asking
>>>> for help to strip stuff off it. Well, it's outside of Solr, do
>>>> whatever you want with it!
>>>>
>>>> But then at the end, you say you want to search for whatever you
>>>> stripped off. So, that should be back in Solr again?
>>>>
>>>> Or are you asking something along these lines:
>>>> 1. I have a multiValued field with the following sample content... (it
>>>> does not matter to Solr where it comes from)
>>>> 2. I wanted it returned as is, but I want to be able to find documents
>>>> when somebody searches for X, Y, or Z
>>>> 3. What would be the best analyzer chain to be able to do so?
>>>>
>>>> Regards,
>>>>      Alex.
>>>> ----
>>>> Sign up for my Solr resources newsletter at http://www.solr-start.com/
>>>>
>>>>
>>>> On 24 January 2015 at 15:04, Carl Roberts <
>>>> carl.roberts.zap...@gmail.com>
>>>> wrote:
>>>>
>>>>  Hi,
>>>>>
>>>>> How can I parse the data in a field that is returned from a query?
>>>>>
>>>>> Basically,
>>>>>
>>>>> I have a multi-valued field that contains values such as these that are
>>>>> returned from a query:
>>>>>
>>>>>             "cpe:/o:freebsd:freebsd:1.1.5.1",
>>>>>             "cpe:/o:freebsd:freebsd:2.2.3",
>>>>>             "cpe:/o:freebsd:freebsd:2.2.2",
>>>>>             "cpe:/o:freebsd:freebsd:2.2.5",
>>>>>             "cpe:/o:freebsd:freebsd:2.2.4",
>>>>>             "cpe:/o:freebsd:freebsd:2.0.5",
>>>>>             "cpe:/o:freebsd:freebsd:2.2.6",
>>>>>             "cpe:/o:freebsd:freebsd:2.1.6.1",
>>>>>             "cpe:/o:freebsd:freebsd:2.0.1",
>>>>>             "cpe:/o:freebsd:freebsd:2.2",
>>>>>             "cpe:/o:freebsd:freebsd:2.0",
>>>>>             "cpe:/o:openbsd:openbsd:2.3",
>>>>>             "cpe:/o:freebsd:freebsd:3.0",
>>>>>             "cpe:/o:freebsd:freebsd:1.1",
>>>>>             "cpe:/o:freebsd:freebsd:2.1.6",
>>>>>             "cpe:/o:openbsd:openbsd:2.4",
>>>>>             "cpe:/o:bsdi:bsd_os:3.1",
>>>>>             "cpe:/o:freebsd:freebsd:1.0",
>>>>>             "cpe:/o:freebsd:freebsd:2.1.7",
>>>>>             "cpe:/o:freebsd:freebsd:1.2",
>>>>>             "cpe:/o:freebsd:freebsd:2.1.5",
>>>>>             "cpe:/o:freebsd:freebsd:2.1.7.1"],
>>>>>
>>>>> And my problem is that I need to strip the cpe:/o part and I also need
>>>>> to
>>>>> tokenize words using the (:) as a separator so that I can then search
>>>>> for
>>>>> "freebsd 1.1" or "openbsd 2.4" or just "freebsd".
>>>>>
>>>>> Thanks in advance.
>>>>>
>>>>> Joe
>>>>>
>>>>>
>

Re: How do you parse the data in a field that is returned from a query?

Reply via email to