The unzipped XML that I am reading looks like this:
<nvd xmlns:scap-core="http://scap.nist.gov/schema/scap-core/0.1"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:patch="http://scap.nist.gov/schema/patch/0.1"
xmlns:vuln="http://scap.nist.gov/schema/vulnerability/0.4"
xmlns:cvss="http://scap.nist.gov/schema/cvss-v2/0.2"
xmlns:cpe-lang="http://cpe.mitre.org/language/2.0"
xmlns="http://scap.nist.gov/schema/feed/vulnerability/2.0"
pub_date="2015-01-10T05:37:05"
xsi:schemaLocation="http://scap.nist.gov/schema/patch/0.1
http://nvd.nist.gov/schema/patch_0.1.xsd
http://scap.nist.gov/schema/scap-core/0.1
http://nvd.nist.gov/schema/scap-core_0.1.xsd
http://scap.nist.gov/schema/feed/vulnerability/2.0
http://nvd.nist.gov/schema/nvd-cve-feed_2.0.xsd" nvd_xml_version="2.0">
<entry id="CVE-1999-0001">
<vuln:vulnerable-configuration id="http://nvd.nist.gov/">
<cpe-lang:logical-test operator="OR" negate="false">
<cpe-lang:fact-ref name="cpe:/o:bsdi:bsd_os:3.1"/>
<cpe-lang:fact-ref name="cpe:/o:freebsd:freebsd:1.0"/>
<cpe-lang:fact-ref name="cpe:/o:freebsd:freebsd:1.1"/>
<cpe-lang:fact-ref name="cpe:/o:freebsd:freebsd:1.1.5.1"/>
<cpe-lang:fact-ref name="cpe:/o:freebsd:freebsd:1.2"/>
<cpe-lang:fact-ref name="cpe:/o:freebsd:freebsd:2.0"/>
<cpe-lang:fact-ref name="cpe:/o:freebsd:freebsd:2.0.5"/>
<cpe-lang:fact-ref name="cpe:/o:freebsd:freebsd:2.1.5"/>
<cpe-lang:fact-ref name="cpe:/o:freebsd:freebsd:2.1.6"/>
<cpe-lang:fact-ref name="cpe:/o:freebsd:freebsd:2.1.6.1"/>
<cpe-lang:fact-ref name="cpe:/o:freebsd:freebsd:2.1.7"/>
<cpe-lang:fact-ref name="cpe:/o:freebsd:freebsd:2.1.7.1"/>
<cpe-lang:fact-ref name="cpe:/o:freebsd:freebsd:2.2"/>
<cpe-lang:fact-ref name="cpe:/o:freebsd:freebsd:2.2.3"/>
<cpe-lang:fact-ref name="cpe:/o:freebsd:freebsd:2.2.4"/>
<cpe-lang:fact-ref name="cpe:/o:freebsd:freebsd:2.2.5"/>
<cpe-lang:fact-ref name="cpe:/o:freebsd:freebsd:2.2.6"/>
<cpe-lang:fact-ref name="cpe:/o:freebsd:freebsd:2.2.8"/>
<cpe-lang:fact-ref name="cpe:/o:freebsd:freebsd:3.0"/>
<cpe-lang:fact-ref name="cpe:/o:openbsd:openbsd:2.3"/>
<cpe-lang:fact-ref name="cpe:/o:openbsd:openbsd:2.4"/>
<cpe-lang:fact-ref name="cpe:/o:freebsd:freebsd:2.2.2"/>
<cpe-lang:fact-ref name="cpe:/o:freebsd:freebsd:2.0.1"/>
</cpe-lang:logical-test>
</vuln:vulnerable-configuration>
<vuln:vulnerable-software-list>
<vuln:product>cpe:/o:freebsd:freebsd:2.2.8</vuln:product>
<vuln:product>cpe:/o:freebsd:freebsd:1.1.5.1</vuln:product>
<vuln:product>cpe:/o:freebsd:freebsd:2.2.3</vuln:product>
<vuln:product>cpe:/o:freebsd:freebsd:2.2.2</vuln:product>
<vuln:product>cpe:/o:freebsd:freebsd:2.2.5</vuln:product>
<vuln:product>cpe:/o:freebsd:freebsd:2.2.4</vuln:product>
<vuln:product>cpe:/o:freebsd:freebsd:2.0.5</vuln:product>
<vuln:product>cpe:/o:freebsd:freebsd:2.2.6</vuln:product>
<vuln:product>cpe:/o:freebsd:freebsd:2.1.6.1</vuln:product>
<vuln:product>cpe:/o:freebsd:freebsd:2.0.1</vuln:product>
<vuln:product>cpe:/o:freebsd:freebsd:2.2</vuln:product>
<vuln:product>cpe:/o:freebsd:freebsd:2.0</vuln:product>
<vuln:product>cpe:/o:openbsd:openbsd:2.3</vuln:product>
<vuln:product>cpe:/o:freebsd:freebsd:3.0</vuln:product>
<vuln:product>cpe:/o:freebsd:freebsd:1.1</vuln:product>
<vuln:product>cpe:/o:freebsd:freebsd:2.1.6</vuln:product>
<vuln:product>cpe:/o:openbsd:openbsd:2.4</vuln:product>
<vuln:product>cpe:/o:bsdi:bsd_os:3.1</vuln:product>
<vuln:product>cpe:/o:freebsd:freebsd:1.0</vuln:product>
<vuln:product>cpe:/o:freebsd:freebsd:2.1.7</vuln:product>
<vuln:product>cpe:/o:freebsd:freebsd:1.2</vuln:product>
<vuln:product>cpe:/o:freebsd:freebsd:2.1.5</vuln:product>
<vuln:product>cpe:/o:freebsd:freebsd:2.1.7.1</vuln:product>
</vuln:vulnerable-software-list>
<vuln:cve-id>CVE-1999-0001</vuln:cve-id>
<vuln:published-datetime>1999-12-30T00:00:00.000-05:00</vuln:published-datetime>
<vuln:last-modified-datetime>2010-12-16T00:00:00.000-05:00</vuln:last-modified-datetime>
<vuln:cvss>
<cvss:base_metrics>
<cvss:score>5.0</cvss:score>
<cvss:access-vector>NETWORK</cvss:access-vector>
<cvss:access-complexity>LOW</cvss:access-complexity>
<cvss:authentication>NONE</cvss:authentication>
<cvss:confidentiality-impact>NONE</cvss:confidentiality-impact>
<cvss:integrity-impact>NONE</cvss:integrity-impact>
<cvss:availability-impact>PARTIAL</cvss:availability-impact>
<cvss:source>http://nvd.nist.gov</cvss:source>
<cvss:generated-on-datetime>2004-01-01T00:00:00.000-05:00</cvss:generated-on-datetime>
</cvss:base_metrics>
</vuln:cvss>
<vuln:cwe id="CWE-20"/>
<vuln:references reference_type="UNKNOWN" xml:lang="en">
<vuln:source>OSVDB</vuln:source>
<vuln:reference href="http://www.osvdb.org/5707"
xml:lang="en">5707</vuln:reference>
</vuln:references>
<vuln:references reference_type="UNKNOWN" xml:lang="en">
<vuln:source>CONFIRM</vuln:source>
<vuln:reference
href="http://www.openbsd.org/errata23.html#tcpfix"
xml:lang="en">http://www.openbsd.org/errata23.html#tcpfix</vuln:reference>
</vuln:references>
<vuln:summary>ip_input.c in BSD-derived TCP/IP implementations
allows remote attackers to cause a denial of service (crash or hang) via
crafted packets.</vuln:summary>
</entry>
On 1/24/15, 3:49 PM, Carl Roberts wrote:
Via this rss-data-config.xml file and a class that I wrote (attached)
to download and XML file from a ZIP URL:
<dataConfig>
<dataSource type="ZIPURLDataSource" connectionTimeout="15000"
readTimeout="30000"/>
<document>
<entity name="cve-2002"
pk="id"
url="https://nvd.nist.gov/feeds/xml/cve/nvdcve-2.0-2002.xml.zip"
processor="XPathEntityProcessor"
forEach="/nvd/entry">
<field column="id" xpath="/nvd/entry/@id"
commonField="false" />
<field column="cve" xpath="/nvd/entry/cve-id"
commonField="false" />
<field column="cwe" xpath="/nvd/entry/cwe/@id"
commonField="false" />
<field column="vulnerable-configuration"
xpath="/nvd/entry/vulnerable-configuration/logical-test/fact-ref/@name" commonField="false"
/>
<field column="vulnerable-software"
xpath="/nvd/entry/vulnerable-software-list/product"
commonField="false" />
<field column="published"
xpath="/nvd/entry/published-datetime" commonField="false" />
<field column="modified"
xpath="/nvd/entry/last-modified-datetime" commonField="false" />
<field column="summary" xpath="/nvd/entry/summary"
commonField="false" />
</entity>
<entity name="cve-2003"
pk="id"
url="http://nvd.nist.gov/feeds/xml/cve/nvdcve-2.0-2003.xml.zip"
processor="XPathEntityProcessor"
forEach="/nvd/entry">
<field column="id" xpath="/nvd/entry/@id"
commonField="false" />
<field column="cve" xpath="/nvd/entry/cve-id"
commonField="false" />
<field column="cwe" xpath="/nvd/entry/cwe/@id"
commonField="false" />
<field column="vulnerable-configuration"
xpath="/nvd/entry/vulnerable-configuration/logical-test/fact-ref/@name" commonField="false"
/>
<field column="vulnerable-software"
xpath="/nvd/entry/vulnerable-software-list/product"
commonField="false" />
<field column="published"
xpath="/nvd/entry/published-datetime" commonField="false" />
<field column="modified"
xpath="/nvd/entry/last-modified-datetime" commonField="false" />
<field column="summary" xpath="/nvd/entry/summary"
commonField="false" />
</entity>
<!--
<entity name="nvd-rss-update"
pk="link"
url="https://nvd.nist.gov/download/nvd-rss.xml"
processor="XPathEntityProcessor"
forEach="/RDF/item"
transformer="DateFormatTransformer"
preImportDeleteQuery="">
<field column="id" xpath="/RDF/item/title"
commonField="true" />
<field column="link" xpath="/RDF/item/link"
commonField="true" />
<field column="summary" xpath="/RDF/item/description"
commonField="true" />
<field column="date" xpath="/RDF/item/date"
commonField="true" />
</entity>
-->
</document>
</dataConfig>
On 1/24/15, 3:45 PM, Jack Krupansky wrote:
How are you currently importing data?
-- Jack Krupansky
On Sat, Jan 24, 2015 at 3:42 PM, Carl Roberts
<carl.roberts.zap...@gmail.com
wrote:
Sorry if I was not clear. What I am asking is this:
How can I parse the data during import to tokenize it by (:) and
strip the
cpe:/o?
On 1/24/15, 3:28 PM, Alexandre Rafalovitch wrote:
You are using keywords here that seem to contradict with each other.
Or your use case is not clear.
Specifically, you are saying you are getting stuff from a (Solr?)
query. So, the results are now outside of Solr. Then you are asking
for help to strip stuff off it. Well, it's outside of Solr, do
whatever you want with it!
But then at the end, you say you want to search for whatever you
stripped off. So, that should be back in Solr again?
Or are you asking something along these lines:
1. I have a multiValued field with the following sample content... (it
does not matter to Solr where it comes from)
2. I wanted it returned as is, but I want to be able to find documents
when somebody searches for X, Y, or Z
3. What would be the best analyzer chain to be able to do so?
Regards,
Alex.
----
Sign up for my Solr resources newsletter at http://www.solr-start.com/
On 24 January 2015 at 15:04, Carl Roberts
<carl.roberts.zap...@gmail.com>
wrote:
Hi,
How can I parse the data in a field that is returned from a query?
Basically,
I have a multi-valued field that contains values such as these
that are
returned from a query:
"cpe:/o:freebsd:freebsd:1.1.5.1",
"cpe:/o:freebsd:freebsd:2.2.3",
"cpe:/o:freebsd:freebsd:2.2.2",
"cpe:/o:freebsd:freebsd:2.2.5",
"cpe:/o:freebsd:freebsd:2.2.4",
"cpe:/o:freebsd:freebsd:2.0.5",
"cpe:/o:freebsd:freebsd:2.2.6",
"cpe:/o:freebsd:freebsd:2.1.6.1",
"cpe:/o:freebsd:freebsd:2.0.1",
"cpe:/o:freebsd:freebsd:2.2",
"cpe:/o:freebsd:freebsd:2.0",
"cpe:/o:openbsd:openbsd:2.3",
"cpe:/o:freebsd:freebsd:3.0",
"cpe:/o:freebsd:freebsd:1.1",
"cpe:/o:freebsd:freebsd:2.1.6",
"cpe:/o:openbsd:openbsd:2.4",
"cpe:/o:bsdi:bsd_os:3.1",
"cpe:/o:freebsd:freebsd:1.0",
"cpe:/o:freebsd:freebsd:2.1.7",
"cpe:/o:freebsd:freebsd:1.2",
"cpe:/o:freebsd:freebsd:2.1.5",
"cpe:/o:freebsd:freebsd:2.1.7.1"],
And my problem is that I need to strip the cpe:/o part and I also
need to
tokenize words using the (:) as a separator so that I can then
search for
"freebsd 1.1" or "openbsd 2.4" or just "freebsd".
Thanks in advance.
Joe