I'm trying to write a DIH to incorporate page view metrics from an XML feed into our index. The DIH makes a single request, and updates 0 documents. I set log level to "finest" for the entire dataimport section, but I still can't tell what's wrong. I suspect the XPath. http://localhost:8080/solr/core1/admin/dataimport.jsp?handler=/dataimport returns 404. Any suggestions on how I can debug this?

   *

     solr-spec
         4.0.0.2012.08.06.22.50.47


The XML data:

<?xml version='1.0' encoding='UTF-8'?>
<ReportDataResponse>
<Data>
<Rows>
<Row rowKey="P#PRODUCT: BURLAP POTATO SACKS (PACK OF 12) (W4537)#N/A#550000000016196614" rowActionAvailability="0 0 0"> <Value columnId="PAGE_NAME" comparisonSpecifier="A">PRODUCT: BURLAP POTATO SACKS (PACK OF 12) (W4537)</Value>
<Value columnId="PAGE_VIEWS" comparisonSpecifier="A">2388</Value>
</Row>
<Row rowKey="P#PRODUCT: OPAQUE PONY BEADS 6X9MM (BAG OF 850) (BE9000)#N/A#550000000021976460" rowActionAvailability="0 0 0"> <Value columnId="PAGE_NAME" comparisonSpecifier="A">PRODUCT: OPAQUE PONY BEADS 6X9MM (BAG OF 850) (BE9000)</Value>
<Value columnId="PAGE_VIEWS" comparisonSpecifier="A">1313</Value>
</Row>
</Rows>
</Data>
</ReportDataResponse>

My DIH:

|<dataConfig>
 <dataSource name="coremetrics"
             type="URLDataSource"
             encoding="UTF-8"
             connectionTimeout="5000"
             readTimeout="10000"/>

 <document>
        <entity  name="coremetrics"
            dataSource="coremetrics"
            pk="id"
            
url="https://welcome.coremetrics.com/analyticswebapp/api/1.0/report-data/contentcategory/bypage.ftl?clientId=******&amp;username=****&amp;format=XML&amp;userAuthKey=****&amp;language=en_US&mp;viewID=9475540&amp;period_a=M20110930";
            processor="XPathEntityProcessor"
            stream="true"
            forEach="/ReportDataResponse/Data/Rows/Row"
            logLevel="fine"
            transformer="RegexTransformer"  >

            <field  column="part_code"  name="id"    
xpath="/ReportDataResponse/Data/Rows/Row/Value[@columnId='PAGE_NAME']"  regex="/^PRODUCT:.*\((.*?)\)$/"  
replaceWith="$1"/>
            <field  column="page_views"             
xpath="/ReportDataResponse/Data/Rows/Row/Value[@columnId='PAGE_VIEWS']"  />
       </entity>
 </document>
</dataConfig>
|

|||This little test perl script correctly extracts the data:|
||
|use XML::XPath;|
|use XML::XPath::XMLParser;|
||
|my $xp = XML::XPath->new(filename => 'cm.xml');|
|||my $nodeset = $xp->find('/ReportDataResponse/Data/Rows/Row');|
|||foreach my $node ($nodeset->get_nodelist) {|
|||my $page_name = $node->findvalue('Value[@columnId="PAGE_NAME"]');|
|    my $page_views = $node->findvalue('Value[@columnId="PAGE_VIEWS"]');|
|    $page_name =~ s/^PRODUCT:.*\((.*?)\)$/$1/;|
|}|

From logs:

INFO: Loading DIH Configuration: data-config.xml
Aug 24, 2012 3:53:10 PM org.apache.solr.handler.dataimport.DataImporter loadDataConfig
INFO: Data Configuration loaded successfully
Aug 24, 2012 3:53:10 PM org.apache.solr.core.SolrCore execute
INFO: [ssww] webapp=/solr path=/dataimport params={command=full-import} status=0 QTime=2 Aug 24, 2012 3:53:10 PM org.apache.solr.handler.dataimport.DataImporter doFullImport
INFO: Starting Full Import
Aug 24, 2012 3:53:10 PM org.apache.solr.handler.dataimport.SimplePropertiesWriter readIndexerProperties
INFO: Read dataimport.properties
Aug 24, 2012 3:53:10 PM org.apache.solr.update.DirectUpdateHandler2 deleteAll
INFO: [ssww] REMOVING ALL DOCUMENTS FROM INDEX
Aug 24, 2012 3:53:10 PM org.apache.solr.handler.dataimport.URLDataSource getData FINE: Accessing URL: https://welcome.coremetrics.com/analyticswebapp/api/1.0/report-data/contentcategory/bypage.ftl?clientId=*****&username=***&format=XML&userAuthKey=******&language=en_US&viewID=9475540&period_a=M20110930
Aug 24, 2012 3:53:10 PM org.apache.solr.core.SolrCore execute
INFO: [ssww] webapp=/solr path=/dataimport params={command=status} status=0 QTime=0
Aug 24, 2012 3:53:12 PM org.apache.solr.core.SolrCore execute
INFO: [ssww] webapp=/solr path=/dataimport params={command=status} status=0 QTime=1
Aug 24, 2012 3:53:14 PM org.apache.solr.core.SolrCore execute
INFO: [ssww] webapp=/solr path=/dataimport params={command=status} status=0 QTime=1
Aug 24, 2012 3:53:16 PM org.apache.solr.core.SolrCore execute
INFO: [ssww] webapp=/solr path=/dataimport params={command=status} status=0 QTime=0
Aug 24, 2012 3:53:18 PM org.apache.solr.core.SolrCore execute
INFO: [ssww] webapp=/solr path=/dataimport params={command=status} status=0 QTime=0
Aug 24, 2012 3:53:20 PM org.apache.solr.core.SolrCore execute
INFO: [ssww] webapp=/solr path=/dataimport params={command=status} status=0 QTime=0
Aug 24, 2012 3:53:22 PM org.apache.solr.core.SolrCore execute
INFO: [ssww] webapp=/solr path=/dataimport params={command=status} status=0 QTime=0
Aug 24, 2012 3:53:24 PM org.apache.solr.core.SolrCore execute
INFO: [ssww] webapp=/solr path=/dataimport params={command=status} status=0 QTime=0
Aug 24, 2012 3:53:27 PM org.apache.solr.core.SolrCore execute
INFO: [ssww] webapp=/solr path=/dataimport params={command=status} status=0 QTime=0
Aug 24, 2012 3:53:28 PM org.apache.solr.handler.dataimport.DocBuilder finish
INFO: Import completed successfully
Aug 24, 2012 3:53:28 PM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start commit{flags=0,_version_=0,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false}
Aug 24, 2012 3:53:28 PM org.apache.solr.core.SolrDeletionPolicy onCommit
INFO: SolrDeletionPolicy.onCommit: commits:num=2
commit{dir=/var/lib/tomcat6/solr/apache-solr-4.0.0-BETA/core1/data/index,segFN=segments_2b,generation=83,filenames=[segments_2b] commit{dir=/var/lib/tomcat6/solr/apache-solr-4.0.0-BETA/core1/data/index,segFN=segments_2c,generation=84,filenames=[segments_2c] Aug 24, 2012 3:53:28 PM org.apache.solr.core.SolrDeletionPolicy updateCommits
INFO: newest commit = 84
Aug 24, 2012 3:53:28 PM org.apache.solr.search.SolrIndexSearcher <init>
INFO: Opening Searcher@ff33d42 main
Aug 24, 2012 3:53:28 PM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: end_commit_flush
Aug 24, 2012 3:53:28 PM org.apache.solr.core.QuerySenderListener newSearcher
INFO: QuerySenderListener sending requests to Searcher@ff33d42 main{StandardDirectoryReader(segments_2c:323)}
Aug 24, 2012 3:53:28 PM org.apache.solr.core.QuerySenderListener newSearcher
INFO: QuerySenderListener done.
Aug 24, 2012 3:53:28 PM org.apache.solr.core.SolrCore registerSearcher
INFO: [ssww] Registered new searcher Searcher@ff33d42 main{StandardDirectoryReader(segments_2c:323)} Aug 24, 2012 3:53:28 PM org.apache.solr.handler.dataimport.SimplePropertiesWriter readIndexerProperties
INFO: Read dataimport.properties
Aug 24, 2012 3:53:28 PM org.apache.solr.handler.dataimport.SimplePropertiesWriter persist
INFO: Wrote last indexed time to dataimport.properties
Aug 24, 2012 3:53:28 PM org.apache.solr.handler.dataimport.DocBuilder execute
INFO: Time taken = 0:0:17.918
Aug 24, 2012 3:53:28 PM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: [ssww] webapp=/solr path=/dataimport params={command=full-import} status=0 QTime=2 {deleteByQuery=*:*,commit=} 0 2

Reply via email to