On 5/6/2016 6:38 AM, Betsey Benagh wrote: > Since it appears that using a recent version of Tika with Solr is not really > feasible, I'm trying to run Grobid on my files, and then import the > corresponding XML into Solr. > > I don't see any errors on the post: > > bba0124$ bin/post -c lrdtest ~/software/grobid/out/021002_1.tei.xml > /Library/Java/JavaVirtualMachines/jdk1.8.0_71.jdk/Contents/Home/bin/java > -classpath /Users/bba0124/software/solr-5.5.0/dist/solr-core-5.5.0.jar > -Dauto=yes -Dc=lrdtest -Ddata=files org.apache.solr.util.SimplePostTool > /Users/bba0124/software/grobid/out/021002_1.tei.xml > SimplePostTool version 5.0.0 > Posting files to [base] url http://localhost:8983/solr/lrdtest/update... > Entering auto mode. File endings considered are > xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,r > tf,htm,html,txt,log > POSTing file 021002_1.tei.xml (application/xml) to [base] > 1 files indexed. > COMMITting Solr index changes to > http://localhost:8983/solr/lrdtest/update... > Time spent: 0:00:00.027 > > But the documents don't seem to show up in the index, either. > > > Additionally, if I try uploading the documents using the web UI, they > appear to upload successfully, > > Response:{ > "responseHeader": { > "status": 0, > "QTime": 7 > } > } > > But aren't in the index. > > What am I missing?
The way that you have used bin/post assumes that the XML is in the Solr xml update format. Is your XML file in that format, or is it something else generated by Tika? A 'bad' XML file will not necessarily throw an error, it might simply be ignored because it does not contain any actions for Solr to process. https://wiki.apache.org/solr/UpdateXmlMessages If it's some other kind of XML data generated by Tika, then I am not sure what you need to do in order to get the information into Solr. Perhaps it needs to be sent through the /update/extract handler (instead of /update), or maybe you will need to use DIH to run it through the XPathEntityProcessor. Thanks, Shawn