Thanks for the quick response. Here are the fields from the schema:
<field name="id" type="string" indexed="true" stored="true" required="true" /> <field name="original_name" type="text" indexed="true" stored="true"/> <field name="current" type="boolean" indexed="true" stored="true"/> <field name="file_association" type="sint" indexed="true" stored="true"/> <field name="uploaded_by_user" type="text" indexed="true" stored="true"/> <field name="text" type="text" indexed="true" stored="false" multiValued="true"/> I use text as the content field for the default field for the ERH. Here's the config of the ERH: <requestHandler name="/update/extract" class="org.apache.solr.handler.extraction.ExtractingRequestHandler"> <lst name="defaults"> <str name="ext.map.Last-Modified">last_modified</str> <bool name="ext.ignore.und.fl">true</bool> </lst> </requestHandler> Here's the output of a curl request w/ the file: <?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"><int name="status">0</int><int name="QTime">650</int></lst><str name="afetest.docx"><?xml version="1.0" encoding="UTF-8"?> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title/> </head> <body> <div class="package-entry"> <h1>[Content_Types].xml</h1> <p xmlns="http://www.w3.org/1999/xhtml"/> </div> <div class="package-entry"> <h1>_rels/.rels</h1> <p xmlns="http://www.w3.org/1999/xhtml">&lt;?xml version="1.0" encoding="UTF-8" standalone="yes"?&gt;&#xd; &lt;Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships"&gt;&lt;Relationship Id="rId4" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/extended-properties" Target="docProps/app.xml"/&gt;&lt;Relationship Id="rId1" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument" Target="word/document.xml"/&gt;&lt;Relationship Id="rId2" Type="http://schemas.openxmlformats.org/package/2006/relationships/metadata/thumbnail" Target="docProps/thumbnail.jpeg"/&gt;&lt;Relationship Id="rId3" Type="http://schemas.openxmlformats.org/package/2006/relationships/metadata/core-properties" Target="docProps/core.xml"/&gt;&lt;/Relationships&gt;</p> </div> <div class="package-entry"> <h1>word/_rels/document.xml.rels</h1> <p xmlns="http://www.w3.org/1999/xhtml">&lt;?xml version="1.0" encoding="UTF-8" standalone="yes"?&gt;&#xd; &lt;Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships"&gt;&lt;Relationship Id="rId4" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/fontTable" Target="fontTable.xml"/&gt;&lt;Relationship Id="rId1" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/styles" Target="styles.xml"/&gt;&lt;Relationship Id="rId2" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/settings" Target="settings.xml"/&gt;&lt;Relationship Id="rId3" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/webSettings" Target="webSettings.xml"/&gt;&lt;Relationship Id="rId5" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/theme" Target="theme/theme1.xml"/&gt;&lt;/Relationships&gt;</p> </div> <div class="package-entry"> <h1>word/document.xml</h1> <p xmlns="http://www.w3.org/1999/xhtml">Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum</p> </div> <div class="package-entry"> <h1>word/theme/theme1.xml</h1> <p xmlns="http://www.w3.org/1999/xhtml"/> </div> <div class="package-entry"> <h1>docProps/thumbnail.jpeg</h1> </div> <div class="package-entry"> <h1>word/settings.xml</h1> <p xmlns="http://www.w3.org/1999/xhtml"/> </div> <div class="package-entry"> <h1>word/fontTable.xml</h1> <p xmlns="http://www.w3.org/1999/xhtml"/> </div> <div class="package-entry"> <h1>word/webSettings.xml</h1> <p xmlns="http://www.w3.org/1999/xhtml"/> </div> <div class="package-entry"> <h1>docProps/core.xml</h1> <p xmlns="http://www.w3.org/1999/xhtml">Joe Doe12009-06-17T20:29:00Z2009-06-17T20:41:00Z</p> </div> <div class="package-entry"> <h1>word/styles.xml</h1> <p xmlns="http://www.w3.org/1999/xhtml"/> </div> <div class="package-entry"> <h1>docProps/app.xml</h1> <p xmlns="http://www.w3.org/1999/xhtml">Normal.dotm1100Microsoft Macintosh Word011false10genfalse0falsefalse12.0000</p> </div> </body> </html> </str><lst name="afetest.docx_metadata"><arr name="stream_source_info"><str>myfile</str></arr><arr name="stream_name"><str>afetest.docx</str></arr><arr name="stream_content_type"><str>application/octet-stream</str></arr><arr name="Content-Type"><str>application/zip</str></arr><arr name="stream_size"><str>38200</str></arr></lst> </response> Query looks like: INFO: [] webapp=/solr path=/select params={wt=standard&rows=10&start=0&explainOther=&hl.fl=&indent=on&q=text:laborum+AND+uploaded_by_user:joe&fl=*,score&qt=standard&version=2.2} hits=0 status=0 QTime=3 Please note that searching solely by "uploaded_by_user:joe" will properly return the document. Thanks again. -joe Grant Ingersoll-6 wrote: > > Can you share your schema for the fields you are indexing, the > configuration of the ExtractingRequestHandler and what your requests > look like? Also, can you share what the output of the extract only > stuff looks like? > > Also, can you post .doc files to the example per > http://wiki.apache.org/solr/ExtractingRequestHandler > ? I was able to do that and search for the doc that I entered and > it was able to handle both .doc and .docx. > > -Grant > > > -------------------------- > Grant Ingersoll > http://www.lucidimagination.com/ > > Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) > using Solr/Lucene: > http://www.lucidimagination.com/search > > > -- View this message in context: http://www.nabble.com/ExtractRequestHandler---not-properly-indexing-office-docs--tp24120125p24124928.html Sent from the Solr - User mailing list archive at Nabble.com.