Short Version: What do I need to do to successfully query for terms that are adjacent to tabs and newlines (i.e. \t, \n) in an uploaded Word document?
Long Version: I am using Solr 4.6.1. I am running an unmodified version of the example core that is started by running java -jar start.jar in the example directory. The schema.xml in use is example/solr/collection1/conf/schema.xml and is unmodified (it is the one downloaded with the distribution), so I won't post it unless someone says it is helpful. After uploading a Word document to Solr with the command http://localhost:8983/solr/update/extract?literal.id=yabba&uprefix=attr_&fmap.content=attr_content&commit=true there are hundreds of tab and newline characters (i.e. \n and \t) in the attr_content field. When a string occurs only once in the document, and is adjacent to one of these characters, queries for that term are not successful. A specific example is an uploaded Word document that after upload contains "Vorname:\t\t\tYasmin" in the attr_content field. The original document contained "Vorname:", then two tab characters, then "Yasmin" (the string "\t" does not appear in the document). The string "Yasmin" appears only in that location in the document. When I query for "Yasmin" with the query http://127.0.0.1:8983/solr/collection1/select?q=Yasmin&wt=json&indent=true I get no results. Queries for terms that are not next to a \t or a \n are successful. What can I do so that a query for a term next to a tab or newline will be successful? Must I change the way the document is uploaded? Or change the way the search is performed? -- View this message in context: http://lucene.472066.n3.nabble.com/Unsuccessful-queries-for-terms-next-to-tabs-and-newlines-in-uploaded-Word-documents-tp4128090.html Sent from the Solr - User mailing list archive at Nabble.com.