bq: Is this expected behavior where it returns only a subset of the documents it has found?
No. But there is _so_ much you're leaving out here that it's totally impossible to say much. bq: I've indexed a lot of documents (*.docx & *.vsd). how? Tika? ExtractingRequestHandler? Some custom code? What fields from these docs is mapped to what fields in Solr? How are those fields analyzed? bq: "q":"NS Finance 9.2", This parses as default_search_field:NS Finance 9.2, or perhaps it goes against edismax and is searched across multiple fields. Or.... Add &debug=query to see how it is actually parsed. Which won't be found at all if this is a title and mapped to some a different field (and not put into a "bag of words" by a copyField directive). If much of that is gibberish, you have a sense of how impossible it is to say much without knowing a lot about your setup. My point is that you cannot say "I know the text is in there" and expect anything really, you have to be able to say "I know the text is going into field X. Field X is defined as fieldType Y. My query is parsed as Z" to know whether these docs should be found. And that pre-supposes you're even able to predict that the text you "know" is in the document is being extracted. PDF files for instance (I know you're not indexing them, just sayin') can be tuned to consider how much space is between letters to try to squash them together, so depending on the settings 'e r i c k' could either be 5 individual letters or one 5-letter word. And it would change if the were a little more space between the letters...... Here's an sample Solr program that uses Tika to extract text from documents, it might help you figure out what's actually happening if you're using ExtractingRequestHandler to ingest data. Best, Erick On Tue, Oct 17, 2017 at 4:53 PM, Phillip Wu <phillip...@unsw.edu.au> wrote: > Hi, > I've indexed a lot of documents (*.docx & *.vsd). > > When I run a query from the website it returns only a small proportion of the > data in the index: > { > "responseHeader":{ > "status":0, > "QTime":66, > "params":{ > "q":"NS Finance 9.2", > "fl":"id,date", > "start":"0", > "_":"1508193512223"}}, > "response":{"numFound":2053,"start":0,"docs":[ > ..here it returns only 9 documents of type *.doc > ] > > I know the search text occurs in some of the *.vsd files so I re-run: > { > "responseHeader":{ > "status":0, > "QTime":754, > "params":{ > "q":"\"NS Finance 9.2\" id:*FIN*.vsd", > "fl":"id,date", "_":"1508193512223"}}, > "response":{"numFound":9,"start":0,"docs":[ > ..here it returns only 9 documents of *.vsd > ] > > Is this expected behavior where it returns only a subset of the documents it > has found? > > I want all the documents that contain the query string. > How do I tell Solr to return ALL documents containing the string? > > > > > > > >