Hi, I am new to Solr and I need to implement a full-text search of some PDF files. The indexing part works out of the box by using bin/post. I can see search results in the admin UI given some queries, though without the matched texts and the context.
Now I am reading this post <http://www.codewrecks.com/blog/index.php/2013/05/27/hilight-matched-text-inside-documents-indexed-with-solr-plus-tika/> for the highlighting part. It is for an older version of Solr when managed schema was not available. Before fully understand what it is doing I have some questions: 1. He defined two fields: <field name="content" type="text_general" indexed="false" stored="true" multiValued="false"/> <field name="text" type="text_general" indexed="true" stored="false" multiValued="true"/> But why are there two fields needed? Can I define a field <field name="content" type="text_general" indexed="true" stored="true" multiValued="true"/> to capture the full text? 2. How are the fields filled? I don't see relevant information in TikaEntityProcessor's documentation <https://lucene.apache.org/solr/6_6_0/solr-dataimporthandler-extras/org/apache/solr/handler/dataimport/TikaEntityProcessor.html#fields.inherited.from.class.org.apache.solr.handler.dataimport.EntityProcessorBase>. The current text extractor should already be Tika (I can see "x_parsed_by": ["org.apache.tika.parser.DefaultParser","org.apache.tika.parser.pdf.PDFParser"] in the returned JSON of some query). But even I define the fields as he said I cannot see them in the search results as keys in JSON. 3. The _text_ field seems a concatenation of other fields, does it contain the full text? Though it does not seem to be accessible by default. To be brief, using The Elements of Statistical Learning <http://statweb.stanford.edu/~tibs/ElemStatLearn/printings/ESLII_print10.pdf> as an example, how to highlight the relevant texts for the query "SVM"? And if changing the file name into "The Elements of Statistical Learning - Trevor Hastie.pdf" and post it, how to highlight "Trevor Hastie" for the query "id:Trevor Hastie"? Thank you. Best regards, Ziyuan