Hi,

The content field is unable to be shown during searching, even though the
following line has been added to the schema using curl from the resource
named in 'managedSchemaResourceName'.

<field name="content" stored="true" type="text_general" indexed="true"/>

I'm using the schema from ManagedIndexSchemaFactory.

As the ExtractRequestHandler has already been defined in solrconfig.xml by
default, and I'm using the ManagedIndexSchemaFactory. I have add the
content field line to allow the indexed content to be shown when user does
a query, as the default setting is not for the content to be shown. I added
in using curl as follows:

$ curl -X POST -H 'Content-type:application/json' --data-binary '{
"update-field" :

{ "name":"text", "type":"text_general", "stored":true, "indexed":true,
"storeOffsetsWithPositions":true}

}' http://localhost:8983/solr/collection1/schema

I have indexed the document using the following command:
java -Dc=collection1 -Dauto=true -jar example\exampledocs\post.jar
example\exampledcos\solr-word.pdf.

The document is successfully indexed, and when I does a search of any words
from the content, the search is able to return document ID and other
informations like subject, author, date, etc. However, the content of the
document is not shown.

This is what I got from the result.

If I didn't request the content field in the fl parameters, this is what I
got.

{
  "responseHeader": {
    "status": 0,
    "QTime": 0,
    "params": {
      "indent": "true",
      "q": "*:*",
      "_": "1425362114731",
      "wt": "json"
    }
  },
  "response": {
    "numFound": 2,
    "start": 0,
    "docs": [
      {
        "id": "C:\\Users\\GHI\\solr-5.0.0\\example\\exampledocs\\solr-word.pdf",
        "meta_save_date": [
          "2008-11-13T00:00:00Z"
        ],
        "dc_subject": [
          "solr, word, pdf"
        ],
        "subject": [
          "solr word"
        ],
        "author": [
          "Grant Ingersoll"
        ],
        "dcterms_created": [
          "2008-11-13T00:00:00Z"
        ],
        "date": [
          "2008-11-13T00:00:00Z"
        ],
        "creator": [
          "Grant Ingersoll"
        ],
        "creation_date": [
          "2008-11-13T00:00:00Z"
        ],
        "title": [
          "solr-word"
        ],
        "meta_author": [
          "Grant Ingersoll"
        ],
        "stream_content_type": [
          "application/pdf"
        ],
        "created": [
          "Thu Nov 13 13:35:51 UTC 2008"
        ],
        "stream_size": [
          21052
        ],
        "meta_keyword": [
          "solr, word, pdf"
        ],
        "cp_subject": [
          "solr word"
        ],
        "dc_format": [
          "application/pdf; version=1.3"
        ],
        "xmp_creatortool": [
          "Microsoft Word"
        ],
        "resourcename": [
          "C:\\Users\\GHI\\solr-5.0.0\\example\\exampledocs\\solr-word.pdf"
        ],
        "keywords": [
          "solr, word, pdf"
        ],
        "last_save_date": [
          "2008-11-13T00:00:00Z"
        ],
        "dc_title": [
          "solr-word"
        ],
        "dcterms_modified": [
          "2008-11-13T00:00:00Z"
        ],
        "meta_creation_date": [
          "2008-11-13T00:00:00Z"
        ],
        "dc_creator": [
          "Grant Ingersoll"
        ],
        "pdf_pdfversion": [
          1.3
        ],
        "last_modified": [
          "2008-11-13T00:00:00Z"
        ],
        "aapl_keywords": [
          "solr, word, pdf"
        ],
        "x_parsed_by": [
          "org.apache.tika.parser.DefaultParser",
          "org.apache.tika.parser.pdf.PDFParser"
        ],
        "modified": [
          "2008-11-13T00:00:00Z"
        ],
        "xmptpg_npages": [
          1
        ],
        "pdf_encrypted": [
          false
        ],
        "producer": [
          "Mac OS X 10.5.5 Quartz PDFContext"
        ],
        "content_type": [
          "application/pdf"
        ],
        "_version_": 1494155334466404300
      },
      {
        "id": 
"C:\\Users\\GHI\\solr-5.0.0\\example\\exampledocs\\solr-word2.pdf",
        "meta_save_date": [
          "2015-02-25T00:00:00Z"
        ],
        "author": [
          "GHI"
        ],
        "dcterms_created": [
          "2015-02-25T00:00:00Z"
        ],
        "date": [
          "2015-02-25T00:00:00Z"
        ],
        "creator": [
          "GHI"
        ],
        "creation_date": [
          "2015-02-25T00:00:00Z"
        ],
        "title": [
          "This is another test of PDF extraction in Solr"
        ],
        "meta_author": [
          "GHI"
        ],
        "stream_content_type": [
          "application/pdf"
        ],
        "created": [
          "Wed Feb 25 08:32:19 UTC 2015"
        ],
        "stream_size": [
          10345
        ],
        "dc_format": [
          "application/pdf; version=1.4"
        ],
        "xmp_creatortool": [
          "PDFCreator Version 1.3.2"
        ],
        "resourcename": [
          "C:\\Users\\GHI\\solr-5.0.0\\example\\exampledocs\\solr-word2.pdf"
        ],
        "last_save_date": [
          "2015-02-25T00:00:00Z"
        ],
        "dc_title": [
          "This is another test of PDF extraction in Solr"
        ],
        "dcterms_modified": [
          "2015-02-25T00:00:00Z"
        ],
        "meta_creation_date": [
          "2015-02-25T00:00:00Z"
        ],
        "dc_creator": [
          "GHI"
        ],
        "pdf_pdfversion": [
          1.4
        ],
        "last_modified": [
          "2015-02-25T00:00:00Z"
        ],
        "x_parsed_by": [
          "org.apache.tika.parser.DefaultParser",
          "org.apache.tika.parser.pdf.PDFParser"
        ],
        "modified": [
          "2015-02-25T00:00:00Z"
        ],
        "xmptpg_npages": [
          1
        ],
        "pdf_encrypted": [
          false
        ],
        "producer": [
          "GPL Ghostscript 9.05"
        ],
        "content_type": [
          "application/pdf"
        ],
        "_version_": 1494155342991327200
      }
    ]
  }
}

If I request for the content field in the fl parameters, this is what I got.

{
  "responseHeader": {
    "status": 0,
    "QTime": 1,
    "params": {
      "fl": "content",
      "indent": "true",
      "q": "*:*",
      "_": "1425362147661",
      "wt": "json"
    }
  },
  "response": {
    "numFound": 2,
    "start": 0,
    "docs": [
      {},
      {}
    ]
  }
}


If I do a query like q=content:[* TO *]&fl=id,content

{
  "responseHeader":{
    "status":0,
    "QTime":5,
    "params":{
      "fl":"id,content",
      "q":"content:[* TO *]"}},
  "response":{"numFound":0,"start":0,"docs":[]
  }}


I'm able to get it to work in Solr 4.10.1, but it's not working in
Solr 5.0. Is there anything that I need to take note for Solr 5.0
which is different from the previous versions of Solr?


Regards,

Edwin

Reply via email to