Hello,

let's say that you haved indexed hundreds of PDFs using the following curl 
command:

curl -Ss -X POST 
'http://mysolr:8990/solr/core0/update/extract?extractFormat=text&wt=json&literal.url=/path/to/the/pdf.pdf";

The PDF's contents are now stored in core0's "content" field.

I wonder how you create facets based on the field's contents, if you don't know 
in advance what it contains (unless you have compiled a list of 
frequently-occurring words in the PDFs, after reading them.)

Many thanks.

Philippe


Reply via email to