Hello, let's say that you haved indexed hundreds of PDFs using the following curl command:
curl -Ss -X POST 'http://mysolr:8990/solr/core0/update/extract?extractFormat=text&wt=json&literal.url=/path/to/the/pdf.pdf" The PDF's contents are now stored in core0's "content" field. I wonder how you create facets based on the field's contents, if you don't know in advance what it contains (unless you have compiled a list of frequently-occurring words in the PDFs, after reading them.) Many thanks. Philippe