Hi Andras, I added <str name="uprefix">metadata_</str> so all PDF metadata fields > should be saved in solr as "metadata_something" fields. > The problem is that the "Category" metadata field from the PDF for some > reason is not prefixed with "metadata_" and > solr will merge the "Category" field I have in the schema with the Category > metadata from PDF >
This is the expected behavior, as it's described in http://wiki.apache.org/solr/ExtractingRequestHandler: uprefix=<prefix> - Prefix all fields that are not defined in the schema with > the given prefix. > You can use the fmap parameter to redirect the category metadata to another field. Regards, *Juan* On Thu, Jul 7, 2011 at 10:44 AM, Andras Balogh <and...@reea.net> wrote: > Hi, > > I think this is a bug but before reporting to issue tracker I thought I > will ask it here first. > So the problem is I have a PDF file which among other metadata fields like > Author, CreatedDate etc. has a metadata > field Category (I can see all metadata fields with tika-app.jar started in > GUI mode). > Now what happens that in my SOLR schema I have a "Category" field also > among other fields and a field called "text" > that is holding the extracted text from the PDF. > I added <str name="uprefix">metadata_</str> so all PDF metadata fields > should be saved in solr as "metadata_something" fields. > The problem is that the "Category" metadata field from the PDF for some > reason is not prefixed with "metadata_" and > solr will merge the "Category" field I have in the schema with the Category > metadata from PDF and I will have an error like: > "multiple values encountered for non multiValued field Category" > I fixed this by patching tika-parsers.jar and will ignore the Category > metadata in > org.apache.tika.parser.pdf.**PDFParser > but this is not the good solution( I don't need that Category metadata so > it works for me). > > So let me know if this should be reported as bug or not. > > Regards, > Andras. > > > > > > >