Are you using apache tika parser to parse pdf files?

1) Solr support parent-child block join using which you can index more than
one file data within document object(if that is what you are looking for)
https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-BlockJoinQueryParsers

2) If the unique key of the document that exists in index is equal to new
document that you are reindexing, it will be overwritten. If you'd like to
do partial updates via curl, here are some examples listed :
http://yonik.com/solr/atomic-updates/





On Thu, Mar 24, 2016 at 3:43 AM, Jay Parashar <bparas...@slb.com> wrote:

> Hi,
>
> I have couple of questions regarding indexing files (say pdf).
>
> 1)      Is there any way to index more than one file to one document with
> a unique id?
>
> One way I think is to do a “extractOnly” of all the documents and then
> index that extract separately. Is there an easier way?
>
> 2)      If my Solr document has existing fields populated and then I index
> a pdf, it seems it overwrites the document with the end result being just
> the contents of the pdf. I know we can do partial updates using SolrJ but
> is it possible to do partial updates of pdf using curl?
>
>
> Thanks
> Jay
>

Reply via email to