In case the exact problem was not clear to somebody: The problem with FileUpload interpreting file data as regular form fields is that, Solr thinks there are no content streams in the request and throws a "missing_content_stream" exception.
On Thu, Mar 10, 2011 at 10:59 AM, Karthik Shiraly < karthikshiral...@gmail.com> wrote: > Hi, > > I'm using Solr 1.4.1. > The scenario involves user uploading multiple files. These have content > extracted using SolrCell, then indexed by Solr along with other information > about the user. > > ContentStreamUpdateRequest seemed like the right choice for this - use > addFile() to send file data, and use setParam() to add normal data fields. > > However, when I do multiple addFile() to ContentStreamUpdateRequest, I > observed that at the server side, even the file parts of this multipart post > are interpreted as regular form fields by the FileUpload component. > I found that FileUpload does so because the "filename" value in > "Content-Disposition" headers of each part are not being set. > Digging a bit further, it seems the actual root cause is in the client side > solrj API ... the CommonsHttpSolrServer class is not setting "filename" > value in "Content-Disposition" header while creating multipart Part > instances (from HttpClient framework). > > I solved this problem by a hack - in CommonsHttpSolrServer.request() method > where the PartBase instances are created, I overrode > "sendDispositionHeader()" and added "filename" value. That solved the > problem. > > However, my questions are: > 1. Am I using ContentStreamUpdateRequest wrong, or is this actually a bug? > Should I be using something else? > > 2. My end goal is to map contents of each file to *separate* fields, not a > common field. Since the regular ExtractingRequestHandler maps all content to > just one field, I believe I've to create a custom RequestHandler (possibly > reusing existing SolrCell classes). > Is this approach right? > > Thanks > Karthik > > >