Hello Erick,
Thank you for your fast answer.
Maybe I don't exclaim my question clearly.
I want index many files to one index entity. I will use the same behavior as
any other multivalued field which can indexed to one unique id.
So I think every ContentStreamUpdateRequest represent one index entity, isn't
it? And with each addContentStream I will add one File to this entity.
Thank you and with best Regards
Mark
-Ursprüngliche Nachricht-
Von: Erick Erickson [mailto:erickerick...@gmail.com]
Gesendet: Donnerstag, 23. Mai 2013 14:11
An: solr-user@lucene.apache.org
Betreff: Re: index multiple files into one index entity
I just skimmed your post, but I'm responding to the last bit.
If you have defined as "id" in schema.xml then no, you cannot have
multiple documents with the same ID.
Whenever a new doc comes in it replaces the old doc with that ID.
You can remove the definition and do what you want, but there are
very few Solr installations with no and it's probably a better idea
to make your id's truly unique.
Best
Erick
On Thu, May 23, 2013 at 6:14 AM, wrote:
> Hello solr team,
>
> I want to index multiple fields into one solr index entity, with the
> same id. We are using solr 4.1
>
>
> I try it with following source fragment:
>
> public void addContentSet(ContentSet contentSet) throws
> SearchProviderException {
>
> ...
>
> ContentStreamUpdateRequest csur =
> generateCSURequest(contentSet.getIndexId(), contentSet);
> String indexId = contentSet.getIndexId();
>
> ConcurrentUpdateSolrServer server =
> serverPool.getUpdateServer(indexId);
> server.request(csur);
>
> ...
> }
>
> private ContentStreamUpdateRequest generateCSURequest(String indexId,
> ContentSet contentSet)
> throws IOException {
> ContentStreamUpdateRequest csur = new
> ContentStreamUpdateRequest(confStore.getExtractUrl());
>
> ModifiableSolrParams parameters = csur.getParams();
> if (parameters == null) {
> parameters = new ModifiableSolrParams();
> }
>
> parameters.set("literalsOverride", "false");
>
> // maps the tika default content attribute to the Attribute with name
> 'fulltext'
> parameters.set("fmap.content",
> SearchSystemAttributeDef.FULLTEXT.getName());
> // create an empty content stream, this seams necessary for
> ContentStreamUpdateRequest
> csur.addContentStream(new ImaContentStream());
>
> for (Content content : contentSet.getContentList()) {
> csur.addContentStream(new ImaContentStream(content));
> // for each content stream add additional attributes
> parameters.add("literal." +
> SearchSystemAttributeDef.CONTENT_ID.getName(),
> content.getBinaryObjectId().toString());
> parameters.add("literal." +
> SearchSystemAttributeDef.CONTENT_KEY.getName(), content.getContentKey());
> parameters.add("literal." +
> SearchSystemAttributeDef.FILE_NAME.getName(), content.getContentName());
> parameters.add("literal." +
> SearchSystemAttributeDef.MIME_TYPE.getName(), content.getMimeType());
> }
>
> parameters.set("literal.id ", indexId);
>
> // adding some other attributes
> ...
>
> csur.setParams(parameters);
>
> return csur;
> }
>
> During debugging I can see that the method 'server.request(csur)' read for
> each ImaContentStream the buffer.
> When I'm looking on solr catalina log I see that the attached files reach the
> solr servlet.
>
> INFO: Releasing directory:/data/V-4-1/master0/data/index
> Apr 25, 2013 5:48:07 AM
> org.apache.solr.update.processor.LogUpdateProcessor finish
> INFO: [master0] webapp=/solr-4-1 path=/update/extract
> params={literal.searchconnectortest15_c8150e41_cc49_4a ..
> &literal.id=26afa5dc-40ad-442a-ac79-0e7880c06aa1& .
> {add=[26afa5dc-40ad-442a-ac79-0e7880c06aa1 (1433265910940958720),
> 26afa5dc-40ad-442a-ac79-0e7880c06aa1 (1433265910971367424),
> 26afa5dc-40ad-442a-ac79-0e7880c06aa1 (1433265910976610304),
> 26afa5dc-40ad-442a-ac79-0e7880c06aa1 (1433265910983950336),
> 26afa5dc-40ad-442a-ac79-0e7880c06aa1 (1433265910989193216),
> 26afa5dc-40ad-442a-ac79-0e7880c06aa1 (1433265910995484672)]} 0 58
>
>
> But only the latest in the content list will be indexed.
>
>
> My schema.xml has the following field definitions:
>
> required="true" />
> stored="true" multiValued="true"/>
>
> multiValued="true"/>
> multiValued="true"/>
> multiValued="true"/>
> stored="true" multiValued="true"/>
>
> stored="true" multiValued="true"/>
>
>
> I'm using the tika ExtractingRequestHandler which can extract binary files.
>
>
>
> startup="lazy"
> class="solr.extraction.ExtractingRequestHandler" >
>
> true
> ignored_
>
>
> true
>