Thanks Jack & Shawn.

As per Jack's comment, if I add update="set" to my "id" field, solr
does not remove/replace the document:
curl http://localhost:10000/solr/simple-collection/update?commit=true
-H "Content-Type: text/xml" -d '
<?xml version="1.0" encoding="UTF-8"?>
<add>
    <doc>
        <field name="id" update="set">doc1</field>
    </doc>
</add>'

http://localhost:10000/solr/simple-collection_shard1_replica1/select?q=*%3A*&wt=xml&indent=true
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">12</int>
<lst name="params">
<str name="indent">true</str>
<str name="q">*:*</str>
<str name="wt">xml</str>
</lst>
</lst>
<result name="response" numFound="1" start="0" maxScore="1.0">
<doc>
<str name="id">doc1</str>
<str name="name_s">Document 1</str>
<arr name="keywords_ss">
<str>A</str>
</arr>
<long name="_version_">1431508208721068032</long>
</doc>
</result>
</response>

As you can see in this example there was no change, which is exactly
what Jack is saying. Although only specifying the "id" in an update is
a little pathological if one wanted to be extra cautious they could
prevent solr from deleting an existing document they could add
update="set" to their "id" field.

I discovered this behaviour because an ORM tool I am using was
incorrectly issuing a Solr update when none of the fields were
modified and thus only the "id" field was sent in the update request.
But the result was unexpected in that we were losing the contents of
documents in our Solr core.

I have to agree with Shawn that there would be value in having an
<update/> XML element as this is more intent revealing. The current
behaviour is really add_or_update: what if I really don't want
add_or_update semantics and just update_or_fail?

Anyway, thanks for the help gentlemen.

On Fri, Apr 5, 2013 at 4:08 PM, Jack Krupansky <j...@basetechnology.com> wrote:
> Since you don't have any "update" attribute specified, you are doing a
> simple "add" - which deletes the old document with that key and replaces it
> with the data from the "add" document.
>
> Again: It is the presence of the "update" that turns the document <add> into
> an "update", otherwise <add> simply replaces any existing document or adds a
> new document.
>
> -- Jack Krupansky
>
> -----Original Message----- From: Curtis Beattie
> Sent: Friday, April 05, 2013 2:52 PM
> To: solr-user@lucene.apache.org
> Subject: Solr 4.2 - Unexpected behaviour when updating a document with only
> id field specified in the update
>
>
> I am experiencing some peculiar behavior when updating a document. I'm
> curious whether this is "working as intended" or whether it is a
> defect. Allow me to articulate the problem using an example (should be
> easily reproducable with the "example" configuration data).
>
> The workflow is as follows:
>
> 1) Create a document with fields: id, name_s and keywords_ss (works as
> expected).
> 2) Update the document by specifying id and replacing keywords_ss
> (works as expected).
> 3) Update the document by only specifying id (unusual behavior:
> document is "wiped")
>
>
> Step #1 - Create the document
> curl http://localhost:10000/solr/simple-collection/update?commit=true
> -H "Content-Type: text/xml" -d '
> <?xml version="1.0" encoding="UTF-8"?>
> <add>
>    <doc>
>        <field name="id">doc1</field>
>        <field name="name_s">Document 1</field>
>        <field name="keywords_ss">A</field>
>    </doc>
> </add>'
>
> http://localhost:10000/solr/simple-collection_shard1_replica1/select?q=*%3A*&wt=xml&indent=true
> <response>
> <lst name="responseHeader">
> <int name="status">0</int>
> <int name="QTime">14</int>
> <lst name="params">
> <str name="indent">true</str>
> <str name="q">*:*</str>
> <str name="wt">xml</str>
> </lst>
> </lst>
> <result name="response" numFound="1" start="0" maxScore="1.0">
> <doc>
> <str name="id">doc1</str>
> <str name="name_s">Document 1</str>
> <arr name="keywords_ss">
> <str>A</str>
> </arr>
> <long name="_version_">1431502565339561984</long>
> </doc>
> </result>
> </response>
>
> Step #2 - Update the document specifying id & keywords_ss
> curl http://localhost:10000/solr/simple-collection/update?commit=true
> -H "Content-Type: text/xml" -d '
> <?xml version="1.0" encoding="UTF-8"?>
> <add>
>    <doc>
>        <field name="id">doc1</field>
>        <field name="keywords_ss" update="set">B</field>
>    </doc>
> </add>'
>
> http://localhost:10000/solr/simple-collection_shard1_replica1/select?q=*%3A*&wt=xml&indent=true
> <response>
> <lst name="responseHeader">
> <int name="status">0</int>
> <int name="QTime">13</int>
> <lst name="params">
> <str name="indent">true</str>
> <str name="q">*:*</str>
> <str name="wt">xml</str>
> </lst>
> </lst>
> <result name="response" numFound="1" start="0" maxScore="1.0">
> <doc>
> <str name="id">doc1</str>
> <str name="name_s">Document 1</str>
> <arr name="keywords_ss">
> <str>B</str>
> </arr>
> <long name="_version_">1431502700990693376</long>
> </doc>
> </result>
> </response>
>
> Step #3 - Update the document specifying only 'id'
> curl http://localhost:10000/solr/simple-collection/update?commit=true
> -H "Content-Type: text/xml" -d '
> <?xml version="1.0" encoding="UTF-8"?>
> <add>
>    <doc>
>        <field name="id">doc1</field>
>    </doc>
> </add>'
>
> http://localhost:10000/solr/simple-collection_shard1_replica1/select?q=*%3A*&wt=xml&indent=true
> <response>
> <lst name="responseHeader">
> <int name="status">0</int>
> <int name="QTime">14</int>
> <lst name="params">
> <str name="indent">true</str>
> <str name="q">*:*</str>
> <str name="wt">xml</str>
> </lst>
> </lst>
> <result name="response" numFound="1" start="0" maxScore="1.0">
> <doc>
> <str name="id">doc1</str>
> <long name="_version_">1431502818264481792</long>
> </doc>
> </result>
> </response>
>
> ---
>
> Now I realize that "updating" a document and specifying only the 'id'
> is pointless but the unusual behavior, in my view, is that in this
> circumstance Solr seems to be deleting the 'name_s' field. In fact,
> all fields except 'id' are lost. The unusual behaviour, in my view, is
> that Solr will perform an update when at least one field (other than
> 'id') is specified but when only 'id' is specified it seems to be
> deleting and re-adding the document without preserving the existing
> data.
>
> Can someone please comment on this behaviour and indicate whether or
> not it is in fact correct or if it represents a defect?
>
> Thanks,
> --
> Curt



-- 
Curt

Reply via email to