Hi Grant,

Happy to.

Currently we are sending over documents by building a big XML file of
all of the fields of that document. Something like this:

$document = new Apache_Solr_Document();
    $document->id = apachesolr_document_id($node->nid);
    $document->title = $node->title;
    $document->body = strip_tags($text);
    $document->type  = $node->type;
    foreach ($categories as $cat) {
       $document->setMultiValue('category', $cat);
    }

The PHP Client library then takes all of this, and builds it into an
XML payload which we POST over to Solr.

When we implement rich file handling, I see these instructions:

-----------------------------
Literals

To add in your own metadata, pass in the literal parameter along with the file:

 curl 
http://localhost:8983/solr/update/extract?ext.idx.attr=true\&ext.def.fl=text\&ext.map.div=foo_t\&ext.capture=div\&ext.boost.foo_t=3\&ext.literal.blah_i=1
 -F "tutori...@tutorial.pdf"

-----------------------------

So it seems we can:

a). Refactor the class to not generate XML, but rather to build post
headers for each field.  We would like to avoid this.
b)  Instead, I was hoping we could send the XML payload with all the
literal fields defined (like id, type, etc), and the post fields
required for the file content and the field it belongs to in one
reqeust

Since my understanding is that docs in Solr are immutable, there is no:
c). Send the file contents over, give it an ID, and then send over the
rest of the fields and merge into that ID.

If the unfortunate answer is a, then how do we deal with multi-value
fields?  I don't know how to format them given the ext.literal format
above.

Thanks for your help and awesome contributions!

-Jacob




On Fri, Dec 12, 2008 at 4:52 AM, Grant Ingersoll <gsing...@apache.org> wrote:
>
> On Dec 10, 2008, at 10:21 PM, Jacob Singh wrote:
>
>> Hey folks,
>>
>> I'm looking at implementing ExtractingRequestHandler in the Apache_Solr_PHP
>> library, and I'm wondering what we can do about adding meta-data.
>>
>> I saw the docs, which suggests you use different post headers to pass field
>> values along with ext.literal.  Is there anyway to use the XmlUpdateHandler
>> instead along with a document?  I'm not sure how this would work, perhaps it
>> would require 2 trips, perhaps the XML would be in the post "content" and
>> the file in something else?  The thing is we would need to refactor the
>> class pretty heavily in this case when indexing RichDocs and we were hoping
>> to avoid it.
>>
>
> I'm not sure I follow how the XmlUpdateHandler plays in, can you explain a 
> little more?  My PHP is weak, but maybe some code will help...
>
>
>> Thanks,
>> Jacob
>> --
>>
>> +1 510 277-0891 (o)
>> +91 9999 33 7458 (m)
>>
>> web: http://pajamadesign.com
>>
>> Skype: pajamadesign
>> Yahoo: jacobsingh
>> AIM: jacobsingh
>> gTalk: jacobsi...@gmail.com
>
>



--

+1 510 277-0891 (o)
+91 9999 33 7458 (m)

web: http://pajamadesign.com

Skype: pajamadesign
Yahoo: jacobsingh
AIM: jacobsingh
gTalk: jacobsi...@gmail.com

Reply via email to