Re: Fastest way to use solrj

Tim Terlegård Wed, 27 Jan 2010 01:00:04 -0800

I have 6 fields. The text field is the biggest, it contains almost all
of the 5000 chars.


/Tim

2010/1/27 Noble Paul നോബിള്‍  नोब्ळ् <noble.p...@corp.aol.com>:
> how many fields are there in each doc? the binary format just reduces
> overhead. it does not touch/compress the payload
>
> 2010/1/27 Tim Terlegård <tim.terleg...@gmail.com>:
>> I have 3 millon documents, each having 5000 chars. The xml file is
>> about 15GB. The binary file is also about 15GB.
>>
>> I was a bit surprised about this. It doesn't bother me much though. At
>> least it performs better.
>>
>> /Tim
>>
>> 2010/1/27 Noble Paul നോബിള്‍  नोब्ळ् <noble.p...@corp.aol.com>:
>>> if you write only a few docs you may not observe much difference in
>>> size. if you write large no:of docs you may observe a big difference.
>>>
>>> 2010/1/27 Tim Terlegård <tim.terleg...@gmail.com>:
>>>> I got the binary format to work perfectly now. Performance is better
>>>> than with xml. Thanks!
>>>>
>>>> Although, it doesn't look like a binary file is smaller in size than
>>>> an xml file?
>>>>
>>>> /Tim
>>>>
>>>> 2010/1/27 Noble Paul നോബിള്‍  नोब्ळ् <noble.p...@corp.aol.com>:
>>>>> 2010/1/21 Tim Terlegård <tim.terleg...@gmail.com>:
>>>>>> Yes, it worked! Thank you very much. But do I need to use curl or can
>>>>>> I use CommonsHttpSolrServer or StreamingUpdateSolrServer? If I can't
>>>>>> use BinaryWriter then I don't know how to do this.
>>>>> if your data is serialized using JavaBinUpdateRequestCodec, you may
>>>>> POST it using curl.
>>>>> If you are writing directly , use CommonsHttpSolrServer
>>>>>>
>>>>>> /Tim
>>>>>>
>>>>>> 2010/1/20 Noble Paul നോബിള്‍  नोब्ळ् <noble.p...@corp.aol.com>:
>>>>>>> 2010/1/20 Tim Terlegård <tim.terleg...@gmail.com>:
>>>>>>>>>>> BinaryRequestWriter does not read from a file and post it
>>>>>>>>>>
>>>>>>>>>> Is there any other way or is this use case not supported? I tried 
>>>>>>>>>> this:
>>>>>>>>>>
>>>>>>>>>> $ curl <host>/solr/update/javabin -F stream.file=/tmp/data.bin
>>>>>>>>>> $ curl <host>/solr/update -F stream.body=' <commit />'
>>>>>>>>>>
>>>>>>>>>> Solr did read the file, because solr complained when the file wasn't
>>>>>>>>>> in the format the JavaBinUpdateRequestCodec expected. But no data is
>>>>>>>>>> added to the index for some reason.
>>>>>>>>
>>>>>>>>> how did you create the file /tmp/data.bin ? what is the format?
>>>>>>>>
>>>>>>>> I wrote this in the first email. It's in the javabin format (I think).
>>>>>>>> I did like this (groovy code):
>>>>>>>>
>>>>>>>>   fieldId = new NamedList()
>>>>>>>>   fieldId.add("name", "id")
>>>>>>>>   fieldId.add("val", "9-0")
>>>>>>>>   fieldId.add("boost", null)
>>>>>>>>   fieldText = new NamedList()
>>>>>>>>   fieldText.add("name", "text")
>>>>>>>>   fieldText.add("val", "Some text")
>>>>>>>>   fieldText.add("boost", null)
>>>>>>>>   fieldNull = new NamedList()
>>>>>>>>   fieldNull.add("boost", null)
>>>>>>>>   doc = [fieldNull, fieldId, fieldText]
>>>>>>>>   docs = [doc]
>>>>>>>>   root = new NamedList()
>>>>>>>>   root.add("docs", docs)
>>>>>>>>   fos = new FileOutputStream("data.bin")
>>>>>>>>   new JavaBinCodec().marshal(root, fos)
>>>>>>>>
>>>>>>>> /Tim
>>>>>>>>
>>>>>>> JavaBin is a format.
>>>>>>> use this method JavaBinUpdateRequestCodec# marshal(UpdateRequest
>>>>>>> updateRequest, OutputStream os)
>>>>>>>
>>>>>>> The output of this can be posted to solr and it should work
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> -----------------------------------------------------
>>>>>>> Noble Paul | Systems Architect| AOL | http://aol.com
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> -----------------------------------------------------
>>>>> Noble Paul | Systems Architect| AOL | http://aol.com
>>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> -----------------------------------------------------
>>> Noble Paul | Systems Architect| AOL | http://aol.com
>>>
>>
>
>
>
> --
> -----------------------------------------------------
> Noble Paul | Systems Architect| AOL | http://aol.com
>

Re: Fastest way to use solrj

Reply via email to