I have 6 fields. The text field is the biggest, it contains almost all of the 5000 chars.
/Tim 2010/1/27 Noble Paul നോബിള് नोब्ळ् <noble.p...@corp.aol.com>: > how many fields are there in each doc? the binary format just reduces > overhead. it does not touch/compress the payload > > 2010/1/27 Tim Terlegård <tim.terleg...@gmail.com>: >> I have 3 millon documents, each having 5000 chars. The xml file is >> about 15GB. The binary file is also about 15GB. >> >> I was a bit surprised about this. It doesn't bother me much though. At >> least it performs better. >> >> /Tim >> >> 2010/1/27 Noble Paul നോബിള് नोब्ळ् <noble.p...@corp.aol.com>: >>> if you write only a few docs you may not observe much difference in >>> size. if you write large no:of docs you may observe a big difference. >>> >>> 2010/1/27 Tim Terlegård <tim.terleg...@gmail.com>: >>>> I got the binary format to work perfectly now. Performance is better >>>> than with xml. Thanks! >>>> >>>> Although, it doesn't look like a binary file is smaller in size than >>>> an xml file? >>>> >>>> /Tim >>>> >>>> 2010/1/27 Noble Paul നോബിള് नोब्ळ् <noble.p...@corp.aol.com>: >>>>> 2010/1/21 Tim Terlegård <tim.terleg...@gmail.com>: >>>>>> Yes, it worked! Thank you very much. But do I need to use curl or can >>>>>> I use CommonsHttpSolrServer or StreamingUpdateSolrServer? If I can't >>>>>> use BinaryWriter then I don't know how to do this. >>>>> if your data is serialized using JavaBinUpdateRequestCodec, you may >>>>> POST it using curl. >>>>> If you are writing directly , use CommonsHttpSolrServer >>>>>> >>>>>> /Tim >>>>>> >>>>>> 2010/1/20 Noble Paul നോബിള് नोब्ळ् <noble.p...@corp.aol.com>: >>>>>>> 2010/1/20 Tim Terlegård <tim.terleg...@gmail.com>: >>>>>>>>>>> BinaryRequestWriter does not read from a file and post it >>>>>>>>>> >>>>>>>>>> Is there any other way or is this use case not supported? I tried >>>>>>>>>> this: >>>>>>>>>> >>>>>>>>>> $ curl <host>/solr/update/javabin -F stream.file=/tmp/data.bin >>>>>>>>>> $ curl <host>/solr/update -F stream.body=' <commit />' >>>>>>>>>> >>>>>>>>>> Solr did read the file, because solr complained when the file wasn't >>>>>>>>>> in the format the JavaBinUpdateRequestCodec expected. But no data is >>>>>>>>>> added to the index for some reason. >>>>>>>> >>>>>>>>> how did you create the file /tmp/data.bin ? what is the format? >>>>>>>> >>>>>>>> I wrote this in the first email. It's in the javabin format (I think). >>>>>>>> I did like this (groovy code): >>>>>>>> >>>>>>>> fieldId = new NamedList() >>>>>>>> fieldId.add("name", "id") >>>>>>>> fieldId.add("val", "9-0") >>>>>>>> fieldId.add("boost", null) >>>>>>>> fieldText = new NamedList() >>>>>>>> fieldText.add("name", "text") >>>>>>>> fieldText.add("val", "Some text") >>>>>>>> fieldText.add("boost", null) >>>>>>>> fieldNull = new NamedList() >>>>>>>> fieldNull.add("boost", null) >>>>>>>> doc = [fieldNull, fieldId, fieldText] >>>>>>>> docs = [doc] >>>>>>>> root = new NamedList() >>>>>>>> root.add("docs", docs) >>>>>>>> fos = new FileOutputStream("data.bin") >>>>>>>> new JavaBinCodec().marshal(root, fos) >>>>>>>> >>>>>>>> /Tim >>>>>>>> >>>>>>> JavaBin is a format. >>>>>>> use this method JavaBinUpdateRequestCodec# marshal(UpdateRequest >>>>>>> updateRequest, OutputStream os) >>>>>>> >>>>>>> The output of this can be posted to solr and it should work >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> ----------------------------------------------------- >>>>>>> Noble Paul | Systems Architect| AOL | http://aol.com >>>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> ----------------------------------------------------- >>>>> Noble Paul | Systems Architect| AOL | http://aol.com >>>>> >>>> >>> >>> >>> >>> -- >>> ----------------------------------------------------- >>> Noble Paul | Systems Architect| AOL | http://aol.com >>> >> > > > > -- > ----------------------------------------------------- > Noble Paul | Systems Architect| AOL | http://aol.com >