On 29/01/2010, amoldavsky <[email protected]> wrote:
>
> Hi Oleg,
>
> Let me rephrase the question in better terms:
> If the server document is Y and buffer size is X, let's even assume that Y =
> kX where X < Y, is it possible that any buffer 0 < x < (k-1) will not be
> fully filled?
Remember that HTTP packets may be broken up in transit.
However, even without that, it's never safe to assume that a buffer is filled.
That's what the return value from read(buffer) is for - it tells you
how many bytes are available.
> Thanks!
> -Assaf
>
>
>
> Ken Krugler wrote:
> >
> >
> > On Jan 28, 2010, at 10:09pm, amoldavsky wrote:
> >
> >>
> >> Hi Oleg,
> >> Thank you for the quick reply.
> >>
> >> So if there is a possibility that not the whole buffer is filled how
> >> can I
> >> insure or force HttpClient to fill the whole buffer? Should I maybe
> >> avoid
> >> Stream Readers all together?
> >
> > If bufferSize is X, and the server document you're fetching has Y
> > bytes, then what do you mean by "force HttpClient to fill the whole
> > buffer"?
> >
> > At a minimum, you'd want
> >
> > int bytesRead = chunkedIns.read(tmp);
> > if (bytesRead != -1) {
> > return new String(tmp, 0, bytesRead);
> > }
> >
> > But that also uses the platform default encoding for the character
> > set, which often won't be correct.
> >
> > -- Ken
> >
> >>
> >> olegk wrote:
> >>>
> >>> On Wed, 2010-01-27 at 20:24 -0800, amoldavsky wrote:
> >>>> Hi
> >>>>
> >>>> I have coded a simple file downloader using HttpClient 4.0.
> >>>> It works fine but there is something wrong with the String
> >>>> encoding or
> >>>> the
> >>>> buffer stream. The problem is that there are long sequences of
> >>>> "NULL"
> >>>> (ANSI
> >>>> code 00) through out the final file, like this:
> >>>> http://old.nabble.com/file/p27350930/httpclient_error01.jpg
> >>>> http://old.nabble.com/file/p27350930/httpclient_error02.jpg
> >>>>
> >>>> Here is the main code:
> >>>>
> >>>> public String getChunk(String url, int bufferSize) throws
> >>>> HTTPClientException
> >>>> {
> >>>> if(!chunkedStarted)
> >>>> {
> >>>> chunkedIns = getInputStream(url);
> >>>> chunkedStarted = true;
> >>>> }
> >>>>
> >>>> byte[] tmp = new byte[bufferSize];
> >>>> try
> >>>> {
> >>>> if(chunkedIns.read(tmp) != -1)
> >>>> {
> >>>
> >>> What makes you think that the entire buffer will be filled with data?
> >>>
> >>> Oleg
> >>>
> >>>
> >>>> return new String(tmp);
> >>>> }
> >>>> else
> >>>> {
> >>>> finish();
> >>>> return null;
> >>>> }
> >>>> }
> >>>> catch(IOException e)
> >>>> {
> >>>> HTTPClientException e2 = new
> >>>> HTTPClientException(e.getMessage());
> >>>> e2.setStackTrace(e.getStackTrace());
> >>>> throw e2;
> >>>> }
> >>>> }
> >>>>
> >>>> public void finish()
> >>>> {
> >>>> // do some cleaning
> >>>> }
> >>>>
> >>>> private InputStream getInputStream(String url) throws
> >>>> HTTPClientException
> >>>> {
> >>>> InputStream instream = null;
> >>>>
> >>>> httpClient = new DefaultHttpClient();
> >>>> httpClient.getParams().setParameter("http.useragent",
> >>>> AGENT_NAME);
> >>>>
> >>>> HttpGet httpGet = new HttpGet(url);
> >>>> HttpResponse response = null;
> >>>>
> >>>> try
> >>>> {
> >>>> response = httpClient.execute(httpGet);
> >>>> HttpEntity entity = response.getEntity();
> >>>>
> >>>> if(entity != null)
> >>>> {
> >>>> instream = entity.getContent();
> >>>> }
> >>>> }
> >>>> catch(ClientProtocolException e)
> >>>> {
> >>>> HTTPClientException e2 = new
> >>>> HTTPClientException(e.getMessage());
> >>>> e2.setStackTrace(e.getStackTrace());
> >>>> throw e2;
> >>>> }
> >>>> catch(IOException e)
> >>>> {
> >>>> HTTPClientException e2 = new
> >>>> HTTPClientException(e.getMessage());
> >>>> e2.setStackTrace(e.getStackTrace());
> >>>> throw e2;
> >>>> }
> >>>>
> >>>> return instream;
> >>>> }
> >>>>
> >>>> getChuck and getInputStream can basically be one method but I just
> >>>> have
> >>>> the
> >>>> need to split them for internal conveniece, that does not change the
> >>>> funtionality as a whole.
> >>>>
> >>>> It seems like either the conversion from bytes to string is a
> >>>> problem:
> >>>> return new String(tmp);
> >>>>
> >>>> or that the buffer is not getting filled to the end. The latter
> >>>> could not
> >>>> be
> >>>> possible because the files are ~30MB each and the buffer size is
> >>>> 2Kb.
> >>>>
> >>>> I have attached the file, it's a CSV (shortened to ~6KB), note
> >>>> that long
> >>>> white space between some of the URLs, if you just remove it, the URL
> >>>> makes
> >>>> sense.
> >>>> http://old.nabble.com/file/p27350930/datafeed.csv datafeed.csv
> >>>>
> >>>> Where can this white space come (null) from??
> >>>>
> >>>> thank!
> >>>
> >>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: [email protected]
> >>> For additional commands, e-mail: [email protected]
> >>>
> >>>
> >>>
> >>
> >> --
> >> View this message in context:
> >>
> http://old.nabble.com/HttpClient-4.0-encoding-madness-tp27350930p27366928.html
> >> Sent from the HttpClient-User mailing list archive at Nabble.com.
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [email protected]
> >> For additional commands, e-mail: [email protected]
> >>
> >
> > --------------------------------------------
> > Ken Krugler
> > +1 530-210-6378
> > http://bixolabs.com
> > e l a s t i c w e b m i n i n g
> >
> >
> >
> >
> >
> >
>
> --
>
> View this message in context:
> http://old.nabble.com/HttpClient-4.0-encoding-madness-tp27350930p27377093.html
>
> Sent from the HttpClient-User mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]