Re: HttpClient 4.0 encoding madness

sebb Fri, 29 Jan 2010 03:35:43 -0800

On 29/01/2010, Ken Krugler <[email protected]> wrote:
>
>  On Jan 28, 2010, at 10:09pm, amoldavsky wrote:
>
>
> >
> > Hi Oleg,
> > Thank you for the quick reply.
> >
> > So if there is a possibility that not the whole buffer is filled how can I
> > insure or force HttpClient to fill the whole buffer? Should I maybe avoid
> > Stream Readers all together?
> >
>
>  If bufferSize is X, and the server document you're fetching has Y bytes,
> then what do you mean by "force HttpClient to fill the whole buffer"?
>
>  At a minimum, you'd want
>
>  int bytesRead = chunkedIns.read(tmp);
>  if (bytesRead != -1) {
>    return new String(tmp, 0, bytesRead);
>  }
>
>  But that also uses the platform default encoding for the character set,
> which often won't be correct.


However, if the user just wants to create a file with the contents of
the response, then surely there is no need to mess with encodings?
Just write the bytes to a file output stream without any conversion.

>  -- Ken
>
>
>
> >
> > olegk wrote:
> >
> > >
> > > On Wed, 2010-01-27 at 20:24 -0800, amoldavsky wrote:
> > >
> > > > Hi
> > > >
> > > > I have coded a simple file downloader using HttpClient 4.0.
> > > > It works fine but there is something wrong with the String encoding or
> > > > the
> > > > buffer stream. The problem is that there are long sequences of "NULL"
> > > > (ANSI
> > > > code 00) through out the final file, like this:
> > > >
> http://old.nabble.com/file/p27350930/httpclient_error01.jpg
> > > >
> http://old.nabble.com/file/p27350930/httpclient_error02.jpg
> > > >
> > > > Here is the main code:
> > > >
> > > > public String getChunk(String url, int bufferSize) throws
> > > > HTTPClientException
> > > >  {
> > > >   if(!chunkedStarted)
> > > >   {
> > > >     chunkedIns = getInputStream(url);
> > > >     chunkedStarted = true;
> > > >   }
> > > >
> > > >   byte[] tmp = new byte[bufferSize];
> > > >   try
> > > >   {
> > > >     if(chunkedIns.read(tmp) != -1)
> > > >     {
> > > >
> > >
> > > What makes you think that the entire buffer will be filled with data?
> > >
> > > Oleg
> > >
> > >
> > >
> > > >       return new String(tmp);
> > > >     }
> > > >     else
> > > >     {
> > > >       finish();
> > > >       return null;
> > > >     }
> > > >   }
> > > >   catch(IOException e)
> > > >   {
> > > >     HTTPClientException e2 = new
> HTTPClientException(e.getMessage());
> > > >     e2.setStackTrace(e.getStackTrace());
> > > >     throw e2;
> > > >   }
> > > >  }
> > > >
> > > >  public void finish()
> > > >  {
> > > >   // do some cleaning
> > > >  }
> > > >
> > > >  private InputStream getInputStream(String url) throws
> > > > HTTPClientException
> > > >  {
> > > >   InputStream instream = null;
> > > >
> > > >   httpClient = new DefaultHttpClient();
> > > >
> httpClient.getParams().setParameter("http.useragent",
> AGENT_NAME);
> > > >
> > > >   HttpGet httpGet = new HttpGet(url);
> > > >   HttpResponse response = null;
> > > >
> > > >   try
> > > >   {
> > > >     response = httpClient.execute(httpGet);
> > > >     HttpEntity entity = response.getEntity();
> > > >
> > > >     if(entity != null)
> > > >     {
> > > >       instream = entity.getContent();
> > > >     }
> > > >   }
> > > >   catch(ClientProtocolException e)
> > > >   {
> > > >     HTTPClientException e2 = new
> HTTPClientException(e.getMessage());
> > > >     e2.setStackTrace(e.getStackTrace());
> > > >     throw e2;
> > > >   }
> > > >   catch(IOException e)
> > > >   {
> > > >     HTTPClientException e2 = new
> HTTPClientException(e.getMessage());
> > > >     e2.setStackTrace(e.getStackTrace());
> > > >     throw e2;
> > > >   }
> > > >
> > > >   return instream;
> > > >  }
> > > >
> > > > getChuck and getInputStream can basically be one method but I just
> have
> > > > the
> > > > need to split them for internal conveniece, that does not change the
> > > > funtionality as a whole.
> > > >
> > > > It seems like either the conversion from bytes to string is a problem:
> > > > return new String(tmp);
> > > >
> > > > or that the buffer is not getting filled to the end. The latter could
> not
> > > > be
> > > > possible because the files are ~30MB each and the buffer size is 2Kb.
> > > >
> > > > I have attached the file, it's a CSV (shortened to ~6KB), note that
> long
> > > > white space between some of the URLs, if you just remove it, the URL
> > > > makes
> > > > sense.
> > > > http://old.nabble.com/file/p27350930/datafeed.csv
> datafeed.csv
> > > >
> > > > Where can this white space come (null) from??
> > > >
> > > > thank!
> > > >
> > >
> > >
> > >
> > >
> ---------------------------------------------------------------------
> > > To unsubscribe, e-mail:
> [email protected]
> > > For additional commands, e-mail:
> [email protected]
> > >
> > >
> > >
> > >
> >
> > --
> > View this message in context:
> http://old.nabble.com/HttpClient-4.0-encoding-madness-tp27350930p27366928.html
> > Sent from the HttpClient-User mailing list archive at Nabble.com.
> >
> >
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail:
> [email protected]
> > For additional commands, e-mail:
> [email protected]
> >
> >
>
>  --------------------------------------------
>  Ken Krugler
>  +1 530-210-6378
>  http://bixolabs.com
>  e l a s t i c   w e b   m i n i n g
>
>
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: HttpClient 4.0 encoding madness

Reply via email to