On 29/01/2010, amoldavsky <[email protected]> wrote:
>
>  Hi Oleg,
>
>  Let me rephrase the question in better terms:
>  If the server document is Y and buffer size is X, let's even assume that Y =
>  kX where X < Y, is it possible that any buffer 0 < x < (k-1) will not be
>  fully filled?

Remember that HTTP packets may be broken up in transit.

However, even without that, it's never safe to assume that a buffer is filled.

That's what the return value from read(buffer) is for - it tells you
how many bytes are available.

>  Thanks!
>  -Assaf
>
>
>
>  Ken Krugler wrote:
>  >
>  >
>  > On Jan 28, 2010, at 10:09pm, amoldavsky wrote:
>  >
>  >>
>  >> Hi Oleg,
>  >> Thank you for the quick reply.
>  >>
>  >> So if there is a possibility that not the whole buffer is filled how
>  >> can I
>  >> insure or force HttpClient to fill the whole buffer? Should I maybe
>  >> avoid
>  >> Stream Readers all together?
>  >
>  > If bufferSize is X, and the server document you're fetching has Y
>  > bytes, then what do you mean by "force HttpClient to fill the whole
>  > buffer"?
>  >
>  > At a minimum, you'd want
>  >
>  > int bytesRead = chunkedIns.read(tmp);
>  > if (bytesRead != -1) {
>  >     return new String(tmp, 0, bytesRead);
>  > }
>  >
>  > But that also uses the platform default encoding for the character
>  > set, which often won't be correct.
>  >
>  > -- Ken
>  >
>  >>
>  >> olegk wrote:
>  >>>
>  >>> On Wed, 2010-01-27 at 20:24 -0800, amoldavsky wrote:
>  >>>> Hi
>  >>>>
>  >>>> I have coded a simple file downloader using HttpClient 4.0.
>  >>>> It works fine but there is something wrong with the String
>  >>>> encoding or
>  >>>> the
>  >>>> buffer stream. The problem is that there are long sequences of
>  >>>> "NULL"
>  >>>> (ANSI
>  >>>> code 00) through out the final file, like this:
>  >>>> http://old.nabble.com/file/p27350930/httpclient_error01.jpg
>  >>>> http://old.nabble.com/file/p27350930/httpclient_error02.jpg
>  >>>>
>  >>>> Here is the main code:
>  >>>>
>  >>>> public String getChunk(String url, int bufferSize) throws
>  >>>> HTTPClientException
>  >>>>  {
>  >>>>    if(!chunkedStarted)
>  >>>>    {
>  >>>>      chunkedIns = getInputStream(url);
>  >>>>      chunkedStarted = true;
>  >>>>    }
>  >>>>
>  >>>>    byte[] tmp = new byte[bufferSize];
>  >>>>    try
>  >>>>    {
>  >>>>      if(chunkedIns.read(tmp) != -1)
>  >>>>      {
>  >>>
>  >>> What makes you think that the entire buffer will be filled with data?
>  >>>
>  >>> Oleg
>  >>>
>  >>>
>  >>>>        return new String(tmp);
>  >>>>      }
>  >>>>      else
>  >>>>      {
>  >>>>        finish();
>  >>>>        return null;
>  >>>>      }
>  >>>>    }
>  >>>>    catch(IOException e)
>  >>>>    {
>  >>>>      HTTPClientException e2 = new
>  >>>> HTTPClientException(e.getMessage());
>  >>>>      e2.setStackTrace(e.getStackTrace());
>  >>>>      throw e2;
>  >>>>    }
>  >>>>  }
>  >>>>
>  >>>>  public void finish()
>  >>>>  {
>  >>>>    // do some cleaning
>  >>>>  }
>  >>>>
>  >>>>   private InputStream getInputStream(String url) throws
>  >>>> HTTPClientException
>  >>>>  {
>  >>>>    InputStream instream = null;
>  >>>>
>  >>>>    httpClient = new DefaultHttpClient();
>  >>>>    httpClient.getParams().setParameter("http.useragent",
>  >>>> AGENT_NAME);
>  >>>>
>  >>>>    HttpGet httpGet = new HttpGet(url);
>  >>>>    HttpResponse response = null;
>  >>>>
>  >>>>    try
>  >>>>    {
>  >>>>      response = httpClient.execute(httpGet);
>  >>>>      HttpEntity entity = response.getEntity();
>  >>>>
>  >>>>      if(entity != null)
>  >>>>      {
>  >>>>        instream = entity.getContent();
>  >>>>      }
>  >>>>    }
>  >>>>    catch(ClientProtocolException e)
>  >>>>    {
>  >>>>      HTTPClientException e2 = new
>  >>>> HTTPClientException(e.getMessage());
>  >>>>      e2.setStackTrace(e.getStackTrace());
>  >>>>      throw e2;
>  >>>>    }
>  >>>>    catch(IOException e)
>  >>>>    {
>  >>>>      HTTPClientException e2 = new
>  >>>> HTTPClientException(e.getMessage());
>  >>>>      e2.setStackTrace(e.getStackTrace());
>  >>>>      throw e2;
>  >>>>    }
>  >>>>
>  >>>>    return instream;
>  >>>>  }
>  >>>>
>  >>>> getChuck and getInputStream can basically be one method but I just
>  >>>> have
>  >>>> the
>  >>>> need to split them for internal conveniece, that does not change the
>  >>>> funtionality as a whole.
>  >>>>
>  >>>> It seems like either the conversion from bytes to string is a
>  >>>> problem:
>  >>>> return new String(tmp);
>  >>>>
>  >>>> or that the buffer is not getting filled to the end. The latter
>  >>>> could not
>  >>>> be
>  >>>> possible because the files are ~30MB each and the buffer size is
>  >>>> 2Kb.
>  >>>>
>  >>>> I have attached the file, it's a CSV (shortened to ~6KB), note
>  >>>> that long
>  >>>> white space between some of the URLs, if you just remove it, the URL
>  >>>> makes
>  >>>> sense.
>  >>>> http://old.nabble.com/file/p27350930/datafeed.csv datafeed.csv
>  >>>>
>  >>>> Where can this white space come (null) from??
>  >>>>
>  >>>> thank!
>  >>>
>  >>>
>  >>>
>  >>> ---------------------------------------------------------------------
>  >>> To unsubscribe, e-mail: [email protected]
>  >>> For additional commands, e-mail: [email protected]
>  >>>
>  >>>
>  >>>
>  >>
>  >> --
>  >> View this message in context:
>  >> 
> http://old.nabble.com/HttpClient-4.0-encoding-madness-tp27350930p27366928.html
>  >> Sent from the HttpClient-User mailing list archive at Nabble.com.
>  >>
>  >>
>  >> ---------------------------------------------------------------------
>  >> To unsubscribe, e-mail: [email protected]
>  >> For additional commands, e-mail: [email protected]
>  >>
>  >
>  > --------------------------------------------
>  > Ken Krugler
>  > +1 530-210-6378
>  > http://bixolabs.com
>  > e l a s t i c   w e b   m i n i n g
>  >
>  >
>  >
>  >
>  >
>  >
>
>  --
>
> View this message in context: 
> http://old.nabble.com/HttpClient-4.0-encoding-madness-tp27350930p27377093.html
>
> Sent from the HttpClient-User mailing list archive at Nabble.com.
>
>
>  ---------------------------------------------------------------------
>  To unsubscribe, e-mail: [email protected]
>  For additional commands, e-mail: [email protected]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to