On 30/01/2010, amoldavsky <[email protected]> wrote:
>
> Hi,
>
> This solution worked out very well:
Glad you finally got there.
> byte[] tmp = new byte[bufferSize];
>
> int bytesRead;
> try
> {
> if((bytesRead = chunkedIns.read(tmp)) != -1)
>
> {
> return new String(tmp, 0, bytesRead);
> }
>
> else
> {
> finish();
> return null;
> }
> }
> catch(IOException e)
> {
> HTTPClientException e2 = new HTTPClientException(e.getMessage());
> e2.setStackTrace(e.getStackTrace());
> throw e2;
> }
>
>
>
> If it's not too much of a trouble would anybody please explain to me why is
> it possible that the buffer maybe not be 100% full when I read it? I think
> it's all depends on how the implementation was done (in this case by Sun),
> and if Sun decided to implement buffering this way I don't understand the
> logic behind it.
In that case you had better ask the Oracle.
>
> Thank you very much Oleg, Ken and Seb-2-2 for your earlier inputs!
>
>
>
>
> sebb-2-2 wrote:
> >
> > On 29/01/2010, amoldavsky <[email protected]> wrote:
> >>
> >> Hi Oleg,
> >>
> >> Let me rephrase the question in better terms:
> >> If the server document is Y and buffer size is X, let's even assume that
> >> Y =
> >> kX where X < Y, is it possible that any buffer 0 < x < (k-1) will not be
> >> fully filled?
> >
> > Remember that HTTP packets may be broken up in transit.
> >
> > However, even without that, it's never safe to assume that a buffer is
> > filled.
> >
> > That's what the return value from read(buffer) is for - it tells you
> > how many bytes are available.
> >
> >> Thanks!
> >> -Assaf
> >>
> >>
> >>
> >> Ken Krugler wrote:
> >> >
> >> >
> >> > On Jan 28, 2010, at 10:09pm, amoldavsky wrote:
> >> >
> >> >>
> >> >> Hi Oleg,
> >> >> Thank you for the quick reply.
> >> >>
> >> >> So if there is a possibility that not the whole buffer is filled how
> >> >> can I
> >> >> insure or force HttpClient to fill the whole buffer? Should I maybe
> >> >> avoid
> >> >> Stream Readers all together?
> >> >
> >> > If bufferSize is X, and the server document you're fetching has Y
> >> > bytes, then what do you mean by "force HttpClient to fill the whole
> >> > buffer"?
> >> >
> >> > At a minimum, you'd want
> >> >
> >> > int bytesRead = chunkedIns.read(tmp);
> >> > if (bytesRead != -1) {
> >> > return new String(tmp, 0, bytesRead);
> >> > }
> >> >
> >> > But that also uses the platform default encoding for the character
> >> > set, which often won't be correct.
> >> >
> >> > -- Ken
> >> >
> >> >>
> >> >> olegk wrote:
> >> >>>
> >> >>> On Wed, 2010-01-27 at 20:24 -0800, amoldavsky wrote:
> >> >>>> Hi
> >> >>>>
> >> >>>> I have coded a simple file downloader using HttpClient 4.0.
> >> >>>> It works fine but there is something wrong with the String
> >> >>>> encoding or
> >> >>>> the
> >> >>>> buffer stream. The problem is that there are long sequences of
> >> >>>> "NULL"
> >> >>>> (ANSI
> >> >>>> code 00) through out the final file, like this:
> >> >>>> http://old.nabble.com/file/p27350930/httpclient_error01.jpg
> >> >>>> http://old.nabble.com/file/p27350930/httpclient_error02.jpg
> >> >>>>
> >> >>>> Here is the main code:
> >> >>>>
> >> >>>> public String getChunk(String url, int bufferSize) throws
> >> >>>> HTTPClientException
> >> >>>> {
> >> >>>> if(!chunkedStarted)
> >> >>>> {
> >> >>>> chunkedIns = getInputStream(url);
> >> >>>> chunkedStarted = true;
> >> >>>> }
> >> >>>>
> >> >>>> byte[] tmp = new byte[bufferSize];
> >> >>>> try
> >> >>>> {
> >> >>>> if(chunkedIns.read(tmp) != -1)
> >> >>>> {
> >> >>>
> >> >>> What makes you think that the entire buffer will be filled with
> >> data?
> >> >>>
> >> >>> Oleg
> >> >>>
> >> >>>
> >> >>>> return new String(tmp);
> >> >>>> }
> >> >>>> else
> >> >>>> {
> >> >>>> finish();
> >> >>>> return null;
> >> >>>> }
> >> >>>> }
> >> >>>> catch(IOException e)
> >> >>>> {
> >> >>>> HTTPClientException e2 = new
> >> >>>> HTTPClientException(e.getMessage());
> >> >>>> e2.setStackTrace(e.getStackTrace());
> >> >>>> throw e2;
> >> >>>> }
> >> >>>> }
> >> >>>>
> >> >>>> public void finish()
> >> >>>> {
> >> >>>> // do some cleaning
> >> >>>> }
> >> >>>>
> >> >>>> private InputStream getInputStream(String url) throws
> >> >>>> HTTPClientException
> >> >>>> {
> >> >>>> InputStream instream = null;
> >> >>>>
> >> >>>> httpClient = new DefaultHttpClient();
> >> >>>> httpClient.getParams().setParameter("http.useragent",
> >> >>>> AGENT_NAME);
> >> >>>>
> >> >>>> HttpGet httpGet = new HttpGet(url);
> >> >>>> HttpResponse response = null;
> >> >>>>
> >> >>>> try
> >> >>>> {
> >> >>>> response = httpClient.execute(httpGet);
> >> >>>> HttpEntity entity = response.getEntity();
> >> >>>>
> >> >>>> if(entity != null)
> >> >>>> {
> >> >>>> instream = entity.getContent();
> >> >>>> }
> >> >>>> }
> >> >>>> catch(ClientProtocolException e)
> >> >>>> {
> >> >>>> HTTPClientException e2 = new
> >> >>>> HTTPClientException(e.getMessage());
> >> >>>> e2.setStackTrace(e.getStackTrace());
> >> >>>> throw e2;
> >> >>>> }
> >> >>>> catch(IOException e)
> >> >>>> {
> >> >>>> HTTPClientException e2 = new
> >> >>>> HTTPClientException(e.getMessage());
> >> >>>> e2.setStackTrace(e.getStackTrace());
> >> >>>> throw e2;
> >> >>>> }
> >> >>>>
> >> >>>> return instream;
> >> >>>> }
> >> >>>>
> >> >>>> getChuck and getInputStream can basically be one method but I just
> >> >>>> have
> >> >>>> the
> >> >>>> need to split them for internal conveniece, that does not change
> >> the
> >> >>>> funtionality as a whole.
> >> >>>>
> >> >>>> It seems like either the conversion from bytes to string is a
> >> >>>> problem:
> >> >>>> return new String(tmp);
> >> >>>>
> >> >>>> or that the buffer is not getting filled to the end. The latter
> >> >>>> could not
> >> >>>> be
> >> >>>> possible because the files are ~30MB each and the buffer size is
> >> >>>> 2Kb.
> >> >>>>
> >> >>>> I have attached the file, it's a CSV (shortened to ~6KB), note
> >> >>>> that long
> >> >>>> white space between some of the URLs, if you just remove it, the
> >> URL
> >> >>>> makes
> >> >>>> sense.
> >> >>>> http://old.nabble.com/file/p27350930/datafeed.csv datafeed.csv
> >> >>>>
> >> >>>> Where can this white space come (null) from??
> >> >>>>
> >> >>>> thank!
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> ---------------------------------------------------------------------
> >> >>> To unsubscribe, e-mail: [email protected]
> >> >>> For additional commands, e-mail: [email protected]
> >> >>>
> >> >>>
> >> >>>
> >> >>
> >> >> --
> >> >> View this message in context:
> >> >>
> >>
> http://old.nabble.com/HttpClient-4.0-encoding-madness-tp27350930p27366928.html
> >> >> Sent from the HttpClient-User mailing list archive at Nabble.com.
> >> >>
> >> >>
> >> >> ---------------------------------------------------------------------
> >> >> To unsubscribe, e-mail: [email protected]
> >> >> For additional commands, e-mail: [email protected]
> >> >>
> >> >
> >> > --------------------------------------------
> >> > Ken Krugler
> >> > +1 530-210-6378
> >> > http://bixolabs.com
> >> > e l a s t i c w e b m i n i n g
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >>
> >> --
> >>
> >> View this message in context:
> >>
> http://old.nabble.com/HttpClient-4.0-encoding-madness-tp27350930p27377093.html
> >>
> >> Sent from the HttpClient-User mailing list archive at Nabble.com.
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [email protected]
> >> For additional commands, e-mail: [email protected]
> >>
> >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> >
> >
> >
>
> --
>
> View this message in context:
> http://old.nabble.com/HttpClient-4.0-encoding-madness-tp27350930p27381546.html
>
> Sent from the HttpClient-User mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]