Thanks Erick I already have gone through the link from tika example you shared. Please look at the code in bold. I believe still the entire contents is pushed to memory with handler object. sorry i copied lengthy code from tika site.
Regards Neo *Streaming the plain text in chunks* Sometimes, you want to chunk the resulting text up, perhaps to output as you go minimising memory use, perhaps to output to HDFS files, or any other reason! With a small custom content handler, you can do that. public List<String> parseToPlainTextChunks() throws IOException, SAXException, TikaException { final List<String> chunks = new ArrayList<>(); chunks.add(""); ContentHandlerDecorator handler = new ContentHandlerDecorator() { @Override public void characters(char[] ch, int start, int length) { String lastChunk = chunks.get(chunks.size() - 1); String thisStr = new String(ch, start, length); if (lastChunk.length() + length > MAXIMUM_TEXT_CHUNK_SIZE) { chunks.add(thisStr); } else { chunks.set(chunks.size() - 1, lastChunk + thisStr); } } }; AutoDetectParser parser = new AutoDetectParser(); Metadata metadata = new Metadata(); try (InputStream stream = ContentHandlerExample.class.getResourceAsStream("test2.doc")) { *parser.parse(stream, handler, metadata);* return chunks; } } -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html