Re: Indexing large text documents

Glen Newton Tue, 05 Jan 2010 06:58:06 -0800

(In Lucene) I break the document into smaller pieces, then add each
piece to the Document field in a loop. This seems to work better, but
will mess-around with analysis like term offsets.
This should work in your example.


In Lucene, you can also add the field using a Reader to the file in question:
http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/document/Field.html#Field%28java.lang.String,%20java.io.Reader%29

I haven't looked at the source, so I am not sure if it handles very
large files in a scalable fashion...

-Glen
http://zzzoot.blogspot.com/

2010/1/5 Mark N <nipen.m...@gmail.com>:
> SolrInputDocument doc1 = new SolrInputDocument();
>  doc1.addField( "Fulltext", strContent);
>
> strContent is a string variable which  contains  contents of  text file.
> ( assume that text file is located in c:\files\abc.txt )
>
> In my case abc.text  ( text files ) could be very huge ~ 2 GB so it is not
> always possible to read and store them into string variables
> before indexing . Can anyone suggest what should be better approach to index
> these huge text files ?
>
>
>
> --
> Nipen Mark
>



-- 

-

Re: Indexing large text documents

Reply via email to