Thank you! Now I use the awk to preprocess it. It seems quite efficiency.I think the other scripting languages will also be helpful.
Return to the post, I would like to know about whether the lucene support the substring search or not. As you can see, one field of my document is long string filed without any spaces. It means the token doesn't work here. Suppose I want to search a string "TARCSV" in my documents. I want to return the sample record from my document set. I try the Wildcard search and Fuzzy search both. But neither seems work. I am very sure whether I do all things right in the index and parse stage. Do you any one has the experience in the substring search? >A0B531 A0B531_METTP^|^^|^Putative uncharacterized protein^|^^|^^|^Methanosaeta thermophila PT^|^349307^|^Arch/Euryar^|^28890 MLFALALSLLILTSGSRSIELNNATVIDLAEGKAVIEQPVSGKIFNITAIARIENISVIH NSH*TARCSV*EESFWRGVYRYRITADSPVSGILRYEAPLRGQQFISPIVLNGTVVVAIPEG YTTGARALGIPRPEPYEIFHENRTVVVWRLERESIVEVGFYRNDAPQILGYFFVLLLAAG IFLAAGYYSSIKKLEAMRRGLK -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-index-a-single-big-file-tp3815540p3818320.html Sent from the Solr - User mailing list archive at Nabble.com.