Why? I want stored=false, at which point multivalued field is just offset values in the dictionary. Still have to reconstruct from offsets.
Or am I missing something? Regards, Alex On 16/04/2014 10:59 pm, "Ramkumar R. Aiyengar" <andyetitmo...@gmail.com> wrote: > Logically if you tokenize and put the results in a multivalued field, you > should be able to get all values in sequence? > On 16 Apr 2014 16:51, "Alexandre Rafalovitch" <arafa...@gmail.com> wrote: > > > Hello, > > > > If I use very basic tokenizers, e.g. space based and no filters, can I > > reconstruct the text from the tokenized form? > > > > So, "This is a test" -> "This", "is", "a", "test" -> "This is a test"? > > > > I know we store enough information, but I don't know internal API > > enough to know what I should be looking at for reconstruction > > algorithm. > > > > Any hints? > > > > The XY problem is that I want to store large amount of very repeatable > > text into Solr. I want the index to be as small as possible, so > > thought if I just pre-tokenized, my dictionary will be quite small. > > And I will be reconstructing some final form anyway. > > > > The other option is to just use compressed fields on stored field, but > > I assume that does not take cross-document efficiencies into account. > > And, it will be a read-only index after build, so I don't care about > > updates messing things up. > > > > Regards, > > Alex > > > > Personal website: http://www.outerthoughts.com/ > > Current project: http://www.solr-start.com/ - Accelerating your Solr > > proficiency > > >