Re: Can I reconstruct text from tokens?

Alexandre Rafalovitch Wed, 16 Apr 2014 09:40:20 -0700

Why? I want stored=false, at which point multivalued field is just offset
values in the dictionary. Still have to reconstruct from offsets.


Or am I missing something?

Regards,
     Alex
On 16/04/2014 10:59 pm, "Ramkumar R. Aiyengar" <andyetitmo...@gmail.com>
wrote:

> Logically if you tokenize and put the results in a multivalued field, you
> should be able to get all values in sequence?
> On 16 Apr 2014 16:51, "Alexandre Rafalovitch" <arafa...@gmail.com> wrote:
>
> > Hello,
> >
> > If I use very basic tokenizers, e.g. space based and no filters, can I
> > reconstruct the text from the tokenized form?
> >
> > So, "This is a test" -> "This", "is", "a", "test" -> "This is a test"?
> >
> > I know we store enough information, but I don't know internal API
> > enough to know what I should be looking at for reconstruction
> > algorithm.
> >
> > Any hints?
> >
> > The XY problem is that I want to store large amount of very repeatable
> > text into Solr. I want the index to be as small as possible, so
> > thought if I just pre-tokenized, my dictionary will be quite small.
> > And I will be reconstructing some final form anyway.
> >
> > The other option is to just use compressed fields on stored field, but
> > I assume that does not take cross-document efficiencies into account.
> > And, it will be a read-only index after build, so I don't care about
> > updates messing things up.
> >
> > Regards,
> >    Alex
> >
> > Personal website: http://www.outerthoughts.com/
> > Current project: http://www.solr-start.com/ - Accelerating your Solr
> > proficiency
> >
>

Re: Can I reconstruct text from tokens?

Reply via email to