RE: separating a column into two for different behavior. Yes, that is exactly I was advised multiple time. However, it will make a problem when I apply it to my application. I have a one core that contains more than 50 columns (out of 100) want to be searched by case-insensitive and partial match as well as faceting. Having too many columns is not a real problem. If I want to search, sort, faceting, I need to map column relationships, that became a headache.
RE: scalability / performance my current index size about 1.1TB, 25 cores, biggest core has 435M records. Don’t have or experienced any memory or performance issue. I am currently maintaining own repo just for this feature. It would be nice if this is supported out of box. Harry > On Jan 26, 2016, at 11:28 AM, Erick Erickson <erickerick...@gmail.com> wrote: > > DocValues was designed to support unanalyzed types > originally. I don't know that code, but given my respect > for the people who wrote I'd be amazed if there weren't > very good reasons this is true. I suspect your work-around > is going to be "surprising". > > And have you tested your change at scale? I suspect > searching won't scale well. > > bq: I need a case-insensitive search for a relatively short string > and at the same time, I need faceting on the original string > > There's no reason at all to change code to do this. Just use a copyField. > The field that's to be faceted on is a "string" type with docValues=true, and > the searchable field is some text type with the appropriate analysis chain. > > This doesn't really make much difference memory wise since the indexing > and docValues are separate in the first place. I.e. if I specify > indexed=true and docValues=true I get _two_ sets of date indexed. > > Best, > Erick > > On Tue, Jan 26, 2016 at 8:50 AM, Harry Yoo <hyunat...@gmail.com> wrote: >> Hi, I actually needed this functionality for a long time and I made up an >> extended data type to work around. >> >> In my use case, I need a case-insensitive search for a relatively short >> string and at the same time, I need faceting on the original string. For >> example, “Human, Home sapiens’ is an original input, and I want it to be >> searched by human, Human, homo sapiens or Homo Sapiens. >> >> Here is my workaround, >> >> public class TextDocValueField extends TextField { >> >> @Override >> public List<IndexableField> createFields(SchemaField field, Object value, >> float boost) { >> if (field.hasDocValues()) { >> List<IndexableField> fields = new ArrayList<>(); >> fields.add(createField(field, value, boost)); >> final BytesRef bytes = new BytesRef(value.toString()); >> if (field.multiValued()) { >> fields.add(new SortedSetDocValuesField(field.getName(), bytes)); >> } else { >> fields.add(new SortedDocValuesField(field.getName(), bytes)); >> } >> return fields; >> } else { >> // return Collections.singletonList(createField(field, value, boost)); >> return super.createFields(field, value, boost); >> } >> } >> >> @Override >> public void checkSchemaField(final SchemaField field) { >> // do nothing >> } >> >> @Override >> public boolean multiValuedFieldCache() { >> return false; >> } >> } >> >> I wish this can be supported by solr so that I don’t have to maintain my own >> repo. >> >> >> >> What do you think? >> >> Regards, >> Harry >> >> >>> On Jan 5, 2016, at 10:51 PM, Alok Bhandari >>> <alokomprakashbhand...@gmail.com> wrote: >>> >>> Thanks Markus. >>> >>> >>> >>> -- >>> View this message in context: >>> http://lucene.472066.n3.nabble.com/How-to-use-DocValues-with-TextField-tp4248647p4248797.html >>> Sent from the Solr - User mailing list archive at Nabble.com. >>