RE: separating a column into two for different behavior.

Yes, that is exactly I was advised multiple time. However, it will make a 
problem when I apply it to my application.
I have a one core that contains more than 50 columns (out of 100) want to be 
searched by case-insensitive and partial match as well as faceting.
Having too many columns is not a real problem. If I want to search, sort, 
faceting, I need to map column relationships, that became a headache. 

RE: scalability / performance

my current index size about 1.1TB, 25 cores, biggest core has 435M records. 
Don’t have or experienced any memory or performance issue. 


I am currently maintaining own repo just for this feature. It would be nice if 
this is supported out of box.

Harry

> On Jan 26, 2016, at 11:28 AM, Erick Erickson <erickerick...@gmail.com> wrote:
> 
> DocValues was designed to support unanalyzed types
> originally. I don't know that code, but given my respect
> for the people who wrote I'd be amazed if there weren't
> very good reasons this is true. I suspect your work-around
> is going to be "surprising".
> 
> And have you tested your change at scale? I suspect
> searching won't scale well.
> 
> bq:  I need a case-insensitive search for a relatively short string
> and at the same time, I need faceting on the original string
> 
> There's no reason at all to change code to do this. Just use a copyField.
> The field that's to be faceted on is a "string" type with docValues=true, and
> the searchable field is some text type with the appropriate analysis chain.
> 
> This doesn't really make much difference memory wise since the indexing
> and docValues are separate in the first place. I.e. if I specify
> indexed=true and docValues=true I get _two_ sets of date indexed.
> 
> Best,
> Erick
> 
> On Tue, Jan 26, 2016 at 8:50 AM, Harry Yoo <hyunat...@gmail.com> wrote:
>> Hi, I actually needed this functionality for a long time and I made up an 
>> extended data type to work around.
>> 
>> In my use case, I need a case-insensitive search for a relatively short 
>> string and at the same time, I need faceting on the original string. For 
>> example, “Human, Home sapiens’ is an original input, and I want it to be 
>> searched by human, Human, homo sapiens or Homo Sapiens.
>> 
>> Here is my workaround,
>> 
>> public class TextDocValueField extends TextField {
>> 
>>  @Override
>>  public List<IndexableField> createFields(SchemaField field, Object value, 
>> float boost) {
>>    if (field.hasDocValues()) {
>>      List<IndexableField> fields = new ArrayList<>();
>>      fields.add(createField(field, value, boost));
>>      final BytesRef bytes = new BytesRef(value.toString());
>>      if (field.multiValued()) {
>>        fields.add(new SortedSetDocValuesField(field.getName(), bytes));
>>      } else {
>>        fields.add(new SortedDocValuesField(field.getName(), bytes));
>>      }
>>      return fields;
>>    } else {
>> //      return Collections.singletonList(createField(field, value, boost));
>>      return super.createFields(field, value, boost);
>>    }
>>  }
>> 
>>  @Override
>>  public void checkSchemaField(final SchemaField field) {
>>    // do nothing
>>  }
>> 
>>  @Override
>>  public boolean multiValuedFieldCache() {
>>    return false;
>>  }
>> }
>> 
>> I wish this can be supported by solr so that I don’t have to maintain my own 
>> repo.
>> 
>> 
>> 
>> What do you think?
>> 
>> Regards,
>> Harry
>> 
>> 
>>> On Jan 5, 2016, at 10:51 PM, Alok Bhandari 
>>> <alokomprakashbhand...@gmail.com> wrote:
>>> 
>>> Thanks Markus.
>>> 
>>> 
>>> 
>>> --
>>> View this message in context: 
>>> http://lucene.472066.n3.nabble.com/How-to-use-DocValues-with-TextField-tp4248647p4248797.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>> 

Reply via email to