Re: How to use DocValues with TextField

2016-01-26 Thread Harry Yoo
Hi, I actually needed this functionality for a long time and I made up an 
extended data type to work around. 

In my use case, I need a case-insensitive search for a relatively short string 
and at the same time, I need faceting on the original string. For example, 
“Human, Home sapiens’ is an original input, and I want it to be searched by 
human, Human, homo sapiens or Homo Sapiens. 

Here is my workaround,

public class TextDocValueField extends TextField {

  @Override
  public List createFields(SchemaField field, Object value, 
float boost) {
if (field.hasDocValues()) {
  List fields = new ArrayList<>();
  fields.add(createField(field, value, boost));
  final BytesRef bytes = new BytesRef(value.toString());
  if (field.multiValued()) {
fields.add(new SortedSetDocValuesField(field.getName(), bytes));
  } else {
fields.add(new SortedDocValuesField(field.getName(), bytes));
  }
  return fields;
} else {
//  return Collections.singletonList(createField(field, value, boost));
  return super.createFields(field, value, boost);
}
  }

  @Override
  public void checkSchemaField(final SchemaField field) {
// do nothing
  }

  @Override
  public boolean multiValuedFieldCache() {
return false;
  }
}

I wish this can be supported by solr so that I don’t have to maintain my own 
repo.



What do you think?

Regards,
Harry


> On Jan 5, 2016, at 10:51 PM, Alok Bhandari  
> wrote:
> 
> Thanks Markus.
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/How-to-use-DocValues-with-TextField-tp4248647p4248797.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: How to use DocValues with TextField

2016-02-18 Thread Harry Yoo
RE: separating a column into two for different behavior.

Yes, that is exactly I was advised multiple time. However, it will make a 
problem when I apply it to my application.
I have a one core that contains more than 50 columns (out of 100) want to be 
searched by case-insensitive and partial match as well as faceting.
Having too many columns is not a real problem. If I want to search, sort, 
faceting, I need to map column relationships, that became a headache. 

RE: scalability / performance

my current index size about 1.1TB, 25 cores, biggest core has 435M records. 
Don’t have or experienced any memory or performance issue. 


I am currently maintaining own repo just for this feature. It would be nice if 
this is supported out of box.

Harry

> On Jan 26, 2016, at 11:28 AM, Erick Erickson  wrote:
> 
> DocValues was designed to support unanalyzed types
> originally. I don't know that code, but given my respect
> for the people who wrote I'd be amazed if there weren't
> very good reasons this is true. I suspect your work-around
> is going to be "surprising".
> 
> And have you tested your change at scale? I suspect
> searching won't scale well.
> 
> bq:  I need a case-insensitive search for a relatively short string
> and at the same time, I need faceting on the original string
> 
> There's no reason at all to change code to do this. Just use a copyField.
> The field that's to be faceted on is a "string" type with docValues=true, and
> the searchable field is some text type with the appropriate analysis chain.
> 
> This doesn't really make much difference memory wise since the indexing
> and docValues are separate in the first place. I.e. if I specify
> indexed=true and docValues=true I get _two_ sets of date indexed.
> 
> Best,
> Erick
> 
> On Tue, Jan 26, 2016 at 8:50 AM, Harry Yoo  wrote:
>> Hi, I actually needed this functionality for a long time and I made up an 
>> extended data type to work around.
>> 
>> In my use case, I need a case-insensitive search for a relatively short 
>> string and at the same time, I need faceting on the original string. For 
>> example, “Human, Home sapiens’ is an original input, and I want it to be 
>> searched by human, Human, homo sapiens or Homo Sapiens.
>> 
>> Here is my workaround,
>> 
>> public class TextDocValueField extends TextField {
>> 
>>  @Override
>>  public List createFields(SchemaField field, Object value, 
>> float boost) {
>>if (field.hasDocValues()) {
>>  List fields = new ArrayList<>();
>>  fields.add(createField(field, value, boost));
>>  final BytesRef bytes = new BytesRef(value.toString());
>>  if (field.multiValued()) {
>>fields.add(new SortedSetDocValuesField(field.getName(), bytes));
>>  } else {
>>fields.add(new SortedDocValuesField(field.getName(), bytes));
>>  }
>>  return fields;
>>} else {
>> //  return Collections.singletonList(createField(field, value, boost));
>>  return super.createFields(field, value, boost);
>>}
>>  }
>> 
>>  @Override
>>  public void checkSchemaField(final SchemaField field) {
>>// do nothing
>>  }
>> 
>>  @Override
>>  public boolean multiValuedFieldCache() {
>>return false;
>>  }
>> }
>> 
>> I wish this can be supported by solr so that I don’t have to maintain my own 
>> repo.
>> 
>> 
>> 
>> What do you think?
>> 
>> Regards,
>> Harry
>> 
>> 
>>> On Jan 5, 2016, at 10:51 PM, Alok Bhandari 
>>>  wrote:
>>> 
>>> Thanks Markus.
>>> 
>>> 
>>> 
>>> --
>>> View this message in context: 
>>> http://lucene.472066.n3.nabble.com/How-to-use-DocValues-with-TextField-tp4248647p4248797.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>> 



Re: How to use DocValues with TextField

2016-02-18 Thread Harry Yoo
Thanks for the pointer. 

Please advise me how I can contribute.

H

> On Jan 27, 2016, at 2:16 AM, Toke Eskildsen  wrote:
> 
> Erick Erickson  wrote:
>> DocValues was designed to support unanalyzed types
>> originally. I don't know that code, but given my respect
>> for the people who wrote I'd be amazed if there weren't
>> very good reasons this is true. I suspect your work-around
>> is going to be "surprising".
> 
> Hoss talked about this at the last Lucene/Solr Revolution and has opened
> https://issues.apache.org/jira/browse/SOLR-8362
> 
> Harry: Maybe you could follow up on that JIRA issue?
> 
> - Toke Eskildsen



Re: serious JSON Facet bug

2015-07-23 Thread Harry Yoo
Is there a way to patch? I am using 5.2.1 and using json facet in production.

> On Jul 16, 2015, at 1:43 PM, Yonik Seeley  wrote:
> 
> To anyone using the JSON Facet API in released Solr versions:
> I discovered a serious memory leak while doing performance benchmarks
> (see http://yonik.com/facet_performance/ for some of the early results).
> 
> Assuming you're in the evaluation / development phase of your project,
> I'd recommend using a recent developer snapshot for now:
> https://builds.apache.org/job/Solr-Artifacts-5.x/lastSuccessfulBuild/artifact/solr/package/
> 
> The fix (and performance improvements) will also be in the next Solr
> release (5.3) of course.
> 
> -Yonik



Re: serious JSON Facet bug

2015-07-27 Thread Harry Yoo
yes, I see the problem on my production solr. I set 10,240 as max and I see the 
current size is 228,940. x22 bigger than max.


> On Jul 23, 2015, at 8:43 PM, Yonik Seeley  wrote:
> 
> On Thu, Jul 23, 2015 at 5:00 PM, Harry Yoo  wrote:
>> Is there a way to patch? I am using 5.2.1 and using json facet in production.
> 
> First you should see if your queries tickle the bug...
> check the size of the filter cache from the admin screen (under
> plugins, filterCache)
> and see if it's current size is larger than the configured maximum.
> 
> -Yonik
> 
> 
>>> On Jul 16, 2015, at 1:43 PM, Yonik Seeley  wrote:
>>> 
>>> To anyone using the JSON Facet API in released Solr versions:
>>> I discovered a serious memory leak while doing performance benchmarks
>>> (see http://yonik.com/facet_performance/ for some of the early results).
>>> 
>>> Assuming you're in the evaluation / development phase of your project,
>>> I'd recommend using a recent developer snapshot for now:
>>> https://builds.apache.org/job/Solr-Artifacts-5.x/lastSuccessfulBuild/artifact/solr/package/
>>> 
>>> The fix (and performance improvements) will also be in the next Solr
>>> release (5.3) of course.
>>> 
>>> -Yonik
>> 



Re: Best way to facets with value preprocessing (w/ docValues)

2015-07-14 Thread Harry Yoo
I had a same issue and here is my solution.

Basically, option #1 that Konstantin suggested,

public class TextDocValueField extends TextField {

  @Override
  public List createFields(SchemaField field, Object value, 
float boost) {
if (field.hasDocValues()) {
  List fields = new ArrayList<>();
  fields.add(createField(field, value, boost));
  final BytesRef bytes = new BytesRef(value.toString());
  if (field.multiValued()) {
fields.add(new SortedSetDocValuesField(field.getName(), bytes));
  } else {
fields.add(new SortedDocValuesField(field.getName(), bytes));
  }
  return fields;
} else {
//  return Collections.singletonList(createField(field, value, boost));
  return super.createFields(field, value, boost);
}
  }

  @Override
  public void checkSchemaField(final SchemaField field) {
// do nothing
  }

  @Override
  public boolean multiValuedFieldCache() {
return false;
  }
}


I had no problem so far, but I haven’t compared performance. I wish Solr allows 
docValue on TextField

Best,
Harry






Re: is there a way to remove deleted documents from index without optimize

2017-10-12 Thread Harry Yoo
I should have read this. My project has been running from apache solr 4.x, and 
moved to 5.x and recently migrated to 6.6.1. Do you think solr will take care 
of old version indexes as well? I wanted to make sure my indexes are updated 
with 6.x lucence version so that it will be supported when i move to solr 7.x

Is there any best practice managing solr indexes?

Harry

> On Sep 22, 2015, at 8:21 PM, Walter Underwood  wrote:
> 
> Don’t do anything. Solr will automatically clean up the deleted documents for 
> you.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
> 
>> On Sep 22, 2015, at 6:01 PM, CrazyDiamond  wrote:
>> 
>> my index is updating frequently and i need to remove unused documents from
>> index after update/reindex.
>> Optimizaion is very expensive so what should i do?
>> 
>> 
>> 
>> --
>> View this message in context: 
>> http://lucene.472066.n3.nabble.com/is-there-a-way-to-remove-deleted-documents-from-index-without-optimize-tp4230691.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
> 



Re: is there a way to remove deleted documents from index without optimize

2017-10-13 Thread Harry Yoo
Thanks for the clarification. 

I use 


${lucene.version}

in the solrconfig.xml  and pass -Dlucene.version when I launch solr, to keep 
the versions.



> On Oct 12, 2017, at 11:01 PM, Erick Erickson  wrote:
> 
> You can use the IndexUpgradeTool that ships with each version of Solr
> (well, actually Lucene) to, well, upgrade your index. So you can use
> the IndexUpgradeTool that ships with 5x to upgrade from 4x. And the
> one that ships with 6x to upgrade from 5x. etc.
> 
> That said, none of that is necessary _if_ you
>> have the Lucene version in solrconfig.xml be the one that corresponds to 
>> your current Solr. I.e. a solrconfig for 6x should have a luceneMatchVersion 
>> of 6something.
>> you update your index enough to rewrite all segments before moving to the 
>> _next_ version. When Lucene sees merges a segment, it writes the new segment 
>> according to the luceneMatchVersion in solrconfig.xml. So as long as you are 
>> on a version long enough for all segments to be merged into new segments, 
>> you don't have to worry.
> 
> Best,
> Erick
> 
> On Thu, Oct 12, 2017 at 8:29 PM, Harry Yoo  wrote:
>> I should have read this. My project has been running from apache solr 4.x, 
>> and moved to 5.x and recently migrated to 6.6.1. Do you think solr will take 
>> care of old version indexes as well? I wanted to make sure my indexes are 
>> updated with 6.x lucence version so that it will be supported when i move to 
>> solr 7.x
>> 
>> Is there any best practice managing solr indexes?
>> 
>> Harry
>> 
>>> On Sep 22, 2015, at 8:21 PM, Walter Underwood  wrote:
>>> 
>>> Don’t do anything. Solr will automatically clean up the deleted documents 
>>> for you.
>>> 
>>> wunder
>>> Walter Underwood
>>> wun...@wunderwood.org
>>> http://observer.wunderwood.org/  (my blog)
>>> 
>>> 
>>>> On Sep 22, 2015, at 6:01 PM, CrazyDiamond  wrote:
>>>> 
>>>> my index is updating frequently and i need to remove unused documents from
>>>> index after update/reindex.
>>>> Optimizaion is very expensive so what should i do?
>>>> 
>>>> 
>>>> 
>>>> --
>>>> View this message in context: 
>>>> http://lucene.472066.n3.nabble.com/is-there-a-way-to-remove-deleted-documents-from-index-without-optimize-tp4230691.html
>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>> 
>>