Re: How to use DocValues with TextField
Hi, I actually needed this functionality for a long time and I made up an extended data type to work around. In my use case, I need a case-insensitive search for a relatively short string and at the same time, I need faceting on the original string. For example, “Human, Home sapiens’ is an original input, and I want it to be searched by human, Human, homo sapiens or Homo Sapiens. Here is my workaround, public class TextDocValueField extends TextField { @Override public List createFields(SchemaField field, Object value, float boost) { if (field.hasDocValues()) { List fields = new ArrayList<>(); fields.add(createField(field, value, boost)); final BytesRef bytes = new BytesRef(value.toString()); if (field.multiValued()) { fields.add(new SortedSetDocValuesField(field.getName(), bytes)); } else { fields.add(new SortedDocValuesField(field.getName(), bytes)); } return fields; } else { // return Collections.singletonList(createField(field, value, boost)); return super.createFields(field, value, boost); } } @Override public void checkSchemaField(final SchemaField field) { // do nothing } @Override public boolean multiValuedFieldCache() { return false; } } I wish this can be supported by solr so that I don’t have to maintain my own repo. What do you think? Regards, Harry > On Jan 5, 2016, at 10:51 PM, Alok Bhandari > wrote: > > Thanks Markus. > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/How-to-use-DocValues-with-TextField-tp4248647p4248797.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to use DocValues with TextField
RE: separating a column into two for different behavior. Yes, that is exactly I was advised multiple time. However, it will make a problem when I apply it to my application. I have a one core that contains more than 50 columns (out of 100) want to be searched by case-insensitive and partial match as well as faceting. Having too many columns is not a real problem. If I want to search, sort, faceting, I need to map column relationships, that became a headache. RE: scalability / performance my current index size about 1.1TB, 25 cores, biggest core has 435M records. Don’t have or experienced any memory or performance issue. I am currently maintaining own repo just for this feature. It would be nice if this is supported out of box. Harry > On Jan 26, 2016, at 11:28 AM, Erick Erickson wrote: > > DocValues was designed to support unanalyzed types > originally. I don't know that code, but given my respect > for the people who wrote I'd be amazed if there weren't > very good reasons this is true. I suspect your work-around > is going to be "surprising". > > And have you tested your change at scale? I suspect > searching won't scale well. > > bq: I need a case-insensitive search for a relatively short string > and at the same time, I need faceting on the original string > > There's no reason at all to change code to do this. Just use a copyField. > The field that's to be faceted on is a "string" type with docValues=true, and > the searchable field is some text type with the appropriate analysis chain. > > This doesn't really make much difference memory wise since the indexing > and docValues are separate in the first place. I.e. if I specify > indexed=true and docValues=true I get _two_ sets of date indexed. > > Best, > Erick > > On Tue, Jan 26, 2016 at 8:50 AM, Harry Yoo wrote: >> Hi, I actually needed this functionality for a long time and I made up an >> extended data type to work around. >> >> In my use case, I need a case-insensitive search for a relatively short >> string and at the same time, I need faceting on the original string. For >> example, “Human, Home sapiens’ is an original input, and I want it to be >> searched by human, Human, homo sapiens or Homo Sapiens. >> >> Here is my workaround, >> >> public class TextDocValueField extends TextField { >> >> @Override >> public List createFields(SchemaField field, Object value, >> float boost) { >>if (field.hasDocValues()) { >> List fields = new ArrayList<>(); >> fields.add(createField(field, value, boost)); >> final BytesRef bytes = new BytesRef(value.toString()); >> if (field.multiValued()) { >>fields.add(new SortedSetDocValuesField(field.getName(), bytes)); >> } else { >>fields.add(new SortedDocValuesField(field.getName(), bytes)); >> } >> return fields; >>} else { >> // return Collections.singletonList(createField(field, value, boost)); >> return super.createFields(field, value, boost); >>} >> } >> >> @Override >> public void checkSchemaField(final SchemaField field) { >>// do nothing >> } >> >> @Override >> public boolean multiValuedFieldCache() { >>return false; >> } >> } >> >> I wish this can be supported by solr so that I don’t have to maintain my own >> repo. >> >> >> >> What do you think? >> >> Regards, >> Harry >> >> >>> On Jan 5, 2016, at 10:51 PM, Alok Bhandari >>> wrote: >>> >>> Thanks Markus. >>> >>> >>> >>> -- >>> View this message in context: >>> http://lucene.472066.n3.nabble.com/How-to-use-DocValues-with-TextField-tp4248647p4248797.html >>> Sent from the Solr - User mailing list archive at Nabble.com. >>
Re: How to use DocValues with TextField
Thanks for the pointer. Please advise me how I can contribute. H > On Jan 27, 2016, at 2:16 AM, Toke Eskildsen wrote: > > Erick Erickson wrote: >> DocValues was designed to support unanalyzed types >> originally. I don't know that code, but given my respect >> for the people who wrote I'd be amazed if there weren't >> very good reasons this is true. I suspect your work-around >> is going to be "surprising". > > Hoss talked about this at the last Lucene/Solr Revolution and has opened > https://issues.apache.org/jira/browse/SOLR-8362 > > Harry: Maybe you could follow up on that JIRA issue? > > - Toke Eskildsen
Re: serious JSON Facet bug
Is there a way to patch? I am using 5.2.1 and using json facet in production. > On Jul 16, 2015, at 1:43 PM, Yonik Seeley wrote: > > To anyone using the JSON Facet API in released Solr versions: > I discovered a serious memory leak while doing performance benchmarks > (see http://yonik.com/facet_performance/ for some of the early results). > > Assuming you're in the evaluation / development phase of your project, > I'd recommend using a recent developer snapshot for now: > https://builds.apache.org/job/Solr-Artifacts-5.x/lastSuccessfulBuild/artifact/solr/package/ > > The fix (and performance improvements) will also be in the next Solr > release (5.3) of course. > > -Yonik
Re: serious JSON Facet bug
yes, I see the problem on my production solr. I set 10,240 as max and I see the current size is 228,940. x22 bigger than max. > On Jul 23, 2015, at 8:43 PM, Yonik Seeley wrote: > > On Thu, Jul 23, 2015 at 5:00 PM, Harry Yoo wrote: >> Is there a way to patch? I am using 5.2.1 and using json facet in production. > > First you should see if your queries tickle the bug... > check the size of the filter cache from the admin screen (under > plugins, filterCache) > and see if it's current size is larger than the configured maximum. > > -Yonik > > >>> On Jul 16, 2015, at 1:43 PM, Yonik Seeley wrote: >>> >>> To anyone using the JSON Facet API in released Solr versions: >>> I discovered a serious memory leak while doing performance benchmarks >>> (see http://yonik.com/facet_performance/ for some of the early results). >>> >>> Assuming you're in the evaluation / development phase of your project, >>> I'd recommend using a recent developer snapshot for now: >>> https://builds.apache.org/job/Solr-Artifacts-5.x/lastSuccessfulBuild/artifact/solr/package/ >>> >>> The fix (and performance improvements) will also be in the next Solr >>> release (5.3) of course. >>> >>> -Yonik >>
Re: Best way to facets with value preprocessing (w/ docValues)
I had a same issue and here is my solution. Basically, option #1 that Konstantin suggested, public class TextDocValueField extends TextField { @Override public List createFields(SchemaField field, Object value, float boost) { if (field.hasDocValues()) { List fields = new ArrayList<>(); fields.add(createField(field, value, boost)); final BytesRef bytes = new BytesRef(value.toString()); if (field.multiValued()) { fields.add(new SortedSetDocValuesField(field.getName(), bytes)); } else { fields.add(new SortedDocValuesField(field.getName(), bytes)); } return fields; } else { // return Collections.singletonList(createField(field, value, boost)); return super.createFields(field, value, boost); } } @Override public void checkSchemaField(final SchemaField field) { // do nothing } @Override public boolean multiValuedFieldCache() { return false; } } I had no problem so far, but I haven’t compared performance. I wish Solr allows docValue on TextField Best, Harry
Re: is there a way to remove deleted documents from index without optimize
I should have read this. My project has been running from apache solr 4.x, and moved to 5.x and recently migrated to 6.6.1. Do you think solr will take care of old version indexes as well? I wanted to make sure my indexes are updated with 6.x lucence version so that it will be supported when i move to solr 7.x Is there any best practice managing solr indexes? Harry > On Sep 22, 2015, at 8:21 PM, Walter Underwood wrote: > > Don’t do anything. Solr will automatically clean up the deleted documents for > you. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > >> On Sep 22, 2015, at 6:01 PM, CrazyDiamond wrote: >> >> my index is updating frequently and i need to remove unused documents from >> index after update/reindex. >> Optimizaion is very expensive so what should i do? >> >> >> >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/is-there-a-way-to-remove-deleted-documents-from-index-without-optimize-tp4230691.html >> Sent from the Solr - User mailing list archive at Nabble.com. >
Re: is there a way to remove deleted documents from index without optimize
Thanks for the clarification. I use ${lucene.version} in the solrconfig.xml and pass -Dlucene.version when I launch solr, to keep the versions. > On Oct 12, 2017, at 11:01 PM, Erick Erickson wrote: > > You can use the IndexUpgradeTool that ships with each version of Solr > (well, actually Lucene) to, well, upgrade your index. So you can use > the IndexUpgradeTool that ships with 5x to upgrade from 4x. And the > one that ships with 6x to upgrade from 5x. etc. > > That said, none of that is necessary _if_ you >> have the Lucene version in solrconfig.xml be the one that corresponds to >> your current Solr. I.e. a solrconfig for 6x should have a luceneMatchVersion >> of 6something. >> you update your index enough to rewrite all segments before moving to the >> _next_ version. When Lucene sees merges a segment, it writes the new segment >> according to the luceneMatchVersion in solrconfig.xml. So as long as you are >> on a version long enough for all segments to be merged into new segments, >> you don't have to worry. > > Best, > Erick > > On Thu, Oct 12, 2017 at 8:29 PM, Harry Yoo wrote: >> I should have read this. My project has been running from apache solr 4.x, >> and moved to 5.x and recently migrated to 6.6.1. Do you think solr will take >> care of old version indexes as well? I wanted to make sure my indexes are >> updated with 6.x lucence version so that it will be supported when i move to >> solr 7.x >> >> Is there any best practice managing solr indexes? >> >> Harry >> >>> On Sep 22, 2015, at 8:21 PM, Walter Underwood wrote: >>> >>> Don’t do anything. Solr will automatically clean up the deleted documents >>> for you. >>> >>> wunder >>> Walter Underwood >>> wun...@wunderwood.org >>> http://observer.wunderwood.org/ (my blog) >>> >>> >>>> On Sep 22, 2015, at 6:01 PM, CrazyDiamond wrote: >>>> >>>> my index is updating frequently and i need to remove unused documents from >>>> index after update/reindex. >>>> Optimizaion is very expensive so what should i do? >>>> >>>> >>>> >>>> -- >>>> View this message in context: >>>> http://lucene.472066.n3.nabble.com/is-there-a-way-to-remove-deleted-documents-from-index-without-optimize-tp4230691.html >>>> Sent from the Solr - User mailing list archive at Nabble.com. >>> >>