Re: Copy field a source of copy field

2017-07-26 Thread alessandro.benedetti
I get your point, the second KeepWordFilter is not keeping anything because the token it gets is : "hey you" and the word is supposed to keep is "hey". Which does clearly not work. The KeepWordFilter just consider each row a single token ( I may be wrong, i didn't check the code, I am just asssumi

Re: Copy field a source of copy field

2017-07-25 Thread tstusr
Je, I also think that!. We have some serious gaps on what you explain to me. First, you point me that there's no real need to use ShingleFilter, I tried with all Tokenizer and the result is the same, the species are not caught. On the simplest scenario I've got this: PUT YOUR FAVORI

Re: Copy field a source of copy field

2017-07-20 Thread Erick Erickson
Yep, we're not communication ;) Use the original source field for the genus, as: The difficulty here is that there might be false hits if the genera names happen to match words in the input that are not part of a genus/species pair. On Thu, Jul 20, 2017 at 9:55 AM, tstusr wrote: > Well, co

Re: Copy field a source of copy field

2017-07-20 Thread tstusr
Well, correct me if I'm wrong. Your suggestion is to use species field as a source of genus field. We try with this Where species work as described and genus just use a KWF, like this: But now, the problem now is different. When we try the

Re: Copy field a source of copy field

2017-07-19 Thread Erick Erickson
OK, you'll need two fields pretty much for certain. The trick is getting _only_ genus names in the genus field. The simplest thing to do would be a straight copyField with a single keep word filter that contains a list of all the genera. That presupposes that the genera are disjoint sets from all

Re: Copy field a source of copy field

2017-07-19 Thread tstusr
Well, our documents consist on pdf files (between 20 to 200 pages). So, we catch words of all the file, for that, we use the extract handler, that's why we have this fields: We catch species in all the pdf content (On attr_content field) Species captured are used for ranking purposes. So, w

Re: Copy field a source of copy field

2017-07-18 Thread Erick Erickson
OK, I take it back. Keepwords handle multiple words just fine. So I have to rewind. I'm having no trouble at all applying multiple, successive keepwords filters, even when there are multiple words on a single line in the keepwords file. Your use of shingles in here is probably going to confuse thi

Re: Copy field a source of copy field

2017-07-18 Thread tstusr
Well, for me it's kind of strange because it's working only with words that have blank spaces. It seems that maybe I'm not explaining well. My field is defined as follows: We have 2 KWF files, "species" and

Re: Copy field a source of copy field

2017-07-18 Thread Erick Erickson
Multiple keyword files work just fine for me. one issue you're having is that multi-word keepwords aren't going to do what you expect. The analysis chains work on _tokens_, and only see one at a time. Plus (apparently) the input is broken up on whitespace (the docs aren't entirely clear on this, b

Re: Copy field a source of copy field

2017-07-18 Thread tstusr
Well, I have no idea why that images display as did. The correct order is: Field chain analyzer. KWF-genus file Test output.

Re: Copy field a source of copy field

2017-07-18 Thread tstusr
It seems that is just taking the last file of keep words. Now for control purposes, I have in genus file: And just is takin

Re: Copy field a source of copy field

2017-07-18 Thread Erick Erickson
The code is very simple, it looks at a quick glance like it just reads the words in then the "accept" method just returns true or false based on whether the text file contains the token. Are you sure you reloaded your core/collection and pushed the changed schema to the right place? The admin/anal

Re: Copy field a source of copy field

2017-07-18 Thread tstusr
Ok, I know shingling will join with "_". But that is the behaviour we want, imagine we have this fields (contained in species file): abarema idiopoda abutilon bakerianum Those become in: abarema idiopoda abutilon bakerianum abarema_idiopoda abutilon_bakerianum But now in my genus file maybe i

Re: Copy field a source of copy field

2017-07-17 Thread Shawn Heisey
On 7/17/2017 4:26 PM, tstusr wrote: > We want to use a copy field as a source for another copy field or some kind > of post processing of a field. > As an example imagine we have on species > > abies durangensis > abies flinckii > > so, after post processing, we expect to have only > abies > > whi

Re: Copy field a source of copy field

2017-07-17 Thread Erick Erickson
In a word, "no". Copyfields are not chained together. I'm not at all sure what you're trying to accomplish with those filter chains anyway, By shingling _then_ doing the stopwords, you'll have some input like abies durangensis become abies abies_durangensis durangensis Then put that through your