Re: Extracting important multi term phrases from the text

2018-11-20 Thread Pratik Patel
@David Sorry for late reply. The SKG query that I am using is actually fairly basic in itself. For example, { > "queries":[ > "dataStoreId:\"123\"", > "text:\"foo\"" > ], > "compare":[ > { > "type":"text_shingles", > "limit":30, > "discover_values":true > } > ] > } What I am exp

Re: Extracting important multi term phrases from the text

2018-11-16 Thread Alexandre Rafalovitch
Good catch Pratik. It is in Javadoc, but not in the reference guide: https://lucene.apache.org/core/6_3_0/analyzers-common/org/apache/lucene/analysis/shingle/ShingleFilterFactory.html . I'll try to fix that later (SOLR-12996). Regards, Alex. On Fri, 16 Nov 2018 at 10:44, Pratik Patel wrote: >

Re: Extracting important multi term phrases from the text

2018-11-16 Thread David Hastings
Thanks, I would be really curious to see your url call if you dont mind. I am just getting started with the skg stuff and finding this conversation in particular has really helped On Fri, Nov 16, 2018 at 10:44 AM Pratik Patel wrote: > @Markus @Walter, @Alexandre is right. The culprit was not S

Re: Extracting important multi term phrases from the text

2018-11-16 Thread Pratik Patel
@Markus @Walter, @Alexandre is right. The culprit was not StopWord Filter, it was ShingleFilter. I could not find parameter filterToken in documentation, is it a new addition? BTW, I tried that and it works. Thanks! I still ended up using pattern replacement filter because I did not want any singl

Re: Extracting important multi term phrases from the text

2018-11-16 Thread David Hastings
Which function of the SKG are you using? significantTerms? On Thu, Nov 15, 2018 at 7:09 PM Alexandre Rafalovitch wrote: > I think the underscore actually comes from the Shingles (parameter > fillerToken). Have you tried setting it to empty string? > > Regards, >Alex. > On Thu, 15 Nov 2018 a

Re: Extracting important multi term phrases from the text

2018-11-15 Thread Alexandre Rafalovitch
I think the underscore actually comes from the Shingles (parameter fillerToken). Have you tried setting it to empty string? Regards, Alex. On Thu, 15 Nov 2018 at 17:16, Pratik Patel wrote: > > Hi Markus, > > Thanks for the reply. I tried using ShingleFilter and it seems to > be working. Howeve

Re: Extracting important multi term phrases from the text

2018-11-15 Thread Walter Underwood
Removing StopFilter will > introduce noise, but you could work around it with SKG. Please let us know if > it works for you. > > Rergards, > Markus > > -Original message- >> From:Pratik Patel >> Sent: Thursday 15th November 2018 23:16 >> To: solr-user@lu

RE: Extracting important multi term phrases from the text

2018-11-15 Thread Markus Jelsma
works for you. Rergards, Markus -Original message- > From:Pratik Patel > Sent: Thursday 15th November 2018 23:16 > To: solr-user@lucene.apache.org > Subject: Re: Extracting important multi term phrases from the text > > Hi Markus, > > Thanks for

Re: Extracting important multi term phrases from the text

2018-11-15 Thread Pratik Patel
Hi Markus, Thanks for the reply. I tried using ShingleFilter and it seems to be working. However, I am hitting an issue when it is used with StopWordFilter. StopWordFilter leaves an underscore "_" for removed words and it kind of screws up the data in index. I tried setting enablePositionIncremen

RE: Extracting important multi term phrases from the text

2018-11-15 Thread Markus Jelsma
Hello Pratik, We would use ShingleFilter for this indeed. If you only want bigrams/shingles, don't forget to disable outputUnigrams and set both shinle size limits to 2. Regards, Markus -Original message- > From:Pratik Patel > Sent: Thursday 15th November 2018 17:00 > To: solr-user@luc