Solr PhraseQuery With Wildcard

2016-06-27 Thread Felipe Vinturini
Hi *all*!

First time posting! I have been struggling with Solr v4.10.2 with a
PhraseQuery with wildcard!

My field definition is below:

























Let's suppose I have the following value added to the index of the field
above (portuguese):
Teste de texto; Será quebrado em espaços em branco!

And the values added to the index, based on the analyzer chain will be
(from Solr "Analysis"):
etset teste ;otxet texto; odarbeuq quebrado socapse espacos !ocnarb branco!
Today, I can search, for example:
title:teste
title:(teste texto)
title:(teste de texto)
title:("teste de texto;") // (PhraseQuery) matches because of ";" in the
end of the string
But, if I try to search (PhraseQuery):
title:("teste de texto")
"parsedquery": "PhraseQuery(title:\"teste ? texto\")"
title:("teste de texto*")
"parsedquery": "PhraseQuery(title:\"teste ? texto*\")"
No results are returned.

I have read about possible solutions to this, but none of them seems to
work:
MultitermQueryAnalysis
Complex Phrase Query Parser

And I just can't understand why the query with the wildcard in the end: "*"
does not work, no results are returned.
Some comments:
- I don't have control over what is entered in the search, I would like it
to work like a "file listing", like a "glob";
- Today I can't change my tokenizer to: "StandardTokenizerFactory" (that in
this case would work), because I need to search for e-mails, words with
colon, for example;
- I tried the: "KeywordTokenizer", but I have the same behavior as above;
- I read about: "ShingleFilterFactory", but my index would be huge, because
I need to index full texts (with more than 3 chars);
- One person in stackoverflow pointed me to the documentation where it says
it is not possible to use a wildcard in a phrase query using the standard
query parser.
I tried to use the *complexphrase: **{!complexphrase}title:"teste de
texto*"*, but no results still. Am I doing something wrong? Is there
anything wrong with my schema analysis?
- I could make it work using: "KeywordTokenizerFactory", but it only works
with "RegexpQuery": *title:(/.*teste de texto.*/)*. Do I have other options?

Could you please help me understand what happens, if there is a way to make
a PhraseQuery with a wildcard work and what are my options?

Please, let me know if you need further information and thanks a lot for
your attention and help!
*Felipe*.

PS: I have added the same question to stackoverflow:
http://stackoverflow.com/questions/38061980/solr-phrasequery-with-wildcard


Re: Solr PhraseQuery With Wildcard

2016-06-28 Thread Felipe Vinturini
Hi Erick,

Thanks for your comments! In fact, I started with Solr one month ago, so I
am still learning! =)

I understand the differences between the Solr tokenizers, but there are so
many options that take some time to find the one that fits our need.

I found a solution to my problem with the configuration below:
  
And search using the Complex Phrase Query Parser, like below, now returns
the desired document:
{!complexphrase df=title}"teste de texto*"

I think that the problem with my last field setup was the
StopFilterFactory, as the Complex Phrase Query Parser documentation states:
"It is recommended not to use stopword elimination with this query parser."
[1]

I've done some tests and, so far, this setup fits my needs (queries).

As I commented, I am new to Solr, so I would like your/Solr community input
to know if there is a better way/other way to achieve the same or if you
see any problem with the setup above?!?

Thanks a lot for your help!

Regards,
Felipe.

[1]
https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-ComplexPhraseQueryParser




On Tue, Jun 28, 2016 at 2:22 AM, Erick Erickson 
wrote:

> OK, you really have to get familiar with the
> admin/analysis page. Whitespace tokenizer
> is really simple, it breaks up on whitespace. So
> punctuation is kept in the index. Which is very
> rarely what you want. Use something like
> StandardTokenizer or maybe a filter that
> removes all non-alpha-num characters (
> see one of the regex filters).
>
> ComplexPhrase should do what you want, but if
> (and only if) you've indexed stuff appropriately. So
> I'd concentrate on getting the indexing to do
> what you need, then worry about querying.
>
> KeywordTokenizer is pretty much inappropriate for
> any kind of free-text search, it doesn't break the input
> up at _all_.
>
> And you need to completely re-index all your docs when
> you change the schema. There are a _few_ cases
> where that's not necessary, but until you're very
> familiar with the nuances it's much safer just
> to re-index from scratch. It _will_ work to
> > shut down Solr
> > rm -r the_data_directory
> > restart solr
>
> That'll wipe everything out. If you're in Solr Cloud
> I'd recommend deleting and recreating the collection
> on schema change.
>
> Best,
> Erick
>
> On Mon, Jun 27, 2016 at 2:21 PM, Felipe Vinturini
>  wrote:
> > Hi *all*!
> >
> > First time posting! I have been struggling with Solr v4.10.2 with a
> > PhraseQuery with wildcard!
> >
> > My field definition is below:
> > 
> > 
> > 
> >  > positionIncrementGap="100">
> > 
> > 
> >
> >  > words="lang/stopwords_pt.txt" format="snowball"
> > enablePositionIncrements="true" />
> > 
> > 
> > 
> >  />
> > 
> > 
> >
> > 
> > 
> >  > words="lang/stopwords_pt.txt" format="snowball"
> > enablePositionIncrements="true" />
> > 
> > 
> > 
> >  />
> > 
> > 
> >
> > Let's suppose I have the following value added to the index of the field
> > above (portuguese):
> > Teste de texto; Será quebrado em espaços em branco!
> >
> > And the values added to the index, based on the analyzer chain will be
> > (from Solr "Analysis"):
> > etset teste ;otxet texto; odarbeuq quebrado socapse espacos !ocnarb
> branco!
> > Today, I can search, for example:
> > title:teste
> > title:(teste texto)
> > title:(teste de texto)
> > title:("teste de texto;") // (PhraseQuery) matches because of ";" in the
> > end of the string
> > But, if I try to search (PhraseQuery):
> > title:("teste de texto")
> > "parsedquery": "PhraseQuery(title:\"teste ? texto\")"
> > title:("teste de texto*")
> > "parsedquery": "PhraseQuery(title:\"teste ? texto*\")"
> > No results are returned.
> >
> > I have read about possible solutions to this, but none of them seems to
> > work:
> > MultitermQueryAnalysis
> > Complex Phrase Query Parser
> >
> > And I just can't understand why the query with the wildcard in the end:
> "*"
> > does not work, no results are returned.
> > Some comments:
> > - I don't have control over what is entered in the search, I would like
> it
> > to work like a "file listing", like a "glob";
> > - Today I can't change my tokenizer to: "StandardTokenizerFactory&quo

Solr Date Query "Intraday"

2016-07-22 Thread Felipe Vinturini
Hi all,

Is there a way to query solr between dates and query like "intraday",
between hours in those days? Something like: I want to search field "text"
with value: "test" and field "date" between 20160601 AND 20160610 and
between only hours of those days: 1PM AND 4PM?

I know I could loop over the dates, I just would like to know if there is
another way to do it in Solr. My Solr version is: 4.10.2.

Also, is there a "name" for these kind of queries?

Thanks a lot for your attention and help.

Regards,
Felipe.


Re: Using DIH FileListEntityProcessor with SolrCloud

2016-12-05 Thread Felipe Vinturini
Hi *Chris*,

I've never used the DIH, but maybe the "*fileName*" pattern is wrong?
 fileName="*.*xml*"

Should be:
 fileName="**.xml*"

Regards,
*Felipe*.


On Mon, Dec 5, 2016 at 9:43 AM, Chris Rogers  wrote:

> Hi all,
>
> Just bumping my question again, as doesn’t seem to have been picked up by
> anyone. Any help would be much appreciated.
>
> Chris
>
> On 02/12/2016, 16:36, "Chris Rogers" 
> wrote:
>
> Hi all,
>
> A question regarding using the DIH FileListEntityProcessor with
> SolrCloud (solr 6.3.0, zookeeper 3.4.8).
>
> I get that the config in SolrCloud lives on the Zookeeper node (a
> different server from the solr nodes in my setup).
>
> With this in mind, where is the baseDir attribute in the
> FileListEntityProcessor config relative to? I’m seeing the config in the
> Solr GUI, and I’ve tried setting it as an absolute path on my Zookeeper
> server, but this doesn’t seem to work… any ideas how this should be setup?
>
> My DIH config is below:
>
> 
>   
>   
> 
>  fileName=".*xml"
> newerThan="'NOW-5YEARS'"
> recursive="true"
> rootEntity="false"
> dataSource="null"
> baseDir="/home/bodl-zoo-svc/files/">
>
>   
>
>  forEach="/TEI" url="${f.fileAbsolutePath}"
> transformer="RegexTransformer" >
>  xpath="/TEI/teiHeader/fileDesc/titleStmt/title"/>
>  xpath="/TEI/teiHeader/fileDesc/publicationStmt/publisher"/>
> 
>   
>
> 
>
>   
> 
>
>
> This same script worked as expected on a single solr node (i.e. not in
> SolrCloud mode).
>
> Thanks,
> Chris
>
> --
> Chris Rogers
> Digital Projects Manager
> Bodleian Digital Library Systems and Services
> chris.rog...@bodleian.ox.ac.uk
>
>
>