OK, you really have to get familiar with the admin/analysis page. Whitespace tokenizer is really simple, it breaks up on whitespace. So punctuation is kept in the index. Which is very rarely what you want. Use something like StandardTokenizer or maybe a filter that removes all non-alpha-num characters ( see one of the regex filters).
ComplexPhrase should do what you want, but if (and only if) you've indexed stuff appropriately. So I'd concentrate on getting the indexing to do what you need, then worry about querying. KeywordTokenizer is pretty much inappropriate for any kind of free-text search, it doesn't break the input up at _all_. And you need to completely re-index all your docs when you change the schema. There are a _few_ cases where that's not necessary, but until you're very familiar with the nuances it's much safer just to re-index from scratch. It _will_ work to > shut down Solr > rm -r the_data_directory > restart solr That'll wipe everything out. If you're in Solr Cloud I'd recommend deleting and recreating the collection on schema change. Best, Erick On Mon, Jun 27, 2016 at 2:21 PM, Felipe Vinturini <felipe.vintur...@gmail.com> wrote: > Hi *all*! > > First time posting! I have been struggling with Solr v4.10.2 with a > PhraseQuery with wildcard! > > My field definition is below: > <!-- Search field --> > <field name="title" type="text_pt_en" indexed="true" stored="true" /> > <!-- Field definition --> > <fieldType name="text_pt_en" class="solr.TextField" > positionIncrementGap="100"> > <analyzer type="index"> > <charFilter class="solr.HTMLStripCharFilterFactory" /> > > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="lang/stopwords_pt.txt" format="snowball" > enablePositionIncrements="true" /> > <tokenizer class="solr.WhitespaceTokenizerFactory" /> > <!-- <tokenizer class="solr.KeywordTokenizerFactory" /> --> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.ASCIIFoldingFilterFactory" preserveOriginal="false" /> > <filter class="solr.ReversedWildcardFilterFactory" /> > </analyzer> > > <analyzer type="query"> > <charFilter class="solr.HTMLStripCharFilterFactory"/> > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="lang/stopwords_pt.txt" format="snowball" > enablePositionIncrements="true" /> > <tokenizer class="solr.WhitespaceTokenizerFactory" /> > <!-- <tokenizer class="solr.KeywordTokenizerFactory" /> --> > <filter class="solr.LowerCaseFilterFactory" /> > <filter class="solr.ASCIIFoldingFilterFactory" preserveOriginal="false" /> > </analyzer> > </fieldType> > > Let's suppose I have the following value added to the index of the field > above (portuguese): > Teste de texto; Será quebrado em espaços em branco! > > And the values added to the index, based on the analyzer chain will be > (from Solr "Analysis"): > etset teste ;otxet texto; odarbeuq quebrado socapse espacos !ocnarb branco! > Today, I can search, for example: > title:teste > title:(teste texto) > title:(teste de texto) > title:("teste de texto;") // (PhraseQuery) matches because of ";" in the > end of the string > But, if I try to search (PhraseQuery): > title:("teste de texto") > "parsedquery": "PhraseQuery(title:\"teste ? texto\")" > title:("teste de texto*") > "parsedquery": "PhraseQuery(title:\"teste ? texto*\")" > No results are returned. > > I have read about possible solutions to this, but none of them seems to > work: > MultitermQueryAnalysis > Complex Phrase Query Parser > > And I just can't understand why the query with the wildcard in the end: "*" > does not work, no results are returned. > Some comments: > - I don't have control over what is entered in the search, I would like it > to work like a "file listing", like a "glob"; > - Today I can't change my tokenizer to: "StandardTokenizerFactory" (that in > this case would work), because I need to search for e-mails, words with > colon, for example; > - I tried the: "KeywordTokenizer", but I have the same behavior as above; > - I read about: "ShingleFilterFactory", but my index would be huge, because > I need to index full texts (with more than 30000 chars); > - One person in stackoverflow pointed me to the documentation where it says > it is not possible to use a wildcard in a phrase query using the standard > query parser. > I tried to use the *complexphrase: **{!complexphrase}title:"teste de > texto*"*, but no results still. Am I doing something wrong? Is there > anything wrong with my schema analysis? > - I could make it work using: "KeywordTokenizerFactory", but it only works > with "RegexpQuery": *title:(/.*teste de texto.*/)*. Do I have other options? > > Could you please help me understand what happens, if there is a way to make > a PhraseQuery with a wildcard work and what are my options? > > Please, let me know if you need further information and thanks a lot for > your attention and help! > *Felipe*. > > PS: I have added the same question to stackoverflow: > http://stackoverflow.com/questions/38061980/solr-phrasequery-with-wildcard