Solr PhraseQuery With Wildcard
Hi *all*! First time posting! I have been struggling with Solr v4.10.2 with a PhraseQuery with wildcard! My field definition is below: Let's suppose I have the following value added to the index of the field above (portuguese): Teste de texto; Será quebrado em espaços em branco! And the values added to the index, based on the analyzer chain will be (from Solr "Analysis"): etset teste ;otxet texto; odarbeuq quebrado socapse espacos !ocnarb branco! Today, I can search, for example: title:teste title:(teste texto) title:(teste de texto) title:("teste de texto;") // (PhraseQuery) matches because of ";" in the end of the string But, if I try to search (PhraseQuery): title:("teste de texto") "parsedquery": "PhraseQuery(title:\"teste ? texto\")" title:("teste de texto*") "parsedquery": "PhraseQuery(title:\"teste ? texto*\")" No results are returned. I have read about possible solutions to this, but none of them seems to work: MultitermQueryAnalysis Complex Phrase Query Parser And I just can't understand why the query with the wildcard in the end: "*" does not work, no results are returned. Some comments: - I don't have control over what is entered in the search, I would like it to work like a "file listing", like a "glob"; - Today I can't change my tokenizer to: "StandardTokenizerFactory" (that in this case would work), because I need to search for e-mails, words with colon, for example; - I tried the: "KeywordTokenizer", but I have the same behavior as above; - I read about: "ShingleFilterFactory", but my index would be huge, because I need to index full texts (with more than 3 chars); - One person in stackoverflow pointed me to the documentation where it says it is not possible to use a wildcard in a phrase query using the standard query parser. I tried to use the *complexphrase: **{!complexphrase}title:"teste de texto*"*, but no results still. Am I doing something wrong? Is there anything wrong with my schema analysis? - I could make it work using: "KeywordTokenizerFactory", but it only works with "RegexpQuery": *title:(/.*teste de texto.*/)*. Do I have other options? Could you please help me understand what happens, if there is a way to make a PhraseQuery with a wildcard work and what are my options? Please, let me know if you need further information and thanks a lot for your attention and help! *Felipe*. PS: I have added the same question to stackoverflow: http://stackoverflow.com/questions/38061980/solr-phrasequery-with-wildcard
Re: Solr PhraseQuery With Wildcard
Hi Erick, Thanks for your comments! In fact, I started with Solr one month ago, so I am still learning! =) I understand the differences between the Solr tokenizers, but there are so many options that take some time to find the one that fits our need. I found a solution to my problem with the configuration below: And search using the Complex Phrase Query Parser, like below, now returns the desired document: {!complexphrase df=title}"teste de texto*" I think that the problem with my last field setup was the StopFilterFactory, as the Complex Phrase Query Parser documentation states: "It is recommended not to use stopword elimination with this query parser." [1] I've done some tests and, so far, this setup fits my needs (queries). As I commented, I am new to Solr, so I would like your/Solr community input to know if there is a better way/other way to achieve the same or if you see any problem with the setup above?!? Thanks a lot for your help! Regards, Felipe. [1] https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-ComplexPhraseQueryParser On Tue, Jun 28, 2016 at 2:22 AM, Erick Erickson wrote: > OK, you really have to get familiar with the > admin/analysis page. Whitespace tokenizer > is really simple, it breaks up on whitespace. So > punctuation is kept in the index. Which is very > rarely what you want. Use something like > StandardTokenizer or maybe a filter that > removes all non-alpha-num characters ( > see one of the regex filters). > > ComplexPhrase should do what you want, but if > (and only if) you've indexed stuff appropriately. So > I'd concentrate on getting the indexing to do > what you need, then worry about querying. > > KeywordTokenizer is pretty much inappropriate for > any kind of free-text search, it doesn't break the input > up at _all_. > > And you need to completely re-index all your docs when > you change the schema. There are a _few_ cases > where that's not necessary, but until you're very > familiar with the nuances it's much safer just > to re-index from scratch. It _will_ work to > > shut down Solr > > rm -r the_data_directory > > restart solr > > That'll wipe everything out. If you're in Solr Cloud > I'd recommend deleting and recreating the collection > on schema change. > > Best, > Erick > > On Mon, Jun 27, 2016 at 2:21 PM, Felipe Vinturini > wrote: > > Hi *all*! > > > > First time posting! I have been struggling with Solr v4.10.2 with a > > PhraseQuery with wildcard! > > > > My field definition is below: > > > > > > > > > positionIncrementGap="100"> > > > > > > > > > words="lang/stopwords_pt.txt" format="snowball" > > enablePositionIncrements="true" /> > > > > > > > > /> > > > > > > > > > > > > > words="lang/stopwords_pt.txt" format="snowball" > > enablePositionIncrements="true" /> > > > > > > > > /> > > > > > > > > Let's suppose I have the following value added to the index of the field > > above (portuguese): > > Teste de texto; Será quebrado em espaços em branco! > > > > And the values added to the index, based on the analyzer chain will be > > (from Solr "Analysis"): > > etset teste ;otxet texto; odarbeuq quebrado socapse espacos !ocnarb > branco! > > Today, I can search, for example: > > title:teste > > title:(teste texto) > > title:(teste de texto) > > title:("teste de texto;") // (PhraseQuery) matches because of ";" in the > > end of the string > > But, if I try to search (PhraseQuery): > > title:("teste de texto") > > "parsedquery": "PhraseQuery(title:\"teste ? texto\")" > > title:("teste de texto*") > > "parsedquery": "PhraseQuery(title:\"teste ? texto*\")" > > No results are returned. > > > > I have read about possible solutions to this, but none of them seems to > > work: > > MultitermQueryAnalysis > > Complex Phrase Query Parser > > > > And I just can't understand why the query with the wildcard in the end: > "*" > > does not work, no results are returned. > > Some comments: > > - I don't have control over what is entered in the search, I would like > it > > to work like a "file listing", like a "glob"; > > - Today I can't change my tokenizer to: "StandardTokenizerFactory&quo
Solr Date Query "Intraday"
Hi all, Is there a way to query solr between dates and query like "intraday", between hours in those days? Something like: I want to search field "text" with value: "test" and field "date" between 20160601 AND 20160610 and between only hours of those days: 1PM AND 4PM? I know I could loop over the dates, I just would like to know if there is another way to do it in Solr. My Solr version is: 4.10.2. Also, is there a "name" for these kind of queries? Thanks a lot for your attention and help. Regards, Felipe.
Re: Using DIH FileListEntityProcessor with SolrCloud
Hi *Chris*, I've never used the DIH, but maybe the "*fileName*" pattern is wrong? fileName="*.*xml*" Should be: fileName="**.xml*" Regards, *Felipe*. On Mon, Dec 5, 2016 at 9:43 AM, Chris Rogers wrote: > Hi all, > > Just bumping my question again, as doesn’t seem to have been picked up by > anyone. Any help would be much appreciated. > > Chris > > On 02/12/2016, 16:36, "Chris Rogers" > wrote: > > Hi all, > > A question regarding using the DIH FileListEntityProcessor with > SolrCloud (solr 6.3.0, zookeeper 3.4.8). > > I get that the config in SolrCloud lives on the Zookeeper node (a > different server from the solr nodes in my setup). > > With this in mind, where is the baseDir attribute in the > FileListEntityProcessor config relative to? I’m seeing the config in the > Solr GUI, and I’ve tried setting it as an absolute path on my Zookeeper > server, but this doesn’t seem to work… any ideas how this should be setup? > > My DIH config is below: > > > > > > fileName=".*xml" > newerThan="'NOW-5YEARS'" > recursive="true" > rootEntity="false" > dataSource="null" > baseDir="/home/bodl-zoo-svc/files/"> > > > > forEach="/TEI" url="${f.fileAbsolutePath}" > transformer="RegexTransformer" > > xpath="/TEI/teiHeader/fileDesc/titleStmt/title"/> > xpath="/TEI/teiHeader/fileDesc/publicationStmt/publisher"/> > > > > > > > > > > This same script worked as expected on a single solr node (i.e. not in > SolrCloud mode). > > Thanks, > Chris > > -- > Chris Rogers > Digital Projects Manager > Bodleian Digital Library Systems and Services > chris.rog...@bodleian.ox.ac.uk > > >