Re: Range query on a substring.

Jack Krupansky Tue, 16 Jul 2013 15:44:07 -0700

Yeah, I was thinking about that.

But... will it properly order "10" as being greater than "9"? Usually, weused trie or sorted field types to assure numeric order, but a text fielddoesn't have that feature.

Although I did think that maybe you could have a token filter that mappednumeric values to a fixed number of digits with leading zeros, and then theywould be properly ordered. But, I don't think we have a token filter thatcan do that, although I imagine that a new one could be proposed.


-- Jack Krupansky

-----Original Message-----From: Ahmet Arslan

Sent: Tuesday, July 16, 2013 6:33 PM
To: solr-user@lucene.apache.org
Subject: Re: Range query on a substring.

Hi Macrin,

May be you can use https://issues.apache.org/jira/browse/SOLR-1604 .ComplexPhraseQueryParser supports ranges inside phrases.

________________________________
From: Marcin Rzewucki <mrzewu...@gmail.com>
To: solr-user@lucene.apache.org
Sent: Wednesday, July 17, 2013 12:08 AM
Subject: Re: Range query on a substring.

Hi guys,

First of all, thanks for your response.

Jack: Data structure was created some time ago and this is a new
requirement in my project. I'm trying to find a solution. I wouldn't like
to split multivalued field into N similar records varying in this
particular field only. That could impact performance and imply more changes
in backend architecture as well. I'd prefer to create yet another
collection and use pseudo-joins...

Roman: Your ideas seem to be much closer to what I'm looking for. However,
the following syntax: "text (1|2|3)" does not work for me. Are you sure it
works like OR inside a regexp ?
By the way: Honestly, I have one more requirement for which I would have to
extend Solr query syntax. Basically, it should be possible to do some math
on few fields and do range query on the result (without indexing it,
because a combination of different fields is allowed). I'd like to spend
some time on ANTLR and the new way of parsing you mentioned. I will let you
know if it was useful for me. Thanks.

Kind regards.

On 16 July 2013 20:07, Roman Chyla <roman.ch...@gmail.com> wrote:

Well, I think this is slightly too categorical - a range query on a
substring can be thought of as a simple range query. So, for example the
following query:

"lucene 1*"

becomes behind the scenes: "lucene (10|11|12|13|14|1abcd)"

the issue there is that it is a string range, but it is a range query - it
just has to be indexed in a clever way

So, Marcin, you still have quite a few options besides the strict boolean
query model

1. have a special tokenizer chain which creates one token out of these
groups (eg. "some text prefix_1") and search for "some text prefix_*" [and
do some post-filtering if necessary]
2. another version, using regex /some text (1|2|3...)/ - you got the idea
3. construct the lucene multi-term range query automatically, in your
qparser - to produce a phrase query "lucene (10|11|12|13|14)"
4. use payloads to index your integer at the position of "some text" and
then retrieve only "some text" where the payload is in range x-y - an
example is here, look at getPayloadQuery()

https://github.com/romanchyla/montysolr/blob/master/contrib/adsabs/src/test/org/adsabs/lucene/BenchmarkAuthorSearch.java-
but this is more complex situation and if you google, you will find a
better description
5. use a qparser that is able to handle nested search and analysis at the
same time - eg. your query is: field:"some text" NEAR1 field:[0 TO 10] - i
know about a parser that can handle this and i invite others to check it
out (yeah, JIRA tickets need reviewers ;-))
https://issues.apache.org/jira/browse/LUCENE-5014

there might be others i forgot, but it is certainly doable; but as Jack
points out, you may want to stop for a moment to reflect whether it is
necessary

HTH,

  roman

On Tue, Jul 16, 2013 at 8:35 AM, Jack Krupansky <j...@basetechnology.com
>wrote:

> Sorry, but you are basically misusing Solr (and multivalued fields),
> trying to take a "shortcut" to avoid a proper data model.
>
> To properly use Solr, you need to put each of these multivalued field
> values in a separate Solr document, with a "text" field and a "value"
> field. Then, you can query:
>
>    text:"some text" AND value:[min-value TO max-value]
>

> Exactly how you should restructure your data model is dependent on all> of

> your other requirements.
>
> You may be able to simply flatten your data.
>
> You may be able to use a simple join operation.
>
> Or, maybe you need to do a multi-step query operation if you data is
> sufficiently complex.
>
> If you want to keep your multivalued field in its current form for
display
> purposes or keyword search, or exact match search, fine, but your stated
> goal is inconsistent with the Semantics of Solr and Lucene.
>
> To be crystal clear, there is no such thing as "a range query on a
> substring" in Solr or Lucene.
>
> -- Jack Krupansky
>
> -----Original Message----- From: Marcin Rzewucki
> Sent: Tuesday, July 16, 2013 5:13 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Range query on a substring.
>
>
> By multivalued I meant an array of values. For example:
> <arr name="myfield">
>  <str>text1 (X)</str>
>  <str>text2 (Y)</str>
> </arr>
>
> I'd like to avoid spliting it as you propose. I have 2.3mn collection
with
> pretty large records (few hundreds fields and more per record).
Duplicating
> them would impact performance.
>
> Regards.
>
>
>
> On 16 July 2013 10:26, Oleg Burlaca <oburl...@gmail.com> wrote:
>
>  Ah, you mean something like this:
>> record:

>> Id=10, text = "this is a text N1 (X), another text N2 (Y), text N3>> (Z)"

>> Id=11, text =  "this is a text N1 (W), another text N2 (Q), third text
>> (M)"
>>
>> and you need to search for: "text N1" and X < B ?
>> How big is the core? the first thing that comes to my mind, again, at
>> indexing level,
>> split the text into pieces and index it in solr like this:
>>
>> record_id | text      | value
>> 10           | text N1 | X
>> 10           | text N2 | Y
>> 10           | text N3 | Z
>>
>> does it help?
>>
>>
>>
>> On Tue, Jul 16, 2013 at 10:51 AM, Marcin Rzewucki <mrzewu...@gmail.com
>> >wrote:
>>
>> > Hi Oleg,
>> > It's a multivalued field and it won't be easier to query when I split
>> this
>> > field into text and numbers. I may get wrong results.
>> >
>> > Regards.
>> >
>> >
>> > On 16 July 2013 09:35, Oleg Burlaca <oburl...@gmail.com> wrote:
>> >
>> > > IMHO the number(s) should be extracted and stored in separate
columns
>> in
>> > > SOLR at indexing time.
>> > >
>> > > --
>> > > Oleg
>> > >
>> > >
>> > > On Tue, Jul 16, 2013 at 10:12 AM, Marcin Rzewucki <
>> mrzewu...@gmail.com
>> > > >wrote:
>> > >
>> > > > Hi,
>> > > >
>> > > > I have a problem (wonder if it is possible to solve it at all)
with
>> the

>> > > > following query. There are documents with a field which contains>> > > > a

>> text
>> > > and
>> > > > a number in brackets, eg.
>> > > >
>> > > > myfield: this is a text (number)
>> > > >
>> > > > There might be some other documents with the same text but
different
>> > > number
>> > > > in brackets.
>> > > > I'd like to find documents with the given text say "this is a
text"
>> and
>> > > > "number" between A and B. Is it possible in Solr ? Any ideas ?
>> > > >
>> > > > Kind regards.
>> > > >
>> > >
>> >
>>
>>
>

Re: Range query on a substring.

Reply via email to