RE: Solr sort preferences number vs space vs character

2016-03-14 Thread Andrew Chillrud
No experience with this personally, but it seems like you are describing 
https://cwiki.apache.org/confluence/display/solr/Language+Analysis#LanguageAnalysis-UnicodeCollation

- Andy -

-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: Monday, March 14, 2016 10:51 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr sort preferences number vs space vs character

On 3/14/2016 12:05 AM, vkrishna wrote:
> Hey Shawn,
>
> Is there any way to use ASCII? so I can get the result I want.

I do not know whether Solr has any config facility to incorporate a custom 
Lucene sorting class.  I tried to look at the Lucene code to see if I could 
figure out how/where the sorting happens, but I couldn't decipher it.

ASCII wouldn't give you the result you want, though -- it sorts numbers before 
letters, and you want them after.  You would likely need some VERY custom 
Lucene sort code ... but like I said above, I do not know if Solr has a way to 
plug that in.

Thanks,
Shawn



RE: Solr sort preferences number vs space vs character

2016-03-14 Thread Andrew Chillrud
Are you sorting against an untokenized field (either defined using the 'string' 
fieldType or a fieldType that is configured with KeywordTokenizerFactory)?

Solr will let you sort against a tokenized field. Not sure what happens 
internally when you do this, but the results will not be what you expect.

- Andy -

-Original Message-
From: vkrishna [mailto:vamsikrishna_t...@yahoo.com] 
Sent: Monday, March 14, 2016 1:14 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr sort preferences number vs space vs character

Shawn, I think you did saw my required result order in previous update(which is 
different from what I asked first )space > number > character, sorry for 
confusion.

Thanks,
Krishna.

On Mon, 3/14/16, Shawn Heisey-2 [via Lucene] 
 wrote:

 Subject: Re: Solr sort preferences number vs space vs character
 To: "vkrishna" 
 Date: Monday, March 14, 2016, 9:58 AM
 
 
 
 On 3/14/2016 10:28 AM, vkrishna wrote:
 > I completely forgot to mention that this  kind of sorting is working fine in 
 > 1.4 version now we are  upgrading to 5.4. I know solr made many changes 
 > between,  because it's been years. Do you know when and in which  version 
 > they made changes for sorting.
 
 Absolutely no idea.
 
 I would be *very* surprised to learn that  numbers were sorted after  letters 
in ANY  version of Solr/Lucene.  If I did see that, I think I  would be looking 
to file a bug.
 
 Thanks,
 Shawn
 
 
 
 
 
 ___
 If you reply to this email, your message will  be added to the discussion 
below:
 
http://lucene.472066.n3.nabble.com/Solr-sort-preferences-number-vs-space-vs-character-tp4263527p4263684.html
 
 To unsubscribe from Solr sort preferences  number vs space vs character, visit 
http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4263527&code=dmFtc2lrcmlzaG5hX3Rzc3NAeWFob28uY29tfDQyNjM1Mjd8LTE3NjA5MTUyMw==




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-sort-preferences-number-vs-space-vs-character-tp4263527p4263691.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Multilevel nested level support using Solr

2015-04-20 Thread Andrew Chillrud
Don't know if this is what you are looking for, but we had a similar 
requirement. In our case each folder had a unique identifier associated with it.

When generating the Solr input document our code populated 2 fields, 
parent_folder, and folder_hierarchy (multi-valued), and for a document in the 
root->foo->bar folder added:

parent_folder:
folder_hierarchy:
folder_hierarchy:
folder_hierarchy:

At search time, if you wanted to restrict your search within the folder 'bar' 
we generated a filter query for either 'parent_folder:' or 
'folder_hierarchy:' depending on whether you wanted only 
documents directly under the 'bar' folder (your case 3), or at any level 
underneath 'bar' (your case 1).

If your folders don't have unique identifiers then you could achieve something 
similar by indexing the folder paths in string fields:

parent_folder:root|foo|bar
folder_hierarchy:root|foo|bar
folder_hierarchy:root|foo
folder_hierarchy:root

and generating a fq for either 'parent_folder:root|foo|bar' or 
'folder_hierarchy:root|foo|bar'

If you didn't want to have to generate all the permutations for the 
folder_hierarchy field before sending the document to Solr for indexing you 
should be able to do something like:

  

  


  

  

   
  

  

In which case you could just send in the 'folder_parent' field and Solr would 
generate the folder_hierarchy field.

For cases 2 and 4 you could do something similar by adding 2 additional fields 
that just index the folder names instead of the paths.

- Andy -

-Original Message-
From: Steven White [mailto:swhite4...@gmail.com] 
Sent: Monday, April 20, 2015 9:49 AM
To: solr-user@lucene.apache.org
Subject: Re: Multilevel nested level support using Solr

Re sending to see if anyone can help.  Thanks

Steve

On Fri, Apr 17, 2015 at 12:14 PM, Steven White  wrote:

> Hi folks,
>
> In my DB, my records are nested in a folder base hierarchy:
>
> 
> 
> record_1
> record_2
> 
> record_3
> record_4
> 
> record_5
> 
> 
> 
> record_6
> record_7
> record_8
>
> You got the idea.
>
> Is there anything in Solr that will let me preserve this structer and 
> thus when I'm searching to tell it in which level to narrow down the 
> search?  I have four search levels needs:
>
> 1) Be able to search inside only level: ...* 
> (and everything under Level_2 from this path).
>
> 2) Be able to search inside a level regardless it's path: .* 
> (no matter where  is, i want to search on all records under 
> Level_2 and everything under it's path.
>
> 3) Same as #1 but limit the search to within that level (nothing below 
> its level are searched).
>
> 4) Same as #3 but limit the search to within that level (nothing below 
> its level are searched).
>
> I found this:
> https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+I
> ndex+Handlers#UploadingDatawithIndexHandlers-NestedChildDocuments
> but it looks like it supports one level only and requires the whole 
> two levels be updated even if 1 of the doc in the nest is updated.
>
> Thanks
>
> Steve
>


RE: Solr query which return only those docs whose all tokens are from given list

2015-05-11 Thread Andrew Chillrud
Based on his example, it sounds like Naresh not only wants the tags field to 
contain at least one of the values [T1, T2, T3] but also wants to exclude 
documents that contain a tag other than T1, T2, or T3 (Doc3 should not be 
retrieved).

If the set of possible values in the tags field is limited and known, you could 
use a NOT (or '-') clause to accomplish this. If there were 5 possible tag 
values:

tags:(( T1 OR T2 OR T3) NOT (T4 OR T5))

However this doesn't seem practical if the number of possible values is large 
or unlimited. Perhaps something could be done with range queries:

tags:(( T1 OR T2 OR T3) NOT ([* TO T1} OR {T1 TO T2} OR {T3 to * ]))

however this would require whatever is constructing the query to be aware of 
the lexical ordering of the terms in the index. Maybe there are more elegant 
solutions, but I am not aware of them.

- Andy -

-Original Message-
From: sujitatgt...@gmail.com [mailto:sujitatgt...@gmail.com] On Behalf Of Sujit 
Pal
Sent: Monday, May 11, 2015 10:40 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr query which return only those docs whose all tokens are from 
given list

Hi Naresh,

Couldn't you could just model this as an OR query since your requirement is at 
least one (but can be more than one), ie:

tags:T1 tags:T2 tags:T3

-sujit


On Mon, May 11, 2015 at 4:14 AM, Naresh Yadav  wrote:

> Hi all,
>
> Also asked this here : http://stackoverflow.com/questions/30166116
>
> For example i have SOLR docs in which tags field is indexed :
>
> Doc1 -> tags:T1 T2
>
> Doc2 -> tags:T1 T3
>
> Doc3 -> tags:T1 T4
>
> Doc4 -> tags:T1 T2 T3
>
> Query1 : get all docs with "tags:T1 AND tags:T3" then it works and 
> will give Doc2 and Doc4
>
> Query2 : get all docs whose tags must be one of these [T1, T2, T3] 
> Expected is : Doc1, Doc2, Doc4
>
> How to model Query2 in Solr ?? Please help me on this ?
>


RE: Solr Exact match boost Reduce the results

2015-06-12 Thread Andrew Chillrud
If I understand you correctly you want to boost the score of documents where 
the contents of the product_name field match exactly (other than case) the 
query string.

I think what you need is for the dummy_name field to be non-tokenized (indexed 
as a single string rather than parsed into individual words). The name of the 
field type you have configured the dummy_name field  to use (string_ci) would 
seem to indicate this is your intent. However the definition of string_ci 
doesn't match the name. It is configured to use the WhitespaceTokenizerFactory 
tokenizer, which will break the contents of the field up into multiple tokens 
where ever white space occurs.

Try defining string_ci using the (somewhat cryptically named) 
KeywordTokenizerFactory, which will index the entire contents of the field as a 
single token. Something like:


  


  


- Andy -

-Original Message-
From: JACK [mailto:mfal...@gmail.com] 
Sent: Friday, June 12, 2015 12:54 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr Exact match boost Reduce the results

As explained above, actually I have around 10 lack data not 5 row. It's not 
about synonyms . When I checked in the FAQ page of Solr wiki, it is found that 
if we need to get exact match results first, use a copy field with different 
configuration. That's why I followed this way. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Exact-match-boost-Reduce-the-results-tp4211352p4211434.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: [E] Re: Faceting Question(s)

2016-06-02 Thread Andrew Chillrud
It is possible to get the original facet counts for the field you are filtering 
on (we have been using this since Solr 3.6). Don't know if this can be extended 
to get the original counts for all fields however. 

This syntax is described here: 
https://cwiki.apache.org/confluence/display/solr/Faceting

Tagging and Excluding Filters

You can tag specific filters and exclude those filters when faceting. This is 
useful when doing multi-select faceting.

Consider the following example query with faceting:

q=mainquery&fq=status:public&fq=doctype:pdf&facet=true&facet.field=doctype

Because everything is already constrained by the filter doctype:pdf, the 
facet.field=doctype facet command is currently redundant and will return 0 
counts for everything except doctype:pdf.

To implement a multi-select facet for doctype, a GUI may want to still display 
the other doctype values and their associated counts, as if the doctype:pdf 
constraint had not yet been applied. For example:
=== Document Type ===
  [ ] Word (42)
  [x] PDF  (96)
  [ ] Excel(11)
  [ ] HTML (63)

To return counts for doctype values that are currently not selected, tag 
filters that directly constrain doctype, and exclude those filters when 
faceting on doctype.

q=mainquery&fq=status:public&fq={!tag=dt}doctype:pdf&facet=true&facet.field={!ex=dt}doctype

Filter exclusion is supported for all types of facets. Both the tag and ex 
local parameters may specify multiple values by separating them with commas.

- Andy -

-Original Message-
From: Robert Brown [mailto:r...@intelcompute.com] 
Sent: Thursday, June 02, 2016 2:12 PM
To: solr-user@lucene.apache.org
Subject: Re: [E] Re: Faceting Question(s)

MaryJo, I think you've mis-understood.  The counts are different simply because 
the 2nd query contains an filter of a facet value from the 1st query - that's 
completely expected.

The issue is how to get the original facet counts (with no filters but same q) 
in the same call as also filtering by one of those facet values.

Personally I don't think it's possible, but will be interested to hear others 
input, since it's a very common situation for me - I cache the first result in 
memcached and tag future queries as related to the first.

Or could you always make 2 calls back to Solr (one original (again), and one 
with the filters), the caches should help massively.



On 02/06/16 19:07, MaryJo Sminkey wrote:
> And you're saying the count for the second query is different than what was
> returned in the facet? You may need to check for any defaults you have set
> up in the solrconfig for the select parser, if for instance you have any
> grouping going on, but aren't doing grouping in your facet, that could
> result in the counts being off.
>
> MJ
>
>
>
>
> On Thu, Jun 2, 2016 at 2:01 PM, Jamal, Sarfaraz <
> sarfaraz.ja...@verizonwireless.com.invalid> wrote:
>
>> Absolutely,
>>
>> Here is what it looks like:
>>
>> This brings the right counts as it should
>> http://
>> **select?q=video&hl=true&hl.fl=*&hl.snippets=20&facet=true&facet.field=team
>>
>> Then when I specify which team
>> http://
>> **select?q=video&hl=true&hl.fl=*&hl.snippets=20&facet=true&facet.field=team&fq=team:rollback
>>
>> The counts are obviously different now, as the result set is limited to
>> one team.
>>
>> Sas
>>
>> -Original Message-
>> From: MaryJo Sminkey [mailto:mjsmin...@gmail.com]
>> Sent: Thursday, June 2, 2016 1:56 PM
>> To: solr-user@lucene.apache.org
>> Subject: [E] Re: Faceting Question(s)
>>
>> Jamai - what is your q= set to? And do you have a fq for the original
>> query? I have found that if you do a wildcard search (*.*) you have to be
>> careful about other parameters you set as that can often result in the
>> numbers returned being off. In my case, my defaults had things like edismax
>> settings for phrase boosting, etc. that don't apply if there isn't a search
>> term, and once I removed those for a wildcard search I got the correct
>> numbers. So possibly your facet query itself may be set up correctly but
>> something else in the parameters and/or filters with the two queries may be
>> the cause of the difference.
>>
>> Mary Jo
>>
>>
>> On Thu, Jun 2, 2016 at 1:47 PM, Jamal, Sarfaraz <
>> sarfaraz.ja...@verizonwireless.com.invalid> wrote:
>>
>>> Hello Everyone,
>>>
>>> I am working on implementing some basic faceting into my project.
>>>
>>> I have it working the way I want to, but I feel like there is probably
>>> a better way the way I went about it.
>>>
>>> * I want to show a category and its count.
>>> * when someone clicks a category, it sets a FQ= to that category.
>>>
>>> But now that the results are being filtered, the category counts from
>>> the original query without the filters are off.
>>>
>>> So, I have a single api call that I make with rows set to 0 and the
>>> base query without any filters, and use that to display my categories.
>>>
>>> And then I call the api again, this time to get the results. And the
>>> cate

RE: Are there issues with the use of SolrCloud / embedded Zookeeper in non-HA deployments?

2016-07-28 Thread Andrew Chillrud
Thanks Markus, Scott, and Erick, I appreciate the input.

Scott, I am not clear what you meant by " One reason is that zkServer.cmd tells 
the process that run Zookeeper by judging the DOS window title. However, 
according to what verison of Windows you use and how you start DOS window, it 
could be wrong.". Can you explain further?

I looked for your post on modifying the script but was unable to find it. Can 
you provide a link?

Thanks again,
Andy

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Thursday, July 28, 2016 1:01 PM
To: solr-user
Subject: Re: Are there issues with the use of SolrCloud / embedded Zookeeper in 
non-HA deployments?

I can certainly that external Zookeepers get _waaay_ more testing/use than 
embedded. While I don't know of any _specific_ issues, embedded is largely 
intended for ease of first use.

I think my argument would be that in the case where you have a customer 
migrating from one node to many, if you _already_ have an external ZK then that 
transition would be much easier. Even if the external ZK is just another Java 
program running on the same physical node...

FWIW,
Erick

On Thu, Jul 28, 2016 at 8:44 AM, Markus Jelsma  
wrote:
> Hello - all our production environments as deployed as a cloud, even when 
> just a single Solr instance is used. We did this for the purpose having a 
> single method of deployment / provisioning and just because we have the 
> option to add replica's with ease if we need to.
>
> We never use embedded Zookeeper.
>
> Markus
>
>
> -Original message-
>> From:Andy C 
>> Sent: Thursday 28th July 2016 17:38
>> To: solr-user@lucene.apache.org
>> Subject: Are there issues with the use of SolrCloud / embedded Zookeeper in 
>> non-HA deployments?
>>
>> We have integrated Solr 5.3.1 into our product. During installation 
>> customers have the option of setting up a single Solr instance, or 
>> for high availability deployments, multiple Solr instances in a 
>> master/slave configuration.
>>
>> We are looking at migrating to SolrCloud for HA deployments, but are 
>> wondering if it makes sense to also use SolrCloud in non-HA deployments?
>>
>> Our thought is that this would simplify things. We could use the same 
>> approach for deploying our schema.xml and other configuration files 
>> on all systems, we could always use the SolrJ CloudSolrClient class 
>> to communicate with Solr, etc.
>>
>> Would it make sense to use the embedded Zookeeper instance in this 
>> situation? I have seen warning that the embedded Zookeeper should not 
>> be used in production deployments, but the reason generally given is 
>> that if Solr goes down Zookeeper will also go down, which doesn't 
>> seem relevant here. Are there other reasons not to use the embedded 
>> Zookeeper?
>>
>> More generally, are there downsides to using SolrCloud with a single 
>> Zookeeper node and single Solr node?
>>
>> Would appreciate any feedback.
>>
>> Thanks,
>> Andy
>>