Re: SolrMeter is dead?

2014-05-16 Thread Tomás Fernández Löbbe
It didn't have any improvements for a long time now (It doesn't have any
SolrCloud-related feautes for example), I just added a note on Solr wiki to
alert users about that. Feel free to ask on the solrmeter mailing list if
you have any other questions.

Tomás


On Wed, May 14, 2014 at 3:56 AM, Ahmet Arslan  wrote:

> Hi Al,
>
> http://jmeter.apache.org
>
> Ahmet
>
>
>
>
>
> On Wednesday, May 14, 2014 1:11 PM, Al Krinker 
> wrote:
> I am trying to test performance of my cluster (solr 4.8).
>
> SolrMeter looked promising... small and standalone. Plus, open source so
> that I could make tweaks if needed.
>
> However, I see that the last update date was in Oct 2012. Is it dead? Any
> better non commercial and preferably open sourced projects out there?
>
> Thanks,
> Al
>
>


Re: permissive mm value and efficient spellchecking

2014-05-16 Thread elisabeth benoit
ok, thanks a lot, I'll check that out.


2014-05-14 14:20 GMT+02:00 Markus Jelsma :

> Elisabeth, i think you are looking for SOLR-3211 that introduced
> spellcheck.collateParam.* to override e.g. dismax settings.
>
> Markus
>
> -Original message-
> From:elisabeth benoit 
> Sent:Wed 14-05-2014 14:01
> Subject:permissive mm value and efficient spellchecking
> To:solr-user@lucene.apache.org;
> Hello,
>
> I'm using solr 4.2.1.
>
> I use a very permissive value for mm, to be able to find results even if
> request contains non relevant words.
>
> At the same time, I'd like to be able to do some efficient spellcheking
> with solrdirectspellchecker.
>
> So for instance, if user searches for "rue de Chraonne Paris", where
> Chraonne is mispelled, because of my permissive mm value I get more than
> 100 000 results containing words "rue" and "Paris" ("de" is a stopword),
> which are very frequent terms in my index, but no spellcheck correction for
> Chraonne. If I set mm=3, then I get the expected spellcheck correction
> value: "rue de Charonne Paris".
>
> Is there a way to achieve my two goals in a single solr request?
>
> Thanks,
> Elisabeth
>


Re: location of the files created by zookeeper?

2014-05-16 Thread Aman Tandon
Any help here??

With Regards
Aman Tandon


On Thu, May 15, 2014 at 10:17 PM, Aman Tandon wrote:

> Hi,
>
> Can anybody tell me where does the embedded zookeeper keeps your config
> files.when we describe the configName in starting the solrcloud then it
> gives that name to the directory, as guessed from the solr logs.
>
>
>
>
>
>
> *4409 [main] INFO  org.apache.solr.common.cloud.SolrZkClient  – makePath:
> /configs/myconf/velocity/footer.vm4456 [main] INFO
> org.apache.solr.common.cloud.SolrZkClient  – makePath:
> /configs/myconf/velocity/pagination_bottom.vm 4479 [main] INFO
> org.apache.solr.common.cloud.SolrZkClient  – makePath:
> /configs/myconf/velocity/head.vm4530 [main] INFO
> org.apache.solr.common.cloud.SolrZkClient  – makePath:
> /configs/myconf/velocity/pagination_top.vm 4555 [main] INFO
> org.apache.solr.common.cloud.SolrZkClient  – makePath:
> /configs/myconf/velocity/VM_global_library.vm4599 [main] INFO
> org.apache.solr.common.cloud.SolrZkClient  – makePath:
> /configs/myconf/velocity/suggest.vm*
>
>
> With Regards
> Aman Tandon
>


Re: KeywordTokenizerFactory splits the string for the exclamation mark

2014-05-16 Thread Erick Erickson
Have you looked at the results after adding &debug=query? That often
gives you valuable insights into such questions. Admittedly, the debug
syntax can be "interesting" get used to...

Best,
Erick

On Tue, May 13, 2014 at 9:11 PM, nativecoder  wrote:
> Yes that happens due to the ! mark.
>
> Also can someone please tell me the difference between searching a text in
> the following ways
>
> 1. q=Exact_Word:"samplestring"
>
> 2. q=samplestring&qf=Exact_Word
>
> 3. q="samplestring"&qf=Exact_Word
>
> I think the first and the third one are the same.  is it correct ? How does
> it differ from the second one.
>
> I am trying to understand how enclosing the full term in "" is resolving
> this problem ? What does it tell to solr  ?
>
> Other than the exclamation mark are there any other characters which tells
> specific things to solr
> On May 14, 2014 1:54 AM, "nativecoder [via Lucene]" <
> ml-node+s472066n4135493...@n3.nabble.com> wrote:
>
>> Also could you please tell me the difference between searching a text in
>> the following ways
>>
>> q=Exact_Word:"samplestring"
>>
>> q=samplestring&qf=Exact_Word
>>
>> I am trying to understand how enclosing the full term in "" is resolving
>> this problem ? What does it tell to solr  ?
>>
>> Other than the exclamation mark are there any other characters which tells
>> specific things to solr
>>
>> --
>>  If you reply to this email, your message will be added to the discussion
>> below:
>>
>> http://lucene.472066.n3.nabble.com/KeywordTokenizerFactory-splits-the-string-for-the-exclamation-mark-tp4135460p4135493.html
>>  To start a new topic under Solr - User, email
>> ml-node+s472066n472068...@n3.nabble.com
>> To unsubscribe from KeywordTokenizerFactory splits the string for the
>> exclamation mark, click 
>> here
>> .
>> NAML
>>
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/KeywordTokenizerFactory-splits-the-string-for-the-exclamation-mark-tp4135460p4135525.html
> Sent from the Solr - User mailing list archive at Nabble.com.


not getting any mails

2014-05-16 Thread Aman Tandon
Hi,

I am not getting any mails from this group, did my subscription just got
ended? Is there anybody can help.

With Regards
Aman Tandon


Is there a way to change transientCacheSize dynamically without restarting Solr

2014-05-16 Thread Elran Dvir
Hi All,

Is there an API in Solr to change transientCacheSize dynamically without the 
need to restart Solr?
Is there other Solr configuration parameters that can be changed dynamically?

Thanks.


Using embedded zookeeper to make an ensemble

2014-05-16 Thread Upayavira
Hi,

I need to set up a zookeeper ensemble. I could download Zookeeper and do
it that way. I already have everything I need to run Zookeeper within a
Solr install.

Is it possible to run a three node zookeeper ensemble by starting up
three Solr nodes with Zookeeper enabled? Obviously, I'd only use these
nodes for their Zookeeper, and keep their indexes empty.

I've made some initial attempts, and whilst it looks like it might be
possible with -DzkRun and -DzkHost=, I haven't yet succeeded.

I think this could be a much easier way for people familar with Solr to
get an ensemble up compared to downloading the Zookeeper distribution.

Thoughts?

Upayavira


Re: Join in solr to get data from two cores

2014-05-16 Thread Erick Erickson
Please read:

http://wiki.apache.org/solr/UsingMailingLists

and the contained link:
http://catb.org/~esr/faqs/smart-questions.html

On Tue, May 13, 2014 at 12:03 AM, Kamal Kishore
 wrote:
> NO reply from anybody..seems strange ?
>
>
> On Fri, May 9, 2014 at 9:47 AM, Kamal Kishore
> wrote:
>
>> Any updates guys ?
>>
>>
>> On Thu, May 8, 2014 at 2:05 PM, Kamal Kishore > > wrote:
>>
>>> Dear Team,
>>>
>>> I have two solr cores. One containing products information and second has
>>> customers points. I am looking at solr join to query on first product core
>>> & boost the results based on customer points in second core. I am not able
>>> to frame solr query for this.
>>>
>>> Moreover, solr is not allowing to get data from both the core.
>>>
>>>
>>> With RegardsK
>>>
>>> Kamal Kishore
>>>
>>>
>>>
>>>
>>
>
> --
> -
> 
>
> Follow IndiaMART.com  for latest updates on this
> and more: 
>   Mobile 
> Channel:
> 
>
>
>
>
>
>
>


Understanding core unload in SolrCloud

2014-05-16 Thread Saumitra Srivastav
When a core is unloaded it is unregistered from zookeeper and stops taking
request, while retaining data on disk(with default params).

Can someone explain what happens internally and how memory, CPU and network
bandwidth will be affected if we load/unload shards frequently in SolrCloud
setup using core admin API?

-Saumitra




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Understanding-core-unload-in-SolrCloud-tp4135714.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: solr 4.2.1 spellcheck strange results

2014-05-16 Thread Dyer, James
To achieve what you want, you need to specify a lightly analyzed field (no 
stemming) for spellcheck.  For instance, if your "solr.SpellCheckComponent" in 
solrconfig.xml is set up with "field" of "title_full", then try using 
"title_full_unstemmed".  Also, if you are specifying a 
"queryAnalyzerFieldType", it should be the same as your unstemmed text field.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: HL [mailto:freemail.grha...@gmail.com] 
Sent: Saturday, May 10, 2014 9:12 AM
To: solr-user@lucene.apache.org
Subject: solr 4.2.1 spellcheck strange results

Hi

I am querying the solr server spellcheck and the results I get back 
although at first glance look ok
it seems like solr is replying back as if it made the search with the 
wrong key.

so while I query the server with the word
"καρδυα"
Solr is responding me as if it was querying the database with the word 
"καρδυ" eliminating the last char
---



---

Ideally, Solr should properly indicate that the suggestions correspond 
with "καρδυα" rather than "καρδυ".

Is there a way to make solr respond with the original search word from 
the query in it's responce, instead of the one that is getting the hits 
from ??

Regars,
Harry



here is the complete solr responce
---


0
23

true
*,score
0
καρδυα
καρδυα

title_short^750 title_full_unstemmed^600 title_full^400 title^500 
title_alt^200 title_new^100 series^50 series2^30 author^300 
author_fuller^150 contents^10 topic_unstemmed^550 topic^500 
geographic^300 genre^300 allfields_unstemmed^10 fulltext_unstemmed^10 
allfields fulltext isbn issn

basicSpell
arrarr
dismax
xml
0






3
0
6
0


καρδ
5


καρδι
3


καρυ
1



false







Question regarding the lastest version of HeliosSearch

2014-05-16 Thread Jean-Sebastien Vachon
Hi All,

I spent some time today playing around with subfacets and facets functions now 
available in helios search 0.05 and I have some concerns... They look very 
promising .

I indexed 10 000 documents and built some queries to look at each feature and 
found some weird behaviour that I could not explain.

The first query I made was to find all documents having the word "java" in 
their title and then compute a facet on the field position_id with stats about 
the field job_id. Basically, I want the number of unique Job_ids for each 
position_id for all matching documents.

http://localhost:8983/solr/current/select?q=title:java&facet=on&facet.field=position_id&facet.stat=unique(job_id)&rows=1&facet.limit=10&facet.mincount=1&wt=json&indent=on&fl=job_id,position_id,super_alias_id

the response looks good except for one little thing... the mincount is not 
respected whenever I specify the facet.stat parameter. Removing it will cause 
the mincount to be respected but then I need this parameter.

Without the parameter the facet looks like this:
"facet_counts":{
"facet_queries":{},
"facet_fields":{
  "position_id":[
"265151",5,
"927284",1,
"1662380",1,
"2625553",1,
"2862455",1,
"4128904",1,
"4253203",1]},  <=== accounted for all 11 documents

And now when adding the parameter:


"facets":{

"position_id":{

  "stats":{

"unique(job_id)":11, <== again, 11 documents, which is good

"count":11},

  "buckets":[{

  "val":265151,

  "unique(job_id)":5,

  "count":5},

{

  "val":927284,

  "unique(job_id)":1,

  "count":1},

{

  "val":1662380,

  "unique(job_id)":1,

  "count":1},

{

  "val":2625553,

  "unique(job_id)":1,

  "count":1},

{

  "val":2862455,

  "unique(job_id)":1,

  "count":1},

{

  "val":4128904,

  "unique(job_id)":1,

  "count":1},

{

  "val":4253203,

  "unique(job_id)":1,

  "count":1},

{

  "val":1133,

  "unique(job_id)":0, <== what is this?

  "count":0},
 Many zero entries following...

I was wondering where the extra entries were coming from... the position_id = 
1133 above is not even a match for my query (its title is "Audit Consultant")
I`ve also noticed a similar behaviour when using subfacets. It looks like the 
number of items returned always match the "facet.limit" parameter.
If not enough values are present for a given entry then the bucket is filled 
with documents not matching the original query.

Am I doing something wrong?


Status of mail?

2014-05-16 Thread Jack Krupansky
Is the mail list working again yet??

-- Jack Krupansky

Re: SolrMeter is dead?

2014-05-16 Thread Dmitry Kan
There is also solrjmeter tool that wraps jmeter inside:
https://github.com/romanchyla/solrjmeter
I have tried it and saw more interesting graphs.

You can also plot the solr cache stats and other metrics via querying
with /admin/mbeans?stats=true&wt=json suffix on your core/collection and
using some target viz. system of preference. I've blogged about one:

http://java.dzone.com/articles/monitoring-solr-graphite-and

HTH,

Dmitry


On Wed, May 14, 2014 at 11:22 PM, Sameer Maggon
wrote:

> Have you looked at JMeter - http://jmeter.apache.org/
>
> Thanks,
> Sameer.
> --
> http://measuredsearch.com
>
>
> On Wed, May 7, 2014 at 7:51 AM, Al Krinker  wrote:
>
> > I am trying to test performance of my cluster (solr 4.8).
> >
> > SolrMeter looked promising... small and standalone. Plus, open source so
> > that I could make tweaks if needed.
> >
> > However, I see that the last update date was in Oct 2012. Is it dead? Any
> > better non commercial and preferably open sourced projects out there?
> >
> > Thanks,
> > Al
> >
>



-- 
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan


Solr performance: multiValued filed vs separate fields

2014-05-16 Thread danny teichthal
I wonder about performance difference of 2 indexing options: 1- multivalued
field 2- separate fields

The case is as follows: Each document has 100 “properties”: prop1..prop100.
The values are strings and there is no relation between different
properties. I would like to search by exact match on several properties by
known values (like ids). For example: search for all docs having
prop1=”blue” and prop6=”high”

I can choose to build the indexes in 1 of 2 ways: 1- the trivial way – 100
separate fields, 1 for each property, multiValued=false. the values are
just property values. 2- 1 field (named “properties”) multiValued=true. The
field will have 100 values: value1=”prop1:blue”.. value6=”high” etc

Is it correct to say that option1 will have much better performance in
searching? How about indexing performance?


Re: Replica active during warming

2014-05-16 Thread lboutros
Thank you Mark.

The issue : https://issues.apache.org/jira/browse/SOLR-6086

Ludovic.



-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Replica-active-during-warming-tp4135274p4136038.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: autowarming queries

2014-05-16 Thread Joel Bernstein
Are you talking about static warming queries, which you define as
newSearcher and firstSearcher events? If so, you should see all three
queries in the log. If you're still having the issue, can you post your
warming query configuration?

Joel Bernstein
Search Engineer at Heliosearch


On Wed, May 7, 2014 at 4:25 PM, Joshi, Shital  wrote:

> Hi,
>
> How many auto warming queries are supported per collection in Solr4.4 and
> higher? We see one out of three queries in log when new searcher is created.
>
> Thanks!
>
>
>
>


Index *.SH and *.SQL scripts

2014-05-16 Thread Marc
Hi,

Recently I have set up an image with SOLR. My goal is to index and extract
files on a Windows and Linux server. It is possible for me to index and
extract data from multiple file types. This is done by the SOLR CELL request
handler. See the post.jar cmd below.

j ava -Dauto -Drecursive -jar post.jar Y:\ SimplePostTool version 1.5
Posting files to base url localhost:8983/solr/update.. Entering auto mode.
File endings considered are xml,json,csv,pdf,doc,docx,ppt,pp
tx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log Entering recursive
mode, max depth=999, delay=0s 0 files indexed.

Is it possible to index and extract metadata/content from file types like
.sh and .sql? If it is possible I would like to know how of course :)
Many thanks
Marc



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Index-SH-and-SQL-scripts-tp4135072.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Physical Files v. Reported Index Size

2014-05-16 Thread Upayavira
Perhaps because du reports disk block usage, not total file size?



Upayavira





On Wed, May 7, 2014, at 04:34 AM, Darrell Burgan wrote:

Hello all, I’m trying to reconcile what I’m seeing in the file system
for a Solr index versus what it is reporting in the UI. Here’s what I
see in the UI for the index:


[1]https://s3-us-west-2.amazonaws.com/pa-darrell/ui.png


As shown, the index is 74.85 GB in size. However, here is what I see in
the data folder of the file system on that server:


[2]https://s3-us-west-2.amazonaws.com/pa-darrell/file-system.png


As shown, it is consuming 109 GB of space. Also note that one of the
index folders is 75 GB in size.


My question is why the difference, and whether I can remove some of
these index folders to reclaim file system space? Or is there a Solr
command to do it (is it as obvious as “Optimize”)?


If there a manual I should RTFM about the file structure, please point
me to it.  J


Thanks!

Darrell



[3]Description: Infor

Darrell Burgan | Architect, Sr. Principal, PeopleAnswers

office: 214 445 2172 | mobile: 214 564 4450 | fax: 972 692
5386 | [4]darrell.bur...@infor.com | [5]http://www.infor.com

CONFIDENTIALITY NOTE: This email (including any attachments) is
confidential and may be protected by legal privilege. If you are not
the intended recipient, be aware that any disclosure, copying,
distribution, or use of the information contained herein is
prohibited.  If you have received this message in error, please notify
the sender by replying to this message and then delete this message in
its entirety. Thank you for your cooperation.

References

1. https://s3-us-west-2.amazonaws.com/pa-darrell/ui.png
2. https://s3-us-west-2.amazonaws.com/pa-darrell/file-system.png
3. http://www.infor.com/
4. mailto:darrell.bur...@infor.com
5. http://www.infor.com/


Re: URLDataSource : indexing from other Solr servers

2014-05-16 Thread helder.sepulveda
I will try with the SolrEntityProcessor
 but I'm still intrested to know why will it not work with the
XPathEntityProcessor 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/URLDataSource-indexing-from-other-Solr-servers-tp4135321p4135730.html
Sent from the Solr - User mailing list archive at Nabble.com.


Unload collection in SolrCloud

2014-05-16 Thread Saumitra Srivastav
Is there a way to unload the complete collection in SolrCloud env? I can
achieve the same by unloading all shards of collection using core admin
API, but is there a better/cleaner approach?

-Saumitra




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Unload-collection-in-SolrCloud-tp4135706.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: not getting any mails

2014-05-16 Thread Peri Stracchino
hmm, i got a message yesterday about emails sent to me from the list
 bouncing, wonder if theres something odd going on with the mailing list?

Cheers
Peri (x4082)


On 10 May 2014 13:30, Aman Tandon  wrote:

> Hi,
>
> I am not getting any mails from this group, did my subscription just got
> ended? Is there anybody can help.
>
> With Regards
> Aman Tandon
>


callback with active state of node

2014-05-16 Thread ronak kirit
Hello,

We are using solr ver 4.3.1 and running them in solrcloud mode. We would
like to keep some dynamic configs under data directory of every shards
and/or replica of a collection. I would like to know that if nodes in not
in active state (lets say it is in recovery or other stats), and if it
comes back to active state, is there any callback can be registered for
active state? If the component is solrcoreaware, would that component's
inform method called every time, the nodes come back to active state?

Thanks,
Ronak


Re: distrib=false is not honoring

2014-05-16 Thread Aman Tandon
Hi,

There is one more problem today, i indexed the mcat core and again copied
the same and then starting the shard
(as decribed in above thread)



*And i was taking my non sharded index(mcats index) and copying it to node1
as well as node 2 and starting the first node as: *
I noticed that there is difference in the total count of docs, please see
these logs

query:
A) *localhost:1983*

*871949 [qtp27058272-19] INFO  org.apache.solr.core.SolrCore  – [mcats]
webapp=/solr path=/select
params={mm=%0a%09%0a2<-1+4<70%25%0a+&facet=true&tie=0.01&qf=%0a%09namex%0a+&distrib=false&wt=javabin&version=2&defType=edismax&rows=10&pf=%0a%09%0a+&NOW=1400148489992&shard.url=http://192.168.6.217:1983/solr/mcats/&fl=id,score&start=0&q=*:*&qs=50&isShard=true&fsv=true&ps=3
}
hits=113573 status=0 QTime=1*

B) *localhost:8983* (running the embedded zookeper server)



*878735 [qtp27058272-15] INFO  org.apache.solr.core.SolrCore  – [mcats]
webapp=/solr path=/select
params={mm=%0a%09%0a2<-1+4<70%25%0a+&facet=true&tie=0.01&qf=%0a%09namex%0a+&distrib=false&wt=javabin&version=2&defType=edismax&rows=10&pf=%0a%09%0a+&NOW=1400148489992&shard.url=http://192.168.6.217:8983/solr/mcats/&fl=id,score&start=0&q=*:*&qs=50&isShard=true&fsv=true&ps=3
}
hits=113573 status=0 QTime=1 878746 [qtp27058272-15] INFO
org.apache.solr.core.SolrCore  – [mcats] webapp=/solr path=/select
params={mm=%0a%09%0a2<-1+4<70%25%0a+&facet=false&tie=0.01&ids=42663,26311,40545,4571,19114,26010,2716,38320,25724,29459&qf=%0a%09namex%0a+&distrib=false&wt=javabin&version=2&defType=edismax&pf=%0a%09%0a+&NOW=1400148489992&shard.url=http://192.168.6.217:8983/solr/mcats/&fl=%0a%0a%09*,+score%0a+&q=*:*&qs=50&isShard=true&ps=3
}
status=0 QTime=4 878750 [qtp27058272-13] INFO
org.apache.solr.core.SolrCore  – [mcats] webapp=/solr path=/select
params={q=*:*&defType=edismax} hits=227136 status=0 QTime=19*

sum of docs should be: *113573 + *

*113573 = 227146 *
But it gives me back the total sum of 227136 as mentioned in logs. Can
anybody help that what's going on here

*.*

With Regards
Aman Tandon


On Thu, May 15, 2014 at 1:36 PM, Aman Tandon wrote:

> Thanks Jack i am using *q.alt* just for testing purpose only we uses
> *q=query* in our general production environment case and *mcat.intent* is
> our request handler to add extra number of rows and all.
>
> Here i was doing some mistake to properly explaining the situation, so i
> am sorry for that.
>
> *Requirement:* I want to test my sharded environment that a unique
> document should present in single shard not in both.
>
> *core name*: mcats
> *core.properties*: name=mcats, so default collection name would be mcats
> as well.
>
> And i was taking my non sharded index(mcats index) and copying it to node1
> as well as node 2 and starting the first node as:
>
> *java -DzkRun -DnumShards=2 -Dbootstrap_confdir=./solr/mcats/conf
> -Dcollection.configName=myconf -jar start.jar*
>
> And second node as
> *java -Djetty.port=1983 -DzkHost=localhost:9983 -jar start.jar*
>
> So i guess it is taking the whole index as it is because when i run the
> query.
>
> http://localhost:8983/solr/mcats/select?q.alt=*:*
> it was giving me the sum of documents in both the shards which is 2*no of
> docs in mcats index.
>
> So the same document is present in both the shard at node1:8983 and
> node2:1983
>
> To figure this out i indexed it with the another different doc now when i
> queried with
> http://localhost:8983/solr/mcats/select?q.alt=id:17406780&distrib=false
> the document is present.
>
> But in another query
> http://localhost:8983/solr/mcats/select?q.alt=id:17406780&distrib=flase
> it is not found.
>
> which fulfilled my test case.
>
> So i thought i have to do the full indexing of my core mcats to validate
> my test case for each ids. Please correct me if i am wrong.
>
>
> With Regards
> Aman Tandon
>
>
> On Wed, May 14, 2014 at 5:52 PM, Jack Krupansky 
> wrote:
>
>> The q.alt param specifies only the parameter to use if the q parameter is
>> missing. Could you verify whether that is really the case? Typically
>> solrconfig gives a default of "*:*" for the q parameter. Specifying a query
>> via the q.alt parameter seems like a strange approach - what is your
>> rationale?
>>
>> What is this odd "mcat.intent" query response writer type that you are
>> specifying with the qt parameter?
>>
>> -- Jack Krupansky
>>
>> -

Re: Indexing DateField timezone problem

2014-05-16 Thread Walter Underwood
After years of building world-wide search services, I disagree.

The general rule is to do everything in Unicode and UTC and to convert at the 
edges of the service. If you use local character sets or local time, you will 
pay for it.

wunder

On May 14, 2014, at 5:27 AM, "Jack Krupansky"  wrote:

> The general rule everywhere is that the default time zone is the local time 
> zone of the server processing the date. Could you verify whether your server 
> is in fact set to be "+03:00".
> 
> If your convention for your database is that the default time zone is GMT, 
> then you will have to manually add that to dates.
> 
> -- Jack Krupansky
> 
> -Original Message- From: hakanbillur
> Sent: Friday, May 9, 2014 4:38 AM
> To: solr-user@lucene.apache.org
> Subject: Indexing DateField timezone problem
> 
> 
> 
> 
> Hi,
> 
> I have a problem about indexing UTC date format to solr from DB. For
> example, in DB, date:"2014-05-01 23:59:00" and same date: "date":
> "2014-05-01T20:59:00Z" in solr.
> There are time diifference -3 hours! (For Turkey).
> 
> you can see about two captures on the right side.
> 
> i hope, someone can help me.
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Indexing-DateField-timezone-problem-tp4135079.html
> Sent from the Solr - User mailing list archive at Nabble.com. 

--
Walter Underwood
wun...@wunderwood.org





Re: Easises way to insatll solr cloud with tomcat

2014-05-16 Thread Shawn Heisey
On 5/15/2014 2:10 AM, Aman Tandon wrote:
> @Matt, Sorry we don't want to use any other organisation product other than
> Apache Foundation. Thanks anyway.
> 
> Anybody else here who can help me with the default tomcat installation
> along with solr to configure solrcloud.

Heliosearch was created by Yonik Seeley ... the original author of Solr.
 He remains the head of the Solr project within Apache.  It *is* Solr,
just with a few tweaks regarding exactly how it uses memory ... so for
some workloads, it is faster.

Thanks,
Shawn



Re: Problem when i'm trying to search something

2014-05-16 Thread Shawn Heisey
On 5/13/2014 3:12 AM, Anon15 wrote:
> Thanks for replying !
> 
> This is my Schema.xml. 

The XML is gone.  I would imagine that this is happening because you're
posting on the Nabble forum, not directly to the mailing list.  Nabble
is a two-way mirror of the list, the actual mailing list is the true
distribution point.  I would guess that Nabble either removes or parses
anything that looks like HTML before it sends to the mailing list, and
if you try to parse XML as HTML, it would likely simply disappear.

You may wish to use a paste website for including data instead.  The
Apache foundation has their own paste site at http://apaste.info for us
to use.

I did visit the Nabble version of your post, and I can see the
schema.xml there.  Your fieldType named "text" has the following line in it:



SnowballPorterFilterFactory is a factory class for a stemming filter.

In your schema, the fields named label, content, teaser, and path_alias
are using this fieldType.  I would imagine that the matches you are
seeing are coming from one or more of those fields.

Thanks,
Shawn



RE: Easises way to insatll solr cloud with tomcat

2014-05-16 Thread Boogie Shafer
aman,

if you don't trust the tomcat bits repackaged by heliosearch, perhaps the best 
step for you is to try looking at the helioseach packaging and configs on a 
test environment and you can diff out the deltas between how they setup tomcat 
to work with solr from the regular distribution you might be using

http://heliosearch.org/getting-started-with-tomcat-and-solr/




From: Aman Tandon 
Sent: Thursday, May 15, 2014 01:10
To: solr-user@lucene.apache.org
Subject: Re: Easises way to insatll solr cloud with tomcat

@Matt, Sorry we don't want to use any other organisation product other than
Apache Foundation. Thanks anyway.

Anybody else here who can help me with the default tomcat installation
along with solr to configure solrcloud.

With Regards
Aman Tandon


On Wed, May 14, 2014 at 8:13 AM, Matt Kuiper (Springblox) <
matt.kui...@springblox.com> wrote:

> Check out http://heliosearch.com/download.html
>
> It is a distribution of Apache Solr packaged with Tomcat.
>
> I have found it simple to use.
>
> Matt
>
> -Original Message-
> From: Aman Tandon [mailto:amantandon...@gmail.com]
> Sent: Monday, May 12, 2014 6:24 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Easises way to insatll solr cloud with tomcat
>
> Can anybody help me out??
>
> With Regards
> Aman Tandon
>
>
> On Mon, May 12, 2014 at 1:24 PM, Aman Tandon  >wrote:
>
> > Hi,
> >
> > I tried to set up solr cloud with jetty which works fine. But in our
> > production environment we uses tomcat so i need to set up the solr
> > cloud with the tomcat. So please help me out to how to setup solr
> > cloud with tomcat on single machine.
> >
> > Thanks in advance.
> >
> > With Regards
> > Aman Tandon
> >
>

Re: Storing tweets For WC2014

2014-05-16 Thread Aman Tandon
I haven't tried situation this this but as per your requirements, you can
make the schema for defining all those fields required by you like, date,
location, etc you can also configure the faceting form solrconfig.xml if
you want the same for every request.

You should give it a try by allocating the 2-4GB of heap space then you can
increase the size by testing it on heavy load.
All the hardware kind of parameters are pluggable, you have to try it by
yourself. If problems arises then should look at the solr logs if there is
a issue related the memory then you can allocate more memory by visualizing
the GC graphs.

I am not an expert, i am just a newbie in solr, may be some points are not
well explained by me, but you should try by experimenting it, I guess you
have  a sufficient time before july ;) .

With Regards
Aman Tandon


On Fri, May 9, 2014 at 11:09 PM, Cool Techi  wrote:

> Hi,
> We have a requirement from one of our customers to provide search and
> analytics on the upcoming Soccer World cup, given the sheer volume of
> tweet's that would be generated at such an event I cannot imagine what
> would be required to store this in solr.
> It would be great if there can be some pointer's on the scale or hardware
> required, number of shards that should be created etc. Some requirement,
> All the tweets should be searchable (approximately 100million tweets/date
>  * 60 Days of event). All fields on tweets should be searchable/facet on
> numeric and date fields. Facets would be run on TwitterId's (unique users),
> tweet created on date, Location, Sentiment (some fields which we generate)
>
> If anyone has attempted anything like this it would be helpful.
> Regards,Rohit
>


Re: Replica active during warming

2014-05-16 Thread Erick Erickson
Are you passing LBHttpSolrServer to the c'tor of  CloudSolrServer or
just using it bare?

On Wed, May 14, 2014 at 12:16 AM, lboutros  wrote:
> In other words, is there a way for the LBHttpSolrServer to ignore replicas
> which are currently "cold" ?
>
> Ludovic.
>
>
>
> -
> Jouve
> France.
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Replica-active-during-warming-tp4135274p4135542.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Why Solrj Default Format is XML Rather than Javabin?

2014-05-16 Thread Furkan KAMACI
Hi;

When I index documents via Solrj it sends documents as XML. Solr processes
it with XMLLoader. Then sends response as javabin format.

Why Solrj client does not send data as javabin format as default?

PS: I use Solr 4.5.1

Thanks;
Furlan KAMACI


Re: Indexing DateField timezone problem

2014-05-16 Thread Alvaro Cabrerizo
I guess, you will need to modify your extraction select in order to fix it,
using some date functions provided by the database manufacturer. For
example, in some projects when using oracle as a data source i've been
using the next recipe to modify the oracle TIMESTAMP(6) datatype to fit the
solr date:

select...
...
TO_CHAR  (myTable.timestamp_colum, '-MM-DD') ||'T'|| TO_CHAR
(myTable.timestamp_colum,'HH24:MI:SS')||'Z'
...
from myTable...


Hope it helps


On Fri, May 9, 2014 at 10:38 AM, hakanbillur wrote:

> 
> 
>
> Hi,
>
> I have a problem about indexing UTC date format to solr from DB. For
> example, in DB, date:"2014-05-01 23:59:00" and same date: "date":
> "2014-05-01T20:59:00Z" in solr.
> There are time diifference -3 hours! (For Turkey).
>
> you can see about two captures on the right side.
>
> i hope, someone can help me.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Indexing-DateField-timezone-problem-tp4135079.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Difference between search strings

2014-05-16 Thread Jack Krupansky
For these specific examples, the results should be the same, but mostly 
that's because the term is a simple sequence of letters.


I have an extended discussion of characters in terms in my e-book:
http://www.lulu.com/us/en/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-7/ebook/product-21203548.html

See the section "Detailed Rules for Terms" in the Query Parsers chapter, on 
through the section "Unicode Characters in Terms", and even some of the 
sections after that, although then we get into analysis, which is much 
deeper than just the query parsing phrase.


-- Jack Krupansky

-Original Message- 
From: rio

Sent: Wednesday, May 14, 2014 9:58 AM
To: solr-user@lucene.apache.org
Subject: Difference between search strings

Can someone please tell me the difference between searching a text in the
following ways

1. q=Exact_Word:"samplestring" -> What does it tell to solr  ?

2. q=samplestring&qf=Exact_Word -> What does it tell to solr  ?

3. q="samplestring"&qf=Exact_Word -> What does it tell to solr  ?

I think the first and the third one are the same.  is it correct ? How does
it differ from the second one.

I am trying to understand how enclosing the full term in "" is resolving the
solr specific special character problem? What does it tell to solr  ? e.g If
there is "!" mark in the string solr will identify it as a NOT, "!" is part
of the string. This issue can be corrected if the full string is enclosed in
a "".





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Difference-between-search-strings-tp4135576.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: KeywordTokenizerFactory splits the string for the exclamation mark

2014-05-16 Thread Shawn Heisey
> Also could you please tell me the difference between searching a text in
> the
> following ways
>
> q=Exact_Word:"samplestring"
>
> q=samplestring&qf=Exact_Word
>
> I am trying to understand how enclosing the full term in "" is resolving
> this problem ? What does it tell to solr  ?

The quotes tell Solr to do a phrase query. A phrase query must have the
same relative position increments in the index as are found in the query,
or the entire string must be an exact match for a single token in the
index. Basically, the index must have the same words as the query, next to
each other, and in the same order.

> Other than the exclamation mark are there any other characters which tells
> specific things to solr

There are a number of special characters to Solr' standard query parser.
The bottom of this page shows them all:

http://lucene.apache.org/core/4_2_1/queryparser/org/apache/lucene/queryparser/classic/package-summary.html?is-external=true

It lists && as a special character. It's not the combination of two &
characters that is special, it is each & character.

Thanks,
Shawn




Re: distrib=false is not honoring

2014-05-16 Thread Shawn Heisey
> The q.alt param specifies only the parameter to use if the q parameter is
> missing. Could you verify whether that is really the case? Typically
> solrconfig gives a default of "*:*" for the q parameter. Specifying a
> query
> via the q.alt parameter seems like a strange approach - what is your
> rationale?

As the author of a book about Solr, I'm sure you already know these
things. For everyone else:

The qt parameter only gets used if the handleSelect value on the request
dispatcher configuration is true. When that is set, it chooses a request
handler by name. This configuration value defaults to false since version
3.6, so that you can't use the /select handler to do other things, like
/update.

The q.alt parameter is applicable only to the dismax and edismax query
parsers, and specifies the query to send to the standard query parser if
the q parameter is blank or missing. It's most often defined as *:* to get
all docs. The *:* value doesn't have any special meaning to the dismax
parser.

Thanks,
Shawn





Re: Help to Understand a Solr Query

2014-05-16 Thread Shawn Heisey
On 5/13/2014 8:56 AM, nativecoder wrote:
> Exact_Word" omitPositions="true" termVectors="false"
> omitTermFreqAndPositions="true" compressed="true" type="string_ci"
> multiValued="false" indexed="true" stored="true" required="false"
> omitNorms="true"/>
> 
>  multiValued="false" indexed="true" stored="true" required="false"
> omitNorms="true"/>
> 
>  omitNorms="true"> class="solr.KeywordTokenizerFactory"/> class="solr.LowerCaseFilterFactory"/>
> 
> 
> 
> As you can see Exact_Word has the KeywordTokenizerFactory and that should
> treat the string as it is.
> 
> Following is my responseHeader. As you can see I am searching my string only
> in the filed Exact_Word and expecting it to return the Word field and the
> score
> 
> "responseHeader":{
> "status":0,
> "QTime":14,
> "params":{
>   "explainOther":"",
>   "fl":"Word,score",
>   "debugQuery":"on",
>   "indent":"on",
>   "start":"0",
>   "q":"d!sdasdsdwasd!a...@dsadsadas.edu",
>   "qf":"Exact_Word",
>   "wt":"json",
>   "fq":"",
>   "version":"2.2",
>   "rows":"10"}},
> 
> 
> But when I enter email with the following string
> "d!sdasdsdwasd...@dsadsadas.edu" it splits the string to two. I was under
> the impression that KeywordTokenizerFactory will treat the string as it is.
> 
> Following is the query debug result. There you can see it has split the word
>  "parsedquery":"+((DisjunctionMaxQuery((Exact_Word:d))
> -DisjunctionMaxQuery((Exact_Word:sdasdsdwasd...@dsadsadas.edu)))~1)",
> 
> can someone please tell why it produce the query result as this
> 
> If I put a string without the "!" sign as below, the produced query will be
> as below
> 
> "parsedquery":"+DisjunctionMaxQuery((Exact_Word:d_sdasdsdwasd_...@dsadsadas.edu))",.
> This is what I expected solr to even with the "!" mark. with "_" mark it
> wont do a string split and treats the string as it is
> 
> I thought if the KeywordTokenizerFactory is applied then it should return
> the exact string as it is
> 
> Please help me to understand what is going wrong here 

I cannot make Solr (4.7.2) behave this way with exclamation points.  I
tried debugQuery=true, using the standard query parser with df set to
the field as well as setting the qf parameter on the dismax parser and
the edismax parser.  None of these will split the string like what shows
up in your debugQuery.

Here's a screenshot of the analysis screen for a similar fieldType with
your input data:

https://www.dropbox.com/s/0v2lbc76h9wejw1/lowercase-analysis.png

KT is the KeywordTokenizer.  ICUFF is the ICUFoldingFilter -- lowercase
on steroids.  TF is the TrimFilter.

Restating what Jack said in his reply:

http://people.apache.org/~hossman/#threadhijack

Thanks,
Shawn



Re: Solr, How to index scripts *.sh and *.SQL

2014-05-16 Thread Marc
Thanks that worked



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-How-to-index-scripts-sh-and-SQL-tp4135627p4136207.html
Sent from the Solr - User mailing list archive at Nabble.com.


Reloading core with param vs unloading and creating with params

2014-05-16 Thread Elran Dvir
Hi all,

I created a new patch https://issues.apache.org/jira/browse/SOLR-6063 , 
enabling changes in core properties without the need to unload and create it.
Considering the change in patch,
is reloading a core with transient=true and loadOnStartup=false equivalent in 
memory footprint to unloading the core and creating it with the same parameters?

Thanks.


Re: ContributorsGroup add request - Username: al.krinker

2014-05-16 Thread Stefan Matheis
Al  

i’ve added you :)

minor note aside: being listed in the contributors group in the wiki doesn’t 
mean, you can change/commit to the lucene/solr repository automatically. but 
improvements are always welcome, you can read about it on 
https://wiki.apache.org/solr/HowToContribute

-Stefan  


On Thursday, May 15, 2014 at 10:19 PM, Al Krinker wrote:

> Please add me to the list of contributors. Username: al.krinker
>  
> There is some minor css tweaks that I would like to fix.
>  
> I work with Solr almost daily, so I would love to contribute to make it
> better.
>  
> Thanks,
> Al
>  
>  




RE: Spell check [or] Did you mean this with Phrase suggestion

2014-05-16 Thread Dyer, James
Have you looked at "spellcheck.collate", which re-writes the entire query with 
one or more corrected words?  See 
http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.collate .  There are 
several options shown at this link that controls how the "collate" feature 
works.

James Dyer
Ingram Content Group
(615) 213-4311

-Original Message-
From: vanitha venkatachalam [mailto:venkatachalam.vani...@gmail.com] 
Sent: Thursday, May 08, 2014 4:14 AM
To: solr-user@lucene.apache.org
Subject: Spell check [or] Did you mean this with Phrase suggestion

Hi,
We need a spell check component that suggest actual full phrase not just
words.

Say, we have list of brands : "Nike corporation", "Samsung electronics" ,

when I search for "tamsong", I like to get suggestions as "samsung
electronics" ( full phrase ) not just "samsung" ( words)
Please help.
-- 
regards,
Vanitha


ContributorsGroup add request - Username: al.krinker

2014-05-16 Thread Al Krinker
Please add me to the list of contributors. Username: al.krinker

There is some minor css tweaks that I would like to fix.

I work with Solr almost daily, so I would love to contribute to make it
better.

Thanks,
Al


Re: Sub-Sequence token filter

2014-05-16 Thread Nitzan Shaked
Doesn't look like it. If I understand it correctly,
PathHierarchyTokenizerFactory
will only output prefixes. I support suffixes as well, plus the
ever-so-useful "unanchored" sub-sequences. Using domains again as an
example, I can use my suggestion to query "market.ebay" and find "
www.market.ebay.com" (domains completely made up for the sake of this
example).


On Fri, May 16, 2014 at 7:53 PM, Ahmet Arslan  wrote:

> Hi Nitzan,
>
> Cant you do what you described with PathHierarchyTokenizerFactory?
>
>
> http://lucene.apache.org/core/4_0_0/analyzers-common/org/apache/lucene/analysis/path/PathHierarchyTokenizerFactory.html
>
> Ahmet
>
>
>
>
>
> On Friday, May 16, 2014 5:13 PM, Nitzan Shaked 
> wrote:
> Hi list
>
> I created a small token filter which I'd gladly "contribute", but want to
> know if there's any interest in it before I go and make it pretty, add
> documentation, etc... ;)
>
> I originally created it to index domain names: I wanted to be able to
> search for "google.com" and find "www.google.com" or "ads.google.com", "
> mail.google.com", etc.
>
> What it does is split a token (in my case -- according to "."), and then
> outputs all sub-sequences. So "a,b,c,d" will output "a", "b", "c", "d",
> "a.b", "b.c", "c.d", "a.b.c", "b.c.d", and "a.b.c.d". I use it only in the
> "index" analyzer, and so am able to specify any of the generated tokens to
> find the original token.
>
> It has the following arguments:
>
> sepRegexp: regular expression that the original token will be split
> according to. (I use "[.]" for domains)
> glue: string that will be used to join sub-sequences back together (I use
> "." for domains)
> minLen: minimum generated sub-sequence length
> maxLen: maximum generated sub-sequence length (0 for unlimited, negative
> numbers for token length minus specified amount)
> anchor: "start" to only output prefixes, "end" to only output suffix, or
> "none" to output any sub-sequence
>
> So... is this useful to anyone?
>
>


Re: getting direct link to solr result.

2014-05-16 Thread Alexandre Rafalovitch
How are you getting the data into Solr?

Solr is not a storage or a database method. It's a search engine. So,
usually, you would have your filesystem with files and then you feed
those to Solr for indexing. When you found what you are looking for,
you can have the particular file delivered by whatever implementation
you chose (outside of Solr).

Regards,
   Alex
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Sun, May 11, 2014 at 2:53 AM, blach  wrote:
> Hello all!
>
> I have been using solr for a few days, but I still don't understand, how can
> I get direct link to open the document i'm looking for.
>
> I tried to do that, but the only information I can retrieve from the Json
> result from Solr is ID, Name, Modified date ...
>
> well, I'm working on android application, and I want to make the user get a
> direct link to the file he searched for.
>
> thanks.
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/getting-direct-link-to-solr-result-tp4135084.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Sub-Sequence token filter

2014-05-16 Thread Ahmet Arslan
Hi Nitzan,

Cant you do what you described with PathHierarchyTokenizerFactory?

http://lucene.apache.org/core/4_0_0/analyzers-common/org/apache/lucene/analysis/path/PathHierarchyTokenizerFactory.html

Ahmet





On Friday, May 16, 2014 5:13 PM, Nitzan Shaked  wrote:
Hi list

I created a small token filter which I'd gladly "contribute", but want to
know if there's any interest in it before I go and make it pretty, add
documentation, etc... ;)

I originally created it to index domain names: I wanted to be able to
search for "google.com" and find "www.google.com" or "ads.google.com", "
mail.google.com", etc.

What it does is split a token (in my case -- according to "."), and then
outputs all sub-sequences. So "a,b,c,d" will output "a", "b", "c", "d",
"a.b", "b.c", "c.d", "a.b.c", "b.c.d", and "a.b.c.d". I use it only in the
"index" analyzer, and so am able to specify any of the generated tokens to
find the original token.

It has the following arguments:

sepRegexp: regular expression that the original token will be split
according to. (I use "[.]" for domains)
glue: string that will be used to join sub-sequences back together (I use
"." for domains)
minLen: minimum generated sub-sequence length
maxLen: maximum generated sub-sequence length (0 for unlimited, negative
numbers for token length minus specified amount)
anchor: "start" to only output prefixes, "end" to only output suffix, or
"none" to output any sub-sequence

So... is this useful to anyone?



Re: Storing tweets For WC2014

2014-05-16 Thread Michael Della Bitta
Some of the data providers for Twitter offer a search API. Depending on
what you're doing, you might not even need to host this yourself.

My company does do search and analytics over tweets, but by the time we end
up indexing them, we've winnowed down the initial set to 10% of what we've
initially ingested, which itself is a fraction of the total set of tweets
as our data provider has let us filter for the ones that have the keywords
we want.

Our news index approaches the size of what you're talking about within an
order of magnitude (where 'news' is really an index of sentences taken from
news reports, along with metadata about the document the news came from).
Overall, we're hosting about 310 million records (give or take depending
where in the sharding cycle we're on) in a cluster of 5 AWS i2.xlarge boxes.

This setup indexes from our feeds in real time, which means there's no mass
loading. Additionally, we generally do bulk data collection across only 3
days of data, so if you're looking to do a mess of reporting against your
full set, take that into consideration.

Michael Della Bitta

Applications Developer

o: +1 646 532 3062

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions  | g+:
plus.google.com/appinions
w: appinions.com 


On Fri, May 9, 2014 at 1:39 PM, Cool Techi  wrote:

> Hi,
> We have a requirement from one of our customers to provide search and
> analytics on the upcoming Soccer World cup, given the sheer volume of
> tweet's that would be generated at such an event I cannot imagine what
> would be required to store this in solr.
> It would be great if there can be some pointer's on the scale or hardware
> required, number of shards that should be created etc. Some requirement,
> All the tweets should be searchable (approximately 100million tweets/date
>  * 60 Days of event). All fields on tweets should be searchable/facet on
> numeric and date fields. Facets would be run on TwitterId's (unique users),
> tweet created on date, Location, Sentiment (some fields which we generate)
>
> If anyone has attempted anything like this it would be helpful.
> Regards,Rohit
>


date range queries efficiency

2014-05-16 Thread Dmitry Kan
Hi,

There was a mention either on solr wiki or on this list, that in order to
optimize the date range queries, it is beneficial to round down the range
values.

For example, if a range query is:

DateTime:[NOW-3DAYS TO NOW]

then if the precision up to msec is not required, we can safely round that
down to a day or hour, for example:

DateTime:[NOW-3DAYS/DAY TO NOW/DAY]
DateTime:[NOW-3DAYS/HOUR TO NOW/HOUR]

What I'm wondering about is what other optimizations would make sense here
on the indexing side? Luke shows that solr stores dates as longs with
millisecond precision. So this seems to utilize the efficient Lucene
numeric range queries internally.

If we do not need msec precision on dates during search, does it make sense
to also "round" dates down during indexing? Are there any other tips and
tricks for efficient date range queries?

Thanks!

-- 
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan


Re: Sub-Sequence token filter

2014-05-16 Thread Ahmet Arslan
Hi,

I don't have system that searches on URLs. So I don't fully follow. 
But I remember people use URLClassifyProcessorFactory



On Friday, May 16, 2014 8:33 PM, Nitzan Shaked  wrote:
Doesn't look like it. If I understand it correctly,
PathHierarchyTokenizerFactory
will only output prefixes. I support suffixes as well, plus the
ever-so-useful "unanchored" sub-sequences. Using domains again as an
example, I can use my suggestion to query "market.ebay" and find "
www.market.ebay.com" (domains completely made up for the sake of this
example).



On Fri, May 16, 2014 at 7:53 PM, Ahmet Arslan  wrote:

> Hi Nitzan,
>
> Cant you do what you described with PathHierarchyTokenizerFactory?
>
>
> http://lucene.apache.org/core/4_0_0/analyzers-common/org/apache/lucene/analysis/path/PathHierarchyTokenizerFactory.html
>
> Ahmet
>
>
>
>
>
> On Friday, May 16, 2014 5:13 PM, Nitzan Shaked 
> wrote:
> Hi list
>
> I created a small token filter which I'd gladly "contribute", but want to
> know if there's any interest in it before I go and make it pretty, add
> documentation, etc... ;)
>
> I originally created it to index domain names: I wanted to be able to
> search for "google.com" and find "www.google.com" or "ads.google.com", "
> mail.google.com", etc.
>
> What it does is split a token (in my case -- according to "."), and then
> outputs all sub-sequences. So "a,b,c,d" will output "a", "b", "c", "d",
> "a.b", "b.c", "c.d", "a.b.c", "b.c.d", and "a.b.c.d". I use it only in the
> "index" analyzer, and so am able to specify any of the generated tokens to
> find the original token.
>
> It has the following arguments:
>
> sepRegexp: regular expression that the original token will be split
> according to. (I use "[.]" for domains)
> glue: string that will be used to join sub-sequences back together (I use
> "." for domains)
> minLen: minimum generated sub-sequence length
> maxLen: maximum generated sub-sequence length (0 for unlimited, negative
> numbers for token length minus specified amount)
> anchor: "start" to only output prefixes, "end" to only output suffix, or
> "none" to output any sub-sequence
>
> So... is this useful to anyone?
>
>



Re: WordDelimiterFilterFactory and StandardTokenizer

2014-05-16 Thread aiguofer
Jack Krupansky-2 wrote
> Typically the white space tokenizer is the best choice when the word 
> delimiter filter will be used.
> 
> -- Jack Krupansky

If we wanted to keep the StandardTokenizer (because we make use of the token
types) but wanted to use the WDFF to get combinations of words that are
split with certain characters (mainly - and /, but possibly others as well),
what is the suggested way of accomplishing this? Would we just have to
extend the JFlex file for the tokenizer and re-compile it?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/WordDelimiterFilterFactory-and-StandardTokenizer-tp4131628p4136146.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Difference between search strings

2014-05-16 Thread Shawn Heisey
On 5/14/2014 7:15 AM, nativecoder wrote:
> Can someone please tell me the difference between searching a text in the
> following ways
> 
> 1. q=Exact_Word:"samplestring" -> What does it tell to solr  ?
> 
> 2. q=samplestring&qf=Exact_Word -> What does it tell to solr  ?
> 
> 3. q="samplestring"&qf=Exact_Word -> What does it tell to solr  ?
>  
> I think the first and the third one are the same.  is it correct ? How does
> it differ from the second one.
> 
> I am trying to understand how enclosing the full term in "" is resolving the
> solr specific special character problem? What does it tell to solr  ? e.g If
> there is "!" mark in the string solr will identify it as a NOT, "!" is part
> of the string. This issue can be corrected if the full string is enclosed in
> a "". 

Quotes surrounding a Solr query turn it into a phrase query.  For fields
where the entire text is a single token, this becomes an exact match.
For tokenized fields, it means that term positions in the index and the
query will be compared -- so the query terms will need to be next to
each other and in that specific order in the indexed data.

Your first and third examples should parse the same, although the third
one only works with the dismax and edismax parsers.  The first one would
work correctly with the standard parser and the edismax parser, but not
the dismax parser.

Quotes will *also* eliminate the need to escape characters that would
normally require backslash escaping.  For single-token fields where
you're doing exact match, quotes will also preserve spaces in the query.
 If you need an actual quote character to be in your query, it needs to
be escaped.

As for the problem you are having with the exclamation point -- the Solr
analaysis page indicates that KeyWordTokenizer does *not* split on
exclamation points.  The only thing I am aware of that uses exclamation
points for splitting is explicit document routing in SolrCloud.  If the
field you are using is the uniqueKey for your index and you are running
SolrCloud, then text before an exclamation point is used for document
routing.  Note:  You should not use a solr.TextField type for your
uniqueKey field, that should be solr.StrField.  If you use
solr.StrField, then you cannot have an analysis chain with a tokenizer,
so any possible confusion about what KeywordTokenizer does would disappear.

Thanks,
Shawn



Sorting problem in Solr due to Lucene Field Cache

2014-05-16 Thread Jeongseok Son
Hello, I'm struggling with large data indexed and searched by Solr.

The schema of the documents consist of date(-MM-DD), text(tokenized and
indexed with Natural Language Toolkit), and several numerical fields.

Each document is small-sized but but the number of the docs is very large,
which is around 10 million per each date. The server has 32GB of memory and
I allocated around 30GB for Solr JVM.

My Solr server has to return documents sorted by one of the numerical
fields when is requested with specific date and text.(ex.
q=date:-MM-DD+text:KEYWORD) The problem is that sorting in Lucene
requires lots of Field Cache and Solr can't handle Field Cache well. The
Field Cache is getting larger as more queries are executed and is not
evicted. When the whole memory is filled with Field Cache, Solr server
stops or generates Out of Memory exception.

Solr cannot control Lucene field cache at all so I have a difficult time to
solve this problem. I'm considering these three ways to solve this.

1) Add more memory.
This can relieve the problem but I don't think it can completely solve it.
Anyway the memory would fill up with field cache as the server handles
search requests.
2) Separate numerical data from text data
I find Solr/Lucene isn't suitable for sorting large numerical data.
Therefore I'm thinking of storing numerical data in another DB(HBase,
MongoDB ...), then Solr server will just do some text search.
3) Switching to Elasticsearch
According to this page(
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-fielddata.html)
Elasticsearch can control field cache. I think ES could solve my
problem.

I'm likely to try 2nd, or 3rd way. Are these appropriate solutions? If you
have any better ideas please let me know. I've went through too many
troubles so it's time to make a decision. I want my choices reviewed by
many other excellent Solr users and developers and also want to find better
solutions.
I really appreciate any help you can provide.


location of the files created by zookeeper?

2014-05-16 Thread Aman Tandon
Hi,

Can anybody tell me where does the embedded zookeeper keeps your config
files.when we describe the configName in starting the solrcloud then it
gives that name to the directory, as guessed from the solr logs.






*4409 [main] INFO  org.apache.solr.common.cloud.SolrZkClient  – makePath:
/configs/myconf/velocity/footer.vm4456 [main] INFO
org.apache.solr.common.cloud.SolrZkClient  – makePath:
/configs/myconf/velocity/pagination_bottom.vm 4479 [main] INFO
org.apache.solr.common.cloud.SolrZkClient  – makePath:
/configs/myconf/velocity/head.vm4530 [main] INFO
org.apache.solr.common.cloud.SolrZkClient  – makePath:
/configs/myconf/velocity/pagination_top.vm 4555 [main] INFO
org.apache.solr.common.cloud.SolrZkClient  – makePath:
/configs/myconf/velocity/VM_global_library.vm4599 [main] INFO
org.apache.solr.common.cloud.SolrZkClient  – makePath:
/configs/myconf/velocity/suggest.vm*


With Regards
Aman Tandon


Re: AnalyzingInfixLookupFactory with multiple cores

2014-05-16 Thread Dmitry Kan
Hi Mike,

The core name can be accessed via: ${solr.core.name} in solrconfig.xml
(verified in a solr replication config).

HTH,
Dmitry


On Fri, May 9, 2014 at 4:07 PM, Michael Sokolov <
msoko...@safaribooksonline.com> wrote:

> It seems as if the location of the suggester dictionary directory is not
> core-specific, so when the suggester is defined for multiple cores, they
> collide: you get exceptions attempting to obtain the lock, and the
> suggestions bleed from one core to the other.   There is an (undocumented)
> "indexPath" parameter that can be used to control this, so I think I can
> work around the problem using that, but it would be a nice feature if the
> suggester index directory were relative to the core directory rather than
> the current working directory of the process.
>
> Question: is the current core directory (or even its name) available as a
> variable that gets substituted in solrconfig.xml?  I.e. ${core-name} or
> something?
>
> -Mike
>



-- 
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan


Re: New equivalent to QueryParsing.parseQuery()?

2014-05-16 Thread Chris Hostetter

: me incorporate these config files as before. I'm (naively?) trying the
: following:
: 
: final StandardQueryParser parser = new StandardQueryParser();
: final Query luceneQuery = parser.parse(query, "text");
: luceneIndex.getIndexSearcher().search(luceneQuery, collector);
: 
: However, the behavior of the StandardQueryParser seems to be different
: enough to make some previously good queries fail, and I've not found a new

I don't think there's anything particular about StandardQueryParser that 
has changed that would make your queries fail -- however what you have 
there doens't refer to your schema at all, so that would obviously result 
in differences.

The main reason QueryParser.parseQuery went away is that it wasn't able to 
track any context of the request, so query time options for things like 
the default field were a pain to deal with -- no to mention doing parser 
overrides.

The closest corelary to what you were doing before is construct a 
LocalSolrQueryRequest and then pass that to QParser.getParser().

But if you are really just completley bypassing Solr, and constructing 
IndexSchema and SolrIndexConfig objects yourself -- you probably don't 
have a SolrCore object, which means that approach is probablematic.

Altenatively, you could try passing schema.getQueryAnalyzer() to your 
StandardQueryParser constructor -- that will give you all of the 
appropriate analyzers, but it won't help with some of the other FieldType 
specific features (like knowing when to build NumericRangeQueries for trie 
fields, when docValues are used, etc...)


Somewhere in between those two suggestions would be implementing 
SolrQueryRequest yourself with a a new mock object that gives access to 
the schema but just throws UnsupportedOperationExceptio for anything 
related to the SolrCore -- and then use that to directly construct a 
"LuceneQParser" instance and an org.apache.solr.parser.QueryParser 
instance.  (once you are that deep into the parsing logic, the only parsts 
of the SolrQueryRequest that hsould be consulted are the schema)


-Hoss
http://www.lucidworks.com/


Indexing Getting Failed

2014-05-16 Thread Vineet Mishra
Hi

I have setup default cloud cluster 4.6.0 with inbuilt Zookeeper running on
Jetty, as I started with indexing till a few thousand it goes fine but soon
after some 5000 documents or so it started giving error(please find below)
and stopped the indexing too as the Zookeeper Leader selection was in
transition, is it the problem due to built in Zookeeper.

*Error Trace:*

ERROR org.apache.solr.core.SolrCore  –
org.apache.solr.common.SolrException: No registered leader was found,
collection:collection1 slice:shard2
at
org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:484)
at
org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:467)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:223)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:428)
at
org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
at
org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:89)
at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:151)
at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:131)
at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:223)
at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:116)
at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:188)
at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:114)
at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:158)
at
org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:99)
at org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:58)
at
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:710)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:413)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:197)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:953)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:679)

Any Suggestion would be appreciated.

Thanks!


Re: WordDelimiterFilterFactory and StandardTokenizer

2014-05-16 Thread Ahmet Arslan
Hi Aiguofer,

You mean ClassicTokenizer? Because StandardTokenizer does not set token types 
(e-mail, url, etc).


I wouldn't go with the JFlex edit, mainly because maintenance costs. It will be 
a burden to maintain a custom tokenizer.

MappingCharFilters could be used to manipulate tokenizer behavior.

Just an example, if you don't want your tokenizer to break on hyphens, replace 
it with something that your tokenizer does not break. For example under score.

"-" => "_"



Plus WDF can be customized too. Please see types attribute :

http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/core/src/test-files/solr/collection1/conf/wdftypes.txt

 
Ahmet


On Friday, May 16, 2014 6:24 PM, aiguofer  wrote:
Jack Krupansky-2 wrote

> Typically the white space tokenizer is the best choice when the word 
> delimiter filter will be used.
> 
> -- Jack Krupansky

If we wanted to keep the StandardTokenizer (because we make use of the token
types) but wanted to use the WDFF to get combinations of words that are
split with certain characters (mainly - and /, but possibly others as well),
what is the suggested way of accomplishing this? Would we just have to
extend the JFlex file for the tokenizer and re-compile it?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/WordDelimiterFilterFactory-and-StandardTokenizer-tp4131628p4136146.html
Sent from the Solr - User mailing list archive at Nabble.com.



Solr Commiter

2014-05-16 Thread Mukundaraman valakumaresan
Hi

How to become a solr committer? Any suggestions?

Regards
Mukund


Re: Storing tweets For WC2014

2014-05-16 Thread Alexandre Rafalovitch
That's a lot of tweets. There is an article talking about smaller
scale lessons, might be still useful:
http://ricston.com/blog/guerrilla-search-solr-run-3-million-documents-search-15month-machine/

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Sat, May 10, 2014 at 12:39 AM, Cool Techi  wrote:
> Hi,
> We have a requirement from one of our customers to provide search and 
> analytics on the upcoming Soccer World cup, given the sheer volume of tweet's 
> that would be generated at such an event I cannot imagine what would be 
> required to store this in solr.
> It would be great if there can be some pointer's on the scale or hardware 
> required, number of shards that should be created etc. Some requirement,
> All the tweets should be searchable (approximately 100million tweets/date  * 
> 60 Days of event). All fields on tweets should be searchable/facet on numeric 
> and date fields. Facets would be run on TwitterId's (unique users), tweet 
> created on date, Location, Sentiment (some fields which we generate)
>
> If anyone has attempted anything like this it would be helpful.
> Regards,Rohit
>


Re: getting direct link to solr result.

2014-05-16 Thread blach
Yes Thank you,

I got to solve by adding literal when indexing.

now I'm trying to implement it into my android application, I used Solrj,
but I found out that Solrj is just for java application and it's not working
with Android.

can you suggest me a way how to index a folder from my android application.

thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/getting-direct-link-to-solr-result-tp4135084p4136266.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: AnalyzingInfixLookupFactory with multiple cores

2014-05-16 Thread Michael Sokolov

Thanks Dmitry!

On 05/15/2014 07:54 AM, Dmitry Kan wrote:

Hi Mike,

The core name can be accessed via: ${solr.core.name} in solrconfig.xml
(verified in a solr replication config).

HTH,
Dmitry


On Fri, May 9, 2014 at 4:07 PM, Michael Sokolov <
msoko...@safaribooksonline.com> wrote:


It seems as if the location of the suggester dictionary directory is not
core-specific, so when the suggester is defined for multiple cores, they
collide: you get exceptions attempting to obtain the lock, and the
suggestions bleed from one core to the other.   There is an (undocumented)
"indexPath" parameter that can be used to control this, so I think I can
work around the problem using that, but it would be a nice feature if the
suggester index directory were relative to the core directory rather than
the current working directory of the process.

Question: is the current core directory (or even its name) available as a
variable that gets substituted in solrconfig.xml?  I.e. ${core-name} or
something?

-Mike








Sub-Sequence token filter

2014-05-16 Thread Nitzan Shaked
Hi list

I created a small token filter which I'd gladly "contribute", but want to
know if there's any interest in it before I go and make it pretty, add
documentation, etc... ;)

I originally created it to index domain names: I wanted to be able to
search for "google.com" and find "www.google.com" or "ads.google.com", "
mail.google.com", etc.

What it does is split a token (in my case -- according to "."), and then
outputs all sub-sequences. So "a,b,c,d" will output "a", "b", "c", "d",
"a.b", "b.c", "c.d", "a.b.c", "b.c.d", and "a.b.c.d". I use it only in the
"index" analyzer, and so am able to specify any of the generated tokens to
find the original token.

It has the following arguments:

sepRegexp: regular expression that the original token will be split
according to. (I use "[.]" for domains)
glue: string that will be used to join sub-sequences back together (I use
"." for domains)
minLen: minimum generated sub-sequence length
maxLen: maximum generated sub-sequence length (0 for unlimited, negative
numbers for token length minus specified amount)
anchor: "start" to only output prefixes, "end" to only output suffix, or
"none" to output any sub-sequence

So... is this useful to anyone?


Multiple highlight snippet for single field

2014-05-16 Thread Bijan Pourriahi
Hello all,

I am trying to return multiple snippets from a single document with a field 
which includes many (5+) instances of the word ‘andy’ in the text. For some 
reason, I can only get it to return one snippet. Any ideas?

Here’s the query and the response:
http://codejaw.com/2gwoozr

Thanks!

- Bijan

This e-mail transmission and any documents, files or previous e-mail messages 
attached to it, are confidential. If you are not the intended recipient, or a 
person responsible for delivering it to the intended recipient, you are hereby 
notified that any review, disclosure, copying, dissemination, distribution or 
use of any of the information contained in, or attached to this e-mail 
transmission is STRICTLY PROHIBITED. If you have received this transmission in 
error, please immediately notify the sender then delete immediately.


DataImportHandler atomic updates

2014-05-16 Thread Peter Pišljar
Hello,

i am trying to import data from my db into solr.
in db i have two tables
- orders [order_id, user_id, created_at, store]
- order_items [order_id, item_id] (one - to - many relation )

i would like to import this into my solr collection
- orders [user_id (unique), item_id (multivalue) ]

i select all the orders from my table and process them
i select all the items for each order and process them,.

now the problem is, that without atomic updates, each order for the user id
'10' will overwrite its previous orders. (instead just add the item_ids)

so i tried to use scripttransformer to set the "add" parameter for atomic
update of item_id field.

its still not working correctly.
1) i dont get all the items, but there are more than just from the last
order ... lets say last 5 orders
2) first few items are saved as add:ID, the others are ok.
(lets say that i have 20 itrems (1...20) for my user_id '10', something
like this would get in the index: [ add:14, add:15, 16, 17,18,19,20 ] (note
that items from 1...13 are missing, and item 14 and 15 have "add:" infront
of them.

here is my full script: http://pastebin.com/6EnW8Heg

please note that above i simplified the description a little bit, but still
everything applies.

i would really appreciate any kind of help.
---


Re: slow performance on simple filter

2014-05-16 Thread Erick Erickson
the first time you use any fq clause, it's evaluated pretty much as
though you'd just ANDed it in to the main clause. It's only if you use
the fq clause again that the query can take advantage of the caching.

But one query does not a pattern make. Is this right after you've
started the server? Or committed? Do you have warming queries defined
to fill up low level caches?

In short, you've given us almost nothing to go on. Perhaps you'd like to review:
http://wiki.apache.org/solr/UsingMailingLists

Best,
Erick

On Wed, May 14, 2014 at 1:41 PM, mizayah  wrote:
> Hey,
>
> I got pretty big index with about 7 mln doccuments.
>
>
> I got pretty slow query when i ask about common word. Nothing changes when i
> ask by q or fq.
>
>  params={fl=score,class_name&indent=true&q=*:*&wt=xml&fq=class_name:CdnFile}
> hits=5594978 status=0 QTime=408
>
> It't not about my hardware tho. I was sure solr is blazing fast if i do such
> simple queries.
> Why my query is so slow? Its because of to frequent worl "CdnFile"?
>
>
> PS.
> I index only this field into elasticsearch and im getting same responses
> like in 0,08s
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/slow-performance-on-simple-filter-tp4135613.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: search multiple cores

2014-05-16 Thread Erick Erickson
Really, really, _really_ consider denormalizing the data. You're
trying to use Solr
as a RDBMS. Solr is a _great_ search engine, but it's not a DB and trying
to make it behave as one is almost always a mistake.

Using joins should really be something you try _last_.

Best,
Erick

On Tue, May 13, 2014 at 8:27 PM, Jay Potharaju  wrote:
> Hi,
> I am trying to join across multiple cores using query time join. Following
> is my setup
> 3 cores - Solr 4.7
> core1:  0.5 million documents
> core2: 4 million documents and growing. This contains the child documents
> for documents in core1.
> core3: 2 million documents and growing. Contains records from all users.
>
>  core2 contains documents that are accessible to each user based on their
> permissions. The number of documents accessible to a user range from couple
> of 1000s to 100,000.
>
> I would like to get results by combining all three cores. For each search I
> get documents from core3 and then query core1 to get parent documents &
> then core2 to get the appropriate child documents depending of user
> permissions.
>
> I 'm referring to this link to join across cores
> http://stackoverflow.com/questions/12665797/is-solr-4-0-capable-of-using-join-for-multiple-core
>
> {!join from=fromField to=toField fromIndex=fromCoreName}fromQuery
>
> This is not working for me. Can anyone suggest why it is not working. Any
> pointers on how to search across multiple cores.
>
> thanks
>
>
>
> J


Re: core.properties setup help

2014-05-16 Thread Aman Tandon
Any help here.??

With Regards
Aman Tandon


On Thu, May 15, 2014 at 7:33 PM, Aman Tandon wrote:

> Hi,
>
> In my solr-4.2 we were using the two cores as described below:
>
> 
>hostContext="${hostContext:}" zkClientTimeout="${zkClientTimeout:15000}">
> 
> 
> 
>
>
> How should i setup the solr 4.7 for the core.properties of mcat and cat
> cores for using the solrcloud?
>
> With Regards
> Aman Tandon
>


Re: Sorting problem in Solr due to Lucene Field Cache

2014-05-16 Thread Joel Bernstein
Take a look at Solr's use of DocValues:
https://cwiki.apache.org/confluence/display/solr/DocValues.

There are docValues options that use less memory then the FieldCache.

Joel Bernstein
Search Engineer at Heliosearch


On Thu, May 15, 2014 at 6:39 AM, Jeongseok Son  wrote:

> Hello, I'm struggling with large data indexed and searched by Solr.
>
> The schema of the documents consist of date(-MM-DD), text(tokenized and
> indexed with Natural Language Toolkit), and several numerical fields.
>
> Each document is small-sized but but the number of the docs is very large,
> which is around 10 million per each date. The server has 32GB of memory and
> I allocated around 30GB for Solr JVM.
>
> My Solr server has to return documents sorted by one of the numerical
> fields when is requested with specific date and text.(ex.
> q=date:-MM-DD+text:KEYWORD) The problem is that sorting in Lucene
> requires lots of Field Cache and Solr can't handle Field Cache well. The
> Field Cache is getting larger as more queries are executed and is not
> evicted. When the whole memory is filled with Field Cache, Solr server
> stops or generates Out of Memory exception.
>
> Solr cannot control Lucene field cache at all so I have a difficult time to
> solve this problem. I'm considering these three ways to solve this.
>
> 1) Add more memory.
> This can relieve the problem but I don't think it can completely solve it.
> Anyway the memory would fill up with field cache as the server handles
> search requests.
> 2) Separate numerical data from text data
> I find Solr/Lucene isn't suitable for sorting large numerical data.
> Therefore I'm thinking of storing numerical data in another DB(HBase,
> MongoDB ...), then Solr server will just do some text search.
> 3) Switching to Elasticsearch
> According to this page(
>
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-fielddata.html
> )
> Elasticsearch can control field cache. I think ES could solve my
> problem.
>
> I'm likely to try 2nd, or 3rd way. Are these appropriate solutions? If you
> have any better ideas please let me know. I've went through too many
> troubles so it's time to make a decision. I want my choices reviewed by
> many other excellent Solr users and developers and also want to find better
> solutions.
> I really appreciate any help you can provide.
>


Re: date range queries efficiency

2014-05-16 Thread Jack Krupansky
My e-book has an example of an update processor that rounds to any specified 
resolution (e.g, day, year, hour, etc.)


The performance reason was for filter queries, to keep their uniqueness 
down, not random user queries, which should be fine unrounded, except that 
they can't be used for exact query matches such as year without expanding 
the date to a range for the full interval.


-- Jack Krupansky

-Original Message- 
From: Dmitry Kan

Sent: Friday, May 9, 2014 6:41 AM
To: solr-user@lucene.apache.org
Subject: date range queries efficiency

Hi,

There was a mention either on solr wiki or on this list, that in order to
optimize the date range queries, it is beneficial to round down the range
values.

For example, if a range query is:

DateTime:[NOW-3DAYS TO NOW]

then if the precision up to msec is not required, we can safely round that
down to a day or hour, for example:

DateTime:[NOW-3DAYS/DAY TO NOW/DAY]
DateTime:[NOW-3DAYS/HOUR TO NOW/HOUR]

What I'm wondering about is what other optimizations would make sense here
on the indexing side? Luke shows that solr stores dates as longs with
millisecond precision. So this seems to utilize the efficient Lucene
numeric range queries internally.

If we do not need msec precision on dates during search, does it make sense
to also "round" dates down during indexing? Are there any other tips and
tricks for efficient date range queries?

Thanks!

--
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan 



Re: commit persistence guarantee

2014-05-16 Thread Erick Erickson
This is almost always that you're committing too often, either  soft
commit or hard commit with openSearcher=true. Shouldn't have any
effect on the consistency of your index though.

It _is_ making your Solr work harder than you want it to, so consider
increasing the commit intervals substantially. If you're indexing from
SolrJ, it's _not_ a good idea to commit except, perhaps, at the very
end of the run. Let your solrconfig settings commit for you. Super
especially if you're indexing form multiple SolrJ programs.

Best,
Erick

On Wed, May 7, 2014 at 3:02 AM, Alvaro Cabrerizo  wrote:
> Hi,
>
> Is there any guarantee that every document is persisted on disk during a
> "commit avalanche" that produces the: "ERROR org.apache.solr.core.SolrCore
>  – org.apache.solr.common.SolrException: Error opening new searcher. *exceeded
> limit of maxWarmingSearchers*=1, try again later".
>
> I've made some tests using jmeter to generate the situation and I
> *allways*get all the documents *well
> stored*, although having ~4% of requests with a 503 response, complaining
> with the previous message in the log.
>
> Regards.
>
> notes:  I know about NearRealTime and the possibility of modifying the
> commit strategy in order to be more polite with Solr ;)


Re: slow performance on simple filter

2014-05-16 Thread Aman Tandon
Could you please share the solrconfigs and schema here for more debugging
the issue as well as you could also try by adding the extra parameter
(&debugQuery=true) to your request params. Then you can view the
parsed_query, the actual query parsed by solr.



With Regards
Aman Tandon


On Thu, May 15, 2014 at 2:11 AM, mizayah  wrote:

> Hey,
>
> I got pretty big index with about 7 mln doccuments.
>
>
> I got pretty slow query when i ask about common word. Nothing changes when
> i
> ask by q or fq.
>
>
>  params={fl=score,class_name&indent=true&q=*:*&wt=xml&fq=class_name:CdnFile}
> hits=5594978 status=0 QTime=408
>
> It't not about my hardware tho. I was sure solr is blazing fast if i do
> such
> simple queries.
> Why my query is so slow? Its because of to frequent worl "CdnFile"?
>
>
> PS.
> I index only this field into elasticsearch and im getting same responses
> like in 0,08s
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/slow-performance-on-simple-filter-tp4135613.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Replica as a "leader"

2014-05-16 Thread Erick Erickson
1. Indexing 100-200 docs per second.
2. Doing Pkill -9 java to 2 replicas (not the leader) in shard 3 (while
indexing).
3. Indexing for 10-20 minutes and doing hard commit.
4. Doing Pkill -9 java to the leader and then starting one replica in shard
3 (while indexing).

I think you're in uncharted territory. By only having the leader
running, indexing docs to it, then killing it, there's no way for one
of the restarted followers to know what docs were indexed. Eventually
the follower will become the leader and the docs are just lost.
Updates are NOT stored on ZK for instance.

Why do you expect the machines to "stay in down status"? SolrCloud is
doing the best it can. How do you expect this scenario to recover?

FWIW,
Erick

On Thu, May 8, 2014 at 8:00 AM, adfel70  wrote:
> Solr & Collection Info:
> solr 4.8 , 4 shards, 3 replicas per shard, 30-40 milion docs per shard.
>
> Process:
> 1. Indexing 100-200 docs per second.
> 2. Doing Pkill -9 java to 2 replicas (not the leader) in shard 3 (while
> indexing).
> 3. Indexing for 10-20 minutes and doing hard commit.
> 4. Doing Pkill -9 java to the leader and then starting one replica in shard
> 3 (while indexing).
> 5. After 20 minutes starting another replica in shard 3 ,while indexing (not
> the leader in step 1).
>
> Results:
> 2. Only the leader is active in shard 3.
> 3. Thousands of docs were added to the leader in shard 3.
> 4. After staring the replica, it's state was down and after 10 minutes it
> became the leader in cluster state (and still down). no servers hosting
> shards for index and search requests.
> 5. After starting another replica, it's state was recovering for 2-3 minutes
> and then it became active (not leader in cluster state).
> 6. Index, commit and search requests are handeled in the other replicae
> (*active status, not leader!!!*).
>
>
> Expected:
> 5. To stay in down status.
> *6. Not to handel index, commit and search requests - no servers hosting
> shards!*
>
> Thanks!
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Replica-as-a-leader-tp4135077.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: slow performance on simple filter

2014-05-16 Thread Jack Krupansky
Add the debugQuery=true parameter and look at the "timing" section to see 
which search component is consuming the time. Are you using faceting or 
highlighting?


7 million documents is actually a fairly small index.

-- Jack Krupansky

-Original Message- 
From: mizayah

Sent: Wednesday, May 14, 2014 4:41 PM
To: solr-user@lucene.apache.org
Subject: slow performance on simple filter

Hey,

I got pretty big index with about 7 mln doccuments.


I got pretty slow query when i ask about common word. Nothing changes when i
ask by q or fq.

params={fl=score,class_name&indent=true&q=*:*&wt=xml&fq=class_name:CdnFile}
hits=5594978 status=0 QTime=408

It't not about my hardware tho. I was sure solr is blazing fast if i do such
simple queries.
Why my query is so slow? Its because of to frequent worl "CdnFile"?


PS.
I index only this field into elasticsearch and im getting same responses
like in 0,08s




--
View this message in context: 
http://lucene.472066.n3.nabble.com/slow-performance-on-simple-filter-tp4135613.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Solrcore.properties variable question.

2014-05-16 Thread Shawn Heisey
> Hi,
>
> We have a couple of Solr servers acting as master and slave, and each
> server have the same amount of cores, we are trying to configure the
> solrcore.properties so that an script is able to add cores without
> changing the solrcore.properties using a hack like this:
>
> enable.master=false
> enable.slave=true
> master_url=http://master_solr:8983/solr/${solr.core.name}
>
> Our idea is to have solr.core.name to be the dynamic variable, but once
> we go to admin, the master URL is not showing the last part, is there a
> format error or something trivial I'm missing?

For the slaves, put the master url right in the solrconfig.xml file, with
the solr.core.name variable.

I know this works. I've used this exact configuration.

Thanks,
Shawn






DeleteByQuery via alias

2014-05-16 Thread ku3ia
Hi all!
I'm using Solr 4.6.0. I'd created three collections and combine them to
alias via CREATEALIAS API. 
I run delete request via curl, ex
curl
"http://127.0.0.1:8080/solr/all/update?stream.body=%3Cdelete%3E%3Cquery%3EText:dummy%3C/query%3E%3C/delete%3E";
where "all" is alias if three collections:
{"collection":{"all":"collection1,collection2,collection3"}}

But data were removed only from first collection in list.

Is it possible to remove data from all collections, using alias?
Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/DeleteByQuery-via-alias-tp4135700.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: spellcheck if docsfound below threshold

2014-05-16 Thread Dyer, James
Its "spellcheck.maxResultsForSuggest".

http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.maxResultsForSuggest

James Dyer
Ingram Content Group
(615) 213-4311

-Original Message-
From: Jan Verweij - Reeleez [mailto:j...@reeleez.nl] 
Sent: Monday, May 12, 2014 2:12 AM
To: solr-user@lucene.apache.org
Subject: spellcheck if docsfound below threshold

Hi,

Is there a setting to only include spellcheck if the number of documents
found is below a certain threshold?

Or would we need to rerun the request with the spellcheck parameters based
on the docs found?

Kind regards,

Jan Verweij


Cloudera Manager install

2014-05-16 Thread Michael Della Bitta
Hi everyone,

I'm investigating migrating over to an HDFS-based Solr Cloud install.

We use Cloudera Manager here to maintain a few other clusters, so
maintaining our Solr cluster with it as well is attractive. However, just
from reading the documentation, it's not totally clear to me what
version(s) of Solr I can install and manage with Cloudera Manager. I saw in
one place in the documentation an indication that Cloudera Search uses 4.4,
but then elsewhere I see the opportunity to use custom versions, and
finally, one indication that Cloudera Manager uses the "latest version."

I'm wondering if anybody has experience with installing a fairly new
version of Solr, say 4.7 or 4.8, through Cloudera Manager.


Michael Della Bitta

Applications Developer

o: +1 646 532 3062

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions  | g+:
plus.google.com/appinions
w: appinions.com 


Re: Question regarding the lastest version of HeliosSearch

2014-05-16 Thread Yonik Seeley
On Thu, May 15, 2014 at 3:44 PM, Jean-Sebastien Vachon
 wrote:
> I spent some time today playing around with subfacets and facets functions 
> now available in helios search 0.05 and I have some concerns... They look 
> very promising .

Thanks, glad for the feedback!

[...]
> the response looks good except for one little thing... the mincount is not 
> respected whenever I specify the facet.stat parameter. Removing it will cause 
> the mincount to be respected but then I need this parameter.

Right, the mincount parameter is not yet implemented.   Hopefully soon!

> {
>
>   "val":1133,
>
>   "unique(job_id)":0, <== what is this?
>
>   "count":0},
>  Many zero entries following...
>
> I was wondering where the extra entries were coming from... the position_id = 
> 1133 above is not even a match for my query (its title is "Audit Consultant")
> I`ve also noticed a similar behaviour when using subfacets. It looks like the 
> number of items returned always match the "facet.limit" parameter.
> If not enough values are present for a given entry then the bucket is filled 
> with documents not matching the original query.

Right... straight Solr faceting will do this too (unless you have a
mincount>0).  We're just looking at terms in the field and we don't
have enough context to know if some 0's make more sense than others to
return.

-Yonik
http://heliosearch.org - facet functions, subfacets, off-heap filters&fieldcache


Re: location of the files created by zookeeper?

2014-05-16 Thread Steve McKay
The config is stored in ZooKeeper. 
/configs/myconf/velocity/pagination_bottom.vm is a ZooKeeper path, not a 
filesystem path. The data on disk is in ZK's binary format. Solr uses the ZK 
client library to talk to the embedded server and read config data.

On May 16, 2014, at 2:47 AM, Aman Tandon  wrote:

> Any help here??
> 
> With Regards
> Aman Tandon
> 
> 
> On Thu, May 15, 2014 at 10:17 PM, Aman Tandon wrote:
> 
>> Hi,
>> 
>> Can anybody tell me where does the embedded zookeeper keeps your config
>> files.when we describe the configName in starting the solrcloud then it
>> gives that name to the directory, as guessed from the solr logs.
>> 
>> 
>> 
>> 
>> 
>> 
>> *4409 [main] INFO  org.apache.solr.common.cloud.SolrZkClient  – makePath:
>> /configs/myconf/velocity/footer.vm4456 [main] INFO
>> org.apache.solr.common.cloud.SolrZkClient  – makePath:
>> /configs/myconf/velocity/pagination_bottom.vm 4479 [main] INFO
>> org.apache.solr.common.cloud.SolrZkClient  – makePath:
>> /configs/myconf/velocity/head.vm4530 [main] INFO
>> org.apache.solr.common.cloud.SolrZkClient  – makePath:
>> /configs/myconf/velocity/pagination_top.vm 4555 [main] INFO
>> org.apache.solr.common.cloud.SolrZkClient  – makePath:
>> /configs/myconf/velocity/VM_global_library.vm4599 [main] INFO
>> org.apache.solr.common.cloud.SolrZkClient  – makePath:
>> /configs/myconf/velocity/suggest.vm*
>> 
>> 
>> With Regards
>> Aman Tandon
>> 



core.properties setup help

2014-05-16 Thread Aman Tandon
Hi,

In my solr-4.2 we were using the two cores as described below:


  





How should i setup the solr 4.7 for the core.properties of mcat and cat
cores for using the solrcloud?

With Regards
Aman Tandon


Re: getting direct link to solr result.

2014-05-16 Thread Alexandre Rafalovitch
This seems to be a relevant discussion:
http://stackoverflow.com/questions/9932722/android-app-solr .
Including some code links.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Sat, May 17, 2014 at 1:42 AM, blach  wrote:
> Yes Thank you,
>
> I got to solve by adding literal when indexing.
>
> now I'm trying to implement it into my android application, I used Solrj,
> but I found out that Solrj is just for java application and it's not working
> with Android.
>
> can you suggest me a way how to index a folder from my android application.
>
> thanks.
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/getting-direct-link-to-solr-result-tp4135084p4136266.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr + SPDY

2014-05-16 Thread harspras
Hi Markus,

SPDY does provide lower latency in the case when I have multiple requests to 
the same server/domain. It compresses the header and reduces the number of 
connections. But since it uses tls I am not sure if it will be faster than http 
1.1. That is why I wanted to test SPDY with solr for inter shard communication. 
Currently, using http for communication to solr is slow. 

Harsh

> On 17-May-2014, at 9:44 am, "Markus Jelsma-2 [via Lucene]" 
>  wrote:
> 
> Hi Harsh, 
> 
>   
> Does SPDY provide lower latency than HTTP/1.1 with KeepAlive or is it 
> encryption that you're after? 
> 
>   
> Markus 
> 
> 
>   
> -Original message- 
> From:harspras <[hidden email]> 
> Sent:Tue 13-05-2014 05:38 
> Subject:Re: Solr + SPDY 
> To:[hidden email]; 
> Hi Vinay, 
> 
> I have been trying to setup a similar environment with SPDY being enabled 
> for Solr inter shard communication. Did you happen to have been able to do 
> it? I somehow cannot use SolrCloud with SPDY enabled in jetty. 
> 
> Regards, 
> Harsh Prasad 
> 
> 
> 
> -- 
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-SPDY-tp4097771p4135377.html
> Sent from the Solr - User mailing list archive at Nabble.com. 
> 
> 
> If you reply to this email, your message will be added to the discussion 
> below:
> http://lucene.472066.n3.nabble.com/Solr-SPDY-tp4097771p4136507.html
> To unsubscribe from Solr + SPDY, click here.
> NAML




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-SPDY-tp4097771p4136544.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Easises way to insatll solr cloud with tomcat

2014-05-16 Thread Aman Tandon
@Matt, Sorry we don't want to use any other organisation product other than
Apache Foundation. Thanks anyway.

Anybody else here who can help me with the default tomcat installation
along with solr to configure solrcloud.

With Regards
Aman Tandon


On Wed, May 14, 2014 at 8:13 AM, Matt Kuiper (Springblox) <
matt.kui...@springblox.com> wrote:

> Check out http://heliosearch.com/download.html
>
> It is a distribution of Apache Solr packaged with Tomcat.
>
> I have found it simple to use.
>
> Matt
>
> -Original Message-
> From: Aman Tandon [mailto:amantandon...@gmail.com]
> Sent: Monday, May 12, 2014 6:24 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Easises way to insatll solr cloud with tomcat
>
> Can anybody help me out??
>
> With Regards
> Aman Tandon
>
>
> On Mon, May 12, 2014 at 1:24 PM, Aman Tandon  >wrote:
>
> > Hi,
> >
> > I tried to set up solr cloud with jetty which works fine. But in our
> > production environment we uses tomcat so i need to set up the solr
> > cloud with the tomcat. So please help me out to how to setup solr
> > cloud with tomcat on single machine.
> >
> > Thanks in advance.
> >
> > With Regards
> > Aman Tandon
> >
>


autowarming queries

2014-05-16 Thread Joshi, Shital
Hi,

How many auto warming queries are supported per collection in Solr4.4 and 
higher? We see one out of three queries in log when new searcher is created.

Thanks!





Re: solr optimize on fnm file

2014-05-16 Thread googoo
Erick,

Thanks for your update.

The problem this this data is will being until whole document in the section
be deleted.
I understand this is cause optimize double scan index folder in this case.
We may add some logic to check when the file size do this scan when the file
size is too bigger. 

Yongtao



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-optimize-on-fnm-file-tp4134969p4136569.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Using embedded zookeeper to make an ensemble

2014-05-16 Thread Steve McKay
Doing this doesn't avoid the need to configure and administrate ZK. Running a 
special snowflake setup to avoid downloading a tar.gz doesn't seem like a good 
trade-off to me.

On May 15, 2014, at 3:27 PM, Upayavira  wrote:

> Hi,
> 
> I need to set up a zookeeper ensemble. I could download Zookeeper and do
> it that way. I already have everything I need to run Zookeeper within a
> Solr install.
> 
> Is it possible to run a three node zookeeper ensemble by starting up
> three Solr nodes with Zookeeper enabled? Obviously, I'd only use these
> nodes for their Zookeeper, and keep their indexes empty.
> 
> I've made some initial attempts, and whilst it looks like it might be
> possible with -DzkRun and -DzkHost=, I haven't yet succeeded.
> 
> I think this could be a much easier way for people familar with Solr to
> get an ensemble up compared to downloading the Zookeeper distribution.
> 
> Thoughts?
> 
> Upayavira



Re: date range queries efficiency

2014-05-16 Thread Alexandre Rafalovitch
I thought the date math rounding was for _caching_ the repeated
queries, not so much the speed of the query itself.

Also, if you are using TrieDateField, precisionStep value is how
optimization is done. There is bucketing at different level of
precision, so the range search works at the least granular level
first, etc.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Fri, May 9, 2014 at 5:41 PM, Dmitry Kan  wrote:
> Hi,
>
> There was a mention either on solr wiki or on this list, that in order to
> optimize the date range queries, it is beneficial to round down the range
> values.
>
> For example, if a range query is:
>
> DateTime:[NOW-3DAYS TO NOW]
>
> then if the precision up to msec is not required, we can safely round that
> down to a day or hour, for example:
>
> DateTime:[NOW-3DAYS/DAY TO NOW/DAY]
> DateTime:[NOW-3DAYS/HOUR TO NOW/HOUR]
>
> What I'm wondering about is what other optimizations would make sense here
> on the indexing side? Luke shows that solr stores dates as longs with
> millisecond precision. So this seems to utilize the efficient Lucene
> numeric range queries internally.
>
> If we do not need msec precision on dates during search, does it make sense
> to also "round" dates down during indexing? Are there any other tips and
> tricks for efficient date range queries?
>
> Thanks!
>
> --
> Dmitry
> Blog: http://dmitrykan.blogspot.com
> Twitter: http://twitter.com/dmitrykan


Re: Multiple highlight snippet for single field

2014-05-16 Thread Koji Sekiguchi

Hi Bijan,

Have you tried to set hl.maxAnalyzedChars parameter to larger number?

hl.maxAnalyzedChars
http://wiki.apache.org/solr/HighlightingParameters#hl.maxAnalyzedChars

As the default value of the parameter is 51200, if the second "Andy" is
at the end paragraph of your large stored field, the highloghter doesn't
deals with the second Andy.

Koji
--
http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html

(2014/05/16 13:25), Bijan Pourriahi wrote:

Hello all,

I am trying to return multiple snippets from a single document with a field 
which includes many (5+) instances of the word ‘andy’ in the text. For some 
reason, I can only get it to return one snippet. Any ideas?

Here’s the query and the response:
http://codejaw.com/2gwoozr

Thanks!

- Bijan

This e-mail transmission and any documents, files or previous e-mail messages 
attached to it, are confidential. If you are not the intended recipient, or a 
person responsible for delivering it to the intended recipient, you are hereby 
notified that any review, disclosure, copying, dissemination, distribution or 
use of any of the information contained in, or attached to this e-mail 
transmission is STRICTLY PROHIBITED. If you have received this transmission in 
error, please immediately notify the sender then delete immediately.







Re: distrib=false is not honoring

2014-05-16 Thread Aman Tandon
Thanks Jack i am using *q.alt* just for testing purpose only we uses
*q=query* in our general production environment case and *mcat.intent* is
our request handler to add extra number of rows and all.

Here i was doing some mistake to properly explaining the situation, so i am
sorry for that.

*Requirement:* I want to test my sharded environment that a unique document
should present in single shard not in both.

*core name*: mcats
*core.properties*: name=mcats, so default collection name would be mcats as
well.

And i was taking my non sharded index(mcats index) and copying it to node1
as well as node 2 and starting the first node as:
*java -DzkRun -DnumShards=2 -Dbootstrap_confdir=./solr/mcats/conf
-Dcollection.configName=myconf -jar start.jar*

And second node as
*java -Djetty.port=1983 -DzkHost=localhost:9983 -jar start.jar*

So i guess it is taking the whole index as it is because when i run the
query.

http://localhost:8983/solr/mcats/select?q.alt=*:*
it was giving me the sum of documents in both the shards which is 2*no of
docs in mcats index.

So the same document is present in both the shard at node1:8983 and
node2:1983

To figure this out i indexed it with the another different doc now when i
queried with
http://localhost:8983/solr/mcats/select?q.alt=id:17406780&distrib=false
the document is present.

But in another query
http://localhost:8983/solr/mcats/select?q.alt=id:17406780&distrib=flase
it is not found.

which fulfilled my test case.

So i thought i have to do the full indexing of my core mcats to validate my
test case for each ids. Please correct me if i am wrong.


With Regards
Aman Tandon


On Wed, May 14, 2014 at 5:52 PM, Jack Krupansky wrote:

> The q.alt param specifies only the parameter to use if the q parameter is
> missing. Could you verify whether that is really the case? Typically
> solrconfig gives a default of "*:*" for the q parameter. Specifying a query
> via the q.alt parameter seems like a strange approach - what is your
> rationale?
>
> What is this odd "mcat.intent" query response writer type that you are
> specifying with the qt parameter?
>
> -- Jack Krupansky
>
> -Original Message- From: Aman Tandon
> Sent: Wednesday, May 14, 2014 1:35 AM
> To: solr-user@lucene.apache.org
> Subject: distrib=false is not honoring
>
>
> I am trying to use the solr cloud for solr 4.2.0 and solr 4.7.1.
> Here mcats is our collection name.
>
> *With solr 4.2*
>
> shard1: localhost:8019
> shard1: localhost:6019
>
> *With solr 4.7.1*
>
> shard1: localhost:8983
> shard1: localhost:1983
>
> With both the server i make the copy of example directory as mentioned in
> wiki, and queried over the both nodes
>
> *query 1*:
> http://localhost:8983/solr/mcats/select?q.alt=id:69763&;
> distrib=false&qt=mcat.intent
> *query 2*:
>
> http://localhost:1983/solr/mcats/select?q.alt=id:69763&;
> distrib=false&qt=mcat.intent
>
> total number of docs count become half but i search for specific id then
> the result is same, if distrib=false this enables to search on particular
> node then one of these node  should not return the return the result. If i
> am incorrect please help me out to test that one record is present in one
> shard only.
>
> With Regards
> Aman Tandon
>


Re: Solr performance: multiValued filed vs separate fields

2014-05-16 Thread Shawn Heisey
On 5/15/2014 8:29 AM, danny teichthal wrote:
> I wonder about performance difference of 2 indexing options: 1- multivalued
> field 2- separate fields
> 
> The case is as follows: Each document has 100 “properties”: prop1..prop100.
> The values are strings and there is no relation between different
> properties. I would like to search by exact match on several properties by
> known values (like ids). For example: search for all docs having
> prop1=”blue” and prop6=”high”
> 
> I can choose to build the indexes in 1 of 2 ways: 1- the trivial way – 100
> separate fields, 1 for each property, multiValued=false. the values are
> just property values. 2- 1 field (named “properties”) multiValued=true. The
> field will have 100 values: value1=”prop1:blue”.. value6=”high” etc
> 
> Is it correct to say that option1 will have much better performance in
> searching? How about indexing performance?

I cannot say for absolute certain, but I do not believe there would be
any significant difference in indexing performance.  If you have
separate fields, query performance is likely to be better, because fewer
terms will need to be examined.  The index might be a little bit bigger
with multiple fields.

Thanks,
Shawn



Re: date range queries efficiency

2014-05-16 Thread Shawn Heisey
On 5/15/2014 1:34 AM, Alexandre Rafalovitch wrote:
> I thought the date math rounding was for _caching_ the repeated
> queries, not so much the speed of the query itself.

Absolutely correct.  When NOW is used without rounding, caching is
completely ineffective.  This is because if the same query using NOW is
sent multiple times several seconds apart, every one of those queries
will be different after they are parsed and NOW is converted to an
actual timestamp.

> Also, if you are using TrieDateField, precisionStep value is how
> optimization is done. There is bucketing at different level of
> precision, so the range search works at the least granular level
> first, etc.

Some nitty-gritty details of how range queries are accelerated with the
Trie data types and precisionStep are described in the Javadoc for
NumericRangeQuery:

http://lucene.apache.org/core/4_8_0/core/org/apache/lucene/search/NumericRangeQuery.html

Thanks,
Shawn



Re: retreive all the fields in join

2014-05-16 Thread Kranti Parisa
Aman,

The option you have got is:
- write custom components like request handlers, collectors & response
writers..
- first you would do the join, then apply the pagination
- you will get the docList in response writer, you would need to make a
call to the second core (you could be smart to use the FQs so that you
could hit the cache and hence the second call will be fast) and fetch the
documents
- use them for building the response

out of the box Solr won't do this for you..

Thanks,
Kranti K. Parisa
http://www.linkedin.com/in/krantiparisa



On Mon, May 12, 2014 at 7:05 AM, Mikhail Khludnev <
mkhlud...@griddynamics.com> wrote:

> On Sun, May 11, 2014 at 12:14 PM, Aman Tandon  >wrote:
>
> > Is it possible?
>
>
> no.
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> 
>  
>


Re: What is the usage of solr.NumericPayloadTokenFilterFactory

2014-05-16 Thread ienjreny
Regarding to your question: "That said, are you sure you want to be using
the payload feature of Lucene? "

I don't know because I don't know what is the benefits from this tokenizer,
and what Payload means here!


On Sat, May 17, 2014 at 2:45 AM, Jack Krupansky-2 [via Lucene] <
ml-node+s472066n4136467...@n3.nabble.com> wrote:

> I do have basic coverage for that filter (and all other filters) and the
> parameter values in my e-book:
>
>
> http://www.lulu.com/us/en/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-7/ebook/product-21203548.html
>
> That said, are you sure you want to be using the payload feature of
> Lucene?
>
> -- Jack Krupansky
>
> -Original Message-
> From: ienjreny
> Sent: Monday, May 12, 2014 12:51 PM
> To: [hidden email] 
> Subject: What is the usage of solr.NumericPayloadTokenFilterFactory
>
> Dears:
> Can any body explain at easy way what is the benefits of
> solr.NumericPayloadTokenFilterFactory and what is acceptable values for
> typeMatch
>
> Thanks in advance
>
>
>
> --
> View this message in context:
>
> http://lucene.472066.n3.nabble.com/What-is-the-usage-of-solr-NumericPayloadTokenFilterFactory-tp4135326.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>
>
> --
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/What-is-the-usage-of-solr-NumericPayloadTokenFilterFactory-tp4135326p4136467.html
>  To unsubscribe from What is the usage of
> solr.NumericPayloadTokenFilterFactory, click 
> here
> .
> NAML
>




--
View this message in context: 
http://lucene.472066.n3.nabble.com/What-is-the-usage-of-solr-NumericPayloadTokenFilterFactory-tp4135326p4136597.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Using embedded zookeeper to make an ensemble

2014-05-16 Thread Shawn Heisey
On 5/16/2014 6:43 PM, Steve McKay wrote:
> Doing this doesn't avoid the need to configure and administrate ZK. Running a 
> special snowflake setup to avoid downloading a tar.gz doesn't seem like a 
> good trade-off to me.
> 
> On May 15, 2014, at 3:27 PM, Upayavira  wrote:
> 
>> Hi,
>>
>> I need to set up a zookeeper ensemble. I could download Zookeeper and do
>> it that way. I already have everything I need to run Zookeeper within a
>> Solr install.
>>
>> Is it possible to run a three node zookeeper ensemble by starting up
>> three Solr nodes with Zookeeper enabled? Obviously, I'd only use these
>> nodes for their Zookeeper, and keep their indexes empty.
>>
>> I've made some initial attempts, and whilst it looks like it might be
>> possible with -DzkRun and -DzkHost=, I haven't yet succeeded.

Although it would be *possible* to do this, it seems like a huge amount
of overhead (jetty and Solr) just to get zookeeper running.

If you plan to actually put Solr indexes on that node, overhead would be
less of a problem, but there's another reason to have zookeeper in an
entirely separate process even if you're sharing hardware: If you're
running zookeeper as part of Solr, you can't shut down or restart Solr
without affecting the zookeeper ensemble and possibly forcing an
election.  If zookeeper is a separate process, then restarting Solr
cannot cause disruption within the ensemble.

Thanks,
Shawn



Re: WordDelimiterFilterFactory and StandardTokenizer

2014-05-16 Thread Shawn Heisey
On 5/16/2014 9:24 AM, aiguofer wrote:
> Jack Krupansky-2 wrote
>> Typically the white space tokenizer is the best choice when the word 
>> delimiter filter will be used.
>>
>> -- Jack Krupansky
> 
> If we wanted to keep the StandardTokenizer (because we make use of the token
> types) but wanted to use the WDFF to get combinations of words that are
> split with certain characters (mainly - and /, but possibly others as well),
> what is the suggested way of accomplishing this? Would we just have to
> extend the JFlex file for the tokenizer and re-compile it?

You can use the ICUTokenizer instead, and pass it a special rulefile
that makes it only break Latin characters on whitespace instead of all
the usual places.  This is exactly what I do in my index.

In the Solr source code, you can find this special rulefile at the
following path:

lucene/analysis/icu/src/test/org/apache/lucene/analysis/icu/segmentation/Latin-break-only-on-whitespace.rbbi

You would place the rule file in the same location as schema.xml, and
then use this in your fieldType:



Note that the ICUTokenizer requires that you add contrib jars to your
Solr install -- the required jars and a README outlining which files you
need are included in the Solr download in solr/contrib/analysis-extras.

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ICUTokenizerFactory

Thanks,
Shawn