date:20140818

Grouping based on multiple filters/criterias

2014-08-18 Thread deniz

is it possible to have multiple filters/criterias on grouping? I am trying to
do something like those, and I am assuming that from the statuses of the
tickets, it doesnt seem possible?

https://issues.apache.org/jira/browse/SOLR-2553
https://issues.apache.org/jira/browse/SOLR-2526
https://issues.apache.org/jira/browse/LUCENE-3257

To make everything clear, here is details which I am planning to do with
Solr...

so there is an activity feed of a site and it is basically working like
facebook or linkedin newsfeed, though there is no relationship between
users, it doesnt matter if i am following someone or not, as long as their
settings allows me to see their posts and they hit my search filter, i will
see their posts.

the part related with grouping is tricky... so lets assume that you are able
to see my posts, and I have posted 8 activities in the last one hour, those
activities should appear different than other posts, as it would be a
combined view of the posts...

i.e

activity one
activity two
.
activity eight

single activity

single activity

activity one
activity two

So here the results should be grouped depending on their post times...

on solr (4.7.2), i am indexing activities as documents, and each document
has bunch of fields including timestamp and source_user etc etc.

is it possible to do this on current solr?

(in case the details are not clear, please feel free to ask for more details
:) )

-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context:
http://lucene.472066.n3.nabble.com/Grouping-based-on-multiple-filters-criterias-tp4153462.html
Sent from the Solr - User mailing list archive at Nabble.com.

Please help to filter of group.limit

2014-08-18 Thread Phi Hoang Hai

Dear everyone.
My problem have 2 query
1) Get top 1 of each group (group.limit = 1 AND group.sort = date desc &
group.field=ABC)
2) Filter to get document of each group match condition. If document don't
match condition then remove of list result.

Help me.
Thanks you.

Hải

optimize and .nfsXXXX files

2014-08-18 Thread BorisG

Hi,
I am using solr 3.6.2. 
I use NFS and my index folder is a mounted folder.
When I run the command:
:/solr/collection1/update?optimize=true&maxSegments=1&waitFlush=true&expungeDeletes=true
in order to optimize my index, I have some .nfsX files created while the
optimize is running.
The problem that i am having is that after optimize finishes its run the
.nfs files aren't deleted.
When I close the solr process they immediately disappear.
I don't want to restart the solr process after each optimize, is there
anything that can be done in order for solr to get rid of those files.

Thanks,




--
View this message in context: 
http://lucene.472066.n3.nabble.com/optimize-and-nfs-files-tp4153473.html
Sent from the Solr - User mailing list archive at Nabble.com.

Solr Clustering component different results than Carrot workbench

2014-08-18 Thread Yavar Husain

Though I am interacting with Dawid (creator of Carrot2) on Carrot2 mailing
list however just wanted to post my problem to a wider audience.

I am using Solr 4.7 (on both windows and linux) and saved my
lingo-attributes.xml file from the workbench which I am using in Solr. Note
that for testing I am just having one solr Index and all the queries are
getting fired on that.

Now the clusters that I am getting are good in the workbench (carrot) but
pathetic in Solr. In the logs (jetty) I can see:

Loaded Solr resource: clustering/carrot2/lingo-attributes.xml, so that
indicates that my attribute file is being loaded.

I am really confused what is accounting for the difference in the two
outputs (workbench vs Solr). Again to reiterate the data sources are same
(just one solr index and same queries with 100 results). This is happening
on both Linux and Windows.

Given below is my search component and request handler configuration:



  lingo

  
  org.carrot2.clustering.lingo.LingoClusteringAlgorithm
  30


  
  clustering/carrot2



  

  
  

  true
  true
  
  org.carrot2.clustering.lingo.LingoClusteringAlgorithm
  clustering/carrot2
  film_id
  
  description
  
  true
  
  
  
  false
  100


  clustering

Re: optimize and .nfsXXXX files

2014-08-18 Thread Michael McCandless

Soft commit (i.e. opening a new IndexReader in Lucene and closing the
old one) should make those go away?

The .nfsX files are created when a file is deleted but a local
process (in this case, the current Lucene IndexReader) still has the
file open.

Mike McCandless

http://blog.mikemccandless.com


On Mon, Aug 18, 2014 at 5:20 AM, BorisG  wrote:
> Hi,
> I am using solr 3.6.2.
> I use NFS and my index folder is a mounted folder.
> When I run the command:
> :/solr/collection1/update?optimize=true&maxSegments=1&waitFlush=true&expungeDeletes=true
> in order to optimize my index, I have some .nfsX files created while the
> optimize is running.
> The problem that i am having is that after optimize finishes its run the
> .nfs files aren't deleted.
> When I close the solr process they immediately disappear.
> I don't want to restart the solr process after each optimize, is there
> anything that can be done in order for solr to get rid of those files.
>
> Thanks,
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/optimize-and-nfs-files-tp4153473.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Editing http://wiki.apache.org/solr/PublicServers

2014-08-18 Thread ikulcsar


Hy,

My name is Istvan Kulcsar and i would like to edit this page:
http://wiki.apache.org/solr/PublicServers

Here is some SOLR search:
http://www.odrportal.hu/kereso/
http://idea.unideb.hu/idealista/
http://www.jobmonitor.hu
http://www.profession.hu/
http://webicina.com/
http://www.cylex.hu/
Én (14.08.13 23:14)
http://kozbeszerzes.ceu.hu/

Thanks for help.

Greets,
Steve

Re: Retrieving and updating large set of documents on Solr 4.7.2

2014-08-18 Thread Otis Gospodnetic

Hi,

Not sure if you've seen https://issues.apache.org/jira/browse/SOLR-5244 ?

It's not in Solr 4.7.2, but may be a good excuse to update Solr.

Otis
--
Solr Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Mon, Aug 18, 2014 at 4:09 AM, deniz  wrote:

>  0 down vote favorite
>
>
> I am trying to implement an activity feed for a website, and planning to
> use
> Solr for this case. As it does not have any follower/following relation,
> Solr is fitting for the requirements.
>
> There is one point which makes me concerned about performance. So as user
> A,
> I may have 10K activities in the feed, and then I have updated my
> preferences, so the activities that I have posted should be updated too
> (imagine that I am changing my user name, so all of the activities would
> have my new username). In order to update the all 10K activities, i need to
> retrieve the unique document ids from Solr, then update them. Retrieving
> 10K
> docs at once is not a good idea, if you imagine bunch of other users are
> also doing a similar change. I have checked docs and forums, using Cursors
> on Solr seems ok, but still makes me thing about the performance (after id
> retrieval, i need to update each activity)
>
> Are there any other ways to handle this withou Cursors? Or I should better
> use another tool/backend to have something like a username - activity_id
> mapping, so i can directly retrieve the ids to update?
>
> Regards,
>
>
>
>
> -
> Zeki ama calismiyor... Calissa yapar...
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Retrieving-and-updating-large-set-of-documents-on-Solr-4-7-2-tp4153457.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: How to search for phrase "IAE_UPC_0001"

2014-08-18 Thread Paul Rogers

Hi Guys

I've been checking into this further and have deleted the index a couple of
times and rebuilt it with the suggestions you've supplied.

I had a bit of an epiphany last week and decided to check if the document I
was searching for was actually in the index (did this by doing a *.* query
to a file and grep'ing for the 'IAE_UPC_0001@ string).  It seems it isn't!!
Not sure if it was in the original index or not, tho' I suspect not.

As far as I can see anything with the reference in the form IAE_UPC_
has not been indexed while those with the reference in the form
IAE-UPC- has.  Not sure if that's a coincidence or not.

Need to see if I can get the docs into the index and then check if the
search works or not.  Will see if the guys on the Nutch list can shed any
light.

All the best.

P


On 4 August 2014 17:09, Jack Krupansky  wrote:

> The standard tokenizer treats underscore as a valid token character, not a
> delimiter.
>
> The word delimiter filter will treat underscore as a delimiter though.
>
> Make sure your query-time WDF does not have preserveOriginal="1" - but the
> index-time WDF should have preserveOriginal="1". Otherwise, the query
> phrase will generate an extra token which will participate in the matching
> and might cause a mismatch.
>
> -- Jack Krupansky
>
> -Original Message- From: Paul Rogers
> Sent: Monday, August 4, 2014 5:55 PM
>
> To: solr-user@lucene.apache.org
> Subject: Re: How to search for phrase "IAE_UPC_0001"
>
> Hi Guys
>
> Thanks for the replies.  I've had a look at the WordDelimiterFilterFactory
> and the Term Info for the url field.  It seems that all the terms exist and
> I now understand that each url is being broken up using the delimiters
> specified.  But I think I'm still missing something.
>
> Am I correct in assuming the minus sign (-) is also a delimiter?
>
> If so why then does  url:"IAE-UPC-0001" return a result (when the url
> contains the substring IAE-UPC-0001) whereas  url:"IAE_UPC_0001" doesn't
> (when the url contains the substring IAE_UPC_0001)?
>
> Secondly if the url has indeed been broken into the terms IAE UPC and 0001
> why do all the searches suggested or tried succeed when the delimiter is a
> minus sign (-) but not when the delimiter is an underscore (_), returning
> zero matches?
>
> Finally, shouldn't the query url:"IAE UPC 0001"~1 work since all it is
> looking for is the three terms?
>
> Many thanks for any enlightenment.
>
> P
>
>
>
>
> On 4 August 2014 01:33, Harald Kirsch  wrote:
>
>  This all depends on how the tokenizers take your URLs apart. To quickly
>> see what ended up in the index, go to a core in the UI, select Schema
>> Browser, select the field containing your URLs, click on "Load Term Info".
>>
>> In your case, for the field holding the URL you could try to switch to a
>> tokenizer that defines tokens as a sequence of alphanumeric characters,
>> roughly [a-z0-9]+ plus diacritics. In particular punctuation and
>> separation
>> characters like dash, underscore, slash, dot and the like would never be
>> part of a token, i.e. they don't make a difference.
>>
>> Then you can search the url parts with a phrase query (
>> https://cwiki.apache.org/confluence/display/solr/The+
>> Standard+Query+Parser#TheStandardQueryParser-
>> SpecifyingTermsfortheStandardQueryParserwhich) like
>>
>>  url:"IAE-UPC-0001"
>>
>> In the same way as during indexing, the dashes are removed to end up with
>> three tokens, namely IAE, UPC and 0001. Further they have to be in that
>> order. Naturally this will then match anything like:
>>
>>   "IAE_UPC_0001"
>>   "IAE UPC 0001"
>>   "IAE/UPC+0001"
>>   "IAE\UPC\0001"
>>   "IAE.UPC,0001"
>>
>> Depending on how your URLs are structured, there is the chance for false
>> positives, of course.
>>
>> The Really Good Thing here is, that you don't need to use wildcards.
>>
>> I have not yet looked at the wildcard-queries implementation in
>> Solr/Lucene, but with the  commercial search engines I know, they are a
>> great way to loose the confidence of your users, because they just don't
>> work as expected by anyone not knowing the implementation. Either they
>> deliver only partial results or they kill the performance or they even go
>> OOM. If Solr committers have not done something really ingenious,
>> Solr/Lucene does have the same problems.
>>
>> Harald.
>>
>>
>>
>>
>>
>>
>> On 31.07.2014 18:31, Paul Rogers wrote:
>>
>>  Hi Guys
>>>
>>> I have a Solr application searching on data uploaded by Nutch.  The
>>> search
>>> I wish to carry out is for a particular document reference contained
>>> within
>>> the "url" field, e.g. IAE-UPC-0001.
>>>
>>> The problem is is that the file names that comprise the url's are not
>>> consistent, so a url might contain the reference as IAE-UPC-0001 or
>>> IAE_UPC_0001 (ie using either the minus or underscore as the delimiter)
>>> but
>>> not both.
>>>
>>> I have created the query (in the solr admin interface):
>>>
>>> url:"IAE-UPC-0001"
>>>
>>> which works (retur

Combining a String Tag with a Numeric Value

2014-08-18 Thread Dave Seltzer

Hello!

I have some new entity data that I'm indexing which takes the form of:

String: EntityString
Float: Confidence

I want to add these to a generic "Tags" field (for faceting), but I'm not
sure how to hold onto the confidence. Token Payloads seem like one method,
but then I'm not sure how to extract the Payload.

Alternatively I could create two fields: TagIndexed which stores just the
string value and TagStored which contains a delimited String|Float.

What's the right way to do this?

Thanks!

-D

Re: Editing http://wiki.apache.org/solr/PublicServers

2014-08-18 Thread Erick Erickson

Steve:

Sure. What we need to add you to the contributor's group is
your Wiki logon though. Provide us that and we'll
add you ASAP.

Best,
Erick

On Mon, Aug 18, 2014 at 3:14 AM,   wrote:
> Hy,
>
> My name is Istvan Kulcsar and i would like to edit this page:
> http://wiki.apache.org/solr/PublicServers
>
> Here is some SOLR search:
> http://www.odrportal.hu/kereso/
> http://idea.unideb.hu/idealista/
> http://www.jobmonitor.hu
> http://www.profession.hu/
> http://webicina.com/
> http://www.cylex.hu/
> Én (14.08.13 23:14)
> http://kozbeszerzes.ceu.hu/
>
> Thanks for help.
>
> Greets,
> Steve

Re: How to search for phrase "IAE_UPC_0001"

2014-08-18 Thread Erick Erickson

I'd pull Nutch out of the mix here as a test. Create
some test docs (use the exampleDocs directory?) and
go from there at least long enough to insure that Solr
does what you expect if the data gets there properly.

You can set this up in about 10 minutes, and test it
in about 15 more. May save you endless hours.

Because you're conflating two issues here:
1> whether Nutch is sending the data
2> whether Solr is indexing and searching as you expect.

Some of the Solr/Lucene analysis chains do transformations
that may not be what you assume, particularly things
like StandardTokenizer and WordDelimiterFilterFactory.

So I'd take the time to see that the values you're dealing
with are behaving as you expect. The admin/analysis page
will help you a _lot_ here.

Best,
Erick




On Mon, Aug 18, 2014 at 7:16 AM, Paul Rogers  wrote:
> Hi Guys
>
> I've been checking into this further and have deleted the index a couple of
> times and rebuilt it with the suggestions you've supplied.
>
> I had a bit of an epiphany last week and decided to check if the document I
> was searching for was actually in the index (did this by doing a *.* query
> to a file and grep'ing for the 'IAE_UPC_0001@ string).  It seems it isn't!!
> Not sure if it was in the original index or not, tho' I suspect not.
>
> As far as I can see anything with the reference in the form IAE_UPC_
> has not been indexed while those with the reference in the form
> IAE-UPC- has.  Not sure if that's a coincidence or not.
>
> Need to see if I can get the docs into the index and then check if the
> search works or not.  Will see if the guys on the Nutch list can shed any
> light.
>
> All the best.
>
> P
>
>
> On 4 August 2014 17:09, Jack Krupansky  wrote:
>
>> The standard tokenizer treats underscore as a valid token character, not a
>> delimiter.
>>
>> The word delimiter filter will treat underscore as a delimiter though.
>>
>> Make sure your query-time WDF does not have preserveOriginal="1" - but the
>> index-time WDF should have preserveOriginal="1". Otherwise, the query
>> phrase will generate an extra token which will participate in the matching
>> and might cause a mismatch.
>>
>> -- Jack Krupansky
>>
>> -Original Message- From: Paul Rogers
>> Sent: Monday, August 4, 2014 5:55 PM
>>
>> To: solr-user@lucene.apache.org
>> Subject: Re: How to search for phrase "IAE_UPC_0001"
>>
>> Hi Guys
>>
>> Thanks for the replies.  I've had a look at the WordDelimiterFilterFactory
>> and the Term Info for the url field.  It seems that all the terms exist and
>> I now understand that each url is being broken up using the delimiters
>> specified.  But I think I'm still missing something.
>>
>> Am I correct in assuming the minus sign (-) is also a delimiter?
>>
>> If so why then does  url:"IAE-UPC-0001" return a result (when the url
>> contains the substring IAE-UPC-0001) whereas  url:"IAE_UPC_0001" doesn't
>> (when the url contains the substring IAE_UPC_0001)?
>>
>> Secondly if the url has indeed been broken into the terms IAE UPC and 0001
>> why do all the searches suggested or tried succeed when the delimiter is a
>> minus sign (-) but not when the delimiter is an underscore (_), returning
>> zero matches?
>>
>> Finally, shouldn't the query url:"IAE UPC 0001"~1 work since all it is
>> looking for is the three terms?
>>
>> Many thanks for any enlightenment.
>>
>> P
>>
>>
>>
>>
>> On 4 August 2014 01:33, Harald Kirsch  wrote:
>>
>>  This all depends on how the tokenizers take your URLs apart. To quickly
>>> see what ended up in the index, go to a core in the UI, select Schema
>>> Browser, select the field containing your URLs, click on "Load Term Info".
>>>
>>> In your case, for the field holding the URL you could try to switch to a
>>> tokenizer that defines tokens as a sequence of alphanumeric characters,
>>> roughly [a-z0-9]+ plus diacritics. In particular punctuation and
>>> separation
>>> characters like dash, underscore, slash, dot and the like would never be
>>> part of a token, i.e. they don't make a difference.
>>>
>>> Then you can search the url parts with a phrase query (
>>> https://cwiki.apache.org/confluence/display/solr/The+
>>> Standard+Query+Parser#TheStandardQueryParser-
>>> SpecifyingTermsfortheStandardQueryParserwhich) like
>>>
>>>  url:"IAE-UPC-0001"
>>>
>>> In the same way as during indexing, the dashes are removed to end up with
>>> three tokens, namely IAE, UPC and 0001. Further they have to be in that
>>> order. Naturally this will then match anything like:
>>>
>>>   "IAE_UPC_0001"
>>>   "IAE UPC 0001"
>>>   "IAE/UPC+0001"
>>>   "IAE\UPC\0001"
>>>   "IAE.UPC,0001"
>>>
>>> Depending on how your URLs are structured, there is the chance for false
>>> positives, of course.
>>>
>>> The Really Good Thing here is, that you don't need to use wildcards.
>>>
>>> I have not yet looked at the wildcard-queries implementation in
>>> Solr/Lucene, but with the  commercial search engines I know, they are a
>>> great way to loose the confidence of your u

Re: Combining a String Tag with a Numeric Value

2014-08-18 Thread Erick Erickson

Hmmm, there's no particular "right way". It'd be simpler
to index these as two separate fields _if_ there's only
one pair per document. If there are more and you index them
as two mutliValued fields, there's no good way at _query_ time
to retain the association. The returned multiValued fields are
guaranteed to be in the same order of insertion so you can
display the correct pairs, but you can't use the association
to score docs. Hmmm, somewhat abstract. OK say you want to
associate two tag/value pairs, tag1:50 and tag2:100. Say further
that you have two multiValued fields, Tags and Values and then
index tag1 and tag2 into Tags and 50 and 100 into Values.
There's no good way to express "q=tags:tag1 and factor the
associated value of 50 into the score"

Note that the returned _values_ will be
Tags:   tag1 tag2
Values  50  100

So at that point you can see the associations.

that said, if there's only _one_ such tag/value pair per document,
it's easy to write a FunctionQuery (http://wiki.apache.org/solr/FunctionQuery)
that does this.

***

If you have many tag/value pairs, payloads are probably what you want.
Here's an end-to-end example:

http://searchhub.org/2014/06/13/end-to-end-payload-example-in-solr/

Best,
Erick

On Mon, Aug 18, 2014 at 7:32 AM, Dave Seltzer  wrote:
> Hello!
>
> I have some new entity data that I'm indexing which takes the form of:
>
> String: EntityString
> Float: Confidence
>
> I want to add these to a generic "Tags" field (for faceting), but I'm not
> sure how to hold onto the confidence. Token Payloads seem like one method,
> but then I'm not sure how to extract the Payload.
>
> Alternatively I could create two fields: TagIndexed which stores just the
> string value and TagStored which contains a delimited String|Float.
>
> What's the right way to do this?
>
> Thanks!
>
> -D

need help in field collapsing

2014-08-18 Thread Sankalp Gupta

Hi

I have about 15 fields in my solr schema but there are two fields lets say
them field1 and field2 in my schema. For most searches I feel I have a
perfect schema but for one use case it is not apt:
*problem*: I have to group by column using field1 and then I have to search
a particular value "a" in field1 only when "b" is not present in any
instance of field2 of this respective group. (Same as using "having" after
group by in mysql). Is there a way to do this in Solr or do I have to
maintain a separate schema for this(which will be a very costly operation
for us).

Thanks in advance

solr cloud going down repeatedly

2014-08-18 Thread Jakov Sosic


Hi guys.

I have a solr cloud, consisting of 3 zookeper VMs running 3.4.5 
backported from Ubuntu 14.04 LTS to 12.04 LTS.


They are orchestrating 4 solr nodes, which have 2 cores. Each core is 
sharded, so 1 shard is on each of the solr nodes.


Solr runs under tomcat7 and ubuntus latest openjdk 7.

Version of solr is 4.2.1.

Each of the nodes have around 7GB of data, and JVM is set to run 8GB 
heap. All solr nodes have 16GB RAM.



Few weeks back we started having issues with this installation. Tomcat 
was filling up catalina.out with following messages:


SEVERE: org.apache.solr.common.SolrException: no servers hosting shard:


Only solution was to restart all 4 tomcats on 4 solr nodes. After that, 
issue would rectify itself, but would occur again, approximately a week 
after a restart.


This happened last time yesterday, and I succeded in recording some of 
the stuff happening on boxes via Zabbix and atop.



Basically at 15:35 load on machine went berzerk, jumping from around 0.5 
to around 30+


Zabbix and atop didn't notice any heavy IO, all the other processes were 
practicaly idle, only JVM (tomcat) exploded with cpu usage increasing 
from standard ~80% to around ~750%


These are the parts of Atop recordings on one of the node. Note that 
they are 10 mins appart:


(15:28:42)
CPL | avg10.12  |   | avg50.36  | avg15   0.38  |

(15:38:42)
CPL | avg18.54  |   | avg53.62  | avg15   1.61  |

(15:48:42)
CPL | avg1   30.14  |   | avg5   27.09  | avg15  14.73  |



This is the status of tomcat at last point (15:48:42):
28891tomcat7 tomcat7  411  8.68s  70m14s 
   209.9M  204K0K 5804K --  - 
  S5704%java



I have noticed similar stuff happening around the solr nodes. At 17:41 
on call person decided to hard reset all the solr nodes, and cloud came 
back up running normally after that.


These are the logs that I found on first node:

Aug 17, 2014 3:44:58 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: no servers hosting shard:

Aug 17, 2014 3:46:12 PM 
org.apache.solr.cloud.OverseerCollectionProcessor run

WARNING: Overseer cannot talk to ZK
Aug 17, 2014 3:46:12 PM 
org.apache.solr.cloud.Overseer$ClusterStateUpdater amILeader

WARNING:
org.apache.zookeeper.KeeperException$SessionExpiredException: 
KeeperErrorCode = Session expired for /overseer_elect/leader


Then a bunch of :

Aug 17, 2014 3:46:42 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: no servers hosting shard:

until the server was rebooted.


On other nodes I can see:
node2:

Aug 17, 2014 3:44:58 PM org.apache.solr.cloud.RecoveryStrategy close
WARNING: Stopping recovery for 
zkNodeName=10.100.254.103:8080_solr_myappcore=myapp

Aug 17, 2014 3:44:58 PM org.apache.solr.cloud.RecoveryStrategy close
WARNING: Stopping recovery for 
zkNodeName=10.100.254.103:8080_solr_myapp2core=myapp2

Aug 17, 2014 3:46:24 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: 
org.apache.solr.client.solrj.SolrServerException: IOException occured 
when talking to server at: http://node1:8080/solr/myapp


node4:

Aug 17, 2014 3:44:06 PM org.apache.solr.cloud.RecoveryStrategy close
WARNING: Stopping recovery for 
zkNodeName=10.100.254.105:8080_solr_myapp2core=myapp2

Aug 17, 2014 3:44:09 PM org.apache.solr.cloud.RecoveryStrategy close
WARNING: Stopping recovery for 
zkNodeName=10.100.254.105:8080_solr_myappcore=myapp

Aug 17, 2014 3:45:37 PM org.apache.solr.common.SolrException log
SEVERE: There was a problem finding the leader in 
zk:org.apache.solr.common.SolrException: Could not get leader props





My impression is that garbage collector is at fault here.

This is the cmdline of tomcat:

/usr/lib/jvm/java-7-openjdk-amd64/bin/java 
-Djava.util.logging.config.file=/var/lib/tomcat7/conf/logging.properties 
-Djava.awt.headless=true -Xmx8192m -XX:+UseConcMarkSweepGC -DnumShards=2 
-Djetty.port=8080 
-DzkHost=10.215.1.96:2181,10.215.1.97:2181,10.215.1.98:2181 
-javaagent:/opt/newrelic/newrelic.jar -Dcom.sun.management.jmxremote 
-Dcom.sun.management.jmxremote.port=9010 
-Dcom.sun.management.jmxremote.local.only=false 
-Dcom.sun.management.jmxremote.authenticate=false 
-Dcom.sun.management.jmxremote.ssl=false 
-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djav 
.endorsed.dirs=/usr/share/tomcat7/endorsed -classpath 
/usr/share/tomcat7/bin/bootstrap.jar:/usr/share/tomcat7/bin/tomcat-juli.jar 
-Dcatalina.base=/var/lib/tomcat7 -Dcatalina.home=/usr/share/tomcat7 
-Djava.io.tmpdir=/tmp/tomcat7-tomcat7-tmp 
org.apache.catalina.startup.Bootstrap start



So, I am using MarkSweepGC.

Do you have any suggestion how can I debug this further and potentially 
eliminate the issue causing downtimes?

Re: How to restore an index from a backup over HTTP

2014-08-18 Thread Jeff Wartes


I¹m able to do cross-solrcloud-cluster index copy using nothing more than
careful use of the ³fetchindex² replication handler command.

I¹m using this as a build/deployment tool, so I manually create a
collection in two clusters, index into one, test, and then ask the other
cluster to fetchindex from it on each shard/replica.

Some caveats:
  1. It seems like fetchindex may silently decline if it thinks the index
it has is newer.
  2. I¹m not doing this on an index that¹s currently receiving updates.
  3. SolrCloud replication doesn¹t come into this flow, even if you
fetchindex on a leader. (although once you¹re done, updates should get
replicated normally)
  4. Both collections must be created with the same number of shards and
sharding mechanism. (although replication factor can vary)
 

I¹ve got a tool for automating this that I¹d like to push to github at
some point, let me know if you¹re interested.





On 8/16/14, 3:03 AM, "Greg Solovyev"  wrote:

>Thanks Shawn, this is a pretty cool idea. Adding the handler seems pretty
>straight forward, but the main concern I have is the internal data format
>that ReplicationHandler and SnapPuller use. This new handler as well as
>the code that I've already written to download the index files from Solr
>will depend on that format. Unfortunately, this format is not documented
>and is not abstracted by SolrJ, so I wonder what I can do to make sure it
>does not change on us without notice.
>
>Thanks,
>Greg
>
>- Original Message -
>From: "Shawn Heisey" 
>To: solr-user@lucene.apache.org
>Sent: Friday, August 15, 2014 7:31:19 PM
>Subject: Re: How to restore an index from a backup over HTTP
>
>On 8/15/2014 5:51 AM, Greg Solovyev wrote:
>> What I want to achieve is being able to send the backed up index to
>>Solr (either standalone or with ZooKeeper) in a way similar to creating
>>a new Collection. I.e. create a new collection and upload an exiting
>>index directly into that Collection. I've looked through Solr code and
>>so far I have not found a handler that would allow this scenario. So,
>>the last idea is to implement a special handler for this case, perhaps
>>extending CoreAdminHandler. ReplicationHandler together with SnapPuller
>>do pretty much what I need to do, except that the action has to be
>>initiated by the receiving Solr server and I need to initiate the action
>>externally. I.e., instead of having Solr slave download an index from
>>Solr master, I need to feed the index to Solr master and ideally this
>>would work the same way in standalone and SolrCloud modes.
>
>I have not made any attempt to verify what I'm stating below.  It may
>not work.
>
>What I think I would *try* is setting up a standalone Solr (no cloud) on
>the backup server.  Use scripted index/config copies and Solr start/stop
>actions to get the index up and running on a known core in the
>standalone Solr.  Then use the replication handler's HTTP API to
>replicate the index from that standalone server to each of the replicas
>in your cluster.
>
>https://wiki.apache.org/solr/SolrReplication#HTTP_API
>https://cwiki.apache.org/confluence/display/solr/Index+Replication#IndexRe
>plication-HTTPAPICommandsfortheReplicationHandler
>
>One thing that I do not know is whether SolrCloud itself might interfere
>with these actions, or whether it might automatically take care of
>additional replicas if you replicate to the shard leader.  If SolrCloud
>*would* interfere, then this idea might need special support in
>SolrCloud, perhaps as an extension to the Collections API.  If it won't
>interfere, then the use-case would need to be documented (on the user
>wiki at a minimum) so that committers will be aware of it and preserve
>the capability in future versions.  An extension to the Collections API
>might be a good idea either way -- I've seen a number of questions about
>capability that falls under this basic heading.
>
>Thanks,
>Shawn

Re: How to restore an index from a backup over HTTP

2014-08-18 Thread Greg Solovyev

Thanks Jeff, I'd be interested in taking a look at the code for this tool. My 
github ID is grishick.

Thanks,
Greg

- Original Message -
From: "Jeff Wartes" 
To: solr-user@lucene.apache.org
Sent: Monday, August 18, 2014 9:49:28 PM
Subject: Re: How to restore an index from a backup over HTTP

I¹m able to do cross-solrcloud-cluster index copy using nothing more than
careful use of the ³fetchindex² replication handler command.

I¹m using this as a build/deployment tool, so I manually create a
collection in two clusters, index into one, test, and then ask the other
cluster to fetchindex from it on each shard/replica.

Some caveats:
  1. It seems like fetchindex may silently decline if it thinks the index
it has is newer.
  2. I¹m not doing this on an index that¹s currently receiving updates.
  3. SolrCloud replication doesn¹t come into this flow, even if you
fetchindex on a leader. (although once you¹re done, updates should get
replicated normally)
  4. Both collections must be created with the same number of shards and
sharding mechanism. (although replication factor can vary)
 

I¹ve got a tool for automating this that I¹d like to push to github at
some point, let me know if you¹re interested.





On 8/16/14, 3:03 AM, "Greg Solovyev"  wrote:

>Thanks Shawn, this is a pretty cool idea. Adding the handler seems pretty
>straight forward, but the main concern I have is the internal data format
>that ReplicationHandler and SnapPuller use. This new handler as well as
>the code that I've already written to download the index files from Solr
>will depend on that format. Unfortunately, this format is not documented
>and is not abstracted by SolrJ, so I wonder what I can do to make sure it
>does not change on us without notice.
>
>Thanks,
>Greg
>
>- Original Message -
>From: "Shawn Heisey" 
>To: solr-user@lucene.apache.org
>Sent: Friday, August 15, 2014 7:31:19 PM
>Subject: Re: How to restore an index from a backup over HTTP
>
>On 8/15/2014 5:51 AM, Greg Solovyev wrote:
>> What I want to achieve is being able to send the backed up index to
>>Solr (either standalone or with ZooKeeper) in a way similar to creating
>>a new Collection. I.e. create a new collection and upload an exiting
>>index directly into that Collection. I've looked through Solr code and
>>so far I have not found a handler that would allow this scenario. So,
>>the last idea is to implement a special handler for this case, perhaps
>>extending CoreAdminHandler. ReplicationHandler together with SnapPuller
>>do pretty much what I need to do, except that the action has to be
>>initiated by the receiving Solr server and I need to initiate the action
>>externally. I.e., instead of having Solr slave download an index from
>>Solr master, I need to feed the index to Solr master and ideally this
>>would work the same way in standalone and SolrCloud modes.
>
>I have not made any attempt to verify what I'm stating below.  It may
>not work.
>
>What I think I would *try* is setting up a standalone Solr (no cloud) on
>the backup server.  Use scripted index/config copies and Solr start/stop
>actions to get the index up and running on a known core in the
>standalone Solr.  Then use the replication handler's HTTP API to
>replicate the index from that standalone server to each of the replicas
>in your cluster.
>
>https://wiki.apache.org/solr/SolrReplication#HTTP_API
>https://cwiki.apache.org/confluence/display/solr/Index+Replication#IndexRe
>plication-HTTPAPICommandsfortheReplicationHandler
>
>One thing that I do not know is whether SolrCloud itself might interfere
>with these actions, or whether it might automatically take care of
>additional replicas if you replicate to the shard leader.  If SolrCloud
>*would* interfere, then this idea might need special support in
>SolrCloud, perhaps as an extension to the Collections API.  If it won't
>interfere, then the use-case would need to be documented (on the user
>wiki at a minimum) so that committers will be aware of it and preserve
>the capability in future versions.  An extension to the Collections API
>might be a good idea either way -- I've seen a number of questions about
>capability that falls under this basic heading.
>
>Thanks,
>Shawn

Re: How to restore an index from a backup over HTTP

2014-08-18 Thread Greg Solovyev

Shawn, the format that I am referencing is "filestream", which starts with 2 
bytes carrying file size, then 4 bytes carrying checksum (optional) and then 
the actual bits of the file.

Thanks,
Greg

- Original Message -
From: "Shawn Heisey" 
To: solr-user@lucene.apache.org
Sent: Sunday, August 17, 2014 12:28:12 AM
Subject: Re: How to restore an index from a backup over HTTP

On 8/16/2014 4:03 AM, Greg Solovyev wrote:
> Thanks Shawn, this is a pretty cool idea. Adding the handler seems pretty 
> straight forward, but the main concern I have is the internal data format 
> that ReplicationHandler and SnapPuller use. This new handler as well as the 
> code that I've already written to download the index files from Solr will 
> depend on that format. Unfortunately, this format is not documented and is 
> not abstracted by SolrJ, so I wonder what I can do to make sure it does not 
> change on us without notice.

I am not really sure what format you're referencing here, but I'm about
99% sure the format *over the wire* is javabin.  When the javabin format
changed between 1.4.1 and 3.1.0, replication between those versions
became impossible.

Historical: The Solr version made a huge leap after the Solr and Lucene
development was merged -- it was synchronized with the Lucene version.
There are no 1.5, 2.x, or 3.0 versions of Solr.

https://issues.apache.org/jira/browse/SOLR-2204

Thanks,
Shawn

Re: solr cloud going down repeatedly

2014-08-18 Thread Shawn Heisey

On 8/18/2014 11:30 AM, Jakov Sosic wrote:
> My impression is that garbage collector is at fault here.
>
> This is the cmdline of tomcat:
>
> /usr/lib/jvm/java-7-openjdk-amd64/bin/java
> -Djava.util.logging.config.file=/var/lib/tomcat7/conf/logging.properties
> -Djava.awt.headless=true -Xmx8192m -XX:+UseConcMarkSweepGC
> -DnumShards=2 -Djetty.port=8080
> -DzkHost=10.215.1.96:2181,10.215.1.97:2181,10.215.1.98:2181
> -javaagent:/opt/newrelic/newrelic.jar -Dcom.sun.management.jmxremote
> -Dcom.sun.management.jmxremote.port=9010
> -Dcom.sun.management.jmxremote.local.only=false
> -Dcom.sun.management.jmxremote.authenticate=false
> -Dcom.sun.management.jmxremote.ssl=false
> -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
> -Djav .endorsed.dirs=/usr/share/tomcat7/endorsed -classpath
> /usr/share/tomcat7/bin/bootstrap.jar:/usr/share/tomcat7/bin/tomcat-juli.jar
> -Dcatalina.base=/var/lib/tomcat7 -Dcatalina.home=/usr/share/tomcat7
> -Djava.io.tmpdir=/tmp/tomcat7-tomcat7-tmp
> org.apache.catalina.startup.Bootstrap start

With an 8GB heap and "UseConcMarkSweepGC" as your only GC tuning, I can
pretty much guarantee that you'll see occasional GC pauses of 10-15
seconds, because I saw exactly that happening with my own setup.

This is what I use now:

http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning

I can't claim that my problem is 100% solved, but collections that go
over one second are *very* rare now, and I'm pretty sure they are all
under two seconds.

Thanks,
Shawn

Need details on this query

2014-08-18 Thread bbarani

Hi,

This might be a silly question..

I came across the below query online but I couldn't really understand the
bolded part. Can someone help me understanding this part of the query?

deviceType_:"Cell" OR deviceType_:"Prepaid" *OR (phone
-data_source_name:("Catalog" OR "Device How To - Interactive" OR "Device How
To - StepByStep"))*

Thanks,
Barani



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Need-details-on-this-query-tp4153606.html
Sent from the Solr - User mailing list archive at Nabble.com.

how to generate stats based on time segments?

2014-08-18 Thread abhayd

hi 
I have dataset in solr like

   id|time|price|
   1|t0|100|
   1|t1|10|
   1|t2|20|
   1|t3|30|

What i want is when i query solr for time > t0 I want to return data like
t1, 100
rest,60 ( which is sum of price for t1,t2,t3)

Is that something can be done?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-generate-stats-based-on-time-segments-tp4153607.html
Sent from the Solr - User mailing list archive at Nabble.com.

faceted query with stats not working in solrj

2014-08-18 Thread tedsolr

Hi. I have a query that works just fine in the browser. It rolls up documents
by the facet field and gives me stats on the stats field:

http://localhost:8983/solr/corename/select?q=*:*&stats=on&stats.field=Spend&stats.facet=Supplier

Posting this works just fine. However I cannot get stats from SolrJ or the
solr admin console. From the admin console (on the Query tab) I see:
can not use FieldCache on a field which is neither indexed
nor has doc values: Supplier?wt=xml

Both Spend and Supplier are indexed. The error must be referring to
something else.

In Java, I use 
query.addStatsFieldFacets("Spend", "Supplier"); 
but the stats object comes back null.
response.getFieldStatsInfo() == null

Thanks so much for any suggestions.
using solr 4.9



--
View this message in context: 
http://lucene.472066.n3.nabble.com/faceted-query-with-stats-not-working-in-solrj-tp4153608.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: faceted query with stats not working in solrj

2014-08-18 Thread Shawn Heisey

On 8/18/2014 12:47 PM, tedsolr wrote:
> Hi. I have a query that works just fine in the browser. It rolls up documents
> by the facet field and gives me stats on the stats field:
>
> http://localhost:8983/solr/corename/select?q=*:*&stats=on&stats.field=Spend&stats.facet=Supplier
>
> Posting this works just fine. However I cannot get stats from SolrJ or the
> solr admin console. From the admin console (on the Query tab) I see:
> can not use FieldCache on a field which is neither indexed
> nor has doc values: Supplier?wt=xml
>
> Both Spend and Supplier are indexed. The error must be referring to
> something else.
>
> In Java, I use 
> query.addStatsFieldFacets("Spend", "Supplier"); 
> but the stats object comes back null.
> response.getFieldStatsInfo() == null

I won't claim to know how the stats stuff works, but one thing to do is
make sure Solr is logging at the INFO level or finer, then look at the
Solr log to see what the differences are in the actual query that Solr
is receiving when you do it in the browser and when you do it with
SolrJ.  You will need to look at the actual log file, not the logging
tab in the admin UI.  When using the example included in the Solr
download, the logfile is at logs/solr.log.   If you're using another
method for starting Solr, that may be different.

Thanks,
Shawn

Currency field type not supported for stats

2014-08-18 Thread tedsolr

Just looking for confirmation that the currency field is not supported for
stats. When I use a currency field as the stats.field I get his error:

http://localhost:8983/solr/corename/select?q=*:*&stats=on&stats.field=SpendAsCurrency&stats.facet=Supplier

Field type
currency{class=org.apache.solr.schema.CurrencyField,analyzer=org.apache.solr.schema.FieldType$DefaultAnalyzer,args={precisionStep=8,
multiValued=false, currencyConfig=currency.xml, defaultCurrency=USD,
class=solr.CurrencyField}} is not currently supported

When I run stats on a long type it works fine. I can of course work around
this by modifying my schema. So is currency not a numeric type in solr?

thanks 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Currency-field-type-not-supported-for-stats-tp4153610.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to search for phrase "IAE_UPC_0001"

2014-08-18 Thread Paul Rogers

Hi Erick

Thanks for the assist.  Did as you suggested (tho' I used Nutch).  Cleared
out solr's index and Nutch's crawl DB and then emptied all the documents
out of the web server bar 10 of each type (IAE-UPC- and IAE_UPC_).
 Then crawled the site using Nutch.

Then confirmed that all 20 docs had been uploaded and that *.* search
returned all 20 docs.

Now when I do a url search on either (for example) q=url:"IAE-UPC-220" or
q="IAE_UPC_0001" I get a result returned for each as expected, ie it now
works as expected.

So seems I now need to figure out why Nutch isn't crawling the documents.

Again many thanks.

P




On 18 August 2014 11:22, Erick Erickson  wrote:

> I'd pull Nutch out of the mix here as a test. Create
> some test docs (use the exampleDocs directory?) and
> go from there at least long enough to insure that Solr
> does what you expect if the data gets there properly.
>
> You can set this up in about 10 minutes, and test it
> in about 15 more. May save you endless hours.
>
> Because you're conflating two issues here:
> 1> whether Nutch is sending the data
> 2> whether Solr is indexing and searching as you expect.
>
> Some of the Solr/Lucene analysis chains do transformations
> that may not be what you assume, particularly things
> like StandardTokenizer and WordDelimiterFilterFactory.
>
> So I'd take the time to see that the values you're dealing
> with are behaving as you expect. The admin/analysis page
> will help you a _lot_ here.
>
> Best,
> Erick
>
>
>
>
> On Mon, Aug 18, 2014 at 7:16 AM, Paul Rogers 
> wrote:
> > Hi Guys
> >
> > I've been checking into this further and have deleted the index a couple
> of
> > times and rebuilt it with the suggestions you've supplied.
> >
> > I had a bit of an epiphany last week and decided to check if the
> document I
> > was searching for was actually in the index (did this by doing a *.*
> query
> > to a file and grep'ing for the 'IAE_UPC_0001@ string).  It seems it
> isn't!!
> > Not sure if it was in the original index or not, tho' I suspect not.
> >
> > As far as I can see anything with the reference in the form IAE_UPC_
> > has not been indexed while those with the reference in the form
> > IAE-UPC- has.  Not sure if that's a coincidence or not.
> >
> > Need to see if I can get the docs into the index and then check if the
> > search works or not.  Will see if the guys on the Nutch list can shed any
> > light.
> >
> > All the best.
> >
> > P
> >
> >
> > On 4 August 2014 17:09, Jack Krupansky  wrote:
> >
> >> The standard tokenizer treats underscore as a valid token character,
> not a
> >> delimiter.
> >>
> >> The word delimiter filter will treat underscore as a delimiter though.
> >>
> >> Make sure your query-time WDF does not have preserveOriginal="1" - but
> the
> >> index-time WDF should have preserveOriginal="1". Otherwise, the query
> >> phrase will generate an extra token which will participate in the
> matching
> >> and might cause a mismatch.
> >>
> >> -- Jack Krupansky
> >>
> >> -Original Message- From: Paul Rogers
> >> Sent: Monday, August 4, 2014 5:55 PM
> >>
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: How to search for phrase "IAE_UPC_0001"
> >>
> >> Hi Guys
> >>
> >> Thanks for the replies.  I've had a look at the
> WordDelimiterFilterFactory
> >> and the Term Info for the url field.  It seems that all the terms exist
> and
> >> I now understand that each url is being broken up using the delimiters
> >> specified.  But I think I'm still missing something.
> >>
> >> Am I correct in assuming the minus sign (-) is also a delimiter?
> >>
> >> If so why then does  url:"IAE-UPC-0001" return a result (when the url
> >> contains the substring IAE-UPC-0001) whereas  url:"IAE_UPC_0001" doesn't
> >> (when the url contains the substring IAE_UPC_0001)?
> >>
> >> Secondly if the url has indeed been broken into the terms IAE UPC and
> 0001
> >> why do all the searches suggested or tried succeed when the delimiter
> is a
> >> minus sign (-) but not when the delimiter is an underscore (_),
> returning
> >> zero matches?
> >>
> >> Finally, shouldn't the query url:"IAE UPC 0001"~1 work since all it is
> >> looking for is the three terms?
> >>
> >> Many thanks for any enlightenment.
> >>
> >> P
> >>
> >>
> >>
> >>
> >> On 4 August 2014 01:33, Harald Kirsch 
> wrote:
> >>
> >>  This all depends on how the tokenizers take your URLs apart. To quickly
> >>> see what ended up in the index, go to a core in the UI, select Schema
> >>> Browser, select the field containing your URLs, click on "Load Term
> Info".
> >>>
> >>> In your case, for the field holding the URL you could try to switch to
> a
> >>> tokenizer that defines tokens as a sequence of alphanumeric characters,
> >>> roughly [a-z0-9]+ plus diacritics. In particular punctuation and
> >>> separation
> >>> characters like dash, underscore, slash, dot and the like would never
> be
> >>> part of a token, i.e. they don't make a difference.
> >>>
> >>> Then you can search th

logging in solr

2014-08-18 Thread M, Arjun (NSN - IN/Bangalore)

Hi,

Currently in my component Solr is logging to catalina.out. What is the 
configuration needed to redirect those logs to some custom logfile eg: Solr.log.

Thanks...

--Arjun

Re: Combining a String Tag with a Numeric Value

2014-08-18 Thread Dave Seltzer

Thanks Erick,

I'm not sure I need to score the documents based on the numeric value, but
I am interested in being able to calculate the average (Mean) of all the
numeric values for a given tag. For example, what is the average confidence
of Tag1 across all documents.

I'm not sure I can do that without building a FunctionQuery.

-Dave


On Mon, Aug 18, 2014 at 12:46 PM, Erick Erickson 
wrote:

> Hmmm, there's no particular "right way". It'd be simpler
> to index these as two separate fields _if_ there's only
> one pair per document. If there are more and you index them
> as two mutliValued fields, there's no good way at _query_ time
> to retain the association. The returned multiValued fields are
> guaranteed to be in the same order of insertion so you can
> display the correct pairs, but you can't use the association
> to score docs. Hmmm, somewhat abstract. OK say you want to
> associate two tag/value pairs, tag1:50 and tag2:100. Say further
> that you have two multiValued fields, Tags and Values and then
> index tag1 and tag2 into Tags and 50 and 100 into Values.
> There's no good way to express "q=tags:tag1 and factor the
> associated value of 50 into the score"
>
> Note that the returned _values_ will be
> Tags:   tag1 tag2
> Values  50  100
>
> So at that point you can see the associations.
>
> that said, if there's only _one_ such tag/value pair per document,
> it's easy to write a FunctionQuery (
> http://wiki.apache.org/solr/FunctionQuery)
> that does this.
>
> ***
>
> If you have many tag/value pairs, payloads are probably what you want.
> Here's an end-to-end example:
>
> http://searchhub.org/2014/06/13/end-to-end-payload-example-in-solr/
>
> Best,
> Erick
>
> On Mon, Aug 18, 2014 at 7:32 AM, Dave Seltzer  wrote:
> > Hello!
> >
> > I have some new entity data that I'm indexing which takes the form of:
> >
> > String: EntityString
> > Float: Confidence
> >
> > I want to add these to a generic "Tags" field (for faceting), but I'm not
> > sure how to hold onto the confidence. Token Payloads seem like one
> method,
> > but then I'm not sure how to extract the Payload.
> >
> > Alternatively I could create two fields: TagIndexed which stores just the
> > string value and TagStored which contains a delimited String|Float.
> >
> > What's the right way to do this?
> >
> > Thanks!
> >
> > -D

Re: logging in solr

2014-08-18 Thread Aurélien MAZOYER


Hi,

Are you using tomcat or jetty? If you use the default jetty, have a look 
to : http://wiki.apache.org/solr/LoggingInDefaultJettySetup


Regards,

Aurélien


Le 18/08/2014 22:43, M, Arjun (NSN - IN/Bangalore) a écrit :

Hi,

 Currently in my component Solr is logging to catalina.out. What is the 
configuration needed to redirect those logs to some custom logfile eg: Solr.log.

 Thanks...

--Arjun

Re: logging in solr

2014-08-18 Thread Aurélien MAZOYER

Sorry, outdated link. And I suppose you use tomcat if you are talking 
about catalina.out The correct link is : 
http://wiki.apache.org/solr/SolrLogging#Solr_4.3_and_above



Le 18/08/2014 23:06, Aurélien MAZOYER a écrit :


Hi,

Are you using tomcat or jetty? If you use the default jetty, have a 
look to : http://wiki.apache.org/solr/LoggingInDefaultJettySetup


Regards,

Aurélien


Le 18/08/2014 22:43, M, Arjun (NSN - IN/Bangalore) a écrit :

Hi,

 Currently in my component Solr is logging to catalina.out. 
What is the configuration needed to redirect those logs to some 
custom logfile eg: Solr.log.


 Thanks...

--Arjun

Re: Need details on this query

2014-08-18 Thread Erick Erickson

OR (phone
-data_source_name:("Catalog" OR "Device How To - Interactive" OR "Device How
To - StepByStep"))

Just an OR clause that searches for all documents that have "phone" ( in the
default search field or multiple fields if its an edismax parser). Remove from
that set any documents with a data_source_name that contains any of the
three phrases:
"Catalog"
"Device How To - Interactive"
"Device How To - StepByStep"

and return all those documents in the query

HTH,
ErIck

On Mon, Aug 18, 2014 at 11:42 AM, bbarani  wrote:
> Hi,
>
> This might be a silly question..
>
> I came across the below query online but I couldn't really understand the
> bolded part. Can someone help me understanding this part of the query?
>
> deviceType_:"Cell" OR deviceType_:"Prepaid" *OR (phone
> -data_source_name:("Catalog" OR "Device How To - Interactive" OR "Device How
> To - StepByStep"))*
>
> Thanks,
> Barani
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Need-details-on-this-query-tp4153606.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to search for phrase "IAE_UPC_0001"

2014-08-18 Thread Erick Erickson

NP, glad you're making forward progress!

Erick

On Mon, Aug 18, 2014 at 12:31 PM, Paul Rogers  wrote:
> Hi Erick
>
> Thanks for the assist.  Did as you suggested (tho' I used Nutch).  Cleared
> out solr's index and Nutch's crawl DB and then emptied all the documents
> out of the web server bar 10 of each type (IAE-UPC- and IAE_UPC_).
>  Then crawled the site using Nutch.
>
> Then confirmed that all 20 docs had been uploaded and that *.* search
> returned all 20 docs.
>
> Now when I do a url search on either (for example) q=url:"IAE-UPC-220" or
> q="IAE_UPC_0001" I get a result returned for each as expected, ie it now
> works as expected.
>
> So seems I now need to figure out why Nutch isn't crawling the documents.
>
> Again many thanks.
>
> P
>
>
>
>
> On 18 August 2014 11:22, Erick Erickson  wrote:
>
>> I'd pull Nutch out of the mix here as a test. Create
>> some test docs (use the exampleDocs directory?) and
>> go from there at least long enough to insure that Solr
>> does what you expect if the data gets there properly.
>>
>> You can set this up in about 10 minutes, and test it
>> in about 15 more. May save you endless hours.
>>
>> Because you're conflating two issues here:
>> 1> whether Nutch is sending the data
>> 2> whether Solr is indexing and searching as you expect.
>>
>> Some of the Solr/Lucene analysis chains do transformations
>> that may not be what you assume, particularly things
>> like StandardTokenizer and WordDelimiterFilterFactory.
>>
>> So I'd take the time to see that the values you're dealing
>> with are behaving as you expect. The admin/analysis page
>> will help you a _lot_ here.
>>
>> Best,
>> Erick
>>
>>
>>
>>
>> On Mon, Aug 18, 2014 at 7:16 AM, Paul Rogers 
>> wrote:
>> > Hi Guys
>> >
>> > I've been checking into this further and have deleted the index a couple
>> of
>> > times and rebuilt it with the suggestions you've supplied.
>> >
>> > I had a bit of an epiphany last week and decided to check if the
>> document I
>> > was searching for was actually in the index (did this by doing a *.*
>> query
>> > to a file and grep'ing for the 'IAE_UPC_0001@ string).  It seems it
>> isn't!!
>> > Not sure if it was in the original index or not, tho' I suspect not.
>> >
>> > As far as I can see anything with the reference in the form IAE_UPC_
>> > has not been indexed while those with the reference in the form
>> > IAE-UPC- has.  Not sure if that's a coincidence or not.
>> >
>> > Need to see if I can get the docs into the index and then check if the
>> > search works or not.  Will see if the guys on the Nutch list can shed any
>> > light.
>> >
>> > All the best.
>> >
>> > P
>> >
>> >
>> > On 4 August 2014 17:09, Jack Krupansky  wrote:
>> >
>> >> The standard tokenizer treats underscore as a valid token character,
>> not a
>> >> delimiter.
>> >>
>> >> The word delimiter filter will treat underscore as a delimiter though.
>> >>
>> >> Make sure your query-time WDF does not have preserveOriginal="1" - but
>> the
>> >> index-time WDF should have preserveOriginal="1". Otherwise, the query
>> >> phrase will generate an extra token which will participate in the
>> matching
>> >> and might cause a mismatch.
>> >>
>> >> -- Jack Krupansky
>> >>
>> >> -Original Message- From: Paul Rogers
>> >> Sent: Monday, August 4, 2014 5:55 PM
>> >>
>> >> To: solr-user@lucene.apache.org
>> >> Subject: Re: How to search for phrase "IAE_UPC_0001"
>> >>
>> >> Hi Guys
>> >>
>> >> Thanks for the replies.  I've had a look at the
>> WordDelimiterFilterFactory
>> >> and the Term Info for the url field.  It seems that all the terms exist
>> and
>> >> I now understand that each url is being broken up using the delimiters
>> >> specified.  But I think I'm still missing something.
>> >>
>> >> Am I correct in assuming the minus sign (-) is also a delimiter?
>> >>
>> >> If so why then does  url:"IAE-UPC-0001" return a result (when the url
>> >> contains the substring IAE-UPC-0001) whereas  url:"IAE_UPC_0001" doesn't
>> >> (when the url contains the substring IAE_UPC_0001)?
>> >>
>> >> Secondly if the url has indeed been broken into the terms IAE UPC and
>> 0001
>> >> why do all the searches suggested or tried succeed when the delimiter
>> is a
>> >> minus sign (-) but not when the delimiter is an underscore (_),
>> returning
>> >> zero matches?
>> >>
>> >> Finally, shouldn't the query url:"IAE UPC 0001"~1 work since all it is
>> >> looking for is the three terms?
>> >>
>> >> Many thanks for any enlightenment.
>> >>
>> >> P
>> >>
>> >>
>> >>
>> >>
>> >> On 4 August 2014 01:33, Harald Kirsch 
>> wrote:
>> >>
>> >>  This all depends on how the tokenizers take your URLs apart. To quickly
>> >>> see what ended up in the index, go to a core in the UI, select Schema
>> >>> Browser, select the field containing your URLs, click on "Load Term
>> Info".
>> >>>
>> >>> In your case, for the field holding the URL you could try to switch to
>> a
>> >>> tokenizer that defines tokens as a sequence of alphanumeric character

Re: Combining a String Tag with a Numeric Value

2014-08-18 Thread Erick Erickson

If you're doing this in a sharded environment, it may be "interesting".

Good Luck!

Erick

On Mon, Aug 18, 2014 at 2:03 PM, Dave Seltzer  wrote:
> Thanks Erick,
>
> I'm not sure I need to score the documents based on the numeric value, but
> I am interested in being able to calculate the average (Mean) of all the
> numeric values for a given tag. For example, what is the average confidence
> of Tag1 across all documents.
>
> I'm not sure I can do that without building a FunctionQuery.
>
> -Dave
>
>
> On Mon, Aug 18, 2014 at 12:46 PM, Erick Erickson 
> wrote:
>
>> Hmmm, there's no particular "right way". It'd be simpler
>> to index these as two separate fields _if_ there's only
>> one pair per document. If there are more and you index them
>> as two mutliValued fields, there's no good way at _query_ time
>> to retain the association. The returned multiValued fields are
>> guaranteed to be in the same order of insertion so you can
>> display the correct pairs, but you can't use the association
>> to score docs. Hmmm, somewhat abstract. OK say you want to
>> associate two tag/value pairs, tag1:50 and tag2:100. Say further
>> that you have two multiValued fields, Tags and Values and then
>> index tag1 and tag2 into Tags and 50 and 100 into Values.
>> There's no good way to express "q=tags:tag1 and factor the
>> associated value of 50 into the score"
>>
>> Note that the returned _values_ will be
>> Tags:   tag1 tag2
>> Values  50  100
>>
>> So at that point you can see the associations.
>>
>> that said, if there's only _one_ such tag/value pair per document,
>> it's easy to write a FunctionQuery (
>> http://wiki.apache.org/solr/FunctionQuery)
>> that does this.
>>
>> ***
>>
>> If you have many tag/value pairs, payloads are probably what you want.
>> Here's an end-to-end example:
>>
>> http://searchhub.org/2014/06/13/end-to-end-payload-example-in-solr/
>>
>> Best,
>> Erick
>>
>> On Mon, Aug 18, 2014 at 7:32 AM, Dave Seltzer  wrote:
>> > Hello!
>> >
>> > I have some new entity data that I'm indexing which takes the form of:
>> >
>> > String: EntityString
>> > Float: Confidence
>> >
>> > I want to add these to a generic "Tags" field (for faceting), but I'm not
>> > sure how to hold onto the confidence. Token Payloads seem like one
>> method,
>> > but then I'm not sure how to extract the Payload.
>> >
>> > Alternatively I could create two fields: TagIndexed which stores just the
>> > string value and TagStored which contains a delimited String|Float.
>> >
>> > What's the right way to do this?
>> >
>> > Thanks!
>> >
>> > -D

Re: logging in solr

2014-08-18 Thread Shawn Heisey

On 8/18/2014 2:43 PM, M, Arjun (NSN - IN/Bangalore) wrote:
> Currently in my component Solr is logging to catalina.out. What is 
> the configuration needed to redirect those logs to some custom logfile eg: 
> Solr.log.

Solr uses the slf4j library for logging.  Simply change your program to
use slf4j, and very likely the logs will go to the same place the Solr
logs do.

http://www.slf4j.org/manual.html

See also the wiki page on logging jars and Solr:

http://wiki.apache.org/solr/SolrLogging

Thanks,
Shawn

[ANNOUNCE] [SECURITY] Recommendation to update Apache POI in Apache Solr 4.8.0, 4.8.1, and 4.9.0 installations

2014-08-18 Thread Uwe Schindler

Hallo Apache Solr Users,

the Apache Lucene PMC wants to make the users of Solr aware of  the following 
issue:

Apache Solr versions 4.8.0, 4.8.1, 4.9.0 bundle Apache POI 3.10-beta2 with its 
binary release tarball. This version (and all previous ones) of Apache POI are 
vulnerable to the following issues:

= CVE-2014-3529: XML External Entity (XXE) problem in Apache POI's OpenXML 
parser =
Type: Information disclosure
Description: Apache POI uses Java's XML components to parse OpenXML files 
produced by Microsoft Office products (DOCX, XLSX, PPTX,...). Applications that 
accept such files from end-users are vulnerable to XML External Entity (XXE) 
attacks, which allows remote attackers to bypass security restrictions and read 
arbitrary files via a crafted OpenXML document that provides an XML external 
entity declaration in conjunction with an entity reference.

= CVE-2014-3574: XML Entity Expansion (XEE) problem in Apache POI's OpenXML 
parser =
Type: Denial of service
Description: Apache POI uses Java's XML components and Apache Xmlbeans to parse 
OpenXML files produced by Microsoft Office products (DOCX, XLSX, PPTX,...). 
Applications that accept such files from end-users are vulnerable to XML Entity 
Expansion (XEE) attacks ("XML bombs"), which allows remote hackers to consume 
large amounts of CPU resources.

The Apache POI PMC released a bugfix version (3.10.1) today.

Solr users are affected by these issues, if they enable the "Apache Solr 
Content Extraction Library (Solr Cell)" contrib module from the folder 
"contrib/extraction" of the release tarball.

Users of Apache Solr are strongly advised to keep the module disabled if they 
don't use it. Alternatively, users of Apache Solr 4.8.0, 4.8.1, or 4.9.0 can 
update the affected libraries by replacing the vulnerable JAR files in the 
distribution folder. Users of previous versions have to update their Solr 
release first, patching older versions is impossible.

To replace the vulnerable JAR files follow these steps:

- Download the Apache POI 3.10.1 binary release: 
http://poi.apache.org/download.html#POI-3.10.1
- Unzip the archive
- Delete the following files in your "solr-4.X.X/contrib/extraction/lib" 
folder: 
# poi-3.10-beta2.jar
# poi-ooxml-3.10-beta2.jar
# poi-ooxml-schemas-3.10-beta2.jar
# poi-scratchpad-3.10-beta2.jar
# xmlbeans-2.3.0.jar
- Copy the following files from the base folder of the Apache POI distribution 
to the "solr-4.X.X/contrib/extraction/lib" folder: 
# poi-3.10.1-20140818.jar
# poi-ooxml-3.10.1-20140818.jar
    # poi-ooxml-schemas-3.10.1-20140818.jar
# poi-scratchpad-3.10.1-20140818.jar
- Copy "xmlbeans-2.6.0.jar" from POI's "ooxml-lib/" folder to the 
"solr-4.X.X/contrib/extraction/lib" folder.
- Verify that the "solr-4.X.X/contrib/extraction/lib" no longer contains any 
files with version number "3.10-beta2".
- Verify that the folder contains one xmlbeans JAR file with version 2.6.0.

If you just want to disable extraction of Microsoft Office documents, delete 
the files above and don't replace them. "Solr Cell" will automatically detect 
this and disable Microsoft Office document extraction.

Coming versions of Apache Solr will have the updated libraries bundled.

Happy Searching and Extracting,
The Apache Lucene Developers

PS: Thanks to Stefan Kopf, Mike Boufford, and Christian Schneider for reporting 
these issues!

-
Uwe Schindler
uschind...@apache.org 
Apache Lucene PMC Member / Committer
Bremen, Germany
http://lucene.apache.org/

Apache Solr Wiki

2014-08-18 Thread Mark Sun

Dear Solr Wiki admin,

We are using Solr for our multilingual asian language keywords search, as
well as visual similarity search engine (via pixolution plugin). We would
like to update the Powered by Solr section. As well as help to add on to
the knowledge base for other Solr setups.

Can you add me, username "MarkSun" as a contributor to the wiki?

Thank you!

Cheers,
Mark Sun
CTO

MotionElements Pte Ltd
190 Middle Road, #10-05 Fortune Centre
Singapore 188979
mark...@motionelements.com

www.motionelements.com
=
Asia-inspired Stock Animation | Video Footage l AE Template online
marketplace
=
This message may contain confidential and/or privileged information.  If
you are not the addressee or authorized to receive this for the addressee,
you must not use, copy, disclose or take any action based on this message
or any information herein. If you have received this message in error,
please advise the sender immediately by reply e-mail and delete this
message.  Thank you for your cooperation.

Re: Apache Solr Wiki

2014-08-18 Thread Erick Erickson

Done, you should have edit rights now!

Best,
Erick

On Mon, Aug 18, 2014 at 6:01 PM, Mark Sun  wrote:
> Dear Solr Wiki admin,
>
> We are using Solr for our multilingual asian language keywords search, as
> well as visual similarity search engine (via pixolution plugin). We would
> like to update the Powered by Solr section. As well as help to add on to
> the knowledge base for other Solr setups.
>
> Can you add me, username "MarkSun" as a contributor to the wiki?
>
> Thank you!
>
> Cheers,
> Mark Sun
> CTO
>
> MotionElements Pte Ltd
> 190 Middle Road, #10-05 Fortune Centre
> Singapore 188979
> mark...@motionelements.com
>
> www.motionelements.com
> =
> Asia-inspired Stock Animation | Video Footage l AE Template online
> marketplace
> =
> This message may contain confidential and/or privileged information.  If
> you are not the addressee or authorized to receive this for the addressee,
> you must not use, copy, disclose or take any action based on this message
> or any information herein. If you have received this message in error,
> please advise the sender immediately by reply e-mail and delete this
> message.  Thank you for your cooperation.

Apache solr sink issue

2014-08-18 Thread Jeniba Johnson

Hi,

I want to index a log file in Solr using Flume + Apache Solr sink
Iam referring this below mentioned URL
https://cwiki.apache.org/confluence/display/FLUME/How+to+Setup+Solr+Sink+for+Flume


Error  from flume console
2014-08-19 15:38:56,451 (concurrentUpdateScheduler-2-thread-1) [ERROR - 
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer.handleError(ConcurrentUpdateSolrServer.java:354)]
 error
java.lang.Exception: Bad Request
request: http://xxx.xx.xx:8983/solr/update?wt=javabin&version=2
at 
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:208)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)


Error  from solr console
473844 [qtp176433427-19] ERROR org.apache.solr.core.SolrCore  â 
org.apache.solr.common.SolrException: Document is missing mandatory uniqueKey 
field: id


Csn anyone help me with this issue and help me with the steps for integrating 
flume with solr sink



Regards,
Jeniba Johnson



The contents of this e-mail and any attachment(s) may contain confidential or 
privileged information for the intended recipient(s). Unintended recipients are 
prohibited from taking action on the basis of information in this e-mail and 
using or disseminating the information, and must notify the sender and delete 
it from their system. L&T Infotech will not accept responsibility or liability 
for the accuracy or completeness of, or the presence of any virus or disabling 
code in this e-mail"

Re: Apache solr sink issue

2014-08-18 Thread Gopal Patwa

Do you have this tag "id" define in your schema , it
is not mandatory to have unique field but if you need it then u have to
provide it else you can remove it, see below wiki page for more details

http://wiki.apache.org/solr/SchemaXml#The_Unique_Key_Field

Some options to generate this field if your document cannot derive one

https://wiki.apache.org/solr/UniqueKey




On Mon, Aug 18, 2014 at 10:48 PM, Jeniba Johnson <
jeniba.john...@lntinfotech.com> wrote:

> Hi,
>
> I want to index a log file in Solr using Flume + Apache Solr sink
> Iam referring this below mentioned URL
>
> https://cwiki.apache.org/confluence/display/FLUME/How+to+Setup+Solr+Sink+for+Flume
>
>
> Error  from flume console
> 2014-08-19 15:38:56,451 (concurrentUpdateScheduler-2-thread-1) [ERROR -
> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer.handleError(ConcurrentUpdateSolrServer.java:354)]
> error
> java.lang.Exception: Bad Request
> request: http://xxx.xx.xx:8983/solr/update?wt=javabin&version=2
> at
> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:208)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:722)
>
>
> Error  from solr console
> 473844 [qtp176433427-19] ERROR org.apache.solr.core.SolrCore  â
> org.apache.solr.common.SolrException: Document is missing mandatory
> uniqueKey field: id
>
>
> Csn anyone help me with this issue and help me with the steps for
> integrating flume with solr sink
>
>
>
> Regards,
> Jeniba Johnson
>
>
> 
> The contents of this e-mail and any attachment(s) may contain confidential
> or privileged information for the intended recipient(s). Unintended
> recipients are prohibited from taking action on the basis of information in
> this e-mail and using or disseminating the information, and must notify the
> sender and delete it from their system. L&T Infotech will not accept
> responsibility or liability for the accuracy or completeness of, or the
> presence of any virus or disabling code in this e-mail"
>

Any recommendation for Solr Cloud version.

2014-08-18 Thread Lee Chunki

Hi,

I am trying to build a new Solr Cloud which will replace sold cluster ( 2 
indexers + 2 searchers ).
the version what I am using is 4.1.

the sooner the better? i.e. version 4.9.0.

Please give any suggestion for me.

Thanks,
Chunki.

Exact match?

2014-08-18 Thread William Bell

If I have a long string, how do I match on 90% of the terms to see if there
is a duplicate?

If I add the field and index it, what is the best way to return 90%?

# terms match
# of terms in the field?


-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076