Re: SORTING RESULTS BASED ON RELAVANCY

2013-09-18 Thread Alexandre Rafalovitch
The default sort is by relevancy. So, if you are getting it in the wrong
order, it think it is relevant in different ways. Depending on algorithm
you use, there are different boosting functions.

You may need to give more details. Algorithm, how would you know if
relevance sorting working, etc.

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Wed, Sep 18, 2013 at 1:50 PM, PAVAN  wrote:

> Hi,
>
>i am using fuzzy logic and it is giving exact results but i need to
> sort the results based on relavancy. Means closer match results comes
> first.
>
> anyone can help with this..
>
>
> Regards,
> Pavan.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/SORTING-RESULTS-BASED-ON-RELAVANCY-tp4090789.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Re-Ranking results based on DocValues with custom function.

2013-09-18 Thread Mathias Lux
Got it! Just for you to share ... and maybe for inclusion in the Java
API docs of ValueSource :)

For sorting one needs to implement the method

public double doubleVal(int) of the class ValueSource

then it works like a charm.

cheers,
  Mathias

On Tue, Sep 17, 2013 at 6:28 PM, Chris Hostetter
 wrote:
>
> : It basically allows for searching for text (which is associated to an
> : image) in an index and then getting the distance to a sample image
> : (base64 encoded byte[] array) based on one of five different low level
> : content based features stored as DocValues.
>
> very cool.
>
> : So there one little tiny question I still have ;) When I'm trying to
> : do a "sort" I'm getting
> :
> : "msg": "sort param could not be parsed as a query, and is not a field
> : that exists in the index:
> : lirefunc(cl_hi,FQY5DhMYDg0ODg0PEBEPDg4ODg8QEgsgEBAQEBAgEBAQEBA=)",
> :
> : for the call 
> http://localhost:9000/solr/lire/select?q=*%3A*&sort=lirefunc(cl_hi%2CFQY5DhMYDg0ODg0PEBEPDg4ODg8QEgsgEBAQEBAgEBAQEBA%3D)+asc&fl=id%2Ctitle%2Clirefunc(cl_hi%2CFQY5DhMYDg0ODg0PEBEPDg4ODg8QEgsgEBAQEBAgEBAQEBA%3D)&wt=json&indent=true
>
> Hmmm...
>
> i think the crux of the issue is your string literal.  function parsing
> tries to make live easy for you by not requiring string literals to be
> quoted unless they conflict with other function names or field names
> etc  on top of that the sort parsing code is kind of hueristic based
> (because it has to account for both functions or field names or wildcards,
> followed by other sort clauses, etc...) so in that context the special
> characters like '=' in your base64 string literal might be confusing hte
> hueristics.
>
> can you try to quote the string literal it and see if that works?
>
> For example, when i try using strdist with your base64 string in a sort
> param using the example configs i get the same error...
>
> http://localhost:8983/solr/select?q=*:*&sort=strdist%28name,FQY5DhMYDg0ODg0PEBEPDg4ODg8QEgsgEBAQEBAgEBAQEBA=,jw%29+asc
>
> but if i quote the string literal it works fine...
>
> http://localhost:8983/solr/select?q=*:*&sort=strdist%28name,%27FQY5DhMYDg0ODg0PEBEPDg4ODg8QEgsgEBAQEBAgEBAQEBA=%27,jw%29+asc
>
>
>
> -Hoss



-- 
Dr. Mathias Lux
Assistant Professor, Klagenfurt University, Austria
http://tinyurl.com/mlux-itec


Re: Stop zookeeper from batch

2013-09-18 Thread Prasi S
Yeah, but its not yet into the zookeeper's latest releases. Is it fine with
using it.


On Wed, Sep 18, 2013 at 2:39 AM, Furkan KAMACI wrote:

> Are you looking for that:
>
> https://issues.apache.org/jira/browse/ZOOKEEPER-1122
>
> 16 Eylül 2013 Pazartesi tarihinde Prasi S  adlı
> kullanıcı şöyle yazdı:
> > Hi,
> > We have setup solrcloud with zookeeper and 2 tomcats . we are using a
> batch
> > file to start the zookeeper, uplink config files and start tomcats.
> >
> > Now, i need to stop zookeeper from the batch file. How is this possible.
> >
> > Im using Windows server. Zookeeper 3.4.5 version.
> >
> > Pls help.
> >
> > Thanks,
> > Prasi
> >
>


Re: FAcet with " " values are displayes in output

2013-09-18 Thread Upayavira
Filter them out in your query, or in your display code.

Upayavira

On Wed, Sep 18, 2013, at 06:36 AM, Prasi S wrote:
> Hi ,
> Im using solr 4.4 for our search. When i query for a keyword, it returns
> empty valued facets in the response
> 
> 
> 
> 
> 
> *1*
> 1
> 
> 
> 
> 
> 
> 
> I have also tried using facet.missing parameter., but no change. How can
> we
> handle this.
> 
> 
> Thanks,
> Prasi


Re: SORTING RESULTS BASED ON RELAVANCY

2013-09-18 Thread Gora Mohanty
On 18 September 2013 12:39, Alexandre Rafalovitch  wrote:
> The default sort is by relevancy. So, if you are getting it in the wrong
> order, it think it is relevant in different ways. Depending on algorithm
> you use, there are different boosting functions.
[...]

Also, you can get an explanation of the scoring by adding
&debugQuery=on to the Solr search URL. Please see
http://wiki.apache.org/solr/CommonQueryParameters#debugQuery

Regards,
Gora


Re: Solr SpellCheckComponent only shows results with certain fields

2013-09-18 Thread Raheel Hasan
what about this query? try to see if you get suggestions here:
/solr/collection1/select?q=*%3Abecaus&wt=json&indent=true&spellcheck=true


On Wed, Sep 18, 2013 at 4:02 AM, jazzy  wrote:

> I'm trying to get the Solr SpellCheckComponent working but am running into
> some issues. When I run
> .../solr/collection1/select?q=%3A&wt=json&indent=true
>
> These results are returned
>
> {
>   "responseHeader": {
> "status": 0,
> "QTime": 1,
> "params": {
>   "indent": "true",
>   "q": "*:*",
>   "_": "1379457032534",
>   "wt": "json"
> }
>   },
>   "response": {
> "numFound": 2,
> "start": 0,
> "docs": [
>   {
> "enterprise_name": "because",
> "name": "doc1",
> "enterprise_id": "100",
> "_version_": 1446463888248799200
>   },
>   {
> "enterprise_name": "what",
> "name": "RZTEST",
> "enterprise_id": "102",
> "_version_": 1446464432735518700
>   }
> ]
>   }
> }
> Those are the values that I have indexed. Now when I want to query for
> spelling I get some weird results.
>
> When I run
>
> .../solr/collection1/select?q=name%3Arxtest&wt=json&indent=true&spellcheck=true
>
> The results are accurate and I get
>
> {
>   "responseHeader":{
> "status":0,
> "QTime":4,
> "params":{
>   "spellcheck":"true",
>   "indent":"true",
>   "q":"name:rxtest",
>   "wt":"json"}},
>   "response":{"numFound":0,"start":0,"docs":[]
>   },
>   "spellcheck":{
> "suggestions":[
>   "rxtest",{
> "numFound":1,
> "startOffset":5,
> "endOffset":11,
> "suggestion":["rztest"]}]}}
> Anytime I run a query without the name values I get 0 results back.
>
> /solr/collection1/select?q=enterprise_name%3Abecaus&wt=json&indent=true&spellcheck=true
>
> {
>   "responseHeader":{
> "status":0,
> "QTime":5,
> "params":{
>   "spellcheck":"true",
>   "indent":"true",
>   "q":"enterprise_name:becaus",
>   "wt":"json"}},
>   "response":{"numFound":0,"start":0,"docs":[]
>   },
>   "spellcheck":{
> "suggestions":[]}}
> My guess is that there is something wrong in my scheme but everything looks
> fine.
>
> Schema.xml
>
> 
>  required="true" />
>  stored="true"/>
>
>  multiValued="true" />
>
> stored="true"/>
> stored="true" multiValued="true"/>
> stored="true" multiValued="true"/>
>
>  
>  
>
>
>  positionIncrementGap="100">
>   
> 
>  words="stopwords.txt" />
>
> 
>   
>   
> 
>  words="stopwords.txt" />
>  ignoreCase="true" expand="true"/>
> 
>   
> 
> solrconfig.xml
>
> 
>
>  
>explicit
>10
>text
>
>default
>
>   wordbreak
>
>   false
>
>   false
>
>   5
> 
>
>  
> spellcheck
>   
> 
>
> 
>
>   
>
> default
>
> solr.IndexBasedSpellChecker
>
> name
>
> ./spellchecker
>
> 0.5
>
> .0001
> true
>   
>
>   
> wordbreak
> solr.WordBreakSolrSpellChecker
> name
> true
> true
> 3
> true
>   
>
>
>   text_general
> 
>
> Any help would be appreciated.
> Thanks!
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-SpellCheckComponent-only-shows-results-with-certain-fields-tp4090727.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Regards,
Raheel Hasan


Re: FAcet with " " values are displayes in output

2013-09-18 Thread Prasi S
How to filter them in the query itself?

Thanks,
Prasi


On Wed, Sep 18, 2013 at 1:06 PM, Upayavira  wrote:

> Filter them out in your query, or in your display code.
>
> Upayavira
>
> On Wed, Sep 18, 2013, at 06:36 AM, Prasi S wrote:
> > Hi ,
> > Im using solr 4.4 for our search. When i query for a keyword, it returns
> > empty valued facets in the response
> >
> > 
> > 
> > 
> > 
> > *1*
> > 1
> > 
> > 
> > 
> > 
> > 
> >
> > I have also tried using facet.missing parameter., but no change. How can
> > we
> > handle this.
> >
> >
> > Thanks,
> > Prasi
>


Re: FAcet with " " values are displayes in output

2013-09-18 Thread Erik Hatcher
This is likely because you added an empty value to the Country field for one 
(in that result set) document.  I imagine this is a data issue and that you 
either need to clean up the data or avoid indexing blank values.

Erik

On Sep 18, 2013, at 1:36 AM, Prasi S  wrote:

> Hi ,
> Im using solr 4.4 for our search. When i query for a keyword, it returns
> empty valued facets in the response
> 
> 
> 
> 
> 
> *1*
> 1
> 
> 
> 
> 
> 
> 
> I have also tried using facet.missing parameter., but no change. How can we
> handle this.
> 
> 
> Thanks,
> Prasi



Generating similar (related) searches a la Google

2013-09-18 Thread Mr Havercamp

I am using Apache Solr 3.6.

I have been playing around with the idea of providing a "similar" search 
in the same way Google provides a link against some results with the 
ability to search for pages similar to the current result: E.g.


related:lucene.apache.org/solr/ apache solr

One method I tried was to use MoreLikeThis on my title field to generate 
a list of results:


?q=experiment&fl=key,id,title&fq=view:item&bf=title^100 
dc.description.abstract_sm^50&mlt=true&mlt.fl=title


which gives me moreLikeThis results. If an item has a matching 
moreLikeThis result with numFound not equal to zero I can go ahead and 
link to a new query using my /mlt request handler, using a unique item 
key and the keyword to build the query:


q=key:com_jspace.item.96 AND 
experiment&fl=title&mlt.fl=title&start=0&rows=10&mlt.interestingTerms=details


This works well, providing me with paging, etc but one downside is the 
inability to highlight results with the keyword "experiment".


It is my understanding that highlighting is not available as part of the 
mlt request handler so I'm wondering if there is another way to generate 
my search results for items related to another item? Or perhaps I'm 
approaching this all wrong.


Any direction, even "you can't do that" much appreciated.

Cheers


Hayden


Solr Cloud dataimport freezes

2013-09-18 Thread kowish.adamosh
Hi guys,

I have a problem with data import (based on database sql) in Solr Cloud. I'm
trying to import ~500 000 000 of documents and I've created 30 logical
shards on 2 physical machines. Documents are distributed by composite id.
After some time (5-10 minutes; about 400 000 documents) Solr Cloud stops
indexing documents. This is because indexing thread parks and waits on
semaphore:
org.apache.solr.update.SolrCmdDistributor#semaphore.acquire() in method
submit.

While indexing I see jdbc calls in stack trace but after it parks on
semaphore I don't see any jdbc calls (I see only Solr and JDK method calls).

Version of Solr: 4.4
Version of Lucene: 4.4

*With one shard and one physical machines everything is OK*
*With one shard and two physical machines (one leader, one replica)
everything is OK*

This is really big problem for us because of large number of documents we
have to shard index. We have unique queries with sorting so it leads to 1
minute long response times without sharding.

Best,
Kowish



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Cloud-dataimport-freezes-tp4090812.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Cloud dataimport freezes

2013-09-18 Thread kowish.adamosh
Update:
- it works for 8 shards. 
I'm going to test it on 16 shards.

Any ideas what is going on? :-)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Cloud-dataimport-freezes-tp4090812p4090832.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Problem indexing windows files

2013-09-18 Thread Yossi Nachum
Thanks for your help
I try to look at the logs but didn't see anything in solr or manifolcf log
files.
I don't know where is Tika log file I download the binary of solr 4.4 and I
am using the example in there


On Wed, Sep 18, 2013 at 12:02 AM, Furkan KAMACI wrote:

> Firstly;
>
> This may not be a Solr related problem. Did you check the log file of Solr?
> Tika mayhave some circumstances at some kind of situations. For example
> when parsing HTML that has a base64 encoded image it may have some
> problems. If you find the correct logs you can detect it. On the other take
> care of Manifold, there may be some problem too.
>
> 17 Eylül 2013 Salı tarihinde Yossi Nachum  adlı
> kullanıcı şöyle yazdı:
> > Hi,
> >
> > I am trying to index my windows pc files with manifoldcf version 1.3 and
> > solr version 4.4.
> >
> > I create output connection and repository connection and started a new
> job
> > that scan my E drive.
> >
> > Everything seems like it work ok but after a few minutes solr stop
> getting
> > new files to index. I am seeing that through tomcat log file.
> >
> > On manifold crawler ui I see that the job is still running but after few
> > minutes I am getting the following error:
> > "Error: Repeated service interruptions - failure processing document:
> Read
> > timed out"
> >
> > I am seeing that tomcat process is constantly consume 100% of one cpu (I
> > have two cpu's) even after I get the error message from manifolfcf
> crawler
> > ui.
> >
> > I check the thread dump in solr admin and saw that the following threads
> > take the most cpu/user time
> > "
> > http-8080-3 (32)
> >
> >- java.io.FileInputStream.readBytes(Native Method)
> >- java.io.FileInputStream.read(FileInputStream.java:236)
> >- java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
> >- java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
> >- java.io.BufferedInputStream.read(BufferedInputStream.java:334)
> >- org.apache.tika.io.ProxyInputStream.read(ProxyInputStream.java:99)
> >- java.io.FilterInputStream.read(FilterInputStream.java:133)
> >- org.apache.tika.io.TailStream.read(TailStream.java:117)
> >- org.apache.tika.io.TailStream.skip(TailStream.java:140)
> >-
> org.apache.tika.parser.mp3.MpegStream.skipStream(MpegStream.java:283)
> >- org.apache.tika.parser.mp3.MpegStream.skipFrame(MpegStream.java:160)
> >-
> >
>  org.apache.tika.parser.mp3.Mp3Parser.getAllTagHandlers(Mp3Parser.java:193)
> >- org.apache.tika.parser.mp3.Mp3Parser.parse(Mp3Parser.java:71)
> >-
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> >-
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> >-
> >
>  org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> >-
> >
>
>  
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219)
> >-
> >
>
>  
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
> >-
> >
>
>  
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> >-
> >
>
>  
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:241)
> >- org.apache.solr.core.SolrCore.execute(SolrCore.java:1904)
> >-
> >
>
>  
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659)
> >-
> >
>
>  
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362)
> >-
> >
>
>  
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
> >-
> >
>
>  
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> >-
> >
>
>  
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> >-
> >
>
>  
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
> >-
> >
>
>  
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
> >-
> >
>
>  org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
> >-
> >
>
>  org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
> >-
> >
>
>  
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> >-
> >
>
>  org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
> >-
> >
>  org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857)
> >-
> >
>
>  
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
> >-
> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
> >- java.lang.Thread.run(Thread.java:679)
> >
> > "
> >
> > does anyone know what can I do? how to debug this issue? how can I check
> > which file cause tika to work so hard?
> > I don't see anything in the log files and I

SolrCloud - Service unavailable

2013-09-18 Thread Indika Tantrigoda
Hi All,

I am using 3 Solr instances behind an Amazon ELB with 1 shared. Serving
data via Solr works as expected, however I noticed a few times a 503 error
was poping up from the applications accessing Solr. Accessing Solr is done
via the AWS ELB.

3 Zookeeper instances also run on the same instances as Solr on a separate
disk.

Solr version is 4.4.

This issue seems to be a sporadic issue. Has anyone else observed this kind
of behavior ?

Thanks,
Indika


Installation issue with solr server

2013-09-18 Thread Chhaya Vishwakarma
Hi,

I have installed solr server on Ubuntu 12.04 LTS.I am able to access 
http://machineip:8983/solr but when i do curl "http://machineip:8983/solr";
Its giving me "Proxy authorization error"
What can be the problem? Is it due to corporate firewalls?
I have given proxy settings in .bashrc file ,/etc/apt/apt.conf  and in 
/etc/environment file and restarted the machine
But did not work.

Regards,
Chhaya Vishwakarma



The contents of this e-mail and any attachment(s) may contain confidential or 
privileged information for the intended recipient(s). Unintended recipients are 
prohibited from taking action on the basis of information in this e-mail and 
using or disseminating the information, and must notify the sender and delete 
it from their system. L&T Infotech will not accept responsibility or liability 
for the accuracy or completeness of, or the presence of any virus or disabling 
code in this e-mail"


Re: SORTING RESULTS BASED ON RELAVANCY

2013-09-18 Thread PAVAN
Hi alex,

 Thanks for you replycan you please check the following details and
give me suggestions how can i do it, then it will be more helpful to me


i am passing query parameters like 

http://localhost:8080/solr/core/c=cityname&s=iphne+4&s1=iphne~0.5&s2=4~0.5

here "s" is the main string and splitted into s1 and s2 for fuzzy


if a user search for "iphne 4" first it has to check exact match it is not
found then i am splitting string into two strings s1 and s2. i am adding
~0.5 for both s1 and s2.


i need "iphone 4" result first


and i did the configuration in the following way...


AND
   fsw_title
   15
   {!edismax v=$s}
   
   city:All OR _query_:"{!field f=city v=$c}"
   true
   mpId
   3
   true
   
   edismax
   false
   tsw_title^15.0 tf_title^10.0 tsw_keywords^1
keywords^0.5
   fsw_title~1^50.0
   fsw_title~1^25.0
   sum(product(typeId,100),weightage)
   
OR
   fsw_title
   20
   _query_:"{!edismax qf=$qfs1 pf=$pfx pf2=$pf2x v=$s1}" AND
_query_:"{!edismax qf=$qfs2 v=$s2}"
   
   city:All OR _query_:"{!field f=city v=$c}"
   true
   mpId
   5
   true
   
   false
   fsw_title^30 tsw_title^20 tf_title^15.0
keywords^1.0
   tsw_title^15.0 tf_title^10.0 tsw_keywords^1
keywords^0.5
   fsw_title~1^100.0
   fsw_title~1^50.0
   fsw_title~1^25.0
   product(typeId,100)






--
View this message in context: 
http://lucene.472066.n3.nabble.com/SORTING-RESULTS-BASED-ON-RELAVANCY-tp4090789p4090794.html
Sent from the Solr - User mailing list archive at Nabble.com.


Installation issue with solr server

2013-09-18 Thread chhayav
Hi,

I have installed solr server on Ubuntu 12.04 LTS.I am able to access
http://machineip:8983/solr but when i do curl “http://machineip:8983/solr”
Its giving me “Proxy authorization error”
What can be the problem? Is it due to corporate firewalls?
I have given proxy settings in .bashrc file ,/etc/apt/apt.conf  and in
/etc/environment file and restarted the machine
But did not work.

Regards,
Chhaya Vishwakarma




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Installation-issue-with-solr-server-tp4090816.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Installation issue with solr server

2013-09-18 Thread Gora Mohanty
On 18 September 2013 16:26, Chhaya Vishwakarma
 wrote:
> Hi,
>
> I have installed solr server on Ubuntu 12.04 LTS.I am able to access 
> http://machineip:8983/solr but when i do curl "http://machineip:8983/solr";
> Its giving me "Proxy authorization error"
> What can be the problem? Is it due to corporate firewalls?
> I have given proxy settings in .bashrc file ,/etc/apt/apt.conf  and in 
> /etc/environment file and restarted the machine
> But did not work.

This question is off-topic for this list. You would be better
off asking on an Ubuntu-specific list.

However, please see if this helps:
http://askubuntu.com/questions/15719/where-are-the-system-wide-proxy-server-settings

Regards,
Gora


Re: Solr Cloud dataimport freezes

2013-09-18 Thread Shawn Heisey
On 9/18/2013 3:40 AM, kowish.adamosh wrote:
> I have a problem with data import (based on database sql) in Solr Cloud. I'm
> trying to import ~500 000 000 of documents and I've created 30 logical
> shards on 2 physical machines. Documents are distributed by composite id.
> After some time (5-10 minutes; about 400 000 documents) Solr Cloud stops
> indexing documents. This is because indexing thread parks and waits on
> semaphore:
> org.apache.solr.update.SolrCmdDistributor#semaphore.acquire() in method
> submit.

There are some SolrCloud bugs that we expect will be fixed in version
4.5.  Basically what happens is that when a large number of updates are
being distributed from whichever core receives them to the appropriate
shard replicas, managing all those requests results in a deadlock.  If
everything goes well with the release, 4.5 will be out sometime within
the next two weeks.

You can always download and build the "branches/lucene_solr_4_5" code
branch from SVN if you want to try out what will become Solr 4.5:

http://wiki.apache.org/solr/HowToContribute#Getting_the_source_code

SOLR-4816 is semi-related, because it helps avoid the problem in the
first place when using CloudSolrServer in a java program.  I'm having a
hard time finding the jira issue number(s) for the underlying
problem(s), but I know some changes were committed recently specifically
for this problem.

Thanks,
Shawn



Re: SolrCloud - Service unavailable

2013-09-18 Thread Shawn Heisey
On 9/18/2013 8:12 AM, Indika Tantrigoda wrote:
> I am using 3 Solr instances behind an Amazon ELB with 1 shared. Serving
> data via Solr works as expected, however I noticed a few times a 503 error
> was poping up from the applications accessing Solr. Accessing Solr is done
> via the AWS ELB.
> 
> 3 Zookeeper instances also run on the same instances as Solr on a separate
> disk.
> 
> Solr version is 4.4.
> 
> This issue seems to be a sporadic issue. Has anyone else observed this kind
> of behavior ?

What kind of session timeouts have you configured on the amazon load
balancer?  I've never used amazon services, but hopefully this is
configurable.  If the timeout is low enough, it could be just that the
request is taking longer than that to execute.  You may need to increase
that timeout.

Aside from general performance issues, one thing that can cause long
request times is stop-the-world Java garbage collections.  This can be a
sign that your heap is too small, too large, or that your garbage
collection hasn't been properly tuned.

http://wiki.apache.org/solr/SolrPerformanceProblems#GC_pause_problems
http://wiki.apache.org/solr/SolrPerformanceProblems#How_much_heap_space_do_I_need.3F

That same wiki page has another section about the OS disk cache.  Not
having enough memory for this is the cause of a lot of performance issues:

http://wiki.apache.org/solr/SolrPerformanceProblems#OS_Disk_Cache

Thanks,
Shawn



Solrcloud - adding a node as a replica?

2013-09-18 Thread didier deshommes
Hi,
How do I add a node as a replica to a solrcloud cluster? Here is my
situation: some time ago, I created several collections
with replicationFactor=2. Now I need to add a new replica. I thought just
starting a new node and re-using the same zokeeper instance would make it
automatically a replica, but that isn't the case. Do I need to delete and
re-create my collections with the right replicationFactor (3 in this case)
again? I am using solr 4.3.0.

Thanks,
didier


Re: Solr SpellCheckComponent only shows results with certain fields

2013-09-18 Thread jazzy
Hey,

I figured it out!

So the reason that only the name was working was because name was the only
field configured in the solrconfig. Once I did that then I followed this
link to solve the rest of the problem.

SOLR suggester multiple field autocomplete




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-SpellCheckComponent-only-shows-results-with-certain-fields-tp4090727p4090891.html
Sent from the Solr - User mailing list archive at Nabble.com.


Memory Using In Faceted Search (UnInvertedField's)

2013-09-18 Thread anton
Hello,
 
I'm using Solr 4.3.1 for faceted search and have 4 fields used for faceting. My 
question is about memory consumtion.
I've set up heap size to use 6Gb of RAM, but I see in resource monitor it uses 
much more than that - up to 10Gb where 4 Gb is reported as shareable memory.
I've calculated the size of cached set of UnInverted fields and it's 2Gb - I'm 
fine with that, both GC monitor and 'fieldValueCache' stats in 'Plugins/Stats' 
UI for Solr report that. But I can't understand what's that memory that's being 
reserved after filling in fieldValueCache with uninverted fields (right in 
UnInvertedField.uninvert method) and not used  (or not released).
Is that some memory leak? Or is that something I should tune with garbage 
collector by making it more aggressive (GC only shows me 2.x Gb in Old Space 
and I see those UnInvertedField's there in heap dump)?
 
Some info: Index size is 76 Gb. I have 6 shards there. Windows OS. Java 6.0.24.
 
Best regards,
Anton.


Re: FAcet with " " values are displayes in output

2013-09-18 Thread tamanjit.bin...@yahoo.co.in
Any analysis happening on the country field during indexing? If so then
facets are on tokens.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/FAcet-with-values-are-displayes-in-output-tp4090777p4090904.html
Sent from the Solr - User mailing list archive at Nabble.com.


DIH field defaults or re-assigning field values

2013-09-18 Thread P Williams
Hi All,

I'm using the DataImportHandler to import documents to my index.  I assign
one of my document's fields by using a sub-entity from the root to look for
a value in a file.  I've got this part working.  If the value isn't in the
file or the file doesn't exist I'd like the field to be assigned a default
value.  Is there a way to do this?

I think I'm looking for a way to re-assign the value of a field.  If this
is possible then I can assign the default value in the root entity and
overwrite it if the value is found in the sub-entity. Ideas?

Thanks,
Tricia


Solr 4.4.0: Plugin init failure for [schema.xml] analyzer/tokenizer

2013-09-18 Thread Abhijith Jain -X (abhijjai - DIGITAL-X INC at Cisco)
Hello Experts,

I am having trouble in upgrading from Solr 3.6 to Solr 4.4.0. I have placed 
required jars in "lib" directory. When I start the Tomcat instance it throws 
following error. Also pasted part of "conf/schema.xml" file.

Solr 4.4.0 works perfect if I comment following lines.


Schema.xml:












Error log:


575  [coreLoadExecutor-3-thread-1] INFO  org.apache.solr.schema.IndexSchema  â 
Reading Solr Schema from schema.xml
583  [coreLoadExecutor-3-thread-2] INFO  org.apache.solr.schema.IndexSchema  â 
[nipTrendHistory] Schema name=NIP
597  [coreLoadExecutor-3-thread-1] INFO  org.apache.solr.schema.IndexSchema  â 
[nip] Schema name=NIP
649  [coreLoadExecutor-3-thread-1] ERROR org.apache.solr.core.CoreContainer  â 
Unable to create core: nip
org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] 
fieldType "delimiterPatternMultiValue": Plugin init failure for [schema.xml] 
analyzer/tokenizer: class 
com.mycomp.as.sts.nps.solr.analysis.MultiValueTokenizerFactory
at 
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177)
at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:467)
at org.apache.solr.schema.IndexSchema.(IndexSchema.java:164)
at 
org.apache.solr.schema.IndexSchemaFactory.create(IndexSchemaFactory.java:55)
at 
org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:69)
at 
org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:619)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:657)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:364)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:356)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: org.apache.solr.common.SolrException: Plugin init failure for 
[schema.xml] analyzer/tokenizer: class 
com.mycomp.as.sts.nps.solr.analysis.MultiValueTokenizerFactory
at 
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177)
at 
org.apache.solr.schema.FieldTypePluginLoader.readAnalyzer(FieldTypePluginLoader.java:362)
at 
org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:95)
at 
org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:43)
at 
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:151)
... 16 more
Caused by: java.lang.ClassCastException: class 
com.mycomp.as.sts.nps.solr.analysis.MultiValueTokenizerFactory
at java.lang.Class.asSubclass(Class.java:3018)
at 
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:433)
at 
org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:543)
at 
org.apache.solr.schema.FieldTypePluginLoader$2.create(FieldTypePluginLoader.java:342)
at 
org.apache.solr.schema.FieldTypePluginLoader$2.create(FieldTypePluginLoader.java:335)
at 
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:151)
... 20 more

651  [coreLoadExecutor-3-thread-1] ERROR org.apache.solr.core.CoreContainer  â 
null:org.apache.solr.common.SolrException: Unable to create core: nip
at 
org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:1150)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:666)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:364)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:356)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
C

Re: Solr 4.4.0: Plugin init failure for [schema.xml] analyzer/tokenizer

2013-09-18 Thread Chris Hostetter

: I am having trouble in upgrading from Solr 3.6 to Solr 4.4.0. I have 
: placed required jars in "lib" directory. When I start the Tomcat 
: instance it throws following error. Also pasted part of 
: "conf/schema.xml" file.

which jars exactly?
where did you get the jars from?
what version of solr were your custom jars compiled against

: Caused by: java.lang.ClassCastException: class 
com.mycomp.as.sts.nps.solr.analysis.MultiValueTokenizerFactory

According to that, your "MultiValueTokenizerFactory" can not be cast to 
the expected type -- this might be because of a classloader problem (ie: 
you may have multiple instances of the MultiValueTokenizerFactory class 
loaded that are confusing things, or multiple isntances of the lucene 
TokenizerFactory, etc...  alternatively it might be because your 
MultiValueTokenizerFactory was compiled against the wrong version of solr 
(but IIRC that should genrate a diff error ... off the top of my head i'm 
not certain).



-Hoss


Re: SORTING RESULTS BASED ON RELAVANCY

2013-09-18 Thread Chris Hostetter

:  Thanks for you replycan you please check the following details and
: give me suggestions how can i do it, then it will be more helpful to me

you need to show us some examples of your documents and the 
debugQuery=true output for those documents for us to better understand the 
behavior you are seeing.

Unless i'm missing something: FuzzyQuery defaults to using the 
"TopTermsScoringBooleanQueryRewrite" method based on the terms found in 
the index that match the fuzzy expression.  So the results of a simple 
fuzzy query should already come back based on the tf/idf scores of the 
terms.

: if a user search for "iphne 4" first it has to check exact match it is not
: found then i am splitting string into two strings s1 and s2. i am adding
: ~0.5 for both s1 and s2.

: i need "iphone 4" result first

Ok, but you haven't given us any indication of what you are *actually* 
getting as your first result, so we're really can't even begin to guess 
why you aren't getting the results you expect.

if you are seeing identical scores for all documents, then it's possibly 
because of some of hte other ways you have combined up custom params to 
build a complex query.


In particular: no where in your example URL, or the configured defaults 
you pasted below, do you show us how you are ultimaitely building up the 
"q" param from the various custom params you have defined...


: and i did the configuration in the following way...
: 
: 
: AND
:fsw_title
:15
:{!edismax v=$s}
:
:city:All OR _query_:"{!field f=city v=$c}"
:true
:mpId
:3
:true
:
:edismax
:false
:tsw_title^15.0 tf_title^10.0 tsw_keywords^1
: keywords^0.5
:fsw_title~1^50.0
:fsw_title~1^25.0
:sum(product(typeId,100),weightage)
:
: OR
:fsw_title
:20
:_query_:"{!edismax qf=$qfs1 pf=$pfx pf2=$pf2x v=$s1}" AND
: _query_:"{!edismax qf=$qfs2 v=$s2}"
:
:city:All OR _query_:"{!field f=city v=$c}"
:true
:mpId
:5
:true
:
:false
:fsw_title^30 tsw_title^20 tf_title^15.0
: keywords^1.0
:tsw_title^15.0 tf_title^10.0 tsw_keywords^1
: keywords^0.5
:fsw_title~1^100.0
:fsw_title~1^50.0
:fsw_title~1^25.0
:product(typeId,100)


-Hoss


Re: Memory Using In Faceted Search (UnInvertedField's)

2013-09-18 Thread Shawn Heisey
On 9/18/2013 11:08 AM, an...@swooptalent.com wrote:
> I'm using Solr 4.3.1 for faceted search and have 4 fields used for faceting. 
> My question is about memory consumtion.
> I've set up heap size to use 6Gb of RAM, but I see in resource monitor it 
> uses much more than that - up to 10Gb where 4 Gb is reported as shareable 
> memory.
> I've calculated the size of cached set of UnInverted fields and it's 2Gb - 
> I'm fine with that, both GC monitor and 'fieldValueCache' stats in 
> 'Plugins/Stats' UI for Solr report that. But I can't understand what's that 
> memory that's being reserved after filling in fieldValueCache with uninverted 
> fields (right in UnInvertedField.uninvert method) and not used  (or not 
> released).
> Is that some memory leak? Or is that something I should tune with garbage 
> collector by making it more aggressive (GC only shows me 2.x Gb in Old Space 
> and I see those UnInvertedField's there in heap dump)?

I have noticed the same thing.  I do not think there is an actual
problem, but just something strange with the operating system memory
reporting.

https://www.dropbox.com/s/zacp4n3gu8wb9ab/idxb1-top-sorted-mem.png

In the screenshot above, you can see that there is 64GiB total memory.
There is 9012k being used by the OS disk cache and 9853824k free
memory.  If you add these two numbers up, you get a number that's
roughly 51 GiB (54302836k).

You can also see that it says Solr (4.2.1) has a resident size of 16g,
with 11g of that in shareable memory.  FYI, the max java heap is 6g,
verified by the Solr dashboard and tools like jconsole.

With these numbers, if Solr really did have a memory resident size of
16g, Solr's memory size plus the combined total of cached and free
memory would require 3g of swap, but as you can see, there is zero swap
in use.

I don't know if the reporting problem can be fixed.  It is interesting
to know that the same thing happens on both Linux and Windows.

Thanks,
Shawn



Re: Solrcloud - adding a node as a replica?

2013-09-18 Thread Furkan KAMACI
Are yoh looking for that:
http://lucene.472066.n3.nabble.com/SOLR-Cloud-Collection-Management-quesiotn-td4063305.html

18 Eylül 2013 Çarşamba tarihinde didier deshommes  adlı
kullanıcı şöyle yazdı:
> Hi,
> How do I add a node as a replica to a solrcloud cluster? Here is my
> situation: some time ago, I created several collections
> with replicationFactor=2. Now I need to add a new replica. I thought just
> starting a new node and re-using the same zokeeper instance would make it
> automatically a replica, but that isn't the case. Do I need to delete and
> re-create my collections with the right replicationFactor (3 in this case)
> again? I am using solr 4.3.0.
>
> Thanks,
> didier
>


Re: Re: Unable to getting started with SOLR

2013-09-18 Thread Furkan KAMACI
I suggest you to start from here:
http://wiki.apache.org/solr/HowToCompileSolr

15 Eylül 2013 Pazar tarihinde Erick Erickson  adlı
kullanıcı şöyle yazdı:
> If you're using the default jetty container, there's no log unless
> you set it up, the content is echoed to the screen.
>
> About a zillion people have downloaded this and started it
> running without issue, so you need to give us the exact
> steps you followed.
>
> If you checked the code out from SVN, you need to build it,
> go into /solr and execute
>
> ant example dist
>
> the "dist" bit isn't strictly necessary, but it builds the jars
> that you link to if you try to develop custom plugins etc.
>
> Best,
> Erick
>
>
> On Fri, Sep 13, 2013 at 3:56 AM, Rah1x  wrote:
>
>> I have the same issue can anyone tell me if they found a solution?
>>
>>
>>
>> --
>> View this message in context:
>>
http://lucene.472066.n3.nabble.com/Unable-to-getting-started-with-SOLR-tp3497276p4089761.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>


Re: SORTING RESULTS BASED ON RELAVANCY

2013-09-18 Thread Chris Hostetter

: Unless i'm missing something: FuzzyQuery defaults to using the 
: "TopTermsScoringBooleanQueryRewrite" method based on the terms found in 
: the index that match the fuzzy expression.  So the results of a simple 
: fuzzy query should already come back based on the tf/idf scores of the 
: terms.

to give a concrete example...

using 4.4, with the example configs & sample data, this query...

http://localhost:8983/solr/select?defType=edismax&qf=features&q=blak~2&fl=score,id,features&debugQuery=true

...matches two documents with differnet scores.  the resulting scores are 
based on both the edit distance of the word that matches the fuzzy term 
(which durring query-rewriting is used as a term boost), and the tf/idf of 
those terms...

A doc that contains "black" (edit distance 1 => boost * 0.75)...

0.39237294 = (MATCH) sum of:
  0.39237294 = (MATCH) weight(features:black^0.75 in 26) [DefaultSimilarity], 
result of:
0.39237294 = score(doc=26,freq=1.0 = termFreq=1.0), product of:
  0.83205026 = queryWeight, product of:
0.75 = boost
3.7725887 = idf(docFreq=1, maxDocs=32)
0.29406872 = queryNorm
  0.4715736 = fieldWeight in 26, product of:
1.0 = tf(freq=1.0), with freq of:
  1.0 = termFreq=1.0
3.7725887 = idf(docFreq=1, maxDocs=32)
0.125 = fieldNorm(doc=26)

...compared to a doc that contains "book" (edit distance 2 => boost * 0.5)...

0.22888422 = (MATCH) sum of:
  0.22888422 = (MATCH) weight(features:book^0.5 in 5) [DefaultSimilarity], 
result of:
0.22888422 = score(doc=5,freq=1.0 = termFreq=1.0), product of:
  0.5547002 = queryWeight, product of:
0.5 = boost
3.7725887 = idf(docFreq=1, maxDocs=32)
0.29406872 = queryNorm
  0.4126269 = fieldWeight in 5, product of:
1.0 = tf(freq=1.0), with freq of:
  1.0 = termFreq=1.0
3.7725887 = idf(docFreq=1, maxDocs=32)
0.109375 = fieldNorm(doc=5)



-Hoss


Re: solr performance against oracle

2013-09-18 Thread Furkan KAMACI
Martin Fowler and Sadagale has a nice book about such kind of architectural
designs: NoSQL Distilled Emerging Polyglot Persistence.If you read it you
will see why to use a NoSQL or an RDBMS or both of them. On the other hand
I have over 50+ millions of documents at a replicated nodes of SolrCloud
and my average response time is ~10 ms So it depends on your architecture,
configuration and hardware specifications.

12 Eylül 2013 Perşembe tarihinde Chris Hostetter 
adlı kullanıcı şöyle yazdı:
>
> Setting asside the excellent responses that have already been made in this
> thread, there are fundemental discrepencies in what you are comparing in
> your respective timing tests.
>
> first off: a micro benchmark like this is virtually useless -- unless you
> really plan on only ever executing a single query in a single run of a
> java application that then terminates, trying to time a single query is
> silly -- you should do lots and lots of iterations using a large set of
> sample inputs.
>
> Second: what you are timing is vastly different between the two cases.
>
> In your Solr timing, no communication happens over the wire to the solr
> server until the call to server.query() inside your time stamps -- if you
> were doing multiple requests using the same SolrServer object, the HTTP
> connection would get re-used, but as things stand your timing includes all
> of hte network overhead of connecting to the server, sending hte request,
> and reading the response.
>
> in your oracle method however, the timestamps you record are only arround
> the call to executeQuery(), rs.next(), and rs.getString() ... you are
> ignoring the timing neccessary for the getConnection() and
> prepareStatement() methods, which may be significant as they both involved
> over the wire communication with the remote server (And it's not like
> these are one time execute and forget about them methods ... in a real
> long lived application you'd need to manage your connections, re-open if
> they get closed, recreate the prepared statement if your connection has to
> be re-open, etc... )
>
> Your comparison is definitly apples and oranges.
>
>
> Lastly, as others have mentioned: 150-200ms to request a single document
> by uniqueKey from an index containing 800K docs seems ridiculously slow,
> and suggests that something is poorly configured about your solr instance
> (another apples to oranges comparison: you've got an ad-hoc solr
> installation setup on your laptop and you're benchmarking it against a
> remote oracle server running on dedicated remote hardware that has
> probably been heavily tunned/optimized for queries).
>
> You haven't provided us any details however about how your index is setup,
> or how you have confiugred solr, or what JVM options you are using to run
> solr, or what physical resources are available to your solr process (disk,
> jvm heap ram, os file system cache ram) so there isn't much we can offer
> in the way of advice on how to speed things up.
>
>
> FWIW:  On my laptop, using Solr 4.4 w/ the example configs and built in
> jetty (ie: "java -jart start.jar") i got a 3.4 GB max heap, and a 1.5 GB
> default heap, with plenty of physical ram left over for the os file system
> cache of an index i created containing 1,000,000 documents with 6 small
> fields containing small amounts of random terms.  I then used curl to
> execute ~4150 requests for documents by id (using simple search, not the
> /get RTG handler) and return the results using JSON.
>
> This commpleted in under 4.5 seconds, or ~1.0ms/request.
>
> Using the more verbose XML response format (after restarting solr to
> ensure nothing in the query result caches) only took 0.3 seconds longer on
> the total time (~1.1ms/request)
>
> $ time curl -sS '
http://localhost:8983/solr/collection1/select?q=id%3A[1-100:241]&wt=json&indent=true'
> /dev/null
>
> real0m4.471s
> user0m0.412s
> sys 0m0.116s
> $ time curl -sS '
http://localhost:8983/solr/collection1/select?q=id%3A[1-100:241]&wt=xml&indent=true'
> /dev/null
>
> real0m4.868s
> user0m0.376s
> sys 0m0.136s
> $ java -version
> java version "1.7.0_25"
> OpenJDK Runtime Environment (IcedTea 2.3.10)
(7u25-2.3.10-1ubuntu0.12.04.2)
> OpenJDK 64-Bit Server VM (build 23.7-b01, mixed mode)
> $ uname -a
> Linux frisbee 3.2.0-52-generic #78-Ubuntu SMP Fri Jul 26 16:21:44 UTC
2013 x86_64 x86_64 x86_64 GNU/Linux
>
>
>
>
>
>
> -Hoss
>


Re: Querying a non-indexed field?

2013-09-18 Thread Chris Hostetter

: Subject: Re: Querying a non-indexed field?
: 
: No.  --wunder

To elaborate just a bit...

: query on a few indexed fields, getting a small # of results.  I want to 
: restrict this further based on values from non-indexed, stored fields.  
: I can obviously do this myself, but it would be nice if Solr could do 

...you could implement this in a custom SearchComponent, or custom qparser 
that would generate PostFilter compatible queries, that looked at the 
stored field values -- but it's extremeley unlikeley that you would ever 
convince any of the lucene/solr devs to agree to commit a general purpose 
version of this type of logic into the code base -- because in the general 
case (arbitrary unknown number of documents matching the main query) it 
would be extremely inefficient and would encourage "bad" user behavior.

-Hoss


Merge problem with lucene 3 and 4 indices

2013-09-18 Thread Harry Hight
We have a process that builds small indices, and merges them into the
master, and are in the process of going from solr 3.5 to solr 4.3. So,
during this process we are going to have to merge indices built with solr 3
with ones built with solr 4.

I'm running into a problem with an index built from that process. It was
merged from a set of solr 3 indices by solr 4 code, but it wrote a solr 3
segment.

Searching on the index works fine, however, this code:
Directory[] indexes = new Directory[1];
indexes[0]  = new NIOFSDirectory(new File(dir));
writer.addIndexes(indexes);

fails in addIndexes() with
Exception in thread "main" java.io.FileNotFoundException: _6.tis
at
org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:266)
at
org.apache.lucene.index.SegmentInfoPerCommit.sizeInBytes(SegmentInfoPerCommit.java:88)
at
org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:2319)

If I rename the files from _0 to _6, it fails with
Exception in thread "main" java.io.FileNotFoundException: /_
0.si (No such file or directory)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccessFile.java:212)
at
org.apache.lucene.store.FSDirectory$FSIndexInput.(FSDirectory.java:410)
at
org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.(NIOFSDirectory.java:123)
at
org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:80)
at
org.apache.lucene.codecs.lucene3x.Lucene3xSegmentInfoReader.read(Lucene3xSegmentInfoReader.java:103)
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:301)
at
org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:347)
at
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:783)
at
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:630)
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:343)

If I copy the _0 files to _6, the merge works fine, but I don't understand
why its trying to find the _6 segment in the first place.

Hex dump of the segments_1 file:
000: 3fd7 6c17 0873 6567 6d65 6e74 7300   ?.l..segments...
010:     0300  0100   
020: 0102 5f30 084c 7563 656e 6533 78ff   .._0.Lucene3x...
030:   ff00       
040: 00f6 90cf 88 .

Index directory:
drwxrwxr-x 2 hhight general 4.0K Sep 18 17:19 ./
drwxrwxr-x 3 hhight general 4.0K Sep 18 13:46 ../
-rw-rw-r-- 1 hhight general  34M Sep 18 13:25 _0.fdt
-rw-rw-r-- 1 hhight general  26K Sep 18 13:25 _0.fdx
-rw-rw-r-- 1 hhight general 2.7K Sep 18 13:25 _0.fnm
-rw-rw-r-- 1 hhight general 3.9M Sep 18 13:25 _0.frq
-rw-rw-r-- 1 hhight general 195K Sep 18 13:25 _0.nrm
-rw-rw-r-- 1 hhight general 8.5M Sep 18 13:25 _0.prx
-rw-rw-r-- 1 hhight general  343 Sep 18 13:25 _0.si
-rw-rw-r-- 1 hhight general 118K Sep 18 13:25 _0.tii
-rw-rw-r-- 1 hhight general 8.5M Sep 18 13:25 _0.tis
-rw-rw-r-- 1 hhight general   29 Sep 18 13:25 _0_upgraded.si
-rw-rw-r-- 1 hhight general   69 Sep 18 13:25 segments_1
-rw-rw-r-- 1 hhight general   20 Sep 18 13:25 segments.gen


Any suggestions on the cause of this?


RE: Solr 4.4.0: Plugin init failure for [schema.xml] analyzer/tokenizer

2013-09-18 Thread Abhijith Jain -X (abhijjai - DIGITAL-X INC at Cisco)
Thanks for the reply.



* Following are the jars placed in "tomcat/lib" dir:





annotations-api.jar  el-api.jarjsp-api.jar  
  lucene-core.jar  solr-core-1.3.0.jar  
tomcat-dbcp.jar

catalina-ant.jar jasper-el.jar jul-to-slf4j-1.6.6.jar   
  private  solr-dataimporthandler-4.4.0.jar 
tomcat-i18n-es.jar

catalina-ha.jar  jasper.jarlog4j-1.2.16.jar 
  servlet-api.jar  solr-dataimporthandler-extras-4.4.0.jar  
tomcat-i18n-fr.jar

catalina.jar jasper-jdt.jarlog4j.properties 
  slf4j-api-1.6.6.jar  solr-solrj-4.4.0.jar 
tomcat-i18n-ja.jar

catalina-tribes.jar  jcl-over-slf4j-1.6.6.jar  
lucene-analyzers-common-4.2.0.jar  slf4j-log4j12-1.6.6.jar  tomcat-coyote.jar





Jars in "tomcat/ webapps/ROOT/WEB-INF/lib/"



commons-cli-1.2.jar  hadoop-common-2.0.5-alpha.jar
lucene-core-4.4.0.jar nps-solr-plugin-1.0-SNAPSHOT.jar

commons-codec-1.7.jarhadoop-hdfs-2.0.5-alpha.jar  
lucene-grouping-4.4.0.jar org.restlet-2.1.1.jar

commons-configuration-1.6.jarhttpclient-4.2.3.jar 
lucene-highlighter-4.4.0.jar  org.restlet.ext.servlet-2.1.1.jar

commons-fileupload-1.2.1.jar httpcore-4.2.2.jar   
lucene-memory-4.4.0.jar   protobuf-java-2.4.0a.jar

commons-io-2.1.jar   httpmime-4.2.3.jar   
lucene-misc-4.4.0.jar solr-core-4.4.0.jar

commons-lang-2.6.jar joda-time-2.2.jar
lucene-queries-4.4.0.jar  solr-dataimporthandler-4.4.0.jar

concurrentlinkedhashmap-lru-1.2.jar  lucene-analyzers-common-4.4.0.jar
lucene-queryparser-4.4.0.jar  solr-solrj-4.4.0.jar

guava-14.0.1.jar lucene-analyzers-kuromoji-4.4.0.jar  
lucene-spatial-4.4.0.jar  spatial4j-0.3.jar

hadoop-annotations-2.0.5-alpha.jar   lucene-analyzers-phonetic-4.4.0.jar  
lucene-suggest-4.4.0.jar  wstx-asl-3.2.7.jar

hadoop-auth-2.0.5-alpha.jar  lucene-codecs-4.4.0.jar  
noggit-0.5.jarzookeeper-3.4.5.jar





* I downloaded solr-4.4.0 instance from apache website. 
(http://www.apache.org/dyn/closer.cgi/lucene/solr/4.4.0)



Most of the jars are from the "dist" directory and "example" directory.





* Custom jars are compiled for Solr 4.4.0 version.  I copied most of 
the jars from apache website. And few jars from www.java2s.com





Thanks

Abhi



-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org]
Sent: Wednesday, September 18, 2013 1:41 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr 4.4.0: Plugin init failure for [schema.xml] analyzer/tokenizer





: I am having trouble in upgrading from Solr 3.6 to Solr 4.4.0. I have

: placed required jars in "lib" directory. When I start the Tomcat

: instance it throws following error. Also pasted part of

: "conf/schema.xml" file.



which jars exactly?

where did you get the jars from?

what version of solr were your custom jars compiled against



: Caused by: java.lang.ClassCastException: class 
com.mycomp.as.sts.nps.solr.analysis.MultiValueTokenizerFactory



According to that, your "MultiValueTokenizerFactory" can not be cast to the 
expected type -- this might be because of a classloader problem (ie:

you may have multiple instances of the MultiValueTokenizerFactory class loaded 
that are confusing things, or multiple isntances of the lucene 
TokenizerFactory, etc...  alternatively it might be because your 
MultiValueTokenizerFactory was compiled against the wrong version of solr (but 
IIRC that should genrate a diff error ... off the top of my head i'm not 
certain).







-Hoss


Re: DIH field defaults or re-assigning field values

2013-09-18 Thread Alexandre Rafalovitch
You could also do this in request update processor. There is a default
value one there. Also, I think field definition in schema allows defaults.

Regards,
Alex
On 19 Sep 2013 02:20, "P Williams"  wrote:

> Hi All,
>
> I'm using the DataImportHandler to import documents to my index.  I assign
> one of my document's fields by using a sub-entity from the root to look for
> a value in a file.  I've got this part working.  If the value isn't in the
> file or the file doesn't exist I'd like the field to be assigned a default
> value.  Is there a way to do this?
>
> I think I'm looking for a way to re-assign the value of a field.  If this
> is possible then I can assign the default value in the root entity and
> overwrite it if the value is found in the sub-entity. Ideas?
>
> Thanks,
> Tricia
>


Re: Querying a non-indexed field?

2013-09-18 Thread Otis Gospodnetic
Moreover, you may be trying to save/optimize in a wrong place. Maybe these
additional indexed fields are not so costly. Maybe you can optimize in some
other part of your setup.

Otis
Solr & ElasticSearch Support
http://sematext.com/
On Sep 18, 2013 5:47 PM, "Chris Hostetter"  wrote:

>
> : Subject: Re: Querying a non-indexed field?
> :
> : No.  --wunder
>
> To elaborate just a bit...
>
> : query on a few indexed fields, getting a small # of results.  I want to
> : restrict this further based on values from non-indexed, stored fields.
> : I can obviously do this myself, but it would be nice if Solr could do
>
> ...you could implement this in a custom SearchComponent, or custom qparser
> that would generate PostFilter compatible queries, that looked at the
> stored field values -- but it's extremeley unlikeley that you would ever
> convince any of the lucene/solr devs to agree to commit a general purpose
> version of this type of logic into the code base -- because in the general
> case (arbitrary unknown number of documents matching the main query) it
> would be extremely inefficient and would encourage "bad" user behavior.
>
> -Hoss
>


Re: SolrCloud - Service unavailable

2013-09-18 Thread Indika Tantrigoda
Thanks Shawn, the links will be useful.

I am still not sure if its related due to a timeout because the 503 error
is coming from Tomcat, which means the requests are going through. I can
access the Solr admin panel and I see a message saying the core was not
initialized.

Thanks,
Indika


On 18 September 2013 21:27, Shawn Heisey  wrote:

> On 9/18/2013 8:12 AM, Indika Tantrigoda wrote:
> > I am using 3 Solr instances behind an Amazon ELB with 1 shared. Serving
> > data via Solr works as expected, however I noticed a few times a 503
> error
> > was poping up from the applications accessing Solr. Accessing Solr is
> done
> > via the AWS ELB.
> >
> > 3 Zookeeper instances also run on the same instances as Solr on a
> separate
> > disk.
> >
> > Solr version is 4.4.
> >
> > This issue seems to be a sporadic issue. Has anyone else observed this
> kind
> > of behavior ?
>
> What kind of session timeouts have you configured on the amazon load
> balancer?  I've never used amazon services, but hopefully this is
> configurable.  If the timeout is low enough, it could be just that the
> request is taking longer than that to execute.  You may need to increase
> that timeout.
>
> Aside from general performance issues, one thing that can cause long
> request times is stop-the-world Java garbage collections.  This can be a
> sign that your heap is too small, too large, or that your garbage
> collection hasn't been properly tuned.
>
> http://wiki.apache.org/solr/SolrPerformanceProblems#GC_pause_problems
>
> http://wiki.apache.org/solr/SolrPerformanceProblems#How_much_heap_space_do_I_need.3F
>
> That same wiki page has another section about the OS disk cache.  Not
> having enough memory for this is the cause of a lot of performance issues:
>
> http://wiki.apache.org/solr/SolrPerformanceProblems#OS_Disk_Cache
>
> Thanks,
> Shawn
>
>


Migrating a existing/splited shard to new node

2013-09-18 Thread Uomesh
Hi,

We have started working on our current search from master/slave to
SolrCloud. I have couple of questions related with expanding the nodes
dynamically. Please help.

1. What is best way to migrate an existing shard to new node? is it just a
creating a core on new node manually as below or there is another way?
http://localhost:/solr/admin/cores?action=CREATE&name=testcollection_shard1_replica1&collection=testcollection&shard=shard1&collection.configName=collection1

2. How to create new replica dynamically? is just creating a new core as
below or there is another way?
http://localhost:/solr/admin/cores?action=CREATE&name=testcollection_shard1_replica2&collection=testcollection&shard=shard1&collection.configName=collection1

3. How to add a brand new shard to collection dynamically? is it just
creating a new core with new shard name on a new node as below? will on
newly created shard documents be distributed automatically? or this is not
the right way and we should use shard splitting?

http://localhost:/solr/admin/cores?action=CREATE&name=testcollection_shard2_replica1&collection=testcollection&shard=shard2&collection.configName=collection1

Thank you so much for help!!

-Umesh



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Migrating-a-existing-splited-shard-to-new-node-tp4090991.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Need help with delta import

2013-09-18 Thread sureshadapa
I am using below configuration file and The problem is I do not see any solr
documents committed into Solr Core Selector 'db'

When i run full-import,Is give me message.
Indexing completed. Added/Updated: 0 documents. Deleted 0 documents.
Requests: 1, Fetched: 8, Skipped: 0, Processed: 0

When i run delta-import,It gives me message.
Requests: 0, Fetched: 0, Skipped: 0, Processed: 0

solrconfig.xml
==
4.4


db1-data-config.xml

  

schema.xml



   
   
   
   
   
 
 solrp_id


db1-data-config.xml
=















--
View this message in context: 
http://lucene.472066.n3.nabble.com/Need-help-with-delta-import-tp4025003p4090999.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: FAcet with " " values are displayes in output

2013-09-18 Thread Prasi S
No analysis is done on the facets. The facets are string fields.


On Wed, Sep 18, 2013 at 11:59 PM, tamanjit.bin...@yahoo.co.in <
tamanjit.bin...@yahoo.co.in> wrote:

> Any analysis happening on the country field during indexing? If so then
> facets are on tokens.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/FAcet-with-values-are-displayes-in-output-tp4090777p4090904.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: FAcet with " " values are displayes in output

2013-09-18 Thread Upayavira
q=country:[* TO *] will find all docs that have a value in a field.
However, it seems you have a space, which *is* a value. I think Eric is
right - track down that record and fix the data.

Upayavira

On Wed, Sep 18, 2013, at 09:23 AM, Prasi S wrote:
> How to filter them in the query itself?
> 
> Thanks,
> Prasi
> 
> 
> On Wed, Sep 18, 2013 at 1:06 PM, Upayavira  wrote:
> 
> > Filter them out in your query, or in your display code.
> >
> > Upayavira
> >
> > On Wed, Sep 18, 2013, at 06:36 AM, Prasi S wrote:
> > > Hi ,
> > > Im using solr 4.4 for our search. When i query for a keyword, it returns
> > > empty valued facets in the response
> > >
> > > 
> > > 
> > > 
> > > 
> > > *1*
> > > 1
> > > 
> > > 
> > > 
> > > 
> > > 
> > >
> > > I have also tried using facet.missing parameter., but no change. How can
> > > we
> > > handle this.
> > >
> > >
> > > Thanks,
> > > Prasi
> >