Re: Optimizing RAM

2014-03-10 Thread Toke Eskildsen
On Sun, 2014-03-09 at 19:55 +0100, abhishek jain wrote:
> I am confused should i keep two separate indexes or keep one index with two
> versions or column , i mean col1_stemmed and col2_unstemmed.

1 index with stemmed & unstemmed will be markedly smaller than 2 indexes
(one with stemmed, one with unstemmed). Furthermore, keeping the stemmed
and unstemmed in the same index allows you to search in both fields and
assign a greater weight to the unstemmed field.

> I have multicore with multi shard configuration.
> My server have 32 GB RAM and stemmed index size (without content) i
> calculated as 60 GB .

What do you mean by "without content"? Is it the col2_unstemmed that you
plan to add? Or stored fields maybe?

> I want to not put too much load and I/O load on a decent server with some 5
> other replicated servers and want to use servers for other purposes also.

If you haven't already done so, use SSDs as your storage. That way you
don't have to worry much about RAM / index size ratio for performance.


- Toke Eskildsen, State and University Library, Denmark




maxClauseCount is set to 1024

2014-03-10 Thread Andreas Owen

does this maxClauseCount go over each field individually or all put together? 
is it the date fields?


when i execute a query i get this error:


   500   93true   
  Ein PDFchen als Dokument roles:* 1394436617394 xml 


.
0.10604319

   390   2   27   1   
1   3 
  3   1   
8   10   1   14   37   1 1   4
   8   44   4   1   6
57   11   11   3   3   4  
 1   2   1   2   2   2   2   29  
 1   1 17   1   1   4   1   3   5   1   5   1   2   1   1   1   35   1   2   26   2   1 
  2   3   1   1   1   27   3   1   1   3   1   1   3   
6   3   1   2   2   2   2   2   28   4   2   1   16 
  46   1   5   11   58   1   2   29   
2   2   1   1   1   9   4   75   2   2   1   2   2   1   4   1   1   2   1   1   1 91   1   11   3   3 20   
15   59   11   
36   204   18 
  2   25   7   5   2   7   3   7   10   10   4   1   34   4   35   25   9   +1MONTH   
2011-03-01T00:00:00Z   2014-04-01T00:00:00Z   0 
   maxClauseCount is set to 1024   org.apache.lucene.search.BooleanQuery$TooManyClauses: 
maxClauseCount is set to 1024at 
org.apache.lucene.search.ScoringRewrite$1.checkMaxClauseCount(ScoringRewrite.java:72)
at 
org.apache.lucene.search.ScoringRewrite$ParallelArraysTermCollector.collect(ScoringRewrite.java:152)
 at 
org.apache.lucene.search.TermCollectingRewrite.collectTerms(TermCollectingRewrite.java:79)
   at org.apache.lucene.search.ScoringRewrite.rewrite(ScoringRewrite.java:108)  
   at org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java:288)  
   at 
org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:217)
 at 
org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:99)
  at 
org.apache.lucene.search.highlight.WeightedSpanTermExtractor.getWeightedSpanTerms(WeightedSpanTermExtractor.java:469)
at
org.apache.lucene.search.highlight.QueryScorer.initExtractor(QueryScorer.java:217)
  at 
org.apache.lucene.search.highlight.QueryScorer.init(QueryScorer.java:186)at 
org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:199)
at 
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByHighlighter(DefaultSolrHighlighter.java:528)
at 
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:415)
 at 
org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:139)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1820)at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656) 
 at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
 at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
  at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)   at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)   
 at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560)  at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)
   at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382) 
   at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
 at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)
   at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)   
 at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
  at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) 
 at org.eclipse.jetty.server.Server.handle(Server.java:365)  at 
org.eclipse.jetty.server.AbstractHttpConnection.han

Re: SolrCloud constantly crashes after upgrading to Solr 4.7

2014-03-10 Thread Martin de Vries

Hi,

When our server crashes the memory fills up fast. So I think it might 
be a specific query that causes our servers to crash. I think the query 
won't be logged because it doesn't finish. Is there anything we can do 
to see the currently running queries in de Solr server (so when can see 
them just before the crash)? A debug log might be another option, but 
I'm afraid our servers are to busy to find it in there.



Martin


Zookeeper will not update cluster state when garbaging

2014-03-10 Thread OSMAN Metin
Hi all,

we are using SolrCloud with this configuration :

* SolR 4.4.0

* Zookeeper 3.4.5

* one server with zookeeper + 4 solr nodes

* one server with 4 solr nodes

* only one core

* Solr instances deployed on tomcats with mod_cluster

* clients access with SolRJ trough Apache + mod_cluster

On the morning, we have massive updates (several thousands in a few minute) 
with explicit softCommit=true.
This updates are load balanced on each regardless a node is the leader or not.

When this happens, the solr cloud admin console shows 7 nodes as recovering and 
the leader as active.
We also noticed, that refreshing the graphic is very long.
This situation can last 3 hours until the clusterstate refreshes.
During this phase, Zookeeper is hardly garbaging (I can post the Munin gc 
graphs).

Here are the command line parameters of zookeeper and solr nodes (I have 
replaced some values with XXX for confidentiality reason).

Zookeeper :

java -cp 
/var/lib/zookeeper/bin/../build/classes:/var/lib/zookeeper/bin/../build/lib/*.jar:/var/lib/zookeeper/bin/../lib/slf4j-log4j12-1.6.1.jar:/var/lib/zookeeper/bin/../lib/slf4j-api-1.6.1.jar:/var/lib/zookeeper/bin/../lib/netty-3.2.2.Final.jar:/var/lib/zookeeper/bin/../lib/log4j-1.2.15.jar:/var/lib/zookeeper/bin/../lib/jline-0.9.94.jar:/var/lib/zookeeper/bin/../zookeeper-3.4.5.jar:/var/lib/zookeeper/bin/../src/java/lib/*.jar:/app/zookeeper/conf:
 -Dcom.sun.management.jmxremote.ssl=false 
-Dcom.sun.management.jmxremote.authenticate=false 
-Dcom.sun.management.jmxremote.port=XXX -Xms384m -Xmx384m -XX:MaxPermSize=128m 
-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.local.only=false 
org.apache.zookeeper.server.quorum.QuorumPeerMain /app/zookeeper/conf/zoo.cfg

SolR :

/usr/lib/jvm/java/bin/java -Dsolr.data.dir=/app/solr/server/search_01/vod/data 
-Dsolr.solr.home=/app/solr/server/search_01 -DnumShards=1 
-Dbootstrap_confdir=/app/solr/server/search_01/vod/conf 
-Dcollection.configName=vod -DzkHost=XXX:2181 -Dtomcat.server.port=XXX 
-Dtomcat.http.port=XXX -Dtomcat.ajp.port=XXX 
-Dlog4j.configuration=file:///app/tomcat/server/search_01/conf/log4j.properties 
-Djboss.jvmRoute=SEARCH_02_01 -Djboss.modcluster.sendToApacheDelayInSec=10 
-Djboss.modcluster.nodetimeout=30 -Djboss.modcluster.ttl=10 -Xms2048m -Xmx2048m 
-XX:MaxPermSize=384m -Dcom.sun.management.jmxremote 
-Dcom.sun.management.jmxremote.port=XXX 
-Dcom.sun.management.jmxremote.ssl=false 
-Dcom.sun.management.jmxremote.authenticate=false -classpath 
:/app/tomcat/server/search_01/bin/bootstrap.jar:/app/tomcat/server/search_01/bin/tomcat-juli.jar:/usr/share/java/commons-daemon.jar
 -Dcatalina.base=/app/tomcat/server/search_01 
-Dcatalina.home=/app/tomcat/server/search_01 -Djava.endorsed.dirs= 
-Djava.io.tmpdir=/app/tomcat/server/search_01/temp 
-Djava.util.logging.config.file=/app/tomcat/server/search_01/conf/log4j.properties
 -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager 
org.apache.catalina.startup.Bootstrap start

I have tried other gc strategies, max heap values, new ratio, etc... on 
Zookeeper without success.
Every time zookeeper is garbaging, the clusterstate is not correct.

Is this a bug with zookeeper, SolR 4.4.0 or is it due to some misconfiguration ?
I have seen somewhere that there is a timeout value between solr and zookeeper, 
but I don't know where it is set (and what is its default value).

Any help will be appreciated.

Regards,
Metin


Re: SolrCloud setup guidance

2014-03-10 Thread Priti Solanki
As of now index is on 136 GB.

I want to understand can we do multiple write on solr?  I don't have any
partitioning strategy as of now.

On Amazon instance for Solr the disk read/write a like 5% or so . I am not
able to understand even though I am almost processing 300 records per min
how come Solr server is not receiving high disk IO.

I have a query, When solr is writing does it update complete index again ?

I will try and increase the RAM to 128 and see if some improvement comes..


On Mon, Mar 10, 2014 at 9:21 AM, Susheel Kumar <
susheel.ku...@thedigitalgroup.net> wrote:

> Not sure how fast your index will grow but first you may still want to
> consider upgrading the single machine to 128 GB to see how the performance
> is coming. Current memory 7 GB is really low.  After that you may want to
> add another node to partition the index into 2 nodes/shards (assuming you
> have some partition strategy ) that index size of 150-200 GB can fit into
> two nodes memory.
>
> Thanks,
> Susheel
>
> -Original Message-
> From: Priti Solanki [mailto:pritiatw...@gmail.com]
> Sent: Friday, March 07, 2014 12:50 AM
> To: solr-user@lucene.apache.org
> Subject: Re: SolrCloud setup guidance
>
> Thanks Susheel,
>
> But this index will keep on growing that my worry So I always have to
> increase the RAM .
>
> Can you suggest how many nodes one can think to support this bug index?
>
> Regards,
>
>
>
> On Fri, Mar 7, 2014 at 2:50 AM, Susheel Kumar <
> susheel.ku...@thedigitalgroup.net> wrote:
>
> > Setting up Solr cloud(horizontal scaling) is definitely a good idea
> > for this big index but before going to Solr cloud, are you able to
> > upgrade your single node to 128GB of memory(vertical scaling) to see the
> difference.
> >
> > Thanks,
> > Susheel
> >
> > -Original Message-
> > From: Priti Solanki [mailto:pritiatw...@gmail.com]
> > Sent: Thursday, March 06, 2014 10:51 AM
> > To: solr-user@lucene.apache.org
> > Subject: SolrCloud setup guidance
> >
> > Hello Everyone,
> >
> > I would like to take you guidance of following
> >
> > I have a single core with 124 GB of index data size. Indexing and
> > Reading both are very slow as I have 7 GB RAM to support this huge
> > data.  Almost 8 million of documents.
> >
> > Hence, we thought of going to SolrCloud so that we can accommodate
> > more upcoming data. I have data for 13 country with their millions of
> > products and we want to set up solrcloud for the same.
> >
> > I am in need of some initial thoughts about how to setup solrCloud for
> > such requirement. How to we come to know how many nodes,core I would
> > be needing to support this...
> >
> > we are thinking to host this on Amazon...Any guidance or reading links
> > ,case study will be highly appreciated.
> >
> > Regards,
> >
>


Re: Solr spatial search within the polygon

2014-03-10 Thread Javi
Hi all.

I need your help! I have read every post about Spatial in Solr because I 
need to check if a point (latitude,longitude) is inside a Polygon.

/**/
/* 1. library */
/**/

(1) I use "jts-1.13.jar" and "spatial4j-0.4.1.jar"
(I think they are the latest version)

/*/
/* 2. schema.xml */
/*/


(I omit geo="true" because it is the default)

...



(Here I dont know what means if I add multiValued="true")

/*/
/* Document contents */
/*/
I have tried with 3 different content for my documents (lat-lon refers to 
Madrid, Spain):


a) As it is WKT format, I tried "longitude latitude" (x y)


  -3.69278 40.442179



b) As it is WKT format, I tried "POINT(longitude latitude) (x y)



  POINT(-3.69278 40.442179)



and

c) I tried no WKT format by adding a comma and using "longitude,latitude"



  40.442179,-3.69278



d) I tried no WKT format by adding a comma and using "latitude,longitude"



  -3.69278,40.442179



/*/
/* My solr query */
/*/

a) 
_Description: This POLYGON (in WKT format, so "longitude latitude") is a 
triangle that cover Madrid at all, so my point would be inside them.
_Result: Query return 0 documents (which is wrong).
 
http://localhost:8983/solr/pisos22/select?q=*%3A*&;
fl=LOCATION&
wt=xml&
indent=true&
fq=LOCATION:"IsWithin(POLYGON((
-3.732605 40.531415,
-3.856201 40.336993,
-3.493652 40.332806,
-3.732605 40.531415
))) distErrPct=0"

b) 
_Descripcion: This POLYGON (in WKT format, so "longitude latitude") is a 
rectangle out of Madrid, so my point would not be inside them.
_Result: Query return 0 documents (which is correct).

http://localhost:8983/solr/pisos22/select?q=*%3A*&;
fl=LOCATION&
wt=xml&
indent=true&
fq=LOCATION:"IsWithin(POLYGON((
-4.0594 40.8708,
-4.0621 40.7211 ,
-3.8095 40.7127,
-3.8232 40.8687,
-4.0594 40.8708
))) distErrPct=0"

***I also tried modifying the order of lat/lon but I am not able to find out 
the solution to make it work.




Re: Solr Production Installation

2014-03-10 Thread leevduhl
Excellent, thank you.

Lee



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Production-Installation-tp4122091p4122533.html
Sent from the Solr - User mailing list archive at Nabble.com.


Updated to v4.7 - Getting "Search requests cannot accept content streams"

2014-03-10 Thread leevduhl
We just upgraded our dev environment from Solr 4.6 to 4.7 and our search
"posts" are now returning a "Search requests cannot accept content streams"
error.  We did not install over top of our 4.6 install, we installed into a
new folder.

org.apache.solr.common.SolrException: Search requests cannot accept content
streams
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:170)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1916)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:780)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:744)

Any idea what may be causing this in v4.7?

Thanks
Lee



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Updated-to-v4-7-Getting-Search-requests-cannot-accept-content-streams-tp4122540.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Which Tokenizer to use at searching

2014-03-10 Thread abhishek jain
Hi,
As a solution, i have tried a combination of PatternTokenizerFactory and
PatternReplaceFilterFactory .

In both query and indexer i have written:




What i am trying to do is tokenizing on space and then rewriting every
special character as " punct " .

So, A,B becomes A punct B .

but the problem is A punct B is still one word and not tokenized further
application of filter,

Is there a way i can tokenize after application of filter, please suggest i
know i am missing something basic.

thanks
abhishek


On Mon, Mar 10, 2014 at 2:06 AM,  wrote:

> Hi
> Oops my bad. I actually meant
> While indexing A,B
> A and B should give result but
> "A B" should not give result.
>
> Also I will look at analyser.
>
> Thanks
> Abhishek
>
>   Original Message
> From: Erick Erickson
> Sent: Monday, 10 March 2014 01:38
> To: abhishek jain
> Subject: Re: Which Tokenizer to use at searching
>
> Then I don't see the problem. StandardTokenizer
> (see the "text_general" fieldType) should do all this
> for you automatically.
>
> Did you look at the analysis page? I really recommend it.
>
> Best,
> Erick
>
> On Sun, Mar 9, 2014 at 3:04 PM, abhishek jain
>  wrote:
> > Hi Erick,
> > Thanks for replying,
> >
> > I want to index A,B (with or without space with comma) as separate words
> and
> > also want to return results when A and B searched individually and also
> > "A,B" .
> >
> > Please let me know your views.
> > Let me know if i still havent explained correctly. I will try again.
> >
> > Thanks
> > abhishek
> >
> >
> > On Sun, Mar 9, 2014 at 11:49 PM, Erick Erickson  >
> > wrote:
> >>
> >> You've contradicted yourself, so it's hard to say. Or
> >> I'm mis-reading your messages.
> >>
> >> bq: During indexing i want to token on all punctuations, so i can use
> >> StandardTokenizer, but at search time i want to consider punctuations as
> >> part of text,
> >>
> >> and in your second message:
> >>
> >> bq: when i search for "A,B" it should return result. [for input "A,B"]
> >>
> >> If, indeed, you "... at search time i want to consider punctuations as
> >> part of text" then "A,B" should NOT match the document.
> >>
> >> The admin/analysis page is your friend, I strongly suggest you spend
> >> some time looking at the various transformations performed by
> >> the various analyzers and tokenizers.
> >>
> >> Best,
> >> Erick
> >>
> >> On Sun, Mar 9, 2014 at 1:54 PM, abhishek jain
> >>  wrote:
> >> > hi,
> >> >
> >> > Thanks for replying promptly,
> >> > an example:
> >> >
> >> > I want to index for A,B
> >> > but when i search A AND B, it should return result,
> >> > when i search for "A,B" it should return result.
> >> >
> >> > Also Ideally when i search for "A , B" (with space) it should return
> >> > result.
> >> >
> >> >
> >> > please advice
> >> > thanks
> >> > abhishek
> >> >
> >> >
> >> > On Sun, Mar 9, 2014 at 9:52 PM, Furkan KAMACI
> >> > wrote:
> >> >
> >> >> Hi;
> >> >>
> >> >> Firstly you have to keep in mind that if you don't index punctuation
> >> >> they
> >> >> will not be visible for search. On the other hand you can have
> >> >> different
> >> >> analyzer for index and search. You have to give more detail about
> your
> >> >> situation. What will be your tokenizer at search time,
> >> >> WhiteSpaceTokenizer?
> >> >> You can have a look at here:
> >> >> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
> >> >>
> >> >> If you can give some examples what you want for indexing and
> searching
> >> >> I
> >> >> can help you to combine index and search analyzer/tokenizer/token
> >> >> filters.
> >> >>
> >> >> Thanks;
> >> >> Furkan KAMACI
> >> >>
> >> >>
> >> >> 2014-03-09 18:06 GMT+02:00 abhishek jain  >:
> >> >>
> >> >> > Hi Friends,
> >> >> >
> >> >> > I am concerned on Tokenizer, my scenario is:
> >> >> >
> >> >> > During indexing i want to token on all punctuations, so i can use
> >> >> > StandardTokenizer, but at search time i want to consider
> punctuations
> >> >> > as
> >> >> > part of text,
> >> >> >
> >> >> > I dont store contents but only indexes.
> >> >> >
> >> >> > What should i use.
> >> >> >
> >> >> > Any advices ?
> >> >> >
> >> >> >
> >> >> > --
> >> >> > Thanks and kind Regards,
> >> >> > Abhishek jain
> >> >> >
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > Thanks and kind Regards,
> >> > Abhishek jain
> >> > +91 9971376767
> >
> >
> >
> >
> > --
> > Thanks and kind Regards,
> > Abhishek jain
> > +91 9971376767
>



-- 
Thanks and kind Regards,
Abhishek jain
+91 9971376767


Re: Which Tokenizer to use at searching

2014-03-10 Thread Shawn Heisey
On 3/10/2014 6:20 AM, abhishek jain wrote:
> 
>  replacement=" punct " replace="all"/>



> Is there a way i can tokenize after application of filter, please suggest i
> know i am missing something basic.

Use PatternReplaceCharFilterFactory instead.  CharFilters are performed
before tokenizers, regardless of where they are defined in the analysis
chain.

Thanks,
Shawn



How to customize Solr

2014-03-10 Thread lavesh
I want list of users who are online and fulfill the criteria specified.

Current implementation:
I am sending post parameters of online ids(usually 20k) with search
criteria.

How i want to optimize it:
I must change the internal code of solr  so that these 20k profiles are
fetching from solr 

i.e. I need to implement something like *fq:online:(1),* which can be
interpreted by solr handler as online search and internally it call mysql
and get 20k profiles.

Help:
Please guide me what and how to do such changes.
Link(if possible) for performing internal changes.

Thanx



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-customize-Solr-tp4122551.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud setup guidance

2014-03-10 Thread Furkan KAMACI
Hi Priti;

100 qps is not much but 7 GB is too low and it may be a problem for you. I
have a tens of nodes of SolrCloud and I send data them via Map/Reduce via
tens of servers. However indexing speed did not be a problem for me yet.
Problems occurs because of network communication, RAM or something else.

If we talk about the ideal size of RAM: then you should have a RAM total of
at least your index size. Because you may want to cache everything ideally.
However if you consider the situations as like optimize you it would be
nice 3 times larger of your index size. However RAM is also depends on your
user characteristics. Sometimes caching only some of documents may be
enough because of they are queried many many times.

As I mentioned previously 100 qps is not much and indexing is not a
bottleneck at many situations I suggest to remember the occam's razor.
Start with simple and increase your RAM step by step. On the other hand I
suggest you to use a tool as like Solrmeter to test your qps.

If you have any question I can help you about your infrastructure.

Thanks;
Furkan KAMACI


2014-03-10 12:41 GMT+02:00 Priti Solanki :

> As of now index is on 136 GB.
>
> I want to understand can we do multiple write on solr?  I don't have any
> partitioning strategy as of now.
>
> On Amazon instance for Solr the disk read/write a like 5% or so . I am not
> able to understand even though I am almost processing 300 records per min
> how come Solr server is not receiving high disk IO.
>
> I have a query, When solr is writing does it update complete index again ?
>
> I will try and increase the RAM to 128 and see if some improvement comes..
>
>
> On Mon, Mar 10, 2014 at 9:21 AM, Susheel Kumar <
> susheel.ku...@thedigitalgroup.net> wrote:
>
> > Not sure how fast your index will grow but first you may still want to
> > consider upgrading the single machine to 128 GB to see how the
> performance
> > is coming. Current memory 7 GB is really low.  After that you may want to
> > add another node to partition the index into 2 nodes/shards (assuming you
> > have some partition strategy ) that index size of 150-200 GB can fit into
> > two nodes memory.
> >
> > Thanks,
> > Susheel
> >
> > -Original Message-
> > From: Priti Solanki [mailto:pritiatw...@gmail.com]
> > Sent: Friday, March 07, 2014 12:50 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: SolrCloud setup guidance
> >
> > Thanks Susheel,
> >
> > But this index will keep on growing that my worry So I always have to
> > increase the RAM .
> >
> > Can you suggest how many nodes one can think to support this bug index?
> >
> > Regards,
> >
> >
> >
> > On Fri, Mar 7, 2014 at 2:50 AM, Susheel Kumar <
> > susheel.ku...@thedigitalgroup.net> wrote:
> >
> > > Setting up Solr cloud(horizontal scaling) is definitely a good idea
> > > for this big index but before going to Solr cloud, are you able to
> > > upgrade your single node to 128GB of memory(vertical scaling) to see
> the
> > difference.
> > >
> > > Thanks,
> > > Susheel
> > >
> > > -Original Message-
> > > From: Priti Solanki [mailto:pritiatw...@gmail.com]
> > > Sent: Thursday, March 06, 2014 10:51 AM
> > > To: solr-user@lucene.apache.org
> > > Subject: SolrCloud setup guidance
> > >
> > > Hello Everyone,
> > >
> > > I would like to take you guidance of following
> > >
> > > I have a single core with 124 GB of index data size. Indexing and
> > > Reading both are very slow as I have 7 GB RAM to support this huge
> > > data.  Almost 8 million of documents.
> > >
> > > Hence, we thought of going to SolrCloud so that we can accommodate
> > > more upcoming data. I have data for 13 country with their millions of
> > > products and we want to set up solrcloud for the same.
> > >
> > > I am in need of some initial thoughts about how to setup solrCloud for
> > > such requirement. How to we come to know how many nodes,core I would
> > > be needing to support this...
> > >
> > > we are thinking to host this on Amazon...Any guidance or reading links
> > > ,case study will be highly appreciated.
> > >
> > > Regards,
> > >
> >
>


The way Autocommit works in solr - Wierd

2014-03-10 Thread RadhaJayalakshmi
Hi,

Brief Description of my application:
We have a java program which reads a flat file, and adds document to solr
using cloudsolrserver.
And we index for every 1000 documents(bulk indexing). 

And the Autocommit setting of my application is:

10
false


So after every 100,000 documents are indexed, engine should perform a
HardCommit/AutoCommit. But still the OpenSearcher will be false. 
Once the file is fully read, we are issuing a commit() from the
CloudSolrServer class. So this by default opens a new Searcher. 

Also, from the Log, i can see that three times, Autocommit is happenning.
and Only with the last/final Autocommit, opensearcher is set to true.

So, till now all looks fine and working as expected.

But one strange issue i observed during the course of indexing.
Now, as per the documentation, the data that is being indexed should first
get written into tlog. When the Autocommit is performed, the data will be
flushed to disk. 
So only at three times, there should have been size difference in the /index
folder. All the time only the size of the /tlog folder should have been
changing

But actually happened is, all the time, i see the size of the /index folder
getting increased in parallel to the size of the /tlog folder.  
Actually it is increasing to certain limit and coming down. Again increasing
and coming down to a point.

So Now the bigger doubt is have is, during hard commit, is the data getting
written into both /index or /tlog folder??

I am using solr 4.5.1.

Some one please clear me how the hardcommit works. I am asumming the
following sequence:
1. Reads the data and writes to tlog
2. During hardcommit, flushes the data from tlog to index. If openSearcher
is false, should not open a new searcher
3. In the end, once all the datas are indexed, it should open a new
searcher.

If not please explain me..

Thanks in Advance
Radha








--
View this message in context: 
http://lucene.472066.n3.nabble.com/The-way-Autocommit-works-in-solr-Wierd-tp4122558.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Zookeeper will not update cluster state when garbaging

2014-03-10 Thread Furkan KAMACI
Hi Metin;

I think that timeout value you are talking about is that:
http://zookeeper.apache.org/doc/r3.1.2/zookeeperStarted.html However it is
not recommended to change timeout value of Zookeeper "if you do not have a
specific reason". On the other hand how many Zookeepers do you have at your
infrastructure?

Also regardless of your question: if it is OK for you could you add your
company here: https://wiki.apache.org/solr/PublicServers This may be nice
for the people that who wonders about which companies uses Solr.

Thanks;
Furkan KAMACI


2014-03-10 12:35 GMT+02:00 OSMAN Metin :

> Hi all,
>
> we are using SolrCloud with this configuration :
>
> * SolR 4.4.0
>
> * Zookeeper 3.4.5
>
> * one server with zookeeper + 4 solr nodes
>
> * one server with 4 solr nodes
>
> * only one core
>
> * Solr instances deployed on tomcats with mod_cluster
>
> * clients access with SolRJ trough Apache + mod_cluster
>
> On the morning, we have massive updates (several thousands in a few
> minute) with explicit softCommit=true.
> This updates are load balanced on each regardless a node is the leader or
> not.
>
> When this happens, the solr cloud admin console shows 7 nodes as
> recovering and the leader as active.
> We also noticed, that refreshing the graphic is very long.
> This situation can last 3 hours until the clusterstate refreshes.
> During this phase, Zookeeper is hardly garbaging (I can post the Munin gc
> graphs).
>
> Here are the command line parameters of zookeeper and solr nodes (I have
> replaced some values with XXX for confidentiality reason).
>
> Zookeeper :
>
> java -cp
> /var/lib/zookeeper/bin/../build/classes:/var/lib/zookeeper/bin/../build/lib/*.jar:/var/lib/zookeeper/bin/../lib/slf4j-log4j12-1.6.1.jar:/var/lib/zookeeper/bin/../lib/slf4j-api-1.6.1.jar:/var/lib/zookeeper/bin/../lib/netty-3.2.2.Final.jar:/var/lib/zookeeper/bin/../lib/log4j-1.2.15.jar:/var/lib/zookeeper/bin/../lib/jline-0.9.94.jar:/var/lib/zookeeper/bin/../zookeeper-3.4.5.jar:/var/lib/zookeeper/bin/../src/java/lib/*.jar:/app/zookeeper/conf:
> -Dcom.sun.management.jmxremote.ssl=false
> -Dcom.sun.management.jmxremote.authenticate=false
> -Dcom.sun.management.jmxremote.port=XXX -Xms384m -Xmx384m
> -XX:MaxPermSize=128m -Dcom.sun.management.jmxremote
> -Dcom.sun.management.jmxremote.local.only=false
> org.apache.zookeeper.server.quorum.QuorumPeerMain
> /app/zookeeper/conf/zoo.cfg
>
> SolR :
>
> /usr/lib/jvm/java/bin/java
> -Dsolr.data.dir=/app/solr/server/search_01/vod/data
> -Dsolr.solr.home=/app/solr/server/search_01 -DnumShards=1
> -Dbootstrap_confdir=/app/solr/server/search_01/vod/conf
> -Dcollection.configName=vod -DzkHost=XXX:2181 -Dtomcat.server.port=XXX
> -Dtomcat.http.port=XXX -Dtomcat.ajp.port=XXX
> -Dlog4j.configuration=file:///app/tomcat/server/search_01/conf/log4j.properties
> -Djboss.jvmRoute=SEARCH_02_01 -Djboss.modcluster.sendToApacheDelayInSec=10
> -Djboss.modcluster.nodetimeout=30 -Djboss.modcluster.ttl=10 -Xms2048m
> -Xmx2048m -XX:MaxPermSize=384m -Dcom.sun.management.jmxremote
> -Dcom.sun.management.jmxremote.port=XXX
> -Dcom.sun.management.jmxremote.ssl=false
> -Dcom.sun.management.jmxremote.authenticate=false -classpath
> :/app/tomcat/server/search_01/bin/bootstrap.jar:/app/tomcat/server/search_01/bin/tomcat-juli.jar:/usr/share/java/commons-daemon.jar
> -Dcatalina.base=/app/tomcat/server/search_01
> -Dcatalina.home=/app/tomcat/server/search_01 -Djava.endorsed.dirs=
> -Djava.io.tmpdir=/app/tomcat/server/search_01/temp
> -Djava.util.logging.config.file=/app/tomcat/server/search_01/conf/log4j.properties
> -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
> org.apache.catalina.startup.Bootstrap start
>
> I have tried other gc strategies, max heap values, new ratio, etc... on
> Zookeeper without success.
> Every time zookeeper is garbaging, the clusterstate is not correct.
>
> Is this a bug with zookeeper, SolR 4.4.0 or is it due to some
> misconfiguration ?
> I have seen somewhere that there is a timeout value between solr and
> zookeeper, but I don't know where it is set (and what is its default value).
>
> Any help will be appreciated.
>
> Regards,
> Metin
>


Re: SolrCloud setup guidance

2014-03-10 Thread Priti Solanki
Thanks Furkam,

This give some really good understanding. We have Amazon Instance and right
now it is running on m1.large.

In Amazon we are not finding a support to increase ONLY RAM ! that our main
concern and we are actively looking which instance can help us to support
this index size.

Do you think m2.4xlarge would help?.

Also if you can share some input over speeding up the indexer process?.

Regards,
Priti


On Mon, Mar 10, 2014 at 6:30 PM, Furkan KAMACI wrote:

> Hi Priti;
>
> 100 qps is not much but 7 GB is too low and it may be a problem for you. I
> have a tens of nodes of SolrCloud and I send data them via Map/Reduce via
> tens of servers. However indexing speed did not be a problem for me yet.
> Problems occurs because of network communication, RAM or something else.
>
> If we talk about the ideal size of RAM: then you should have a RAM total of
> at least your index size. Because you may want to cache everything ideally.
> However if you consider the situations as like optimize you it would be
> nice 3 times larger of your index size. However RAM is also depends on your
> user characteristics. Sometimes caching only some of documents may be
> enough because of they are queried many many times.
>
> As I mentioned previously 100 qps is not much and indexing is not a
> bottleneck at many situations I suggest to remember the occam's razor.
> Start with simple and increase your RAM step by step. On the other hand I
> suggest you to use a tool as like Solrmeter to test your qps.
>
> If you have any question I can help you about your infrastructure.
>
> Thanks;
> Furkan KAMACI
>
>
> 2014-03-10 12:41 GMT+02:00 Priti Solanki :
>
> > As of now index is on 136 GB.
> >
> > I want to understand can we do multiple write on solr?  I don't have any
> > partitioning strategy as of now.
> >
> > On Amazon instance for Solr the disk read/write a like 5% or so . I am
> not
> > able to understand even though I am almost processing 300 records per min
> > how come Solr server is not receiving high disk IO.
> >
> > I have a query, When solr is writing does it update complete index again
> ?
> >
> > I will try and increase the RAM to 128 and see if some improvement
> comes..
> >
> >
> > On Mon, Mar 10, 2014 at 9:21 AM, Susheel Kumar <
> > susheel.ku...@thedigitalgroup.net> wrote:
> >
> > > Not sure how fast your index will grow but first you may still want to
> > > consider upgrading the single machine to 128 GB to see how the
> > performance
> > > is coming. Current memory 7 GB is really low.  After that you may want
> to
> > > add another node to partition the index into 2 nodes/shards (assuming
> you
> > > have some partition strategy ) that index size of 150-200 GB can fit
> into
> > > two nodes memory.
> > >
> > > Thanks,
> > > Susheel
> > >
> > > -Original Message-
> > > From: Priti Solanki [mailto:pritiatw...@gmail.com]
> > > Sent: Friday, March 07, 2014 12:50 AM
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: SolrCloud setup guidance
> > >
> > > Thanks Susheel,
> > >
> > > But this index will keep on growing that my worry So I always have to
> > > increase the RAM .
> > >
> > > Can you suggest how many nodes one can think to support this bug index?
> > >
> > > Regards,
> > >
> > >
> > >
> > > On Fri, Mar 7, 2014 at 2:50 AM, Susheel Kumar <
> > > susheel.ku...@thedigitalgroup.net> wrote:
> > >
> > > > Setting up Solr cloud(horizontal scaling) is definitely a good idea
> > > > for this big index but before going to Solr cloud, are you able to
> > > > upgrade your single node to 128GB of memory(vertical scaling) to see
> > the
> > > difference.
> > > >
> > > > Thanks,
> > > > Susheel
> > > >
> > > > -Original Message-
> > > > From: Priti Solanki [mailto:pritiatw...@gmail.com]
> > > > Sent: Thursday, March 06, 2014 10:51 AM
> > > > To: solr-user@lucene.apache.org
> > > > Subject: SolrCloud setup guidance
> > > >
> > > > Hello Everyone,
> > > >
> > > > I would like to take you guidance of following
> > > >
> > > > I have a single core with 124 GB of index data size. Indexing and
> > > > Reading both are very slow as I have 7 GB RAM to support this huge
> > > > data.  Almost 8 million of documents.
> > > >
> > > > Hence, we thought of going to SolrCloud so that we can accommodate
> > > > more upcoming data. I have data for 13 country with their millions of
> > > > products and we want to set up solrcloud for the same.
> > > >
> > > > I am in need of some initial thoughts about how to setup solrCloud
> for
> > > > such requirement. How to we come to know how many nodes,core I would
> > > > be needing to support this...
> > > >
> > > > we are thinking to host this on Amazon...Any guidance or reading
> links
> > > > ,case study will be highly appreciated.
> > > >
> > > > Regards,
> > > >
> > >
> >
>


Curl : shell script : The requested resource is not available. update/extract !

2014-03-10 Thread Priti Solanki
Hi all,

Following throw "The request resource is not available"


curl "
http://localhost:8080/solr/#/dev/update/extract?stream.file=/home/priti/$file&literal.id=document$i&commit=true
"

I don't understand what is literal.id ?? Is it mandatory. [Please share
reading links if known]

 HTTP Status 404 - /solr/#/dev/update/extracttype Status report*message*
/solr/#dev/update/extractdescription The requested
resource is not available.Apache
Tomcat/7.0.42

Re: How to customize Solr

2014-03-10 Thread Ahmet Arslan
Hi,

How much refreshes do you need? Can you live with 3-5 minutes refresh rate?

If you can effort to query mysql for every single query, consider using post 
filter :
http://searchhub.org/2012/02/22/custom-security-filtering-in-solr/


Ahmet



On Monday, March 10, 2014 2:56 PM, lavesh  wrote:
I want list of users who are online and fulfill the criteria specified.

Current implementation:
I am sending post parameters of online ids(usually 20k) with search
criteria.

How i want to optimize it:
I must change the internal code of solr  so that these 20k profiles are
fetching from solr 

i.e. I need to implement something like *fq:online:(1),* which can be
interpreted by solr handler as online search and internally it call mysql
and get 20k profiles.

Help:
Please guide me what and how to do such changes.
Link(if possible) for performing internal changes.

Thanx



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-customize-Solr-tp4122551.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: The way Autocommit works in solr - Wierd

2014-03-10 Thread Furkan KAMACI
Hi;

Did you read here:
http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Thanks;
Furkan KAMACI


2014-03-10 15:14 GMT+02:00 RadhaJayalakshmi :

> Hi,
>
> Brief Description of my application:
> We have a java program which reads a flat file, and adds document to solr
> using cloudsolrserver.
> And we index for every 1000 documents(bulk indexing).
>
> And the Autocommit setting of my application is:
> 
> 10
> false
> 
>
> So after every 100,000 documents are indexed, engine should perform a
> HardCommit/AutoCommit. But still the OpenSearcher will be false.
> Once the file is fully read, we are issuing a commit() from the
> CloudSolrServer class. So this by default opens a new Searcher.
>
> Also, from the Log, i can see that three times, Autocommit is happenning.
> and Only with the last/final Autocommit, opensearcher is set to true.
>
> So, till now all looks fine and working as expected.
>
> But one strange issue i observed during the course of indexing.
> Now, as per the documentation, the data that is being indexed should first
> get written into tlog. When the Autocommit is performed, the data will be
> flushed to disk.
> So only at three times, there should have been size difference in the
> /index
> folder. All the time only the size of the /tlog folder should have been
> changing
>
> But actually happened is, all the time, i see the size of the /index folder
> getting increased in parallel to the size of the /tlog folder.
> Actually it is increasing to certain limit and coming down. Again
> increasing
> and coming down to a point.
>
> So Now the bigger doubt is have is, during hard commit, is the data getting
> written into both /index or /tlog folder??
>
> I am using solr 4.5.1.
>
> Some one please clear me how the hardcommit works. I am asumming the
> following sequence:
> 1. Reads the data and writes to tlog
> 2. During hardcommit, flushes the data from tlog to index. If openSearcher
> is false, should not open a new searcher
> 3. In the end, once all the datas are indexed, it should open a new
> searcher.
>
> If not please explain me..
>
> Thanks in Advance
> Radha
>
>
>
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/The-way-Autocommit-works-in-solr-Wierd-tp4122558.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Solr Clustering

2014-03-10 Thread Alessandro Benedetti
Any news regarding this ?
I'm investigating in Solr offline clustering as well ( full index
clustering).

Cheers


2012-09-17 20:16 GMT+01:00 Denis Kuzmenok :

>
>
>
> Sorry for late response. To be strict, here is what i want:
>
> * I get documents all the time. Let's assume those are news (It's
> rather similar thing).
>
> * Every time i get new batch of "news" i should add them to Solr index
> and get cluster information for that document. Store this information
> in the DB (so i should know each document's cluster).
>
> * I can't wait for cluster definition service/program to launch from
> time to time, but it should define clusters on the fly.
>
> * I want to be able to get clusters only for some period of time (For
> example i want to search for clusters only for documents that were
> loader one month ago).
>
> * I will have tens of thousands of new documents every day and overall
> base of several millions.
>
> I'm reading "Mahout in action" now. But maybe you can point me to what i
> need.
> --- Исходное сообщение ---
> От кого: "Chandan Tamrakar" 
> Кому: solr-user@lucene.apache.org
> Дата: 4 сентября 2012, 12:30:56
> Тема: Re: Solr Clustering
>
>
>
> >
>
> yes there is a solr component if you want to cluster solr documents , check
> the following linkhttp://wiki.apache.org/solr/ClusteringComponent
> Carrot2 might be good if you want to cluster few thousands of documents ,
> for example when user search solr , just cluster the  search results
>
> Mahout is much more scalable and probably you need Hadoop for that
>
>
> thanks
> chandan
>
> On Tue, Sep 4, 2012 at 2:10 PM, Denis Kuzmenok  wrote:
>
> >
> >
> >  Original Message 
> > Subject: Solr Clustering
> > From: Denis Kuzmenok 
> > To: solr-user@lucene.apache.org> CC:
> >
> > Hi, all.
> > I know there is carrot2 and mahout for clustering. I want to implement
> > such thing:
> > I fetch documents and want to group them into clusters when they are
> added
> > to index (i want to filter "similar" documents for example for 1 week). i
> > need these documents quickly, so i cant rely on some postponed
> > calculations. Each document should have assigned cluster id (like group
> > similar documents into clusters and assign each document its cluster id.
> > It's something similar to news aggregators like google news. I dont need
> > to search for clusters with documents older than 1 week (for example).
> Each
> > document will have its unique id and saved into DB. But solr will have
> > cluster id field also.
> > Is it possible to implement this with solr/carrot/mahout?
>
>
>
>
> --
> Chandan Tamrakar
> *
> *
>
>


-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Indexing useful N-grams and adding payloads

2014-03-10 Thread Manuel Le Normand
Hi,
I have a performance and scoring problem for phrase queries

   1. Performance - phrase queries involving frequent terms are very slow
   due to the reading of large positions posting list.
   2. Scoring - I want to control the boost of phrase and entity (in
   gazetteers) matches

Indexing all terms as bi-grams and unigrams is out of question in my use
case, so I plan indexing only the useful bi-grams. Part of it will be
achieved by the CommonGram filter in which I put the frequent words, but I
think of going one step further and indexing also every phrase query I have
extracted from my query log and entity from my gazetteers To the latter
(which are N-grams) I will also add a payload to control the boost.

An example MappingCharFilter.txt would be:

#phrase-query
term1 term2 term3 => term1_term2_term3|1
#entity
firstName lastName => firstName_lastName|2

One of the issues is that I have 100k-1M (depending on frequency)
phrases/entities as above. I saw that MappingCharFilter is implemented as
an FST, still I'm concerned that iterating on the charBuffer for long
documents might cause problems.

Has anyone faced a similar issue? Is this mapping implementation resonable
during query time performance wise?

Thanks in advance,
Manuel


[Clustering] Full-Index Offline cluster

2014-03-10 Thread Alessandro Benedetti
Hi guys,
I'm looking around to find out if it's possible to have a full-index
/Offline cluster.
My scope is to make a full index clustering ad for each document have the
cluster field with the id/label of the cluster at indexing time.
Anyone know more details regarding this kind of integration with Carrot2 ?

I find only the classic query time clustering approach :
https://cwiki.apache.org/confluence/display/solr/Result+Clustering

Cheers


-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: [Clustering] Full-Index Offline cluster

2014-03-10 Thread Ahmet Arslan
Hi Alessandro,

Generally Apache mahout http://mahout.apache.org is recommended for offline 
clustering.

Ahmet



On Monday, March 10, 2014 4:11 PM, Alessandro Benedetti 
 wrote:
Hi guys,
I'm looking around to find out if it's possible to have a full-index
/Offline cluster.
My scope is to make a full index clustering ad for each document have the
cluster field with the id/label of the cluster at indexing time.
Anyone know more details regarding this kind of integration with Carrot2 ?

I find only the classic query time clustering approach :
https://cwiki.apache.org/confluence/display/solr/Result+Clustering

Cheers


-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England



Re: SolrCloud with Tomcat

2014-03-10 Thread Vineet Mishra
Hi

Got it working!

Much thanks for you response.


On Sat, Mar 8, 2014 at 7:40 PM, Furkan KAMACI wrote:

> Hi;
>
> Could you check here:
>
> http://lucene.472066.n3.nabble.com/Error-when-creating-collection-in-Solr-4-6-td4103536.html
>
> Thanks;
> Furkan KAMACI
>
>
> 2014-03-07 9:44 GMT+02:00 Vineet Mishra :
>
> > Hi
> >
> > I am installing SolrCloud with 3 External
> > Zookeeper(localhost:2181,localhost:2182,localhost:2183) and 2
> > Tomcats(localhost:8181,localhost:8182) all available on a single
> > Machine(Just for getting started).
> > By Following these links
> >
> > http://myjeeva.com/solrcloud-cluster-single-collection-deployment.html
> > http://wiki.apache.org/solr/SolrCloudTomcat
> >
> > I have got the Solr UI on the machine pointing to
> >
> > http://localhost:8181/solr/#/~cloud
> >
> > In the Cloud Graph View it is coming with
> >
> > mycollection
> > |
> > |_ shard1
> > |_ shard2
> >
> > But both the shards are empty and showing no cores or replica.
> >
> > Following
> >
> http://myjeeva.com/solrcloud-cluster-single-collection-deployment.htmlblog
> > ,
> > I have been successful till starting tomcat,
> > since after the section "Creating Collection, Shard(s), Replica(s) in
> > SolrCloud" I am facing the problem.
> >
> > Giving command to create replica for the shard using
> >
> > *curl
> > '
> >
> http://localhost:8181/solr/admin/cores?action=CREATE&name=shard1-replica-2&collection=mycollection&shard=shard1
> > <
> >
> http://localhost:8181/solr/admin/cores?action=CREATE&name=shard1-replica-2&collection=mycollection&shard=shard1
> > >'*
> >
> > it is giving error
> >
> > 
> > 400 > name="QTime">137
> > *Error CREATEing SolrCore 'shard1-replica-2':
> > 192.168.2.183:8182_solr_shard1-replica-2 is removed*
> > 400
> > 
> >
> > Has anybody went through this issue?
> >
> > Regards
> >
>


Re: Curl : shell script : The requested resource is not available. update/extract !

2014-03-10 Thread Raymond Wiker
"literal.id" should contain a unique identifier for each document (assuming
that the unique identifier field in your solr schema is called "id"); see
http://wiki.apache.org/solr/ExtractingRequestHandler .

I'm guessing that the url for the ExtractinRequestHandler is incorrect, or
maybe you haven't even configured/enabled the ExtractingRequestHandler in
solr-config.xml. Further, from the url you show, I'm guessing that "$file"
and "$i" are references to shell variables that have been incorrectly
quoted (for example, by enclosing a constructed url in single quotes
instead of double quotes, if you're on a Unixoid platform.)


On Mon, Mar 10, 2014 at 2:51 PM, Priti Solanki wrote:

> Hi all,
>
> Following throw "The request resource is not available"
>
>
> curl "
>
> http://localhost:8080/solr/#/dev/update/extract?stream.file=/home/priti/$file&literal.id=document$i&commit=true
> "
>
> I don't understand what is literal.id ?? Is it mandatory. [Please share
> reading links if known]
>
>  HTTP Status 404 - /solr/#/dev/update/extract size="1" noshade="noshade">type Status
> report*message*
> /solr/#dev/update/extractdescription The requested
> resource is not available.Apache
> Tomcat/7.0.42
> Whats wrong?
>
> Regards,
> Priti
>


Re: [Clustering] Full-Index Offline cluster

2014-03-10 Thread Alessandro Benedetti
Thank you, Ahmet, i already know Mahout.
What i was curious is if already exists an integration in Solr for Offline
clustering ...
Reading the wiki we can find this phrase : " While Solr contains an
extension for for full-index clustering (*off-line* clustering) this
section will focus on discussing on-line clustering only."[1]
So I was wondering if any documentation stands there :)
[1] https://cwiki.apache.org/confluence/display/solr/Result+Clustering


2014-03-10 14:15 GMT+00:00 Ahmet Arslan :

> Hi Alessandro,
>
> Generally Apache mahout http://mahout.apache.org is recommended for
> offline clustering.
>
> Ahmet
>
>
>
> On Monday, March 10, 2014 4:11 PM, Alessandro Benedetti <
> benedetti.ale...@gmail.com> wrote:
> Hi guys,
> I'm looking around to find out if it's possible to have a full-index
> /Offline cluster.
> My scope is to make a full index clustering ad for each document have the
> cluster field with the id/label of the cluster at indexing time.
> Anyone know more details regarding this kind of integration with Carrot2 ?
>
> I find only the classic query time clustering approach :
> https://cwiki.apache.org/confluence/display/solr/Result+Clustering
>
> Cheers
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>
>


-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


RE: Zookeeper will not update cluster state when garbaging

2014-03-10 Thread OSMAN Metin
Merhaba Furkan,

We are planning to migrate to 3 nodes in an ensemble, but by now we have only 
one active zookeeper instance in production.

Actually, I thought about a param somewhere in Solr configuration. I may be 
wrong but I thought that the problem was due to the fact that Solr asks or 
tells zookeeper to update its states, but it cannot as it is busy garbaging its 
memory. Nevertheless, I will try modifying the tickTime param.

For the second point, I will ask my boss if I can add our company to your wiki.

Metin

-Message d'origine-
De : Furkan KAMACI [mailto:furkankam...@gmail.com] 
Envoyé : lundi 10 mars 2014 14:26
À : solr-user@lucene.apache.org
Objet : Re: Zookeeper will not update cluster state when garbaging

Hi Metin;

I think that timeout value you are talking about is that:
http://zookeeper.apache.org/doc/r3.1.2/zookeeperStarted.html However it is not 
recommended to change timeout value of Zookeeper "if you do not have a specific 
reason". On the other hand how many Zookeepers do you have at your 
infrastructure?

Also regardless of your question: if it is OK for you could you add your 
company here: https://wiki.apache.org/solr/PublicServers This may be nice for 
the people that who wonders about which companies uses Solr.

Thanks;
Furkan KAMACI


2014-03-10 12:35 GMT+02:00 OSMAN Metin :

> Hi all,
>
> we are using SolrCloud with this configuration :
>
> * SolR 4.4.0
>
> * Zookeeper 3.4.5
>
> * one server with zookeeper + 4 solr nodes
>
> * one server with 4 solr nodes
>
> * only one core
>
> * Solr instances deployed on tomcats with mod_cluster
>
> * clients access with SolRJ trough Apache + mod_cluster
>
> On the morning, we have massive updates (several thousands in a few
> minute) with explicit softCommit=true.
> This updates are load balanced on each regardless a node is the leader 
> or not.
>
> When this happens, the solr cloud admin console shows 7 nodes as 
> recovering and the leader as active.
> We also noticed, that refreshing the graphic is very long.
> This situation can last 3 hours until the clusterstate refreshes.
> During this phase, Zookeeper is hardly garbaging (I can post the Munin 
> gc graphs).
>
> Here are the command line parameters of zookeeper and solr nodes (I 
> have replaced some values with XXX for confidentiality reason).
>
> Zookeeper :
>
> java -cp
> /var/lib/zookeeper/bin/../build/classes:/var/lib/zookeeper/bin/../build/lib/*.jar:/var/lib/zookeeper/bin/../lib/slf4j-log4j12-1.6.1.jar:/var/lib/zookeeper/bin/../lib/slf4j-api-1.6.1.jar:/var/lib/zookeeper/bin/../lib/netty-3.2.2.Final.jar:/var/lib/zookeeper/bin/../lib/log4j-1.2.15.jar:/var/lib/zookeeper/bin/../lib/jline-0.9.94.jar:/var/lib/zookeeper/bin/../zookeeper-3.4.5.jar:/var/lib/zookeeper/bin/../src/java/lib/*.jar:/app/zookeeper/conf:
> -Dcom.sun.management.jmxremote.ssl=false
> -Dcom.sun.management.jmxremote.authenticate=false
> -Dcom.sun.management.jmxremote.port=XXX -Xms384m -Xmx384m 
> -XX:MaxPermSize=128m -Dcom.sun.management.jmxremote 
> -Dcom.sun.management.jmxremote.local.only=false
> org.apache.zookeeper.server.quorum.QuorumPeerMain
> /app/zookeeper/conf/zoo.cfg
>
> SolR :
>
> /usr/lib/jvm/java/bin/java
> -Dsolr.data.dir=/app/solr/server/search_01/vod/data
> -Dsolr.solr.home=/app/solr/server/search_01 -DnumShards=1 
> -Dbootstrap_confdir=/app/solr/server/search_01/vod/conf
> -Dcollection.configName=vod -DzkHost=XXX:2181 -Dtomcat.server.port=XXX 
> -Dtomcat.http.port=XXX -Dtomcat.ajp.port=XXX 
> -Dlog4j.configuration=file:///app/tomcat/server/search_01/conf/log4j.p
> roperties
> -Djboss.jvmRoute=SEARCH_02_01 
> -Djboss.modcluster.sendToApacheDelayInSec=10
> -Djboss.modcluster.nodetimeout=30 -Djboss.modcluster.ttl=10 -Xms2048m 
> -Xmx2048m -XX:MaxPermSize=384m -Dcom.sun.management.jmxremote 
> -Dcom.sun.management.jmxremote.port=XXX
> -Dcom.sun.management.jmxremote.ssl=false
> -Dcom.sun.management.jmxremote.authenticate=false -classpath 
> :/app/tomcat/server/search_01/bin/bootstrap.jar:/app/tomcat/server/sea
> rch_01/bin/tomcat-juli.jar:/usr/share/java/commons-daemon.jar
> -Dcatalina.base=/app/tomcat/server/search_01
> -Dcatalina.home=/app/tomcat/server/search_01 -Djava.endorsed.dirs= 
> -Djava.io.tmpdir=/app/tomcat/server/search_01/temp
> -Djava.util.logging.config.file=/app/tomcat/server/search_01/conf/log4
> j.properties 
> -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
> org.apache.catalina.startup.Bootstrap start
>
> I have tried other gc strategies, max heap values, new ratio, etc... 
> on Zookeeper without success.
> Every time zookeeper is garbaging, the clusterstate is not correct.
>
> Is this a bug with zookeeper, SolR 4.4.0 or is it due to some 
> misconfiguration ?
> I have seen somewhere that there is a timeout value between solr and 
> zookeeper, but I don't know where it is set (and what is its default value).
>
> Any help will be appreciated.
>
> Regards,
> Metin
>


Re: Curl : shell script : The requested resource is not available. update/extract !

2014-03-10 Thread Jack Krupansky
The "#" character introduces the "fragment" portion of a URL, so 
"/dev/update/extract" is not a part of the "path" of the URL. In this case 
the URL "path" is "/solr/" and the server is simply complaining that there 
is no code registered to process that path.


Normally, the collection name (core name) follows "/solr/".

-- Jack Krupansky

-Original Message- 
From: Priti Solanki

Sent: Monday, March 10, 2014 9:51 AM
To: solr-user@lucene.apache.org
Subject: Curl : shell script : The requested resource is not available. 
update/extract !


Hi all,

Following throw "The request resource is not available"


curl "
http://localhost:8080/solr/#/dev/update/extract?stream.file=/home/priti/$file&literal.id=document$i&commit=true
"

I don't understand what is literal.id ?? Is it mandatory. [Please share
reading links if known]

HTTP Status 404 - /solr/#/dev/update/extractsize="1" noshade="noshade">type Status 
report*message*

/solr/#dev/update/extractdescription The requested
resource is not available.Apache
Tomcat/7.0.42Priti 



Re: Polygon search returning "InvalidShapeException: incompatible dimension (2)... error.

2014-03-10 Thread Smiley, David W.
You need to either quote your query (after the colon, and another at the
very end), or escape any special characters, or use a different query
parser like “field”.  I prefer to use the field query parser:

{!field f=loc}Intersects(POLYGON(...

~ David

On 3/6/14, 10:52 AM, "leevduhl"  wrote:

>Getting the following error when attempting to run a polygon query from
>the
>Solr Admin utility: :"com.spatial4j.core.exception.InvalidShapeException:
>incompatible dimension (2) and values (Intersects).  Only 0 values
>specified",
>"code":400
>
>My query is as follows:
>q=geoloc:Intersects(POLYGON((-83.6349 42.4718, -83.5096 42.471868,
>-83.5096
>42.4338, -83.6349 42.4338, -83.6349 42.4718)))
>
>The response is as follows:
>{
>  "responseHeader":{
>"status":400,
>"QTime":2,
>"params":{
>  "debugQuery":"true",
>  "fl":"id, openhousestartdate, geoloc",
>  "sort":"openhousestartdate desc",
>  "indent":"true",
>  "q":"geoloc:Intersects(POLYGON((83.6349 42.4718, 83.5096 42.471868,
>83.5096 42.4338, 83.6349 42.4338, 83.6349 42.4718)))",
>  "wt":"json"}},
>  "error":{
>"msg":"com.spatial4j.core.exception.InvalidShapeException:
>incompatible
>dimension (2) and values (Intersects).  Only 0 values specified",
>"code":400}}
>
>My "geoloc" dimension/field is setup as follows in my Schema.xml:
>multiValued="false"/>
>
>Some sample document "geoloc" data is shown below.
>"docs": [
>  {
>"geoloc": "-82.549200,43.447400"
>  },
>  {
>"geoloc": "-82.671551,43.421797"
>  }
>]
>
>My Solr version info is as follows:
>solr-spec: 4.6.1
>solr-impl: 4.6.1 1560866 - mark - 2014-01-23 20:21:50
>lucene-spec: 4.6.1
>lucene-impl: 4.6.1 1560866 - mark - 2014-01-23 20:11:13
>
>Any info on a solution to this problem would be appreciated.
>
>Thanks
>Lee
>
>
>
>--
>View this message in context:
>http://lucene.472066.n3.nabble.com/Polygon-search-returning-InvalidShapeEx
>ception-incompatible-dimension-2-error-tp4121704.html
>Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr spatial search within the polygon

2014-03-10 Thread Smiley, David W.


On 3/10/14, 6:45 AM, "Javi"  wrote:

>Hi all.
>
>I need your help! I have read every post about Spatial in Solr because I
>need to check if a point (latitude,longitude) is inside a Polygon.
>
>/**/
>/* 1. library */
>/**/
>
>(1) I use "jts-1.13.jar" and "spatial4j-0.4.1.jar"
>(I think they are the latest version)

You should only need to add JTS; spatial4j is included in Solr.  Where
exactly did you put it?

>
>/*/
>/* 2. schema.xml */
>/*/
>class="solr.SpatialRecursivePrefixTreeFieldType"   
>spatialContextFactory="com.spatial4j.core.context.jts.JtsSpatialContextFac
>to
>ry" distErrPct="0.025" maxDistErr="0.09" units="degrees" />
>
>(I omit geo="true" because it is the default)
>
>...
>
>
>
>(Here I dont know what means if I add multiValued="true")

How many points might there be in this field for a given document?  0 or
1?  Don’t set multiValued=true but if you expect possibly more than 1 then
set it to true.

>
>/*/
>/* Document contents */
>/*/
>I have tried with 3 different content for my documents (lat-lon refers to
>Madrid, Spain):

Um…. Just to be absolutely sure, are you adding the data in Solr’s XML
format, which is this?:

XML Formatted Index Updates



The examples you give below are the *output* XML format which is not the
same as the input format.  In particular you don’t give arrays of values
to Solr; you simply give more than one field element that has the same
name.

>
>
>a) As it is WKT format, I tried "longitude latitude" (x y)
>
>
>  -3.69278 40.442179
>
>

That should work but I don’t recommend that, as a matter of taste, if all
your data is in latitude & longitude, as opposed to projected data or any
other spatial data.

>
>b) As it is WKT format, I tried "POINT(longitude latitude) (x y)
>
>
>
>  POINT(-3.69278 40.442179)
>
>
>
>And

Again, that should work but see my comment above.

>
>c) I tried no WKT format by adding a comma and using "longitude,latitude"
>
>
>
>  40.442179,-3.69278
>
>

That is *wrong*.  Remove the comma and it will then be okay.  But again,
see my earlier advise on lat & lon data.

>
>d) I tried no WKT format by adding a comma and using "latitude,longitude"
>
>
>
>  -3.69278,40.442179
>
>

But that isn’t latitude then longitude of Madrid; you have it reversed.
“latitude,longitude” of Madrid is “40.442719,-3.69278”.

>
>/*/
>/* My solr query */
>/*/
>
>a) 
>_Description: This POLYGON (in WKT format, so "longitude latitude") is a
>triangle that cover Madrid at all, so my point would be inside them.
>_Result: Query return 0 documents (which is wrong).
> 
>http://localhost:8983/solr/pisos22/select?q=*%3A*&;
>fl=LOCATION&
>wt=xml&
>indent=true&
>fq=LOCATION:"IsWithin(POLYGON((
>-3.732605 40.531415,
>-3.856201 40.336993,
>-3.493652 40.332806,
>-3.732605 40.531415
>))) distErrPct=0"
>
>b) 
>_Descripcion: This POLYGON (in WKT format, so "longitude latitude") is a
>rectangle out of Madrid, so my point would not be inside them.
>_Result: Query return 0 documents (which is correct).
>
>http://localhost:8983/solr/pisos22/select?q=*%3A*&;
>fl=LOCATION&
>wt=xml&
>indent=true&
>fq=LOCATION:"IsWithin(POLYGON((
>-4.0594 40.8708,
>-4.0621 40.7211 ,
>-3.8095 40.7127,
>-3.8232 40.8687,
>-4.0594 40.8708
>))) distErrPct=0"
>
>***I also tried modifying the order of lat/lon but I am not able to find
>out 
>the solution to make it work.

The “x y” order looks good.  “IsWithin” should work but if all your
indexed data is points then use “Intersects” which is much faster.

As a sanity check can you simply do a {!geofilt} query with the “pt” set
to madrid and a hundred kilometers or whatever?

~ David



Re: maxClauseCount is set to 1024

2014-03-10 Thread Jack Krupansky
The clause limit covers all clauses (terms) in one Lucene BooleanQuery - one 
level of a Solr query, where a parenthesized sub-query is a separate level 
and counts as a single clause in the parent query.


In this case, it appears that the wildcard is being expanded/rewritten to a 
long list of terms.


What version of Solr are you using? Newer releases of Solr should expand 
wildcards as constant score queries rather than enumerating all the terms. 
Although, it looks like the difficulty may be in highlighting - do you 
really need to highlight the roles field?


-- Jack Krupansky

-Original Message- 
From: Andreas Owen

Sent: Monday, March 10, 2014 4:07 AM
To: solr-user@lucene.apache.org
Subject: maxClauseCount is set to 1024


does this maxClauseCount go over each field individually or all put 
together? is it the date fields?



when i execute a query i get this error:


   500   name="QTime">93name="indent">true Ein PDFchen als Dokument 
roles:* 1394436617394 name="wt">xml start="0" maxScore="0.40899447">


   .
   0.10604319

   name="facet_fields">name="Agenda">390   2   name="Formulare">27   1 
1   3 
3   1 
8   10   name="Schulung_ONL">1   14   name="Weisung">37   1  name="doctype">   1   4 
8   44   name="pptx">4   1   name="xlsx">6
57   name="1_Anleitungen">11   11   3   name="1_Ausbildung_Weiterbildung">3   name="1_Beratung">4   1   2 
1   name="1_Handlungsempfehlung">2   name="1_Handlungsempfehlung_a">2   name="1_Marktbearbeitung">2   name="1_Marktbearbeitung_Events">2   name="1_Produkte">29   1   name="1_Weisungen_Workplace [Weisungen]">1  name="author_s">   17   1   1   name="Bannwart Markus (MBA)">4   1   3   name="Bollinger Beat (BBO)">5   1   5   name="Buric Aleksandra (ABC)">1   2   1   name="D'Adamo-Gähler Karin (KDA)">1   1   35   name="Donatsch Roman (RDO)">1   2   26   name="Fankhauser Hausi (HFA)">2   1   2   3 
1   1   1   name="Heimbeck Markus (MHI)">27   3   1   name="Helg Christoph (CHL)">1   3   1   name="Huber Paul (PHU)">1   3   6   name="Hümbeli Isabelle (IHE)">3   1   2   name="Kasper Markus (MKP)">2   2   2   2   name="Kälin-Klay Sonja (SKY)">28   4   2   name="Monti Mirko (MMO)">1   16   46   name="Pfister Nicole (NPF)">1   5   11   name="Reutlinger Graf Caroline (CRE)">58   1   2   name="Salvisberg Adrian (ASA)">29   2   2 
1   1   name="Seeholzer Carola (CSZ)">1   9   4   name="Tobler Tamara (TTO)">75   2   2   name="Vettori Renato (RVE)">1   2   2   name="Weinzerl Rudolf (RWE)">1   4   1   name="Wuffli Markus (MWU)">1   2   1   name="Zimmermann Peter (PZN)">1   1 name="">91   name="Füllemann Stefan (SFU)">1   11   3   name="Wäschle Jeannine (JWA)">3  name="veranstaltung_s">   20 
15   59   11 
36   name="Kundenevents">204   18   2   name="Mitarbeiterevents">25   name="facet_dates">name="2011-03-01T00:00:00Z">7   name="2011-04-01T00:00:00Z">5   name="2011-05-01T00:00:00Z">2   name="2011-06-01T00:00:00Z">7   name="2011-07-01T00:00:00Z">3   name="2011-08-01T00:00:00Z">7   name="2011-09-01T00:00:00Z">10   name="2011-10-01T00:00:00Z">10   name="2011-11-01T00:00:00Z">4   name="2013-10-01T00:00:00Z">1   name="2013-11-01T00:00:00Z">34   name="2013-12-01T00:00:00Z">4   name="2014-01-01T00:00:00Z">35   name="2014-02-01T00:00:00Z">25   name="2014-03-01T00:00:00Z">9   +1MONTH 
2011-03-01T00:00:00Z   name="end">2014-04-01T00:00:00Z   0 
 
maxClauseCount is set to 1024   name="trace">org.apache.lucene.search.BooleanQuery$TooManyClauses: 
maxClauseCount is set to 1024 at 
org.apache.lucene.search.ScoringRewrite$1.checkMaxClauseCount(ScoringRewrite.java:72) 
at 
org.apache.lucene.search.ScoringRewrite$ParallelArraysTermCollector.collect(ScoringRewrite.java:152) 
at 
org.apache.lucene.search.TermCollectingRewrite.collectTerms(TermCollectingRewrite.java:79) 
at org.apache.lucene.search.ScoringRewrite.rewrite(ScoringRewrite.java:108) 
at org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java:288) 
at 
org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:217) 
at 
org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:99) 
at 
org.apache.lucene.search.highlight.WeightedSpanTermExtractor.getWeightedSpanTerms(WeightedSpanTermExtractor.java:469) 
at
org.apache.lucene.search.highlight.QueryScorer.initExtractor(QueryScorer.java:217) 
at org.apach

Solr to return the list of matched fields

2014-03-10 Thread heaven
Hi, I have a few text fields indexed and when searching I need to know what
field matched. For example I have fields:
{code}
full_name, site_source, tweets, rss_entries, etc
{code}
When searching I need to show results and show scores per each field. So an
user can see what exactly content match the given keywords (I can't use
stored fields because of the index size).

Thank you,
Alex



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-to-return-the-list-of-matched-fields-tp4122613.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr spatial search within the polygon

2014-03-10 Thread Smiley, David W.


On 3/10/14, 12:12 PM, "Smiley, David W."  wrote:

>>
>>
>>
>>c) I tried no WKT format by adding a comma and using "longitude,latitude"
>>
>>
>>
>>  40.442179,-3.69278
>>
>>
>
>That is *wrong*.  Remove the comma and it will then be okay.  But again,
>see my earlier advise on lat & lon data.

Whoops; I mean… “-3.69 40.44” would be a valid way — X Y order.



Re: Solr spatial search within the polygon

2014-03-10 Thread javinsnc
David Smiley (@MITRE.org) wrote
> On 3/10/14, 6:45 AM, "Javi" <

> javiersangrador@

> > wrote:
> 
>>/**/
>>/* 1. library */
>>/**/
>>
>>(1) I use "jts-1.13.jar" and "spatial4j-0.4.1.jar"
>>(I think they are the latest version)
> 
> You should only need to add JTS; spatial4j is included in Solr.  Where
> exactly did you put it?
> 
> \solr-4.6\solr-webapp\webapp\WEB-INF\lib
> As the other lucene librarires. If I delete from this path, solr returns
> me an error, so I think it's in the proper place.
> 
>>
>>/*/
>>/* 2. schema.xml */
>>/*/
>>
> >
> class="solr.SpatialRecursivePrefixTreeFieldType"  
>>spatialContextFactory="com.spatial4j.core.context.jts.JtsSpatialContextFac
>>to
>>ry" distErrPct="0.025" maxDistErr="0.09" units="degrees" />
>>
>>(I omit geo="true" because it is the default)
>>
>>...
>>
>>
> 
>>
>>(Here I dont know what means if I add multiValued="true")
> 
> How many points might there be in this field for a given document?  0 or
> 1?  Don’t set multiValued=true but if you expect possibly more than 1 then
> set it to true.
> 
> Just one point, so I omit multiValued. Thanks.
> 
>>
>>/*/
>>/* Document contents */
>>/*/
>>I have tried with 3 different content for my documents (lat-lon refers to
>>Madrid, Spain):
> 
> Um…. Just to be absolutely sure, are you adding the data in Solr’s XML
> format, which is this?:
> 
> XML Formatted Index Updates
>  +Handlers#UploadingDatawithIndexHandlers-XMLFormattedIndexUpdates>
> 
> I am not sure If I understand you properly. I index the documents with
> Lucene, not with not with update handler of solr. Maybe here is the
> problem. Can you set the type of the field (apart from 
> <
> fieldType defined in schema.xml) when indexing in Lucene? Months ago, I
> needed to index a LONG field, I do the trick with this aproach and it
> works.
> 
> One thing is how Solr retrieve he data (defined in schema.xml with its
>  >
> ) and other thing is how lucene index the field, right?
> 
> 
> The examples you give below are the *output* XML format which is not the
> same as the input format.  In particular you don’t give arrays of values
> to Solr; you simply give more than one field element that has the same
> name.
> 
>>
>>
>>a) As it is WKT format, I tried "longitude latitude" (x y)
>>
> 
>>
> 
>>  
> 
> -3.69278 40.442179
> 
>>
> 
>>
> 
> That should work but I don’t recommend that, as a matter of taste, if all
> your data is in latitude & longitude, as opposed to projected data or any
> other spatial data.
> 
> What do you recommend?
> 
> With "all your data is in latitude & longitude", do you refer that every
> doc in the index has only the field LOCATION? If the answer is yes, then
> no, there is more fields in all the documents.
> 
>>
>>b) As it is WKT format, I tried "POINT(longitude latitude) (x y)
>>
>>
> 
>>
> 
>>  
> 
> POINT(-3.69278 40.442179)
> 
>>
> 
>>
> 
>>
>>And
> 
> Again, that should work but see my comment above.
> 
>>
>>c) I tried no WKT format by adding a comma and using "longitude,latitude"
>>
>>
> 
>>
> 
>>  
> 
> 40.442179,-3.69278
> 
>>
> 
>>
> 
> That is *wrong*.  Remove the comma and it will then be okay.  But again,
> see my earlier advise on lat & lon data.
> 
>>
>>d) I tried no WKT format by adding a comma and using "latitude,longitude"
>>
>>
> 
>>
> 
>>  
> 
> -3.69278,40.442179
> 
>>
> 
>>
> 
> But that isn’t latitude then longitude of Madrid; you have it reversed.
> “latitude,longitude” of Madrid is “40.442719,-3.69278”.
> 
>>
>>/*/
>>/* My solr query */
>>/*/
>>
>>a) 
>>_Description: This POLYGON (in WKT format, so "longitude latitude") is a
>>triangle that cover Madrid at all, so my point would be inside them.
>>_Result: Query return 0 documents (which is wrong).
>> 
>>http://localhost:8983/solr/pisos22/select?q=*%3A*&;
>>fl=LOCATION&
>>wt=xml&
>>indent=true&
>>fq=LOCATION:"IsWithin(POLYGON((
>>-3.732605 40.531415,
>>-3.856201 40.336993,
>>-3.493652 40.332806,
>>-3.732605 40.531415
>>))) distErrPct=0"
>>
>>b) 
>>_Descripcion: This POLYGON (in WKT format, so "longitude latitude") is a
>>rectangle out of Madrid, so my point would not be inside them.
>>_Result: Query return 0 documents (which is correct).
>>
>>http://localhost:8983/solr/pisos22/select?q=*%3A*&;
>>fl=LOCATION&
>>wt=xml&
>>indent=true&
>>fq=LOCATION:"IsWithin(POLYGON((
>>-4.0594 40.8708,
>>-4.0621 40.7211 ,
>>-3.8095 40.7127,
>>-3.8232 40.8687,
>>-4.0594 40.8708
>>))) distErrPct=0"
>>
>>***I also tried modifying the order of lat/lon but I am not able to find
>>out 
>>the solution to make it work.
> 
> The “x y” order looks good.  “IsWithin” should work but if all your
> indexed data is points then use “Intersects” which is much faster.
> 
> As a sanity check can you simply do a {!geofilt} query with 

Re: Solr to return the list of matched fields

2014-03-10 Thread Jack Krupansky
Take a look at the "explain" section of the results when you set the 
debugQuery=true parameter.


Also set the debug.explain.structured=true parameter to get a structured 
representation of the explain section.


-- Jack Krupansky

-Original Message- 
From: heaven

Sent: Monday, March 10, 2014 12:40 PM
To: solr-user@lucene.apache.org
Subject: Solr to return the list of matched fields

Hi, I have a few text fields indexed and when searching I need to know what
field matched. For example I have fields:
{code}
full_name, site_source, tweets, rss_entries, etc
{code}
When searching I need to show results and show scores per each field. So an
user can see what exactly content match the given keywords (I can't use
stored fields because of the index size).

Thank you,
Alex



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-to-return-the-list-of-matched-fields-tp4122613.html
Sent from the Solr - User mailing list archive at Nabble.com. 



[ANNOUNCE] Heliosearch 0.04

2014-03-10 Thread Yonik Seeley
Changes from the previous release are primarily off-heap FieldCache
support for strings as well as as all numerics (the previous release
only had integer support).

Benchmarks for string fields here:
http://heliosearch.org/hs-solr-off-heap-fieldcache-performance

Try it out here: https://github.com/Heliosearch/heliosearch/releases/

-Yonik
http://heliosearch.org - native off-heap filters and fieldcache for solr


Re: Solr spatial search within the polygon

2014-03-10 Thread Smiley, David W.


On 3/10/14, 12:56 PM, "javinsnc"  wrote:
>>>
>>>/*/
>>>/* Document contents */
>>>/*/
>>>I have tried with 3 different content for my documents (lat-lon refers
>>>to
>>>Madrid, Spain):
>> 
>> Um…. Just to be absolutely sure, are you adding the data in Solr’s XML
>> format, which is this?:
>> 
>> XML Formatted Index Updates
>> 
>>>Index
>> +Handlers#UploadingDatawithIndexHandlers-XMLFormattedIndexUpdates>
>> 
>> I am not sure If I understand you properly. I index the documents with
>> Lucene, not with not with update handler of solr. Maybe here is the
>> problem. Can you set the type of the field (apart from
>> <
>> fieldType defined in schema.xml) when indexing in Lucene? Months ago, I
>> needed to index a LONG field, I do the trick with this aproach and it
>> works.
>> 
>> One thing is how Solr retrieve he data (defined in schema.xml with its
>> > >
>> ) and other thing is how lucene index the field, right?

This is indeed the source of the problem.

Why do you index with Lucene’s API and not Solr’s?  Solr not only has a
web-service API but it also has the SolrJ API that can embed Solr —
EmbeddedSolrServer.  I only recommend embedding Solr in limited
circumstances as it’s more flexible and usually plenty fast to communicate
with Solr normally, or to easily customize Solr to load data from a custom
file and so the indexing is all in-process.  You would do the latter with
either a custom DataImportHandler piece or a “ContentStreamLoader subclass.


>> 
>> 
>> The examples you give below are the *output* XML format which is not the
>> same as the input format.  In particular you don’t give arrays of values
>> to Solr; you simply give more than one field element that has the same
>> name.
>>>
>>>
>>>a) As it is WKT format, I tried "longitude latitude" (x y)
>>>
>> 
>>>
>> 
>>>  
>> 
>> -3.69278 40.442179
>> 
>>>
>> 
>>>
>> 
>> That should work but I don’t recommend that, as a matter of taste, if
>>all
>> your data is in latitude & longitude, as opposed to projected data or
>>any
>> other spatial data.
>> 
>> What do you recommend?
>> 
>> With "all your data is in latitude & longitude", do you refer that every
>> doc in the index has only the field LOCATION? If the answer is yes, then
>> no, there is more fields in all the documents.

I’m only talking about the spatial field.  I mean if your *spatial data*
is entirely data points where the two dimensions are latitude and
longitude on the surface of the earth (or hypothetically some other
spherical place).

~ David



Re: Solr spatial search within the polygon

2014-03-10 Thread javinsnc
David Smiley (@MITRE.org) wrote
> On 3/10/14, 12:56 PM, "javinsnc" <

> javiersangrador@

> > wrote:
> This is indeed the source of the problem.
> 
> Why do you index with Lucene’s API and not Solr’s?  Solr not only has a
> web-service API but it also has the SolrJ API that can embed Solr —
> EmbeddedSolrServer.  I only recommend embedding Solr in limited
> circumstances as it’s more flexible and usually plenty fast to communicate
> with Solr normally, or to easily customize Solr to load data from a custom
> file and so the indexing is all in-process.  You would do the latter with
> either a custom DataImportHandler piece or a “ContentStreamLoader
> subclass.

Because I started indexing by lucene and for now, it's impossible to change
it to Solr (although I know the benefits). Maybe in the future.

So do you know how I should index the field in Lucene? I need to know the
exact type for this field. I think Lucene index fields as String by default,
right?

Thanks in advance!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-spatial-search-within-the-polygon-tp4101147p4122640.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr spatial search within the polygon

2014-03-10 Thread David Smiley (@MITRE.org)
You're going to have to use the Lucene-spatial module directly then.  There's
SpatialExample.java to get you started.


javinsnc wrote
> 
> David Smiley (@MITRE.org) wrote
>> On 3/10/14, 12:56 PM, "javinsnc" <

>> javiersangrador@

>> > wrote:
>> This is indeed the source of the problem.
>> 
>> Why do you index with Lucene’s API and not Solr’s?  Solr not only has a
>> web-service API but it also has the SolrJ API that can embed Solr —
>> EmbeddedSolrServer.  I only recommend embedding Solr in limited
>> circumstances as it’s more flexible and usually plenty fast to
>> communicate
>> with Solr normally, or to easily customize Solr to load data from a
>> custom
>> file and so the indexing is all in-process.  You would do the latter with
>> either a custom DataImportHandler piece or a “ContentStreamLoader
>> subclass.
> Because I started indexing by lucene and for now, it's impossible to
> change it to Solr (although I know the benefits). Maybe in the future.
> 
> So do you know how I should index the field in Lucene? I need to know the
> exact type for this field. I think Lucene index fields as String by
> default, right?
> 
> Thanks in advance!





-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-spatial-search-within-the-polygon-tp4101147p4122641.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr spatial search within the polygon

2014-03-10 Thread javinsnc
Could you please send me where I can find this .java? 

What do you refer by "Lucene-spatial module"?

Thanks for your time David!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-spatial-search-within-the-polygon-tp4101147p4122642.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud with Tomcat

2014-03-10 Thread Furkan KAMACI
Hi;

If you have any other problems you can ask them too.

Thanks;
Furkan KAMACI


2014-03-10 16:17 GMT+02:00 Vineet Mishra :

> Hi
>
> Got it working!
>
> Much thanks for you response.
>
>
> On Sat, Mar 8, 2014 at 7:40 PM, Furkan KAMACI  >wrote:
>
> > Hi;
> >
> > Could you check here:
> >
> >
> http://lucene.472066.n3.nabble.com/Error-when-creating-collection-in-Solr-4-6-td4103536.html
> >
> > Thanks;
> > Furkan KAMACI
> >
> >
> > 2014-03-07 9:44 GMT+02:00 Vineet Mishra :
> >
> > > Hi
> > >
> > > I am installing SolrCloud with 3 External
> > > Zookeeper(localhost:2181,localhost:2182,localhost:2183) and 2
> > > Tomcats(localhost:8181,localhost:8182) all available on a single
> > > Machine(Just for getting started).
> > > By Following these links
> > >
> > > http://myjeeva.com/solrcloud-cluster-single-collection-deployment.html
> > > http://wiki.apache.org/solr/SolrCloudTomcat
> > >
> > > I have got the Solr UI on the machine pointing to
> > >
> > > http://localhost:8181/solr/#/~cloud
> > >
> > > In the Cloud Graph View it is coming with
> > >
> > > mycollection
> > > |
> > > |_ shard1
> > > |_ shard2
> > >
> > > But both the shards are empty and showing no cores or replica.
> > >
> > > Following
> > >
> >
> http://myjeeva.com/solrcloud-cluster-single-collection-deployment.htmlblog
> > > ,
> > > I have been successful till starting tomcat,
> > > since after the section "Creating Collection, Shard(s), Replica(s) in
> > > SolrCloud" I am facing the problem.
> > >
> > > Giving command to create replica for the shard using
> > >
> > > *curl
> > > '
> > >
> >
> http://localhost:8181/solr/admin/cores?action=CREATE&name=shard1-replica-2&collection=mycollection&shard=shard1
> > > <
> > >
> >
> http://localhost:8181/solr/admin/cores?action=CREATE&name=shard1-replica-2&collection=mycollection&shard=shard1
> > > >'*
> > >
> > > it is giving error
> > >
> > > 
> > > 400 > > name="QTime">137
> > > *Error CREATEing SolrCore 'shard1-replica-2':
> > > 192.168.2.183:8182_solr_shard1-replica-2 is removed*
> > > 400
> > > 
> > >
> > > Has anybody went through this issue?
> > >
> > > Regards
> > >
> >
>


Re: Solr spatial search within the polygon

2014-03-10 Thread David Smiley (@MITRE.org)
Lucene has multiple modules, one of which is "spatial".  You'll see it in the
source tree checkout underneath the lucene directory.
Javadocs: http://lucene.apache.org/core/4_7_0/spatial/index.html

SpatialExample.java:
https://github.com/apache/lucene-solr/blob/trunk/lucene/spatial/src/test/org/apache/lucene/spatial/SpatialExample.java
Note: I simply went to the project on GitHub and typed the file name into
the search box and it came right up.



-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-spatial-search-within-the-polygon-tp4101147p4122645.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr spatial search within the polygon

2014-03-10 Thread javinsnc
Ok David. I give it a shot.

Thanks again!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-spatial-search-within-the-polygon-tp4101147p4122647.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: [Clustering] Full-Index Offline cluster

2014-03-10 Thread Ahmet Arslan
Hi,

Thats weird. As far as I know there is no such thing. There is classification 
stuff but I haven't heard of clustering.
http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html

May be others (Dawid Weiss) can clarify?

Ahmet



On Monday, March 10, 2014 4:24 PM, Alessandro Benedetti 
 wrote:
Thank you, Ahmet, i already know Mahout.
What i was curious is if already exists an integration in Solr for Offline
clustering ...
Reading the wiki we can find this phrase : " While Solr contains an
extension for for full-index clustering (*off-line* clustering) this
section will focus on discussing on-line clustering only."[1]
So I was wondering if any documentation stands there :)
[1] https://cwiki.apache.org/confluence/display/solr/Result+Clustering


2014-03-10 14:15 GMT+00:00 Ahmet Arslan :

> Hi Alessandro,
>
> Generally Apache mahout http://mahout.apache.org is recommended for
> offline clustering.
>
> Ahmet
>
>
>
> On Monday, March 10, 2014 4:11 PM, Alessandro Benedetti <
> benedetti.ale...@gmail.com> wrote:
> Hi guys,
> I'm looking around to find out if it's possible to have a full-index
> /Offline cluster.
> My scope is to make a full index clustering ad for each document have the
> cluster field with the id/label of the cluster at indexing time.
> Anyone know more details regarding this kind of integration with Carrot2 ?
>
> I find only the classic query time clustering approach :
> https://cwiki.apache.org/confluence/display/solr/Result+Clustering
>
> Cheers
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England

>
>


-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England



Re: [Clustering] Full-Index Offline cluster

2014-03-10 Thread Stanislaw Osinski
>
> Thats weird. As far as I know there is no such thing. There is
> classification stuff but I haven't heard of clustering.
>
> http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html


I think the wording on the wiki page needs some clarification -- Solr
contains an internal API interface for full index clustering, but the
interface is not yet implemented, so the only clustering mode available out
of the box is currently search results clustering (based on the Carrot2
library).

Staszek


Re: [Clustering] Full-Index Offline cluster

2014-03-10 Thread Tommaso Teofili
Hi Ahmet, Ale,

right, there's a classification module for Lucene (and therefore usable in
Solr as well), but no clustering support there.

Regards,
Tommaso


2014-03-10 19:15 GMT+01:00 Ahmet Arslan :

> Hi,
>
> Thats weird. As far as I know there is no such thing. There is
> classification stuff but I haven't heard of clustering.
>
> http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html
>
> May be others (Dawid Weiss) can clarify?
>
> Ahmet
>
>
>
> On Monday, March 10, 2014 4:24 PM, Alessandro Benedetti <
> benedetti.ale...@gmail.com> wrote:
> Thank you, Ahmet, i already know Mahout.
> What i was curious is if already exists an integration in Solr for Offline
> clustering ...
> Reading the wiki we can find this phrase : " While Solr contains an
> extension for for full-index clustering (*off-line* clustering) this
> section will focus on discussing on-line clustering only."[1]
> So I was wondering if any documentation stands there :)
> [1] https://cwiki.apache.org/confluence/display/solr/Result+Clustering
>
>
> 2014-03-10 14:15 GMT+00:00 Ahmet Arslan :
>
> > Hi Alessandro,
> >
> > Generally Apache mahout http://mahout.apache.org is recommended for
> > offline clustering.
> >
> > Ahmet
> >
> >
> >
> > On Monday, March 10, 2014 4:11 PM, Alessandro Benedetti <
> > benedetti.ale...@gmail.com> wrote:
> > Hi guys,
> > I'm looking around to find out if it's possible to have a full-index
> > /Offline cluster.
> > My scope is to make a full index clustering ad for each document have the
> > cluster field with the id/label of the cluster at indexing time.
> > Anyone know more details regarding this kind of integration with Carrot2
> ?
> >
> > I find only the classic query time clustering approach :
> > https://cwiki.apache.org/confluence/display/solr/Result+Clustering
> >
> > Cheers
> >
> >
> > --
> > --
> >
> > Benedetti Alessandro
> > Visiting card : http://about.me/alessandro_benedetti
> >
> > "Tyger, tyger burning bright
> > In the forests of the night,
> > What immortal hand or eye
> > Could frame thy fearful symmetry?"
> >
> > William Blake - Songs of Experience -1794 England
>
> >
> >
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>
>


Re: [Clustering] Full-Index Offline cluster

2014-03-10 Thread Ahmet Arslan


Hi Staszek, Tommaso,

Thanks for the clarification.

Ahmet

On Monday, March 10, 2014 8:23 PM, Tommaso Teofili  
wrote:
Hi Ahmet, Ale,

right, there's a classification module for Lucene (and therefore usable in
Solr as well), but no clustering support there.

Regards,
Tommaso



2014-03-10 19:15 GMT+01:00 Ahmet Arslan :

> Hi,
>
> Thats weird. As far as I know there is no such thing. There is
> classification stuff but I haven't heard of clustering.
>
> http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html
>
> May be others (Dawid Weiss) can clarify?
>
> Ahmet
>
>
>
> On Monday, March 10, 2014 4:24 PM, Alessandro Benedetti <
> benedetti.ale...@gmail.com> wrote:
> Thank you, Ahmet, i already know Mahout.
> What i was curious is if already exists an integration in Solr for Offline
> clustering ...
> Reading the wiki we can find this phrase : " While Solr contains an
> extension for for full-index clustering (*off-line* clustering) this
> section will focus on discussing on-line clustering only."[1]
> So I was wondering if any documentation stands there :)
> [1] https://cwiki.apache.org/confluence/display/solr/Result+Clustering
>
>
> 2014-03-10 14:15 GMT+00:00 Ahmet Arslan :
>
> > Hi Alessandro,
> >
> > Generally Apache mahout http://mahout.apache.org is recommended for
> > offline clustering.
> >
> > Ahmet
> >
> >
> >
> > On Monday, March 10, 2014 4:11 PM, Alessandro Benedetti <
> > benedetti.ale...@gmail.com> wrote:
> > Hi guys,
> > I'm looking around to find out if it's possible to have a full-index
> > /Offline cluster.
> > My scope is to make a full index clustering ad for each document have the
> > cluster field with the id/label of the cluster at indexing time.
> > Anyone know more details regarding this kind of integration with Carrot2
> ?
> >
> > I find only the classic query time clustering approach :
> > https://cwiki.apache.org/confluence/display/solr/Result+Clustering
> >
> > Cheers
> >
> >
> > --
> > --
> >
> > Benedetti Alessandro
> > Visiting card : http://about.me/alessandro_benedetti
> >
> > "Tyger, tyger burning bright
> > In the forests of the night,
> > What immortal hand or eye
> > Could frame thy fearful symmetry?"
> >
> > William Blake - Songs of Experience -1794 England
>
> >
> >
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>
>



Re: Date Range Query taking more time.

2014-03-10 Thread Vijay Kokatnur
Maybe I spoke too soon.

The second and third filter parameter *fq={!cache=false cost=50}ClientID:4*and
 *fq={!cache=false cost=150}StartDate:[NOW/DAY TO NOW/DAY+1YEAR] *above are
not getting executed, unless I make it the first parameter.  And when it's
the first filter parameter the Qtime goes up to 250ms from 2ms!!

Something I have noticed - Solr always respects only first "q" and "fq"
parameters.   Rest of the parameters are not applied at all.





On Thu, Mar 6, 2014 at 11:55 AM, Vijay Kokatnur wrote:

> That did the trick Ahmet.  The first response was around 200ms, but the
> subsequent queries were around 2-5ms.
>
> I tried this
>
> &q=UserID:AC10263A-E28B-99F9-0012-AAA42DDD9336
> &fq={!cache=false cost=100}Status:Booked
> &fq={!cache=false cost=50}ClientID:4
> &fq={!cache=false cost=50}[NOW/DAY TO NOW/DAY+1YEAR]
>
>
>
> On Thu, Mar 6, 2014 at 11:49 AM, Ahmet Arslan  wrote:
>
>> Hi,
>>
>> Did you try with non-cached filter quries before?
>> cached Filter queries are useful when they are re-used. How often do you
>> commit?
>>
>> I thought that we can do something if we disable cache filter queries and
>> manipulate their execution order with cost parameter.
>>
>> What happens with this :
>> &q=UserID:AC10263A-E28B-99F9-0012-AAA42DDD9336
>> &fq={!cache=false cost=100}Status:Booked
>> &fq={!cache=false cost=50}ClientID:4
>>
>> &fq={!cache=false cost=150}StartDate:[NOW/DAY TO NOW/DAY+1YEAR]
>>
>>
>>
>> On Thursday, March 6, 2014 9:15 PM, Vijay Kokatnur <
>> kokatnur.vi...@gmail.com> wrote:
>> Ahmet, I have tried filter queries before to fine tune query performance.
>>
>> However, whenever we use filter queries the response time goes up and
>> remains there.  With above change, the response time was consistently
>> around 4-5 secs.  We are using the default cache settings.
>>
>> Is there any settings I missed?
>>
>>
>>
>> On Thu, Mar 6, 2014 at 10:44 AM, Ahmet Arslan  wrote:
>>
>> > Hi,
>> >
>> > Since your range query has NOW in it, it won't be cached meaningfully.
>> > http://solr.pl/en/2012/03/05/use-of-cachefalse-and-cost-parameters/
>> >
>> > This is untested but can you try this?
>> >
>> > &q=UserID:AC10263A-E28B-99F9-0012-AAA42DDD9336
>> > &fq=Status:Booked
>> > &fq=ClientID:4
>> > &fq={!cache=false cost=150}StartDate:[NOW/DAY TO NOW/DAY+1YEAR]
>> >
>> >
>> >
>> >
>> > On Thursday, March 6, 2014 8:29 PM, Vijay Kokatnur <
>> > kokatnur.vi...@gmail.com> wrote:
>> > I am working with date range query that is not giving me faster response
>> > times.  After modifying date range construct after reading several
>> forums,
>> > response time now is around 200ms, down from 2-3secs.
>> >
>> > However, I was wondering if there still some way to improve upon it as
>> > queries without date range have around 2-10ms latency,
>> >
>> > Query : To look up upcoming booked trips for a user whenever he logs in
>> to
>> > the app-
>> >
>> > q=UserID:AC10263A-E28B-99F9-0012-AAA42DDD9336 AND Status:Booked
>> > ANDClientID:4 AND  StartDate:[NOW/DAY TO NOW/DAY+1YEAR]
>> >
>> > Date configuration in Schema :
>> >
>> > 
>> > > > positionIncrementGap="0"/>
>> >
>> > Appreciate any inputs.
>> >
>> > Thanks!
>> >
>> >
>>
>>
>


Re: Partial Counts in SOLR

2014-03-10 Thread Dmitry Kan
Salman,

It looks like what you describe has been implemented at Twitter.

Presentation from the recent Lucene / Solr Revolution conference in Dublin:
http://www.youtube.com/watch?v=AguWva8P_DI


On Sat, Mar 8, 2014 at 4:16 PM, Salman Akram <
salman.ak...@northbaysolutions.net> wrote:

> The issue with timeallowed is you never know if it will return minimum
> amount of docs or not.
>
> I do want docs to be sorted based on date but it seems its not possible
> that solr starts searching from recent docs and stops after finding certain
> no. of docs...any other tweak?
>
> Thanks
>
>
> On Saturday, March 8, 2014, Chris Hostetter 
> wrote:
>
> >
> > : Reason: In an index with millions of documents I don't want to know
> that
> > a
> > : certain query matched 1 million docs (of course it will take time to
> > : calculate that). Why don't just stop looking for more results lets say
> > : after it finds 100 docs? Possible??
> >
> > but if you care about sorting, ie: you want the top 100 documents sorted
> > by score, or sorted by date, you still have to "collect" all 1 million
> > matches in order to know what the first 100 are.
> >
> > if you really don't care about sorting, you can use the "timAllowed"
> > option to tell the seraching method to do the best job it can in an
> > (approximated) limited amount of time, and then pretend that the docs
> > collected so far represent the total number of matches...
> >
> >
> >
> https://cwiki.apache.org/confluence/display/solr/Common+Query+Parameters#CommonQueryParameters-ThetimeAllowedParameter
> >
> >
> > -Hoss
> > http://www.lucidworks.com/
> >
>
>
> --
> Regards,
>
> Salman Akram
> Project Manager - Intelligize
> NorthBay Solutions
> 410-G4 Johar Town, Lahore
> Off: +92-42-35290152
>
> Cell: +92-302-8495621
>



-- 
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan


Issue with spatial search

2014-03-10 Thread Steven Bower
I am seeing a "error" when doing a spatial search where a particular point
is showing up within a polygon, but by all methods I've tried that point is
not within the polygon..

First the point is: 41.2299,29.1345 (lat/lon)

The polygon is:

31.2719,32.283
31.2179,32.3681
31.1333,32.3407
30.9356,32.6318
31.0707,34.5196
35.2053,36.9415
37.2959,36.6339
40.8334,30.4273
41.1622,29.1421
41.6484,27.4832
47.0255,13.6342
43.9457,3.17525
37.0029,-5.7017
35.7741,-5.57719
34.801,-4.66201
33.345,10.0157
29.6745,18.9366
30.6592,29.1683
31.2719,32.283

The geo field we are using has this config:



The config is basically the same as the one from the docs...

They query I am issuing is this:

location:"Intersects(POLYGON((32.283 31.2719, 32.3681 31.2179, 32.3407
31.1333, 32.6318 30.9356, 34.5196 31.0707, 36.9415 35.2053, 36.6339
37.2959, 30.4273 40.8334, 29.1421 41.1622, 27.4832 41.6484, 13.6342
47.0255, 3.17525 43.9457, -5.7017 37.0029, -5.57719 35.7741, -4.66201
34.801, 10.0157 33.345, 18.9366 29.6745, 29.1683 30.6592, 32.283 31.2719)))"

and it brings back a result where the "location" field is 41.2299,29.1345

I've attached a KML with the polygon and the point and you can see from
that, visually, that the point is not within the polygon. I also tried in
google maps API but after playing around realize that the polygons in maps
are draw in Euclidian space while the map itself is a Mercator projection..
Loading the kml in earth fixes this issue but the point still lays outside
the polygon.. The distance between the edge of the polygon closes to the
point and the point itself is ~1.2 miles which is much larger than the
1meter accuracy given by the maxDistErr (per the docs).

Any thoughts on this?

Thanks,

Steve


solr_map_issue.kml
Description: application/vnd.google-earth.kml


Re: Issue with spatial search

2014-03-10 Thread Steven Bower
Minor edit to the KML to adjust color of polygon


On Mon, Mar 10, 2014 at 4:21 PM, Steven Bower  wrote:

> I am seeing a "error" when doing a spatial search where a particular point
> is showing up within a polygon, but by all methods I've tried that point is
> not within the polygon..
>
> First the point is: 41.2299,29.1345 (lat/lon)
>
> The polygon is:
>
> 31.2719,32.283
> 31.2179,32.3681
> 31.1333,32.3407
> 30.9356,32.6318
> 31.0707,34.5196
> 35.2053,36.9415
> 37.2959,36.6339
> 40.8334,30.4273
> 41.1622,29.1421
> 41.6484,27.4832
> 47.0255,13.6342
> 43.9457,3.17525
> 37.0029,-5.7017
> 35.7741,-5.57719
> 34.801,-4.66201
> 33.345,10.0157
> 29.6745,18.9366
> 30.6592,29.1683
> 31.2719,32.283
>
> The geo field we are using has this config:
>
> class="solr.SpatialRecursivePrefixTreeFieldType"
>distErrPct="0.025"
>maxDistErr="0.09"
>
>  
> spatialContextFactory="com.spatial4j.core.context.jts.JtsSpatialContextFactory"
>units="degrees"/>
>
> The config is basically the same as the one from the docs...
>
> They query I am issuing is this:
>
> location:"Intersects(POLYGON((32.283 31.2719, 32.3681 31.2179, 32.3407
> 31.1333, 32.6318 30.9356, 34.5196 31.0707, 36.9415 35.2053, 36.6339
> 37.2959, 30.4273 40.8334, 29.1421 41.1622, 27.4832 41.6484, 13.6342
> 47.0255, 3.17525 43.9457, -5.7017 37.0029, -5.57719 35.7741, -4.66201
> 34.801, 10.0157 33.345, 18.9366 29.6745, 29.1683 30.6592, 32.283 31.2719)))"
>
> and it brings back a result where the "location" field is 41.2299,29.1345
>
> I've attached a KML with the polygon and the point and you can see from
> that, visually, that the point is not within the polygon. I also tried in
> google maps API but after playing around realize that the polygons in maps
> are draw in Euclidian space while the map itself is a Mercator projection..
> Loading the kml in earth fixes this issue but the point still lays outside
> the polygon.. The distance between the edge of the polygon closes to the
> point and the point itself is ~1.2 miles which is much larger than the
> 1meter accuracy given by the maxDistErr (per the docs).
>
> Any thoughts on this?
>
> Thanks,
>
> Steve
>


solr_map_issue.kml
Description: application/vnd.google-earth.kml


Multiple "fq" parameters are not executed

2014-03-10 Thread Vijay Kokatnur
<..Spawning this as a separate thread..>

So I have a filter query with multiple "fq" parameters.  However, I have
noticed that only the first "fq" is used for filtering.  For instance, a
lookup with

...&fq=ClientID:2
&fq=HotelID:234-PPP
&fq={!cache=false}StartDate:[NOW/DAY TO *]

In the above query, results are filtered only by ClientID and not by
HotelID and StartDate.  The same thing happens with "q" query.  Does anyone
know why?


Re: Issue with spatial search

2014-03-10 Thread Steven Bower
Weirdly that same point shows up in the polygon below as well, which in the
area around the point doesn't intersect with the polygon in my first msg...

29.0454,41.2198
29.2349,41.1826
31.1107,40.9956
38.437,40.7991
41.1616,40.8988
42.1284,42.2141
40.0919,47.8482
30.4169,47.5783
26.9892,43.6459
27.2095,41.5676
29.0454,41.2198



On Mon, Mar 10, 2014 at 4:23 PM, Steven Bower  wrote:

> Minor edit to the KML to adjust color of polygon
>
>
> On Mon, Mar 10, 2014 at 4:21 PM, Steven Bower wrote:
>
>> I am seeing a "error" when doing a spatial search where a particular
>> point is showing up within a polygon, but by all methods I've tried that
>> point is not within the polygon..
>>
>> First the point is: 41.2299,29.1345 (lat/lon)
>>
>> The polygon is:
>>
>> 31.2719,32.283
>> 31.2179,32.3681
>> 31.1333,32.3407
>> 30.9356,32.6318
>> 31.0707,34.5196
>> 35.2053,36.9415
>> 37.2959,36.6339
>> 40.8334,30.4273
>> 41.1622,29.1421
>> 41.6484,27.4832
>> 47.0255,13.6342
>> 43.9457,3.17525
>> 37.0029,-5.7017
>> 35.7741,-5.57719
>> 34.801,-4.66201
>> 33.345,10.0157
>> 29.6745,18.9366
>> 30.6592,29.1683
>> 31.2719,32.283
>>
>> The geo field we are using has this config:
>>
>> >class="solr.SpatialRecursivePrefixTreeFieldType"
>>distErrPct="0.025"
>>maxDistErr="0.09"
>>
>>  
>> spatialContextFactory="com.spatial4j.core.context.jts.JtsSpatialContextFactory"
>>units="degrees"/>
>>
>> The config is basically the same as the one from the docs...
>>
>> They query I am issuing is this:
>>
>> location:"Intersects(POLYGON((32.283 31.2719, 32.3681 31.2179, 32.3407
>> 31.1333, 32.6318 30.9356, 34.5196 31.0707, 36.9415 35.2053, 36.6339
>> 37.2959, 30.4273 40.8334, 29.1421 41.1622, 27.4832 41.6484, 13.6342
>> 47.0255, 3.17525 43.9457, -5.7017 37.0029, -5.57719 35.7741, -4.66201
>> 34.801, 10.0157 33.345, 18.9366 29.6745, 29.1683 30.6592, 32.283 31.2719)))"
>>
>> and it brings back a result where the "location" field is 41.2299,29.1345
>>
>> I've attached a KML with the polygon and the point and you can see from
>> that, visually, that the point is not within the polygon. I also tried in
>> google maps API but after playing around realize that the polygons in maps
>> are draw in Euclidian space while the map itself is a Mercator projection..
>> Loading the kml in earth fixes this issue but the point still lays outside
>> the polygon.. The distance between the edge of the polygon closes to the
>> point and the point itself is ~1.2 miles which is much larger than the
>> 1meter accuracy given by the maxDistErr (per the docs).
>>
>> Any thoughts on this?
>>
>> Thanks,
>>
>> Steve
>>
>
>


Re: Multiple "fq" parameters are not executed

2014-03-10 Thread Jack Krupansky
What are some example values of the HotelID and StateDate fields that are 
not getting filtered out?


Multiple fq queries will be ANDed.

-- Jack Krupansky

-Original Message- 
From: Vijay Kokatnur

Sent: Monday, March 10, 2014 4:51 PM
To: solr-user
Subject: Multiple "fq" parameters are not executed

<..Spawning this as a separate thread..>

So I have a filter query with multiple "fq" parameters.  However, I have
noticed that only the first "fq" is used for filtering.  For instance, a
lookup with

...&fq=ClientID:2
&fq=HotelID:234-PPP
&fq={!cache=false}StartDate:[NOW/DAY TO *]

In the above query, results are filtered only by ClientID and not by
HotelID and StartDate.  The same thing happens with "q" query.  Does anyone
know why? 



Re: Multiple "fq" parameters are not executed

2014-03-10 Thread Yonik Seeley
Solr has extensive filtering tests.
The first step would be to double check that you see what you think
you are seeing, and then try and create an example to reproduce it.

For example, this works fine with the "example" data, and is of the
same form as your query:
http://localhost:8983/solr/query
?q=*:*
&fq=inStock:true
&fq={!cache=false}text:solr

-Yonik
http://heliosearch.org - solve Solr GC pauses with off-heap filters
and fieldcache


On Mon, Mar 10, 2014 at 4:51 PM, Vijay Kokatnur
 wrote:
> <..Spawning this as a separate thread..>
>
> So I have a filter query with multiple "fq" parameters.  However, I have
> noticed that only the first "fq" is used for filtering.  For instance, a
> lookup with
>
> ...&fq=ClientID:2
> &fq=HotelID:234-PPP
> &fq={!cache=false}StartDate:[NOW/DAY TO *]
>
> In the above query, results are filtered only by ClientID and not by
> HotelID and StartDate.  The same thing happens with "q" query.  Does anyone
> know why?


Re: Updated to v4.7 - Getting "Search requests cannot accept content streams"

2014-03-10 Thread Shawn Heisey
On 3/10/2014 6:14 AM, leevduhl wrote:
> We just upgraded our dev environment from Solr 4.6 to 4.7 and our search
> "posts" are now returning a "Search requests cannot accept content streams"
> error.  We did not install over top of our 4.6 install, we installed into a
> new folder.
> 
> org.apache.solr.common.SolrException: Search requests cannot accept content
> streams

This message was added by SOLR-5517.  This is specifically thrown when
Solr detects that the request includes one or more content streams,
which is only supported for updates, not queries.

https://issues.apache.org/jira/browse/SOLR-5517

You may need to include a Content-Type header in your HTTP request.  If
you are already doing this, then it may be invalid, or there may be some
other problem with your request.  To figure out what it is, we'll need
more information about how your requests are constructed.

Thanks,
Shawn



Luke 4.7.0 released

2014-03-10 Thread Dmitry Kan
Hello!

Luke 4.7.0 has been released. Download it here:

https://github.com/DmitryKey/luke/releases/tag/4.7.0

Release based on pull request of Petri Kivikangas (
https://github.com/DmitryKey/luke/pull/2) Kiitos, Petri!

Tested against the solr-4.7.0 index.

1. Upgraded maven plugins.
2. Added simple Windows launch script: In Windows, Luke can now be launched
easily by executing luke.bat. Script sets MaxPermSize to 512m because Luke
was found to crash on lower settings.

Best regards,

Dmitry Kan

-- 
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan


Re: Updated to v4.7 - Getting "Search requests cannot accept content streams"

2014-03-10 Thread Yonik Seeley
I get a different error (but related to the same issue I guess) with
the following simple query:

/opt/code/heliosearch/solr$ curl -XPOST
"http://localhost:8983/solr/select?q=*:*";



Must specify a Content-Type header
with POST requests415



HTTP does not require a POST body, so it seems like the checking was a
bit too strict (i.e. if there is no post body, there should be no
requirement for a content-type).

-Yonik
http://heliosearch.org - solve Solr GC pauses with off-heap filters
and fieldcache


On Mon, Mar 10, 2014 at 5:08 PM, Shawn Heisey  wrote:
> On 3/10/2014 6:14 AM, leevduhl wrote:
>> We just upgraded our dev environment from Solr 4.6 to 4.7 and our search
>> "posts" are now returning a "Search requests cannot accept content streams"
>> error.  We did not install over top of our 4.6 install, we installed into a
>> new folder.
>>
>> org.apache.solr.common.SolrException: Search requests cannot accept content
>> streams
>
> This message was added by SOLR-5517.  This is specifically thrown when
> Solr detects that the request includes one or more content streams,
> which is only supported for updates, not queries.
>
> https://issues.apache.org/jira/browse/SOLR-5517
>
> You may need to include a Content-Type header in your HTTP request.  If
> you are already doing this, then it may be invalid, or there may be some
> other problem with your request.  To figure out what it is, we'll need
> more information about how your requests are constructed.
>
> Thanks,
> Shawn
>


Re: Date Range Query taking more time.

2014-03-10 Thread Vijay Kokatnur
Thanks Erick.  The links you provided are invaluable.

Here are our commit settings.  Since we have NRT search, softCommit is set
to 1000s which explains why cache is constantly invalidated.

 
   60
   false
 

 
   1000
 


With constant cache invalidation it becomes almost impossible to get better
response times.  Is the only to solve this is to fine tune softCommit
settings?



On Fri, Mar 7, 2014 at 6:17 PM, Erick Erickson wrote:

> OK, something is not right here. What are
> your autocommit settings? What you pasted
> above looks like you're looking at a searcher that
> has _just_ opened, which would mean either
> 1> you just had a hard commit with openSearcher=false happen
> or
> 2> you just had a soft commit happen
>
> In either case, the cache is thrown out. That said, if you
> have autowarming for the cache set up you should be
> seeing some hits eventually.
>
> The top part is the _current_ searcher. The cumulative_*
> is all the cache results since the application started.
>
> A couple of blogs:
>
>
> http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>
> http://searchhub.org/2012/02/23/date-math-now-and-filter-queries/
>
> I'm going to guess that you have soft commits or hard commits
> with openSearcher=true set to a very short interval and are
> having your filter caches invalidated very frequently, and that is
> misleading you, but that's just a guess.
>
> Best,
> Erick
>
>
>
> On Thu, Mar 6, 2014 at 9:32 PM, Vijay Kokatnur 
> wrote:
> > My initial approach was to use filter cache static fields.  However when
> > filter query is used, every query after the first has the same response
> > time as the first.  For instance, when cache is enabled in the query
> under
> > review, response time shoots up to 4-5secs and stays there.
> >
> > We are using default filter cache settings provided with 4.5.0
> > distribution.
> >
> > Current Filter Cache stats :
> >
> > lookups:0
> > hits:0
> > hitratio:0
> > inserts:0
> > evictions:0
> > size:0
> > warmupTime:0
> > cumulative_lookups:17135
> > cumulative_hits:2465
> > cumulative_hitratio:0.14
> > cumulative_inserts:14670
> > cumulative_evictions:0
> >
> > I did not find what cumulative_* fields mean
> > here ,
> > but it looks like nothing is being cached with fq as hit ratio is 0.
> >
> > Any idea whats happening?
> >
> >
> >
> > On Thu, Mar 6, 2014 at 2:41 PM, Ahmet Arslan  wrote:
> >
> >> Hoss,
> >>
> >> Thanks for the correction. I missed the /DAY part and thought as it was
> >>  StartDate:[NOW TO NOW+1YEAR]
> >>
> >> Ahmet
> >>
> >>
> >> On Friday, March 7, 2014 12:33 AM, Chris Hostetter <
> >> hossman_luc...@fucit.org> wrote:
> >>
> >> : That did the trick Ahmet.  The first response was around 200ms, but
> the
> >> : subsequent queries were around 2-5ms.
> >>
> >> Are you really sure you want "cache=false" on all of those filters?
> >>
> >> While the "ClientID:4" query may by something that cahnges significantly
> >> enough in every query to not be useful to cache, i suspect you'd find a
> >> lot of value in going ahead and caching those Status:Booked and
> >> StartDate:[NOW/DAY TO NOW/DAY+1YEAR] clauses ... the first query to hit
> >> them might be "slower" but ever query after that should be fairly fast
> --
> >> and if you really need them to *always* be fast, configure them as
> static
> >> newSeracher warming queries (or make sure you have autowarming on.
> >>
> >> It also look like you forgot the "StartDate:" part of your range query
> in
> >> your last test...
> >>
> >> : &fq={!cache=false cost=50}[NOW/DAY TO NOW/DAY+1YEAR]
> >>
> >> And one finally comment just to make sure it doesn't slip throug hthe
> >> cracks
> >>
> >>
> >> : > > Since your range query has NOW in it, it won't be cached
> >> meaningfully.
> >>
> >> this is not applicable.  the use of "NOW" in a range query doesn't mean
> >> that it can't be cached -- the problem is anytime you use really precise
> >> dates (or numeric values) that *change* in every query.
> >>
> >> if your range query uses "NOW" as a lower/upper end point, then it calls
> >> in that "really precise dates" situation -- but for this user, who is
> >> specifically rounding his dates to hte nearest day, that advice isn't
> >> really applicable -- the date range queries can be cached & reused for
> an
> >> entire day.
> >>
> >>
> >>
> >> -Hoss
> >> http://www.lucidworks.com/
> >>
> >>
>


Re: Issue with spatial search

2014-03-10 Thread Smiley, David W.
Hi Steven,

Set distErrPct to 0 in order to get non-point shapes to always be as accurate 
as maxDistErr.  Point shapes are always that accurate.  As long as you only 
index points, not other shapes (you don’t index polygons, etc.) then distErrPct 
of 0 should be fine.  In fact, perhaps a future Solr version should simply use 
0 as the default; the last time I did benchmarks it was pretty marginal impact 
of higher distErrPct.

It’s a fairly different story if you are indexing non-point shapes.

~ David

From: Steven Bower mailto:smb-apa...@alcyon.net>>
Reply-To: "solr-user@lucene.apache.org" 
mailto:solr-user@lucene.apache.org>>
Date: Monday, March 10, 2014 at 4:23 PM
To: "solr-user@lucene.apache.org" 
mailto:solr-user@lucene.apache.org>>
Subject: Re: Issue with spatial search

Minor edit to the KML to adjust color of polygon


On Mon, Mar 10, 2014 at 4:21 PM, Steven Bower 
mailto:smb-apa...@alcyon.net>> wrote:
I am seeing a "error" when doing a spatial search where a particular point is 
showing up within a polygon, but by all methods I've tried that point is not 
within the polygon..

First the point is: 41.2299,29.1345 (lat/lon)

The polygon is:

31.2719,32.283
31.2179,32.3681
31.1333,32.3407
30.9356,32.6318
31.0707,34.5196
35.2053,36.9415
37.2959,36.6339
40.8334,30.4273
41.1622,29.1421
41.6484,27.4832
47.0255,13.6342
43.9457,3.17525
37.0029,-5.7017
35.7741,-5.57719
34.801,-4.66201
33.345,10.0157
29.6745,18.9366
30.6592,29.1683
31.2719,32.283

The geo field we are using has this config:



The config is basically the same as the one from the docs...

They query I am issuing is this:

location:"Intersects(POLYGON((32.283 31.2719, 32.3681 31.2179, 32.3407 31.1333, 
32.6318 30.9356, 34.5196 31.0707, 36.9415 35.2053, 36.6339 37.2959, 30.4273 
40.8334, 29.1421 41.1622, 27.4832 41.6484, 13.6342 47.0255, 3.17525 43.9457, 
-5.7017 37.0029, -5.57719 35.7741, -4.66201 34.801, 10.0157 33.345, 18.9366 
29.6745, 29.1683 30.6592, 32.283 31.2719)))"

and it brings back a result where the "location" field is 41.2299,29.1345

I've attached a KML with the polygon and the point and you can see from that, 
visually, that the point is not within the polygon. I also tried in google maps 
API but after playing around realize that the polygons in maps are draw in 
Euclidian space while the map itself is a Mercator projection.. Loading the kml 
in earth fixes this issue but the point still lays outside the polygon.. The 
distance between the edge of the polygon closes to the point and the point 
itself is ~1.2 miles which is much larger than the 1meter accuracy given by the 
maxDistErr (per the docs).

Any thoughts on this?

Thanks,

Steve



Re: Date Range Query taking more time.

2014-03-10 Thread Vijay Kokatnur
Pardon my typo.  I meant 1000ms in my last mail.

Thanks,
-Vijay


On Mon, Mar 10, 2014 at 4:22 PM, Vijay Kokatnur wrote:

> Thanks Erick.  The links you provided are invaluable.
>
> Here are our commit settings.  Since we have NRT search, softCommit is set
> to 1000s which explains why cache is constantly invalidated.
>
>  
>60
>false
>  
>
>  
>1000
>  
>
>
> With constant cache invalidation it becomes almost impossible to get
> better response times.  Is the only to solve this is to fine tune
> softCommit settings?
>
>
>
> On Fri, Mar 7, 2014 at 6:17 PM, Erick Erickson wrote:
>
>> OK, something is not right here. What are
>> your autocommit settings? What you pasted
>> above looks like you're looking at a searcher that
>> has _just_ opened, which would mean either
>> 1> you just had a hard commit with openSearcher=false happen
>> or
>> 2> you just had a soft commit happen
>>
>> In either case, the cache is thrown out. That said, if you
>> have autowarming for the cache set up you should be
>> seeing some hits eventually.
>>
>> The top part is the _current_ searcher. The cumulative_*
>> is all the cache results since the application started.
>>
>> A couple of blogs:
>>
>>
>> http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>>
>> http://searchhub.org/2012/02/23/date-math-now-and-filter-queries/
>>
>> I'm going to guess that you have soft commits or hard commits
>> with openSearcher=true set to a very short interval and are
>> having your filter caches invalidated very frequently, and that is
>> misleading you, but that's just a guess.
>>
>> Best,
>> Erick
>>
>>
>>
>> On Thu, Mar 6, 2014 at 9:32 PM, Vijay Kokatnur 
>> wrote:
>> > My initial approach was to use filter cache static fields.  However when
>> > filter query is used, every query after the first has the same response
>> > time as the first.  For instance, when cache is enabled in the query
>> under
>> > review, response time shoots up to 4-5secs and stays there.
>> >
>> > We are using default filter cache settings provided with 4.5.0
>> > distribution.
>> >
>> > Current Filter Cache stats :
>> >
>> > lookups:0
>> > hits:0
>> > hitratio:0
>> > inserts:0
>> > evictions:0
>> > size:0
>> > warmupTime:0
>> > cumulative_lookups:17135
>> > cumulative_hits:2465
>> > cumulative_hitratio:0.14
>> > cumulative_inserts:14670
>> > cumulative_evictions:0
>> >
>> > I did not find what cumulative_* fields mean
>> > here ,
>> > but it looks like nothing is being cached with fq as hit ratio is 0.
>> >
>> > Any idea whats happening?
>> >
>> >
>> >
>> > On Thu, Mar 6, 2014 at 2:41 PM, Ahmet Arslan  wrote:
>> >
>> >> Hoss,
>> >>
>> >> Thanks for the correction. I missed the /DAY part and thought as it was
>> >>  StartDate:[NOW TO NOW+1YEAR]
>> >>
>> >> Ahmet
>> >>
>> >>
>> >> On Friday, March 7, 2014 12:33 AM, Chris Hostetter <
>> >> hossman_luc...@fucit.org> wrote:
>> >>
>> >> : That did the trick Ahmet.  The first response was around 200ms, but
>> the
>> >> : subsequent queries were around 2-5ms.
>> >>
>> >> Are you really sure you want "cache=false" on all of those filters?
>> >>
>> >> While the "ClientID:4" query may by something that cahnges
>> significantly
>> >> enough in every query to not be useful to cache, i suspect you'd find a
>> >> lot of value in going ahead and caching those Status:Booked and
>> >> StartDate:[NOW/DAY TO NOW/DAY+1YEAR] clauses ... the first query to hit
>> >> them might be "slower" but ever query after that should be fairly fast
>> --
>> >> and if you really need them to *always* be fast, configure them as
>> static
>> >> newSeracher warming queries (or make sure you have autowarming on.
>> >>
>> >> It also look like you forgot the "StartDate:" part of your range query
>> in
>> >> your last test...
>> >>
>> >> : &fq={!cache=false cost=50}[NOW/DAY TO NOW/DAY+1YEAR]
>> >>
>> >> And one finally comment just to make sure it doesn't slip throug hthe
>> >> cracks
>> >>
>> >>
>> >> : > > Since your range query has NOW in it, it won't be cached
>> >> meaningfully.
>> >>
>> >> this is not applicable.  the use of "NOW" in a range query doesn't mean
>> >> that it can't be cached -- the problem is anytime you use really
>> precise
>> >> dates (or numeric values) that *change* in every query.
>> >>
>> >> if your range query uses "NOW" as a lower/upper end point, then it
>> calls
>> >> in that "really precise dates" situation -- but for this user, who is
>> >> specifically rounding his dates to hte nearest day, that advice isn't
>> >> really applicable -- the date range queries can be cached & reused for
>> an
>> >> entire day.
>> >>
>> >>
>> >>
>> >> -Hoss
>> >> http://www.lucidworks.com/
>> >>
>> >>
>>
>
>


Re: How to apply Semantic Search in Solr

2014-03-10 Thread Sujit Pal
Hi Sohan,

You would be the best person to answer your question of how to proceed :-).
>From your original query term "musical events in New York" rewriting to
"musical nights at ABC place" OR "concerts events" OR "classical music
event" you would have to build into your knowledge base that "ABC place" is
a synonym for "New York", and that "musical event at New York" is a synonym
for "concerts events" and "classical music event". You can do this using
approach #1 (from the Berryman blog post) and the approach #2 (my first
suggestion) but these results are not guaranteed - because your corpus may
not contain this relationship. Approach #3 (my second suggestion) involves
lots of work and possibly domain knowledge but much cleaner relationships.
OTOH, you could get away for this one query by adding the three queries
into your synonyms.txt and enabling synonym support in Solr.

http://stackoverflow.com/questions/18790256/solr-synonym-not-working

So how much effort you put into supporting this feature would be dictated
by how important it is to your environment - that is a question only you
can answer.

-sujit



On Sun, Mar 9, 2014 at 11:26 PM, Sohan Kalsariya
wrote:

> Thanks Sujit and all for your views about semantic search in solr.
> But How do i proceed towards, i mean how do i start off the things to get
> on track ?
>
>
>
> On Sat, Mar 8, 2014 at 10:50 PM, Sujit Pal  wrote:
>
> > Thanks for sharing this link Sohan, its an interesting approach. Since
> you
> > have effectively defined what you mean by Semantic Search, there are
> couple
> > other approaches I know of to do something like this:
> > 1) preprocess your documents looking for terms that co-occur in the same
> > document. The more such cooccurrences you find the more strongly these
> > terms are related (can help with ordering related terms from most related
> > to least related). At query time expand the query to include /most/
> related
> > concepts and search.
> > 2) use an external knowledgebase such as a taxonomy that indicates
> > relationships between concepts (this is the approach we use). At query
> time
> > expand the query to include related concepts and search.
> >
> > -sujit
> >
> > On Sat, Mar 8, 2014 at 8:21 AM, Sohan Kalsariya <
> sohankalsar...@gmail.com
> > >wrote:
> >
> > > Basically, when i searched it on Google I got this result :
> > >
> > >
> > >
> >
> http://www.opensourceconnections.com/2013/08/25/semantic-search-with-solr-and-python-numpy/
> > >
> > > And I am working on this.
> > >
> > > So is this useful ?
> > >
> > >
> > > On Sat, Mar 8, 2014 at 3:11 PM, Alexandre Rafalovitch <
> > arafa...@gmail.com
> > > >wrote:
> > >
> > > > And how would it know to give you those results? Obviously, you have
> > > > some sort of magic/algorithm in your mind. Are you doing geographic
> > > > location match, category match, synonyms match?
> > > >
> > > > We can't really help with generic questions. You still need to figure
> > > > out what "semantic" means for you specifically.
> > > >
> > > > Regards,
> > > >Alex.
> > > > Personal website: http://www.outerthoughts.com/
> > > > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> > > > - Time is the quality of nature that keeps events from happening all
> > > > at once. Lately, it doesn't seem to be working.  (Anonymous  - via
> GTD
> > > > book)
> > > >
> > > >
> > > > On Sat, Mar 8, 2014 at 4:27 PM, Sohan Kalsariya
> > > >  wrote:
> > > > > Hello,
> > > > >
> > > > > I am working on an event listing and promotions website(
> > > > > http://allevents.in) and I want to apply semantic search on solr.
> > > > > For example, if someone search :
> > > > >
> > > > > "Musical Events in New York"
> > > > > So it would give me results such as :
> > > > >
> > > > >  * Musical Night at ABC place
> > > > >  * Concerts Events
> > > > >  * Classical Music Event
> > > > > I mean all results should be Semantic to the Search_Query it should
> > not
> > > > > give the results only based on "tf-idf". So can you please make me
> > > > > understand how do i proceed to apply Semantic Search in Solr. (
> > > > allevents.in)
> > > > >
> > > > > --
> > > > > Regards,
> > > > > *Sohan Kalsariya*
> > > >
> > >
> > >
> > >
> > > --
> > > Regards,
> > > *Sohan Kalsariya*
> > >
> >
>
>
>
> --
> Regards,
> *Sohan Kalsariya*
>


Re: The way Autocommit works in solr - Wierd

2014-03-10 Thread Erick Erickson
You have to separate out a couple of things.

First, data gets written to segments _without_
the segment getting closed and _before_ you
commit. What happens is that when
ramBufferSizeMB in solrconfig.xml is exceeded,
its contents are flushed to the currently-opened
segment. The segment is _not_ closed, so
the fact that you see index size varying on disk
apart from commits is not surprising at all.

Second, segments are merged. During merging,
a complete copy is made of the segments being
merged. Only when the merge is successful are
the old segments deleted (well, and the searcher
holding them open is closed). This happens as
a background process.

All of which is a way of saying I think you're
being puzzled by the extra stuff going on and
can safely ignore it all...

Best,
Erick

On Mon, Mar 10, 2014 at 9:57 AM, Furkan KAMACI  wrote:
> Hi;
>
> Did you read here:
> http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>
> Thanks;
> Furkan KAMACI
>
>
> 2014-03-10 15:14 GMT+02:00 RadhaJayalakshmi >:
>
>> Hi,
>>
>> Brief Description of my application:
>> We have a java program which reads a flat file, and adds document to solr
>> using cloudsolrserver.
>> And we index for every 1000 documents(bulk indexing).
>>
>> And the Autocommit setting of my application is:
>> 
>> 10
>> false
>> 
>>
>> So after every 100,000 documents are indexed, engine should perform a
>> HardCommit/AutoCommit. But still the OpenSearcher will be false.
>> Once the file is fully read, we are issuing a commit() from the
>> CloudSolrServer class. So this by default opens a new Searcher.
>>
>> Also, from the Log, i can see that three times, Autocommit is happenning.
>> and Only with the last/final Autocommit, opensearcher is set to true.
>>
>> So, till now all looks fine and working as expected.
>>
>> But one strange issue i observed during the course of indexing.
>> Now, as per the documentation, the data that is being indexed should first
>> get written into tlog. When the Autocommit is performed, the data will be
>> flushed to disk.
>> So only at three times, there should have been size difference in the
>> /index
>> folder. All the time only the size of the /tlog folder should have been
>> changing
>>
>> But actually happened is, all the time, i see the size of the /index folder
>> getting increased in parallel to the size of the /tlog folder.
>> Actually it is increasing to certain limit and coming down. Again
>> increasing
>> and coming down to a point.
>>
>> So Now the bigger doubt is have is, during hard commit, is the data getting
>> written into both /index or /tlog folder??
>>
>> I am using solr 4.5.1.
>>
>> Some one please clear me how the hardcommit works. I am asumming the
>> following sequence:
>> 1. Reads the data and writes to tlog
>> 2. During hardcommit, flushes the data from tlog to index. If openSearcher
>> is false, should not open a new searcher
>> 3. In the end, once all the datas are indexed, it should open a new
>> searcher.
>>
>> If not please explain me..
>>
>> Thanks in Advance
>> Radha
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/The-way-Autocommit-works-in-solr-Wierd-tp4122558.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>


Re: Multiple "fq" parameters are not executed

2014-03-10 Thread Erick Erickson
Having a couple of docs that aren't being
returned that you think should be would
help.

It's tangential, but you might get better
performance out of this when you get over
your initial problem by using something like
fq=StartDate:[NOW/DAY TO NOW/DAY+1DAY]

That'll filter on all docs with startDate of this calendar
day. It will be cached and reused until midnight.
Of course if you have docs in your index that have
StartDate of NOW+1MINUTE, that should _not_
be shown you have to use the form you already have,
and it's quite correct to specify cache=false..

Best,
Erick

On Mon, Mar 10, 2014 at 4:51 PM, Vijay Kokatnur
 wrote:
> <..Spawning this as a separate thread..>
>
> So I have a filter query with multiple "fq" parameters.  However, I have
> noticed that only the first "fq" is used for filtering.  For instance, a
> lookup with
>
> ...&fq=ClientID:2
> &fq=HotelID:234-PPP
> &fq={!cache=false}StartDate:[NOW/DAY TO *]
>
> In the above query, results are filtered only by ClientID and not by
> HotelID and StartDate.  The same thing happens with "q" query.  Does anyone
> know why?


Possible small bug with AnalyzingInfixSuggester in 4.7.0?

2014-03-10 Thread Brian Bray
I'm trying to use the new InfixSuggester exposed in 4.7 and I'm getting
some errors on startup, they don't seem to necessarily cause any problems,
my app still seems to run, but I get the following:

17:28:54.721 WARN  {coreLoadExecutor-4-thread-1} [o.a.s.core.SolrCore] :
[vpr] Solr index directory '/app/data/dir/solr/data/index' doesn't exist.
Creating new index...
17:28:55.194 ERROR {searcherExecutor-5-thread-1} [o.a.s.core.SolrCore] :
Exception in reloading suggester index for: phrase_suggester
java.io.FileNotFoundException:
/app/data/dir/solr/data/suggest_infix_dict/iwfsta.bin (No such file or
directory)
at java.io.FileInputStream.open(Native Method) ~[na:1.7.0_40]
at java.io.FileInputStream.(FileInputStream.java:146)
~[na:1.7.0_40]
at
org.apache.solr.spelling.suggest.SolrSuggester.reload(SolrSuggester.java:158)
~[solr-core-4.7.0.jar:4.7.0 1570806 - simon - 2014-02-22 08:36:23]
at
org.apache.solr.handler.component.SuggestComponent$SuggesterListener.newSearcher(SuggestComponent.java:465)
~[solr-core-4.7.0.jar:4.7.0 1570806 - simon - 2014-02-22 08:36:23]
at org.apache.solr.core.SolrCore$5.call(SolrCore.java:1695)
[solr-core-4.7.0.jar:4.7.0 1570806 - simon - 2014-02-22 08:36:23]
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
[na:1.7.0_40]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
[na:1.7.0_40]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[na:1.7.0_40]
at java.lang.Thread.run(Thread.java:724) [na:1.7.0_40]

Relevant config is:



  phrase_suggester
  AnalyzingInfixLookupFactory
  HighFrequencyDictionaryFactory
  phrase
  pid
  ${solr.data.dir:}/suggest_infix
  ${solr.data.dir:}/suggest_infix_dict
  text_general
  true
  true


  freetext_suggester
  FreeTextLookupFactory
  HighFrequencyDictionaryFactory
  spell
  pid
  text_general
  true

  

And I was poking through the source from:
http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/core/src/java/org/apache/solr/spelling/suggest/SolrSuggester.javato
see where the exception came from, and was wondering if in
SolrSuggester.reload(...)
the FileInputStream should be opened in the try/catch block?

Error seems to go away if I create an empty file for it to find.

Thanks,
Brian


Re: Filter query not working for time range

2014-03-10 Thread Darniz
Hello
is there a fix for the NOW rounding 

Otherwise i have to get current date and crreate a range query like
* TO -MM-ddThh:mm:ssZ



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Filter-query-not-working-for-time-range-tp4122441p4122723.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Filter query not working for time range

2014-03-10 Thread Erick Erickson
Where do you live? Is it possible you're getting fooled by the fact
that Solr uses UTC?

Solr doesn't distinguish between dates and times, they're all just
unix timestamps.

And, taking into account the time difference between now and UTC in my
time zone it works perfectly for me.

Best,
Erick

On Mon, Mar 10, 2014 at 8:20 PM, Darniz  wrote:
> Hello
> is there a fix for the NOW rounding
>
> Otherwise i have to get current date and crreate a range query like
> * TO -MM-ddThh:mm:ssZ
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Filter-query-not-working-for-time-range-tp4122441p4122723.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Unable to get offsets using AtomicReader.termPositionsEnum(Term)

2014-03-10 Thread Jefferson French
This looks like a codec issue, but I'm not sure how to address it. I've
found that a different instance of DocsAndPositionsEnum is instantiated
between my code and Solr's TermVectorComponent.

Mine:
org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$EverythingEnum
Solr: 
org.apache.lucene.codecs.compressing.CompressingTermVectorsReader$TVDocsEnum

As far as I can tell, I've only used Lucene/Solr 4.6, so I'm not sure where
the Lucene 4.1 reference comes from. I've searched through the Solr config
files and can't see where to change the codec, but shouldn't the reader use
the same codec as used when the index was created?


On Fri, Mar 7, 2014 at 1:37 PM, Jefferson French wrote:

> We have an API on top of Lucene 4.6 that I'm trying to adapt to running
> under Solr 4.6. The problem is although I'm getting the correct offsets
> when the index is created by Lucene, the same method calls always return -1
> when the index is created by Solr. In the latter case I can see the
> character offsets via Luke, and I can even get them from Solr when I access
> the /tvrh search handler, which uses the TermVectorComponent class.
>
> This is roughly how I'm reading character offsets in my Lucene code:
>
>> AtomicReader reader = ...
>> Term term = ...
>> DocsAndPositionsEnum postings = reader.termPositionsEnum(term);
>> while (postings.nextDoc() != DocsAndPositionsEnum.NO_MORE_DOCS) {
>>   for (int i = 0; i < postings.freq(); i++) {
>> System.out.println("start:" + postings.startOffset());
>> System.out.println("end:" + postings.endOffset());
>>   }
>> }
>
>
> Notice that I want the values for a single term. When run against an index
> created by Solr, the above calls to startOffset() and endOffset() return
> -1. Solr's TermVectorComponent prints the correct offsets like this
> (paraphrased):
>
> IndexReader reader = searcher.getIndexReader();
>> Terms vector = reader.getTermVector(docId, field);
>> TermsEnum termsEnum = vector.iterator(termsEnum);
>> int freq = (int) termsEnum.totalTermFreq();
>> DocsAndPositionsEnum dpEnum = null;
>> while((text = termsEnum.next()) != null) {
>>   String term = text.utf8ToString();
>>   dpEnum = termsEnum.docsAndPositions(null, dpEnum);
>>   dpEnum.nextDoc();
>>   for (int i = 0; i < freq; i++) {
>> final int pos = dpEnum.nextPosition();
>> System.out.println("start:" + dpEnum.startOffset());
>> System.out.println("end:" + dpEnum.endOffset());
>>   }
>> }
>
>
> but in this case it is getting the offsets per doc ID, rather than a
> single term, which is what I want.
>
> Could anyone tell me:
>
>1. Why I'm not able to get the offsets using my first example, and/or
>2. A better way to get the offsets for a given term?
>
> Thanks.
>
>Jeff
>
>
>
>
>
>
>
>
>


Re: Issue with spatial search

2014-03-10 Thread Steven Bower
Only points in the index.. Am I correct this won't require a reindex?

On Monday, March 10, 2014, Smiley, David W.  wrote:

> Hi Steven,
>
> Set distErrPct to 0 in order to get non-point shapes to always be as
> accurate as maxDistErr.  Point shapes are always that accurate.  As long as
> you only index points, not other shapes (you don't index polygons, etc.)
> then distErrPct of 0 should be fine.  In fact, perhaps a future Solr
> version should simply use 0 as the default; the last time I did benchmarks
> it was pretty marginal impact of higher distErrPct.
>
> It's a fairly different story if you are indexing non-point shapes.
>
> ~ David
>
> From: Steven Bower  smb-apa...@alcyon.net >>
> Reply-To: "solr-user@lucene.apache.org  solr-user@lucene.apache.org >" 
> 
> >
> Date: Monday, March 10, 2014 at 4:23 PM
> To: "solr-user@lucene.apache.org  solr-user@lucene.apache.org >" 
> 
> >
> Subject: Re: Issue with spatial search
>
> Minor edit to the KML to adjust color of polygon
>
>
> On Mon, Mar 10, 2014 at 4:21 PM, Steven Bower 
> 
> > wrote:
> I am seeing a "error" when doing a spatial search where a particular point
> is showing up within a polygon, but by all methods I've tried that point is
> not within the polygon..
>
> First the point is: 41.2299,29.1345 (lat/lon)
>
> The polygon is:
>
> 31.2719,32.283
> 31.2179,32.3681
> 31.1333,32.3407
> 30.9356,32.6318
> 31.0707,34.5196
> 35.2053,36.9415
> 37.2959,36.6339
> 40.8334,30.4273
> 41.1622,29.1421
> 41.6484,27.4832
> 47.0255,13.6342
> 43.9457,3.17525
> 37.0029,-5.7017
> 35.7741,-5.57719
> 34.801,-4.66201
> 33.345,10.0157
> 29.6745,18.9366
> 30.6592,29.1683
> 31.2719,32.283
>
> The geo field we are using has this config:
>
> class="solr.SpatialRecursivePrefixTreeFieldType"
>distErrPct="0.025"
>maxDistErr="0.09"
>
>  
> spatialContextFactory="com.spatial4j.core.context.jts.JtsSpatialContextFactory"
>units="degrees"/>
>
> The config is basically the same as the one from the docs...
>
> They query I am issuing is this:
>
> location:"Intersects(POLYGON((32.283 31.2719, 32.3681 31.2179, 32.3407
> 31.1333, 32.6318 30.9356, 34.5196 31.0707, 36.9415 35.2053, 36.6339
> 37.2959, 30.4273 40.8334, 29.1421 41.1622, 27.4832 41.6484, 13.6342
> 47.0255, 3.17525 43.9457, -5.7017 37.0029, -5.57719 35.7741, -4.66201
> 34.801, 10.0157 33.345, 18.9366 29.6745, 29.1683 30.6592, 32.283 31.2719)))"
>
> and it brings back a result where the "location" field is 41.2299,29.1345
>
> I've attached a KML with the polygon and the point and you can see from
> that, visually, that the point is not within the polygon. I also tried in
> google maps API but after playing around realize that the polygons in maps
> are draw in Euclidian space while the map itself is a Mercator projection..
> Loading the kml in earth fixes this issue but the point still lays outside
> the polygon.. The distance between the edge of the polygon closes to the
> point and the point itself is ~1.2 miles which is much larger than the
> 1meter accuracy given by the maxDistErr (per the docs).
>
> Any thoughts on this?
>
> Thanks,
>
> Steve
>
>


Re: SOLR JOINS not working and not returning any data for simple query

2014-03-10 Thread William Bell
Send the queries.


On Fri, Mar 7, 2014 at 2:32 PM, EXTERNAL Taminidi Ravi (ETI,
Automotive-Service-Solutions)  wrote:

> Hi All,
>
> I am facing a strange behavior with the Solr Server. All my joins are not
> working suddenly after a restart. Individual collections are returning the
> response but when I join the collection , I am getting zero documents. Let
> me know if anyone have same type of issues.
>
>
>


-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: SOLR JOINS not working and not returning any data for simple query

2014-03-10 Thread Erick Erickson
Really, how can anyone help with this little information?
Please read:
http://wiki.apache.org/solr/UsingMailingLists

Best,
Erick

On Mon, Mar 10, 2014 at 10:03 PM, William Bell  wrote:
> Send the queries.
>
>
> On Fri, Mar 7, 2014 at 2:32 PM, EXTERNAL Taminidi Ravi (ETI,
> Automotive-Service-Solutions)  wrote:
>
>> Hi All,
>>
>> I am facing a strange behavior with the Solr Server. All my joins are not
>> working suddenly after a restart. Individual collections are returning the
>> response but when I join the collection , I am getting zero documents. Let
>> me know if anyone have same type of issues.
>>
>>
>>
>
>
> --
> Bill Bell
> billnb...@gmail.com
> cell 720-256-8076


Re: Issue with spatial search

2014-03-10 Thread David Smiley (@MITRE.org)
Correct, Steve. Alternatively you can also put this option in your query
after the end of the last parenthesis, as in this example from the wiki:

  fq=geo:"IsWithin(POLYGON((-10 30, -40 40, -10 -20, 40 20, 0 0, -10 30)))
distErrPct=0"

~ David


Steven Bower wrote
> Only points in the index.. Am I correct this won't require a reindex?
> 
> On Monday, March 10, 2014, Smiley, David W. <

> dsmiley@

> > wrote:
> 
>> Hi Steven,
>>
>> Set distErrPct to 0 in order to get non-point shapes to always be as
>> accurate as maxDistErr.  Point shapes are always that accurate.  As long
>> as
>> you only index points, not other shapes (you don't index polygons, etc.)
>> then distErrPct of 0 should be fine.  In fact, perhaps a future Solr
>> version should simply use 0 as the default; the last time I did
>> benchmarks
>> it was pretty marginal impact of higher distErrPct.
>>
>> It's a fairly different story if you are indexing non-point shapes.
>>
>> ~ David
>>
>> From: Steven Bower <

> smb-apache@

>  
> >
>  

> smb-apache@

>  >>
>> Reply-To: "

> solr-user@.apache

>  
> >
>  

> solr-user@.apache

>  >" <

> solr-user@.apache

> 
>>  solr-user@.apache

>  >>
>> Date: Monday, March 10, 2014 at 4:23 PM
>> To: "

> solr-user@.apache

>  
> >
>  

> solr-user@.apache

>  >" <

> solr-user@.apache

> 
>>  solr-user@.apache

>  >>
>> Subject: Re: Issue with spatial search
>>
>> Minor edit to the KML to adjust color of polygon
>>
>>
>> On Mon, Mar 10, 2014 at 4:21 PM, Steven Bower <

> smb-apache@

> 
>>  smb-apache@

>  >> wrote:
>> I am seeing a "error" when doing a spatial search where a particular
>> point
>> is showing up within a polygon, but by all methods I've tried that point
>> is
>> not within the polygon..
>>
>> First the point is: 41.2299,29.1345 (lat/lon)
>>
>> The polygon is:
>>
>> 31.2719,32.283
>> 31.2179,32.3681
>> 31.1333,32.3407
>> 30.9356,32.6318
>> 31.0707,34.5196
>> 35.2053,36.9415
>> 37.2959,36.6339
>> 40.8334,30.4273
>> 41.1622,29.1421
>> 41.6484,27.4832
>> 47.0255,13.6342
>> 43.9457,3.17525
>> 37.0029,-5.7017
>> 35.7741,-5.57719
>> 34.801,-4.66201
>> 33.345,10.0157
>> 29.6745,18.9366
>> 30.6592,29.1683
>> 31.2719,32.283
>>
>> The geo field we are using has this config:
>>
>> 
> >
> class="solr.SpatialRecursivePrefixTreeFieldType"
>>distErrPct="0.025"
>>maxDistErr="0.09"
>>
>> 
>> spatialContextFactory="com.spatial4j.core.context.jts.JtsSpatialContextFactory"
>>units="degrees"/>
>>
>> The config is basically the same as the one from the docs...
>>
>> They query I am issuing is this:
>>
>> location:"Intersects(POLYGON((32.283 31.2719, 32.3681 31.2179, 32.3407
>> 31.1333, 32.6318 30.9356, 34.5196 31.0707, 36.9415 35.2053, 36.6339
>> 37.2959, 30.4273 40.8334, 29.1421 41.1622, 27.4832 41.6484, 13.6342
>> 47.0255, 3.17525 43.9457, -5.7017 37.0029, -5.57719 35.7741, -4.66201
>> 34.801, 10.0157 33.345, 18.9366 29.6745, 29.1683 30.6592, 32.283
>> 31.2719)))"
>>
>> and it brings back a result where the "location" field is 41.2299,29.1345
>>
>> I've attached a KML with the polygon and the point and you can see from
>> that, visually, that the point is not within the polygon. I also tried in
>> google maps API but after playing around realize that the polygons in
>> maps
>> are draw in Euclidian space while the map itself is a Mercator
>> projection..
>> Loading the kml in earth fixes this issue but the point still lays
>> outside
>> the polygon.. The distance between the edge of the polygon closes to the
>> point and the point itself is ~1.2 miles which is much larger than the
>> 1meter accuracy given by the maxDistErr (per the docs).
>>
>> Any thoughts on this?
>>
>> Thanks,
>>
>> Steve
>>
>>





-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Issue-with-spatial-search-tp4122690p4122744.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Unable to get offsets using AtomicReader.termPositionsEnum(Term)

2014-03-10 Thread Robert Muir
Hello, I think you are confused between two different index
structures, probably because of the name of the options in solr.

1. indexing term vectors: this means given a document, you can go
lookup a miniature "inverted index" just for that document. That means
each document has "term vectors" which has a term dictionary of the
terms in that one document, and optionally things like positions and
character offsets. This can be useful if you are examining *many
terms* for just a few documents. For example: the MoreLikeThis use
case. In solr this is activated with termVectors=true. To additionally
store positions/offsets information inside the term vectors its
termPositions and termOffsets, respectively.

2. indexing character offsets: this means given a term, you can get
the offset information "along with" each position that matched. So
really you can think of this as a special form of a payload. This is
useful if you are examining *many documents* for just a few terms. For
example, many highlighting use cases. In solr this is activated with
storeOffsetsWithPositions=true. It is unrelated to term vectors.

Hopefully this helps.

On Mon, Mar 10, 2014 at 9:32 PM, Jefferson French  wrote:
> This looks like a codec issue, but I'm not sure how to address it. I've
> found that a different instance of DocsAndPositionsEnum is instantiated
> between my code and Solr's TermVectorComponent.
>
> Mine:
> org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$EverythingEnum
> Solr: 
> org.apache.lucene.codecs.compressing.CompressingTermVectorsReader$TVDocsEnum
>
> As far as I can tell, I've only used Lucene/Solr 4.6, so I'm not sure where
> the Lucene 4.1 reference comes from. I've searched through the Solr config
> files and can't see where to change the codec, but shouldn't the reader use
> the same codec as used when the index was created?
>
>
> On Fri, Mar 7, 2014 at 1:37 PM, Jefferson French wrote:
>
>> We have an API on top of Lucene 4.6 that I'm trying to adapt to running
>> under Solr 4.6. The problem is although I'm getting the correct offsets
>> when the index is created by Lucene, the same method calls always return -1
>> when the index is created by Solr. In the latter case I can see the
>> character offsets via Luke, and I can even get them from Solr when I access
>> the /tvrh search handler, which uses the TermVectorComponent class.
>>
>> This is roughly how I'm reading character offsets in my Lucene code:
>>
>>> AtomicReader reader = ...
>>> Term term = ...
>>> DocsAndPositionsEnum postings = reader.termPositionsEnum(term);
>>> while (postings.nextDoc() != DocsAndPositionsEnum.NO_MORE_DOCS) {
>>>   for (int i = 0; i < postings.freq(); i++) {
>>> System.out.println("start:" + postings.startOffset());
>>> System.out.println("end:" + postings.endOffset());
>>>   }
>>> }
>>
>>
>> Notice that I want the values for a single term. When run against an index
>> created by Solr, the above calls to startOffset() and endOffset() return
>> -1. Solr's TermVectorComponent prints the correct offsets like this
>> (paraphrased):
>>
>> IndexReader reader = searcher.getIndexReader();
>>> Terms vector = reader.getTermVector(docId, field);
>>> TermsEnum termsEnum = vector.iterator(termsEnum);
>>> int freq = (int) termsEnum.totalTermFreq();
>>> DocsAndPositionsEnum dpEnum = null;
>>> while((text = termsEnum.next()) != null) {
>>>   String term = text.utf8ToString();
>>>   dpEnum = termsEnum.docsAndPositions(null, dpEnum);
>>>   dpEnum.nextDoc();
>>>   for (int i = 0; i < freq; i++) {
>>> final int pos = dpEnum.nextPosition();
>>> System.out.println("start:" + dpEnum.startOffset());
>>> System.out.println("end:" + dpEnum.endOffset());
>>>   }
>>> }
>>
>>
>> but in this case it is getting the offsets per doc ID, rather than a
>> single term, which is what I want.
>>
>> Could anyone tell me:
>>
>>1. Why I'm not able to get the offsets using my first example, and/or
>>2. A better way to get the offsets for a given term?
>>
>> Thanks.
>>
>>Jeff
>>
>>
>>
>>
>>
>>
>>
>>
>>


UpdateHandler issues with SolrCloud 4.7

2014-03-10 Thread ralph tice
We have a cluster running SolrCloud 4.7 built 2/25.  10 shards with 2
replicas each (20 shards total) at about ~20GB/shard.

We index around 1k-1.5k documents/second into this cluster constantly.  To
manage growth we have a scheduled job that runs every 3 hours to prune
documents based on business rules.  Lately this job has taken to failing.
 There are several facet queries before our delete queries but we're
generally deleting ~10k documents at a time.  Auto hard commits are set to
every 60 seconds and auto soft commits at 10 seconds.  Each node has enough
RAM to page cache the entire data set.  We run multiple JVMs per node to
help with GC.

When our pruning job is running, it has started to completely wedge the
UpdateHandler.  Indexing stops and takes 20-60 minutes to recover.  The
prune job encounters multiple read timeouts.

My guess is that the UpdateHandler blocks because shards are going into
recovery because they can't keep up with the documents sent over
replication after hard commits.  I suspect either updates/replication are
the issue or shard size because we have another (larger) cluster with
5GB/shard and no replication that seems to handle load better.

Some logs from 2 of the 4 -- the other nodes have similar logs to these
with SnapPuller / PeerSync on one and Connection Reset errors on the other:

Mar 10 20:13:35 solr-5e.i.jobcorp.com [Thread-29775]
org.apache.solr.cloud.RecoveryStrategy Stopping recovery for
zkNodeName=core_node25core=solr_shard10_8987
Mar 10 20:13:35 solr-5e.i.jobcorp.com [Thread-29772]
org.apache.solr.cloud.RecoveryStrategy Stopping recovery for
zkNodeName=core_node25core=solr_shard10_8987
Mar 10 21:05:45 solr-5e.i.jobcorp.com [Thread-37627]
org.apache.solr.cloud.RecoveryStrategy Stopping recovery for
zkNodeName=core_node21core=solr_shard6_8983
Mar 10 21:05:47 solr-5e.i.jobcorp.com [RecoveryThread]
org.apache.solr.update.PeerSync PeerSync: core=solr_shard6_8983 url=
http://solr-5e.i.jobcorp.com:8983/solr too many updates received since
start - startingUpdates no longer overlaps with our currentUpdates
Mar 10 21:05:47 solr-5e.i.jobcorp.com [RecoveryThread]
org.apache.solr.handler.SnapPuller File _fqp9x_Lucene41_0.tip expected to
be 495806 while it is 107332
Mar 10 21:07:33 solr-5e.i.jobcorp.com [recoveryExecutor-6-thread-4]
org.apache.solr.update.UpdateLog Starting log replay
tlog{file=/mnt/solr/data/solr_shard6_8983/tlog/tlog.0005547
refcount=2} active=true starting pos=2249142
Mar 10 21:08:06 solr-5e.i.jobcorp.com [recoveryExecutor-6-thread-4]
org.apache.solr.update.UpdateLog Log replay finished.
recoveryInfo=RecoveryInfo{adds=45282 deletes=0 deleteByQuery=252 errors=0
positionOfStart=2249142}
Mar 11 00:08:24 solr-5e.i.jobcorp.com [RecoveryThread]
org.apache.solr.update.PeerSync PeerSync: core=solr_shard8_8985 url=
http://solr-5e.i.jobcorp.com:8985/solr too many updates received since
start - startingUpdates no longer overlaps with our currentUpdates
Mar 11 00:09:20 solr-5e.i.jobcorp.com [commitScheduler-8-thread-1]
org.apache.solr.core.SolrCore [solr_shard8_8985] PERFORMANCE WARNING:
Overlapping onDeckSearchers=2
Mar 11 00:09:29 solr-5e.i.jobcorp.com [recoveryExecutor-6-thread-6]
org.apache.solr.update.UpdateLog Starting log replay
tlog{file=/mnt/solr/data/solr_shard8_8985/tlog/tlog.0005717
refcount=2} active=true starting pos=1329158
Mar 11 00:09:31 solr-5e.i.jobcorp.com [recoveryExecutor-6-thread-6]
org.apache.solr.core.SolrCore [solr_shard8_8985] PERFORMANCE WARNING:
Overlapping onDeckSearchers=2
Mar 11 00:09:50 solr-5e.i.jobcorp.com [recoveryExecutor-6-thread-6]
org.apache.solr.update.UpdateLog Log replay finished.
recoveryInfo=RecoveryInfo{adds=8069 deletes=0 deleteByQuery=14 errors=0
positionOfStart=1329158}

Different node:
Mar 11 02:36:32 solr-3d.i.jobcorp.com [updateExecutor-1-thread-74378]
org.apache.solr.update.StreamingSolrServers error
Mar 11 02:36:32 solr-3d.i.jobcorp.com #011java.net.SocketException:
Connection reset
Mar 11 02:36:32 solr-3d.i.jobcorp.com #011at
java.net.SocketInputStream.read(SocketInputStream.java:196)
Mar 11 02:36:32 solr-3d.i.jobcorp.com #011at
java.net.SocketInputStream.read(SocketInputStream.java:122)
Mar 11 02:36:32 solr-3d.i.jobcorp.com #011at
org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160)
Mar 11 02:36:32 solr-3d.i.jobcorp.com #011at
org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84)
Mar 11 02:36:32 solr-3d.i.jobcorp.com #011at
org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:273)
Mar 11 02:36:32 solr-3d.i.jobcorp.com #011at
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140)
Mar 11 02:36:32 solr-3d.i.jobcorp.com #011at
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
Mar 11 02:36:32 solr-3d.i.jobcorp.com #011at
org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:2

Re: How to customize Solr

2014-03-10 Thread ~$alpha`
the link you provided has no information about customizing 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-customize-Solr-tp4122551p4122760.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to apply Semantic Search in Solr

2014-03-10 Thread Sohan Kalsariya
Hey Sujit thanks a lot.
But what do you think about Berryman blog post ?
Is it feasible to apply or should i apply the synonym stuff ?
which one is good?
And the 3rd approach you told me about, seems like difficult and
time consuming for students like me as i will have to submit this in next
15 Days.
Please suggest me something.


On Tue, Mar 11, 2014 at 5:12 AM, Sujit Pal  wrote:

> Hi Sohan,
>
> You would be the best person to answer your question of how to proceed :-).
> From your original query term "musical events in New York" rewriting to
> "musical nights at ABC place" OR "concerts events" OR "classical music
> event" you would have to build into your knowledge base that "ABC place" is
> a synonym for "New York", and that "musical event at New York" is a synonym
> for "concerts events" and "classical music event". You can do this using
> approach #1 (from the Berryman blog post) and the approach #2 (my first
> suggestion) but these results are not guaranteed - because your corpus may
> not contain this relationship. Approach #3 (my second suggestion) involves
> lots of work and possibly domain knowledge but much cleaner relationships.
> OTOH, you could get away for this one query by adding the three queries
> into your synonyms.txt and enabling synonym support in Solr.
>
> http://stackoverflow.com/questions/18790256/solr-synonym-not-working
>
> So how much effort you put into supporting this feature would be dictated
> by how important it is to your environment - that is a question only you
> can answer.
>
> -sujit
>
>
>
> On Sun, Mar 9, 2014 at 11:26 PM, Sohan Kalsariya
> wrote:
>
> > Thanks Sujit and all for your views about semantic search in solr.
> > But How do i proceed towards, i mean how do i start off the things to get
> > on track ?
> >
> >
> >
> > On Sat, Mar 8, 2014 at 10:50 PM, Sujit Pal 
> wrote:
> >
> > > Thanks for sharing this link Sohan, its an interesting approach. Since
> > you
> > > have effectively defined what you mean by Semantic Search, there are
> > couple
> > > other approaches I know of to do something like this:
> > > 1) preprocess your documents looking for terms that co-occur in the
> same
> > > document. The more such cooccurrences you find the more strongly these
> > > terms are related (can help with ordering related terms from most
> related
> > > to least related). At query time expand the query to include /most/
> > related
> > > concepts and search.
> > > 2) use an external knowledgebase such as a taxonomy that indicates
> > > relationships between concepts (this is the approach we use). At query
> > time
> > > expand the query to include related concepts and search.
> > >
> > > -sujit
> > >
> > > On Sat, Mar 8, 2014 at 8:21 AM, Sohan Kalsariya <
> > sohankalsar...@gmail.com
> > > >wrote:
> > >
> > > > Basically, when i searched it on Google I got this result :
> > > >
> > > >
> > > >
> > >
> >
> http://www.opensourceconnections.com/2013/08/25/semantic-search-with-solr-and-python-numpy/
> > > >
> > > > And I am working on this.
> > > >
> > > > So is this useful ?
> > > >
> > > >
> > > > On Sat, Mar 8, 2014 at 3:11 PM, Alexandre Rafalovitch <
> > > arafa...@gmail.com
> > > > >wrote:
> > > >
> > > > > And how would it know to give you those results? Obviously, you
> have
> > > > > some sort of magic/algorithm in your mind. Are you doing geographic
> > > > > location match, category match, synonyms match?
> > > > >
> > > > > We can't really help with generic questions. You still need to
> figure
> > > > > out what "semantic" means for you specifically.
> > > > >
> > > > > Regards,
> > > > >Alex.
> > > > > Personal website: http://www.outerthoughts.com/
> > > > > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> > > > > - Time is the quality of nature that keeps events from happening
> all
> > > > > at once. Lately, it doesn't seem to be working.  (Anonymous  - via
> > GTD
> > > > > book)
> > > > >
> > > > >
> > > > > On Sat, Mar 8, 2014 at 4:27 PM, Sohan Kalsariya
> > > > >  wrote:
> > > > > > Hello,
> > > > > >
> > > > > > I am working on an event listing and promotions website(
> > > > > > http://allevents.in) and I want to apply semantic search on
> solr.
> > > > > > For example, if someone search :
> > > > > >
> > > > > > "Musical Events in New York"
> > > > > > So it would give me results such as :
> > > > > >
> > > > > >  * Musical Night at ABC place
> > > > > >  * Concerts Events
> > > > > >  * Classical Music Event
> > > > > > I mean all results should be Semantic to the Search_Query it
> should
> > > not
> > > > > > give the results only based on "tf-idf". So can you please make
> me
> > > > > > understand how do i proceed to apply Semantic Search in Solr. (
> > > > > allevents.in)
> > > > > >
> > > > > > --
> > > > > > Regards,
> > > > > > *Sohan Kalsariya*
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Regards,
> > > > *Sohan Kalsariya*
> > > >
> > >
> >
> >
> >
> > --
> > Regards,
> > *Sohan Kalsariya*
> >
>



-- 
Regard