Re: Restrict search to subset (a list of aprrox 40,000 ids from an external service) of corpus

2014-11-17 Thread deviantcode
Hi, 
I am not sure i understand what you mean. Could you kindly elaborate
further?
Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Restrict-search-to-subset-a-list-of-aprrox-40-000-ids-from-an-external-service-of-corpus-tp4169210p4169435.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Restrict search to subset (a list of aprrox 40,000 ids from an external service) of corpus

2014-11-17 Thread deviantcode
Hi Jurgen, 
Thanks for the reply. There actually is a complex query which i
oversimplified in the post. as:
"q=id:(id1 OR id2 OR id3 OR id4 ... OR id4 ) AND name:*" I am not
searching for docs with those ids, 
i wish to restrict my actual search query to those ids similar to how an
'fq' works.
Perhaps a better example in a "family" schema =>
 "Find all parent-ids from the list of 40,000 ids, (subset of all the parent
ids) that have children between ages "X" TO "Y" AND attend a schools in
locality "Z"  where the parent themselves are bankers OR engineers AND earn
between 40-50k".

Its something like this im working with and only want to restirict to the
ids provided and not search the whole corpus.

Hope this clarifies it. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Restrict-search-to-subset-a-list-of-aprrox-40-000-ids-from-an-external-service-of-corpus-tp4169210p4169436.html
Sent from the Solr - User mailing list archive at Nabble.com.


unloading a solr core doesn't free any memory

2014-11-17 Thread Ofer Fort
Getting a lot of those today.
Is it all from the same site we saw last week?

OFER FORT
Head of R&D

437 Fifth Avenue 9th floor, New York, NY 10016
cell: ISR +972-54-5678339  US +1 212 738 9594 ext 34
skype: oferfort
tracx
social intelligence
www.tracx.com
Follow us:
[Tracx on Facebook] [Tracx on Twitter] 
  [Tracx on Linked In] 
  [Making Tracx] 



Re: Restrict search to subset (a list of aprrox 40,000 ids from an external service) of corpus

2014-11-17 Thread Alexandre Rafalovitch
On 17 November 2014 04:47, deviantcode  wrote:
> Find all parent-ids from the list of 40,000 ids, (subset of all the parent
> ids)

And how do you calculate that subset? Is that absolutely not something
that translates into the rules that can be codified in Solr?

Just passing 40,000 ids into Solr is going to be performance killer
all by itself. How often that subset changes? If not often, there
might be other ways to create a semi-permanent marker of some sort.

Regards,
Alex.


Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


Solr HTTP client authentication

2014-11-17 Thread Bai Shen
I am using solrj to connect to my solr server.  However I need to
authenticate against the server and can not find out how to do so using
solrj.  Is this possible or do I need to drop solrj?  I can manually create
an httpclient and set up authentication but then I can't use solrj.

Thanks.


Re: Restrict search to subset (a list of aprrox 40,000 ids from an external service) of corpus

2014-11-17 Thread deviantcode
Unfortunately no, the ids come back from an external service (spark) that
performs complex aspects of the user query which as far is i can tell,
cannot be easily done in solr.
For example assuming, from the "family" schema described previously, the
children docs record, "weight" "height", "year" and "month" fields.
The queries to be performed are like the following:
"find all children who have been gaining weight over the last 6 months"
,
"find children who have recorded a 10% increase in weight over any two
consecutive years",
"find children of age X in the top 25% percentile of height from a
particular school"

The ids that come back match such queries. Is there a way to do these in
solr?

Thanks


On Mon, Nov 17, 2014 at 12:09 PM, Alexandre Rafalovitch [via Lucene] <
ml-node+s472066n416945...@n3.nabble.com> wrote:

> On 17 November 2014 04:47, deviantcode <[hidden email]
> > wrote:
> > Find all parent-ids from the list of 40,000 ids, (subset of all the
> parent
> > ids)
>
> And how do you calculate that subset? Is that absolutely not something
> that translates into the rules that can be codified in Solr?
>
> Just passing 40,000 ids into Solr is going to be performance killer
> all by itself. How often that subset changes? If not often, there
> might be other ways to create a semi-permanent marker of some sort.
>
> Regards,
> Alex.
>
>
> Personal: http://www.outerthoughts.com/ and @arafalov
> Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
> Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
>
>
> --
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/Restrict-search-to-subset-a-list-of-aprrox-40-000-ids-from-an-external-service-of-corpus-tp4169210p4169455.html
>  To unsubscribe from Restrict search to subset (a list of aprrox 40,000
> ids from an external service) of corpus, click here
> 
> .
> NAML
> 
>




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Restrict-search-to-subset-a-list-of-aprrox-40-000-ids-from-an-external-service-of-corpus-tp4169210p4169466.html
Sent from the Solr - User mailing list archive at Nabble.com.

Boosting mixed fiedl types

2014-11-17 Thread eakarsu
I have several field types and like to assign correct boosting so that I will
get results in correct order.
Here is a summary of what I have:
1- Product Title - text field , Boost = 160
2- Product Description - text field  , Boost = 80
3-Number of clicks - Integer field, having value [1 TO 1000] , Boost = 40
4- Product Features - text field , Boost = 20
5- AmountPurchased - Float field , Boost = 10
5- Product Properties - text field , Boost = 5

User will make a search q= "foo bar" and we expect solr will return results
based on Boost values assigned above. qf and pf can help me to assign
boosting for text fields easily. But I am having difficulty to mix text
fields with numeric ones. For example, I want product with Number of clicks
= 20 should be listed higher than one with 10 clicks after 1) and 2).

I guess solr, based on search results, will re order based boost values in
text fields but I want product with number of clicks 10 will be higher than
with clicks 5. As result, any products having clicks will have higher ranks
that products that has features that includes search keywords.

I hope I have explained correctly,

Can you please guide me on how to solve this issue?

Regards

Erol Akarsu



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Boosting-mixed-fiedl-types-tp4169469.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Restrict search to subset (a list of aprrox 40,000 ids from an external service) of corpus

2014-11-17 Thread Erik Hatcher
&fq={!terms}... in theory ought to do the trick pretty performantly.   How's 
that work for you?

Erik


> On Nov 17, 2014, at 08:33, deviantcode  wrote:
> 
> Unfortunately no, the ids come back from an external service (spark) that
> performs complex aspects of the user query which as far is i can tell,
> cannot be easily done in solr.
> For example assuming, from the "family" schema described previously, the
> children docs record, "weight" "height", "year" and "month" fields.
> The queries to be performed are like the following:
>"find all children who have been gaining weight over the last 6 months"
> ,
>"find children who have recorded a 10% increase in weight over any two
> consecutive years",
>"find children of age X in the top 25% percentile of height from a
> particular school"
> 
> The ids that come back match such queries. Is there a way to do these in
> solr?
> 
> Thanks
> 
> 
> On Mon, Nov 17, 2014 at 12:09 PM, Alexandre Rafalovitch [via Lucene] <
> ml-node+s472066n416945...@n3.nabble.com> wrote:
> 
>> On 17 November 2014 04:47, deviantcode <[hidden email]
>> > wrote:
>>> Find all parent-ids from the list of 40,000 ids, (subset of all the
>> parent
>>> ids)
>> 
>> And how do you calculate that subset? Is that absolutely not something
>> that translates into the rules that can be codified in Solr?
>> 
>> Just passing 40,000 ids into Solr is going to be performance killer
>> all by itself. How often that subset changes? If not often, there
>> might be other ways to create a semi-permanent marker of some sort.
>> 
>> Regards,
>>Alex.
>> 
>> 
>> Personal: http://www.outerthoughts.com/ and @arafalov
>> Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
>> Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
>> 
>> 
>> --
>> If you reply to this email, your message will be added to the discussion
>> below:
>> 
>> http://lucene.472066.n3.nabble.com/Restrict-search-to-subset-a-list-of-aprrox-40-000-ids-from-an-external-service-of-corpus-tp4169210p4169455.html
>> To unsubscribe from Restrict search to subset (a list of aprrox 40,000
>> ids from an external service) of corpus, click here
>> 
>> .
>> NAML
>> 
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Restrict-search-to-subset-a-list-of-aprrox-40-000-ids-from-an-external-service-of-corpus-tp4169210p4169466.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Boosting mixed fiedl types

2014-11-17 Thread Ahmet Arslan
Hi,

Edismax has the boost parameter for example. It is multiplicative boost.

boost=log(NumberOfClicks)

https://cwiki.apache.org/confluence/display/solr/The+Extended+DisMax+Query+Parser


Ahmet



On Monday, November 17, 2014 3:38 PM, eakarsu  wrote:
I have several field types and like to assign correct boosting so that I will
get results in correct order.
Here is a summary of what I have:
1- Product Title - text field , Boost = 160
2- Product Description - text field  , Boost = 80
3-Number of clicks - Integer field, having value [1 TO 1000] , Boost = 40
4- Product Features - text field , Boost = 20
5- AmountPurchased - Float field , Boost = 10
5- Product Properties - text field , Boost = 5

User will make a search q= "foo bar" and we expect solr will return results
based on Boost values assigned above. qf and pf can help me to assign
boosting for text fields easily. But I am having difficulty to mix text
fields with numeric ones. For example, I want product with Number of clicks
= 20 should be listed higher than one with 10 clicks after 1) and 2).

I guess solr, based on search results, will re order based boost values in
text fields but I want product with number of clicks 10 will be higher than
with clicks 5. As result, any products having clicks will have higher ranks
that products that has features that includes search keywords.

I hope I have explained correctly,

Can you please guide me on how to solve this issue?

Regards

Erol Akarsu



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Boosting-mixed-fiedl-types-tp4169469.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Restrict search to subset (a list of aprrox 40,000 ids from an external service) of corpus

2014-11-17 Thread deviantcode
Thanks Eric, I will give that  go and try to workout the number of ids i can
safely pass to {!terms}. 
Also, is this to confirm that solr cannot execute such queries as those 
three i listed earlier?
Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Restrict-search-to-subset-a-list-of-aprrox-40-000-ids-from-an-external-service-of-corpus-tp4169210p4169477.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Can I be added to the Wiki contributors group?

2014-11-17 Thread Erick Erickson
done, thanks!

On Sun, Nov 16, 2014 at 8:28 PM, Xavier Morera  wrote:
> I mean for: https://wiki.apache.org/solr/FrontPage
>
> My username is XavierMorera
>
> Regards,
> Xavier
>
> --
>
> *Xavier Morera*
>
> Entrepreneur | Author & Trainer | Consultant | Developer & Scrum Master
>
> *www.xaviermorera.com *
>
> office:  (305) 600-4919
>
> cel: +506 8849-8866
>
> skype: xmorera
> Twitter  | LinkedIn
>  | Pluralsight Author
> 


Re: Does ReRankQuery support reranking the result of a FuzzyQuery?

2014-11-17 Thread Brian Sawyer
To answer myself, looks like this was fixed as part of
https://issues.apache.org/jira/browse/SOLR-6323.

On Mon, Nov 10, 2014 at 1:50 PM, Brian Sawyer  wrote:

> Hello,
>
> We are trying to make use of the new ReRankQuery to rescore results
> according to a custom function but run into problems when our main query
> includes a FuzzyQuery.
>
> Using the example setup in Solr 4.10.2 querying:
>
> q=name:Dell~1
> &rq={!rerank reRankQuery=id:whatever}
>
> results in:
> java.lang.UnsupportedOperationException: Query name:delk~1 does not
> implement createWeight
>
> Is this a bug or is this intended?
>
> Thanks,
> Brian
>
> Full stack trace below:
>
> java.lang.UnsupportedOperationException: Query name:delk~1 does not
> implement createWeight
> at org.apache.lucene.search.Query.createWeight(Query.java:80)
> at org.apache.solr.search.ReRankQParserPlugin$ReRankWeight. (ReRankQParserPlugin.java:177)
> at
> org.apache.solr.search.ReRankQParserPlugin$ReRankQuery.createWeight(ReRankQParserPlugin.java:163)
> at
> org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:684)
> at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297)
> at
> org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:209)
> at
> org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1619)
> at
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1433)
> at
> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:514)
> at
> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:485)
> at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1967)
> at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
> at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
> at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
> at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
> at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
> at
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
> at org.eclipse.jetty.server.Server.handle(Server.java:368)
> at
> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
> at
> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
> at
> org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
> at
> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
> at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
> at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
> at
> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
> at
> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
> at java.lang.Thread.run(Thread.java:722)
>


Re: Boosting mixed fiedl types

2014-11-17 Thread eakarsu
Ahmet,

Thanks

boost parameter is determining boost value for whole query. But I am
assigning boost for other individual fields. I worry whether or not boost
parameter and invidiual boosts with bf and pdf will rank results  properly.

Erol Akarsu



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Boosting-mixed-fiedl-types-tp4169469p4169489.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr HTTP client authentication

2014-11-17 Thread Anurag Sharma
I think Solr encourage SSL than authentication

On Mon, Nov 17, 2014 at 6:08 PM, Bai Shen  wrote:

> I am using solrj to connect to my solr server.  However I need to
> authenticate against the server and can not find out how to do so using
> solrj.  Is this possible or do I need to drop solrj?  I can manually create
> an httpclient and set up authentication but then I can't use solrj.
>
> Thanks.
>


RE: Solr HTTP client authentication

2014-11-17 Thread Fuad Efendi
>  I can 
> manually create an httpclient and set up authentication but then I can't use 
> solrj.

Yes; correct; except that you _can_ use solj with this custom HttpClient 
instance (which will intercept authentication, which will support cookies, SSL 
or plain HTTP, Keep-Alive, and etc.)

You can provide to SolrJ custom HttpClient at construction:

final HttpSolrServer myHttpSolrServer =
new HttpSolrServer(
SOLR_URL_BASE + "/" + SOLR_CORE_NAME,
myHttpClient);


Best Regards,

http://www.tokenizer.ca


-Original Message-
From: Anurag Sharma [mailto:anura...@gmail.com] 
Sent: November-17-14 11:21 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr HTTP client authentication

I think Solr encourage SSL than authentication

On Mon, Nov 17, 2014 at 6:08 PM, Bai Shen  wrote:

> I am using solrj to connect to my solr server.  However I need to 
> authenticate against the server and can not find out how to do so 
> using solrj.  Is this possible or do I need to drop solrj?  I can 
> manually create an httpclient and set up authentication but then I can't use 
> solrj.
>
> Thanks.
>



Re: Solr HTTP client authentication

2014-11-17 Thread Bai Shen
I had seen where I could pass in an HttpClient to the SolrServer.  The
problem is that the HttpClient only receives the authentication information
through the execute method using the context. See the example located here.

https://hc.apache.org/httpcomponents-client-4.3.x/tutorial/html/authentication.html

DefaultHttpClient has methods to set the authentication information but the
class is deprecated.

Thanks.

On Mon, Nov 17, 2014 at 11:35 AM, Fuad Efendi  wrote:

> >  I can
> > manually create an httpclient and set up authentication but then I can't
> use solrj.
>
> Yes; correct; except that you _can_ use solj with this custom HttpClient
> instance (which will intercept authentication, which will support cookies,
> SSL or plain HTTP, Keep-Alive, and etc.)
>
> You can provide to SolrJ custom HttpClient at construction:
>
> final HttpSolrServer myHttpSolrServer =
> new HttpSolrServer(
> SOLR_URL_BASE + "/" + SOLR_CORE_NAME,
> myHttpClient);
>
>
> Best Regards,
>
> http://www.tokenizer.ca
>
>
> -Original Message-
> From: Anurag Sharma [mailto:anura...@gmail.com]
> Sent: November-17-14 11:21 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr HTTP client authentication
>
> I think Solr encourage SSL than authentication
>
> On Mon, Nov 17, 2014 at 6:08 PM, Bai Shen  wrote:
>
> > I am using solrj to connect to my solr server.  However I need to
> > authenticate against the server and can not find out how to do so
> > using solrj.  Is this possible or do I need to drop solrj?  I can
> > manually create an httpclient and set up authentication but then I can't
> use solrj.
> >
> > Thanks.
> >
>
>


Re: Solr HTTP client authentication

2014-11-17 Thread Jürgen Wagner (DVT)
Why rely on the default http client? Why not create one with

HttpClients.custom()
.setDefaultSocketConfig(socketConfig)
.setDefaultRequestConfig(requestConfig)
.setSSLSocketFactory(sslsf)
.build();

that has the SSLConnectionSocketFactory property set up with an
SSLContext that has the trust store and key store loaded properly?

Best,
--J.

On 17.11.2014 18:41, Bai Shen wrote:
> I had seen where I could pass in an HttpClient to the SolrServer.  The
> problem is that the HttpClient only receives the authentication information
> through the execute method using the context. See the example located here.
>
> https://hc.apache.org/httpcomponents-client-4.3.x/tutorial/html/authentication.html
>
> DefaultHttpClient has methods to set the authentication information but the
> class is deprecated.
>
> Thanks.
>
> On Mon, Nov 17, 2014 at 11:35 AM, Fuad Efendi  wrote:
>
>>>  I can
>>> manually create an httpclient and set up authentication but then I can't
>> use solrj.
>>
>> Yes; correct; except that you _can_ use solj with this custom HttpClient
>> instance (which will intercept authentication, which will support cookies,
>> SSL or plain HTTP, Keep-Alive, and etc.)
>>
>> You can provide to SolrJ custom HttpClient at construction:
>>
>> final HttpSolrServer myHttpSolrServer =
>> new HttpSolrServer(
>> SOLR_URL_BASE + "/" + SOLR_CORE_NAME,
>> myHttpClient);
>>
>>
>> Best Regards,
>>
>> http://www.tokenizer.ca
>>
>>
>> -Original Message-
>> From: Anurag Sharma [mailto:anura...@gmail.com]
>> Sent: November-17-14 11:21 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Solr HTTP client authentication
>>
>> I think Solr encourage SSL than authentication
>>
>> On Mon, Nov 17, 2014 at 6:08 PM, Bai Shen  wrote:
>>
>>> I am using solrj to connect to my solr server.  However I need to
>>> authenticate against the server and can not find out how to do so
>>> using solrj.  Is this possible or do I need to drop solrj?  I can
>>> manually create an httpclient and set up authentication but then I can't
>> use solrj.
>>> Thanks.
>>>
>>


-- 

Mit freundlichen Grüßen/Kind regards/Cordialement vôtre/Atentamente/С
уважением
*i.A. Jürgen Wagner*
Head of Competence Center "Intelligence"
& Senior Cloud Consultant

Devoteam GmbH, Industriestr. 3, 70565 Stuttgart, Germany
Phone: +49 6151 868-8725, Fax: +49 711 13353-53, Mobile: +49 171 864 1543
E-Mail: juergen.wag...@devoteam.com
, URL: www.devoteam.de



Managing Board: Jürgen Hatzipantelis (CEO)
Address of Record: 64331 Weiterstadt, Germany; Commercial Register:
Amtsgericht Darmstadt HRB 6450; Tax Number: DE 172 993 071




Shard splitting and HDFS

2014-11-17 Thread Joseph Obernberger
I tried to split a shard using HDFS storage, and at first I received this
error:
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Error
CREATEing SolrCore 'COLLECT1_shard1_0_replica1': Unable to create core
[COLLECT1_shard1_0_replica1] Caused by: Direct buffer memory
at
org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:552)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)
at
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:157)
at
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:119)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)

-
It appears that to do a split, you need to allocate double the amount of
direct memory for the HDFS cache.  After doing that, however, I'm getting
an error about not being able to find the clustering class:
219156 [qtp1312435169-19] ERROR org.apache.solr.core.SolrCore  â
org.apache.solr.common.SolrException: Error CREATEing SolrCore
COLLECT1_shard1_0_replica1': Unable to create core
[COLLECT1_shard1_0_replica1] Caused by: solr.clustering.ClusteringComponent
at
org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:613)
at
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestInternal(CoreAdminHandler.java:199)
at
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:188)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:729)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:258)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)

I'm assuming this is a path issue in solrconfig.xml?  I'm not sure how to
work around this.  Any ideas?

Thank you!

-Joe


Shard splitting and HDFS

2014-11-17 Thread Joseph Obernberger
If I create the directory manually on the server that I'm splitting:
COLLECT_shard1_0_replica1
Then do the shard split command, it works OK.

-Joe


Re: Hierarchical faceting

2014-11-17 Thread rashmy1
Hi Alexandre,
Yes, I've read this post and that's the 'Option1' listed in my initial post.

I'm looking to see if Solr has any in-built tokenizer that splits the tokens
and prepends with the depth information. I'd like to avoid building depth
information into the filed values if Solr already has something that can be
used.

Thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Hierarchical-faceting-tp4169263p4169536.html
Sent from the Solr - User mailing list archive at Nabble.com.


More HDFS and Shard Splitting

2014-11-17 Thread Joseph Obernberger
Originally I had two shards on two machines - shard1 and shard2.
I did a SHARDSPLIT on shard1.
Now have shard1, shard2, and shard1_0
If I select the core (COLLECT_shard1_0_replica1) and execute a query, I get
all the docs OK, but if I specific &distrib=false, I get 0 documents.

Under HDFS - when/how will the new core start to get data?
Thank you!

-Joe


Re: Hierarchical faceting

2014-11-17 Thread Alexandre Rafalovitch
You might be able to stick in a couple of PatternReplaceFilterFactory
in a row with regular expressions to catch different levels.

Something like:



...

I did not test this, you may need to escape some thing or put explicit
groups in there.

Regards,
   Alex.
P.s. 
http://www.solr-start.com/javadoc/solr-lucene/org/apache/lucene/analysis/pattern/PatternReplaceFilterFactory.html

Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On 17 November 2014 15:01, rashmy1  wrote:
> Hi Alexandre,
> Yes, I've read this post and that's the 'Option1' listed in my initial post.
>
> I'm looking to see if Solr has any in-built tokenizer that splits the tokens
> and prepends with the depth information. I'd like to avoid building depth
> information into the filed values if Solr already has something that can be
> used.
>
> Thanks!
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Hierarchical-faceting-tp4169263p4169536.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Internals of Analysis and Token Matching

2014-11-17 Thread Pritesh Patel
Hi Community.

Hoping someone can help explain this ...

Once all the analysis is done on a field all the tokens to identify that
field are stored.  What else is affecting a match to the document beyond a
simple token match and frequency of terms that match?

All the searches I did produce the same tokens (verified by using the
analysis screen in the admin, and looking at the terms indexed in solr
through the schema browser for field).  But some match and some don't when
I actually do the search.  I don't know why some of the searches don't
match even though everything in the analysis tells me they have the same
tokens.  What am I missing?

*Descriptions*

*Indexed in a field*: "4048860461"

*Searches that Match*
"4048860461"
"(404)8860461"

*Searches that don't match*
"404-886-0461"
"404)8860461"
"404)886)0461"

*Field analysis*
Field analysis is pretty simple, just used the "text_en_splitting_tight"
field but added an "ngram" filter to it.  See below.

  <
tokenizer class="solr.WhitespaceTokenizerFactory"/>
  


Re: Internals of Analysis and Token Matching

2014-11-17 Thread Alexandre Rafalovitch
Are you trying to match phone numbers despite the
spaces/dashes/brackets? By prefix? Suffix?

If so, you may look at something more like:




And remember, if you are using ngrams, you probably want them in the
index-chain of the analyzer, but not in the query-chain. Otherwise,
you will be matching on anything that has 3 characters overlapping.

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On 17 November 2014 16:43, Pritesh Patel  wrote:
> Hi Community.
>
> Hoping someone can help explain this ...
>
> Once all the analysis is done on a field all the tokens to identify that
> field are stored.  What else is affecting a match to the document beyond a
> simple token match and frequency of terms that match?
>
> All the searches I did produce the same tokens (verified by using the
> analysis screen in the admin, and looking at the terms indexed in solr
> through the schema browser for field).  But some match and some don't when
> I actually do the search.  I don't know why some of the searches don't
> match even though everything in the analysis tells me they have the same
> tokens.  What am I missing?
>
> *Descriptions*
>
> *Indexed in a field*: "4048860461"
>
> *Searches that Match*
> "4048860461"
> "(404)8860461"
>
> *Searches that don't match*
> "404-886-0461"
> "404)8860461"
> "404)886)0461"
>
> *Field analysis*
> Field analysis is pretty simple, just used the "text_en_splitting_tight"
> field but added an "ngram" filter to it.  See below.
>
>  positionIncrementGap="100" autoGeneratePhraseQueries="true">  <
> tokenizer class="solr.WhitespaceTokenizerFactory"/>  "solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand
> ="false"/>  "lang/stopwords_en.txt"/>  generateWordParts="0" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0"/>  "solr.LowerCaseFilterFactory"/>  "solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>  "solr.EnglishMinimalStemFilterFactory"/>  "solr.NGramFilterFactory" minGramSize="3" maxGramSize="20"/> 
>fieldType>


Re: More HDFS and Shard Splitting

2014-11-17 Thread Erick Erickson
Tell us more about your HDFS stuff. Specifically, how
do you have your HDFSDirectoryFactory specified in
solrconfig.xml?

Cause you shouldn't have to do things like create the
directory ahead of time I don't think.

Best,
Erick

On Mon, Nov 17, 2014 at 12:17 PM, Joseph Obernberger
 wrote:
> Originally I had two shards on two machines - shard1 and shard2.
> I did a SHARDSPLIT on shard1.
> Now have shard1, shard2, and shard1_0
> If I select the core (COLLECT_shard1_0_replica1) and execute a query, I get
> all the docs OK, but if I specific &distrib=false, I get 0 documents.
>
> Under HDFS - when/how will the new core start to get data?
> Thank you!
>
> -Joe


chaos monkey

2014-11-17 Thread Arpit Agarwal
Hi All,
I have setup Solr 4.9.0 in a 3 shard configuration running on tomcat servers.
I want to run a test with chaos-monkey to ensure the availability of
the entire system.
Can someone tell me, how I can integrate chaos-monkey with Solr or
make use of Solr test-framework to do the same?

Thanks & Regards
Arpit A


Re: More HDFS and Shard Splitting

2014-11-17 Thread Joseph Obernberger
Looks like the shard split failed, and only created one additional shard.
I didn't allocate enough memory for 3x - since two additional shards needed
to be created.  I was allocating 20G for each shard, so in order do the
split, I needed to give 60G for the direct memory access.  I've now
switched it to 10G, and run the split - that works, but I still need to
build the directories before hand otherwise I get the cannot find class
problem.

Here are my HDFS parameters:

true
80
true
16384
true
false
true
64
512
hdfs://nameservice1:8020/solr6
/etc/hadoop/conf.cloudera.hdfs1


I did have the slab.count set to 160 before, and just didn't have the RAM
to try this out.  The split is now running and I see the amount of space
going into the new shards is increasing.  Looks like it's going to be
overnight before it completes.

-Joe

On Mon, Nov 17, 2014 at 5:57 PM, Erick Erickson 
wrote:

> Tell us more about your HDFS stuff. Specifically, how
> do you have your HDFSDirectoryFactory specified in
> solrconfig.xml?
>
> Cause you shouldn't have to do things like create the
> directory ahead of time I don't think.
>
> Best,
> Erick
>
> On Mon, Nov 17, 2014 at 12:17 PM, Joseph Obernberger
>  wrote:
> > Originally I had two shards on two machines - shard1 and shard2.
> > I did a SHARDSPLIT on shard1.
> > Now have shard1, shard2, and shard1_0
> > If I select the core (COLLECT_shard1_0_replica1) and execute a query, I
> get
> > all the docs OK, but if I specific &distrib=false, I get 0 documents.
> >
> > Under HDFS - when/how will the new core start to get data?
> > Thank you!
> >
> > -Joe
>


Solr 5 release date ?

2014-11-17 Thread roy123
Hi,

  Does anyone when Solr 5.0 is scheduled to release ? I'm planning to
upgrade to 4.10.2, but will wait if there's a plan to roll-out 5.0 pretty
soon.

-Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-5-release-date-tp4169571.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: chaos monkey

2014-11-17 Thread Manoj Bharadwaj
Chaos monkey is designed to work n an AWS context. It uses your AWS account
credentials and via API terminates instances in EC2. So if you have your
shards in EC2, the documentation can be followed to configure it to
terminate the instances as needed.

If you are looking specifically to terminate just tomcat, AFAIK there isn't
a monkey in the simian army that does what you want. There may be other OSS
projects that does this. You can use Latency Monkey from within the simian
army tools to test how your servers handle latency.

It may be simple to just write a shell script to randomly kill a tomcat
instance - if you have ssh setup it will be simple to do something with
Ansible without needing a lot of tools (other than python).

On Mon, Nov 17, 2014 at 5:59 PM, Arpit Agarwal 
wrote:

> Hi All,
> I have setup Solr 4.9.0 in a 3 shard configuration running on tomcat
> servers.
> I want to run a test with chaos-monkey to ensure the availability of
> the entire system.
> Can someone tell me, how I can integrate chaos-monkey with Solr or
> make use of Solr test-framework to do the same?
>
> Thanks & Regards
> Arpit A
>


Re: Solr 5 release date ?

2014-11-17 Thread Erick Erickson
There are rumblings about sometime in December IIRC, nothing's
been committed to though.

Best,
Erick


On Mon, Nov 17, 2014 at 4:24 PM, roy123  wrote:
> Hi,
>
>   Does anyone when Solr 5.0 is scheduled to release ? I'm planning to
> upgrade to 4.10.2, but will wait if there's a plan to roll-out 5.0 pretty
> soon.
>
> -Thanks
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-5-release-date-tp4169571.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Hierarchical faceting

2014-11-17 Thread Jason Hellman
I realize you want to avoid putting depth details into the field values, but 
something has to imply the depth.  So with that in mind, here is another 
approach (with the assumption that you are chasing down a single branch of a 
tree (and all its subbranch offshoots)),

Use dynamic fields
Step from one level to the next with a simple increment
Build the facet for the next level on the call
The UI needs only know the current level

This would possibly be as so:

step_fieldname_n

With a dynamic field configuration of:

step_*

The content of the step_fieldname_n field would either be the strong of the 
field value or the delimited path of the current level (as suited to taste).  
Either way, most likely a fieldType of String (or some variation thereof)

The UI would then call:

facet.field=step_fieldname_n+1

And the UI would need to be aware to carry the n+1 into the fq link verbiage:

fq=step_fieldname_n+1:facetvalue

The trick of all of this is that you must build your index with the depth of 
your hierarchy in mind to place the values into the suitable fields.  You 
could, of course, write an UpdateProcessor to accomplish this if that seems 
fitting.

Jason

> On Nov 17, 2014, at 12:22 PM, Alexandre Rafalovitch  
> wrote:
> 
> You might be able to stick in a couple of PatternReplaceFilterFactory
> in a row with regular expressions to catch different levels.
> 
> Something like:
> 
>  pattern="^[^0-9][^/]+/[^/]/[^/]+$" replacement="2$0" />
>  pattern="^[^0-9][^/]+/[^/]$" replacement="1$0" />
> ...
> 
> I did not test this, you may need to escape some thing or put explicit
> groups in there.
> 
> Regards,
>   Alex.
> P.s. 
> http://www.solr-start.com/javadoc/solr-lucene/org/apache/lucene/analysis/pattern/PatternReplaceFilterFactory.html
> 
> Personal: http://www.outerthoughts.com/ and @arafalov
> Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
> Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
> 
> 
> On 17 November 2014 15:01, rashmy1  wrote:
>> Hi Alexandre,
>> Yes, I've read this post and that's the 'Option1' listed in my initial post.
>> 
>> I'm looking to see if Solr has any in-built tokenizer that splits the tokens
>> and prepends with the depth information. I'd like to avoid building depth
>> information into the filed values if Solr already has something that can be
>> used.
>> 
>> Thanks!
>> 
>> 
>> 
>> --
>> View this message in context: 
>> http://lucene.472066.n3.nabble.com/Hierarchical-faceting-tp4169263p4169536.html
>> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Hierarchical faceting

2014-11-17 Thread Evan Pease
>I'm looking to see if Solr has any in-built tokenizer that splits the
tokens
>and prepends with the depth information. I'd like to avoid building depth
>information into the filed values if Solr already has something that can be
>used.

So the goal is to find out the level of the tree for each category? You
could determine this in the UI by splitting the category facet value string
by the separator.

As you're aware, when you query a field indexed using
solr.PathHierarchyTokenizerFactory
you still get the full path category path back as a facet value.

For example, if a user navigates to "Phy":
fq={!term f=category}NonFic/Sci/Phy

The facet values that are returned will look like this (made up counts):


  10
  
wrote:

> I realize you want to avoid putting depth details into the field values,
> but something has to imply the depth.  So with that in mind, here is
> another approach (with the assumption that you are chasing down a single
> branch of a tree (and all its subbranch offshoots)),
>
> Use dynamic fields
> Step from one level to the next with a simple increment
> Build the facet for the next level on the call
> The UI needs only know the current level
>
> This would possibly be as so:
>
> step_fieldname_n
>
> With a dynamic field configuration of:
>
> step_*
>
> The content of the step_fieldname_n field would either be the strong of
> the field value or the delimited path of the current level (as suited to
> taste).  Either way, most likely a fieldType of String (or some variation
> thereof)
>
> The UI would then call:
>
> facet.field=step_fieldname_n+1
>
> And the UI would need to be aware to carry the n+1 into the fq link
> verbiage:
>
> fq=step_fieldname_n+1:facetvalue
>
> The trick of all of this is that you must build your index with the depth
> of your hierarchy in mind to place the values into the suitable fields.
> You could, of course, write an UpdateProcessor to accomplish this if that
> seems fitting.
>
> Jason
>
> > On Nov 17, 2014, at 12:22 PM, Alexandre Rafalovitch 
> wrote:
> >
> > You might be able to stick in a couple of PatternReplaceFilterFactory
> > in a row with regular expressions to catch different levels.
> >
> > Something like:
> >
> >  > pattern="^[^0-9][^/]+/[^/]/[^/]+$" replacement="2$0" />
> >  > pattern="^[^0-9][^/]+/[^/]$" replacement="1$0" />
> > ...
> >
> > I did not test this, you may need to escape some thing or put explicit
> > groups in there.
> >
> > Regards,
> >   Alex.
> > P.s.
> http://www.solr-start.com/javadoc/solr-lucene/org/apache/lucene/analysis/pattern/PatternReplaceFilterFactory.html
> >
> > Personal: http://www.outerthoughts.com/ and @arafalov
> > Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
> > Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
> >
> >
> > On 17 November 2014 15:01, rashmy1 
> wrote:
> >> Hi Alexandre,
> >> Yes, I've read this post and that's the 'Option1' listed in my initial
> post.
> >>
> >> I'm looking to see if Solr has any in-built tokenizer that splits the
> tokens
> >> and prepends with the depth information. I'd like to avoid building
> depth
> >> information into the filed values if Solr already has something that
> can be
> >> used.
> >>
> >> Thanks!
> >>
> >>
> >>
> >> --
> >> View this message in context:
> http://lucene.472066.n3.nabble.com/Hierarchical-faceting-tp4169263p4169536.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
>
>