Collection fix predefined hash range for data import.

2014-06-12 Thread SolrUser1543
I have a collection containing n shards. 

Now I want to create a new collection and perform a data import from old to
new one. 

How can I make hash ranges of new collection be the same as old one , in
order to make data import be locally ( on the same machine ) ? 

I mean , if shard#3 of old collection has range 100-200 for example, so how
can I force the new shard#3 have the same range ?  





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Collection-fix-predefined-hash-range-for-data-import-tp4141365.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Problem faceting

2014-06-12 Thread Thomas Egense
First of all, make sure you use docvalues for facet fields with many unique
values.

If that still does not help you can try the following.
My kollega Toke Eskildsen  has made a huge improvement when faceting IF the
number of results in the facets are less than 8% of the total number of
documents.
In this case we get a substantial improvement in both memory use and query
time:
See: https://plus.google.com/+TokeEskildsen/posts/7oGxWZRKJEs
We have tested it for index with 300M documents.

From,
Thomas Egense



On Wed, Jun 11, 2014 at 5:36 PM, marcos palacios 
wrote:

> Hello everyone.
>
>
>
> I’m having problems with the performance of queries with  facets, the temp
> expend to resolve a query is very high.
>
>
>
> The index has 10Millions of documents, each one with 100 fields.
>
> The server has 8 cores and 56 Gb of ram, running with jetty with this
> memory configuration: -Xms24096m -Xmx44576m
>
>
>
> When I do a query, with 20 facets, the time expended is 4 – 5 seconds. If
> the same request is did another time, the
>
>
>
> Debug query first execution:
>
> 6037.0 name="time">265.0 name="time">5772.0
>
>
>
> Debug query seconds executions:
>
> 6037.0 name="time">1.0 name="time">4872.0
>
>
>
>
>
> What can I do? Why the facets are not cached?
>
>
>
>
>
> Thank you, Marcos
>


RE: Solr search

2014-06-12 Thread Shay Sofer
Thanks for your reply.

How can I support suffix search?

Name: Hello_world
Search: *world

And I'll get hello_world as a result.

Thanks in advance.



-Original Message-
From: Shawn Heisey [mailto:s...@elyograg.org] 
Sent: Wednesday, June 11, 2014 5:47 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr search

> Hi,
>
> Any suggestion for tokenizer / filter / other solutions that support 
> search in Solr as following -
>
> Use Case
>
> Input
>
> Solr should return
>
> All Results
>
> *
>
> All results
>
> Prefix Search
>
> Text*
>
> All data started by Text* (Prefix search)
>
> Exact Search
>
> "Auto Text"
>
> Exact match. Only Auto Text
>
> Partial (substring)
>
> *Text*
>
> All strings contains the text
>
>
> Now I'm using KeywordTokenizerFactory and WordDelimiterFilterFactory.
>
> My issue is with exact search:
> When I have document named hello_world, and I'm trying to do Exact 
> Search of hello, I got "hello_world" as a result (I want to get only 
> hello named doccuments).

The WordDelimeterFilter will split on the underscore, which means that the term 
"hello" is in the index for that document. Leave that filter out if you really 
do want an exact match.

Searching for "*" by itself is not how you match all documents. It may work, 
but it is a wildcard search, which means under the covers that it's a search 
for every term in the index for that field. It's SLOW. The special shortcut *:* 
(this must be the entire query with no field name, and I'm assuming the 
standard query parser here) is what you want for all documents. In terms of 
user input, this is what you want to use when the user leaves the search box 
empty. If you're using dismax or edismax, then you would send an empty q 
parameter or leave it off entirely, and define a default q.alt parameter in 
solrconfig.xml, set to *:* for all docs.

Thanks,
Shawn





Email secured by Check Point


Warning message logs on startup after upgrading to 4.8.1

2014-06-12 Thread Marius Dumitru Florea
Hi guys,

After I upgraded to Solr 4.8.1 I got a few warning messages in the log
at startup:

WARN  o.a.s.c.SolrResourceLoader - Solr loaded a deprecated
plugin/analysis class [solr.ThaiWordFilterFactory]. Please consult
documentation how to replace it accordingly.

I fixed this with
https://github.com/xwiki/xwiki-platform/commit/d41580c383f40d2aa4e4f551971418536a3f3a20#diff-44d79e64e45f3b05115aebcd714bd897L1159

WARN  o.a.s.r.ManagedResource- No stored data found for
/schema/analysis/stopwords/english
WARN  o.a.s.r.ManagedResource- No stored data found for
/schema/analysis/synonyms/english

I fixed these by commenting out the managed_en field type in my
schema, see 
https://github.com/xwiki/xwiki-platform/commit/d41580c383f40d2aa4e4f551971418536a3f3a20#diff-44d79e64e45f3b05115aebcd714bd897L486

And now I'm left with:

WARN  o.a.s.r.ManagedResource- No stored data found for /rest/managed
WARN  o.a.s.r.ManagedResource- No registered observers for /rest/managed

How can I get rid of these 2?

This jira issue is related https://issues.apache.org/jira/browse/SOLR-6128 .

Thanks,
Marius


Re: span query with SHOUD semantic instead of MUST HAVE

2014-06-12 Thread wanggaohang

q.op=OR
On 2014??06??06?? 20:48, ?? wrote:

hi,


I have two docs,
 a) "aa bb cc" and,
 b) "aa cc bb".
The query is "aa bb". What I expected is the doc a comes first with a higher 
score than doc b because the term distance in query and that in doc a are more similar.
After google for a while I get it down with the span query q: "aa bb"~10. However, when I 
change my query into "aa bb dd"~10, the span query return nothing
hits becuase dd can not be found in any doc. So what's a solution to this 
problem?


Thanks.




What is best practice for Solr + Jetty installation

2014-06-12 Thread elmerfudd
According to   CWIKI
  ",
the working Jetty server in SOLR example folder is optimized and is
recommended for optimal SOLR performance.

1. What settings are optimized in the example folder and
how a stand alone Jetty installation can be optimized to get an optimal
performance?

2. How the provided Jetty server can be used as template ?

thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/What-is-best-practice-for-Solr-Jetty-installation-tp4141396.html
Sent from the Solr - User mailing list archive at Nabble.com.


How to query for content with ACLs?

2014-06-12 Thread lalitjangra
Hi,I have integrated Solr 4.6 with Apache ManifoldCF 1.5 to crawl sharepoint,
& shared drives. Now i am able to index content from these sources along
with ACL details which are stored in solr index.Now i want to perform search
queries on solr index to get search results containing these ACLs. E.g. If
user A is searching into solr indexes, he should be rendered only those
search results on which he has permissions say read permissions.I am still
working on how to create such queries of putting user details into solr
queries to get compared with stored ACLs in solr?Does anybody has an idea?
Any help would be appreciated.Regards. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-query-for-content-with-ACLs-tp4141402.html
Sent from the Solr - User mailing list archive at Nabble.com.

split field on json update

2014-06-12 Thread elisabeth benoit
Hello,

Is it possible, in solr 4.2.1, to split a multivalued field with a json
update as it is possible do to with a csv update?

with csv
/update/csv?f.address.split=true&f.address.separator=%2C&commit=true

with json (using a post)
/update/json

Thanks,
Elisabeth


Re: split field on json update

2014-06-12 Thread Alexandre Rafalovitch
There is always UpdateRequestProcessor.

Regards,
Alex
On 12/06/2014 7:05 pm, "elisabeth benoit"  wrote:

> Hello,
>
> Is it possible, in solr 4.2.1, to split a multivalued field with a json
> update as it is possible do to with a csv update?
>
> with csv
> /update/csv?f.address.split=true&f.address.separator=%2C&commit=true
>
> with json (using a post)
> /update/json
>
> Thanks,
> Elisabeth
>


Re: How to query for content with ACLs?

2014-06-12 Thread Ahmet Arslan
Hi Lalitjangra,

MCF in Action book is publicly available to anyone : 
https://manifoldcfinaction.googlecode.com/svn/trunk/pdfs/

You need to download/use mcf-solr4x-plugin to filter results. There are two 
separate options, SearchComponent and QParserPlugin.  
http://manifoldcf.apache.org/en_US/download.html#Plugin+for+Apache+Solr+4.x

However this discussion should be continued in MCF user mailing list : 
http://manifoldcf.apache.org/en_US/mail.html

Ahmet




On Thursday, June 12, 2014 2:39 PM, lalitjangra  
wrote:
Hi,I have integrated Solr 4.6 with Apache ManifoldCF 1.5 to crawl sharepoint,
& shared drives. Now i am able to index content from these sources along
with ACL details which are stored in solr index.Now i want to perform search
queries on solr index to get search results containing these ACLs. E.g. If
user A is searching into solr indexes, he should be rendered only those
search results on which he has permissions say read permissions.I am still
working on how to create such queries of putting user details into solr
queries to get compared with stored ACLs in solr?Does anybody has an idea?
Any help would be appreciated.Regards. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-query-for-content-with-ACLs-tp4141402.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to query for content with ACLs?

2014-06-12 Thread lalitjangra
Thanks Ahmet ,

I have already setup mcf-solr4x-plugin in MCF 1.5.1 and i can see ACLs
indexed into solr indexes. 

But now i assume i need to write Solr query to put a user's permission
details into in it which can be compared to ACL stored in solr. This is why
i have posted it here. Also i have posted it to MCF list.

Can anybody help?

Regards. 





--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-query-for-content-with-ACLs-tp4141402p4141419.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Performance/scaling with custom function queries

2014-06-12 Thread Robert Krüger
Thanks for the info. I will look at that.

On Wed, Jun 11, 2014 at 3:47 PM, Joel Bernstein  wrote:
> In Solr 4.9 there is a feature called RankQueries, that allows you to
> plugin your own ranking collector. So, if you wanted to write a
> ranking/sorting collector that used a thread per segment, you could cleanly
> plug it in.
>
> Joel Bernstein
> Search Engineer at Heliosearch
>
>
> On Wed, Jun 11, 2014 at 9:39 AM, david.w.smi...@gmail.com <
> david.w.smi...@gmail.com> wrote:
>
>> On Wed, Jun 11, 2014 at 7:46 AM, Robert Krüger 
>> wrote:
>>
>> > Or will I have to set up distributed search to achieve that?
>>
>>
>> Yes — you have to shard it to achieve that.  The shards could be on the
>> same node.
>>
>> There were some discussions this year in JIRA about being able to do
>> thread-per-segment but it’s not quite there yet.  FWIW I think it would be
>> a nice option for some use-cases (like yours).
>>
>> ~ David Smiley
>> Freelance Apache Lucene/Solr Search Consultant/Developer
>> http://www.linkedin.com/in/davidwsmiley
>>



-- 
Robert Krüger
Managing Partner
Lesspain GmbH & Co. KG

www.lesspain-software.com


Schema editing in SolrCloud

2014-06-12 Thread michael.boom
I'm having a SolrCloud setup using Solr 4.6 with several configuration sets
and multiple collections, some sharing the same config set.
I would like now to update the schema inside a config set, adding a new
field.

1. Can i do this directly downloading the schema file and re-uploading after
editing or do i have to download the whole config set and re-upload after
editing only the schema file?

2. At
https://cwiki.apache.org/confluence/display/solr/Command+Line+Utilities I've
saw the it is possible to put  data inside a Zookeeper file, but how can one
specify which config set should that file be uploaded into?

Thanks! 



-
Thanks,
Michael
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Schema-editing-in-SolrCloud-tp4141423.html
Sent from the Solr - User mailing list archive at Nabble.com.


Fwd: Implementing Hive query in Solr

2014-06-12 Thread Vivekanand Ittigi
Hi,

Can anyone please look into this issue. I want to implement this query in
solr.

Thanks,
Vivek

-- Forwarded message --
From: Vivekanand Ittigi 
Date: Thu, Jun 12, 2014 at 11:08 AM
Subject: Implementing Hive query in Solr
To: "solr-user@lucene.apache.org" 


Hi,

My requirements is to execute this query(hive) in solr:

select SUM(Primary_cause_vaR),collect_set(skuType),RiskType,market,
collect_set(primary_cause) from bil_tos Where skuType='Product' group by
RiskType,market;

I can implement sum and groupBy operations in solr using StatsComponent
concept but i've no idea to implement collect_set() in solr.

Collect_set() is used in Hive queries.
Please provide me equivalent function for collect_set in solr or links or
how to achieve it. It'd be a great help.


Thanks,
Vivek


RE: What is best practice for Solr + Jetty installation

2014-06-12 Thread Boogie Shafer

you might want to take a look at the rpm building scripts i have here

https://github.com/boogieshafer/jetty-solr-rpm


gives an example of taking the included jetty and tweaking it in a few ways to 
make it more production ready by adding init script, configuring JMX, tuning 
logging and putting all the java options in an options file, etc





From: elmerfudd 
Sent: Thursday, June 12, 2014 04:14
To: solr-user@lucene.apache.org
Subject: What is best practice for Solr + Jetty installation

According to   CWIKI
  ",
the working Jetty server in SOLR example folder is optimized and is
recommended for optimal SOLR performance.

1. What settings are optimized in the example folder and
how a stand alone Jetty installation can be optimized to get an optimal
performance?

2. How the provided Jetty server can be used as template ?

thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/What-is-best-practice-for-Solr-Jetty-installation-tp4141396.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Collection fix predefined hash range for data import.

2014-06-12 Thread Erick Erickson
bq: Now I want to create a new collection and perform a data import from old to
new one.

Let's start there before considering hash ranges. exactly how do you intend
to do this? Forget about mapping the hash ranges, how do you expect to
move the data?

And, even more important, what is the _reason_ you're trying to do this? This
might be an XY problem...

Best,
Erick

On Thu, Jun 12, 2014 at 12:22 AM, SolrUser1543  wrote:
> I have a collection containing n shards.
>
> Now I want to create a new collection and perform a data import from old to
> new one.
>
> How can I make hash ranges of new collection be the same as old one , in
> order to make data import be locally ( on the same machine ) ?
>
> I mean , if shard#3 of old collection has range 100-200 for example, so how
> can I force the new shard#3 have the same range ?
>
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Collection-fix-predefined-hash-range-for-data-import-tp4141365.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: split field on json update

2014-06-12 Thread elisabeth benoit
Thanks for your answer,

best regards,
Elisabeth


2014-06-12 14:07 GMT+02:00 Alexandre Rafalovitch :

> There is always UpdateRequestProcessor.
>
> Regards,
> Alex
> On 12/06/2014 7:05 pm, "elisabeth benoit" 
> wrote:
>
> > Hello,
> >
> > Is it possible, in solr 4.2.1, to split a multivalued field with a json
> > update as it is possible do to with a csv update?
> >
> > with csv
> > /update/csv?f.address.split=true&f.address.separator=%2C&commit=true
> >
> > with json (using a post)
> > /update/json
> >
> > Thanks,
> > Elisabeth
> >
>


Re: Solr search

2014-06-12 Thread Erick Erickson
Stop and back up..

It's very unusual to use KeywordTokenizer with WDDF, it's far
more common to use something like StandardTokenizer, WhitespaceTokenizer, etc.

Using keyword along with WDDF is kind of working, but probably not
doing what you
expect. Get familiar with the admin/analysis page to see.

Suffix Search is supported by ReversedWildcardFilter.

But really spend some time understanding your analysis chain before
you go there.
KeywordTokenizer is intended to be for input that you want to treat as a single
unit rather than a series of tokens.

FWIW,
Erick

On Thu, Jun 12, 2014 at 1:03 AM, Shay Sofer  wrote:
> Thanks for your reply.
>
> How can I support suffix search?
>
> Name: Hello_world
> Search: *world
>
> And I'll get hello_world as a result.
>
> Thanks in advance.
>
>
>
> -Original Message-
> From: Shawn Heisey [mailto:s...@elyograg.org]
> Sent: Wednesday, June 11, 2014 5:47 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr search
>
>> Hi,
>>
>> Any suggestion for tokenizer / filter / other solutions that support
>> search in Solr as following -
>>
>> Use Case
>>
>> Input
>>
>> Solr should return
>>
>> All Results
>>
>> *
>>
>> All results
>>
>> Prefix Search
>>
>> Text*
>>
>> All data started by Text* (Prefix search)
>>
>> Exact Search
>>
>> "Auto Text"
>>
>> Exact match. Only Auto Text
>>
>> Partial (substring)
>>
>> *Text*
>>
>> All strings contains the text
>>
>>
>> Now I'm using KeywordTokenizerFactory and WordDelimiterFilterFactory.
>>
>> My issue is with exact search:
>> When I have document named hello_world, and I'm trying to do Exact
>> Search of hello, I got "hello_world" as a result (I want to get only
>> hello named doccuments).
>
> The WordDelimeterFilter will split on the underscore, which means that the 
> term "hello" is in the index for that document. Leave that filter out if you 
> really do want an exact match.
>
> Searching for "*" by itself is not how you match all documents. It may work, 
> but it is a wildcard search, which means under the covers that it's a search 
> for every term in the index for that field. It's SLOW. The special shortcut 
> *:* (this must be the entire query with no field name, and I'm assuming the 
> standard query parser here) is what you want for all documents. In terms of 
> user input, this is what you want to use when the user leaves the search box 
> empty. If you're using dismax or edismax, then you would send an empty q 
> parameter or leave it off entirely, and define a default q.alt parameter in 
> solrconfig.xml, set to *:* for all docs.
>
> Thanks,
> Shawn
>
>
>
>
>
> Email secured by Check Point


Indexing Files Month by Month

2014-06-12 Thread Venkata krishna
Hi ,

I am using lucene solr , would like to use Data import handler for to index
files but millions of files are there to import so indexing process will
take more time. I decided to import files month by month,so could you please
provide an suggestion  to import files month by month basis.








Thanks,

Venkata Krishna Tolusuri.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-Files-Month-by-Month-tp4141443.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Implementing Hive query in Solr

2014-06-12 Thread Erick Erickson
Any time I see a question like this I break out in hives (little pun there).
Solr is _not_ a replacement for Hive. Or any other SQL or SQL-like
engine. Trying to make it into one is almost always a mistake. First I'd ask
why you have to form this query.

Now, while I have very little knowledge of HIve, collect_set "removes
duplicates"
Why do you have duplicates in the first place?

Best,
Erick

On Thu, Jun 12, 2014 at 7:12 AM, Vivekanand Ittigi
 wrote:
> Hi,
>
> Can anyone please look into this issue. I want to implement this query in
> solr.
>
> Thanks,
> Vivek
>
> -- Forwarded message --
> From: Vivekanand Ittigi 
> Date: Thu, Jun 12, 2014 at 11:08 AM
> Subject: Implementing Hive query in Solr
> To: "solr-user@lucene.apache.org" 
>
>
> Hi,
>
> My requirements is to execute this query(hive) in solr:
>
> select SUM(Primary_cause_vaR),collect_set(skuType),RiskType,market,
> collect_set(primary_cause) from bil_tos Where skuType='Product' group by
> RiskType,market;
>
> I can implement sum and groupBy operations in solr using StatsComponent
> concept but i've no idea to implement collect_set() in solr.
>
> Collect_set() is used in Hive queries.
> Please provide me equivalent function for collect_set in solr or links or
> how to achieve it. It'd be a great help.
>
>
> Thanks,
> Vivek


Re: Indexing Files Month by Month

2014-06-12 Thread Erick Erickson
Partition your files into month-size folders and have DIH work on one
directory at a time

What I'd do is move away from DIH and use SolrJ. That way
1> you can take full control over what you do
2> you can offload the heavy lifting of parsing the various files
(I'm assuming here that you're indexing PDFs, Word docs, etc)
to a bunch of clients.

Here's some code samples:http://searchhub.org/2012/02/14/indexing-with-solrj/

Or, if you really want to get wild, consider the MapReduceIndexerTool. That
requires some infrastructure though.

Best,
Erick

On Thu, Jun 12, 2014 at 7:22 AM, Venkata krishna  wrote:
> Hi ,
>
> I am using lucene solr , would like to use Data import handler for to index
> files but millions of files are there to import so indexing process will
> take more time. I decided to import files month by month,so could you please
> provide an suggestion  to import files month by month basis.
>
>
>
>
>
>
>
>
> Thanks,
>
> Venkata Krishna Tolusuri.
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Indexing-Files-Month-by-Month-tp4141443.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: split field on json update

2014-06-12 Thread Jack Krupansky
You can easily write a JavaScript snippet using the stateless script update 
processor and do whatever string manipulation you want on an input value, 
and then write extracted strings to whatever field(s) you want. My e-book 
has plenty of script examples.


-- Jack Krupansky

-Original Message- 
From: elisabeth benoit

Sent: Thursday, June 12, 2014 10:18 AM
To: solr-user@lucene.apache.org
Subject: Re: split field on json update

Thanks for your answer,

best regards,
Elisabeth


2014-06-12 14:07 GMT+02:00 Alexandre Rafalovitch :


There is always UpdateRequestProcessor.

Regards,
Alex
On 12/06/2014 7:05 pm, "elisabeth benoit" 
wrote:

> Hello,
>
> Is it possible, in solr 4.2.1, to split a multivalued field with a json
> update as it is possible do to with a csv update?
>
> with csv
> /update/csv?f.address.split=true&f.address.separator=%2C&commit=true
>
> with json (using a post)
> /update/json
>
> Thanks,
> Elisabeth
>





Converting XML response of Search query into HTML.

2014-06-12 Thread Venkata krishna
Hi,

I am using solr4.8, solrj  for to do searching, would like to get response
of search query in html format,for that purpose i have written this code, 
private static final String urlString = "http://localhost:8983/solr";;
private SolrServer solrServer; 
public SolrJ() {  
if (solrServer == null) { 
solrServer = new HttpSolrServer(urlString);   

}
}  

public QueryResponse getRueryResponse(String queryString) { 
SolrQuery query = new SolrQuery();  
query.setHighlight(true).setHighlightSnippets(20); //set other 
params as
needed
query.setParam("hl.fl", "content");
query.setQuery(queryString);
query.set("&wt", "xslt");
query.set("&indent",true);
query.set("&tr", "example.xsl");
   
   
QueryResponse queryResponse = null;
try { 
((HttpSolrServer) solrServer).setParser(new 
XMLResponseParser());
queryResponse = solrServer.query(query);
} catch (SolrServerException e) {
e.printStackTrace();   
}   return queryResponse; 
}
and in example.xsl media type is  
.

but i am getting an exception
Exception in thread "main"
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
Expected mime type application/xml but got text/html.

So could you please provide any solution to resolve issue.


Thanks,

Venkata Krishna Tolusuri.


  



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Converting-XML-response-of-Search-query-into-HTML-tp4141456.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Converting XML response of Search query into HTML.

2014-06-12 Thread Ahmet Arslan
Hi,

I see that you have ampersand left when setting various parameters. 

  query.set("&wt", "xslt");

should be 

  query.set("wt", "xslt");
       



On Thursday, June 12, 2014 6:12 PM, Venkata krishna  
wrote:



Hi,

I am using solr4.8, solrj  for to do searching, would like to get response
of search query in html format,for that purpose i have written this code, 
private static final String urlString = "http://localhost:8983/solr";;
    private SolrServer solrServer; 
    public SolrJ() {  
        if (solrServer == null) { 
            solrServer = new HttpSolrServer(urlString);  

                }
        }  

public QueryResponse getRueryResponse(String queryString) { 
        SolrQuery query = new SolrQuery();  
        query.setHighlight(true).setHighlightSnippets(20); //set other params as
needed
        query.setParam("hl.fl", "content");
        query.setQuery(queryString);
        query.set("&wt", "xslt");
        query.set("&indent",true);
        query.set("&tr", "example.xsl");


        QueryResponse queryResponse = null;
        try { 
            ((HttpSolrServer) solrServer).setParser(new XMLResponseParser());
            queryResponse = solrServer.query(query);
            } catch (SolrServerException e) {
                e.printStackTrace();  
            }   return queryResponse; 
    }
and in example.xsl media type is  
.

but i am getting an exception
Exception in thread "main"
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
Expected mime type application/xml but got text/html.

So could you please provide any solution to resolve issue.


Thanks,

Venkata Krishna Tolusuri.






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Converting-XML-response-of-Search-query-into-HTML-tp4141456.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Implementing Hive query in Solr

2014-06-12 Thread Joel Bernstein
You may have to implement this yourself. In Solr 4.9 you'll be able to
implement your own analytic functions in java and plug them in using the
AnalyticsQuery API. This is a new Solr API for plugging in custom
analytics.

http://heliosearch.org/solrs-new-analyticsquery-api/

Joel Bernstein
Search Engineer at Heliosearch


On Thu, Jun 12, 2014 at 10:27 AM, Erick Erickson 
wrote:

> Any time I see a question like this I break out in hives (little pun
> there).
> Solr is _not_ a replacement for Hive. Or any other SQL or SQL-like
> engine. Trying to make it into one is almost always a mistake. First I'd
> ask
> why you have to form this query.
>
> Now, while I have very little knowledge of HIve, collect_set "removes
> duplicates"
> Why do you have duplicates in the first place?
>
> Best,
> Erick
>
> On Thu, Jun 12, 2014 at 7:12 AM, Vivekanand Ittigi
>  wrote:
> > Hi,
> >
> > Can anyone please look into this issue. I want to implement this query in
> > solr.
> >
> > Thanks,
> > Vivek
> >
> > -- Forwarded message --
> > From: Vivekanand Ittigi 
> > Date: Thu, Jun 12, 2014 at 11:08 AM
> > Subject: Implementing Hive query in Solr
> > To: "solr-user@lucene.apache.org" 
> >
> >
> > Hi,
> >
> > My requirements is to execute this query(hive) in solr:
> >
> > select SUM(Primary_cause_vaR),collect_set(skuType),RiskType,market,
> > collect_set(primary_cause) from bil_tos Where skuType='Product' group by
> > RiskType,market;
> >
> > I can implement sum and groupBy operations in solr using StatsComponent
> > concept but i've no idea to implement collect_set() in solr.
> >
> > Collect_set() is used in Hive queries.
> > Please provide me equivalent function for collect_set in solr or links or
> > how to achieve it. It'd be a great help.
> >
> >
> > Thanks,
> > Vivek
>


Re: Implementing Hive query in Solr

2014-06-12 Thread Mikhail Khludnev
Hello,

I've found https://github.com/kawasima/solr-jdbc recently. Haven't checked
it so far, but the idea is fairly cool. I wonder if it can be relevant to
your challenge.


On Thu, Jun 12, 2014 at 9:38 AM, Vivekanand Ittigi 
wrote:

> Hi,
>
> My requirements is to execute this query(hive) in solr:
>
> select SUM(Primary_cause_vaR),collect_set(skuType),RiskType,market,
> collect_set(primary_cause) from bil_tos Where skuType='Product' group by
> RiskType,market;
>
> I can implement sum and groupBy operations in solr using StatsComponent
> concept but i've no idea to implement collect_set() in solr.
>
> Collect_set() is used in Hive queries.
> Please provide me equivalent function for collect_set in solr or links or
> how to achieve it. It'd be a great help.
>
>
> Thanks,
> Vivek
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics


 


Re: Implementing Hive query in Solr

2014-06-12 Thread Joel Bernstein
Yeah, solr-jdbc does look interesting. Has an Apache license as well.

Joel Bernstein
Search Engineer at Heliosearch


On Thu, Jun 12, 2014 at 1:18 PM, Mikhail Khludnev <
mkhlud...@griddynamics.com> wrote:

> Hello,
>
> I've found https://github.com/kawasima/solr-jdbc recently. Haven't checked
> it so far, but the idea is fairly cool. I wonder if it can be relevant to
> your challenge.
>
>
> On Thu, Jun 12, 2014 at 9:38 AM, Vivekanand Ittigi 
> wrote:
>
> > Hi,
> >
> > My requirements is to execute this query(hive) in solr:
> >
> > select SUM(Primary_cause_vaR),collect_set(skuType),RiskType,market,
> > collect_set(primary_cause) from bil_tos Where skuType='Product' group by
> > RiskType,market;
> >
> > I can implement sum and groupBy operations in solr using StatsComponent
> > concept but i've no idea to implement collect_set() in solr.
> >
> > Collect_set() is used in Hive queries.
> > Please provide me equivalent function for collect_set in solr or links or
> > how to achieve it. It'd be a great help.
> >
> >
> > Thanks,
> > Vivek
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> 
>  
>


SOLR error on distributed query

2014-06-12 Thread Diego Marchi
Il giorno giovedì 12 giugno 2014 11:31:07 UTC-7, Diego Marchi ha scritto:
>
>
> Hi all,
>
> I have a distributed environment in SOLR with 4 cores. Each core has
> approx 100m documents. We are maintaining the database of documents since
> version 2 of solr I think, so many documents do not respect the schema we
> set with the latest solr version. (Now we are running version 4.8 - the
> current schema has a uniqueid field set, while it wasn't present in the
> earlier versions. This unique field is unsurprisingly called "id" but not
> all the documents have it.)
>
> Bow I'm trying to perform a distributed search on some random words
> belonging to the English dictionary. It works for most of the time but,
> from time to time, it returns back a NullPointerException I cannot track
> down:
>
>>
>> {
>> "responseHeader":{
>> "status":500,
>> "QTime":229,
>> "params":{
>> "shards":"http://x.x.x.x:p/solr/collection2";,
>> "indent":"true",
>> "start":"0",
>> "q":"content:suppleness",
>> "wt":"json",
>> "fq":"website:523",
>> "rows":"10"}},
>> "error":{
>> "trace":"java.lang.NullPointerException\n\tat
>> org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:962)\n\tat
>> org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:686)\n\tat
>> org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:665)\n\tat
>> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:324)\n\tat
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)\n\tat
>> org.apache.solr.core.SolrCore.execute(SolrCore.java:1952)\n\tat
>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:774)\n\tat
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)\n\tat
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)\n\tat
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)\n\tat
>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)\n\tat
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)\n\tat
>> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)\n\tat
>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)\n\tat
>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)\n\tat
>> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)\n\tat
>> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)\n\tat
>> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)\n\tat
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)\n\tat
>> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)\n\tat
>> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)\n\tat
>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)\n\tat
>> org.eclipse.jetty.server.Server.handle(Server.java:368)\n\tat
>> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)\n\tat
>> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)\n\tat
>> org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)\n\tat
>> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)\n\tat
>> org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)\n\tat
>> org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)\n\tat
>> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)\n\tat
>> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)\n\tat
>> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)\n\tat
>> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)\n\tat
>> java.lang.Thread.run(Thread.java:744)\n",
>> "code":500}}
>
>
> I tried to release the lock on files ( I read somewhere it could be that
> causing the exception), I tried also to add the shards.qt param with no
> success. I was thinking it could be the fact that some of these pages are
> missing fields and have not the same structure as the others... so I
> checked how to add a field but for me it's impossible since I don't have
> all my fields stored in solr (some of them are just indexed).
>
> The funny thing is that if I query the same collection on a different term
> it works fine...
>
> This is the query I ran and failing
> http://x.x.x.x:p
> /solr/collection1/select?q=content:suppleness&fq=website:523&wt=json&indent=true&shards=http://x.x.x.x:p
> /solr/collection2,http://x.x.x.x:p/solr/collection1,http://x.x.x.x:p
> /solr/collection3,

Re: Solr relevancy tuning

2014-06-12 Thread Doug Turnbull
I realize I never responded to this thread, shame on me!

Jorge/Giovanni Kelvin looks pretty cool -- thanks for sharing it. When we
use Quepid ,we sometimes do it at places with existing relevancy test
scripts like Kelvin. Quepid/test scripts tend to satisfy different nitches.
In addition to testing, Quepid is a GUI for helping you explain/investigate
and sandbox in addition to test. Sometimes this is nice for fuzzier/more
qualitative judgments especially when you want to collaborate with
non-technical stakeholders. Its been our replacement for the "spreadsheet"
that a lot of our clients used before Quepid -- where the non-technical
folks would list

Scripts work very well for getting that pass/fail response. Its nice that
Kelvin gives you a "temperature" instead of necessarily a pass fail, that
level of fuzzyness is definitely useful.

We certainly see value in both (and will probably be doing more to
integrate Quepid with continuous integration/scripting).

Cheers,
-Doug


On Mon, May 5, 2014 at 2:47 AM, Jorge Luis Betancourt González <
jlbetanco...@uci.cu> wrote:

> One good thing about kelvin it's more a programmatic task, so you could
> execute the scripts after a few changes/deployment and get a general idea
> if the new changes has impacted into the search experience; yeah sure the
> changing catalog it's still a problem but I kind of like to be able to
> execute a few commands and presto get it done. This could become a must-run
> test in the test suite of the app. I kind of do this already but testing
> from the user interface, using the test library provided by symfony2
> (framework I'm using) and the functional tests. It's not
> test-driven-search-relevancy "perse" but we ensure not to mess up with some
> basic queries we use to test the search feature.
>
> - Original Message -
> From: "Giovanni Bricconi" 
> To: "solr-user" 
> Cc: "Ahmet Arslan" 
> Sent: Friday, April 11, 2014 5:15:56 AM
> Subject: Re: Solr relevancy tuning
>
> Hello Doug
>
> I have just watched the quepid demonstration video, and I strongly agree
> with your introduction: it is very hard to involve marketing/business
> people in repeated testing session, and speadsheets or other kind of files
> are not the right tool to use.
> Currenlty I'm quite alone in my tuning task and having a visual approach
> could be benefical for me, you are giving me many good inputs!
>
> I see that kelvin (my scripted tool) and queepid follows the same path. In
> queepid someone quickly whatches the results and applies colours to result,
> in kelvin you enter one on more queries (network cable, ethernet cable) and
> states that the result must contains ethernet in the title, or must come
> from a list of product categories.
>
> I also do diffs of results, before and after changes, to check what is
> going on; but I have to do that in a very unix-scripted way.
>
> Have you considered of placing a counter of total red/bad results in
> quepid? I use this index to have a quick overview of changes impact across
> all queries. Actually I repeat tests in production from times to time, and
> if I see the "kelvin temperature" rising (the number of errors going up) I
> know I have to check what's going on because new products maybe are having
> a bad impact on the index.
>
> I also keep counters of products with low quality images/no images at all
> or too short listings, sometimes are useful to undestand better what will
> happen if you change some bq/fq in the application.
>
> I see also that after changes in quepid someone have to check "gray"
> results and assign them a colour, in kelvin case sometimes the conditions
> can do a bit of magic (new product names still contains SM-G900F) but
> sometimes can introduce false errors (the new product name contains only
> Galaxy 5 and not the product code SM-G900F). So some checks are needed but
> with quepid everybody can do the check, with kelvin you have to change some
> line of a script, and not everybody is able/willing to do that.
>
> The idea of a static index is a good suggestion, I will try to have it in
> the next round of search engine improvement.
>
> Thank you Doug!
>
>
>
>
> 2014-04-09 17:48 GMT+02:00 Doug Turnbull <
> dturnb...@opensourceconnections.com>:
>
> > Hey Giovanni, nice to meet you.
> >
> > I'm the person that did the Test Driven Relevancy talk. We've got a
> product
> > Quepid (http://quepid.com) that lets you gather good/bad results for
> > queries and do a sort of test driven development against search
> relevancy.
> > Sounds similar to your existing scripted approach. Have you considered
> > keeping a static catalog for testing purposes? We had a project with a
> lot
> > of updates and date-dependent relevancy. This lets you create some test
> > scenarios against a static data set. However, one downside is you can't
> > recreate problems in production in your test setup exactly-- you have to
> > find a similar issue that reflects what you're seeing.
> >
> > Cheers,
> > -Doug
> 

ANNOUNCE: ApacheCon deadlines: CFP June 25 / Travel Assistance Jul 25

2014-06-12 Thread Chris Hostetter


(NOTE: cross-posted announcement, please confine any replies to 
general@lucene)


As you may be aware, ApacheCon will be held this year in Budapest, on
November 17-23. (See http://apachecon.eu for more info.)

### ### 1 - Call For Papers - June 25

The CFP for the conference is still open, but will end on June 25th.

If you have an idea for a Lucene/Solr related session @ ApacheCon please 
submit it.  All types of sessions are of interest to ApacheCon attendees 
-- from deep technical talks about internals, hands-on tutorials of 
specific features, general introductions for beginners, "How we did X" 
case studies about your own expereinces, etc...


Please consider submitting a proposal, at
http://events.linuxfoundation.org//events/apachecon-europe/program/cfp


### ### 2 - Travel Assistance - July 25th

The Travel Assistance Committee (TAC) is happy to anounce that 
applications for ApacheCon Europe 2014 will be accepted until July 25th.


Applications are welcome from individuals within the Apache community
at-large, users, developers, educators, students, Committers, and Members,
who need financial support to attend ApacheCon.

Please be aware the seats are very limited, and all applicants will be
scored on their individual merit.

More information can be found at http://www.apache.org/travel including a
link to the online application and detailed instructions for submitting.




-Hoss
http://www.lucidworks.com/



Re: SOLR error on distributed query

2014-06-12 Thread Chris Hostetter

: > set with the latest solr version. (Now we are running version 4.8 - the
: > current schema has a uniqueid field set, while it wasn't present in the
: > earlier versions. This unique field is unsurprisingly called "id" but not
: > all the documents have it.)

this is going to be the source of a lot of pain for you -- and is almost 
certainly the source of your current error -- if you don't have a 
uniqueKey field (which yes: must exist for every document) distributed 
search is not going to work -- it's such a low level expectation, that 
it's take nas a given in a lot of the code (because it is verified on 
startup) and explains why you get an NPE from mergeIds instead o a more 
"freindly" error message

: >> "trace":"java.lang.NullPointerException\n\tat
: >> 
org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:962)\n\tat

: > The funny thing is that if I query the same collection on a different term
: > it works fine...

...because differnet terms match differnet documents - so the problematic 
docs w/o a uniqueKey aren't included.

: > I then tried to isolate the core where the exception happens and it seems
: > to be core 2. (this command in fact triggers the exception, while checking
: > in other cores simply gives me back that numFound=0)
...
: > but if I query directly the core (without sharding) I obtain the results
: > with no problems:

...because in non-sharded queries there is no need to mergeIds, so solr 
doesn't notice/care that some of the docs it's returning are missing the 
uniqueKey field.


-Hoss
http://www.lucidworks.com/


Re: Non-Heap OOM Error with Small Index Size

2014-06-12 Thread msoltow
We've managed to fix our issue, but just in case anyone has the same problem,
I wanted to identify our solution.

We were originally using the version of Tomcat that was packaged with CentOS
(Tomcat 6.0.24).  We tried downloading a newer version of Tomcat (7.0.52)
and running Solr there, and this fixed the problem.  We're not sure exactly
what the problem was, but that's all it took.

Hope this helps someone!

Michael



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Non-Heap-OOM-Error-with-Small-Index-Size-tp4141175p4141509.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SOLR error on distributed query

2014-06-12 Thread Diego Marchi
Thanks Hoss for your reply...

yeah I thought so... and I don't think it's even possible to add the id
field to the document missing it, right? Also because some of the fields
are not stored and it is my understanding that it is one of the
requirements to have the update query work... right?

But still I don't understand why it gives a result with no problems for
some docs and it throws an error for others...

For example now I re-run the query and it works fine... Does the Elevation
component has anything to do with this?

Diego


On Thu, Jun 12, 2014 at 12:47 PM, Chris Hostetter 
wrote:

>
> : > set with the latest solr version. (Now we are running version 4.8 - the
> : > current schema has a uniqueid field set, while it wasn't present in the
> : > earlier versions. This unique field is unsurprisingly called "id" but
> not
> : > all the documents have it.)
>
> this is going to be the source of a lot of pain for you -- and is almost
> certainly the source of your current error -- if you don't have a
> uniqueKey field (which yes: must exist for every document) distributed
> search is not going to work -- it's such a low level expectation, that
> it's take nas a given in a lot of the code (because it is verified on
> startup) and explains why you get an NPE from mergeIds instead o a more
> "freindly" error message
>
> : >> "trace":"java.lang.NullPointerException\n\tat
> : >>
> org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:962)\n\tat
>
> : > The funny thing is that if I query the same collection on a different
> term
> : > it works fine...
>
> ...because differnet terms match differnet documents - so the problematic
> docs w/o a uniqueKey aren't included.
>
> : > I then tried to isolate the core where the exception happens and it
> seems
> : > to be core 2. (this command in fact triggers the exception, while
> checking
> : > in other cores simply gives me back that numFound=0)
> ...
> : > but if I query directly the core (without sharding) I obtain the
> results
> : > with no problems:
>
> ...because in non-sharded queries there is no need to mergeIds, so solr
> doesn't notice/care that some of the docs it's returning are missing the
> uniqueKey field.
>
>
> -Hoss
> http://www.lucidworks.com/
>


Re: How to query for content with ACLs?

2014-06-12 Thread Jack Krupansky

Take a look at this:
http://www.slideshare.net/lucenerevolution/wright-nokia-manifoldcfeurocon-2011

Karl has an old Jira patch somewhere for doing the ACLs processing in Solr.

-- Jack Krupansky

-Original Message- 
From: lalitjangra

Sent: Thursday, June 12, 2014 9:28 AM
To: solr-user@lucene.apache.org
Subject: Re: How to query for content with ACLs?

Thanks Ahmet ,

I have already setup mcf-solr4x-plugin in MCF 1.5.1 and i can see ACLs
indexed into solr indexes.

But now i assume i need to write Solr query to put a user's permission
details into in it which can be compared to ACL stored in solr. This is why
i have posted it here. Also i have posted it to MCF list.

Can anybody help?

Regards.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-query-for-content-with-ACLs-tp4141402p4141419.html
Sent from the Solr - User mailing list archive at Nabble.com. 



faceting performance on fields with high-cardinality

2014-06-12 Thread Tang, Rebecca
Hi there,

I have an solr index with 14+ million records.  We facet on quite a few fields 
with very high-cardinality such as author, person, organization, brand and 
document type.  Some of the records contain thousands of persons and 
organizations.  So the person and organization fields can be very large.

First I built these fields as
 
  
  
  

The performance was atrocious when faceting is turned on.  It took 10+ min to 
run any query.

Then I decided to break the values up myself and just build them into the field 
as a multi-valued field like this:
 
 

After this change, the performance improved drastically. But I can't understand 
why building these fields as multi-valued field vs. single-valued field with 
semicolon tokenizer can have such a dramatic performance difference. Doesn't 
Solr tokenize the field at index time and save the values as tokens anyway? Why 
does manually breaking down the values into tokens improve faceting performance 
so much?

Thanks!
Rebecca Tang
Applications Developer, UCSF CKM
Legacy Tobacco Document Library
E: rebecca.t...@ucsf.edu


Re: Collection fix predefined hash range for data import.

2014-06-12 Thread SolrUser1543
The reason is the following :

I have a  collection named col1 , which has n Shards deployed on n machines.
( on each machine - one shard with one replica ) 

Now I want to create col2 , with new config and import data from col1 to
col2. 

What I need is that shards on col2 will be on the same machine like as in
col 1 ( which means with same has range ). 

 The reason is simple - I want during data import that data will be copied
locally , not via the network between machines. 

For example if shard3 was machine #7 in col2 , so I want shard of  col2 be
also on the same machine. Otherwise it will copied over the network. 

But during collection creation the order of shards could not be controlled. 





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Collection-fix-predefined-hash-range-for-data-import-tp4141365p4141527.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr Replication Issue : Incorrect Base_URL

2014-06-12 Thread pramodEbay
Hi,
I am deploying Solr in a larger web application. The standalone solr
instance works fine. The path-prefix I use is raptorslrweb. A standalone
SOLR query to my instance that works is as follows:

http://hostname:8080/raptorslrweb/solr/reviews/select?q=*%3A*&wt=json&indent=true

However, when I configure a solr cloud, I get the following error in
RecoveryStrategy:
"msg":"org.apache.solr.client.solrj.SolrServerException: Server at
http://hostname:8080/solr/reviews sent back a redirect (302).",

The reason is the base_url does not seem to honor the path-prefix.
clusterstate.json shows the following for the node:
{"reviews":{
"shards":{"shard1":{
"range":null,
"state":"active",
"parent":null,
"replicas":{
  "core_node1":{
"state":"down",
   * "base_url":"http://hostname:8080/solr",*   
"core":"reviews",
"node_name":"10.98.63.98:8080_solr"},

Can someone please tell me where do I tell zookeeper or solr cloud that the
base url should be hostname:8080/raptorslrweb/solr and not
hostname:8080/solr.

Thanks,
Pramod



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Replication-Issue-Incorrect-Base-URL-tp4141537.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr Master-Slave fail-over across multiple data-centers

2014-06-12 Thread Arcadius Ahouansou
Hello.

- We currently have solr 4 in master-slave mode across 2 DataCenters.

- We are planning to run the system in active-active mode, meaning that
search requests will go to Solr Slaves in both DC-A and DC-B.

- We have a highly available and cross DC database that feeds the
SolrMaster in both DC. So, both Solr Masters are being kept up-to-date.

- In order to allow all slaves in both DC to have the very same index
version, we have come up with the idea of having multiple masterUrl on each
slave, i.e masterUrl=masterUrl-A,masterUrl-B (and this is the main point of
this post)

- When both DC are available, only masterUrl-A is used for fetching the
index and the topology would look like the one shown at
https://www.dropbox.com/s/4vqdx70af5ddn69/master-slave-failover.png

- In case the worst happens and we lose DC-A,   the slaves in DC-B will get
network errors like NoRouteToHost or ConnectionTimeout.

- After few attempts, the slaves will switch to using the next url in the
masterUrl variable which would be masterUrl-B

- This should work pretty well and when DC-A becomes available, we could
issue a rest API call to reset the masterUrl or restart the master in DC-B
and slaves in DC-B should switch back to using masterUrl-A.

- I would like to gather your thought about this idea.

- If this makes sense, I could raise a Jira ticket to enable multiple
masterUrl and the fail-over principle described here.

Thank you very much.

Arcadius.


Re: Converting XML response of Search query into HTML.

2014-06-12 Thread Alexandre Rafalovitch
Why are you doing your conversion on Solr side and not on SolrJ
(client) side? Seems more efficient and you can control the lifecycle
of XSLT objects better yourself.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Thu, Jun 12, 2014 at 10:26 PM, Ahmet Arslan
 wrote:
> Hi,
>
> I see that you have ampersand left when setting various parameters.
>
>   query.set("&wt", "xslt");
>
> should be
>
>   query.set("wt", "xslt");
>
>
>
>
> On Thursday, June 12, 2014 6:12 PM, Venkata krishna  
> wrote:
>
>
>
> Hi,
>
> I am using solr4.8, solrj  for to do searching, would like to get response
> of search query in html format,for that purpose i have written this code,
> private static final String urlString = "http://localhost:8983/solr";;
> private SolrServer solrServer;
> public SolrJ() {
> if (solrServer == null) {
> solrServer = new HttpSolrServer(urlString);
>
> }
> }
>
> public QueryResponse getRueryResponse(String queryString) {
> SolrQuery query = new SolrQuery();
> query.setHighlight(true).setHighlightSnippets(20); //set other params 
> as
> needed
> query.setParam("hl.fl", "content");
> query.setQuery(queryString);
> query.set("&wt", "xslt");
> query.set("&indent",true);
> query.set("&tr", "example.xsl");
>
>
> QueryResponse queryResponse = null;
> try {
> ((HttpSolrServer) solrServer).setParser(new XMLResponseParser());
> queryResponse = solrServer.query(query);
> } catch (SolrServerException e) {
> e.printStackTrace();
> }   return queryResponse;
> }
> and in example.xsl media type is
> .
>
> but i am getting an exception
> Exception in thread "main"
> org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
> Expected mime type application/xml but got text/html.
>
> So could you please provide any solution to resolve issue.
>
>
> Thanks,
>
> Venkata Krishna Tolusuri.
>
>
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Converting-XML-response-of-Search-query-into-HTML-tp4141456.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Converting XML response of Search query into HTML.

2014-06-12 Thread Erik Hatcher
Or, ahem, use VelocityResponseWriter :)

> On Jun 12, 2014, at 21:07, Alexandre Rafalovitch  wrote:
> 
> Why are you doing your conversion on Solr side and not on SolrJ
> (client) side? Seems more efficient and you can control the lifecycle
> of XSLT objects better yourself.
> 
> Regards,
>   Alex.
> Personal website: http://www.outerthoughts.com/
> Current project: http://www.solr-start.com/ - Accelerating your Solr 
> proficiency
> 
> 
> On Thu, Jun 12, 2014 at 10:26 PM, Ahmet Arslan
>  wrote:
>> Hi,
>> 
>> I see that you have ampersand left when setting various parameters.
>> 
>>  query.set("&wt", "xslt");
>> 
>> should be
>> 
>>  query.set("wt", "xslt");
>> 
>> 
>> 
>> 
>> On Thursday, June 12, 2014 6:12 PM, Venkata krishna  
>> wrote:
>> 
>> 
>> 
>> Hi,
>> 
>> I am using solr4.8, solrj  for to do searching, would like to get response
>> of search query in html format,for that purpose i have written this code,
>> private static final String urlString = "http://localhost:8983/solr";;
>>private SolrServer solrServer;
>>public SolrJ() {
>>if (solrServer == null) {
>>solrServer = new HttpSolrServer(urlString);
>> 
>>}
>>}
>> 
>> public QueryResponse getRueryResponse(String queryString) {
>>SolrQuery query = new SolrQuery();
>>query.setHighlight(true).setHighlightSnippets(20); //set other params 
>> as
>> needed
>>query.setParam("hl.fl", "content");
>>query.setQuery(queryString);
>>query.set("&wt", "xslt");
>>query.set("&indent",true);
>>query.set("&tr", "example.xsl");
>> 
>> 
>>QueryResponse queryResponse = null;
>>try {
>>((HttpSolrServer) solrServer).setParser(new XMLResponseParser());
>>queryResponse = solrServer.query(query);
>>} catch (SolrServerException e) {
>>e.printStackTrace();
>>}   return queryResponse;
>>}
>> and in example.xsl media type is
>> .
>> 
>> but i am getting an exception
>> Exception in thread "main"
>> org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
>> Expected mime type application/xml but got text/html.
>> 
>> So could you please provide any solution to resolve issue.
>> 
>> 
>> Thanks,
>> 
>> Venkata Krishna Tolusuri.
>> 
>> 
>> 
>> 
>> 
>> 
>> --
>> View this message in context: 
>> http://lucene.472066.n3.nabble.com/Converting-XML-response-of-Search-query-into-HTML-tp4141456.html
>> Sent from the Solr - User mailing list archive at Nabble.com.


Codec - PostingsFormat - Postings/TermsConsumer - Checkpointed merged segment.

2014-06-12 Thread Aditya Tripathi
Hi,

We are trying an implementation where we use a custom PostingsFormat for
one field to write the postings directly to a third party stable storage.
The intention is partial update for this field. But for now, I want to ask
one specific problem regarding merge.

Main Issue:
*
In the indexing chain of Lucene, PerFieldPostingsFormat (which is a
consumer for FreqProxTermsWriter) is only invoked at the flush time. Which
means if you implement a custom PostingsFormat for a field, a custom
FieldsConsumer/TermsConsumer/PostingsConsumer can only do things at flush
time or merge time. We are stuck because a commit can happen without
anything to be flushed. In this sort of commit, an uncommitted merged
segment gets committed, but our customer FieldsConsumer is not aware of
this commit at all.

Is there anyway for a custom FieldConsumer of a PostingsFormat to know
about a "commit without anything to flush" event?

More Details on what are we trying:
*

We have our own implementation of merge logic, which is primarily
renumbering the docIds. And whenever a merge happens we ended up writing
the new merged state to the stable storage because of Lucene's design.
However, we got into some inconsistency with Lucene's uncommted merged
segment. Our FieldConsumer has committed the new merged state to the stable
storage.

Some details and Points to Note:
*
Our custom FieldsConsumer is called only at the flush time by Lucene.
Because FreqProxTermsWriterPerField invokes it's consumer only at the time
of flush().

I can not give a lot of details here, but assume we store something like
(key:segment_field_term , value:postingsArray )  in the stable storage.

There are other fields as well which follow default Lucene42 Codec formats.

The problem: How to keep the segment information in the stable storage and
Lucene's uncommited merged segments, in sync.
***
In this case, the opened Lucene directory may have the old segments but our
key-value store have only the new merged state. The searches are done with
old segment readers, and since old segments are no more there in the stable
storage, searches fail.

Seeking Solution for how to solve the case where a merged segment is just
checkpointed (not yet commited) but may or may not participate in a search.
**

So our flow is something like this:
(Skipping the whole indexing chain for brevity)
a) UpdateDocument -> InvertedDocConsumer (DocProcessorPerField) ->
TermHashPerField ->FreqProxTermsWriterPerField
b) DocumentWriter.flush() -> DWPT.flush() called through commit (Do not
consider merge for now)
c) This also goes through same indexing chain, calling flush of the
consumers in the chain.
d) Finally theFreqProxTermsWriterPerField.flush() calls it's consumer which
is perFieldPostingsFormat and for this field, the custom TermsConsumer and
PostingsConsumer are used.

For merge, the flow is:

SegmentMerger.mergeTerms -> FieldsConsumer.merge -> .TermsConsumer.merge ->
PostingsConsumer.merge

***
Since merge is also flushing a new segment, it is no surprise that merge
and flush almost call the same methods of TermsConsumer and
PostingsConsumer (startTerm, startDoc etc)

But this design is a problem for us.

We want "flush" to write directly to the stable storage but "merge" should
wait when the merged segment gets committed.
And so we implement our own merge method where we writed the merged state
to an in-memory structure. But we may not be able to commit this at the
same time as the uncommitted merged segment gets committed.


One solution we tried but did not work completely :
***
We do not allow Lucene's TermConsumer.merge to process anything because we
end the TermsEnum iterator before calling super.merge.
And in our custom merge method:The custom TermsConsumer creates an
in-memory merge information from the given mergeState.
We write(commit, so to say) this to the stable storage at the next flush
because we have no other signal to write this.

However, the problem with this approach is we are dependent on another
document to be added and flushed for our in-memory merge info to get
committed to stable storage.
IndexWriter's commit does not communicate with the FieldConsumers directly
but only through DocumentWriter.flush() and this goes via the indexing
chain.

However, same is not true with the merged checkpointed segment, any commit
will commit all the uncommitted segments without any flush require

Re: Highlighting on Parent document

2014-06-12 Thread StrW_dev
Apparently it is not supported, so I will try to push it into Jira.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Highlighting-on-Parent-document-tp4139784p4141579.html
Sent from the Solr - User mailing list archive at Nabble.com.