Re: Stored hierachical data in Solr

2013-01-16 Thread Toke Eskildsen
On Tue, 2013-01-15 at 18:02 +0100, Nicholas Ding wrote:
> I'm thinking store hierachical data structure on Solr. I know I have to
> flatten the structure in a form like A_B_C, but it is possible to extend
> Solr to support hierachical data?

You need to be more specific here. What is it you're trying to do?

If you just want to search and retrieve, you can index "A/B/C" as a
StrField. Searching for all data under A/B is done with the prefix query
"myfield:A/B/*".

If you want hierarchical faceting, there are different solutions,
ranging from clever indexing to patching Solr:
https://wiki.apache.org/solr/HierarchicalFaceting

Regards,
Toke Eskildsen



Re: Missing documents with ConcurrentUpdateSolrServer (vs. HttpSolrServer) ?

2013-01-16 Thread Mikhail Khludnev
Mark,
Here is the https://issues.apache.org/jira/browse/SOLR-3284
ConcurrentUpdateSolrServer queues updates on the SolrJ side, not the server
ones. Solr server processes number of updates simultaneously, e.g. if your
servlet containers threads are unlimited it can potentially lead to OOM.


On Wed, Jan 16, 2013 at 3:09 AM, Shawn Heisey  wrote:

> On 1/15/2013 2:10 PM, Mark Bennett wrote:
>
>> First off, just reporting this:
>>
>> I wound up with approx 58% few documents having submitted via
>> ConcurrentUpdateSolrServer.  I went back and changed the code to use
>> HttpSolrServer and had 100%
>>
>> This was a long running test, approx 12 hours, with gigabytes of data, so
>> conveniently shared / reproducible, but I at least wanted to email around,
>> in part to get it "on the record", and second to see if anybody else has
>> seen this?  I didn't see anything in JIRA.
>>
>> I realize that Concurrent update is asynchronous and I'm giving up the
>> ability to monitor things, but since it works using the old server,
>> there's
>> nothing glaringly wrong at least.
>>
>
> You're not only giving up the ability to monitor things, you're also
> giving up the ability to detect errors.  All exceptions that get thrown by
> the internals of ConcurrentUpdateSolrServer are swallowed, your code will
> never know they happened.  The client log (slf4j with whatever binding &
> config you chose) may have such errors logged, but they are completely
> undetectable by the code.  Make sure you're actually logging someplace with
> your solrj app at a minimum level of INFO, then check that log.
>
> It might be a case of errors being silently swallowed, or it might be a
> bug.
>
> Thanks,
> Shawn
>
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics


 


Re: Is *:* the only possible search with * on the left-hand-side?

2013-01-16 Thread Upayavira
And, it would make for slow queries, as the more fields you query, the
worse performance gets.

Having said that, you can query multiple fields using the edismax query
parser, with it qf param.

Upayavira

On Wed, Jan 16, 2013, at 12:23 AM, Jack Krupansky wrote:
> Semi-hard-coded.
> 
> In QueryParserBase.java:
> 
> protected Query getWildcardQuery(String field, String termStr) throws 
> ParseException
> {
>   if ("*".equals(field)) {
> if ("*".equals(termStr)) return newMatchAllDocsQuery();
> 
> Otherwise, if you try *:x, "*" is an undefined field.
> 
> -- Jack Krupansky
> 
> -Original Message- 
> From: Alexandre Rafalovitch
> Sent: Tuesday, January 15, 2013 7:06 PM
> To: solr-user@lucene.apache.org
> Subject: Is *:* the only possible search with * on the left-hand-side?
> 
> Hello,
> 
> Is *:* hardcoded somewhere as a unique special pattern or is there
> actually
> a class of queries with *:'something'?
> 
> I tried searching for it, but I suspect this is not the patterns most
> tokenizers will actually index as searchable. :-)
> 
> Regards,
>Alex.
> 
> Personal blog: http://blog.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all at
> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book) 
> 


Search strategy - improving search quality for short search terms such as "doll"

2013-01-16 Thread David Parks
I'm a beginner-intermediate solr admin, I've set up the basics for our
application and it runs well.

 

Now it's time for me to dig in and start tuning and improving queries.

 

My next target is searches on simple terms such as "doll" which, in google,
would return documents about, well, "toy dolls", because that's the most
common usage of the simple term "doll". But in my index it predominantly
returns documents about CDs with the song "Doll Face", and "My baby doll" in
them.

 

I'm not directly asking how to solve this as much as I'm asking what
direction I should be looking in to learn what I need to know to tackle the
general issue myself.

 

Left on my own I would start looking at categorizing the CD's into a facet
called "music", reasonably doable in my dataset. Then I need to reduce the
boost-value of the entire facet/category of music unless certain pre-defined
query terms exist, such as [music, cd, song, listen, dvd, , etc.]. 

 

I don't yet know how to do all of this, but after a couple more good books I
should be "dangerous".

 

So the question to this list:

 

-  Am I on the right track here?  If not, can you point me in a
direction to go?

 

 



Re: Disable term frequency for some fields in solr

2013-01-16 Thread Amit Jha
Hi,

How can I do this in solr4.

Amit
On Thu, Dec 6, 2012 at 1:40 PM, Markus Jelsma wrote:

> custom similarity for that field that returns 1 for


Re: Disable term frequency for some fields in solr

2013-01-16 Thread Upayavira
This involves taking a subclass of the DefaultSimilarity class, in Java,
and adding that to your Solr setup. For someone versed in Java, this is
relatively straight-forward. For others it is non-trivial.

Upayavira

On Wed, Jan 16, 2013, at 10:57 AM, Amit Jha wrote:
> Hi,
> 
> How can I do this in solr4.
> 
> Amit
> On Thu, Dec 6, 2012 at 1:40 PM, Markus Jelsma
> wrote:
> 
> > custom similarity for that field that returns 1 for


Re: Disable term frequency for some fields in solr

2013-01-16 Thread Amit Jha
Done same thing in solr3.6 and working but in sorl3.6 filed level of
similarity is not available. And Solr4 has Similarity Factories. So I was
not getting how do I do it on solr4. Which class do i need to extend and
move ahead.


On Wed, Jan 16, 2013 at 4:44 PM, Upayavira  wrote:

> For someone versed in Jav


Priorities on fields

2013-01-16 Thread Dariusz Borowski
Hi,

Is it possible to define priorities on fields?

Lets say I have a product table which has the following fields:

- id
- title
- description
- code_name

An entry could be like this:

id: 42
title: shinny new shoes
description: Shinny new shoes made in Italy
code_name: shinny-new-shoes-42-2013

Now, I would like to priorities the fields for the search hint. I would
like to do as follow:

id: 0.0
title: 0.8
description: 0.5
code_name: 0.1

Is it possible in SOLR 3.6.1?

Dariusz


Re: Priorities on fields

2013-01-16 Thread Rafał Kuć
Hello!

What do you mean by priority ? You can define index or query time
boost. However that will allow to specify the importance of such
field.

A good page to look at is: http://wiki.apache.org/solr/SolrRelevancyCookbook

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

> Hi,

> Is it possible to define priorities on fields?

> Lets say I have a product table which has the following fields:

> - id
> - title
> - description
> - code_name

> An entry could be like this:

> id: 42
> title: shinny new shoes
> description: Shinny new shoes made in Italy
> code_name: shinny-new-shoes-42-2013

> Now, I would like to priorities the fields for the search hint. I would
> like to do as follow:

> id: 0.0
> title: 0.8
> description: 0.5
> code_name: 0.1

> Is it possible in SOLR 3.6.1?

> Dariusz



Re: Solr exception when parsing XML

2013-01-16 Thread Andre Bois-Crettez

Worth to note that some characters are completely forbidden in XML, such
as "chr(0)".
When dealing with external text input, some cleanup might be necessary
to avoid breaking indexation.
For example you could replace each forbidden XML character with " ".

André

On 01/15/2013 09:55 PM, Alexandre Rafalovitch wrote:

Interesting point. Looks like CDATA is more limiting than I thought:
http://en.wikipedia.org/wiki/CDATA#Issues_with_encoding . Basically, the
recommendation is to avoid CDATA and automatically encode characters such
as yours, as well as less/more and ampersand.

Regards,
Alex.

--
André Bois-Crettez

Search technology, Kelkoo
http://www.kelkoo.com/


Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 8, rue du Sentier 75002 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention 
exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce 
message, merci de le détruire et d'en avertir l'expéditeur.


Re: Solr exception when parsing XML

2013-01-16 Thread Andre Bois-Crettez

Forgot the link : http://en.wikipedia.org/wiki/Valid_characters_in_XML

André

On 01/16/2013 02:24 PM, Andre Bois-Crettez wrote:

Worth to note that some characters are completely forbidden in XML, such
as "chr(0)".
When dealing with external text input, some cleanup might be necessary
to avoid breaking indexation.
For example you could replace each forbidden XML character with " ".

André

On 01/15/2013 09:55 PM, Alexandre Rafalovitch wrote:

Interesting point. Looks like CDATA is more limiting than I thought:
http://en.wikipedia.org/wiki/CDATA#Issues_with_encoding . Basically, the
recommendation is to avoid CDATA and automatically encode characters such
as yours, as well as less/more and ampersand.

Regards,
 Alex.

--


Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 8, rue du Sentier 75002 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention 
exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce 
message, merci de le détruire et d'en avertir l'expéditeur.


Re: Solr exception when parsing XML

2013-01-16 Thread Yonik Seeley
On Tue, Jan 15, 2013 at 3:55 PM, Alexandre Rafalovitch
 wrote:
> Basically, the
> recommendation is to avoid CDATA and automatically encode characters such
> as yours, as well as less/more and ampersand.

Unfortunately that doesn't even work.  Just as a raw control character
like a 0 byte is invalid XML, so is an encoded 0 byte like �
XML on it's own is simply incapable of representing all unicode code
points (without some further encoding on top like base64 or whatever).

You could always use JSON...

-Yonik
http://lucidworks.com


Re: Solr exception when parsing XML

2013-01-16 Thread Alexandre Rafalovitch
Looking at this second time, maybe we have an X/Y problem (sp?). Why was
that symbol in there in the first place?

Was it a field separator instead of using multiple fields? Was it a
character in an encoding other than UTF-8?

My guess is that the character will not make sense to Solr during either
indexing or Solr, so what's the reason of trying to get it in.

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Wed, Jan 16, 2013 at 9:18 AM, Yonik Seeley  wrote:

> On Tue, Jan 15, 2013 at 3:55 PM, Alexandre Rafalovitch
>  wrote:
> > Basically, the
> > recommendation is to avoid CDATA and automatically encode characters such
> > as yours, as well as less/more and ampersand.
>
> Unfortunately that doesn't even work.  Just as a raw control character
> like a 0 byte is invalid XML, so is an encoded 0 byte like �
> XML on it's own is simply incapable of representing all unicode code
> points (without some further encoding on top like base64 or whatever).
>
> You could always use JSON...
>
> -Yonik
> http://lucidworks.com
>


Way to lock solr for incoming writes

2013-01-16 Thread mizayah
Is there a way to lock solr for writes?
I don't wona use solr integrated backup because i'm using ceph claster.

What I need is to have consistent data for few seconds to make backup.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Way-to-lock-solr-for-incoming-writes-tp4033873.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Way to lock solr for incoming writes

2013-01-16 Thread Per Steffensen

Well you can stop the solrs :-)
If you are making backup by copying the actual files stored by solr, you 
probably want to stop them anyway to make sure everything is consistent 
and written to disk. If you dont stop the solrs, at least make sure that 
you do a "commit" (not soft) after all incomming writes have been stopped.
If you cannot afford stopping the solrs, when of course you will need to 
do something smarter. Maybe it is possible to just close the http 
endpoint in your webcontainer (jetty or tomcat or whatever) for a short 
while, or close the port on OS level or ...


Regards, Per Steffensen

On 1/16/13 4:02 PM, mizayah wrote:

Is there a way to lock solr for writes?
I don't wona use solr integrated backup because i'm using ceph claster.

What I need is to have consistent data for few seconds to make backup.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Way-to-lock-solr-for-incoming-writes-tp4033873.html
Sent from the Solr - User mailing list archive at Nabble.com.





Re: Search strategy - improving search quality for short search terms such as "doll"

2013-01-16 Thread Amit Jha
Its all about the data data set, here I mean index. If you have documents 
containing "toy" and "doll" it will return that in result set. 

What I understood that you are talking about the context of the query. For 
example if you search "books on MK Gandhi" and "books by MK Gandhi" both 
queries have different context.

Context based search at some level achieved by natural language processing. 
This one you can look at for better search.

Look for solr wiki & mailing list would be great source of learning.


Rgds
AJ

On 16-Jan-2013, at 15:10, "David Parks"  wrote:

> I'm a beginner-intermediate solr admin, I've set up the basics for our
> application and it runs well.
> 
> 
> 
> Now it's time for me to dig in and start tuning and improving queries.
> 
> 
> 
> My next target is searches on simple terms such as "doll" which, in google,
> would return documents about, well, "toy dolls", because that's the most
> common usage of the simple term "doll". But in my index it predominantly
> returns documents about CDs with the song "Doll Face", and "My baby doll" in
> them.
> 
> 
> 
> I'm not directly asking how to solve this as much as I'm asking what
> direction I should be looking in to learn what I need to know to tackle the
> general issue myself.
> 
> 
> 
> Left on my own I would start looking at categorizing the CD's into a facet
> called "music", reasonably doable in my dataset. Then I need to reduce the
> boost-value of the entire facet/category of music unless certain pre-defined
> query terms exist, such as [music, cd, song, listen, dvd,  user queries to come up with a more exhaustive list>, etc.]. 
> 
> 
> 
> I don't yet know how to do all of this, but after a couple more good books I
> should be "dangerous".
> 
> 
> 
> So the question to this list:
> 
> 
> 
> -  Am I on the right track here?  If not, can you point me in a
> direction to go?
> 
> 
> 
> 
> 


Re: Priorities on fields

2013-01-16 Thread Amit Jha
Boost query and Boost function will suffice your purpose. 

Rgds
AJ

On 16-Jan-2013, at 17:20, Dariusz Borowski  wrote:

> Hi,
> 
> Is it possible to define priorities on fields?
> 
> Lets say I have a product table which has the following fields:
> 
> - id
> - title
> - description
> - code_name
> 
> An entry could be like this:
> 
> id: 42
> title: shinny new shoes
> description: Shinny new shoes made in Italy
> code_name: shinny-new-shoes-42-2013
> 
> Now, I would like to priorities the fields for the search hint. I would
> like to do as follow:
> 
> id: 0.0
> title: 0.8
> description: 0.5
> code_name: 0.1
> 
> Is it possible in SOLR 3.6.1?
> 
> Dariusz


Re: Search strategy - improving search quality for short search terms such as "doll"

2013-01-16 Thread Alexandre Rafalovitch
Sounds like 'Doll' could be a category for you, while "Doll face" is a
title. Maybe the categories should get a higher boost in eDismax definition
over the titles?

Related, you may find the following book interesting:
http://rosenfeldmedia.com/books/searchanalytics/

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Wed, Jan 16, 2013 at 4:40 AM, David Parks  wrote:

> I'm a beginner-intermediate solr admin, I've set up the basics for our
> application and it runs well.
>
>
>
> Now it's time for me to dig in and start tuning and improving queries.
>
>
>
> My next target is searches on simple terms such as "doll" which, in google,
> would return documents about, well, "toy dolls", because that's the most
> common usage of the simple term "doll". But in my index it predominantly
> returns documents about CDs with the song "Doll Face", and "My baby doll"
> in
> them.
>
>
>
> I'm not directly asking how to solve this as much as I'm asking what
> direction I should be looking in to learn what I need to know to tackle the
> general issue myself.
>
>
>
> Left on my own I would start looking at categorizing the CD's into a facet
> called "music", reasonably doable in my dataset. Then I need to reduce the
> boost-value of the entire facet/category of music unless certain
> pre-defined
> query terms exist, such as [music, cd, song, listen, dvd,  user queries to come up with a more exhaustive list>, etc.].
>
>
>
> I don't yet know how to do all of this, but after a couple more good books
> I
> should be "dangerous".
>
>
>
> So the question to this list:
>
>
>
> -  Am I on the right track here?  If not, can you point me in a
> direction to go?
>
>
>
>
>
>


group.ngroups behavior in response

2013-01-16 Thread Amit Nithian
Hi all,

I recently discovered the group.main=true/false parameter which really has
made life simple in terms of ensuring that the format coming out of Solr
for my clients (RoR app) is backwards compatible with the non-grouped
results which ensures no special "handle grouped results" logic.

The only issue though is that the numFound is the number of total matches
instead of the number of groups which can seem odd (and incorrect if you
rely on the numFound to determine whether or not to display a "next page"
link).

I created a JIRA issue, SOLR-4310, and submitted a patch for this and
wanted to get feedback to see if this is an issue that others have
encountered and if so, would this help.

Thanks
Amit


Re: retrieving latest document **only**

2013-01-16 Thread J Mohamed Zahoor
group field is timestamp… it is not multivalued.

./zahoor


On 15-Jan-2013, at 7:14 PM, Upayavira  wrote:

> Is your group field multivalued? Could docs appear in more than one
> group?
> 
> Upayavira
> 
> On Tue, Jan 15, 2013, at 01:22 PM, J Mohamed Zahoor wrote:
>> 
>> The sum of all the "count" in the groups… does not match the total no of
>> docs found.
>> 
>> ./zahoor
>> 
>> 
>> On 12-Jan-2013, at 1:27 PM, Upayavira  wrote:
>> 
>>> Not sure exactly what you mean, can you give an example?
>>> 
>>> Upayavira
>>> 
>>> On Sat, Jan 12, 2013, at 06:32 AM, J Mohamed Zahoor wrote:
 Cool… it worked… But the count of all the groups and the count inside
 stats component does not match…
 Is that a bug?
 
 ./zahoor
 
 
 On 11-Jan-2013, at 6:48 PM, Upayavira  wrote:
 
> could you use field collapsing? Boost by date and only show one value
> per group, and you'll have the most recent document only.
> 
> Upayavira
> 
> On Fri, Jan 11, 2013, at 01:10 PM, jmozah wrote:
>> one crude way is first query and pick the latest date from the result
>> then issue a query with q=timestamp[latestDate TO latestDate]
>> 
>> But i dont want to execute two queries...
>> 
>> ./zahoor
>> 
>> On 11-Jan-2013, at 6:37 PM, jmozah  wrote:
>> 
>>> 
>>> 
>>> 
 What do you want?
 'the most recent ones' or '**only** the latest' ?
 
 Perhaps a range query "q=timestamp:[refdate TO NOW]" will match your 
 needs.
 
 Uwe
 
>>> 
>>> 
>>> I need **only** the latest documents...
>>> in the above query , "refdate" can vary based on the query.
>>> 
>>> ./zahoor
>>> 
>>> 
>>> 
>> 
 
>> 



Searching for field that contains multiple values

2013-01-16 Thread Nguyen, Vincent (CDC/OD/OADS) (CTR)
Hi,

How do I find documents that have more than one value in a field?

Example:


   
blue
red
   


Vincent Vu Nguyen




400 error with boost and exists()

2013-01-16 Thread Walter Underwood
We're running Solr 3.3 and I have a function query for boosting that works with 
bq but not with boost (edismax). This is the same behavior described here:

http://stackoverflow.com/questions/12128561/why-doesnt-solr-function-query-work-with-boost-parameter

Here is the first part of the stack trace:

null java.lang.UnsupportedOperationException at 
org.apache.solr.search.function.DocValues.floatVal(DocValues.java:41) at 
org.apache.solr.search.function.BoostedQuery$CustomScorer.score(BoostedQuery.java:167)
 at 
org.apache.lucene.search.TopScoreDocCollector$InOrderTopScoreDocCollector.collect(TopScoreDocCollector.java:47)
 at org.apache.lucene.search.Scorer.score(Scorer.java:90) at 
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:526) at 
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:320) at 
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1178)
 at 
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1066)
 at 
...

I'm passing in this function query. The first term in the product() is set as a 
default in the request handler for boost. 

"product(log(max(demand_chegg_rolling,1)),if(exists('school'='1579535');5;1))"

I did not find a Jira item that matched this. Any hints on what is going on?

wunder
--
Walter Underwood
wun...@wunderwood.org





Re: Searching for field that contains multiple values

2013-01-16 Thread Mikhail Khludnev
It has been discussed few times - you need to implement own Similarity,
which will write number of tokens as a norm during indexing, and then in
query time you can check the norm value per document.
You can also do it on a more straightforward way: preprocess docs to derive
a number_or_colors field, eg. via UpdateProcessor and filter for this field
as usual.


On Wed, Jan 16, 2013 at 10:18 PM, Nguyen, Vincent (CDC/OD/OADS) (CTR) <
v...@cdc.gov> wrote:

> Hi,
>
> How do I find documents that have more than one value in a field?
>
> Example:
>
> 
>
> blue
> red
>
> 
>
> Vincent Vu Nguyen
>
>
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics


 


RE: Solr exception when parsing XML

2013-01-16 Thread Zhang, Lisheng
Hi Alex,

Thanks very much for helps! I switched to (I am using PHP in client side)

createTextNode(urlencode($value))

so CTRL character problem is avoided, but I noticed that somehow solr did
not perform urldecode($value), so my initial value

abc xyz

becomes 

abc+xyz 

I have not fully read through solr code on this part, but guess maybe it
is a configuration issue (when using CDATA I donot have this issue)?

Thanks and best regards, Lisheng

-Original Message-
From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
Sent: Tuesday, January 15, 2013 12:56 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr exception when parsing XML


Interesting point. Looks like CDATA is more limiting than I thought:
http://en.wikipedia.org/wiki/CDATA#Issues_with_encoding . Basically, the
recommendation is to avoid CDATA and automatically encode characters such
as yours, as well as less/more and ampersand.

Regards,
   Alex.


Re: Disable term frequency for some fields in solr

2013-01-16 Thread Upayavira
There's gonna be two ways to do this - for yourself or for everyone.

For yourself, you'll want to subclass
org.apache.lucene.search.similarities.DefaultSimilarity and
org.apache.solr.search.similarities.DefaultSimilarityFactory.

Alternatively, patch those two files to allow setting the TF or the IDF
via a configuration parameter, and post a patch to JIRA. I'm sure there
are other folks that would want the feature, and would hope it would be
accepted easily.

E.g. disableIDF=true or disableTF=true would make those functions just
return 1.

My thoughts anyhow.

Upayavira

On Wed, Jan 16, 2013, at 11:37 AM, Amit Jha wrote:
> Done same thing in solr3.6 and working but in sorl3.6 filed level of
> similarity is not available. And Solr4 has Similarity Factories. So I was
> not getting how do I do it on solr4. Which class do i need to extend and
> move ahead.
> 
> 
> On Wed, Jan 16, 2013 at 4:44 PM, Upayavira  wrote:
> 
> > For someone versed in Jav


Re: Query parsing VS marshalling/unmarshalling

2013-01-16 Thread balaji.gandhi
Hi, 

I am trying to do something similar:- 

Eg. 
Input: (name:John AND name:Doe) 
Output: ((firstName:John OR lastName:John) AND (firstName:John OR
lastName:John)) 

How can I extract the fields, change them and repackage the query? 

Thanks, 
Balaji



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Query-parsing-VS-marshalling-unmarshalling-tp3935430p4033985.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Rename fields in a query

2013-01-16 Thread balaji.gandhi
Hi, 

I am trying to do something similar:- 

Eg. 
Input: (name:John AND name:Doe) 
Output: ((firstName:John OR lastName:John) AND (firstName:John OR
lastName:John)) 

How can I extract the fields, change them and repackage the query? 

Thanks, 
Balaji



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Rename-fields-in-a-query-tp2693739p4033988.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Disable term frequency for some fields in solr

2013-01-16 Thread Markus Jelsma
I would prefer to use SchemaSimilarityFactory as a global similarity and 
configure a per-field similarity of which some use a flat TF impl. Much simples 
and no need to patch anything, just build a custom sim.

-Original message-
> From:Upayavira 
> Sent: Wed 16-Jan-2013 21:22
> To: solr-user@lucene.apache.org
> Subject: Re: Disable term frequency for some fields in solr
> 
> There's gonna be two ways to do this - for yourself or for everyone.
> 
> For yourself, you'll want to subclass
> org.apache.lucene.search.similarities.DefaultSimilarity and
> org.apache.solr.search.similarities.DefaultSimilarityFactory.
> 
> Alternatively, patch those two files to allow setting the TF or the IDF
> via a configuration parameter, and post a patch to JIRA. I'm sure there
> are other folks that would want the feature, and would hope it would be
> accepted easily.
> 
> E.g. disableIDF=true or disableTF=true would make those functions just
> return 1.
> 
> My thoughts anyhow.
> 
> Upayavira
> 
> On Wed, Jan 16, 2013, at 11:37 AM, Amit Jha wrote:
> > Done same thing in solr3.6 and working but in sorl3.6 filed level of
> > similarity is not available. And Solr4 has Similarity Factories. So I was
> > not getting how do I do it on solr4. Which class do i need to extend and
> > move ahead.
> > 
> > 
> > On Wed, Jan 16, 2013 at 4:44 PM, Upayavira  wrote:
> > 
> > > For someone versed in Jav
> 


RE: Solr exception when parsing XML

2013-01-16 Thread Markus Jelsma
In Apache Nutch we strip non-character code points with a simple method. Check 
the patch, the relevant part is easily ported to any language: 
https://issues.apache.org/jira/browse/NUTCH-1016

 
 
-Original message-
> From:Zhang, Lisheng 
> Sent: Wed 16-Jan-2013 20:48
> To: solr-user@lucene.apache.org
> Subject: RE: Solr exception when parsing XML
> 
> Hi Alex,
> 
> Thanks very much for helps! I switched to (I am using PHP in client side)
> 
> createTextNode(urlencode($value))
> 
> so CTRL character problem is avoided, but I noticed that somehow solr did
> not perform urldecode($value), so my initial value
> 
> abc xyz
> 
> becomes 
> 
> abc+xyz 
> 
> I have not fully read through solr code on this part, but guess maybe it
> is a configuration issue (when using CDATA I donot have this issue)?
> 
> Thanks and best regards, Lisheng
> 
> -Original Message-
> From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
> Sent: Tuesday, January 15, 2013 12:56 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr exception when parsing XML
> 
> 
> Interesting point. Looks like CDATA is more limiting than I thought:
> http://en.wikipedia.org/wiki/CDATA#Issues_with_encoding . Basically, the
> recommendation is to avoid CDATA and automatically encode characters such
> as yours, as well as less/more and ampersand.
> 
> Regards,
>Alex.
> 


Re: 400 error with boost and exists()

2013-01-16 Thread Jack Krupansky
Maybe it's the semicolons in the "if", which should be commas. Also, you're 
using some odd syntax in the "exists" value data source which expects a 
field name or a function.


-- Jack Krupansky

-Original Message- 
From: Walter Underwood

Sent: Wednesday, January 16, 2013 1:28 PM
To: solr-user@lucene.apache.org
Subject: 400 error with boost and exists()

We're running Solr 3.3 and I have a function query for boosting that works 
with bq but not with boost (edismax). This is the same behavior described 
here:


http://stackoverflow.com/questions/12128561/why-doesnt-solr-function-query-work-with-boost-parameter

Here is the first part of the stack trace:

null java.lang.UnsupportedOperationException at 
org.apache.solr.search.function.DocValues.floatVal(DocValues.java:41) at 
org.apache.solr.search.function.BoostedQuery$CustomScorer.score(BoostedQuery.java:167) 
at 
org.apache.lucene.search.TopScoreDocCollector$InOrderTopScoreDocCollector.collect(TopScoreDocCollector.java:47) 
at org.apache.lucene.search.Scorer.score(Scorer.java:90) at 
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:526) at 
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:320) at 
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1178) 
at 
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1066) 
at

...

I'm passing in this function query. The first term in the product() is set 
as a default in the request handler for boost.


"product(log(max(demand_chegg_rolling,1)),if(exists('school'='1579535');5;1))"

I did not find a Jira item that matched this. Any hints on what is going on?

wunder
--
Walter Underwood
wun...@wunderwood.org




Re: 400 error with boost and exists()

2013-01-16 Thread Walter Underwood
First, that works as "bf".

I got the syntax from: 
http://lucidworks.lucidimagination.com/display/solr/Function+Queries

Various documentation has different syntax for exists().

wunder

On Jan 16, 2013, at 3:00 PM, Jack Krupansky wrote:

> Maybe it's the semicolons in the "if", which should be commas. Also, you're 
> using some odd syntax in the "exists" value data source which expects a field 
> name or a function.
> 
> -- Jack Krupansky
> 
> -Original Message- From: Walter Underwood
> Sent: Wednesday, January 16, 2013 1:28 PM
> To: solr-user@lucene.apache.org
> Subject: 400 error with boost and exists()
> 
> We're running Solr 3.3 and I have a function query for boosting that works 
> with bq but not with boost (edismax). This is the same behavior described 
> here:
> 
> http://stackoverflow.com/questions/12128561/why-doesnt-solr-function-query-work-with-boost-parameter
> 
> Here is the first part of the stack trace:
> 
> null java.lang.UnsupportedOperationException at 
> org.apache.solr.search.function.DocValues.floatVal(DocValues.java:41) at 
> org.apache.solr.search.function.BoostedQuery$CustomScorer.score(BoostedQuery.java:167)
>  at 
> org.apache.lucene.search.TopScoreDocCollector$InOrderTopScoreDocCollector.collect(TopScoreDocCollector.java:47)
>  at org.apache.lucene.search.Scorer.score(Scorer.java:90) at 
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:526) at 
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:320) at 
> org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1178)
>  at 
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1066)
>  at
> ...
> 
> I'm passing in this function query. The first term in the product() is set as 
> a default in the request handler for boost.
> 
> "product(log(max(demand_chegg_rolling,1)),if(exists('school'='1579535');5;1))"
> 
> I did not find a Jira item that matched this. Any hints on what is going on?
> 
> wunder
> --
> Walter Underwood
> wun...@wunderwood.org
> 
> 

--
Walter Underwood
wun...@wunderwood.org





RE: Solr exception when parsing XML

2013-01-16 Thread Zhang, Lisheng
Hi,

Thanks very much for helps! I checked solr source code, what happened is that
for XML text inside one element, solr does not call URLDecoder (but to pass
CTRL character, I have to call urlencode from PHP).

So either I try to remove CTRL character from PHP side, or I change solr 
XMLReader
slightly to call URLDecoder on text.

Thanks and best regards, Lisheng


-Original Message-
From: Markus Jelsma [mailto:markus.jel...@openindex.io]
Sent: Wednesday, January 16, 2013 2:41 PM
To: solr-user@lucene.apache.org
Subject: RE: Solr exception when parsing XML


In Apache Nutch we strip non-character code points with a simple method. Check 
the patch, the relevant part is easily ported to any language: 
https://issues.apache.org/jira/browse/NUTCH-1016

 
 
-Original message-
> From:Zhang, Lisheng 
> Sent: Wed 16-Jan-2013 20:48
> To: solr-user@lucene.apache.org
> Subject: RE: Solr exception when parsing XML
> 
> Hi Alex,
> 
> Thanks very much for helps! I switched to (I am using PHP in client side)
> 
> createTextNode(urlencode($value))
> 
> so CTRL character problem is avoided, but I noticed that somehow solr did
> not perform urldecode($value), so my initial value
> 
> abc xyz
> 
> becomes 
> 
> abc+xyz 
> 
> I have not fully read through solr code on this part, but guess maybe it
> is a configuration issue (when using CDATA I donot have this issue)?
> 
> Thanks and best regards, Lisheng
> 
> -Original Message-
> From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
> Sent: Tuesday, January 15, 2013 12:56 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr exception when parsing XML
> 
> 
> Interesting point. Looks like CDATA is more limiting than I thought:
> http://en.wikipedia.org/wiki/CDATA#Issues_with_encoding . Basically, the
> recommendation is to avoid CDATA and automatically encode characters such
> as yours, as well as less/more and ampersand.
> 
> Regards,
>Alex.
> 


Re: 400 error with boost and exists()

2013-01-16 Thread Yonik Seeley
On Wed, Jan 16, 2013 at 6:11 PM, Walter Underwood  wrote:
> I got the syntax from: 
> http://lucidworks.lucidimagination.com/display/solr/Function+Queries

Oops, I've alerted our tech writers!  It should be fixed now.

exists(field|function) returns true if a value exists for a given document.
Example use: exists(myField) will return if myField has a value, while
exists(query(year:2012)) will return true for docs with year=2012.

So in your case, something like this should hopefully work:
if( exists(query(school:1579535)), 5, 1)

-Yonik
http://lucidworks.com


Re: 400 error with boost and exists()

2013-01-16 Thread Walter Underwood
None of the variants worked. I started with that syntax for both exists() and 
if(). All gave the same stack trace. --wunder

On Jan 16, 2013, at 3:32 PM, Yonik Seeley wrote:

> On Wed, Jan 16, 2013 at 6:11 PM, Walter Underwood  
> wrote:
>> I got the syntax from: 
>> http://lucidworks.lucidimagination.com/display/solr/Function+Queries
> 
> Oops, I've alerted our tech writers!  It should be fixed now.
> 
> exists(field|function) returns true if a value exists for a given document.
> Example use: exists(myField) will return if myField has a value, while
> exists(query(year:2012)) will return true for docs with year=2012.
> 
> So in your case, something like this should hopefully work:
> if( exists(query(school:1579535)), 5, 1)
> 
> -Yonik
> http://lucidworks.com






Re: 400 error with boost and exists()

2013-01-16 Thread Yonik Seeley
On Wed, Jan 16, 2013 at 6:35 PM, Walter Underwood  wrote:
> None of the variants worked. I started with that syntax for both exists() and 
> if(). All gave the same stack trace. --wunder

These boolean functions are new for 4.0, but it looks like you're using 3.3?

-Yonik
http://lucidworks.com


Re: 400 error with boost and exists()

2013-01-16 Thread Chris Hostetter

: None of the variants worked. I started with that syntax for both 
: exists() and if(). All gave the same stack trace. --wunder

...

: We're running Solr 3.3 and I have a function query for boosting that 
: works with bq but not

...i'm very confused.  All of the "boolean" functions (like "if()", 
and "exists()") were added in Solr 4.0...

https://wiki.apache.org/solr/FunctionQuery#Boolean_Functions

As for why it might look like it "works" in the "bq" param -- that is a 
"boost query" parsed as a query string, so it doesn't parse with the 
function syntax, and you are probably getting a very odd boolean query 
matching on docs with "if" and "exists" in the default text field.

-Hoss


Re: 400 error with boost and exists()

2013-01-16 Thread Walter Underwood
Ah, that would be it. Does 4.0 also give a stack trace if you call a function 
that doesn't exist?

I can achieve most of what I want with bq, though that has IDF, which I'd 
rather avoid here.

wunder

On Jan 16, 2013, at 3:38 PM, Yonik Seeley wrote:

> On Wed, Jan 16, 2013 at 6:35 PM, Walter Underwood  
> wrote:
>> None of the variants worked. I started with that syntax for both exists() 
>> and if(). All gave the same stack trace. --wunder
> 
> These boolean functions are new for 4.0, but it looks like you're using 3.3?
> 
> -Yonik
> http://lucidworks.com






Re: 400 error with boost and exists()

2013-01-16 Thread Yonik Seeley
On Wed, Jan 16, 2013 at 6:42 PM, Walter Underwood  wrote:
> Ah, that would be it. Does 4.0 also give a stack trace if you call a function 
> that doesn't exist?

Stack trace still appears in the logs, but the error message returned seems OK:

http://localhost:8983/solr/query?q=*:*&defType=edismax&boost=product(log(max(2,1)),if(exists('school'='1579535');5;1))

{
  "responseHeader":{
"status":400,
"QTime":4,
"params":{
  "q":"*:*",
  "boost":"product(log(max(2,1)),if(exists('school'='1579535');5;1))",
  "defType":"edismax"}},
  "error":{
"msg":"org.apache.solr.search.SyntaxError: Expected ',' at
position 40 in 'product(log(max(2,1)),if(exists('school'='1579535');5;1))'",
"code":400}}

http://localhost:8983/solr/select?q=*:*&defType=edismax&boost=product(log(max(2,1)),if(foobar('school'='1579535');5;1))

{
  "responseHeader":{
"status":400,
"QTime":1,
"params":{
  "q":"*:*",
  "boost":"product(log(max(2,1)),if(foobar('school'='1579535');5;1))",
  "defType":"edismax"}},
  "error":{
"msg":"org.apache.solr.search.SyntaxError: Unknown function foobar
in FunctionQuery('product(log(max(2,1)),if(foobar('school'='1579535');5;1))',
pos=32)",
"code":400}}


-Yonik
http://lucidworks.com


Re: Disable term frequency for some fields in solr

2013-01-16 Thread Amit Jha
Please correct my understanding,

Use one of the factory as global similarity.

And extends org.apache.lucene.search.similarities.DefaultSimilarity to create 
custom sim.

And add a similarity tag in field type definition for required fields.

Or there is some other way to do that?

Rgds
AJ

On 17-Jan-2013, at 4:08, Markus Jelsma  wrote:

> org.apache.lucene.search.similarities.DefaultSimilarity


Re: Parsing a Lucene/Solr query and adding more clauses

2013-01-16 Thread Chris Hostetter
: I am trying to write a util which can parse a Lucene/Solr query and convert
: into an object representation to add more clauses to the query. 
: 
: Eg.
: Input: (name:John AND name:Doe)
: Output: ((firstName:John OR lastName:John) AND (firstName:John OR
: lastName:John))

edismax can support this natively, no coding required -- although itwill 
build a DisjunctionMaxQuery across the "name"=>"firstName,lastName" 
expansion, not a simple BooleanQuery (but if you set the "tie" param to 
1.0 it should be equivilent, although i suspect a DisjunctionMaxQuery 
with the default tie breaker value would fit your usecase better 
anyway)...

http://wiki.apache.org/solr/ExtendedDisMax#Field_aliasing_.2BAC8_renaming

So try something like this...

http://localhost:8983/solr/select?defType=edismax&f.name.qf=firstName+lastName&q=%28name:John%20AND%20name:Doe%29&debugQuery=true


-Hoss


Re: Disable term frequency for some fields in solr

2013-01-16 Thread Chris Hostetter

: Or there is some other way to do that?

I'm late to this thread, but what was wrong with the simple suggestion of 
omitTermFreqAndPositions="true" ?


-Hoss


RE: SolrJ DirectXmlRequest

2013-01-16 Thread Chris Hostetter

: DirectXmlRequest is part of the SolrJ library, so I guess that means it 
: is not commonly used.  My use case is that I'm applying an XSLT to the 
: raw XML on the client side, instead of leaving that up to the Solr 
: master (although even if I applied the XSLT on the Solr server, I'd 

I think Otis's point was that most people don't have Solr XML files lying 
arround that they send to Solr, nor do they build up XML strings in Java 
in the Solr input format (with XSLT or otherwise) ... most people using 
SolrJ build up SolrInputDocument objects and pass those to their 
SolrServer instance.

: I've done some research and I'm fairly confident that apache 
: commons-fileupload library is responsible for the temp files.  There's 

I believe you are correct ... searching for "solr fileupload temp files" 
lead me to this issue which seems to have fallen by the way side...

https://issues.apache.org/jira/browse/SOLR-1953

...if you could try that patch outand/or post your comments it would be 
helpful.

Something that seems really odd to me however is how/why your basic 
updates are even causing multipart/file-upload functionality to be used 
... a quick skim of the client code suggests that that should only happen 
if your try to send multiple ContentStreams in a single request: I can 
understand why that wouldn't typically happen for most users building up 
multiple SolrInputDocuments (they would get added to a single stream); and 
i can understand why that would typically happen for users sending 
multiple binary files to something like ExtractingRequestHandler -- but if 
you are using DirectXmlRequest in the way you described each xml file 
should be sent as a single stream in a single request and the XML should 
be sent in the raw POST body -- the commons-fileupload code shouldn't even 
come into play.  (either that, or i'm missing something, or you're using 
an older version of solr that used fileupload even if there was only a 
single content stream)


-Hoss


RE: Search strategy - improving search quality for short search terms such as "doll"

2013-01-16 Thread David Parks
Thanks for the recommendation. I'll start this book today.

In my example, "doll" is one example of a million I might only guess at, 
whereas the category "music", and "book" tend to interferes in many places and 
seem to be a more limited set of categories to deal with.

Dave


-Original Message-
From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] 
Sent: Thursday, January 17, 2013 12:01 AM
To: solr-user@lucene.apache.org
Subject: Re: Search strategy - improving search quality for short search terms 
such as "doll"

Sounds like 'Doll' could be a category for you, while "Doll face" is a title. 
Maybe the categories should get a higher boost in eDismax definition over the 
titles?

Related, you may find the following book interesting:
http://rosenfeldmedia.com/books/searchanalytics/

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at once. 
Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Wed, Jan 16, 2013 at 4:40 AM, David Parks  wrote:

> I'm a beginner-intermediate solr admin, I've set up the basics for our 
> application and it runs well.
>
>
>
> Now it's time for me to dig in and start tuning and improving queries.
>
>
>
> My next target is searches on simple terms such as "doll" which, in 
> google, would return documents about, well, "toy dolls", because 
> that's the most common usage of the simple term "doll". But in my 
> index it predominantly returns documents about CDs with the song "Doll Face", 
> and "My baby doll"
> in
> them.
>
>
>
> I'm not directly asking how to solve this as much as I'm asking what 
> direction I should be looking in to learn what I need to know to 
> tackle the general issue myself.
>
>
>
> Left on my own I would start looking at categorizing the CD's into a 
> facet called "music", reasonably doable in my dataset. Then I need to 
> reduce the boost-value of the entire facet/category of music unless 
> certain pre-defined query terms exist, such as [music, cd, song, 
> listen, dvd,  exhaustive list>, etc.].
>
>
>
> I don't yet know how to do all of this, but after a couple more good 
> books I should be "dangerous".
>
>
>
> So the question to this list:
>
>
>
> -  Am I on the right track here?  If not, can you point me in a
> direction to go?
>
>
>
>
>
>



RE: Search strategy - improving search quality for short search terms such as "doll"

2013-01-16 Thread David Parks
My issue is more that the search term doll shows up in both documents on CDs
as well as documents about toys. But I have 10 CD documents for every toy
document, so my searches for "doll" tend to show the CDs most prominently.
But that's not the way a user thinks. If they want the CD documents they'll
search for "doll face", or "doll face song", more specific queries (which
work fine), but if they want the toy they might just search for "doll".

If I run the searches "doll" and "doll song" on google image search you'll
clearly see that google has solved this problem perfectly. "doll" returns
toy dolls, and "doll song" returns music and anime results.

I'm striving for this type of result.



-Original Message-
From: Amit Jha [mailto:shanuu@gmail.com] 
Sent: Wednesday, January 16, 2013 11:41 PM
To: solr-user@lucene.apache.org
Subject: Re: Search strategy - improving search quality for short search
terms such as "doll"

Its all about the data data set, here I mean index. If you have documents
containing "toy" and "doll" it will return that in result set. 

What I understood that you are talking about the context of the query. For
example if you search "books on MK Gandhi" and "books by MK Gandhi" both
queries have different context.

Context based search at some level achieved by natural language processing.
This one you can look at for better search.

Look for solr wiki & mailing list would be great source of learning.


Rgds
AJ

On 16-Jan-2013, at 15:10, "David Parks"  wrote:

> I'm a beginner-intermediate solr admin, I've set up the basics for our 
> application and it runs well.
> 
> 
> 
> Now it's time for me to dig in and start tuning and improving queries.
> 
> 
> 
> My next target is searches on simple terms such as "doll" which, in 
> google, would return documents about, well, "toy dolls", because 
> that's the most common usage of the simple term "doll". But in my 
> index it predominantly returns documents about CDs with the song "Doll 
> Face", and "My baby doll" in them.
> 
> 
> 
> I'm not directly asking how to solve this as much as I'm asking what 
> direction I should be looking in to learn what I need to know to 
> tackle the general issue myself.
> 
> 
> 
> Left on my own I would start looking at categorizing the CD's into a 
> facet called "music", reasonably doable in my dataset. Then I need to 
> reduce the boost-value of the entire facet/category of music unless 
> certain pre-defined query terms exist, such as [music, cd, song, 
> listen, dvd, , etc.].
> 
> 
> 
> I don't yet know how to do all of this, but after a couple more good 
> books I should be "dangerous".
> 
> 
> 
> So the question to this list:
> 
> 
> 
> -  Am I on the right track here?  If not, can you point me in a
> direction to go?
> 
> 
> 
> 
> 



Re: SolrCloud-Master-Slave hybrid via additional replication handler on SolrCloud nodes?

2013-01-16 Thread Mark Miller

On Jan 15, 2013, at 10:59 AM, Otis Gospodnetic  
wrote:

> Hi,
> 
> Question:
> Can one add the Solr master-like replication handler (but not call it
> /replication, yes) to SolrCloud nodes and point additional slave-like
> servers (i.e. servers that are not in the SolrCloud cluster) to that?
> 
> More info:
> I have a 4-node SolrCloud cluster and would like to add 2 more servers to
> the picture -- not as members of the cluster, but as slaves that replicate
> the collection from some SolrCloud nodes periodically.  In effect, a hybrid
> Cloud-Master-Slave setup. :)

Yeah, theoretically it should work, but I've never tested it to see if there is 
a hitch.

> 
> Reason:
> I have some heavy machine-triggered queries that I want to separate from
> lighter human-entered queries It looks like I cannot increase the
> replication factor for the collection because the collection was created
> via the API, not via solr.xml, so I can't easily reconfigure the
> collection.  

You could just use the CoreAdmin API to create new replicas on whatever nodes.

- Mark



Re: Disable term frequency for some fields in solr

2013-01-16 Thread Amit Jha
It will affect the phrase queries. That is why I am not using suggest
configuration.

On Thu, Jan 17, 2013 at 7:20 AM, Chris Hostetter
wrote:

>
> : Or there is some other way to do that?
>
> I'm late to this thread, but what was wrong with the simple suggestion of
> omitTermFreqAndPositions="true" ?
>
>
> -Hoss
>


Solr commit taking too long

2013-01-16 Thread Cool Techi
Hi,

We have an index of approximately 400GB in size, indexing 5000 documents was 
taking 20 seconds. But lately, the indexing is taking very long, committing the 
same amount of document is taking 5-20 mins. 

On checking the logs I can see that their a frequent merges happening, which I 
am guessing is the reason for this, how can this be improved. My configurations 
are given below,

false
30
64

regards,
Ayush
  

Large data importing getting rollback with solr

2013-01-16 Thread ashimbose
I am trying to index large data (not rich document) about 5GB, but Its not
getting index. In case of small data it's perfectly indexing.For Large data
import XML response..  00  data-config.xml   
full-import  busy  A command is still running...  0:9:12.738169   
18107902013-01-17 12:50:13Indexing failed. Rolled back all
changes.2013-01-17 12:50:30This response format is experimental.  It
is likely to change in the future.BUT for small data index XML response
perfectly OK as below...  00  data-config.xml   
full-import  busy  A command is still running...  0:0:12.43611   
3820902013-01-17 12:56:57Indexing completed. Added/Updated:
38209 documents. Deleted 0 documents.This response format is
experimental.  It is likely to change in the future.For Large data error log
response is as below...Its getting RollbackINFO: Time taken for
getConnection(): 1343Jan 17, 2013 12:36:21 PM
org.apache.solr.handler.dataimport.JdbcDataSource$1 callINFO: Creating a
connection for entity PS_JOBCODE_HAZ_BRA with URL:
jdbc:attconnect://192.168.1.29:2551/NAVIGATOR;DefTdpName=sampleDBJan 17,
2013 12:36:23 PM org.apache.solr.handler.dataimport.JdbcDataSource$1
callINFO: Time taken for getConnection(): 1341Jan 17, 2013 12:36:23 PM
org.apache.solr.handler.dataimport.JdbcDataSource$1 callINFO: Creating a
connection for entity PS_JOBCODE_HAZ_TBL with URL:
jdbc:attconnect://192.168.1.29:2551/NAVIGATOR;DefTdpName=sampleDBJan 17,
2013 12:36:24 PM org.apache.solr.handler.dataimport.JdbcDataSource$1
callINFO: Time taken for getConnection(): 1357Jan 17, 2013 12:36:24 PM
org.apache.solr.handler.dataimport.JdbcDataSource$1 callINFO: Creating a
connection for entity PS_JOBCODE_LANG with URL:
jdbc:attconnect://192.168.1.29:2551/NAVIGATOR;DefTdpName=sampleDBJan 17,
2013 12:36:26 PM org.apache.solr.handler.dataimport.JdbcDataSource$1
callINFO: Time taken for getConnection(): 1392Jan 17, 2013 12:36:26 PM
org.apache.solr.handler.dataimport.JdbcDataSource$1 callINFO: Creating a
connection for entity PS_JOBCODE_TBL with URL:
jdbc:attconnect://192.168.1.29:2551/NAVIGATOR;DefTdpName=sampleDBJan 17,
2013 12:36:27 PM org.apache.solr.handler.dataimport.JdbcDataSource$1
callINFO: Time taken for getConnection(): 1535Jan 17, 2013 12:36:41 PM
org.apache.solr.handler.dataimport.JdbcDataSource$1 callINFO: Creating a
connection for entity PS_JOBCODE_TBL_ARG with URL:
jdbc:attconnect://192.168.1.29:2551/NAVIGATOR;DefTdpName=sampleDBJan 17,
2013 12:36:43 PM org.apache.solr.handler.dataimport.JdbcDataSource$1
callINFO: Time taken for getConnection(): 1467Jan 17, 2013 12:36:43 PM
org.apache.solr.handler.dataimport.JdbcDataSource$1 callINFO: Creating a
connection for entity PS_JOBCODE_TBL_BRA with URL:
jdbc:attconnect://192.168.1.29:2551/NAVIGATOR;DefTdpName=sampleDBJan 17,
2013 12:36:44 PM org.apache.solr.handler.dataimport.JdbcDataSource$1
callINFO: Time taken for getConnection(): 1373Jan 17, 2013 12:36:44 PM
org.apache.solr.handler.dataimport.JdbcDataSource$1 callINFO: Creating a
connection for entity PS_JOBCOMP_TMP_MC with URL:
jdbc:attconnect://192.168.1.29:2551/NAVIGATOR;DefTdpName=sampleDBJan 17,
2013 12:36:45 PM org.apache.solr.handler.dataimport.JdbcDataSource$1
callINFO: Time taken for getConnection(): 1404Jan 17, 2013 12:36:45 PM
org.apache.solr.handler.dataimport.JdbcDataSource$1 callINFO: Creating a
connection for entity PS_JOBFUNCTION_LNG with URL:
jdbc:attconnect://192.168.1.29:2551/NAVIGATOR;DefTdpName=sampleDBJan 17,
2013 12:36:47 PM org.apache.solr.core.SolrCore executeINFO: [core1]
webapp=/solr path=/dataimport params={command=full-import} status=0
QTime=0Jan 17, 2013 12:36:47 PM
org.apache.solr.handler.dataimport.JdbcDataSource$1 callINFO: Time taken for
getConnection(): 1357Jan 17, 2013 12:36:47 PM
org.apache.solr.handler.dataimport.JdbcDataSource$1 callINFO: Creating a
connection for entity PS_JOBFUNCTION_TBL with URL:
jdbc:attconnect://192.168.1.29:2551/NAVIGATOR;DefTdpName=sampleDBJan 17,
2013 12:36:48 PM org.apache.solr.handler.dataimport.JdbcDataSource$1
callINFO: Time taken for getConnection(): 1310Jan 17, 2013 12:36:48 PM
org.apache.solr.handler.dataimport.JdbcDataSource$1 callINFO: Creating a
connection for entity PS_JOB_APPROVALS with URL:
jdbc:attconnect://192.168.1.29:2551/NAVIGATOR;DefTdpName=sampleDBJan 17,
2013 12:36:50 PM org.apache.solr.handler.dataimport.JdbcDataSource$1
callINFO: Time taken for getConnection(): 1342Jan 17, 2013 12:36:50 PM
org.apache.solr.handler.dataimport.JdbcDataSource$1 callINFO: Creating a
connection for entity PS_JOB_AUS with URL:
jdbc:attconnect://ESMART12:2551/NAVIGATOR;DefTdpName=sampleDBJan 17, 2013
12:36:53 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 callINFO:
Time taken for getConnection(): 2979Jan 17, 2013 12:36:54 PM
org.apache.solr.handler.dataimport.JdbcDataSource$1 callINFO: Creating a
connection for entity PS_JOB_CTG_FRA_LNG with URL:
jdbc:attconnect://192.168.1.29:2551/NAVIGATOR;DefTdp