RE: date range tree

2013-11-13 Thread Andreas Owen
I solved it by adding a loop for years and one for quartals in which i count
the month-facets

-Original Message-
From: Andreas Owen [mailto:a...@conx.ch] 
Sent: Montag, 11. November 2013 17:52
To: solr-user@lucene.apache.org
Subject: RE: date range tree

Has someone at least got a idee how i could do a year/month-date-tree? 

In Solr-Wiki it is mentioned that facet.date.gap=+1DAY,+2DAY,+3DAY,+10DAY
should create 4 buckets but it doesn't work


-Original Message-
From: Andreas Owen [mailto:a...@conx.ch]
Sent: Donnerstag, 7. November 2013 18:23
To: solr-user@lucene.apache.org
Subject: date range tree

I would like to make a facet on a date field with the following tree:

 

2013

4.Quartal

December

November

Oktober

3.Quartal

September

August

Juli

2.Quartal

June

Mai

April

1.   Quartal

March

February

January

2012 .

Same as above

 

 

So far I have this in solrconfig.xml:

 

{!ex=last_modified,thema,inhaltstyp,doctype}last_modified<
/str>

   +1MONTH

   NOW/MONTH

   NOW/MONTH-36MONTHS

   after

 

Can I do this in one query or do I need multiple queries? If yes how would I
do the second and keep all the facet queries in the count?




Re: serialization error - BinaryResponseWriter

2013-11-13 Thread giovanni.bricc...@banzai.it
Mhhh, I run a dih full reload every night, and the source field is a 
sqlserver smallint column...


By the way I'll try cleaning the data dir of the index and reindexing

Il 12/11/13 17:13, Shawn Heisey ha scritto:

On 11/12/2013 2:37 AM, giovanni.bricc...@banzai.it wrote:

I'm getting some errors reading boolean filelds, can you give me any
suggestions? in this example I only have four "false" fields:
leasing=false, FiltroNovita=false, FiltroFreeShipping=false, Outlet=false.

this is the stack trace (solr 4.2.1)

java.lang.NumberFormatException: For input string: "false"
 at
java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)

 at java.lang.Integer.parseInt(Integer.java:492)
 at java.lang.Integer.valueOf(Integer.java:582)
 at org.apache.solr.schema.IntField.toObject(IntField.java:89)
 at org.apache.solr.schema.IntField.toObject(IntField.java:43)
 at
org.apache.solr.response.BinaryResponseWriter$Resolver.getValue(BinaryResponseWriter.java:223)

Solr stores boolean values internally as a number - 0 or 1.  That gets
changed to true/false when displaying search results.

It sounds like what you have here is quite possibly an index which
originally had text fields with the literal string "true" or "false",
and you've changed your schema so these fields are now boolean.  When
you change your schema, you have to reindex.

http://wiki.apache.org/solr/HowToReindex

Thanks,
Shawn





Re: Multi-Tenant Setup in Single Core

2013-11-13 Thread Christian Ramseyer
On 11/12/13 5:20 PM, Shawn Heisey wrote:
> Ensure that all handler names start with a slash character, so they are
> things like "/query", "/select", and so on.  Make sure that handleSelect
> is set to false on your requestDispatcher config.  This is how Solr 4.x
> examples are set up already.
> 
> With that config, the "qt" parameter will not function and will be
> ignored -- you must use the request handler path as part of the URL --
> /solr/corename/handler.


Great thanks, I already had it this way but I wasn't aware of these fine
details, very helpful.

Christian




Re: Modify the querySearch to q=*:*

2013-11-13 Thread Alvaro Cabrerizo
Hi:

First of all I have to say that I had never heard about *\* as the query to
get all the documents in a index but *:*  (maybe I'm wrong) . Re-reading
"Apache Solr 4 cookbook", "Solr 1.4 Enterprise Search Server" and " Apache
Solr 3 Enterprise Search Server" there is no trace for the query *\* as the
universal query to get every doc.

If you enable 
debugQuery
you
can see that *:* is transformed into "MatchAllDocsQuery(*:*) (Solr1.4 and
Solr4.4) wich means give me all the documents, but the query *\* is
transformed into other thing (In my case having a default field called
description defined in the schema) I get in Solr1.4 description:*\\* wich
means give all the documents that have the char \ in the field description
and in SOLR1.4  I get description:** which also gets all the documents in
the index. It would be helpful to see how is interpreted *\* in your system
(solr3.5 and solr4).

I think, the best way to solve your problem Is to modify the system which
launches the request to SOLR and modify *\* by *:* (if it is possible). I
dont know if SOLR can make that kind of translation, I mean change *\* by
*:*.  One possible workaround with collateral damages is the inclusion of a
PatternReplaceCharFilterFactory (in schema.xml) within the fieldtypes you
use to search in order to delete every \ character included in the input or
even include the expression to transform *\* into *:* . But including that
element in your schema means that it will always be used during your search
(thus if your users type a\b they will search ab). If you want to explore
that path I recommend you to use the analysis
toolincluded
in solr.

Regards.













On Wed, Nov 13, 2013 at 2:34 AM, Shawn Heisey  wrote:

> On 11/12/2013 6:03 PM, Abhijith Jain -X (abhijjai - DIGITAL-X INC at
> Cisco) wrote:
>
>> I am trying to set the query to q=*:* permanently. I tried to set q=*:*
>> in SolrConfig.xml file as follows.
>>
>> 
>>  
>>  none
>>  *:*
>>  
>>  
>>
>> But this didn’t help. Please advise how to change query to q=*:* in Solr
>> 4.4.
>>
>
> This configuration sets the default for the q parameter to *:*, but if the
> actual query that is sent to Solr has a q parameter, it will override that
> default.
>
> In the very unlikely situation that you don't want to ever do any query
> besides *:*, you can put that setting into the invariants section instead
> of the defaults section - but be aware that if you do that, you will never
> be able to send any other query.Normally your application decides what the
> query string should be, not Solr.
>
> I concur with Jack's recommendation that you migrate to the 4.x way of
> naming handlers.  You would need to set handleSelect to false and change
> all your search handlers so their name starts with a slash.  The one that
> is currently named "standard" would instead be named "/select" and you
> would need to remove the default="true" setting.
>
> Thanks,
> Shawn
>
>


Re: solrcloud - forward update to a shard failed

2013-11-13 Thread michael.boom
Do you do your commit from the two indexing clients or have the autocommit
set to maxDocs = 1000 ?



-
Thanks,
Michael
--
View this message in context: 
http://lucene.472066.n3.nabble.com/solrcloud-forward-update-to-a-shard-failed-tp4100608p4100633.html
Sent from the Solr - User mailing list archive at Nabble.com.


Updating Document Score With Payload of Multivalued Field?

2013-11-13 Thread Furkan KAMACI
Here is my case;

I have a field at my schema named *elmo_field*. I want that *elmo_field* should
have multiple values and multiple payloads. i.e.

dorothy|0.46
sesame|0.37
big bird|0.19
bird|0.22

When a user searches for a keyword i.e. *dorothy* I want to add 0.46 to
score. If user searches for *big bird *0.19 and if user searches for *bird *
0.22

I mean I will make a search on my index at my other fields of solr schema.
 And I will make another search (this one is an exact match search) at
*elmo_field* at same time and if matches something I will increase score
with payloads.

How can I do that: adding something to score at multivalued payload (with a
nested query or not) and do you have any other ideas to achieve that?


Re: Updating Document Score With Payload of Multivalued Field?

2013-11-13 Thread Furkan KAMACI
PS: I use Solr 4.5.1


2013/11/13 Furkan KAMACI 

> Here is my case;
>
> I have a field at my schema named *elmo_field*. I want that *elmo_field* 
> should
> have multiple values and multiple payloads. i.e.
>
> dorothy|0.46
> sesame|0.37
> big bird|0.19
> bird|0.22
>
> When a user searches for a keyword i.e. *dorothy* I want to add 0.46 to
> score. If user searches for *big bird *0.19 and if user searches for *bird
> *0.22
>
> I mean I will make a search on my index at my other fields of solr schema.
>  And I will make another search (this one is an exact match search) at
> *elmo_field* at same time and if matches something I will increase score
> with payloads.
>
> How can I do that: adding something to score at multivalued payload (with
> a nested query or not) and do you have any other ideas to achieve that?
>
>
>
>


Re: Why do people want to deploy to Tomcat?

2013-11-13 Thread Alexandre Rafalovitch
So, it sounds like that either Solr is treated as a webapp, in which case
it is installed with most of the webapps under Tomcat (legacy/operational
reason). So, Solr docs just needs to explain how to deploy under Tomcat and
the rest of document/tooling comes from Tomcat community.

Or, if Solr is treated not as a webapp but as a black box, it needs to
support and explain all the operational requirements (deployment,
extension, monitoring) that are currently waved away as a 'container
issue'.

Regards,
   Alex.
P.s. I also agree that example directory layout is become very confusing
and may need to be re-thought. Probably a discussion for a different
thread, if somebody has a thought out suggestion.
Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Tue, Nov 12, 2013 at 8:32 PM, Gopal Patwa  wrote:

> My case is also similar to "Sujit Pal" but we have jboss6.
>
>
> On Tue, Nov 12, 2013 at 9:47 AM, Sujit Pal  wrote:
>
> > In our case, it is because all our other applications are deployed on
> > Tomcat and ops is familiar with the deployment process. We also had
> > customizations that needed to go in, so we inserted our custom JAR into
> the
> > solr.war's WEB-INF/lib directory, so to ops the process of deploying Solr
> > was (almost, except for schema.xml or solrconfig.xml changes) identical
> to
> > any of the other apps. But I think if Solr becomes a server with clearly
> > defined extension points (such as dropping your custom JARs into lib/ and
> > custom configuration in conf/solrconfig.xml or similar like it already
> is)
> > then it will be treated as something other than a webapp and the
> > expectation that it runs on Tomcat will not apply.
> >
> > Just my $0.02...
> >
> > Sujit
> >
> >
> >
> > On Tue, Nov 12, 2013 at 9:13 AM, Siegfried Goeschl 
> > wrote:
> >
> > > Hi ALex,
> > >
> > > in my case
> > >
> > > * ignorance that Tomcat is not fully supported
> > > * Tomcat configuration and operations know-how inhouse
> > > * could migrate to Jetty but need approved change request to do so
> > >
> > > Cheers,
> > >
> > > Siegfried Goeschl
> > >
> > > On 12.11.13 04:54, Alexandre Rafalovitch wrote:
> > >
> > >> Hello,
> > >>
> > >> I keep seeing here and on Stack Overflow people trying to deploy Solr
> to
> > >> Tomcat. We don't usually ask why, just help when where we can.
> > >>
> > >> But the question happens often enough that I am curious. What is the
> > >> actual
> > >> business case. Is that because Tomcat is well known? Is it because
> other
> > >> apps are running under Tomcat and it is ops' requirement? Is it
> because
> > >> Tomcat gives something - to Solr - that Jetty does not?
> > >>
> > >> It might be useful to know. Especially, since Solr team is considering
> > >> making the server part into a black box component. What use cases will
> > >> that
> > >> break?
> > >>
> > >> So, if somebody runs Solr under Tomcat (or needed to and gave up),
> let's
> > >> use this thread to collect this knowledge.
> > >>
> > >> Regards,
> > >> Alex.
> > >> Personal website: http://www.outerthoughts.com/
> > >> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> > >> - Time is the quality of nature that keeps events from happening all
> at
> > >> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> > book)
> > >>
> > >>
> >
>


(info) lucene first search performance

2013-11-13 Thread Jacky.J.Wang (mis.cnsh04.Newegg) 41361


Dear lucene


In order to test the solr search performance  ,I closed all the cache solr
[cid:image001.png@01CEE0AA.39ECDE90]

insert into the 10 million data,and find  the first search very 
slowly(700ms),and the secondary search very quick(20ms),I am sure no solr cache。

This problem bothering me for a month,



Tracing the source code found



[说明: 说明: cid:image001.png@01CED80C.EF49C740]

Fisrt  invoke readVIntBlock method always very slowly  ,and secondary invoke 
readVIntBlock method is very quick, I don't know what reason is this



Eagerly awaiting your reply, thanks very much!!!




Re: Why do people want to deploy to Tomcat?

2013-11-13 Thread Dmitry Kan
Hi,

Reading that people have considered deploying "example" folder is slightly
strange to me. No wonder they are confused and confuse their ops. We just
took vanilla jetty (jetty9) and installed solr.war on it, configured it, no
example folders at all. Since then it works nicely.

The main reason for us to get away from tomcat, that we have used
originally, was that it felt too heavy for running a Solr webapp, which
isn't using anything Tomcat-specific. In older versions (tomcat6) it would
leak memory and threads. We knew, that jetty is mature enough and is
lighter and used at large companies, like Google. This was convincing
enough to try.

We are still using Tomcat for other webapps, specifically for clustering
and load balancing between webapp instances, but that is not needed for our
Solr installation at this point.

Regards,

Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: twitter.com/dmitrykan



On Wed, Nov 13, 2013 at 1:42 PM, Alexandre Rafalovitch
wrote:

> So, it sounds like that either Solr is treated as a webapp, in which case
> it is installed with most of the webapps under Tomcat (legacy/operational
> reason). So, Solr docs just needs to explain how to deploy under Tomcat and
> the rest of document/tooling comes from Tomcat community.
>
> Or, if Solr is treated not as a webapp but as a black box, it needs to
> support and explain all the operational requirements (deployment,
> extension, monitoring) that are currently waved away as a 'container
> issue'.
>
> Regards,
>Alex.
> P.s. I also agree that example directory layout is become very confusing
> and may need to be re-thought. Probably a discussion for a different
> thread, if somebody has a thought out suggestion.
> Personal website: http://www.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all at
> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
>
>
> On Tue, Nov 12, 2013 at 8:32 PM, Gopal Patwa  wrote:
>
> > My case is also similar to "Sujit Pal" but we have jboss6.
> >
> >
> > On Tue, Nov 12, 2013 at 9:47 AM, Sujit Pal 
> wrote:
> >
> > > In our case, it is because all our other applications are deployed on
> > > Tomcat and ops is familiar with the deployment process. We also had
> > > customizations that needed to go in, so we inserted our custom JAR into
> > the
> > > solr.war's WEB-INF/lib directory, so to ops the process of deploying
> Solr
> > > was (almost, except for schema.xml or solrconfig.xml changes) identical
> > to
> > > any of the other apps. But I think if Solr becomes a server with
> clearly
> > > defined extension points (such as dropping your custom JARs into lib/
> and
> > > custom configuration in conf/solrconfig.xml or similar like it already
> > is)
> > > then it will be treated as something other than a webapp and the
> > > expectation that it runs on Tomcat will not apply.
> > >
> > > Just my $0.02...
> > >
> > > Sujit
> > >
> > >
> > >
> > > On Tue, Nov 12, 2013 at 9:13 AM, Siegfried Goeschl 
> > > wrote:
> > >
> > > > Hi ALex,
> > > >
> > > > in my case
> > > >
> > > > * ignorance that Tomcat is not fully supported
> > > > * Tomcat configuration and operations know-how inhouse
> > > > * could migrate to Jetty but need approved change request to do so
> > > >
> > > > Cheers,
> > > >
> > > > Siegfried Goeschl
> > > >
> > > > On 12.11.13 04:54, Alexandre Rafalovitch wrote:
> > > >
> > > >> Hello,
> > > >>
> > > >> I keep seeing here and on Stack Overflow people trying to deploy
> Solr
> > to
> > > >> Tomcat. We don't usually ask why, just help when where we can.
> > > >>
> > > >> But the question happens often enough that I am curious. What is the
> > > >> actual
> > > >> business case. Is that because Tomcat is well known? Is it because
> > other
> > > >> apps are running under Tomcat and it is ops' requirement? Is it
> > because
> > > >> Tomcat gives something - to Solr - that Jetty does not?
> > > >>
> > > >> It might be useful to know. Especially, since Solr team is
> considering
> > > >> making the server part into a black box component. What use cases
> will
> > > >> that
> > > >> break?
> > > >>
> > > >> So, if somebody runs Solr under Tomcat (or needed to and gave up),
> > let's
> > > >> use this thread to collect this knowledge.
> > > >>
> > > >> Regards,
> > > >> Alex.
> > > >> Personal website: http://www.outerthoughts.com/
> > > >> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> > > >> - Time is the quality of nature that keeps events from happening all
> > at
> > > >> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> > > book)
> > > >>
> > > >>
> > >
> >
>


Re: distributed search is significantly slower than direct search

2013-11-13 Thread Erick Erickson
One thing you can try, and this is more diagnostic than a cure, is return
just
the id field (and insure that lazy field loading is true). That'll tell you
whether
the issue is actually fetching the document off disk and decompressing,
although
frankly that's unlikely since you can get your 5,000 rows from a single
machine
quickly.

The code you found where Solr is spending its time, is that on the
"routing" core
or on the shards? I actually have a hard time understanding how that
code could take a long time, doesn't seem right.

You are transferring 5,000 docs across the network, so it's possible that
your network is just slow, that's certainly a difference between the local
and remote case, but that's a stab in the dark.

Not much help I know,
Erick



On Wed, Nov 13, 2013 at 2:52 AM, Elran Dvir  wrote:

> Erick, Thanks for your response.
>
> We are upgrading our system using Solr.
> We need to preserve old functionality.  Our client displays 5K document
> and groups them.
>
> Is there a way to refactor code in order to improve distributed documents
> fetching?
>
> Thanks.
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Wednesday, October 30, 2013 3:17 AM
> To: solr-user@lucene.apache.org
> Subject: Re: distributed search is significantly slower than direct search
>
> You can't. There will inevitably be some overhead in the distributed case.
> That said, 7 seconds is quite long.
>
> 5,000 rows is excessive, and probably where your issue is. You're having
> to go out and fetch the docs across the wire. Perhaps there is some
> batching that could be done there, I don't know whether this is one
> document per request or not.
>
> Why 5K docs?
>
> Best,
> Erick
>
>
> On Tue, Oct 29, 2013 at 2:54 AM, Elran Dvir  wrote:
>
> > Hi all,
> >
> > I am using Solr 4.4 with multi cores. One core (called template) is my
> > "routing" core.
> >
> > When I run
> > http://127.0.0.1:8983/solr/template/select?rows=5000&q=*:*&shards=127.
> > 0.0.1:8983/solr/core1,
> > it consistently takes about 7s.
> > When I run http://127.0.0.1:8983/solr/core1/select?rows=5000&q=*:*, it
> > consistently takes about 40ms.
> >
> > I profiled the distributed query.
> > This is the distributed query process (I hope the terms are accurate):
> > When solr identifies a distributed query, it sends the query to the
> > shard and get matched shard docs.
> > Then it sends another query to the shard to get the Solr documents.
> > Most time is spent in the last stage in the function "process" of
> > "QueryComponent" in:
> >
> > for (int i=0; i > int id = req.getSearcher().getFirstMatch(
> > new Term(idField.getName(),
> > idField.getType().toInternal(idArr.get(i;
> >
> > How can I make my distributed query as fast as the direct one?
> >
> > Thanks.
> >
>
>
> Email secured by Check Point
>


Re: (info) lucene first search performance

2013-11-13 Thread fbrisbart
Solr uses the MMap Directory by default.

What you see is surely a filesystem cache.
Once a file is accessed, it's memory mapped.
Restarting solr won't reset it.


On unix, you may reset this cache with 
  echo 3 > /proc/sys/vm/drop_caches


Franck Brisbart


Le mercredi 13 novembre 2013 à 11:58 +, Jacky.J.Wang
(mis.cnsh04.Newegg) 41361 a écrit :
>  
> 
> Dear lucene
> 
>  
> 
> In order to test the solr search performance ,I closed all the cache
> solr
> 
> 
> 
> insert into the 10 million data,and find  the first search very
> slowly(700ms),and the secondary search very quick(20ms),I am
> sure no solr cache。
> 
> This problem bothering me for a month,
> 
>  
> 
> Tracing the source code found
> 
>  
> 
> 说明: 说明: cid:image001.png@01CED80C.EF49C740
> 
> Fisrt  invoke readVIntBlock method always very slowly  ,and secondary
> invoke readVIntBlock method is very quick, I don't know what reason is
> this
> 
>  
> 
> Eagerly awaiting your reply, thanks very much!!!
> 
>  
> 
>  
> 
> 




Re: solrcloud - forward update to a shard failed

2013-11-13 Thread Aileen
Explicit commits after writing 1000 docs in a batch from both indexing clients. 
 No auto commit.

Thanks.

> 
> -Original Message
> 
> Do you do your commit from the two indexing clients or have the autocommit 
> set to maxDocs = 1000 ?
> 
> 
> 
> -
> Thanks,
> Michael
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/solrcloud-forward-update-to-a-shard-failed-tp4100608p4100633.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: (info) lucene first search performance

2013-11-13 Thread Erick Erickson
I have to ask a different question: Why would you disable
the caches? You're trying to test worst-case times perhaps?

Because the caches are an integral part of Solr performance.
Disabling them artificially reduces your performance
numbers. So disabling them is useful for answering the question
"how bad can it get", but it's also skewing your results

FWIW,
Erick


On Wed, Nov 13, 2013 at 7:42 AM, fbrisbart wrote:

> Solr uses the MMap Directory by default.
>
> What you see is surely a filesystem cache.
> Once a file is accessed, it's memory mapped.
> Restarting solr won't reset it.
>
>
> On unix, you may reset this cache with
>   echo 3 > /proc/sys/vm/drop_caches
>
>
> Franck Brisbart
>
>
> Le mercredi 13 novembre 2013 à 11:58 +, Jacky.J.Wang
> (mis.cnsh04.Newegg) 41361 a écrit :
> >
> >
> > Dear lucene
> >
> >
> >
> > In order to test the solr search performance ,I closed all the cache
> > solr
> >
> >
> >
> > insert into the 10 million data,and find  the first search very
> > slowly(700ms),and the secondary search very quick(20ms),I am
> > sure no solr cache。
> >
> > This problem bothering me for a month,
> >
> >
> >
> > Tracing the source code found
> >
> >
> >
> > 说明: 说明: cid:image001.png@01CED80C.EF49C740
> >
> > Fisrt  invoke readVIntBlock method always very slowly  ,and secondary
> > invoke readVIntBlock method is very quick, I don't know what reason is
> > this
> >
> >
> >
> > Eagerly awaiting your reply, thanks very much!!!
> >
> >
> >
> >
> >
> >
>
>
>


Re: Modify the querySearch to q=*:*

2013-11-13 Thread Jack Krupansky
Just in case anybody is curious what *\* would really mean, the backslash 
means to escape the following character, which in this case means don't 
treat the second asterisk as a wildcard, but since the initial asterisk was 
not escaped (the full rule is that if there is any unescaped wildcard in a 
term then all of the escaped wildcards are treated as unescaped since Lucene 
has no support for escaping in WildcardQuery), any escaping of wildcards in 
the term is ignored, so *\* is treated as **, and ** is redundant and 
matches the same as *, so a *\* query would simply match all documents that 
have a value in the default search field. In many cases this would give 
identical results to a *:* query, but in some apps it might not.


Still it would be nice to know who originated this suggestion to use *\* 
instead of *:* - or even simply *.


-- Jack Krupansky

-Original Message- 
From: Alvaro Cabrerizo

Sent: Wednesday, November 13, 2013 4:16 AM
To: solr-user@lucene.apache.org
Subject: Re: Modify the querySearch to q=*:*

Hi:

First of all I have to say that I had never heard about *\* as the query to
get all the documents in a index but *:*  (maybe I'm wrong) . Re-reading
"Apache Solr 4 cookbook", "Solr 1.4 Enterprise Search Server" and " Apache
Solr 3 Enterprise Search Server" there is no trace for the query *\* as the
universal query to get every doc.

If you enable 
debugQuery

you
can see that *:* is transformed into "MatchAllDocsQuery(*:*) (Solr1.4 and
Solr4.4) wich means give me all the documents, but the query *\* is
transformed into other thing (In my case having a default field called
description defined in the schema) I get in Solr1.4 description:*\\* wich
means give all the documents that have the char \ in the field description
and in SOLR1.4  I get description:** which also gets all the documents in
the index. It would be helpful to see how is interpreted *\* in your system
(solr3.5 and solr4).

I think, the best way to solve your problem Is to modify the system which
launches the request to SOLR and modify *\* by *:* (if it is possible). I
dont know if SOLR can make that kind of translation, I mean change *\* by
*:*.  One possible workaround with collateral damages is the inclusion of a
PatternReplaceCharFilterFactory (in schema.xml) within the fieldtypes you
use to search in order to delete every \ character included in the input or
even include the expression to transform *\* into *:* . But including that
element in your schema means that it will always be used during your search
(thus if your users type a\b they will search ab). If you want to explore
that path I recommend you to use the analysis
toolincluded
in solr.

Regards.













On Wed, Nov 13, 2013 at 2:34 AM, Shawn Heisey  wrote:


On 11/12/2013 6:03 PM, Abhijith Jain -X (abhijjai - DIGITAL-X INC at
Cisco) wrote:


I am trying to set the query to q=*:* permanently. I tried to set q=*:*
in SolrConfig.xml file as follows.

default="true">

 
 none
 *:*
 
 

But this didn’t help. Please advise how to change query to q=*:* in Solr
4.4.



This configuration sets the default for the q parameter to *:*, but if the
actual query that is sent to Solr has a q parameter, it will override that
default.

In the very unlikely situation that you don't want to ever do any query
besides *:*, you can put that setting into the invariants section instead
of the defaults section - but be aware that if you do that, you will never
be able to send any other query.Normally your application decides what the
query string should be, not Solr.

I concur with Jack's recommendation that you migrate to the 4.x way of
naming handlers.  You would need to set handleSelect to false and change
all your search handlers so their name starts with a slash.  The one that
is currently named "standard" would instead be named "/select" and you
would need to remove the default="true" setting.

Thanks,
Shawn






Re: solrcloud - forward update to a shard failed

2013-11-13 Thread michael.boom
I did something like that also, and i was getting some nasty problems when
one of my clients would try to commit before a commit issued by another one
hadn't yet finish. Might be the same problem for you too.

Try not doing explicit commits fomr the indexing client and instead set the
autocommit to 1000 docs or whichever value fits you best.




-
Thanks,
Michael
--
View this message in context: 
http://lucene.472066.n3.nabble.com/solrcloud-forward-update-to-a-shard-failed-tp4100608p4100670.html
Sent from the Solr - User mailing list archive at Nabble.com.


SOLRJ API to do similar CURL command execution

2013-11-13 Thread Anupam Bhattacharya
I am able to perform the xml atomic update properly using curl commands.
However the moment I try to achieve the same using the solrj APIs I am
facing problems.

What should be the equivalent SOLRJ api code to perform similar action
using the below CURL command ?

curl "http://search1.es.dupont.com:8080/solr/core1/update"; -H
"Content-Type: text/xml" --data-binary "uniqueidupdatefieldvalue"

I have attempted below code but it fails to add the field in proper manner
as it get set as {add=[updatefieldvalue]}.

QueryResponse qs2 = solr.query(params2);
Map> operation = new HashMap>();
List vals = new ArrayList();
vals.add(tag);
SolrInputDocument doc = new SolrInputDocument();
doc.addField("id", (String)qs2.getResults().get(j).get("id"));
operation.put("add",vals);
doc.addField("tags", operation);

Thanks in advance for any inputs.

Regards
Anupam


Re: SOLRJ API to do similar CURL command execution

2013-11-13 Thread Anupam Bhattacharya
How can I post the whole XML string to SOLR using its SOLRJ API ?


On Wed, Nov 13, 2013 at 6:50 PM, Anupam Bhattacharya wrote:

> I am able to perform the xml atomic update properly using curl commands.
> However the moment I try to achieve the same using the solrj APIs I am
> facing problems.
>
> What should be the equivalent SOLRJ api code to perform similar action
> using the below CURL command ?
>
> curl "http://search1.es.dupont.com:8080/solr/core1/update"; -H
> "Content-Type: text/xml" --data-binary " name=\"id\">uniqueid update=\"add\">updatefieldvalue"
>
> I have attempted below code but it fails to add the field in proper manner
> as it get set as {add=[updatefieldvalue]}.
>
> QueryResponse qs2 = solr.query(params2);
> Map> operation = new HashMap>();
> List vals = new ArrayList();
> vals.add(tag);
> SolrInputDocument doc = new SolrInputDocument();
> doc.addField("id", (String)qs2.getResults().get(j).get("id"));
> operation.put("add",vals);
> doc.addField("tags", operation);
>
> Thanks in advance for any inputs.
>
> Regards
> Anupam
>



-- 
Thanks & Regards
Anupam Bhattacharya


Updating an entry in Solr

2013-11-13 Thread gohome190
Hi,
I've been researching how to update a specific field of an entry in Solr,
and it seems like the only way to do this is a delete then an add.  Is there
a better way to do this?  If I want to change one field, do I have to store
the whole entry locally, delete it from the solr index, and then add it with
the new field? That seems like a big missing feature if so!

Thanks
Zach



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Updating-an-entry-in-Solr-tp4100674.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Updating an entry in Solr

2013-11-13 Thread gohome190
Okay, so I've found in the solr tutorial that if you do a POST command and
post a new entry with the same uniquekey (in my case, id_) as an entry
already in the index, solr will automatically replace it for you.  That
seems to be what I need, right?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Updating-an-entry-in-Solr-tp4100674p4100675.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SOLRJ API to do similar CURL command execution

2013-11-13 Thread Koji Sekiguchi

(13/11/13 22:25), Anupam Bhattacharya wrote:

How can I post the whole XML string to SOLR using its SOLRJ API ?




The source code of SimplePostTool would be of some help:

http://lucene.apache.org/solr/4_5_1/solr-core/org/apache/solr/util/SimplePostTool.html

koji
--
http://soleami.com/blog/automatically-acquiring-synonym-knowledge-from-wikipedia.html


Re: Updating an entry in Solr

2013-11-13 Thread primoz . skale
Yes, that's correct. You can also update document "per field" but all 
fields need to be stored=true, because Solr (version >= 4.0) first gets 
your document from the index, creates new document with modified field, 
and adds it again to the index...

Primoz



From:   gohome190 
To: solr-user@lucene.apache.org
Date:   13.11.2013 14:39
Subject:Re: Updating an entry in Solr



Okay, so I've found in the solr tutorial that if you do a POST command and
post a new entry with the same uniquekey (in my case, id_) as an entry
already in the index, solr will automatically replace it for you.  That
seems to be what I need, right?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Updating-an-entry-in-Solr-tp4100674p4100675.html

Sent from the Solr - User mailing list archive at Nabble.com.



RE: Data Import Handler

2013-11-13 Thread Ramesh
James can elaborate how to process driver="${dataimporter.request.driver}" 
url ="${dataimporter.request.url}" and all where to mention these 
my purpose is to config my DB Details(url,uname,password) in properties file

-Original Message-
From: Dyer, James [mailto:james.d...@ingramcontent.com] 
Sent: Wednesday, November 06, 2013 7:42 PM
To: solr-user@lucene.apache.org
Subject: RE: Data Import Handler

If you prepend the variable name with "dataimporter.request", you can
include variables like these as request parameters:



/dih?driver=some.driver.class&url=jdbc:url:something

If you want to include these in solrcore.properties, you can additionally
add each property to solrconfig.xml like this:



${dih.driver}
${dih.url}



Then in solrcore.properties:
 dih.driver=some.driver.class
 dih.url=jdbc:url:something

See http://wiki.apache.org/solr/SolrConfigXml?#System_property_substitution


James Dyer
Ingram Content Group
(615) 213-4311

-Original Message-
From: Ramesh [mailto:ramesh.po...@vensaiinc.com]
Sent: Wednesday, November 06, 2013 7:25 AM
To: solr-user@lucene.apache.org
Subject: Data Import Handler

Hi Folks,

 

Can anyone suggest me how can customize dataconfig.xml file 

I want to provide database details like( db_url,uname,password ) from my own
properties file instead of dataconfig.xaml file





Re: Updating an entry in Solr

2013-11-13 Thread Furkan KAMACI
You should read here: http://wiki.apache.org/solr/Atomic_Updates


2013/11/13 

> Yes, that's correct. You can also update document "per field" but all
> fields need to be stored=true, because Solr (version >= 4.0) first gets
> your document from the index, creates new document with modified field,
> and adds it again to the index...
>
> Primoz
>
>
>
> From:   gohome190 
> To: solr-user@lucene.apache.org
> Date:   13.11.2013 14:39
> Subject:Re: Updating an entry in Solr
>
>
>
> Okay, so I've found in the solr tutorial that if you do a POST command and
> post a new entry with the same uniquekey (in my case, id_) as an entry
> already in the index, solr will automatically replace it for you.  That
> seems to be what I need, right?
>
>
>
> --
> View this message in context:
>
> http://lucene.472066.n3.nabble.com/Updating-an-entry-in-Solr-tp4100674p4100675.html
>
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


RE: Data Import Handler

2013-11-13 Thread Dyer, James
In solrcore.properties, put:

datasource.url=jdbc:xxx:yyy
datasource.driver=com.some.driver

In solrconfig.xml, put:



... 
${datasource.driver}
${datasource.url}
...



In data-config.xml, put:


Hope this works for you.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Ramesh [mailto:ramesh.po...@vensaiinc.com] 
Sent: Wednesday, November 13, 2013 9:00 AM
To: solr-user@lucene.apache.org
Subject: RE: Data Import Handler

James can elaborate how to process driver="${dataimporter.request.driver}" 
url ="${dataimporter.request.url}" and all where to mention these 
my purpose is to config my DB Details(url,uname,password) in properties file

-Original Message-
From: Dyer, James [mailto:james.d...@ingramcontent.com] 
Sent: Wednesday, November 06, 2013 7:42 PM
To: solr-user@lucene.apache.org
Subject: RE: Data Import Handler

If you prepend the variable name with "dataimporter.request", you can
include variables like these as request parameters:



/dih?driver=some.driver.class&url=jdbc:url:something

If you want to include these in solrcore.properties, you can additionally
add each property to solrconfig.xml like this:



${dih.driver}
${dih.url}



Then in solrcore.properties:
 dih.driver=some.driver.class
 dih.url=jdbc:url:something

See http://wiki.apache.org/solr/SolrConfigXml?#System_property_substitution


James Dyer
Ingram Content Group
(615) 213-4311

-Original Message-
From: Ramesh [mailto:ramesh.po...@vensaiinc.com]
Sent: Wednesday, November 06, 2013 7:25 AM
To: solr-user@lucene.apache.org
Subject: Data Import Handler

Hi Folks,

 

Can anyone suggest me how can customize dataconfig.xml file 

I want to provide database details like( db_url,uname,password ) from my own
properties file instead of dataconfig.xaml file







RE: Data Import Handler

2013-11-13 Thread Ramesh
Need to be put out of solr like 

customized Mysolr_core.properties
how to access it

-Original Message-
From: Dyer, James [mailto:james.d...@ingramcontent.com] 
Sent: Wednesday, November 13, 2013 8:50 PM
To: solr-user@lucene.apache.org
Subject: RE: Data Import Handler

In solrcore.properties, put:

datasource.url=jdbc:xxx:yyy
datasource.driver=com.some.driver

In solrconfig.xml, put:



... 
${datasource.driver}
${datasource.url}
...



In data-config.xml, put:


Hope this works for you.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Ramesh [mailto:ramesh.po...@vensaiinc.com]
Sent: Wednesday, November 13, 2013 9:00 AM
To: solr-user@lucene.apache.org
Subject: RE: Data Import Handler

James can elaborate how to process driver="${dataimporter.request.driver}" 
url ="${dataimporter.request.url}" and all where to mention these my purpose
is to config my DB Details(url,uname,password) in properties file

-Original Message-
From: Dyer, James [mailto:james.d...@ingramcontent.com]
Sent: Wednesday, November 06, 2013 7:42 PM
To: solr-user@lucene.apache.org
Subject: RE: Data Import Handler

If you prepend the variable name with "dataimporter.request", you can
include variables like these as request parameters:



/dih?driver=some.driver.class&url=jdbc:url:something

If you want to include these in solrcore.properties, you can additionally
add each property to solrconfig.xml like this:



${dih.driver}
${dih.url}



Then in solrcore.properties:
 dih.driver=some.driver.class
 dih.url=jdbc:url:something

See http://wiki.apache.org/solr/SolrConfigXml?#System_property_substitution


James Dyer
Ingram Content Group
(615) 213-4311

-Original Message-
From: Ramesh [mailto:ramesh.po...@vensaiinc.com]
Sent: Wednesday, November 06, 2013 7:25 AM
To: solr-user@lucene.apache.org
Subject: Data Import Handler

Hi Folks,

 

Can anyone suggest me how can customize dataconfig.xml file 

I want to provide database details like( db_url,uname,password ) from my own
properties file instead of dataconfig.xaml file









Strange behavior of gap fragmenter on highlighting

2013-11-13 Thread Ing. Jorge Luis Betancourt Gonzalez
I'm seeing a rare behavior of the gap fragmenter on solr 3.6. Right now this is 
my configuration for the gap fragmenter:

  

  150

  

This is the basic configuration, just tweaked the fragsize parameter to get 
shorter fragments. The thing is that for 1 particular PDF document in my 
results I get a really long snippet, way over 150 characters. This get a little 
more odd, if I change the 150 value for 100 the snippet for the same document 
it's normal ~ 100 characters. The type of the field being highlighted is this:















Any ideas about what's happening?? Or how could I debug what is really going 
on??

Greetings!

III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu


Re: High disk IO during UpdateCSV

2013-11-13 Thread Utkarsh Sengar
Bumping this one again, any suggestions?


On Tue, Nov 12, 2013 at 3:58 PM, Utkarsh Sengar wrote:

> Hello,
>
> I load data from csv to solr via UpdateCSV. There are about 50M documents
> with 10 columns in each document. The index size is about 15GB and I am
> using a 3 node distributed solr cluster.
>
> While loading the data the disk IO goes to 100%. if the load balancer in
> front of solr hits the machine which is doing the processing then the
> request times out. But in general, requests to all the machines become
> slow. I have attached a screenshot of the diskI/O and CPU usage.
>
> Is there a fix in solr which can possibly throttle the load or maybe its
> due to MergePolicy? How can I debug solr to get the exact cause?
>
> --
> Thanks,
> -Utkarsh
>



-- 
Thanks,
-Utkarsh


Re: High disk IO during UpdateCSV

2013-11-13 Thread Michael Della Bitta
Utkarsh,

Your screenshot didn't come through. I don't think this list allows
attachments. Maybe put it up on imgur or something?

I'm a little unclear on whether you're using Solr in Cloud mode, or with a
single master.

Michael Della Bitta

Applications Developer

o: +1 646 532 3062  | c: +1 917 477 7906

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions  | g+:
plus.google.com/appinions
w: appinions.com 


On Wed, Nov 13, 2013 at 11:22 AM, Utkarsh Sengar wrote:

> Bumping this one again, any suggestions?
>
>
> On Tue, Nov 12, 2013 at 3:58 PM, Utkarsh Sengar  >wrote:
>
> > Hello,
> >
> > I load data from csv to solr via UpdateCSV. There are about 50M documents
> > with 10 columns in each document. The index size is about 15GB and I am
> > using a 3 node distributed solr cluster.
> >
> > While loading the data the disk IO goes to 100%. if the load balancer in
> > front of solr hits the machine which is doing the processing then the
> > request times out. But in general, requests to all the machines become
> > slow. I have attached a screenshot of the diskI/O and CPU usage.
> >
> > Is there a fix in solr which can possibly throttle the load or maybe its
> > due to MergePolicy? How can I debug solr to get the exact cause?
> >
> > --
> > Thanks,
> > -Utkarsh
> >
>
>
>
> --
> Thanks,
> -Utkarsh
>


Re: High disk IO during UpdateCSV

2013-11-13 Thread Utkarsh Sengar
Hi Michael,

I am using solr cloud 4.5.
And update csv loads data to one of these nodes.
Attachment: http://i.imgur.com/1xmoNtt.png


Thanks,
-Utkarsh


On Wed, Nov 13, 2013 at 8:33 AM, Michael Della Bitta <
michael.della.bi...@appinions.com> wrote:

> Utkarsh,
>
> Your screenshot didn't come through. I don't think this list allows
> attachments. Maybe put it up on imgur or something?
>
> I'm a little unclear on whether you're using Solr in Cloud mode, or with a
> single master.
>
> Michael Della Bitta
>
> Applications Developer
>
> o: +1 646 532 3062  | c: +1 917 477 7906
>
> appinions inc.
>
> “The Science of Influence Marketing”
>
> 18 East 41st Street
>
> New York, NY 10017
>
> t: @appinions  | g+:
> plus.google.com/appinions<
> https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
> >
> w: appinions.com 
>
>
> On Wed, Nov 13, 2013 at 11:22 AM, Utkarsh Sengar  >wrote:
>
> > Bumping this one again, any suggestions?
> >
> >
> > On Tue, Nov 12, 2013 at 3:58 PM, Utkarsh Sengar  > >wrote:
> >
> > > Hello,
> > >
> > > I load data from csv to solr via UpdateCSV. There are about 50M
> documents
> > > with 10 columns in each document. The index size is about 15GB and I am
> > > using a 3 node distributed solr cluster.
> > >
> > > While loading the data the disk IO goes to 100%. if the load balancer
> in
> > > front of solr hits the machine which is doing the processing then the
> > > request times out. But in general, requests to all the machines become
> > > slow. I have attached a screenshot of the diskI/O and CPU usage.
> > >
> > > Is there a fix in solr which can possibly throttle the load or maybe
> its
> > > due to MergePolicy? How can I debug solr to get the exact cause?
> > >
> > > --
> > > Thanks,
> > > -Utkarsh
> > >
> >
> >
> >
> > --
> > Thanks,
> > -Utkarsh
> >
>



-- 
Thanks,
-Utkarsh


Re: High disk IO during UpdateCSV

2013-11-13 Thread Walter Underwood
Don't load 50M documents in one shot. Break it up into reasonable chunks 
(100K?) with commits at each point.

You will have a bottleneck somewhere, usually disk or CPU. Yours appears to be 
disk. If you get faster disks, it might become the CPU.

wunder

On Nov 13, 2013, at 8:22 AM, Utkarsh Sengar  wrote:

> Bumping this one again, any suggestions?
> 
> 
> On Tue, Nov 12, 2013 at 3:58 PM, Utkarsh Sengar wrote:
> 
>> Hello,
>> 
>> I load data from csv to solr via UpdateCSV. There are about 50M documents
>> with 10 columns in each document. The index size is about 15GB and I am
>> using a 3 node distributed solr cluster.
>> 
>> While loading the data the disk IO goes to 100%. if the load balancer in
>> front of solr hits the machine which is doing the processing then the
>> request times out. But in general, requests to all the machines become
>> slow. I have attached a screenshot of the diskI/O and CPU usage.
>> 
>> Is there a fix in solr which can possibly throttle the load or maybe its
>> due to MergePolicy? How can I debug solr to get the exact cause?
>> 
>> --
>> Thanks,
>> -Utkarsh
>> 
> 
> 
> 
> -- 
> Thanks,
> -Utkarsh

--
Walter Underwood
wun...@wunderwood.org





Re: Why do people want to deploy to Tomcat?

2013-11-13 Thread Shawn Heisey

On 11/13/2013 5:29 AM, Dmitry Kan wrote:

Reading that people have considered deploying "example" folder is slightly
strange to me. No wonder they are confused and confuse their ops.


I do use the stripped jetty included in the example, but my setup is not 
a straight copy of the example directory. I removed a lot of it and 
changed how jars get loaded.  I built my own init script from scratch, 
tailored for my setup.


I'll start a new thread with my init script and some info about how I 
installed Solr.


Thanks,
Shawn



Re: Why do people want to deploy to Tomcat?

2013-11-13 Thread Mark Miller
RE: the example folder

It’s something I’ve been pushing towards moving away from for a long time - see 
https://issues.apache.org/jira/browse/SOLR-3619 Rename 'example' dir to 
'server' and pull examples into an 'examples’ directory

Part of a push I’ve been on to own the Container level (people are now on board 
with that for 5.0), add start scripts, and other niceties that we should have 
but don’t yet.

Even our config files should move away from being an “example” and end up more 
like a default starting template. Like a database, it should be simple to 
create a collection without needing to deal with config - you want to deal with 
the config when you need to, not face it all up front every time it is time to 
create a new collection.

IMO, the name example is historical - most people already use it this way, the 
name just confuses matters.

- Mark


On Nov 13, 2013, at 12:30 PM, Shawn Heisey  wrote:

> On 11/13/2013 5:29 AM, Dmitry Kan wrote:
>> Reading that people have considered deploying "example" folder is slightly
>> strange to me. No wonder they are confused and confuse their ops.
> 
> I do use the stripped jetty included in the example, but my setup is not a 
> straight copy of the example directory. I removed a lot of it and changed how 
> jars get loaded.  I built my own init script from scratch, tailored for my 
> setup.
> 
> I'll start a new thread with my init script and some info about how I 
> installed Solr.
> 
> Thanks,
> Shawn
> 



Re: My setup - init script and other info

2013-11-13 Thread Palmer, Eric
Thank you. This will help me a lot. 

Sent from my iPhone

On Nov 13, 2013, at 10:08 AM, "Shawn Heisey"  wrote:

> In the hopes that it will help someone get Solr running in a very clean way, 
> here's an informational email.
> 
> For my Solr install on CentOS 6, I use /opt/solr4 as my installation path, 
> and /index/solr4 as my solr home.  The /index directory is a dedicated 
> filesystem, /opt is part of the root filesystem.
> 
> From the example directory, I copied cloud-scripts, contexts, etc, lib, 
> webapps, and start.jar over to /opt/solr4.  My stuff was created before 
> 4.3.0, so the resources directory didn't exist.  I was already using log4j 
> with a custom Solr build, and I put my log4j.properties file in etc instead.  
> I created a logs directory and a run directory in /opt/solr4.
> 
> My data structure in /index/solr4 is complex.  All a new user really needs to 
> know is that solr.xml goes here and dictates the rest of the structure.  
> There is a symlink at /index/solr4/lib, pointing to /opt/solr4/solrlib - so 
> that jars placed in ${solr.solr.home}/lib are actually located in the program 
> directory, not the data directory.  That makes for a much cleaner version 
> control scenario - both directories are git repositories cloned from our 
> internal git server.
> 
> Unlike the example configs, my solrconfig.xml files do not have  
> directives for loading jars.  That gets automatically handled by the jars 
> living in that symlinked lib directory.  See SOLR-4852 for caveats regarding 
> central lib directories.
> 
> https://issues.apache.org/jira/browse/SOLR-4852
> 
> If you want to run SolrCloud, you would need to install zookeeper separately 
> and put your zkHost parameter in solr.xml.  Due to a bug, putting zkHost in 
> solr.xml doesn't work properly until 4.4.0.
> 
> Here's the current state of my init script.  It's redhat-specific.  I used 
> /bin/bash (instead of /bin/sh) in the shebang because I am pretty sure that 
> there are bash-isms in it, and bash is always available on the systems that I 
> use:
> 
> http://apaste.info/9fVA
> 
> Notable features:
> * Runs Solr as an unprivileged user.
> * Has three methods for stopping Solr, tries graceful methods first.
> 1) The jetty STOPPORT/STOPKEY mechanism.
> 2) PID saved by the 'start' action.
> 3) Any program using the Solr listening port.
> * Before killing by PID, tries to make sure that the process actually is Solr.
> * Sets up remote JMX, by default without authentication or SSL.
> * Highly tuned CMS garbage collection.
> * Sets up GC logging.
> * Virtually everything is overridable via /etc/sysconfig/solr4.
> * Points at an overridable log4j config file, by default in /opt/solr4/etc.
> * Removes the existing PID file if the server is just booting up -- which it 
> knows by noting that server uptime is less than three minutes.
> 
> It shouldn't be too hard to convert this so it works on debian-derived 
> systems.  That would involve rewriting portions that use redhat init 
> routines, and probably start-stop-daemon. What I'd really like is one script 
> that will work on any system, but that will require a fair amount of work.
> 
> It's a work in progress.  It should load log4j.properties from resources 
> instead of etc. I'd like to include it in the Solr download, but without a 
> fair amount of documentation and possibly an installation script, which still 
> must be written, that won't be possible.
> 
> Feel free to ask questions about anything that doesn't seem clear. I welcome 
> ideas for improvement on both my own setup and the solr example.
> 
> Thanks,
> Shawn
> 


Atomic Update at Solrj For a Newly Added Schema Field

2013-11-13 Thread Furkan KAMACI
I use Solr 4.5.1 I have indexed some documents and decided to add a new
field to my schema after a time later. I want to use Atomic Updates for
that newly added field. I use Solrj for indexing. However due to there is
no field named as I've newly added Solr does not make an atomic update for
existing documents. I do not want to reindex my whole data. Any ideas for
it?


My setup - init script and other info

2013-11-13 Thread Shawn Heisey
In the hopes that it will help someone get Solr running in a very clean 
way, here's an informational email.


For my Solr install on CentOS 6, I use /opt/solr4 as my installation 
path, and /index/solr4 as my solr home.  The /index directory is a 
dedicated filesystem, /opt is part of the root filesystem.


From the example directory, I copied cloud-scripts, contexts, etc, lib, 
webapps, and start.jar over to /opt/solr4.  My stuff was created before 
4.3.0, so the resources directory didn't exist.  I was already using 
log4j with a custom Solr build, and I put my log4j.properties file in 
etc instead.  I created a logs directory and a run directory in /opt/solr4.


My data structure in /index/solr4 is complex.  All a new user really 
needs to know is that solr.xml goes here and dictates the rest of the 
structure.  There is a symlink at /index/solr4/lib, pointing to 
/opt/solr4/solrlib - so that jars placed in ${solr.solr.home}/lib are 
actually located in the program directory, not the data directory.  That 
makes for a much cleaner version control scenario - both directories are 
git repositories cloned from our internal git server.


Unlike the example configs, my solrconfig.xml files do not have  
directives for loading jars.  That gets automatically handled by the 
jars living in that symlinked lib directory.  See SOLR-4852 for caveats 
regarding central lib directories.


https://issues.apache.org/jira/browse/SOLR-4852

If you want to run SolrCloud, you would need to install zookeeper 
separately and put your zkHost parameter in solr.xml.  Due to a bug, 
putting zkHost in solr.xml doesn't work properly until 4.4.0.


Here's the current state of my init script.  It's redhat-specific.  I 
used /bin/bash (instead of /bin/sh) in the shebang because I am pretty 
sure that there are bash-isms in it, and bash is always available on the 
systems that I use:


http://apaste.info/9fVA

Notable features:
* Runs Solr as an unprivileged user.
* Has three methods for stopping Solr, tries graceful methods first.
 1) The jetty STOPPORT/STOPKEY mechanism.
 2) PID saved by the 'start' action.
 3) Any program using the Solr listening port.
* Before killing by PID, tries to make sure that the process actually is 
Solr.

* Sets up remote JMX, by default without authentication or SSL.
* Highly tuned CMS garbage collection.
* Sets up GC logging.
* Virtually everything is overridable via /etc/sysconfig/solr4.
* Points at an overridable log4j config file, by default in /opt/solr4/etc.
* Removes the existing PID file if the server is just booting up -- 
which it knows by noting that server uptime is less than three minutes.


It shouldn't be too hard to convert this so it works on debian-derived 
systems.  That would involve rewriting portions that use redhat init 
routines, and probably start-stop-daemon. What I'd really like is one 
script that will work on any system, but that will require a fair amount 
of work.


It's a work in progress.  It should load log4j.properties from resources 
instead of etc. I'd like to include it in the Solr download, but without 
a fair amount of documentation and possibly an installation script, 
which still must be written, that won't be possible.


Feel free to ask questions about anything that doesn't seem clear. I 
welcome ideas for improvement on both my own setup and the solr example.


Thanks,
Shawn



Using data-config.xml from DIH in SolrJ

2013-11-13 Thread P Williams
Hi All,

I'm building a utility (Java jar) to create SolrInputDocuments and send
them to a HttpSolrServer using the SolrJ API.  The intention is to find an
efficient way to create documents from a large directory of files (where
multiple files make one Solr document) and be sent to a remote Solr
instance for update and commit.

I've already solved the problem using the DataImportHandler (DIH) so I have
a data-config.xml that describes the templated fields and cross-walking of
the source(s) to the schema.  The original data won't always be able to be
co-located with the Solr server which is why I'm looking for another option.

I've also already solved the problem using ant and xslt to create a
temporary (and unfortunately a potentially large) document which the
UpdateHandler will accept.  I couldn't think of a solution that took
advantage of the XSLT support in the UpdateHandler because each document is
created from multiple files.  Our current dated Java based solution
significantly outperforms this solution in terms of disk and time.  I've
rejected it based on that and gone back to the drawing board.

Does anyone have any suggestions on how I might be able to reuse my DIH
configuration in the SolrJ context without re-inventing the wheel (or DIH
in this case)?  If I'm doing something ridiculous I hope you'll point that
out too.

Thanks,
Tricia


Re: collections API error

2013-11-13 Thread Mark Miller
Try Solr 4.5.1.

https://issues.apache.org/jira/browse/SOLR-5306  Extra collection creation 
parameters like collection.configName are not being respected.

- Mark

On Nov 13, 2013, at 2:24 PM, Christopher Gross  wrote:

> Running Apache Solr 4.5 on Tomcat 7.0.29, Java 1.6_30.  3 SolrCloud nodes
> running.  5 ZK nodes (v 3.4.5), one on each SolrCloud server, and on 2
> other servers.
> 
> I want to create a collection on all 3 nodes.  I only need 1 shard.  The
> config is in Zookeeper (another collection is using it)
> 
> http://solrserver:8080/solr/admin/collections?action=CREATE&name=newtest&numShards=1&replicationFactor=3&collection.configName=test
> 
> I get this error (3 times, though for a different replica #)
> org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:Error
> CREATEing SolrCore 'newtest_shard1_replica2': Unable to create core:
> newtest_shard1_replica2
> 
> The SolrCloud Admin logs give this as the root error:
> 
> Caused by: org.apache.solr.common.cloud.ZooKeeperException: Specified
> config does not exist in ZooKeeper:newtest
> 
> You can see from my call that I don't want it to be called "test" (already
> have one) but I want to make a new instance of the "test" collection.
> 
> This seems  pretty straightforward -- what am I missing?  Did the
> parameters change and the wiki not get updated?
> [
> http://wiki.apache.org/solr/SolrCloud#Managing_collections_via_the_Collections_API
> ]
> 
> Thanks.
> 
> -- Chris



collections API error

2013-11-13 Thread Christopher Gross
Running Apache Solr 4.5 on Tomcat 7.0.29, Java 1.6_30.  3 SolrCloud nodes
running.  5 ZK nodes (v 3.4.5), one on each SolrCloud server, and on 2
other servers.

I want to create a collection on all 3 nodes.  I only need 1 shard.  The
config is in Zookeeper (another collection is using it)

http://solrserver:8080/solr/admin/collections?action=CREATE&name=newtest&numShards=1&replicationFactor=3&collection.configName=test

I get this error (3 times, though for a different replica #)
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:Error
CREATEing SolrCore 'newtest_shard1_replica2': Unable to create core:
newtest_shard1_replica2

The SolrCloud Admin logs give this as the root error:

Caused by: org.apache.solr.common.cloud.ZooKeeperException: Specified
config does not exist in ZooKeeper:newtest

You can see from my call that I don't want it to be called "test" (already
have one) but I want to make a new instance of the "test" collection.

This seems  pretty straightforward -- what am I missing?  Did the
parameters change and the wiki not get updated?
 [
http://wiki.apache.org/solr/SolrCloud#Managing_collections_via_the_Collections_API
]

Thanks.

-- Chris


field collapsing performance in sharded environment

2013-11-13 Thread David Anthony Troiano
Hello,

I'm hitting a performance issue when using field collapsing in a
distributed Solr setup and I'm wondering if others have seen it and if
anyone has an idea to work around. it.

I'm using field collapsing to deduplicate documents that have the same near
duplicate hash value, and deduplicating at query time (as opposed to
filtering at index time) is a requirement.  I have a sharded setup with 10
cores (not SolrCloud), each having ~1000 documents each.  Of the 10k docs,
most have a unique near duplicate hash value, so there are about 10k unique
values for the field that I'm grouping on.  The grouping parameters that
I'm using are:

group=true
group.field=
group.main=true

I'm attempting distributed queries (&shards=s1,s2,...,s10) where the only
difference is the absence or presence of these three grouping parameters
and I'm consistently seeing a marked difference in performance (as a
representative data point, 200ms latency without grouping and 1600ms with
grouping).  Interestingly, if I put all 10k docs on the same core and query
that core independently with and without grouping, I don't see much of a
latency difference, so the performance degradation seems to exist only in
the sharded setup.

Is there a known performance issue when field collapsing in a sharded setup
(perhaps only manifests when the grouping field has many unique values), or
have other people observed this?  Any ideas for a workaround?  Note that
docs in my sharded setup can only have the same signature if they're in the
same shard, so perhaps that can be used to boost perf, though I don't see
an exposed way to do so.

A follow-on question is whether we're likely to see the same issue if /
when we move to SolrCloud.

Thanks,
Dave


How to escape special characters from SOLR response header

2013-11-13 Thread Developer
I am trying to escape special characters from SOLR response header (to
prevent cross site scripting).

I couldn't find any method in SolrQueryResponse to get just the SOLR
response header. 

Can someone let me know if there is a way to modify the SOLR response
header?





--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-escape-special-characters-from-SOLR-response-header-tp4100772.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to escape special characters from SOLR response header

2013-11-13 Thread Erik Hatcher
I'm not quite sure what you're trying to do here, can you please elaborate with 
an example?

But, you can get the response header from a SolrQueryResponse using the 
getResponseHeader() method.

Erik

On Nov 13, 2013, at 3:21 PM, Developer  wrote:

> I am trying to escape special characters from SOLR response header (to
> prevent cross site scripting).
> 
> I couldn't find any method in SolrQueryResponse to get just the SOLR
> response header. 
> 
> Can someone let me know if there is a way to modify the SOLR response
> header?
> 
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/How-to-escape-special-characters-from-SOLR-response-header-tp4100772.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: distributed search is significantly slower than direct search

2013-11-13 Thread Manuel Le Normand
It's surprising such a query takes a long time, I would assume that after
trying consistently q=*:* you should be getting cache hits and times should
be faster. Try see in the adminUI how do your query/doc cache perform.
Moreover, the query in itself is just asking the first 5000 docs that were
indexed (returing the first [docid]), so seems all this time is wasted on
transfer. Out of these 7 secs how much is spent on the above method? What
do you return by default? How big is every doc you display in your results?
Might be the matter that both collections work on the same ressources. Try
elaborating your use-case.

Anyway, it seems like you just made a test to see what will be the
performance hit in a distributed environment so I'll try to explain some
things we encountered in our benchmarks, with a case that has at least the
similarity of the num of docs fetched.

We reclaim 2000 docs every query, running over 40 shards. This means every
shard is actually transfering to our frontend 2000 docs every
document-match request (the first you were referring to). Even if lazily
loaded, reading 2000 id's (on 40 servers) and lazy loading the fields is a
tough job. Waiting for the slowest shard to respond, then sorting the docs
and reloading (lazy or not) the top 2000 docs might take a long time.

Our times are 4-8 secs, but do it's not possible comparing cases. We've
done few steps that improved it along the way, steps that led to others.
These were our starters:

   1. Profile these queries from different servers and solr instances, try
   putting your finger what collection is working hard and why. Check if
   you're stuck on components that don't have an added value for you but are
   used by default.
   2. Consider eliminating the doc cache. It loads lots of (partly) lazy
   documents that their probability of secondary usage is low. There's no such
   thing "popular docs" when requesting so many docs. You may be using your
   memory in a better way.
   3. Bottleneck check - inner server metrics as cpu user / iowait, packets
   transferred over the network, page faults etc. are excellent in order to
   understand if the disk/network/cpu is slowing you down. Then upgrade
   hardware in one of the shards to check if it helps by looking at the
   upgraded shard qTime compared to other.
   4. Warm up the index after commiting - try to benchmark how do queries
   performs before and after some warm-up, let's say some few hundreds of
   queries (from your previous system) in order to warm up the os cache
   (assuming your using NRTDirectoryFactory)


Good luck,
Manu


On Wed, Nov 13, 2013 at 2:38 PM, Erick Erickson wrote:

> One thing you can try, and this is more diagnostic than a cure, is return
> just
> the id field (and insure that lazy field loading is true). That'll tell you
> whether
> the issue is actually fetching the document off disk and decompressing,
> although
> frankly that's unlikely since you can get your 5,000 rows from a single
> machine
> quickly.
>
> The code you found where Solr is spending its time, is that on the
> "routing" core
> or on the shards? I actually have a hard time understanding how that
> code could take a long time, doesn't seem right.
>
> You are transferring 5,000 docs across the network, so it's possible that
> your network is just slow, that's certainly a difference between the local
> and remote case, but that's a stab in the dark.
>
> Not much help I know,
> Erick
>
>
>
> On Wed, Nov 13, 2013 at 2:52 AM, Elran Dvir  wrote:
>
> > Erick, Thanks for your response.
> >
> > We are upgrading our system using Solr.
> > We need to preserve old functionality.  Our client displays 5K document
> > and groups them.
> >
> > Is there a way to refactor code in order to improve distributed documents
> > fetching?
> >
> > Thanks.
> >
> > -Original Message-
> > From: Erick Erickson [mailto:erickerick...@gmail.com]
> > Sent: Wednesday, October 30, 2013 3:17 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: distributed search is significantly slower than direct
> search
> >
> > You can't. There will inevitably be some overhead in the distributed
> case.
> > That said, 7 seconds is quite long.
> >
> > 5,000 rows is excessive, and probably where your issue is. You're having
> > to go out and fetch the docs across the wire. Perhaps there is some
> > batching that could be done there, I don't know whether this is one
> > document per request or not.
> >
> > Why 5K docs?
> >
> > Best,
> > Erick
> >
> >
> > On Tue, Oct 29, 2013 at 2:54 AM, Elran Dvir 
> wrote:
> >
> > > Hi all,
> > >
> > > I am using Solr 4.4 with multi cores. One core (called template) is my
> > > "routing" core.
> > >
> > > When I run
> > > http://127.0.0.1:8983/solr/template/select?rows=5000&q=*:*&shards=127.
> > > 0.0.1:8983/solr/core1,
> > > it consistently takes about 7s.
> > > When I run http://127.0.0.1:8983/solr/core1/select?rows=5000&q=*:*, it
> > > consistently takes about 40ms.
> > >
> > > I profiled the di

Re: High disk IO during UpdateCSV

2013-11-13 Thread Utkarsh Sengar
Thanks guys!
I will start splitting the file in chunks of 5M (10 chunks) to start with
reduce the size if needed.

Thanks,
-Utkarsh


On Wed, Nov 13, 2013 at 9:08 AM, Walter Underwood wrote:

> Don't load 50M documents in one shot. Break it up into reasonable chunks
> (100K?) with commits at each point.
>
> You will have a bottleneck somewhere, usually disk or CPU. Yours appears
> to be disk. If you get faster disks, it might become the CPU.
>
> wunder
>
> On Nov 13, 2013, at 8:22 AM, Utkarsh Sengar  wrote:
>
> > Bumping this one again, any suggestions?
> >
> >
> > On Tue, Nov 12, 2013 at 3:58 PM, Utkarsh Sengar  >wrote:
> >
> >> Hello,
> >>
> >> I load data from csv to solr via UpdateCSV. There are about 50M
> documents
> >> with 10 columns in each document. The index size is about 15GB and I am
> >> using a 3 node distributed solr cluster.
> >>
> >> While loading the data the disk IO goes to 100%. if the load balancer in
> >> front of solr hits the machine which is doing the processing then the
> >> request times out. But in general, requests to all the machines become
> >> slow. I have attached a screenshot of the diskI/O and CPU usage.
> >>
> >> Is there a fix in solr which can possibly throttle the load or maybe its
> >> due to MergePolicy? How can I debug solr to get the exact cause?
> >>
> >> --
> >> Thanks,
> >> -Utkarsh
> >>
> >
> >
> >
> > --
> > Thanks,
> > -Utkarsh
>
> --
> Walter Underwood
> wun...@wunderwood.org
>
>
>
>


-- 
Thanks,
-Utkarsh


Re: Why do people want to deploy to Tomcat?

2013-11-13 Thread Robert Muir
which example? there are so many.

On Wed, Nov 13, 2013 at 1:00 PM, Mark Miller  wrote:
> RE: the example folder
>
> It’s something I’ve been pushing towards moving away from for a long time - 
> see https://issues.apache.org/jira/browse/SOLR-3619 Rename 'example' dir to 
> 'server' and pull examples into an 'examples’ directory
>
> Part of a push I’ve been on to own the Container level (people are now on 
> board with that for 5.0), add start scripts, and other niceties that we 
> should have but don’t yet.
>
> Even our config files should move away from being an “example” and end up 
> more like a default starting template. Like a database, it should be simple 
> to create a collection without needing to deal with config - you want to deal 
> with the config when you need to, not face it all up front every time it is 
> time to create a new collection.
>
> IMO, the name example is historical - most people already use it this way, 
> the name just confuses matters.
>
> - Mark
>
>
> On Nov 13, 2013, at 12:30 PM, Shawn Heisey  wrote:
>
>> On 11/13/2013 5:29 AM, Dmitry Kan wrote:
>>> Reading that people have considered deploying "example" folder is slightly
>>> strange to me. No wonder they are confused and confuse their ops.
>>
>> I do use the stripped jetty included in the example, but my setup is not a 
>> straight copy of the example directory. I removed a lot of it and changed 
>> how jars get loaded.  I built my own init script from scratch, tailored for 
>> my setup.
>>
>> I'll start a new thread with my init script and some info about how I 
>> installed Solr.
>>
>> Thanks,
>> Shawn
>>
>


queries including time zone

2013-11-13 Thread Eric Katherman
Can anybody provide any insight about using the tz param? The behavior of this 
isn't affecting date math and /day rounding.  What format does the tz variables 
need to be in?  Not finding any documentation on this.

Sample query we're using:

path=/select 
params={tz=America/Chicago&sort=id+desc&start=0&q=application_id:51b30ed9bc571bd96773f09c+AND+object_key:object_26+AND+values_field_215_date:[*+TO+NOW/DAY%2B1DAY]&wt=json&rows=25}

Thanks!
Eric

Re: queries including time zone

2013-11-13 Thread Jack Krupansky

I believe it is the TZ column from this table:
http://en.wikipedia.org/wiki/List_of_tz_database_time_zones

Yeah, it's on my TODO list for my book.

I suspect that "tz" will not affect "NOW", which is probably UTC. I suspect 
that "tz" only affects literal dates in date math.


-- Jack Krupansky

-Original Message- 
From: Eric Katherman

Sent: Wednesday, November 13, 2013 11:38 PM
To: solr-user@lucene.apache.org
Subject: queries including time zone

Can anybody provide any insight about using the tz param? The behavior of 
this isn't affecting date math and /day rounding.  What format does the tz 
variables need to be in?  Not finding any documentation on this.


Sample query we're using:

path=/select 
params={tz=America/Chicago&sort=id+desc&start=0&q=application_id:51b30ed9bc571bd96773f09c+AND+object_key:object_26+AND+values_field_215_date:[*+TO+NOW/DAY%2B1DAY]&wt=json&rows=25}


Thanks!
Eric= 



(info)about lucene search performents

2013-11-13 Thread Jacky.J.Wang (mis.cnsh04.Newegg) 41361
Dear lucene

I find a question that lucene search performent,first search is very slowly and 
secondary search is very quick
I use MMapDirectoryFactory in solrconfig.xml (I have already banned all solr 
cache for testing lucene search peforments )

Call mmap () is the kernel just logical addresses to physical address mapping 
table is established, and without any data to the memory mapping

Should be madvise () and mmap () match ,but MMapDirectoryFactory no madvise 
method



I find a jrra(LUCENE-3178) , I don't know if I can solve this problem







(info) about lucene search performents

2013-11-13 Thread Jacky.J.Wang (mis.cnsh04.Newegg) 41361
Dear lucene

I find a question that lucene search performent,first search is very slowly and 
secondary search is very quick
I use MMapDirectoryFactory in solrconfig.xml (I have already banned all solr 
cache for testing lucene search peforments )

Call mmap () is the kernel just logical addresses to physical address mapping 
table is established, and without any data to the memory mapping

Should be madvise () and mmap () match ,but MMapDirectoryFactory no madvise 
method



I find a jrra(LUCENE-3178) , I don't know if I can solve this problem