Re: different indexes for multitenant approach

2011-06-03 Thread Chandan Tamrakar
may be you need multi core feature of solr , you can have a single Solr
instance with separate configurations and indexes

http://wiki.apache.org/solr/CoreAdmin



On Fri, Jun 3, 2011 at 12:04 PM, Naveen Gupta  wrote:

> Hi
>
> I want to implement different index strategy where we want to keep indexes
> with respect to each tennant and we want to maintain indexes separately ...
>
> first level of category -- company name
>
> second level of category - company name + fields to be indexed
>
> then further categories - group of different company name based on some
> heuristic (hashing) (if it grows furhter)
>
> i want to do in the same solr instance. can it be possible ?
>
> Thanks
> Naveen
>



-- 
Chandan Tamrakar
*
*


Getting query fields in a custom SearchHandler

2011-06-03 Thread Marc SCHNEIDER
Hi all,

I wrote my own SearchHandler and therefore overrided the handleRequestBody
method.
This method takes two input parameters : SolrQueryRequest and
SolrQueryResponse objects.
The thing I'd like to do is to get the query fields that are used in my
request.
Of course I can use req.getParams().get("q") but it returns the complete
query (which can be very complicated). I'd like to have a simple map with
field:value.
Is there a way to get it? Or do I have to write my own parser for the "q"
parameter?

Thanks in advance,
Marc.


How to search camel case words using CJKTokenizer

2011-06-03 Thread tiffany
Hi all,

I'm using CJKTokenizerFactory tokenizer to handle text which contains both
Japanese and alphabet words.  However, I noticed that CJKTokenizerFactory
converts alphabet to lowercase, so that I cannot use
WordDelimiterFilterFactory filter with splitOnCaseChange property for camel
case words.

I changed to NGramTokenizerFactory (2-gram), but it only parses first 1024
characters. Because of that, I cannot use NGramTokenizerFactory, neither.

I tried the following two settings and both of them seem working fine, but I
don't know if these are good or not, or if there are some other better
solutions.

1)



2)



If anyone can give me any advice, it would be nice.

Thank you.

Tiffany

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-search-camel-case-words-using-CJKTokenizer-tp3018853p3018853.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Query problem in Solr

2011-06-03 Thread Kurt Sultana
@ Pravesh: It's 2 seperate cores, not 2 indexes. Sorry for that.

@ Erick: Yes, I've seen this suggestion and it seems to be the only possible
solution. I'll look into it.

Thanks for your answers guys!
Kurt

On Wed, Jun 1, 2011 at 4:24 PM, Erick Erickson wrote:

> If I read this correctly, one approach is to specify an
> increment gap in a multiValued field, then search for phrases
> with a slop less than that increment gap. i.e.
> incrementGap=100 in your definition, and search for
> "apple orange"~99
>
> If this is gibberish, please post some examples and we'll
> try something else.
>
> Best
> Erick
>
> On Wed, Jun 1, 2011 at 4:21 AM, Kurt Sultana 
> wrote:
> >  Hi all,
> >
> > We're using Solr to search on a Shop index and a Product index. Currently
> a
> > Shop has a field `shop_keyword` which also contains the keywords of the
> > products assigned to it. The shop keywords are separated by a space.
> > Consequently, if there is a product which has a keyword "apple" and
> another
> > which has "orange", a search for shops having `Apple AND Orange` would
> > return the shop for these products.
> >
> > However, this is incorrect since we want that a search for shops having
> > `Apple AND Orange` returns shop(s) having products with both "apple" and
> > "orange" as keywords.
> >
> > We tried solving this problem, by making shop keywords multi-valued and
> > assigning the keywords of every product of the shop as a new value in
> shop
> > keywords. However as was confirmed in another post
> >
> http://markmail.org/thread/xce4qyzs5367yplo#query:+page:1+mid:76eerw5yqev2aanu+state:results
> ,
> > Solr does not support "all words must match in the same value of a
> > multi-valued field".
> >
> > (Hope I explained myself well)
> >
> > How can we go about this? Ideally, we shouldn't change our search
> > infrastructure dramatically.
> >
> > Thanks!
> >
> > Krt_Malta
> >
>


Return stemmed word

2011-06-03 Thread Kurt Sultana
Hi,

We have stemming in our Solr search and we need to retrieve the word/phrase
after stemming. That is if I search for "oranges", through stemming a search
for "orange" is carried out. If I turn on debugQuery I would be able to see
this, however we'd like to access it through the result if possible.
Basically, we need this, because we pass the searched word as a parameter to
a 3rd party application which highlights the word in an online PDF reader.
Currently, if a user searches for "oranges" and a document contains
"orange", then the PDF wouldn't highlight anything since it tries to
highlight "oranges" not "orange".

Thanks all in advance,
Kurt


Re: Strategy --> Frequent updates in our application

2011-06-03 Thread pravesh
You can use DataImportHandler for your full/incremental indexing. Now NRT
indexing could vary as per business requirements (i mean delay cud be 5-mins
,10-mins,15-mins,OR, 30-mins). Then it also depends on how much volume will
be indexed incrementally.
BTW, r u having Master+Slave SOLR setup?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Strategy-Frequent-updates-in-our-application-tp3018386p3019040.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Sorting

2011-06-03 Thread pravesh
BTW, why r u sorting on this field?
You could also index & store this field twice. First, in its original value,
and then second, by encoding to some unique code/hash and index it and sort
on that.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Sorting-tp3017285p3019055.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Sorting algorithm

2011-06-03 Thread Richard Hodsdon
Hi Tomás

Thanks, that makes a lot of sense, and your math is sound.

It is working well. An if() function would be great, and it seems its coming
soon.

Richard

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Sorting-algorithm-tp3014549p3019077.html
Sent from the Solr - User mailing list archive at Nabble.com.


Nullpointer Exception in Solr 4.x in DebugComponent when using wildcard in facet value

2011-06-03 Thread Stefan Moises

Hi,

in Solr 4.x (trunk version of mid may) I have noticed a null pointer 
exception if I activate debugging (debug=true) and use a wildcard to 
filter by facet value, e.g.

if I have a price field

..."&debug=true&facet.field=price&fq=price[500+TO+*]"
I get

SEVERE: java.lang.RuntimeException: java.lang.NullPointerException
at 
org.apache.solr.search.QueryParsing.toString(QueryParsing.java:538)
at 
org.apache.solr.handler.component.DebugComponent.process(DebugComponent.java:77)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:239)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1298)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:353)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at 
org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:465)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at 
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:555)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:852)
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)

at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.NullPointerException
at 
org.apache.solr.search.QueryParsing.toString(QueryParsing.java:402)
at 
org.apache.solr.search.QueryParsing.toString(QueryParsing.java:535)


This used to work in Solr 1.4 and I was wondering if it's a bug or a new 
feature and if there is a trick to get this working again?


Best regards,
Stefan




Re: Nullpointer Exception in Solr 4.x in DebugComponent when using wildcard in facet value

2011-06-03 Thread Stefan Matheis
Stefan,

i guess there is a colon missing? &fq=price:[500+TO+*] should do the trick

Regards
Stefan

On Fri, Jun 3, 2011 at 11:42 AM, Stefan Moises  wrote:
> Hi,
>
> in Solr 4.x (trunk version of mid may) I have noticed a null pointer
> exception if I activate debugging (debug=true) and use a wildcard to filter
> by facet value, e.g.
> if I have a price field
>
> ..."&debug=true&facet.field=price&fq=price[500+TO+*]"
> I get
>
> SEVERE: java.lang.RuntimeException: java.lang.NullPointerException
>        at
> org.apache.solr.search.QueryParsing.toString(QueryParsing.java:538)
>        at
> org.apache.solr.handler.component.DebugComponent.process(DebugComponent.java:77)
>        at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:239)
>        at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
>        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1298)
>        at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:353)
>        at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248)
>        at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
>        at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>        at
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>        at
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
>        at
> org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:465)
>        at
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
>        at
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
>        at
> org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:555)
>        at
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
>        at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
>        at
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:852)
>        at
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
>        at
> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
>        at java.lang.Thread.run(Thread.java:662)
> Caused by: java.lang.NullPointerException
>        at
> org.apache.solr.search.QueryParsing.toString(QueryParsing.java:402)
>        at
> org.apache.solr.search.QueryParsing.toString(QueryParsing.java:535)
>
> This used to work in Solr 1.4 and I was wondering if it's a bug or a new
> feature and if there is a trick to get this working again?
>
> Best regards,
> Stefan
>
>
>


Re: Nullpointer Exception in Solr 4.x in DebugComponent when using wildcard in facet value

2011-06-03 Thread Stefan Moises

Hi Stefan,
sorry, actually there is a colon, I just forgot it in my example...
so the exception also appears for

&fq=price:[500+TO+*]

But only if debug=true... and "normal" price values work, e.g.

&fq=price:[500+TO+999]


Thanks,
Stefan

Am 03.06.2011 11:46, schrieb Stefan Matheis:

Stefan,

i guess there is a colon missing?&fq=price:[500+TO+*] should do the trick

Regards
Stefan

On Fri, Jun 3, 2011 at 11:42 AM, Stefan Moises  wrote:

Hi,

in Solr 4.x (trunk version of mid may) I have noticed a null pointer
exception if I activate debugging (debug=true) and use a wildcard to filter
by facet value, e.g.
if I have a price field

..."&debug=true&facet.field=price&fq=price[500+TO+*]"
I get

SEVERE: java.lang.RuntimeException: java.lang.NullPointerException
at
org.apache.solr.search.QueryParsing.toString(QueryParsing.java:538)
at
org.apache.solr.handler.component.DebugComponent.process(DebugComponent.java:77)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:239)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1298)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:353)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at
org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:465)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:555)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:852)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.NullPointerException
at
org.apache.solr.search.QueryParsing.toString(QueryParsing.java:402)
at
org.apache.solr.search.QueryParsing.toString(QueryParsing.java:535)

This used to work in Solr 1.4 and I was wondering if it's a bug or a new
feature and if there is a trick to get this working again?

Best regards,
Stefan




.



--
Mit den besten Grüßen aus Nürnberg,
Stefan Moises

***
Stefan Moises
Senior Softwareentwickler

shoptimax GmbH
Guntherstraße 45 a
90461 Nürnberg
Amtsgericht Nürnberg HRB 21703
GF Friedrich Schreieck

Tel.: 0911/25566-25
Fax:  0911/25566-29
moi...@shoptimax.de
http://www.shoptimax.de
***




php library for extractrequest handler

2011-06-03 Thread Naveen Gupta
Hi

We want to post to solr server with some of the files (rtf,doc,etc) using
php .. one way is to post using curl

is there any client like java client (solrcell)

urls will also help

Thanks
Naveen


Re: Return stemmed word

2011-06-03 Thread lboutros
Hi Kurt,

I think this is a bit more tricky than that.

For example, if a user searches for "oranges", the stemmer may return
"orang" which is not an existing word.

So getting stemmed words might/will not work for your highlighting purpose.

Ludovic.

-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Return-stemmed-word-tp3018880p3019180.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: php library for extractrequest handler

2011-06-03 Thread Gora Mohanty
On Fri, Jun 3, 2011 at 3:55 PM, Naveen Gupta  wrote:
> Hi
>
> We want to post to solr server with some of the files (rtf,doc,etc) using
> php .. one way is to post using curl

Do not normally use PHP, and have not tried it myself.
However, there is a PHP extension for Solr:
  http://wiki.apache.org/solr/SolPHP
  http://php.net/manual/en/book.solr.php

Regards,
Gora


Re: how to update database record after indexing

2011-06-03 Thread vrpar...@gmail.com
Hey Erick,

i written separate process as you suggested, and achieved task.

Thanks a lot

Vishal Parekh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-update-database-record-after-indexing-tp2874171p3019217.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: how to do offline adding/updating index

2011-06-03 Thread vrpar...@gmail.com
Thanks to all,

i done by using multicore,

vishal parekh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-do-offline-adding-updating-index-tp2923035p3019219.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: how to concatenate two nodes of xml with xpathentityprocessor

2011-06-03 Thread vrpar...@gmail.com
Thanks kbootz

your suggestion works fine, 

vishal parekh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-concatenate-two-nodes-of-xml-with-xpathentityprocessor-tp2861260p3019223.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrJ and Range Faceting

2011-06-03 Thread Martijn v Groningen
Hi Jamie,

I don't know why range facets didn't make it into SolrJ. But I've recently
opened an issue for this:
https://issues.apache.org/jira/browse/SOLR-2523

I hope this will be committed soon. Check the patch out and see if you like
it.

Martijn

On 2 June 2011 18:22, Jamie Johnson  wrote:

> Currently the range and date faceting in SolrJ acts a bit differently than
> I
> would expect.  Specifically, range facets aren't parsed at all and date
> facets end up generating filterQueries which don't have the range, just the
> lower bound.  Is there a reason why SolrJ doesn't support these?  I have
> written some things on my end to handle these and generate filterQueries
> for
> date ranges of the form dateTime:[start TO end] and I have a function
> (which
> I copied from the date faceting) which parses the range facets, but would
> prefer not to have to maintain these myself.  Is there a plan to implement
> these?  Also is there a plan to update FacetField to not have end be a
> date,
> perhaps making it a String like start so we can support date and range
> queries?
>



-- 
Met vriendelijke groet,

Martijn van Groningen


[Visualizations] from Query Results

2011-06-03 Thread Adam Estrada
Dear Solr experts,

I am curious to learn what visualization tools are out there to help me
"visualize" my query results. I am not talking about a language specific
client per se but something more like Carrot2 which breaks clusters in to
their knowledge tree and expandable pie chart. Sorry if those aren't the
correct names for those tools ;-) Anyway, what else is out there like
Carrot2 http://project.carrot2.org/ to help me visualize Solr query results?

Thanks for your input,
Adam


Re: Strategy --> Frequent updates in our application

2011-06-03 Thread Naveen Gupta
Hi Pravesh

We don't have that setup right now .. we are thinking of doing that 

for writes we are going to have one instance and for read, we are going to
have another...

do you have other design in mind .. kindly share

Thanks
Naveen

On Fri, Jun 3, 2011 at 2:50 PM, pravesh  wrote:

> You can use DataImportHandler for your full/incremental indexing. Now NRT
> indexing could vary as per business requirements (i mean delay cud be
> 5-mins
> ,10-mins,15-mins,OR, 30-mins). Then it also depends on how much volume will
> be indexed incrementally.
> BTW, r u having Master+Slave SOLR setup?
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Strategy-Frequent-updates-in-our-application-tp3018386p3019040.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: php library for extractrequest handler

2011-06-03 Thread Naveen Gupta
Yes,

that one i used and it is working fine .thanks to nabble ..

Thanks
Naveen

On Fri, Jun 3, 2011 at 4:02 PM, Gora Mohanty  wrote:

> On Fri, Jun 3, 2011 at 3:55 PM, Naveen Gupta  wrote:
> > Hi
> >
> > We want to post to solr server with some of the files (rtf,doc,etc) using
> > php .. one way is to post using curl
>
> Do not normally use PHP, and have not tried it myself.
> However, there is a PHP extension for Solr:
>  http://wiki.apache.org/solr/SolPHP
>  http://php.net/manual/en/book.solr.php
>
> Regards,
> Gora
>


Re: Strategy --> Frequent updates in our application

2011-06-03 Thread Nagendra Nagarajayya

Hi Naveen:

Solr with RankingAlgorithm supports NRT. The performance is about 262 
docs / sec. You can get more information about the performance and NRT 
from here:

http://solr-ra.tgels.com/wiki/en/Near_Real_Time_Search

You can download Solr with RankingAlgorithm from here:
http://solr-ra.tgels.com

Regards,

- Nagendra Nagarajayya
http://solr-ra.tgels.com

On 6/2/2011 8:29 PM, Naveen Gupta wrote:

Hi

We are having an application where every 10 mins, we are doing indexing of
users docs repository, and eventually, if some thread is being added in that
particular discussion, we need to index the thread again (please note we are
not doing blind indexing each time, we have various rules to filter out
which thread is new and thus that is a candidate for indexing plus new ones
which has arrived).

So we are doing updates for each user docs repository .. the performance is
not looking so far very good. the future is that we are going to get hits in
volume(1000 to 10,000 hits per mins), so looking for strategy where we can
tune solr in order to index the data in real time

and what about NRT, is it fine to apply in this case of scenario. i read
that solr NRT is not very good in performance, but i am not going to believe
it since it is one of the best open sources ..so it is going to have this
problem sorted in near future ..but if any benchmark is there, kindly share
with me ... we would like to analyze with our requirements.

Is there any way to add incremental indexes which we generally find in other
search engine like endeca and etc? i don't know much in detail about solr...
since i am newbie, so can you please tell me if we can have some settings
which can keep track of incremental indexing?


Thanks
Naveen





Solr Indexing Patterns

2011-06-03 Thread Judioo
What is the "best practice" method to index the following in Solr:

I'm attempting to use solr for a book store site.

Each book will have a price but on occasions this will be discounted. The
discounted price exists for a defined time period but there may be many
discount periods. Each discount will have a brief synopsis, start and end
time.

A subset of the desired output would be as follows:

...
"response":{"numFound":1,"start":0,"docs":[
  {
"name":"The Book",
"price":"$9.99",
"discounts":[
{
 "price":"$3.00",
 "synopsis":"thanksgiving special",
 "starts":"11-24-2011",
 "ends":"11-25-2011",
},
{
 "price":"$4.00",
 "synopsis":"Canadian thanksgiving special",
 "starts":"10-10-2011",
 "ends":"10-11-2011",
},
 ]
  },
  .

A requirement is to be able to search for just discounted publications. I
think I could use date faceting for this ( return publications that are
within a discount window ). When a discount search is performed no
publications that are not currently discounted will be returned.

My question are:

   - Does solr support this type of sub documents

In the above example the discounts are the sub documents. I know solr is not
a relational DB but I would like to store and index the above representation
in a single document if possible.

   - what is the best method to approach the above

I can see in many examples the authors tend to denormalize to solve similar
problems. This suggest that for each discount I am required to duplicate the
book data or form a document
association.
Which method would you advise?

It would be nice if solr could return a response structured as above.

Much Thanks


Re: Strategy --> Frequent updates in our application

2011-06-03 Thread pravesh
You can go ahead with the Master/Slave setup provided by SOLR. Its trivial to
setup and you also get SOLR's operational scripts for index synch'ing b/w
Master-to-Slave(s), OR the Java based replication feature.

There is no need to re-invent other architecture :)

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Strategy-Frequent-updates-in-our-application-tp3018386p3019475.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr performance tuning - disk i/o?

2011-06-03 Thread Demian Katz
Hello,

I'm trying to move a VuFind installation from an ailing physical server into a 
virtualized environment, and I'm running into performance problems.  VuFind is 
a Solr 1.4.1-based application with fairly large and complex records (many 
stored fields, many words per record).  My particular installation contains 
about a million records in the index, with a total index size around 6GB.

The virtual environment has more RAM and better CPUs than the old physical box, 
and I am satisfied that my Java environment is well-tuned.  My index is 
optimized.  Searches that hit the cache respond very well.  The problem is that 
non-cached searches are very slow - the more keywords I add, the slower they 
get, to the point of taking 6-12 seconds to come back with results on a quiet 
box and well over a minute under stress testing.  (The old box still took a 
while for equivalent searches, but it was about twice as fast as the new one).

My gut feeling is that disk access reading the index is the bottleneck here, 
but I know little about the specifics of Solr's internals, so it's entirely 
possible that my gut is wrong.  Outside testing does show that the the virtual 
environment's disk performance is not as good as the old physical server, 
especially when multiple processes are trying to access the same file 
simultaneously.

So, two basic questions:


1.)Would you agree that I'm dealing with a disk bottleneck, or are there 
some other factors I should be considering?  Any good diagnostics I should be 
looking at?

2.)If the problem is disk access, is there anything I can tune on the Solr 
side to alleviate the problems?

Thanks,
Demian


Re: java.io.IOException: The specified network name is no longer available

2011-06-03 Thread Erick Erickson
You'v got to tell us more about your setup. We can only guess that you're
on a remote file system and there's a problem there, which would be a
network problem outside of Solr's purview

You might want to review:
http://wiki.apache.org/solr/UsingMailingLists

Best
Erick

On Fri, Jun 3, 2011 at 1:52 AM, Gaurav Shingala
 wrote:
>
> Hi,
>
> I am using solr 1.4.1 and at the time of updating index getting following 
> error:
>
> 2011-06-03 05:54:06,943 ERROR [org.apache.solr.core.SolrCore] 
> (http-10.38.33.146-8080-4) java.io.IOException: The specified network name is 
> no longer available
>    at java.io.RandomAccessFile.readBytes(Native Method)
>    at java.io.RandomAccessFile.read(RandomAccessFile.java:322)
>    at 
> org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.readInternal(SimpleFSDirectory.java:132)
>    at 
> org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:157)
>    at 
> org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:38)
>    at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:78)
>    at org.apache.lucene.index.TermBuffer.read(TermBuffer.java:64)
>    at org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:129)
>    at org.apache.lucene.index.SegmentTermEnum.scanTo(SegmentTermEnum.java:160)
>    at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:232)
>    at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:179)
>    at org.apache.lucene.index.SegmentTermDocs.seek(SegmentTermDocs.java:57)
>    at org.apache.lucene.index.IndexReader.termDocs(IndexReader.java:1103)
>    at org.apache.lucene.index.SegmentReader.termDocs(SegmentReader.java:981)
>    at 
> org.apache.solr.search.SolrIndexReader.termDocs(SolrIndexReader.java:320)
>    at 
> org.apache.solr.search.SolrIndexSearcher.getDocSetNC(SolrIndexSearcher.java:640)
>    at 
> org.apache.solr.search.SolrIndexSearcher.getDocSet(SolrIndexSearcher.java:545)
>    at 
> org.apache.solr.search.SolrIndexSearcher.getDocSet(SolrIndexSearcher.java:581)
>    at 
> org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:903)
>    at 
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:884)
>    at 
> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:341)
>    at 
> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:182)
>    at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
>    at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>    at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
>    at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
>    at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
>    at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:274)
>    at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:242)
>    at 
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:275)
>    at 
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
>    at 
> org.jboss.web.tomcat.security.SecurityAssociationValve.invoke(SecurityAssociationValve.java:181)
>    at 
> org.jboss.modcluster.catalina.CatalinaContext$RequestListenerValve.event(CatalinaContext.java:285)
>    at 
> org.jboss.modcluster.catalina.CatalinaContext$RequestListenerValve.invoke(CatalinaContext.java:261)
>    at 
> org.jboss.web.tomcat.security.JaccContextValve.invoke(JaccContextValve.java:88)
>    at 
> org.jboss.web.tomcat.security.SecurityContextEstablishmentValve.invoke(SecurityContextEstablishmentValve.java:100)
>    at 
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
>    at 
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
>    at 
> org.jboss.web.tomcat.service.jca.CachedConnectionValve.invoke(CachedConnectionValve.java:158)
>    at 
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
>    at 
> org.jboss.web.tomcat.service.request.ActiveRequestResponseCacheValve.invoke(ActiveRequestResponseCacheValve.java:53)
>    at 
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:362)
>    at 
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:877)
>    at 
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:654)
>    at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:951)
>    at java.lang.Thread.run(Thread.java:619)
>
> 2011-06-03 05:54:06,943 INFO  [org.apache.solr.core.SolrCore] 
> (http-10.38.33.146-8080-4) [project_58787] webapp=/solr path=/select 
> params={sort=revisionid_l+desc&start=0&q=type_s:IFCFileMaster+AND+modelversionid_l:(+8+7+)&wt=javabin&fq=reftable_s:IFCRELDEFINE

Re: how to make getJson parameter dynamic

2011-06-03 Thread Erick Erickson
Romi:

Please review:
http://wiki.apache.org/solr/UsingMailingLists

This is the Solr forum. jQuery questions are best directed at a
jQuery-specific forum.

Best
Erick

On Fri, Jun 3, 2011 at 2:27 AM, Romi  wrote:
> lee carroll: Sorry for this. i did this because i was not getting any
> response. anyway thanks for letting me know and now i found the solution of
> the above problem :)
> now i am facing a very strange problem related to jquery can you please help
> me out.
>
> $(document).ready(function(){
>         $("#c2").click(function(){
>            var q=getquerystring() ;
>
>
> $.getJSON("http://192.168.1.9:8983/solr/db/select/?wt=json&q="+q+"&json.wrf=?";,
> function(result){
>                $.each(result.response.docs, function(i,item){
>                    alert(result.response.docs);
>                    alert(item.UID_PK);
>                });
>            });
>        });
>        });
>
>
> when i use $("#c2").click(function() then it does not enter in $.getJSON()
> and when i remove $("#c2").click(function() from the code it run fine. Why
> is so please explain. because i want to get data from a text box on
> onclickevent and then display response.
>
>
>
> -
> Thanks & Regards
> Romi
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/how-to-make-getJson-parameter-dynamic-tp3014941p3018732.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Nullpointer Exception in Solr 4.x in DebugComponent when using wildcard in facet value

2011-06-03 Thread Erick Erickson
Hmmm, I just tried it on a trunk from a couple of days ago and it
doesn't error out.
Could you re-try with a new build?

Thanks
Erick

On Fri, Jun 3, 2011 at 5:51 AM, Stefan Moises  wrote:
> Hi Stefan,
> sorry, actually there is a colon, I just forgot it in my example...
> so the exception also appears for
>
> &fq=price:[500+TO+*]
>
> But only if debug=true... and "normal" price values work, e.g.
>
> &fq=price:[500+TO+999]
>
>
> Thanks,
> Stefan
>
> Am 03.06.2011 11:46, schrieb Stefan Matheis:
>>
>> Stefan,
>>
>> i guess there is a colon missing?&fq=price:[500+TO+*] should do the trick
>>
>> Regards
>> Stefan
>>
>> On Fri, Jun 3, 2011 at 11:42 AM, Stefan Moises
>>  wrote:
>>>
>>> Hi,
>>>
>>> in Solr 4.x (trunk version of mid may) I have noticed a null pointer
>>> exception if I activate debugging (debug=true) and use a wildcard to
>>> filter
>>> by facet value, e.g.
>>> if I have a price field
>>>
>>> ..."&debug=true&facet.field=price&fq=price[500+TO+*]"
>>> I get
>>>
>>> SEVERE: java.lang.RuntimeException: java.lang.NullPointerException
>>>        at
>>> org.apache.solr.search.QueryParsing.toString(QueryParsing.java:538)
>>>        at
>>>
>>> org.apache.solr.handler.component.DebugComponent.process(DebugComponent.java:77)
>>>        at
>>>
>>> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:239)
>>>        at
>>>
>>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
>>>        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1298)
>>>        at
>>>
>>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:353)
>>>        at
>>>
>>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248)
>>>        at
>>>
>>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
>>>        at
>>>
>>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>>>        at
>>>
>>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>>>        at
>>>
>>> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
>>>        at
>>>
>>> org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:465)
>>>        at
>>>
>>> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
>>>        at
>>>
>>> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
>>>        at
>>> org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:555)
>>>        at
>>>
>>> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
>>>        at
>>>
>>> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
>>>        at
>>>
>>> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:852)
>>>        at
>>>
>>> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
>>>        at
>>> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
>>>        at java.lang.Thread.run(Thread.java:662)
>>> Caused by: java.lang.NullPointerException
>>>        at
>>> org.apache.solr.search.QueryParsing.toString(QueryParsing.java:402)
>>>        at
>>> org.apache.solr.search.QueryParsing.toString(QueryParsing.java:535)
>>>
>>> This used to work in Solr 1.4 and I was wondering if it's a bug or a new
>>> feature and if there is a trick to get this working again?
>>>
>>> Best regards,
>>> Stefan
>>>
>>>
>>>
>> .
>>
>
> --
> Mit den besten Grüßen aus Nürnberg,
> Stefan Moises
>
> ***
> Stefan Moises
> Senior Softwareentwickler
>
> shoptimax GmbH
> Guntherstraße 45 a
> 90461 Nürnberg
> Amtsgericht Nürnberg HRB 21703
> GF Friedrich Schreieck
>
> Tel.: 0911/25566-25
> Fax:  0911/25566-29
> moi...@shoptimax.de
> http://www.shoptimax.de
> ***
>
>
>


Re: [Visualizations] from Query Results

2011-06-03 Thread Erick Erickson
I'm not quite sure what you mean by "visualization" here. Do you
want to see the query parse tree? The results list in something other
than XML (see the /browse functionality if so). How documents are
ranked?

"Visualization" is another overloaded word ...

Best
Erick

On Fri, Jun 3, 2011 at 7:13 AM, Adam Estrada
 wrote:
> Dear Solr experts,
>
> I am curious to learn what visualization tools are out there to help me
> "visualize" my query results. I am not talking about a language specific
> client per se but something more like Carrot2 which breaks clusters in to
> their knowledge tree and expandable pie chart. Sorry if those aren't the
> correct names for those tools ;-) Anyway, what else is out there like
> Carrot2 http://project.carrot2.org/ to help me visualize Solr query results?
>
> Thanks for your input,
> Adam
>


Re: Nullpointer Exception in Solr 4.x in DebugComponent when using wildcard in facet value

2011-06-03 Thread Stefan Moises

Hi Erick

sure, thanks for looking into it! I'll let you know if it's working for 
me there, too...
(I'm using edismax btw., but I've also tested with standard and got the 
exception)


Stefan

Am 03.06.2011 15:22, schrieb Erick Erickson:

Hmmm, I just tried it on a trunk from a couple of days ago and it
doesn't error out.
Could you re-try with a new build?

Thanks
Erick

On Fri, Jun 3, 2011 at 5:51 AM, Stefan Moises  wrote:

Hi Stefan,
sorry, actually there is a colon, I just forgot it in my example...
so the exception also appears for

&fq=price:[500+TO+*]

But only if debug=true... and "normal" price values work, e.g.

&fq=price:[500+TO+999]


Thanks,
Stefan

Am 03.06.2011 11:46, schrieb Stefan Matheis:

Stefan,

i guess there is a colon missing?&fq=price:[500+TO+*] should do the trick

Regards
Stefan

On Fri, Jun 3, 2011 at 11:42 AM, Stefan Moises
  wrote:

Hi,

in Solr 4.x (trunk version of mid may) I have noticed a null pointer
exception if I activate debugging (debug=true) and use a wildcard to
filter
by facet value, e.g.
if I have a price field

..."&debug=true&facet.field=price&fq=price[500+TO+*]"
I get

SEVERE: java.lang.RuntimeException: java.lang.NullPointerException
at
org.apache.solr.search.QueryParsing.toString(QueryParsing.java:538)
at

org.apache.solr.handler.component.DebugComponent.process(DebugComponent.java:77)
at

org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:239)
at

org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1298)
at

org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:353)
at

org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248)
at

org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at

org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at

org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at

org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at

org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:465)
at

org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at

org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:555)
at

org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at

org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at

org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:852)
at

org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.NullPointerException
at
org.apache.solr.search.QueryParsing.toString(QueryParsing.java:402)
at
org.apache.solr.search.QueryParsing.toString(QueryParsing.java:535)

This used to work in Solr 1.4 and I was wondering if it's a bug or a new
feature and if there is a trick to get this working again?

Best regards,
Stefan




.


--
Mit den besten Grüßen aus Nürnberg,
Stefan Moises

***
Stefan Moises
Senior Softwareentwickler

shoptimax GmbH
Guntherstraße 45 a
90461 Nürnberg
Amtsgericht Nürnberg HRB 21703
GF Friedrich Schreieck

Tel.: 0911/25566-25
Fax:  0911/25566-29
moi...@shoptimax.de
http://www.shoptimax.de
***




.



--
Mit den besten Grüßen aus Nürnberg,
Stefan Moises

***
Stefan Moises
Senior Softwareentwickler

shoptimax GmbH
Guntherstraße 45 a
90461 Nürnberg
Amtsgericht Nürnberg HRB 21703
GF Friedrich Schreieck

Tel.: 0911/25566-25
Fax:  0911/25566-29
moi...@shoptimax.de
http://www.shoptimax.de
***




Re: Solr Indexing Patterns

2011-06-03 Thread Erick Erickson
How often are the discounts changed? Because you can simply
re-index the book information with a multiValued "discounts" field
and get something similar to your example (&wt=json)


Best
Erick

On Fri, Jun 3, 2011 at 8:38 AM, Judioo  wrote:
> What is the "best practice" method to index the following in Solr:
>
> I'm attempting to use solr for a book store site.
>
> Each book will have a price but on occasions this will be discounted. The
> discounted price exists for a defined time period but there may be many
> discount periods. Each discount will have a brief synopsis, start and end
> time.
>
> A subset of the desired output would be as follows:
>
> ...
> "response":{"numFound":1,"start":0,"docs":[
>  {
>    "name":"The Book",
>    "price":"$9.99",
>    "discounts":[
>        {
>         "price":"$3.00",
>         "synopsis":"thanksgiving special",
>         "starts":"11-24-2011",
>         "ends":"11-25-2011",
>        },
>        {
>         "price":"$4.00",
>         "synopsis":"Canadian thanksgiving special",
>         "starts":"10-10-2011",
>         "ends":"10-11-2011",
>        },
>     ]
>  },
>  .
>
> A requirement is to be able to search for just discounted publications. I
> think I could use date faceting for this ( return publications that are
> within a discount window ). When a discount search is performed no
> publications that are not currently discounted will be returned.
>
> My question are:
>
>   - Does solr support this type of sub documents
>
> In the above example the discounts are the sub documents. I know solr is not
> a relational DB but I would like to store and index the above representation
> in a single document if possible.
>
>   - what is the best method to approach the above
>
> I can see in many examples the authors tend to denormalize to solve similar
> problems. This suggest that for each discount I am required to duplicate the
> book data or form a document
> association.
> Which method would you advise?
>
> It would be nice if solr could return a response structured as above.
>
> Much Thanks
>


Re: Nullpointer Exception in Solr 4.x in DebugComponent when using wildcard in facet value

2011-06-03 Thread Yonik Seeley
This bug was introduced during the cutover from strings to BytesRef on
TermRangeQuery.
I just committed a fix.

-Yonik
http://www.lucidimagination.com

On Fri, Jun 3, 2011 at 5:42 AM, Stefan Moises  wrote:
> Hi,
>
> in Solr 4.x (trunk version of mid may) I have noticed a null pointer
> exception if I activate debugging (debug=true) and use a wildcard to filter
> by facet value, e.g.
> if I have a price field
>
> ..."&debug=true&facet.field=price&fq=price[500+TO+*]"
> I get
>
> SEVERE: java.lang.RuntimeException: java.lang.NullPointerException
>        at
> org.apache.solr.search.QueryParsing.toString(QueryParsing.java:538)
>        at
> org.apache.solr.handler.component.DebugComponent.process(DebugComponent.java:77)
>        at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:239)
>        at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
>        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1298)
>        at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:353)
>        at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248)
>        at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
>        at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>        at
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>        at
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
>        at
> org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:465)
>        at
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
>        at
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
>        at
> org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:555)
>        at
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
>        at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
>        at
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:852)
>        at
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
>        at
> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
>        at java.lang.Thread.run(Thread.java:662)
> Caused by: java.lang.NullPointerException
>        at
> org.apache.solr.search.QueryParsing.toString(QueryParsing.java:402)
>        at
> org.apache.solr.search.QueryParsing.toString(QueryParsing.java:535)
>
> This used to work in Solr 1.4 and I was wondering if it's a bug or a new
> feature and if there is a trick to get this working again?
>
> Best regards,
> Stefan
>
>
>


Re: Strategy --> Frequent updates in our application

2011-06-03 Thread Erick Erickson
Do be careful how often you pull down indexes on your slaves. A
too-short polling interval can
lead to some problems. Start with, say, 5 minutes and insure that your
autowarm time (see your
logs) is less than your polling interval

Best
Erick


On Fri, Jun 3, 2011 at 8:43 AM, pravesh  wrote:
> You can go ahead with the Master/Slave setup provided by SOLR. Its trivial to
> setup and you also get SOLR's operational scripts for index synch'ing b/w
> Master-to-Slave(s), OR the Java based replication feature.
>
> There is no need to re-invent other architecture :)
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Strategy-Frequent-updates-in-our-application-tp3018386p3019475.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Solr performance tuning - disk i/o?

2011-06-03 Thread Otis Gospodnetic
Demian,

* You can run iostat or vmstat and see if there is disk IO during your slow 
queries and compare that to disk IO (if any) with your fast/cached queries
* You can make sure you warm up your index well after the first and any new 
searcher, so that OS and Solr caches are warmed up
* You can look at Solr Stats page to make sure your caches are utilized well 
and 
adjust their settings if they are not.
* ...

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: Demian Katz 
> To: "solr-user@lucene.apache.org" 
> Sent: Fri, June 3, 2011 8:44:33 AM
> Subject: Solr performance tuning - disk i/o?
> 
> Hello,
> 
> I'm trying to move a VuFind installation from an ailing physical  server into 
> a 
>virtualized environment, and I'm running into performance  problems.  VuFind 
>is 
>a Solr 1.4.1-based application with fairly large and  complex records (many 
>stored fields, many words per record).  My particular  installation contains 
>about a million records in the index, with a total index  size around 6GB.
> 
> The virtual environment has more RAM and better CPUs  than the old physical 
>box, and I am satisfied that my Java environment is  well-tuned.  My index is 
>optimized.  Searches that hit the cache  respond very well.  The problem is 
>that 
>non-cached searches are very slow -  the more keywords I add, the slower they 
>get, to the point of taking 6-12  seconds to come back with results on a quiet 
>box and well over a minute under  stress testing.  (The old box still took a 
>while for equivalent searches,  but it was about twice as fast as the new one).
> 
> My gut feeling is that  disk access reading the index is the bottleneck here, 
>but I know little about  the specifics of Solr's internals, so it's entirely 
>possible that my gut is  wrong.  Outside testing does show that the the 
>virtual 
>environment's disk  performance is not as good as the old physical server, 
>especially when multiple  processes are trying to access the same file 
>simultaneously.
> 
> So, two  basic questions:
> 
> 
> 1.)Would you agree that I'm dealing  with a disk bottleneck, or are there 
>some other factors I should be  considering?  Any good diagnostics I should be 
>looking at?
> 
> 2.) If the problem is disk access, is there anything I can tune on the 
> Solr  
>side to alleviate the problems?
> 
> Thanks,
> Demian
> 


Re: Solr performance tuning - disk i/o?

2011-06-03 Thread Erick Erickson
This doesn't seem right. Here's a couple of things to try:
1> attach &debugQuery=on to your long-running queries. The QTime returned
 is the time taken to search, NOT including the time to load the
docs. That'll
 help pinpoint whether the problem is the search itself, or assembling the
 documents.
2> Are you autowarming? If so, be sure it's actually done before querying.
3> Measure queries after the first few, particularly if you're sorting or
 faceting.
4> What are your JVM settings? How much memory do you have?
5> is  set to true in your solrconfig.xml?
6> How many docs are you returning?


There's more, but that'll do for a start Let us know if you gather more data
and it's still slow.

Best
Erick

On Fri, Jun 3, 2011 at 8:44 AM, Demian Katz  wrote:
> Hello,
>
> I'm trying to move a VuFind installation from an ailing physical server into 
> a virtualized environment, and I'm running into performance problems.  VuFind 
> is a Solr 1.4.1-based application with fairly large and complex records (many 
> stored fields, many words per record).  My particular installation contains 
> about a million records in the index, with a total index size around 6GB.
>
> The virtual environment has more RAM and better CPUs than the old physical 
> box, and I am satisfied that my Java environment is well-tuned.  My index is 
> optimized.  Searches that hit the cache respond very well.  The problem is 
> that non-cached searches are very slow - the more keywords I add, the slower 
> they get, to the point of taking 6-12 seconds to come back with results on a 
> quiet box and well over a minute under stress testing.  (The old box still 
> took a while for equivalent searches, but it was about twice as fast as the 
> new one).
>
> My gut feeling is that disk access reading the index is the bottleneck here, 
> but I know little about the specifics of Solr's internals, so it's entirely 
> possible that my gut is wrong.  Outside testing does show that the the 
> virtual environment's disk performance is not as good as the old physical 
> server, especially when multiple processes are trying to access the same file 
> simultaneously.
>
> So, two basic questions:
>
>
> 1.)    Would you agree that I'm dealing with a disk bottleneck, or are there 
> some other factors I should be considering?  Any good diagnostics I should be 
> looking at?
>
> 2.)    If the problem is disk access, is there anything I can tune on the 
> Solr side to alleviate the problems?
>
> Thanks,
> Demian
>


Re: [Visualizations] from Query Results

2011-06-03 Thread Otis Gospodnetic
Hi Adam,

Try this:
http://lmgtfy.com/?q=search%20results%20visualizations

In practice I find that visualizations are cool and attractive looking, but 
often text is more useful because it's more direct.  But there is room for 
graphical representation of search results, sure.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: Adam Estrada 
> To: solr-user@lucene.apache.org
> Sent: Fri, June 3, 2011 7:13:39 AM
> Subject: [Visualizations] from Query Results
> 
> Dear Solr experts,
> 
> I am curious to learn what visualization tools are out  there to help me
> "visualize" my query results. I am not talking about a  language specific
> client per se but something more like Carrot2 which breaks  clusters in to
> their knowledge tree and expandable pie chart. Sorry if those  aren't the
> correct names for those tools ;-) Anyway, what else is out there  like
> Carrot2 http://project.carrot2.org/ to help me visualize Solr query  results?
> 
> Thanks for your input,
> Adam
> 


Re: query routing with shards

2011-06-03 Thread Otis Gospodnetic
Hi Dmitry,

Yes, you could also implement your own custom SearchComponent.  In this 
component you could grab the query param, examine the query value, and based on 
that add the shards URL param with appropriate value, so that when the regular 
QueryComponent grabs stuff from the request, it has the correct shard in there 
already.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: Dmitry Kan 
> To: solr-user@lucene.apache.org
> Sent: Fri, June 3, 2011 2:47:00 AM
> Subject: Re: query routing with shards
> 
> Hi Otis,
> 
> I merely followed on the gmail's suggestion to include other  people into the
> recipients list, Yonik was the first one :) I won't do it  next time.
> 
> Thanks for a rapid reply. The reason for doing this query  routing is that we
> abstract the distributed SOLR from the client code for  security reasons
> (that is, we don't want to expose the entire shard farm to  the world, but
> only the frontend SOLR) and for better decoupling.
> 
> Is  it possible to implement a plugin to SOLR that would map queries  to
> shards?
> 
> We have other choices too, they'll take quite some time,  that's why I
> decided to quickly ask, if I was missing something from the SOLR  main
> components design and configuration.
> 
> Dmitry
> 
> On Fri, Jun 3,  2011 at 8:25 AM, Otis Gospodnetic  >  wrote:
> 
> > Hi Dmitry (you may not want to additionally copy Yonik, he's  subscribed to
> > this
> > list, too)
> >
> >
> > It sounds  like you have the knowledge of which query maps to which shard.
> >   If
> > so, why not control/change the value of "shards" param in the request  to
> > your
> > front-end Solr (aka distributed request dispatcher)  within your app, which
> > is
> > the one calling Solr?
> >
> >  Otis
> > 
> > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> > Lucene  ecosystem search :: http://search-lucene.com/
> >
> >
> >
> > - Original  Message 
> > > From: Dmitry Kan 
> > > To: solr-user@lucene.apache.org; yo...@lucidimagination.com
> >  > Sent: Thu, June 2, 2011 7:00:53 AM
> > > Subject: query routing with  shards
> > >
> > > Hello all,
> > >
> > > We have  currently several pretty fat logically isolated shards  with the
> >  same
> > > schema / solrconfig (indices are separate). We currently  have  one single
> > > front end SOLR (1.4) for the client code  calls. Since a client  code
> > query
> > > usually hits only  one shard, we are considering making a smart  routing
> > of
> >  > queries to the shards they map to. Can you please give some  pointers  as
> > to
> > > what would be an optimal way to achieve such a  routing inside  the front
> > end
> > > solr? Is there a way to  configure mapping inside the  solrconfig?
> > >
> > >  Thanks.
> > >
> > > --
> > > Regards,
> > >
> >  > Dmitry Kan
> > >
> >
> 
> 
> 
> -- 
> Regards,
> 
> Dmitry Kan
> 


Re: java.io.IOException: The specified network name is no longer available

2011-06-03 Thread Otis Gospodnetic
Hi,

I'm guessing your index is on some sort of network drive that got detached?

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: Gaurav Shingala 
> To: Apache SolrUser 
> Sent: Fri, June 3, 2011 1:52:42 AM
> Subject: java.io.IOException: The specified network name is no longer 
available
> 
> 
> Hi,
> 
> I am using solr 1.4.1 and at the time of updating index getting  following 
>error:
> 
> 2011-06-03 05:54:06,943 ERROR  [org.apache.solr.core.SolrCore] 
>(http-10.38.33.146-8080-4) java.io.IOException:  The specified network name is 
>no longer available
> at  java.io.RandomAccessFile.readBytes(Native Method)
> at  java.io.RandomAccessFile.read(RandomAccessFile.java:322)
> at  
>org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.readInternal(SimpleFSDirectory.java:132)
>
>  at  
>org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:157)
>  at  
>org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:38)
>  at  org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:78)
>  at org.apache.lucene.index.TermBuffer.read(TermBuffer.java:64)
>  at  
org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:129)
>  at  
>org.apache.lucene.index.SegmentTermEnum.scanTo(SegmentTermEnum.java:160)
>  at  org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:232)
>  at  org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:179)
>  at  org.apache.lucene.index.SegmentTermDocs.seek(SegmentTermDocs.java:57)
>  at  org.apache.lucene.index.IndexReader.termDocs(IndexReader.java:1103)
>  at  
org.apache.lucene.index.SegmentReader.termDocs(SegmentReader.java:981)
>  at  
>org.apache.solr.search.SolrIndexReader.termDocs(SolrIndexReader.java:320)
>  at  
>org.apache.solr.search.SolrIndexSearcher.getDocSetNC(SolrIndexSearcher.java:640)
>  at  
>org.apache.solr.search.SolrIndexSearcher.getDocSet(SolrIndexSearcher.java:545)
>  at  
>org.apache.solr.search.SolrIndexSearcher.getDocSet(SolrIndexSearcher.java:581)
>  at  
>org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:903)
>
>  at  
>org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:884)
>  at  
>org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:341)
>  at  
>org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:182)
>
>  at  
>org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
>
>  at  
>org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>
>  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
>  at  
>org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
>  at  
>org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
>  at  
>org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:274)
>
>  at  
>org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:242)
>
>  at  
>org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:275)
>
>  at  
>org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
>
>  at  
>org.jboss.web.tomcat.security.SecurityAssociationValve.invoke(SecurityAssociationValve.java:181)
>
>  at  
>org.jboss.modcluster.catalina.CatalinaContext$RequestListenerValve.event(CatalinaContext.java:285)
>
>  at  
>org.jboss.modcluster.catalina.CatalinaContext$RequestListenerValve.invoke(CatalinaContext.java:261)
>
>  at  
>org.jboss.web.tomcat.security.JaccContextValve.invoke(JaccContextValve.java:88)
>  at  
>org.jboss.web.tomcat.security.SecurityContextEstablishmentValve.invoke(SecurityContextEstablishmentValve.java:100)
>
>  at  
>org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
>  at  
>org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
>  at  
>org.jboss.web.tomcat.service.jca.CachedConnectionValve.invoke(CachedConnectionValve.java:158)
>
>  at  
>org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
>
>  at  
>org.jboss.web.tomcat.service.request.ActiveRequestResponseCacheValve.invoke(ActiveRequestResponseCacheValve.java:53)
>
>  at  
>org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:362)
>  at  
>org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:877)
>  at  
>org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:654)
>
>  at  
>org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:951)
>  at java.lang.Thread.run(Thread.java:619)
> 
> 2011-06-03 05:54:06,943  INFO  [org.apache.solr.

Ignore This Test Message

2011-06-03 Thread Jasneet Sabharwal

Hey Guys

Just a test mail, please ignore this.

--
Thanx&  Regards

Jasneet Sabharwal
Software Developer
NextGen Invent Corporation



Re: Better to have lots of smaller cores or one really big core?

2011-06-03 Thread JohnRodey
Thanks Erick for the response.

So my data structure is the same, i.e. they all use the same schema.  Though
I think it makes sense for us to somehow break apart the data, for example
by the date it was indexed.  I'm just trying to get a feel for how large we
should aim to keep those (by day, by week, by month, etc...).

So it sounds like we should aim to keep them at a size that one solr server
can host to avoid serving multiple cores.

One question, there is no real difference (other than configuration) from a
server hosting its own index vs. it hosting one core, is there?

Thanks!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Better-to-have-lots-of-smaller-cores-or-one-really-big-core-tp3017973p3019686.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Solr performance tuning - disk i/o?

2011-06-03 Thread Demian Katz
Thanks to you and Otis for the suggestions!  Some more information:

- Based on the Solr stats page, my caches seem to be working pretty well (few 
or no evictions, hit rates in the 75-80% range).
- VuFind is actually doing two Solr queries per search (one initial search 
followed by a supplemental spell check search -- I believe this is necessary 
because VuFind has two separate spelling indexes, one for shingled terms and 
one for single words).  That is probably exaggerating the problem, though based 
on searches with debugQuery on, it looks like it's always the initial search 
(rather than the supplemental spelling search) that's consuming the bulk of the 
time.
- enableLazyFieldLoading is set to true.
- I'm retrieving 20 documents per page.
- My JVM settings: -server -Xloggc:/usr/local/vufind/solr/jetty/logs/gc.log 
-Xms4096m -Xmx4096m -XX:+UseParallelGC -XX:+UseParallelOldGC -XX:NewRatio=5

It appears that a large portion of my problem had to do with autowarming, a 
topic that I've never had a strong grasp on, though perhaps I'm finally 
learning (any recommended primer links would be welcome!).  I did have some 
autowarming settings in solrconfig.xml (an arbitrary search for a bunch of 
random keywords in the newSearcher and firstSearcher events, plus autowarmCount 
settings on all of my caches).  However, when I looked at the debugQuery 
output, I noticed that a huge amount of time was being wasted loading facets on 
the first search after restarting Solr, so I changed my newSearcher and 
firstSearcher events to this:

  

  *:*
  0
  10
  true
  1
  collection
  format
  publishDate
  callnumber-first
  topic_facet
  authorStr
  language
  genre_facet
  era_facet
  geographic_facet

  

Overall performance has now increased dramatically, and now the biggest 
bottleneck in the debug output seems to be the shingle spell checking!

Any other suggestions are welcome, since I suspect there's still room to 
squeeze more performance out of the system, and I'm still not sure I'm making 
the most of autowarming...  but this seems like a big step in the right 
direction.  Thanks again for the help!

- Demian

> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Friday, June 03, 2011 9:41 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr performance tuning - disk i/o?
> 
> This doesn't seem right. Here's a couple of things to try:
> 1> attach &debugQuery=on to your long-running queries. The QTime
> returned
>  is the time taken to search, NOT including the time to load the
> docs. That'll
>  help pinpoint whether the problem is the search itself, or
> assembling the
>  documents.
> 2> Are you autowarming? If so, be sure it's actually done before
> querying.
> 3> Measure queries after the first few, particularly if you're sorting
> or
>  faceting.
> 4> What are your JVM settings? How much memory do you have?
> 5> is  set to true in your solrconfig.xml?
> 6> How many docs are you returning?
> 
> 
> There's more, but that'll do for a start Let us know if you gather
> more data
> and it's still slow.
> 
> Best
> Erick
> 
> On Fri, Jun 3, 2011 at 8:44 AM, Demian Katz 
> wrote:
> > Hello,
> >
> > I'm trying to move a VuFind installation from an ailing physical
> server into a virtualized environment, and I'm running into performance
> problems.  VuFind is a Solr 1.4.1-based application with fairly large
> and complex records (many stored fields, many words per record).  My
> particular installation contains about a million records in the index,
> with a total index size around 6GB.
> >
> > The virtual environment has more RAM and better CPUs than the old
> physical box, and I am satisfied that my Java environment is well-
> tuned.  My index is optimized.  Searches that hit the cache respond
> very well.  The problem is that non-cached searches are very slow - the
> more keywords I add, the slower they get, to the point of taking 6-12
> seconds to come back with results on a quiet box and well over a minute
> under stress testing.  (The old box still took a while for equivalent
> searches, but it was about twice as fast as the new one).
> >
> > My gut feeling is that disk access reading the index is the
> bottleneck here, but I know little about the specifics of Solr's
> internals, so it's entirely possible that my gut is wrong.  Outside
> testing does show that the the virtual environment's disk performance
> is not as good as the old physical server, especially when multiple
> processes are trying to access the same file simultaneously.
> >
> > So, two basic questions:
> >
> >
> > 1.)    Would you agree that I'm dealing with a disk bottleneck, or
> are there some other factors I should be considering?  Any good
> diagnostics I should be looking at?
> >
> > 2.)    If the problem is disk access, is there any

fq null pointer exception

2011-06-03 Thread dan whelan
I am noticing something strange with our recent upgrade to solr 3.1 and 
want to see if anyone has experienced anything similar.


I have a solr.StrField field named Status the values are Enabled, 
Disabled, or ''


When I facet on that field it I get

Enabled 4409565
Disabled 29185
"" 112


The issue is when I do a filter query

This query works

select/?q=*:*&fq=Status:"Enabled"

But when I run this query I get a NPE

select/?q=*:*&fq=Status:"Disabled"


Here is part of the stack trace


Problem accessing /solr/global_accounts/select/. Reason:
null

java.lang.NullPointerException
at org.apache.solr.response.XMLWriter.writePrim(XMLWriter.java:828)
at org.apache.solr.response.XMLWriter.writeStr(XMLWriter.java:686)
at org.apache.solr.schema.StrField.write(StrField.java:49)
at org.apache.solr.schema.SchemaField.write(SchemaField.java:125)
at org.apache.solr.response.XMLWriter.writeDoc(XMLWriter.java:369)
at org.apache.solr.response.XMLWriter$3.writeDocs(XMLWriter.java:545)
at 
org.apache.solr.response.XMLWriter.writeDocuments(XMLWriter.java:482)

at org.apache.solr.response.XMLWriter.writeDocList(XMLWriter.java:519)
at org.apache.solr.response.XMLWriter.writeVal(XMLWriter.java:582)
at org.apache.solr.response.XMLWriter.writeResponse(XMLWriter.java:131)
at 
org.apache.solr.response.XMLResponseWriter.write(XMLResponseWriter.java:35)

...


Thanks,

Dan



Solr Performance

2011-06-03 Thread Rohit
Hi,

 

We migrated to Solr a few days back, but have now after going live we have
noticed a performance drop, especially when we do a delta index, which we
are executing every 1hours with around 100,000 records . We have a multi
core Solr server running on a Linux machine, with 4Gb given to the JVM, its
not possible for me to upgrade the ram or give more memory to the Solr
currently.

 

So I was considering the option of running a master-slave config, I have
another window machine with 4gb ram available on the same network. I have
two questions regarding this,

 

. Is this a right path to take ?

. How can I do this with minimum down time, given the fact that our
index is huge

. Can someone point me to the right direction for this?

 

Thanks and Regards,

Rohit



Re: Sorting

2011-06-03 Thread Clecio Varjao
Because when browsing through legislation, people want to browse in
the same order as it is actually printed in the hard copy volumes.
It did work by using a copyfield to a lowercase field.

On Fri, Jun 3, 2011 at 2:29 AM, pravesh  wrote:
> BTW, why r u sorting on this field?
> You could also index & store this field twice. First, in its original value,
> and then second, by encoding to some unique code/hash and index it and sort
> on that.
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Sorting-tp3017285p3019055.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Hitting the URI limit, how to get around this?

2011-06-03 Thread JohnRodey
So here's what I'm seeing: I'm running Solr 3.1
I'm running a java client that executes a Httpget (I tried HttpPost) with a
large shard list.  If I remove a few shards from my current list it returns
fine, when I use my full shard list I get a "HTTP/1.1 400 Bad Request".  If
I execute it in firefox with a few shards removed it returns fine, with the
full shard list I get a blank screen returned immediately.

My URI works at around 7800 characters but adding one more shard to it blows
up.

Any ideas? 

I've tried using SolrJ rather than httpget before but ran into similar
issues but with even less shards.
See 
http://lucene.472066.n3.nabble.com/Long-list-of-shards-breaks-solrj-query-td2748556.html
http://lucene.472066.n3.nabble.com/Long-list-of-shards-breaks-solrj-query-td2748556.html
 

My shards are added dynamically, every few hours I am adding new shards or
cores into the cluster.  so I cannot have a shard list in the config files
unless I can somehow update them while the system is running.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Hitting-the-URI-limit-how-to-get-around-this-tp3017837p3020185.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Hitting the URI limit, how to get around this?

2011-06-03 Thread Ken Krugler
It sounds like you're hitting the max URL length (8K is a common default) for 
the HTTP web server that you're using to run Solr.

All of the web servers I know about let you bump this limit up via 
configuration settings.

-- Ken

On Jun 3, 2011, at 9:27am, JohnRodey wrote:

> So here's what I'm seeing: I'm running Solr 3.1
> I'm running a java client that executes a Httpget (I tried HttpPost) with a
> large shard list.  If I remove a few shards from my current list it returns
> fine, when I use my full shard list I get a "HTTP/1.1 400 Bad Request".  If
> I execute it in firefox with a few shards removed it returns fine, with the
> full shard list I get a blank screen returned immediately.
> 
> My URI works at around 7800 characters but adding one more shard to it blows
> up.
> 
> Any ideas? 
> 
> I've tried using SolrJ rather than httpget before but ran into similar
> issues but with even less shards.
> See 
> http://lucene.472066.n3.nabble.com/Long-list-of-shards-breaks-solrj-query-td2748556.html
> http://lucene.472066.n3.nabble.com/Long-list-of-shards-breaks-solrj-query-td2748556.html
>  
> 
> My shards are added dynamically, every few hours I am adding new shards or
> cores into the cluster.  so I cannot have a shard list in the config files
> unless I can somehow update them while the system is running.
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Hitting-the-URI-limit-how-to-get-around-this-tp3017837p3020185.html
> Sent from the Solr - User mailing list archive at Nabble.com.

--
Ken Krugler
+1 530-210-6378
http://bixolabs.com
custom data mining solutions








Re: Solr Performance

2011-06-03 Thread Otis Gospodnetic
Rohit:

Yes, run indexing on one machine (master), searches on the other (slave) and 
set 
up replication between them.  Don't optimize your index and warm up the 
searcher 
and caches on slaves.  No downtime.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: Rohit 
> To: solr-user@lucene.apache.org
> Sent: Fri, June 3, 2011 11:49:28 AM
> Subject: Solr Performance
> 
> Hi,
> 
> 
> 
> We migrated to Solr a few days back, but have now after  going live we have
> noticed a performance drop, especially when we do a delta  index, which we
> are executing every 1hours with around 100,000 records . We  have a multi
> core Solr server running on a Linux machine, with 4Gb given to  the JVM, its
> not possible for me to upgrade the ram or give more memory to  the Solr
> currently.
> 
> 
> 
> So I was considering the option of  running a master-slave config, I have
> another window machine with 4gb ram  available on the same network. I have
> two questions regarding this,
> 
> 
> 
> . Is this a right path to take  ?
> 
> . How can I do this with minimum down time,  given the fact that our
> index is huge
> 
> .  Can someone point me to the right direction for this?
> 
> 
> 
> Thanks and  Regards,
> 
> Rohit
> 
> 


Re: Solr performance tuning - disk i/o?

2011-06-03 Thread Otis Gospodnetic
Right, if you facet results, then your warmup queries should include those 
facets.  The same with sorting.  If you sort on fields A and B, then include 
warmup queries that sort on A and B.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: Demian Katz 
> To: "solr-user@lucene.apache.org" 
> Sent: Fri, June 3, 2011 11:21:52 AM
> Subject: RE: Solr performance tuning - disk i/o?
> 
> Thanks to you and Otis for the suggestions!  Some more  information:
> 
> - Based on the Solr stats page, my caches seem to be working  pretty well 
> (few 
>or no evictions, hit rates in the 75-80% range).
> - VuFind is  actually doing two Solr queries per search (one initial search 
>followed by a  supplemental spell check search -- I believe this is necessary 
>because VuFind  has two separate spelling indexes, one for shingled terms and 
>one for single  words).  That is probably exaggerating the problem, though 
>based 
>on  searches with debugQuery on, it looks like it's always the initial search  
>(rather than the supplemental spelling search) that's consuming the bulk of 
>the  
>time.
> - enableLazyFieldLoading is set to true.
> - I'm retrieving 20  documents per page.
> - My JVM settings: -server  -Xloggc:/usr/local/vufind/solr/jetty/logs/gc.log 
>-Xms4096m -Xmx4096m  -XX:+UseParallelGC -XX:+UseParallelOldGC -XX:NewRatio=5
> 
> It appears that a  large portion of my problem had to do with autowarming, a 
>topic that I've never  had a strong grasp on, though perhaps I'm finally 
>learning (any recommended  primer links would be welcome!).  I did have some 
>autowarming settings in  solrconfig.xml (an arbitrary search for a bunch of 
>random keywords in the  newSearcher and firstSearcher events, plus 
>autowarmCount 
>settings on all of my  caches).  However, when I looked at the debugQuery 
>output, I noticed that a  huge amount of time was being wasted loading facets 
>on 
>the first search after  restarting Solr, so I changed my newSearcher and 
>firstSearcher events to  this:
> 
>   
>  
>   *:*
>   0
>   10
>   true
>   1
>collection
>format
>publishDate
>callnumber-first
>topic_facet
>authorStr
>language
>genre_facet
>era_facet
>geographic_facet
>  
>   
> 
> Overall  performance has now increased dramatically, and now the biggest 
>bottleneck in  the debug output seems to be the shingle spell checking!
> 
> Any other  suggestions are welcome, since I suspect there's still room to 
>squeeze more  performance out of the system, and I'm still not sure I'm making 
>the most of  autowarming...  but this seems like a big step in the right  
>direction.  Thanks again for the help!
> 
> - Demian
> 
> >  -Original Message-
> > From: Erick Erickson [mailto:erickerick...@gmail.com]
> > Sent:  Friday, June 03, 2011 9:41 AM
> > To: solr-user@lucene.apache.org
> >  Subject: Re: Solr performance tuning - disk i/o?
> > 
> > This doesn't  seem right. Here's a couple of things to try:
> > 1> attach  &debugQuery=on to your long-running queries. The QTime
> >  returned
> >  is the time taken to search, NOT including  the time to load the
> > docs. That'll
> >  help  pinpoint whether the problem is the search itself, or
> > assembling  the
> >  documents.
> > 2> Are you autowarming? If  so, be sure it's actually done before
> > querying.
> > 3> Measure  queries after the first few, particularly if you're sorting
> >  or
> >  faceting.
> > 4> What are your JVM  settings? How much memory do you have?
> > 5> is   set to true in your solrconfig.xml?
> > 6>  How many docs are you returning?
> > 
> > 
> > There's more, but  that'll do for a start Let us know if you gather
> > more data
> >  and it's still slow.
> > 
> > Best
> > Erick
> > 
> > On  Fri, Jun 3, 2011 at 8:44 AM, Demian Katz 
> >  wrote:
> > > Hello,
> > >
> > > I'm trying to move a  VuFind installation from an ailing physical
> > server into a virtualized  environment, and I'm running into performance
> > problems.  VuFind is a  Solr 1.4.1-based application with fairly large
> > and complex records (many  stored fields, many words per record).  My
> > particular installation  contains about a million records in the index,
> > with a total index size  around 6GB.
> > >
> > > The virtual environment has more RAM and  better CPUs than the old
> > physical box, and I am satisfied that my Java  environment is well-
> > tuned.  My index is optimized.  Searches that hit  the cache respond
> > very well.  The problem is that non-cached searches  are very slow - the
> > more keywords I add, the slower they get, to the  point of taking 6-12
> > seconds to come back with results on a quiet box  and well over a minute
> > under stress testing.  (The old box still took a 

Re: fq null pointer exception

2011-06-03 Thread Otis Gospodnetic
Dan, does the problem go away if you get rid of those 112 documents with empty 
Status or replace their empty status value with, say, "Unknown"?

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: dan whelan 
> To: solr-user@lucene.apache.org
> Sent: Fri, June 3, 2011 11:46:46 AM
> Subject: fq null pointer exception
> 
> I am noticing something strange with our recent upgrade to solr 3.1 and want 
> to  
>see if anyone has experienced anything similar.
> 
> I have a solr.StrField  field named Status the values are Enabled, Disabled, 
> or 
>''
> 
> When I facet  on that field it I get
> 
> Enabled 4409565
> Disabled 29185
> ""  112
> 
> 
> The issue is when I do a filter query
> 
> This query  works
> 
> select/?q=*:*&fq=Status:"Enabled"
> 
> But when I run this  query I get a NPE
> 
> select/?q=*:*&fq=Status:"Disabled"
> 
> 
> Here  is part of the stack trace
> 
> 
> Problem accessing  /solr/global_accounts/select/. Reason:
>  null
> 
> java.lang.NullPointerException
> at  org.apache.solr.response.XMLWriter.writePrim(XMLWriter.java:828)
>  at  org.apache.solr.response.XMLWriter.writeStr(XMLWriter.java:686)
>  at org.apache.solr.schema.StrField.write(StrField.java:49)
> at  org.apache.solr.schema.SchemaField.write(SchemaField.java:125)
>  at org.apache.solr.response.XMLWriter.writeDoc(XMLWriter.java:369)
>  at  org.apache.solr.response.XMLWriter$3.writeDocs(XMLWriter.java:545)
>  at  org.apache.solr.response.XMLWriter.writeDocuments(XMLWriter.java:482)
>  at  org.apache.solr.response.XMLWriter.writeDocList(XMLWriter.java:519)
>  at  org.apache.solr.response.XMLWriter.writeVal(XMLWriter.java:582)
>  at  org.apache.solr.response.XMLWriter.writeResponse(XMLWriter.java:131)
>  at  
>org.apache.solr.response.XMLResponseWriter.write(XMLResponseWriter.java:35)
> ...
> 
> 
> Thanks,
> 
> Dan
> 
> 


Re: query routing with shards

2011-06-03 Thread Dmitry Kan
Hi Otis,

Thanks! This sounds promising. This custom implementation, will it hurt in
any way the stability of the front end SOLR? After implementing it, can I
run some tests to verify the stability / performance?

Dmitry
On Fri, Jun 3, 2011 at 4:49 PM, Otis Gospodnetic  wrote:

> Hi Dmitry,
>
> Yes, you could also implement your own custom SearchComponent.  In this
> component you could grab the query param, examine the query value, and
> based on
> that add the shards URL param with appropriate value, so that when the
> regular
> QueryComponent grabs stuff from the request, it has the correct shard in
> there
> already.
>
> Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>
>
>
> - Original Message 
> > From: Dmitry Kan 
> > To: solr-user@lucene.apache.org
>  > Sent: Fri, June 3, 2011 2:47:00 AM
> > Subject: Re: query routing with shards
> >
> > Hi Otis,
> >
> > I merely followed on the gmail's suggestion to include other  people into
> the
> > recipients list, Yonik was the first one :) I won't do it  next time.
> >
> > Thanks for a rapid reply. The reason for doing this query  routing is
> that we
> > abstract the distributed SOLR from the client code for  security reasons
> > (that is, we don't want to expose the entire shard farm to  the world,
> but
> > only the frontend SOLR) and for better decoupling.
> >
> > Is  it possible to implement a plugin to SOLR that would map queries  to
> > shards?
> >
> > We have other choices too, they'll take quite some time,  that's why I
> > decided to quickly ask, if I was missing something from the SOLR  main
> > components design and configuration.
> >
> > Dmitry
> >
> > On Fri, Jun 3,  2011 at 8:25 AM, Otis Gospodnetic <
> otis_gospodne...@yahoo.com
> > >  wrote:
> >
> > > Hi Dmitry (you may not want to additionally copy Yonik, he's
>  subscribed to
> > > this
> > > list, too)
> > >
> > >
> > > It sounds  like you have the knowledge of which query maps to which
> shard.
> > >   If
> > > so, why not control/change the value of "shards" param in the request
>  to
> > > your
> > > front-end Solr (aka distributed request dispatcher)  within your app,
> which
> > > is
> > > the one calling Solr?
> > >
> > >  Otis
> > > 
> > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> > > Lucene  ecosystem search :: http://search-lucene.com/
> > >
> > >
> > >
> > > - Original  Message 
> > > > From: Dmitry Kan 
> > > > To: solr-user@lucene.apache.org; yo...@lucidimagination.com
> > >  > Sent: Thu, June 2, 2011 7:00:53 AM
> > > > Subject: query routing with  shards
> > > >
> > > > Hello all,
> > > >
> > > > We have  currently several pretty fat logically isolated shards  with
> the
> > >  same
> > > > schema / solrconfig (indices are separate). We currently  have  one
> single
> > > > front end SOLR (1.4) for the client code  calls. Since a client  code
> > > query
> > > > usually hits only  one shard, we are considering making a smart
>  routing
> > > of
> > >  > queries to the shards they map to. Can you please give some
>  pointers  as
> > > to
> > > > what would be an optimal way to achieve such a  routing inside  the
> front
> > > end
> > > > solr? Is there a way to  configure mapping inside the  solrconfig?
> > > >
> > > >  Thanks.
> > > >
> > > > --
> > > > Regards,
> > > >
> > >  > Dmitry Kan
> > > >
> > >
> >
> >
> >
> > --
> > Regards,
> >
> > Dmitry Kan
> >
>



-- 
Regards,

Dmitry Kan


Re: query routing with shards

2011-06-03 Thread Otis Gospodnetic
Nah, if you can quickly figure out which shard a given query maps to, then all 
this component needs to do is stick the appropriate shards param value in the 
request and let the request pass through to the other SearchComponents in the 
chain,  including QueryComponent, which will know what to do with the shards 
param.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: Dmitry Kan 
> To: solr-user@lucene.apache.org
> Sent: Fri, June 3, 2011 12:56:15 PM
> Subject: Re: query routing with shards
> 
> Hi Otis,
> 
> Thanks! This sounds promising. This custom implementation, will  it hurt in
> any way the stability of the front end SOLR? After implementing  it, can I
> run some tests to verify the stability /  performance?
> 
> Dmitry
> On Fri, Jun 3, 2011 at 4:49 PM, Otis Gospodnetic   >  wrote:
> 
> > Hi Dmitry,
> >
> > Yes, you could also implement your  own custom SearchComponent.  In this
> > component you could grab the  query param, examine the query value, and
> > based on
> > that add the  shards URL param with appropriate value, so that when the
> >  regular
> > QueryComponent grabs stuff from the request, it has the correct  shard in
> > there
> > already.
> >
> > Otis
> >  
> > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> > Lucene ecosystem  search :: http://search-lucene.com/
> >
> >
> >
> > - Original  Message 
> > > From: Dmitry Kan 
> > > To: solr-user@lucene.apache.org
> >   > Sent: Fri, June 3, 2011 2:47:00 AM
> > > Subject: Re: query routing  with shards
> > >
> > > Hi Otis,
> > >
> > > I  merely followed on the gmail's suggestion to include other  people  
into
> > the
> > > recipients list, Yonik was the first one :) I  won't do it  next time.
> > >
> > > Thanks for a rapid reply.  The reason for doing this query  routing is
> > that we
> > >  abstract the distributed SOLR from the client code for  security  reasons
> > > (that is, we don't want to expose the entire shard farm  to  the world,
> > but
> > > only the frontend SOLR) and for  better decoupling.
> > >
> > > Is  it possible to implement a  plugin to SOLR that would map queries  to
> > > shards?
> >  >
> > > We have other choices too, they'll take quite some time,   that's why I
> > > decided to quickly ask, if I was missing something  from the SOLR  main
> > > components design and  configuration.
> > >
> > > Dmitry
> > >
> > > On  Fri, Jun 3,  2011 at 8:25 AM, Otis Gospodnetic <
> > otis_gospodne...@yahoo.com
> >  > >  wrote:
> > >
> > > > Hi Dmitry (you may not  want to additionally copy Yonik, he's
> >  subscribed to
> > >  > this
> > > > list, too)
> > > >
> > >  >
> > > > It sounds  like you have the knowledge of which  query maps to which
> > shard.
> > > >   If
> > > >  so, why not control/change the value of "shards" param in the  request
> >  to
> > > > your
> > > > front-end Solr  (aka distributed request dispatcher)  within your app,
> >  which
> > > > is
> > > > the one calling Solr?
> > >  >
> > > >  Otis
> > > > 
> > > >  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> > > >  Lucene  ecosystem search :: http://search-lucene.com/
> > > >
> > >  >
> > > >
> > > > - Original  Message  
> > > > > From: Dmitry Kan 
> > >  > > To: solr-user@lucene.apache.org; yo...@lucidimagination.com
> >  > >  > Sent: Thu, June 2, 2011 7:00:53 AM
> > > > >  Subject: query routing with  shards
> > > > >
> > >  > > Hello all,
> > > > >
> > > > > We have   currently several pretty fat logically isolated shards  with
> >  the
> > > >  same
> > > > > schema / solrconfig  (indices are separate). We currently  have  one
> > single
> >  > > > front end SOLR (1.4) for the client code  calls. Since a  client  
code
> > > > query
> > > > > usually hits  only  one shard, we are considering making a smart
> >   routing
> > > > of
> > > >  > queries to the shards  they map to. Can you please give some
> >  pointers  as
> >  > > to
> > > > > what would be an optimal way to achieve such  a  routing inside  the
> > front
> > > > end
> >  > > > solr? Is there a way to  configure mapping inside the   solrconfig?
> > > > >
> > > > >  Thanks.
> >  > > >
> > > > > --
> > > > > Regards,
> >  > > >
> > > >  > Dmitry Kan
> > > >  >
> > > >
> > >
> > >
> > >
> > >  --
> > > Regards,
> > >
> > > Dmitry Kan
> >  >
> >
> 
> 
> 
> -- 
> Regards,
> 
> Dmitry Kan
> 


Re: [Visualizations] from Query Results

2011-06-03 Thread Adam Estrada
Otis and Erick,

Believe it or not, I did Google this and didn't come up with anything all
that useful. I was at the Lucene Revolution conference last year and saw
some prezos that had some sort of graphical representation of the query
results. The one from Basic Tech especially caught my attention because it
simply showed a graph of hits over time. I can do that using jQuery or
Raphael as he suggested. I have also been playing with the Carrot2
visualization tools which are pretty cool too which is why I pointed them
out in my original email. I was just curious to see if there were any
"speciality" type projects out there like Carrot2 that folks in the Solr
community are using.

Adam

On Fri, Jun 3, 2011 at 9:42 AM, Otis Gospodnetic  wrote:

> Hi Adam,
>
> Try this:
> http://lmgtfy.com/?q=search%20results%20visualizations
>
> In practice I find that visualizations are cool and attractive looking, but
> often text is more useful because it's more direct.  But there is room for
> graphical representation of search results, sure.
>
> Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>
>
>
> - Original Message 
> > From: Adam Estrada 
> > To: solr-user@lucene.apache.org
> > Sent: Fri, June 3, 2011 7:13:39 AM
> > Subject: [Visualizations] from Query Results
> >
> > Dear Solr experts,
> >
> > I am curious to learn what visualization tools are out  there to help me
> > "visualize" my query results. I am not talking about a  language specific
> > client per se but something more like Carrot2 which breaks  clusters in
> to
> > their knowledge tree and expandable pie chart. Sorry if those  aren't the
> > correct names for those tools ;-) Anyway, what else is out there  like
> > Carrot2 http://project.carrot2.org/ to help me visualize Solr query
>  results?
> >
> > Thanks for your input,
> > Adam
> >
>


Feature: skipping caches and info about cache use

2011-06-03 Thread Otis Gospodnetic
Hi,

Is it just me, or would others like things like:
* The ability to tell Solr (by passing some URL param?) to skip one or more of 
its caches and get data from the index
* An additional attrib in the Solr response that shows whether the query came 
from the cache or not

* Maybe something else along these lines?

Or maybe some of this is already there and I just don't know about it? :)

Thanks,
Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



Re: query routing with shards

2011-06-03 Thread Dmitry Kan
Got it, I can quickly figure the shard out, thanks a lot Otis!

Dmitry

On Fri, Jun 3, 2011 at 8:00 PM, Otis Gospodnetic  wrote:

> Nah, if you can quickly figure out which shard a given query maps to, then
> all
> this component needs to do is stick the appropriate shards param value in
> the
> request and let the request pass through to the other SearchComponents in
> the
> chain,  including QueryComponent, which will know what to do with the
> shards
> param.
>
> Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>
>
>
> - Original Message 
> > From: Dmitry Kan 
> > To: solr-user@lucene.apache.org
>   > Sent: Fri, June 3, 2011 12:56:15 PM
> > Subject: Re: query routing with shards
> >
> > Hi Otis,
> >
> > Thanks! This sounds promising. This custom implementation, will  it hurt
> in
> > any way the stability of the front end SOLR? After implementing  it, can
> I
> > run some tests to verify the stability /  performance?
> >
> > Dmitry
> > On Fri, Jun 3, 2011 at 4:49 PM, Otis Gospodnetic  <
> otis_gospodne...@yahoo.com
> > >  wrote:
> >
> > > Hi Dmitry,
> > >
> > > Yes, you could also implement your  own custom SearchComponent.  In
> this
> > > component you could grab the  query param, examine the query value, and
> > > based on
> > > that add the  shards URL param with appropriate value, so that when the
> > >  regular
> > > QueryComponent grabs stuff from the request, it has the correct  shard
> in
> > > there
> > > already.
> > >
> > > Otis
> > >  
> > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> > > Lucene ecosystem  search :: http://search-lucene.com/
> > >
> > >
> > >
> > > - Original  Message 
> > > > From: Dmitry Kan 
> > > > To: solr-user@lucene.apache.org
> > >   > Sent: Fri, June 3, 2011 2:47:00 AM
> > > > Subject: Re: query routing  with shards
> > > >
> > > > Hi Otis,
> > > >
> > > > I  merely followed on the gmail's suggestion to include other  people
> into
> > > the
> > > > recipients list, Yonik was the first one :) I  won't do it  next
> time.
> > > >
> > > > Thanks for a rapid reply.  The reason for doing this query  routing
> is
> > > that we
> > > >  abstract the distributed SOLR from the client code for  security
>  reasons
> > > > (that is, we don't want to expose the entire shard farm  to  the
> world,
> > > but
> > > > only the frontend SOLR) and for  better decoupling.
> > > >
> > > > Is  it possible to implement a  plugin to SOLR that would map queries
>  to
> > > > shards?
> > >  >
> > > > We have other choices too, they'll take quite some time,   that's why
> I
> > > > decided to quickly ask, if I was missing something  from the SOLR
>  main
> > > > components design and  configuration.
> > > >
> > > > Dmitry
> > > >
> > > > On  Fri, Jun 3,  2011 at 8:25 AM, Otis Gospodnetic <
> > > otis_gospodne...@yahoo.com
> > >  > >  wrote:
> > > >
> > > > > Hi Dmitry (you may not  want to additionally copy Yonik, he's
> > >  subscribed to
> > > >  > this
> > > > > list, too)
> > > > >
> > > >  >
> > > > > It sounds  like you have the knowledge of which  query maps to
> which
> > > shard.
> > > > >   If
> > > > >  so, why not control/change the value of "shards" param in the
>  request
> > >  to
> > > > > your
> > > > > front-end Solr  (aka distributed request dispatcher)  within your
> app,
> > >  which
> > > > > is
> > > > > the one calling Solr?
> > > >  >
> > > > >  Otis
> > > > > 
> > > > >  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> > > > >  Lucene  ecosystem search :: http://search-lucene.com/
> > > > >
> > > >  >
> > > > >
> > > > > - Original  Message  
> > > > > > From: Dmitry Kan 
> > > >  > > To: solr-user@lucene.apache.org; yo...@lucidimagination.com
> > >  > >  > Sent: Thu, June 2, 2011 7:00:53 AM
> > > > > >  Subject: query routing with  shards
> > > > > >
> > > >  > > Hello all,
> > > > > >
> > > > > > We have   currently several pretty fat logically isolated shards
>  with
> > >  the
> > > > >  same
> > > > > > schema / solrconfig  (indices are separate). We currently  have
>  one
> > > single
> > >  > > > front end SOLR (1.4) for the client code  calls. Since a  client
> code
> > > > > query
> > > > > > usually hits  only  one shard, we are considering making a smart
> > >   routing
> > > > > of
> > > > >  > queries to the shards  they map to. Can you please give some
> > >  pointers  as
> > >  > > to
> > > > > > what would be an optimal way to achieve such  a  routing inside
>  the
> > > front
> > > > > end
> > >  > > > solr? Is there a way to  configure mapping inside the
> solrconfig?
> > > > > >
> > > > > >  Thanks.
> > >  > > >
> > > > > > --
> > > > > > Regards,
> > >  > > >
> > > > >  > Dmitry Kan
> > > > >  >
> > > > >
> > > >
> > > >
> > > >
> > > >  --
> > > > Regards,
> > > >
> > > > Dmitry Kan
> > >  >
> > >
> >
> >
> >
> > --
> > Regards,
> >
> > Dmitry Kan
> >
>



-- 
Regards,

Dmitry Kan


RE: Hitting the URI limit, how to get around this?

2011-06-03 Thread Colin Bennett
It sounds like you need to increase the HTTP header size.

In tomcat the default is 4096 bytes, and to change it you need to add
maxHttpHeaderSize="" to the connector definition in server.xml 

Colin.

-Original Message-
From: Ken Krugler [mailto:kkrugler_li...@transpac.com] 
Sent: Friday, June 03, 2011 12:39 PM
To: solr-user@lucene.apache.org
Subject: Re: Hitting the URI limit, how to get around this?

It sounds like you're hitting the max URL length (8K is a common default)
for the HTTP web server that you're using to run Solr.

All of the web servers I know about let you bump this limit up via
configuration settings.

-- Ken

On Jun 3, 2011, at 9:27am, JohnRodey wrote:

> So here's what I'm seeing: I'm running Solr 3.1
> I'm running a java client that executes a Httpget (I tried HttpPost) with
a
> large shard list.  If I remove a few shards from my current list it
returns
> fine, when I use my full shard list I get a "HTTP/1.1 400 Bad Request".
If
> I execute it in firefox with a few shards removed it returns fine, with
the
> full shard list I get a blank screen returned immediately.
> 
> My URI works at around 7800 characters but adding one more shard to it
blows
> up.
> 
> Any ideas? 
> 
> I've tried using SolrJ rather than httpget before but ran into similar
> issues but with even less shards.
> See 
>
http://lucene.472066.n3.nabble.com/Long-list-of-shards-breaks-solrj-query-td
2748556.html
>
http://lucene.472066.n3.nabble.com/Long-list-of-shards-breaks-solrj-query-td
2748556.html 
> 
> My shards are added dynamically, every few hours I am adding new shards or
> cores into the cluster.  so I cannot have a shard list in the config files
> unless I can somehow update them while the system is running.
> 
> --
> View this message in context:
http://lucene.472066.n3.nabble.com/Hitting-the-URI-limit-how-to-get-around-t
his-tp3017837p3020185.html
> Sent from the Solr - User mailing list archive at Nabble.com.

--
Ken Krugler
+1 530-210-6378
http://bixolabs.com
custom data mining solutions











Re: fq null pointer exception

2011-06-03 Thread dan whelan

Otis, I just deleted the documents and committed and I still get that error.

Thanks,

Dan


On 6/3/11 9:43 AM, Otis Gospodnetic wrote:

Dan, does the problem go away if you get rid of those 112 documents with empty
Status or replace their empty status value with, say, "Unknown"?

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 

From: dan whelan
To: solr-user@lucene.apache.org
Sent: Fri, June 3, 2011 11:46:46 AM
Subject: fq null pointer exception

I am noticing something strange with our recent upgrade to solr 3.1 and want to
see if anyone has experienced anything similar.

I have a solr.StrField  field named Status the values are Enabled, Disabled, or
''

When I facet  on that field it I get

Enabled 4409565
Disabled 29185
""  112


The issue is when I do a filter query

This query  works

select/?q=*:*&fq=Status:"Enabled"

But when I run this  query I get a NPE

select/?q=*:*&fq=Status:"Disabled"


Here  is part of the stack trace


Problem accessing  /solr/global_accounts/select/. Reason:
  null

java.lang.NullPointerException
 at  org.apache.solr.response.XMLWriter.writePrim(XMLWriter.java:828)
  at  org.apache.solr.response.XMLWriter.writeStr(XMLWriter.java:686)
  at org.apache.solr.schema.StrField.write(StrField.java:49)
 at  org.apache.solr.schema.SchemaField.write(SchemaField.java:125)
  at org.apache.solr.response.XMLWriter.writeDoc(XMLWriter.java:369)
  at  org.apache.solr.response.XMLWriter$3.writeDocs(XMLWriter.java:545)
  at  org.apache.solr.response.XMLWriter.writeDocuments(XMLWriter.java:482)
  at  org.apache.solr.response.XMLWriter.writeDocList(XMLWriter.java:519)
  at  org.apache.solr.response.XMLWriter.writeVal(XMLWriter.java:582)
  at  org.apache.solr.response.XMLWriter.writeResponse(XMLWriter.java:131)
  at
org.apache.solr.response.XMLResponseWriter.write(XMLResponseWriter.java:35)
...


Thanks,

Dan






Re: Strategy --> Frequent updates in our application

2011-06-03 Thread Jack Repenning
On Jun 2, 2011, at 8:29 PM, Naveen Gupta wrote:

> and what about NRT, is it fine to apply in this case of scenario

Is NRT really what's wanted here? I'm asking the experts, as I have a situation 
 not too different from the b.p.

It appears to me (from the dox) that NRT makes a difference in the lag between 
a document being added and it being available in searches. But the BP really 
sounds to me like a concern over documents-added-per-second. Does the 
RankingAlgorithm form of NRT improve the docs-added-per-second performance?

My add-to-view limits aren't really threatened by Solr performance today; 
something like 30 seconds is just fine. But I am feeling close enough to the 
documents-per-second boundary that I'm pondering measures like master/slave. If 
NRT only improvs add-to-view lag, I'm not overly interested, but if it can 
improve add throughput, I'm all over it ;-)

-==-
Jack Repenning
Technologist
Codesion Business Unit
CollabNet, Inc.
8000 Marina Boulevard, Suite 600
Brisbane, California 94005
office: +1 650.228.2562
twitter: http://twitter.com/jrep











PGP.sig
Description: This is a digitally signed message part


Re: Hitting the URI limit, how to get around this?

2011-06-03 Thread Dmitry Kan
Hi,

Why not use HTTP POST?

Dmitry

On Fri, Jun 3, 2011 at 8:27 PM, Colin Bennett  wrote:

> It sounds like you need to increase the HTTP header size.
>
> In tomcat the default is 4096 bytes, and to change it you need to add
> maxHttpHeaderSize="" to the connector definition in server.xml
>
> Colin.
>
> -Original Message-
> From: Ken Krugler [mailto:kkrugler_li...@transpac.com]
> Sent: Friday, June 03, 2011 12:39 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Hitting the URI limit, how to get around this?
>
> It sounds like you're hitting the max URL length (8K is a common default)
> for the HTTP web server that you're using to run Solr.
>
> All of the web servers I know about let you bump this limit up via
> configuration settings.
>
> -- Ken
>
> On Jun 3, 2011, at 9:27am, JohnRodey wrote:
>
> > So here's what I'm seeing: I'm running Solr 3.1
> > I'm running a java client that executes a Httpget (I tried HttpPost) with
> a
> > large shard list.  If I remove a few shards from my current list it
> returns
> > fine, when I use my full shard list I get a "HTTP/1.1 400 Bad Request".
> If
> > I execute it in firefox with a few shards removed it returns fine, with
> the
> > full shard list I get a blank screen returned immediately.
> >
> > My URI works at around 7800 characters but adding one more shard to it
> blows
> > up.
> >
> > Any ideas?
> >
> > I've tried using SolrJ rather than httpget before but ran into similar
> > issues but with even less shards.
> > See
> >
>
> http://lucene.472066.n3.nabble.com/Long-list-of-shards-breaks-solrj-query-td
> 2748556.html
> >
>
> http://lucene.472066.n3.nabble.com/Long-list-of-shards-breaks-solrj-query-td
> 2748556.html
> >
> > My shards are added dynamically, every few hours I am adding new shards
> or
> > cores into the cluster.  so I cannot have a shard list in the config
> files
> > unless I can somehow update them while the system is running.
> >
> > --
> > View this message in context:
>
> http://lucene.472066.n3.nabble.com/Hitting-the-URI-limit-how-to-get-around-t
> his-tp3017837p3020185.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
>
> --
> Ken Krugler
> +1 530-210-6378
> http://bixolabs.com
> custom data mining solutions
>
>
>
>
>
>
>
>
>
>


-- 
Regards,

Dmitry Kan


How to know how many documents are indexed? Anything more elegant than parsing numFound?

2011-06-03 Thread Gabriele Kahlout
$ curl "http://192.168.34.51:8080/solr/select?q=*%3A*&rows=0"; >> resp.xml
$ xmlstarlet sel -t -v "//@numFound" resp.xml


-- 
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains "[LON]" or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
< Now + 48h) ⇒ ¬resend(I, this).

If an email is sent by a sender that is not a trusted contact or the email
does not contain a valid code then the email is not received. A valid code
starts with a hyphen and ends with "X".
∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
L(-[a-z]+[0-9]X)).


Re: Hitting the URI limit, how to get around this?

2011-06-03 Thread JohnRodey
Yep that was my issue.

And like Ken said on Tomcat I set maxHttpHeaderSize="65536".



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Hitting-the-URI-limit-how-to-get-around-this-tp3017837p3020774.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to know how many documents are indexed? Anything more elegant than parsing numFound?

2011-06-03 Thread Ahmet Arslan
: How to know how many documents are indexed? Anything more elegant than
: parsing numFound?
> $ curl "http://192.168.34.51:8080/solr/select?q=*%3A*&rows=0";
> >> resp.xml
> $ xmlstarlet sel -t -v "//@numFound" resp.xml

solr/admin/stats.jsp is actually an xml too and contains numDocs and maxDoc 
info. 

I think you can get numDocs with jmx too. http://wiki.apache.org/solr/SolrJmx


Getting payloads in Highlighter

2011-06-03 Thread lboutros
Hi all,

I need to highlight searched words in the original text (xml) of a document. 

So I'm trying to develop a new Highlighter which uses the defaultHighlighter
to highlight some fields and then retrieve the original text file/document
(external or internal storage) and put the highlighted parts into them.

I'm using an additional field for the field offsets for each field in each
document.
To store the offsets (and perhaps other infos) I'm using the payloads. (I
cannot wait for the future DocValues).

now my question, what is the fastest way to retrieve payloads (TermPositions
?) for a given document a given field and a given term ?

If other methods exist to do that, I'm open :)

Ludovic.



-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Getting-payloads-in-Highlighter-tp3020885p3020885.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: fq null pointer exception

2011-06-03 Thread Otis Gospodnetic
And what happens if you add &fl=?

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: dan whelan 
> To: solr-user@lucene.apache.org
> Sent: Fri, June 3, 2011 1:38:33 PM
> Subject: Re: fq null pointer exception
> 
> Otis, I just deleted the documents and committed and I still get that  error.
> 
> Thanks,
> 
> Dan
> 
> 
> On 6/3/11 9:43 AM, Otis Gospodnetic  wrote:
> > Dan, does the problem go away if you get rid of those 112  documents with 
>empty
> > Status or replace their empty status value with,  say, "Unknown"?
> >
> > Otis
> > 
> > Sematext :: http://sematext.com/ :: Solr -  Lucene - Nutch
> > Lucene ecosystem search :: http://search-lucene.com/
> >
> >
> >
> > - Original  Message 
> >> From: dan whelan
> >> To: solr-user@lucene.apache.org
> >>  Sent: Fri, June 3, 2011 11:46:46 AM
> >> Subject: fq null pointer  exception
> >>
> >> I am noticing something strange with our  recent upgrade to solr 3.1 and 
>want to
> >> see if anyone has experienced  anything similar.
> >>
> >> I have a solr.StrField  field  named Status the values are Enabled, 
>Disabled, or
> >>  ''
> >>
> >> When I facet  on that field it I  get
> >>
> >> Enabled 4409565
> >> Disabled  29185
> >> ""  112
> >>
> >>
> >> The issue is  when I do a filter query
> >>
> >> This query   works
> >>
> >>  select/?q=*:*&fq=Status:"Enabled"
> >>
> >> But when I run  this  query I get a NPE
> >>
> >>  select/?q=*:*&fq=Status:"Disabled"
> >>
> >>
> >>  Here  is part of the stack trace
> >>
> >>
> >>  Problem accessing  /solr/global_accounts/select/. Reason:
> >>null
> >>
> >>  java.lang.NullPointerException
> >>  at   org.apache.solr.response.XMLWriter.writePrim(XMLWriter.java:828)
> >>at   org.apache.solr.response.XMLWriter.writeStr(XMLWriter.java:686)
> >>at  org.apache.solr.schema.StrField.write(StrField.java:49)
> >>   at   org.apache.solr.schema.SchemaField.write(SchemaField.java:125)
> >>at  org.apache.solr.response.XMLWriter.writeDoc(XMLWriter.java:369)
> >>at   
>org.apache.solr.response.XMLWriter$3.writeDocs(XMLWriter.java:545)
> >>at   
>org.apache.solr.response.XMLWriter.writeDocuments(XMLWriter.java:482)
> >>at   
>org.apache.solr.response.XMLWriter.writeDocList(XMLWriter.java:519)
> >>at   org.apache.solr.response.XMLWriter.writeVal(XMLWriter.java:582)
> >>at   
>org.apache.solr.response.XMLWriter.writeResponse(XMLWriter.java:131)
> >>at
> >>  
org.apache.solr.response.XMLResponseWriter.write(XMLResponseWriter.java:35)
> >>  ...
> >>
> >>
> >> Thanks,
> >>
> >>  Dan
> >>
> >>
> 
> 


Re: Strategy --> Frequent updates in our application

2011-06-03 Thread Otis Gospodnetic
Yes, when people talk about NRT search they refer to 'add to view lag'.  In a 
typical Solr master-slave setup this is dominated by waiting for replication, 
doing the replication, and then warming up.

If your problem is indexing speed then that's a separate story that I think 
you'll find answers to on http://search-lucene.com/ or if you can't find them 
we 
can repeat :)

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: Jack Repenning 
> To: solr-user@lucene.apache.org
> Sent: Fri, June 3, 2011 2:10:27 PM
> Subject: Re: Strategy --> Frequent updates in our application
> 
> On Jun 2, 2011, at 8:29 PM, Naveen Gupta wrote:
> 
> > and what about NRT,  is it fine to apply in this case of scenario
> 
> Is NRT really what's wanted  here? I'm asking the experts, as I have a 
>situation  not too different from  the b.p.
> 
> It appears to me (from the dox) that NRT makes a difference in  the lag 
> between 
>a document being added and it being available in searches. But  the BP really 
>sounds to me like a concern over documents-added-per-second. Does  the 
>RankingAlgorithm form of NRT improve the docs-added-per-second  performance?
> 
> My add-to-view limits aren't really threatened by Solr  performance today; 
>something like 30 seconds is just fine. But I am feeling  close enough to the 
>documents-per-second boundary that I'm pondering measures  like master/slave. 
>If 
>NRT only improvs add-to-view lag, I'm not overly  interested, but if it can 
>improve add throughput, I'm all over it  ;-)
> 
> -==-
> Jack Repenning
> Technologist
> Codesion Business  Unit
> CollabNet, Inc.
> 8000 Marina Boulevard, Suite 600
> Brisbane,  California 94005
> office: +1 650.228.2562
> twitter: http://twitter.com/jrep
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 


Re: Getting payloads in Highlighter

2011-06-03 Thread lboutros
To clarify a bit more, I took a look to this function :

termPositions

public TermPositions termPositions()
throws IOException

Description copied from class: IndexReader
Returns an unpositioned TermPositions enumerator. 

But it returns an unpositioned enumerator, is there a way to get a
TermPositions directly positioned on a document, a field and a term ?

Ludovic.

-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Getting-payloads-in-Highlighter-tp3020885p3020922.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Getting payloads in Highlighter

2011-06-03 Thread Ahmet Arslan
> I need to highlight searched words in the original text
> (xml) of a document. 

Why don't you remove xml tags in an analyzer? You can highlight xml by doing so.


Re: How to know how many documents are indexed? Anything more elegant than parsing numFound?

2011-06-03 Thread Gabriele Kahlout
$ curl --fail "http://192.168.34.51:8080/solr/admin/stats.jsp"; >> resp.xml
$ xmlstarlet sel -t -v "//@numDocs" resp.xml
*Extra content at the end of the document*

On Fri, Jun 3, 2011 at 8:56 PM, Ahmet Arslan  wrote:

> : How to know how many documents are indexed? Anything more elegant than
> : parsing numFound?
> > $ curl "http://192.168.34.51:8080/solr/select?q=*%3A*&rows=0";
> > >> resp.xml
> > $ xmlstarlet sel -t -v "//@numFound" resp.xml
>
> solr/admin/stats.jsp is actually an xml too and contains numDocs and maxDoc
> info.
>
> I think you can get numDocs with jmx too.
> http://wiki.apache.org/solr/SolrJmx
>



-- 
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains "[LON]" or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
< Now + 48h) ⇒ ¬resend(I, this).

If an email is sent by a sender that is not a trusted contact or the email
does not contain a valid code then the email is not received. A valid code
starts with a hyphen and ends with "X".
∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
L(-[a-z]+[0-9]X)).


Re: fq null pointer exception

2011-06-03 Thread dan whelan

It returned results when I added the fl param.

Strange... wonder what is going on there

Thanks,

Dan



On 6/3/11 12:17 PM, Otis Gospodnetic wrote:

And what happens if you add&fl=?

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 

From: dan whelan
To: solr-user@lucene.apache.org
Sent: Fri, June 3, 2011 1:38:33 PM
Subject: Re: fq null pointer exception

Otis, I just deleted the documents and committed and I still get that  error.

Thanks,

Dan


On 6/3/11 9:43 AM, Otis Gospodnetic  wrote:

Dan, does the problem go away if you get rid of those 112  documents with

empty

Status or replace their empty status value with,  say, "Unknown"?

Otis

Sematext :: http://sematext.com/ :: Solr -  Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original  Message 

From: dan whelan
To: solr-user@lucene.apache.org
  Sent: Fri, June 3, 2011 11:46:46 AM
Subject: fq null pointer  exception

I am noticing something strange with our  recent upgrade to solr 3.1 and

want to

see if anyone has experienced  anything similar.

I have a solr.StrField  field  named Status the values are Enabled,

Disabled, or

  ''

When I facet  on that field it I  get

Enabled 4409565
Disabled  29185
""  112


The issue is  when I do a filter query

This query   works

  select/?q=*:*&fq=Status:"Enabled"

But when I run  this  query I get a NPE

  select/?q=*:*&fq=Status:"Disabled"


  Here  is part of the stack trace


  Problem accessing  /solr/global_accounts/select/. Reason:
null

  java.lang.NullPointerException
  at   org.apache.solr.response.XMLWriter.writePrim(XMLWriter.java:828)
at   org.apache.solr.response.XMLWriter.writeStr(XMLWriter.java:686)
at  org.apache.solr.schema.StrField.write(StrField.java:49)
   at   org.apache.solr.schema.SchemaField.write(SchemaField.java:125)
at  org.apache.solr.response.XMLWriter.writeDoc(XMLWriter.java:369)
at

org.apache.solr.response.XMLWriter$3.writeDocs(XMLWriter.java:545)

at

org.apache.solr.response.XMLWriter.writeDocuments(XMLWriter.java:482)

at

org.apache.solr.response.XMLWriter.writeDocList(XMLWriter.java:519)

at   org.apache.solr.response.XMLWriter.writeVal(XMLWriter.java:582)
at

org.apache.solr.response.XMLWriter.writeResponse(XMLWriter.java:131)

at


org.apache.solr.response.XMLResponseWriter.write(XMLResponseWriter.java:35)

  ...


Thanks,

  Dan








Re: Solr performance tuning - disk i/o?

2011-06-03 Thread Erick Erickson
Quick impressions:

The faceting is usually best done on fields that don't have lots of unique
values for three reasons:
1> It's questionable how much use to the user to have a gazillion facets.
 In the case of a unique field per document, in fact, it's useless.
2> resource requirements go up as a function of the number of unique
 terms. This is true for faceting and sorting.
3> warmup times grow the more terms have to be read into memory.


Glancing at your warmup stuff, things like publishDate, authorStr and maybe
callnumber-first are questionable. publishDate depends on how coarse the
resolution is. If it's by day, that's not really much use. authorStr.. How many
authors have more than one publication? Would this be better served by some
kind of autosuggest rather than facets? callnumber-first... I don't
really know, but
if it's unique per document it's probably not something the user would
find useful
as a facet.

The admin page will help you determine the number of unique terms per field,
which may guide you whether or not to continue to facet on these fields.

As Otis said, doing a sort on the fields during warmup will also help.

Watch your polling interval for any slaves in relation to the warmup times.
If your polling interval is shorter than the warmup times, you run a risk of
"runaway warmups".

As you've figured out, measuring responses to the first few queries doesn't
always measure what you really need ..

I don't have the pages handy, but autowarming is a good topic to understand,
so you might spend some time tracking it down.

Best
Erick

On Fri, Jun 3, 2011 at 11:21 AM, Demian Katz  wrote:
> Thanks to you and Otis for the suggestions!  Some more information:
>
> - Based on the Solr stats page, my caches seem to be working pretty well (few 
> or no evictions, hit rates in the 75-80% range).
> - VuFind is actually doing two Solr queries per search (one initial search 
> followed by a supplemental spell check search -- I believe this is necessary 
> because VuFind has two separate spelling indexes, one for shingled terms and 
> one for single words).  That is probably exaggerating the problem, though 
> based on searches with debugQuery on, it looks like it's always the initial 
> search (rather than the supplemental spelling search) that's consuming the 
> bulk of the time.
> - enableLazyFieldLoading is set to true.
> - I'm retrieving 20 documents per page.
> - My JVM settings: -server -Xloggc:/usr/local/vufind/solr/jetty/logs/gc.log 
> -Xms4096m -Xmx4096m -XX:+UseParallelGC -XX:+UseParallelOldGC -XX:NewRatio=5
>
> It appears that a large portion of my problem had to do with autowarming, a 
> topic that I've never had a strong grasp on, though perhaps I'm finally 
> learning (any recommended primer links would be welcome!).  I did have some 
> autowarming settings in solrconfig.xml (an arbitrary search for a bunch of 
> random keywords in the newSearcher and firstSearcher events, plus 
> autowarmCount settings on all of my caches).  However, when I looked at the 
> debugQuery output, I noticed that a huge amount of time was being wasted 
> loading facets on the first search after restarting Solr, so I changed my 
> newSearcher and firstSearcher events to this:
>
>      
>        
>          *:*
>          0
>          10
>          true
>          1
>          collection
>          format
>          publishDate
>          callnumber-first
>          topic_facet
>          authorStr
>          language
>          genre_facet
>          era_facet
>          geographic_facet
>        
>      
>
> Overall performance has now increased dramatically, and now the biggest 
> bottleneck in the debug output seems to be the shingle spell checking!
>
> Any other suggestions are welcome, since I suspect there's still room to 
> squeeze more performance out of the system, and I'm still not sure I'm making 
> the most of autowarming...  but this seems like a big step in the right 
> direction.  Thanks again for the help!
>
> - Demian
>
>> -Original Message-
>> From: Erick Erickson [mailto:erickerick...@gmail.com]
>> Sent: Friday, June 03, 2011 9:41 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Solr performance tuning - disk i/o?
>>
>> This doesn't seem right. Here's a couple of things to try:
>> 1> attach &debugQuery=on to your long-running queries. The QTime
>> returned
>>      is the time taken to search, NOT including the time to load the
>> docs. That'll
>>      help pinpoint whether the problem is the search itself, or
>> assembling the
>>      documents.
>> 2> Are you autowarming? If so, be sure it's actually done before
>> querying.
>> 3> Measure queries after the first few, particularly if you're sorting
>> or
>>      faceting.
>> 4> What are your JVM settings? How much memory do you have?
>> 5> is  set to true in your solrconfig.xml?
>> 6> How many docs are you returning?
>>
>>
>> There's more, but that'll do for a start Let us know if you gather
>> more data
>> and it'

Re: Better to have lots of smaller cores or one really big core?

2011-06-03 Thread Erick Erickson
Nope, cores are just a self-contained index, really.

What is the point of breaking them up? If you have some kind
of rolling currency (i.e. you only want to keep the last N days/weeks/months)
then you can always delete-by-query to age-out the relevant docs.

You'll be able to fit more on one server if it's in a single core, but what the
ratio is I'm not sure.

My take would be go for the simplest, which would be a single core (index)
for administrative purposes if for no other reason, but that may well just be
personal preference...

Best
Erick

On Fri, Jun 3, 2011 at 10:10 AM, JohnRodey  wrote:
> Thanks Erick for the response.
>
> So my data structure is the same, i.e. they all use the same schema.  Though
> I think it makes sense for us to somehow break apart the data, for example
> by the date it was indexed.  I'm just trying to get a feel for how large we
> should aim to keep those (by day, by week, by month, etc...).
>
> So it sounds like we should aim to keep them at a size that one solr server
> can host to avoid serving multiple cores.
>
> One question, there is no real difference (other than configuration) from a
> server hosting its own index vs. it hosting one core, is there?
>
> Thanks!
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Better-to-have-lots-of-smaller-cores-or-one-really-big-core-tp3017973p3019686.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Getting payloads in Highlighter

2011-06-03 Thread lboutros
The original document is not indexed. Currently it is just stored and could
be stored in an filesystem or a database in the future.

The different parts of a document are indexed in multiple different fields
with some  different analyzers (stemming, multiple languages, regex,...).

So, I don't think your solution can be applied, but if I'm wrong, could you
please explain me how ?

Thanks,

Ludovic.


-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Getting-payloads-in-Highlighter-tp3020885p3021383.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Feature: skipping caches and info about cache use

2011-06-03 Thread Robert Petersen
Why, I'm just wondering?
  
For a case where you know the next query would not be possible to be
already in the cache because it is so different from the norm? 

Just for timing information for instrumentation used for tuning  (ie so
you can compare cached response times vs non-cached response times)?  


-Original Message-
From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] 
Sent: Friday, June 03, 2011 10:02 AM
To: solr-user@lucene.apache.org
Subject: Feature: skipping caches and info about cache use

Hi,

Is it just me, or would others like things like:
* The ability to tell Solr (by passing some URL param?) to skip one or
more of 
its caches and get data from the index
* An additional attrib in the Solr response that shows whether the query
came 
from the cache or not

* Maybe something else along these lines?

Or maybe some of this is already there and I just don't know about it?
:)

Thanks,
Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



Re: Feature: skipping caches and info about cache use

2011-06-03 Thread Yonik Seeley
On Fri, Jun 3, 2011 at 1:02 PM, Otis Gospodnetic
 wrote:
> Is it just me, or would others like things like:
> * The ability to tell Solr (by passing some URL param?) to skip one or more of
> its caches and get data from the index

Yeah, we've needed this for a long time, and I believe there's a JIRA
issue open for it.
It really needs to be on a per query basis though... so a localParam
that has cache=true/false
would be ideal.


-Yonik
http://www.lucidimagination.com


Re: fq null pointer exception

2011-06-03 Thread Yonik Seeley
Dan, this doesn't really have anything to do with your filter on the
Status field except that it causes different documents to be selected.
The root cause is a schema mismatch with your index.
A string field (or so the schema is saying it's a string field) is
returning "null" for a value, which is impossible (null values aren't
stored... they are simply missing).
This can happen when the field is actually stored as binary (as is the
case for numeric fields).  So my guess is that a field that was
previously a numeric field is now declared to be of type string by the
current schema.

You can try varying the "fl" parameter to see what field is causing
the issue, or try luke or the luke request handler for a lower-level
view of the index.

-Yonik
http://www.lucidimagination.com



On Fri, Jun 3, 2011 at 11:46 AM, dan whelan  wrote:
> I am noticing something strange with our recent upgrade to solr 3.1 and want
> to see if anyone has experienced anything similar.
>
> I have a solr.StrField field named Status the values are Enabled, Disabled,
> or ''
>
> When I facet on that field it I get
>
> Enabled 4409565
> Disabled 29185
> "" 112
>
>
> The issue is when I do a filter query
>
> This query works
>
> select/?q=*:*&fq=Status:"Enabled"
>
> But when I run this query I get a NPE
>
> select/?q=*:*&fq=Status:"Disabled"
>
>
> Here is part of the stack trace
>
>
> Problem accessing /solr/global_accounts/select/. Reason:
>    null
>
> java.lang.NullPointerException
>    at org.apache.solr.response.XMLWriter.writePrim(XMLWriter.java:828)
>    at org.apache.solr.response.XMLWriter.writeStr(XMLWriter.java:686)
>    at org.apache.solr.schema.StrField.write(StrField.java:49)
>    at org.apache.solr.schema.SchemaField.write(SchemaField.java:125)
>    at org.apache.solr.response.XMLWriter.writeDoc(XMLWriter.java:369)
>    at org.apache.solr.response.XMLWriter$3.writeDocs(XMLWriter.java:545)
>    at org.apache.solr.response.XMLWriter.writeDocuments(XMLWriter.java:482)
>    at org.apache.solr.response.XMLWriter.writeDocList(XMLWriter.java:519)
>    at org.apache.solr.response.XMLWriter.writeVal(XMLWriter.java:582)
>    at org.apache.solr.response.XMLWriter.writeResponse(XMLWriter.java:131)
>    at
> org.apache.solr.response.XMLResponseWriter.write(XMLResponseWriter.java:35)
> ...
>
>
> Thanks,
>
> Dan
>
>


Re: fq null pointer exception

2011-06-03 Thread Otis Gospodnetic
Right, so now try adding different fields and see which one breaks it again.  
Then you know which field is a problem and you can dig deeper around that field.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: dan whelan 
> To: solr-user@lucene.apache.org
> Sent: Fri, June 3, 2011 4:34:40 PM
> Subject: Re: fq null pointer exception
> 
> It returned results when I added the fl param.
> 
> Strange... wonder what is  going on there
> 
> Thanks,
> 
> Dan
> 
> 
> 
> On 6/3/11 12:17 PM, Otis  Gospodnetic wrote:
> > And what happens if you add&fl=?
> >
> > Otis
> > 
> > Sematext :: http://sematext.com/ :: Solr -  Lucene - Nutch
> > Lucene ecosystem search :: http://search-lucene.com/
> >
> >
> >
> > - Original  Message 
> >> From: dan whelan
> >> To: solr-user@lucene.apache.org
> >>  Sent: Fri, June 3, 2011 1:38:33 PM
> >> Subject: Re: fq null pointer  exception
> >>
> >> Otis, I just deleted the documents and  committed and I still get that  
>error.
> >>
> >>  Thanks,
> >>
> >> Dan
> >>
> >>
> >> On  6/3/11 9:43 AM, Otis Gospodnetic  wrote:
> >>> Dan, does the  problem go away if you get rid of those 112  documents with
> >>  empty
> >>> Status or replace their empty status value with,   say, "Unknown"?
> >>>
> >>> Otis
> >>>  
> >>> Sematext :: http://sematext.com/ :: Solr -  Lucene - Nutch
> >>> Lucene  ecosystem search :: http://search-lucene.com/
> >>>
> >>>
> >>>
> >>>  - Original  Message 
>  From: dan whelan
>  To: solr-user@lucene.apache.org
> Sent: Fri, June 3, 2011 11:46:46 AM
>  Subject: fq null  pointer  exception
> 
>  I am noticing  something strange with our  recent upgrade to solr 3.1 and
> >> want  to
>  see if anyone has experienced  anything  similar.
> 
>  I have a solr.StrField   field  named Status the values are Enabled,
> >> Disabled,  or
>    ''
> 
>  When I  facet  on that field it I  get
> 
>   Enabled 4409565
>  Disabled  29185
>   ""  112
> 
> 
>  The  issue is  when I do a filter query
> 
>   This query   works
> 
> select/?q=*:*&fq=Status:"Enabled"
> 
>   But when I run  this  query I get a  NPE
> 
> select/?q=*:*&fq=Status:"Disabled"
> 
> 
> Here  is part of the stack  trace
> 
> 
>    Problem  accessing  /solr/global_accounts/select/. Reason:
>   null
> 
> java.lang.NullPointerException
>    at
>org.apache.solr.response.XMLWriter.writePrim(XMLWriter.java:828)
>   at
>org.apache.solr.response.XMLWriter.writeStr(XMLWriter.java:686)
>   at   org.apache.solr.schema.StrField.write(StrField.java:49)
>  at
>org.apache.solr.schema.SchemaField.write(SchemaField.java:125)
>   at   
>org.apache.solr.response.XMLWriter.writeDoc(XMLWriter.java:369)
>   at
> >>  org.apache.solr.response.XMLWriter$3.writeDocs(XMLWriter.java:545)
>   at
> >>  org.apache.solr.response.XMLWriter.writeDocuments(XMLWriter.java:482)
>   at
> >>  org.apache.solr.response.XMLWriter.writeDocList(XMLWriter.java:519)
>   at
>org.apache.solr.response.XMLWriter.writeVal(XMLWriter.java:582)
>   at
> >>  org.apache.solr.response.XMLWriter.writeResponse(XMLWriter.java:131)
>   at
> 
> >  org.apache.solr.response.XMLResponseWriter.write(XMLResponseWriter.java:35)
> ...
> 
> 
>   Thanks,
> 
> Dan
> 
> 
> >>
> 
> 


Re: Feature: skipping caches and info about cache use

2011-06-03 Thread Otis Gospodnetic
Robert,

Mainly so that you can tell how fast the search itself is when query or 
documents or filters are not cached.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: Robert Petersen 
> To: solr-user@lucene.apache.org
> Sent: Fri, June 3, 2011 5:58:43 PM
> Subject: RE: Feature: skipping caches and info about cache use
> 
> Why, I'm just wondering?
>   
> For a case where you know the next query  would not be possible to be
> already in the cache because it is so different  from the norm? 
> 
> Just for timing information for instrumentation used for  tuning  (ie so
> you can compare cached response times vs non-cached  response times)?  
> 
> 
> -Original Message-
> From: Otis  Gospodnetic [mailto:otis_gospodne...@yahoo.com] 
> Sent: Friday, June 03, 2011 10:02 AM
> To: solr-user@lucene.apache.org
> Subject:  Feature: skipping caches and info about cache use
> 
> Hi,
> 
> Is it just  me, or would others like things like:
> * The ability to tell Solr (by passing  some URL param?) to skip one or
> more of 
> its caches and get data from the  index
> * An additional attrib in the Solr response that shows whether the  query
> came 
> from the cache or not
> 
> * Maybe something else along  these lines?
> 
> Or maybe some of this is already there and I just don't know  about it?
> :)
> 
> Thanks,
> Otis
> 
> Sematext :: http://sematext.com/ :: Solr -  Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
> 
> 


Re: How to disable QueryElevationComponent

2011-06-03 Thread Otis Gospodnetic
Romi,

If you don't have a unique ID field, you can always create a UUID - see 
http://search-lucene.com/?q=uuid&fc_type=javadoc
If you don't want to use QEC, remove it from the list of components in 
solrconfig.xml

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: Romi 
> To: solr-user@lucene.apache.org
> Sent: Fri, May 27, 2011 5:36:22 AM
> Subject: How to disable QueryElevationComponent
> 
> Hi, in my indexed document i do not want a uniqueKey field, but when i do  not
> give any uniqueKey in schema.xml then it shows an  exception
> org.apache.solr.common.SolrException: QueryElevationComponent  requires the
> schema to have a uniqueKeyField.
> it means  QueryElevationComponent requires a uniqueKey field.then how can i
> disable  this QueryEvelationComponent. please reply.
> 
> -
> Thanks &  Regards
> Romi
> --
> View this message in context: 
>http://lucene.472066.n3.nabble.com/How-to-disable-QueryElevationComponent-tp2992195p2992195.html
>
> Sent  from the Solr - User mailing list archive at Nabble.com.
> 


Re: Nutch Crawl error

2011-06-03 Thread Otis Gospodnetic
Roger, wrong list.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: Roger Shah 
> To: "solr-user@lucene.apache.org" 
> Sent: Thu, May 26, 2011 3:06:15 PM
> Subject: Nutch Crawl error
> 
> I ran the command bin/nutch crawl urls -dir crawl -depth 3 >&  crawl.log
> 
> When I viewed crawl.log I found some errors such  as:
> 
> Can't retrieve Tika parser for  mime-typeapplication/x-shockwave-flash, and 
>some other similar messages for  other types such as application/xml, etc.
> 
> Do I need to download Tika for  these errors to go away?  Where can I 
> download 
>Tika so that it can work  with Nutch?  If there are instructions to install 
>Tika 
>to work with Nutch  please send them to me.
> 
> Thanks,
> Roger
> 


found a bug in query parser upgrading from 1.4.1 to 3.1

2011-06-03 Thread Jason Toy
Greeting all, I found a bug today while trying to upgrade from 1.4.1 to 3.1

In 1.4.1 I was able to insert this  doc:
User
14914457UserSan
Franciscojtoyjtoylife
hacker0.05


And then I can run the query:

http://localhost:8983/solr/select?q=life&qf=description_text&defType=dismax&sort=scores:rails_f+desc

and I will get results.

If I insert the same document into solr 3.1 and run the same query I get the
error:

Problem accessing /solr/select. Reason:

undefined field scores

For some reason, solr has cutoff the column name from the colon
forward so "scores:rails_f" becomes "scores"

I can see in the lucene index that the data for scores:rails_f is in
the document. For that reason I believe the bug is in solr and not in
lucene.



Jason Toy
socmetrics
http://socmetrics.com
@jtoy


Re: Solr Indexing Patterns

2011-06-03 Thread Judioo
Hi,
Discounts can change daily. Also there can be a lot of them (over time and
in a given time period ).

Could you give an example of what you mean buy multi-valuing the field.

Thanks

On 3 June 2011 14:29, Erick Erickson  wrote:

> How often are the discounts changed? Because you can simply
> re-index the book information with a multiValued "discounts" field
> and get something similar to your example (&wt=json)
>
>
> Best
> Erick
>
> On Fri, Jun 3, 2011 at 8:38 AM, Judioo  wrote:
> > What is the "best practice" method to index the following in Solr:
> >
> > I'm attempting to use solr for a book store site.
> >
> > Each book will have a price but on occasions this will be discounted. The
> > discounted price exists for a defined time period but there may be many
> > discount periods. Each discount will have a brief synopsis, start and end
> > time.
> >
> > A subset of the desired output would be as follows:
> >
> > ...
> > "response":{"numFound":1,"start":0,"docs":[
> >  {
> >"name":"The Book",
> >"price":"$9.99",
> >"discounts":[
> >{
> > "price":"$3.00",
> > "synopsis":"thanksgiving special",
> > "starts":"11-24-2011",
> > "ends":"11-25-2011",
> >},
> >{
> > "price":"$4.00",
> > "synopsis":"Canadian thanksgiving special",
> > "starts":"10-10-2011",
> > "ends":"10-11-2011",
> >},
> > ]
> >  },
> >  .
> >
> > A requirement is to be able to search for just discounted publications. I
> > think I could use date faceting for this ( return publications that are
> > within a discount window ). When a discount search is performed no
> > publications that are not currently discounted will be returned.
> >
> > My question are:
> >
> >   - Does solr support this type of sub documents
> >
> > In the above example the discounts are the sub documents. I know solr is
> not
> > a relational DB but I would like to store and index the above
> representation
> > in a single document if possible.
> >
> >   - what is the best method to approach the above
> >
> > I can see in many examples the authors tend to denormalize to solve
> similar
> > problems. This suggest that for each discount I am required to duplicate
> the
> > book data or form a document
> > association >.
> > Which method would you advise?
> >
> > It would be nice if solr could return a response structured as above.
> >
> > Much Thanks
> >
>