Re: How to improve this solr query?

2012-07-04 Thread Amit Nithian
Couple questions:
1) Why are you explicitly telling solr to sort by score desc,
shouldn't it do that for you? Could this be a source of performance
problems since sorting requires the loading of the field caches?
2) Of the query parameters, q1 and q2, which one is actually doing
"text" searching on your index? It looks like q1 is doing non-string
related stuff, could this be better handled in either the bf or bq
section of the edismax config? Looking at the sample though I don't
understand how q1=apartment would hit non-string fields again (but see
#3)
3) Are the "string" fields literally of string type (i.e. no analysis
on the field) or are you saying string loosely to mean "text" field.
pf ==> phrase fields ==> given a multiple word query, will ensure that
the specified phrase exists in the specified fields separated by some
slop ("hello my world" may match "hello world" depending on this slop
value). The "qf" means that given a multi term query, each term exists
in the specified fields (name, description whatever text fields you
want).

Best
Amit

On Mon, Jul 2, 2012 at 9:35 AM, Chamnap Chhorn  wrote:
> Hi all,
>
> I'm using solr 3.5 with nested query on the 4 core cpu server + 17 Gb. The
> problem is that my query is so slow; the average response time is 12 secs
> against 13 millions documents.
>
> What I am doing is to send quoted string (q2) to string fields and
> non-quoted string (q1) to other fields and combine the result together.
>
> facet=true&sort=score+desc&q2=*"apartment"*&facet.mincount=1&q1=*apartment*
> &tie=0.1&q.alt=*:*&wt=json&version=2.2&rows=20&fl=uuid&facet.query=has_map:+true&facet.query=has_image:+true&facet.query=has_website:+true&start=0&q=
> *
> _query_:+"{!dismax+qf='.'+fq='..'+v=$q1}"+OR+_query_:+"{!dismax+qf='..'+fq='...'+v=$q2}"
> *
> &facet.field={!ex%3Ddt}sub_category_uuids&facet.field={!ex%3Ddt}location_uuid
>
> I have done solr optimize already, but it's still slow. Any idea how to
> improve the speed? Am I done anything wrong?
>
> --
> Chhorn Chamnap
> http://chamnap.github.com/


Use of Solr as primary store for search engine

2012-07-04 Thread Amit Nithian
Hello all,

I am curious to know how people are using Solr in conjunction with
other data stores when building search engines to power web sites (say
an ecommerce site). The question I have for the group is given an
architecture where the primary (transactional) data store is MySQL
(Oracle, PostGres whatever) with periodic indexing into Solr, when
your front end issues a search query to Solr and returns results, are
there any joins with your primary Oracle/MySQL etc to help render
results?

Basically I guess my question is whether or not you store enough in
Solr so that when your front end renders the results page, it never
has to hit the database. The other option is that your search engine
only returns primary keys that your front end then uses to hit the DB
to fetch data to display to your end user.

With Solr 4.0 and Solr moving towards the NoSQL direction, I am
curious what people are doing and what application architectures with
Solr look like.

Thanks!
Amit


Re: Use of Solr as primary store for search engine

2012-07-04 Thread Paul Libbrecht
Amit,

not exactly a response to your question but doing this with a lucene index on 
i2geo.net has resulted in considerably performance boost (reading from 
stored-fields instead of reading from the xwiki objects which pull from the SQL 
database). However, it implied that we had to rewrite anything necessary for 
the rendering, hence the rendering has not re-used that many code.

Paul


Le 4 juil. 2012 à 09:54, Amit Nithian a écrit :

> Hello all,
> 
> I am curious to know how people are using Solr in conjunction with
> other data stores when building search engines to power web sites (say
> an ecommerce site). The question I have for the group is given an
> architecture where the primary (transactional) data store is MySQL
> (Oracle, PostGres whatever) with periodic indexing into Solr, when
> your front end issues a search query to Solr and returns results, are
> there any joins with your primary Oracle/MySQL etc to help render
> results?
> 
> Basically I guess my question is whether or not you store enough in
> Solr so that when your front end renders the results page, it never
> has to hit the database. The other option is that your search engine
> only returns primary keys that your front end then uses to hit the DB
> to fetch data to display to your end user.
> 
> With Solr 4.0 and Solr moving towards the NoSQL direction, I am
> curious what people are doing and what application architectures with
> Solr look like.
> 
> Thanks!
> Amit



Re: Something like 'bf' or 'bq' with MoreLikeThis

2012-07-04 Thread nanshi
Thanks a lot, Amit! Please bear with me, I am a new Solr dev, could you
please shed me some light on how to use a patch? point me to a wiki/doc is
fine too. Thanks a lot! :)

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Something-like-bf-or-bq-with-MoreLikeThis-tp3989060p3992935.html
Sent from the Solr - User mailing list archive at Nabble.com.


How to change tmp directory

2012-07-04 Thread Erik Fäßler
Hello all,

I came about an odd issue today when I wanted to add ca. 7M documents to my 
Solr index: I got a SolrServerException telling me "No space left on device". I 
had a look at the directory Solr (and its index) is installed in and there is 
plenty space (~300GB).
I then noticed a file named "upload_457ee97b_1385125274b__8000_0005.tmp" 
had taken up all space of the machine's /tmp directory. The partition holding 
the /tmp directory only has around 1GB of space and this file already took 
nearly 800MB. I had a look at it and I realized that the file contained the 
data I was adding to Solr in an XML format.

Is there a possibility to change the temporary directory for this action?

I use an Iterator with the HttpSolrServer's add(Iterator) 
method for performance. So I can't just do commits from time to time.

Best regards,

Erik

Solr: MLT filter by a field in matched doc

2012-07-04 Thread nanshi
MoreLikeThis can return the matched doc. My question is that can i somehow
pass in a query param to indicate that i would like to filter on a field
value of the matched doc? Is this doable? Or, if not doable, what's the work
around? Thanks a lot!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-MLT-filter-by-a-field-in-matched-doc-tp3992945.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: "Similarity" of numbers in MoreLikeThisHandler

2012-07-04 Thread nanshi
very well explained. However, you dont know the number (integer/float) field
value of a matched in advance. So even suppose the Similarity field is
constructed, how to use it in the query?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Similarity-of-numbers-in-MoreLikeThisHandler-tp486350p3992949.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Synonyms and hyphens

2012-07-04 Thread Alireza Salimi
Hi,

Does anybody know why hyphen '-' and q.op=AND causes such a big difference
between the two queries? I thought hyphens are removed by StandardTokenizer
which means theoretically the two queries should be the same!

Thanks

On Tue, Jul 3, 2012 at 4:05 PM, Alireza Salimi wrote:

> Hi,
>
> I'm not sure if anybody has experienced this behavior before or not.
> I noticed that 'hyphen' plays a very important role here.
> I used Solr's default example directory.
>
> http://localhost:8983/solr/select/?q=name:(gb-mb)&version=2.2&start=0&rows=10&indent=on&debugQuery=on&indent=on&wt=json&q.op=AND
> results in  "parsedquery":"+name:gb +name:gib +name:gigabyte
> +name:gigabytes +name:mb +name:mib +name:megabyte +name:megabytes",
>
> While searching 
> http://localhost:8984/solr/select/?q=name:(gbmb)&version=2.2&start=0&rows=10&indent=on&debugQuery=on&indent=on&wt=json&q.op=AND
> results in "parsedquery":"+(name:gb name:gib name:gigabyte
> name:gigabytes) +(name:mb name:mib name:megabyte name:megabytes)",
>
> If you notice to the first query - with hyphens - you can see that the
> results of
> parsing is totally different. I know that hyphens are special characters
> in Solr,
> but there's no way that the first query returns any entry because it's
> asking for
> ALL synonyms.
>
> Am I missing something here?
>
> Thanks
>
>
> --
> Alireza Salimi
> Java EE Developer
>
>
>


-- 
Alireza Salimi
Java EE Developer


Re: how Solr/Lucene can support standard join operation

2012-07-04 Thread Mikhail Khludnev
FYI,

If denormalization doesn't work for you, check index time join
http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html.
here is the scratch for query and index time support:
https://issues.apache.org/jira/browse/SOLR-3076
https://issues.apache.org/jira/browse/SOLR-3535

On Wed, Jun 27, 2012 at 3:47 PM, Lee Carroll
wrote:

> Sorry you have that link! and I did not see the question - apols
>
> index schema could look something like:
>
> id
> name
> classList -> multi value
> majorClassList -> multi value
>
> a standard query would do the equivalent of your sql
>
> again apols for not seeing the link
>
> lee c
>
>
>
> On 27 June 2012 12:37, Lee Carroll  wrote:
> > In your example de-normalising would be fine in a vast number of
> > use-cases. multi value fields are fine.
> >
> > If you really want to, see http://wiki.apache.org/solr/Join but make
> > sure you loose the default relational dba world view first
> > and only go down that route if you need to.
> >
> >
> >
> > On 27 June 2012 12:27, Robert Yu  wrote:
> >> The ability of join operation supported as what
> http://wiki.apache.org/solr/Join says is so limited.
> >> I'm thinking how to support standard join operation in Solr/Lucene
> because not all can be de-normalized efficiently.
> >>
> >> Take 2 schemas below as an example:
> >>
> >> (1)Student
> >> sid
> >> name
> >> cid// class id
> >>
> >> (2)class
> >>
> >> cid
> >>
> >> name
> >>
> >> major
> >> In SQL, it will be easy to get all students' name and its class name
> where student's name start with 'p' and class's major is "CS".
> >> Select s.name, c.name from student s, class c where s.namelike 
> >> 'p%' and c.major = "CS".
> >>
> >> How Solr/Lucene support the above query? It seems they do not.
> >>
> >> Thanks,
> >> 
> >> Robert Yu
> >> Application Service - Backend
> >> Morningstar Shenzhen Ltd.
> >> Morningstar. Illuminating investing worldwide.
> >>
> >> +86 755 3311-0223 voice
> >> +86 137-2377-0925 mobile
> >> +86 755 - fax
> >> robert...@morningstar.com
> >> 8FL, Tower A, Donghai International Center ( or East Pacific
> International Center)
> >> 7888 Shennan Road, Futian district,
> >> Shenzhen, Guangdong province, China 518040
> >>
> >> http://cn.morningstar.com
> >>
> >> This e-mail contains privileged and confidential information and is
> intended only for the use of the person(s) named above. Any dissemination,
> distribution, or duplication of this communication without prior written
> consent from Morningstar is strictly prohibited. If you have received this
> message in error, please contact the sender immediately and delete the
> materials from any computer.
> >>
>



-- 
Sincerely yours
Mikhail Khludnev
Tech Lead
Grid Dynamics


 


Re: Solr 3.6 issue - DataImportHandler with CachedSqlEntityProcessor not importing all multi-valued fields

2012-07-04 Thread Mikhail Khludnev
It's hard to troubleshoot without debug logs. Pls pay attention that
regular configuration for CachedSqlEP is slightly different

http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor
see

  where="xid=x.id"



On Wed, Jun 27, 2012 at 2:29 AM, ps_sra  wrote:

> Not sure if this is the right forum to post this question.  If not, please
> excuse.
>
> I'm trying to use the DataImportHandler with
> processor="CachedSqlEntityProcessor" to speed up import from an RDBMS.
> While
> processor="CachedSqlEntityProcessor" is much faster than
> processor="SqlEntityProcessor", the resulting Solr index does not contain
> multi-valued fields on sub-entities.
>
> So, for example, my db-data-config.xml has the following structure:
>
> 
> ..
> 
> processor="SqlEntityProcessor"
> query="SELECT
> f.id AS foo_id,
>
>   f.name AS foo_name
>  FROM
>   foo f"
>  >
> 
> 
>
>
>  processor="CachedSqlEntityProcessor"
>
> query="SELECT   b.name as bar_name
>
>   FROMbar b
>
>  WHEREb.id = '${foo.id}'"
> >
>   />
> 
>
> 
> ..
> 
>
> where the database relationship foo:bar is 1:m.
>
> The issue is that when I import with processor="SqlEntityProcessor" ,
> everything works fine and the multi-valued field - "bar_name" has multiple
> values, while importing with processor="CachedSqlEntityProcessor" does not
> even create the "bar_name" field in the index.
>
> I've deployed Solr 3.6 on Weblogic 11g, with the patch
> https://issues.apache.org/jira/browse/SOLR-3360 applied.
>
> Any help on this issue is appreciated.
>
>
> Thanks,
> ps
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-3-6-issue-DataImportHandler-with-CachedSqlEntityProcessor-not-importing-all-multi-valued-fields-tp3991449.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Sincerely yours
Mikhail Khludnev
Tech Lead
Grid Dynamics


 


Re: Elevation togehter with grouping

2012-07-04 Thread tushar_k47
Hi,

I am facing an identical problem. Does anyone have any pointers on this ?

Regards,
Tushar

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Elevation-togehter-with-grouping-tp3916981p3992925.html
Sent from the Solr - User mailing list archive at Nabble.com.


WordDelimiterFilter removes ampersands

2012-07-04 Thread Stephen Lacy
If a user writes a query "Apples & Oranges" the word delimiter filter
factory will change this into "Apples Oranges"
Which isn't very useful for me as I'd prefer especially when the phrase is
wrapped in quotes that the original is preserved.
However I still want to be able to separate Apples&Oranges into Apples &
Oranges so preserveOriginal isn't really useful.
What I really would like to be able to do is tell WordDelimeterFilter to
treat it like it's neither alpha nor numeric, however
that doesn't mean that you remove it completely.

Thanks for your help
Stephen


Get all matching terms of an OR query

2012-07-04 Thread Michael Jakl
Hi,
is there an easy way to get the matches of an OR query?

If I'm searching for "android OR google OR apple OR iphone OR -ipod",
I'd like to know which of these terms document X contains.

I've been using debugQuery and tried to extract the info from the
explain information, unfortunately this is too slow and I'm having
troubles with the stemming of the query.

Using the highlight component doesn't work either because my fields
aren't stored (would the highlighter work with stemmed texts?)

We're using Solr 3.6 in a distributed setting. I'd like to prevent
storing the texts because of space issues, but if that's the only
reasonable solution... .

Thank you,
Michael


Re: WordDelimiterFilter removes ampersands

2012-07-04 Thread Jack Krupansky

That's a perfectly reasonable request. But, WDF doesn't have such a feature.

Maybe what is needed is a distinct "ampersand filter" that runs before WDF 
and detects ampersands that are likely shorthands for "and" and expands 
them. It would also need to be able to detect "AT&T" (capital letter before 
the &) and not expand it (and you can set up a character type table for WDF 
that treats "&" as a letter. A single "&" could also be expanded to "and" - 
that could also be done with the synonym filter, but that would not help you 
with the embedded "&" of "Apples&Oranges".


Maybe a simple character filter that always expands "&" to " and " would be 
good enough for a lot of common cases, as a rough approximation.


Maybe solr.PatternReplaceCharFilterFactory could be used to accomplish that. 
Match "&" and replace with " and ".


-- Jack Krupansky

-Original Message- 
From: Stephen Lacy

Sent: Wednesday, July 04, 2012 8:16 AM
To: solr-user@lucene.apache.org
Subject: WordDelimiterFilter removes ampersands

If a user writes a query "Apples & Oranges" the word delimiter filter
factory will change this into "Apples Oranges"
Which isn't very useful for me as I'd prefer especially when the phrase is
wrapped in quotes that the original is preserved.
However I still want to be able to separate Apples&Oranges into Apples &
Oranges so preserveOriginal isn't really useful.
What I really would like to be able to do is tell WordDelimeterFilter to
treat it like it's neither alpha nor numeric, however
that doesn't mean that you remove it completely.

Thanks for your help
Stephen 



Urgent:Partial Search not Working

2012-07-04 Thread jayakeerthi s
All,

I am using apache-solr-4.0.0-ALPHA and trying to configure the Partial
search on two fields.

Keywords using to search are
The value inside the search ProdSymbl is M1.6X0.35 9P

and I illl have to get the results if I search for M1.6 or X0.35 (Partial
of the search value).


I have tried using  both NGramTokenizerFactory and solr.EdgeNGramFilterFactory
 in the schema.xml


  


  




  
  



Fields I have configured as

  
   

Copy field as


   



Please let me know IF I and missing anything, this is kind of Urgent
requirement needs to be addressed at the earliest, Please help.


Thanks in advance,

Jay


Boosting the whole documents

2012-07-04 Thread Danilak Michal
Hi,

I have the following problem.
I would like to give a boost to the whole documents as I index them. I am
sending to solr xml in the form:



But it does't seem to alter the search scores in any way. I would expect
that to multiply the final search score by two, am I correct?
Probably I would need to alter schema.xml, but I found only information on
how to do that for specific fields (just put omitNorms=false into the field
tag). But what should I do, if I want to boost the whole document?

Note: by boosting a whole document I mean, that if document A has search
score 10.0 and document B has search score 15.0 and I give document A the
boost 2.0, when I index it, I would expect its search score to be 20.0.

Thanks in advance!

Michal Danilak


Solr facet multiple constraint

2012-07-04 Thread davidbougearel
Hi,

I'm trying to make a facet search on a multiple value field and add a filter
query on it and it doesn't work.
Could you please help me find my mistake ?

Here is my solr query :

facet=true,sort=publishingdate desc,facet.mincount=1,q=service:1 AND
publicationstatus:LIVE,facet.field={!ex=dt}user,wt=javabin,fq={!tag=dt}user:10,version=2

Thanks in advance for answers, David. 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-facet-multiple-constraint-tp3992974.html
Sent from the Solr - User mailing list archive at Nabble.com.


fl Parameter and Wildcards for Dynamic Fields

2012-07-04 Thread Josh Harness
I'm using SOLR 3.3 and would like to know how to return a list of dynamic
fields in my search results using a wildcard with the fl parameter. I found
SOLR-2444  but this
appears to be for SOLR 4.0. Am I correct in assuming this isn't doable yet?
Please note that I don't want to query the dynamic fields, I just need them
returned in the search results. Using fl=myDynamicField_* doesn't seem to
work.

Many Thanks!

Josh


Re: leap second bug

2012-07-04 Thread Michael Tsadikov
explanation of the cause:

https://lkml.org/lkml/2012/7/1/203

On Wed, Jul 4, 2012 at 1:48 AM, Óscar Marín Miró
wrote:

> So, this was the solution, sorry to post it so late, just in case it helps
> anyone:
>
> /etc/init.d/ntp stop; date; date `date +"%m%d%H%M%C%y.%S"`; date;
> /etc/init.d/ntp start
>
> And tomcat magically switched from 100% CPU to 0.5% :)
>
> From:
>
>
> https://groups.google.com/forum/?fromgroups#!topic/elasticsearch/_I1_OfaL7QY
>
> [from Michael McCandless help on this thread]
>
> On Sun, Jul 1, 2012 at 6:15 PM, Jack Krupansky  >wrote:
>
> > Interesting:
> >
> > "
> > The sequence of dates of the UTC second markers will be:
> >
> > 2012 June 30, 23h 59m 59s
> > 2012 June 30, 23h 59m 60s
> > 2012 July 1, 0h 0m 0s
> > "
> >
> > See:
> > http://wwp.greenwichmeantime.**com/info/leap-second.htm<
> http://wwp.greenwichmeantime.com/info/leap-second.htm>
> >
> > So, there were two consecutive second " markers" which were literally
> > distinct, but numerically identical.
> >
> > What "design pattern" for timing did Linux violate? In other words, what
> > lesson should we be learning to assure that we don't have a similar
> problem
> > at an application level on a future leap second?
> >
> > -- Jack Krupansky
> >
> > -Original Message- From: Óscar Marín Miró
> > Sent: Sunday, July 01, 2012 11:02 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: leap second bug
> >
> >
> > Thanks Michael, nice information :)
> >
> > On Sun, Jul 1, 2012 at 5:29 PM, Michael McCandless <
> > luc...@mikemccandless.com> wrote:
> >
> >  Looks like this is a low-level Linux issue ... see Shay's email to the
> >> ElasticSearch list about it:
> >>
> >>
> >> https://groups.google.com/**forum/?fromgroups#!topic/**
> >> elasticsearch/_I1_OfaL7QY<
> https://groups.google.com/forum/?fromgroups#!topic/elasticsearch/_I1_OfaL7QY
> >
> >>
> >> Also see the comments here:
> >>
> >>  http://news.ycombinator.com/**item?id=4182642<
> http://news.ycombinator.com/item?id=4182642>
> >>
> >> Mike McCandless
> >>
> >> http://blog.mikemccandless.com
> >>
> >> On Sun, Jul 1, 2012 at 8:08 AM, Óscar Marín Miró
> >>  wrote:
> >> > Hello Michael, thanks for the note :)
> >> >
> >> > I'm having a similar problem since yesterday, tomcats are wild on CPU
> >> [near
> >> > 100%]. Did your solr servers did not reply to index/query requests?
> >> >
> >> > Thanks :)
> >> >
> >> > On Sun, Jul 1, 2012 at 1:22 PM, Michael Tsadikov <
> >> mich...@myheritage.com
> >> >wrote:
> >> >
> >> >> Our solr servers went into GC hell, and became non-responsive on date
> >> >> change today.
> >> >>
> >> >> Restarting tomcats did not help.
> >> >>
> >> >> Rebooting the machine did.
> >> >>
> >> >>
> >> >>
> >> http://www.wired.com/**wiredenterprise/2012/07/leap-**
> >> second-bug-wreaks-havoc-with-**java-linux/<
> http://www.wired.com/wiredenterprise/2012/07/leap-second-bug-wreaks-havoc-with-java-linux/
> >
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > Whether it's science, technology, personal experience, true love,
> >> > astrology, or gut feelings, each of us has confidence in something
> that
> >> we
> >> > will never fully comprehend.
> >> >  --Roy H. William
> >>
> >>
> >
> >
> > --
> > Whether it's science, technology, personal experience, true love,
> > astrology, or gut feelings, each of us has confidence in something that
> we
> > will never fully comprehend.
> > --Roy H. William
> >
>
>
>
> --
> Whether it's science, technology, personal experience, true love,
> astrology, or gut feelings, each of us has confidence in something that we
> will never fully comprehend.
>  --Roy H. William
>


Re: fl Parameter and Wildcards for Dynamic Fields

2012-07-04 Thread Jack Krupansky
This appears to be the case. "*" is the only wildcard supported by "fl" 
before 4.0.


-- Jack Krupansky

-Original Message- 
From: Josh Harness

Sent: Wednesday, July 04, 2012 9:08 AM
To: solr-user@lucene.apache.org
Subject: fl Parameter and Wildcards for Dynamic Fields

I'm using SOLR 3.3 and would like to know how to return a list of dynamic
fields in my search results using a wildcard with the fl parameter. I found
SOLR-2444  but this
appears to be for SOLR 4.0. Am I correct in assuming this isn't doable yet?
Please note that I don't want to query the dynamic fields, I just need them
returned in the search results. Using fl=myDynamicField_* doesn't seem to
work.

Many Thanks!

Josh 



Re: Get all matching terms of an OR query

2012-07-04 Thread Jack Krupansky
First, "OR -ipod" needs to be written as "OR (*:* -ipod)" due to an ongoing 
deficiency in Lucene query parsing, but I wonder what you really think you 
are OR'ing in that clause - all documents that don't contain "ipod"? That 
seems odd. Maybe you really want to constrain the preceding query to exclude 
ipod? That would be:


(android OR google OR apple OR iphone) -ipod

-- Jack Krupansky

-Original Message- 
From: Michael Jakl

Sent: Wednesday, July 04, 2012 8:29 AM
To: solr-user@lucene.apache.org
Subject: Get all matching terms of an OR query

Hi,
is there an easy way to get the matches of an OR query?

If I'm searching for "android OR google OR apple OR iphone OR -ipod",
I'd like to know which of these terms document X contains.

I've been using debugQuery and tried to extract the info from the
explain information, unfortunately this is too slow and I'm having
troubles with the stemming of the query.

Using the highlight component doesn't work either because my fields
aren't stored (would the highlighter work with stemmed texts?)

We're using Solr 3.6 in a distributed setting. I'd like to prevent
storing the texts because of space issues, but if that's the only
reasonable solution... .

Thank you,
Michael 



Javadocs issue on Solr web site

2012-07-04 Thread Ken Krugler
Currently all Javadoc links seem to wind up pointing at the api-4_0_0-ALPHA 
versions - is that expected?

E.g. do a Google search on StreamingUpdateSolrServer. First hit is for 
"StreamingUpdateSolrServer (Solr 3.6.0 API)"

Follow that link, and you get a 404 for page 
http://lucene.apache.org/solr/api-4_0_0-ALPHA/org/apache/solr/client/solrj/impl/StreamingUpdateSolrServer.html

-- Ken

--
Ken Krugler
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Mahout & Solr






Re: Get all matching terms of an OR query

2012-07-04 Thread Michael Jakl
Hi!

On 4 July 2012 17:01, Jack Krupansky  wrote:
> First, "OR -ipod" needs to be written as "OR (*:* -ipod)" due to an ongoing
> deficiency in Lucene query parsing, but I wonder what you really think you
> are OR'ing in that clause - all documents that don't contain "ipod"? That
> seems odd. Maybe you really want to constrain the preceding query to exclude
> ipod? That would be:
>
> (android OR google OR apple OR iphone) -ipod

Thanks, the example was ill-chosen, the -ipod part shouldn't be there.

After some more tests and research, using the debugQuery method seems
the only viable solution(?)

Cheers,
Michael


Re: Get all matching terms of an OR query

2012-07-04 Thread Jack Krupansky
You could always do a custom search component, but all the same information 
(which terms matched) is in the debugQuery. For example, 
"queryWeight(text:the)" indicates that "the" appears in the document.


What exactly is it that is too slow?

Yes, you do have to accept that explain uses analyzed terms. I would note 
that you could try to correlate the "parsedquery" with the original query 
since the parsed query will contain stemmed terms.


It would be nice to have an optional search component or query parser option 
that returned the analyzed term for each query term.


But as things stand, I would suggest that you do your own "fuzzy match" 
between the debugQuery terms and your source terms. That may not be 100% 
accurate, but probably would cover most/many cases.


-- Jack Krupansky

-Original Message- 
From: Michael Jakl

Sent: Wednesday, July 04, 2012 10:09 AM
To: solr-user@lucene.apache.org
Subject: Re: Get all matching terms of an OR query

Hi!

On 4 July 2012 17:01, Jack Krupansky  wrote:
First, "OR -ipod" needs to be written as "OR (*:* -ipod)" due to an 
ongoing

deficiency in Lucene query parsing, but I wonder what you really think you
are OR'ing in that clause - all documents that don't contain "ipod"? That
seems odd. Maybe you really want to constrain the preceding query to 
exclude

ipod? That would be:

(android OR google OR apple OR iphone) -ipod


Thanks, the example was ill-chosen, the -ipod part shouldn't be there.

After some more tests and research, using the debugQuery method seems
the only viable solution(?)

Cheers,
Michael 



Re: WordDelimiterFilter removes ampersands

2012-07-04 Thread Stephen Lacy
 solr.**PatternReplaceCharFilterFactor**y  is a brilliant idea, thanks so
much :)

On Wed, Jul 4, 2012 at 2:46 PM, Jack Krupansky wrote:

> That's a perfectly reasonable request. But, WDF doesn't have such a
> feature.
>
> Maybe what is needed is a distinct "ampersand filter" that runs before WDF
> and detects ampersands that are likely shorthands for "and" and expands
> them. It would also need to be able to detect "AT&T" (capital letter before
> the &) and not expand it (and you can set up a character type table for WDF
> that treats "&" as a letter. A single "&" could also be expanded to "and" -
> that could also be done with the synonym filter, but that would not help
> you with the embedded "&" of "Apples&Oranges".
>
> Maybe a simple character filter that always expands "&" to " and " would
> be good enough for a lot of common cases, as a rough approximation.
>
> Maybe solr.**PatternReplaceCharFilterFactor**y could be used to
> accomplish that. Match "&" and replace with " and ".
>
> -- Jack Krupansky
>
> -Original Message- From: Stephen Lacy
> Sent: Wednesday, July 04, 2012 8:16 AM
> To: solr-user@lucene.apache.org
> Subject: WordDelimiterFilter removes ampersands
>
>
> If a user writes a query "Apples & Oranges" the word delimiter filter
> factory will change this into "Apples Oranges"
> Which isn't very useful for me as I'd prefer especially when the phrase is
> wrapped in quotes that the original is preserved.
> However I still want to be able to separate Apples&Oranges into Apples &
> Oranges so preserveOriginal isn't really useful.
> What I really would like to be able to do is tell WordDelimeterFilter to
> treat it like it's neither alpha nor numeric, however
> that doesn't mean that you remove it completely.
>
> Thanks for your help
> Stephen
>


Boosting the score of the whole documents

2012-07-04 Thread Danilak Michal
Hi guys,
I have the following problem.

I would like to give a boost to the whole documents as I index them. I am
sending to solr the xml in the form:



But it does't seem to alter the search scores in any way. I would expect
that to multiply the final search score by two, am I correct?
Probably I would need to alter schema.xml, but I found only information on
how to do that for specific fields (just put omitNorms=false into the field
tag). But what should I do, if I want to boost the whole document?

Note: by boosting a whole document I mean, that if document A has search
score 10.0 and document B has search score 15.0 and I give document A the
boost 2.0, when I index it, I would expect its search score to be 20.0.

Thanks in advance!


Re: How to change tmp directory

2012-07-04 Thread Jack Krupansky
Solr is probably simply using Java's temp directory, which you can redefine 
by setting the java.io.tmpdir system property on the java command line or 
using a system-specific environment variable.


-- Jack Krupansky

-Original Message- 
From: Erik Fäßler

Sent: Wednesday, July 04, 2012 3:56 AM
To: solr-user@lucene.apache.org
Subject: How to change tmp directory

Hello all,

I came about an odd issue today when I wanted to add ca. 7M documents to my 
Solr index: I got a SolrServerException telling me "No space left on 
device". I had a look at the directory Solr (and its index) is installed in 
and there is plenty space (~300GB).
I then noticed a file named "upload_457ee97b_1385125274b__8000_0005.tmp" 
had taken up all space of the machine's /tmp directory. The partition 
holding the /tmp directory only has around 1GB of space and this file 
already took nearly 800MB. I had a look at it and I realized that the file 
contained the data I was adding to Solr in an XML format.


Is there a possibility to change the temporary directory for this action?

I use an Iterator with the HttpSolrServer's add(Iterator) 
method for performance. So I can't just do commits from time to time.


Best regards,

Erik 



Re: difference between stored="false" and stored="true" ?

2012-07-04 Thread Jack Krupansky
1. The "useless" combination of stored=false and indexed=false is useful to 
"ignore" fields. You might have input data which has fields that you have 
decided to ignore.


2. Stored fields take up memory for documents (fields) to be returned for 
search results in the Solr query response, so fewer stored fields is better 
for performance and memory usage.


-- Jack Krupansky

-Original Message- 
From: Amit Nithian

Sent: Wednesday, July 04, 2012 12:54 AM
To: solr-user@lucene.apache.org
Subject: Re: difference between stored="false" and stored="true" ?

So couple questions on this (comment first then question):
1) I guess you can't have four combinations b/c
index=false/stored=false has no meaning?
2) If you set less fields stored=true does this reduce the memory
footprint for the document cache? Or better yet, I can store more
documents in the cache possibly increasing my cache efficiency?

I read about the lazy loading of fields which seems like a good way to
maximize the cache and gain the advantage of storing data in Solr too.

Thanks
Amit

On Sat, Jun 30, 2012 at 11:01 AM, Giovanni Gherdovich
 wrote:

Thank you François and Jack for those explainations.

Cheers,
GGhh

2012/6/30 François Schiettecatte:

Giovanni

 means the data is stored in the index and [...]



2012/6/30 Jack Krupansky:
"indexed" and "stored" are independent [...] 




Re: Synonyms and hyphens

2012-07-04 Thread Jack Krupansky
Terms with embedded special characters are treated as phrases with spaces in 
place of the special characters. So, "gb-mb" is treated as if you had 
enclosed the term in quotes.


-- Jack Krupansky
-Original Message- 
From: Alireza Salimi

Sent: Wednesday, July 04, 2012 6:50 AM
To: solr-user@lucene.apache.org
Subject: Re: Synonyms and hyphens

Hi,

Does anybody know why hyphen '-' and q.op=AND causes such a big difference
between the two queries? I thought hyphens are removed by StandardTokenizer
which means theoretically the two queries should be the same!

Thanks

On Tue, Jul 3, 2012 at 4:05 PM, Alireza Salimi 
wrote:



Hi,

I'm not sure if anybody has experienced this behavior before or not.
I noticed that 'hyphen' plays a very important role here.
I used Solr's default example directory.

http://localhost:8983/solr/select/?q=name:(gb-mb)&version=2.2&start=0&rows=10&indent=on&debugQuery=on&indent=on&wt=json&q.op=AND
results in  "parsedquery":"+name:gb +name:gib +name:gigabyte
+name:gigabytes +name:mb +name:mib +name:megabyte +name:megabytes",

While searching 
http://localhost:8984/solr/select/?q=name:(gbmb)&version=2.2&start=0&rows=10&indent=on&debugQuery=on&indent=on&wt=json&q.op=AND

results in "parsedquery":"+(name:gb name:gib name:gigabyte
name:gigabytes) +(name:mb name:mib name:megabyte name:megabytes)",

If you notice to the first query - with hyphens - you can see that the
results of
parsing is totally different. I know that hyphens are special characters
in Solr,
but there's no way that the first query returns any entry because it's
asking for
ALL synonyms.

Am I missing something here?

Thanks


--
Alireza Salimi
Java EE Developer






--
Alireza Salimi
Java EE Developer 



Re: Synonyms and hyphens

2012-07-04 Thread Alireza Salimi
Wow, I didn't know that. Is there a way to disable this feature? I mean, is
it something coming from the Analyzer?

On Wed, Jul 4, 2012 at 12:26 PM, Jack Krupansky wrote:

> Terms with embedded special characters are treated as phrases with spaces
> in place of the special characters. So, "gb-mb" is treated as if you had
> enclosed the term in quotes.
>
> -- Jack Krupansky
> -Original Message- From: Alireza Salimi
> Sent: Wednesday, July 04, 2012 6:50 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Synonyms and hyphens
>
>
> Hi,
>
> Does anybody know why hyphen '-' and q.op=AND causes such a big difference
> between the two queries? I thought hyphens are removed by StandardTokenizer
> which means theoretically the two queries should be the same!
>
> Thanks
>
> On Tue, Jul 3, 2012 at 4:05 PM, Alireza Salimi *
> *wrote:
>
>  Hi,
>>
>> I'm not sure if anybody has experienced this behavior before or not.
>> I noticed that 'hyphen' plays a very important role here.
>> I used Solr's default example directory.
>>
>> http://localhost:8983/solr/**select/?q=name:(gb-mb)&**
>> version=2.2&start=0&rows=10&**indent=on&debugQuery=on&**
>> indent=on&wt=json&q.op=AND
>> results in  "parsedquery":"+name:gb +name:gib +name:gigabyte
>> +name:gigabytes +name:mb +name:mib +name:megabyte +name:megabytes",
>>
>> While searching http://localhost:8984/solr/**
>> select/?q=name:(gbmb)&version=**2.2&start=0&rows=10&indent=on&**
>> debugQuery=on&indent=on&wt=**json&q.op=AND
>> results in "parsedquery":"+(name:gb name:gib name:gigabyte
>> name:gigabytes) +(name:mb name:mib name:megabyte name:megabytes)",
>>
>> If you notice to the first query - with hyphens - you can see that the
>> results of
>> parsing is totally different. I know that hyphens are special characters
>> in Solr,
>> but there's no way that the first query returns any entry because it's
>> asking for
>> ALL synonyms.
>>
>> Am I missing something here?
>>
>> Thanks
>>
>>
>> --
>> Alireza Salimi
>> Java EE Developer
>>
>>
>>
>>
>
> --
> Alireza Salimi
> Java EE Developer
>



-- 
Alireza Salimi
Java EE Developer


Re: Boosting the score of the whole documents

2012-07-04 Thread Jack Krupansky
Make sure to review the "similarity" javadoc page to understand what any of 
these factors does to the document score.


See:
http://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/search/Similarity.html

Sure, a document boost applies a multiplicative factor, but that is all 
relative to all of the other factors for that document and query. In other 
words, "all other things being equal", a doc-boost of 2.0 would double the 
score, but all other things are usually not equal.


Try different doc-boost values and see how the score is affected. The 
document may have such a low score that a boost of 2.0 doesn't move the 
needle relative to other documents.


I believe that the doc-boost is included within the "fieldNorm" value that 
is shown in the "explain" section if you add &debugQuery=true to your query 
request. This is explained under "norm" in the similarity javadoc.


I did try a couple of examples with the Solr 3.6 example, such as doc 
boost=2.0, 0.2 (de-boost), 4.0, and 8.0. In my case, it took a boost of 8.0 
to move a document up.


-- Jack Krupansky

-Original Message- 
From: Danilak Michal

Sent: Wednesday, July 04, 2012 10:57 AM
To: solr-user@lucene.apache.org
Subject: Boosting the score of the whole documents

Hi guys,
I have the following problem.

I would like to give a boost to the whole documents as I index them. I am
sending to solr the xml in the form:



But it does't seem to alter the search scores in any way. I would expect
that to multiply the final search score by two, am I correct?
Probably I would need to alter schema.xml, but I found only information on
how to do that for specific fields (just put omitNorms=false into the field
tag). But what should I do, if I want to boost the whole document?

Note: by boosting a whole document I mean, that if document A has search
score 10.0 and document B has search score 15.0 and I give document A the
boost 2.0, when I index it, I would expect its search score to be 20.0.

Thanks in advance! 



Debugging jetty IllegalStateException errors?

2012-07-04 Thread Aaron Daubman
Greetings,

I'm wondering if anybody has experienced (and found root cause) for errors
like this. We're running Solr 3.6.0 with latest stable Jetty 7
(7.6.4.v20120524).
I know this is likely due to a client (or the server) terminating the
connection unexpectedly, but we see these fairly frequently and can't
determine what the impact is or why they are happening (who is closing
early, why?)

Any tips/tricks on troubleshooting or what to do to possibly minimize or
help prevent these from happening (we are using a fairly old python client
to programmatically access this solr instance).

---snip---
17:25:13,250 [qtp581536050-12] WARN  jetty.server.Response null - Committed
before 500 null

org.eclipse.jetty.io.EofException
at
org.eclipse.jetty.http.HttpGenerator.flushBuffer(HttpGenerator.java:952)
at
org.eclipse.jetty.http.AbstractGenerator.flush(AbstractGenerator.java:438)
at org.eclipse.jetty.server.HttpOutput.flush(HttpOutput.java:94)
at
org.eclipse.jetty.server.AbstractHttpConnection$Output.flush(AbstractHttpConnection.java:1016)
at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:278)
at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:122)
at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:212)
at org.apache.solr.common.util.FastWriter.flush(FastWriter.java:115)
at
org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:353)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:273)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1332)
at
org.eclipse.jetty.servlets.UserAgentFilter.doFilter(UserAgentFilter.java:77)
at
org.eclipse.jetty.servlets.GzipFilter.doFilter(GzipFilter.java:247)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1332)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:477)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:225)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1031)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:406)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:186)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:965)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111)
at org.eclipse.jetty.server.Server.handle(Server.java:348)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:452)
at
org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:894)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:948)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:851)
at
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at
org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:77)
at
org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:620)
at
org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:46)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:603)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:538)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.nio.channels.ClosedChannelException
at
sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:137)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:359)
at java.nio.channels.SocketChannel.write(SocketChannel.java:360)
at
org.eclipse.jetty.io.nio.ChannelEndPoint.gatheringFlush(ChannelEndPoint.java:371)
at
org.eclipse.jetty.io.nio.ChannelEndPoint.flush(ChannelEndPoint.java:330)
at
org.eclipse.jetty.io.nio.SelectChannelEndPoint.flush(SelectChannelEndPoint.java:330)
at
org.eclipse.jetty.http.HttpGenerator.flushBuffer(HttpGenerator.java:876)
... 37 more

17:25:13,250 [qtp581536050-12] WARN  jetty.servlet.ServletHandler null -
/solr/artists/select java.lang.IllegalStateException: Committed
at org.eclipse.jetty.server.Response.resetBuffer(Response.java:1087)
at org.eclipse.jetty.server.Res

Re: Javadocs issue on Solr web site

2012-07-04 Thread Chris Hostetter

: Currently all Javadoc links seem to wind up pointing at the api-4_0_0-ALPHA 
versions - is that expected?

yes.  

/solr/api has always pointed at the javadocs for the most recent 
release of solr.  All that's changed now is that we host multiple copies 
of hte javadocs (just like Lucene-Core has for a long time) and the 
canonical URLs make it clear which version you are looking at.

there's an open Jira to make a landing page listing all the versions that 
i'm going to try to get to later today, but you can still find the 3.6 
javadocs here...

http://lucene.apache.org/solr/api-3_6_0/

: E.g. do a Google search on StreamingUpdateSolrServer. First hit is for 
"StreamingUpdateSolrServer (Solr 3.6.0 API)"
: 
: Follow that link, and you get a 404 for page 
: 
http://lucene.apache.org/solr/api-4_0_0-ALPHA/org/apache/solr/client/solrj/impl/StreamingUpdateSolrServer.html

that's to be expected:
  1) google hasn't recrawled yet so it doesn't know about the new versions in 
general
  2) that class was removed in 4.0


-Hoss


Re: Synonyms and hyphens

2012-07-04 Thread Jack Krupansky
There is one other detail that should clarify the situation. At query time, 
the query parser itself is breaking your query into space-delimited terms, 
and only calling the analyzer for each of those terms, each of which will be 
treated as if a quoted phrase. So it doesn't matter whether it is the 
standard analyzer or word delimiter filter or other filter that is breaking 
up the compound term.


And the default "query operator" only applies to the "terms" as the query 
parser parsed them, not for the sub-terms of a compound term like CD-ROM or 
gb-mb.


-- Jack Krupansky

-Original Message- 
From: Alireza Salimi

Sent: Wednesday, July 04, 2012 12:05 PM
To: solr-user@lucene.apache.org
Subject: Re: Synonyms and hyphens

Wow, I didn't know that. Is there a way to disable this feature? I mean, is
it something coming from the Analyzer?

On Wed, Jul 4, 2012 at 12:26 PM, Jack Krupansky 
wrote:



Terms with embedded special characters are treated as phrases with spaces
in place of the special characters. So, "gb-mb" is treated as if you had
enclosed the term in quotes.

-- Jack Krupansky
-Original Message- From: Alireza Salimi
Sent: Wednesday, July 04, 2012 6:50 AM
To: solr-user@lucene.apache.org
Subject: Re: Synonyms and hyphens


Hi,

Does anybody know why hyphen '-' and q.op=AND causes such a big difference
between the two queries? I thought hyphens are removed by 
StandardTokenizer

which means theoretically the two queries should be the same!

Thanks

On Tue, Jul 3, 2012 at 4:05 PM, Alireza Salimi *
*wrote:

 Hi,


I'm not sure if anybody has experienced this behavior before or not.
I noticed that 'hyphen' plays a very important role here.
I used Solr's default example directory.

http://localhost:8983/solr/**select/?q=name:(gb-mb)&**
version=2.2&start=0&rows=10&**indent=on&debugQuery=on&**
indent=on&wt=json&q.op=AND
results in  "parsedquery":"+name:gb +name:gib +name:gigabyte
+name:gigabytes +name:mb +name:mib +name:megabyte +name:megabytes",

While searching http://localhost:8984/solr/**
select/?q=name:(gbmb)&version=**2.2&start=0&rows=10&indent=on&**
debugQuery=on&indent=on&wt=**json&q.op=AND
results in "parsedquery":"+(name:gb name:gib name:gigabyte
name:gigabytes) +(name:mb name:mib name:megabyte name:megabytes)",

If you notice to the first query - with hyphens - you can see that the
results of
parsing is totally different. I know that hyphens are special characters
in Solr,
but there's no way that the first query returns any entry because it's
asking for
ALL synonyms.

Am I missing something here?

Thanks


--
Alireza Salimi
Java EE Developer






--
Alireza Salimi
Java EE Developer





--
Alireza Salimi
Java EE Developer 



Re: Synonyms and hyphens

2012-07-04 Thread Alireza Salimi
ok, so how can I prevent this behavior to happen?
As you can see the parsed query is very different in these two cases.

On Wed, Jul 4, 2012 at 1:37 PM, Jack Krupansky wrote:

> There is one other detail that should clarify the situation. At query
> time, the query parser itself is breaking your query into space-delimited
> terms, and only calling the analyzer for each of those terms, each of which
> will be treated as if a quoted phrase. So it doesn't matter whether it is
> the standard analyzer or word delimiter filter or other filter that is
> breaking up the compound term.
>
> And the default "query operator" only applies to the "terms" as the query
> parser parsed them, not for the sub-terms of a compound term like CD-ROM or
> gb-mb.
>
>
> -- Jack Krupansky
>
> -Original Message- From: Alireza Salimi
> Sent: Wednesday, July 04, 2012 12:05 PM
>
> To: solr-user@lucene.apache.org
> Subject: Re: Synonyms and hyphens
>
> Wow, I didn't know that. Is there a way to disable this feature? I mean, is
> it something coming from the Analyzer?
>
> On Wed, Jul 4, 2012 at 12:26 PM, Jack Krupansky *
> *wrote:
>
>  Terms with embedded special characters are treated as phrases with spaces
>> in place of the special characters. So, "gb-mb" is treated as if you had
>> enclosed the term in quotes.
>>
>> -- Jack Krupansky
>> -Original Message- From: Alireza Salimi
>> Sent: Wednesday, July 04, 2012 6:50 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Synonyms and hyphens
>>
>>
>> Hi,
>>
>> Does anybody know why hyphen '-' and q.op=AND causes such a big difference
>> between the two queries? I thought hyphens are removed by
>> StandardTokenizer
>> which means theoretically the two queries should be the same!
>>
>> Thanks
>>
>> On Tue, Jul 3, 2012 at 4:05 PM, Alireza Salimi > >*
>> *wrote:
>>
>>  Hi,
>>
>>>
>>> I'm not sure if anybody has experienced this behavior before or not.
>>> I noticed that 'hyphen' plays a very important role here.
>>> I used Solr's default example directory.
>>>
>>> http://localhost:8983/solr/select/?q=name:(gb-mb)&**
>>> version=2.2&start=0&rows=10&indent=on&debugQuery=on&**
>>> indent=on&wt=json&q.op=AND>> select/?q=name:(gb-mb)&**version=2.2&start=0&rows=10&**
>>> indent=on&debugQuery=on&**indent=on&wt=json&q.op=AND
>>> >
>>>
>>> results in  "parsedquery":"+name:gb +name:gib +name:gigabyte
>>> +name:gigabytes +name:mb +name:mib +name:megabyte +name:megabytes",
>>>
>>> While searching http://localhost:8984/solr/**
>>> select/?q=name:(gbmb)&version=2.2&start=0&rows=10&indent=**on&**
>>> debugQuery=on&indent=on&wt=json&q.op=AND>> localhost:8984/solr/select/?q=**name:(gbmb)&version=2.2&start=**
>>> 0&rows=10&indent=on&**debugQuery=on&indent=on&wt=**json&q.op=AND
>>> >
>>>
>>> results in "parsedquery":"+(name:gb name:gib name:gigabyte
>>> name:gigabytes) +(name:mb name:mib name:megabyte name:megabytes)",
>>>
>>> If you notice to the first query - with hyphens - you can see that the
>>> results of
>>> parsing is totally different. I know that hyphens are special characters
>>> in Solr,
>>> but there's no way that the first query returns any entry because it's
>>> asking for
>>> ALL synonyms.
>>>
>>> Am I missing something here?
>>>
>>> Thanks
>>>
>>>
>>> --
>>> Alireza Salimi
>>> Java EE Developer
>>>
>>>
>>>
>>>
>>>
>> --
>> Alireza Salimi
>> Java EE Developer
>>
>>
>
>
> --
> Alireza Salimi
> Java EE Developer
>



-- 
Alireza Salimi
Java EE Developer


Re: Boosting the score of the whole documents

2012-07-04 Thread Danilak Michal
Should there be made any modification into scheme.xml file?
For example, to enable field boosts, one has to set omitNorms to false.
Is there some similar field for document boosts?

On Wed, Jul 4, 2012 at 7:29 PM, Jack Krupansky wrote:

> Make sure to review the "similarity" javadoc page to understand what any
> of these factors does to the document score.
>
> See:
> http://lucene.apache.org/core/**3_6_0/api/all/org/apache/**
> lucene/search/Similarity.html
>
> Sure, a document boost applies a multiplicative factor, but that is all
> relative to all of the other factors for that document and query. In other
> words, "all other things being equal", a doc-boost of 2.0 would double the
> score, but all other things are usually not equal.
>
> Try different doc-boost values and see how the score is affected. The
> document may have such a low score that a boost of 2.0 doesn't move the
> needle relative to other documents.
>
> I believe that the doc-boost is included within the "fieldNorm" value that
> is shown in the "explain" section if you add &debugQuery=true to your query
> request. This is explained under "norm" in the similarity javadoc.
>
> I did try a couple of examples with the Solr 3.6 example, such as doc
> boost=2.0, 0.2 (de-boost), 4.0, and 8.0. In my case, it took a boost of 8.0
> to move a document up.
>
> -- Jack Krupansky
>
> -Original Message- From: Danilak Michal
> Sent: Wednesday, July 04, 2012 10:57 AM
> To: solr-user@lucene.apache.org
> Subject: Boosting the score of the whole documents
>
>
> Hi guys,
> I have the following problem.
>
> I would like to give a boost to the whole documents as I index them. I am
> sending to solr the xml in the form:
>
> 
>
> But it does't seem to alter the search scores in any way. I would expect
> that to multiply the final search score by two, am I correct?
> Probably I would need to alter schema.xml, but I found only information on
> how to do that for specific fields (just put omitNorms=false into the field
> tag). But what should I do, if I want to boost the whole document?
>
> Note: by boosting a whole document I mean, that if document A has search
> score 10.0 and document B has search score 15.0 and I give document A the
> boost 2.0, when I index it, I would expect its search score to be 20.0.
>
> Thanks in advance!
>


Re: Urgent:Partial Search not Working

2012-07-04 Thread jayakeerthi s
Could anyone please reply the solution to this

On Wed, Jul 4, 2012 at 7:18 PM, jayakeerthi s wrote:

> All,
>
> I am using apache-solr-4.0.0-ALPHA and trying to configure the Partial
> search on two fields.
>
> Keywords using to search are
> The value inside the search ProdSymbl is M1.6X0.35 9P
>
> and I willl have to get the results if I search for M1.6 or X0.35 (Partial
> of the search value).
>
>
> I have tried using  both NGramTokenizerFactory and solr.EdgeNGramFilterFactory
>  in the schema.xml
>
> 
>   
>
>  omitNorms="false">
>   
> 
> 
> 
>  maxGramSize="15" side="front"/>
>   
>   
>
>
>
> Fields I have configured as
>
>multiValued="true"/>
> multiValued="true"/>
>
> Copy field as
>
> 
>
>
>
>
> Please let me know IF I and missing anything, this is kind of Urgent
> requirement needs to be addressed at the earliest, Please help.
>
>
> Thanks in advance,
>
> Jay
>


Re: Something like 'bf' or 'bq' with MoreLikeThis

2012-07-04 Thread Amit Nithian
No worries! What version of Solr are you using? One that you
downloaded as a tarball or one that you checked out from SVN (trunk)?
I'll take a bit of time and document steps and respond.

I'll review the patch to see that it fits a general case. Question for
you with MLT, are your users doing a blank search (no text) for
something or are you returning results More Like results that were
generated as a result of a user typing some text query. I may have
built this patch assuming a blank query but I can make it work (or try
to) make it work for text based queries.

Thanks
Amit

On Wed, Jul 4, 2012 at 1:37 AM, nanshi  wrote:
> Thanks a lot, Amit! Please bear with me, I am a new Solr dev, could you
> please shed me some light on how to use a patch? point me to a wiki/doc is
> fine too. Thanks a lot! :)
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Something-like-bf-or-bq-with-MoreLikeThis-tp3989060p3992935.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Urgent:Partial Search not Working

2012-07-04 Thread Jack Krupansky
You need to apply the edge n-gram filter only at index time, not at query 
time. So, you need to specify two analyzers for these field types, an 
"index" and a "query" analyzer. They should be roughly the same, but the 
"query" analyzer would not have the edge n-gram filter since you are 
accepting the single n-gram given by the user and then matching it against 
the full list of n-grams that are in the index.


It is unfortunate that the wiki example is misleading. Just as bad, we don't 
have an example in the example schema.


Basically, take a "text" field type that you like from the Solr example 
schema and then add the edge n-gram filter to its "index" analyzer, probably 
as the last token filter. I would note that the edge n-gram filter will 
interact with the stemming filter, but there is not much you can do other 
than try different stemmers and experiment with whether stemming should be 
before or after the edge n-gram filter. I suspect that having stemming after 
edge n-gram may be better.


-- Jack Krupansky

-Original Message- 
From: jayakeerthi s

Sent: Wednesday, July 04, 2012 1:41 PM
To: solr-user@lucene.apache.org ; solr-user-h...@lucene.apache.org
Subject: Re: Urgent:Partial Search not Working

Could anyone please reply the solution to this

On Wed, Jul 4, 2012 at 7:18 PM, jayakeerthi s wrote:


All,

I am using apache-solr-4.0.0-ALPHA and trying to configure the Partial
search on two fields.

Keywords using to search are
The value inside the search ProdSymbl is M1.6X0.35 9P

and I willl have to get the results if I search for M1.6 or X0.35 (Partial
of the search value).


I have tried using  both NGramTokenizerFactory and 
solr.EdgeNGramFilterFactory

 in the schema.xml


  


  




  
  



Fields I have configured as

  
   

Copy field as


   



Please let me know IF I and missing anything, this is kind of Urgent
requirement needs to be addressed at the earliest, Please help.


Thanks in advance,

Jay





Re: Use of Solr as primary store for search engine

2012-07-04 Thread Amit Nithian
Paul,

Thanks for your response! Were you using the SQL database as an object
store to pull XWiki objects or did you have to execute several queries
to reconstruct these objects? I don't know much about them sorry..
Also for those responding, can you provide a few basic metrics for me?
1) Number of nodes receiving queries
2) Approximate queries per second
3) Approximate latency per query

I know some of this may be sensitive depending on where you work so
reasonable ranges would be nice (i.e. sub-second isn't hugely helpful
since 50,100,200 ms have huge impacts depending on your site).

Thanks again!
Amit

On Wed, Jul 4, 2012 at 1:09 AM, Paul Libbrecht  wrote:
> Amit,
>
> not exactly a response to your question but doing this with a lucene index on 
> i2geo.net has resulted in considerably performance boost (reading from 
> stored-fields instead of reading from the xwiki objects which pull from the 
> SQL database). However, it implied that we had to rewrite anything necessary 
> for the rendering, hence the rendering has not re-used that many code.
>
> Paul
>
>
> Le 4 juil. 2012 à 09:54, Amit Nithian a écrit :
>
>> Hello all,
>>
>> I am curious to know how people are using Solr in conjunction with
>> other data stores when building search engines to power web sites (say
>> an ecommerce site). The question I have for the group is given an
>> architecture where the primary (transactional) data store is MySQL
>> (Oracle, PostGres whatever) with periodic indexing into Solr, when
>> your front end issues a search query to Solr and returns results, are
>> there any joins with your primary Oracle/MySQL etc to help render
>> results?
>>
>> Basically I guess my question is whether or not you store enough in
>> Solr so that when your front end renders the results page, it never
>> has to hit the database. The other option is that your search engine
>> only returns primary keys that your front end then uses to hit the DB
>> to fetch data to display to your end user.
>>
>> With Solr 4.0 and Solr moving towards the NoSQL direction, I am
>> curious what people are doing and what application architectures with
>> Solr look like.
>>
>> Thanks!
>> Amit
>


Re: Synonyms and hyphens

2012-07-04 Thread Jack Krupansky
You could pre-process your queries to convert hyphen and other special 
characters to spaces.


-- Jack Krupansky

-Original Message- 
From: Alireza Salimi

Sent: Wednesday, July 04, 2012 12:56 PM
To: solr-user@lucene.apache.org
Subject: Re: Synonyms and hyphens

ok, so how can I prevent this behavior to happen?
As you can see the parsed query is very different in these two cases.

On Wed, Jul 4, 2012 at 1:37 PM, Jack Krupansky 
wrote:



There is one other detail that should clarify the situation. At query
time, the query parser itself is breaking your query into space-delimited
terms, and only calling the analyzer for each of those terms, each of 
which

will be treated as if a quoted phrase. So it doesn't matter whether it is
the standard analyzer or word delimiter filter or other filter that is
breaking up the compound term.

And the default "query operator" only applies to the "terms" as the query
parser parsed them, not for the sub-terms of a compound term like CD-ROM 
or

gb-mb.


-- Jack Krupansky

-Original Message- From: Alireza Salimi
Sent: Wednesday, July 04, 2012 12:05 PM

To: solr-user@lucene.apache.org
Subject: Re: Synonyms and hyphens

Wow, I didn't know that. Is there a way to disable this feature? I mean, 
is

it something coming from the Analyzer?

On Wed, Jul 4, 2012 at 12:26 PM, Jack Krupansky *
*wrote:

 Terms with embedded special characters are treated as phrases with spaces

in place of the special characters. So, "gb-mb" is treated as if you had
enclosed the term in quotes.

-- Jack Krupansky
-Original Message- From: Alireza Salimi
Sent: Wednesday, July 04, 2012 6:50 AM
To: solr-user@lucene.apache.org
Subject: Re: Synonyms and hyphens


Hi,

Does anybody know why hyphen '-' and q.op=AND causes such a big 
difference

between the two queries? I thought hyphens are removed by
StandardTokenizer
which means theoretically the two queries should be the same!

Thanks

On Tue, Jul 3, 2012 at 4:05 PM, Alireza Salimi *
*wrote:

 Hi,



I'm not sure if anybody has experienced this behavior before or not.
I noticed that 'hyphen' plays a very important role here.
I used Solr's default example directory.

http://localhost:8983/solr/select/?q=name:(gb-mb)&**
version=2.2&start=0&rows=10&indent=on&debugQuery=on&**
indent=on&wt=json&q.op=ANDhttp://localhost:8983/solr/select/?q=name:(gb-mb)&version=2.2&start=0&rows=10&indent=on&debugQuery=on&indent=on&wt=json&q.op=AND>
>

results in  "parsedquery":"+name:gb +name:gib +name:gigabyte
+name:gigabytes +name:mb +name:mib +name:megabyte +name:megabytes",

While searching http://localhost:8984/solr/**
select/?q=name:(gbmb)&version=2.2&start=0&rows=10&indent=**on&**
debugQuery=on&indent=on&wt=json&q.op=AND
>

results in "parsedquery":"+(name:gb name:gib name:gigabyte
name:gigabytes) +(name:mb name:mib name:megabyte name:megabytes)",

If you notice to the first query - with hyphens - you can see that the
results of
parsing is totally different. I know that hyphens are special characters
in Solr,
but there's no way that the first query returns any entry because it's
asking for
ALL synonyms.

Am I missing something here?

Thanks


--
Alireza Salimi
Java EE Developer






--
Alireza Salimi
Java EE Developer





--
Alireza Salimi
Java EE Developer





--
Alireza Salimi
Java EE Developer 



Re: Use of Solr as primary store for search engine

2012-07-04 Thread Paul Libbrecht

Le 4 juil. 2012 à 21:17, Amit Nithian a écrit :
> Thanks for your response! Were you using the SQL database as an object
> store to pull XWiki objects or did you have to execute several queries
> to reconstruct these objects?

The first. It's all fairly transparent.
There are "XWiki Classes" and XWiki objects which are rendered, they live as 
composite of the XWiki-java-objects which hibernate-persisted.

> I don't know much about them sorry..
> Also for those responding, can you provide a few basic metrics for me?
> 1) Number of nodes receiving queries
> 2) Approximate queries per second
> 3) Approximate latency per query

I admire those that have this at hand.

> I know some of this may be sensitive depending on where you work so
> reasonable ranges would be nice (i.e. sub-second isn't hugely helpful
> since 50,100,200 ms have huge impacts depending on your site).

I think caching comes into play here in a very strong manner, so these measures 
are fairly difficult to establish. One Solr I run, in particular, makes 
differences between 100ms (uncached queries) and 9 ms (cached query).

Paul

Re: Urgent:Partial Search not Working

2012-07-04 Thread jayakeerthi s
Hi Jack,

Many thanks for your reply...
yes i have tried both ngram and Edgegram filterfactory, still no result.
Please le t me know any alternatives

On Thu, Jul 5, 2012 at 12:42 AM, Jack Krupansky wrote:

> You need to apply the edge n-gram filter only at index time, not at query
> time. So, you need to specify two analyzers for these field types, an
> "index" and a "query" analyzer. They should be roughly the same, but the
> "query" analyzer would not have the edge n-gram filter since you are
> accepting the single n-gram given by the user and then matching it against
> the full list of n-grams that are in the index.
>
> It is unfortunate that the wiki example is misleading. Just as bad, we
> don't have an example in the example schema.
>
> Basically, take a "text" field type that you like from the Solr example
> schema and then add the edge n-gram filter to its "index" analyzer,
> probably as the last token filter. I would note that the edge n-gram filter
> will interact with the stemming filter, but there is not much you can do
> other than try different stemmers and experiment with whether stemming
> should be before or after the edge n-gram filter. I suspect that having
> stemming after edge n-gram may be better.
>
> -- Jack Krupansky
>
> -Original Message- From: jayakeerthi s
> Sent: Wednesday, July 04, 2012 1:41 PM
> To: solr-user@lucene.apache.org ; 
> solr-user-help@lucene.apache.**org
> Subject: Re: Urgent:Partial Search not Working
>
>
> Could anyone please reply the solution to this
>
> On Wed, Jul 4, 2012 at 7:18 PM, jayakeerthi s  >wrote:
>
>  All,
>>
>> I am using apache-solr-4.0.0-ALPHA and trying to configure the Partial
>> search on two fields.
>>
>> Keywords using to search are
>> The value inside the search ProdSymbl is M1.6X0.35 9P
>>
>> and I willl have to get the results if I search for M1.6 or X0.35 (Partial
>> of the search value).
>>
>>
>> I have tried using  both NGramTokenizerFactory and
>> solr.EdgeNGramFilterFactory
>>  in the schema.xml
>>
>> 
>>   
>>
>> > omitNorms="false">
>>   
>> 
>> 
>> 
>> > maxGramSize="15" side="front"/>
>>   
>>   
>>
>>
>>
>> Fields I have configured as
>>
>>   > multiValued="true"/>
>>> multiValued="true"/>
>>
>> Copy field as
>>
>> 
>>
>>
>>
>>
>> Please let me know IF I and missing anything, this is kind of Urgent
>> requirement needs to be addressed at the earliest, Please help.
>>
>>
>> Thanks in advance,
>>
>> Jay
>>
>>
>


Re: Urgent:Partial Search not Working

2012-07-04 Thread Jack Krupansky
Don't forget to test your field type analyzers on the Solr Admin "analysis" 
page. It will show you exactly how terms gets analyzed at both index and 
query time.


If something is not working, be specific as to what the case is and exactly 
what is not as you would expect, both the expected value and the actual 
value.


-- Jack Krupansky

-Original Message- 
From: jayakeerthi s

Sent: Wednesday, July 04, 2012 3:44 PM
To: solr-user@lucene.apache.org
Subject: Re: Urgent:Partial Search not Working

Hi Jack,

Many thanks for your reply...
yes i have tried both ngram and Edgegram filterfactory, still no result.
Please le t me know any alternatives

On Thu, Jul 5, 2012 at 12:42 AM, Jack Krupansky 
wrote:



You need to apply the edge n-gram filter only at index time, not at query
time. So, you need to specify two analyzers for these field types, an
"index" and a "query" analyzer. They should be roughly the same, but the
"query" analyzer would not have the edge n-gram filter since you are
accepting the single n-gram given by the user and then matching it against
the full list of n-grams that are in the index.

It is unfortunate that the wiki example is misleading. Just as bad, we
don't have an example in the example schema.

Basically, take a "text" field type that you like from the Solr example
schema and then add the edge n-gram filter to its "index" analyzer,
probably as the last token filter. I would note that the edge n-gram 
filter

will interact with the stemming filter, but there is not much you can do
other than try different stemmers and experiment with whether stemming
should be before or after the edge n-gram filter. I suspect that having
stemming after edge n-gram may be better.

-- Jack Krupansky

-Original Message- From: jayakeerthi s
Sent: Wednesday, July 04, 2012 1:41 PM
To: solr-user@lucene.apache.org ; 
solr-user-help@lucene.apache.**org

Subject: Re: Urgent:Partial Search not Working


Could anyone please reply the solution to this

On Wed, Jul 4, 2012 at 7:18 PM, jayakeerthi s wrote:

 All,


I am using apache-solr-4.0.0-ALPHA and trying to configure the Partial
search on two fields.

Keywords using to search are
The value inside the search ProdSymbl is M1.6X0.35 9P

and I willl have to get the results if I search for M1.6 or X0.35 
(Partial

of the search value).


I have tried using  both NGramTokenizerFactory and
solr.EdgeNGramFilterFactory
 in the schema.xml


  


  




  
  



Fields I have configured as

  
   

Copy field as


   



Please let me know IF I and missing anything, this is kind of Urgent
requirement needs to be addressed at the earliest, Please help.


Thanks in advance,

Jay








Re: Boosting the score of the whole documents

2012-07-04 Thread Jack Krupansky
I'm not completely sure. I wouldn't expect that document boost should 
require field norms, but glancing at the code, it seems that having 
omitNorms=true does mean that the score for a field will not get the 
document boost, and in fact such a field gets a "constant score". In other 
words, that the score for any field within the document will only get the 
document boost if that field does not have omitNorms=true. But as long as at 
least one field has norms, the document score should get some boost from 
document boost. I am not sure if this is the way the code is supposed to 
work, or whether it just happens to be this way.


I would hope that some committer with detailed knowledge of "norms" and 
"similarity" weigh in on this matter.

-- Jack Krupansky

-Original Message- 
From: Danilak Michal

Sent: Wednesday, July 04, 2012 1:11 PM
To: solr-user@lucene.apache.org
Subject: Re: Boosting the score of the whole documents

Should there be made any modification into scheme.xml file?
For example, to enable field boosts, one has to set omitNorms to false.
Is there some similar field for document boosts?

On Wed, Jul 4, 2012 at 7:29 PM, Jack Krupansky 
wrote:



Make sure to review the "similarity" javadoc page to understand what any
of these factors does to the document score.

See:
http://lucene.apache.org/core/**3_6_0/api/all/org/apache/**
lucene/search/Similarity.html

Sure, a document boost applies a multiplicative factor, but that is all
relative to all of the other factors for that document and query. In other
words, "all other things being equal", a doc-boost of 2.0 would double the
score, but all other things are usually not equal.

Try different doc-boost values and see how the score is affected. The
document may have such a low score that a boost of 2.0 doesn't move the
needle relative to other documents.

I believe that the doc-boost is included within the "fieldNorm" value that
is shown in the "explain" section if you add &debugQuery=true to your 
query

request. This is explained under "norm" in the similarity javadoc.

I did try a couple of examples with the Solr 3.6 example, such as doc
boost=2.0, 0.2 (de-boost), 4.0, and 8.0. In my case, it took a boost of 
8.0

to move a document up.

-- Jack Krupansky

-Original Message- From: Danilak Michal
Sent: Wednesday, July 04, 2012 10:57 AM
To: solr-user@lucene.apache.org
Subject: Boosting the score of the whole documents


Hi guys,
I have the following problem.

I would like to give a boost to the whole documents as I index them. I am
sending to solr the xml in the form:



But it does't seem to alter the search scores in any way. I would expect
that to multiply the final search score by two, am I correct?
Probably I would need to alter schema.xml, but I found only information on
how to do that for specific fields (just put omitNorms=false into the 
field

tag). But what should I do, if I want to boost the whole document?

Note: by boosting a whole document I mean, that if document A has search
score 10.0 and document B has search score 15.0 and I give document A the
boost 2.0, when I index it, I would expect its search score to be 20.0.

Thanks in advance!





Re: Something like 'bf' or 'bq' with MoreLikeThis

2012-07-04 Thread nanshi
Amit, I am using Solr3.6 and directly imported apache-solr-3.6.0.war into
Eclipse (Indigo). I will need to directly invoke a MoreLikeThis(/mlt) call
using a unique id to get MoreLikeThis results. 

The hard part is that I need to use a float number field (that i cannot use
mlt.fl or mlt.fq since it's not a string) in the matched document of the MLT
response to find MLT results - this is purely for relevance improvement. 

I found a work around that I can use a standard query parameter
fq=Rating:[1.5 TO 2.5]; however, for the run time queries, i have to extract
the rating number from the matched doc(/mlt?q=id:12345) that i dont know how
to extract this at run time If the matched rating is 2, for instance,
then i can construct [1.5 TO 2.5] to say that 2 is more like a value within
the range from 1.5 to 2.5So, the same thing i will encounter if i use a
bf parameter to calculate distance, i will still need to get the Rating
value out of the matched document.


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Something-like-bf-or-bq-with-MoreLikeThis-tp3989060p3993079.html
Sent from the Solr - User mailing list archive at Nabble.com.


Internal Error 500 - How to diagnose?

2012-07-04 Thread Spadez
Hi,

Sorry for this post, but im having a hard time getting my head around this.
I installed Solr on Tomcat and it seems to work fine. I get the solr admin
page and the "it works" page from tomcat. 

When I try to query my solr server I get this message:

*Internal Server Error

The server encountered an internal error and was unable to complete your
request. Either the server is overloaded or there is an error in the
application.*

I had this working before but I have changed almost everything since so I
dont know where to start diagnosing this. Can anyone give me a bit of input
on where I should go next? Is there a log file that will give more
information? Really quite confused and stuck!

Regards,

James

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Internal-Error-500-How-to-diagnose-tp3993087.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Internal Error 500 - How to diagnose?

2012-07-04 Thread Markus Jelsma
Check your /var/log/tomcat*/. It logs to a catalina.out file unless you 
modified log4j.properties.
 
 
-Original message-
> From:Spadez 
> Sent: Thu 05-Jul-2012 00:36
> To: solr-user@lucene.apache.org
> Subject: Internal Error 500 - How to diagnose?
> 
> Hi,
> 
> Sorry for this post, but im having a hard time getting my head around this.
> I installed Solr on Tomcat and it seems to work fine. I get the solr admin
> page and the "it works" page from tomcat. 
> 
> When I try to query my solr server I get this message:
> 
> *Internal Server Error
> 
> The server encountered an internal error and was unable to complete your
> request. Either the server is overloaded or there is an error in the
> application.*
> 
> I had this working before but I have changed almost everything since so I
> dont know where to start diagnosing this. Can anyone give me a bit of input
> on where I should go next? Is there a log file that will give more
> information? Really quite confused and stuck!
> 
> Regards,
> 
> James
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Internal-Error-500-How-to-diagnose-tp3993087.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 


RE: Internal Error 500 - How to diagnose?

2012-07-04 Thread Spadez
Thank you, the query seems to have got through, thats good i guess?

*Jul 4, 2012 6:32:34 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/select
params={facet=true&facet.query={!key%3Danytime}date:[*+TO+*]&facet.query={!key%3D1day}date:[NOW/DAY-1DAY+TO+NOW/DAY]&facet.query={!key%3D3days}date:[NOW/DAY-3DAYS+TO+NOW/DAY]&facet.query={!key%3D1week}date:[NOW/DAY-7DAYS+TO+NOW/DAY]&facet.query={!key%3D1month}date:[NOW/DAY-1MONTH+TO+NOW/DAY]&facet.query={!geofilt+d%3D10+key%3D10kms}&facet.query={!geofilt+d%3D30+key%3D30kms}&facet.query={!geofilt+d%3D50+key%3D50kms}&facet.query={!geofilt+d%3D100+key%3D100kms}&start=0&q=(title:(test))+OR+(description:(test))+OR+(company:(test))+OR+(location_name:(test))&sfield=latlng&pt=51.27241,0.190898&wt=python&fq={!geofilt+d%3D10}&rows=10}
hits=0 status=0 QTime=3 *

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Internal-Error-500-How-to-diagnose-tp3993087p3993089.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Internal Error 500 - How to diagnose?

2012-07-04 Thread Lance Norskog
Eclipse and IntelliJ have remote debugging for tomcat. Sometime it is
the only way.

On Wed, Jul 4, 2012 at 3:48 PM, Spadez  wrote:
> Thank you, the query seems to have got through, thats good i guess?
>
> *Jul 4, 2012 6:32:34 PM org.apache.solr.core.SolrCore execute
> INFO: [] webapp=/solr path=/select
> params={facet=true&facet.query={!key%3Danytime}date:[*+TO+*]&facet.query={!key%3D1day}date:[NOW/DAY-1DAY+TO+NOW/DAY]&facet.query={!key%3D3days}date:[NOW/DAY-3DAYS+TO+NOW/DAY]&facet.query={!key%3D1week}date:[NOW/DAY-7DAYS+TO+NOW/DAY]&facet.query={!key%3D1month}date:[NOW/DAY-1MONTH+TO+NOW/DAY]&facet.query={!geofilt+d%3D10+key%3D10kms}&facet.query={!geofilt+d%3D30+key%3D30kms}&facet.query={!geofilt+d%3D50+key%3D50kms}&facet.query={!geofilt+d%3D100+key%3D100kms}&start=0&q=(title:(test))+OR+(description:(test))+OR+(company:(test))+OR+(location_name:(test))&sfield=latlng&pt=51.27241,0.190898&wt=python&fq={!geofilt+d%3D10}&rows=10}
> hits=0 status=0 QTime=3 *
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Internal-Error-500-How-to-diagnose-tp3993087p3993089.html
> Sent from the Solr - User mailing list archive at Nabble.com.



-- 
Lance Norskog
goks...@gmail.com


Re: Problem with sorting solr docs

2012-07-04 Thread Bill Bell
Would all optional fields need the sortmissinglast and sortmissingfirst set 
even when not sorting on that field? Seems broken to me.

Sent from my Mobile device
720-256-8076

On Jul 3, 2012, at 6:45 AM, Shubham Srivastava 
 wrote:

> Just adding to the below--> If there is a field(say X) which is not populated 
> and in the query I am not sorting on this particular field but on another 
> field (say Y) still the result ordering would depend on X .
> 
> Infact in the below problem mentioned by Harsh making X as 
> sortMissingLast="false" sortMissingFirst="false" solved the problem while in 
> the query he was sorting on Y.  This seems a bit illogical.
> 
> Regards,
> Shubham
> 
> From: Harshvardhan Ojha [harshvardhan.o...@makemytrip.com]
> Sent: Tuesday, July 03, 2012 5:58 PM
> To: solr-user@lucene.apache.org
> Subject: RE: Problem with sorting solr docs
> 
> Hi,
> 
> I have added  sortMissingLast="false" sortMissingFirst="false"/> to my schema.xml, although 
> I am searching on name field.
> It seems to be working fine. What is its default behavior?
> 
> Regards
> Harshvardhan Ojha
> 
> -Original Message-
> From: Rafał Kuć [mailto:r@solr.pl]
> Sent: Tuesday, July 03, 2012 5:35 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Problem with sorting solr docs
> 
> Hello!
> 
> But the latlng field is not taken into account when sorting with sort defined 
> such as in your query. You only sort on the name field and only that field. 
> You can also define Solr behavior when there is no value in the field, but 
> adding sortMissingLast="true" or sortMissingFirst="true" to your type 
> definition in the schema.xml file.
> 
> --
> Regards,
> Rafał Kuć
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch
> 
>> Hi,
> 
>> Thanks for reply.
>> I want to sort my docs on name field, it is working well only if I have all 
>> fields populated well.
>> But my latlng field is optional, every doc will not have this value.
>> So those docs are not getting sorted.
> 
>> Regards
>> Harshvardhan Ojha
> 
>> -Original Message-
>> From: Rafał Kuć [mailto:r@solr.pl]
>> Sent: Tuesday, July 03, 2012 5:24 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Problem with sorting solr docs
> 
>> Hello!
> 
>> Your query suggests that you are sorting on the 'name' field instead
>> of the latlng field (sort=name +asc).
> 
>> The question is what you are trying to achieve ? Do you want to sort
>> your documents from a given geographical point ? If that's the case
>> you may want to look here:
>> http://wiki.apache.org/solr/SpatialSearch/
>> and look at the possibility of sorting on the distance from a given point.
> 
>> --
>> Regards,
>> Rafał Kuć
>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
>> ElasticSearch
> 
> 
>> Hi,
>> 
>> I have 260 docs which I want to sort on a single field latlng.
>> 
>> 1
>> Amphoe Khanom
>> 1.0,1.0
>> 
>> 
>> My query is :
>> http://localhost:8080/solr/select?q=*:*&sort=name +asc
>> 
>> This query sorts all documents except those which doesn’t have latlng,
>> and I can’t keep any default value for this field.
>> My question is how can I sort all docs on latlng?
>> 
>> Regards
>> Harshvardhan Ojha  | Software Developer - Technology Development
>>|  MakeMyTrip.com, 243 SP Infocity, Udyog Vihar Phase 1, Gurgaon,
>> Haryana - 122 016, India
> 
>> What's new?: Inspire - Discover an inspiring new way to plan and book travel 
>> online.
> 
> 
>> Office Map
> 
>> Facebook
> 
>> Twitter
> 
> 
>> 
> 


Re: How to space between spatial search results? (Declustering)

2012-07-04 Thread David Smiley (@MITRE.org)
Hi mcb

You're looking for spatial clustering.  I answered this question yesterday
on Stack Overflow:
http://stackoverflow.com/a/11321723/92186

~ David Smiley

-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-space-between-spatial-search-results-Declustering-tp3992668p3993106.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr facet multiple constraint

2012-07-04 Thread davidbougearel
Please someone can help me, 

we are a team waiting for a fix.
We try several ways to implement it without success.

Thanks for reading anyway, David.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-facet-multiple-constraint-tp3992974p3993119.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Use of Solr as primary store for search engine

2012-07-04 Thread Shawn Heisey

On 7/4/2012 1:54 AM, Amit Nithian wrote:

I am curious to know how people are using Solr in conjunction with
other data stores when building search engines to power web sites (say
an ecommerce site). The question I have for the group is given an
architecture where the primary (transactional) data store is MySQL
(Oracle, PostGres whatever) with periodic indexing into Solr, when
your front end issues a search query to Solr and returns results, are
there any joins with your primary Oracle/MySQL etc to help render
results?


We used to pull almost everything from our previous search engine. 
Shortly after we switched to Solr, we began deploying a new version of 
our website which pulls more from the original data source.  The current 
goal is to only store just enough data in Solr to render a search result 
grid (pulling thumbails from the filesystem), but go to the database and 
the filesystem for detail pages.  We'd like to reduce the index size to 
the point where the whole thing will fit in RAM, which we hope will also 
reduce the amount of time required for a full reindex.


What I hope to gain out of upgrading to Solr 4: Use the NRT features so 
that we can index item popularity and purchase data fast enough to make 
it actually useful.


Thanks,
Shawn



Re: How to improve this solr query?

2012-07-04 Thread Chamnap Chhorn
Hi Amit,

Thanks for your response.
1. It's just sometimes I see solr doesn't sort by score desc, so I made it
like that. I will have to check that again.
2. q1 and q2 are doing the search but just on different fields. String
fields means that it must match exactly, and solr need the q parameter to
be quoted. I did a nested query with the OR operator.

I'll check out the bf, pf, bq parameter more.

Thanks for the advise. :)

On Wed, Jul 4, 2012 at 2:28 PM, Amit Nithian  wrote:

> Couple questions:
> 1) Why are you explicitly telling solr to sort by score desc,
> shouldn't it do that for you? Could this be a source of performance
> problems since sorting requires the loading of the field caches?
> 2) Of the query parameters, q1 and q2, which one is actually doing
> "text" searching on your index? It looks like q1 is doing non-string
> related stuff, could this be better handled in either the bf or bq
> section of the edismax config? Looking at the sample though I don't
> understand how q1=apartment would hit non-string fields again (but see
> #3)
> 3) Are the "string" fields literally of string type (i.e. no analysis
> on the field) or are you saying string loosely to mean "text" field.
> pf ==> phrase fields ==> given a multiple word query, will ensure that
> the specified phrase exists in the specified fields separated by some
> slop ("hello my world" may match "hello world" depending on this slop
> value). The "qf" means that given a multi term query, each term exists
> in the specified fields (name, description whatever text fields you
> want).
>
> Best
> Amit
>
> On Mon, Jul 2, 2012 at 9:35 AM, Chamnap Chhorn 
> wrote:
> > Hi all,
> >
> > I'm using solr 3.5 with nested query on the 4 core cpu server + 17 Gb.
> The
> > problem is that my query is so slow; the average response time is 12 secs
> > against 13 millions documents.
> >
> > What I am doing is to send quoted string (q2) to string fields and
> > non-quoted string (q1) to other fields and combine the result together.
> >
> >
> facet=true&sort=score+desc&q2=*"apartment"*&facet.mincount=1&q1=*apartment*
> >
> &tie=0.1&q.alt=*:*&wt=json&version=2.2&rows=20&fl=uuid&facet.query=has_map:+true&facet.query=has_image:+true&facet.query=has_website:+true&start=0&q=
> > *
> >
> _query_:+"{!dismax+qf='.'+fq='..'+v=$q1}"+OR+_query_:+"{!dismax+qf='..'+fq='...'+v=$q2}"
> > *
> >
> &facet.field={!ex%3Ddt}sub_category_uuids&facet.field={!ex%3Ddt}location_uuid
> >
> > I have done solr optimize already, but it's still slow. Any idea how to
> > improve the speed? Am I done anything wrong?
> >
> > --
> > Chhorn Chamnap
> > http://chamnap.github.com/
>



-- 
Chhorn Chamnap
http://chamnap.github.com/


Search for abc AND *foo* return all docs for abc which do not have foo why?

2012-07-04 Thread Alok Bhandari
Hello,

If I Search for abc AND *foo* return all docs for abc which do not have foo
why? I suspect that if the * is present on both the side of a word then that
word is ignored. Is it the correct interpretation? I am using solr 3.6 and
field uses StandardTokenizer. 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Search-for-abc-AND-foo-return-all-docs-for-abc-which-do-not-have-foo-why-tp3993138.html
Sent from the Solr - User mailing list archive at Nabble.com.