Security/authentication strategies

2010-04-29 Thread Andrew McCombe
Hi

I'm planning on adding some protection to our solr servers and would
like to know what others are doing in this area.

Basically I have a few solr cores running under tomcat6 and all use DH
to populate the solr index.  This is all behind a firewall and only
accessible from certain IP addresses.  Access to Solr Admin is open to
anyone in the company and many use it for checking data is in the
index and simple analysis.  However, they can also trigger a
full-import if they are careless (one of the cores takes 6 hours to
ingest the data).

What would be the recommended way of protecting things like the DIH
functionality? HTTP Authentication via tomcat realms or are there any
other solutions?

Thanks
Andrew McCombe
iWeb Solutions


Slow Date-Range Queries

2010-04-29 Thread Jan Simon Winkelmann
Hi,

I am currently having serious performance problems with date range queries. 
What I am doing, is validating a datasets published status by a valid_from and 
a valid_till date field.

I did get a performance boost of ~ 100% by switching from a normal 
solr.DateField to a solr.TrieDateField with precisionStep="8", however my query 
still takes about 1,3 seconds.

My field defintion looks like this:







And the query looks like this:
((valid_from:[* TO 2010-04-29T10:34:12Z]) AND (valid_till:[2010-04-29T10:34:12Z 
TO *])) OR ((*:* -valid_from:[* TO *]) AND (*:* -valid_till:[* TO *])))

I use the empty checks for datasets which do not have a valid from/till range.


Is there any way to get this any faster? Would it be faster using 
unix-timestamps with int fields?

I would appreciate any insight and help on this.

regards,
Jan-Simon



Re: How are (multiple) filter queries processed?

2010-04-29 Thread Alexander Valet
Hi, thanks for your help, I figued it out myself I guess.
All parts of an fq are always intersected, so it has no effect to put
a boolean operator inside a fq like in 

fq=+tags:(Gucci) OR -tags:(watch sunglasses)

(would be a mildly strange query anyway)

The order in which the intersections are made follows their appearance
in the query I suppose.

best regards,
Alex
 

On Di, 2010-04-27 at 12:09 -0700, Chris Hostetter wrote:
> : i was wondering how the following query might be processed:
> : 
> : ?q=*:*&fq=+tags:(Gucci)&fq=-tags:(watch sunglasses)
> 
> they are intersected so only documents matching all of them are potential 
> matches.
> 
> : and if there is a difference to a query with only one fq parameter like
> : 
> : ?q=*:*&fq=+tags:(Gucci) -tags:(watch sunglasses)
> : 
> : I am aware of the chaching implications but i am not sure how the set
> : intersections work between the results of the 'q' and one or more 'fq'
> : parameters and if it is possible to use boolean operators inside a 
> : filter query.
> 
> filter queries an use an QParser, so you can use boolean operators if the 
> QParser supports it (by default the QParser is "lucene" so "yes") ... 
> 
> i don't understand the "i am not sure how the set intersections work between 
> the results of the 'q' and one or more 'fq'" part of your question, can 
> you clarify what it is you are asking?
> 
> 
> -Hoss
> 




Re: How are (multiple) filter queries processed?

2010-04-29 Thread Alexander Valet
Hi, thanks for your help, I figued it out myself I guess.
All parts of an fq are always intersected, so it has no effect to put
a boolean operator inside a fq like in 

fq=+tags:(Gucci) OR -tags:(watch sunglasses)

(would be a mildly strange query anyway)

The order in which the intersections are made follows their appearance
in the query I suppose.

best regards,
Alex


On Di, 2010-04-27 at 12:09 -0700, Chris Hostetter wrote:
> : i was wondering how the following query might be processed:
> : 
> : ?q=*:*&fq=+tags:(Gucci)&fq=-tags:(watch sunglasses)
> 
> they are intersected so only documents matching all of them are potential 
> matches.
> 
> : and if there is a difference to a query with only one fq parameter like
> : 
> : ?q=*:*&fq=+tags:(Gucci) -tags:(watch sunglasses)
> : 
> : I am aware of the chaching implications but i am not sure how the set
> : intersections work between the results of the 'q' and one or more 'fq'
> : parameters and if it is possible to use boolean operators inside a 
> : filter query.
> 
> filter queries an use an QParser, so you can use boolean operators if the 
> QParser supports it (by default the QParser is "lucene" so "yes") ... 
> 
> i don't understand the "i am not sure how the set intersections work between 
> the results of the 'q' and one or more 'fq'" part of your question, can 
> you clarify what it is you are asking?
> 
> 
> -Hoss
> 




require synonym filter on string field

2010-04-29 Thread Ranveer Kumar
Hi,

I require to configure synonym to exact match.
The field I need to search is string type. I tried to configure by the text
but in text, due to whitespace tokenizer exact match not found.
My requirement is :
suppose user search by "solr user" and exact "solr user" (or equivalant
synonym) are available then only return result..
my fieldType is "string" and I want to configure synonym on string field.

or
Is there any other way to index without tokenize (as it is) string and
configure synonym for that field?

please help..


Solr date range problem - specific date problem

2010-04-29 Thread Hamid Vahedi
I index some data include date in solr
but when search for specific date, i get some record (not all record) 
include some record in next day for example: 
http://localhost:8080/solr/select/?q=pubdate:[2010-03-25T00:00:00Z >TO 
>2010-03-25T23:59:59Z]&start=0&rows=10&indent=on&sort=pubdate
> desc
i have 625000 record in 2010-03-25 but above query result return 
325412 that include 14 record from 2010-03-26. 
Also i try with below query, but not get right result
http://localhost:8080/solr/select/?q=pubdate:"2010-03-25T00:00:00Z"&start=0&rows=10&indent=on&sort=pubdate
>
>  desc
How to get right result for specific date ???

Could you please help me?

Thanks in advanced
Hamid


  

Re: CDATA For All Fields?

2010-04-29 Thread Erik Hatcher

yes, that's totally fine.

On Apr 28, 2010, at 7:14 PM, Thomas Nguyen wrote:


Is there anything wrong with wrapping the text content of all fields
with CDATA whether they be analyzed, not analyzed, indexed, not  
indexed

and etc.?  I have a script that creates update XML documents and it's
just simple to wrap all text content in all fields with CDATA.  From  
my

brief tests it does not affect the search results at all.





AW: No highlighting results with dismax?

2010-04-29 Thread Markus.Rietzler
we use dismax and highlighting works fine.
the only thing we had to add to the query-url was

&hl.fl=FIELD1,FIELD2

so we had to specify which fields should be used for highlighting. 

> -Ursprüngliche Nachricht-
> Von: fabritw [mailto:fabr...@gmail.com] 
> Gesendet: Mittwoch, 28. April 2010 16:08
> An: solr-user@lucene.apache.org
> Betreff: No highlighting results with dismax?
> 
> 
> Hi,
> 
> Can highlights be returned when using the dismax request handler? 
> 
> I read in the below post that I can use a workaround with "qf"?
> http://lucene.472066.n3.nabble.com/bug-No-highlighting-results
> -with-dismax-and-q-alt-td498132.html
> 
> Any advise is greatly appreciated.
> 
> Regards, Will
> 
> 
> 
> 
> 
> -- 
> View this message in context: 
> http://lucene.472066.n3.nabble.com/No-highlighting-results-wit
> h-dismax-tp762570p762570.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 


solr multi indexes and scoring

2010-04-29 Thread khirb7

Hello every body,

In our application  we are dealing with music. In our index we are  storing
music tracks (3 million documents). We have popularity field which inside
the track document, this field  contains the number of  times  the track
have been listened.
The issue is that we are forced to ré-index the whole 3 millions documents
every day to update this field. this field is very important for us because
we use it to modify  scoring via the _val_ parameter to boost track who have
higher popularity.
- Our tracks are stored in a data base, even we use delta import ,via Solr
DIH, it took so much time because the majority of tracks' popularity is
updated. Our indexe is growing every week so ré-indexing isn't well at all.

- An other possible solution is to split the indexe into two index:
  1- the first one will  contain the  popularity and a key to point the
second index (Fk --> PK).
  2- the second index  will contain the stable fields on which we run the
search, but how to modify the score using popularity wich is inside the
first index , is it possible with Solr or  am I obliged to manage this
inside my code to trigger a search on the first index for each document 
returned from the second index and recalculate the default score returner
from this second index?
So what is the best solution to deal with that.

Any suggestion is welcome and thank you in advance.   
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-multi-indexes-and-scoring-tp764837p764837.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr multi indexes and scoring

2010-04-29 Thread Koji Sekiguchi

khirb7 wrote:

Hello every body,

In our application  we are dealing with music. In our index we are  storing
music tracks (3 million documents). We have popularity field which inside
the track document, this field  contains the number of  times  the track
have been listened.
The issue is that we are forced to ré-index the whole 3 millions documents
every day to update this field. this field is very important for us because
we use it to modify  scoring via the _val_ parameter to boost track who have
higher popularity.
  

Just using ExternalFileField may solve your problem?

http://lucene.apache.org/solr/api/org/apache/solr/schema/ExternalFileField.html

Koji

--
http://www.rondhuit.com/en/



Re: require synonym filter on string field

2010-04-29 Thread Koji Sekiguchi

Ranveer Kumar wrote:

Hi,

I require to configure synonym to exact match.
The field I need to search is string type. I tried to configure by the text
but in text, due to whitespace tokenizer exact match not found.
My requirement is :
suppose user search by "solr user" and exact "solr user" (or equivalant
synonym) are available then only return result..
my fieldType is "string" and I want to configure synonym on string field.

or
Is there any other way to index without tokenize (as it is) string and
configure synonym for that field?

please help..

  

Why don't you use KeywordTokenizer? And if you want to
treat text in synonyms.txt as string as well,  set tokenizerFactory
attribute to KeywordTokenizerFactory.

Koji

--
http://www.rondhuit.com/en/



Re: require synonym filter on string field

2010-04-29 Thread Ranveer

On 4/29/10 3:45 PM, Koji Sekiguchi wrote:

Ranveer Kumar wrote:

Hi,

I require to configure synonym to exact match.
The field I need to search is string type. I tried to configure by 
the text

but in text, due to whitespace tokenizer exact match not found.
My requirement is :
suppose user search by "solr user" and exact "solr user" (or equivalant
synonym) are available then only return result..
my fieldType is "string" and I want to configure synonym on string 
field.


or
Is there any other way to index without tokenize (as it is) string and
configure synonym for that field?

please help..


Why don't you use KeywordTokenizer? And if you want to
treat text in synonyms.txt as string as well,  set tokenizerFactory
attribute to KeywordTokenizerFactory.

Koji


Hi Koji,
thanks for reply.
where should I use the KeywordTokenizerFactory in string or in text field.

I am wondering that KeywordTokenizerFactory will work or not in 
textfield. Actually as I understood about the KeywordTokenizerFactory 
that : KeywordTokenizerFactory is tokenize the keyword.
 for example : 'solr user' will tokenize to 'solr' and 'user' because 
solr and user are keyword.. My requirement is to index as 'solr user'




Re: Slow Date-Range Queries

2010-04-29 Thread Ahmet Arslan

> I am currently having serious performance problems with
> date range queries. What I am doing, is validating a
> datasets published status by a valid_from and a valid_till
> date field.
> 
> I did get a performance boost of ~ 100% by switching from a
> normal solr.DateField to a solr.TrieDateField with
> precisionStep="8", however my query still takes about 1,3
> seconds.
> 
> My field defintion looks like this:
> 
>  precisionStep="8" sortMissingLast="true"
> omitNorms="true"/>
> 
>  stored="false" required="false" />
>  stored="false" required="false" />
> 
> 
> And the query looks like this:
> ((valid_from:[* TO 2010-04-29T10:34:12Z]) AND
> (valid_till:[2010-04-29T10:34:12Z TO *])) OR ((*:*
> -valid_from:[* TO *]) AND (*:* -valid_till:[* TO *])))
> 
> I use the empty checks for datasets which do not have a
> valid from/till range.
> 
> 
> Is there any way to get this any faster?

I can suggest you two things. 

1-) valid_till:[* TO *] and valid_from:[* TO *] type queries can be performance 
killer. You can create a new boolean field ( populated via conditional copy or 
populated client side) that holds the information whether valid_from exists or 
not. So that valid_till:[* TO *] can be rewritten as valid_till_bool:true.

2-) If you are embedding these queries into q parameter, you can write your 
clauses into (filter query) fq parameters so that they are cached. 


  


AW: Slow Date-Range Queries

2010-04-29 Thread Jan Simon Winkelmann
> > ((valid_from:[* TO 2010-04-29T10:34:12Z]) AND
> > (valid_till:[2010-04-29T10:34:12Z TO *])) OR ((*:*
> > -valid_from:[* TO *]) AND (*:* -valid_till:[* TO *])))
> >
> > I use the empty checks for datasets which do not have a
> > valid from/till range.
> >
> >
> > Is there any way to get this any faster?
> 
> I can suggest you two things.
> 
> 1-) valid_till:[* TO *] and valid_from:[* TO *] type queries can be
> performance killer. You can create a new boolean field ( populated via
> conditional copy or populated client side) that holds the information
> whether valid_from exists or not. So that valid_till:[* TO *] can be
> rewritten as valid_till_bool:true.

That may be an idea, however i checked what happens when I simply leave them 
out. It does affect the performance but the query is still somewhere around 1 
second.
 
> 2-) If you are embedding these queries into q parameter, you can write
> your clauses into (filter query) fq parameters so that they are cached.

The problem here is, that the timestamp itself does change quite a bit and 
hence cannot be properly cached. It could be for a few seconds, but occasional 
response times of more than a second is still unacceptable for us. We need a 
solution that responds quickly ALL the time, not just most of the time.

Thanks for your ideas though :)

regards,
Jan-Simon



Re: require synonym filter on string field

2010-04-29 Thread Ahmet Arslan
> I am wondering that KeywordTokenizerFactory will work or
> not in textfield. Actually as I understood about the
> KeywordTokenizerFactory that : KeywordTokenizerFactory is
> tokenize the keyword.
>  for example : 'solr user' will tokenize to 'solr' and
> 'user' because solr and user are keyword.. My requirement is
> to index as 'solr user'
> 

you can use something like:


  

   
  

 
  


Also: "KeywordTokenizer does no actual tokenizing, so the entire input string 
is preserved as a single token" [from example\solr\conf\schema.xml] 


  


Re: Using QueryElevationComponent without specifying top results?

2010-04-29 Thread Oliver Beattie
Just wondering if anyone had any further thoughts on how I might do this?

On 26 April 2010 19:18, Oliver Beattie  wrote:

> Hi Grant,
>
> Thanks for getting back to me. Yes, indeed, #1 is exactly what I'm looking
> for. Results are already ranked by distance (among other things), but we
> need the ability to manually include a certain result in the set. They
> wouldn't usually match, because they fall outside the radius of the filter
> query we use. Most of the resulting score comes from function queries (we
> have a number of metrics that rank listings [price, feedback score, etc]),
> so the score from the text search doesn't have *that much* bearing on the
> outcome. So, yeah, basically, I'm looking for a way to include results that
> don't match, but have Solr calculate its score as it would if it did match
> the filter query. Sorry for being so unclear and rambling a bit, I'm
> struggling to articulate what we want in a clear manner!
>
> —Oliver
>
>
>
> On 26 April 2010 19:13, Grant Ingersoll  wrote:
>
>>
>> On Apr 26, 2010, at 7:53 AM, Oliver Beattie wrote:
>>
>> > Hi all,
>> >
>> > I'm currently writing an application that uses Solr, and we'd like to
>> use
>> > something like the QueryElevationComponent, without having to specify
>> which
>> > results appear top. For example, what we really need is a way to say
>> "for
>> > this search, include these results as part of the result set, and rank
>> them
>> > as you normally would". We're using a filter to specify which results we
>> > want included (which is distance-based), but we really want to be able
>> to
>> > explicitly include certain results in certain queries (i.e. we want to
>> > include a listing more than 5 miles away from a particular location for
>> > certain queries).
>> >
>> > Is this possible? Any help would be really appreciated :)
>>
>>
>> I'm not following the "rank them as you normally would" part.  If Solr
>> were already finding them, then they would already be ranked and showing up
>> in the results and you wouldn't need to "hardcode" them, right?  So, that
>> leaves a couple of cases:
>>
>> 1. Including results that don't match
>> 2. Elevating results that do match
>>
>> In your case, it sounds like you mostly just want #1.  And, based on the
>> context (distance search) perhaps you want those results sorted by distance?
>>  Otherwise, how else would you know where to inject the results?
>>
>> The QueryElevationComponent can include the results, although, I must
>> admit, I'm not 100% certain on what happens to injected results given
>> sorting.
>>
>>
>> --
>> Grant Ingersoll
>> http://www.lucidimagination.com/
>>
>> Search the Lucene ecosystem using Solr/Lucene:
>> http://www.lucidimagination.com/search
>>
>>
>


Re: require synonym filter on string field

2010-04-29 Thread Koji Sekiguchi



Hi Koji,
thanks for reply.
where should I use the KeywordTokenizerFactory in string or in text 
field.


I am wondering that KeywordTokenizerFactory will work or not in 
textfield. Actually as I understood about the KeywordTokenizerFactory 
that : KeywordTokenizerFactory is tokenize the keyword.
 for example : 'solr user' will tokenize to 'solr' and 'user' because 
solr and user are keyword.. My requirement is to index as 'solr user'




KeywordTokenizer emits the entire input as a single token.
Apply KeywordTokenizerFactory to TextField and try to
see how "solr user" is tokenized via analysis.jsp (Launch admin GUI
> ANALYSIS).

Koji

--
http://www.rondhuit.com/en/



Re: Security/authentication strategies

2010-04-29 Thread Peter Sturge
Hi Andrew,

Today, authentication is handled by the container (e.g. Tomcat, Jetty etc.).


There's a thread I found to be very useful on this topic here:

http://www.lucidimagination.com/search/document/d1e338dc452db2e4/how_can_i_protect_the_solr_cores

This was for Jetty, but the idea is pretty much the same for Tomcat.

HTH

Peter



On Thu, Apr 29, 2010 at 8:42 AM, Andrew McCombe  wrote:

> Hi
>
> I'm planning on adding some protection to our solr servers and would
> like to know what others are doing in this area.
>
> Basically I have a few solr cores running under tomcat6 and all use DH
> to populate the solr index.  This is all behind a firewall and only
> accessible from certain IP addresses.  Access to Solr Admin is open to
> anyone in the company and many use it for checking data is in the
> index and simple analysis.  However, they can also trigger a
> full-import if they are careless (one of the cores takes 6 hours to
> ingest the data).
>
> What would be the recommended way of protecting things like the DIH
> functionality? HTTP Authentication via tomcat realms or are there any
> other solutions?
>
> Thanks
> Andrew McCombe
> iWeb Solutions
>


RE: Problem with DIH delta-import on JDBC

2010-04-29 Thread cbennett
Hi,

It looks like the deltaImportQuery needs to be changed you are using
dataimporter.delta.id which is not correct, you are selecting objected in
the deltaQuery, so the deltaImportQuery should be using
dataimporter.delta.objectid

So try this:




Colin.

> -Original Message-
> From: safl [mailto:s...@salamin.net]
> Sent: Wednesday, April 28, 2010 3:05 PM
> To: solr-user@lucene.apache.org
> Subject: Problem with DIH delta-import on JDBC
> 
> 
> Hello,
> 
> I'm just new on the list.
> I searched a lot on the list, but I didn't find an answer to my
> question.
> 
> I'm using Solr 1.4 on Windows with an Oracle 10g database.
> I am able to do full-import without any problem, but I'm not able to
> get
> delta-import working.
> 
> I have the following in the data-config.xml:
> 
> ...
>  query="select * from table"
> deltaImportQuery="select * from table where
> objectid='${dataimporter.delta.id}'"
> deltaQuery="select objectid from table where lastupdate >
> '${dataimporter.last_index_time}'">
> 
> ...
> 
> I update some records in the table and the try to run a delta-import.
> I track the SQL queries on DB with P6Spy, and I always see a query like
> 
> select * from table where objectid=''
> 
> Of course, with such an SQL query, nothing is updated in my index.
> 
> It behave the same if I replace ${dataimporter.delta.id} by
> ${dataimporter.delta.objectid}.
> Can someone tell what is wrong with it?
> 
> Thanks a lot,
>  Florian
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Problem-with-DIH-delta-import-on-
> JDBC-tp763469p763469.html
> Sent from the Solr - User mailing list archive at Nabble.com.





Re: Problem in solr search

2010-04-29 Thread stockii

hey..

try the fq parameter !? 

...&fq=(title:A country:USA)
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-in-solr-search-tp765028p765171.html
Sent from the Solr - User mailing list archive at Nabble.com.


JTeam Spatial Plugin

2010-04-29 Thread Jean-Sebastien Vachon
Hi All,

I am using JTeam's Spatial Plugin RC3 to perform spatial searches on my index 
and it works great. However, I can't seem to get it to return the computed 
distances.

My query component is run before the geoDistanceComponent and the distanceField 
is set to "distance"
Fields for lat/long are defined as well and the different tiers field are in 
the results. Increasing the radius cause the number of matches to increase so I 
guess that my setup is working...

Here is sample query and its output (I removed some of the fields to keep it 
short):

/select?passkey=sample&q={!spatial%20lat=40.27%20long=-76.29%20radius=22%20calc=arc}title:engineer&wt=json&indent=on&fl=*,distance



{
 "responseHeader":{
  "status":0,
  "QTime":69,
  "params":{
"fl":"*,distance",
"indent":"on",
"q":"{!spatial lat=40.27 long=-76.29 radius=22 calc=arc}title:engineer",
"wt":"json"}},
 "response":{"numFound":223,"start":0,"docs":[
{

 "title":"Electrical Engineer",
"long":-76.3054962158203,
 "lat":40.037899017334,
 "_tier_9":-3.004,
 "_tier_10":-6.0008,
 "_tier_11":-12.0016,
 "_tier_12":-24.0031,
 "_tier_13":-47.0061,
 "_tier_14":-93.00122,
 "_tier_15":-186.00243,
 "_tier_16":-372.00485},
}}

This output suggests to me that everything is in place. Anyone knows how to 
fetch the computed distance? I tried adding the field 'distance' to my list of 
fields but it didn't work

Thanks


How to make documents low priority

2010-04-29 Thread Doddamani, Prakash
Hi,

 

I am using the boost factor as below



   field1^20.0 field2^5 field3^2.5 field4^.5 



 

Where it searches first in field1 then field1 and so on

 

Is there a way, where I can make some documents very low priority so
that they come at the end?

 

Scenario :

 



aaa

bbb





1



2010-04-29T12:40:05.589Z



 

I want all the documents which have field5=1 come last and documents
which have field5=0 should come first while searching.

Any advise is greatly appreciated.

 

Thanks

Prakash



synonym filter problem for string or phrase

2010-04-29 Thread Ranveer

Hi,

I am trying to configure synonym filter.
my requirement is:
when user searching by phrase like "what is solr user?" then it should 
be replace with "solr user".

something like : what is solr user? => solr user

My schema for particular field is:

positionIncrementGap="100">










ignoreCase="true" expand="true" tokenizerFactory="KeywordTokenizerFactory"/>




it seems working fine while trying by analysis.jsp but not by url
http://localhost:8080/solr/core0/select?q="what is solr user?"
or
http://localhost:8080/solr/core0/select?q=what is solr user?

Please guide me for achieve desire result.



RE: Solr date range problem - specific date problem

2010-04-29 Thread Ankit Bhatnagar

You should do this -

 http://localhost:8080/solr/select/?q=*:*&fq=pubdate:[2010-03-25T00:00:00Z 
%20TO%202010-03-25T23:59:59Z]


Ankit

-Original Message-
From: Hamid Vahedi [mailto:hvb...@yahoo.com] 
Sent: Thursday, April 29, 2010 5:33 AM
To: solr-user@lucene.apache.org
Subject: Solr date range problem - specific date problem

I index some data include date in solr
but when search for specific date, i get some record (not all record) 
include some record in next day for example: 
http://localhost:8080/solr/select/?q=pubdate:[2010-03-25T00:00:00Z >TO 
>2010-03-25T23:59:59Z]&start=0&rows=10&indent=on&sort=pubdate
> desc
i have 625000 record in 2010-03-25 but above query result return 
325412 that include 14 record from 2010-03-26. 
Also i try with below query, but not get right result
http://localhost:8080/solr/select/?q=pubdate:"2010-03-25T00:00:00Z"&start=0&rows=10&indent=on&sort=pubdate
>
>  desc
How to get right result for specific date ???

Could you please help me?

Thanks in advanced
Hamid


  


RE: Problem with DIH delta-import on JDBC

2010-04-29 Thread safl

Hi,

I did a debugger session and found that the column names are case sensitive
(at least with Oracle).
The column names are retreived from the JDBC metadatas and I found that my
objectid is in fact OBJECTID.

So now, I'm able to do an update with the following config (pay attention to
the OBJECTID):





Is there a way to be "case insensitive" ?

Anyway, it works now and that's the most important thing!
:-)

Thanks to all,
 Florian



cbennett wrote:
> 
> Hi,
> 
> It looks like the deltaImportQuery needs to be changed you are using
> dataimporter.delta.id which is not correct, you are selecting objected in
> the deltaQuery, so the deltaImportQuery should be using
> dataimporter.delta.objectid
> 
> So try this:
> 
>  query="select * from table"
> deltaImportQuery="select * from table where
> objectid='${dataimporter.delta.objectid}'"
> deltaQuery="select objectid from table where lastupdate >
> '${dataimporter.last_index_time}'">
> 
> 
> Colin.
> 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-with-DIH-delta-import-on-JDBC-tp763469p765262.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to make documents low priority

2010-04-29 Thread Jon Baer
Does a "sort=field5+desc" on the query param not work?

- Jon

On Apr 29, 2010, at 9:32 AM, Doddamani, Prakash wrote:

> Hi,
> 
> 
> 
> I am using the boost factor as below
> 
> 
> 
>   field1^20.0 field2^5 field3^2.5 field4^.5 
> 
> 
> 
> 
> 
> Where it searches first in field1 then field1 and so on
> 
> 
> 
> Is there a way, where I can make some documents very low priority so
> that they come at the end?
> 
> 
> 
> Scenario :
> 
> 
> 
> 
> 
> aaa
> 
> bbb
> 
> 
> 
> 
> 
> 1
> 
> 
> 
> 2010-04-29T12:40:05.589Z
> 
> 
> 
> 
> 
> I want all the documents which have field5=1 come last and documents
> which have field5=0 should come first while searching.
> 
> Any advise is greatly appreciated.
> 
> 
> 
> Thanks
> 
> Prakash
> 



RE: Slow Date-Range Queries

2010-04-29 Thread Nagelberg, Kallin
You might want to look at DateMath, 
http://lucene.apache.org/solr/api/org/apache/solr/util/DateMathParser.html. I 
believe the default precision is to the millisecond, so if you afford to round 
to the nearest second or even minute you might see some performance gains.

-Kallin Nagelberg

-Original Message-
From: Jan Simon Winkelmann [mailto:winkelm...@newsfactory.de] 
Sent: Thursday, April 29, 2010 4:36 AM
To: solr-user@lucene.apache.org
Subject: Slow Date-Range Queries

Hi,

I am currently having serious performance problems with date range queries. 
What I am doing, is validating a datasets published status by a valid_from and 
a valid_till date field.

I did get a performance boost of ~ 100% by switching from a normal 
solr.DateField to a solr.TrieDateField with precisionStep="8", however my query 
still takes about 1,3 seconds.

My field defintion looks like this:







And the query looks like this:
((valid_from:[* TO 2010-04-29T10:34:12Z]) AND (valid_till:[2010-04-29T10:34:12Z 
TO *])) OR ((*:* -valid_from:[* TO *]) AND (*:* -valid_till:[* TO *])))

I use the empty checks for datasets which do not have a valid from/till range.


Is there any way to get this any faster? Would it be faster using 
unix-timestamps with int fields?

I would appreciate any insight and help on this.

regards,
Jan-Simon



Relevancy Practices

2010-04-29 Thread Grant Ingersoll
I'm putting on a talk at Lucene Eurocon 
(http://lucene-eurocon.org/sessions-track1-day2.html#1) on "Practical 
Relevance" and I'm curious as to what people put in practice for testing and 
improving relevance.  I have my own inclinations, but I don't want to muddy the 
water just yet.  So, if you have a few moments, I'd love to hear responses to 
the following questions.

What worked?  
What didn't work?  
What didn't you understand about it?  
What tools did you use?  
What tools did you wish you had either for debugging relevance or "fixing" it?
How much time did you spend on it?
How did you avoid over/under tuning?
What stage of development/testing/production did you decide to do relevance 
tuning?  Was that timing planned or not?



Thanks,
Grant

Re: Problem with DIH delta-import on JDBC

2010-04-29 Thread Jon Baer
All that stuff happens in the JDBC driver associated w/ the DataSource so 
probably not unless there is something which can be set in the Oracle driver 
itself.

One thing that might have helped in this case might have been if 
readFieldNames() in the JDBCDataSource dumped its return to debug log for you.  
That might be something that can be JIRA(ed).

- Jon

On Apr 29, 2010, at 9:45 AM, safl wrote:

> 
> Hi,
> 
> I did a debugger session and found that the column names are case sensitive
> (at least with Oracle).
> The column names are retreived from the JDBC metadatas and I found that my
> objectid is in fact OBJECTID.
> 
> So now, I'm able to do an update with the following config (pay attention to
> the OBJECTID):
> 
> query="select * from table"
>deltaImportQuery="select * from table where
> objectid='${dataimporter.delta.OBJECTID}'"
>deltaQuery="select objectid from table where lastupdate >
> '${dataimporter.last_index_time}'">
> 
> 
> 
> Is there a way to be "case insensitive" ?
> 
> Anyway, it works now and that's the most important thing!
> :-)
> 
> Thanks to all,
> Florian
> 
> 
> 
> cbennett wrote:
>> 
>> Hi,
>> 
>> It looks like the deltaImportQuery needs to be changed you are using
>> dataimporter.delta.id which is not correct, you are selecting objected in
>> the deltaQuery, so the deltaImportQuery should be using
>> dataimporter.delta.objectid
>> 
>> So try this:
>> 
>> >query="select * from table"
>>deltaImportQuery="select * from table where
>> objectid='${dataimporter.delta.objectid}'"
>>deltaQuery="select objectid from table where lastupdate >
>> '${dataimporter.last_index_time}'">
>> 
>> 
>> Colin.
>> 
> -- 
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Problem-with-DIH-delta-import-on-JDBC-tp763469p765262.html
> Sent from the Solr - User mailing list archive at Nabble.com.



RE: How to make documents low priority

2010-04-29 Thread Doddamani, Prakash
Thanks Jon,

Its very nice idea I dint thought about it, But I am already using order
for one more field,
"sort=field1+desc"

Can I have order for 2 fields something like
"sort=field1+desc&field5+desc"

Or is there something else I should do.

Thanks
Prakash

-Original Message-
From: Jon Baer [mailto:jonb...@gmail.com] 
Sent: Thursday, April 29, 2010 7:39 PM
To: solr-user@lucene.apache.org
Subject: Re: How to make documents low priority

Does a "sort=field5+desc" on the query param not work?

- Jon

On Apr 29, 2010, at 9:32 AM, Doddamani, Prakash wrote:

> Hi,
> 
> 
> 
> I am using the boost factor as below
> 
> 
> 
>   field1^20.0 field2^5 field3^2.5 field4^.5
> 
> 
> 
> 
> 
> Where it searches first in field1 then field1 and so on
> 
> 
> 
> Is there a way, where I can make some documents very low priority so 
> that they come at the end?
> 
> 
> 
> Scenario :
> 
> 
> 
> 
> 
> aaa
> 
> bbb
> 
> 
> 
> 
> 
> 1
> 
> 
> 
> 2010-04-29T12:40:05.589Z
> 
> 
> 
> 
> 
> I want all the documents which have field5=1 come last and documents 
> which have field5=0 should come first while searching.
> 
> Any advise is greatly appreciated.
> 
> 
> 
> Thanks
> 
> Prakash
> 



Re: How to make documents low priority

2010-04-29 Thread Koji Sekiguchi

Doddamani, Prakash wrote:

Thanks Jon,

Its very nice idea I dint thought about it, But I am already using order
for one more field,
"sort=field1+desc"

Can I have order for 2 fields something like
"sort=field1+desc&field5+desc"

  

Yes, you can:

sort=field1+desc,field5+desc

http://wiki.apache.org/solr/CommonQueryParameters#sort

Koji

--
http://www.rondhuit.com/en/



Re: synonym filter problem for string or phrase

2010-04-29 Thread Marco Martinez
Hi Ranveer,

If you don't specify a field type in the q parameter, the search will be
done searching in your default search field defined in the solrconfig.xml,
its your default field a text_sync field?

Regards,

Marco Martínez Bautista
http://www.paradigmatecnologico.com
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón
Tel.: 91 352 59 42


2010/4/29 Ranveer 

> Hi,
>
> I am trying to configure synonym filter.
> my requirement is:
> when user searching by phrase like "what is solr user?" then it should be
> replace with "solr user".
> something like : what is solr user? => solr user
>
> My schema for particular field is:
>
>  positionIncrementGap="100">
> 
> 
> 
>
> 
> 
> 
> 
> 
>  ignoreCase="true" expand="true" tokenizerFactory="KeywordTokenizerFactory"/>
> 
> 
>
> it seems working fine while trying by analysis.jsp but not by url
> http://localhost:8080/solr/core0/select?q="what is solr user?"
> or
> http://localhost:8080/solr/core0/select?q=what is solr user?
>
> Please guide me for achieve desire result.
>
>


Re: Using NoOpMergePolicy (Lucene 2331) from Solr

2010-04-29 Thread Koji Sekiguchi

Jason Rutherglen wrote:

Tom,

Interesting, can you post your findings after you've found them? :)

Jason

On Tue, Apr 27, 2010 at 2:33 PM, Burton-West, Tom  wrote:
  

Is it possible to use the NoOpMergePolicy ( 
https://issues.apache.org/jira/browse/LUCENE-2331   ) from Solr?

We have very large indexes and always optimize, so we are thinking about using 
a very large ramBufferSizeMB
and a NoOpMergePolicy and then running an optimize to avoid extra disk reads 
and writes.

Tom Burton-West


I've never tried it but NoMergePolicy and NoMergeScheduler
can be specified in solrconfig.xml:

1000



Koji

--
http://www.rondhuit.com/en/



Re: synonym filter problem for string or phrase

2010-04-29 Thread Ranveer

On 4/29/10 8:50 PM, Marco Martinez wrote:

Hi Ranveer,

If you don't specify a field type in the q parameter, the search will be
done searching in your default search field defined in the solrconfig.xml,
its your default field a text_sync field?

Regards,

Marco Martínez Bautista
http://www.paradigmatecnologico.com
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón
Tel.: 91 352 59 42


2010/4/29 Ranveer

   

Hi,

I am trying to configure synonym filter.
my requirement is:
when user searching by phrase like "what is solr user?" then it should be
replace with "solr user".
something like : what is solr user? =>  solr user

My schema for particular field is:















it seems working fine while trying by analysis.jsp but not by url
http://localhost:8080/solr/core0/select?q="what is solr user?"
or
http://localhost:8080/solr/core0/select?q=what is solr user?

Please guide me for achieve desire result.


 
   

Hi Marco,
thanks.
yes my default search field is text_sync.
I am getting result now but not as I expect.
following is my synonym.txt

what is bone cancer=>bone cancer
what is bone cancer?=>bone cancer
what is of bone cancer=>bone cancer
what is symptom of bone cancer=>bone cancer
what is symptoms of bone cancer=>bone cancer

in above I am getting result of all synonym but not the last one "what 
is symptoms of bone cancer=>bone cancer".
I think due to stemming I am not getting expected result. However when I 
am checking result from the analysis.jsp,

its giving expected result. I am confused..
Also I want to know best approach to configure synonym for my requirement.

thanks
with regards


Re: Slow Date-Range Queries

2010-04-29 Thread Erick Erickson
Hmmm, what does the rest of your query look like? And does adding
&debugQuery=on show anything interesting?

Best
Erick

On Thu, Apr 29, 2010 at 6:54 AM, Jan Simon Winkelmann <
winkelm...@newsfactory.de> wrote:

> > > ((valid_from:[* TO 2010-04-29T10:34:12Z]) AND
> > > (valid_till:[2010-04-29T10:34:12Z TO *])) OR ((*:*
> > > -valid_from:[* TO *]) AND (*:* -valid_till:[* TO *])))
> > >
> > > I use the empty checks for datasets which do not have a
> > > valid from/till range.
> > >
> > >
> > > Is there any way to get this any faster?
> >
> > I can suggest you two things.
> >
> > 1-) valid_till:[* TO *] and valid_from:[* TO *] type queries can be
> > performance killer. You can create a new boolean field ( populated via
> > conditional copy or populated client side) that holds the information
> > whether valid_from exists or not. So that valid_till:[* TO *] can be
> > rewritten as valid_till_bool:true.
>
> That may be an idea, however i checked what happens when I simply leave
> them out. It does affect the performance but the query is still somewhere
> around 1 second.
>
> > 2-) If you are embedding these queries into q parameter, you can write
> > your clauses into (filter query) fq parameters so that they are cached.
>
> The problem here is, that the timestamp itself does change quite a bit and
> hence cannot be properly cached. It could be for a few seconds, but
> occasional response times of more than a second is still unacceptable for
> us. We need a solution that responds quickly ALL the time, not just most of
> the time.
>
> Thanks for your ideas though :)
>
> regards,
> Jan-Simon
>
>


Solr configuration to enable indexing/searching webapp log files

2010-04-29 Thread Stefan Maric

I thought i remembered seeing some information about this, but have been
unable to find it

Does anyone know if there is a configuration / module that would allow us to
setup Solr to take in the (large) log files generated by our web/app
servers, so that we can query for things like peak time requests or most
frequently requested web page etc

Thanks
Stefan Maric



Re: Solr Cloud & Gossip Protocols

2010-04-29 Thread Jon Baer
Thanks, Im looking @ the atomic broadcast messaging protocol of Zookeeper and 
think I have found what I was looking for ...

- Jon

On Apr 28, 2010, at 11:27 PM, Yonik Seeley wrote:

> On Wed, Apr 28, 2010 at 2:23 PM, Jon Baer  wrote:
>> From what I understand Cassandra uses a generic gossip protocol for node 
>> discovery (custom), will the Solr-Cloud have something similar?
> 
> SolrCloud uses zookeeper, so node discovery is a simple matter of
> looking there.  Nodes are responsible for registering themselves in
> zookeeper.
> 
> -Yonik
> Apache Lucene Eurocon 2010
> 18-21 May 2010 | Prague



Re: Solr configuration to enable indexing/searching webapp log files

2010-04-29 Thread Jon Baer
Good question, +1 on finding answer, my take ...

Depending on how large of log files you are talking about it might be better 
off to do this w/ HDFS / Hadoop (and a script language like Pig) (or Amazon EMR)

http://developer.amazonwebservices.com/connect/entry.jspa?externalID=873

Theoretically you could split the logs to fields, use a dataimporter and search 
/ sort w/ something like LineEntityProcessor.

http://wiki.apache.org/solr/DataImportHandler#LineEntityProcessor

I've tried to use Solr as a log analytics tool (before dataimporthandler) and 
it was not worth the disk space or practical but I'd love to hear otherwise.  
In general you could flush daily logs to an index but working w/ the data in 
another context if you had to seems better fit for HDFS use (I think).

- Jon

On Apr 29, 2010, at 1:46 PM, Stefan Maric wrote:

> 
> I thought i remembered seeing some information about this, but have been
> unable to find it
> 
> Does anyone know if there is a configuration / module that would allow us to
> setup Solr to take in the (large) log files generated by our web/app
> servers, so that we can query for things like peak time requests or most
> frequently requested web page etc
> 
> Thanks
> Stefan Maric
> 



Evangelism

2010-04-29 Thread Daniel Baughman
Hi I'm new to the list here,

 

I'd like to steer someone in the direction of Solr, and I see the list of
companies using solr, but none have a "power by solr" logo or anything.

 

Does anyone have any great links with evidence to majorly successful solr
projects?

 

Thanks in advance,

 

Dan B.

 



Re: Evangelism

2010-04-29 Thread Peter Wolanin
A very abbreviated list of sites using Apache Solr + Drupal here:
http://drupal.org/node/447564

-Peter

On Thu, Apr 29, 2010 at 2:10 PM, Daniel Baughman  wrote:
> Hi I'm new to the list here,
>
>
>
> I'd like to steer someone in the direction of Solr, and I see the list of
> companies using solr, but none have a "power by solr" logo or anything.
>
>
>
> Does anyone have any great links with evidence to majorly successful solr
> projects?
>
>
>
> Thanks in advance,
>
>
>
> Dan B.
>
>
>
>



-- 
Peter M. Wolanin, Ph.D.
Momentum Specialist,  Acquia. Inc.
peter.wola...@acquia.com


Re: Evangelism

2010-04-29 Thread Israel Ekpo
Checkout Lucid Imagination

http://www.lucidimagination.com/About-Search

This should convince you.

On Thu, Apr 29, 2010 at 2:10 PM, Daniel Baughman wrote:

> Hi I'm new to the list here,
>
>
>
> I'd like to steer someone in the direction of Solr, and I see the list of
> companies using solr, but none have a "power by solr" logo or anything.
>
>
>
> Does anyone have any great links with evidence to majorly successful solr
> projects?
>
>
>
> Thanks in advance,
>
>
>
> Dan B.
>
>
>
>


-- 
"Good Enough" is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: Evangelism

2010-04-29 Thread Israel Ekpo
Their main search page has the "Powered by Solr" logo

http://www.lucidimagination.com/search/



On Thu, Apr 29, 2010 at 2:18 PM, Israel Ekpo  wrote:

> Checkout Lucid Imagination
>
> http://www.lucidimagination.com/About-Search
>
> This should convince you.
>
>
> On Thu, Apr 29, 2010 at 2:10 PM, Daniel Baughman wrote:
>
>> Hi I'm new to the list here,
>>
>>
>>
>> I'd like to steer someone in the direction of Solr, and I see the list of
>> companies using solr, but none have a "power by solr" logo or anything.
>>
>>
>>
>> Does anyone have any great links with evidence to majorly successful solr
>> projects?
>>
>>
>>
>> Thanks in advance,
>>
>>
>>
>> Dan B.
>>
>>
>>
>>
>
>
> --
> "Good Enough" is not good enough.
> To give anything less than your best is to sacrifice the gift.
> Quality First. Measure Twice. Cut Once.
> http://www.israelekpo.com/
>



-- 
"Good Enough" is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


RE: Evangelism

2010-04-29 Thread Nagelberg, Kallin
I had a very hard time selling Solr to business folks. Most are of the mind 
that if you're not paying for something it can't be any good. That might also 
be why they refrain from posting 'powered by solr' on their website, as if it 
might show them to be cheap. They are also fearful of lack of support should 
you get hit by a bus. This might be remedied by recommending professional 
services from a company such as lucid imagination. 

I think your best bet is to create a working demo with your data and show them 
the performance. 

Cheers,
-Kallin Nagelberg



-Original Message-
From: Israel Ekpo [mailto:israele...@gmail.com] 
Sent: Thursday, April 29, 2010 2:19 PM
To: solr-user@lucene.apache.org
Subject: Re: Evangelism

Their main search page has the "Powered by Solr" logo

http://www.lucidimagination.com/search/



On Thu, Apr 29, 2010 at 2:18 PM, Israel Ekpo  wrote:

> Checkout Lucid Imagination
>
> http://www.lucidimagination.com/About-Search
>
> This should convince you.
>
>
> On Thu, Apr 29, 2010 at 2:10 PM, Daniel Baughman wrote:
>
>> Hi I'm new to the list here,
>>
>>
>>
>> I'd like to steer someone in the direction of Solr, and I see the list of
>> companies using solr, but none have a "power by solr" logo or anything.
>>
>>
>>
>> Does anyone have any great links with evidence to majorly successful solr
>> projects?
>>
>>
>>
>> Thanks in advance,
>>
>>
>>
>> Dan B.
>>
>>
>>
>>
>
>
> --
> "Good Enough" is not good enough.
> To give anything less than your best is to sacrifice the gift.
> Quality First. Measure Twice. Cut Once.
> http://www.israelekpo.com/
>



-- 
"Good Enough" is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: Evangelism

2010-04-29 Thread Israel Ekpo
A lot of high performing websites use MySQL, Oracle and Microsoft SQL Server
for data storage and other RDBMS needs without necessarily putting the
"powered by" logo on the sites.

If you need the certified version of Apache Solr, you can contact Lucid
Imagination.

Just like MySQL, Apache Solr and Apache Lucene also have commercial backing
(from Lucid Imagination) if you choose to go that route.

On Thu, Apr 29, 2010 at 2:24 PM, Nagelberg, Kallin <
knagelb...@globeandmail.com> wrote:

> I had a very hard time selling Solr to business folks. Most are of the mind
> that if you're not paying for something it can't be any good. That might
> also be why they refrain from posting 'powered by solr' on their website, as
> if it might show them to be cheap. They are also fearful of lack of support
> should you get hit by a bus. This might be remedied by recommending
> professional services from a company such as lucid imagination.
>
> I think your best bet is to create a working demo with your data and show
> them the performance.
>
> Cheers,
> -Kallin Nagelberg
>
>
>
> -Original Message-
> From: Israel Ekpo [mailto:israele...@gmail.com]
> Sent: Thursday, April 29, 2010 2:19 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Evangelism
>
> Their main search page has the "Powered by Solr" logo
>
> http://www.lucidimagination.com/search/
>
>
>
> On Thu, Apr 29, 2010 at 2:18 PM, Israel Ekpo  wrote:
>
> > Checkout Lucid Imagination
> >
> > http://www.lucidimagination.com/About-Search
> >
> > This should convince you.
> >
> >
> > On Thu, Apr 29, 2010 at 2:10 PM, Daniel Baughman  >wrote:
> >
> >> Hi I'm new to the list here,
> >>
> >>
> >>
> >> I'd like to steer someone in the direction of Solr, and I see the list
> of
> >> companies using solr, but none have a "power by solr" logo or anything.
> >>
> >>
> >>
> >> Does anyone have any great links with evidence to majorly successful
> solr
> >> projects?
> >>
> >>
> >>
> >> Thanks in advance,
> >>
> >>
> >>
> >> Dan B.
> >>
> >>
> >>
> >>
> >
> >
> > --
> > "Good Enough" is not good enough.
> > To give anything less than your best is to sacrifice the gift.
> > Quality First. Measure Twice. Cut Once.
> > http://www.israelekpo.com/
> >
>
>
>
> --
> "Good Enough" is not good enough.
> To give anything less than your best is to sacrifice the gift.
> Quality First. Measure Twice. Cut Once.
> http://www.israelekpo.com/
>



-- 
"Good Enough" is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: Evangelism

2010-04-29 Thread Erick Erickson
This is a Lucene story, but may well apply... By the time I'd sent a request
for assistance
to the vendor of one of our search tools and received the reply "you didn't
give us the
right license number", I'd found Lucene, indexed part of my corpus and run
successful
searches against it. And had answers provided to me from the users list.

Paying for support provides, I believe, a false sense of security. Once you
sign up,
you're at the mercy of the vendor for many things, among them:

1> releases are far apart
2> if the company gets purchased, all sorts of interesting things happen.
Witness Microsoft buying FAST recently, then announcing they were not
doing
any more development on *nix platforms.
3> If the company does go out of business, you are stuck with binary code
you can't
 compile/run/fix understand.
4> You are at the mercy of the next release for "really gotta have it now"
changes. Unless
 you're willing to pay...er...a considerable sum to get a special fix,
which may not
 even be an option.

That said, not all open source products are great, I just happen to think
that SOLR/Lucene
is. Add to that that problems that are found are often fixed in a day or
two, a record that
no commercial package I've ever used has matched.

Here's one technique you can use to sell it to management. Get a pilot up
and running in, oh,
say three days (ok, take a week). Try the same thing with commercial package
X. Do not,
under any circumstances, be satisfied with the powerpoint presentation from
a commercial
vendor . Require working code. Then evaluate ...

Best
Erick

On Thu, Apr 29, 2010 at 2:24 PM, Nagelberg, Kallin <
knagelb...@globeandmail.com> wrote:

> I had a very hard time selling Solr to business folks. Most are of the mind
> that if you're not paying for something it can't be any good. That might
> also be why they refrain from posting 'powered by solr' on their website, as
> if it might show them to be cheap. They are also fearful of lack of support
> should you get hit by a bus. This might be remedied by recommending
> professional services from a company such as lucid imagination.
>
> I think your best bet is to create a working demo with your data and show
> them the performance.
>
> Cheers,
> -Kallin Nagelberg
>
>
>
> -Original Message-
> From: Israel Ekpo [mailto:israele...@gmail.com]
> Sent: Thursday, April 29, 2010 2:19 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Evangelism
>
> Their main search page has the "Powered by Solr" logo
>
> http://www.lucidimagination.com/search/
>
>
>
> On Thu, Apr 29, 2010 at 2:18 PM, Israel Ekpo  wrote:
>
> > Checkout Lucid Imagination
> >
> > http://www.lucidimagination.com/About-Search
> >
> > This should convince you.
> >
> >
> > On Thu, Apr 29, 2010 at 2:10 PM, Daniel Baughman  >wrote:
> >
> >> Hi I'm new to the list here,
> >>
> >>
> >>
> >> I'd like to steer someone in the direction of Solr, and I see the list
> of
> >> companies using solr, but none have a "power by solr" logo or anything.
> >>
> >>
> >>
> >> Does anyone have any great links with evidence to majorly successful
> solr
> >> projects?
> >>
> >>
> >>
> >> Thanks in advance,
> >>
> >>
> >>
> >> Dan B.
> >>
> >>
> >>
> >>
> >
> >
> > --
> > "Good Enough" is not good enough.
> > To give anything less than your best is to sacrifice the gift.
> > Quality First. Measure Twice. Cut Once.
> > http://www.israelekpo.com/
> >
>
>
>
> --
> "Good Enough" is not good enough.
> To give anything less than your best is to sacrifice the gift.
> Quality First. Measure Twice. Cut Once.
> http://www.israelekpo.com/
>


RE: Evangelism

2010-04-29 Thread Jason Chaffee
Netflix search is built with Solr.  That seems like a fairly big and
recognizable company.

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Thursday, April 29, 2010 11:44 AM
To: solr-user@lucene.apache.org
Subject: Re: Evangelism

This is a Lucene story, but may well apply... By the time I'd sent a
request
for assistance
to the vendor of one of our search tools and received the reply "you
didn't
give us the
right license number", I'd found Lucene, indexed part of my corpus and
run
successful
searches against it. And had answers provided to me from the users list.

Paying for support provides, I believe, a false sense of security. Once
you
sign up,
you're at the mercy of the vendor for many things, among them:

1> releases are far apart
2> if the company gets purchased, all sorts of interesting things
happen.
Witness Microsoft buying FAST recently, then announcing they were
not
doing
any more development on *nix platforms.
3> If the company does go out of business, you are stuck with binary
code
you can't
 compile/run/fix understand.
4> You are at the mercy of the next release for "really gotta have it
now"
changes. Unless
 you're willing to pay...er...a considerable sum to get a special
fix,
which may not
 even be an option.

That said, not all open source products are great, I just happen to
think
that SOLR/Lucene
is. Add to that that problems that are found are often fixed in a day or
two, a record that
no commercial package I've ever used has matched.

Here's one technique you can use to sell it to management. Get a pilot
up
and running in, oh,
say three days (ok, take a week). Try the same thing with commercial
package
X. Do not,
under any circumstances, be satisfied with the powerpoint presentation
from
a commercial
vendor . Require working code. Then evaluate ...

Best
Erick

On Thu, Apr 29, 2010 at 2:24 PM, Nagelberg, Kallin <
knagelb...@globeandmail.com> wrote:

> I had a very hard time selling Solr to business folks. Most are of the
mind
> that if you're not paying for something it can't be any good. That
might
> also be why they refrain from posting 'powered by solr' on their
website, as
> if it might show them to be cheap. They are also fearful of lack of
support
> should you get hit by a bus. This might be remedied by recommending
> professional services from a company such as lucid imagination.
>
> I think your best bet is to create a working demo with your data and
show
> them the performance.
>
> Cheers,
> -Kallin Nagelberg
>
>
>
> -Original Message-
> From: Israel Ekpo [mailto:israele...@gmail.com]
> Sent: Thursday, April 29, 2010 2:19 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Evangelism
>
> Their main search page has the "Powered by Solr" logo
>
> http://www.lucidimagination.com/search/
>
>
>
> On Thu, Apr 29, 2010 at 2:18 PM, Israel Ekpo 
wrote:
>
> > Checkout Lucid Imagination
> >
> > http://www.lucidimagination.com/About-Search
> >
> > This should convince you.
> >
> >
> > On Thu, Apr 29, 2010 at 2:10 PM, Daniel Baughman
 >wrote:
> >
> >> Hi I'm new to the list here,
> >>
> >>
> >>
> >> I'd like to steer someone in the direction of Solr, and I see the
list
> of
> >> companies using solr, but none have a "power by solr" logo or
anything.
> >>
> >>
> >>
> >> Does anyone have any great links with evidence to majorly
successful
> solr
> >> projects?
> >>
> >>
> >>
> >> Thanks in advance,
> >>
> >>
> >>
> >> Dan B.
> >>
> >>
> >>
> >>
> >
> >
> > --
> > "Good Enough" is not good enough.
> > To give anything less than your best is to sacrifice the gift.
> > Quality First. Measure Twice. Cut Once.
> > http://www.israelekpo.com/
> >
>
>
>
> --
> "Good Enough" is not good enough.
> To give anything less than your best is to sacrifice the gift.
> Quality First. Measure Twice. Cut Once.
> http://www.israelekpo.com/
>


RE: Evangelism

2010-04-29 Thread Jason Chaffee
Forgot the link.

http://www.lucidimagination.com/Community/Marketplace/Application-Showca
se-Wiki/Netflix


-Original Message-
From: Jason Chaffee [mailto:jchaf...@ebates.com] 
Sent: Thursday, April 29, 2010 11:52 AM
To: solr-user@lucene.apache.org
Subject: RE: Evangelism

Netflix search is built with Solr.  That seems like a fairly big and
recognizable company.

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Thursday, April 29, 2010 11:44 AM
To: solr-user@lucene.apache.org
Subject: Re: Evangelism

This is a Lucene story, but may well apply... By the time I'd sent a
request
for assistance
to the vendor of one of our search tools and received the reply "you
didn't
give us the
right license number", I'd found Lucene, indexed part of my corpus and
run
successful
searches against it. And had answers provided to me from the users list.

Paying for support provides, I believe, a false sense of security. Once
you
sign up,
you're at the mercy of the vendor for many things, among them:

1> releases are far apart
2> if the company gets purchased, all sorts of interesting things
happen.
Witness Microsoft buying FAST recently, then announcing they were
not
doing
any more development on *nix platforms.
3> If the company does go out of business, you are stuck with binary
code
you can't
 compile/run/fix understand.
4> You are at the mercy of the next release for "really gotta have it
now"
changes. Unless
 you're willing to pay...er...a considerable sum to get a special
fix,
which may not
 even be an option.

That said, not all open source products are great, I just happen to
think
that SOLR/Lucene
is. Add to that that problems that are found are often fixed in a day or
two, a record that
no commercial package I've ever used has matched.

Here's one technique you can use to sell it to management. Get a pilot
up
and running in, oh,
say three days (ok, take a week). Try the same thing with commercial
package
X. Do not,
under any circumstances, be satisfied with the powerpoint presentation
from
a commercial
vendor . Require working code. Then evaluate ...

Best
Erick

On Thu, Apr 29, 2010 at 2:24 PM, Nagelberg, Kallin <
knagelb...@globeandmail.com> wrote:

> I had a very hard time selling Solr to business folks. Most are of the
mind
> that if you're not paying for something it can't be any good. That
might
> also be why they refrain from posting 'powered by solr' on their
website, as
> if it might show them to be cheap. They are also fearful of lack of
support
> should you get hit by a bus. This might be remedied by recommending
> professional services from a company such as lucid imagination.
>
> I think your best bet is to create a working demo with your data and
show
> them the performance.
>
> Cheers,
> -Kallin Nagelberg
>
>
>
> -Original Message-
> From: Israel Ekpo [mailto:israele...@gmail.com]
> Sent: Thursday, April 29, 2010 2:19 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Evangelism
>
> Their main search page has the "Powered by Solr" logo
>
> http://www.lucidimagination.com/search/
>
>
>
> On Thu, Apr 29, 2010 at 2:18 PM, Israel Ekpo 
wrote:
>
> > Checkout Lucid Imagination
> >
> > http://www.lucidimagination.com/About-Search
> >
> > This should convince you.
> >
> >
> > On Thu, Apr 29, 2010 at 2:10 PM, Daniel Baughman
 >wrote:
> >
> >> Hi I'm new to the list here,
> >>
> >>
> >>
> >> I'd like to steer someone in the direction of Solr, and I see the
list
> of
> >> companies using solr, but none have a "power by solr" logo or
anything.
> >>
> >>
> >>
> >> Does anyone have any great links with evidence to majorly
successful
> solr
> >> projects?
> >>
> >>
> >>
> >> Thanks in advance,
> >>
> >>
> >>
> >> Dan B.
> >>
> >>
> >>
> >>
> >
> >
> > --
> > "Good Enough" is not good enough.
> > To give anything less than your best is to sacrifice the gift.
> > Quality First. Measure Twice. Cut Once.
> > http://www.israelekpo.com/
> >
>
>
>
> --
> "Good Enough" is not good enough.
> To give anything less than your best is to sacrifice the gift.
> Quality First. Measure Twice. Cut Once.
> http://www.israelekpo.com/
>


Re: Solr configuration to enable indexing/searching webapp log files

2010-04-29 Thread Jon Baer
To follow up it ... it seems dumping to Solr is common ...

http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-query-terabytes-data

- Jon

On Apr 29, 2010, at 1:58 PM, Jon Baer wrote:

> Good question, +1 on finding answer, my take ...
> 
> Depending on how large of log files you are talking about it might be better 
> off to do this w/ HDFS / Hadoop (and a script language like Pig) (or Amazon 
> EMR)
> 
> http://developer.amazonwebservices.com/connect/entry.jspa?externalID=873
> 
> Theoretically you could split the logs to fields, use a dataimporter and 
> search / sort w/ something like LineEntityProcessor.
> 
> http://wiki.apache.org/solr/DataImportHandler#LineEntityProcessor
> 
> I've tried to use Solr as a log analytics tool (before dataimporthandler) and 
> it was not worth the disk space or practical but I'd love to hear otherwise.  
> In general you could flush daily logs to an index but working w/ the data in 
> another context if you had to seems better fit for HDFS use (I think).
> 
> - Jon
> 
> On Apr 29, 2010, at 1:46 PM, Stefan Maric wrote:
> 
>> 
>> I thought i remembered seeing some information about this, but have been
>> unable to find it
>> 
>> Does anyone know if there is a configuration / module that would allow us to
>> setup Solr to take in the (large) log files generated by our web/app
>> servers, so that we can query for things like peak time requests or most
>> frequently requested web page etc
>> 
>> Thanks
>> Stefan Maric
>> 
> 



Re: Evangelism

2010-04-29 Thread Grant Ingersoll
Hi Daniel,

There are lots of sites running Solr ranging from very large to very small.  
Because it is open source, people aren't required to report, but there are 
several places where people have reported:

http://wiki.apache.org/solr/PublicServers
http://www.lucidimagination.com/developer/Community/Application-Showcase-Wiki
You can also see a number of case studies at: 
http://www.lucidimagination.com/solutions/documents

From those lists, you'll see recognizable names like AT&T, StubHub, CNET, Digg, 
MTV/Viacom, The Motley Fool, Disney, Netflix, etc.

Hope that helps,
Grant


On Apr 29, 2010, at 2:10 PM, Daniel Baughman wrote:

> Hi I'm new to the list here,
> 
> 
> 
> I'd like to steer someone in the direction of Solr, and I see the list of
> companies using solr, but none have a "power by solr" logo or anything.
> 
> 
> 
> Does anyone have any great links with evidence to majorly successful solr
> projects?
> 
> 
> 
> Thanks in advance,
> 
> 
> 
> Dan B.
> 
> 
> 

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem using Solr/Lucene: 
http://www.lucidimagination.com/search



RE: Evangelism

2010-04-29 Thread Daniel Baughman
ColdFusion 9 is now shipping with it, as well.

Thanks everyone for the inputs.

-Original Message-
From: Grant Ingersoll [mailto:gsi...@gmail.com] On Behalf Of Grant Ingersoll
Sent: Thursday, April 29, 2010 1:35 PM
To: solr-user@lucene.apache.org
Subject: Re: Evangelism

Hi Daniel,

There are lots of sites running Solr ranging from very large to very small.
Because it is open source, people aren't required to report, but there are
several places where people have reported:

http://wiki.apache.org/solr/PublicServers
http://www.lucidimagination.com/developer/Community/Application-Showcase-Wik
i
You can also see a number of case studies at:
http://www.lucidimagination.com/solutions/documents

>From those lists, you'll see recognizable names like AT&T, StubHub, CNET,
Digg, MTV/Viacom, The Motley Fool, Disney, Netflix, etc.

Hope that helps,
Grant


On Apr 29, 2010, at 2:10 PM, Daniel Baughman wrote:

> Hi I'm new to the list here,
> 
> 
> 
> I'd like to steer someone in the direction of Solr, and I see the list of
> companies using solr, but none have a "power by solr" logo or anything.
> 
> 
> 
> Does anyone have any great links with evidence to majorly successful solr
> projects?
> 
> 
> 
> Thanks in advance,
> 
> 
> 
> Dan B.
> 
> 
> 

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem using Solr/Lucene:
http://www.lucidimagination.com/search



Re: benefits of float vs. string

2010-04-29 Thread Lance Norskog
Floats are Trie types and are stored in a compressed format. They will
search faster. They will also sort with much less space.

One thing to point out is that doing bitwise comparison on floats is
to live in a state of sin. Your string representations must parse
exactly right.

On Wed, Apr 28, 2010 at 8:22 AM, Nagelberg, Kallin
 wrote:
> Hi,
>
> Does anyone have an idea about the performance benefits of searching across 
> floats compared to strings? I have one multi-valued field that contains about 
> 3000 distinct IDs across 5 million documents. I am going to be a lot of 
> queries like q=id:102 OR id:303 OR id:305, etc. Right now it is a String but 
> I am going to switch to a float as intuitively it ought to be easier to 
> filter a number than a string. I'm just curious if this should in fact bring 
> a benefit, and more generally what the benefits/penalties to using numerical 
> over string field types is.
>
> Thanks,
> Kallin Nagelberg
>



-- 
Lance Norskog
goks...@gmail.com


Re: benefits of float vs. string

2010-04-29 Thread Yonik Seeley
On Wed, Apr 28, 2010 at 11:22 AM, Nagelberg, Kallin
 wrote:
> Does anyone have an idea about the performance benefits of searching across 
> floats compared to strings? I have one multi-valued field that contains about 
> 3000 distinct IDs across 5 million documents. I am going to be a lot of 
> queries like q=id:102 OR id:303 OR id:305, etc. Right now it is a String but 
> I am going to switch to a float as intuitively it ought to be easier to 
> filter a number than a string.


There won't be any difference in search speed for term queries as you
show above.
If you don't need to do sorting or range queries on that field, I'd
leave it as a String.


-Yonik
Apache Lucene Eurocon 2010
18-21 May 2010 | Prague


Re: Security/authentication strategies

2010-04-29 Thread Andrew McCombe
Thanks for this Peter.  I have managed to get this working with Tomcat.

Andrew

On 29 April 2010 12:11, Peter Sturge  wrote:
> Hi Andrew,
>
> Today, authentication is handled by the container (e.g. Tomcat, Jetty etc.).
>
>
> There's a thread I found to be very useful on this topic here:
>
> http://www.lucidimagination.com/search/document/d1e338dc452db2e4/how_can_i_protect_the_solr_cores
>
> This was for Jetty, but the idea is pretty much the same for Tomcat.
>
> HTH
>
> Peter
>
>
>
> On Thu, Apr 29, 2010 at 8:42 AM, Andrew McCombe  wrote:
>
>> Hi
>>
>> I'm planning on adding some protection to our solr servers and would
>> like to know what others are doing in this area.
>>
>> Basically I have a few solr cores running under tomcat6 and all use DH
>> to populate the solr index.  This is all behind a firewall and only
>> accessible from certain IP addresses.  Access to Solr Admin is open to
>> anyone in the company and many use it for checking data is in the
>> index and simple analysis.  However, they can also trigger a
>> full-import if they are careless (one of the cores takes 6 hours to
>> ingest the data).
>>
>> What would be the recommended way of protecting things like the DIH
>> functionality? HTTP Authentication via tomcat realms or are there any
>> other solutions?
>>
>> Thanks
>> Andrew McCombe
>> iWeb Solutions
>>
>


Solr Dismax query - prefix matching

2010-04-29 Thread Belagodu, Bharath
Folks, 
Greetings.
Using dismax query parser is there a way to perform prefix match. For
example: If I have a field called 'booktitle' with the actual values as
'Code Complete', 'Coding standard 101', then I'd like to search for the
query string 'cod' and have the dismax match against both the book
titles since 'cod' is a prefix match for 'code' and 'coding'. 

Thanks,
Bharath



RE: Using NoOpMergePolicy (Lucene 2331) from Solr

2010-04-29 Thread Burton-West, Tom
Thanks Koji, 

That was the information I was looking for.  I'll be sure to post the test 
results to the list.  It may be a few weeks before we can schedule the tests 
for our test server.

Tom


>>I've never tried it but NoMergePolicy and NoMergeScheduler
>>can be specified in solrconfig.xml:

>>  1000
>> 
>> 

Koji

-- 
http://www.rondhuit.com/en/



Re: StreamingUpdateSolrServer hangs

2010-04-29 Thread Yonik Seeley
On Fri, Apr 16, 2010 at 1:34 PM, Sascha Szott  wrote:
> In my case the whole application hangs and never recovers (CPU utilization
> goes down to near 0%). Interestingly, the problem reproducibly occurs only
> if SUSS is created with *more than 2* threads.

Is your application also using multiple threads when adding docs to the SUSS?
FYI, I opened https://issues.apache.org/jira/browse/SOLR-1885 to track this.

-Yonik
Apache Lucene Eurocon 2010
18-21 May 2010 | Prague


Re: Relevancy Practices

2010-04-29 Thread MitchK

I think the problems one has to solve are depending on the usecases one has
to deal with.
It makes a difference whether I got much documents that are bloody similar
but with different contexts and I have to determine what query applies to
what context in what probability for which document - or if I have lots of
editorialy managed documents with relatively clear contexts, because they
offer human-created tags etc.

I haven't made much experiences with Solr (and no experiences in a
productive environment). However, those experiences I have made show that
spliting the document's context in as small parts as possible is always a
good idea.
I don't mean splitting in a sense of making the part's of a document
smaller. I mean that in a way of making it easier to decide which part of a
document is more important than another.
e.g.: I got a social network and every user is able to create his or her own
blog - as a corporation I want to make them all searchable. It would be
beneficial for high-quality search, if I am able to extract the
introduction, the category  (maybe added by the author). 

According to this: If this is not done by people, or not well done enough,
than I need to do so algorithmically. 
e.g.:
If I got a dictionary of person-names, than I could use the keepWordFilter
to create a field I can facet *and* boost on. 
Let's say the user writes about Paris Hilton, Barrack Obama or any other
well known person, than I can extract their names from the content in an
easy way - of course this could be done better, but that's not the point
here.
If I search for "Obama's speech" all documents with "Obama" could get a
boost. 
The difference between the solution without this keepWordFilter-feature
would be, that Solr does not know that the most important word in this query
is "Obama".

It is only a shortcut of some ideas on how one can improve the relevancy
with several features that Solr offers out-of-the-box. Some of them could be
improved with external NLP-tools.

My biggest problem with relevancy is, that I can't work with metadata
computed on the fly or every hour out of the box (okay, you mentioned at the
discussion on the dev-list that it may be possible, however I answered that
the feature you talked about is not well documented, so that I don't know if
it fits my needs or how to use it).

How to avoid over- or under-tuning?
Easily: Testing every change I made on scoring-factors against a lot of
queries. If it looks good in 9 of 10 cases in a real good way, than the 10th
case runs against a really bad query or could be solved with a facet or...
there are a lot of ideas how to solve this. What I really want to say is:
Test as much as you can and try to realize what your changes really mean
(for example I can make a boost on the title of a document with a value of
1.000, every other field has got a boost-value between 1 and 10. I am
relatively sure that this meets the needs for some queries but works
catastrophal with the rest).
It really helps to understand how Lucene's similarity works and what those
factors mean in reality to your existing data. Maybe you need to change the
smiliarity, because you don't want that the length of a document influences
the score of it.

Just some thougths. I don't think that I tell you much new stuff, however,
if you got any questions or want to know more about this or that, please
ask.
Unfortunately I can't go to the ApacheCon, but hopefully it helps to give a
good presentation.

Kind regards
- Mitch 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Relevancy-Practices-tp765364p766456.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Using QueryElevationComponent without specifying top results?

2010-04-29 Thread Lance Norskog
What you want is:
All results within the area and whatever results the
QueryElevateComponent adds, sorted by some relevance function.

If this is it, you can get the results, with the elevated output, and
do a second query with all of the ids, sorted by distance. This second
query would not use the filter query.

On Thu, Apr 29, 2010 at 4:04 AM, Oliver Beattie  wrote:
> Just wondering if anyone had any further thoughts on how I might do this?
>
> On 26 April 2010 19:18, Oliver Beattie  wrote:
>
>> Hi Grant,
>>
>> Thanks for getting back to me. Yes, indeed, #1 is exactly what I'm looking
>> for. Results are already ranked by distance (among other things), but we
>> need the ability to manually include a certain result in the set. They
>> wouldn't usually match, because they fall outside the radius of the filter
>> query we use. Most of the resulting score comes from function queries (we
>> have a number of metrics that rank listings [price, feedback score, etc]),
>> so the score from the text search doesn't have *that much* bearing on the
>> outcome. So, yeah, basically, I'm looking for a way to include results that
>> don't match, but have Solr calculate its score as it would if it did match
>> the filter query. Sorry for being so unclear and rambling a bit, I'm
>> struggling to articulate what we want in a clear manner!
>>
>> —Oliver
>>
>>
>>
>> On 26 April 2010 19:13, Grant Ingersoll  wrote:
>>
>>>
>>> On Apr 26, 2010, at 7:53 AM, Oliver Beattie wrote:
>>>
>>> > Hi all,
>>> >
>>> > I'm currently writing an application that uses Solr, and we'd like to
>>> use
>>> > something like the QueryElevationComponent, without having to specify
>>> which
>>> > results appear top. For example, what we really need is a way to say
>>> "for
>>> > this search, include these results as part of the result set, and rank
>>> them
>>> > as you normally would". We're using a filter to specify which results we
>>> > want included (which is distance-based), but we really want to be able
>>> to
>>> > explicitly include certain results in certain queries (i.e. we want to
>>> > include a listing more than 5 miles away from a particular location for
>>> > certain queries).
>>> >
>>> > Is this possible? Any help would be really appreciated :)
>>>
>>>
>>> I'm not following the "rank them as you normally would" part.  If Solr
>>> were already finding them, then they would already be ranked and showing up
>>> in the results and you wouldn't need to "hardcode" them, right?  So, that
>>> leaves a couple of cases:
>>>
>>> 1. Including results that don't match
>>> 2. Elevating results that do match
>>>
>>> In your case, it sounds like you mostly just want #1.  And, based on the
>>> context (distance search) perhaps you want those results sorted by distance?
>>>  Otherwise, how else would you know where to inject the results?
>>>
>>> The QueryElevationComponent can include the results, although, I must
>>> admit, I'm not 100% certain on what happens to injected results given
>>> sorting.
>>>
>>>
>>> --
>>> Grant Ingersoll
>>> http://www.lucidimagination.com/
>>>
>>> Search the Lucene ecosystem using Solr/Lucene:
>>> http://www.lucidimagination.com/search
>>>
>>>
>>
>



-- 
Lance Norskog
goks...@gmail.com


Re: Slow Date-Range Queries

2010-04-29 Thread Lance Norskog
Do you really need the *:* stuff in the date range subqueries? That
may add to the execution time.

On Thu, Apr 29, 2010 at 9:52 AM, Erick Erickson  wrote:
> Hmmm, what does the rest of your query look like? And does adding
> &debugQuery=on show anything interesting?
>
> Best
> Erick
>
> On Thu, Apr 29, 2010 at 6:54 AM, Jan Simon Winkelmann <
> winkelm...@newsfactory.de> wrote:
>
>> > > ((valid_from:[* TO 2010-04-29T10:34:12Z]) AND
>> > > (valid_till:[2010-04-29T10:34:12Z TO *])) OR ((*:*
>> > > -valid_from:[* TO *]) AND (*:* -valid_till:[* TO *])))
>> > >
>> > > I use the empty checks for datasets which do not have a
>> > > valid from/till range.
>> > >
>> > >
>> > > Is there any way to get this any faster?
>> >
>> > I can suggest you two things.
>> >
>> > 1-) valid_till:[* TO *] and valid_from:[* TO *] type queries can be
>> > performance killer. You can create a new boolean field ( populated via
>> > conditional copy or populated client side) that holds the information
>> > whether valid_from exists or not. So that valid_till:[* TO *] can be
>> > rewritten as valid_till_bool:true.
>>
>> That may be an idea, however i checked what happens when I simply leave
>> them out. It does affect the performance but the query is still somewhere
>> around 1 second.
>>
>> > 2-) If you are embedding these queries into q parameter, you can write
>> > your clauses into (filter query) fq parameters so that they are cached.
>>
>> The problem here is, that the timestamp itself does change quite a bit and
>> hence cannot be properly cached. It could be for a few seconds, but
>> occasional response times of more than a second is still unacceptable for
>> us. We need a solution that responds quickly ALL the time, not just most of
>> the time.
>>
>> Thanks for your ideas though :)
>>
>> regards,
>> Jan-Simon
>>
>>
>



-- 
Lance Norskog
goks...@gmail.com


Re: Solr configuration to enable indexing/searching webapp log files

2010-04-29 Thread Lance Norskog
It sounds like you want a data warehouse, not a text search engine.
Splunk and Pentaho are good things to try.

On Thu, Apr 29, 2010 at 12:03 PM, Jon Baer  wrote:
> To follow up it ... it seems dumping to Solr is common ...
>
> http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-query-terabytes-data
>
> - Jon
>
> On Apr 29, 2010, at 1:58 PM, Jon Baer wrote:
>
>> Good question, +1 on finding answer, my take ...
>>
>> Depending on how large of log files you are talking about it might be better 
>> off to do this w/ HDFS / Hadoop (and a script language like Pig) (or Amazon 
>> EMR)
>>
>> http://developer.amazonwebservices.com/connect/entry.jspa?externalID=873
>>
>> Theoretically you could split the logs to fields, use a dataimporter and 
>> search / sort w/ something like LineEntityProcessor.
>>
>> http://wiki.apache.org/solr/DataImportHandler#LineEntityProcessor
>>
>> I've tried to use Solr as a log analytics tool (before dataimporthandler) 
>> and it was not worth the disk space or practical but I'd love to hear 
>> otherwise.  In general you could flush daily logs to an index but working w/ 
>> the data in another context if you had to seems better fit for HDFS use (I 
>> think).
>>
>> - Jon
>>
>> On Apr 29, 2010, at 1:46 PM, Stefan Maric wrote:
>>
>>>
>>> I thought i remembered seeing some information about this, but have been
>>> unable to find it
>>>
>>> Does anyone know if there is a configuration / module that would allow us to
>>> setup Solr to take in the (large) log files generated by our web/app
>>> servers, so that we can query for things like peak time requests or most
>>> frequently requested web page etc
>>>
>>> Thanks
>>> Stefan Maric
>>>
>>
>
>



-- 
Lance Norskog
goks...@gmail.com


Re: StreamingUpdateSolrServer hangs

2010-04-29 Thread Lance Norskog
In solrconfig.xml, there is a parameter controlling remote streaming:
   
  
  

1) Is this relevant with the SUSS?
2) It seems to be 'true' in the example default, which may not be a good idea.

On Thu, Apr 29, 2010 at 2:12 PM, Yonik Seeley
 wrote:
> On Fri, Apr 16, 2010 at 1:34 PM, Sascha Szott  wrote:
>> In my case the whole application hangs and never recovers (CPU utilization
>> goes down to near 0%). Interestingly, the problem reproducibly occurs only
>> if SUSS is created with *more than 2* threads.
>
> Is your application also using multiple threads when adding docs to the SUSS?
> FYI, I opened https://issues.apache.org/jira/browse/SOLR-1885 to track this.
>
> -Yonik
> Apache Lucene Eurocon 2010
> 18-21 May 2010 | Prague
>



-- 
Lance Norskog
goks...@gmail.com


Re: Evangelism

2010-04-29 Thread Ryan Grange
DollarDays.com is currently using it and we display the powered by logo 
as at least a gesture of giving back to the community.


Ryan T. Grange, IT Manager
DollarDays International, Inc.
rgra...@dollardays.com (480)922-8155 x106


On 4/29/2010 11:10 AM, Daniel Baughman wrote:

Hi I'm new to the list here,



I'd like to steer someone in the direction of Solr, and I see the list of
companies using solr, but none have a "power by solr" logo or anything.



Does anyone have any great links with evidence to majorly successful solr
projects?



Thanks in advance,



Dan B.




   


Re: StreamingUpdateSolrServer hangs

2010-04-29 Thread Yonik Seeley
On Thu, Apr 29, 2010 at 6:04 PM, Lance Norskog  wrote:
> In solrconfig.xml, there is a parameter controlling remote streaming:
>   
>      
>       multipartUploadLimitInKB="2048000" />
>
> 1) Is this relevant with the SUSS?

No, this relates to solr pulling data from another source (via stream.url)

-Yonik
Apache Lucene Eurocon 2010
18-21 May 2010 | Prague


Re: StreamingUpdateSolrServer hangs

2010-04-29 Thread Lance Norskog
What is the garbage collection status when this happens?

What are the open sockets in the OS when this happens? Run 'netstat
-an | fgrep 8983' where 8983 is the Solr incoming port number.

A side note on sockets:
SUSS uses the MultiThreadedHttpConnectionManager but  never calls
MultiThreadedHttpConnectionManager.closeIdleConnections() on its
sockets. I don't know if this is a problem, but it should do this as a
matter of dotting the i's and crossing the t's.

On Thu, Apr 29, 2010 at 3:25 PM, Yonik Seeley
 wrote:
> On Thu, Apr 29, 2010 at 6:04 PM, Lance Norskog  wrote:
>> In solrconfig.xml, there is a parameter controlling remote streaming:
>>   
>>      
>>      > multipartUploadLimitInKB="2048000" />
>>
>> 1) Is this relevant with the SUSS?
>
> No, this relates to solr pulling data from another source (via stream.url)
>
> -Yonik
> Apache Lucene Eurocon 2010
> 18-21 May 2010 | Prague
>



-- 
Lance Norskog
goks...@gmail.com


Re: StreamingUpdateSolrServer hangs

2010-04-29 Thread Yonik Seeley
I'm trying to reproduce now... single thread adding documents to a
multithreaded client, StreamingUpdateSolrServer(addr,32,4)

I'm currently at the 2.5 hour mark and 100M documents - no issues so far.

-Yonik
Apache Lucene Eurocon 2010
18-21 May 2010 | Prague



On Thu, Apr 29, 2010 at 5:12 PM, Yonik Seeley
 wrote:
> On Fri, Apr 16, 2010 at 1:34 PM, Sascha Szott  wrote:
>> In my case the whole application hangs and never recovers (CPU utilization
>> goes down to near 0%). Interestingly, the problem reproducibly occurs only
>> if SUSS is created with *more than 2* threads.
>
> Is your application also using multiple threads when adding docs to the SUSS?
> FYI, I opened https://issues.apache.org/jira/browse/SOLR-1885 to track this.
>
> -Yonik
> Apache Lucene Eurocon 2010
> 18-21 May 2010 | Prague
>


Re: synonym filter problem for string or phrase

2010-04-29 Thread Jonty Rhods
On 4/29/10 8:50 PM, Marco Martinez wrote:

Hi Ranveer,

If you don't specify a field type in the q parameter, the search will be
done searching in your default search field defined in the solrconfig.xml,
its your default field a text_sync field?

Regards,

Marco Martínez Bautista
http://www.paradigmatecnologico.com
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón
Tel.: 91 352 59 42


2010/4/29 Ranveer 



Hi,

I am trying to configure synonym filter.
my requirement is:
when user searching by phrase like "what is solr user?" then it should be
replace with "solr user".
something like : what is solr user? =>  solr user

My schema for particular field is:
















it seems working fine while trying by analysis.jsp but not by url
http://localhost:8080/solr/core0/select?q="what is solr user?"
or
http://localhost:8080/solr/core0/select?q=what is solr user?

Please guide me for achieve desire result.






Hi Marco,
thanks.
yes my default search field is text_sync.
I am getting result now but not as I expect.
following is my synonym.txt

what is bone cancer=>bone cancer
what is bone cancer?=>bone cancer
what is of bone cancer=>bone cancer
what is symptom of bone cancer=>bone cancer
what is symptoms of bone cancer=>bone cancer

in above I am getting result of all synonym but not the last one "what is
symptoms of bone cancer=>bone cancer".
I think due to stemming I am not getting expected result. However when I am
checking result from the analysis.jsp,
its giving expected result. I am confused..
Also I want to know best approach to configure synonym for my requirement.

thanks
with regards

Hi,

I am also facing same type of problem..
I am Newbie please help.

thanks
Jonty


ubuntu lucid package

2010-04-29 Thread pablo platt
Hi

I've installed solr-tomcat package on ubuntu lucid (10.04 latest).
It automatically install java and tomcat and hopefully all other
dependencies.
I can access tomcat at http://localhost:8080 but not sure where to find the
solr web admin
http://localhost:8180 gives me nothing.

Is this package known to work? I've read that on previous ubuntu releases
the packages were broken.
Do I need to configure anything after installing the package?

Thanks


Re: ubuntu lucid package

2010-04-29 Thread Otis Gospodnetic
Pablo, Ubuntu Lucid is *brand* new :)

try:
find / -name \*solr\*
or 
locate solr.war

Or simply try http://localhost:8080/solr/admin/
Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: pablo platt 
> To: solr-user@lucene.apache.org
> Sent: Thu, April 29, 2010 10:27:31 PM
> Subject: ubuntu lucid package
> 
> Hi

I've installed solr-tomcat package on ubuntu lucid (10.04 
> latest).
It automatically install java and tomcat and hopefully all 
> other
dependencies.
I can access tomcat at 
> target=_blank >http://localhost:8080 but not sure where to find the
solr 
> web admin

> >http://localhost:8180 gives me nothing.

Is this package known to 
> work? I've read that on previous ubuntu releases
the packages were 
> broken.
Do I need to configure anything after installing the 
> package?

Thanks


copyField - how does it work?

2010-04-29 Thread Naga Darbha
Hi,

I have my config something like "clubbed_text" of type "text" and 
"clubbed_string" of type "string". :

BLOCK-1...



BLOCK-2...
   
   

BLOCK-3...
   
   

BLOCK-4...
   

Is the copyField valid specified in BLOCK-4?  It seems it is not populating the 
clubbed_string with the values of field_A and field_B.

Do I need to populate clubbed_string by explicitly copying field_A and field_B 
directly to it?

Please help.

regards,
Naga


RE: How to make documents low priority

2010-04-29 Thread Doddamani, Prakash
Thanks much Koji,

Let me have look on this,

Regards
Prakash 

-Original Message-
From: Koji Sekiguchi [mailto:k...@r.email.ne.jp] 
Sent: Thursday, April 29, 2010 8:25 PM
To: solr-user@lucene.apache.org
Subject: Re: How to make documents low priority

Doddamani, Prakash wrote:
> Thanks Jon,
>
> Its very nice idea I dint thought about it, But I am already using 
> order for one more field, "sort=field1+desc"
>
> Can I have order for 2 fields something like 
> "sort=field1+desc&field5+desc"
>
>   
Yes, you can:

sort=field1+desc,field5+desc

http://wiki.apache.org/solr/CommonQueryParameters#sort

Koji

--
http://www.rondhuit.com/en/



Re: benefits of float vs. string

2010-04-29 Thread Dennis Gearon
Please explain a range query? 

tia :-)

Dennis Gearon

Signature Warning

EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Thu, 4/29/10, Yonik Seeley  wrote:

> From: Yonik Seeley 
> Subject: Re: benefits of float vs. string
> To: solr-user@lucene.apache.org
> Date: Thursday, April 29, 2010, 1:01 PM
> On Wed, Apr 28, 2010 at 11:22 AM,
> Nagelberg, Kallin
> 
> wrote:
> > Does anyone have an idea about the performance
> benefits of searching across floats compared to strings? I
> have one multi-valued field that contains about 3000
> distinct IDs across 5 million documents. I am going to be a
> lot of queries like q=id:102 OR id:303 OR id:305, etc. Right
> now it is a String but I am going to switch to a float as
> intuitively it ought to be easier to filter a number than a
> string.
> 
> 
> There won't be any difference in search speed for term
> queries as you
> show above.
> If you don't need to do sorting or range queries on that
> field, I'd
> leave it as a String.
> 
> 
> -Yonik
> Apache Lucene Eurocon 2010
> 18-21 May 2010 | Prague
> 


Elevation of of part match

2010-04-29 Thread Villemos, Gert
I would like to be able to elevate documents if the query matches part of a 
string.
 
For example, I would like to elevate the document FOO in case the query 
contains the word 'archive'. So when executing the queries
 
"packet archive"
"archive failure"
"archive"
 
All leads to the document FOO being elevated to the top.
 
Playing with the elevate component, this is not the case. The component seems 
to only work with complete matches. Or?
 
Thanks,
Gert.
 


Please help Logica to respect the environment by not printing this email  / 
Pour contribuer comme Logica au respect de l'environnement, merci de ne pas 
imprimer ce mail /  Bitte drucken Sie diese Nachricht nicht aus und helfen Sie 
so Logica dabei, die Umwelt zu schützen. /  Por favor ajude a Logica a 
respeitar o ambiente nao imprimindo este correio electronico.



This e-mail and any attachment is for authorised use by the intended 
recipient(s) only. It may contain proprietary material, confidential 
information and/or be subject to legal privilege. It should not be copied, 
disclosed to, retained or used by, any other party. If you are not an intended 
recipient then please promptly delete this e-mail and any attachment and all 
copies and inform the sender. Thank you.



Trouble with parenthesis

2010-04-29 Thread mailing-list

Hi everybody,

We got a problem with parenthesis in a lucene/solr request (Solr 1.4) :
- {!lucene q.op=AND}( ville:"Moscou" -periodicite:"annuel") give 
254documents
  with parsedquery>+ville:Moscou -periodicite:annuel< in debug mode. 
Thas'ts correct.
- {!lucene q.op=AND} (ville:"Moscou" AND NOT periodicite:"annuel") same 
results.
- {!lucene q.op=AND} (ville:"Moscou" AND (NOT periodicite:"annuel")) 
give 0 documents

  with parsedquery>+ville:Moscou +(-periodicite:annuel)<

The 2 fields are standards string fields in the solr shema.

Is it a issue or standard way of the Solr Query Parser ?

Best regards.
Gilbert Boyreau


Any way to get top 'n' queries searched from Solr?

2010-04-29 Thread Praveen Agrawal
Hi,
I need to know what are the top (most frequently searched and their
frequencies) 'n' (say 100) search queries that users tried. Does Solr keep
this information and can return, or else what options do i have here?
Thanks,
Praveen