Security/authentication strategies
Hi I'm planning on adding some protection to our solr servers and would like to know what others are doing in this area. Basically I have a few solr cores running under tomcat6 and all use DH to populate the solr index. This is all behind a firewall and only accessible from certain IP addresses. Access to Solr Admin is open to anyone in the company and many use it for checking data is in the index and simple analysis. However, they can also trigger a full-import if they are careless (one of the cores takes 6 hours to ingest the data). What would be the recommended way of protecting things like the DIH functionality? HTTP Authentication via tomcat realms or are there any other solutions? Thanks Andrew McCombe iWeb Solutions
Slow Date-Range Queries
Hi, I am currently having serious performance problems with date range queries. What I am doing, is validating a datasets published status by a valid_from and a valid_till date field. I did get a performance boost of ~ 100% by switching from a normal solr.DateField to a solr.TrieDateField with precisionStep="8", however my query still takes about 1,3 seconds. My field defintion looks like this: And the query looks like this: ((valid_from:[* TO 2010-04-29T10:34:12Z]) AND (valid_till:[2010-04-29T10:34:12Z TO *])) OR ((*:* -valid_from:[* TO *]) AND (*:* -valid_till:[* TO *]))) I use the empty checks for datasets which do not have a valid from/till range. Is there any way to get this any faster? Would it be faster using unix-timestamps with int fields? I would appreciate any insight and help on this. regards, Jan-Simon
Re: How are (multiple) filter queries processed?
Hi, thanks for your help, I figued it out myself I guess. All parts of an fq are always intersected, so it has no effect to put a boolean operator inside a fq like in fq=+tags:(Gucci) OR -tags:(watch sunglasses) (would be a mildly strange query anyway) The order in which the intersections are made follows their appearance in the query I suppose. best regards, Alex On Di, 2010-04-27 at 12:09 -0700, Chris Hostetter wrote: > : i was wondering how the following query might be processed: > : > : ?q=*:*&fq=+tags:(Gucci)&fq=-tags:(watch sunglasses) > > they are intersected so only documents matching all of them are potential > matches. > > : and if there is a difference to a query with only one fq parameter like > : > : ?q=*:*&fq=+tags:(Gucci) -tags:(watch sunglasses) > : > : I am aware of the chaching implications but i am not sure how the set > : intersections work between the results of the 'q' and one or more 'fq' > : parameters and if it is possible to use boolean operators inside a > : filter query. > > filter queries an use an QParser, so you can use boolean operators if the > QParser supports it (by default the QParser is "lucene" so "yes") ... > > i don't understand the "i am not sure how the set intersections work between > the results of the 'q' and one or more 'fq'" part of your question, can > you clarify what it is you are asking? > > > -Hoss >
Re: How are (multiple) filter queries processed?
Hi, thanks for your help, I figued it out myself I guess. All parts of an fq are always intersected, so it has no effect to put a boolean operator inside a fq like in fq=+tags:(Gucci) OR -tags:(watch sunglasses) (would be a mildly strange query anyway) The order in which the intersections are made follows their appearance in the query I suppose. best regards, Alex On Di, 2010-04-27 at 12:09 -0700, Chris Hostetter wrote: > : i was wondering how the following query might be processed: > : > : ?q=*:*&fq=+tags:(Gucci)&fq=-tags:(watch sunglasses) > > they are intersected so only documents matching all of them are potential > matches. > > : and if there is a difference to a query with only one fq parameter like > : > : ?q=*:*&fq=+tags:(Gucci) -tags:(watch sunglasses) > : > : I am aware of the chaching implications but i am not sure how the set > : intersections work between the results of the 'q' and one or more 'fq' > : parameters and if it is possible to use boolean operators inside a > : filter query. > > filter queries an use an QParser, so you can use boolean operators if the > QParser supports it (by default the QParser is "lucene" so "yes") ... > > i don't understand the "i am not sure how the set intersections work between > the results of the 'q' and one or more 'fq'" part of your question, can > you clarify what it is you are asking? > > > -Hoss >
require synonym filter on string field
Hi, I require to configure synonym to exact match. The field I need to search is string type. I tried to configure by the text but in text, due to whitespace tokenizer exact match not found. My requirement is : suppose user search by "solr user" and exact "solr user" (or equivalant synonym) are available then only return result.. my fieldType is "string" and I want to configure synonym on string field. or Is there any other way to index without tokenize (as it is) string and configure synonym for that field? please help..
Solr date range problem - specific date problem
I index some data include date in solr but when search for specific date, i get some record (not all record) include some record in next day for example: http://localhost:8080/solr/select/?q=pubdate:[2010-03-25T00:00:00Z >TO >2010-03-25T23:59:59Z]&start=0&rows=10&indent=on&sort=pubdate > desc i have 625000 record in 2010-03-25 but above query result return 325412 that include 14 record from 2010-03-26. Also i try with below query, but not get right result http://localhost:8080/solr/select/?q=pubdate:"2010-03-25T00:00:00Z"&start=0&rows=10&indent=on&sort=pubdate > > desc How to get right result for specific date ??? Could you please help me? Thanks in advanced Hamid
Re: CDATA For All Fields?
yes, that's totally fine. On Apr 28, 2010, at 7:14 PM, Thomas Nguyen wrote: Is there anything wrong with wrapping the text content of all fields with CDATA whether they be analyzed, not analyzed, indexed, not indexed and etc.? I have a script that creates update XML documents and it's just simple to wrap all text content in all fields with CDATA. From my brief tests it does not affect the search results at all.
AW: No highlighting results with dismax?
we use dismax and highlighting works fine. the only thing we had to add to the query-url was &hl.fl=FIELD1,FIELD2 so we had to specify which fields should be used for highlighting. > -Ursprüngliche Nachricht- > Von: fabritw [mailto:fabr...@gmail.com] > Gesendet: Mittwoch, 28. April 2010 16:08 > An: solr-user@lucene.apache.org > Betreff: No highlighting results with dismax? > > > Hi, > > Can highlights be returned when using the dismax request handler? > > I read in the below post that I can use a workaround with "qf"? > http://lucene.472066.n3.nabble.com/bug-No-highlighting-results > -with-dismax-and-q-alt-td498132.html > > Any advise is greatly appreciated. > > Regards, Will > > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/No-highlighting-results-wit > h-dismax-tp762570p762570.html > Sent from the Solr - User mailing list archive at Nabble.com. >
solr multi indexes and scoring
Hello every body, In our application we are dealing with music. In our index we are storing music tracks (3 million documents). We have popularity field which inside the track document, this field contains the number of times the track have been listened. The issue is that we are forced to ré-index the whole 3 millions documents every day to update this field. this field is very important for us because we use it to modify scoring via the _val_ parameter to boost track who have higher popularity. - Our tracks are stored in a data base, even we use delta import ,via Solr DIH, it took so much time because the majority of tracks' popularity is updated. Our indexe is growing every week so ré-indexing isn't well at all. - An other possible solution is to split the indexe into two index: 1- the first one will contain the popularity and a key to point the second index (Fk --> PK). 2- the second index will contain the stable fields on which we run the search, but how to modify the score using popularity wich is inside the first index , is it possible with Solr or am I obliged to manage this inside my code to trigger a search on the first index for each document returned from the second index and recalculate the default score returner from this second index? So what is the best solution to deal with that. Any suggestion is welcome and thank you in advance. -- View this message in context: http://lucene.472066.n3.nabble.com/solr-multi-indexes-and-scoring-tp764837p764837.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr multi indexes and scoring
khirb7 wrote: Hello every body, In our application we are dealing with music. In our index we are storing music tracks (3 million documents). We have popularity field which inside the track document, this field contains the number of times the track have been listened. The issue is that we are forced to ré-index the whole 3 millions documents every day to update this field. this field is very important for us because we use it to modify scoring via the _val_ parameter to boost track who have higher popularity. Just using ExternalFileField may solve your problem? http://lucene.apache.org/solr/api/org/apache/solr/schema/ExternalFileField.html Koji -- http://www.rondhuit.com/en/
Re: require synonym filter on string field
Ranveer Kumar wrote: Hi, I require to configure synonym to exact match. The field I need to search is string type. I tried to configure by the text but in text, due to whitespace tokenizer exact match not found. My requirement is : suppose user search by "solr user" and exact "solr user" (or equivalant synonym) are available then only return result.. my fieldType is "string" and I want to configure synonym on string field. or Is there any other way to index without tokenize (as it is) string and configure synonym for that field? please help.. Why don't you use KeywordTokenizer? And if you want to treat text in synonyms.txt as string as well, set tokenizerFactory attribute to KeywordTokenizerFactory. Koji -- http://www.rondhuit.com/en/
Re: require synonym filter on string field
On 4/29/10 3:45 PM, Koji Sekiguchi wrote: Ranveer Kumar wrote: Hi, I require to configure synonym to exact match. The field I need to search is string type. I tried to configure by the text but in text, due to whitespace tokenizer exact match not found. My requirement is : suppose user search by "solr user" and exact "solr user" (or equivalant synonym) are available then only return result.. my fieldType is "string" and I want to configure synonym on string field. or Is there any other way to index without tokenize (as it is) string and configure synonym for that field? please help.. Why don't you use KeywordTokenizer? And if you want to treat text in synonyms.txt as string as well, set tokenizerFactory attribute to KeywordTokenizerFactory. Koji Hi Koji, thanks for reply. where should I use the KeywordTokenizerFactory in string or in text field. I am wondering that KeywordTokenizerFactory will work or not in textfield. Actually as I understood about the KeywordTokenizerFactory that : KeywordTokenizerFactory is tokenize the keyword. for example : 'solr user' will tokenize to 'solr' and 'user' because solr and user are keyword.. My requirement is to index as 'solr user'
Re: Slow Date-Range Queries
> I am currently having serious performance problems with > date range queries. What I am doing, is validating a > datasets published status by a valid_from and a valid_till > date field. > > I did get a performance boost of ~ 100% by switching from a > normal solr.DateField to a solr.TrieDateField with > precisionStep="8", however my query still takes about 1,3 > seconds. > > My field defintion looks like this: > > precisionStep="8" sortMissingLast="true" > omitNorms="true"/> > > stored="false" required="false" /> > stored="false" required="false" /> > > > And the query looks like this: > ((valid_from:[* TO 2010-04-29T10:34:12Z]) AND > (valid_till:[2010-04-29T10:34:12Z TO *])) OR ((*:* > -valid_from:[* TO *]) AND (*:* -valid_till:[* TO *]))) > > I use the empty checks for datasets which do not have a > valid from/till range. > > > Is there any way to get this any faster? I can suggest you two things. 1-) valid_till:[* TO *] and valid_from:[* TO *] type queries can be performance killer. You can create a new boolean field ( populated via conditional copy or populated client side) that holds the information whether valid_from exists or not. So that valid_till:[* TO *] can be rewritten as valid_till_bool:true. 2-) If you are embedding these queries into q parameter, you can write your clauses into (filter query) fq parameters so that they are cached.
AW: Slow Date-Range Queries
> > ((valid_from:[* TO 2010-04-29T10:34:12Z]) AND > > (valid_till:[2010-04-29T10:34:12Z TO *])) OR ((*:* > > -valid_from:[* TO *]) AND (*:* -valid_till:[* TO *]))) > > > > I use the empty checks for datasets which do not have a > > valid from/till range. > > > > > > Is there any way to get this any faster? > > I can suggest you two things. > > 1-) valid_till:[* TO *] and valid_from:[* TO *] type queries can be > performance killer. You can create a new boolean field ( populated via > conditional copy or populated client side) that holds the information > whether valid_from exists or not. So that valid_till:[* TO *] can be > rewritten as valid_till_bool:true. That may be an idea, however i checked what happens when I simply leave them out. It does affect the performance but the query is still somewhere around 1 second. > 2-) If you are embedding these queries into q parameter, you can write > your clauses into (filter query) fq parameters so that they are cached. The problem here is, that the timestamp itself does change quite a bit and hence cannot be properly cached. It could be for a few seconds, but occasional response times of more than a second is still unacceptable for us. We need a solution that responds quickly ALL the time, not just most of the time. Thanks for your ideas though :) regards, Jan-Simon
Re: require synonym filter on string field
> I am wondering that KeywordTokenizerFactory will work or > not in textfield. Actually as I understood about the > KeywordTokenizerFactory that : KeywordTokenizerFactory is > tokenize the keyword. > for example : 'solr user' will tokenize to 'solr' and > 'user' because solr and user are keyword.. My requirement is > to index as 'solr user' > you can use something like: Also: "KeywordTokenizer does no actual tokenizing, so the entire input string is preserved as a single token" [from example\solr\conf\schema.xml]
Re: Using QueryElevationComponent without specifying top results?
Just wondering if anyone had any further thoughts on how I might do this? On 26 April 2010 19:18, Oliver Beattie wrote: > Hi Grant, > > Thanks for getting back to me. Yes, indeed, #1 is exactly what I'm looking > for. Results are already ranked by distance (among other things), but we > need the ability to manually include a certain result in the set. They > wouldn't usually match, because they fall outside the radius of the filter > query we use. Most of the resulting score comes from function queries (we > have a number of metrics that rank listings [price, feedback score, etc]), > so the score from the text search doesn't have *that much* bearing on the > outcome. So, yeah, basically, I'm looking for a way to include results that > don't match, but have Solr calculate its score as it would if it did match > the filter query. Sorry for being so unclear and rambling a bit, I'm > struggling to articulate what we want in a clear manner! > > —Oliver > > > > On 26 April 2010 19:13, Grant Ingersoll wrote: > >> >> On Apr 26, 2010, at 7:53 AM, Oliver Beattie wrote: >> >> > Hi all, >> > >> > I'm currently writing an application that uses Solr, and we'd like to >> use >> > something like the QueryElevationComponent, without having to specify >> which >> > results appear top. For example, what we really need is a way to say >> "for >> > this search, include these results as part of the result set, and rank >> them >> > as you normally would". We're using a filter to specify which results we >> > want included (which is distance-based), but we really want to be able >> to >> > explicitly include certain results in certain queries (i.e. we want to >> > include a listing more than 5 miles away from a particular location for >> > certain queries). >> > >> > Is this possible? Any help would be really appreciated :) >> >> >> I'm not following the "rank them as you normally would" part. If Solr >> were already finding them, then they would already be ranked and showing up >> in the results and you wouldn't need to "hardcode" them, right? So, that >> leaves a couple of cases: >> >> 1. Including results that don't match >> 2. Elevating results that do match >> >> In your case, it sounds like you mostly just want #1. And, based on the >> context (distance search) perhaps you want those results sorted by distance? >> Otherwise, how else would you know where to inject the results? >> >> The QueryElevationComponent can include the results, although, I must >> admit, I'm not 100% certain on what happens to injected results given >> sorting. >> >> >> -- >> Grant Ingersoll >> http://www.lucidimagination.com/ >> >> Search the Lucene ecosystem using Solr/Lucene: >> http://www.lucidimagination.com/search >> >> >
Re: require synonym filter on string field
Hi Koji, thanks for reply. where should I use the KeywordTokenizerFactory in string or in text field. I am wondering that KeywordTokenizerFactory will work or not in textfield. Actually as I understood about the KeywordTokenizerFactory that : KeywordTokenizerFactory is tokenize the keyword. for example : 'solr user' will tokenize to 'solr' and 'user' because solr and user are keyword.. My requirement is to index as 'solr user' KeywordTokenizer emits the entire input as a single token. Apply KeywordTokenizerFactory to TextField and try to see how "solr user" is tokenized via analysis.jsp (Launch admin GUI > ANALYSIS). Koji -- http://www.rondhuit.com/en/
Re: Security/authentication strategies
Hi Andrew, Today, authentication is handled by the container (e.g. Tomcat, Jetty etc.). There's a thread I found to be very useful on this topic here: http://www.lucidimagination.com/search/document/d1e338dc452db2e4/how_can_i_protect_the_solr_cores This was for Jetty, but the idea is pretty much the same for Tomcat. HTH Peter On Thu, Apr 29, 2010 at 8:42 AM, Andrew McCombe wrote: > Hi > > I'm planning on adding some protection to our solr servers and would > like to know what others are doing in this area. > > Basically I have a few solr cores running under tomcat6 and all use DH > to populate the solr index. This is all behind a firewall and only > accessible from certain IP addresses. Access to Solr Admin is open to > anyone in the company and many use it for checking data is in the > index and simple analysis. However, they can also trigger a > full-import if they are careless (one of the cores takes 6 hours to > ingest the data). > > What would be the recommended way of protecting things like the DIH > functionality? HTTP Authentication via tomcat realms or are there any > other solutions? > > Thanks > Andrew McCombe > iWeb Solutions >
RE: Problem with DIH delta-import on JDBC
Hi, It looks like the deltaImportQuery needs to be changed you are using dataimporter.delta.id which is not correct, you are selecting objected in the deltaQuery, so the deltaImportQuery should be using dataimporter.delta.objectid So try this: Colin. > -Original Message- > From: safl [mailto:s...@salamin.net] > Sent: Wednesday, April 28, 2010 3:05 PM > To: solr-user@lucene.apache.org > Subject: Problem with DIH delta-import on JDBC > > > Hello, > > I'm just new on the list. > I searched a lot on the list, but I didn't find an answer to my > question. > > I'm using Solr 1.4 on Windows with an Oracle 10g database. > I am able to do full-import without any problem, but I'm not able to > get > delta-import working. > > I have the following in the data-config.xml: > > ... > query="select * from table" > deltaImportQuery="select * from table where > objectid='${dataimporter.delta.id}'" > deltaQuery="select objectid from table where lastupdate > > '${dataimporter.last_index_time}'"> > > ... > > I update some records in the table and the try to run a delta-import. > I track the SQL queries on DB with P6Spy, and I always see a query like > > select * from table where objectid='' > > Of course, with such an SQL query, nothing is updated in my index. > > It behave the same if I replace ${dataimporter.delta.id} by > ${dataimporter.delta.objectid}. > Can someone tell what is wrong with it? > > Thanks a lot, > Florian > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Problem-with-DIH-delta-import-on- > JDBC-tp763469p763469.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: Problem in solr search
hey.. try the fq parameter !? ...&fq=(title:A country:USA) -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-in-solr-search-tp765028p765171.html Sent from the Solr - User mailing list archive at Nabble.com.
JTeam Spatial Plugin
Hi All, I am using JTeam's Spatial Plugin RC3 to perform spatial searches on my index and it works great. However, I can't seem to get it to return the computed distances. My query component is run before the geoDistanceComponent and the distanceField is set to "distance" Fields for lat/long are defined as well and the different tiers field are in the results. Increasing the radius cause the number of matches to increase so I guess that my setup is working... Here is sample query and its output (I removed some of the fields to keep it short): /select?passkey=sample&q={!spatial%20lat=40.27%20long=-76.29%20radius=22%20calc=arc}title:engineer&wt=json&indent=on&fl=*,distance { "responseHeader":{ "status":0, "QTime":69, "params":{ "fl":"*,distance", "indent":"on", "q":"{!spatial lat=40.27 long=-76.29 radius=22 calc=arc}title:engineer", "wt":"json"}}, "response":{"numFound":223,"start":0,"docs":[ { "title":"Electrical Engineer", "long":-76.3054962158203, "lat":40.037899017334, "_tier_9":-3.004, "_tier_10":-6.0008, "_tier_11":-12.0016, "_tier_12":-24.0031, "_tier_13":-47.0061, "_tier_14":-93.00122, "_tier_15":-186.00243, "_tier_16":-372.00485}, }} This output suggests to me that everything is in place. Anyone knows how to fetch the computed distance? I tried adding the field 'distance' to my list of fields but it didn't work Thanks
How to make documents low priority
Hi, I am using the boost factor as below field1^20.0 field2^5 field3^2.5 field4^.5 Where it searches first in field1 then field1 and so on Is there a way, where I can make some documents very low priority so that they come at the end? Scenario : aaa bbb 1 2010-04-29T12:40:05.589Z I want all the documents which have field5=1 come last and documents which have field5=0 should come first while searching. Any advise is greatly appreciated. Thanks Prakash
synonym filter problem for string or phrase
Hi, I am trying to configure synonym filter. my requirement is: when user searching by phrase like "what is solr user?" then it should be replace with "solr user". something like : what is solr user? => solr user My schema for particular field is: positionIncrementGap="100"> ignoreCase="true" expand="true" tokenizerFactory="KeywordTokenizerFactory"/> it seems working fine while trying by analysis.jsp but not by url http://localhost:8080/solr/core0/select?q="what is solr user?" or http://localhost:8080/solr/core0/select?q=what is solr user? Please guide me for achieve desire result.
RE: Solr date range problem - specific date problem
You should do this - http://localhost:8080/solr/select/?q=*:*&fq=pubdate:[2010-03-25T00:00:00Z %20TO%202010-03-25T23:59:59Z] Ankit -Original Message- From: Hamid Vahedi [mailto:hvb...@yahoo.com] Sent: Thursday, April 29, 2010 5:33 AM To: solr-user@lucene.apache.org Subject: Solr date range problem - specific date problem I index some data include date in solr but when search for specific date, i get some record (not all record) include some record in next day for example: http://localhost:8080/solr/select/?q=pubdate:[2010-03-25T00:00:00Z >TO >2010-03-25T23:59:59Z]&start=0&rows=10&indent=on&sort=pubdate > desc i have 625000 record in 2010-03-25 but above query result return 325412 that include 14 record from 2010-03-26. Also i try with below query, but not get right result http://localhost:8080/solr/select/?q=pubdate:"2010-03-25T00:00:00Z"&start=0&rows=10&indent=on&sort=pubdate > > desc How to get right result for specific date ??? Could you please help me? Thanks in advanced Hamid
RE: Problem with DIH delta-import on JDBC
Hi, I did a debugger session and found that the column names are case sensitive (at least with Oracle). The column names are retreived from the JDBC metadatas and I found that my objectid is in fact OBJECTID. So now, I'm able to do an update with the following config (pay attention to the OBJECTID): Is there a way to be "case insensitive" ? Anyway, it works now and that's the most important thing! :-) Thanks to all, Florian cbennett wrote: > > Hi, > > It looks like the deltaImportQuery needs to be changed you are using > dataimporter.delta.id which is not correct, you are selecting objected in > the deltaQuery, so the deltaImportQuery should be using > dataimporter.delta.objectid > > So try this: > > query="select * from table" > deltaImportQuery="select * from table where > objectid='${dataimporter.delta.objectid}'" > deltaQuery="select objectid from table where lastupdate > > '${dataimporter.last_index_time}'"> > > > Colin. > -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-with-DIH-delta-import-on-JDBC-tp763469p765262.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to make documents low priority
Does a "sort=field5+desc" on the query param not work? - Jon On Apr 29, 2010, at 9:32 AM, Doddamani, Prakash wrote: > Hi, > > > > I am using the boost factor as below > > > > field1^20.0 field2^5 field3^2.5 field4^.5 > > > > > > Where it searches first in field1 then field1 and so on > > > > Is there a way, where I can make some documents very low priority so > that they come at the end? > > > > Scenario : > > > > > > aaa > > bbb > > > > > > 1 > > > > 2010-04-29T12:40:05.589Z > > > > > > I want all the documents which have field5=1 come last and documents > which have field5=0 should come first while searching. > > Any advise is greatly appreciated. > > > > Thanks > > Prakash >
RE: Slow Date-Range Queries
You might want to look at DateMath, http://lucene.apache.org/solr/api/org/apache/solr/util/DateMathParser.html. I believe the default precision is to the millisecond, so if you afford to round to the nearest second or even minute you might see some performance gains. -Kallin Nagelberg -Original Message- From: Jan Simon Winkelmann [mailto:winkelm...@newsfactory.de] Sent: Thursday, April 29, 2010 4:36 AM To: solr-user@lucene.apache.org Subject: Slow Date-Range Queries Hi, I am currently having serious performance problems with date range queries. What I am doing, is validating a datasets published status by a valid_from and a valid_till date field. I did get a performance boost of ~ 100% by switching from a normal solr.DateField to a solr.TrieDateField with precisionStep="8", however my query still takes about 1,3 seconds. My field defintion looks like this: And the query looks like this: ((valid_from:[* TO 2010-04-29T10:34:12Z]) AND (valid_till:[2010-04-29T10:34:12Z TO *])) OR ((*:* -valid_from:[* TO *]) AND (*:* -valid_till:[* TO *]))) I use the empty checks for datasets which do not have a valid from/till range. Is there any way to get this any faster? Would it be faster using unix-timestamps with int fields? I would appreciate any insight and help on this. regards, Jan-Simon
Relevancy Practices
I'm putting on a talk at Lucene Eurocon (http://lucene-eurocon.org/sessions-track1-day2.html#1) on "Practical Relevance" and I'm curious as to what people put in practice for testing and improving relevance. I have my own inclinations, but I don't want to muddy the water just yet. So, if you have a few moments, I'd love to hear responses to the following questions. What worked? What didn't work? What didn't you understand about it? What tools did you use? What tools did you wish you had either for debugging relevance or "fixing" it? How much time did you spend on it? How did you avoid over/under tuning? What stage of development/testing/production did you decide to do relevance tuning? Was that timing planned or not? Thanks, Grant
Re: Problem with DIH delta-import on JDBC
All that stuff happens in the JDBC driver associated w/ the DataSource so probably not unless there is something which can be set in the Oracle driver itself. One thing that might have helped in this case might have been if readFieldNames() in the JDBCDataSource dumped its return to debug log for you. That might be something that can be JIRA(ed). - Jon On Apr 29, 2010, at 9:45 AM, safl wrote: > > Hi, > > I did a debugger session and found that the column names are case sensitive > (at least with Oracle). > The column names are retreived from the JDBC metadatas and I found that my > objectid is in fact OBJECTID. > > So now, I'm able to do an update with the following config (pay attention to > the OBJECTID): > > query="select * from table" >deltaImportQuery="select * from table where > objectid='${dataimporter.delta.OBJECTID}'" >deltaQuery="select objectid from table where lastupdate > > '${dataimporter.last_index_time}'"> > > > > Is there a way to be "case insensitive" ? > > Anyway, it works now and that's the most important thing! > :-) > > Thanks to all, > Florian > > > > cbennett wrote: >> >> Hi, >> >> It looks like the deltaImportQuery needs to be changed you are using >> dataimporter.delta.id which is not correct, you are selecting objected in >> the deltaQuery, so the deltaImportQuery should be using >> dataimporter.delta.objectid >> >> So try this: >> >> >query="select * from table" >>deltaImportQuery="select * from table where >> objectid='${dataimporter.delta.objectid}'" >>deltaQuery="select objectid from table where lastupdate > >> '${dataimporter.last_index_time}'"> >> >> >> Colin. >> > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Problem-with-DIH-delta-import-on-JDBC-tp763469p765262.html > Sent from the Solr - User mailing list archive at Nabble.com.
RE: How to make documents low priority
Thanks Jon, Its very nice idea I dint thought about it, But I am already using order for one more field, "sort=field1+desc" Can I have order for 2 fields something like "sort=field1+desc&field5+desc" Or is there something else I should do. Thanks Prakash -Original Message- From: Jon Baer [mailto:jonb...@gmail.com] Sent: Thursday, April 29, 2010 7:39 PM To: solr-user@lucene.apache.org Subject: Re: How to make documents low priority Does a "sort=field5+desc" on the query param not work? - Jon On Apr 29, 2010, at 9:32 AM, Doddamani, Prakash wrote: > Hi, > > > > I am using the boost factor as below > > > > field1^20.0 field2^5 field3^2.5 field4^.5 > > > > > > Where it searches first in field1 then field1 and so on > > > > Is there a way, where I can make some documents very low priority so > that they come at the end? > > > > Scenario : > > > > > > aaa > > bbb > > > > > > 1 > > > > 2010-04-29T12:40:05.589Z > > > > > > I want all the documents which have field5=1 come last and documents > which have field5=0 should come first while searching. > > Any advise is greatly appreciated. > > > > Thanks > > Prakash >
Re: How to make documents low priority
Doddamani, Prakash wrote: Thanks Jon, Its very nice idea I dint thought about it, But I am already using order for one more field, "sort=field1+desc" Can I have order for 2 fields something like "sort=field1+desc&field5+desc" Yes, you can: sort=field1+desc,field5+desc http://wiki.apache.org/solr/CommonQueryParameters#sort Koji -- http://www.rondhuit.com/en/
Re: synonym filter problem for string or phrase
Hi Ranveer, If you don't specify a field type in the q parameter, the search will be done searching in your default search field defined in the solrconfig.xml, its your default field a text_sync field? Regards, Marco Martínez Bautista http://www.paradigmatecnologico.com Avenida de Europa, 26. Ática 5. 3ª Planta 28224 Pozuelo de Alarcón Tel.: 91 352 59 42 2010/4/29 Ranveer > Hi, > > I am trying to configure synonym filter. > my requirement is: > when user searching by phrase like "what is solr user?" then it should be > replace with "solr user". > something like : what is solr user? => solr user > > My schema for particular field is: > > positionIncrementGap="100"> > > > > > > > > > > ignoreCase="true" expand="true" tokenizerFactory="KeywordTokenizerFactory"/> > > > > it seems working fine while trying by analysis.jsp but not by url > http://localhost:8080/solr/core0/select?q="what is solr user?" > or > http://localhost:8080/solr/core0/select?q=what is solr user? > > Please guide me for achieve desire result. > >
Re: Using NoOpMergePolicy (Lucene 2331) from Solr
Jason Rutherglen wrote: Tom, Interesting, can you post your findings after you've found them? :) Jason On Tue, Apr 27, 2010 at 2:33 PM, Burton-West, Tom wrote: Is it possible to use the NoOpMergePolicy ( https://issues.apache.org/jira/browse/LUCENE-2331 ) from Solr? We have very large indexes and always optimize, so we are thinking about using a very large ramBufferSizeMB and a NoOpMergePolicy and then running an optimize to avoid extra disk reads and writes. Tom Burton-West I've never tried it but NoMergePolicy and NoMergeScheduler can be specified in solrconfig.xml: 1000 Koji -- http://www.rondhuit.com/en/
Re: synonym filter problem for string or phrase
On 4/29/10 8:50 PM, Marco Martinez wrote: Hi Ranveer, If you don't specify a field type in the q parameter, the search will be done searching in your default search field defined in the solrconfig.xml, its your default field a text_sync field? Regards, Marco Martínez Bautista http://www.paradigmatecnologico.com Avenida de Europa, 26. Ática 5. 3ª Planta 28224 Pozuelo de Alarcón Tel.: 91 352 59 42 2010/4/29 Ranveer Hi, I am trying to configure synonym filter. my requirement is: when user searching by phrase like "what is solr user?" then it should be replace with "solr user". something like : what is solr user? => solr user My schema for particular field is: it seems working fine while trying by analysis.jsp but not by url http://localhost:8080/solr/core0/select?q="what is solr user?" or http://localhost:8080/solr/core0/select?q=what is solr user? Please guide me for achieve desire result. Hi Marco, thanks. yes my default search field is text_sync. I am getting result now but not as I expect. following is my synonym.txt what is bone cancer=>bone cancer what is bone cancer?=>bone cancer what is of bone cancer=>bone cancer what is symptom of bone cancer=>bone cancer what is symptoms of bone cancer=>bone cancer in above I am getting result of all synonym but not the last one "what is symptoms of bone cancer=>bone cancer". I think due to stemming I am not getting expected result. However when I am checking result from the analysis.jsp, its giving expected result. I am confused.. Also I want to know best approach to configure synonym for my requirement. thanks with regards
Re: Slow Date-Range Queries
Hmmm, what does the rest of your query look like? And does adding &debugQuery=on show anything interesting? Best Erick On Thu, Apr 29, 2010 at 6:54 AM, Jan Simon Winkelmann < winkelm...@newsfactory.de> wrote: > > > ((valid_from:[* TO 2010-04-29T10:34:12Z]) AND > > > (valid_till:[2010-04-29T10:34:12Z TO *])) OR ((*:* > > > -valid_from:[* TO *]) AND (*:* -valid_till:[* TO *]))) > > > > > > I use the empty checks for datasets which do not have a > > > valid from/till range. > > > > > > > > > Is there any way to get this any faster? > > > > I can suggest you two things. > > > > 1-) valid_till:[* TO *] and valid_from:[* TO *] type queries can be > > performance killer. You can create a new boolean field ( populated via > > conditional copy or populated client side) that holds the information > > whether valid_from exists or not. So that valid_till:[* TO *] can be > > rewritten as valid_till_bool:true. > > That may be an idea, however i checked what happens when I simply leave > them out. It does affect the performance but the query is still somewhere > around 1 second. > > > 2-) If you are embedding these queries into q parameter, you can write > > your clauses into (filter query) fq parameters so that they are cached. > > The problem here is, that the timestamp itself does change quite a bit and > hence cannot be properly cached. It could be for a few seconds, but > occasional response times of more than a second is still unacceptable for > us. We need a solution that responds quickly ALL the time, not just most of > the time. > > Thanks for your ideas though :) > > regards, > Jan-Simon > >
Solr configuration to enable indexing/searching webapp log files
I thought i remembered seeing some information about this, but have been unable to find it Does anyone know if there is a configuration / module that would allow us to setup Solr to take in the (large) log files generated by our web/app servers, so that we can query for things like peak time requests or most frequently requested web page etc Thanks Stefan Maric
Re: Solr Cloud & Gossip Protocols
Thanks, Im looking @ the atomic broadcast messaging protocol of Zookeeper and think I have found what I was looking for ... - Jon On Apr 28, 2010, at 11:27 PM, Yonik Seeley wrote: > On Wed, Apr 28, 2010 at 2:23 PM, Jon Baer wrote: >> From what I understand Cassandra uses a generic gossip protocol for node >> discovery (custom), will the Solr-Cloud have something similar? > > SolrCloud uses zookeeper, so node discovery is a simple matter of > looking there. Nodes are responsible for registering themselves in > zookeeper. > > -Yonik > Apache Lucene Eurocon 2010 > 18-21 May 2010 | Prague
Re: Solr configuration to enable indexing/searching webapp log files
Good question, +1 on finding answer, my take ... Depending on how large of log files you are talking about it might be better off to do this w/ HDFS / Hadoop (and a script language like Pig) (or Amazon EMR) http://developer.amazonwebservices.com/connect/entry.jspa?externalID=873 Theoretically you could split the logs to fields, use a dataimporter and search / sort w/ something like LineEntityProcessor. http://wiki.apache.org/solr/DataImportHandler#LineEntityProcessor I've tried to use Solr as a log analytics tool (before dataimporthandler) and it was not worth the disk space or practical but I'd love to hear otherwise. In general you could flush daily logs to an index but working w/ the data in another context if you had to seems better fit for HDFS use (I think). - Jon On Apr 29, 2010, at 1:46 PM, Stefan Maric wrote: > > I thought i remembered seeing some information about this, but have been > unable to find it > > Does anyone know if there is a configuration / module that would allow us to > setup Solr to take in the (large) log files generated by our web/app > servers, so that we can query for things like peak time requests or most > frequently requested web page etc > > Thanks > Stefan Maric >
Evangelism
Hi I'm new to the list here, I'd like to steer someone in the direction of Solr, and I see the list of companies using solr, but none have a "power by solr" logo or anything. Does anyone have any great links with evidence to majorly successful solr projects? Thanks in advance, Dan B.
Re: Evangelism
A very abbreviated list of sites using Apache Solr + Drupal here: http://drupal.org/node/447564 -Peter On Thu, Apr 29, 2010 at 2:10 PM, Daniel Baughman wrote: > Hi I'm new to the list here, > > > > I'd like to steer someone in the direction of Solr, and I see the list of > companies using solr, but none have a "power by solr" logo or anything. > > > > Does anyone have any great links with evidence to majorly successful solr > projects? > > > > Thanks in advance, > > > > Dan B. > > > > -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: Evangelism
Checkout Lucid Imagination http://www.lucidimagination.com/About-Search This should convince you. On Thu, Apr 29, 2010 at 2:10 PM, Daniel Baughman wrote: > Hi I'm new to the list here, > > > > I'd like to steer someone in the direction of Solr, and I see the list of > companies using solr, but none have a "power by solr" logo or anything. > > > > Does anyone have any great links with evidence to majorly successful solr > projects? > > > > Thanks in advance, > > > > Dan B. > > > > -- "Good Enough" is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: Evangelism
Their main search page has the "Powered by Solr" logo http://www.lucidimagination.com/search/ On Thu, Apr 29, 2010 at 2:18 PM, Israel Ekpo wrote: > Checkout Lucid Imagination > > http://www.lucidimagination.com/About-Search > > This should convince you. > > > On Thu, Apr 29, 2010 at 2:10 PM, Daniel Baughman wrote: > >> Hi I'm new to the list here, >> >> >> >> I'd like to steer someone in the direction of Solr, and I see the list of >> companies using solr, but none have a "power by solr" logo or anything. >> >> >> >> Does anyone have any great links with evidence to majorly successful solr >> projects? >> >> >> >> Thanks in advance, >> >> >> >> Dan B. >> >> >> >> > > > -- > "Good Enough" is not good enough. > To give anything less than your best is to sacrifice the gift. > Quality First. Measure Twice. Cut Once. > http://www.israelekpo.com/ > -- "Good Enough" is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
RE: Evangelism
I had a very hard time selling Solr to business folks. Most are of the mind that if you're not paying for something it can't be any good. That might also be why they refrain from posting 'powered by solr' on their website, as if it might show them to be cheap. They are also fearful of lack of support should you get hit by a bus. This might be remedied by recommending professional services from a company such as lucid imagination. I think your best bet is to create a working demo with your data and show them the performance. Cheers, -Kallin Nagelberg -Original Message- From: Israel Ekpo [mailto:israele...@gmail.com] Sent: Thursday, April 29, 2010 2:19 PM To: solr-user@lucene.apache.org Subject: Re: Evangelism Their main search page has the "Powered by Solr" logo http://www.lucidimagination.com/search/ On Thu, Apr 29, 2010 at 2:18 PM, Israel Ekpo wrote: > Checkout Lucid Imagination > > http://www.lucidimagination.com/About-Search > > This should convince you. > > > On Thu, Apr 29, 2010 at 2:10 PM, Daniel Baughman wrote: > >> Hi I'm new to the list here, >> >> >> >> I'd like to steer someone in the direction of Solr, and I see the list of >> companies using solr, but none have a "power by solr" logo or anything. >> >> >> >> Does anyone have any great links with evidence to majorly successful solr >> projects? >> >> >> >> Thanks in advance, >> >> >> >> Dan B. >> >> >> >> > > > -- > "Good Enough" is not good enough. > To give anything less than your best is to sacrifice the gift. > Quality First. Measure Twice. Cut Once. > http://www.israelekpo.com/ > -- "Good Enough" is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: Evangelism
A lot of high performing websites use MySQL, Oracle and Microsoft SQL Server for data storage and other RDBMS needs without necessarily putting the "powered by" logo on the sites. If you need the certified version of Apache Solr, you can contact Lucid Imagination. Just like MySQL, Apache Solr and Apache Lucene also have commercial backing (from Lucid Imagination) if you choose to go that route. On Thu, Apr 29, 2010 at 2:24 PM, Nagelberg, Kallin < knagelb...@globeandmail.com> wrote: > I had a very hard time selling Solr to business folks. Most are of the mind > that if you're not paying for something it can't be any good. That might > also be why they refrain from posting 'powered by solr' on their website, as > if it might show them to be cheap. They are also fearful of lack of support > should you get hit by a bus. This might be remedied by recommending > professional services from a company such as lucid imagination. > > I think your best bet is to create a working demo with your data and show > them the performance. > > Cheers, > -Kallin Nagelberg > > > > -Original Message- > From: Israel Ekpo [mailto:israele...@gmail.com] > Sent: Thursday, April 29, 2010 2:19 PM > To: solr-user@lucene.apache.org > Subject: Re: Evangelism > > Their main search page has the "Powered by Solr" logo > > http://www.lucidimagination.com/search/ > > > > On Thu, Apr 29, 2010 at 2:18 PM, Israel Ekpo wrote: > > > Checkout Lucid Imagination > > > > http://www.lucidimagination.com/About-Search > > > > This should convince you. > > > > > > On Thu, Apr 29, 2010 at 2:10 PM, Daniel Baughman >wrote: > > > >> Hi I'm new to the list here, > >> > >> > >> > >> I'd like to steer someone in the direction of Solr, and I see the list > of > >> companies using solr, but none have a "power by solr" logo or anything. > >> > >> > >> > >> Does anyone have any great links with evidence to majorly successful > solr > >> projects? > >> > >> > >> > >> Thanks in advance, > >> > >> > >> > >> Dan B. > >> > >> > >> > >> > > > > > > -- > > "Good Enough" is not good enough. > > To give anything less than your best is to sacrifice the gift. > > Quality First. Measure Twice. Cut Once. > > http://www.israelekpo.com/ > > > > > > -- > "Good Enough" is not good enough. > To give anything less than your best is to sacrifice the gift. > Quality First. Measure Twice. Cut Once. > http://www.israelekpo.com/ > -- "Good Enough" is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: Evangelism
This is a Lucene story, but may well apply... By the time I'd sent a request for assistance to the vendor of one of our search tools and received the reply "you didn't give us the right license number", I'd found Lucene, indexed part of my corpus and run successful searches against it. And had answers provided to me from the users list. Paying for support provides, I believe, a false sense of security. Once you sign up, you're at the mercy of the vendor for many things, among them: 1> releases are far apart 2> if the company gets purchased, all sorts of interesting things happen. Witness Microsoft buying FAST recently, then announcing they were not doing any more development on *nix platforms. 3> If the company does go out of business, you are stuck with binary code you can't compile/run/fix understand. 4> You are at the mercy of the next release for "really gotta have it now" changes. Unless you're willing to pay...er...a considerable sum to get a special fix, which may not even be an option. That said, not all open source products are great, I just happen to think that SOLR/Lucene is. Add to that that problems that are found are often fixed in a day or two, a record that no commercial package I've ever used has matched. Here's one technique you can use to sell it to management. Get a pilot up and running in, oh, say three days (ok, take a week). Try the same thing with commercial package X. Do not, under any circumstances, be satisfied with the powerpoint presentation from a commercial vendor . Require working code. Then evaluate ... Best Erick On Thu, Apr 29, 2010 at 2:24 PM, Nagelberg, Kallin < knagelb...@globeandmail.com> wrote: > I had a very hard time selling Solr to business folks. Most are of the mind > that if you're not paying for something it can't be any good. That might > also be why they refrain from posting 'powered by solr' on their website, as > if it might show them to be cheap. They are also fearful of lack of support > should you get hit by a bus. This might be remedied by recommending > professional services from a company such as lucid imagination. > > I think your best bet is to create a working demo with your data and show > them the performance. > > Cheers, > -Kallin Nagelberg > > > > -Original Message- > From: Israel Ekpo [mailto:israele...@gmail.com] > Sent: Thursday, April 29, 2010 2:19 PM > To: solr-user@lucene.apache.org > Subject: Re: Evangelism > > Their main search page has the "Powered by Solr" logo > > http://www.lucidimagination.com/search/ > > > > On Thu, Apr 29, 2010 at 2:18 PM, Israel Ekpo wrote: > > > Checkout Lucid Imagination > > > > http://www.lucidimagination.com/About-Search > > > > This should convince you. > > > > > > On Thu, Apr 29, 2010 at 2:10 PM, Daniel Baughman >wrote: > > > >> Hi I'm new to the list here, > >> > >> > >> > >> I'd like to steer someone in the direction of Solr, and I see the list > of > >> companies using solr, but none have a "power by solr" logo or anything. > >> > >> > >> > >> Does anyone have any great links with evidence to majorly successful > solr > >> projects? > >> > >> > >> > >> Thanks in advance, > >> > >> > >> > >> Dan B. > >> > >> > >> > >> > > > > > > -- > > "Good Enough" is not good enough. > > To give anything less than your best is to sacrifice the gift. > > Quality First. Measure Twice. Cut Once. > > http://www.israelekpo.com/ > > > > > > -- > "Good Enough" is not good enough. > To give anything less than your best is to sacrifice the gift. > Quality First. Measure Twice. Cut Once. > http://www.israelekpo.com/ >
RE: Evangelism
Netflix search is built with Solr. That seems like a fairly big and recognizable company. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Thursday, April 29, 2010 11:44 AM To: solr-user@lucene.apache.org Subject: Re: Evangelism This is a Lucene story, but may well apply... By the time I'd sent a request for assistance to the vendor of one of our search tools and received the reply "you didn't give us the right license number", I'd found Lucene, indexed part of my corpus and run successful searches against it. And had answers provided to me from the users list. Paying for support provides, I believe, a false sense of security. Once you sign up, you're at the mercy of the vendor for many things, among them: 1> releases are far apart 2> if the company gets purchased, all sorts of interesting things happen. Witness Microsoft buying FAST recently, then announcing they were not doing any more development on *nix platforms. 3> If the company does go out of business, you are stuck with binary code you can't compile/run/fix understand. 4> You are at the mercy of the next release for "really gotta have it now" changes. Unless you're willing to pay...er...a considerable sum to get a special fix, which may not even be an option. That said, not all open source products are great, I just happen to think that SOLR/Lucene is. Add to that that problems that are found are often fixed in a day or two, a record that no commercial package I've ever used has matched. Here's one technique you can use to sell it to management. Get a pilot up and running in, oh, say three days (ok, take a week). Try the same thing with commercial package X. Do not, under any circumstances, be satisfied with the powerpoint presentation from a commercial vendor . Require working code. Then evaluate ... Best Erick On Thu, Apr 29, 2010 at 2:24 PM, Nagelberg, Kallin < knagelb...@globeandmail.com> wrote: > I had a very hard time selling Solr to business folks. Most are of the mind > that if you're not paying for something it can't be any good. That might > also be why they refrain from posting 'powered by solr' on their website, as > if it might show them to be cheap. They are also fearful of lack of support > should you get hit by a bus. This might be remedied by recommending > professional services from a company such as lucid imagination. > > I think your best bet is to create a working demo with your data and show > them the performance. > > Cheers, > -Kallin Nagelberg > > > > -Original Message- > From: Israel Ekpo [mailto:israele...@gmail.com] > Sent: Thursday, April 29, 2010 2:19 PM > To: solr-user@lucene.apache.org > Subject: Re: Evangelism > > Their main search page has the "Powered by Solr" logo > > http://www.lucidimagination.com/search/ > > > > On Thu, Apr 29, 2010 at 2:18 PM, Israel Ekpo wrote: > > > Checkout Lucid Imagination > > > > http://www.lucidimagination.com/About-Search > > > > This should convince you. > > > > > > On Thu, Apr 29, 2010 at 2:10 PM, Daniel Baughman >wrote: > > > >> Hi I'm new to the list here, > >> > >> > >> > >> I'd like to steer someone in the direction of Solr, and I see the list > of > >> companies using solr, but none have a "power by solr" logo or anything. > >> > >> > >> > >> Does anyone have any great links with evidence to majorly successful > solr > >> projects? > >> > >> > >> > >> Thanks in advance, > >> > >> > >> > >> Dan B. > >> > >> > >> > >> > > > > > > -- > > "Good Enough" is not good enough. > > To give anything less than your best is to sacrifice the gift. > > Quality First. Measure Twice. Cut Once. > > http://www.israelekpo.com/ > > > > > > -- > "Good Enough" is not good enough. > To give anything less than your best is to sacrifice the gift. > Quality First. Measure Twice. Cut Once. > http://www.israelekpo.com/ >
RE: Evangelism
Forgot the link. http://www.lucidimagination.com/Community/Marketplace/Application-Showca se-Wiki/Netflix -Original Message- From: Jason Chaffee [mailto:jchaf...@ebates.com] Sent: Thursday, April 29, 2010 11:52 AM To: solr-user@lucene.apache.org Subject: RE: Evangelism Netflix search is built with Solr. That seems like a fairly big and recognizable company. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Thursday, April 29, 2010 11:44 AM To: solr-user@lucene.apache.org Subject: Re: Evangelism This is a Lucene story, but may well apply... By the time I'd sent a request for assistance to the vendor of one of our search tools and received the reply "you didn't give us the right license number", I'd found Lucene, indexed part of my corpus and run successful searches against it. And had answers provided to me from the users list. Paying for support provides, I believe, a false sense of security. Once you sign up, you're at the mercy of the vendor for many things, among them: 1> releases are far apart 2> if the company gets purchased, all sorts of interesting things happen. Witness Microsoft buying FAST recently, then announcing they were not doing any more development on *nix platforms. 3> If the company does go out of business, you are stuck with binary code you can't compile/run/fix understand. 4> You are at the mercy of the next release for "really gotta have it now" changes. Unless you're willing to pay...er...a considerable sum to get a special fix, which may not even be an option. That said, not all open source products are great, I just happen to think that SOLR/Lucene is. Add to that that problems that are found are often fixed in a day or two, a record that no commercial package I've ever used has matched. Here's one technique you can use to sell it to management. Get a pilot up and running in, oh, say three days (ok, take a week). Try the same thing with commercial package X. Do not, under any circumstances, be satisfied with the powerpoint presentation from a commercial vendor . Require working code. Then evaluate ... Best Erick On Thu, Apr 29, 2010 at 2:24 PM, Nagelberg, Kallin < knagelb...@globeandmail.com> wrote: > I had a very hard time selling Solr to business folks. Most are of the mind > that if you're not paying for something it can't be any good. That might > also be why they refrain from posting 'powered by solr' on their website, as > if it might show them to be cheap. They are also fearful of lack of support > should you get hit by a bus. This might be remedied by recommending > professional services from a company such as lucid imagination. > > I think your best bet is to create a working demo with your data and show > them the performance. > > Cheers, > -Kallin Nagelberg > > > > -Original Message- > From: Israel Ekpo [mailto:israele...@gmail.com] > Sent: Thursday, April 29, 2010 2:19 PM > To: solr-user@lucene.apache.org > Subject: Re: Evangelism > > Their main search page has the "Powered by Solr" logo > > http://www.lucidimagination.com/search/ > > > > On Thu, Apr 29, 2010 at 2:18 PM, Israel Ekpo wrote: > > > Checkout Lucid Imagination > > > > http://www.lucidimagination.com/About-Search > > > > This should convince you. > > > > > > On Thu, Apr 29, 2010 at 2:10 PM, Daniel Baughman >wrote: > > > >> Hi I'm new to the list here, > >> > >> > >> > >> I'd like to steer someone in the direction of Solr, and I see the list > of > >> companies using solr, but none have a "power by solr" logo or anything. > >> > >> > >> > >> Does anyone have any great links with evidence to majorly successful > solr > >> projects? > >> > >> > >> > >> Thanks in advance, > >> > >> > >> > >> Dan B. > >> > >> > >> > >> > > > > > > -- > > "Good Enough" is not good enough. > > To give anything less than your best is to sacrifice the gift. > > Quality First. Measure Twice. Cut Once. > > http://www.israelekpo.com/ > > > > > > -- > "Good Enough" is not good enough. > To give anything less than your best is to sacrifice the gift. > Quality First. Measure Twice. Cut Once. > http://www.israelekpo.com/ >
Re: Solr configuration to enable indexing/searching webapp log files
To follow up it ... it seems dumping to Solr is common ... http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-query-terabytes-data - Jon On Apr 29, 2010, at 1:58 PM, Jon Baer wrote: > Good question, +1 on finding answer, my take ... > > Depending on how large of log files you are talking about it might be better > off to do this w/ HDFS / Hadoop (and a script language like Pig) (or Amazon > EMR) > > http://developer.amazonwebservices.com/connect/entry.jspa?externalID=873 > > Theoretically you could split the logs to fields, use a dataimporter and > search / sort w/ something like LineEntityProcessor. > > http://wiki.apache.org/solr/DataImportHandler#LineEntityProcessor > > I've tried to use Solr as a log analytics tool (before dataimporthandler) and > it was not worth the disk space or practical but I'd love to hear otherwise. > In general you could flush daily logs to an index but working w/ the data in > another context if you had to seems better fit for HDFS use (I think). > > - Jon > > On Apr 29, 2010, at 1:46 PM, Stefan Maric wrote: > >> >> I thought i remembered seeing some information about this, but have been >> unable to find it >> >> Does anyone know if there is a configuration / module that would allow us to >> setup Solr to take in the (large) log files generated by our web/app >> servers, so that we can query for things like peak time requests or most >> frequently requested web page etc >> >> Thanks >> Stefan Maric >> >
Re: Evangelism
Hi Daniel, There are lots of sites running Solr ranging from very large to very small. Because it is open source, people aren't required to report, but there are several places where people have reported: http://wiki.apache.org/solr/PublicServers http://www.lucidimagination.com/developer/Community/Application-Showcase-Wiki You can also see a number of case studies at: http://www.lucidimagination.com/solutions/documents From those lists, you'll see recognizable names like AT&T, StubHub, CNET, Digg, MTV/Viacom, The Motley Fool, Disney, Netflix, etc. Hope that helps, Grant On Apr 29, 2010, at 2:10 PM, Daniel Baughman wrote: > Hi I'm new to the list here, > > > > I'd like to steer someone in the direction of Solr, and I see the list of > companies using solr, but none have a "power by solr" logo or anything. > > > > Does anyone have any great links with evidence to majorly successful solr > projects? > > > > Thanks in advance, > > > > Dan B. > > > -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.com/search
RE: Evangelism
ColdFusion 9 is now shipping with it, as well. Thanks everyone for the inputs. -Original Message- From: Grant Ingersoll [mailto:gsi...@gmail.com] On Behalf Of Grant Ingersoll Sent: Thursday, April 29, 2010 1:35 PM To: solr-user@lucene.apache.org Subject: Re: Evangelism Hi Daniel, There are lots of sites running Solr ranging from very large to very small. Because it is open source, people aren't required to report, but there are several places where people have reported: http://wiki.apache.org/solr/PublicServers http://www.lucidimagination.com/developer/Community/Application-Showcase-Wik i You can also see a number of case studies at: http://www.lucidimagination.com/solutions/documents >From those lists, you'll see recognizable names like AT&T, StubHub, CNET, Digg, MTV/Viacom, The Motley Fool, Disney, Netflix, etc. Hope that helps, Grant On Apr 29, 2010, at 2:10 PM, Daniel Baughman wrote: > Hi I'm new to the list here, > > > > I'd like to steer someone in the direction of Solr, and I see the list of > companies using solr, but none have a "power by solr" logo or anything. > > > > Does anyone have any great links with evidence to majorly successful solr > projects? > > > > Thanks in advance, > > > > Dan B. > > > -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.com/search
Re: benefits of float vs. string
Floats are Trie types and are stored in a compressed format. They will search faster. They will also sort with much less space. One thing to point out is that doing bitwise comparison on floats is to live in a state of sin. Your string representations must parse exactly right. On Wed, Apr 28, 2010 at 8:22 AM, Nagelberg, Kallin wrote: > Hi, > > Does anyone have an idea about the performance benefits of searching across > floats compared to strings? I have one multi-valued field that contains about > 3000 distinct IDs across 5 million documents. I am going to be a lot of > queries like q=id:102 OR id:303 OR id:305, etc. Right now it is a String but > I am going to switch to a float as intuitively it ought to be easier to > filter a number than a string. I'm just curious if this should in fact bring > a benefit, and more generally what the benefits/penalties to using numerical > over string field types is. > > Thanks, > Kallin Nagelberg > -- Lance Norskog goks...@gmail.com
Re: benefits of float vs. string
On Wed, Apr 28, 2010 at 11:22 AM, Nagelberg, Kallin wrote: > Does anyone have an idea about the performance benefits of searching across > floats compared to strings? I have one multi-valued field that contains about > 3000 distinct IDs across 5 million documents. I am going to be a lot of > queries like q=id:102 OR id:303 OR id:305, etc. Right now it is a String but > I am going to switch to a float as intuitively it ought to be easier to > filter a number than a string. There won't be any difference in search speed for term queries as you show above. If you don't need to do sorting or range queries on that field, I'd leave it as a String. -Yonik Apache Lucene Eurocon 2010 18-21 May 2010 | Prague
Re: Security/authentication strategies
Thanks for this Peter. I have managed to get this working with Tomcat. Andrew On 29 April 2010 12:11, Peter Sturge wrote: > Hi Andrew, > > Today, authentication is handled by the container (e.g. Tomcat, Jetty etc.). > > > There's a thread I found to be very useful on this topic here: > > http://www.lucidimagination.com/search/document/d1e338dc452db2e4/how_can_i_protect_the_solr_cores > > This was for Jetty, but the idea is pretty much the same for Tomcat. > > HTH > > Peter > > > > On Thu, Apr 29, 2010 at 8:42 AM, Andrew McCombe wrote: > >> Hi >> >> I'm planning on adding some protection to our solr servers and would >> like to know what others are doing in this area. >> >> Basically I have a few solr cores running under tomcat6 and all use DH >> to populate the solr index. This is all behind a firewall and only >> accessible from certain IP addresses. Access to Solr Admin is open to >> anyone in the company and many use it for checking data is in the >> index and simple analysis. However, they can also trigger a >> full-import if they are careless (one of the cores takes 6 hours to >> ingest the data). >> >> What would be the recommended way of protecting things like the DIH >> functionality? HTTP Authentication via tomcat realms or are there any >> other solutions? >> >> Thanks >> Andrew McCombe >> iWeb Solutions >> >
Solr Dismax query - prefix matching
Folks, Greetings. Using dismax query parser is there a way to perform prefix match. For example: If I have a field called 'booktitle' with the actual values as 'Code Complete', 'Coding standard 101', then I'd like to search for the query string 'cod' and have the dismax match against both the book titles since 'cod' is a prefix match for 'code' and 'coding'. Thanks, Bharath
RE: Using NoOpMergePolicy (Lucene 2331) from Solr
Thanks Koji, That was the information I was looking for. I'll be sure to post the test results to the list. It may be a few weeks before we can schedule the tests for our test server. Tom >>I've never tried it but NoMergePolicy and NoMergeScheduler >>can be specified in solrconfig.xml: >> 1000 >> >> Koji -- http://www.rondhuit.com/en/
Re: StreamingUpdateSolrServer hangs
On Fri, Apr 16, 2010 at 1:34 PM, Sascha Szott wrote: > In my case the whole application hangs and never recovers (CPU utilization > goes down to near 0%). Interestingly, the problem reproducibly occurs only > if SUSS is created with *more than 2* threads. Is your application also using multiple threads when adding docs to the SUSS? FYI, I opened https://issues.apache.org/jira/browse/SOLR-1885 to track this. -Yonik Apache Lucene Eurocon 2010 18-21 May 2010 | Prague
Re: Relevancy Practices
I think the problems one has to solve are depending on the usecases one has to deal with. It makes a difference whether I got much documents that are bloody similar but with different contexts and I have to determine what query applies to what context in what probability for which document - or if I have lots of editorialy managed documents with relatively clear contexts, because they offer human-created tags etc. I haven't made much experiences with Solr (and no experiences in a productive environment). However, those experiences I have made show that spliting the document's context in as small parts as possible is always a good idea. I don't mean splitting in a sense of making the part's of a document smaller. I mean that in a way of making it easier to decide which part of a document is more important than another. e.g.: I got a social network and every user is able to create his or her own blog - as a corporation I want to make them all searchable. It would be beneficial for high-quality search, if I am able to extract the introduction, the category (maybe added by the author). According to this: If this is not done by people, or not well done enough, than I need to do so algorithmically. e.g.: If I got a dictionary of person-names, than I could use the keepWordFilter to create a field I can facet *and* boost on. Let's say the user writes about Paris Hilton, Barrack Obama or any other well known person, than I can extract their names from the content in an easy way - of course this could be done better, but that's not the point here. If I search for "Obama's speech" all documents with "Obama" could get a boost. The difference between the solution without this keepWordFilter-feature would be, that Solr does not know that the most important word in this query is "Obama". It is only a shortcut of some ideas on how one can improve the relevancy with several features that Solr offers out-of-the-box. Some of them could be improved with external NLP-tools. My biggest problem with relevancy is, that I can't work with metadata computed on the fly or every hour out of the box (okay, you mentioned at the discussion on the dev-list that it may be possible, however I answered that the feature you talked about is not well documented, so that I don't know if it fits my needs or how to use it). How to avoid over- or under-tuning? Easily: Testing every change I made on scoring-factors against a lot of queries. If it looks good in 9 of 10 cases in a real good way, than the 10th case runs against a really bad query or could be solved with a facet or... there are a lot of ideas how to solve this. What I really want to say is: Test as much as you can and try to realize what your changes really mean (for example I can make a boost on the title of a document with a value of 1.000, every other field has got a boost-value between 1 and 10. I am relatively sure that this meets the needs for some queries but works catastrophal with the rest). It really helps to understand how Lucene's similarity works and what those factors mean in reality to your existing data. Maybe you need to change the smiliarity, because you don't want that the length of a document influences the score of it. Just some thougths. I don't think that I tell you much new stuff, however, if you got any questions or want to know more about this or that, please ask. Unfortunately I can't go to the ApacheCon, but hopefully it helps to give a good presentation. Kind regards - Mitch -- View this message in context: http://lucene.472066.n3.nabble.com/Relevancy-Practices-tp765364p766456.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Using QueryElevationComponent without specifying top results?
What you want is: All results within the area and whatever results the QueryElevateComponent adds, sorted by some relevance function. If this is it, you can get the results, with the elevated output, and do a second query with all of the ids, sorted by distance. This second query would not use the filter query. On Thu, Apr 29, 2010 at 4:04 AM, Oliver Beattie wrote: > Just wondering if anyone had any further thoughts on how I might do this? > > On 26 April 2010 19:18, Oliver Beattie wrote: > >> Hi Grant, >> >> Thanks for getting back to me. Yes, indeed, #1 is exactly what I'm looking >> for. Results are already ranked by distance (among other things), but we >> need the ability to manually include a certain result in the set. They >> wouldn't usually match, because they fall outside the radius of the filter >> query we use. Most of the resulting score comes from function queries (we >> have a number of metrics that rank listings [price, feedback score, etc]), >> so the score from the text search doesn't have *that much* bearing on the >> outcome. So, yeah, basically, I'm looking for a way to include results that >> don't match, but have Solr calculate its score as it would if it did match >> the filter query. Sorry for being so unclear and rambling a bit, I'm >> struggling to articulate what we want in a clear manner! >> >> —Oliver >> >> >> >> On 26 April 2010 19:13, Grant Ingersoll wrote: >> >>> >>> On Apr 26, 2010, at 7:53 AM, Oliver Beattie wrote: >>> >>> > Hi all, >>> > >>> > I'm currently writing an application that uses Solr, and we'd like to >>> use >>> > something like the QueryElevationComponent, without having to specify >>> which >>> > results appear top. For example, what we really need is a way to say >>> "for >>> > this search, include these results as part of the result set, and rank >>> them >>> > as you normally would". We're using a filter to specify which results we >>> > want included (which is distance-based), but we really want to be able >>> to >>> > explicitly include certain results in certain queries (i.e. we want to >>> > include a listing more than 5 miles away from a particular location for >>> > certain queries). >>> > >>> > Is this possible? Any help would be really appreciated :) >>> >>> >>> I'm not following the "rank them as you normally would" part. If Solr >>> were already finding them, then they would already be ranked and showing up >>> in the results and you wouldn't need to "hardcode" them, right? So, that >>> leaves a couple of cases: >>> >>> 1. Including results that don't match >>> 2. Elevating results that do match >>> >>> In your case, it sounds like you mostly just want #1. And, based on the >>> context (distance search) perhaps you want those results sorted by distance? >>> Otherwise, how else would you know where to inject the results? >>> >>> The QueryElevationComponent can include the results, although, I must >>> admit, I'm not 100% certain on what happens to injected results given >>> sorting. >>> >>> >>> -- >>> Grant Ingersoll >>> http://www.lucidimagination.com/ >>> >>> Search the Lucene ecosystem using Solr/Lucene: >>> http://www.lucidimagination.com/search >>> >>> >> > -- Lance Norskog goks...@gmail.com
Re: Slow Date-Range Queries
Do you really need the *:* stuff in the date range subqueries? That may add to the execution time. On Thu, Apr 29, 2010 at 9:52 AM, Erick Erickson wrote: > Hmmm, what does the rest of your query look like? And does adding > &debugQuery=on show anything interesting? > > Best > Erick > > On Thu, Apr 29, 2010 at 6:54 AM, Jan Simon Winkelmann < > winkelm...@newsfactory.de> wrote: > >> > > ((valid_from:[* TO 2010-04-29T10:34:12Z]) AND >> > > (valid_till:[2010-04-29T10:34:12Z TO *])) OR ((*:* >> > > -valid_from:[* TO *]) AND (*:* -valid_till:[* TO *]))) >> > > >> > > I use the empty checks for datasets which do not have a >> > > valid from/till range. >> > > >> > > >> > > Is there any way to get this any faster? >> > >> > I can suggest you two things. >> > >> > 1-) valid_till:[* TO *] and valid_from:[* TO *] type queries can be >> > performance killer. You can create a new boolean field ( populated via >> > conditional copy or populated client side) that holds the information >> > whether valid_from exists or not. So that valid_till:[* TO *] can be >> > rewritten as valid_till_bool:true. >> >> That may be an idea, however i checked what happens when I simply leave >> them out. It does affect the performance but the query is still somewhere >> around 1 second. >> >> > 2-) If you are embedding these queries into q parameter, you can write >> > your clauses into (filter query) fq parameters so that they are cached. >> >> The problem here is, that the timestamp itself does change quite a bit and >> hence cannot be properly cached. It could be for a few seconds, but >> occasional response times of more than a second is still unacceptable for >> us. We need a solution that responds quickly ALL the time, not just most of >> the time. >> >> Thanks for your ideas though :) >> >> regards, >> Jan-Simon >> >> > -- Lance Norskog goks...@gmail.com
Re: Solr configuration to enable indexing/searching webapp log files
It sounds like you want a data warehouse, not a text search engine. Splunk and Pentaho are good things to try. On Thu, Apr 29, 2010 at 12:03 PM, Jon Baer wrote: > To follow up it ... it seems dumping to Solr is common ... > > http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-query-terabytes-data > > - Jon > > On Apr 29, 2010, at 1:58 PM, Jon Baer wrote: > >> Good question, +1 on finding answer, my take ... >> >> Depending on how large of log files you are talking about it might be better >> off to do this w/ HDFS / Hadoop (and a script language like Pig) (or Amazon >> EMR) >> >> http://developer.amazonwebservices.com/connect/entry.jspa?externalID=873 >> >> Theoretically you could split the logs to fields, use a dataimporter and >> search / sort w/ something like LineEntityProcessor. >> >> http://wiki.apache.org/solr/DataImportHandler#LineEntityProcessor >> >> I've tried to use Solr as a log analytics tool (before dataimporthandler) >> and it was not worth the disk space or practical but I'd love to hear >> otherwise. In general you could flush daily logs to an index but working w/ >> the data in another context if you had to seems better fit for HDFS use (I >> think). >> >> - Jon >> >> On Apr 29, 2010, at 1:46 PM, Stefan Maric wrote: >> >>> >>> I thought i remembered seeing some information about this, but have been >>> unable to find it >>> >>> Does anyone know if there is a configuration / module that would allow us to >>> setup Solr to take in the (large) log files generated by our web/app >>> servers, so that we can query for things like peak time requests or most >>> frequently requested web page etc >>> >>> Thanks >>> Stefan Maric >>> >> > > -- Lance Norskog goks...@gmail.com
Re: StreamingUpdateSolrServer hangs
In solrconfig.xml, there is a parameter controlling remote streaming: 1) Is this relevant with the SUSS? 2) It seems to be 'true' in the example default, which may not be a good idea. On Thu, Apr 29, 2010 at 2:12 PM, Yonik Seeley wrote: > On Fri, Apr 16, 2010 at 1:34 PM, Sascha Szott wrote: >> In my case the whole application hangs and never recovers (CPU utilization >> goes down to near 0%). Interestingly, the problem reproducibly occurs only >> if SUSS is created with *more than 2* threads. > > Is your application also using multiple threads when adding docs to the SUSS? > FYI, I opened https://issues.apache.org/jira/browse/SOLR-1885 to track this. > > -Yonik > Apache Lucene Eurocon 2010 > 18-21 May 2010 | Prague > -- Lance Norskog goks...@gmail.com
Re: Evangelism
DollarDays.com is currently using it and we display the powered by logo as at least a gesture of giving back to the community. Ryan T. Grange, IT Manager DollarDays International, Inc. rgra...@dollardays.com (480)922-8155 x106 On 4/29/2010 11:10 AM, Daniel Baughman wrote: Hi I'm new to the list here, I'd like to steer someone in the direction of Solr, and I see the list of companies using solr, but none have a "power by solr" logo or anything. Does anyone have any great links with evidence to majorly successful solr projects? Thanks in advance, Dan B.
Re: StreamingUpdateSolrServer hangs
On Thu, Apr 29, 2010 at 6:04 PM, Lance Norskog wrote: > In solrconfig.xml, there is a parameter controlling remote streaming: > > > multipartUploadLimitInKB="2048000" /> > > 1) Is this relevant with the SUSS? No, this relates to solr pulling data from another source (via stream.url) -Yonik Apache Lucene Eurocon 2010 18-21 May 2010 | Prague
Re: StreamingUpdateSolrServer hangs
What is the garbage collection status when this happens? What are the open sockets in the OS when this happens? Run 'netstat -an | fgrep 8983' where 8983 is the Solr incoming port number. A side note on sockets: SUSS uses the MultiThreadedHttpConnectionManager but never calls MultiThreadedHttpConnectionManager.closeIdleConnections() on its sockets. I don't know if this is a problem, but it should do this as a matter of dotting the i's and crossing the t's. On Thu, Apr 29, 2010 at 3:25 PM, Yonik Seeley wrote: > On Thu, Apr 29, 2010 at 6:04 PM, Lance Norskog wrote: >> In solrconfig.xml, there is a parameter controlling remote streaming: >> >> >> > multipartUploadLimitInKB="2048000" /> >> >> 1) Is this relevant with the SUSS? > > No, this relates to solr pulling data from another source (via stream.url) > > -Yonik > Apache Lucene Eurocon 2010 > 18-21 May 2010 | Prague > -- Lance Norskog goks...@gmail.com
Re: StreamingUpdateSolrServer hangs
I'm trying to reproduce now... single thread adding documents to a multithreaded client, StreamingUpdateSolrServer(addr,32,4) I'm currently at the 2.5 hour mark and 100M documents - no issues so far. -Yonik Apache Lucene Eurocon 2010 18-21 May 2010 | Prague On Thu, Apr 29, 2010 at 5:12 PM, Yonik Seeley wrote: > On Fri, Apr 16, 2010 at 1:34 PM, Sascha Szott wrote: >> In my case the whole application hangs and never recovers (CPU utilization >> goes down to near 0%). Interestingly, the problem reproducibly occurs only >> if SUSS is created with *more than 2* threads. > > Is your application also using multiple threads when adding docs to the SUSS? > FYI, I opened https://issues.apache.org/jira/browse/SOLR-1885 to track this. > > -Yonik > Apache Lucene Eurocon 2010 > 18-21 May 2010 | Prague >
Re: synonym filter problem for string or phrase
On 4/29/10 8:50 PM, Marco Martinez wrote: Hi Ranveer, If you don't specify a field type in the q parameter, the search will be done searching in your default search field defined in the solrconfig.xml, its your default field a text_sync field? Regards, Marco Martínez Bautista http://www.paradigmatecnologico.com Avenida de Europa, 26. Ática 5. 3ª Planta 28224 Pozuelo de Alarcón Tel.: 91 352 59 42 2010/4/29 Ranveer Hi, I am trying to configure synonym filter. my requirement is: when user searching by phrase like "what is solr user?" then it should be replace with "solr user". something like : what is solr user? => solr user My schema for particular field is: it seems working fine while trying by analysis.jsp but not by url http://localhost:8080/solr/core0/select?q="what is solr user?" or http://localhost:8080/solr/core0/select?q=what is solr user? Please guide me for achieve desire result. Hi Marco, thanks. yes my default search field is text_sync. I am getting result now but not as I expect. following is my synonym.txt what is bone cancer=>bone cancer what is bone cancer?=>bone cancer what is of bone cancer=>bone cancer what is symptom of bone cancer=>bone cancer what is symptoms of bone cancer=>bone cancer in above I am getting result of all synonym but not the last one "what is symptoms of bone cancer=>bone cancer". I think due to stemming I am not getting expected result. However when I am checking result from the analysis.jsp, its giving expected result. I am confused.. Also I want to know best approach to configure synonym for my requirement. thanks with regards Hi, I am also facing same type of problem.. I am Newbie please help. thanks Jonty
ubuntu lucid package
Hi I've installed solr-tomcat package on ubuntu lucid (10.04 latest). It automatically install java and tomcat and hopefully all other dependencies. I can access tomcat at http://localhost:8080 but not sure where to find the solr web admin http://localhost:8180 gives me nothing. Is this package known to work? I've read that on previous ubuntu releases the packages were broken. Do I need to configure anything after installing the package? Thanks
Re: ubuntu lucid package
Pablo, Ubuntu Lucid is *brand* new :) try: find / -name \*solr\* or locate solr.war Or simply try http://localhost:8080/solr/admin/ Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message > From: pablo platt > To: solr-user@lucene.apache.org > Sent: Thu, April 29, 2010 10:27:31 PM > Subject: ubuntu lucid package > > Hi I've installed solr-tomcat package on ubuntu lucid (10.04 > latest). It automatically install java and tomcat and hopefully all > other dependencies. I can access tomcat at > target=_blank >http://localhost:8080 but not sure where to find the solr > web admin > >http://localhost:8180 gives me nothing. Is this package known to > work? I've read that on previous ubuntu releases the packages were > broken. Do I need to configure anything after installing the > package? Thanks
copyField - how does it work?
Hi, I have my config something like "clubbed_text" of type "text" and "clubbed_string" of type "string". : BLOCK-1... BLOCK-2... BLOCK-3... BLOCK-4... Is the copyField valid specified in BLOCK-4? It seems it is not populating the clubbed_string with the values of field_A and field_B. Do I need to populate clubbed_string by explicitly copying field_A and field_B directly to it? Please help. regards, Naga
RE: How to make documents low priority
Thanks much Koji, Let me have look on this, Regards Prakash -Original Message- From: Koji Sekiguchi [mailto:k...@r.email.ne.jp] Sent: Thursday, April 29, 2010 8:25 PM To: solr-user@lucene.apache.org Subject: Re: How to make documents low priority Doddamani, Prakash wrote: > Thanks Jon, > > Its very nice idea I dint thought about it, But I am already using > order for one more field, "sort=field1+desc" > > Can I have order for 2 fields something like > "sort=field1+desc&field5+desc" > > Yes, you can: sort=field1+desc,field5+desc http://wiki.apache.org/solr/CommonQueryParameters#sort Koji -- http://www.rondhuit.com/en/
Re: benefits of float vs. string
Please explain a range query? tia :-) Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Thu, 4/29/10, Yonik Seeley wrote: > From: Yonik Seeley > Subject: Re: benefits of float vs. string > To: solr-user@lucene.apache.org > Date: Thursday, April 29, 2010, 1:01 PM > On Wed, Apr 28, 2010 at 11:22 AM, > Nagelberg, Kallin > > wrote: > > Does anyone have an idea about the performance > benefits of searching across floats compared to strings? I > have one multi-valued field that contains about 3000 > distinct IDs across 5 million documents. I am going to be a > lot of queries like q=id:102 OR id:303 OR id:305, etc. Right > now it is a String but I am going to switch to a float as > intuitively it ought to be easier to filter a number than a > string. > > > There won't be any difference in search speed for term > queries as you > show above. > If you don't need to do sorting or range queries on that > field, I'd > leave it as a String. > > > -Yonik > Apache Lucene Eurocon 2010 > 18-21 May 2010 | Prague >
Elevation of of part match
I would like to be able to elevate documents if the query matches part of a string. For example, I would like to elevate the document FOO in case the query contains the word 'archive'. So when executing the queries "packet archive" "archive failure" "archive" All leads to the document FOO being elevated to the top. Playing with the elevate component, this is not the case. The component seems to only work with complete matches. Or? Thanks, Gert. Please help Logica to respect the environment by not printing this email / Pour contribuer comme Logica au respect de l'environnement, merci de ne pas imprimer ce mail / Bitte drucken Sie diese Nachricht nicht aus und helfen Sie so Logica dabei, die Umwelt zu schützen. / Por favor ajude a Logica a respeitar o ambiente nao imprimindo este correio electronico. This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.
Trouble with parenthesis
Hi everybody, We got a problem with parenthesis in a lucene/solr request (Solr 1.4) : - {!lucene q.op=AND}( ville:"Moscou" -periodicite:"annuel") give 254documents with parsedquery>+ville:Moscou -periodicite:annuel< in debug mode. Thas'ts correct. - {!lucene q.op=AND} (ville:"Moscou" AND NOT periodicite:"annuel") same results. - {!lucene q.op=AND} (ville:"Moscou" AND (NOT periodicite:"annuel")) give 0 documents with parsedquery>+ville:Moscou +(-periodicite:annuel)< The 2 fields are standards string fields in the solr shema. Is it a issue or standard way of the Solr Query Parser ? Best regards. Gilbert Boyreau
Any way to get top 'n' queries searched from Solr?
Hi, I need to know what are the top (most frequently searched and their frequencies) 'n' (say 100) search queries that users tried. Does Solr keep this information and can return, or else what options do i have here? Thanks, Praveen