RE: Require some advice

2010-08-12 Thread Nagelberg, Kallin
Try this, http://viewer.opencalais.com/ They have an open API for that data. With your text message of : "John Mayer Mumbai 411004 Juhu, car driver, also capable of body guard" It gives back: People: John Mayer Mumbai Positions: body guard, car driver. It's not perfect but it's not bad eithe

RE: How to 'filter' facet results

2010-07-28 Thread Nagelberg, Kallin
ManBearPig is still a threat. -Kallin Nagelberg -Original Message- From: Jonathan Rochkind [mailto:rochk...@jhu.edu] Sent: Tuesday, July 27, 2010 7:44 PM To: solr-user@lucene.apache.org Subject: RE: How to 'filter' facet results > Is there a way to tell Solr to only return a specific se

RE: help with a schema design problem

2010-07-23 Thread Nagelberg, Kallin
if i understand correctly in solr it would store > > the > > > field like this: > > > > > > p_value: "Pramod" "Raj" > > > p_type: "Client" "Supplier" > > > > > > When i search > > >

RE: help with a schema design problem

2010-07-23 Thread Nagelberg, Kallin
I think you just want something like: p_value:"Pramod" AND p_type:"Supplier" no? -Kallin Nagelberg -Original Message- From: Pramod Goyal [mailto:pramod.go...@gmail.com] Sent: Friday, July 23, 2010 2:17 PM To: solr-user@lucene.apache.org Subject: help with a schema design problem Hi, L

solrj occasional timeout on commit

2010-07-23 Thread Nagelberg, Kallin
Hey, I recently moved a solr app from a testing environment into a production environment, and I'm seeing a brand new error which never occurred during testing. I'm seeing this in the solrJ-based app logs: org.apache.solr.common.SolrException: com.caucho.vfs.SocketTimeoutException: client tim

RE: faceted search with job title

2010-07-21 Thread Nagelberg, Kallin
Yeah you should definitely just setup a custom parser for each site.. should be easy to extract title using groovy's xml parsing along with tagsoup for sloppy html. If you can't find the pattern for each site leading to the job title how can you expect solr to? Humans have the advantage here :P

RE: how to eliminating scoring from a query?

2010-07-15 Thread Nagelberg, Kallin
How about: 1. Create a date field to indicate indextime. 2 Use a date filter to restrict articles to today and yesterday such as myindexdate:"[NOW/DAY-1DAY TO NOW/DAY+1DAY]" 3. sort on that field. -Kallin Nagelberg -Original Message- From: oferiko [mailto:ofer...@gmail.com] Sent: Th

RE: limiting the total number of documents matched

2010-07-14 Thread Nagelberg, Kallin
So you want to take the top 1000 sorted by score, then sort those by another field. It's a strange case, and I can't think of a clean way to accomplish it. You could do it in two queries, where the first is by score and you only request your IDs to keep it snappy, then do a second query against

RE: Faceted search outofmemory

2010-06-29 Thread Nagelberg, Kallin
How much memory have you given the solr jvm? Many servlet containers have small amount by default. -Kal -Original Message- From: olivier sallou [mailto:olivier.sal...@gmail.com] Sent: Tuesday, June 29, 2010 2:04 PM To: solr-user@lucene.apache.org Subject: Faceted search outofmemory Hi,

RE: Help patching Solr

2010-06-15 Thread Nagelberg, Kallin
I'm pretty sure you need to be running the patch against a checkout of the trunk sources, not a generated .war file. Once you've done that you can use the build scripts to make a new war. -Kallin Nagelberg -Original Message- From: Moazzam Khan [mailto:moazz...@gmail.com] Sent: Tuesday,

RE: index growing with updates

2010-06-04 Thread Nagelberg, Kallin
nks, Kallin Nagelberg -----Original Message- From: Nagelberg, Kallin Sent: Thursday, June 03, 2010 1:36 PM To: 'solr-user@lucene.apache.org' Subject: RE: index growing with updates Is there a way to trigger a purge, or under what conditions does it occur? -Kallin Nagelber

RE: general debugging techniques?

2010-06-03 Thread Nagelberg, Kallin
rn it down to like 5 documents. -Kal -Original Message- From: jim.bl...@pbwiki.com [mailto:jim.bl...@pbwiki.com] On Behalf Of Jim Blomo Sent: Thursday, June 03, 2010 2:29 PM To: solr-user@lucene.apache.org Subject: Re: general debugging techniques? On Thu, Jun 3, 2010 at 11:17 AM, Nage

RE: general debugging techniques?

2010-06-03 Thread Nagelberg, Kallin
How much memory have you given tomcat? The default is 64M which is going to be really small for 5MB documents. -Original Message- From: jim.bl...@pbwiki.com [mailto:jim.bl...@pbwiki.com] On Behalf Of Jim Blomo Sent: Thursday, June 03, 2010 2:05 PM To: solr-user@lucene.apache.org Subject:

RE: index growing with updates

2010-06-03 Thread Nagelberg, Kallin
your config is set up to replace unique keys, you're really doing a delete and an add (under the covers). It could very well be that the deleted version of the document is still in your index taking up space and will be until it is purged. HTH Erick On Thu, Jun 3, 2010 at 10:22 AM, Nage

index growing with updates

2010-06-03 Thread Nagelberg, Kallin
Hey, If I add a document to the index that already exists (same uniquekey) what is the expected behavior? I would imagine that if the document is the same then the index should not grow, but mine appears to be growing. Any ideas? Thanks, -Kallin Nagelberg

RE: Storing different entities in Solr

2010-05-28 Thread Nagelberg, Kallin
g a filter query, which is very fast and > > efficient. > > > > Bill > > > > On Fri, May 28, 2010 at 12:28 PM, Nagelberg, Kallin < > > knagelb...@globeandmail.com> wrote: > > > >> Multi-core is an option, but keep in mind if you go that ro

RE: Storing different entities in Solr

2010-05-28 Thread Nagelberg, Kallin
Multi-core is an option, but keep in mind if you go that route you will need to do two searches to correlate data between the two. -Kallin Nagelberg -Original Message- From: Robert Zotter [mailto:robertzot...@gmail.com] Sent: Friday, May 28, 2010 12:26 PM To: solr-user@lucene.apache.or

RE: Storing different entities in Solr

2010-05-28 Thread Nagelberg, Kallin
Good read here: http://mysolr.com/tips/denormalized-data-structure/ . Are consultation requests unique to each consultant? In that case you could represent the request as a Json String and store it as a multi-valued string field for each consultant, though that makes querying against requests t

RE: seemingly impossible query

2010-05-26 Thread Nagelberg, Kallin
mmit that would conflict. Hopefully someone finds this useful eventually! -Kallin Nagelberg -Original Message- From: Nagelberg, Kallin [mailto:knagelb...@globeandmail.com] Sent: Friday, May 21, 2010 4:44 PM To: 'solr-user@lucene.apache.org' Subject: RE: seemingly impossible que

RE: How real-time are Solr/Lucene queries?

2010-05-26 Thread Nagelberg, Kallin
Searching is very fast with Solr, but no way as fast as keying into a map. There is possibly disk I/O if your document isn't cached. Your situation sounds unique enough I think you're going to need to prototype to see if it meets your demands. Figure out how 'fast' is 'fast' for your application

RE: Any realtime indexing plugin available for SOLR

2010-05-26 Thread Nagelberg, Kallin
I'm afraid nothing is completely 'real-time'. Even when doing your inserts on the database there is time taken for those operations to complete. Right now I have my solr server autocommiting every 30 seconds, which is 'real-time' enough for me. You need to figure out what your threshold is, and

field collapsing on multi-valued field

2010-05-21 Thread Nagelberg, Kallin
As I understand from looking at https://issues.apache.org/jira/login.jsp?os_destination=/browse/SOLR-236 field collapsing has been disabled on multi-valued fields. Is this really necessary? Let's say I have a multi-valued field, 'my-mv-field'. I have a query like (my-mv-field:1 OR my-mv-field:5

RE: seemingly impossible query

2010-05-21 Thread Nagelberg, Kallin
y this as a requirement I think this will suffics. Cheers, Geert-Jan 2010/5/20 Nagelberg, Kallin > Yeah I need something like: > (id:1 and maxhits:1) OR (id:2 and maxits:1).. something crazy like that.. > > I'm not sure how I can hit solr once. If I do try and do them all in one &

RE: seemingly impossible query

2010-05-20 Thread Nagelberg, Kallin
hink this will suffics. Cheers, Geert-Jan 2010/5/20 Nagelberg, Kallin > Yeah I need something like: > (id:1 and maxhits:1) OR (id:2 and maxits:1).. something crazy like that.. > > I'm not sure how I can hit solr once. If I do try and do them all in one > big OR query the

RE: Machine utilization while indexing

2010-05-20 Thread Nagelberg, Kallin
StreamingUpdateSolrServer already has multiple threads and uses multiple connections under the covers. At least the api says ' Uses an internal MultiThreadedHttpConnectionManager to manage http connections'. The constructor allows you to specify the number of threads used, http://lucene.apache.

RE: seemingly impossible query

2010-05-20 Thread Nagelberg, Kallin
ned this field by the ids specified you are left with 1 matching doc for each id. Again it is not guarenteed that all docs returned are different. Since you didn't specify this as a requirement I think this will suffics. Cheers, Geert-Jan 2010/5/20 Nagelberg, Kallin > Yeah I need somethi

RE: seemingly impossible query

2010-05-20 Thread Nagelberg, Kallin
Yeah I need something like: (id:1 and maxhits:1) OR (id:2 and maxits:1).. something crazy like that.. I'm not sure how I can hit solr once. If I do try and do them all in one big OR query then I'm probably not going to get a hit for each ID. I would need to request probably 1000 documents to fin

RE: seemingly impossible query

2010-05-20 Thread Nagelberg, Kallin
Thanks Darren, The problem with that is that it may not return one document per id, which is what I need. IE, I could give 100 ids in that OR query and retrieve 100 documents, all containing just 1 of the IDs. -Kallin Nagelberg -Original Message- From: dar...@ontrenet.com [mailto:dar

RE: Machine utilization while indexing

2010-05-20 Thread Nagelberg, Kallin
indexing faster then what your doing.Currently it takes about 2hour to index the 5m documents I'm talking about. But I still feel as if my machine is under utilized. Thijs On 20-5-2010 17:16, Nagelberg, Kallin wrote: > How about throwing a blockingqueue, > http://java.sun.com/j2se

RE: Machine utilization while indexing

2010-05-20 Thread Nagelberg, Kallin
com/film.php --- On Thu, 5/20/10, Nagelberg, Kallin wrote: > From: Nagelberg, Kallin > Subject: RE: Machine utilization while indexing > To: "'solr-user@lucene.apache.org'" > Date: Thursday, May 20, 2010, 8:16 AM > How about throwing a blockingqueue, > http://

RE: Machine utilization while indexing

2010-05-20 Thread Nagelberg, Kallin
How about throwing a blockingqueue, http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/BlockingQueue.html, between your document-creator and solrserver? Give it a size of 10,000 or something, with one thread trying to feed it, and one thread waiting for it to get near full then draini

seemingly impossible query

2010-05-20 Thread Nagelberg, Kallin
Hey everyone, I've recently been given a requirement that is giving me some trouble. I need to retrieve up to 100 documents, but I can't see a way to do it without making 100 different queries. My schema has a multi-valued field like 'listOfIds'. Each document has between 0 and N of these ids

RE: disable caches in real time

2010-05-19 Thread Nagelberg, Kallin
I suppose you are still losing some performance on the replicated box since it needs to use some resources to warm the cache. It would be nice if a warmed cache could be replicated from the master though perhaps that's not practical. Chris is right though: The newly updated index created by a co

RE: Challenge: Searching for variant products and get basic products in result set

2010-05-19 Thread Nagelberg, Kallin
get basic products in result set sorry, what does "sku" mean? I understand you like this: indexing base and variants, and include all atributes (for one base and its variants) in each document. I think that would work. Thanks. Nagelberg, Kallin wrote: > > I agree that pulli

RE: Challenge: Searching for variant products and get basic products in result set

2010-05-19 Thread Nagelberg, Kallin
I agree that pulling all attributes into the parent sku during indexing could work well. Define a Boolean field like 'isVirtual' to identify the non-leaf skus, and use a multi-valued field for each of the attributes. For now you can do a search like (isVirtual:true AND doorType:screen). If at a

maximum recommended document cache size

2010-05-13 Thread Nagelberg, Kallin
I am trying to tune my Solr setup so that the caches are well warmed after the index is updated. My documents are quite small, usually under 10k. I currently have a document cache size of about 15,000, and am warming up 5,000 with a query after each indexing. Autocommit is set at 30 seconds, and

RE: confused by simple OR

2010-05-13 Thread Nagelberg, Kallin
Awesome that works, thanks Ahmet. -Kallin Nagelberg -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com] Sent: Thursday, May 13, 2010 12:24 PM To: solr-user@lucene.apache.org Subject: Re: confused by simple OR > I must be missing something very > obvious here. I have a fil

confused by simple OR

2010-05-13 Thread Nagelberg, Kallin
I must be missing something very obvious here. I have a filter query like so: (-rootdir:somevalue) I get results for that filter However, when I OR it with another term like so I get nothing: ((-rootdir:somevalue) OR (rootdir:somevalue AND someboolean:true)) How is this possible? Have I gone m

RE: strange behaviour when sorting, fields are missing in result

2010-05-12 Thread Nagelberg, Kallin
I'm not sure I understand how your results are truncated. They both find 21502 documents. The fact that you are sorting on '_erstelldatum' ascending and not seeing any results for that field on the first page leads me to think that you have 'sortMissingLast="false"' on that field's fieldType. In

cache control per-request

2010-05-06 Thread Nagelberg, Kallin
Hey everyone, Does anyone know if it is possible to control cache behavior on a per-request basis? I would like to be able to use the queryResultCache for certain queries, but have it bypassed for others. IE, I know at query time if there is 0 chance of a hit and would like to avoid the cache o

caching repeated OR'd terms

2010-05-06 Thread Nagelberg, Kallin
Hey everyone, I'm having some difficulty figuring out the best way to optimize for a certain query situation. My documents have a many-valued field that stores lists of IDs. All in all there are probably about 10,000 distinct IDs throughout my index. I need to be able to query and find all docu

nstein and 3S

2010-05-05 Thread Nagelberg, Kallin
Hey everyone, I'm curious if anyone has experiencing working with the company NStein and their Solr based search solution S3. Any comments on performance, usability, support etc. would be really appreciated. Thanks, -Kallin Nagelberg

prefixing with dismax

2010-04-30 Thread Nagelberg, Kallin
Hey, I've been using the dismax query parser so that I can pass a user created search string directly to Solr. Now I'm getting the requirement that something like 'Bo' must match 'Bob', or 'Bob Jo' must match 'Bob Jones'. I can't think of a way to make this happen with Dismax, though it's prett

RE: benefits of float vs. string

2010-04-30 Thread Nagelberg, Kallin
d 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Thu, 4/29/10, Yonik Seeley wrote: > From: Yonik Seeley > Subject: Re: benefits of float vs. string > To: solr-user@lucene.apache.org > Date: Thursday, April 29, 2010, 1:01 PM > On Wed, Apr 28

RE: Evangelism

2010-04-29 Thread Nagelberg, Kallin
I had a very hard time selling Solr to business folks. Most are of the mind that if you're not paying for something it can't be any good. That might also be why they refrain from posting 'powered by solr' on their website, as if it might show them to be cheap. They are also fearful of lack of su

RE: Slow Date-Range Queries

2010-04-29 Thread Nagelberg, Kallin
You might want to look at DateMath, http://lucene.apache.org/solr/api/org/apache/solr/util/DateMathParser.html. I believe the default precision is to the millisecond, so if you afford to round to the nearest second or even minute you might see some performance gains. -Kallin Nagelberg -Ori

benefits of float vs. string

2010-04-28 Thread Nagelberg, Kallin
Hi, Does anyone have an idea about the performance benefits of searching across floats compared to strings? I have one multi-valued field that contains about 3000 distinct IDs across 5 million documents. I am going to be a lot of queries like q=id:102 OR id:303 OR id:305, etc. Right now it is a

RE: nfs vs sas in production

2010-04-28 Thread Nagelberg, Kallin
r disk I/O. See http://www.hathitrust.org/blogs/large-scale-search/scaling-large-scale-search-50-volumes-5-million-volumes-and-beyond for details. Tom -Original Message----- From: Nagelberg, Kallin [mailto:knagelb...@globeandmail.com] Sent: Tuesday, April 27, 2010 4:13 PM To:

nfs vs sas in production

2010-04-27 Thread Nagelberg, Kallin
Hey, A question was raised during a meeting about our new Solr based search projects. We're getting 4 cutting edge servers each with something like 24 Gigs of ram dedicated to search. However there is some problem with the amount of SAS based storage each machine can handle, and people wonder i

RE: Benchmarking Solr

2010-04-12 Thread Nagelberg, Kallin
I have been using Jmeter to perform some load testing. In your case you might like to take a look at http://jakarta.apache.org/jmeter/usermanual/component_reference.html#CSV_Data_Set_Config . This will allow you to use a random item from your query list. Regards, Kallin Nagelberg -Original

RE: index corruption / deployment strategy

2010-04-09 Thread Nagelberg, Kallin
mission critical big scale use of Solr :) On Apr 8, 2010, at 1:33 PM, Nagelberg, Kallin wrote: > I've been doing work evaluating Solr for use on a hightraffic > website for sometime and things are looking positive. I have some > concerns from my higher-ups that I need to addr

index corruption / deployment strategy

2010-04-08 Thread Nagelberg, Kallin
Hi everyone, I've been doing work evaluating Solr for use on a hightraffic website for sometime and things are looking positive. I have some concerns from my higher-ups that I need to address. I have suggested that we use a single index in order to keep things simple, but there are suggestions

RE: multicore embedded swap / reload etc.

2010-03-26 Thread Nagelberg, Kallin
web app and embedded Solr. You code the calls to update cores >> with the same SolrJ APIs either way. >> >> On Wed, Mar 24, 2010 at 2:19 PM, Nagelberg, Kallin >> wrote: >>> Hi, >>> >>> I've got a situation where I need to reindex a core once a

multicore embedded swap / reload etc.

2010-03-24 Thread Nagelberg, Kallin
Hi, I've got a situation where I need to reindex a core once a day. To do this I was thinking of having two cores, one 'live' and one 'staging'. The app is always serving 'live', but when the daily index happens it goes into 'staging', then staging is swapped into 'live'. I can see how to do th

RE: lowercasing for sorting

2010-03-23 Thread Nagelberg, Kallin
lues. Just out of curiosity, can you tell us anything about what the Globe and Mail is using Solr for? (assuming the question is work-related) Peter > -Original Message- > From: Nagelberg, Kallin [mailto:knagelb...@globeandmail.com] > Sent: Tuesday, March 23, 2010 1

lowercasing for sorting

2010-03-23 Thread Nagelberg, Kallin
I'm trying to perform a case-insensitive sort on a field in my index that contains values like aaa bbb AA BB And I get them sorted like: aaa bbb AA BB When I would like them: aa aaa bb bbb To do this I'm trying to setup a fieldType who's sole purpose is to lowercase a value on query and ind

RE: How to use dismax and boosting properly?

2010-02-25 Thread Nagelberg, Kallin
Try setting the boost to 0 for the fields you don't want to contribute to the score. Kallin Nagelberg -Original Message- From: Jason Chaffee [mailto:jchaf...@ebates.com] Sent: Thursday, February 25, 2010 4:03 PM To: solr-user@lucene.apache.org Subject: How to use dismax and boosting pro

stop words make dismax fail

2010-02-24 Thread Nagelberg, Kallin
I'm having a problem when users enter stopwords in their query. I'm using a dismax request handler against a field setup like:

including 'the' dismax query kills results

2010-02-18 Thread Nagelberg, Kallin
I've noticed some peculiar behavior with the dismax searchhandler. In my case I'm making the search "The British Open", and am getting 0 results. When I change it to "British Open" I get many hits. I looked at the query analyzer and it should be broken down to "british" and "open" tokens ('the'

RE: filter queries not fully filtering

2010-02-16 Thread Nagelberg, Kallin
Problem solved. I wasn't quoting the value. Since I was using names such as 'Gary Bettman' solr must have been giving all the Garys. -Original Message- From: Nagelberg, Kallin [mailto:knagelb...@globeandmail.com] Sent: Tuesday, February 16, 2010 3:22 PM To: 'solr-us

filter queries not fully filtering

2010-02-16 Thread Nagelberg, Kallin
Hi everyone, I am attempting to implement a faceted drill down feature with Solr. I am having problems explaining some results of the fq parameter. Let's say I have two fields, 'people' and 'category'. I do a search for 'dog' and ask to facet on the people and category fields. I am told that t

parabolic type function centered on a date

2010-02-11 Thread Nagelberg, Kallin
Hi everyone, I'm trying to enhance a more like this search I'm conducting by boosting the documents that have a date close to the original. I would like to do something like a parabolic function centered on the date (would make tuning a little more effective), though a linear function would pro

RE: ord on TrieDateField always returning max

2010-01-06 Thread Nagelberg, Kallin
x Besides using up a lot more memory, ord() isn't even going to work for a field with multiple tokens indexed per value (like tdate). I'd recommend using a function on the date value itself. http://wiki.apache.org/solr/FunctionQuery#ms -Yonik http://www.lucidimagination.com On Wed,

ord on TrieDateField always returning max

2010-01-06 Thread Nagelberg, Kallin
Hi everyone, I've been trying to add a date based boost to my queries. I have a field like: When I look at the datetime field in the solr schema browser I can see that there are 9051 distinct dates. When I try to add the parameter to my query like: bf=ord(datetime) (on a dismax query) I alw