Try this,
http://viewer.opencalais.com/
They have an open API for that data. With your text message of :
"John Mayer Mumbai 411004 Juhu, car driver, also capable of body guard"
It gives back:
People: John Mayer Mumbai
Positions: body guard, car driver.
It's not perfect but it's not bad eithe
ManBearPig is still a threat.
-Kallin Nagelberg
-Original Message-
From: Jonathan Rochkind [mailto:rochk...@jhu.edu]
Sent: Tuesday, July 27, 2010 7:44 PM
To: solr-user@lucene.apache.org
Subject: RE: How to 'filter' facet results
> Is there a way to tell Solr to only return a specific se
if i understand correctly in solr it would store
> > the
> > > field like this:
> > >
> > > p_value: "Pramod" "Raj"
> > > p_type: "Client" "Supplier"
> > >
> > > When i search
> > >
I think you just want something like:
p_value:"Pramod" AND p_type:"Supplier"
no?
-Kallin Nagelberg
-Original Message-
From: Pramod Goyal [mailto:pramod.go...@gmail.com]
Sent: Friday, July 23, 2010 2:17 PM
To: solr-user@lucene.apache.org
Subject: help with a schema design problem
Hi,
L
Hey,
I recently moved a solr app from a testing environment into a production
environment, and I'm seeing a brand new error which never occurred during
testing. I'm seeing this in the solrJ-based app logs:
org.apache.solr.common.SolrException: com.caucho.vfs.SocketTimeoutException:
client tim
Yeah you should definitely just setup a custom parser for each site.. should be
easy to extract title using groovy's xml parsing along with tagsoup for sloppy
html. If you can't find the pattern for each site leading to the job title how
can you expect solr to? Humans have the advantage here :P
How about:
1. Create a date field to indicate indextime.
2 Use a date filter to restrict articles to today and yesterday such as
myindexdate:"[NOW/DAY-1DAY TO NOW/DAY+1DAY]"
3. sort on that field.
-Kallin Nagelberg
-Original Message-
From: oferiko [mailto:ofer...@gmail.com]
Sent: Th
So you want to take the top 1000 sorted by score, then sort those by another
field. It's a strange case, and I can't think of a clean way to accomplish it.
You could do it in two queries, where the first is by score and you only
request your IDs to keep it snappy, then do a second query against
How much memory have you given the solr jvm? Many servlet containers have small
amount by default.
-Kal
-Original Message-
From: olivier sallou [mailto:olivier.sal...@gmail.com]
Sent: Tuesday, June 29, 2010 2:04 PM
To: solr-user@lucene.apache.org
Subject: Faceted search outofmemory
Hi,
I'm pretty sure you need to be running the patch against a checkout of the
trunk sources, not a generated .war file. Once you've done that you can use the
build scripts to make a new war.
-Kallin Nagelberg
-Original Message-
From: Moazzam Khan [mailto:moazz...@gmail.com]
Sent: Tuesday,
nks,
Kallin Nagelberg
-----Original Message-
From: Nagelberg, Kallin
Sent: Thursday, June 03, 2010 1:36 PM
To: 'solr-user@lucene.apache.org'
Subject: RE: index growing with updates
Is there a way to trigger a purge, or under what conditions does it occur?
-Kallin Nagelber
rn it down to like 5 documents.
-Kal
-Original Message-
From: jim.bl...@pbwiki.com [mailto:jim.bl...@pbwiki.com] On Behalf Of Jim Blomo
Sent: Thursday, June 03, 2010 2:29 PM
To: solr-user@lucene.apache.org
Subject: Re: general debugging techniques?
On Thu, Jun 3, 2010 at 11:17 AM, Nage
How much memory have you given tomcat? The default is 64M which is going to be
really small for 5MB documents.
-Original Message-
From: jim.bl...@pbwiki.com [mailto:jim.bl...@pbwiki.com] On Behalf Of Jim Blomo
Sent: Thursday, June 03, 2010 2:05 PM
To: solr-user@lucene.apache.org
Subject:
your config is set up to replace unique keys, you're really
doing a delete and an add (under the covers). It could very well be that
the deleted version of the document is still in your index taking up
space and will be until it is purged.
HTH
Erick
On Thu, Jun 3, 2010 at 10:22 AM, Nage
Hey,
If I add a document to the index that already exists (same uniquekey) what is
the expected behavior? I would imagine that if the document is the same then
the index should not grow, but mine appears to be growing. Any ideas?
Thanks,
-Kallin Nagelberg
g a filter query, which is very fast and
> > efficient.
> >
> > Bill
> >
> > On Fri, May 28, 2010 at 12:28 PM, Nagelberg, Kallin <
> > knagelb...@globeandmail.com> wrote:
> >
> >> Multi-core is an option, but keep in mind if you go that ro
Multi-core is an option, but keep in mind if you go that route you will need to
do two searches to correlate data between the two.
-Kallin Nagelberg
-Original Message-
From: Robert Zotter [mailto:robertzot...@gmail.com]
Sent: Friday, May 28, 2010 12:26 PM
To: solr-user@lucene.apache.or
Good read here: http://mysolr.com/tips/denormalized-data-structure/ .
Are consultation requests unique to each consultant? In that case you could
represent the request as a Json String and store it as a multi-valued string
field for each consultant, though that makes querying against requests
t
mmit that would conflict.
Hopefully someone finds this useful eventually!
-Kallin Nagelberg
-Original Message-
From: Nagelberg, Kallin [mailto:knagelb...@globeandmail.com]
Sent: Friday, May 21, 2010 4:44 PM
To: 'solr-user@lucene.apache.org'
Subject: RE: seemingly impossible que
Searching is very fast with Solr, but no way as fast as keying into a map.
There is possibly disk I/O if your document isn't cached. Your situation sounds
unique enough I think you're going to need to prototype to see if it meets your
demands. Figure out how 'fast' is 'fast' for your application
I'm afraid nothing is completely 'real-time'. Even when doing your inserts on
the database there is time taken for those operations to complete. Right now I
have my solr server autocommiting every 30 seconds, which is 'real-time' enough
for me. You need to figure out what your threshold is, and
As I understand from looking at
https://issues.apache.org/jira/login.jsp?os_destination=/browse/SOLR-236 field
collapsing has been disabled on multi-valued fields. Is this really necessary?
Let's say I have a multi-valued field, 'my-mv-field'. I have a query like
(my-mv-field:1 OR my-mv-field:5
y this as a requirement I think this will suffics.
Cheers,
Geert-Jan
2010/5/20 Nagelberg, Kallin
> Yeah I need something like:
> (id:1 and maxhits:1) OR (id:2 and maxits:1).. something crazy like that..
>
> I'm not sure how I can hit solr once. If I do try and do them all in one
&
hink this will suffics.
Cheers,
Geert-Jan
2010/5/20 Nagelberg, Kallin
> Yeah I need something like:
> (id:1 and maxhits:1) OR (id:2 and maxits:1).. something crazy like that..
>
> I'm not sure how I can hit solr once. If I do try and do them all in one
> big OR query the
StreamingUpdateSolrServer already has multiple threads and uses multiple
connections under the covers. At least the api says ' Uses an internal
MultiThreadedHttpConnectionManager to manage http connections'. The constructor
allows you to specify the number of threads used,
http://lucene.apache.
ned this field by the ids specified you are left with 1 matching
doc for each id.
Again it is not guarenteed that all docs returned are different. Since you
didn't specify this as a requirement I think this will suffics.
Cheers,
Geert-Jan
2010/5/20 Nagelberg, Kallin
> Yeah I need somethi
Yeah I need something like:
(id:1 and maxhits:1) OR (id:2 and maxits:1).. something crazy like that..
I'm not sure how I can hit solr once. If I do try and do them all in one big OR
query then I'm probably not going to get a hit for each ID. I would need to
request probably 1000 documents to fin
Thanks Darren,
The problem with that is that it may not return one document per id, which is
what I need. IE, I could give 100 ids in that OR query and retrieve 100
documents, all containing just 1 of the IDs.
-Kallin Nagelberg
-Original Message-
From: dar...@ontrenet.com [mailto:dar
indexing faster then what your doing.Currently it takes
about 2hour to index the 5m documents I'm talking about. But I still
feel as if my machine is under utilized.
Thijs
On 20-5-2010 17:16, Nagelberg, Kallin wrote:
> How about throwing a blockingqueue,
> http://java.sun.com/j2se
com/film.php
--- On Thu, 5/20/10, Nagelberg, Kallin wrote:
> From: Nagelberg, Kallin
> Subject: RE: Machine utilization while indexing
> To: "'solr-user@lucene.apache.org'"
> Date: Thursday, May 20, 2010, 8:16 AM
> How about throwing a blockingqueue,
> http://
How about throwing a blockingqueue,
http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/BlockingQueue.html,
between your document-creator and solrserver? Give it a size of 10,000 or
something, with one thread trying to feed it, and one thread waiting for it to
get near full then draini
Hey everyone,
I've recently been given a requirement that is giving me some trouble. I need
to retrieve up to 100 documents, but I can't see a way to do it without making
100 different queries.
My schema has a multi-valued field like 'listOfIds'. Each document has between
0 and N of these ids
I suppose you are still losing some performance on the replicated box since it
needs to use some resources to warm the cache. It would be nice if a warmed
cache could be replicated from the master though perhaps that's not practical.
Chris is right though: The newly updated index created by a co
get basic products
in result set
sorry, what does "sku" mean?
I understand you like this: indexing base and variants, and include all
atributes (for one base and its variants) in each document. I think that
would work. Thanks.
Nagelberg, Kallin wrote:
>
> I agree that pulli
I agree that pulling all attributes into the parent sku during indexing could
work well. Define a Boolean field like 'isVirtual' to identify the non-leaf
skus, and use a multi-valued field for each of the attributes. For now you can
do a search like (isVirtual:true AND doorType:screen). If at a
I am trying to tune my Solr setup so that the caches are well warmed after the
index is updated. My documents are quite small, usually under 10k. I currently
have a document cache size of about 15,000, and am warming up 5,000 with a
query after each indexing. Autocommit is set at 30 seconds, and
Awesome that works, thanks Ahmet.
-Kallin Nagelberg
-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com]
Sent: Thursday, May 13, 2010 12:24 PM
To: solr-user@lucene.apache.org
Subject: Re: confused by simple OR
> I must be missing something very
> obvious here. I have a fil
I must be missing something very obvious here. I have a filter query like so:
(-rootdir:somevalue)
I get results for that filter
However, when I OR it with another term like so I get nothing:
((-rootdir:somevalue) OR (rootdir:somevalue AND someboolean:true))
How is this possible? Have I gone m
I'm not sure I understand how your results are truncated. They both find 21502
documents. The fact that you are sorting on '_erstelldatum' ascending and not
seeing any results for that field on the first page leads me to think that you
have 'sortMissingLast="false"' on that field's fieldType. In
Hey everyone,
Does anyone know if it is possible to control cache behavior on a per-request
basis? I would like to be able to use the queryResultCache for certain queries,
but have it bypassed for others. IE, I know at query time if there is 0 chance
of a hit and would like to avoid the cache o
Hey everyone,
I'm having some difficulty figuring out the best way to optimize for a certain
query situation. My documents have a many-valued field that stores lists of
IDs. All in all there are probably about 10,000 distinct IDs throughout my
index. I need to be able to query and find all docu
Hey everyone,
I'm curious if anyone has experiencing working with the company NStein and
their Solr based search solution S3. Any comments on performance, usability,
support etc. would be really appreciated.
Thanks,
-Kallin Nagelberg
Hey,
I've been using the dismax query parser so that I can pass a user created
search string directly to Solr. Now I'm getting the requirement that something
like 'Bo' must match 'Bob', or 'Bob Jo' must match 'Bob Jones'. I can't think
of a way to make this happen with Dismax, though it's prett
d 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php
--- On Thu, 4/29/10, Yonik Seeley wrote:
> From: Yonik Seeley
> Subject: Re: benefits of float vs. string
> To: solr-user@lucene.apache.org
> Date: Thursday, April 29, 2010, 1:01 PM
> On Wed, Apr 28
I had a very hard time selling Solr to business folks. Most are of the mind
that if you're not paying for something it can't be any good. That might also
be why they refrain from posting 'powered by solr' on their website, as if it
might show them to be cheap. They are also fearful of lack of su
You might want to look at DateMath,
http://lucene.apache.org/solr/api/org/apache/solr/util/DateMathParser.html. I
believe the default precision is to the millisecond, so if you afford to round
to the nearest second or even minute you might see some performance gains.
-Kallin Nagelberg
-Ori
Hi,
Does anyone have an idea about the performance benefits of searching across
floats compared to strings? I have one multi-valued field that contains about
3000 distinct IDs across 5 million documents. I am going to be a lot of queries
like q=id:102 OR id:303 OR id:305, etc. Right now it is a
r disk I/O. See
http://www.hathitrust.org/blogs/large-scale-search/scaling-large-scale-search-50-volumes-5-million-volumes-and-beyond
for details.
Tom
-Original Message-----
From: Nagelberg, Kallin [mailto:knagelb...@globeandmail.com]
Sent: Tuesday, April 27, 2010 4:13 PM
To:
Hey,
A question was raised during a meeting about our new Solr based search
projects. We're getting 4 cutting edge servers each with something like 24 Gigs
of ram dedicated to search. However there is some problem with the amount of
SAS based storage each machine can handle, and people wonder i
I have been using Jmeter to perform some load testing. In your case you might
like to take a look at
http://jakarta.apache.org/jmeter/usermanual/component_reference.html#CSV_Data_Set_Config
. This will allow you to use a random item from your query list.
Regards,
Kallin Nagelberg
-Original
mission
critical big scale use of Solr :)
On Apr 8, 2010, at 1:33 PM, Nagelberg, Kallin wrote:
> I've been doing work evaluating Solr for use on a hightraffic
> website for sometime and things are looking positive. I have some
> concerns from my higher-ups that I need to addr
Hi everyone,
I've been doing work evaluating Solr for use on a hightraffic website for
sometime and things are looking positive. I have some concerns from my
higher-ups that I need to address. I have suggested that we use a single index
in order to keep things simple, but there are suggestions
web app and embedded Solr. You code the calls to update cores
>> with the same SolrJ APIs either way.
>>
>> On Wed, Mar 24, 2010 at 2:19 PM, Nagelberg, Kallin
>> wrote:
>>> Hi,
>>>
>>> I've got a situation where I need to reindex a core once a
Hi,
I've got a situation where I need to reindex a core once a day. To do this I
was thinking of having two cores, one 'live' and one 'staging'. The app is
always serving 'live', but when the daily index happens it goes into 'staging',
then staging is swapped into 'live'. I can see how to do th
lues.
Just out of curiosity, can you tell us anything about what the Globe and
Mail is using Solr for? (assuming the question is work-related)
Peter
> -Original Message-
> From: Nagelberg, Kallin [mailto:knagelb...@globeandmail.com]
> Sent: Tuesday, March 23, 2010 1
I'm trying to perform a case-insensitive sort on a field in my index that
contains values like
aaa
bbb
AA
BB
And I get them sorted like:
aaa
bbb
AA
BB
When I would like them:
aa
aaa
bb
bbb
To do this I'm trying to setup a fieldType who's sole purpose is to lowercase a
value on query and ind
Try setting the boost to 0 for the fields you don't want to contribute to the
score.
Kallin Nagelberg
-Original Message-
From: Jason Chaffee [mailto:jchaf...@ebates.com]
Sent: Thursday, February 25, 2010 4:03 PM
To: solr-user@lucene.apache.org
Subject: How to use dismax and boosting pro
I'm having a problem when users enter stopwords in their query. I'm using a
dismax request handler against a field setup like:
I've noticed some peculiar behavior with the dismax searchhandler.
In my case I'm making the search "The British Open", and am getting 0 results.
When I change it to "British Open" I get many hits. I looked at the query
analyzer and it should be broken down to "british" and "open" tokens ('the'
Problem solved. I wasn't quoting the value. Since I was using names such as
'Gary Bettman' solr must have been giving all the Garys.
-Original Message-
From: Nagelberg, Kallin [mailto:knagelb...@globeandmail.com]
Sent: Tuesday, February 16, 2010 3:22 PM
To: 'solr-us
Hi everyone,
I am attempting to implement a faceted drill down feature with Solr. I am
having problems explaining some results of the fq parameter.
Let's say I have two fields, 'people' and 'category'. I do a search for 'dog'
and ask to facet on the people and category fields.
I am told that t
Hi everyone,
I'm trying to enhance a more like this search I'm conducting by boosting the
documents that have a date close to the original. I would like to do something
like a parabolic function centered on the date (would make tuning a little more
effective), though a linear function would pro
x
Besides using up a lot more memory, ord() isn't even going to work for
a field with multiple tokens indexed per value (like tdate).
I'd recommend using a function on the date value itself.
http://wiki.apache.org/solr/FunctionQuery#ms
-Yonik
http://www.lucidimagination.com
On Wed,
Hi everyone,
I've been trying to add a date based boost to my queries. I have a field like:
When I look at the datetime field in the solr schema browser I can see that
there are 9051 distinct dates.
When I try to add the parameter to my query like: bf=ord(datetime) (on a dismax
query) I alw
64 matches
Mail list logo