date:20110204

UIMA Error

2011-02-04 Thread Darx Oman

hi guys i'm trying to use UIMA contrib, but i got the following error ... INFO: [] webapp=/solr path=/select params={clean=false&commit=true&command=status&qt=/dataimport} status=0 QTime=0 05/02/2011 10:54:53 ص org.apache.solr.uima.processor.UIMAUpdateRequestProcessor processText INFO: Analazying

Re: geodist and spacial search

2011-02-04 Thread Bill Bell

Why not just: q=*:* fq={!bbox} sfield=store pt=49.45031,11.077721 d=40 fl=store sort=geodist() asc http://localhost:8983/solr/select?q=*:*&sfield=store&pt=49.45031,11.077721&; d=40&fq={!bbox}&sort=geodist%28%29%20asc That will sort, and filter up to 40km. No need for the fq={!func}geodist()

Re: Performance optimization of Proximity/Wildcard searches

2011-02-04 Thread Otis Gospodnetic

Heh, I'm not sure if this is valid thinking. :) By *matching* doc distribution I meant: what proportion of your millions of documents actually ever get matched and then how many of those make it to the UI. If you have 1000 queries in a day and they all end up matching only 3 of your docs, the s

Re: Performance optimization of Proximity/Wildcard searches

2011-02-04 Thread Otis Gospodnetic

Salman, Warming up may be useful if your caches are getting decent hit ratios. Plus, you are warming up the OS cache when you warm up. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message > From:

Re: Highlighting with/without Term Vectors

2011-02-04 Thread Otis Gospodnetic

Hi Salman, Ah, so in the end you *did* have TV enabled on one of your fields! :) (I think this was a problem we were trying to solve a few weeks ago here) How many docs you have in the index doesn't matter here - only N docs/fields that you need to display on a page with N results need to be re

Re: What is the best protocol for data transfer rate HTTP or RMI?

2011-02-04 Thread Otis Gospodnetic

Hi Gustavo, I think none of the answers I could give you would be valuable to you now, because they would be from circa 2007 or 2008. We didn't use Solr, just Lucene. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/

Re: WordDelimiterFilterFactory

2011-02-04 Thread Jay Hill

You can always try something like this out in the analysis.jsp page, accessible from the Solr Admin home. Check out that page and see how it allows you to enter text to represent what was indexed, and text for a query. You can then see if there are matches. Very handy to see how the various filters

WordDelimiterFilterFactory

2011-02-04 Thread John kim

If i use WordDelimiterFilterFactory during indexing and at query time, will a search for "cls500" find "cls 500" and "cls500x"? If so, will it find and score exact matches higher? If not, how do you get exact matches to display first?

Re: phrase, inidividual term, prefix, fuzzy and stemming search

2011-02-04 Thread Jay Hill

You mentioned that dismax does not support wildcards, but edismax does. Not sure if dismax would have solved your other problems, or whether you just had to shift gears because of the wildcard issue, but you might want to have a look at edismax. -Jay http://www.lucidimagination.com On Mon, Jan 3

NullPointerException on queries to new 3rd core

2011-02-04 Thread Alex Thurlow

I just moved to a multi core solr instance a few weeks ago, and it's been working great. I'm trying to add a 3rd core and I can't query against it though. I'm running 1.4.1 (and tried 1.4.0) with the spatial search plugin. This is the section in solr.xml I've removed the index dir and c

Re: Performance optimization of Proximity/Wildcard searches

2011-02-04 Thread Salman Akram

Well I assume many people out there would have indexes larger than 100GB and I don't think so normally you will have more RAM than 32GB or 64! As I mentioned the queries are mostly phrase, proximity, wildcard and combination of these. What exactly do you mean by distribution of documents? On this

Re: Performance optimization of Proximity/Wildcard searches

2011-02-04 Thread Salman Akram

I know so we are not really using it for regular warm-ups (in any case index is updated on hourly basis). Just tried few times to compare results. The issue is I am not even sure if warming up is useful for such regular updates. On Fri, Feb 4, 2011 at 5:16 PM, Otis Gospodnetic wrote: > Salman,

Re: prices

2011-02-04 Thread Dennis Gearon

That's a good idea, Yonik. So, fields that aren't stored don't get displayed, so the float field in the schema never gets seen by the user. Good, I like it. Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea

Re: Using terms and N-gram

2011-02-04 Thread openvictor Open

Hi Otis, That's good I finally made it. For sematext I am afraid that I am too poor to consider this solution :) (I am doing that for fun) Thank you anyway ! 2011/2/4 Otis Gospodnetic > Hi, > > The main difference is that CommonGrams will take 2 adjacent words and put > them > together, while N

RE: prices

2011-02-04 Thread Jonathan Rochkind

Your prices are just dollars and cents? For actual queries, you might consider an int type rather than a float type. Multiple by a hundred to put it in the index, then multiply your values in queries by a hundred before putting them in the query. Same for range facetting, just divide by 100 be

Re: HTTP ERROR 400 undefined field: *

2011-02-04 Thread Jed Glazner

Sorry for the lack of details. It's all clear in my head.. :) We checked out the head revision from the 3.x branch a few weeks ago (https://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x/). We picked up r1058326. We upgraded from a previous checkout (r960098). I am using our customi

Re: prices

2011-02-04 Thread Yonik Seeley

On Fri, Feb 4, 2011 at 12:56 PM, Dennis Gearon wrote: > Using solr 1.4. > > I have a price in my schema. Currently it's a tfloat. Somewhere along the way > from php, json, solr, and back, extra zeroes are getting truncated along with > the decimal point for even dollar amounts. > > So I have two q

RE: DataImportHandler usage with RDF database

2011-02-04 Thread McGibbney, Lewis John

Hi Otis... thanks for your thoughts. >I don't think DIH can read from a triple store today. It can read from a >RDBMS, >RSS/Atom feeds, URLs, mail servers, maybe others... >Maybe what you should be looking at is the ManifoldCF instead, although I don't >think it can fetch data from triple stores

prices

2011-02-04 Thread Dennis Gearon

Using solr 1.4. I have a price in my schema. Currently it's a tfloat. Somewhere along the way from php, json, solr, and back, extra zeroes are getting truncated along with the decimal point for even dollar amounts. So I have two questions, neither of which seemed to be findable with google. A/

Re: What is the best protocol for data transfer rate HTTP or RMI?

2011-02-04 Thread Mattmann, Chris A (388J)

Hi Guys, It depends on what properties you're trying to maximize. I've done several studies of this over the years: http://sunset.usc.edu/~mattmann/pubs/MSST2006.pdf http://sunset.usc.edu/~mattmann/pubs/IWICSS07.pdf http://sunset.usc.edu/~mattmann/pubs/icse-shark08.pdf And if you're really bore

Re: What is the best protocol for data transfer rate HTTP or RMI?

2011-02-04 Thread Gustavo Maia

Hi Otis, Hello, You have many documents, 2 billion. Could you explain to me how this set yours? The mine is defined as follows, but using lucene. I have 3 machines and each machine with 6 each hds. Each hd this index with afragment of 10GB. Soon I have 3 servers search. Each server uses the l

Re: Use Parallel Search

2011-02-04 Thread Gustavo Maia

Hello, I am not using Nutch. Let me explain more about how to use the lucene. The class has lucene RemoteSearch which a server machine is used to publish its index. RemoteSearchable remote = new RemoteSearchable (parallelSearcher); Naming.rebind ("//"+ LocalIP +"/"+ artPortMap.getNick (), remote

Re: Index Not Matching

2011-02-04 Thread Stefan Matheis

try http://localhost:8080/solr/select?q=*:* or while using solr's default port http://localhost:8983/solr/select?q=*:* On Fri, Feb 4, 2011 at 2:50 PM, Esclusa, Will wrote: > Hello Grijesh, > > The URL below returns a 404 with the following error: > > The requested resource (/select/) is not avail

RE: Index Not Matching

2011-02-04 Thread Esclusa, Will

Hello Grijesh, The URL below returns a 404 with the following error: The requested resource (/select/) is not available. -Original Message- From: Grijesh [mailto:pintu.grij...@gmail.com] Sent: Friday, February 04, 2011 12:17 AM To: solr-user@lucene.apache.org Subject: RE: Index Not Ma

Re: Facet Query

2011-02-04 Thread Bagesh Sharma

yes it works fine ... thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Facet-Query-tp2422212p2424155.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Problem in faceting

2011-02-04 Thread Bagesh Sharma

Sending two separate queries is an approach but i think it may affect performance of the solr because for every new search there will be two queries to solr due to this reason i was thinking to do it by a single query. I am going to implement it with two queries now but if any thing is found usefu

Re: Highlighting with/without Term Vectors

2011-02-04 Thread Salman Akram

Basically Term Vectors are only on one main field i.e. Contents. Average size of each document would be few KB's but there are around 130 million documents so what do you suggest now? On Fri, Feb 4, 2011 at 5:24 PM, Otis Gospodnetic wrote: > Salman, > > It also depends on the size of your docume

Re: Solr for finding similar word between two documents

2011-02-04 Thread Otis Gospodnetic

Rohan, You can really do that with Lucene's tokenizers to get individual tokens/words and a HashMap where keys are those words/tokens from the first document. You can then tokenize the second doc and check each of its words in the HashMap. Our Key Phrase Extractor ( http://sematext.com/produc

Re: Highlighting with/without Term Vectors

2011-02-04 Thread Otis Gospodnetic

Salman, It also depends on the size of your documents. Re-analyzing 20 fields of 500 bytes each will be a lot faster than re-analyzing 20 fields with 50 KB each. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Ori

Re: Performance optimization of Proximity/Wildcard searches

2011-02-04 Thread Otis Gospodnetic

Hi, > Sharding is an option too but that too comes with limitations so want to > keep that as a last resort but I think there must be other things coz 150GB > is not too big for one drive/server with 32GB Ram. Hmm what makes you think 32 GB is enough for your 150 GB index? It depends on q

Re: Performance optimization of Proximity/Wildcard searches

2011-02-04 Thread Otis Gospodnetic

Salman, I only skimmed your email, but wanted to say that this part sounds a little suspicious: > Our warm up script currently executes all distinct queries in our logs > having count > 5. It was run yesterday (with all the indexing update every It sounds like this will make warmup take a loo

Re: Detect Out of Memory Errors

2011-02-04 Thread Otis Gospodnetic

Hi, There are external tools that one can use to watch Java processes, listen for errors, and restart processes if they die - monit, daemontools, and some Java-specific ones. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/

Re: Solr Indexing Performance

2011-02-04 Thread Otis Gospodnetic

Hi, 2 GB for ramBufferSize is probably too much and not needed, but you could increase it from default 32 MB to something like 128 MB or even 512 MB, if you really have that much data where that would make a difference (you mention only 49 PDF files). I'd leave mergeFactor at 10 for now. The

Re: phrase, inidividual term, prefix, fuzzy and stemming search

2011-02-04 Thread Otis Gospodnetic

Hi, I'll admit I didn't read your email closely, but the first part makes me thing that ngrams, which I don't think you mentioned, might be handy for you here, allowing for misspellings without the implementation complexity. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lu

Re: Using terms and N-gram

2011-02-04 Thread Otis Gospodnetic

Hi, The main difference is that CommonGrams will take 2 adjacent words and put them together, while NGram* stuff will take a single word and chop it up in sequences of one or more characters/letters. If you are stuck with auto-complete stuff, consider http://sematext.com/products/autocomplete

Re: value for maxFieldLength

2011-02-04 Thread Otis Gospodnetic

Lewis, A large maxFieldLength may not necessarily result in OOM - it depends on -Xmx you are using, the number of concurrent documents being processed, and such. So the first thing I'd look would be my machine's RAM, then -Xmx I can afford, then based on that set maxFieldLengthmay. Otis Se

Re: What is the best protocol for data transfer rate HTTP or RMI?

2011-02-04 Thread Otis Gospodnetic

Gustavo, I haven't used RMI in 5 years, but last time I used it I remember it being problematic - this is in the context of Lucene-based search involving some 40 different shards/servers, high query rates, and some 2 billion documents, if I remember correctly. I remember us wanting to get away

Re: geodist and spacial search

2011-02-04 Thread Eric Grobler

Hi Grant, Thanks for the tip This seems to work: q=*:* fq={!func}geodist() sfield=store pt=49.45031,11.077721 fq={!bbox} sfield=store pt=49.45031,11.077721 d=40 fl=store sort=geodist() asc On Thu, Feb 3, 2011 at 7:46 PM, Grant Ingersoll wrote: > Use a filter query? See the {!geofilt} stuff

RE: Problem in faceting

2011-02-04 Thread Pierre GOSSE

Yes, I see I didn't understand that facet.query parameter. Have you consider submitting two queries ? One for results with q.op=OR, one for faceting with q.op=AND ? -Message d'origine- De : Grijesh [mailto:pintu.grij...@gmail.com] Envoyé : vendredi 4 février 2011 10:42 À : solr-user@lu

Re: DataImportHandler usage with RDF database

2011-02-04 Thread Otis Gospodnetic

Hi Lewis, > I am very interested in DataImportHandler. I have data stored in an RDF db > and >wish to use this data to boost query results via Solr. I wish to keep this >data >stored in db as I have a web app which directly maintains this db. Is it >possible to use a DataImportHandler to

Re: SOLR 1.4 and Lucene 3.0.3 index problem

2011-02-04 Thread Churchill Nanje Mambe

thanks Dominique I am on windows... how do I do this on a windows 7 machine... I have netbeans and I have SVN and ant plugins regards Mambe Churchill Nanje 237 33011349, AfroVisioN Founder, President,CEO http://www.afrovisiongroup.com | http://mambenanje.blogspot.com skypeID: mambenanje www.twit

RE: Problem in faceting

2011-02-04 Thread Grijesh

facet.query=+water +treatement +plant will not return the city facet that is needed by poster. That will give the counts matching the query facet.query=+water +treatement +plant only - Thanx: Grijesh http://lucidimagination.com -- View this message in context: http://lucene.472066.n3.nabble

RE: Problem in faceting

2011-02-04 Thread Pierre GOSSE

Using a facet query like facet.query=+water +treatement +plant ... should give a count of 0 to documents not having all tree terms. This could do the trick, if I understand how this parameter works.

Re: Problem in faceting

2011-02-04 Thread Grijesh

Try solr's new Local Params ,may that will help for your requirement. http://wiki.apache.org/solr/LocalParams - Thanx: Grijesh http://lucidimagination.com -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-in-faceting-tp2422182p2422534.html Sent from the Solr - Use

Solr faceting on score

2011-02-04 Thread Bagesh Sharma

Hi friends, Is it possible to do faceting over score. I want to results from facets which have more score. Please suggest. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-faceting-on-score-tp2422076p2422076.html Sent from the Solr - User mailing list archive at Nabble.co

Re: Problem in faceting

2011-02-04 Thread Bagesh Sharma

But i want results as it is as the above query is returning. There is no problem with the results with it is returning. Problem detail I have implemented search for my company in which in search box user can search any query. Now when a user search "water treatment plant". Then the results come

Re: Facet Query

2011-02-04 Thread Grijesh

No ,Facet query and fq parameters work with any type of query. when you will search for facet.query=city:mumbai then it will return facet like 3 Facet query is for faceting against perticullar query. If you wants result for that query then you have to go for fq=city:mumbai - Thanx:

Re: Problem in faceting

2011-02-04 Thread Grijesh

change the default operator from "OR" to "AND" by using q.op or in schema - Thanx: Grijesh http://lucidimagination.com -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-in-faceting-tp2422182p248.html Sent from the Solr - User mailing list archive at Nabble.com.

48 matches

Mail list logo