Re: deleteById without solrj?

2009-12-04 Thread Erik Hatcher
Also note that the XML that an be POSTed to /solr/update can also be sent as a content stream on the URL for a plain GET request: /solr/update?stream.body=...&commit=true Erik On Dec 3, 2009, at 3:05 PM, Tom Hill wrote: http://wiki.apache.org/solr/UpdateXmlMessages#A.22delete.2

Debian Lenny + Apache Tomcat 5.5 + Solr 1.4

2009-12-04 Thread rajan chandi
Hi All, We've deployed 4 instances of Solr on a debian server. It is taking only 1.5 GB of RAM on local ubuntu machine but it is taking 2.0 GB plus on Debian Lenny server. Any ideas/pointers will help. Regards Rajan

Re: Issues with alphanumeric search terms

2009-12-04 Thread AHMET ARSLAN
> I have added >     class="solr.WordDelimiterFilterFactory" catenateAll="1" > /> > to both index and query but still getting same behaviour. > > Is there any other that i am missing? > Did you re-start tomcat and re-index? Why not use StandardTokenizerFactory?

Re: creating Lucene document from an external XML file.

2009-12-04 Thread Phanindra Reva
Hello.., You have mentioned I can make use of UpdateProcessor API. May I know when the flow of execution enters that UpdateRequestProcessor class.? To be brief , it would be perfect for my case if its after analysis but exactly before its being added to the index. Thanks alot. On Wed, De

Re: Issues with alphanumeric search terms

2009-12-04 Thread Erick Erickson
as Ahmet says, you need to re-index. Nothing about WordDelmiterFilterFactory alters case as far as I can tell from http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory Are you applying this in addition to the LowerCaseTokenizerFactory? In which case it's too

how to get list of unique terms for a field

2009-12-04 Thread Joel Nylund
Hi, lets say I have a field called countryName, is there a way to get a list of all the countries for this field? Trying to figure out a nice way to keep my categories and the solr results in sync, would be nice to get these from solr instead of the database. thanks Joel

Re: how to get list of unique terms for a field

2009-12-04 Thread Erik Hatcher
On Dec 4, 2009, at 8:59 AM, Joel Nylund wrote: lets say I have a field called countryName, is there a way to get a list of all the countries for this field? Trying to figure out a nice way to keep my categories and the solr results in sync, would be nice to get these from solr instead of th

Re: Debian Lenny + Apache Tomcat 5.5 + Solr 1.4

2009-12-04 Thread Yonik Seeley
Are you explicitly setting the heap sizes? If not, the JVM is deciding for itself based on what the box looks like (ram, cpus, OS, etc). Are they both the same architecture (32 bit or 64 bit?) -Yonik http://www.lucidimagination.com p.s. in general cross-posting to both solr-user and solr-dev is

Re: dismax query syntax to replace standard query

2009-12-04 Thread javaxmlsoapdev
Thanks. When I do it that way it gives me following query. params={indent=on&start=0&q=risk+test&qt=dismax&fq=statusName:(Male+OR+Female)+name:"Joe"&hl=on&rows=10&version=2.2} hits=63 status=0 QTime=54 I typed in 'Risk test' (no quote in the text) in the text field in UI. I want search to do AN

Re: latency in solr response is observed after index is updated

2009-12-04 Thread Bharath Venkatesh
Hi Kay Kay , We have commented out auto commit frequency in solrconfig.xml below is the cache configuration:- will further requests after index is updated wait for auto warming to complete ? Thanks, Bharath Kay Kay wrote: > What would be the average doc size.

edismax using bigrams instead of phrases?

2009-12-04 Thread Bill Dueber
I've started trying edismax, and have noticed that my relevancy ranking is messed up with edismax because, according to the debug output, it's using bigrams instead of phrases and inexplicably ignoring a couple of the pf fields. While the hit count isn't changing, this kills my ability to boost ex

Re: how to get list of unique terms for a field

2009-12-04 Thread Bill Dueber
Here's a pretty simple perl script. Call it as "scriptname facetindex" (or "scriptname facetindex maxnum") # #!/usr/local/bin/perl use strict; use JSON::XS; use LWP::Simple; ### CHANGE THIS TO YOUR URL!! ### my $select = 'http://solr-vufind:8026/solr/biblio/select'; # Get facet an

RE: search on tomcat server

2009-12-04 Thread Jill Han
I went through all the links on http://wiki.apache.org/solr/#Search_and_Indexing And still have no clue as how to proceed. 1. do I have to do some implementation in order to get solr to search doc. on tomcat server? 2. if I have files, such as .doc, docx, .pdf, .jsp, .html, etc under window xp, c

Re: edismax using bigrams instead of phrases?

2009-12-04 Thread Yonik Seeley
On Fri, Dec 4, 2009 at 11:26 AM, Bill Dueber wrote: > I've started trying edismax, and have noticed that my relevancy ranking is > messed up with edismax because, according to the debug output, it's using > bigrams instead of phrases and inexplicably ignoring a couple of the pf > fields. While the

Re: edismax using bigrams instead of phrases?

2009-12-04 Thread Bill Dueber
I see that edismax already defines pf (bigrams) and pf3 (trigrams) -- how would folks think about just calling them pf / pf1 (aliases for each other?), pf2, and pf3? The pf would then behave exactly as it does in dismax. And it sounds like the solution to my single-token fields is to just move the

Re: question about schemas

2009-12-04 Thread solr-user
Lance Norskog-2 wrote: > > But, in general, this is a "shopping cart" database and Solr/Lucene may > not be the best fit for this problem. > True, every tool has strengths and weaknesses. Given how powerful Solr appears to be, I would be surprised if I was not able to handle this use case. L

Re: High add/delete rate and index fragmentation

2009-12-04 Thread Rodrigo De Castro
On Thu, Dec 3, 2009 at 3:59 PM, Lance Norskog wrote: > #2: The standard architecture is with a master that only does indexing > and one or more slaves that only handle queries. The slaves poll the > master for index updates regularly. Java 1.4 has a built-in system for > this. > How do you achi

Solr 1.4: StringIndexOutOfBoundsException in SpellCheckComponent with HTMLStripCharFilterFactory

2009-12-04 Thread Robin Wojciki
I am running a search in Solr 1.4 and I am getting the StringIndexOutOfBoundsException pasted below. The spell check field uses HTMLStripCharFilterFactory. However, the search works fine if I do not use the HTMLStripCharFilterFactory. If I set a breakpoint at SpellCheckComponent.java: 248, the val

Re: question about schemas (and SOLR-1131?)

2009-12-04 Thread wojtekpia
Could this be solved with a multi-valued custom field type (including a custom comparator)? The OP's situation deals with multi-valuing products for each customer. If products contain strictly numeric fields then it seems like a custom field implementation (or extension of BinaryField?) *should* b

Re: HTML Stripping slower in Solr 1.4?

2009-12-04 Thread Robin Wojciki
Thanks Koji for logging the ticket. I noticed its priority is set to minor. Is there any work around? I feel like I am being half as productive as every iteration is taking twice as much time. Thanks Robin On Tue, Dec 1, 2009 at 11:47 AM, Koji Sekiguchi wrote: > Robin, > > Thank you for reportin

Re: High add/delete rate and index fragmentation

2009-12-04 Thread Rodrigo De Castro
On Wed, Dec 2, 2009 at 2:43 PM, Jason Rutherglen wrote: > It sounds like you're asking about near realtime search support, > I'm not sure. So here's few ideas. > > #1 How often do you need to be able to search on the latest > updates (as opposed to updates from lets say, 10 minutes ago)? > You

Grouping

2009-12-04 Thread Bruno
Is there a way to make a group by or distinct query? -- Bruno Morelli Vargas Mail: brun...@gmail.com Msn: brun...@hotmail.com Icq: 165055101 Skype: morellibmv

Re: creating Lucene document from an external XML file.

2009-12-04 Thread Otis Gospodnetic
I think you'd have to dig into Solr (Lucene actually) to inject yourself after Analysis. The UpdateRequestProcessor, as the name implies, it at the request level, so pretty high up/early on. Otis -- Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch - Original Message > From:

Re: Grouping

2009-12-04 Thread Otis Gospodnetic
Not out of the box. You could "group by" using SOLR-236 perhaps? Otis -- Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch - Original Message > From: Bruno > To: solr-user@lucene.apache.org > Sent: Fri, December 4, 2009 1:08:59 PM > Subject: Grouping > > Is there a way to ma

Re: High add/delete rate and index fragmentation

2009-12-04 Thread Otis Gospodnetic
Hello, > You are right that we would need near realtime support. The problem is not > so much about new records becoming available, but guaranteeing that deleted > records will not be returned. For this reason, our plan would be to update > and search a master index, provided that: (1) search whil

how to do auto-suggest case-insensitive match and return original case field values

2009-12-04 Thread hermida
Hi everyone, New to forum and to Solr, doing my first major project with it and enjoying it so far, great software. In my web application I want to set up auto-suggest as you type functionality which will search case-insensitively yet return the original case terms. It doesn't seem like TermsCo

Re: search on tomcat server

2009-12-04 Thread William Pierce
Have you gone through the solr tomcat wiki? http://wiki.apache.org/solr/SolrTomcat I found this very helpful when I did our solr installation on tomcat. - Bill -- From: "Jill Han" Sent: Friday, December 04, 2009 8:54 AM To: Subject: RE: search

Dumping solr requests for indexing

2009-12-04 Thread Teruhiko Kurosaka
Is there any way to dump all incoming requests to Solr into a file? My customer is seeing a strange problem of disappearing docs from index and I'd like to ask them to capture all incoming requests. Thanks. -kuro

Re: Dumping solr requests for indexing

2009-12-04 Thread Otis Gospodnetic
The solr log, as well as the servlet container log should have them all. Otis -- Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch - Original Message > From: Teruhiko Kurosaka > To: "solr-user@lucene.apache.org" > Sent: Fri, December 4, 2009 2:23:17 PM > Subject: Dumping solr

Best way to handle bitfields in solr...

2009-12-04 Thread William Pierce
Folks: In my db I currently have fields that represent bitmasks. Thus, for example, a value of the mask of 48 might represent an "undergraduate" (value = 16) and "graduate" (value = 32). Currently, the corresponding field in solr is a multi-valued string field called "EdLevel" which will h

Re: Best way to handle bitfields in solr...

2009-12-04 Thread Otis Gospodnetic
Would http://wiki.apache.org/solr/FunctionQuery#fieldvalue help? Otis -- Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch - Original Message > From: William Pierce > To: solr-user@lucene.apache.org > Sent: Fri, December 4, 2009 2:43:25 PM > Subject: Best way to handle bitfie

RE: Dumping solr requests for indexing

2009-12-04 Thread Teruhiko Kurosaka
Log only tells high-level descriptions of what were done. I'd like to capture the exact XML requests with data, so that I could re-feed it to Solr to reproduce the issue my customer is encountering. -kuro > -Original Message- > From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]

Question: Write to Solr but not via http, and still store date_format

2009-12-04 Thread Peter 4U
Hi Solr team, Has anyone been able to write to Solr, keeping things like 'date_format', but indexing directly, rather than via http? I've been indexing using Lucene Java, and this works well and is very fast, except that any data indexed this way doesn't store date_format et al informat

Re: Question: Write to Solr but not via http, and still store date_format

2009-12-04 Thread Otis Gospodnetic
Are you looking for http://wiki.apache.org/solr/EmbeddedSolr ? Otis -- Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch - Original Message > From: Peter 4U > To: Solr > Sent: Fri, December 4, 2009 3:09:19 PM > Subject: Question: Write to Solr but not via http, and still sto

Answer: RE: Question: Write to Solr but not via http, and still store date_format

2009-12-04 Thread Peter 4U
Oops, of course the answer was staring me in the face! --> Use the EmbeddedSolrServer, rather than the CommonsHttpSolrServer. Live and learn. Live. and learn. Thanks, Peter > From: pete...@hotmail.com > To: solr-user@lucene.apache.org > Subject: Question: Write to Solr but not

Re: Dumping solr requests for indexing

2009-12-04 Thread Otis Gospodnetic
Aha! Sounds like a job for a simple, custom UpdateRequestProcessor. Actually, I think URP doesn't get access to the actual XML, but what it has access may be enough for you: http://wiki.apache.org/solr/UpdateRequestProcessor Alternatively, unpack the war, add a custom logging servlet filter, ch

how to set multiple fq while building a query in solrj

2009-12-04 Thread javaxmlsoapdev
how do I create a query string witih multiple fq params using solrj SolrQuery API. e.g. I want to build a query as follow http://servername:port/solr/issues/select/?q=testing&fq=statusName:(Female OR Male)&fq=name="Joe" I am using solrj client APIs to build query and using SolrQuery as follow

Re: how is score computed with hsin functionquery?

2009-12-04 Thread gdeconto
Thanks Lance, I appreciate your response. I know what a DIH is and have already written custom transformers. I just misunderstood your response to my message (I wasnt aware that we could use JS to create transformers). Anyhow, my intent is to change the tool (create a variation of hsin to sup

Re: HTML Stripping slower in Solr 1.4?

2009-12-04 Thread Yonik Seeley
Is BaseCharFilter required for the html strip filter? -Yonik http://www.lucidimagination.com On Tue, Dec 1, 2009 at 1:17 AM, Koji Sekiguchi wrote: > Robin, > > Thank you for reporting this. Performance degradation of HTML Stripper > could be in 1.4. I opened a ticket in Lucene: > > https://issu

Re: Debian Lenny + Apache Tomcat 5.5 + Solr 1.4

2009-12-04 Thread Kay Kay
What are the nature of the machines / VM run on ? 32-bit / 64-bit ? rajan chandi wrote: Hi All, We've deployed 4 instances of Solr on a debian server. It is taking only 1.5 GB of RAM on local ubuntu machine but it is taking 2.0 GB plus on Debian Lenny server. Any ideas/pointers will help. R

RE: Dumping solr requests for indexing

2009-12-04 Thread Teruhiko Kurosaka
> Aha! > Sounds like a job for a simple, custom > UpdateRequestProcessor. Actually, I think URP doesn't get > access to the actual XML, but what it has access may be > enough for you: http://wiki.apache.org/solr/UpdateRequestProcessor I added this to solrconfig.xml but I don't see any extra o

Re: Dumping solr requests for indexing

2009-12-04 Thread Mark Miller
Teruhiko Kurosaka wrote: >> Aha! >> Sounds like a job for a simple, custom >> UpdateRequestProcessor. Actually, I think URP doesn't get >> access to the actual XML, but what it has access may be >> enough for you: http://wiki.apache.org/solr/UpdateRequestProcessor >> > > I added this to so

Re: WELCOME to solr-user@lucene.apache.org

2009-12-04 Thread khalid y
Hi, I have a problem with solr. I'm indexing some html content and solr crash because my id field is multivalued. I found that Tika read the html and extract metadata like from my htmls but my documents has an already an id setted by literal.id=10. I tried to map the id from Tika by fmap.id=igno

Re: how to set multiple fq while building a query in solrj

2009-12-04 Thread Erik Hatcher
On Dec 4, 2009, at 4:21 PM, javaxmlsoapdev wrote: how do I create a query string witih multiple fq params using solrj SolrQuery API. e.g. I want to build a query as follow http://servername:port/solr/issues/select/?q=testing&fq=statusName: (Female OR Male)&fq=name="Joe" I am using solr

Sanity check on numeric types and which of them to use

2009-12-04 Thread Jay Hill
Looking at the example version of schema.xml there seems to be some confusion on which numeric field types are best used in different situations. What confused me was that the type of "int" is now set to a TrieIntField, but with a precisionStep of 0: ' the "tint" type is set up as a TrieIntFiel

Re: Sanity check on numeric types and which of them to use

2009-12-04 Thread Yonik Seeley
On Fri, Dec 4, 2009 at 7:38 PM, Jay Hill wrote: > 1) Is there any benefit to using the "int" type as a TrieIntField w/ > precisionStep=0 over the "pint" type for simple ints that won't be sorted or > range queried? No. But given that people could throw in a random range query and have it work co

Query time boosting with dismax

2009-12-04 Thread Girish Redekar
Hi, Is it possible to weigh specific query terms with a Dismax query parser? Is it possible to write queries of the sort ... field1:(term1)^2.0 + (term2^3.0) with dismax? Thanks, Girish Redekar http://girishredekar.net

Re: Debian Lenny + Apache Tomcat 5.5 + Solr 1.4

2009-12-04 Thread rajan chandi
We are using 64 bit VM with 64 bit JDK on it. It is 2.00 GB RAM Zen instance. We're setting up max JVM heap size of 1800 MB max. - Rajan On Fri, Dec 4, 2009 at 8:19 PM, Yonik Seeley wrote: > Are you explicitly setting the heap sizes? If not, the JVM is > deciding for itself based on what the

Re: Debian Lenny + Apache Tomcat 5.5 + Solr 1.4

2009-12-04 Thread rajan chandi
My local ubuntu 9.04 64 bit taking 1.5 GB is not a VM and Debian Lenny 64 bit taking 2 GB is a Xen Instance. - Rajan On Sat, Dec 5, 2009 at 10:51 AM, rajan chandi wrote: > We are using 64 bit VM with 64 bit JDK on it. > It is 2.00 GB RAM Zen instance. > > We're setting up max JVM heap size of 18

Re: Debian Lenny + Apache Tomcat 5.5 + Solr 1.4

2009-12-04 Thread rajan chandi
Local Solr doesn't look like 64 bit. ra...@rajan-desktop:~$ uname -a Linux rajan-desktop 2.6.28-16-server #55-Ubuntu SMP Tue Oct 20 20:50:00 UTC 2009 i686 GNU/Linux But the Xen Solr server does ra...@rajan-desktop:~$ uname -a Linux rajan-desktop 2.6.28-16-server #55-Ubuntu SMP Tue Oct 20 20:50:

Re: Query time boosting with dismax

2009-12-04 Thread Otis Gospodnetic
Terms no, but fields (with terms) and phrases, yes. Otis -- Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch - Original Message > From: Girish Redekar > To: solr-user@lucene.apache.org > Sent: Fri, December 4, 2009 11:42:16 PM > Subject: Query time boosting with dismax > >