date:20090128

newbie question --- multiple schemas

2009-01-28 Thread Cheng Zhang

Hello, Is it possible to define more than one schema? I'm reading the example schema.xml. It seems that we can only define one schema? What about if I want to define one schema for document type A and another schema for document type B? Thanks a lot, Kevin

Re: solr as the data store

2009-01-28 Thread Neal Richter

You might examine what the Apache CouchDB people have done. It's a document oriented DB that is able to use JSON structured documents combined with Lucene indexing of the documents with a RESTful HTTP interface. It's a stretch, and written in Erlang.. but perhaps there is some inspiration to be h

Re: DIH handling of missing files

2009-01-28 Thread Noble Paul നോബിള്‍ नोब्ळ्

onError="continue" must help . which version of DIH are you using? onError is a Solr 1.4 feature --Noble On Thu, Jan 29, 2009 at 5:04 AM, Nathan Adams wrote: > I am constructing documents from a JDBC datasource and a HTTP datasource > (see data-config file below.) My problem is that I cannot kn

Re: How to handle database replication delay when using DataImportHandler?

2009-01-28 Thread Noble Paul നോബിള്‍ नोब्ळ्

The problem you are trying to solve is that you cannot use ${dataimporter.last_index_time} as is. you may need something like ${dataimporter.last_index_time} - 3secs am I right? There are no straight ways to do this . 1) you may write your own function say 'lastIndexMinus3Secs' and add them. func

Re: multilanguage + howto search in all languages?

2009-01-28 Thread Walter Underwood

Duh. Four cases. For extra credit, what language is "wunder" in? wunder On 1/28/09 5:12 PM, "Walter Underwood" wrote: > I've done this. There are five cases for the tokens in the search > index: > > 1. Tokens that are unique after stemming (this is good). > 2. Tokens that are common after stem

Re: multilanguage + howto search in all languages?

2009-01-28 Thread Walter Underwood

I've done this. There are five cases for the tokens in the search index: 1. Tokens that are unique after stemming (this is good). 2. Tokens that are common after stemming (usually trademarks, like LaserJet). 3. Tokens with collisions after stemming: German "mit", "MIT" the university Germ

How to handle database replication delay when using DataImportHandler?

2009-01-28 Thread Gregg

I'd like to use the DataImportHandler running against a slave database that, at any given time, may be significantly behind the master DB. This can cause updates to be missed if you use the clock-time as the "last_index_time." E.g., if the slave catches up to the master between two delta-imports.

Re: multilanguage + howto search in all languages?

2009-01-28 Thread Erick Erickson

I'm not entirely sure about the fine points, but consider the filters that are available that fold all the diacritics into their low-ascii equivalents. Perhaps using that filter at *both* index and search time on the English index would do the trick. In your example, both would be 'munchen'. Strai

Re: solr as the data store

2009-01-28 Thread Erick Erickson

But do note that there's also no requirement that all documents have the same fields. So you could consider storing a special "meta document" that had *no* fields in common with any other document that records whatever information you want about the current state of the index. Best Erick On Wed,

DIH handling of missing files

2009-01-28 Thread Nathan Adams

I am constructing documents from a JDBC datasource and a HTTP datasource (see data-config file below.) My problem is that I cannot know if a particular HTTP URL is available at index time, so I need DIH to continue processing even if the HTTP location returns a 404. onError="continue" does not app

Re: Help with Solr 1.3 lockups?

2009-01-28 Thread Mark Miller

org/apache/catalina/connector/Connector java/util/WeakHashMap $Entry399,913,269 bytes org/apache/catalina/connector/Connector java/lang/Object[ ] 197,256,078 bytes org/apache/lucene/search/ExtendedFieldCachejava/util/WeakHashMap$Entry [ ] 177,893,021 bytes o

Pagination by facet?

2009-01-28 Thread Bruno Aranda

Hi, bear with me as I am new to Solr. I have a requirement in an application where I need to show a list of results by groups. For instance, each document in my index correspond to a person and they have a family name. I have hundreds of thousands of records (persons). What I would like to do is

Re: Help with Solr 1.3 lockups?

2009-01-28 Thread Jerome L Quinn

Mark Miller wrote on 01/26/2009 04:30:00 PM: > Just a point or I missed: with such a large index (not doc size large, > but content wise), I imagine a lot of your 16GB of RAM is being used by > the system disk cache - which is good. Another reason you don't want to > give too much RAM to the JV

multilanguage + howto search in all languages?

2009-01-28 Thread Julian Davchev

Hi, I currently have two indexes with solr. One for english version and one with german version. They use respectively english/german2 snowball factory. Right now depending on which language is website currently I query corresponding index. There is requirement though that stuff is found regardless

Re: solr as the data store

2009-01-28 Thread Otis Gospodnetic

There is no existing internal field like that. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Ian Connor > To: solr-user@lucene.apache.org > Sent: Wednesday, January 28, 2009 4:59:28 PM > Subject: Re: solr as the data store > > I am plan

Re: solr as the data store

2009-01-28 Thread Ian Connor

I am planning with backups, the recovery will only be incremental. Is there an internal field to know when the last document hit the index or is this best to build your own "created_at" type field to know when you need to rebuild from? After the backup is restored, this field could be read and th

RE: solr as the data store

2009-01-28 Thread Feak, Todd

Although the idea that you will need to rebuild from scratch is unlikely, you might want to fully understand the cost of recovery if you *do* have to. If it's incredibly expensive(time or money), you need to keep that in mind. -Todd -Original Message- From: Ian Connor [mailto:ian.con...

Re: Indexing documents in multiple languages

2009-01-28 Thread Otis Gospodnetic

Alejandro, What you really want to do is identify the language of the email, store that in the index and apply the appropriate analyzer. At query time you really want to know the language of the query (either by detecting it or asking the user or ...) Otis -- Sematext -- http://sematext.com/

Re: Tools for Managing Synonyms, Elevate, etc.

2009-01-28 Thread Otis Gospodnetic

Mark, I am not aware of anyone open-sourcing such tools. But note that changing the files with a GUI is easy (editor + scp?). What makes things more complicated is the need to make Solr reload those files and, in some cases, changes really require a full index rebuilding. Otis -- Sematext --

Re: Customizing Solr to handle Leading Wildcard queries

2009-01-28 Thread Otis Gospodnetic

Yeah, I think the begin/end chars are very helpful here. But I like the suggestion of figuring out which words really need to support leading wildcards...although that's typically impossible to predict, since people are typically free to enter whatever queries they feel like. Otis -- Sematext

Re: solr as the data store

2009-01-28 Thread Otis Gospodnetic

This is perfectly fine. Of course, you lose any relational model. If you don't have or don't need one, why not. It used to be the case that backups of live Lucene indices were hard, so people preferred having a RDBMS be the primary data source, the one they know how to back up and maintain we

Re: solr as the data store

2009-01-28 Thread Matthew Runo

One thing to keep in mind is that things like joins are impossible in solr, but easy in a database. So if you ever need to do stuff like run reports, you're probably better off with a database to query on - unless you cover your bases very well in the solr index. Thanks for your time! Matt

solr as the data store

2009-01-28 Thread Ian Connor

Hi All, Is anyone using Solr (and thus the lucene index) as there database store. Up to now, we have been using a database to build Solr from. However, given that lucene already keeps the stored data intact, and that rebuilding from solr to solr can be very fast, the need for the separate databas

Re: query with stemming, prefix and fuzzy?

2009-01-28 Thread Shalin Shekhar Mangar

On Thu, Jan 29, 2009 at 12:39 AM, Gert Brinkmann wrote: > > Hello again, > > is there nobody who could help me with this? Or is it an FAQ and my > questions are dumb somehow? Maybe I should try to shorten the questions: ;) > Quite the opposite, you are actually working with some advanced stuff :

Re: query with stemming, prefix and fuzzy?

2009-01-28 Thread Gert Brinkmann

Hello again, is there nobody who could help me with this? Or is it an FAQ and my questions are dumb somehow? Maybe I should try to shorten the questions: ;) > A) fuzzy search > > What can I do to speed up the fuzzy query? > B) combine stemming, prefix and fuzzy search > > Is there a way to

Re: Highlighting does not work?

2009-01-28 Thread Mike Klaas

Well, both pages I listed are in the search results :). But I agree that it isn't obvious to find, and that it should be improved. (The Wiki is a community-created site which anyone can contribute to, incidentally.) cheers, -Mike On 28-Jan-09, at 1:11 AM, Jarek Zgoda wrote: I swear I wa

Re: Joining Solr Indexes

2009-01-28 Thread Sameer Maggon

IndexMergeTool - http://wiki.apache.org/solr/MergingSolrIndexes Sameer. -- http://www.productification.com On Wed, Jan 28, 2009 at 7:30 AM, Jae Joo wrote: > Hi, > > Is there any way to join multiple indexes in Solr? > > Thanks, > > Jae >

Re: index size tripled during optimization

2009-01-28 Thread Shalin Shekhar Mangar

Does you index stay at triple size after optimization? It is normal for Lucene to use 2x or upto 3x disk space during optimization but it should fall back to the normal numbers once optimization completes and unused segments are cleaned up due the index deletion policy. If you search for threads i

Re: index size tripled during optimization

2009-01-28 Thread Qingdi

Hi Ryuuichi, Thanks for your quick reply. I checked the setting of in solrconfig.xml, and the value is 'false'. Here is what in our solrconfig.xml. === false 1000 1 2147483647 10 1000

RE: Performance "dead-zone" due to garbage collection

2009-01-28 Thread Renaud Waldura

I'm coming in late on this thread, but I want to recommend the YourKit Profiler product. It helped me track a performance problem similar to what you describe. I had been futzing with GC logging etc. for days before YourKit pinpointed the issue within minutes. http://www.yourkit.com/ (My problem

Re: Changing to multicore

2009-01-28 Thread Jeff Newburn

Tried that. Basically, solr really didn't want to do the internal rewrite. So essentially we would have to rewrite with a full redirect and then change the solrj source to allow it to follow the redirect. We are going with an external rewriter. However, the seemingly easiest way would be to just

Re: [dummy question] applying patch

2009-01-28 Thread Mark Miller

surfer10 wrote: i'm a little bit noob in java compiler so could you please tell me what tools are used to apply patch SOLR-236 (Field groupping), does it need to be applied on current solr-1.3 (and nightly builds of 1.4) or it already in box? what batch file stands for solr compilation in its di

Re: Changing to multicore

2009-01-28 Thread Bryan Talbot

I would think that using a servlet filter to rewrite the URL should be pretty strait forward. You could write your own or use a tool like http://tuckey.org/urlrewrite/ and just configure that. Using something like this, I think the upgrade procedure could be: - install rewrite filter to rewr

Re: multilanguage prototype

2009-01-28 Thread Jerven Bolleman

Hi, Your problem seems to be lower level than the SOLR code. You are sending an xml request that contains an illegal (to xml spec) character. You should strip these characters out of the data that you send. Or turn the xml validation (not recommended because of all kinds of risks). See http://www

Joining Solr Indexes

2009-01-28 Thread Jae Joo

Hi, Is there any way to join multiple indexes in Solr? Thanks, Jae

Changing to multicore

2009-01-28 Thread Jeff Newburn

We are moving from single core to multicore. We have a few servers that we want to migrate one at a time to ensure that each one functions. This process is proving difficult as there is no default core to allow the application to talk to the solr servers uniformly (ie without a core name during c

Re: solrj delete by Id problem

2009-01-28 Thread Parisa

I know that I can see the search result after the commit and it is ok, I can disable the queryResultCache and the problem will be fixed . but I need the queryResultCache because my index Size is big and I need good performance . so I am trying to find how to fix the bug or may be the solr guys

Re: solrj delete by Id problem

2009-01-28 Thread Shalin Shekhar Mangar

On Wed, Jan 28, 2009 at 4:29 PM, Parisa wrote: > > I should say that we also have this problem when we commit with waitflush = > true and waitsearcher = true > > because it again close the old searcher and open a new one. so it has > warming up process with the queryResultCache. > > besides , I

Re: solrj delete by Id problem

2009-01-28 Thread Parisa

I should say that we also have this problem when we commit with waitflush = true and waitsearcher = true because it again close the old searcher and open a new one. so it has warming up process with the queryResultCache. besides , I need to commit waitFlush = false and waitSearcher=false to

Re: Text classification with Solr

2009-01-28 Thread Hannes Carl Meyer

>From my past projects, our Lucene classification corpus looked like this: 0|document text...|categoryA 1|document text...|categoryB 2|document text...|categoryA 3|document text...|categoryA ... 800|document text...|categoryC With the faceting capabilities of Solr it is now possible to design mor

Re: index size tripled during optimization

2009-01-28 Thread Ryuuichi KUMAI

Hello Qingdi, Have you changed the "" setting in solrconfig.xml? In my experience, when using compound-file index ("true"), the size of index grows up to triple during optimization. My understanding is that when writing a new segment in compound format, Lucene writes the multifile format first and

Re: Highlighting does not work?

2009-01-28 Thread Jarek Zgoda

I swear I was looking this information in Solr wiki. See for yourself if this is accessible at all: http://wiki.apache.org/solr/?action=fullsearch&context=180&value=highlight&fullsearch=Text Wiadomość napisana w dniu 2009-01-28, o godz. 00:58, przez Mike Klaas: They are documented in http://w

Re: Customizing Solr to handle Leading Wildcard queries

2009-01-28 Thread Neal Richter

Oh wait.. looks like Otis' suggestion of "index n-grams with begin/end delim characters" and relying on phrase-searching to link the chains of characters.. logically doing a better version of my previous email. - Neal On Wed, Jan 28, 2009 at 1:04 AM, Neal Richter wrote: > leading wildcard searc

Re: Customizing Solr to handle Leading Wildcard queries

2009-01-28 Thread Neal Richter

leading wildcard search is called grep ;-) Ditto on the indexing reversed words suggestion. Can you create a second field in solr that contains /only/ the words from the fields you care to reverse? Once you do that you could pre-process the query and look for leading wildcards and address those

44 matches

Mail list logo