Re: delete snapshot??

2009-02-17 Thread sunnyfr
How can I remove from time to time, because for the script snapcleaner I just have the option to delete last day ??? thanks a lot Noble and sorry again for all this question, Noble Paul നോബിള്‍ नोब्ळ् wrote: > > The hardlinks will prevent the unused files from getting cleaned up. > So the dis

Re: dealing with logs - feature advice based on a use case

2009-02-17 Thread Otis Gospodnetic
Marc, I don't have a Multicore setup that's itching for better logging, but I think what you are suggesting is good.  If I had a multicore setup I might want either separate logs or the option to log the core name.  Perhaps an Enhancement type JIRA entry is in order? Otis -- Sematext -- http:/

Re: delete snapshot??

2009-02-17 Thread Otis Gospodnetic
Hi, snapcleaner lets you delete snapshots by one of the following two criteria: - delete all but last N snapshots - delete all snapshots older than N days Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch From: sunnyfr To: solr-user@lucene.apa

Re: Outofmemory error for large files

2009-02-17 Thread Shalin Shekhar Mangar
On Tue, Feb 17, 2009 at 1:10 PM, Otis Gospodnetic < otis_gospodne...@yahoo.com> wrote: > Right. But I was trying to point out that a single 150MB Document is not > in fact what the o.p. wants to do. For example, if your 150MB represents, > say, a whole book, should that really be a single docume

Facet search on Multi-Valued Fields

2009-02-17 Thread Wang Guangchen
Hi all, I have been experimenting solr faceted search for 2 weeks. But I meet performance limitation on facet Search. My solr contains 4,000,000 documents. Normal searching is fairly fast, But faceted search is extremely slow. I am trying to do facet search on 3 fields (all multivalued fields) in

Re: Facet search on Multi-Valued Fields

2009-02-17 Thread Marc Sturlese
Have you tired with a nightly build with the new facet algorithm (it is activated by default)? http://www.nabble.com/new-faceting-algorithm-td20674902.html Wang Guangchen wrote: > > Hi all, > I have been experimenting solr faceted search for 2 weeks. But I meet > performance limitation on face

Re: Multilanguage

2009-02-17 Thread Paul Libbrecht
I was looking for such a tool and haven't found it yet. Using StandardAnalyzer one can obtain some form of token-stream which can be used for "agnostic analysis". Clearly, then, something that matches words in a dictionary and decides on the language based on the language of the majority could

Re: Facet search on Multi-Valued Fields

2009-02-17 Thread Wang Guangchen
Nope, I am using the latest stable version of solr 1.3.0. Thanks for your tips. Besides this, Is there any other thing I should do? I am reading some previous threads about index optimization. ( http://www.mail-archive.com/solr-user@lucene.apache.org/msg05290.html), Will it improve the facet sea

Re: Multilanguage

2009-02-17 Thread Till Kinstler
Paul Libbrecht schrieb: Clearly, then, something that matches words in a dictionary and decides on the language based on the language of the majority could do a decent job to decide the analyzer. Does such a tool exist? I once played around with http://ngramj.sourceforge.net/ for language

Re: Facet search on Multi-Valued Fields

2009-02-17 Thread Marc Sturlese
Well doing an optimization after you do indexing will always improve your search speed a little bit. But with the new facet algorithm you will note a huge improvement ... Other things to consider is to just index and store the necessary fields, omitNorms always that is possible... there are many t

Re: Facet search on Multi-Valued Fields

2009-02-17 Thread Wang Guangchen
Thank you very much. On Tue, Feb 17, 2009 at 6:04 PM, Marc Sturlese wrote: > > Well doing an optimization after you do indexing will always improve your > search speed a little bit. But with the new facet algorithm you will note a > huge improvement ... > Other things to consider is to just index

Finding total range of dates for date faceting

2009-02-17 Thread Jacob Singh
Hi, I'm trying to write some code to build a facet list for a date field, but I don't know what the first and last available dates are. I would adjust the gap param accordingly. If there is a 10yr stretch between min(date) and max(date) I'd want to facet by year. If it is a 1 month gap, I'd wan

Re: Multilanguage

2009-02-17 Thread revathy arun
Does Apache Tika help find the language of the given document? On 2/17/09, Till Kinstler wrote: > > Paul Libbrecht schrieb: > > Clearly, then, something that matches words in a dictionary and decides on >> the language based on the language of the majority could do a decent job to >> decide the

DIH full-import with clean=true fails and rollback empties index

2009-02-17 Thread Steffen B.
Hi there, I've got a pretty simple question regarding the DIH full-import command. I have a SOLR server running that has a full index with lots of documents in it. Once a day, a full-import is run, which uses the default parameters (clean=true, because it's not an incremental index). When I run a

Query regarding setTimeAllowed(Integer) and setRows(Integer)

2009-02-17 Thread Jana, Kumar Raja
Hi, I am trying to avoid queries which take a lot of server time. For this I plan to use setRows(Integer) and setTimeAllowed(Integer) methods while creating the SolrQuery. I would like to know the following: 1. I set SolrQuery.setRows(5000) Will the processing of the query stop once 5

Re: DIH full-import with clean=true fails and rollback empties index

2009-02-17 Thread Shalin Shekhar Mangar
On Tue, Feb 17, 2009 at 4:42 PM, Steffen B. wrote: > > Unfortunately, this rollback does not "refill" the index with the old data, > and neither keeps the old index from being overwritten with the new, > erroneous index. Now my question is: is there anything I can do to keep > Solr > from trashing

Re: DIH full-import with clean=true fails and rollback empties index

2009-02-17 Thread Noble Paul നോബിള്‍ नोब्ळ्
may be you can try "postImportDeleteQuery" (not yet documented , SOLR-801) on a root entity. You can keep a timestamp in the fields which can keep the value of ${dataimporter.index_start_time} as a field . Use that to remove old docs which may exist in the index before the indexing started --Noble

2 strange behaviours with DIH full-import.

2009-02-17 Thread Marc Sturlese
Hey, I have 2 problems that I think are really important and can be useful for other users: 1.) I am runing 3 cores in a solr instance. Each core contains about a milion and a half docs. Once a full-import is run in a core it will free just a little bit of java memory. Once that first full-import

Re: Multilanguage

2009-02-17 Thread Otis Gospodnetic
Hi, No, Tika doesn't do LangID.  I haven't used ngramj, so I can't speak for its accuracy nor speed (but I know the code has been around for years).  Another LangID implementation is at the URL below my name. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ___

Re: indexing Chienese langage

2009-02-17 Thread Koji Sekiguchi
CharFilter can normalize (convert) traditional chinese to simplified chinese or vice versa, if you define mapping.txt. Here is the sample of Chinese character normalization: https://issues.apache.org/jira/secure/attachment/12392639/character-normalization.JPG See SOLR-822 for the detail: http

Re: Word Locations & Search Components

2009-02-17 Thread Koji Sekiguchi
Hmm, Otis, very nice! Koji Otis Gospodnetic wrote: Hi, Wouldn't this be as easy as: - split email into "paragraphs" - for each paragraph compute signature (MD5 or something fuzzier, like in SOLR-799) - for each signature look for other emails with this signature - when you find an email with

Re: Finding total range of dates for date faceting

2009-02-17 Thread Peter Wolanin
It *looks* as though Solr supports returning the results of arbitrary calculations: http://wiki.apache.org/solr/SolrQuerySyntax However, I am so far unable to get any example working except in the context of a dismax bf. It seems like one ought to be able to write a query to return the doc match

Re: DIH transformers - sect 2

2009-02-17 Thread Fergus McMenemie
>On Mon, Feb 16, 2009 at 3:22 PM, Fergus McMenemie wrote: >> >> 2) Having used TemplateTransformer to assign a value to an >> entity column that column cannot be used in other >> TemplateTransformer operations. In my project I am >> attempting to reuse "x.fileWebPath". To fix this, th

Re: snapshot created if there is no documente updated/new?

2009-02-17 Thread Bill Au
A sanpshot is created every time snapshooter is invoked even if there is no changed in the index. However, since snapshots are created using hard links, no additional space is used if there are no changed to the index. It does use up one directory entry in the data directory. Bill On Mon, Feb 1

Re: snapshot as big as the index folder?

2009-02-17 Thread Bill Au
Snapshots are created using hard links. So even though it is as big as the index, it is not taking up any more space on the disk. The size of the snapshot will change as the size of the index changes. Bill On Mon, Feb 16, 2009 at 9:50 AM, sunnyfr wrote: > > It change a lot in few minute ?? is

Re: delete snapshot??

2009-02-17 Thread Bill Au
usage: snapcleaner -D | -N [-d dir] [-u username] [-v] -Dcleanup snapshots more than days old -N keep the most recent number of snapshots and cleanup up the remaining ones that are not being pulled -d specify directory holding index data

Re: delete snapshot??

2009-02-17 Thread Walter Underwood
I run snapcleaner from cron. That cleans up old snapshots once each day. Here is a crontab line that runs it at 30 minutes past the hour, every hour. 30 * * * * /apps/wss/solr_home/bin/snapcleaner -N 3 wunder On 2/17/09 7:23 AM, "Bill Au" wrote: > usage: snapcleaner -D | -N [-d dir] [-u user

Re: Query regarding setTimeAllowed(Integer) and setRows(Integer)

2009-02-17 Thread Walter Underwood
Requesting 5000 rows will use a lot of server time, because it has to fetch the information for 5000 results when it makes the response. It is much more efficient to request only the results you will need, usually 10 at a time. wunder On 2/17/09 3:30 AM, "Jana, Kumar Raja" wrote: > Hi, > >

Store content out of solr

2009-02-17 Thread roberto
Hello, We are indexing information from diferent sources so we would like to centralize the information content so i can retrieve using the ID provided buy solr? Does anyone did something like this, and have some advices ? I thinking in store the information into a database like mysql ? Thanks,

Re: Multilanguage

2009-02-17 Thread revathy arun
Hi Otis, But this is not freeware ,right? On 2/17/09, Otis Gospodnetic wrote: > > Hi, > > No, Tika doesn't do LangID. I haven't used ngramj, so I can't speak for > its accuracy nor speed (but I know the code has been around for > years). Another LangID implementation is at the URL below my

Re: Store content out of solr

2009-02-17 Thread Peter Wolanin
Sure, we are doing essentially that with our Drupal integration module - each search result contains a link to the "real" content, which is stored in MySQL, etc, and presented via the Drupal CMS. http://drupal.org/project/apachesolr -Peter On Tue, Feb 17, 2009 at 11:57 AM, roberto wrote: > Hell

Re: Query regarding setTimeAllowed(Integer) and setRows(Integer)

2009-02-17 Thread Sean Timm
Jana, Kumar Raja wrote: 2. If I set SolrQuery.setTimeAllowed(2000) Will this kill query processing after 2 secs? (I know this question sounds silly but I just want a confirmation from the experts J That is the idea, but only some of the code is within the timer. So, there are cases where

Re: Store content out of solr

2009-02-17 Thread Renaud Delbru
A common approach (for web search engines) is to use HBase [1] as a "Document Repository". Each document indexed inside Solr will have an entry (row, identified by the document URL) in the HBase table. This works great when you deal with a large data collection (it scales better than a SQL data

Re: Multilanguage

2009-02-17 Thread Grant Ingersoll
There are a number of options for freeware here, just do some searching on your favorite Internet search engine. TextCat is one of the more popular, as I seem to recall: http://odur.let.rug.nl/~vannoord/TextCat/ I believe Karl Wettin submitted a Lucene patch for a Language guesser: http://is

Re: Multilanguage

2009-02-17 Thread Walter Underwood
On 2/17/09 12:26 PM, "Grant Ingersoll" wrote: > If purchasing, several companies offer solutions, but I don't know > that their quality is any better than what you can get through open > source, as generally speaking, the problem is solved with a high > degree of accuracy through n-gram analysis.

making changes to solr schema

2009-02-17 Thread Jonathan Haddad
Preface: This is my first attempt at using solr. What happens if I need to do a change to a solr schema that's already in production? Can fields be added or removed? Can a type change from an integer to a float? Thanks in advance, Jon -- Jonathan Haddad http://www.rustyrazorblade.com

making changes to solr schema after deployed to production

2009-02-17 Thread Jonathan Haddad
Preface: This is my first attempt at using solr. What happens if I need to do a change to a solr schema that's already in production? Can fields be added or removed? Can a type change from an integer to a float? Thanks in advance, Jon

embedded wildcard search not working?

2009-02-17 Thread Jim Adams
This is a straightforward question, but I haven't been able to figure out what is up with my application. I seem to be able to search on trailing wildcards just find. For example, fieldName:a* will return documents with apple, ardvaark, etc. in them. But if I was to try and search on a field con

Reading Core-Specific Config File in a Row Transformer

2009-02-17 Thread wojtekpia
I'm using the DataImportHandler to load data. I created a custom row transformer, and inside of it I'm reading a configuration file. I am using the system's solr.solr.home property to figure out which directory the file should be in. That works for a single-core deployment, but not for multi-core

Re: Reading Core-Specific Config File in a Row Transformer

2009-02-17 Thread Shalin Shekhar Mangar
On Wed, Feb 18, 2009 at 5:53 AM, wojtekpia wrote: > > Is there a clean way to resolve the actual > conf directory path from within a custom row transformer so that it works > for both single-core and multi-core deployments? > You can use Context.getSolrCore().getInstanceDir() -- Regards, Shali

Re: making changes to solr schema

2009-02-17 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Wed, Feb 18, 2009 at 3:37 AM, Jonathan Haddad wrote: > Preface: This is my first attempt at using solr. > > What happens if I need to do a change to a solr schema that's already > in production? Can fields be added or removed? you may need a core reload or a serverrestart fields can be added a

Data Normalization in Solr.

2009-02-17 Thread Kalidoss MM
Hi, I want to store normalized data into Solr, example am spliting personal information datas(fname, lname, mname) as one solr record, Address (personal, office) as another record in Solr. the id is different 123212_name, 123212_add, Now, some case i require both personal and

RE: Query regarding setTimeAllowed(Integer) and setRows(Integer)

2009-02-17 Thread Jana, Kumar Raja
Thanks wunder for the response. So I would like to know if I were to limit the resultset from Solr to 10 and my query actually matches, say 1000 documents, will the query processing stop the moment the search finds the first 10 documents? Or will the entire search be carried out and then sorted ou

RE: Query regarding setTimeAllowed(Integer) and setRows(Integer)

2009-02-17 Thread Jana, Kumar Raja
Thanks Sean. That clears up the timer concept. Is there any other way through which I can make sure that the server time is not wasted? -Original Message- From: Sean Timm [mailto:tim...@aol.com] Sent: Wednesday, February 18, 2009 1:00 AM To: solr-user@lucene.apache.org Subject: Re: Query

Re: Data Normalization in Solr.

2009-02-17 Thread Otis Gospodnetic
Hi, There are no entity relationships in Solr and there are no joins, so the simplest thing to do in this case is to issue two requests.  You could also write a custom SearchComponent that internally does two requests and returns a single unified response. Otis -- Sematext -- http://sematext.c

Re: embedded wildcard search not working?

2009-02-17 Thread Otis Gospodnetic
Jim, Does app*l or even a*p* work?  Perhaps "apple" gets stemmed to something that doesn't end in "e", such as "appl"? Regarding your config, you probably want to lowercase before removing stop words, so you'll want to change the order of those filters a bit.  That's not related to your wildcar