How can I remove from time to time, because for the script snapcleaner I just
have the option to delete last day ???
thanks a lot Noble and sorry again for all this question,
Noble Paul നോബിള് नोब्ळ् wrote:
>
> The hardlinks will prevent the unused files from getting cleaned up.
> So the dis
Marc,
I don't have a Multicore setup that's itching for better logging, but I
think what you are suggesting is good. If I had a multicore setup I might want
either separate logs or the option to log the core name. Perhaps an
Enhancement type JIRA entry is in order?
Otis --
Sematext -- http:/
Hi,
snapcleaner lets you delete snapshots by one of the following two criteria:
- delete all but last N snapshots
- delete all snapshots older than N days
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
From: sunnyfr
To: solr-user@lucene.apa
On Tue, Feb 17, 2009 at 1:10 PM, Otis Gospodnetic <
otis_gospodne...@yahoo.com> wrote:
> Right. But I was trying to point out that a single 150MB Document is not
> in fact what the o.p. wants to do. For example, if your 150MB represents,
> say, a whole book, should that really be a single docume
Hi all,
I have been experimenting solr faceted search for 2 weeks. But I meet
performance limitation on facet Search.
My solr contains 4,000,000 documents. Normal searching is fairly fast, But
faceted search is extremely slow.
I am trying to do facet search on 3 fields (all multivalued fields) in
Have you tired with a nightly build with the new facet algorithm (it is
activated by default)?
http://www.nabble.com/new-faceting-algorithm-td20674902.html
Wang Guangchen wrote:
>
> Hi all,
> I have been experimenting solr faceted search for 2 weeks. But I meet
> performance limitation on face
I was looking for such a tool and haven't found it yet.
Using StandardAnalyzer one can obtain some form of token-stream which
can be used for "agnostic analysis".
Clearly, then, something that matches words in a dictionary and
decides on the language based on the language of the majority could
Nope, I am using the latest stable version of solr 1.3.0.
Thanks for your tips.
Besides this, Is there any other thing I should do? I am reading some
previous threads about index optimization. (
http://www.mail-archive.com/solr-user@lucene.apache.org/msg05290.html), Will
it improve the facet sea
Paul Libbrecht schrieb:
Clearly, then, something that matches words in a dictionary and decides
on the language based on the language of the majority could do a decent
job to decide the analyzer.
Does such a tool exist?
I once played around with http://ngramj.sourceforge.net/ for language
Well doing an optimization after you do indexing will always improve your
search speed a little bit. But with the new facet algorithm you will note a
huge improvement ...
Other things to consider is to just index and store the necessary fields,
omitNorms always that is possible... there are many t
Thank you very much.
On Tue, Feb 17, 2009 at 6:04 PM, Marc Sturlese wrote:
>
> Well doing an optimization after you do indexing will always improve your
> search speed a little bit. But with the new facet algorithm you will note a
> huge improvement ...
> Other things to consider is to just index
Hi,
I'm trying to write some code to build a facet list for a date field,
but I don't know what the first and last available dates are. I would
adjust the gap param accordingly. If there is a 10yr stretch between
min(date) and max(date) I'd want to facet by year. If it is a 1 month
gap, I'd wan
Does Apache Tika help find the language of the given document?
On 2/17/09, Till Kinstler wrote:
>
> Paul Libbrecht schrieb:
>
> Clearly, then, something that matches words in a dictionary and decides on
>> the language based on the language of the majority could do a decent job to
>> decide the
Hi there,
I've got a pretty simple question regarding the DIH full-import command.
I have a SOLR server running that has a full index with lots of documents in
it. Once a day, a full-import is run, which uses the default parameters
(clean=true, because it's not an incremental index).
When I run a
Hi,
I am trying to avoid queries which take a lot of server time. For this I
plan to use setRows(Integer) and setTimeAllowed(Integer) methods while
creating the SolrQuery. I would like to know the following:
1. I set SolrQuery.setRows(5000) Will the processing of the query
stop once 5
On Tue, Feb 17, 2009 at 4:42 PM, Steffen B. wrote:
>
> Unfortunately, this rollback does not "refill" the index with the old data,
> and neither keeps the old index from being overwritten with the new,
> erroneous index. Now my question is: is there anything I can do to keep
> Solr
> from trashing
may be you can try "postImportDeleteQuery" (not yet documented ,
SOLR-801) on a root entity.
You can keep a timestamp in the fields which can keep the value of
${dataimporter.index_start_time} as a field . Use that to remove old
docs which may exist in the index before the indexing started
--Noble
Hey, I have 2 problems that I think are really important and can be useful
for other users:
1.) I am runing 3 cores in a solr instance. Each core contains about a
milion and a half docs. Once a full-import is run in a core it will free
just a little bit of java memory. Once that first full-import
Hi,
No, Tika doesn't do LangID. I haven't used ngramj, so I can't speak for its
accuracy nor speed (but I know the code has been around for years). Another
LangID implementation is at the URL below my name.
Otis --
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
___
CharFilter can normalize (convert) traditional chinese to simplified
chinese or vice versa,
if you define mapping.txt. Here is the sample of Chinese character
normalization:
https://issues.apache.org/jira/secure/attachment/12392639/character-normalization.JPG
See SOLR-822 for the detail:
http
Hmm, Otis, very nice!
Koji
Otis Gospodnetic wrote:
Hi,
Wouldn't this be as easy as:
- split email into "paragraphs"
- for each paragraph compute signature (MD5 or something fuzzier, like in
SOLR-799)
- for each signature look for other emails with this signature
- when you find an email with
It *looks* as though Solr supports returning the results of arbitrary
calculations:
http://wiki.apache.org/solr/SolrQuerySyntax
However, I am so far unable to get any example working except in the
context of a dismax bf. It seems like one ought to be able to write a
query to return the doc match
>On Mon, Feb 16, 2009 at 3:22 PM, Fergus McMenemie wrote:
>>
>> 2) Having used TemplateTransformer to assign a value to an
>> entity column that column cannot be used in other
>> TemplateTransformer operations. In my project I am
>> attempting to reuse "x.fileWebPath". To fix this, th
A sanpshot is created every time snapshooter is invoked even if there is no
changed in the index. However, since snapshots are created using hard
links, no additional space is used if there are no changed to the index. It
does use up one directory entry in the data directory.
Bill
On Mon, Feb 1
Snapshots are created using hard links. So even though it is as big as the
index, it is not taking up any more space on the disk. The size of the
snapshot will change as the size of the index changes.
Bill
On Mon, Feb 16, 2009 at 9:50 AM, sunnyfr wrote:
>
> It change a lot in few minute ?? is
usage: snapcleaner -D | -N [-d dir] [-u username] [-v]
-Dcleanup snapshots more than days old
-N keep the most recent number of snapshots and
cleanup up the remaining ones that are not being pulled
-d specify directory holding index data
I run snapcleaner from cron. That cleans up old snapshots once
each day. Here is a crontab line that runs it at 30 minutes past
the hour, every hour.
30 * * * * /apps/wss/solr_home/bin/snapcleaner -N 3
wunder
On 2/17/09 7:23 AM, "Bill Au" wrote:
> usage: snapcleaner -D | -N [-d dir] [-u user
Requesting 5000 rows will use a lot of server time, because
it has to fetch the information for 5000 results when it
makes the response.
It is much more efficient to request only the results you
will need, usually 10 at a time.
wunder
On 2/17/09 3:30 AM, "Jana, Kumar Raja" wrote:
> Hi,
>
>
Hello,
We are indexing information from diferent sources so we would like to
centralize the information content so i can retrieve using the ID
provided buy solr?
Does anyone did something like this, and have some advices ? I
thinking in store the information into a database like mysql ?
Thanks,
Hi Otis,
But this is not freeware ,right?
On 2/17/09, Otis Gospodnetic wrote:
>
> Hi,
>
> No, Tika doesn't do LangID. I haven't used ngramj, so I can't speak for
> its accuracy nor speed (but I know the code has been around for
> years). Another LangID implementation is at the URL below my
Sure, we are doing essentially that with our Drupal integration module
- each search result contains a link to the "real" content, which is
stored in MySQL, etc, and presented via the Drupal CMS.
http://drupal.org/project/apachesolr
-Peter
On Tue, Feb 17, 2009 at 11:57 AM, roberto wrote:
> Hell
Jana, Kumar Raja wrote:
2. If I set SolrQuery.setTimeAllowed(2000) Will this kill query
processing after 2 secs? (I know this question sounds silly but I just
want a confirmation from the experts J
That is the idea, but only some of the code is within the timer. So,
there are cases where
A common approach (for web search engines) is to use HBase [1] as a
"Document Repository". Each document indexed inside Solr will have an
entry (row, identified by the document URL) in the HBase table. This
works great when you deal with a large data collection (it scales better
than a SQL data
There are a number of options for freeware here, just do some
searching on your favorite Internet search engine.
TextCat is one of the more popular, as I seem to recall:
http://odur.let.rug.nl/~vannoord/TextCat/
I believe Karl Wettin submitted a Lucene patch for a Language guesser: http://is
On 2/17/09 12:26 PM, "Grant Ingersoll" wrote:
> If purchasing, several companies offer solutions, but I don't know
> that their quality is any better than what you can get through open
> source, as generally speaking, the problem is solved with a high
> degree of accuracy through n-gram analysis.
Preface: This is my first attempt at using solr.
What happens if I need to do a change to a solr schema that's already
in production? Can fields be added or removed?
Can a type change from an integer to a float?
Thanks in advance,
Jon
--
Jonathan Haddad
http://www.rustyrazorblade.com
Preface: This is my first attempt at using solr.
What happens if I need to do a change to a solr schema that's already
in production? Can fields be added or removed?
Can a type change from an integer to a float?
Thanks in advance,
Jon
This is a straightforward question, but I haven't been able to figure out
what is up with my application.
I seem to be able to search on trailing wildcards just find. For example,
fieldName:a* will return documents with apple, ardvaark, etc. in them. But
if I was to try and search on a field con
I'm using the DataImportHandler to load data. I created a custom row
transformer, and inside of it I'm reading a configuration file. I am using
the system's solr.solr.home property to figure out which directory the file
should be in. That works for a single-core deployment, but not for
multi-core
On Wed, Feb 18, 2009 at 5:53 AM, wojtekpia wrote:
>
> Is there a clean way to resolve the actual
> conf directory path from within a custom row transformer so that it works
> for both single-core and multi-core deployments?
>
You can use Context.getSolrCore().getInstanceDir()
--
Regards,
Shali
On Wed, Feb 18, 2009 at 3:37 AM, Jonathan Haddad wrote:
> Preface: This is my first attempt at using solr.
>
> What happens if I need to do a change to a solr schema that's already
> in production? Can fields be added or removed?
you may need a core reload or a serverrestart
fields can be added a
Hi,
I want to store normalized data into Solr, example am spliting
personal information datas(fname, lname, mname) as one solr record, Address
(personal, office) as another record in Solr. the id is different
123212_name, 123212_add,
Now, some case i require both personal and
Thanks wunder for the response.
So I would like to know if I were to limit the resultset from Solr to 10
and my query actually matches, say 1000 documents, will the query
processing stop the moment the search finds the first 10 documents? Or
will the entire search be carried out and then sorted ou
Thanks Sean. That clears up the timer concept.
Is there any other way through which I can make sure that the server
time is not wasted?
-Original Message-
From: Sean Timm [mailto:tim...@aol.com]
Sent: Wednesday, February 18, 2009 1:00 AM
To: solr-user@lucene.apache.org
Subject: Re: Query
Hi,
There are no entity relationships in Solr and there are no joins, so the
simplest thing to do in this case is to issue two requests. You could also
write a custom SearchComponent that internally does two requests and returns a
single unified response.
Otis
--
Sematext -- http://sematext.c
Jim,
Does app*l or even a*p* work? Perhaps "apple" gets stemmed to something that
doesn't end in "e", such as "appl"?
Regarding your config, you probably want to lowercase before removing stop
words, so you'll want to change the order of those filters a bit. That's not
related to your wildcar
46 matches
Mail list logo