Hello,
I need to know exact count of certain terms in the documents. I
noticed that when I update a document, (only one field for testing)
the terms count go +1 for that specific term. for example, if I have
two documents in index, each with tag="ccc" and if I update one of the
documents, the term
Hello,
I have upgraded from 1.4 to 3.6 - it went quite smooth, using the same
schema.xml
I have done some testing, and I have not found any problems yet. Soon
I will migrate the production system to 3.6
Any recomendations on this matter? Maybe I skipped something?
Best Regards,
C.B.
: the terms count go +1 for that specific term. for example, if I have
: two documents in index, each with tag="ccc" and if I update one of the
: documents, the terms frequency for ccc becomes 3. when I optimize the
: index, it goes down again to correct number. (2)
http://wiki.apache.org/solr/Te
Hi,
If you're using non ascii data with solrj you might want to test that
it works for you properly. See for example
https://issues.apache.org/jira/browse/SOLR-3375
--
Sami Siren
On Fri, May 25, 2012 at 10:11 AM, Cam Bazz wrote:
> Hello,
>
> I have upgraded from 1.4 to 3.6 - it went quite smoo
Hello,
I have tested, but was not able to replicate the problem.
(basically i indexed few documents with utf8 chars, and then searched
for them, and found ok)
On the issues at 27/Apr/12 08:56
> the fix is now committed to 3.6 branch
I just recently downloaded the 3.6 - well actually it seems I
Oh ok, I got it.
So If I update the document three times, does that mean I have 1
normal document, and 2 marked for deletion?
Because the max difference was 1 - no matter how many times you update.
I think I can manage the faceting to do what I need. I guess that will
be faster than making a rea
Oh, thx for the update! I didn't noticed that solr 3.6 has a text_de field
type. These two options... less / more aggressive. Aggressive in terms of
what?
Thank you!
> -Original Message-
> From: Jack Krupansky [mailto:j...@basetechnology.com]
> Sent: Freitag, 25. Mai 2012 03:25
> To: sol
Jack Krupansky basetechnology.com> writes:
>
> I vaguely recall some thread blocking issue with trying to parse too many
> PDF files at one time in the same JVM.
>
> Occasionally Tika (actually PDFBox) has been known to hang for some PDF
> docs.
>
> Do you have enough memory in the JVM? When
I think we are going to add some more knobs, but currently it's done like this.
Say you want 3 shards, each with 3 replicas.
Start each shard with the sys prop -DnumShards=3, and start 9 shards.
On May 24, 2012, at 11:42 PM, Vince Wei (jianwei) wrote:
> I am using Solr 4.0.
>
> I want the numb
i have a need to incrementally index documents (probably MS
Office/OpenOffice/pdf files)
from a GIT repository using Tika. i'm expecting to run periodic pulls against
the repository
to find new and updated docs.
does anyone have any experience and/or thoughts/suggestions that they'd like to
sha
Hey all,
I have another question with regards to this thread.
Does anyone know what the state is of the rollback command in 4.0 and how
it works with both; replicas (i.e. distributed rollbacks) and the snapshot
isolation implemented (i.e. timestamps reverted?), the relevant class is
DistributedU
Greetings,
Following the directions here:
http://svn.apache.org/repos/asf/lucene/dev/trunk/dev-tools/maven/README.maven
for building Lucene/Solr with Maven, what is the correct -Dversion to pass
in to get-maven-poms.
This seems set up for building -SNAPSHOT, however, I would like to use
maven to
I have just noticed that Solr 3.6 still includes Jetty 6, which is no
longer maintained.
Not no longer developed, but it has actually reached End of Life as of
26th January 2012 (
http://dev.eclipse.org/mhonarc/lists/jetty-announce/msg00026.html ) and
that means no bugfixes or security patches
There is some discussion here:
https://issues.apache.org/jira/browse/SOLR-3159
-- Jack Krupansky
-Original Message-
From: Maciej Lisiewski
Sent: Friday, May 25, 2012 10:43 AM
To: solr-user@lucene.apache.org
Subject: Why is Solr still shipped with Jetty 6 / switching to Jetty 8?
I h
There is some discussion here:
https://issues.apache.org/jira/browse/SOLR-3159
I've seen it - it's one of the Jira tickets I was referring to: Jetty 8
is default for trunk now, but I have failed to find any info about using
Jetty 8 with Solr 3.6.
--
Maciej Lisiewski
I don't know the specific rules in these specific stemmers, but generally a
"less aggressive" stemming (e.g., "plural-only") of "paintings" would be
"painting", while a "more aggressive" stemming would be "paint". For some
"aggressive" stemmers the stemmed word is not even a word.
It would be
Hmmm... what's going on here with email names and addresses???
My email client says "From: chris.a.mattm...@jpl.nasa.gov" for the name, but
shows an email address of "csnsha...@gmail.com". Is this message from Chris
A. Mattmann or not?!?
And in the actual eamil header I see this:
From: =?utf-
Let's just wait until SOLR 4.0 is out in a couple months.
On Fri, May 25, 2012 at 9:06 AM, Maciej Lisiewski wrote:
>
>> There is some discussion here:
>> https://issues.apache.org/jira/browse/SOLR-3159
>>
>
> I've seen it - it's one of the Jira tickets I was referring to: Jetty 8 is
> default for
> I don't know the specific rules in these specific stemmers,
> but generally a
> "less aggressive" stemming (e.g., "plural-only") of
> "paintings" would be
> "painting", while a "more aggressive" stemming would be
> "paint". For some
> "aggressive" stemmers the stemmed word is not even a wor
I tried your scenario with the Solr 3.6 example and it seemed to work fine
and suggested an accented term for me.
Some possibilities:
1) Your term had an editing distance that was too high relative to any
accented correction. Check your term and count how many characters must be
changed to ma
Hello all,
I am trying to understand the output of Solr explain for a one word query.
I am querying on the "ocr" field with no stemming/synonyms or stopwords.
And no query or index time boosting.
The query is "ocr:the"
The document (result below) which contains two words "The Aeroplane" gets
mo
"the encoding of the character used for alif (02BE) carries with it an
assigned property in the Unicode database of (Lm), putting it into the
category of 'Modifier_Letter'..."
Correction to what I put there: 02BC, rather. The rest of that still
holds up; the data I'm looking at regarding proper
On 25/05/2012 20:13, Tom Burton-West wrote:
Hello all,
I am trying to understand the output of Solr explain for a one word query.
I am querying on the "ocr" field with no stemming/synonyms or stopwords.
And no query or index time boosting.
The query is "ocr:the"
The document (result below) wh
On Fri, May 25, 2012 at 2:13 PM, Tom Burton-West wrote:
> The explain (debugQuery) shows the following for fieldnorm:
> 0.625 = fieldNorm(field=ocr, doc=16624)
> What does the "doc=16624" mean?
It's the internal document id (i.e. it's debugging info and doesn't
affect scoring)
-Yonik
http://luc
Hi,
I delete some data from Solr, post the deletion I am getting truncated XML
when I run q=*:* query, in all other cases the queries execute fine. The
following error is shown in the log files,
May 25, 2012 7:10:36 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.NullPointer
Hello
I just wanted to ask if queries to solr index are blocked while delta
import?
I read at the wiki page that queries to solr are not blocked while full
imports, but the page doesnt mention anything about delta import. What
happens then?
I am currently facing a problem, my query takes very lo
Consider a db of just names. Now if I use synonym expansion at query time, I
get a set of results.
(Background: I created a class, which resets idf, tf, .. .all to 1) since
they dont matter to me anymore. What really matters is, how closely does the
query match to the given name.
Currently I am
Another problem (just discovered this): TokenizerFactories do not get
resource handlers. So, you can't go read config or model files for
your Tokenizer. TokenFilters do, so you can use the KeywordTokenizer
(make one big term) and do your work in a TokenFilter that gets the
whole thing.
On Thu, May
I was wondering if someone could explain if the following is supported
with the current EDisMax Field Aliasing.
I have a field like person_name which exists in solr, we also have 2
other fields named person_first_name and person_last_name. I would
like to allow queries for person_name to be alias
: Another problem (just discovered this): TokenizerFactories do not get
: resource handlers. So, you can't go read config or model files for
: your Tokenizer. TokenFilters do, so you can use the KeywordTokenizer
TokenizerFactory subclasses can implement ResourceLoaderAware and load any
resources
30 matches
Mail list logo