terms component misleading results

2012-05-25 Thread Cam Bazz
Hello, I need to know exact count of certain terms in the documents. I noticed that when I update a document, (only one field for testing) the terms count go +1 for that specific term. for example, if I have two documents in index, each with tag="ccc" and if I update one of the documents, the term

upgrade to 3.6

2012-05-25 Thread Cam Bazz
Hello, I have upgraded from 1.4 to 3.6 - it went quite smooth, using the same schema.xml I have done some testing, and I have not found any problems yet. Soon I will migrate the production system to 3.6 Any recomendations on this matter? Maybe I skipped something? Best Regards, C.B.

Re: terms component misleading results

2012-05-25 Thread Chris Hostetter
: the terms count go +1 for that specific term. for example, if I have : two documents in index, each with tag="ccc" and if I update one of the : documents, the terms frequency for ccc becomes 3. when I optimize the : index, it goes down again to correct number. (2) http://wiki.apache.org/solr/Te

Re: upgrade to 3.6

2012-05-25 Thread Sami Siren
Hi, If you're using non ascii data with solrj you might want to test that it works for you properly. See for example https://issues.apache.org/jira/browse/SOLR-3375 -- Sami Siren On Fri, May 25, 2012 at 10:11 AM, Cam Bazz wrote: > Hello, > > I have upgraded from 1.4 to 3.6 - it went quite smoo

Re: upgrade to 3.6

2012-05-25 Thread Cam Bazz
Hello, I have tested, but was not able to replicate the problem. (basically i indexed few documents with utf8 chars, and then searched for them, and found ok) On the issues at 27/Apr/12 08:56 > the fix is now committed to 3.6 branch I just recently downloaded the 3.6 - well actually it seems I

Re: terms component misleading results

2012-05-25 Thread Cam Bazz
Oh ok, I got it. So If I update the document three times, does that mean I have 1 normal document, and 2 marked for deletion? Because the max difference was 1 - no matter how many times you update. I think I can manage the faceting to do what I need. I guess that will be faster than making a rea

RE: Wildcard-Search Solr 3.5.0

2012-05-25 Thread spring
Oh, thx for the update! I didn't noticed that solr 3.6 has a text_de field type. These two options... less / more aggressive. Aggressive in terms of what? Thank you! > -Original Message- > From: Jack Krupansky [mailto:j...@basetechnology.com] > Sent: Freitag, 25. Mai 2012 03:25 > To: sol

Re: Solr Performance

2012-05-25 Thread chris . a . mattmann
Jack Krupansky basetechnology.com> writes: > > I vaguely recall some thread blocking issue with trying to parse too many > PDF files at one time in the same JVM. > > Occasionally Tika (actually PDFBox) has been known to hang for some PDF > docs. > > Do you have enough memory in the JVM? When

Re: how can I specify the number of replications for each shard?

2012-05-25 Thread Mark Miller
I think we are going to add some more knobs, but currently it's done like this. Say you want 3 shards, each with 3 replicas. Start each shard with the sys prop -DnumShards=3, and start 9 shards. On May 24, 2012, at 11:42 PM, Vince Wei (jianwei) wrote: > I am using Solr 4.0. > > I want the numb

indexing documents from a git repository

2012-05-25 Thread Welty, Richard
i have a need to incrementally index documents (probably MS Office/OpenOffice/pdf files) from a GIT repository using Tika. i'm expecting to run periodic pulls against the repository to find new and updated docs. does anyone have any experience and/or thoughts/suggestions that they'd like to sha

Re: Solr 4.0 Distributed Concurrency Control Mechanism?

2012-05-25 Thread Nicholas Ball
Hey all, I have another question with regards to this thread. Does anyone know what the state is of the rollback command in 4.0 and how it works with both; replicas (i.e. distributed rollbacks) and the snapshot isolation implemented (i.e. timestamps reverted?), the relevant class is DistributedU

Generating maven artifacts for 3.6.0 build - correct -Dversion to use?

2012-05-25 Thread Aaron Daubman
Greetings, Following the directions here: http://svn.apache.org/repos/asf/lucene/dev/trunk/dev-tools/maven/README.maven for building Lucene/Solr with Maven, what is the correct -Dversion to pass in to get-maven-poms. This seems set up for building -SNAPSHOT, however, I would like to use maven to

Why is Solr still shipped with Jetty 6 / switching to Jetty 8?

2012-05-25 Thread Maciej Lisiewski
I have just noticed that Solr 3.6 still includes Jetty 6, which is no longer maintained. Not no longer developed, but it has actually reached End of Life as of 26th January 2012 ( http://dev.eclipse.org/mhonarc/lists/jetty-announce/msg00026.html ) and that means no bugfixes or security patches

Re: Why is Solr still shipped with Jetty 6 / switching to Jetty 8?

2012-05-25 Thread Jack Krupansky
There is some discussion here: https://issues.apache.org/jira/browse/SOLR-3159 -- Jack Krupansky -Original Message- From: Maciej Lisiewski Sent: Friday, May 25, 2012 10:43 AM To: solr-user@lucene.apache.org Subject: Why is Solr still shipped with Jetty 6 / switching to Jetty 8? I h

Re: Why is Solr still shipped with Jetty 6 / switching to Jetty 8?

2012-05-25 Thread Maciej Lisiewski
There is some discussion here: https://issues.apache.org/jira/browse/SOLR-3159 I've seen it - it's one of the Jira tickets I was referring to: Jetty 8 is default for trunk now, but I have failed to find any info about using Jetty 8 with Solr 3.6. -- Maciej Lisiewski

Re: Wildcard-Search Solr 3.5.0

2012-05-25 Thread Jack Krupansky
I don't know the specific rules in these specific stemmers, but generally a "less aggressive" stemming (e.g., "plural-only") of "paintings" would be "painting", while a "more aggressive" stemming would be "paint". For some "aggressive" stemmers the stemmed word is not even a word. It would be

Re: Solr Performance

2012-05-25 Thread Jack Krupansky
Hmmm... what's going on here with email names and addresses??? My email client says "From: chris.a.mattm...@jpl.nasa.gov" for the name, but shows an email address of "csnsha...@gmail.com". Is this message from Chris A. Mattmann or not?!? And in the actual eamil header I see this: From: =?utf-

Re: Why is Solr still shipped with Jetty 6 / switching to Jetty 8?

2012-05-25 Thread William Bell
Let's just wait until SOLR 4.0 is out in a couple months. On Fri, May 25, 2012 at 9:06 AM, Maciej Lisiewski wrote: > >> There is some discussion here: >> https://issues.apache.org/jira/browse/SOLR-3159 >> > > I've seen it - it's one of the Jira tickets I was referring to: Jetty 8 is > default for

RE: Wildcard-Search Solr 3.5.0

2012-05-25 Thread spring
> I don't know the specific rules in these specific stemmers, > but generally a > "less aggressive" stemming (e.g., "plural-only") of > "paintings" would be > "painting", while a "more aggressive" stemming would be > "paint". For some > "aggressive" stemmers the stemmed word is not even a wor

Re: Accent Characters

2012-05-25 Thread Jack Krupansky
I tried your scenario with the Solr 3.6 example and it seemed to work fine and suggested an accented term for me. Some possibilities: 1) Your term had an editing distance that was too high relative to any accented correction. Check your term and count how many characters must be changed to ma

What is the "docs" number in Solr explain query results for fieldnorm?

2012-05-25 Thread Tom Burton-West
Hello all, I am trying to understand the output of Solr explain for a one word query. I am querying on the "ocr" field with no stemming/synonyms or stopwords. And no query or index time boosting. The query is "ocr:the" The document (result below) which contains two words "The Aeroplane" gets mo

Re: [solrmarc-tech] apostrophe / ayn / alif

2012-05-25 Thread Charles Riley
"the encoding of the character used for alif (02BE) carries with it an assigned property in the Unicode database of (Lm), putting it into the category of 'Modifier_Letter'..." Correction to what I put there: 02BC, rather. The rest of that still holds up; the data I'm looking at regarding proper

Re: What is the "docs" number in Solr explain query results for fieldnorm?

2012-05-25 Thread Andrzej Bialecki
On 25/05/2012 20:13, Tom Burton-West wrote: Hello all, I am trying to understand the output of Solr explain for a one word query. I am querying on the "ocr" field with no stemming/synonyms or stopwords. And no query or index time boosting. The query is "ocr:the" The document (result below) wh

Re: What is the "docs" number in Solr explain query results for fieldnorm?

2012-05-25 Thread Yonik Seeley
On Fri, May 25, 2012 at 2:13 PM, Tom Burton-West wrote: > The explain (debugQuery) shows the following for fieldnorm: >  0.625 = fieldNorm(field=ocr, doc=16624) > What does the "doc=16624" mean? It's the internal document id (i.e. it's debugging info and doesn't affect scoring) -Yonik http://luc

Strange Error - org.apache.solr.response.XMLWriter.writePrim(XMLWriter.java:778)

2012-05-25 Thread Rohit
Hi, I delete some data from Solr, post the deletion I am getting truncated XML when I run q=*:* query, in all other cases the queries execute fine. The following error is shown in the log files, May 25, 2012 7:10:36 PM org.apache.solr.common.SolrException log SEVERE: java.lang.NullPointer

Queries to solr being blocked

2012-05-25 Thread KPK
Hello I just wanted to ask if queries to solr index are blocked while delta import? I read at the wiki page that queries to solr are not blocked while full imports, but the page doesnt mention anything about delta import. What happens then? I am currently facing a problem, my query takes very lo

Solr boost relevancy

2012-05-25 Thread Gau
Consider a db of just names. Now if I use synonym expansion at query time, I get a set of results. (Background: I created a class, which resets idf, tf, .. .all to 1) since they dont matter to me anymore. What really matters is, how closely does the query match to the given name. Currently I am

Re: Creating custom Filter / Tokenizer / Request Handler for integration of NER-Framework

2012-05-25 Thread Lance Norskog
Another problem (just discovered this): TokenizerFactories do not get resource handlers. So, you can't go read config or model files for your Tokenizer. TokenFilters do, so you can use the KeywordTokenizer (make one big term) and do your work in a TokenFilter that gets the whole thing. On Thu, May

ExtendedDisMax Field Alias Question

2012-05-25 Thread Jamie Johnson
I was wondering if someone could explain if the following is supported with the current EDisMax Field Aliasing. I have a field like person_name which exists in solr, we also have 2 other fields named person_first_name and person_last_name. I would like to allow queries for person_name to be alias

Re: Creating custom Filter / Tokenizer / Request Handler for integration of NER-Framework

2012-05-25 Thread Chris Hostetter
: Another problem (just discovered this): TokenizerFactories do not get : resource handlers. So, you can't go read config or model files for : your Tokenizer. TokenFilters do, so you can use the KeywordTokenizer TokenizerFactory subclasses can implement ResourceLoaderAware and load any resources