Re: How to manage resource out of index?

2010-07-06 Thread Rebecca Watson
hi li, i looked at doing something similar - where we only index the text but retrieve search results / highlight from files -- we ended up giving up because of the amount of customisation required in solr -- mainly because we wanted the distributed search functionality in solr which meant making

How to manage resource out of index?

2010-07-06 Thread Li Li
I used to store full text into lucene index. But I found it's very slow when merging index because when merging 2 segments it copy the fdt files into a new one. So I want to only index full text. But When searching I need the full text for applications such as hightlight and view full text. I can s

Re: document level security: indexing/searching techniques

2010-07-06 Thread Glen Newton
You could implement a good solution with the underlying Lucene ParallelReader http://lucene.apache.org/java/3_0_2/api/core/org/apache/lucene/index/ParallelReader.html Keep the 100 search fields - 'static' info - in one index, the permissions info in another index that gets updated when the permissi

Re: document level security: indexing/searching techniques

2010-07-06 Thread Lance Norskog
What Ken describes is called 'role-based' security. Users have roles, and security items talk about roles, not users. http://en.wikipedia.org/wiki/Role-based_access_control On Tue, Jul 6, 2010 at 3:15 PM, Peter Sturge wrote: > Yes, you don't want to hard code permissions into your index - it wil

Re: general debugging techniques?

2010-07-06 Thread Lance Norskog
Ah! I did not notice the 'too many open files' part. This means that your mergeFactor setting is too high for what your operating system allows. The default mergeFactor is 10 (which translates into thousands of open file descriptors). You should lower this number. On Tue, Jul 6, 2010 at 1:14 PM, J

index format error because disk full

2010-07-06 Thread Li Li
the index file is ill-formated because disk full when feeding. Can I roll back to last version? Is there any method to avoid unexpected errors when indexing? attachments are my segment_N

Re: Deleting Terms:

2010-07-06 Thread Erick Erickson
That's because deleting a document simply marks it as deleted, it doesn't really do much else with it, all that work is deferred to the optimize step as you've found. But deleted documents will NOT be found even though the admin page shows their terms still in the index. Best Erick On Tue, Jul 6

Re: Relevancy and non-matching words

2010-07-06 Thread Erick Erickson
Underneath SOLR is Lucene. Here's a description of Lucene's scoring algorithm (follow the "Similarity" link) http://lucene.apache.org/java/2_4_0/scoring.html#Understanding%20the%20Scoring%20Formula Letters in non-matching words isn't relevant, what is is the relationship between the number of sear

Re: Adding new elements to index

2010-07-06 Thread Erick Erickson
first do you have a unique key defined in your schema.xml? If you do, some of those 300 rows could be replacing earlier rows. You say: " if I have 200 rows indexed from postgres and 100 rows from Oracle, the full-import process only indexes 200 documents from oracle, although it shows clearly that

Re: Wildcards queries

2010-07-06 Thread Erick Erickson
Still not enough info. Please show: 1> the field type (not field, but field type showing the analyzers for the field you're interested in). 2> example data you've indexed 3> the query you submit 4> the response from the query (especially with &debugQuery=on appended to the query). Otherwise, it's

Re: Problem building Nightly Solr

2010-07-06 Thread Ken Krugler
On Jul 6, 2010, at 3:44pm, Chris Hostetter wrote: : Can you try "ant compile example"? : After Lucene/Solr merge, solr ant build needs to compile before example : target. the "compile" target is already in the dependency tree for the "example" target, so that won't change anything. At

Re: Problem building Nightly Solr

2010-07-06 Thread Chris Hostetter
: (this is particularly odd since the nightlies include all the compiled : lucene code as jars in a "lucene-libs/" directory, but the build system : doesn't seem to use that directory ... at least not when compiling solrj). https://issues.apache.org/jira/browse/SOLR-1989 -Hoss

Re: Problem building Nightly Solr

2010-07-06 Thread Chris Hostetter
: Can you try "ant compile example"? : After Lucene/Solr merge, solr ant build needs to compile before example : target. the "compile" target is already in the dependency tree for the "example" target, so that won't change anything. At the moment, the "nightly" snapshots produced by hudson only

Re: Unicode processing - Issue with CharStreamAwareWhitespaceTokenizerFactory

2010-07-06 Thread Jan Høydahl / Cominvent
The Char-filters MUST come before the Tokenizer, due to their nature of processing the character-stream and not the tokens. If you need to apply the accent normalizatino later in the analysis chain, either use ISOLatin1AccentFilterFactory or help with the implementation of SOLR-1978. -- Jan Hø

Re: document level security: indexing/searching techniques

2010-07-06 Thread Peter Sturge
Yes, you don't want to hard code permissions into your index - it will give you headaches. You might want to have a look at SOLR 1872: https://issues.apache.org/jira/browse/SOLR-1872 . This patch provides doc level security through an external ACL mechanism (in this case, an XML file) controlling

Re: Problem building Nightly Solr

2010-07-06 Thread Koji Sekiguchi
(10/07/07 6:25), darknovan...@gmail.com wrote: I'd like to try the new edismax feature in Solr, so I downloaded the latest nightly (apache-solr-4.0-2010-07-05_08-06-42) and tried running "ant example". It fails with a missing package error. I've pasted in the output below. I tried a nightly fro

Problem building Nightly Solr

2010-07-06 Thread DarkNovaNick
I'd like to try the new edismax feature in Solr, so I downloaded the latest nightly (apache-solr-4.0-2010-07-05_08-06-42) and tried running "ant example". It fails with a missing package error. I've pasted in the output below. I tried a nightly from a couple weeks ago, and it did the same t

Re: Solr results not updating

2010-07-06 Thread Moazzam Khan
That's exactly what it was. I forgot to commit. Thanks, Moazzam On Tue, Jul 6, 2010 at 3:29 PM, Markus Jelsma wrote: > Hi, > > > > If q=*:* doesn't show your insert, then you forgot the commit: > > http://wiki.apache.org/solr/UpdateXmlMessages#A.22commit.22_and_.22optimize.22 > > > > Cheers, >

RE: Solr results not updating

2010-07-06 Thread Markus Jelsma
Hi,   If q=*:* doesn't show your insert, then you forgot the commit: http://wiki.apache.org/solr/UpdateXmlMessages#A.22commit.22_and_.22optimize.22   Cheers,   -Original message- From: Moazzam Khan Sent: Tue 06-07-2010 22:09 To: solr-user@lucene.apache.org; Subject: Solr results no

Re: general debugging techniques?

2010-07-06 Thread Jim Blomo
On Sat, Jul 3, 2010 at 1:10 PM, Lance Norskog wrote: > You don't need to optimize, only commit. OK, thanks for the tip, Lance. I thought the "too many open files" problem was because I wasn't optimizing/merging frequently enough. My understanding of your suggestion is that commit also does merg

Solr results not updating

2010-07-06 Thread Moazzam Khan
Hi, I just successfully inserted a document into SOlr but when I search for it, it doesn't show up. Is it a cache issue or something? Is there a way to make sure it was inserted properly? And, it's there? Thanks, Moazzam

Re: using DataImport Dev Console: no errors, but no documents

2010-07-06 Thread Chris Hostetter
: It fetches 5322 rows but doesn't process any documents and doesn't : populate the index. Any suggestions would be appreciated. I don't know much about DIH, but it seems weird that both of your entities say 'rootEntity="false"' looking at the docs, that definitely doesn't seem like what you

Re: DatImportHandler and cron issue

2010-07-06 Thread Chris Hostetter
: What we are seeing is the request is dispatched to solr server,but its not : being processed. you'll have to explain what you mean by "not being processed" ? According to your logs, DIH is in fact working and logging it's progress... : 2010-06-14 12:51:01,328 INFO [org.apache.solr.core.SolrC

Re: proximity question

2010-07-06 Thread Ahmet Arslan
> Will quotes do an exact match within > a proximity test? No. > If not, does anybody know how to accomplish this? It is not supported out-of-the-box. You need to plug Lucene's XmlQueryParser or SurroundQueryParser. Similar discussion: http://search-lucene.com/m/PO3iXKRuAv1/

proximity question

2010-07-06 Thread mike anderson
Will quotes do an exact match within a proximity test? For instance body:""mountain goat" grass"~10 should match: "the mountain goat went up the hill to eat grass" but should NOT match "the mountain where the goat lives is covered in grass" If not, does anybody know how to accomplish this?

Re: Deleting Terms:

2010-07-06 Thread Kumaravel Kandasami
FYI - optimise() operations solved the issue. Kumar_/|\_ www.saisk.com ku...@saisk.com "making a profound difference with knowledge and creativity..." On Tue, Jul 6, 2010 at 11:47 AM, Kumaravel Kandasami < kumaravel.kandas...@gmail.com> wrote: > BTW, Using SOLRJ - javabin api. > > > > Kuma

Re: Deleting Terms:

2010-07-06 Thread Kumaravel Kandasami
BTW, Using SOLRJ - javabin api. Kumar_/|\_ www.saisk.com ku...@saisk.com "making a profound difference with knowledge and creativity..." On Tue, Jul 6, 2010 at 11:43 AM, Kumaravel Kandasami < kumaravel.kandas...@gmail.com> wrote: > Hi, > >How to delete the terms associated with the doc

Deleting Terms:

2010-07-06 Thread Kumaravel Kandasami
Hi, How to delete the terms associated with the document ? Current scenario: We are deleting documents based on a query ('field:value'). The documents are getting deleted, however, the old terms associated to the field are displayed in the admin. How do we make SOLR to re-evaluate and update

Re: document level security: indexing/searching techniques

2010-07-06 Thread Ken Krugler
On Jul 6, 2010, at 8:27am, osocurious2 wrote: Someone else was recently asking a similar question (or maybe it was you but worded differently :) ). Putting user level security at a document level seems like a recipe for pain. Solr/Lucene don't do frequent update well...and being highly

Relevancy and non-matching words

2010-07-06 Thread dbashford
Is there some sort of threshold that I can tweak which sets how many letters in non-matching words makes a result more or less relevant? Searching on title, q=fantasy football, and I get this: {"title":"The Fantasy Football Guys", "score":2.8387074}, {"title":"Fantasy Football Bums", "score":2.8

Re: document level security: indexing/searching techniques

2010-07-06 Thread osocurious2
Someone else was recently asking a similar question (or maybe it was you but worded differently :) ). Putting user level security at a document level seems like a recipe for pain. Solr/Lucene don't do frequent update well...and being highly optimized for query, I don't blame them. Is there any wa

Adding new elements to index

2010-07-06 Thread Xavier Rodriguez
Hi, I have a SOLR installed on a Tomcat application server. This solr instance has some data indexed from a postgres database. Now I need to add some entities from an Oracle database. When I run the full-import command, the documents indexed are only documents from postgres. In fact, if I have 200

Re: Wildcards queries

2010-07-06 Thread RL
Hi, a bit more information would help to identify what's the problem in your case. but in general these facts come into my mind: - leading wildcard queries are not available in solr (without extending the QueryParser). - no text analysing will be performed on the search word when using wildcards

document level security: indexing/searching techniques

2010-07-06 Thread RL
I've a question about indexing/searching techniques in relation to document level security. In planning a system that has, let's say, about 1million search documents with about 100 search fields each. Most of them unstored to keep the index size low, because some of them can contain some kilobytes

Re: Wildcards queries

2010-07-06 Thread Robert Naczinski
Hi, thanks for the reply. I am an absolute beginner with Solr. I have taken, for the beginning, the configuration from {solr.home}example/solr . In solrconfig.xml are all queryparser commented out ;-( Where can a find the QeryParser? Javadoc, Wiki? Regards, Robert 2010/7/6 Mark Miller : > On

Re: solr with hadoop

2010-07-06 Thread Jason Rutherglen
> If you do distributed indexing correctly, what about updating the documents > and what about replicating them correctly? Yes, you can do you and it'll work great. On Mon, Jul 5, 2010 at 7:42 AM, MitchK wrote: > > I need to revive this discussion... > > If you do distributed indexing correctly,

Re: Wildcards queries

2010-07-06 Thread Mark Miller
On 7/6/10 8:53 AM, Robert Naczinski wrote: > Hi, > > we use in our application EmbeddedSolrServer. Great! > Everything went fine. Excellent! > Now I want use wildcards queries. Cool! > > It does not work. Bummer! > Must be adapted for the schema.xml? Not necessarily... > > Can someon

Wildcards queries

2010-07-06 Thread Robert Naczinski
Hi, we use in our application EmbeddedSolrServer. Everything went fine. Now I want use wildcards queries. It does not work. Must be adapted for the schema.xml? Can someone help me? In wiki, I find nothing? Why do I need simple example or link. Regards, Robert

Re: Data Import Handler Rich Format Documents

2010-07-06 Thread Tod
On 6/28/2010 8:28 AM, Alexey Serba wrote: Ok, I'm trying to integrate the TikaEntityProcessor as suggested. �I'm using Solr Version: 1.4.0 and getting the following error: java.lang.ClassNotFoundException: Unable to load BinURLDataSource or org.apache.solr.handler.dataimport.BinURLDataSource It

Re: Duplicate items in distributed search

2010-07-06 Thread Erik Hatcher
On Jul 4, 2010, at 5:10 PM, Andrew Clegg wrote: Mark Miller-3 wrote: On 7/4/10 12:49 PM, Andrew Clegg wrote: I thought so but thanks for clarifying. Maybe a wording change on the wiki Sounds like a good idea - go ahead and make the change if you'd like. That page seems to be marked

Re: problem with formulating a negative query

2010-07-06 Thread Sascha Szott
Hi, Chris Hostetter wrote: AND, OR, and NOT are just syntactic-sugar for modifying the MUST, MUST_NOT, and SHOULD. The default op of "OR" only affects the first clause of your query (R) because it doesn't have any modifiers -- Thanks for pointing that out! -Sascha the second clause has that