Re: Solr UIMA with KEA

2012-11-23 Thread Tommaso Teofili
the AlchemyAPI service is not mandatory (it's there just as an example and can be safely removed), you can use whatever service you want as long as it's wrapped by a UIMA AnalysisEngine and you specify its descriptor. See following updateChain example configuration : /path/to/KEAdescritpor.x

Re: From Solr3.1 to SolrCloud

2012-11-23 Thread roySolr
Thanks Tomás for the information so far. You said: You can effectively run with only one zk instance, the problem with this is that if that instance dies, then your whole cluster will go down. When the cluster goes down i can still send queries to the solr instances? We have a lb that's choose a

RE: Performance improvement for solr faceting on large index

2012-11-23 Thread Pravin Agrawal
Thanks Yuval and Otis for the reply. Yuval: I tried different combination of facet.method (fc and enum) and filtercache size but there was not much improvement in the processing time. Otis: We have a plan in future to move this processing out of solr but it will be a large code change at this p

Re: SolrCloud and exernal file fields

2012-11-23 Thread Martin Koch
The short answer is no; the number was chosen in an attempt to get as many cores working in parallel to complete the search faster, but I realize that there is an overhead incurred by distribution and merging the results. We've now gone to 8 shards and will be monitoring performance. /Martin On

Re: From Solr3.1 to SolrCloud

2012-11-23 Thread Tomás Fernández Löbbe
I think that's correct. Queries to the existing nodes will still work with no ZK. On Fri, Nov 23, 2012 at 7:16 AM, roySolr wrote: > Thanks Tomás for the information so far. > > You said: > You can effectively run with only one zk instance, the problem with this is > that if that instance dies,

Solr replication

2012-11-23 Thread jacques.cortes
I have 2 Solr servers : - 1 master - 1 slave The master server has a mounted VIP by heartbeat and can toggle on the other server. What do you think of the idea to put the same config file on the 2 servers with master's replication handler url pointing on the vip? -- View this message in conte

SOLR4 cluster - strange CPU spike on slave

2012-11-23 Thread John Nielsen
Hi all, We are seeing a strange CPU spike on one of our solr4 servers which we are unable to explain. The spike, which only lasts for a couple of minutes, sends the disks racing. This happens a few times a times a day. This is what the load looks like: 2012.Nov.14 13:37:172.77 2012.Nov.14 13:

Re: copyField multiValued duplicates

2012-11-23 Thread Erick Erickson
Unless you stored all the original fields, I think you're stuck with re-indexing all your docs Best Erick On Mon, Nov 19, 2012 at 12:21 PM, Ravi Solr wrote: > Hello, > I have a couple of questions. I need an easy way to clean up a gaffe > with copyFields (close to a million docs). Is

Re: inconsistent number of results returned in solr cloud

2012-11-23 Thread Erick Erickson
Dave: I should have asked this first. What version of Solr are you using? I Not sure whether it was fixed in BETA or not (certainly is in the 4.0 GA release). There was a problem with adding a doclist via solrj, here's one related JIRA, although it wasn't the main fix: https://issues.apache.org/j

Re: SOLR4 cluster - strange CPU spike on slave

2012-11-23 Thread Otis Gospodnetic
Strange indeed. What about query load/ayes during that time? What about GC? And does cache hit rate drop? Otis -- SOLR Performance Monitoring - http://sematext.com/spm On Nov 23, 2012 2:45 AM, "John Nielsen" wrote: > Hi all, > > We are seeing a strange CPU spike on one of our solr4 servers which

Re: Apply clustering to field names?

2012-11-23 Thread Erick Erickson
Per: 1> relevancy sorting on field names: First, you have to define what that means ... Relavant to your query terms? Relevant by the count of field names in a particular document? Under any circumstances, this seems like it's heading towards some kind of analytics. Take a look at FunctionQueries,

Re: SolrCloud across datacenter

2012-11-23 Thread Erick Erickson
I'd be pretty cautious trying this, although I confess I haven't done it personally. The problem here is that each node has to be able (and will) talk to each leader and vice-versa. And the various replicas will have to talk to every other replica. If the pipe connecting your data centers isn't ver

MailEntityProcessor in Solr 4

2012-11-23 Thread Robert Bernhardt
Hi Guys, I'm trying to use the MailEntityProcessor with Solr 4 and have a couple of questions about. I already configured an IMAP Account to fetch mails from, so I'm able to index some mails. Questions/Issues: 1.) I set the fetchSize to 1 000 000 as the default 32k were not sufficient at all, but

Re: SynonymFilterFactory breaking WordDelimiterFilterFactory output

2012-11-23 Thread Erick Erickson
Best advice here is to look hard at admin/analysis and see. But a couple of notes: 1> it's usually unnecessary to include the exact same synonyms in both query and index time chains. Index-time is preferred. 2> putting lowercasefilter in front of worddelimiterfilter is going to break wdff _if_ yo

Re: Copying few field using copyField to non multiValued field

2012-11-23 Thread Erick Erickson
Barry: This is just an artifact of the output. If you set positionIncremenGap to 1 (or maybe 0, but I think 1) then phrase searches will work just fine across multiple entries. With the proper setting for positionIncrementGap, there is really no difference between multiValued and non-multiValued f

'CException' with message 'Solr error: "0" Status: Communication Error'

2012-11-23 Thread John Kim
I run a batch fullindexing process but it seems to get disconnected after the first batch. Am using the php-solr-client. Am wondering if this thread is relevant to me: http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201001.mbox/%3cd8b8d0161001122041w230b34efxdc15f17a8c8c4...@mail.gmail.

matched terms are not highlighted

2012-11-23 Thread Dzmitry Petrushenka
Hi All! Has anyone ever had the following problem... If use FastVectorHighlighter and boundaryScanner with hl.bs.type set to SENTENCE there are situations when not all matched terms are highlighted in fragments. I.e. 1. FastVectorHighlighter generates FieldFragList with info on what term

Re: User context based search in apache solr

2012-11-23 Thread sagarzond
Hi Erick Thanks for reply. In our application having product table with many fields and we are providing these all fields during search. If we made de-normalized structure then there is having lots of redundant data and that may result in to 1. required more amount of memory. 2. d

Solr Near Realtime with denormalized Data

2012-11-23 Thread zbindigonzales
Hi there. In our company we use Apache Solr 4 to index data from the database via the databaseimport handler. The data we are indexing is a denormalzied table (Patient, Visit, Study, Image). One requirement is to be near realtime. For that we use softcomits every second. The index size is about

Re: SolrCloud and exernal file fields

2012-11-23 Thread Simone Gianni
2012/11/22 Martin Koch > IMO it would be ideal if the lucene/solr community could come up with a > good way of updating fields in a document without reindexing. This could be > by linking to some external data store, or in the lucene/solr internals. If > it would make things easier, a good first

Re: SolrCloud and exernal file fields

2012-11-23 Thread Simone Gianni
Posted, see it here http://lucene.472066.n3.nabble.com/Possible-sharded-and-replicated-replacement-for-ExternalFileFields-in-SolrCloud-td4022108.html Simone 2012/11/23 Simone Gianni > 2012/11/22 Martin Koch > >> IMO it would be ideal if the lucene/solr community could come up with a >> good w

SPAN queries in solr

2012-11-23 Thread Anirudha Jadhav
What is the best way to use span queries in solr ? I see https://issues.apache.org/jira/browse/SOLR-839 which enables the XML Query parser that supports span queries. -- Anirudha P. Jadhav

Re: SPAN queries in solr

2012-11-23 Thread simon
take a look at SOLR-2703, which was committed for 4.0. It provides a Solr wrapper for the surround query parser, which supports span queries. On Fri, Nov 23, 2012 at 3:38 PM, Anirudha Jadhav wrote: > What is the best way to use span queries in solr ? > > I see https://issues.apache.org/jira/brow

Re: SPAN queries in solr

2012-11-23 Thread Anirudha Jadhav
Can this be made to work with solr 3.5 ? i will give it a try. Thanks On Nov 23, 2012, at 17:28, simon wrote: > take a look at SOLR-2703, which was committed for 4.0. It provides a Solr > wrapper for the surround query parser, which supports span queries. > > On Fri, Nov 23, 2012 at 3:38 PM

Re: SolrCloud and exernal file fields

2012-11-23 Thread Gopal Patwa
Hi, I am also very much interested in this, since we use Solr 4 with NRT where we update index every second but most of time it update only stored filed. if Solr/Lucene could provide external datastore without re-indexing even for stored field only, it would be very beneficial for frequent update

Problem with Solr 3.6.1 extracting ODT content using SolrCell's ExtractingRequestHandler

2012-11-23 Thread Brett Melbourne
Hi all, I am encountering a problem where Solr 3.6.1 is not able to extract the text content from ODT (Open Office Document) files submitted to the ExtractingRequestHandler. I can reproduce this issue against the example schema running with jetty. Executing a simple index request (based on the

Re: SynonymFilterFactory breaking WordDelimiterFilterFactory output

2012-11-23 Thread Yonik Seeley
Sounds like perhaps the SynonymFilter is losing the positionIncrement of 0 (which make the first two tokens overlap)? You could perhaps verify with the analysis debugging on the admin page. -Yonik http://lucidworks.com On Tue, Nov 20, 2012 at 10:55 PM, Chris Book wrote: > Hello, I've recently u

Re: Ignore tf/idf at index time

2012-11-23 Thread Jack Krupansky
"the boosting at index time" What index-time boosting are you referring to? I mean, tf and idf are used to calculate scores at query time. There is a document boost at index time, but that is independent of tf and idf. -- Jack Krupansky -Original Message- From: flahti Sent: Frida

Re: User context based search in apache solr

2012-11-23 Thread sagarzond
Let me re-phrase. In our application de-normalizing "Will" result in to 1. required more amount of memory. 2. degrade search performance (cpu and response time) Let me give example - Our application has product table with 1 million entries and users are increasing exponentially.