Solr & JVM performance issue after 2 days

2010-12-06 Thread Hamid Vahedi
Hi, I am using multi-core tomcat on 2 servers. 3 language per server. I am adding documents to solr up to 200 doc/sec. when updating process is started, every thing is fine (update performance is max 200 ms/doc. with about 800 MB memory used with minimal cpu usage). After 15-17 hours it's bec

Re: only index synonyms

2010-12-06 Thread Tom Hill
Hi Lee, On Mon, Dec 6, 2010 at 10:56 PM, lee carroll wrote: > Hi Erik Nope, Erik is the other one. :-) > thanks for the reply. I only want the synonyms to be in the index > how can I achieve that ? Sorry probably missing something obvious in the > docs Exactly what he said, use the => syntax.

Re: only index synonyms

2010-12-06 Thread lee carroll
Hi Erik thanks for the reply. I only want the synonyms to be in the index how can I achieve that ? Sorry probably missing something obvious in the docs On 7 Dec 2010 01:28, "Erick Erickson" wrote: > See: > http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory > > wi

how to config DataImport Scheduling

2010-12-06 Thread Hamid Vahedi
Hi I want to config DataImport Scheduling, but not know, how to do it. i just create and compile Scheduling classes with netbeans. and now have Scheduling.Jar. Q: how to setup it on tomcat or solr? (i using tomcat 6 on windows 2008) thanks in advanced

Re: Solr -File Based Spell Check and Read .cfs file generated

2010-12-06 Thread rajini maski
Anyone know abt it? how to extract the dictionary generated by default.? How do i read this .cfs files generated in index folder.. Awaiting reply On Mon, Dec 6, 2010 at 7:54 PM, rajini maski wrote: > Yeah.. I wanna use this Spell-check only.. I want to create myself the > dictionary.. And

Re: Problem with DIH delta-import delete.

2010-12-06 Thread Matti Oinas
Thanks Koji. Problem seems to be that template transformer is not used when delete is performed. ... Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed ModifiedRowKey for Entity: entry rows obtained : 0 Dec 7, 2010 7:19:43 AM org.apache.solr.handler.

Re: Out of memory error

2010-12-06 Thread Fuad Efendi
Batch size "-1"??? Strange but could be a problem. Note also you can't provide parameters to default startup.sh command; you should modify setenv.sh instead --Original Message-- From: sivaprasad To: solr-user@lucene.apache.org ReplyTo: solr-user@lucene.apache.org Subject: Out of memory

Out of memory error

2010-12-06 Thread sivaprasad
Hi, When i am trying to import the data using DIH, iam getting Out of memory error.The below are the configurations which i have. Database:Mysql Os:windows No Of documents:15525532 In Db-config.xml i made batch size as "-1" The solr server is running on Linux machine with tomcat. i set tomcat a

Solr Newbie - need a point in the right direction

2010-12-06 Thread Mark
Hi, First time poster here - I'm not entirely sure where I need to look for this information. What I'm trying to do is extract some (presumably) structured information from non-uniform data (eg, prices from a nutch crawl) that needs to show in search queries, and I've come up against a wall. I'v

Re: Taxonomy and Faceting

2010-12-06 Thread Lance Norskog
That is correct. Solr is a search engine, not a text analysis engine. There are a few open source text analysis systems: Weka, OpenNLP, UIMA. Someone is working on integrating UIMA with Solr: https://issues.apache.org/jira/browse/SOLR-2129 But you should generally assume you will have a batch pro

Re: FastVectorHighlighter ignoring fragmenter parameter . . .

2010-12-06 Thread Koji Sekiguchi
(10/12/06 23:52), CRB wrote: Koji, Thank you for the reply. Being something of a novice with Solr, I would be grateful if you could clarify my next steps. I infer from your reply that there is no current implementation yet contributed for the FVH similar to the regex fragmenter. Thus I need

Re: only index synonyms

2010-12-06 Thread Erick Erickson
See: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory with the => syntax, I think that's what you're looking for Best Erick On Mon, Dec 6, 2010 at 6:34 PM, lee carroll wrote: > Hi Can the following usecase be achieved. > > value to be analysed at index time

Re: high CPU usage and SelectCannelConnector threads used a lot

2010-12-06 Thread Kent Fitch
Hi John, sounds like this bug in NIO: http://jira.codehaus.org/browse/JETTY-937 http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6403933 I think recent versions of jetty work around this bug, or maybe try the non-NIO socket connector Kent On Tue, Dec 7, 2010 at 9:10 AM, John Russell wrote:

only index synonyms

2010-12-06 Thread lee carroll
Hi Can the following usecase be achieved. value to be analysed at index time "this is a pretty line of text" synonym list is pretty => scenic , text => words valued placed in the index is "scenic words" That is to say only the matching synonyms. Basically i want to produce a normalised set of p

Re: DIH - rdbms to index confusion

2010-12-06 Thread kmf
I'm not understanding this response. My main table does have a one to many relationship with the other tables. What should I be anticipating/wanting for each document if I want to return to the user the values while allowing them to search on the other terms? Thanks. -- View this message in

high CPU usage and SelectCannelConnector threads used a lot

2010-12-06 Thread John Russell
Hi, I'm using solr and have been load testing it for around 4 days. We use the solrj client to communicate with a separate jetty based solr process on the same box. After a few days solr's CPU% is now consistently at or above 100% (multiple processors available) and the application using it is mo

Re: Syncing 'delta-import' with 'select' query

2010-12-06 Thread Juan Manuel Alvarez
Thanks for all the help! It is really appreciated. For now, I can afford the parallel requests problem, but when I put synchronous=true in the delta import, the call still returns with outdated items. Examining the log, it seems that the commit operation is being executed after the operation retur

Re: Taxonomy and Faceting

2010-12-06 Thread webdev1977
Thanks for the quick response! I was thinking more about the idea of having both structured and unstructred data coming into a system to be indexed/searched. I would like these documents to be processed by some sort of entity/keyword/semantic processing. I have a well defined taxonomy for my

Re: Taxonomy and Faceting

2010-12-06 Thread Peter Karich
I'm unsure but maybe you mean something like clustering? Then carrot^2 can do this (at index time I think): http://search.carrot2.org/stable/search?query=jetwick&view=visu (There is a plugin for solr) Or do you already know the categories of your docs. E.g. you already have a category tree and

Re: Syncing 'delta-import' with 'select' query

2010-12-06 Thread Alexey Serba
> When you say "two parallel requests from two users to single DIH > request handler", what do you mean by "request handler"? I mean DIH. > Are you > refering to the HTTP request? Would that mean that if I make the > request from different HTTP sessions it would work? No. It means that when you h

Re: Syncing 'delta-import' with 'select' query

2010-12-06 Thread Juan Manuel Alvarez
Alex: Thanks for the quick reply. When you say "two parallel requests from two users to single DIH request handler", what do you mean by "request handler"? Are you refering to the HTTP request? Would that mean that if I make the request from different HTTP sessions it would work? Cheers! Juan M.

Re: DIH - rdbms to index confusion

2010-12-06 Thread Alexey Serba
> I have a table that contains the data values I'm wanting to return when > someone makes a search.  This table has, in addition to the data values, 3 > id's (FKs) pointing to the data/info that I'm wanting the users to be able > to search on (while also returning the data values). > > The general

Re: FastVectorHighlighter ignoring fragmenter parameter . . .

2010-12-06 Thread CRB
Koji, Thank you for the reply. Being something of a novice with Solr, I would be grateful if you could clarify my next steps. I infer from your reply that there is no current implementation yet contributed for the FVH similar to the regex fragmenter. Thus I need to write my own custom exte

Using Saxon 9 as a response writer with Solr 3.1 . . ?

2010-12-06 Thread CRB
Has anyone been able to get Saxon 9 working with Solr3.1? I was following the wiki page (http://wiki.apache.org/solr/XsltResponseWriter), placing all the saxon-*.jars are in Jetty's lib/ext folder and start with java -Djavax.xml.transform.TransformerFactory=net.sf.saxon.TransformerFactoryImp

Re: Question about Solr Fieldtypes, Chaining of Tokenizers

2010-12-06 Thread Matthew Hall
Yes, that's my conclusion as well Grant. As for the example output: The symposium of Tg(RX3fg+and) gene studies Should end up tokenizing to: symposium tg the rx3fg and gene studi Assuming I guessed right on the stemming. Anyhow, thanks for the confirmation guys. Matt On 12/4/2010 8:18 PM,

Re: Stored field value modification

2010-12-06 Thread Emmanuel Bégué
2010/12/6 Ahmet Arslan : > > If you are already using DIH, > http://wiki.apache.org/solr/DataImportHandler#HTMLStripTransformer can do > what you want. Indeed it can. Many thanks.

DIH - rdbms to index confusion

2010-12-06 Thread kmf
I'm new to solr (and indexing in general) and am having a hard time making the transition from rdbms to indexing in terms of the DIH/data-config.xml file. I've successfully created a working index (so far) for the simple queries in my db, but I'm struggling to add a more "complex" query. When I

Dynamically filtering results based on score

2010-12-06 Thread Bryan Barkley
I've seen references to score filtering in the list archives with frange being the suggested solution, but I have a slightly different problem that I don't think frange will solve. I basically want to drop a portion of the results based on their score in relation to the other scores in the result s

Re: Index version on slave nodes

2010-12-06 Thread Xin Li
I think this is expected behavior. You have to issue the "details" command to get the real indexversion for slave machines. Thanks, Xin On Mon, Dec 6, 2010 at 11:26 AM, Markus Jelsma wrote: > Hi, > > The indexversion command in the replicationHandler on slave nodes returns 0 > for indexversion a

Re: Stored field value modification

2010-12-06 Thread Ahmet Arslan
> - I have zero control over what is stored in the database > - using the Solr XML update protocol i could probably > transform the > data before sending it > - ... but I'd much rather continue using DataImportHandler > to access > the database If you are already using DIH, http://wiki.apache.or

Re: How to get all the search results?

2010-12-06 Thread Savvas-Andreas Moysidis
ahhh, right..in dismax, you pre-define the fields that will be searched upon is that right? is it also true that the query is parsed and all special characters escaped? On 6 December 2010 16:25, Peter Karich wrote: > for dismax just pass an empty query all q= or none at all > > > Hello, >> >>

Taxonomy and Faceting

2010-12-06 Thread webdev1977
I have been digging through the user lists for Solr and Nutch, as well as reading lots of blogs, etc. I have yet to find a clear answer (maybe there is none ) I am trying to find the best way ahead for choosing a technology that will allow the ability to use a large taxonomy for classifying stru

Re: Stored field value modification

2010-12-06 Thread Markus Jelsma
Hi, You can create a custom update request processor [1] to strip unwanted input as it is about to enter the index. [1]: http://wiki.apache.org/solr/UpdateRequestProcessor Cheers, On Monday 06 December 2010 17:36:09 Emmanuel Bégué wrote: > Hello, > > Is it possible to manipulate the value of

Stored field value modification

2010-12-06 Thread Emmanuel Bégué
Hello, Is it possible to manipulate the value of a field before it is stored? I'm indexing a database where some field contain raw HTML, including named character entities. Using solr.HTMLStripCharFilterFactory on the index analyzer, results in this HTML being correctly stripped, and named chara

Index version on slave nodes

2010-12-06 Thread Markus Jelsma
Hi, The indexversion command in the replicationHandler on slave nodes returns 0 for indexversion and generation while the details command does return the correct information. I haven't found an existing ticket on this one although https://issues.apache.org/jira/browse/SOLR-1573 has similarities

Re: How to get all the search results?

2010-12-06 Thread Peter Karich
for dismax just pass an empty query all q= or none at all Hello, shouldn't that query syntax be *:* ? Regards, -- Savvas. On 6 December 2010 16:10, Solr User wrote: Hi, First off thanks to the group for guiding me to move from default search handler to dismax. I have a question related

Re: How to get all the search results?

2010-12-06 Thread Shawn Heisey
With dismax, I didn't get any results with *:*. I did the query with these options (q is empty) and got the full rowcount: q=&rows=0&qt=dismax I have q.alt defined in my dismax handler as *:*, don't know if that is required or not. Shawn On 12/6/2010 9:17 AM, Savvas-Andreas Moysidis wrote

Re: How to get all the search results?

2010-12-06 Thread Savvas-Andreas Moysidis
Hello, shouldn't that query syntax be *:* ? Regards, -- Savvas. On 6 December 2010 16:10, Solr User wrote: > Hi, > > First off thanks to the group for guiding me to move from default search > handler to dismax. > > I have a question related to getting all the search results. In the past > with

Re: Syncing 'delta-import' with 'select' query

2010-12-06 Thread Alexey Serba
Hey Juan, It seems that DataImportHandler is not a right tool for your scenario and you'd better use Solr XML update protocol. * http://wiki.apache.org/solr/UpdateXmlMessages You still can work around your outdated GUI view problem with calling DIH synchronously, by adding synchronous=true to you

How to get all the search results?

2010-12-06 Thread Solr User
Hi, First off thanks to the group for guiding me to move from default search handler to dismax. I have a question related to getting all the search results. In the past with the default search handler I was getting all the search results (8000) if I pass q=* as search string but with dismax I was

Re: dataimports response returns before done?

2010-12-06 Thread Alexey Serba
> After issueing a dataimport, I've noticed solr returns a response prior to > finishing the import. Is this correct?   Is there anyway i can make solr not > return until it finishes? Yes, you can add synchronous=true to your request. But be aware that it could take a long time and you can see ht

Re: Query performance very slow even after autowarming

2010-12-06 Thread Alexey Serba
* Do you use EdgeNGramFilter in index analyzer only? Or you also use it on query side as well? * What if you create additional field first_letter (string) and put first character/characters (multivalued?) there in your external processing code. And then during search you can filter all documents t

Re: FastVectorHighlighter ignoring fragmenter parameter . . .

2010-12-06 Thread CRB
Koji, Thank you for the reply. Being something of a novice with Solr, I would be grateful if you could clarify my next steps. I infer from your reply that there is no current implementation yet contributed for the FVH similar to the regex fragmenter. Thus I need to write my own custom exte

Re: Solr -File Based Spell Check

2010-12-06 Thread rajini maski
Yeah.. I wanna use this Spell-check only.. I want to create myself the dictionary.. And give it as input to solr.. Because my indexes also have mis-spelled content and so I want solr to refer this file and not autogenrated. How do i get this done? I will try the spell check as suggested by micha

Re: Solr -File Based Spell Check

2010-12-06 Thread Erick Erickson
Are you sure you want spellcheck/autosuggest? Because what you're talking about almost sounds like synonyms. Best Erick On Mon, Dec 6, 2010 at 1:37 AM, rajini maski wrote: > How does the solr file based spell check work? > > How do we need to enter data in the spelling.txt...I am not clear

Re: Solr -File Based Spell Check

2010-12-06 Thread ramzesua
Hi. As I know, for file based spellcheck you need: - configure you spellcheck seach component in solrconfig.xml, for example: solr.FileBasedSpellChecker file spellings.txt UTF-8 ./spellcheckerFile - then you must get or form spell

Re: Dataimport: Could not load driver: com.mysql.jdbc.Driver

2010-12-06 Thread stockii
maybe encoding !? -- View this message in context: http://lucene.472066.n3.nabble.com/Dataimport-Could-not-load-driver-com-mysql-jdbc-Driver-tp2021616p2027138.html Sent from the Solr - User mailing list archive at Nabble.com.