Re: Solr vs ElasticSearch

2011-05-31 Thread Jason Rutherglen
Thanks Shashi, this is oddly coincidental with another issue being put into Solr (SOLR-2193) to help solve some of the NRT issues, the timing is impeccable. At a base however Solr uses Lucene, as does ES. I think the main advantage of ES is the auto-sharding etc. I think it uses a gossip protoco

Re: Anyway to know changed documents?

2011-05-31 Thread Paul Libbrecht
I think you should look at the indextime field. There are examples in the wiki. paul Le 1 juin 2011 à 08:07, 京东 a écrit : > Hi everyone, > If I have two server ,their indexes should be synchronized. I changed A's > index via HTTP send document objects, Is there any config or some plug-ins to

Anyway to know changed documents?

2011-05-31 Thread 京东
Hi everyone, If I have two server ,their indexes should be synchronized. I changed A's index via HTTP send document objects, Is there any config or some plug-ins to let solr know which objects are changed and can push it B ? Any suggestion will be appreciate. Thanks :)

Re: Solr vs ElasticSearch

2011-05-31 Thread Fuad Efendi
Nice article... 2 ms better than 20 ms, but in another chart 50 seconds are not as good as 3 seconds... Sorry for my vision... SOLR pushed into Lucene Core huge amount of performance improvements... Sent on the TELUS Mobility network with BlackBerry -Original Message- From: Shashi Kant

Index vs. Query Time Aware Filters

2011-05-31 Thread Mike Schultz
We have very long schema files for each of our language dependent query shards. One thing that is doubling the configuration length of our main text processing field definition is that we have to repeat the exact same filter chain for query time version EXCEPT with a queryMode=true parameter. Is

Re: Solr vs ElasticSearch

2011-05-31 Thread Shashi Kant
Here is a very interesting comparison http://engineering.socialcast.com/2011/05/realtime-search-solr-vs-elasticsearch/ > -Original Message- > From: Mark > Sent: May-31-11 10:33 PM > To: solr-user@lucene.apache.org > Subject: Solr vs ElasticSearch > > I've been hearing more and more about

Re: collapse component with pivot faceting

2011-05-31 Thread Isha Garg
Hi, Actually currently I am using solr version 3.0 . I applied the field collapsing patch of solr . The field collapsing work fine with collapse.facet=after for any facet.field but when I try to use facet.pivot query after collapse.facet=after it does nt show any results. Also piv

RE: Solr vs ElasticSearch

2011-05-31 Thread Fuad Efendi
Interesting wordings: "we want real-time search, we want simple multi-tenancy, and we want a solution that is built for the cloud" And later, " built on top of Lucene." Is that possible? :) (what does that mean "real time search" anyway... and what is "cloud"?) community is growing! P.S. I neve

RE: DIH: Exception with "Too many connections"

2011-05-31 Thread tiffany
Stephan, Your advice (check the process list) gave me an important clue for my solution. I changed my database connection to the slave instead of master, so that I can use more threads. Thank you very much! *** François, My setting is the default value: max_connections = 151 max_user_connect

Re: Solr vs ElasticSearch

2011-05-31 Thread Jason Rutherglen
Mark, Nice email address. I personally have no idea, maybe ask Shay Banon to post an answer? I think it's possible to make Solr more elastic, eg, it's currently difficult to make it move cores between servers without a lot of manual labor. Jason On Tue, May 31, 2011 at 7:33 PM, Mark wrote: >

Solr vs ElasticSearch

2011-05-31 Thread Mark
I've been hearing more and more about ElasticSearch. Can anyone give me a rough overview on how these two technologies differ. What are the strengths/weaknesses of each. Why would one choose one of the other? Thanks

Odd (i.e. wrong) File Names in 3.1 distro source zip

2011-05-31 Thread Bob Sandiford
Hi, all. I just downloaded the apache-solr-3.1.0-src.gz file, and unzipped that. I see inside there, a apache-solr-3.1.0-src file, and tried unzipping that. There weren't any errors, but as I look inside the apache-solr-3.0.1-src file, I see that not all the java code (for example) ended up b

Re: copyField generates "multiple values encountered for non multiValued field"

2011-05-31 Thread Alexander Kanarsky
Alexander, I saw the same behavior in 1.4.x with non-multivalued fields when "updating" the document in the index (i.e obtaining the doc from the index, modifying some fields and then adding the document with the same id back). I do not know what causes this, but it looks like the copyField logic

Re: Obtaining query AST?

2011-05-31 Thread Renaud Delbru
Hi, have a look at the flexible query parser of lucene (contrib package) [1]. It provides a framework to easily create different parsing logic. You should be able to access the AST and to modify as you want how it can be translated into a Lucene query (look at processors and pipeline processo

Re: Obtaining query AST?

2011-05-31 Thread Darren Govoni
Ludovic, Thank you for this tip, it sounds useful. Darren On Tue, 2011-05-31 at 14:38 -0700, lboutros wrote: > Darren, > > you can even take a look to the DebugComponent which returns the parsed > query in a string form. > It uses the QueryParsing class to parse the query, you could perhaps

Re: Obtaining query AST?

2011-05-31 Thread lboutros
Darren, you can even take a look to the DebugComponent which returns the parsed query in a string form. It uses the QueryParsing class to parse the query, you could perhaps do the same. Ludovic. - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/Obtaining-qu

Re: Obtaining query AST?

2011-05-31 Thread lboutros
Hi Darren, I think that if I had to get the parsing result, I would create my own QueryComponent which would create the parser in the 'prepare' function (you can take a look to the actual QueryComponent class) and instead of resolving the query in the 'process' function, I would just parse the que

Re: What's your query result cache's stats?

2011-05-31 Thread Jack Repenning
On May 31, 2011, at 2:02 PM, Markus Jelsma wrote: > the cumulative hit > ratio of the query result cache, it's almost never higher than 50%. > > What are your stats? How do you deal with it? warmupTime : 0 cumulative_lookups : 394867 cumulative_hits : 394780 cumulative_hitratio : 0.99 cumu

RE: Solr memory consumption

2011-05-31 Thread Fuad Efendi
It could be environment specific (specific of your "top" command implementation, OS, etc) I have on CentOS 2986m "virtual" memory showing although -Xmx2g You have 10g "virtual" although -Xmx6g Don't trust it too much... "top" command may count OS buffers for opened files, network sockets, JVM D

What's your query result cache's stats?

2011-05-31 Thread Markus Jelsma
Hi, I've seen the stats page many times, of quite a few installations and even more servers. There's one issue that keeps bothering me: the cumulative hit ratio of the query result cache, it's almost never higher than 50%. What are your stats? How do you deal with it? In some cases i have to d

Re: Using multiple CPUs for a single document base?

2011-05-31 Thread Erick Erickson
As far as I know you're on the right track, adds are single threaded. You can have multiple threads making indexing requests from your client, but that's primarily aimed at making the I/O not be the bottleneck, at some point the actual indexing of the documents is single-threaded. It'd be tricky,

Solr memory consumption

2011-05-31 Thread Denis Kuzmenok
I run multiple-core solr with flags: -Xms3g -Xmx6g -D64, but i see this in top after 6-8 hours and still raising: 17485 test214 10.0g 7.4g 9760 S 308.2 31.3 448:00.75 java -Xms3g -Xmx6g -D64 -Dsolr.solr.home=/home/test/solr/example/multicore/ -jar start.jar Are there any ways t

Re: Using multiple CPUs for a single document base?

2011-05-31 Thread Jack Repenning
On May 31, 2011, at 12:44 PM, Markus Jelsma wrote: > I haven't given it a try but perhaps opening multiple HTTP connections to the > update handler will end up in multiple threads thus better CPU utilization. My original test case had hundreds of HTTP connections (all to the same URL) doing ad

Re: Better Spellcheck

2011-05-31 Thread Otis Gospodnetic
Hi Tanner, We have something we call DYM ReSearcher that helps in situations like these, esp. with multi-word queries that Lucene/Solr spellcheckers have trouble with. See http://sematext.com/products/dym-researcher/index.html Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch

RE: Searching Database

2011-05-31 Thread Roger Shah
Sorry, Markus. I was not aware I need to create a new email. -Original Message- From: Markus Jelsma [mailto:markus.jel...@openindex.io] Sent: Tuesday, May 31, 2011 3:48 PM To: solr-user@lucene.apache.org Subject: Re: Searching Database Roger, how about not hijacking another user's threa

Re: Searching Database

2011-05-31 Thread Markus Jelsma
Roger, how about not hijacking another user's thread and not hijacking your already hijacked thread twice more? Chaning the e-mail subject won't change the header's contents. > How can I use SOLR (version 3.1) to search in our Microsoft SQL Server > database? I looked at the DIH example but that

Re: Searching Database

2011-05-31 Thread Gora Mohanty
On Wed, Jun 1, 2011 at 1:14 AM, Roger Shah wrote: > How can I use SOLR (version 3.1) to search in our Microsoft SQL Server > database? > I looked at the DIH example but that looks like it is for importing.  I also > looked at the following link:  http://wiki.apache.org/solr/DataImportHandler [..

Re: Searching Database

2011-05-31 Thread Otis Gospodnetic
Roger, You have to import/index into Solr before you can search it. Solr can't go into your MS SQL server and search data in there. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message > From: Ro

Re: Searching Database

2011-05-31 Thread Stefan Matheis
Roger, " .. but that looks like it is for importing .." it will remain the same - no matter how often you'll search for it :) because that is .. what solr is for. you have to import (and this means indexing, analyzing ..) the whole content that you want to search. You could either use DIH to

Re: Using multiple CPUs for a single document base?

2011-05-31 Thread Markus Jelsma
I haven't given it a try but perhaps opening multiple HTTP connections to the update handler will end up in multiple threads thus better CPU utilization. Would be nice if someone can prove it here, i'm not in `the lab` right now ;) > You say it like it's something you have control over; how wou

Searching Database

2011-05-31 Thread Roger Shah
How can I use SOLR (version 3.1) to search in our Microsoft SQL Server database? I looked at the DIH example but that looks like it is for importing. I also looked at the following link: http://wiki.apache.org/solr/DataImportHandler Please send me a link to any instructions to set up SOLR so

Re: Splitting fields

2011-05-31 Thread Markus Jelsma
I'd go for this option as well. The example update processor can't make it more easier and it's a very flexible approach. Judging from the patch in SOLR-2105 it should still work with the current 3.2 branch. https://issues.apache.org/jira/browse/SOLR-2105 > Hi, > > Write a custom UpdateProces

Re: Using multiple CPUs for a single document base?

2011-05-31 Thread Jack Repenning
On May 31, 2011, at 12:24 PM, Jonathan Rochkind wrote: > I do all my 'adds' to a seperate Solr index, and then replicate to a slave > that actually serves queries. Yes, that's a step I'm holding in reserve. Probably get there some day, as I expect always to have a very high add-to-query ratio.

Re: Using multiple CPUs for a single document base?

2011-05-31 Thread Jonathan Rochkind
You say it like it's something you have control over; how would one choose to use more than one thread when indexing? I guess maybe it depends on how you're indexing of course; I guess if you're using SolrJ it's straightforward. What if you're using the ordinary HTTP Post interface, or DIH? O

Re: Boosting fields at query time in Standard Request Handler from Solrconfig.xml

2011-05-31 Thread Jan Høydahl
Hi, You need to add edismax -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 31. mai 2011, at 18.08, Vignesh Raj wrote: > Hi, > > I am developing a search engine app using Asp.Net, C# and Solrnet. I use the > standard reques

Re: Using multiple CPUs for a single document base?

2011-05-31 Thread Markus Jelsma
If you use only one thread when indexing then one one core is going to be used. > On May 31, 2011, at 11:29 AM, Jonathan Rochkind wrote: > > I kind of think you should get multi-CPU use 'for free' as a Java app > > too. > > Ah, probably experimental error? If I apply a stress load consisting onl

Re: Splitting fields

2011-05-31 Thread Jan Høydahl
Hi, Write a custom UpdateProcessor, which gives you full control of the SolrDocument prior to indexing. The best would be if you write a generic FieldSplitterProcessor which is configurable on what field to take as input, what delimiter or regex to split on and finally what fields to write the

Re: Obtaining query AST?

2011-05-31 Thread Jonathan Rochkind
You're going to have to parse it yourself. Or, since Solr is open source, you can take pieces of the existing query parsers (dismax or lucene), and repurpose them. But I don't _think_ (I could be wrong) there is any public API in Solr/SolrJ that will give you an AST. On 5/31/2011 3:18 PM, dar

Re: Using multiple CPUs for a single document base?

2011-05-31 Thread Jonathan Rochkind
Yep, that could be it. You certainly don't get _great_ concurrency support 'for free' in Java, concurrent programming is still tricky. Parts of Solr are surely better at it than others. The one place I'd be shocked was just with multiple concurrent queries, if those weren't helped by multi-C

Re: Obtaining query AST?

2011-05-31 Thread darren
Hi, thanks for the tip. I noticed the XML stuff, but the trouble is I am taking a query string entered by a user such as "this OR that AND (this AND that)" so I'm not sure how to go from that to a representational AST parse tree... > I believe there is a query parser that accepts queries formatted

Re: Obtaining query AST?

2011-05-31 Thread Mike Sokolov
I believe there is a query parser that accepts queries formatted in XML, allowing you to provide a parse tree to Solr; perhaps that would get you the control you're after. -Mike On 05/31/2011 02:24 PM, dar...@ontrenet.com wrote: Hi, I want to write my own query expander. It needs to obtain

Re: Using multiple CPUs for a single document base?

2011-05-31 Thread Jack Repenning
On May 31, 2011, at 11:29 AM, Jonathan Rochkind wrote: > I kind of think you should get multi-CPU use 'for free' as a Java app too. Ah, probably experimental error? If I apply a stress load consisting only of queries, I get automatic multi-core use as expected. I could see where indexing new d

Re: Using multiple CPUs for a single document base?

2011-05-31 Thread Markus Jelsma
1.3 is the schema version. It hasn't had as many upgrades. Solr 3.1 uses the 1.3 schema version, Solr 1.4.x uses the 1.2 schema version. > On May 31, 2011, at 11:16 AM, Markus Jelsma wrote: > > Are you using a < 1.4 version of Solr? > > Yeah, about those version numbers ... The tarball I install

Re: Using multiple CPUs for a single document base?

2011-05-31 Thread Jonathan Rochkind
Yeah, ignore the 'multiple cores' you are seeing in the docs there, that's about something else unrelated to CPU's, as you discovered, has nothing to do with what you're asking about, put it out of your mind. I kind of think you should get multi-CPU use 'for free' as a Java app too. It does fo

Re: Using multiple CPUs for a single document base?

2011-05-31 Thread Jack Repenning
On May 31, 2011, at 11:16 AM, Markus Jelsma wrote: > Are you using a < 1.4 version of Solr? Yeah, about those version numbers ... The tarball I installed claimed its version was apache-solr-3.1.0 Which sounds comfortably later than 1.4. But the examples/solr/schema.xml that comes with it cl

Obtaining query AST?

2011-05-31 Thread darren
Hi, I want to write my own query expander. It needs to obtain the AST (abstract syntax tree) of an already parsed query string, navigate to certain parts of it (words) and make logical phrases of those words by adding to the AST - where necessary. This cannot be done to the string because the que

Re: Using multiple CPUs for a single document base?

2011-05-31 Thread Markus Jelsma
Are you using a < 1.4 version of Solr? It has since been improved for multi- threaded scalability such as a lot of neat non-blocking components. It's been a long time ago since i saw a Solr server not taking advantage of multiple cores. Today, they take 200% up to even 1200% CPU time when viewing

Using multiple CPUs for a single document base?

2011-05-31 Thread Jack Repenning
Is there a way to allow Solr to use multiple CPUs of a single, multi-core box, to increase scale (number of documents, number of searches) of the searchbase? The CoreAdmin wiki page talks about "Multiple Cores" as essentially independent document bases with independent indexes, but with some uni

Re: how does Solr/Lucene index multi-value fields

2011-05-31 Thread Ian Holsman
Thanks Erick. sadly in my use-case I don't that wouldn't work. I'll go back to storing them at the story level, and hitting a DB to get related stories I think. --I On May 31, 2011, at 12:27 PM, Erick Erickson wrote: > Hmmm, I may have mis-lead you. Re-reading my text it > wasn't very well writ

Re: Custom Scoring relying on another server.

2011-05-31 Thread arian487
bump -- View this message in context: http://lucene.472066.n3.nabble.com/Custom-Scoring-relying-on-another-server-tp2994546p3006873.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: how does Solr/Lucene index multi-value fields

2011-05-31 Thread Erick Erickson
Hmmm, I may have mis-lead you. Re-reading my text it wasn't very well written TF/IDF calculations are, indeed, per-field. I was trying to say that there was no difference between storing all the data for an individual field as a single long string of text in a single-valued field or as several

Re: how does Solr/Lucene index multi-value fields

2011-05-31 Thread Jonathan Rochkind
On 5/31/2011 12:16 PM, Ian Holsman wrote: we have a collection of related stories. when a user searches for something, we might not want to display the story that is most-relevant (according to SOLR), but according to other home-grown rules. by combing all the possibilities in one SolrDocument,

Re: Edgengram

2011-05-31 Thread Tomás Fernández Löbbe
...or also use the LowerCaseTokenizerFactory at query time for consistency, but not the edge ngram filter. 2011/5/31 Tomás Fernández Löbbe > Hi Brian, I don't know if I understand what you are trying to achieve. You > want the term query "abcdefg" to have an idf of 1 insead of 7? I think using >

Re: Edgengram

2011-05-31 Thread Tomás Fernández Löbbe
Hi Brian, I don't know if I understand what you are trying to achieve. You want the term query "abcdefg" to have an idf of 1 insead of 7? I think using the KeywordTokenizerFilterFactory at query time should work. I would be something like: this way, at query time "abcde

Re: how does Solr/Lucene index multi-value fields

2011-05-31 Thread Ian Holsman
On May 31, 2011, at 12:11 PM, Erick Erickson wrote: > Can you explain the use-case a bit more here? Especially the post-query > processing and how you expect the multiple documents to help here. > we have a collection of related stories. when a user searches for something, we might not want to

Re: how does Solr/Lucene index multi-value fields

2011-05-31 Thread Erick Erickson
Can you explain the use-case a bit more here? Especially the post-query processing and how you expect the multiple documents to help here. But TF/IDF is calculated over all the values in the field. There's really no difference between a multi-valued field and storing all the data in a single field

Boosting fields at query time in Standard Request Handler from Solrconfig.xml

2011-05-31 Thread Vignesh Raj
Hi, I am developing a search engine app using Asp.Net, C# and Solrnet. I use the standard request handler. Is there a way I can boost the fields at query time from inside the solrconfig.xml file itself. Just like the "qf" field for Dismax handler. Right now am searching like "field1:value^1.5 fiel

Re: Edgengram

2011-05-31 Thread Brian Lamb
I believe I used that link when I initially set up the field and it worked great (and I'm still using it in other places). In this particular example however it does not appear to be practical for me. I mentioned that I have a similarity class that returns 1 for the idf and i

Re: Edgengram

2011-05-31 Thread bmdakshinamur...@gmail.com
Can you specify the analyzer you are using for your queries? May be you could use a KeywordAnalyzer for your queries so you don't end up matching parts of your query. http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/ This should help you. On Tue,

Better Spellcheck

2011-05-31 Thread Tanner Postert
I've tried to use a spellcheck dictionary built from my own content, but my content ends up having a lot of misspelled words so the spellcheck ends up being less than effective. I could use a standard dictionary, but it may have problems with proper nouns. It also misses phrases. When someone searc

how does Solr/Lucene index multi-value fields

2011-05-31 Thread Ian Holsman
Hi. I want to store a list of documents (say each being 30-60k of text) into a single SolrDocument. (to speed up post-retrieval querying) In order to do this, I need to know if lucene calculates the TF/IDF score over the entire field or does it treat each value in the list as a unique field?

Re: Edgengram

2011-05-31 Thread Brian Lamb
In this particular case, I will be doing a solr search based on user preferences. So I will not be depending on the user to type "abcdefg". That will be automatically generated based on user selections. The contents of the field do not contain spaces and since I am created the search parameters, c

Re: Solr NRT

2011-05-31 Thread Nagendra Nagarajayya
Did you trying using Solr with RankingAlgorithm ? It supports NRT. You can index documents without a commit while searching concurrently. No changes are needed except for enabling NRT through solrconfig.xml. You can get information about the implementation from here: http://solr-ra.tgels.com/

Re: Getting and viewing a heap dump

2011-05-31 Thread Constantijn Visinescu
I was sending using the gmail webbrowser client in plaintext. Spamassasin didn't seem to like 3 things (according to the error message i got back): - I use a free email adress - My email address (before the @) ends with a number - The email had the word replication in the subject line. No idea wh

Re: how can i index data in different documents

2011-05-31 Thread Erick Erickson
isn't a tag recognized in schema.xml. Please review: http://wiki.apache.org/solr/UsingMailingLists Best Erick On Thu, May 26, 2011 at 6:46 AM, Romi wrote: > > Ensure that when you add your documents, their "type" value is > effectively set to either "table1" or table"2". > > did you mean i set

Re: Getting and viewing a heap dump

2011-05-31 Thread Erick Erickson
Constantjin: I've had better luck by sending messages as "plain text". The Spam filter on the user list sometimes acts up if you send mail in "richtext" or similar formats. Gmail has a link to change this, what client are you using? And thanks for participating! Best Erick On Tue, May 31, 2011

Re: Solr Dismax bf & bq vs. q:{boost ...}

2011-05-31 Thread Erick Erickson
First, please define what "wrong results" means, what are you expecting and what are you seeing? Second, please post the results of &debugQuery=on where we can all see it, perhaps something will pop out... Best Erick On Mon, May 30, 2011 at 12:27 PM, chazzuka wrote: > I tried to do this: > > #1

Re: collapse component with pivot faceting

2011-05-31 Thread Erick Erickson
Please provide a more detailed request. This is so general that it's hard to respond. What is the use-case you're trying to understand/implement? You might review: http://wiki.apache.org/solr/UsingMailingLists Best Erick On Mon, May 30, 2011 at 4:31 AM, Isha Garg wrote: > Hi All! > >         Ca

Re: Problem with caps and star symbol

2011-05-31 Thread Erick Erickson
I think you're tripping over the issue that wildcards aren't analyzed, they don't go through your analysis chain. So the casing matters. Try lowercasing the input and I believe you'll see more like what you expect... Best Erick On Mon, May 30, 2011 at 12:07 AM, Saumitra Chowdhury wrote: > I am s

Re: Match in the process of filter, not end, does it mean "not matching"?

2011-05-31 Thread Erick Erickson
Take a closer look at the results of KeywordTokenizerFactory. It won't break up the text into any tokens, the entire input is considered a single string. Are you sure this is what you intend? I'd start by removing most of your filters, understanding what's happening at each step then adding them b

Re: Documents update

2011-05-31 Thread Denis Kuzmenok
Will it be slow if there are 3-5 million key/value rows? > http://lucene.apache.org/solr/api/org/apache/solr/schema/ExternalFileField.html > On Tuesday 31 May 2011 15:41:32 Denis Kuzmenok wrote: >> Flags are stored to filter results and it's pretty highloaded, it's >> working fine, but i c

WIKI alerts

2011-05-31 Thread Fuad Efendi
Anyone noticed that it doesn't work? Already 2 weeks https://issues.apache.org/jira/browse/INFRA-3667 I don't receive WIKI change notifications. I CC to 'Apache Wiki' wikidi...@apache.org Something is bad. -Fuad

Re: solr Invalid Date in Date Math String/Invalid Date String

2011-05-31 Thread Erick Erickson
Can we see the results of attaching &debugQuery=on to the query? That often points out the issue. I'd expect this form to work: [2006-12-22T00:00:00Z TO 2006-12-22T23:59:59Z] Best Erick 2011/5/27 Ellery Leung : > Thank you Mike. > > So I understand that now. But what about the other items that

Re: Documents update

2011-05-31 Thread Markus Jelsma
http://lucene.apache.org/solr/api/org/apache/solr/schema/ExternalFileField.html On Tuesday 31 May 2011 15:41:32 Denis Kuzmenok wrote: > Flags are stored to filter results and it's pretty highloaded, it's > working fine, but i can't update index very often just to make flags > up to time =\

Re: Pivot with Stats (or Stats with Pivot)

2011-05-31 Thread Erick Erickson
Well, you can discuss it on the dev list, but given that Solr is open source, you'll have to either create your own patch or engage the community to create one. You haven't really stated why this is a good thing to have, just that you want it. So the use-case would be a big help. Please don't rai

Re: applying FastVectorHighlighter truncation patch to solr 3.1

2011-05-31 Thread Markus Jelsma
Did you try to apply the patch in Lucene's contrib? On Tuesday 17 May 2011 18:55:49 Paul wrote: > I'm having this issue with solr 3.1: > > https://issues.apache.org/jira/browse/LUCENE-1824 > > It looks like there is a patch offered, but I can't figure out how to apply > it. > > What is the easi

Re: Edgengram

2011-05-31 Thread Erick Erickson
That'll work for your case, although be aware that string types aren't analyzed at all, so case matters, as do spaces etc. What is the use-case here? If you explain it a bit there might be better answers Best Erick On Fri, May 27, 2011 at 9:17 AM, Brian Lamb wrote: > For this, I ended u

Re: Splitting fields

2011-05-31 Thread Erick Erickson
Hmmm, I wonder if a custom Transformer would help here? It can be inserted into a chain of transformers in DIH. Essentially, you subclass Transformer and implement one method (transformRow) and do anything you want. The input is a map of that is a simple representation of the Solr document. You c

Re: Documents update

2011-05-31 Thread Denis Kuzmenok
Flags are stored to filter results and it's pretty highloaded, it's working fine, but i can't update index very often just to make flags up to time =\ Where can i read about using external fields / files? > And it wouldn't work unless all the data is stored anyway. Currently there's > no w

Re: DIH render html entities

2011-05-31 Thread Erick Erickson
Convert them to what? Individual fields in your docs? Text? If the former, you might get some joy from the XpathEntityProcessor. If you want to just strip the markup and index all the content you might get some joy from the various *html* analyzers listed here: http://wiki.apache.org/solr/Analyzer

Re: Documents update

2011-05-31 Thread Erick Erickson
And it wouldn't work unless all the data is stored anyway. Currently there's no way to update a single field in a document, although there's work being done in that direction (see the "column stride" JIRA). What do you want to do with these fields? If it's to influence scoring, you could look at e

Re: How to disable QueryElevationComponent

2011-05-31 Thread Erick Erickson
Let's back up a bit. Why don't you want a ? It's usually a good idea to have one, especially if you're using DIH. Best Erick On Fri, May 27, 2011 at 2:53 AM, Romi wrote: > i removed > class="org.apache.solr.handler.component.QueryElevationComponent" > > >     string >    elevate.xml >   > > fro

Re: Facet Query

2011-05-31 Thread Erick Erickson
I'm guessing you're faceting on an analyzed field. This is usually a bad idea. What is the use-case you're trying to solve? Best Erick On Fri, May 27, 2011 at 12:51 AM, Jasneet Sabharwal wrote: > Hi > > When I do a facet query on my data, it shows me a list of all the words > present in my datab

Re: Nutch Crawl error

2011-05-31 Thread Erick Erickson
This question would be better asked on the Nutch forum rather than the Solr forum. Best Erick On Thu, May 26, 2011 at 12:06 PM, Roger Shah wrote: > I ran the command bin/nutch crawl urls -dir crawl -depth 3 >& crawl.log > > When I viewed crawl.log I found some errors such as: > > Can't retrieve

Re: Spellcheck component not returned with numeric queries

2011-05-31 Thread Markus Jelsma
File an issue: https://issues.apache.org/jira/browse/SOLR-2556 On Monday 30 May 2011 16:07:41 Markus Jelsma wrote: > Hi, > > The spell check component's output is not written when sending queries that > consist of numbers only. Clients depending on the availability of the > spellcheck output need

Re: Bulk indexing, UpdateProcessor overwriteDupes and poor IO performances

2011-05-31 Thread lee carroll
Tanguy You might have tried this already but can you set overwritedupes to false and set the signiture key to be the id. That way solr will manage updates? from the wiki http://wiki.apache.org/solr/Deduplication HTH Lee On 30 May 2011 08:32, Tanguy Moal wrote: > > Hello, > > Sorry for re-

Re: Solr NRT

2011-05-31 Thread Ionut Manta
What results did you got with this "hack"? How long it takes since you start indexing some documents until you get a search result? Did you try NRT? On Tue, May 31, 2011 at 3:47 PM, David Hill wrote: > > Unless you cross a Solr server commit threshold your client has to post a > message for the

RE: DIH: Exception with "Too many connections"

2011-05-31 Thread Fuad Efendi
Hi, There is existing bug in DataImportHandler described (and patched) at https://issues.apache.org/jira/browse/SOLR-2233 It is not used in a thread safe manner, and it is not appropriately closed & reopened (why?); and new connection is opened unpredictably. It may cause "Too many connections" e

RE: Solr NRT

2011-05-31 Thread David Hill
Unless you cross a Solr server commit threshold your client has to post a message for the server content to be available for searching. Unfortunatly the Solr tool that is supposed to do this apparently doesn't. I asked for community help last week and was surprised to receive no response, I t

Solr NRT

2011-05-31 Thread Ionut Manta
Hi, I have the following strange use case: Index 100 documents and make them immediately available for search. I call this "on the fly indexing". Then the index can be removed. So the size of the index is not an issue here. Is this possible with Solr? Anyone tried something similar? Thank you, I

Re: DIH: Exception with "Too many connections"

2011-05-31 Thread François Schiettecatte
Hi You might also check the 'max_user_connections' settings too if you have that set: # Maximum number of connections, and per user max_connections = 2048 max_user_connections = 2048 http://dev.mysql.com/doc/refman/5.5/en/server-system-variables.html Cheers Fran

Re: DIH: Exception with "Too many connections"

2011-05-31 Thread Stefan Matheis
Tiffany, On Tue, May 31, 2011 at 12:45 PM, tiffany wrote: > I executed the  SHOW PROCESSLIST; command. (Is it what you mean? I've never > tried it before...) Exactly this, yes :) On Tue, May 31, 2011 at 12:45 PM, tiffany wrote: > So, if the number of threads in the process list is larger than

RE: newbie question for DataImportHandler

2011-05-31 Thread Kevin Bootz
In the op it's stated that the index was deleted. I'm guessing that means the physical files, /data/ quote populate the table > with another million rows of data. > I remove the index that solr previously create. I restart solr and go > to the > data import handler development console and

Re: Getting and viewing a heap dump

2011-05-31 Thread Bernd Fehling
Hi Constantijn, yes I use Linux 64bit and thanks for the help. Bernd Am 31.05.2011 12:22, schrieb Constantijn Visinescu: Hi Bernd, I'm assuming Linux here, if you're running something else these instructions might differ slightly. First get a heap dump with: jmap -heap:format=b,file=/path/to

Re: DIH: Exception with "Too many connections"

2011-05-31 Thread tiffany
Thanks Stefan! I executed the SHOW PROCESSLIST; command. (Is it what you mean? I've never tried it before...) It seems that when I executed one delta-import command, several threads were inserted into the table and removed after commit. Also it looks like the number of threads are pretty much eq

Getting and viewing a heap dump

2011-05-31 Thread Constantijn Visinescu
Hi Bernd, I'm assuming Linux here, if you're running something else these instructions might differ slightly. First get a heap dump with: jmap -heap:format=b,file=/path/to/generate/heapdumpfile.hprof 1234 with 1234 being the PID (process id) of the JVM After you get a Heap dump you can analyze

Re: DIH: Exception with "Too many connections"

2011-05-31 Thread tiffany
Thanks for your reply, Chandan. Here is the additional information. I'm also using the multi-core function, and I run the delta-import command in parallel due to saving the running time. If I don't run in parallel, it works fine. Each core accesses to the same database server but different sch

Re: DIH: Exception with "Too many connections"

2011-05-31 Thread Stefan Matheis
Tiffany, On Tue, May 31, 2011 at 11:16 AM, tiffany wrote: > Any idea to solve this problem? in Addition to Chandan: Check your mysql process list and have a look what is displayed there Regards Stefan

Re: DIH: Exception with "Too many connections"

2011-05-31 Thread Chandan Tamrakar
looks like you are not being able to connect to database , pls see if you get similar exception when you try to connect from other clients On Tue, May 31, 2011 at 3:01 PM, tiffany wrote: > Hi all, > > I'm using DIH and getting the following error. > My Solr version is Solr3.1. > > = > ...

DIH: Exception with "Too many connections"

2011-05-31 Thread tiffany
Hi all, I'm using DIH and getting the following error. My Solr version is Solr3.1. = ... Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: Could not create connection to database server. Attempted reconnect 3 times. Giving up. at sun.reflect.GeneratedCon

  1   2   >