Re: string cut-off filter?

2011-08-08 Thread Ahmet Arslan
> Yes indeed I currently use a > workaround with regex filter. > > Example for limiting to 30 characters: > pattern="(.{1,30})(.{31,})" replacement="$1" > replace="all"/> > > Just thought there might be already a filter. As an alternative you can steal TruncateTokenFilter from ElasticSearch too

Re: string cut-off filter?

2011-08-08 Thread Bernd Fehling
Yes indeed I currently use a workaround with regex filter. Example for limiting to 30 characters: Just thought there might be already a filter. But as Karsten showed it is pretty easy to implement. May be Karsten can open an issue and add his code? Regards Bernd Am 08.08.2011 22:56, schrieb

Re: Multiple Cores on different machines?

2011-08-08 Thread Yury Kats
On 8/8/2011 11:51 PM, Satish Talim wrote: > A quick question - is it possible to have 2 cores in Solr on two different > machines? Yes

Multiple Cores on different machines?

2011-08-08 Thread Satish Talim
A quick question - is it possible to have 2 cores in Solr on two different machines? Satish

Re: how to enable MMapDirectory in solr 1.4?

2011-08-08 Thread Li Li
thank you. I will try it. On Mon, Aug 8, 2011 at 11:18 PM, Rich Cariens wrote: > We patched our 1.4.1 build with > SOLR-1969(making > MMapDirectory configurable) and realized a 64% search performance > boost on our Linux hosts. > > On Mon, Aug 8, 2

Re: Suggestions for copying fields across cores...

2011-08-08 Thread Erick Erickson
Not that I know of. Separate cores are pretty distinct to Solr, so you're probably stuck with doing it by sending the request to each core... Best Erick On Fri, Aug 5, 2011 at 5:51 PM, josh lucas wrote: > Is there a suggested way to copy data in fields to additional fields that > will only be i

Re: Same id on two shards

2011-08-08 Thread simon
I think the first one to respond is indeed the way it works, but that's only deterministic up to a point (if your small index is in the throes of a commit and everything required for a response happens to be cached on the larger shard ... who knows ?) On Mon, Aug 8, 2011 at 7:10 PM, Shawn Heisey

Re: Records skipped when using DataImportHandler

2011-08-08 Thread Erick Erickson
Spend some time in the admin/analysis page, that'll show you what part of the analysis chain is doing what to your data. It'll save you a world of headache... But at a guess WordDelimiterFilterFactory is your culprit... Best Erick On Thu, Aug 4, 2011 at 6:08 PM, anand sridhar wrote: > Ok. After

Re: MultiSearcher/ParallelSearcher - searching over multiple cores?

2011-08-08 Thread Erick Erickson
I think you'll have to make this go yourself, I don't see how to make Solr do it for you. And even if it could, the scores aren't comparable, so combining them for presentation to the user will be "interesting" Best Erick On Thu, Aug 4, 2011 at 2:27 PM, Ralf Musick wrote: > Hi Erik, > > I have s

Re: merge factor performance

2011-08-08 Thread Erick Erickson
What version of Solr are you using? And how are you sending your docs to Solr? Bumping your JVM size and bumping your RAM size to 128M also might help.. How are you sending your docs to Solr? And where are you getting them from? Are you sure that Solr is your problem or is it your data acquisitio

Re: Same id on two shards

2011-08-08 Thread Shawn Heisey
On 8/8/2011 4:07 PM, simon wrote: Only one should be returned, but it's non-deterministic. See http://wiki.apache.org/solr/DistributedSearch#Distributed_Searching_Limitations I had heard it was based on which one responded first. This is part of why we have a small index that contains the new

Re: bug in termfreq? was Re: is it possible to do a sort without query?

2011-08-08 Thread Markus Jelsma
If your want to understand and debug the scoring you can use debugQuery=true to see how different documents score. Most of the time docs with both terms are on top of the result set unless norms are interferring. To understand your should check the Solr relevancy wiki but the Lucene docs are mu

Re: bug in termfreq? was Re: is it possible to do a sort without query?

2011-08-08 Thread Jason Toy
I am trying to test out and compare different sorts and scoring. When I use dismax to search for "indie music" with: qf=all_lists_text&q="indie+music"&defType=dismax&rows=100 I see some stuff that seems "irrelevant", meaning in top results I see only 1 or 2 mentions of "indie music", but when I l

Re: Same id on two shards

2011-08-08 Thread simon
Only one should be returned, but it's non-deterministic. See http://wiki.apache.org/solr/DistributedSearch#Distributed_Searching_Limitations -Simon On Sat, Aug 6, 2011 at 6:27 AM, Pooja Verlani wrote: > Hi, > > We have a multicore solr with 6 cores. We merge the results using shards > parameter

Re: Is anobdy using lotsofcores feature in production?

2011-08-08 Thread Uomesh
Hi Shalin, Is this means if I apply the patch mention at below link still Solr does not support lots of core? https://issues.apache.org/jira/browse/SOLR-1293 Are you saying this is just a concept and the patch is not an implementation? We are planning to use lots of core in our commerce system to

Re: Can Master push data to slave

2011-08-08 Thread simon
You could configure a PostCommit event listener on the master which would send a HTTP fetchindex request to the slave you want to carry out replication - see http://wiki.apache.org/solr/SolrReplication#HTTP_API But why do you want the master to push to the slave ? -Simon On Mon, Aug 8, 2011 at

Re: Is anobdy using lotsofcores feature in production?

2011-08-08 Thread Uomesh
Hi Shalin, Is this means if I apply the patch mention at below link still Solr does not support lots of core? https://issues.apache.org/jira/browse/SOLR-1293 Are you saying this is just a concept and the patch is not an implementation? We are planning to use lots of core in our commerce system to

Re: Can Solr with the StatsComponent analyze 20+ million files?

2011-08-08 Thread Fred Smith
Thank you Walter, Markus and Jonathan for your fast responses and help! We will be looking into CouchDB (and Hadoop if necessary) to process our data. Thanks again, Fred

Re: Example Solr Config on EC2

2011-08-08 Thread mbohlig
Matthew, Here's another resource: http://www.lucidimagination.com/blog/2010/02/01/solr-shines-through-the-cloud-lucidworks-solr-on-ec2/ Michael Bohlig Lucid Imagination - Original Message From: Matt Shields To: solr-user@lucene.apache.org Sent: Mon, August 8, 2011 2:03:20 PM Subject

Re: Can Master push data to slave

2011-08-08 Thread Markus Jelsma
Hi, > Hi > > I am using Solr 1.4. and doing a replication process where my slave is > pulling data from Master. I have 2 questions > > a. Can Master push data to slave Not in current versions. Not sure about exotic patches for this. > b. How to make sure that lock file is not created while rep

Re: bug in termfreq? was Re: is it possible to do a sort without query?

2011-08-08 Thread Markus Jelsma
> Dismax queries can. But > > sort=termfreq(all_lists_text,'indie+music') > > is not using dismax. Apparenty termfreq function can not? I am not > familiar with the termfreq function. It simply returns the TF of the given _term_ as it is indexed of the current document. Sorting on TF like

Re: Can Solr with the StatsComponent analyze 20+ million files?

2011-08-08 Thread Jonathan Rochkind
On 8/8/2011 5:10 PM, Markus Jelsma wrote: Will the StatsComponent in Solr do what we need with minimal configuration? Can the StatsComponent only be used on a subset of the data? For example, only look at data from certain months? If i remember correctly, it cannot. Well, if you index things p

Re: csv responsewriter and numfound

2011-08-08 Thread Yonik Seeley
On Mon, Aug 8, 2011 at 5:12 PM, Erik Hatcher wrote: > Great question.  But how would that get returned in the response? > > It is a drag that the header is lost when results are written in CSV, but > there really isn't an obvious spot for that information to be returned. I guess a comment would

Re: bug in termfreq? was Re: is it possible to do a sort without query?

2011-08-08 Thread Markus Jelsma
> Are not Dismax queries able to search for phrases using the default > index(which is what I am using?) If I can already do phrase searches, I > don't understand why I would need to reindex t be able to access phrases > from a function. Executing a Lucene phrase query is not the same as term f

Re: bug in termfreq? was Re: is it possible to do a sort without query?

2011-08-08 Thread Jonathan Rochkind
Dismax queries can. But sort=termfreq(all_lists_text,'indie+music') is not using dismax. Apparenty termfreq function can not? I am not familiar with the termfreq function. To understand why you'd need to reindex, you might want to read up on how lucene actually works, to get a basic understa

Re: csv responsewriter and numfound

2011-08-08 Thread Erik Hatcher
Great question. But how would that get returned in the response? It is a drag that the header is lost when results are written in CSV, but there really isn't an obvious spot for that information to be returned. Erik On Aug 4, 2011, at 01:52 , Pooja Verlani wrote: > Hi, > > Is there

Re: Example Solr Config on EC2

2011-08-08 Thread Yury Kats
On 8/8/2011 5:03 PM, Matt Shields wrote: > I'm looking for some examples of how to setup Solr on EC2. The > configuration I'm looking for would have multiple nodes for redundancy. > I've tested in-house with a single master and slave with replication > running in Tomcat on Windows Server 2003, bu

Re: Can Solr with the StatsComponent analyze 20+ million files?

2011-08-08 Thread Markus Jelsma
> Hi, > Currently we are in the process of figuring out how to deal with > millions of CSV files containing weather data(20+ million files). Each > file is about 500 bytes in size. > We want to calculate statistics on fields read from the file. For > example, the standard deviation of wind speed ac

Re: Dispatching a query to multiple different cores

2011-08-08 Thread Jonathan Rochkind
However, if you unify your schemas to do this, I'd consider whether you really want seperate cores/shards in the first place. If you want to search over all of them together, what are your reasons to put them in seperate solr indexes in the first place? Ordinarily, if you want to search over

Re: Can Solr with the StatsComponent analyze 20+ million files?

2011-08-08 Thread Walter Underwood
This does not seem well matched to Solr. Solr and Lucene are optimized to show the best few matches, not every match. I'd use Hadoop for this. Or MarkLogic, if you'd like to talk about that off-list. wunder Lead Engineer, MarkLogic On Aug 8, 2011, at 1:59 PM, Fred Smith wrote: > Hi, > Current

Example Solr Config on EC2

2011-08-08 Thread Matt Shields
I'm looking for some examples of how to setup Solr on EC2. The configuration I'm looking for would have multiple nodes for redundancy. I've tested in-house with a single master and slave with replication running in Tomcat on Windows Server 2003, but even if I have multiple slaves the single maste

Re: bug in termfreq? was Re: is it possible to do a sort without query?

2011-08-08 Thread Jason Toy
Are not Dismax queries able to search for phrases using the default index(which is what I am using?) If I can already do phrase searches, I don't understand why I would need to reindex t be able to access phrases from a function. On Mon, Aug 8, 2011 at 1:49 PM, Markus Jelsma wrote: > > > Aelexe

Can Solr with the StatsComponent analyze 20+ million files?

2011-08-08 Thread Fred Smith
Hi, Currently we are in the process of figuring out how to deal with millions of CSV files containing weather data(20+ million files). Each file is about 500 bytes in size. We want to calculate statistics on fields read from the file. For example, the standard deviation of wind speed across all 20+

Re: string cut-off filter?

2011-08-08 Thread Markus Jelsma
There is none indeed exept using copyField and maxChars. Could you perhaps come up with some regex that replaces the group of chars beyond the desired limit and replace it with '' ? That would fit in a pattern replace char filter. > Hi Bernd, > > I also searched for such a filter but did not f

Re: solr 3.1, not indexing entire document?

2011-08-08 Thread dhastings
that was it... thanks. obviously the document is well over 2 mgs. -- View this message in context: http://lucene.472066.n3.nabble.com/solr-3-1-not-indexing-entire-document-tp3236719p3236773.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Dispatching a query to multiple different cores

2011-08-08 Thread Erik Hatcher
You could use Solr's distributed (shards parameter) capability to do this. However, if you've got somewhat different schemas that isn't necessarily going to work properly. Perhaps unify your schemas in order to facilitate this using Solr's distributed search feature? Erik On Aug 3, 2

Re: string cut-off filter?

2011-08-08 Thread karsten-solr
Hi Bernd, I also searched for such a filter but did not found it. Best regards Karsten P.S. I am using now this filter: public class CutMaxLengthFilter extends TokenFilter { public CutMaxLengthFilter(TokenStream in) { this(in, DEFAULT_MAXLENGTH); } pu

Re: PivotFaceting in solr 3.3

2011-08-08 Thread Erik Hatcher
As far as I know, there isn't a patch for pivot faceting for 3.x. It'd require extracting the code from trunk and porting it. Perhaps as easy as applying the diff from the pivot commit from trunk to the 3.x codebase? (but probably not quite that easy) Erik On Aug 3, 2011, at 00:58 ,

Re: edismax configuration

2011-08-08 Thread Mark juszczec
Got it. Thank you. I thought this was going to be much more difficult than it actually was. Mark On Mon, Aug 8, 2011 at 4:50 PM, Markus Jelsma wrote: > http://wiki.apache.org/solr/CommonQueryParameters#defType > > > Hello all > > > > Can someone direct me to a link with config info in order to

Re: edismax configuration

2011-08-08 Thread Markus Jelsma
http://wiki.apache.org/solr/CommonQueryParameters#defType > Hello all > > Can someone direct me to a link with config info in order to allow use of > the edismax QueryHandler? > > Mark

Re: bug in termfreq? was Re: is it possible to do a sort without query?

2011-08-08 Thread Markus Jelsma
> Aelexei, thank you , that does seem to work. > > My sort results seem to be totally wrong though, I'm not sure if its > because of my sort function or something else. > > My query consists of: > sort=termfreq(all_lists_text,'indie+music')+desc&q=*:*&rows=100 > And I get back 4571232 hits. Tha

Re: bug in termfreq? was Re: is it possible to do a sort without query?

2011-08-08 Thread Yury Kats
On 8/8/2011 4:34 PM, Jason Toy wrote: > Aelexei, thank you , that does seem to work. > > My sort results seem to be totally wrong though, I'm not sure if its because > of my sort function or something else. > > My query consists of: > sort=termfreq(all_lists_text,'indie+music')+desc&q=*:*&rows=10

Re: solr 3.1, not indexing entire document?

2011-08-08 Thread Markus Jelsma
Check your maxFieldLength settting. > hi, i have my solr field text configured as per earlier discussion: > > autoGeneratePhraseQueries="true"> > > > > > > generateWordParts="1" generateNumberParts="1" catenateWords="1" > catenateNumbers="1" catenateAll="0" splitOnCa

solr 3.1, not indexing entire document?

2011-08-08 Thread dhastings
hi, i have my solr field text configured as per earlier discussion: and for debugging purposes i am storing the text field as well, so: now when i do a search again

bug in termfreq? was Re: is it possible to do a sort without query?

2011-08-08 Thread Jason Toy
Aelexei, thank you , that does seem to work. My sort results seem to be totally wrong though, I'm not sure if its because of my sort function or something else. My query consists of: sort=termfreq(all_lists_text,'indie+music')+desc&q=*:*&rows=100 And I get back 4571232 hits. All the results don't

Re: is it possible to do a sort without query?

2011-08-08 Thread Alexei Martchenko
You can use the standard query parser and pass q=*:* 2011/8/8 Jason Toy > I am trying to list some data based on a function I run , > specifically termfreq(post_text,'indie music') and I am unable to do it > without passing in data to the q paramater. Is it possible to get a sorted > list wit

Test failures on lucene_solr_3_3 and branch_3x

2011-08-08 Thread Shawn Heisey
I've got a consistent test failure on Solr source code checked out from svn. The same thing happens with 3.3 and branch_3x. I have information saved from the failures on branch_3x, which I have gotten to to fail about a dozen times in a row. It fails on a test called TestSqlEntityProcessorDe

is it possible to do a sort without query?

2011-08-08 Thread Jason Toy
I am trying to list some data based on a function I run , specifically termfreq(post_text,'indie music') and I am unable to do it without passing in data to the q paramater. Is it possible to get a sorted list without searching for any terms?

edismax configuration

2011-08-08 Thread Mark juszczec
Hello all Can someone direct me to a link with config info in order to allow use of the edismax QueryHandler? Mark

Re: solr-ruby: Error undefined method `closed?' for nil:NilClass

2011-08-08 Thread Erik Hatcher
Ian - What does your solr-ruby using code look like? Solr::Connection is light-weight, so you could just construct a new one of those for each request. Are you keeping an instance around? Erik On Aug 8, 2011, at 12:03 , Ian Connor wrote: > Hi, > > I have seen some of these errors c

solr-ruby: Error undefined method `closed?' for nil:NilClass

2011-08-08 Thread Ian Connor
Hi, I have seen some of these errors come through from time to time. It looks like: /usr/lib/ruby/1.8/net/http.rb:1060:in `request'\n/usr/lib/ruby/1.8/net/http.rb:845:in `post' /usr/lib/ruby/gems/1.8/gems/solr-ruby-0.0.8/lib/solr/connection.rb:158:in `post' /usr/lib/ruby/gems/1.8/gems/solr-ruby

Re: how to enable MMapDirectory in solr 1.4?

2011-08-08 Thread Rich Cariens
We patched our 1.4.1 build with SOLR-1969(making MMapDirectory configurable) and realized a 64% search performance boost on our Linux hosts. On Mon, Aug 8, 2011 at 10:05 AM, Dyer, James wrote: > If you want to try MMapDirectory with Solr 1.4, then

Re: "Weighted" facet strings

2011-08-08 Thread Jonathan Rochkind
Ah wait, I forgot about dismax 'bq' parameter! That might be a way to accomplish your first and second use cases. You probably still need the seperate _text_weight_X fields for your third use case. Sorry I don't have a complete recipe for you, but hopefully these tools will help get you somew

Re: "Weighted" facet strings

2011-08-08 Thread Jonathan Rochkind
One kind of hacky way to accomplish some of those tasks involves creating a lot more Solr fields. (This kind of 'de-normalization' is often the answer to how to make Solr do something). So facet fields are ordinarily not tokenized or normalized at all. But that doesn't work very well for match

matching exact/whole phrase

2011-08-08 Thread Daphna Chen-Deutsch
Hi, I'm trying to search for a specific phrase on a specific index field. The filed definition is : Type definition: When trying the following query: http://localhost:8983/solr/select?q=title_string:'One Shot' I'm getting back titles that match the string 'One Shot' but also titles that only

Re: Why Slop doens't match anything?

2011-08-08 Thread Alexander Ramos Jardim
Hi Ahmet, Thanks for the help, but since I remade the index, the problem got solved. Maybe I was doing something wrong. 2011/8/3 Ahmet Arslan > > Hm... > > > > No. > > Can you paste output of &debugQuery=on for two queries? > -- Alexander Ramos Jardim

Re: Scoring using POJO/SolrJ

2011-08-08 Thread darren
score isn't a field, so what happens when you remove @Field from the score property but leave the getter/setters? On Mon, 8 Aug 2011 10:07:31 +0100, Kissue Kissue wrote: > Hi, > > I am using the SolrJ client library and using a POJO with the @Field > annotation to index documents and to retriev

PositionIncrement gap and multi-valued fields.

2011-08-08 Thread Luis Cappa Banda
Hello! I have a doubt about the behaviour of searching over field types that have positionIncrementGap defined. For example, supose that: 1. We have a field called "test" defined as multi-valued and white space tokenized. 2. The index has an single document with a "test" value: TEST1

RE: how to enable MMapDirectory in solr 1.4?

2011-08-08 Thread Dyer, James
If you want to try MMapDirectory with Solr 1.4, then copy the class org.apache.solr.core.MMapDirectoryFactory from 3.x or Trunk, and either add it to the .war file (you can just add it under "src/java" and re-package the war), or you can put it in its own .jar file in the "lib" directory under

Multiplexing TokenFilter for multi-language?

2011-08-08 Thread cnyee
Sorry if this has already been discussed, but I have already spent a couple of days googling in vain The problem: - documents in multiple languages (us, de, fr, es). - language is known (a team of editors determines the language manually, and users are asked to specify language option for sear

Re: strip html from data

2011-08-08 Thread Merlin Morgenstern
Unfortunatelly I still cant get it running. The code I am using is the following:

how to enable MMapDirectory in solr 1.4?

2011-08-08 Thread Li Li
hi all, I read Apache Solr 3.1 Released Note today and found that MMapDirectory is now the default implementation in 64 bit Systems. I am now using solr 1.4 with 64-bit jvm in Linux. how can I use MMapDirectory? will it improve performance?

Scoring using POJO/SolrJ

2011-08-08 Thread Kissue Kissue
Hi, I am using the SolrJ client library and using a POJO with the @Field annotation to index documents and to retrieve documents from the index. I retrieve the documents from the index like so: List beans = response.getBeans(Item.class) Now in order to add the scores to the beans i added a field

string cut-off filter?

2011-08-08 Thread Bernd Fehling
Hi list, is there a string cut-off filter to limit the length of a KeywordTokenized string? So the string should not be dropped, only limitited to a certain length. Regards Bernd

Can Master push data to slave

2011-08-08 Thread Pawan Darira
Hi I am using Solr 1.4. and doing a replication process where my slave is pulling data from Master. I have 2 questions a. Can Master push data to slave b. How to make sure that lock file is not created while replication Please help thanks Pawan

Re: Spell Check

2011-08-08 Thread tamanjit.bin...@yahoo.co.in
Hey thanks. It worked. I have another query. Since I have made a dictionary on Keyword tokenization (as I need a dictionary of names), when I try to spellcheck, it works great on most cases. But in cases where say the correct word is :"Shivthar Ghal" and I try to search for a spelling correction

Re: cores vs indices

2011-08-08 Thread Dave Stuart
Hi Daniel, Yes there is a one-to-one relationship between Solr indices and cores. The one to many comes when you look at the relationship between cores and tomcat/jetty webapps instances. This gives you the ability to clone, add and swap cores around. See for for core manipulation functions: h

cores vs indices

2011-08-08 Thread Daniel Schobel
Can someone provide me with a succinct defintion of what a solr core is? Is there a one-to-one relationship of cores to solr indices or can you have multiple indices per core? Cheers, Daniel