AW: FieldCache

2010-10-25 Thread Mathias Walter
I don't think it is an XY problem. I indexed about 90 million sentences and the PAS (predicate argument structures) they consist of (which are about 500 million). Then I try to do NER (named entity recognition) by searching about 5 million entities. For each entity I need the all search results,

Re: How to index on basis of a condition?

2010-10-25 Thread Jan Høydahl / Cominvent
Do you want to use a field's content do decide whether the document should be indexed or not? You could write an UpdateProcessor for that, simply aborting the chain for the docs that don't pass your test. @Override public void processAdd(AddUpdateCommand cmd) throws IOException { SolrInputDo

Re: a bug of solr distributed search

2010-10-25 Thread Toke Eskildsen
On Thu, 2010-07-22 at 04:21 +0200, Li Li wrote: > But itshows a problem of distrubted search without common idf. > A doc will get different score in different shard. Bingo. I really don't understand why this fundamental problem with sharding isn't mentioned more often. Every time the advice "use

Seattle Scalability Meetup: Rackspace OpenStack, Karmasphere Hadoop, Wed Oct 27

2010-10-25 Thread Bradford Stephens
Link/Details: http://www.meetup.com/Seattle-Hadoop-HBase-NoSQL-Meetup/calendar/13704371/ This meetup focuses on Scalability and technologies to enable handling large amounts of data: Hadoop, HBase, distributed NoSQL databases, and more! There's not only a focus on technology, but also everything s

Re: Import From MYSQL database

2010-10-25 Thread virtas
Why don't you paste log excerpt here which is generated when you are trying to import the data. -- View this message in context: http://lucene.472066.n3.nabble.com/Import-From-MYSQL-database-tp1738753p1766375.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: a bug of solr distributed search

2010-10-25 Thread Andrzej Bialecki
On 2010-10-25 11:22, Toke Eskildsen wrote: > On Thu, 2010-07-22 at 04:21 +0200, Li Li wrote: >> But itshows a problem of distrubted search without common idf. >> A doc will get different score in different shard. > > Bingo. > > I really don't understand why this fundamental problem with sharding

Re: Solr Javascript+JSON not optimized for SEO

2010-10-25 Thread Nick Jenkin
The solution is to offer both, and provide fallback for browsers that don't support javascript (e.g. Googlebot) I would also ponder the question "how does this ajax feature help my users?". If you can't find a good answer to that, you should probably just not use ajax. (NB: "it's faster" is not a v

Re: Solr Javascript+JSON not optimized for SEO

2010-10-25 Thread PeterKerk
Offering both...that sounds to me like duplicating development efforts? Or am I overseeing something here? Nick Jenkin-2 wrote: > > NB: "it's faster" is not a valid answer! > Why is it not valid? Because its not necessarily faster or...? And what about user experience? Instead of needing to r

Re: a bug of solr distributed search

2010-10-25 Thread Toke Eskildsen
On Mon, 2010-10-25 at 11:50 +0200, Andrzej Bialecki wrote: > * there is an exact solution to this problem, namely to make two > distributed calls instead of one (first call to collect per-shard IDFs > for given query terms, second call to submit a query rewritten with the > global IDF-s). This solu

solr 1.4 suggester component

2010-10-25 Thread abhayd
hi I was looking into using solr suggester component as described in http://wiki.apache.org/solr/Suggester I have a file which has words, phrases in it. I was wondering how to make following possible. file has - rebate form form when i look for "form" or even "for" i would like to

Re: AW: FieldCache

2010-10-25 Thread Toke Eskildsen
On Mon, 2010-10-25 at 09:41 +0200, Mathias Walter wrote: > [...] I enabled the field cache for my ID field and another > single char field (PAS type) to get the benefit of accessing > the fields with an array. Unfortunately, the IDs are too > large to fit in memory. I gave 12 GB of RAM to each node

Re: Modelling Access Control

2010-10-25 Thread Paul Carey
Many thanks for all the responses. I now plan on benchmarking and validating both the filter query approach, and maintaining the ACL entirely outside of Solr. I'll decide from there. Paul

Re: FieldCache

2010-10-25 Thread Robert Muir
On Mon, Oct 25, 2010 at 3:41 AM, Mathias Walter wrote: > I indexed about 90 million sentences and the PAS (predicate argument > structures) they consist of (which are about 500 million). Then > I try to do NER (named entity recognition) by searching about 5 million > entities. For each entity I

Re: Integrating Carrot2/Solr Deafult Example

2010-10-25 Thread Grant Ingersoll
On Oct 24, 2010, at 1:45 PM, Eric Martin wrote: > Hello, > > > > Welcome to all. I am a very basic user. I have limited knowledge. I read the > documentation, I have an 'example' Solr installation working on my server. I > have Drupal 6. I have Drupal using Solr (apachesolr) as its default sea

RE: FieldCache

2010-10-25 Thread Steven A Rowe
Hi Mathias, > [...] I tried to use IndexableBinaryStringTools to re-encode my 11 byte > array. The size was increased to 7 characters (= 14 bytes) > which is still a gain of more than 50 percent compared to the UTF8 > encoding. BTW: I found no sample how to use the > IndexableBinaryStringTools cla

RE: FieldCache

2010-10-25 Thread Steven A Rowe
Hi Robert, On 10/25/2010 at 8:20 AM, Robert Muir wrote: > it is deprecated in trunk, because you can index binary terms (your > own byte[]) directly if you want. To do this, you need to use a custom > AttributeFactory. It's not actually deprecated yet. > See src/test/org/apache/lucene/index/Test

Re: Modelling Access Control

2010-10-25 Thread Israel Ekpo
On Mon, Oct 25, 2010 at 8:16 AM, Paul Carey wrote: > Many thanks for all the responses. I now plan on benchmarking and > validating both the filter query approach, and maintaining the ACL > entirely outside of Solr. I'll decide from there. > > Paul > Great. I am looking forward for some feedba

Re: EmbeddedSolrServer with one core and schema.xml loaded via ClassLoader, is it possible?

2010-10-25 Thread Paolo Castagna
I've found two ways which allow me to load all the config files from a jar file, however with the first solution I cannot specify the dataDir. This is the first way: System.setProperty("solr.solr.home", solrHome); CoreContainer.Initializer initializer = new CoreContainer.Initializer(); CoreCo

RE: How to index on basis of a condition?

2010-10-25 Thread Ephraim Ofir
Assuming you're talking about data that comes from a DB, I find it easiest to do this kind of logic on the DB's side (mssql example): SELECT IF(someField = someValue, desiredValue, NULL) AS desiredName from someTable If that's not possible, you can use RegexTransformer(http://wiki.apache.org/so

Re: FieldCache

2010-10-25 Thread Robert Muir
On Mon, Oct 25, 2010 at 9:00 AM, Steven A Rowe wrote: > It's not actually deprecated yet. you are right! only in my patch! > AFAICT, Test2BTerms only deals with the indexing side of this issue, and > doesn't test searching. > > LUCENE-2551 does, however, test searching.  Why hasn't this been co

ApacheCon Atlanta next week

2010-10-25 Thread Grant Ingersoll
Hi All, Just a couple of notes about ApacheCon next week for those who either are attending or are thinking of attending. 1. There will be Lucene and Solr 2 day trainings done by Erik Hatcher (Solr) and me (Lucene). It's not too late to sign up. See http://na.apachecon.com/c/acna2010/schedul

Re: solr 1.4 suggester component

2010-10-25 Thread Erick Erickson
Try here: http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/ For the infix-type match you're using, you might not want the "edge" version of ngram..

How to use AND as opposed to OR as the default query operator.

2010-10-25 Thread Swapnonil Mukherjee
Hi Everybody, I simply want to use AND as the default operator in queries. When a user searches for Jennifer Lopez solr converts this to a Jennifer OR Lopez query. On the other hand I want solr to treat this query as Jennifer AND Lopez and not as Jennifer OR Lopez. In other words I want a defa

Re: How to use AND as opposed to OR as the default query operator.

2010-10-25 Thread Markus Jelsma
http://wiki.apache.org/solr/SchemaXml#Default_query_parser_operator On Monday 25 October 2010 15:41:50 Swapnonil Mukherjee wrote: > Hi Everybody, > > I simply want to use AND as the default operator in queries. When a user > searches for Jennifer Lopez solr converts this to a Jennifer OR Lopez >

DataImporter using pure solr XML

2010-10-25 Thread Dario Rigolin
Looking at DataImporter I'm not sure if it's possible to import using a standard ... xml document representing a document add operation. Generating is quite expensive in my application and I have cached all those documents into a text column into MySQL database. It will be easier for me to "push

Re: How to use AND as opposed to OR as the default query operator.

2010-10-25 Thread Pradeep Singh
Which query handler are you using? For a standard query handler you can set q.op per request or set defaultOperator in schema.xml. For a dismax handler you will have to work with min should match. On Mon, Oct 25, 2010 at 6:41 AM, Swapnonil Mukherjee < swapnonil.mukher...@gettyimages.com> wrote:

Re: How to use AND as opposed to OR as the default query operator.

2010-10-25 Thread Swapnonil Mukherjee
Hi Pradeep, I am using the standard query parser. I made the changes in schema.xml and it works. It is also good to know that this can done on a per query basis as well. Swapnonil Mukherjee On 25-Oct-2010, at 7:48 PM, Pradeep Singh wrote: > Which query handler are you using? For a standard

Re: solr 1.4 suggester component

2010-10-25 Thread abhayd
hi erick, Thanks for the link. Problem is we dont want to have another solr core for implementing this, So was trying suggester component as it allows file based auto suggest. It works fine only issue is how to get prefix ignored . Any idea? -- View this message in context: http://lucene.47206

London open-source search social - 28th Oct - NEW VENUE

2010-10-25 Thread Richard Marr
Just a reminder that we're meeting this Thursday near St James Park/Westminster. Details on the Meetup page: http://www.meetup.com/london-search-social/ Rich -- Richard Marr

Re: OutOfMemory and auto-commit

2010-10-25 Thread Jonathan Rochkind
Yes, that's my question too. Anyone? Dennis Gearon wrote: How is this avoided? Dennis Gearon --- On Thu, 10/21/10, Lance Norskog wrote: From: Lance Norskog Subject: Re: OutOfMemory and auto-commit To: solr-user@lucene.apache.org Date: Thursday, October 21, 2010, 9:53 PM Yes. Indexin

Re: Modelling Access Control

2010-10-25 Thread Jonathan Rochkind
Dennis Gearon wrote: why use filter queries? Wouldn't reducing the set headed into the filters by putting it in the main query be faster? (A question to learn, since I do NOT know :-) No. At least as I understand it. In the best case, the filter query will be a lot faster, because filter q

Re: How to use AND as opposed to OR as the default query operator.

2010-10-25 Thread Jonathan Rochkind
However, for user entered queries, I suggest you take a look at dismax, a lot more suitable for user-entered queries than the standard solr-lucene query parsers. Markus Jelsma wrote: http://wiki.apache.org/solr/SchemaXml#Default_query_parser_operator On Monday 25 October 2010 15:41:50 Swapnon

Re: Modelling Access Control

2010-10-25 Thread Dennis Gearon
I'll also be interested in how that works for you. Bringing out the whole dataset not filtered for some kind of access control will mean that you will have then do the filtering of the result set in your server side/command line program. So the speed comparison with the filter query vs the outs

Re: Modelling Access Control

2010-10-25 Thread Dennis Gearon
Thanks for that insight, a lot. Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=45

Does anyone notice this site?

2010-10-25 Thread scott chu
I happen to bump into this site: http://www.solr.biz/ They said they are also developing a search engine? Is this any connection to open source "Solr"?

RE: Does anyone notice this site?

2010-10-25 Thread Eric Martin
This is not legal advice. Take this as it is. Just off my head and what I know. I did not research this, but could, if Solr wants me to. >From a marketing stand-point, probably. >From a legal standpoint. They can do whatever they want with the name Solr so long as they maintain a distance betwee

Re: Does anyone notice this site?

2010-10-25 Thread Grant Ingersoll
On Oct 25, 2010, at 12:54 PM, scott chu wrote: > I happen to bump into this site: http://www.solr.biz/ > > They said they are also developing a search engine? Is this any connection to > open source "Solr"? No, it is not a connection and they likely should not be using the name that way, as

Re: a bug of solr distributed search

2010-10-25 Thread Andrzej Bialecki
On 2010-10-25 13:37, Toke Eskildsen wrote: > On Mon, 2010-10-25 at 11:50 +0200, Andrzej Bialecki wrote: >> * there is an exact solution to this problem, namely to make two >> distributed calls instead of one (first call to collect per-shard IDFs >> for given query terms, second call to submit a que

DIH wiht several Cores

2010-10-25 Thread stockiii
Hello. I have 7 Cores. Each Core has his own index and his own import. i want one DIH with an url like http://host/solr/dih. is this possible that the DIH is using different index-folder ? or its nessecary that each core use his own DIH with the solrconfig from each core ? -- View this message

Re: Does anyone notice this site?

2010-10-25 Thread Peter Keegan
fwiw, our proxy server has blocked this site for malicious content. Peter On Mon, Oct 25, 2010 at 1:25 PM, Grant Ingersoll wrote: > > On Oct 25, 2010, at 12:54 PM, scott chu wrote: > > > I happen to bump into this site: http://www.solr.biz/ > > > > They said they are also developing a search eng

Re: Solr ExtractingRequestHandler with Compressed files

2010-10-25 Thread Jayendra Patil
There was this issue with the previous version of Solr, wherein only the file names from the zip used to get indexed. We had faced the same issue and ended up using the Solr trunk which has the Tika version upgraded and works fine. The Solr version 1.4.1 should also have the fix included. Try usin

RE: FieldCache

2010-10-25 Thread Mathias Walter
Hi, > On Mon, Oct 25, 2010 at 3:41 AM, Mathias Walter > wrote: > > I indexed about 90 million sentences and the PAS (predicate argument > structures) they consist of (which are about 500 million). Then > > I try to do NER (named entity recognition) by searching about 5 million > entities. For eac

Re: FieldCache

2010-10-25 Thread Robert Muir
On Mon, Oct 25, 2010 at 3:41 PM, Mathias Walter wrote: > How do I use it with Solr, i. e. how to set up a schema.xml using a custom > AttributeFactory? > at the moment there is no way to specify an AttributeFactory (AttributeFactoryFactory? heh) in the schema.xml, nor do the TokenizerFactories

command line to check if Solr is up running

2010-10-25 Thread Xin Li
As we know we can use browser to check if Solr is running by going to http://$hostName:$portNumber/$masterName/admin, say http://localhost:8080/solr1/admin. My questions is: are there any ways to check it using command line? I used "curl http://localhost:8080"; to check my Tomcat, it worked fin

Re: command line to check if Solr is up running

2010-10-25 Thread Rob Casson
you could look at the ping stuff: http://wiki.apache.org/solr/SolrConfigXml#The_Admin.2BAC8-GUI_Section cheers, rob On Mon, Oct 25, 2010 at 3:56 PM, Xin Li wrote: > As we know we can use browser to check if Solr is running by going to > http://$hostName:$portNumber/$masterName/admin, say

Re: command line to check if Solr is up running

2010-10-25 Thread Ahmet Arslan
> My questions is: are > there any ways to check it using command line? I used "curl > http://localhost:8080"; to check my Tomcat, it worked > fine. However, no response if I try "curl http://localhost:8080/solr1/admin"; > (even when my Solr > is running). Does anyone know any command line > alter

RE: command line to check if Solr is up running

2010-10-25 Thread Xin Li
Thanks Bob and Ahmet, "curl http://localhost:8080/solr1/admin/ping"; works fine :) Xin -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com] Sent: Monday, October 25, 2010 4:03 PM To: solr-user@lucene.apache.org Subject: Re: command line to check if Solr is up running > M

error in Solr log when adding documents?

2010-10-25 Thread Jonathan Rochkind
Anyone seen anything like this before, the error message does not give me very much information, not sure what's going on. Oct 25, 2010 4:11:02 PM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: ERROR adding document SolrInputDocument [lengthy serialized h

Re: DataImporter using pure solr XML

2010-10-25 Thread Ken Stanley
On Mon, Oct 25, 2010 at 10:12 AM, Dario Rigolin wrote: > Looking at DataImporter I'm not sure if it's possible to import using a > standard ... xml document representing a document add operation. > Generating is quite expensive in my application and I have > cached > all those documents into a te

replication with multicores

2010-10-25 Thread Mike Zupan
On my master for the forum core I have the following in forum/conf/solrconfig.xml startup commit optimize Then on the slave for the forum core I have the following in forum/conf/solrconfig.xml http://host.domain.com:8983/solr/

Re: DIH wiht several Cores

2010-10-25 Thread markwaddle
Unfortunately, what you are asking for is not possible. The DIH needs to be configured separately for each core. I have a similar situation with my Solr application. I am solving it by creating a custom index feeder that is aware of all of the cores and which documents to send to which cores. --

Re: Failing to successfully import international characters via DIH

2010-10-25 Thread virtas
As it turns out issue was somewhere in mysql. Not sure exactly where, but something to do to with BLOB. Now, I changed text field from BLOB to varchar and started using mysql_real_escape_string in my php code and all started working just fine. Thanks for the help -- View this message in conte

after the slave node pull index from master, when will solr del the tmp index dir

2010-10-25 Thread Chengyang
I noticed that the slave node have some tmp Index.x dir that created during the index sync with master, but they are not removed even after serval days. So when will solr del the tmp index dir?

Keeping "qt" parameter in distributed search

2010-10-25 Thread Shawn Heisey
I have a request handler with a qt of "lbcheck" so that load balancer healthchecks, which happen every five seconds, do not skew my query statistics. I've recently modified the way i do my load balancing, which required that I add a shards parameter to my PingRequestHandler. The ping handler

Need help for solr searching case insensative item

2010-10-25 Thread wu liu
Hi all, I just noticed a wierd thing happend to my solr search result. if I do a search for "ecommons", it cannot get the result for "eCommons", instead, if i do a search for "eCommons", i can only get all the match for "eCommons", but not "ecommons". I cannot figure it out why? please help me

Externalizing properties file

2010-10-25 Thread sivaprasad
Hi, I created custom component in solr.This is using one properties file.When i place the jar in solr_home lib directory the class is coming into class path, but the properties file is not.If i bundle the properties file in side jar , the file is coming into class path.But i need to externalize t