RE: large scale indexing issues / single threaded bottleneck

2011-11-01 Thread Roman Alekseenkov
We have a rate of 2K small docs/sec which translates into 90 GB/day of index space You should be fine Roman Awasthi, Shishir wrote: > > Roman, > How frequently do you update your index? I have a need to do real time > add/delete to SOLR documents at a rate of approximately 20/min. > The total n

Re: change solr url

2011-11-01 Thread Ankita Patil
I am not very clear. Could you explain a bit in detail or give an example. Ankita. On 2 November 2011 06:26, Chris Hostetter wrote: > > : Is it possible to change the url for solr admin?? > : What i want is : > : http://192.168.0.89:8983/solr/private/coreName/admin > : > : i want to add /privat

RE: large scale indexing issues / single threaded bottleneck

2011-11-01 Thread Awasthi, Shishir
Roman, How frequently do you update your index? I have a need to do real time add/delete to SOLR documents at a rate of approximately 20/min. The total number of documents are in the range of 4 million. Will there be any performance issues? Thanks, Shishir -Original Message- From: Roman A

Solr real-time update taking time

2011-11-01 Thread vijay.sampath
Hi All, I recently started working on SOLR 3.3 and would need your expertise to provide a solution. I'm working on a POC, in which I've imported 3.5 million document records using DIH. We have a source system which publishes change data capture in a XML format. The requirement is to integrate S

Re: change solr url

2011-11-01 Thread Chris Hostetter
: Is it possible to change the url for solr admin?? : What i want is : : http://192.168.0.89:8983/solr/private/coreName/admin : : i want to add /private/ before the coreName. Is that possible? If yes how? You can either do this via settings in your servlet container (to specify that hte mapping

Re: Can Solr handle large text files?

2011-11-01 Thread Peter Spam
Oh by the way - what analyzer are you using for your log files? Here's what I'm trying: Thanks! Pete On Oct 31, 2011, at 9:28 PM, anand.ni...@rbs.com wrote: > Hi, > > Basically I need to index very large log files. I have modified the > Ex

Re: Can Solr handle large text files?

2011-11-01 Thread Peter Spam
Wow, 50 lines is tiny! Is that how small you need to go, to get good highlighting performance? I'm looking at documents that can be up to 800MB in size, so I've decided to split them down into 256k chunks. I'm still indexing right now - I'm curious to see how performance is when the injection

Re: Lucene queries to Solr requestHandler

2011-11-01 Thread Chris Hostetter
Grrr cut/paste mistake. This... : public class FieldQParserPlugin extends QParserPlugin { ...should have been something like... public class MyQParserPlugin extends QParserPlugin { ...to match the configuration example... -Hoss

Re: Lucene queries to Solr requestHandler

2011-11-01 Thread Chris Hostetter
: I have these queries in Lucene 2.9.4, is there a way to convert these : exactly to Solr 3.4 but using only the solrconfig.xml? I will figure out the : queries but I wanted to know if it is even possible to go from here to : having something like this: : : : ... queries : : : So the fr

Re: Newbie question

2011-11-01 Thread Chris Hostetter
: If using CommonsHttpSolrServer query() method with parameter wt=json, when : retrieving QueryResponse, how to do to get JSON result output stream ? when you are using the CommonsHttpSolrServer level of API, the client takes care of parsing the response (which is typically in an efficient bina

Re: multiple dateranges/timeslots per doc: modeling openinghours.

2011-11-01 Thread Chris Hostetter
: This would need 2*3*100 = 600 dynamicfields to cover the openinghours. You : mention this is peanuts for constructing a booleanquery, but how about : memory consumption? : I'm particularly concerned about the Lucene FieldCache getting populated for : each of the 600 fields. (Since I had some nas

Re: Uncomplete date expressions

2011-11-01 Thread Chris Hostetter
: But Solr is (intentionally) stupid about dates, and : requires the (almost) full date format. There are I'm not sure how i feel about "intentionally stupid" ... but the underlying sentiment is correct: Solr requires clients to be *VERY* explicit about dates, because that way the client is in

Re: How to use an External Database for Fields?

2011-11-01 Thread Chris Hostetter
: I don't think I'm quite getting this. Instead of going down that low, : could you make your own ResponseWriter? That has access to all : the information in the doc, and it seems like you could reach out to : the DB at that point and get your information merrily adding it to the : docs. Agreed.

Re: Replicating Large Indexes

2011-11-01 Thread Chris Hostetter
: We optimize less frequently than we used to. Down to twice a month from once a day. : : Without optimizing the search speed stays the same, however the index size increases to 70+ GB. : : Perhaps there is a different way to restrict disk usage. Consider using the "maxSegments" option on o

Re: edismax/boost: certain documents should be last

2011-11-01 Thread Chris Hostetter
: For the record, I figured out something that will work, although it is : somewhat inelegant. My q parameter is now: : : (+content:notes -genre:Citation)^20 (+content:notes genre:Citation)^0.01 : : Can I improve on that? not really (although you can probably get cleaner seperate of query and

Re: Questions about Solr's security

2011-11-01 Thread Alireza Salimi
Yeah, actually our firewalls/loadbalancers can handle these issues. If they don't, then I'll use HAProxy. Thanks for all info :-) On Tue, Nov 1, 2011 at 5:42 PM, Robert Stewart wrote: > I think you can address a lot of these concerns by running some proxy in > front of SOLR, such as HAProxy. Yo

Re: Questions about Solr's security

2011-11-01 Thread Robert Stewart
I think you can address a lot of these concerns by running some proxy in front of SOLR, such as HAProxy. You should be able to limit only certain URIs (so you can prevent /select queries).HAProxy is a free software load-balancer, and it is very configurable and fairly easy to setup. On No

Re: Usage of Double quotes for single terms (camelcase) while querying

2011-11-01 Thread Chris Hostetter
: Subject: Usage of Double quotes for single terms (camelcase) while querying : References: : : <6640582f-568a-4402-8ce7-bb6d8c9fc...@mac.com> : : <51d31d0a-e7c4-4ce8-90f3-67ec5939c...@mac.com> : In-Reply-To: <51d31d0a-e7c4-4ce8-90f3-67ec5939c...@mac.com> https://people.apache.org/~hossma

Re: Questions about Solr's security

2011-11-01 Thread Alireza Salimi
sorry, I didn't explain that part. We are the developers of client codes too. Meaning that just we know the credentials to access the web container, and we won't run such queries. Right now, I'm writing a subclass of SearchHandler which changes the SolrParams to remove 'qt' parameter and limit the

Re: Questions about Solr's security

2011-11-01 Thread Erik Hatcher
SSL and auth doesn't address that /select can hit any request handler defined (/select?qt=/update&stream.body=*:*&commit=true). Be careful! But certainly knowing all the issues mentioned on this thread, it is possible to lock Solr down and make it safe to hit directly. But not out of the box

Re: LocalParams, bq, and highlighting

2011-11-01 Thread Demian Katz
> This is definitely an interesting case that i don't think anyone ever > really considered before. It seems like a strong argument in favor of > adding an "hl.q" param that the HighlightingComponent would use as an > override for whatever the QueryComponent thinks the highlighting query > should

Re: Questions about Solr's security

2011-11-01 Thread Alireza Salimi
I'm not sure if anybody has asked these questions before or not. Sorry if they are duplicates. The problem is that the clients (smart phones) of our Solr machines are outside the network in which solr machines are located. So, we need to somehow expose their service to the outside word. What's th

Re: Questions about Solr's security

2011-11-01 Thread Walter Underwood
I once had to deal with a severe performance problem caused by a bot that was requesting results starting at 5000. We disallowed requests over a certain number of pages in the front end to fix it. wunder On Nov 1, 2011, at 12:57 PM, Erik Hatcher wrote: > Be aware that even /select could have s

Re: Questions about Solr's security

2011-11-01 Thread Erik Hatcher
Be aware that even /select could have some harmful effects, see https://issues.apache.org/jira/browse/SOLR-2854 (addressed on trunk). Even disregarding that issue, /select is a potential gateway to any request handler defined via /select?qt=/req_handler Again, in general it's not a good idea to

Re: Questions about Solr's security

2011-11-01 Thread Alireza Salimi
What if we just expose '/select' paths - by firewalls and load balancers - and also use SSL and HTTP basic or digest access control? On Tue, Nov 1, 2011 at 2:20 PM, Chris Hostetter wrote: > > : I was wondering if it's a good idea to expose Solr to the outside world, > : so that our clients runni

Re: Selective Result Grouping

2011-11-01 Thread entdeveloper
Martijn v Groningen-2 wrote: > > When using the group.field option values must be the same otherwise > they don't get grouped together. Maybe fuzzy grouping would be nice. > Grouping videos and images based on mimetype should be easy, right? > Videos have a mimetype that start with video/ and ima

Re: Limit by score? sort by other field

2011-11-01 Thread Chris Hostetter
: Sounds like a custom sorting collector would work - one that throws away : docs with less than some minimum score, so that it only collects/sorts did you look at the example query Karsten mentioned (and also discussedin the linked thread) there is no need for a custom collector to do this, y

Re: Questions about Solr's security

2011-11-01 Thread Chris Hostetter
: I was wondering if it's a good idea to expose Solr to the outside world, : so that our clients running on smart phones will be able to use Solr. As a general rule of thumb, i would say that it is not a good idea to expose solr directly to the public internet. there are exceptions to this rule

Re: Questions about Solr's security

2011-11-01 Thread Alireza Salimi
Thanks Robert, But do you also think limiting the page size inside a request handler is a good solution for attackers? Honestly, I'm not sure if it's a good solution, that doesn't save a server from attackers at all. Do you agree with me? We are not security experts, just developers, but any sugg

Re: Find Documents with field = maxValue

2011-11-01 Thread Chris Hostetter
: What I'm looking for is to do everything in single shot in Solr. : I'm not even sure if it's possible or not. : Finding the max value and then running another query is NOT my ideal : solution. stats component to determine the max value, and a second query to search for docs containing that val

index enum

2011-11-01 Thread Radha Krishna Reddy
Hi, I have 2 issues. 1. I have an enum column in my sql table.i want to index that column.which fieldtype should i specify in the schema.xml for enum? 2. Normally we can index one column in a table using the column header as entity name and the column data as value of the entity.Can i index 2 c

Re: Can't find resource 'solrconfig.xml'

2011-11-01 Thread Chris Hostetter
rather then mucking with system properties, i find that using JNDI is the easiest and cleanest way to configure solr home with tomcat. https://wiki.apache.org/solr/SolrTomcat#Configuring_Solr_Home_with_JNDI ...those instructions are fairly simple, and will work on both windows and linux (just

Multivalued fields question

2011-11-01 Thread Travis Low
Greetings. We're finally kicking off our little Solr project. We're indexing a paltry 25,000 records but each has MANY documents attached, so we're using Tika to parse those documents into a big long string, which we use in a call to solrj.addField("relateddoccontents", bigLongStringOfDocumentCon

Re: simple persistance layer on top of Solr

2011-11-01 Thread Mikhail Garber
This is very good idea and I used it several times over the years with great success. As long as you understand limitations (global transactions, not being able to "update" records, ...) On Tue, Nov 1, 2011 at 8:47 AM, Memory Makers wrote: > Greetings guys, > > I have been thinking of using Solr

RE: Replicating Large Indexes

2011-11-01 Thread Jason Biggin
Thanks Erick, Will take a look at this article. Cheers, Jason -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Tuesday, November 01, 2011 8:05 AM To: solr-user@lucene.apache.org Subject: Re: Replicating Large Indexes Yes, that's expected behavior. When you

Re: Questions about Solr's security

2011-11-01 Thread Robert Stewart
You would need to setup request handlers in solrconfig.xml to limit what types of queries people can send to SOLR (and define things like max page size, etc). You need to restrict people from sending update/delete commands as well. Then at the minimum, setup some proxy in front of SOLR that y

Re: simple persistance layer on top of Solr

2011-11-01 Thread Memory Makers
Well, I've done a lot of work with MySQL and content management systems -- and frankly whenever I have to integrate with Solr or do some Lucene work I am amazed at the speed -- even when I index web pages for search -- MySQL pales by comparison when data sets get large (2> million rows) Thanks,

Re: Replicating Large Indexes

2011-11-01 Thread Robert Stewart
Do you do a lot of deletes (or 'updates' of existing documents)? Do you store lots of large fields? Maybe you can use compressed fields in that case (we never have tried it so I cannot confirm how well it works or performs). You can also turn off things like norms and vectors, etc. if you aren't

Re: Is SQL Like operator feature available in Apache Solr query

2011-11-01 Thread Erick Erickson
Start here: http://lucene.apache.org/solr/api/org/apache/solr/analysis/NGramFilterFactory.html But the idea is that you define a field with the NGramFilterFactory and it indexes, (here are bigrams) mysolrstuff as separate tokens: my ys so ol lr rs st tu uf ff. This supports the %solr% idea if you

Questions about Solr's security

2011-11-01 Thread Alireza Salimi
Hi, I was wondering if it's a good idea to expose Solr to the outside world, so that our clients running on smart phones will be able to use Solr. If we decide to do this, what's the security concerns about it? For example, someone suggested we should limit the number of rows requested in order

Re: Replicating Large Indexes

2011-11-01 Thread Jason Biggin
Thanks Robert. We optimize less frequently than we used to. Down to twice a month from once a day. Without optimizing the search speed stays the same, however the index size increases to 70+ GB. Perhaps there is a different way to restrict disk usage. Thanks, Jason Robert Stewart wrote:

Re: simple persistance layer on top of Solr

2011-11-01 Thread Robert Stewart
One other potentially huge consideration is how "updatable" you need documents to be. Lucene only can replace existing documents, it cannot modify existing documents directly (so an update is essentially a delete followed by an insert of a new document with the same primary key). There are per

Re: simple persistance layer on top of Solr

2011-11-01 Thread Robert Stewart
It is not a horrible idea. Lucene has a pretty reliable index now (it should not get corrupted). And you can do backups with replication. If you need ranked results (sort by relevance), and lots of free-text queries then using it makes sense. If you just need boolean search and maybe some sor

Re: simple persistance layer on top of Solr

2011-11-01 Thread Memory Makers
Well I want something beyond a key value store. I want to be able to free-text search documents I want to be able to retrieve documents based on other criteria I'm not sure how that would compare with something like MongoDB. Thanks. On Tue, Nov 1, 2011 at 11:49 AM, Walter Underwood wrote:

Re: simple persistance layer on top of Solr

2011-11-01 Thread Walter Underwood
Other than "it isn't a database"? If you want a key/value store, use one of those. If you want a full DB with transactions, use one of those. wunder On Nov 1, 2011, at 8:47 AM, Memory Makers wrote: > Greetings guys, > > I have been thinking of using Solr as a simple database due to it's > bli

Re: Solr Profiling

2011-11-01 Thread Andre Parodi
I guess it could be many things. Typically an easy one to spot is if you have insufficient heap (i.e. your 16Gb) and the jvm is full gc'ing constantly and not freeing up any memory and using lots of cpu. This would make solr slow and "hangs up" as well during potentially long gc pauses. add:

simple persistance layer on top of Solr

2011-11-01 Thread Memory Makers
Greetings guys, I have been thinking of using Solr as a simple database due to it's blinding speed -- actually I've used that approach in some projects with decent success. Any thoughts on that? Thanks, MM.

Re: Is SQL Like operator feature available in Apache Solr query

2011-11-01 Thread Memory Makers
Eric, NGrams could you elaborate on that ? -- haven't seen that before. Thanks. On Tue, Nov 1, 2011 at 11:06 AM, Erick Erickson wrote: > NGrams are often used in Solr for this case, but they will also add to > your index size. > > It might be worthwhile to look closely at your user requirements

Re: Is SQL Like operator feature available in Apache Solr query

2011-11-01 Thread Michael Kuhlmann
Am 01.11.2011 16:06, schrieb Erick Erickson: NGrams are often used in Solr for this case, but they will also add to your index size. It might be worthwhile to look closely at your user requirements before going ahead and supporting this functionality Best Erick My opinion. Wildcards are g

Re: Replicating Large Indexes

2011-11-01 Thread Robert Stewart
Optimization merges index to a single segment (one huge file), so entire index will be copied on replication. So you really do need 2x disk in some cases then. Do you really need to optimize? We have a pretty big total index (about 200 million docs) and we never optimize. But we do have a sh

Re: Is SQL Like operator feature available in Apache Solr query

2011-11-01 Thread Erick Erickson
NGrams are often used in Solr for this case, but they will also add to your index size. It might be worthwhile to look closely at your user requirements before going ahead and supporting this functionality Best Erick 2011/11/1 François Schiettecatte : > Kuli > > Good point about just tokeniz

Re: Replicating Large Indexes

2011-11-01 Thread Erick Erickson
Yes, that's expected behavior. When you optimize, all segments are copied over to new segments(s). Since all changed/new segments are replicated to the slave, you'll (temporarily) have twice the data on your disk. You can stop optimizing, it's often not really very useful despite its name. That s

Using Solr components for dictionary matching?

2011-11-01 Thread Nagendra Mishr
Hi all, Is there a good guide on using Solr components as a dictionary matcher? I'm need to do some pre-processing that involves lots of dictionary lookups and it doesn't seem right to query solr for each instance. Thanks in advance, Nagendra

Re: Is SQL Like operator feature available in Apache Solr query

2011-11-01 Thread François Schiettecatte
Kuli Good point about just tokenizing the fields :) I ran a couple of tests to double-check my understanding and you can have a wildcard operator at either or both ends of a term. Adding ReversedWildcardFilterFactory to your field analyzer will make leading wildcard searches a lot faster of co

Re: Is SQL Like operator feature available in Apache Solr query

2011-11-01 Thread Michael Kuhlmann
Hi, this is not exactly true. In Solr, you can't have the wildcard operator on both sides of the operator. However, you can tokenize your fields and simply query for "Solr". This is what's Solr made for. :) -Kuli Am 01.11.2011 13:24, schrieb François Schiettecatte: Arshad Actually it is

Re: Is SQL Like operator feature available in Apache Solr query

2011-11-01 Thread François Schiettecatte
Arshad Actually it is available, you need to use the ReversedWildcardFilterFactory which I am sure you can Google for. Solr and SQL address different problem sets with some overlaps but there are significant differences between the two technologies. Actually '%Solr%' is a worse case for SQL bu

Is SQL Like operator feature available in Apache Solr query

2011-11-01 Thread arshad ansari
Hi, Is SQL Like operator feature available in Apache Solr Just like we have it in SQL. SQL example below - *Select * from Employee where employee_name like '%Solr%'* If not is it a Bug with Solr. If this feature available, please tell the examples available. Thanks! -- Best Regards, Arshad

Re: MultiValued fields and Facets...

2011-11-01 Thread Tiernan OToole
I have figured out what was wrong... The field Warehouse was not marked as indexed... It was being stored, but not indexed... It is now working as expected. Thanks. --Tiernan On Wed, Oct 26, 2011 at 1:01 PM, Tiernan OToole wrote: > Ok, so now i am getting something back, but still getting "o