Solr - Tika(?) memory leak

2012-01-13 Thread Wayne W
Hi, we're using Solr running on tomcat with 1GB in production, and of late we've been having a huge number of OutOfMemory issues. It seems from what I can tell this is coming from the tika extraction of the content. I've processed the java dump file using a memory analyzer and its pretty clean at

Improving Solr Spell Checker Results

2012-01-13 Thread David Radunz
Hey, Firstly I would like to thank you all for creating such a great searching platform. What I was wondering is whether it is possible to: 1. Have the spell checker take into account multiple words. For example if I search for "Sigourney Wever" it doesn't flag as a spelling issue as 'we

Re: Restricting access to shards / collections with SolrCloud

2012-01-13 Thread Jaran Nilsen
Excellent. Thank you, Mark! This will be a huge improvement for us when this functionality goes live :) Jaran On Fri, Jan 13, 2012 at 9:13 PM, Mark Miller wrote: > Here it is: https://issues.apache.org/jira/browse/SOLR-2287 > > On Fri, Jan 13, 2012 at 3:12 PM, Mark Miller > wrote: > > > > > >

Re: JSON & XML response writer issues with short & binary fields

2012-01-13 Thread Ken Krugler
On Jan 13, 2012, at 1:39pm, Yonik Seeley wrote: > -Yonik > http://www.lucidimagination.com > > > > On Fri, Jan 13, 2012 at 4:22 PM, Yonik Seeley > wrote: >> On Fri, Jan 13, 2012 at 4:04 PM, Ken Krugler >> wrote: >>> I finally got around to looking at why short field values are returned as >

Re: JSON & XML response writer issues with short & binary fields

2012-01-13 Thread Yonik Seeley
-Yonik http://www.lucidimagination.com On Fri, Jan 13, 2012 at 4:22 PM, Yonik Seeley wrote: > On Fri, Jan 13, 2012 at 4:04 PM, Ken Krugler > wrote: >> I finally got around to looking at why short field values are returned as >> "java.lang.Short:". >> >> Both XMLWriter.writeVal() and TextRespo

Re: JSON & XML response writer issues with short & binary fields

2012-01-13 Thread Yonik Seeley
On Fri, Jan 13, 2012 at 4:04 PM, Ken Krugler wrote: > I finally got around to looking at why short field values are returned as > "java.lang.Short:". > > Both XMLWriter.writeVal() and TextResponseWriter.writeVal() are missing the > check for (val instanceof Short), and thus this bit of code is u

JSON & XML response writer issues with short & binary fields

2012-01-13 Thread Ken Krugler
I finally got around to looking at why short field values are returned as "java.lang.Short:". Both XMLWriter.writeVal() and TextResponseWriter.writeVal() are missing the check for (val instanceof Short), and thus this bit of code is used: // default... for debugging only writeStr(na

Re: linking query in DIH fails with sql syntax error when specific fields contain bad data

2012-01-13 Thread Mikhail Khludnev
Hello, I'm afraid you can only vote https://issues.apache.org/jira/browse/SOLR-1262 Regards On Fri, Jan 13, 2012 at 11:16 PM, geeky2 wrote: > > hello all, > > > some of my records contain bad data i the orb_itm_id column. > > example: > > select * from prtxtps_prt_summ where orb_itm_id like ''

Re: Restricting access to shards / collections with SolrCloud

2012-01-13 Thread Mark Miller
Here it is: https://issues.apache.org/jira/browse/SOLR-2287 On Fri, Jan 13, 2012 at 3:12 PM, Mark Miller wrote: > > > On Thu, Jan 12, 2012 at 5:13 AM, Jaran Nilsen wrote: > >> >> My questions are: >> >> 1. would it be an idea to create a separate collection for the shards that >> are restricted?

Re: Restricting access to shards / collections with SolrCloud

2012-01-13 Thread Mark Miller
On Thu, Jan 12, 2012 at 5:13 AM, Jaran Nilsen wrote: > > > My questions are: > > 1. would it be an idea to create a separate collection for the shards that > are restricted? If so, is there currently any support for specifying which > collections to search so that we could implement the solution ou

linking query in DIH fails with sql syntax error when specific fields contain bad data

2012-01-13 Thread geeky2
hello all, some of my records contain bad data i the orb_itm_id column. example: select * from prtxtps_prt_summ where orb_itm_id like '''%'; prd_gro_id spp_id orb_itm_id ds_tx rnk_no 0022 335 ' LONG. (TERMINAL ATTACH )' LONG. (TERMINAL ATTACH) 0 0042

Re: Problem with facet.fields

2012-01-13 Thread Chris Hostetter
: So multivalued URL params are not taken in account. : I'm using Jetty and Solrj with EmbeddedSolrServer implementation. : Trying it using the "normal" http version does work, so you're right : it's a problem with the client library. : : Any idea why it would refuse multivalued parameters? i do

Re: a way to marshall xml doc into a SolrInputDocument

2012-01-13 Thread Chris Hostetter
: Anyway thanks, seems I'll have to code it myself, not hard, just tedious. you could probably re-use a *log* of what's in XMLLoader -- certinaly easier then starting from scratch -- i just don't know if you'll be able to drop it in and use the API as is. -Hoss

Re: Solr core as a dispatcher

2012-01-13 Thread Chris Hostetter
: - binary fields don't work. The results come back as "@B[]", versus the actual data. : - short fields get "java.lang.Short" text prefixed on every value. these sound like they must be bugs in the javabin codec, certianly not intentional Ken: did you file jiras about these? -Hoss

Re: server stop responding in few hours due to CLOSE_WAIT

2012-01-13 Thread Mikhail Khludnev
Hello, It sounds like disabled http keep alive (connection cache). Here is the solutionfor jdk's http client. Unfortunately I have no experience with your Commons Http Client, but cm.closeIdleConnections(0L) looks very suspi

Re: FacetComponent: suppress original query

2012-01-13 Thread Chris Hostetter
: I would like to "by-pass" the maxBooleanClauses limit in such a way, that : those queries that contain boolean clauses more than maxBooleanClauses in : the number, would be automatically split into sub-queries. That part is : done. : : Now, when such a query arrives, solr throws : : org.apache

Re: faceting question

2012-01-13 Thread Christopher Gross
I do have it as a text_ws field. The field list is pretty long, and the two around it are strings. So, my bad. Thanks all! -- Chris On Fri, Jan 13, 2012 at 9:48 AM, wrote: > > What index analyzer or field settings are you using for that field? Sounds > like it might be tokenized. Maybe loo

Re: Can Apache Solr Handle TeraByte Large Data

2012-01-13 Thread Robert Stewart
Any idea how many documents your 5TB data contains? Certain features such as faceting depends more on # of total documents than on actual size of data. I have tested approx. 1 TB (100 million documents) running on a single machine (40 cores, 128 GB RAM), using distributed search across 10 shard

server stop responding in few hours due to CLOSE_WAIT

2012-01-13 Thread Jonty Rhods
Hi All , I am facing problem of too many CLOSE_WAIT. My env is : solr 3.4 in Linux RHEL 5.2. I am getting around 1 million request per day on application server on my production. Production server is communicating locally with solr server. I have 5 core setup and for each core I am using seprate

Re: search within specific domain

2012-01-13 Thread Erick Erickson
This is all just adding the appropriate filter query (fq) on the query you generate I think.. Something like fq=url:(nytimes.com). Of course you have to have a url field that's appropriately analyzed for this to work like you want. Best Erick On Fri, Jan 13, 2012 at 9:46 AM, remi tassing wrote:

Re: Can Apache Solr Handle TeraByte Large Data

2012-01-13 Thread darren
Maybe also have a look at these links. http://www.hathitrust.org/blogs/large-scale-search/performance-5-million-volumes http://www.hathitrust.org/blogs/large-scale-search On Fri, 13 Jan 2012 15:49:06 +0100, Daniel Brügge wrote: > Hi, > > it's definitely a problem to store 5TB in Solr without u

Re: Can Apache Solr Handle TeraByte Large Data

2012-01-13 Thread Daniel Brügge
Hi, it's definitely a problem to store 5TB in Solr without using sharding. I try to split data over solr instances, so that the index will fit in my memory on the server. I ran into trouble with a Solr using 50G index. Daniel On Jan 13, 2012, at 1:08 PM, mustafozbek wrote: > I am an apache s

Re: faceting question

2012-01-13 Thread darren
What index analyzer or field settings are you using for that field? Sounds like it might be tokenized. Maybe look at alternatives that don't tokenize fields. Just a guess here though. Good luck. On Fri, 13 Jan 2012 09:04:00 -0500, Christopher Gross wrote: > My index has a multi-valued String fie

search within specific domain

2012-01-13 Thread remi tassing
Hello all, I think it's possible with Solr to search within a specific domain (like with google). How is done? Ref: http://support.google.com/websearch/bin/answer.py?hl=en&answer=136861&rd=1 *Search within a specific website (site:)* Google allows you to specify that your search results must come

Re: Restricting access to shards / collections with SolrCloud

2012-01-13 Thread Jaran Nilsen
Hi Erick, thanks for your response! I think we'll stick with our current solution for now, but thanks for your suggestion on using tokens. Best, Jaran On Fri, Jan 13, 2012 at 1:58 PM, Erick Erickson wrote: > The SolrCloud capabilities are pretty new to me too, but I doubt > anything like this i

Re: FacetComponent: suppress original query

2012-01-13 Thread Dmitry Kan
Hello, The problem seem to have been solved (still some testing is required). But I stumbled upon another issue.. which requires telling a bit about the use case. I would like to "by-pass" the maxBooleanClauses limit in such a way, that those queries that contain boolean clauses more than maxBool

Re: Facets, Get top 10 categories

2012-01-13 Thread Dmitry Kan
If you mean, that you need to group facets starting after top-10 as "Others", than I'm not sure if SOLR would allow you do this without tweaking on the source code level. However, it is still possible on the client side to grab those facet counts that logically belong to "Others" group and sum thei

Merging text nodes/blocks of the catchall field

2012-01-13 Thread JZ
Currently, I have copied numerous fields to the catchall field. When I retrieve this field, it returns me the copied text nodes/blocks, separated by a comma. I want to merge all of the content of the catchall field. This is more convenient for keyword highlighting, so more text is shown. My que

Re: faceting question

2012-01-13 Thread Manish Bafna
Can you send the schema you are using? Looks like you are using WhiteSpace / StandardAnalyzer for the tag field. On Fri, Jan 13, 2012 at 7:34 PM, Christopher Gross wrote: > My index has a multi-valued String field called "tag" that is used to > store a category/keyword for the item the record is

Re: Facets, Get top 10 categories

2012-01-13 Thread Manish Bafna
How to mark remaining as "Others" That field is a multi-valued field and so cant do any calculation based on resultset count. On Fri, Jan 13, 2012 at 5:44 PM, Dmitry Kan wrote: > You could do this on the client side, just read 10 first facets off the top > of the list and mark the remaining as "O

faceting question

2012-01-13 Thread Christopher Gross
My index has a multi-valued String field called "tag" that is used to store a category/keyword for the item the record is about. I made a faceted query in order to find out all the different tags that are stored in the index: http://localhost:8080/solr/select?q=*:*&facet=on&facet.field=tag&facet.

Re: Not able to see output in XML output

2012-01-13 Thread Erick Erickson
Let's back up a bit here. First, what do your Solr logs show? Anything unusual? Second, is there anything in your index at all? What does the admin/stats page show for numDocs and maxDocs? People often forget to commit after a DIH import, did you make sure you commited when the run was done? Bes

Re: Restricting access to shards / collections with SolrCloud

2012-01-13 Thread Erick Erickson
The SolrCloud capabilities are pretty new to me too, but I doubt anything like this is built in, you're probably better off with your current solution. Although one wonders if some kind of group-based permission scheme would work for you. Essentially you add an authorized_users entry to each docum

Re: Acceptable Response Time

2012-01-13 Thread Erick Erickson
There's no good answer here. But a 3GB index isn't very big, I doubt that sharding is necessary. Absolute numbers aren't much use here IMO. What you probably want to do is create a stress test for your system. Run the stress test occasionally and when the average response times start to go up, you'

Re: Facets, Get top 10 categories

2012-01-13 Thread Dmitry Kan
You could do this on the client side, just read 10 first facets off the top of the list and mark the remaining as "Others". On Fri, Jan 13, 2012 at 12:47 PM, Manish Bafna wrote: > Hi, > Is it possible to get top 10 facets and group the remaining in "Others". > > Thanks, > Manish. > -- Regards

Facets, Get top 10 categories

2012-01-13 Thread Manish Bafna
Hi, Is it possible to get top 10 facets and group the remaining in "Others". Thanks, Manish.

Re: can solr automatically search for different punctuation of a word

2012-01-13 Thread Chantal Ackermann
Hi Alex, for me, ICUFoldingFilterFactory works very good. It does lowercasing and removes diacritica (this is how umlauts and accenting of letters is called - punctuation means comma, points etc.). It will work for any any language, not only German. And it will also handle apostrophs as in "C'est

Re: a way to marshall xml doc into a SolrInputDocument

2012-01-13 Thread jmuguruza
Chris Hostetter-3 wrote > > but you're the first person i've ever seen ask about > serializng to Solr's XML format on the client, then parse it again, then > send the SolrInputDocument to Solr (seems like a lot of > gratuitious serialize/desrialze/serialise/etc...) > -Hoss > Yes, , but I am

Sorting results within the fields

2012-01-13 Thread aronitin
I need to implement sorting of search results where sorting needs to be done based on the fields that are matched for a query and the score associated with each term in the field which is generated by application logic. e.g if there are 3 fields which are being queried and the final query after ap

Re: a way to marshall xml doc into a SolrInputDocument

2012-01-13 Thread Chris Hostetter
: Is not there a way to easily marshal that file into a SolrInputDocument? Do : I have to do the parsing myself? : : I need them in java pojo cause I want to modify some fields before indexing. : I would think that is possible with built in methods in Solr but cannot find : a way. the class tha