Re: ExtractRequestHandler, skipping errors

2013-10-18 Thread Roland Everaert
Hi, We already configure the extractrequesthandler to ignore tika exceptions, but it is solr that complains. The customer manage to reproduce the problem. Following is the error from the solr.log. The file type cause this exception was WMZ. It seems that something is missing in a solr class. We us

Proximity search with wildcard

2013-10-18 Thread sayeed
Hi, I am new to solr. Is it possible to do proximity search with solr. For example "comp* engage"~5. -- View this message in context: http://lucene.472066.n3.nabble.com/Proximity-search-with-wildcard-tp4096285.html Sent from the Solr - User mailing list archive at Nabble.com.

Complex Queries in solr

2013-10-18 Thread sayeed
Hi, Is it possible to search complex queries like (consult* or advis*) NEAR(40) (fee or retainer or salary or bonus) in solr - Sayeed -- View this message in context: http://lucene.472066.n3.nabble.com/Complex-Queries-in-solr-tp4096288.html Sent from the Solr - User mailing list archive a

Re: solrconfig.xml carrot2 params

2013-10-18 Thread Stanislaw Osinski
Hi, Out of curiosity -- what would you like to achieve by changing Tokenizer.documentFields? If you want to have clustering applied to more than one document field, you can provide a comma-separated list of fields in the carrot.title and/or carrot.snippet parameters. Thanks, Staszek -- Stanisla

Re: Proximity search with wildcard

2013-10-18 Thread Harshvardhan Ojha
Hi Sayeed, you can use fuzzy search. comp engage~0.2. Regards harshvardhan ojha On Fri, Oct 18, 2013 at 10:28 AM, sayeed wrote: > Hi, > I am new to solr. Is it possible to do proximity search with solr. > > For example > "comp* engage"~5. > > > > > -- > View this message in context: > http://

how to retireve content page in solr

2013-10-18 Thread javozzo
Hi, i'm new in solr. I use Nutch 1.1 to crawl web pages. I use solr to indexer these pages. My problem is: how to retrieve the content information about a document "stored" il solr? Example If I have a page http://www.prova.com/prova.html that contains the text "This is a web page" Is there a w

Re: ExtractRequestHandler, skipping errors

2013-10-18 Thread Koji Sekiguchi
Hi, I think the flag cannot ignore NoSuchMethodError. There may be something wrong here? ... I've just checked my Solr 4.5 directories and I found Tika version is 1.4. Tika 1.4 seems to use commons compress 1.5: http://svn.apache.org/viewvc/tika/tags/1.4/tika-parsers/pom.xml?view=markup But

Re: how to retireve content page in solr

2013-10-18 Thread Harshvardhan Ojha
Hi Danila, What do you mean by content information? A whole document? Metadata? do you keep it separate in some fields? Or is it about solr search queries? Regards Harshvardhan Ojha On Fri, Oct 18, 2013 at 1:09 PM, javozzo wrote: > Hi, i'm new in solr. > I use Nutch 1.1 to crawl web pages. >

Re: Debugging update request

2013-10-18 Thread Erick Erickson
@Michael: Yep, that's the bit that's addressed by the two patches I referenced. If you can try this with 4.5 (or the soon to be done 4.5.1), the problem should go away. @Chris: I think you have a different issue. A very quick glance at your stack trace doesn't really show anything outstanding. T

Re: Concurent indexing

2013-10-18 Thread Erick Erickson
Chris: OK, one of those stack traces does have the problem I referenced in the other thread. Are you sending updates to the server with SolrJ? And are you using CloudSolrServer? If you are, I'm surprised... There are the important lines: 1. - java.util.concurrent.Semaphore.acquire() @bci=5,

Re: measure result set quality

2013-10-18 Thread Erick Erickson
bq: How do you compare the quality of your search result in order to decide which schema is better? Well, that's actually a hard problem. There's the various TREC data, but that's a generic solution and most every individual application of this generic thing called "search" has its own version of

XLSB files not indexed

2013-10-18 Thread Roland Everaert
Hi, Can someone tells me if tika is supposed to extract data from xlsb files (the new MS Office format in binary form)? If so then it seems that solr is not able to index them like it is not able to index ODF files (a JIRA is already opened for ODF https://issues.apache.org/jira/browse/SOLR-4809)

Re: ExtractRequestHandler, skipping errors

2013-10-18 Thread Roland Everaert
I will open a JIRA issue, I suppose that I just have to create an account first? Regards, Roland. On Fri, Oct 18, 2013 at 12:05 PM, Koji Sekiguchi wrote: > Hi, > > I think the flag cannot ignore NoSuchMethodError. There may be something > wrong here? > > ... I've just checked my Solr 4.5 di

Re: ExtractRequestHandler, skipping errors

2013-10-18 Thread Roland Everaert
Here is the link to the issue: https://issues.apache.org/jira/browse/SOLR-5365 Thanks for your help. Roland Everaert. On Fri, Oct 18, 2013 at 1:40 PM, Roland Everaert wrote: > I will open a JIRA issue, I suppose that I just have to create an account > first? > > > Regards, > > > Roland. > >

Re: ExtractRequestHandler, skipping errors

2013-10-18 Thread Guido Medina
Dont, commons compress 1.5 is broken, either use 1.4.1 or later. Our app stopped compressing properly for a maven update. Guido. On 18/10/13 12:40, Roland Everaert wrote: I will open a JIRA issue, I suppose that I just have to create an account first? Regards, Roland. On Fri, Oct 18, 201

Facet performance

2013-10-18 Thread Lemke, Michael SZ/HZA-ZSW
I am working with Solr facet fields and come across a performance problem I don't understand. Consider these two queries: 1. q=word&facet.field=CONTENT&facet=true&facet.prefix=&facet.limit=10&facet.mincount=1&facet.method=enum&rows=0 2. q=word&facet.field=CONTENT&facet=true&facet.prefix=a&fac

Re: feedback on Solr 4.x LotsOfCores feature

2013-10-18 Thread Soyez Olivier
15K cores is around 4 minutes : no network drive, just a spinning disk But, one important thing, to simulate a cold start or an useless linux buffer cache, I used the following command to empty the linux buffer cache : sync && echo 3 > /proc/sys/vm/drop_caches Then, I started Solr and I found the

Re: Proximity search with wildcard

2013-10-18 Thread sayeed
Generally in solr if we give "Company engage"~5 it will give the results containing "engage" 5 words near to the "company". So here I want to get the results if i gave the query with wildcard as "Compa* engage"~5 - Sayeed -- View this message in context: http://lucene.472066.n3.nabble.co

Re: Filter cache pollution during sharded edismax queries

2013-10-18 Thread Anca Kopetz
Hi Ken, Have you managed to find out why these entries were stored into filterCache and if they have an impact on the hit ratio ? We noticed the same problem, there are entries of this type : item_+(+(title:western^10.0 | ... in our filterCache. Thanks, Anca On 07/02/2013 09:01 PM, Ken Krugle

Re: Concurent indexing

2013-10-18 Thread Chris Geeringh
Erick, yes. Using SolrJ and CloudSolrServer - both 4.6 snapshots from 13 Oct On 18 October 2013 12:17, Erick Erickson wrote: > Chris: > > OK, one of those stack traces does have the problem I referenced in the > other thread. Are you sending updates to the server with SolrJ? And are you > using

querying nested entity fields

2013-10-18 Thread sathish_ix
Hi , can some help if below query is possible, Schema: A product1 product2 B product12 product23 Is it possible to like this q=tag.category:A AND tag.category.product=product1 ??? -- View t

RE: Facet performance

2013-10-18 Thread Toke Eskildsen
Lemke, Michael SZ/HZA-ZSW [lemke...@schaeffler.com] wrote: > 1. > q=word&facet.field=CONTENT&facet=true&facet.prefix=&facet.limit=10&facet.mincount=1&facet.method=enum&rows=0 > 2. > q=word&facet.field=CONTENT&facet=true&facet.prefix=a&facet.limit=10&facet.mincount=1&facet.method=enum&rows=0 > T

Re: solrconfig.xml carrot2 params

2013-10-18 Thread youknowwho
Thanks, I'm new to the clustering libraries. I finally made this connection when I started browsing through the carrot2 source. I had pulled down a smaller MM document collection from our test environment. It was not ideal as it was mostly structured, but small. I foolishly thought I could

Re: how to retireve content page in solr

2013-10-18 Thread javozzo
hi Harshvardhan Ojha; i'm using nutch 1.1 and solr 3.6.0. I mean whole document. I try to create a search engine with nutch and solr and i would obtain a interface like this: name1 http://www.prova.com/name1.html first rows of content document name2 http://www.prova.com/name2.html first rows of c

Solr timeout after reboot

2013-10-18 Thread michael.boom
I have a SolrCloud environment with 4 shards, each having a replica and a leader. The index size is about 70M docs and 60Gb, running with Jetty + Zookeeper, on 2 EC2 instances, each with 4CPUs and 15G RAM. I'm using SolrMeter for stress testing. If I restart Jetty and then try to use SolrMeter to

Fwd: Searching within list of regions with 1:1 document-region mapping

2013-10-18 Thread Sandeep Gupta
Hi, I have a Solr index of around 100 million documents with each document being given a region id growing at a rate of about 10 million documents per month - the average document size being aronud 10KB of pure text. The total number of region ids are themselves in the range of 2.5 million. I wan

RE: Facet performance

2013-10-18 Thread Lemke, Michael SZ/HZA-ZSW
Toke Eskildsen [mailto:t...@statsbiblioteket.dk] wrote: >Lemke, Michael SZ/HZA-ZSW [lemke...@schaeffler.com] wrote: >> 1. >> q=word&facet.field=CONTENT&facet=true&facet.prefix=&facet.limit=10&facet.mincount=1&facet.method=enum&rows=0 >> 2. >> q=word&facet.field=CONTENT&facet=true&facet.prefix=a&

Re: Check if dynamic columns exists and query else ignore

2013-10-18 Thread Utkarsh Sengar
Bumping this one, any suggestions? Looks like if() and exists() are meant to solve this problem, but I am using it in a wrong way. -Utkarsh On Thu, Oct 17, 2013 at 1:16 PM, Utkarsh Sengar wrote: > I trying to do this: > > if (US_offers_i exists): >fq=US_offers_i:[1 TO *] > else: >fq=off

Issues with Language detection in Solr

2013-10-18 Thread vibhoreng04
Hi All,I am trying to detect the language of the business name filed and the address field. I am using Solr's lang Detect(Google Library) , not Tika. It works ok in most of the cases but in some it detects the language wrongly.For an example the document -"OrgName": "EXPLOITS VALLEY HIGHGREENWOOD",

Re: Issues with Language detection in Solr

2013-10-18 Thread Jack Krupansky
I would say that in general you need at least 15 or 20 words in a text field for language to be detected reasonably well. Sure, sometimes it can work for 8 to 12 words, but flip a coin how reliable it will be. You haven't shown us any true text fields. I would say that language detection again

Seeking New Moderators for solr-user@lucene

2013-10-18 Thread Chris Hostetter
It looks like it's time to inject some fresh blood into the solr-user@lucene moderation team. If you'd like to volunteer to be a moderator, please reply back to this thread and specify which email address you'd like to use as a moderator (if different from the one you use when sending the em

Re: Switching indexes

2013-10-18 Thread Christopher Gross
I was able to get the new collections working dynamically (via Collections RESTful calls). I was having some other issues with my development environment that I had to fix up to get it going. I had to upgrade to 4.5 in order for the aliases to work at all though. Not sure what the deal was with t

Re: Check if dynamic columns exists and query else ignore

2013-10-18 Thread Chris Hostetter
: I trying to do this: : : if (US_offers_i exists): :fq=US_offers_i:[1 TO *] : else: :fq=offers_count:[1 TO *] "if()" and "exist()" are functions, so you would have to explicitly use them in a function context (ie: {!func} parser, or {!frange} parser) and to use those nested queries i

Re: Issues with Language detection in Solr

2013-10-18 Thread vibhoreng04
I agree with you Jack . But I request you to see here that still this filter works perfectly fine .Only in one case case where even all the words are latin , the language is getting detected as German.My question is why and how ? If it works perfectly for the other docs what in this case is making

Re: Seeking New Moderators for solr-user@lucene

2013-10-18 Thread Anshum Gupta
Hey Hoss, I'd be happy to moderate. Sent from my iPhone > On 19-Oct-2013, at 0:22, Chris Hostetter wrote: > > > It looks like it's time to inject some fresh blood into the solr-user@lucene > moderation team. > > If you'd like to volunteer to be a moderator, please reply back to this > thre

Re: Questions developing custom functionquery

2013-10-18 Thread Chris Hostetter
: Field-Type: org.apache.solr.schema.TextField ... : DocTermsIndexDocValues. : Calling "getVal()" on a Do

Re: Seeking New Moderators for solr-user@lucene

2013-10-18 Thread vibhoreng04
Hi Chris, I would like to moderate and you can use the mail id vibhoren...@gmail.com for this purpose . Regards, Vibhor Jaiswal -- View this message in context: http://lucene.472066.n3.nabble.com/Seeking-New-Moderators-for-solr-user-lucene-tp4096447p4096448.html Sent from the Solr - User mai

Re: Seeking New Moderators for solr-user@lucene

2013-10-18 Thread Rafał Kuć
Hello! I can help with moderation. -- Regards, Rafał Kuć Sematext :: http://sematext.com/ :: Solr - Lucene - ElasticSearch > It looks like it's time to inject some fresh blood into the > solr-user@lucene moderation team. > If you'd like to volunteer to be a moderator, please reply back to

Re: Facet performance

2013-10-18 Thread Otis Gospodnetic
DocValues is the new black http://wiki.apache.org/solr/DocValues Otis -- Solr & ElasticSearch Support -- http://sematext.com/ SOLR Performance Monitoring -- http://sematext.com/spm On Fri, Oct 18, 2013 at 12:30 PM, Lemke, Michael SZ/HZA-ZSW wrote: > Toke Eskildsen [mailto:t...@statsbiblioteke

RE: Facet performance

2013-10-18 Thread Chris Hostetter
: >> 1. q=word&facet.field=CONTENT&facet=true&facet.prefix=&facet.limit=10&facet.mincount=1&facet.method=enum&rows=0 : >> 2. q=word&facet.field=CONTENT&facet=true&facet.prefix=a&facet.limit=10&facet.mincount=1&facet.method=enum&rows=0 : > : >> The only difference is am empty facet.prefix in the

SOLRJ replace document

2013-10-18 Thread Brent Ryan
How do I replace a document in solr using solrj library? I keep getting this error back: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Atomic document updates are not supported unless is configured I don't want to do partial updates, I just want to replace it... Thanks

Re: Check if dynamic columns exists and query else ignore

2013-10-18 Thread Utkarsh Sengar
Thanks Chris! That worked! I overengineered my query! Thanks, -Utkarsh On Fri, Oct 18, 2013 at 12:02 PM, Chris Hostetter wrote: > > : I trying to do this: > : > : if (US_offers_i exists): > :fq=US_offers_i:[1 TO *] > : else: > :fq=offers_count:[1 TO *] > > "if()" and "exist()" are funct

loading djvu xml into solr

2013-10-18 Thread Sara Amato
Does anyone have a schema they'd be willing to share for loading djvu xml into solr?

Re: loading djvu xml into solr

2013-10-18 Thread Upayavira
On Fri, Oct 18, 2013, at 10:11 PM, Sara Amato wrote: > Does anyone have a schema they'd be willing to share for loading djvu xml > into solr? I assume that djvu XML is a particular XML format? In which case, there is no schema that can do it. That's not how Solr works. You need to use the XML

Re: Solr 4.3 Startup with Multiple Cores Hangs on "Registering Core"

2013-10-18 Thread Jonatan Fournier
Hello, I still have this issue using Solr 4.4, removing firstSearcher queries did make the problem go away. Note that I'm using Tomcat 7 and that if I'm using my own Java application launching an Embedded Solr Server pointing to the same Solr configuration the server fully starts with no hang. W

Re: SOLRJ replace document

2013-10-18 Thread Jack Krupansky
To "replace" a Solr document, simply "add" it again using the same technique used to insert the original document. The "set" option for atomic update is only used when you wish to selectively update only some of the fields for a document, and that does require that the update log be enabled usin

Re: SOLRJ replace document

2013-10-18 Thread Brent Ryan
I wish that was the case but calling addDoc() is what's triggering that exception. On Friday, October 18, 2013, Jack Krupansky wrote: > To "replace" a Solr document, simply "add" it again using the same > technique used to insert the original document. The "set" option for atomic > update is only

Re: SOLRJ replace document

2013-10-18 Thread Shawn Heisey
On 10/18/2013 2:59 PM, Brent Ryan wrote: How do I replace a document in solr using solrj library? I keep getting this error back: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Atomic document updates are not supported unless is configured I don't want to do partial upd

Re: loading djvu xml into solr

2013-10-18 Thread sara amato
Ah, thanks for the clarification - I was having a serious misunderstanding! (As you can tell I'm newly off the tutorial and blundering ahead...) On Oct 18, 2013, at 2:22 PM, Upayavira wrote: > > > On Fri, Oct 18, 2013, at 10:11 PM, Sara Amato wrote: >> Does anyone have a schema they'd be will

Re: SOLRJ replace document

2013-10-18 Thread Brent Ryan
My schema is pretty simple and has a string field called solr_id as my unique key. Once I get back to my computer I'll send some more details. Brent On Friday, October 18, 2013, Shawn Heisey wrote: > On 10/18/2013 2:59 PM, Brent Ryan wrote: > >> How do I replace a document in solr using solrj l

Re: Issues with Language detection in Solr

2013-10-18 Thread Jack Krupansky
Sorry, but Latin is not on the list of supported languages: https://code.google.com/p/language-detection/wiki/LanguageList -- Jack Krupansky -Original Message- From: vibhoreng04 Sent: Friday, October 18, 2013 3:07 PM To: solr-user@lucene.apache.org Subject: Re: Issues with Language de

Re: SOLRJ replace document

2013-10-18 Thread Shawn Heisey
On 10/18/2013 3:36 PM, Brent Ryan wrote: My schema is pretty simple and has a string field called solr_id as my unique key. Once I get back to my computer I'll send some more details. If you are trying to use a Map object as the value of a field, that is probably why it is interpreting your a

Re: Seeking New Moderators for solr-user@lucene

2013-10-18 Thread Alexandre Rafalovitch
I'll be happy to moderate. I do it for some other lists already. Regards, Alex

Leader election fails in some point.

2013-10-18 Thread yriveiro
Hi, In this screenshot I have a shard with two replicas without leader, http://picpaste.com/qf2jdkj8.png On machine with shard green I found this exception: INFO - dat5 - 2013-10-18 22:48:04.775; org.apache.solr.handler.admin.CoreAdminHandler; Going to wait for coreNodeName: 192.168.20.106:898

Re: Solr timeout after reboot

2013-10-18 Thread Otis Gospodnetic
Michael, The servlet container controls timeouts, max threads and such. That's not a high query rate, but yes, it could be solr or OS caches are cold. You will ne able too see all this in SPM for Solr while you hammer your poor Solr servers :) Otis Solr & ElasticSearch Support http://sematext.co

Re: how to retireve content page in solr

2013-10-18 Thread Otis Gospodnetic
Hi, Ignore Nutch for a bit and just follow the Solr tutorial to learn about the Solr side. Should be quick. Otis Solr & ElasticSearch Support http://sematext.com/ On Oct 18, 2013 11:30 AM, "javozzo" wrote: > hi Harshvardhan Ojha; > i'm using nutch 1.1 and solr 3.6.0. > I mean whole document. I

Re: XLSB files not indexed

2013-10-18 Thread Otis Gospodnetic
Hi Roland, It looks like: Tika - yes Solr - no? Based on http://search-lucene.com/?q=xlsb ODF != XLSB though, I think... Otis -- Solr & ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Fri, Oct 18, 2013 at 7:36 AM, Roland Everaert wrote: > H

Re: SolrCloud Performance Issue

2013-10-18 Thread Otis Gospodnetic
Hi, What happens if you have just 1 shard - no distributed search, like before? SPM for Solr or any other monitoring tool that captures OS and Solr metrics should help you find the source of the problem faster. Is disk IO the same? utilization of caches? JVM version, heap, etc.? CPU usage? network

Re: SOLRJ replace document

2013-10-18 Thread Brent Ryan
So I think the issue might be related to the tech stack we're using which is SOLR within DataStax enterprise which doesn't support atomic updates. But I think it must have some sort of bug around this because it doesn't appear to work correctly for this use case when using solrj ... Anyways, I've

Re: SOLRJ replace document

2013-10-18 Thread Jason Hellman
Keep in mind that DataStax has a custom update handler, and as such isn't exactly a vanilla Solr implementation (even though in many ways it still is). Since updates are co-written to Cassandra and Solr you should always tread a bit carefully when slightly outside what they perceive to be norms

Re: SOLRJ replace document

2013-10-18 Thread Jack Krupansky
By all means please do file a support request with DataStax, either as an official support ticket or as a question on StackOverflow. But, I do think the previous answer of avoiding the use of a Map object in your document is likely to be the solution. -- Jack Krupansky -Original Message-