Re: Null Pointer Exception on DIH with MySQL

2012-08-29 Thread Aleksey Vorona
Thank you for the reply. We rebuilt solr from sources, reinstalled it and the problem went away. As it was never reproducible on any other server, I blame some mysterious java byte code corruption on that server. The assumption I would never be able to verify, because we did not make a copy of

Sorting on mutivalued fields still impossible?

2012-08-29 Thread Uwe Reh
Hi, just to be sure. There is still no way to sort by multivalued fields? "...&sort=max(datefield) desc&" There is no smarter option, than creating additional singelevalued fields just for sorting? "eg. datafield_max and datefield_min" Uwe

solrj api for partial document update

2012-08-29 Thread Yoni Amir
Is there a solrj api for partial document update in solr 4. It is described here: http://solr.pl/en/2012/07/09/solr-4-0-partial-documents-update/ That article explains how the xml structure should be. I want to use solrj api, but I can't figure out if it is supported. Thanks, Yoni -Origin

Re: AW: AW: auto completion search with solr using NGrams in SOLR

2012-08-29 Thread aniljayanti
Hi, thanks, I added "PatternReplaceFilterFactory" like below.Getting results differently(not like suggester). You suggested to remove "KeywordTokenizerFactory" , "PatternReplace" is a FilterFactory, then which "TokenizerFactory" need to use ?

Re: AW: AW: auto completion search with solr using NGrams in SOLR

2012-08-29 Thread aniljayanti
Hi thanks, I tried with below said changes, but getting same result as earlier. suggest/?q="michael ja" --- Response : - - 0 1 - - - 10 1 8 - *m

Patch 2429 for solr1.3?

2012-08-29 Thread Sujatha Arun
Can we use the patch 2429 in solr 1.3? Regards Sujatha

Re: How do I represent a group of customer key/value pairs

2012-08-29 Thread Lance Norskog
I do not understand exactly the data modeling problem. PathHierarchyTokenizerFactory may be what you're looking for. You might have to combine this with a charfilter or some token filters to get exactly what you want. Maybe have two fields, one which only saves the leaf words and the other that onl

Re: Document Processing

2012-08-29 Thread Lance Norskog
I've seen the JSoup HTML parser library used for this. It worked really well. The Boilerpipe library may be what you want. Its schwerpunkt (*) is to separate boilerplate from wanted text in an HTML page. I don't know what fine-grained control it has. * raison d'être. There is no English word for t

Re: Solr contribs build and jar-of-jars

2012-08-29 Thread Lance Norskog
I found a couple implementations of a crazy classloader that finds jars inside a jar. I tested the 'zipfileset' feature of 'ant zip' which works well. It unpacks the outboard jar directly into the target without making a separate staging directory, and ran surprisingly fast on my laptop. So, jars-i

Re: Hierarchical faceting and filter query exclusions

2012-08-29 Thread Erick Erickson
See "Tagging and excluding filters" here: http://lucidworks.lucidimagination.com/display/solr/Faceting Best Erick On Wed, Aug 29, 2012 at 11:44 AM, Nicholas Swarr wrote: > We're using Solr 4.0 Beta, testing the hierarchical faceting support to see > if it's a good fit to facet on taxonomies.

Re: Re: Antwort: Re: refiltering search results

2012-08-29 Thread Erick Erickson
Perhaps you're making this harder than it needs to be. The preferred way of handling ACL calculations is by group. That is, you just have a multiValued field in each document that contains the groups have permissions for that document, say G1, G2, G3. Then you just add an fq clause for the query t

Re: Custom close to index metadata / pass commit data to writer.commit

2012-08-29 Thread Erick Erickson
You have to look at the "Resolution" entry. It's currently "unresolved", so it hasn't been committed. Best Erick On Wed, Aug 29, 2012 at 5:27 AM, Jozef Vilcek wrote: > Hi, > > I just wanted to check if someone have an idea about intentions with this > issue: > https://issues.apache.org/jira/bro

Re: Solr Shard Replicas sharing files

2012-08-29 Thread Erick Erickson
Possible, kinda maybe. But then all of the SolrCloud goodness that's there for HA/DR goes out the window because the shared index (actually the hardware it's on) becomes a single point of failure. On the other hand, you're using the word replica but not explicitly talking about SolrCloud, so I gues

Re: Frequently Updated Index and Caching

2012-08-29 Thread Erick Erickson
Hmmm, the critical thing here is not how often you change the index, it's how often you commit. Look at your Solr admin/stats page and your logs. You'll see things like "hit ratio" and "cumulative hit ratio" for, particularly, your filtercache. Whether you're getting decent hit ratios is what tell

Re: Null Pointer Exception on DIH with MySQL

2012-08-29 Thread Erick Erickson
Not much information to go on here, have you tried the DIH debugging console? See: http://wiki.apache.org/solr/DataImportHandler#interactive Best Erick On Mon, Aug 27, 2012 at 7:22 PM, Aleksey Vorona wrote: > We have Solr 3.6.1 running on Jetty (7.x) and using DIH to get data from the > MySQL da

Re: Correctly importing and producing null in search results

2012-08-29 Thread Erick Erickson
If I'm reading this right, you're kind of stuck. Solr/DIH don't have any way to reach out to your mapping file and "do the right thing" A couple of things come to mind. Use a Transformer in DIH to simply remove the field from the document you're indexing. Then the absence of the field in the r

Re: Fail to huge collection extraction

2012-08-29 Thread Erick Erickson
I really think you need to think about firing successive page requests at the index and reporting in chunks. Best Erick On Mon, Aug 27, 2012 at 2:56 PM, neosky wrote: > I am using Solr 3.5 and Jetty 8.12 > I need to pull out huge query results at a time(for example, 1 million > documents, probab

Re: Configure logging with Solr 4 on Tomcat 7

2012-08-29 Thread Erick Erickson
Have you looked in catalina.out? Best Erick On Mon, Aug 27, 2012 at 12:43 PM, Nicholas Ding wrote: > I put a logging.properties into solr/WEB-INF/classes, but I still not see > any logs. > > On Mon, Aug 27, 2012 at 11:56 AM, Chantal Ackermann < > c.ackerm...@it-agenten.com> wrote: > >> >> Drop t

Re: SolrCloud admin UI core/stats showing commit count even without no explicit commit

2012-08-29 Thread Erick Erickson
Been busy the last couple of days, sorry it took so long to get back You have basically 2 questions: About the 80% rate. It's not quite clear. What I meant was say you have 20M docs on a server. You push it until you max out the QPS rate, say that's 100 queries/second. Now, configure your loa

Re: Injest pauses

2012-08-29 Thread Otis Gospodnetic
Hello Brad, At one point you said CPU is at 100% and there is no disk IO.  Then in a separate email I think you said this happens during RAM -> Disk flush.  Isn't there a contradiction here? A few thread dumps may tell you where things are "stuck". Also, how does your JVM look while this is ha

Re: Load Testing in Solr

2012-08-29 Thread Otis Gospodnetic
Hello, JMeter, SolrMeter, HP LoadRunner ah, there is another open-source one that I like whose name I can't recall now. Otis  Performance Monitoring for Solr / ElasticSearch / HBase - http://sematext.com/spm  - Original Message - > From: dhaivat dave > To: solr-user@lucen

Re: Injest pauses

2012-08-29 Thread Alexey Serba
Could you take jstack dump when it's happening and post it here? > Interestingly it is not pausing during every commit so at least a portion of > the time the async commit code is working. Trying to track down the case > where a wait would still be issued. > > -Original Message- > From

Re: Large XML file sizes error out parsing the file size as an Integer

2012-08-29 Thread Chris Hostetter
: Shouldn't the file size be a long? Has anybody else experienced this problem? Your problem does not apear to be any internal limitation in Solr - your problem appears to be that you have a field in your schema named "fileSize" which uses a fieldType that is a "TrieIntField" but you are atte

Re: Large XML file sizes error out parsing the file size as an Integer

2012-08-29 Thread Walter Underwood
Break it up. You'll need 7GB of RAM for the source, at least that much for the parsed version, at least that much for the indexes, and so on. Why try to make something work when you aren't going to do it that way in production? wunder On Aug 29, 2012, at 3:38 PM, David Martin wrote: > Folks:

Large XML file sizes error out parsing the file size as an Integer

2012-08-29 Thread David Martin
Folks: One of our files of XML entities for import is almost 7GB in size. When trying to import, we error out with the exception below. 6845266984 is the exact size of the input file in bytes. Shouldn't the file size be a long? Has anybody else experienced this problem? We plan on dividing t

Re: LateBinding

2012-08-29 Thread Chris Hostetter
: In-Reply-To: <1346241342637-4003991.p...@n3.nabble.com> : References: <1343815485386-3998559.p...@n3.nabble.com> : : <1343892838577-3998721.p...@n3.nabble.com> : : <1346155058429-4003689.p...@n3.nabble.com> : : <1346241342637-4003991.p...@n3.nabble.com> : Subject: LateBinding https://p

Re: Maximum index size on single instance of Solr

2012-08-29 Thread Michael Della Bitta
Unfortunately the answer for this can vary quite a bit based on a number of factors: 1. Whether or not fields are stored, 2. Document size, 3. Total term count, 4. Solr version etc. We have two major indexes, one for servicing online queries, and one for batch processing. Our batch index is perf

Re: Problem with copyfield wild card

2012-08-29 Thread Kiran Jayakumar
Thank you Jack. On Wed, Aug 29, 2012 at 12:10 PM, Jack Krupansky wrote: > Alas, copyField does not support full glob. Just like dynamicField, you > can only use * at the start or end of the source field name, but not both. > > -- Jack Krupansky > > -Original Message- From: Kiran Jayakuma

Re: Problem with copyfield wild card

2012-08-29 Thread Jack Krupansky
Alas, copyField does not support full glob. Just like dynamicField, you can only use * at the start or end of the source field name, but not both. -- Jack Krupansky -Original Message- From: Kiran Jayakumar Sent: Wednesday, August 29, 2012 1:41 PM To: solr-user@lucene.apache.org Subjec

RE: Cloud assigning incorrect port to shards

2012-08-29 Thread Buttler, David
I think the issue was that I didn't have a solr.xml in the solr home. I was a little confused by the example directory because there are actually 5 solr.xml files % find . -name solr.xml ./multicore/solr.xml ./example-DIH/solr/solr.xml ./exampledocs/solr.xml ./contexts/solr.xml ./solr/solr.xml

Re: Load Testing in Solr

2012-08-29 Thread Aleksey Vorona
On 12-08-29 11:44 AM, dhaivat dave wrote: Hello everyone . Can any one know any component or tool that can be used for testing the solr performance. People were recommending https://code.google.com/p/solrmeter/ earlier. -- Aleksey

Re: Ordering of fields

2012-08-29 Thread Yonik Seeley
In 4.0 you can use the def function with pseudo-fields (returning function results as doc field values) http://wiki.apache.org/solr/FunctionQuery#def fl=a,b,c:def(myfield,10) -Yonik http://lucidworks.com On Wed, Aug 29, 2012 at 2:39 PM, Rohit Harchandani wrote: > Hi all, > Is there a way to s

xinclude and relative files

2012-08-29 Thread Shawn Heisey
I found some discussion saying that starting in 3.1, files that you xinclude with a relative path from something like solrconfig.xml are relative to the location of the file with the xinclude. I use a directory structure where solrconfig.xml in the individual core directory is a symlink to ano

Problem with copyfield wild card

2012-08-29 Thread Kiran Jayakumar
Hi everyone, I have several fields like Something_Misc_1, Something_Misc_2, SomeOtherThing_Misc_1,... etc. I have defined a copy field like this: It doesnt capture the misc fields. Am I missing something ? Any help is much appreciated. Thanks

Solr and query abortion

2012-08-29 Thread Aleksey Vorona
Hi, we are running Solr 3.6.1 and see an issue in our load tests. Some of the queries our load test script produces result in huge number of hits. It may go as high as 90% of all documents we have (2.5M). Those are all range queries. I see in the log that those queries take much more time to ex

Re: AW: AW: auto completion search with solr using NGrams in SOLR

2012-08-29 Thread Kiran Jayakumar
You need this for both index and query: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PatternReplaceFilterFactory On Wed, Aug 29, 2012 at 4:55 AM, aniljayanti wrote: > Hi, > > thanks for ur reply, > > I donot know how to remove multiple white spaces using regax in the sear

RE: Injest pauses

2012-08-29 Thread Voth, Brad (GE Corporate)
Interestingly it is not pausing during every commit so at least a portion of the time the async commit code is working. Trying to track down the case where a wait would still be issued. -Original Message- From: Voth, Brad (GE Corporate) Sent: Wednesday, August 29, 2012 12:32 PM To: sol

RE: Solr 4.0 - Join performance

2012-08-29 Thread Eric Khoury
Thanks David, will work around this issue for now, and will keep an eye out for changes to solr-3304.Good luck with the rethink.Eric. > Date: Wed, 29 Aug 2012 08:44:14 -0700 > From: dsmi...@mitre.org > To: solr-user@lucene.apache.org > Subject: Re: Solr 4.0 - Join performance > > The solr.GeoHa

RE: Injest pauses

2012-08-29 Thread Voth, Brad (GE Corporate)
Thanks, I'll continue with my testing and tracking down the block. -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: Wednesday, August 29, 2012 12:28 PM To: solr-user@lucene.apache.org Subject: Re: Injest pauses On Wed, Aug 29, 2012 at 1

Re: Injest pauses

2012-08-29 Thread Yonik Seeley
On Wed, Aug 29, 2012 at 11:58 AM, Voth, Brad (GE Corporate) wrote: > Anyone know the actual status of SOLR-2565, it looks to be marked as resolved > in 4.* but I am still seeing long pauses during commits using 4.* SOLR-2565 is definitely committed - adds are no longer blocked by commits (at lea

RE: Injest pauses

2012-08-29 Thread Voth, Brad (GE Corporate)
Anyone know the actual status of SOLR-2565, it looks to be marked as resolved in 4.* but I am still seeing long pauses during commits using 4.* I am currently digging through code to see what I can find, but java not being my primary (or secondary ) language it is mostly slow going. -Or

Hierarchical faceting and filter query exclusions

2012-08-29 Thread Nicholas Swarr
We're using Solr 4.0 Beta, testing the hierarchical faceting support to see if it's a good fit to facet on taxonomies. One issue we've encountered is that we can't apply filter exclusions to the hierarchical facets so as to preserve facet count with multi-select. I haven't been able to locate

Re: Solr 4.0 - Join performance

2012-08-29 Thread David Smiley (@MITRE.org)
The solr.GeoHashFieldType is useless; I'd like to see it deprecated then removed. You'll need to go with unreleased code and apply patches or wait till Solr 4. ~ David On Aug 29, 2012, at 10:53 AM, Eric Khoury [via Lucene] wrote: Awesome, thanks David. In the meantime, could I potentially u

Re: how to boost given word in a field in the query parameters

2012-08-29 Thread andy
I got it , thanks for ur kindly reply!!! iorixxx wrote > >> Thanks for your reply, if I insert the clause >> category:206^100 , the search >> result will only include the results in category 206 >> ? > > It will be an optional clause, unless you have set default operator to AND > somewhere. >

RE: Unexcpected RuntimeException when indexing with Solr 4.0 Beta

2012-08-29 Thread Alexander Cougarman
Thanks, Jack. Another solution: Use two instances of Solr on separate ports -- 3.6.1 and 4.0. Use an IF statement to send the file to the proper instance :) Sincerely, Alex Cougarman Bahá'í World Centre Haifa, Israel Office: +972-4-835-8683 Cell: +972-54-241-4742 acoug...@bwc.org -Origi

RE: Solr 4.0 - Join performance

2012-08-29 Thread Eric Khoury
Awesome, thanks David. In the meantime, could I potentially use geohash, or something similar? Geohash looks like it supports seperate "lon" or "lat" range queries which would help, but its not a multivalue field, which I need. > Date: Wed, 29 Aug 2012 07:20:42 -0700 > From: dsmi...@mitre.org

Re: how to boost given word in a field in the query parameters

2012-08-29 Thread andy
Thanks Yes, my default operator is AND,if I use OR operator like this: q=cell phone OR category:206^100 , the results will more than the query q=cell phone may be something in the category 206 which don't contains the cell phone keywords will be included, This is really a tickler for me iori

Re: Unexcpected RuntimeException when indexing with Solr 4.0 Beta

2012-08-29 Thread Jack Krupansky
Understood. Well, you could always manually convert old docs to a newer doc format. Or use a tool such as: http://download.cnet.com/Docx-to-Doc-Converter/3000-2079_4-75206386.html -- Jack Krupansky -Original Message- From: Alexander Cougarman Sent: Wednesday, August 29, 2012 9:59 AM T

Re: Solr 4.0 - Join performance

2012-08-29 Thread David Smiley (@MITRE.org)
Solr 4 is certainly the goal. There's a bit of a setback at the moment until some of the Lucene spatial API is re-thought. I'm working heavily on such things this week. ~ David On Aug 28, 2012, at 6:22 PM, Eric Khoury [via Lucene] wrote: David, Solr support for this will come in Solr-3304 I

RE: Injest pauses

2012-08-29 Thread Voth, Brad (GE Corporate)
Very interesting links, after much more digging yesterday this appears to be exactly what I'm seeing. I am using 4.0 beta currently for my testing. FWIW I've also pulled trunk from svn as of yesterday and experienced the same issue. From: Alexey Serba [

Re: Sharing and performance testing question.

2012-08-29 Thread Tiernan OToole
Thanks for the tips! will check out those links and see what i can find! On Wed, Aug 29, 2012 at 9:44 AM, Alexey Serba wrote: > > Any tips on load testing solr? Ideally we would like caching to not > effect > > the result as much as possible. > > 1. Siege tool > This is probably the simplest opt

RE: Unexcpected RuntimeException when indexing with Solr 4.0 Beta

2012-08-29 Thread Alexander Cougarman
I believe these are the older Word 97 docs (*.doc) files. The problem was that Solr 3.6.1 blew up on *.MSG files when doing extractOnly=true. So we upgraded to Solr 4.0, and now run into this; if we use Tika 1.0, I'm afraid the DOC files will be fixed but the MSG files will break! Sincerely, Al

Re: Unexcpected RuntimeException when indexing with Solr 4.0 Beta

2012-08-29 Thread Jack Krupansky
Sounds like this POI bug (SolrCell invokes Tika which invokes POI): https://issues.apache.org/bugzilla/show_bug.cgi?id=53380 Are these in fact Office 97 documents that are failing? Solr 4.0 includes Tika 1.1, while Solr 3.6.1 includes Tika 1.0. It may be possible for you to drop the old Tika 1.

Re: LateBinding

2012-08-29 Thread Alexey Serba
http://searchhub.org/dev/2012/02/22/custom-security-filtering-in-solr/ See section about PostFilter. On Wed, Aug 29, 2012 at 4:43 PM, wrote: > Hello, > > Has anyone ever implementet the security feature called late-binding? > > I am trying this but I am very new to solr and I would be very glad

Unexcpected RuntimeException when indexing with Solr 4.0 Beta

2012-08-29 Thread Alexander Cougarman
Hi. I'm using Solr 4.0 Beta (no modifications to default installation) to index, and it's blowing up on some Word docs: curl "http://localhost:8983/solr/update/extract?literal.id=doc15&commit=true"; -F "myfile=@15.doc" Here's the exception. And the same files go through Solr 3.6.1 just fine.

LateBinding

2012-08-29 Thread Johannes . Schwendinger
Hello, Has anyone ever implementet the security feature called late-binding? I am trying this but I am very new to solr and I would be very glad if I would get some hints to this. Regards, Johannes

Re: AW: AW: auto completion search with solr using NGrams in SOLR

2012-08-29 Thread aniljayanti
Hi, thanks for ur reply, I donot know how to remove multiple white spaces using regax in the search text. Can u share me that one. Thanks, AnilJayanti -- View this message in context: http://lucene.472066.n3.nabble.com/auto-completion-search-with-solr-using-NGrams-in-SOLR-tp3998559p4003991.

Multiple Versions getting formed while replicating in solr 1.4.1

2012-08-29 Thread mechravi25
Hi, Im using solr 1.4.1 version and I have the following configuration for replication in master and slave Solrconfig.xml (master) commit startup schema.xml,stopwords.txt SolrConfig.xml (slave) http://localhost:8982/solr/

Re: how to boost given word in a field in the query parameters

2012-08-29 Thread Ahmet Arslan
> Thanks for your reply, if I insert the clause > category:206^100 , the search > result will only include the results in category 206 > ? It will be an optional clause, unless you have set default operator to AND somewhere. search results will contain all categories, but 206 will be boosted.

Antwort: Re: Antwort: Re: refiltering search results

2012-08-29 Thread Johannes . Schwendinger
Von: Ahmet Arslan An: solr-user@lucene.apache.org Datum: 29.08.2012 10:50 Betreff: Re: Antwort: Re: refiltering search results Thanks for the answer. My next question is how can i filter the result or how to replace the old ResponseBuilder Result with a new one? --- On Wed, 8/29/12, johanne

Re: how to boost given word in a field in the query parameters

2012-08-29 Thread andy
Hi iorixxx, Thanks for your reply, if I insert the clause category:206^100 , the search result will only include the results in category 206 ? iorixxx wrote > >> category field has certain values ,for >> examples:307,503,206.. >> >> my query like >> this:q=cell+phone&version=2.2&start=0&rows=

Re: Custom close to index metadata / pass commit data to writer.commit

2012-08-29 Thread Jozef Vilcek
Hi, I just wanted to check if someone have an idea about intentions with this issue: https://issues.apache.org/jira/browse/SOLR-2701 It is marked for 4.0-Alpha and there is already Beta out there. Can anyone tell if it planed to be part of 4.0 release. Best, Jozef On Sun, Jun 24, 2012 at 1:18 A

Re: how to boost given word in a field in the query parameters

2012-08-29 Thread Ahmet Arslan
> category field has certain values ,for > examples:307,503,206.. > > my query like > this:q=cell+phone&version=2.2&start=0&rows=10&indent=on > the search result will be in many categories ,for example > may be in > 206,782,307,289 > you know the the default sort which depends on the > relevance,

Re: AW: AW: auto completion search with solr using NGrams in SOLR

2012-08-29 Thread Ahmet Arslan
> Hi, > > thanks, > > I tried by adding " marks,  but still giving same > results. > > http://localhost:8080/test/suggest/?q="michael f" Looking back to your field type definition, i saw that you have defined in query analyzer. Move this into index analyzer. Restart solr, re-index and su

Re: Injest pauses

2012-08-29 Thread Alexey Serba
Hey Brad, > This leads me to believe that a single merge thread is blocking indexing from > occuring. > When this happens our producers, which distribute their updates amongst all > the shards, pile up on this shard and wait. Which version of Solr you are using? Have you tried 4.0 beta? * http

how to boost given word in a field in the query parameters

2012-08-29 Thread andy
Hi All, I am a Solr newbie,I encountered a problem right now about how to* boost given word in the field in the query parameter* the details as follows: my solr schema as follows category field has certain values ,for examples:307,503,206.. my query like this:q=cell+phone&version=2.2&start=0&

Re: Antwort: Re: refiltering search results

2012-08-29 Thread Ahmet Arslan
--- On Wed, 8/29/12, johannes.schwendin...@blum.com wrote: > From: johannes.schwendin...@blum.com > Subject: Antwort: Re: refiltering search results > To: solr-user@lucene.apache.org > Date: Wednesday, August 29, 2012, 8:22 AM > The main idea is to filter results as > much as possible with so

Re: Sharing and performance testing question.

2012-08-29 Thread Alexey Serba
> Any tips on load testing solr? Ideally we would like caching to not effect > the result as much as possible. 1. Siege tool This is probably the simplest option. You can generate urls.txt file and pass it to the tool. You should also capture server performance (CPU, memory, qps, etc) using tools

Solr Shard Replicas sharing files

2012-08-29 Thread Christian von Wendt-Jensen
Hi, I was wondering if it was possible to let all replicas of a shard share the physical lucene files. In that way you would only need one set of files on a shared storage, and then setup as many replicas as needed without copying files around. This would make it very fast to optimize and rebal

AW: Sharing and performance testing question.

2012-08-29 Thread Markus Klose
Hi tiernan Check out if solrmeter fits your need (http://code.google.com/p/solrmeter/) Viele Grüße aus Augsburg Markus Klose SHI Elektronische Medien GmbH -Ursprüngliche Nachricht- Von: Tiernan OToole [mailto:lsmart...@gmail.com] Gesendet: Dienstag, 28. August 2012 16:52 An: solr-u