Fwd: Solr 3.5 result grouping is failing

2012-08-14 Thread chethan
Hi, I'm trying to group (field collapse) my search results on a field called "site". The schema says that it has to be indexed: *.* But when I try to query the results with *group.field=site&group.limit=100, *I see only 1 group of results being returned. And the group value is null. This seems to

RE: Facet sort numeric values

2012-08-14 Thread Aleksander Akerø
Oh brilliant, didn't think of it being possible to configure that way. Had made my own "untokenized" type, so I guess it would be better for me to control datatype this way. Bonus question (hehe): What if these field values also contain alphanumeric values? E.g. "Alpha, Bravo, Omega, ... " How wo

Re: RAMDirectoryFactory bug

2012-08-14 Thread Lance Norskog
I can't remember the property name, but there is a Solr Java property that tells where to hunt for the data/ directory. You might be able to work around this bug using that property. On Tue, Aug 14, 2012 at 1:34 PM, Michael Della Bitta wrote: > Hi everyone, > > It looks like I found a bug with RA

Re: Custom Jars for a config in the Solr Cloud world..

2012-08-14 Thread Chris Hostetter
: by upload the configuration information to ZooKeeper. The newest index : needs the UIMA jars. Normally I would put them in the core's /lib : directory, but since I am only accessing my server via ZooKeeper, I : don't have that directory as an option. : : I know I could manually upload the

Duplicated facet counts in solr 4 beta: user error

2012-08-14 Thread Buttler, David
Here are my steps: 1) Download apache-solr-4.0.0-BETA 2) Untar into a directory 3) cp -r example example2 4) cp -r example exampleB 5) cp -r example example2B 6) cd example; java -Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=myconf -DzkRun

Re: scanned pdf with solr cell

2012-08-14 Thread Jack Krupansky
Some PDFs are purely scanned documents and have only a bitmap image for each page with no text, or sometimes they do a mediocre OCR on the page images which produces a fair amount of garbage in the text. My recollection is that PDFBox is parsing the PostScript for the text layer, which may be ei

scanned pdf with solr cell

2012-08-14 Thread Ahmet Arslan
Hi All, I have set of rich documents. Some of them are scanned pdf files. When I send a scanned pdf to extraction request handler, below icon appears in my Dock. http://tinypic.com/r/2mpmo7o/6 http://tinypic.com/r/28ukxhj/6 Does anyone know what this is? curl "http://localhost:8983/solr/docum

Re: Distributed Searching + unique Ids

2012-08-14 Thread Erick Erickson
Oh, and David Buttler: Please start a new thread when asking unrelated questions, aka don't hijack threads, see: http://people.apache.org/~hossman/#threadhijack Best Erick On Tue, Aug 14, 2012 at 3:55 PM, Buttler, David wrote: > I just downloaded the solr 4 beta, and was running through the tut

Re: Distributed Searching + unique Ids

2012-08-14 Thread Erick Erickson
This shouldn't be happening, but it may well be pilot error. Could you show us the queries that get submitted? Especially add &debug=query and paste the results? Best Erick On Tue, Aug 14, 2012 at 3:55 PM, Buttler, David wrote: > I just downloaded the solr 4 beta, and was running through the tu

Re: Distributed Searching + unique Ids

2012-08-14 Thread Erick Erickson
I still don't see the need to have duplicate documents here. Simply have your indexing process put the data that should be grouped on a shard on that shard. Let the rest of the objects be randomly distributed amongst the shards... Now, your front end has to know that some queries only need to go t

Re: multi-searching problem

2012-08-14 Thread Erick Erickson
You have two problems. 1> the & may or may not be interpreted as a character of significance in the XML, you'd probably need make it an entity as & 2> this information should NOT be in your schema.xml anyway. defType is a request handler parameter, and should be in a request handler in solrconfig.

Re: SOLR3.6:Field Collapsing/Grouping throws OOM

2012-08-14 Thread Erick Erickson
You'r putting a lot of data on a single box, then asking to group on what I presume is a string field. That's just going to eat up a _bunch_ of memory. let's say your average file name is 16 bytes long. Each unique value will take up 58 + 32 bytes (58 bytes of overhead, I'm presuming Solr 3.X and

Re: Update data directory at run time

2012-08-14 Thread Walter Underwood
It will download all changed files on disk. This might be the entire index. If the download succeeds, it will switch to the new index with no downtime. Usually, search is a bit slower for a few minutes because it is starting with new caches. wunder On Aug 14, 2012, at 3:32 PM, Rohit Harchandan

Re: Update data directory at run time

2012-08-14 Thread Rohit Harchandani
Cool. Thanks. I will have a look at this. But in this case, if all the files on the master are new, will the entire index on the slave be replaced or will it add to whatever is currently present on the slave? Thanks again, Rohit On Tue, Aug 14, 2012 at 6:04 PM, Walter Underwood wrote: > Why are y

Re: SOLR 4 Alpha Out Of Mem Err

2012-08-14 Thread sausarkar
Hello Mark, Has this issue been fixed in the BETA release? - Sauvik -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-4-Alpha-Out-Of-Mem-Err-tp3995033p4001266.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Update data directory at run time

2012-08-14 Thread Walter Underwood
Why are you not using the built-in replication? That works fine. You do not need to invent anything. wunder On Aug 14, 2012, at 2:57 PM, Rohit Harchandani wrote: > Hi All, > I am new to Solr and would really appreciate some help on this issue. > I have a single core setup currently and have sep

RE: Solr 4.0 - Join performance

2012-08-14 Thread Eric Khoury
Thanks David, I'm not clear on how the X value of a range pair will help me filter on pairs of start-end times.Can you explain how that'd work? Still, seems like the ability to create subobjects in solr is a huge feature, I'm hoping it'll eventually make it in.Eric. > From: dsmi...@mitre.org >

Update data directory at run time

2012-08-14 Thread Rohit Harchandani
Hi All, I am new to Solr and would really appreciate some help on this issue. I have a single core setup currently and have separate instances for querying and indexing. These two instances point to different data directories through symbolic links since I do not want it to affect the "live" search

RE: Distributed Searching + unique Ids

2012-08-14 Thread Buttler, David
I just downloaded the solr 4 beta, and was running through the tutorial. It seemed to me that I was getting duplicate counts in my facet fields when I had two shards and four cores running. For example, http://localhost:8983/solr/collection1/browse Reports 21 entries in the facet cat:electronic

Re: Index not loading

2012-08-14 Thread Jonatan Fournier
On Tue, Aug 14, 2012 at 10:25 AM, Erick Erickson wrote: > This is quite odd, it really sounds like you're not > actually committing. So, some questions. > > 1> What happens if you search before you shut > down your tomcat? Do you see docs then? If so, > somehow you're doing soft commits and never

Re: Custom Jars for a config in the Solr Cloud world..

2012-08-14 Thread Eric Pugh
And I can now confirm that yes, ZooKeeper blows up when I attempted to add all the UIMA and content extraction jars to my conf/ directory in ZooKeeper! A couple small jars did upload, and then it started sending back "java.io.IOException: Broken pipe" errors. So any thoughts on the best way to

Re: Solr 4.0 - Join performance

2012-08-14 Thread Smiley, David W.
This one should work for now: https://issues.apache.org/jira/browse/SOLR-3304 If you're comfortable with checking out Lucene/Solr and applying a patch, then you can do it yourself and get it working without any real coding. You'd have to use a dummy constant value for 'y' as you index rectangles

Re: Custom Jars for a config in the Solr Cloud world..

2012-08-14 Thread Jack Krupansky
Dear Eric The Brave, As per the wiki:"znodes are limited to the amount of data that they can have. ZooKeeper was designed to store coordination data: status information, configuration, location information, etc. This kind of meta-information is usually measured in kilobytes, if not bytes. ZooK

Switch from Sphinx to Solr - some basics please

2012-08-14 Thread nnikolay
Hi all, I am switching from Sphinx to Solr right now and I am looking for some basic configurations and understanding, so I can switch my project in some weeks. Because I have set a post in Stackoverflow, I wan't, that there is dublicate questions. Can you please read this post: http://stackove

RAMDirectoryFactory bug

2012-08-14 Thread Michael Della Bitta
Hi everyone, It looks like I found a bug with RAMDirectoryFactory (I know, I know...) It doesn't seem to be able to load files off the disk. Everytime it starts up, it logs: WARNING: [] Solr index directory 'solr/./data/index' doesn't exist. Creating new index... Even if that filesystem path ex

RE: Solr 4.0 - Join performance

2012-08-14 Thread Eric Khoury
Thanks David, that does indeed sound like it'll help. Is there an issue number I can use to track development\availability?Eric. > From: dsmi...@mitre.org > To: solr-user@lucene.apache.org > Subject: Re: Solr 4.0 - Join performance > Date: Tue, 14 Aug 2012 20:15:27 + > > Stepping back a bi

Re: Solr 4.0 - Join performance

2012-08-14 Thread Smiley, David W.
Stepping back a bit, the reason you are using multiple cores with a join is because Solr doesn't have a multi-valued numeric range type. The spatial work I'm doing in Lucene-spatial does, and it's 2-dimensional for an x & y whereas your case calls for one dimension. It's taking a bit of time,

Custom Jars for a config in the Solr Cloud world..

2012-08-14 Thread Eric Pugh
I've got a Solr instance with a number of cores that are each configured by upload the configuration information to ZooKeeper. The newest index needs the UIMA jars. Normally I would put them in the core's /lib directory, but since I am only accessing my server via ZooKeeper, I don't have that

Re: Are there any comparisons of Elastic Search specifically with SOLR 4?

2012-08-14 Thread Bing Hua
Most of existing comparisons were done on Solr3.x or earlier against ES. After Solr4 added those cloud concepts similar to ES's, there are really less differences. Solr is more heavier loaded and was not designed for maximize elasticity In my opinion. It's not hard to decide which way to go as long

Re: Solr 4.0 - Join performance

2012-08-14 Thread Mikhail Khludnev
Eric, Unfortunately Solr guys ignores it. On Tue, Aug 14, 2012 at 7:48 PM, Eric Khoury wrote: > > Hi Mikhail, was trying to figure out if solr-3076 made it into the beta, > but since the issue is still marked as opened, I take it it didn't > yet?Thanks,Eric. > > From: mkhlud...@griddynamics.co

Re: offsets issues with multiword synonyms since LUCENE_33

2012-08-14 Thread Michael McCandless
See also SOLR-3390. Some cases have been addressed. Eg, if you match domain name system -> dns, then dns will have correct offsets spanning the full phrase "domain name system" in the input. (However: QueryParser won't work because a query for "domain name system" is pre-split on whitespace so t

Re: Index not loading

2012-08-14 Thread Jonatan Fournier
That's the commit log I'm getting when using commitWithin: Aug 14, 2012 12:53:52 PM org.apache.solr.update.DirectUpdateHandler2 commit INFO: start commit{flags=0,version=0,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true} Aug 14, 2012 12:53:52 PM org.apache.

Re: offsets issues with multiword synonyms since LUCENE_33

2012-08-14 Thread Marc Sturlese
Well an example would be: synonyms.txt: huge,big size The I have the docs: 1- The huge fox attacks first 2- The big size fox attacks first Then if I query for huge, the highlights for each document are: 1- The huge fox attacks first 2- The big size fox attacks first The analyzer looks like this

Re: Index not loading

2012-08-14 Thread Jonatan Fournier
On Tue, Aug 14, 2012 at 11:14 AM, Jack Krupansky wrote: > If you send a dummy document using a curl command, without the commit > option, does it auto-commit and become visible in 1 minute? Sending a JSON document using curl: { "add": { "commitWithin": 6, "overwrite": false, "d

Re: offsets issues with multiword synonyms since LUCENE_33

2012-08-14 Thread Jack Krupansky
What is your specific example? There are lots of issues and "gotchas" with synonyms. Is your example exactly identical to the referenced Jira, or merely roughly similar. The exact example is needed to analyze these types of issues. And please be specific about which term in the sequence has an

Re: Select where in select

2012-08-14 Thread Chris Hostetter
: I'm trying to do a query with a select in another. I'm not 100% sure i understand your question, but i think you want take a look at the "join" feature... http://wiki.apache.org/solr/Join -Hoss

Re: Indexing thousands file on solr

2012-08-14 Thread Bing Hua
You may write a client using solrj and loop through all files in that folder. Something like, ContentStreamUpdateRequest up = new ContentStreamUpdateRequest("/update/extract"); up.addFile(new File(fileLocation), null); ModifiableSolrParams p = new ModifiableSolrParams(); p.add("literal.id", str);

offsets issues with multiword synonyms since LUCENE_33

2012-08-14 Thread Marc Sturlese
Has someone noticed this problem and solved it somehow? (without using LUCENE_33 in the solrconfig.xml) https://issues.apache.org/jira/browse/LUCENE-3668 Thanks in advance -- View this message in context: http://lucene.472066.n3.nabble.com/offsets-issues-with-multiword-synonyms-since-LUCENE-33

RE: Solr 4.0 - Join performance

2012-08-14 Thread Eric Khoury
Hi Mikhail, was trying to figure out if solr-3076 made it into the beta, but since the issue is still marked as opened, I take it it didn't yet?Thanks,Eric. > From: mkhlud...@griddynamics.com > Date: Fri, 3 Aug 2012 00:06:36 +0400 > Subject: Re: Solr 4.0 - Join performance > To: ekhour...@hotmai

Re: Facet sort numeric values

2012-08-14 Thread Chris Hostetter
: I'm having a problem with sorting facets. I am using the facet.sort=index : parameter and it works fine for most of the values. ... : Eksample, when sorting "15, 6, 23, 7, 10, 90" it sorts like this: "10, 15, : 23, 6, 7, 90", but what I wanted was "6, 7, 10, 15, 23, 90". what field type

Re: Getting Suggestions without Search Results

2012-08-14 Thread Bing Hua
Great comments. Thanks to you all. Bing -- View this message in context: http://lucene.472066.n3.nabble.com/Getting-Suggestions-without-Search-Results-tp4000968p4001192.html Sent from the Solr - User mailing list archive at Nabble.com.

RE: Distributed Searching + unique Ids

2012-08-14 Thread Eric Khoury
Hey Erick, thanks.I was hoping to shard on a very logical boundary for my data, where most queries would only care about data on single shards, and some queries would go to all shards, but that would only work if certain common objects are duplicated across shards.Can you think of another way t

Re: Index not loading

2012-08-14 Thread Jack Krupansky
If you send a dummy document using a curl command, without the commit option, does it auto-commit and become visible in 1 minute? -- Jack Krupansky -Original Message- From: Jonatan Fournier Sent: Tuesday, August 14, 2012 11:03 AM To: solr-user@lucene.apache.org ; erickerick...@gmail.c

Re: Solr Index linear growth - Performance degradation.

2012-08-14 Thread feroz_kh
These are simple search queries and Its multithreaded . -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Index-linear-growth-Performance-degradation-tp4000934p4001184.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Index linear growth - Performance degradation.

2012-08-14 Thread feroz_kh
The queries were extracted from production log. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Index-linear-growth-Performance-degradation-tp4000934p4001182.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Index not loading

2012-08-14 Thread Jonatan Fournier
Hi Erick, On Tue, Aug 14, 2012 at 10:25 AM, Erick Erickson wrote: > This is quite odd, it really sounds like you're not > actually committing. So, some questions. > > 1> What happens if you search before you shut > down your tomcat? Do you see docs then? If so, > somehow you're doing soft commits

RE: multi-searching problem

2012-08-14 Thread Videnova, Svetlana
Ok, thank you ill check that. Have a nice day -Message d'origine- De : Ahmet Arslan [mailto:iori...@yahoo.com] Envoyé : lundi 13 août 2012 17:40 À : solr-user@lucene.apache.org Objet : RE: multi-searching problem --- On Mon, 8/13/12, Videnova, Svetlana wrote: > From: Videnova, Svetl

RE: Dataimport Handler in solr 3.6.1

2012-08-14 Thread Dyer, James
One thing I notice in your configuration...the child entity has this: cacheLookup="ent1.uid" but your parent entity doesn't have a "uid" field. Also, you have these 3 transformers: RegexTransformer,DateFormatTransformer,TemplateTransformer but none of your columns seem to make use of these.

Re: Index not loading

2012-08-14 Thread Erick Erickson
This is quite odd, it really sounds like you're not actually committing. So, some questions. 1> What happens if you search before you shut down your tomcat? Do you see docs then? If so, somehow you're doing soft commits and never doing a hard commit. 2> What happens if, as the last statement in y

Re: Distributed Searching + unique Ids

2012-08-14 Thread Erick Erickson
Don't do this. Many bits of sharding assume that a uniqueKey exists on one and only one shard. Document counts may be off. Faceting may be off. Etc. Why do you want to duplicate records across shards? What benefit is this providing? This feels like an XY problem... Best Erick On Fri, Aug 10, 2

RE: Scope bar?

2012-08-14 Thread Valentin, AJ
Sort of, but I should have mentioned I'm using a Drupal 7 module. So basically, how it's done there. My mistake for not mentioning that I'm using Solr with Drupal 7. -Original Message- From: Michael Della Bitta [mailto:michael.della.bi...@appinions.com] Sent: Tuesday, August 14, 20

Re: Scope bar?

2012-08-14 Thread Michael Della Bitta
You could pretty easily have your scope bar be a selection from a list of canned FilterQueries that you write ahead of time. Does that make sense? Michael Della Bitta Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017 www.appinions.com

Scope bar?

2012-08-14 Thread Valentin, AJ
Can anyone help me out? I've tried a number of support paths, but I'm getting nowhere. I need to add a 'Scope Bar' to my Solr search mechanism, so I can let the user's search within a given Scope instead of 'All'...if they wish. It would be like the Drupal.org search (www.drupal.org

Dataimport Handler in solr 3.6.1

2012-08-14 Thread mechravi25
I am indexing some data using dataimport handler files in solr 3.6.1. I using a nested entity in my handler file. I noticed a scenario where-in instead of the records which is to be fetched for a document, all the records present in the table are indexed. Following is the ideal scenario how the

Facet sort numeric values

2012-08-14 Thread Aleksander Akerø
Hi I'm having a problem with sorting facets. I am using the facet.sort=index parameter and it works fine for most of the values. But for numeric values the sorting goes a bit off. Eksample, when sorting "15, 6, 23, 7, 10, 90" it sorts like this: "10, 15, 23, 6, 7, 90", but what I wanted wa

Re: Indexing thousands file on solr

2012-08-14 Thread Jack Krupansky
The new 4.0 branch has a greatly improved post tool that handles directories for SolrCell. And it will generate the default IDs for you. See: http://wiki.apache.org/solr/ExtractingRequestHandler#SimplePostTool_.28post.jar.29 -- Jack Krupansky -Original Message- From: troya Sent: Mond

Re: SOLR3.6:Field Collapsing/Grouping throws OOM

2012-08-14 Thread Tirthankar Chatterjee
Editing the query...remove I don't know where it came from while I did copy/paste Tirthankar Chatterjee wrote: Hi, I have a beefy box with 24Gb RAM (12GB for Tomcat7 which houses SOLR3.6) 2 Processors Intel Xeon 64 bit Server, 30TB HDD. JDK 1.7.0_03 x64 bit Data Index Dir Size: 400GB

Re: Near Real Time + Facets + Hierarchical Faceting (Pivot Table) with Date Range: huge data set

2012-08-14 Thread Nagendra Nagarajayya
You should try realtime NRT available with Apache Solr 4.0 with RankingAlgorithm 1.4.4, allows faceting in realtime. RankingAlgorithm 1.4.4 also provides an age feature that allows you to retrieve the most recent changed docs in realtime, allowing you to query your huge index in ms. You can

[ANNOUNCE] Apache Solr 4.0-beta released.

2012-08-14 Thread Robert Muir
14 August 2012, Apache Solr™ 4.0-beta available The Lucene PMC is pleased to announce the release of Apache Solr 4.0-beta. Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, fa

Re: Solr Index linear growth - Performance degradation.

2012-08-14 Thread Alexey Serba
>10K queries How do you generate these queries? I.e. is this a single or multi threaded application? Can you provide full queries you send to Solr servers and solrconfig request handler configuration? Do you use function queries, grouping, faceting, etc? On Tue, Aug 14, 2012 at 10:31 AM, feroz_k

Query regarding dataimporthandler

2012-08-14 Thread ravicv
Hi, Is there any way for intermediate commits while indexing data using dataimport handler? I am using 1.4 solr version. My problem is : Some times while indexing huge data about 4 GB , after indeixng it is while commit process is going on if any user searches the data sometimes solr is throwing

Unexpected function query sort

2012-08-14 Thread Lochschmied, Alexander
strdist() doesn't seem to work for me (using Solr 3.6.0). I use a query like this one: ?fl=mycol &start=0 &rows=20 &q=mycol:abcd OR othercol:abcd &fq=somecol:someval &sort=strdist('abcd',mycol,edit) asc I would have expected all documents with mycol == 'abcd' to be on top of the results (or at t

Re: Solr 4.0.0, query, default port not changeable

2012-08-14 Thread Raghav Karol
On Aug 13, 2012, at 6:15 PM, Chris Hostetter wrote: > > : We would like to use multiple jvm's to host solr cores but can not > : because the queries ignore the jetty.port settings. The following is > : they query generated using the admin interface, solr is running in jetty > : under port 808

Re: Indexing thousands file on solr

2012-08-14 Thread Gora Mohanty
On 14 August 2012 09:02, troya wrote: > HI All, > > I have thousands file on some folder which i want to index using solr. > Firstly my file only 9 until 20 file, so i upload them manually into solr > using curl. > > But Now, my file is thousands file, how i can index it using solr ? should i > up

Re: Solr 4.0.0, query, default port not changeable

2012-08-14 Thread Raghav Karol
Hi Jack, Problem is fixed :-) Thanks for the pointer, yes, I tried the suggestion in the email below as well as adding -Djetty.port= in jetty's start.ini file. The problem was the shard and core information was being taken from zookeeper; I have a two host zookeeper ensemble. Deleting zooke