Re: Apache Nutch 1.5.1 + Apache Solr 4.0

2012-11-09 Thread John Whelan
Hi, I while back, I had the same 'problem'. After solving it for myself, I built and distributed a combination of Solr and Nutch into a pre-configured environment. While what I did was specific to Windows (I included Cygwin in the distribution, and a bunch of other stuff for easy administration of

Re: Solr4.0 / SolrCloud queries

2012-11-09 Thread Mark Miller
On Nov 9, 2012, at 1:20 PM, shreejay wrote: > Instead of doing an optimize, I have now changed the Merge settings by > keeping a maxBuffer = 960, a merge Factor = 40 and ConcurrentMergePolicy. Don't you mean ConcurrentMergeScheduler? Keep in mind that if you use the default TieredMergePolicy,

customize solr search/scoring for performance

2012-11-09 Thread jchen2000
Hi we have 20million short docs (about 60 terms, less than 1k in total bytes each) on each box, and we wanted to rank results based on how many terms got matched only. In particular we are only interested in top N with best scores (say a small number like 5). With some help from the forum users

Re: Collections limit in SolrCloud aka best to use single index, SOLR-1293

2012-11-09 Thread Mark Miller
Have you looked at your logs? I think at around 1000 collections, the clusterstate.json node will become too large for zookeeper by default. It has a default limit of 1MB per node - you should be able to raise/override that limit with a sys prop or something when starting zookeeper. I can't reme

Re: SolrZKClient changed interface

2012-11-09 Thread Mark Miller
Please file a JIRA issue for this change. - Mark On Nov 9, 2012, at 8:41 AM, Trym R. Møller wrote: > Hi > > The constructor of SolrZKClient has changed, I expect to ensure clean up of > resources. The strategy is as follows: > connManager = new ConnectionManager(...) > try { >... > } catc

Re: Error with SolrCloud

2012-11-09 Thread Mark Miller
Yeah, if you want to use a new config set when you dynamically create a new collection, you must first upload the new config set. It's pretty easy using the cloud-scripts/zkcli.sh|bat scripts. If someone likes the idea of being able to point to a new config set to upload when using the collecti

Re: 4.0 query question

2012-11-09 Thread dm_tim
I think I may have found my answer buy I'd like additional validation: I believe that I can add a function to my query to get only the highest values of 'file_version' like this - _val_:"max(file_version, 1)" I seem to be getting the results I want. Does this look correct? Regards, Tim -- Vie

Re: Error with SolrCloud

2012-11-09 Thread Carlos Alexandro Becker
Hm, OK, now I just leave my work, next week I'll try to do what you say and give you a feedback. Meanwhile, thank you very much for your help. On Fri, Nov 9, 2012 at 6:30 PM, Tomás Fernández Löbbe wrote: > I thought it was possible to upload a new configuration when creating a new > collection

Re: Error with SolrCloud

2012-11-09 Thread Tomás Fernández Löbbe
I thought it was possible to upload a new configuration when creating a new collection through the Collections API, but it looks like the CREATE action only takes: replicationFactor name collection.configName numShards I think this means that you'll have to use an existing configuration (already u

4.0 query question

2012-11-09 Thread dm_tim
Howdy, I have a Solr query that is almost perfect: http://localhost:8080/apache-solr-4.0.0/v3_tag_core/select?q=tag%3A%22coat%22%5E4+%22coat%22+cid%3A136+&sort=score+desc&rows=10&fl=id+tag+cid+file_version+lang+score&wt=json&indent=true&debugQuery=true It's grabbing data that includes the fields:

Re: Error with SolrCloud

2012-11-09 Thread Carlos Alexandro Becker
Hi, about the port, that's my mistake, I have the wrong port specified in solr.xml. But, now, I got the following error: 17:37:10,358 WARN [com.datasul.technology.webdesk.indexer.engine.IndexerSearchEngine] (http--0.0.0.0-8080-6) Fail uptading indexer synonyms/stopwords list. 17:37:10,378 INFO

Re: Error with SolrCloud

2012-11-09 Thread Tomás Fernández Löbbe
Also, JBoss AS uses Tomcat, rigth? you may want to look at Mark Miller's comments here: http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201210.mbox/%3ccabcj++j+am6e0ghmm+hpzak5d0exrqhyxaxla6uutw1yqae...@mail.gmail.com%3E On Fri, Nov 9, 2012 at 4:30 PM, Tomás Fernández Löbbe wrote: > D

Re: Error with SolrCloud

2012-11-09 Thread Tomás Fernández Löbbe
Do you have a stacktrace of the error you are getting? When Zookeeper runs embedded (when you are using -DzkRun), it runs on [solr port]+1000. In the example Jetty, Solr runs at 8983, and so zk runs at 9983, in your case it should be using 9080. Which Solr instance is the one that can't connect to

Re: Error with SolrCloud

2012-11-09 Thread Carlos Alexandro Becker
Hi Thomás, thanks for your help. I change the start cmd to: JAVA_OPTS="-DzkRun -DnumShards=2 -Dbootstrap_conf=true -Xmx2048m -XX:MaxPermSize=512m" ./standalone.sh Then, I tried to add a new core like this: http://localhost:8080/ecm-indexer/admin/collections?action=CREATE&name=2&numShards=2 &boot

Re: Limit the SolR acces from the web for one user-agent?

2012-11-09 Thread Michael Della Bitta
Another option is to use HTTP auth, which would involve modifying web.xml in the Solr WAR and configuring a user in your container. Unfortunately, this won't work with distributed queries. Michael Della Bitta Appinions 18 East 41st Street, 2nd Flo

Re: Splitting data into an array / lookup

2012-11-09 Thread Amit Nithian
Why not just do the join in the DB via your initial query? You'll be executing 1 query per *each* ID in your list which is expensive in your sub-entity. If you just have your query do the joins up front then each row could be a complete (or nearly complete) document? On Thu, Nov 8, 2012 at 9:31 A

Re: Error with SolrCloud

2012-11-09 Thread Tomás Fernández Löbbe
I think you have to use either bootstrap_conf=true or "bootstrap_confdir=/path/to/conf"+"collection.configName=foo" (not both at the same time). If you use the first one, Solr will upload the configuration for all the cores that you have configured (with the name of the core as name of the configur

Re: Solr4.0 / SolrCloud queries

2012-11-09 Thread shreejay
Thanks Erick. I will try optimizing after indexing everything. I was doing it after every batch since it was taking way too long to Optimize (which was expected), but it was not finishing merging it into lesser number of segments (1 segment). Instead of doing an optimize, I have now changed the M

Re: Using AnalyzingQueryParser - Solr 4.0

2012-11-09 Thread Jack Krupansky
Maybe you just want to use the white space tokenizer - the standard tokenizer treats the at-sign as if a space. See: http://lucene.apache.org/core/4_0_0/analyzers-common/org/apache/lucene/analysis/core/WhitespaceTokenizerFactory.html Or, you could use the "classic" tokenizer which does keep ema

Re: custom request handler

2012-11-09 Thread Amit Nithian
Lee, I guess my question was if you are trying to prevent the "big bad world" from doing stuff they aren't supposed to in Solr, how are you going to prevent the big bad world from POSTing a "delete all" query? Or restrict them from hitting the admin console, looking at the schema.xml, solrconfig.x

Re: Error with SolrCloud

2012-11-09 Thread Carlos Alexandro Becker
Actually, I want to use it with multiple cores, and my app dinamically add cores to solr. So, my solr.xml looks like this: so, my solr.home is jboss.home/solr, which is represented by the dot in instanceDir setting. My solr.home has the following files: conf/ -stopwords.txt --

Re: Error with SolrCloud

2012-11-09 Thread Tomás Fernández Löbbe
Are you sure you are pointing to the correct conf directory? sounds like you are missing the collection name in the path (maybe it should be ../solr/YOURCOLLECTIONNAME/conf?) On Fri, Nov 9, 2012 at 1:58 PM, Carlos Alexandro Becker wrote: > I started my JBoss server with the following command: >

Re: My latest solr blog post on Solr's PostFiltering

2012-11-09 Thread Amit Nithian
Oh weird. I'll post URLs on their own lines next time to clarify. Thanks guys and looking forward to any feedback! Cheers Amit On Fri, Nov 9, 2012 at 2:05 AM, Dmitry Kan wrote: > I guess the url should have been: > > > http://hokiesuns.blogspot.com/2012/11/using-solrs-postfiltering-to-collect

Re: Using AnalyzingQueryParser - Solr 4.0

2012-11-09 Thread balaji.gandhi
Hi Jack, We have an email field defined like this:- A query like [emailAddress : bob*] would match b...@bob.com, but queries

Re: SolrZKClient changed interface

2012-11-09 Thread Per Steffensen
Hi Trym I believe one of the reasons that they started throwing RuntimeExceptions insted of UnknownHostException, TimeoutException etc is that the method signature has changed to not have a "throws"-part. They probably do not want do deal with those checked exceptions. Im not sure I completel

RE: Solr SpellCheck on Query Field

2012-11-09 Thread Dyer, James
What I'm saying is if you specify "spellcheck.maxCollationTries", it will run the suggested query against the index for you and only return valid re-written queries. That is, a misspelled firstname will be replaced with a valid firstname; a missspelled lastname will be replaced with a valid las

RE: DIH nested entities don't work

2012-11-09 Thread Dyer, James
Here are things I would try: - You need to package the patch from SOLR-2943 in your jar as well as SOLR-2613 (to get the class DIHCachePersistCacheProperties) - You need to specify "cacheImpl", not "persistCacheImpl" - You are correct using "persistCacheName" & "persistCacheBaseDir" , contra the

Distributed Search (shards) not working with /terms request handler

2012-11-09 Thread Daniel Baur
Hi all, I am using the the /terms request handler defined in the default configuration with solr 3.6.1: true terms When issuing a normal request to this request handler it is working as expected. However, when I'm trying to issue a distributed search requ

SolrZKClient changed interface

2012-11-09 Thread Trym R. Møller
Hi The constructor of SolrZKClient has changed, I expect to ensure clean up of resources. The strategy is as follows: connManager = new ConnectionManager(...) try { ... } catch (Throwable e) { connManager.close(); throw new RuntimeException(); } try { connManager.waitForConnec

Re: custom request handler

2012-11-09 Thread Lee Carroll
Hi Amit I did not do this via a servlet filter as I wanted the solr devs to be concerned with solr config and keep them out of any concerns of the container. By specifying declarative data in a request handler that would be enough to produce a service uri for an application. Or have I missed a p

Re: sort on wild card query not working in solr 3.6

2012-11-09 Thread Ahmet Arslan
Hi Doug, Retrieval Engines are not designed for deep paging (very large start parameter). https://issues.apache.org/jira/browse/SOLR-1726 And your sort syntax is wrong. &sort:id It should be &sort=id asc --- On Fri, 11/9/12, Doug Kunzman wrote: > From: Doug Kunzman > Subject: sort on wild c

Re: Testing Solr Cloud with ZooKeeper

2012-11-09 Thread darul
Yes ku3ia, I read your thread yesterday and looks like we get same issue. I wish Apache Con is nearly finished and expert can resolve this Thanks again to solr community, Jul -- View this message in context: http://lucene.472066.n3.nabble.com/Testing-Solr-Cloud-with-ZooKeeper-tp4018900p4019271

sort on wild card query not working in solr 3.6

2012-11-09 Thread Doug Kunzman
Hi - We are using SOLR 3.6 and have noticed that when the start parameter is a very large number SOLR's performance is rather slow. After looking at our schema I was hoping to speed up SOLR performance by using a sort order since it could be on an index column. This hasn't worked. I was wonderi

Re: Patch Needed for Issue Solr-3790

2012-11-09 Thread Koji Sekiguchi
(12/11/09 19:20), mechravi25 wrote: Hi All, Im using Solr 3.6.1 version. For the issue given in the following url, there is no patch file provided https://issues.apache.org/jira/browse/SOLR-3790 Can you tell me if there is patch file for the same? Also, We noticed that the below url had the c

newSearcher event

2012-11-09 Thread Dzmitry Petrushenka
Hi All! Solr provides support for newSearcher events. But those are dispatched before the real search becomes the current one. Is that possible to add some code that would be called whenever the new searcher starts to serve requests? Thanx,

Re: Testing Solr Cloud with ZooKeeper

2012-11-09 Thread ku3ia
Hi, I have near the same problems with cloud state see http://lucene.472066.n3.nabble.com/Replicated-zookeeper-td4018984.html -- View this message in context: http://lucene.472066.n3.nabble.com/Testing-Solr-Cloud-with-ZooKeeper-tp4018900p4019264.html Sent from the Solr - User mailing list archi

Re: Testing Solr Cloud with ZooKeeper

2012-11-09 Thread darul
- Shards : 2 - ZooKeeper Cluster : 3 - One collection. Here is how I run it and my scenario case: In first console, I get first Node (first Shard) running on port 8983: In second console, I get second Node (second Shard) running on port 8984: Here I get just 2 nodes for my 2 shards runn

Patch Needed for Issue Solr-3790

2012-11-09 Thread mechravi25
Hi All, Im using Solr 3.6.1 version. For the issue given in the following url, there is no patch file provided https://issues.apache.org/jira/browse/SOLR-3790 Can you tell me if there is patch file for the same? Also, We noticed that the below url had the changes that had to be done to resolve

Re: My latest solr blog post on Solr's PostFiltering

2012-11-09 Thread Dmitry Kan
I guess the url should have been: http://hokiesuns.blogspot.com/2012/11/using-solrs-postfiltering-to-collect.html i.e. without 'and' in the end of it. -- Dmitry On Fri, Nov 9, 2012 at 12:03 PM, Erick Erickson wrote: > It's always good when someone writes up their experiences! > > But when I tr

Re: My latest solr blog post on Solr's PostFiltering

2012-11-09 Thread Erick Erickson
It's always good when someone writes up their experiences! But when I try to follow that link, I get to your "Random Writings", but it tells me that the blog post doesn't exist... Erick On Thu, Nov 8, 2012 at 4:21 PM, Amit Nithian wrote: > Hey all, > > I wanted to thank those who have helped

Re: [SOLR-2549] DIH LineEntityProcessor support for delimited & fixed-width files

2012-11-09 Thread zakaria benzidalmal
Hi James, Yes, that was this parameter who made the request fail. I've edited the patch and added the new version to jira. Thank you. 2012/11/7 Dyer, James > Try specifying the "escape" parameter. This is the character your file > uses to escape delimiters occuring in the data. If this fixe

Re: Solr4.0 / SolrCloud queries

2012-11-09 Thread Erick Erickson
You really should be careful about optimizes, they're generally not needed. And optimizing is almost always wrong when done after every N documents in a batch process. Do it at the very end or not at all. optimize essentially re-writes the entire index into a single segment, so you're copying aroun

Re: NullPointerException when debugQuery=true

2012-11-09 Thread Erick Erickson
If this went away when you made your "id" field into a string type rather than analyzed then it's probably not worth a JIRA... Erick On Thu, Nov 8, 2012 at 11:39 AM, Otis Gospodnetic < otis.gospodne...@gmail.com> wrote: > Looks like a bug. If Solr 4.0, maybe this needs to be in JIRA along with

Re: Testing Solr Cloud with ZooKeeper

2012-11-09 Thread Erick Erickson
you have to have at least one node per shard running for SolrCloud to function. So when you bring down all nodes and start one, then you have some shards with no live nodes and SolrCloud goes into a wait state. Best Erick On Thu, Nov 8, 2012 at 6:17 PM, darul wrote: > Is it same issue as one d

RE: Skewed IDF in multi lingual index

2012-11-09 Thread Markus Jelsma
Robert, Tom, That's it indeed! Using maxDoc as numerator opposed to docCount yields very skewed results for an unevenly distributed multi-lingual index. We have one language dominating the other twenty so the dominating language contains no rare terms compared to the others. We're now checking

RE: Solr SpellCheck on Query Field

2012-11-09 Thread SolrCarinthia
Correct me if i am wrong but wouldn't collation return alternate terms against the master dictionary field. So if I were to take a collated term and run a query for that term against a specific field (say First Name) I am not guaranteed to get back results since that term could actually have been