SOLR delete some docs when Number of csv Rows are big

2014-01-21 Thread pouya_samie
Hi All I have a problem whit SOLR Uploading csv. im using an application that triggers every one minute and upload and commit some csv data content in SOLR but when im uploading 1000 doc per minuts it works fine i get all my docs on solr index ( deleted Docs :0) but when i increase the number o

Re: SOLR delete some docs when Number of csv Rows are big

2014-01-21 Thread Alexandre Rafalovitch
Not a hundred percent match to your question, but check what you have defined as unique ID (in solr.xml) and check that you don't have any records with that ID as a duplicate. Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitc

Re: SOLR delete some docs when Number of csv Rows are big

2014-01-21 Thread pouya_samie
thank you Alexandre But my Id is Sql Server Unique ID and i know it cant be duplicated ID by the way im ok when im importing 500 records per minutes -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-delete-some-docs-when-Number-of-csv-Rows-are-big-tp4112404p4112411.htm

Re: Memory Usage on Windows Os while indexing

2014-01-21 Thread onetwothree
Does Solr on a Linux Os has a better memory management than a Windows Os, or can you neglect this comparison? -- View this message in context: http://lucene.472066.n3.nabble.com/Memory-Usage-on-Windows-Os-while-indexing-tp4112262p4112416.html Sent from the Solr - User mailing list archive

CloudSolrServer has thread safe issue?

2014-01-21 Thread longsan
Hi , i'm using SolrJ to do some indexing work with CloudSolrServer class. It's strange that when i start several threads (each thread add 1 documents) to add documents, the result is just only 1 can be indexed finally. But if i change the thread num as 1, everything is ok. Even if it'

Re: solr cloud + hdfs issue

2014-01-21 Thread longsan
thanks. i think it's a good option for me. -- View this message in context: http://lucene.472066.n3.nabble.com/solr-cloud-hdfs-issue-tp4111593p4112422.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: SOLR delete some docs when Number of csv Rows are big

2014-01-21 Thread Erick Erickson
This shouldn't be happening, I suspect that you really _do_ have duplicated docs. Perhaps you're running the job a second time? Or somehow your select is selecting the same row mutliple times when you go to 7,000? Rather than look at deleted documents, I'd look at the total number of non-deleted d

Re: CloudSolrServer has thread safe issue?

2014-01-21 Thread Erick Erickson
I suspect that each thread is indexing the _same_ 10,000 documents, and any document with the same will replace earlier docs with the same ID. If that is true, on the admin page you should see numDocs as 10,000 and maxDoc as (number of threads) * 10,000 Best Erick On Tue, Jan 21, 2014 at 5:37 AM

Solr middle-ware?

2014-01-21 Thread Alexandre Rafalovitch
Hello, All the Solr documents talk about not running Solr directly to the cloud. But I see people keep asking for a thin secure layer in front of Solr they can talk from JavaScript to, perhaps with some basic extension options. Has anybody actually written one? Open source or in a community part

RE: Solr middle-ware?

2014-01-21 Thread Markus Jelsma
Hi - We use Nginx to expose the index to the internet. It comes down to putting some limitations on input parameters and on-the-fly rewrite of queries using embedded Perl scripting. Limitations and rewrites are usually just a bunch of regular expressions, so it is not that hard. Cheers Markus

Re: Solr middle-ware?

2014-01-21 Thread Alexandre Rafalovitch
Hi Markus, Thanks for quick reply. I dare to differ that anything with 'embedded Perl scripting' is an easy suggestion for a random new/intermediate Solr developer to handle. http://xkcd.com/1171/ and all that ;-) Still, I appreciate you sharing your approach, as at least it shows one possible pa

Re: Solr middle-ware?

2014-01-21 Thread Raymond Wiker
We're using Apache with mod_auth_sspi, mod_rewrite and mod_proxy to handle authentication and (limited) parameter validation. On the inside, we have a wrapper process that builds filters for document-level security based on the user's identity/identities and groups, does some more parameter validat

Re: Solr middle-ware?

2014-01-21 Thread Artem Karpenko
Hello. Not really middle-ware but might be of interest concerning possible ways implementing security. We use custom built Solr with web.xml including Spring Security filter and appropriate infrastructure classes for authentication added as a dependency into project. We pass token from fronten

Re: Memory Usage on Windows Os while indexing

2014-01-21 Thread Toke Eskildsen
On Tue, 2014-01-21 at 10:17 +0100, onetwothree wrote: > Does Solr on a Linux Os has a better memory management than a Windows Os, or > can you neglect this comparison? That is debatable, but in this context you can see them as fairly equal: Out of the box, they will both use all free memory for

Re: Optimizing index on Slave

2014-01-21 Thread Michael Della Bitta
Taking a step back: Are you sure you need to optimize? It might be hurting you more than it helps. Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions

Re: solr cloud + hdfs issue

2014-01-21 Thread Greg Walters
> You can configure the Solr client to use a replication factor of 1 for hdfs > and then let Solr replicate for you if you want to avoid this. What is solr's behavior if the lucene files underneath it suddenly disappear? Will a core that's running and can't access its files in the case of a HDFS

Interesting search question! How to match documents based on the least number of fields that match all query terms?

2014-01-21 Thread Daniel Shane
I have an interesting solr/lucene question and its quite possible that some new features in solr might make this much easier that what I am about to try. If anyone has a clever idea on how to do this search, please let me know! Basically, lets state that I have an index in which each documents h

How to get phrase recipe working?

2014-01-21 Thread eShard
Good morning, In the Apache Solr 4 cookbook, p 112 there is a recipe for setting up phrase searches; like so: I ran a sample query q=text_ph:"a-z index" and it didn't work very well at all. Is there a bet

ODP: How to get phrase recipe working?

2014-01-21 Thread Rafał Kuć
Hello, Phrase search will work on any analyzed field if you will use the " to sorround your phrase. Are you looking for something specific or a standard phrase search? Also - if you are looking for exact phrade search, you may want to remove Snowball filter. Rafał Kuć Oryginalna wia

Removing a node from Solr Cloud

2014-01-21 Thread Software Dev
What is the process for completely removing a node from Solr Cloud? We recently removed one but t its still showing up as "Gone" in the Cloud admin. Thanks

Re: Search Suggestion Filtering

2014-01-21 Thread Areek Zillur
Regarding LUCENE-5350, the context is the filter. i.e. the context is prefixed with every entry (suggestion) that is in the context. So when users lookup "foo" entry in context of "bar", the actual lookup is bar(ctx_seperator)foo. This filters entries that match "foo" in another context in the loo

Re: ODP: How to get phrase recipe working?

2014-01-21 Thread eShard
Thanks, I'll remove the snowball filter and give it try. I guess I'm looking for an exact phrase match to start. (Is that the standard phrase search?) Is there something better or more versatile? Btw, great job on the book! -- View this message in context: http://lucene.472066.n3.nabble.com/ODP

Setting leaderVoteWait for auto discovered cores

2014-01-21 Thread Software Dev
How is this accomplished? We currently have an empty solr.xml (auto-discovery) so I'm not sure where to put this value?

Re: Removing a node from Solr Cloud

2014-01-21 Thread Software Dev
Thanks. Anyway to accomplish this if the machine crashed (ie, can't unload it from that admin)? On Tue, Jan 21, 2014 at 11:25 AM, Anshum Gupta wrote: > You could unload the cores. This optionally also deletes the data and > instance directory. > Look at http://wiki.apache.org/solr/CoreAdmin#UNLO

Re: Setting leaderVoteWait for auto discovered cores

2014-01-21 Thread Greg Walters
Allow me to quote Mark via StackOverflow: ** In solr.xml, add a cores attribute of leaderVoteWait=0. It defaults to 18 (3 minutes). This is simply to protect against starting the cluster with an old node - you don't want it to become the leader before other nodes get to participate in the

Re: Removing a node from Solr Cloud

2014-01-21 Thread Anshum Gupta
You could unload the cores. This optionally also deletes the data and instance directory. Look at http://wiki.apache.org/solr/CoreAdmin#UNLOAD. On Tue, Jan 21, 2014 at 10:22 AM, Software Dev wrote: > What is the process for completely removing a node from Solr Cloud? We > recently removed one bu

RE: Indexing URLs from websites

2014-01-21 Thread Teague James
What I'm getting is just the anchor text. In cases where there are multiple anchors I am getting a comma separated list of anchor text - which is fine. However, I am not getting all of the anchors that are on the page, nor am I getting any of the URLs. The anchors I am getting back never include

RE: Indexing URLs from websites

2014-01-21 Thread Markus Jelsma
Hi, are you getting pdfs at all? Sounds like a problem with url filters, those also work on the linkdb. You should also try dumping the linkdb and inspect it for urls. Btw, i noticed this is om the solr list, its best to open a new discussion on the nutch user mailing list. CheersTeague James

Re: Solr Cloud Bulk Indexing Questions

2014-01-21 Thread Software Dev
Any other suggestions? On Mon, Jan 20, 2014 at 2:49 PM, Software Dev wrote: > 4.6.0 > > > On Mon, Jan 20, 2014 at 2:47 PM, Mark Miller wrote: > >> What version are you running? >> >> - Mark >> >> On Jan 20, 2014, at 5:43 PM, Software Dev >> wrote: >> >> > We also noticed that disk IO shoots up

Trying to config solr cloud

2014-01-21 Thread svante karlsson
I've been playing around with solr 4.6.0 for some weeks and I'm trying to get a solrcloud configuration running. I've installed two physical machines and I'm trying to set up 4 shards on each. I installled a zookeeper on each host as well I uploaded a config to zookeeper with /opt/solr-4.6.0/exa

Sorting by a dynamically-generated field in a distributed context

2014-01-21 Thread Andy Crossen
Hi folks, Using Solr 4.6.0 in a cloud configuration, I'm developing a SearchComponent that generates a custom score for each document. Its operational flow looks like this: 1. The score is derived from an analysis of search results coming out of the QueryComponent. Therefore, the component is i

RE: Trying to config solr cloud

2014-01-21 Thread Tim Potter
Hi Svante, It seems like the TermVectorComponent is in the search component chain of your /select search handler but you haven't indexed docs with term vectors enabled (at least from what's in the schema you provided). Admittedly, the NamedList code could be a little more paranoid but I think t

Re: Trying to config solr cloud

2014-01-21 Thread Mark Miller
If that is the case, we could probably use a JIRA issue Svante. The component should really give a nice user error in this scenerio. - Mark On Jan 21, 2014, 8:00:55 PM, Tim Potter wrote: Hi Svante, It seems like the TermVectorComponent is in the search component chain of your /select sear

Re: CloudSolrServer has thread safe issue?

2014-01-21 Thread longsan
Thanks. You are right. It's the key. -- View this message in context: http://lucene.472066.n3.nabble.com/CloudSolrServer-has-thread-safe-issue-tp4112423p4112618.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Memory Usage on Windows Os while indexing

2014-01-21 Thread Shawn Heisey
On 1/21/2014 2:17 AM, onetwothree wrote: > Does Solr on a Linux Os has a better memory management than a Windows Os, or > can you neglect this comparison? As Toke said, this is indeed debatable. I personally believe that Linux is better at almost everything, but if you're running a recent 64-bi

Re: Removing a node from Solr Cloud

2014-01-21 Thread Shalin Shekhar Mangar
There is a deleteReplica collection admin command added in Solr 4.6 which can be used to remove a node even if it is down. See https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-DeleteaReplica On Wed, Jan 22, 2014 at 1:06 AM, Software Dev wrote: > Thanks. Anyway to

Re: Solr middle-ware?

2014-01-21 Thread Alexandre Rafalovitch
So, everybody so far is exposing Solr directly to the web, but with proxy/rewriting. Which means the html/JS libraries are Solr query-format aware as well? Is anybody using Solr clients (SolrNet, SolrJ) as a base? Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://

Re: Memory Usage on Windows Os while indexing

2014-01-21 Thread Jason Hellman
To a very large extent, the capability of a platform is measurable by the skill of the team administering it. If core competencies lie in Windows OS then I would wager heavily the platform will outperform a similar Linux OS installation in the long haul. All things being equal, it’s really hard

Re: Solr middle-ware?

2014-01-21 Thread Raymond Wiker
Speaking for myself, I avoid using "client apis" like SolrNet, SolrJ and FAST DSAPI for the simple reason that I feel that the abstractions they offer are so thin that I may just as well talk directly to the HTTP interface. Doing that also lets me build web applications that maintain their own stat