Need Solr 5.0 Support to search and upload doc and indexing.

2015-03-23 Thread rupak
Hi, I am new in Solr and using Solr 5.0.0 search server. After installing when I’m going to search any keyword in solr 5.0.0 it dose not give any results back. But when I was using a previous version of Solr (1.3.0)(previously installed) it gives each and every results of the queried Keyword. For

Re: Securing Solr 5.0.0

2015-03-23 Thread davidphilip cherian
Hi Frederik Arnold, Could you please blog it? the steps to setup the same using apache as reverse proxy and share it with community? On Sun, Mar 22, 2015 at 10:16 PM, Frederik Arnold wrote: > I have and I tried all sorts of things and they didn't work. > But I figured it out now. I setup Apache

Can configName of a collection be change?

2015-03-23 Thread Derek Poh
Hi I created 3 collections (on Lucidworks Fusion). I noticed a difference in configName of the 3 collections. I did not specifically inputthe configName to usefor thembut 1of the collection ('product') is defaulted touse 'myconf' as the configName. collection 'product' = {"configName":"myconf

Re: Block join ordering

2015-03-23 Thread StrW_dev
Hi, No this is unrelated to my issue, I actually made some adjustments to that code part to support some other needs, but at the moment I am not talking about passing scores or something similar. What I want is the order on the parent document. Maybe I can rewrite my query that it produces a s

How to deal with different configurations on different collection?

2015-03-23 Thread Nitin Solanki
Hello, Few days before, I have created a collection (wikingram) in solr 4.10.4(Solr cloud) by applying default configuration from collection1. *sudo /mnt/nitin/Solr/solr_lm/example/scripts/cloud-scripts/zkcli.sh -zkhost localhost:9983 -cmd upconfig -confdir /mnt/nitin/Solr/solr_lm/examp

Re: Can configName of a collection be change?

2015-03-23 Thread Shawn Heisey
On 3/23/2015 2:59 AM, Derek Poh wrote: > I created 3 collections (on Lucidworks Fusion). I noticed a difference > in configName of the 3 collections. > I did not specifically inputthe configName to usefor thembut 1of the > collection ('product') is defaulted touse 'myconf' as the configName. > > c

Re: How to deal with different configurations on different collection?

2015-03-23 Thread Shawn Heisey
On 3/23/2015 4:51 AM, Nitin Solanki wrote: >Few days before, I have created a collection (wikingram) in solr > 4.10.4(Solr cloud) by applying default configuration from collection1. > > *sudo /mnt/nitin/Solr/solr_lm/example/scripts/cloud-scripts/zkcli.sh > -zkhost localhost:9983 -cmd u

Re: Error trying to index files to Solr

2015-03-23 Thread Markus Jelsma
Hello Majisha, Nutch' Solr indexing plugin has support for stripping non-utf8 character codepoints from the input, but it does so only on the content field if i remember correctly. However, that stripping method was not built with the invalid middle byte exception in mind, and i have not seen

Re: How to deal with different configurations on different collection?

2015-03-23 Thread Nitin Solanki
Thanks Shawn. It is working now as you said.. No need to switch to external zookeeper. It is also working in embedded zookeeper On Mon, Mar 23, 2015 at 5:42 PM, Shawn Heisey wrote: > On 3/23/2015 4:51 AM, Nitin Solanki wrote: > >Few days before, I have created a collection (wikingra

Test of MapReduceIndexerTool with Solr 5.0.0 and Hadoop 2.6.0

2015-03-23 Thread Dominique Bejean
Hi, I try to adapt Mark Miller's solr-map-reduce-example scripts in order to try to use MapReduceIndexerTool with Solr 5.0.0 and Hadoop 2.6.0. I use the same twitter sample data with the same avro configuration, ... I had to change the set-map-reduce-classpath.sh file provided with Solr 5 under s

Indexing data from multiple sites

2015-03-23 Thread Bjørn Axelsen
Hello Solr users! I need suggestions on the best and most bullet-proof way to index data from multiple websites. - different websites, - running on different CMS systems (Drupal, Plone, Sharepoint, Wordpress) etc, - different site owners (somebody else is in control of each of the sites). Curren

Re: How to deal with different configurations on different collection?

2015-03-23 Thread Shawn Heisey
On 3/23/2015 7:19 AM, Nitin Solanki wrote: > Thanks Shawn. It is working now as you said.. No need to switch to > external zookeeper. It is also working in embedded zookeeper Failures happen. This is a reality of computer systems, and planning for that failure is absolutely critical. We can put

Creating facets based on the content field

2015-03-23 Thread phiroc
Hello, let's say that you haved indexed hundreds of PDFs using the following curl command: curl -Ss -X POST 'http://mysolr:8990/solr/core0/update/extract?extractFormat=text&wt=json&literal.url=/path/to/the/pdf.pdf"; The PDF's contents are now stored in core0's "content" field. I wonder how yo

How to boost records based on score than a custom field rank (double field)

2015-03-23 Thread Umang Agrawal
Hi All How can we boost the solr records at query time based on score (calculated by solr search engine) then by a custom field rank (a double field available in record). I have dolr documents containing fields "id", "title", "headline", "summary", "rank". Where id is string title is stri

Can hdfs be used as one of the sources

2015-03-23 Thread Alind Sinha
Can HDFS be used as one of the multiple sources to index and store the data?i want to install Solr outside HDFS and then use MySQL and HDFS as two sources to search the data. Kindly suggest. Sent from my iPhone

Re: PostFilter does not seem to work across shards

2015-03-23 Thread Kevin Osborn
A little more information here. I have verified that the post filter is giving me only documents that are in the first shard. Running two shards and a single replica in debug mode also shows that the collect method is only called for documents in the first shard. I never see any indication that the

Auto naming replicas via ADDREPLICA

2015-03-23 Thread Shai Erera
Hi I have a Solr cluster started (all programmatically) with one Solr node, one collection and one shard. I set the replicationFactor to 1. The name of the result core was set to mycollection_shard1_replica1. I then start a second Solr node and issue an ADDREPLICA command as described in the refe

Re: Creating facets based on the content field

2015-03-23 Thread Erik Hatcher
Philippe - can you provide a concrete example of what you mean by creating facets on field’s content? Or maybe rather, what’s missing from doing &facet.field=content currently? Erik > On Mar 23, 2015, at 10:48 AM, phi...@free.fr wrote: > > Hello, > > let's say that you haved indexed

SOLR 5.0.0 and Tomcat version ?

2015-03-23 Thread Vishal Swaroop
Hi, We are planning to configure new linux server for latest SOLR release i.e. 5.0.0 Please suggest which Tomcat version will be best compatible with SOLR5... latest Tomcat release is 8.0 Thanks

Re: SOLR 5.0.0 and Tomcat version ?

2015-03-23 Thread Aman Tandon
Hi Vishal, I am not aware of which version of tomcat will suit best. But I will suggest to use Solr as it is, because after few more release solr will not be able to run using application server. So its good to use it as it is (without application server) when you are redesigning the structure ag

Re: SOLR 5.0.0 and Tomcat version ?

2015-03-23 Thread Karl Kildén
Just curious, what will be done that is incompatible with servlet containers? On 23 March 2015 at 16:50, Aman Tandon wrote: > Hi Vishal, > > I am not aware of which version of tomcat will suit best. But I will > suggest to use Solr as it is, because after few more release solr will not > be ab

Re: Creating facets based on the content field

2015-03-23 Thread phiroc
Let's say that one pdf has the following contents: "[thousands of characters] blablabla Churchill blablabla [thousands of text characters]" ... and another PDF contains: "[thousands of characters] blablabla Gandhi [thousands of characters] Churchill blablabla [thousands of text characters]" A

Re: Auto naming replicas via ADDREPLICA

2015-03-23 Thread Shawn Heisey
On 3/23/2015 9:27 AM, Shai Erera wrote: > I have a Solr cluster started (all programmatically) with one Solr node, > one collection and one shard. I set the replicationFactor to 1. The name of > the result core was set to mycollection_shard1_replica1. > > I then start a second Solr node and issue a

Re: PostFilter does not seem to work across shards

2015-03-23 Thread Kevin Osborn
I think I found my issue. It has nothing to do with the post filter. In the constructor of my post filter, I am doing a TermQuery do get a single user document. I then later intersect this user's permissions with the collected documents. So, if the user document is in the shard that I am filtering

Re: Creating facets based on the content field

2015-03-23 Thread Alexandre Rafalovitch
I think you are over-complicated this before actually trying it. If you index your texts and tokenize them to have individual words then "facet.field=content" will actually give you the list of words sorted by their occurrence count. That's what facet will do. A bigger problem is - from your examp

Query in Solr plugin across shards

2015-03-23 Thread Kevin Osborn
I have created a PostFilter. PostFilter creates a DelegatingCollector, which provides a Lucene IndexSearcher. However, I need to query for an object that may or may not be located on the shard that I am filtering on. Normally, I would do something like: searcher.search(new TermQuery(new Term("fi

Re: Creating facets based on the content field

2015-03-23 Thread Charlie Hull
On 23/03/2015 16:08, phi...@free.fr wrote: Let's say that one pdf has the following contents: Aren't you thinking of Named Entity Recognition? We've used Stanford NLP for this in the past and it's quite good at People, Places and Organisations out of the box (needs tuning for other classes of

Re: Creating facets based on the content field

2015-03-23 Thread phiroc
I reindexed the PDFs without specifying facets and they "magically" appeared in facets.vm! Many thanks! - Mail original - De: "Alexandre Rafalovitch" À: "solr-user" Envoyé: Lundi 23 Mars 2015 17:23:40 Objet: Re: Creating facets based on the content field I think you are over-complic

Re: Creating facets based on the content field

2015-03-23 Thread phiroc
I just want a list of recurring words (for now.) I removed the manually-created facets from solrconfig.xml and SOLR "automagically" created a facet list for me. But thanks for your suggestions. - Mail original - De: "Charlie Hull" À: solr-user@lucene.apache.org Envoyé: Lundi 23 Mars

document contained more than 100000 characters

2015-03-23 Thread Srinivas
Hi, Present in my project we are using apache tika for reading metadata of the file,So whenever we handled large files(contained more than 10 characters file) tika generating the error is file contained more than 10 characters, So is it possible or not handling large files by using tika,Pl

Re: Auto naming replicas via ADDREPLICA

2015-03-23 Thread Shai Erera
Shawn, that was a great tip! When I tried the URL, the core was named as expected (mycollection_shard1_replica2). I then compared the URLs as reported in the logs, and I believe I found the bug: SolrJ: [admin] webapp=null path=/admin/collections params={shard=shard1& *name=mycollection*&action=AD

Re: Need Solr 5.0 Support to search and upload doc and indexing.

2015-03-23 Thread Erick Erickson
bq: Then when we are going to execute $ bin/post -c gettingstarted example/exampledocs/*.json in cmd prompt it fires some errors like "'post' is not recognized This usually means you're not firing the query from the proper directory. Are you in the solr parent of "bin"? I'm really unclear where y

Re: SOLR 5.0.0 and Tomcat version ?

2015-03-23 Thread Erick Erickson
There will be no war file distributed for a start Best, Erick On Mon, Mar 23, 2015 at 9:04 AM, Karl Kildén wrote: > Just curious, what will be done that is incompatible with servlet > containers? > > > > On 23 March 2015 at 16:50, Aman Tandon wrote: > >> Hi Vishal, >> >> I am not aware of w

Re: How to boost records based on score than a custom field rank (double field)

2015-03-23 Thread Erick Erickson
Function queries are built for this, see: https://cwiki.apache.org/confluence/display/solr/Function+Queries, the _val_ trick (that's underscore-val-underscore in case italics happen). You can also wrap that value in any of the other functions. Best, Erick On Mon, Mar 23, 2015 at 7:58 AM, Umang A

Re: PostFilter does not seem to work across shards

2015-03-23 Thread Erick Erickson
Whew! Thanks for bring this to closure! Best, Erick On Mon, Mar 23, 2015 at 9:18 AM, Kevin Osborn wrote: > I think I found my issue. It has nothing to do with the post filter. In the > constructor of my post filter, I am doing a TermQuery do get a single user > document. I then later intersect t

Re: Query in Solr plugin across shards

2015-03-23 Thread Erick Erickson
How much information do you need from this document? If it's a reasonably small amount, can you read it at the application layer and attach it as a set of parameters to the query that are then available to the post filter. Or is it a huge ACL list of something In this latter case, if you know

Re: Creating facets based on the content field

2015-03-23 Thread Erick Erickson
Be a little careful here about memory. Faceting on high-cardinality fields is a very good way to encounter OOM and/or performance problems. But you're right, in Solr faceting is a query-time construct, it needs nothing at index time. The NLP stuff can help narrow down the number of unique values i

Re: document contained more than 100000 characters

2015-03-23 Thread Alexandre Rafalovitch
Apache Tika has it's own mailing list. You may have better luck asking this question there. If you then need help adapting it in Solr context, come back. Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 23 March 2015 at 05:08,

Re: schemaless slow indexing

2015-03-23 Thread Alexandre Rafalovitch
I looked at SOLR-7290, but I think the discussion should stay on the mailing list for at least one more iteration. My understanding that the reason copyField exists is so that a search actually worked out of the box. Without knowing the field names, one cannot say what to search. So, the copyField

Re: schemaless slow indexing

2015-03-23 Thread Yonik Seeley
On Mon, Mar 23, 2015 at 1:54 PM, Alexandre Rafalovitch wrote: > I looked at SOLR-7290, but I think the discussion should stay on the > mailing list for at least one more iteration. > > My understanding that the reason copyField exists is so that a search > actually worked out of the box. Without k

Re: Query in Solr plugin across shards

2015-03-23 Thread Kevin Osborn
Thanks. It is a fairly large ACL, so I am hoping to avoid any sort of application redirect. That is sort of the problem we are trying to solve actually. Our list was getting too large and we were maxing out maxBooleanQueries. And I don't know which shard the user document is located on, just its

Re: SOLR 5.0.0 and Tomcat version ?

2015-03-23 Thread Adnan Yaqoob
Erick, Any specific reason for going away from war file? Adnan On Mon, Mar 23, 2015 at 12:35 PM, Erick Erickson wrote: > There will be no war file distributed for a start > > Best, > Erick > > On Mon, Mar 23, 2015 at 9:04 AM, Karl Kildén > wrote: > > Just curious, what will be done that is

Re: SOLR 5.0.0 and Tomcat version ?

2015-03-23 Thread Vishal Swaroop
a) Does this means that SOLR 5 cannot be deployed on Tomcat or it is not worth. Regards Vishal On Mon, Mar 23, 2015 at 2:27 PM, Adnan Yaqoob wrote: > Erick, > Any specific reason for going away from war file? > > Adnan > > On Mon, Mar 23, 2015 at 12:35 PM, Erick Erickson > wrote: > > > There

Re: schemaless slow indexing

2015-03-23 Thread Alexandre Rafalovitch
Yonik, those are all facts. Which I do not disagree with at all. But there are also consequences when you bring the rest of the facts and the assumptions and documented workflows into play. My comment was trying to address the situation on that level I am all for improving performance. I am just

Re: SOLR 5.0.0 and Tomcat version ?

2015-03-23 Thread Alexandre Rafalovitch
There is a very long chain of discussions both on Development mailing list and on JIRA about not supporting 3rd party containers. It's very hard to summarize well. Please check those discussions for details. But the end result is: it is what it is. Solr 5+ is coming in as a black box and will cont

Re: SOLR 5.0.0 and Tomcat version ?

2015-03-23 Thread Erick Erickson
The argument is that "you don't run a SQL engine from a servlet container, why should you run Solr that way?" Currently 5.x does have a war file in the webapps directory you can use. As Alexandre says, though, people are being encouraged to move away from that model and use the script way of doing

Custom updateProcessor for purpose of extracting interesting terms at index time

2015-03-23 Thread Ali Nazemian
Dear All, Hi, I wrote a customize updateProcessorFactory for the purpose of extracting interesting terms at index time an putting them in a new field. Since I use MLT interesting terms for this purpose, I have to make sure that the added document exists in index or not. If it was indexed before the

Re: Custom updateProcessor for purpose of extracting interesting terms at index time

2015-03-23 Thread Alexandre Rafalovitch
So, for a new document. You want to index the document, then read it, then add keywords and index again? This does sound like an infinite loop. Not sure there is a solution for this approach. You sure you cannot do it like spell checker does with compiling a side-car index on commit? Or even with

Re: Custom updateProcessor for purpose of extracting interesting terms at index time

2015-03-23 Thread Ali Nazemian
Dear Alex, Hi, I am not sure about what would be the best way of doing such process, Would you please provide me some detail example about doing that on commit? like spell checker that you mentioned? Is is possible to do that using a custom analyzer on a copy field? In order to use MLT interesting

Re: How to use ConcurrentUpdateSolrServer for Secured Solr?

2015-03-23 Thread Mark Miller
Doesn't ConcurrentUpdateSolrServer take an HttpClient in one of it's constructors? - Mark On Sun, Mar 22, 2015 at 3:40 PM Ramkumar R. Aiyengar < andyetitmo...@gmail.com> wrote: > Not a direct answer, but Anshum just created this.. > > https://issues.apache.org/jira/browse/SOLR-7275 > On 20 Mar

How To Remove an Alert

2015-03-23 Thread jack.met...@hp.com
Hello, I have a problem I just created an alert but I set the threshold too low. Is there a way to edit or remove the alert.

Re: Securing Solr 5.0.0

2015-03-23 Thread Frederik Arnold
Sure! You have to install mod_proxy for Apache and activate it. And then I put a file with the following content in /etc/apache2/conf.d: ProxyRequests Off ProxyPreserveHost Off AddDefaultCharset off Order deny,allow Allow from all ProxyPass /solrsearch http://localhost:8983/solr/s

How to remove an Alert

2015-03-23 Thread jack.met...@hp.com
Hello, I have a problem with [ ... briefly describe your problem here ... ] [ ... insert additional info here - keep it short and to the point ... ] Below are some SPM graphs showing the state of my system. Here's the 'Threads' graph: https://apps.sematext.com/spm-reports/s/aFUIR1fecb

Re: Creating facets based on the content field

2015-03-23 Thread Philippe de Rochambeau
Hi Erick, can you use NLP for query-time facetting? How? Moreover, can you use it to find keyword patterns? Cheers, Philippe > Le 23 mars 2015 à 18:44, Erick Erickson a écrit : > > Be a little careful here about memory. Faceting on high-cardinality > fields is a very good way to encounter OOM a

Difference in indexing using config file vs client i.e SolrJ

2015-03-23 Thread Purohit, Sumit
Hi All, I have recently started working with Solr and i have a trivial question to ask, as i could not find suitable answer. A document's indexes can be defined in a config file (such as schema.xml) and on the fly using some solr client such as SolrJ. 1. What is the difference in indexes creat

Re: How To Remove an Alert

2015-03-23 Thread Erick Erickson
What product? What alert? This doesn't sound like straight Solr. There is zero context here to help us help you... Please review: http://wiki.apache.org/solr/UsingMailingLists Best, Erick On Mon, Mar 23, 2015 at 1:37 PM, jack.met...@hp.com wrote: > Hello, > > I have a problem I just created a

Re: Creating facets based on the content field

2015-03-23 Thread Erick Erickson
I wasn't talking about using NLP at query time. I was trying to convey that perhaps NLP processing on documents at _index_ time could reduce the number of distinct tokens you then facet over at query time. The basic caution still remains, faceting on high-cardinality fields is expensive, it's just

Re: schemaless slow indexing

2015-03-23 Thread Steve Rowe
> On Mar 23, 2015, at 11:51 AM, Alexandre Rafalovitch > wrote: > For example, I am not even sure if we can create a copyField > definition via REST API yet.

RE: Creating facets based on the content field

2015-03-23 Thread Markus Jelsma
Hi - trying to extract entities for facets or whatever using IDF is bad at best. MLT works well because of scoring, not for entity extraction, because it doesnt extract entities. The OpenNLP Lucene filters do what you need, but it depends on the model you built. The freely available maxent model

Re: Difference in indexing using config file vs client i.e SolrJ

2015-03-23 Thread Erick Erickson
1> Either none or lots, depending;). You're talking "schemaless" here I think. schemaless mode guesses what the field should be based on the document and creates a field in the doc. pre-defined schemas require you to make that decision up front. So in terms of what the underlying index looks like

Re: schemaless slow indexing

2015-03-23 Thread Steve Rowe
> On Mar 23, 2015, at 11:09 AM, Yonik Seeley wrote: > > On Mon, Mar 23, 2015 at 1:54 PM, Alexandre Rafalovitch > wrote: >> I looked at SOLR-7290, but I think the discussion should stay on the >> mailing list for at least one more iteration. >> >> My understanding that the reason copyField exist

Re: How To Remove an Alert

2015-03-23 Thread Otis Gospodnetic
Hi, I think this may have been for Sematext SPM for Solr monitoring and Jack got our help a few hours ago. Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & Elasticsearch Support * http://sematext.com/ On Mon, Mar 23, 2015 at 7:

SolrFaceting -Help improving suggestions tag

2015-03-23 Thread MKGoose
Whenever a term is searched, we display the results and related tags. (We provide tags to each post using pre defined tags library) We would like to improve the tag suggestions to the user with a mix of relevancy and facet count In the current implementation, related tags are shown based on facet

Re: Can configName of a collection be change?

2015-03-23 Thread Derek Poh
Hi Shawn I understand what you mean on checking back with the vendor. Thank you for your explanation from the perspective of pure Solr. -Derek On 3/23/2015 8:08 PM, Shawn Heisey wrote: On 3/23/2015 2:59 AM, Derek Poh wrote: I created 3 collections (on Lucidworks Fusion). I noticed a differen

Re: Need Solr 5.0 Support to search and upload doc and indexing.

2015-03-23 Thread rupak
Hi Erick Erickson, I am executing the post script from the below mentioned path "C:\Users\Desktop\solr-5.0.0\bin\" and here I am trying to execute the post script of example docs. Can you tell me that am I going right? if not then please tell me the right one. One more thing, that I am receivi

Re: Need Solr 5.0 Support to search and upload doc and indexing.

2015-03-23 Thread Erick Erickson
Um, if the directory you're in has the "post" executable, how do you expect to path to bin/post? You're already _in_ the bin directory and this would be looking for a subdirectory called "bin" that contained the post tool. So back up one directory to the parent of 'bin' and try it again would

Unable to setup solr cloud with multiple collections.

2015-03-23 Thread sthita
I have newly created a new collection and activated the replication for 4 nodes(Including masters). After doing the config changes as suggested on http://wiki.apache.org/solr/SolrReplication The nodes of the newly created collections are down on sol

DIH debugging

2015-03-23 Thread Midas A
How can i debug in DIH ? query time ..