Re: Lucene cosine similarity score for more like this query

2015-02-03 Thread Koji Sekiguchi
Lucene uses TFIDFSimilarity class to calculate the similarity. It is implemented on the idea of cosine measurement but it modifies the cosine formula. Please take a look at "Lucene Practical Scoring Function" in the following Javadoc: http://lucene.apache.org/core/4_10_3/core/org/apache/lucene/

Re: Lucene cosine similarity score for more like this query

2015-02-03 Thread Ali Nazemian
Dear Koji, Thank you very much. Do you know what is the range of score in this new formula? What is the reasonable threshold for considering two documents as similar enough in this formula? Regards. On Tue, Feb 3, 2015 at 1:35 PM, Koji Sekiguchi wrote: > Lucene uses TFIDFSimilarity class to calc

Stats calculation of existInDoc on multivalue fields which are doc valued

2015-02-03 Thread Elran Dvir
Hi all, I uploaded a patch (https://issues.apache.org/jira/browse/SOLR-5972) that contains a new statistics result for a field - existInDoc. It returns the number of documents in which the field has a value (not missing). This patch is bason on Solr 4.4. For multivalue fields there is a calculat

Re: shell script or script in any language to scale a replica solr node with some configs from zookeeper and the remaining from svn/git

2015-02-03 Thread Rajesh Hazari
we have already started using this toolkit, we have explored it completely, Do we have any sample script in python to get the config file or other files from svn and deploy in tomcat? *Thanks,* *Rajesh**.* On Mon, Feb 2, 2015 at 3:32 PM, Anshum Gupta wrote: > Solr scale toolkit should be a go

Re: Delete By query on a multi-value field

2015-02-03 Thread Jean-Sebastien Vachon
Hi Lokesh, thanks for the information. I forgot to mention that the system I am working on is still using 3.5 so I will probably have to reindex the whole set of documents. Unless someone knows how to get around this... From: Lokesh Chhaparwal Sent:

Score results by only the highest scoring term

2015-02-03 Thread Burgmans, Tom
Hi All, I wonder if it's in some way possible to search for multiple terms like: ( OR OR OR ) and in case a document contains 2 or more of these terms: only the highest scoring term should contribute to the final relevancy score; possibly lower scoring terms should be discarded from the sco

RE: Score results by only the highest scoring term

2015-02-03 Thread Markus Jelsma
Either use the MaxScoreQueryParser [1] or set tie to zero when using a DisMax parser. [1]: https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-MaxScoreQueryParser -Original message- > From:Burgmans, Tom > Sent: Tuesday 3rd February 2015 16:13 > To: solr-user@

Re: Reverse deep paging

2015-02-03 Thread tedsolr
Oh, I know I have problems! My (b) option of reversing sort and using the current cursor mark is not working. It gets off by one record. paging forward: pg 1: docs 1-10 pg 2: docs 11-20 pg 3: docs 21-30 now paging backwards: pg 2: docs 10-19 I'll go back to tracking all the cursor marks. --

MoreLikeThis filter by score threshold

2015-02-03 Thread Ali Nazemian
Hi, I was wondering how can I limit the result of MoreLikeThis query by the score value instead of filtering them by document count? Thank you very much. -- A.Nazemian

RE: MoreLikeThis filter by score threshold

2015-02-03 Thread Markus Jelsma
Hi - sure you can, using the frange parser as a filter: https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-FunctionRangeQueryParser http://lucene.apache.org/solr/4_10_3/solr-core/org/apache/solr/search/FunctionRangeQParserPlugin.html But this is very much not recommended,

DIH: entities in xml problem

2015-02-03 Thread Raul
Hi all! I'm trying to use Solr with the DIH and xslt processing. All is fine till i put xml with html entity in the content (like $euro;) where i get a Caused by: javax.xml.transform.TransformerException: com.sun.org.apache.xml.internal.utils.WrappedRuntimeException i put in the xsl the dt

DIH: entities in xml problem

2015-02-03 Thread Raul
Hi all! I'm trying to use Solr with the DIH and xslt processing. All is fine till i put xml with html entity in the content (like $euro;) where i get a Caused by: javax.xml.transform.TransformerException: com.sun.org.apache.xml.internal.utils.WrappedRuntimeException i put in the xsl the dt

Re: DIH: entities in xml problem

2015-02-03 Thread Michael Sokolov
If the entities are in the content, you would need to add the DTD to the content, not to the stylesheet. Or you could transform the content converting the entities. -Mike On 02/03/2015 10:41 AM, Raul wrote: Hi all! I'm trying to use Solr with the DIH and xslt processing. All is fine till i

Re: timestamp field and atomic updates

2015-02-03 Thread Chris Hostetter
: Recently, we have switched over to use atomic update instead of re-indexing : when we need to update a doc in the index. It looks to me that the : timestamp field is not updated during an atomic update. I have also looked : into TimestampUpdateProcessorFactory and it looks to me that won't hel

Re: Solr Logging files get high

2015-02-03 Thread Nishanth S
I feel the tlog size is perfectly fine since your hard commit interval is low.You can try increasing your hard commit and soft commit values.Soft commit of 1 sec is very low.Soft commit is about visibility of documents,so you can try and increase this as far your slas. -Nishanth On Mon, Feb 2,

Re: Solr Logging files get high

2015-02-03 Thread Michael Della Bitta
If you're trying to do a bulk ingest of data, I recommend committing less frequently. Don't soft commit at all until the end of the batch, and hard commit every 60 seconds. Michael Della Bitta Senior Software Engineer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 E

Re: Reverse deep paging

2015-02-03 Thread Alexandre Rafalovitch
You could implement some sort of sparse map. E.g. discard 9 out of 10 marks for anything more than 20 marks back. If they actually go back that far again, you re-request from the nearest mark with a larger row count. And I would definitely add behavior analytic in this case. It may well be that 14

Re: role of the wiki and cwiki

2015-02-03 Thread Chris Hostetter
: of official documentation, but I wonder abstractly how a non-committer then : should contribute to the documentation. I just did an evaluation of ... : With current technology, possibilities include: you pretty much nailed it... : * Make a comment within Confluence suggesting content

Where can we set the parameters in Solr Config?

2015-02-03 Thread O. Olson
I'm sorry if this is a basic question, but I am curious where, or at least, how can we set the parameters in the solrconfig.xml. E.g. Consider the solrconfig.xml shown here: http://svn.apache.org/viewvc/lucene/dev/branches/lucene_solr_4_10/solr/example/example-DIH/solr/db/conf/solrconfig.xml?revis

Importing XML into SOLR, identifying a failed import document

2015-02-03 Thread Morris, Paul E.
Hi All, I'm using SOLR 4.9.0 to import XML using /dataimport from the dashboard and a suitably configured xml-data-config.xml file. Everything works fine, but very occasionally I encounter a bad XML file and the XML importhandler fails with the following error, and the index rolls-back. Caused

Re: Where can we set the parameters in Solr Config?

2015-02-03 Thread Jim . Musil
We set them as extra parameters sent to to the servlet (jetty or tomcat). eg java -Dsolr.lock.type=native -jar start.jar Jim On 2/3/15, 11:58 AM, "O. Olson" wrote: >I'm sorry if this is a basic question, but I am curious where, or at >least, >how can we set the parameters in the solrconfig.xml

RE: MoreLikeThis filter by score threshold

2015-02-03 Thread Ali Nazemian
Dear Markus, Hi, Thank you very much for your response. I did check the reason why it is not recommended to filter by score in search query. But I think it is reasonable to filter by score in case of finding similar documents. I know in both of them (simple search query and mlt query) vsm of tf-idf

Re: Where can we set the parameters in Solr Config?

2015-02-03 Thread O. Olson
Thank you Jim. I was hoping if there is an alternative to putting the parameters on the command line, which would be a pain if there are more than a few parameters i.e. like a config file for example. Thanks again Jim.Musil wrote > We set them as extra parameters sent to to the servlet (jetty or

Re: MoreLikeThis filter by score threshold

2015-02-03 Thread Upayavira
I've seen this done (encouraged against it, but didn't win). It works. Except, sometimes things change in the index, and the scores change subtly. We get complaints that documents that previously were above the threshold now aren't, and visa-versa. I try to explain that the score has no meaning bet

Re: DocumentAnalysisRequestHandler

2015-02-03 Thread melb
Thx, It worked -- View this message in context: http://lucene.472066.n3.nabble.com/DocumentAnalysisRequestHandler-tp4183449p4183736.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Where can we set the parameters in Solr Config?

2015-02-03 Thread Alexandre Rafalovitch
core.properties? https://cwiki.apache.org/confluence/display/solr/Configuring+solrconfig.xml#Configuringsolrconfig.xml-SubstitutingPropertiesinSolrConfigFiles Regards. Alex Sign up for my Solr resources newsletter at http://www.solr-start.com/ On 3 February 2015 at 15:31, O. Olson wrot

Re: Where can we set the parameters in Solr Config?

2015-02-03 Thread Jack Krupansky
The Solr properties can also be defined in solrcore.properties and core.properties files: https://cwiki.apache.org/confluence/display/solr/Configuring+solrconfig.xml -- Jack Krupansky On Tue, Feb 3, 2015 at 3:31 PM, O. Olson wrote: > Thank you Jim. I was hoping if there is an alternative to pu

Solr 4.9 Calling DIH concurrently

2015-02-03 Thread meena.sri...@mathworks.com
Hi I am using solr 4.9 and need to index million of documents from database. I am using DIH and sending request to fetch by ids. Is there a way to run multiple indexing threads, concurrently in DIH. I want to take advantage of parameter. How do I do it. I am just invoking DIH handler using sol

RE: Solr 4.9 Calling DIH concurrently

2015-02-03 Thread Dyer, James
DIH is single-threaded. There was once a threaded option, but it was buggy and subsequently was removed. What I do is partition my data and run multiple dih request handlers at the same time. It means redundant sections in solrconfig.xml and its not very elegant but it works. For instance,

RE: Solr 4.9 Calling DIH concurrently

2015-02-03 Thread Arumugam, Suresh
We are also facing the same problem in loading 14 Billion documents into Solr 4.8.10. Dataimport is working in Single threaded, which is taking more than 3 weeks. This is working fine without any issues but it takes months to complete the load. When we tried SolrJ with the below configuration

RE: Solr 4.9 Calling DIH concurrently

2015-02-03 Thread meena.sri...@mathworks.com
Thanks James. After lots of search and reading now I think I understand a little from your answer. If I understand correctly my solrconfig.xml will have section like this db-data-config1.xml db-data-config1.xml . . . . . db-data-config1.xml

SolrJ Facetting: changing sort order

2015-02-03 Thread harish singh
Hi, I am trying get the results of my facet-query in a sorted order. This is the code snippet: SolrQuery solrQuery = new SolrQuery(); solrQuery.setFacet(true); solrQuery.setFacetLimit(100); solrQuery.setFacetMinCount(1); solrQuery.setStart(0); solr

Solrcloud (to HDFS) poor indexing performance

2015-02-03 Thread Tim Smith
Hi, I have a SolrCloud (Solr 4.4, writing to HDFS on CDH-5.3) collection configured to be populated by flume Morphlines sink. The flume agent reads data from Kafka and writes to the Solr collection. The issue is that Solr indexing rate is abysmally poor (~6k docs/sec at best, dips to a few hundre

Re: Solrcloud (to HDFS) poor indexing performance

2015-02-03 Thread Mark Miller
What is your replication factor and doc size? Replication can affect performance a fair amount more than it should currently. For the number of nodes, that doesn’t sound like it matches what I’ve seen unless those are huge documents or you have some slow analyzer in the chain or something. Wit

Re: SolrJ Facetting: changing sort order

2015-02-03 Thread Tomoko Uchida
Hi, > I have been trying to find out a way to get the facet results in ascending order of counts. I could not look up online to find a way to do this. In short answer, Solr only supports facet results sorting by descending order of counts, or lexicographical order of terms. See the description fo

Re: WordDelimiterFilterFactory and position increment.

2015-02-03 Thread Modassar Ather
Hi, No I am not using WordDelimiterFilter on query side. Regards, Modassar On Fri, Jan 30, 2015 at 5:12 PM, Dmitry Kan wrote: > Hi, > > Do you use WordDelimiterFilter on query side as well? > > On Fri, Jan 30, 2015 at 12:51 PM, Modassar Ather > wrote: > > > Hi, > > > > An insight in the behav

Re: Core property name ignored when creating collection using API

2015-02-03 Thread Shawn Heisey
On 2/2/2015 1:08 AM, Avanish Raju wrote: > I'm learning to create collections by http for a new solr instance. To > create a new collection called "*user6*", I tried the following: > http://104.154.50.127:8983/solr/admin/collections?action=CREATE&name=*user6* > &numShards=1&replicationFactor=2&prop

Re: SOLR retrieve data using URL

2015-02-03 Thread Shawn Heisey
On 2/2/2015 11:57 AM, mathewvino wrote: > I am using solrj API to make call to Solr Server with the data that I am > looking for. Basically I am using > solrj api as below to get the data. Everything is working as expected > > HttpSolrServer solr = new > HttpSolrServer("http://server:8983/solr/co

Re: CopyField exclude patterns

2015-02-03 Thread danny teichthal
Alexander and Jack Thanks for the reply. Looking at both, I think that the CloneFieldUpdateProcessor can do what I need without having to implement a custom one. By the way, Is there a performance penalty by update processor comparing to copy Field? On Mon, Feb 2, 2015 at 4:29 PM, Alexandre Rafa

Re: SolrJ Facetting: changing sort order

2015-02-03 Thread Tomoko Uchida
FYI, this Jira ticket might be related to your question... you can check the patch. https://issues.apache.org/jira/browse/SOLR-1672 2015-02-04 11:41 GMT+09:00 Tomoko Uchida : > Hi, > > > I have been trying to find out a way to get the facet results in > ascending order of counts. I could not look

comparatorClass is not reflected on all nodes in Solr

2015-02-03 Thread Nitin Solanki
I am created Solr cloud having 4 nodes. I want to sort the suggestion on frequency. For this, I have added a line into solrconfig.xml is *freq* but it is not working and not reflecting on all nodes. Even I do the below steps.: sudo /mnt/nitin/solr/example/scripts/cloud-scripts/zkcli.sh -zkh

Re: Solr Logging files get high

2015-02-03 Thread Nitin Solanki
Thanks Michael Della Bitta. Hi. Mike Sokolov, There is no DEBUG appears inside logs.. On Tue, Feb 3, 2015 at 10:06 PM, Michael Della Bitta < michael.della.bi...@appinions.com> wrote: > If you're trying to do a bulk ingest of data, I recommend committing less > frequently.

Re: DIH: entities in xml problem

2015-02-03 Thread Raul
How would you do the transform of the content to convert the entities? With a pre-proccess? We have lot of xml with the content insert (and the content has the entities) and will be dificult add the DTD to the content... Thanks - Raul El 03/02/15 a las 17:15, Michael Sokolov escribió: If the e

Is there any way to restrict search by maximum document count?

2015-02-03 Thread Jason
Hi, When I use MultiTermQuery like prefix, wildcard, Solr throws an exception if exceeded maxBooleanClauses value in solrconfig.xml. If I increase maxBooleanClauses, problem is solved. But it can cause memory issuses. So I want to know if there is any way to restrict search by maximum hitting docum

Re: Importing XML into SOLR, identifying a failed import document

2015-02-03 Thread Mikhail Khludnev
giving https://github.com/apache/lucene-solr/blob/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/XPathEntityProcessor.java#L309 you need to specify onError="continue" and check the log for LOG.warn("Failed for url : "... Developers, would you mind to fix typo: app