Lucene uses TFIDFSimilarity class to calculate the similarity.
It is implemented on the idea of cosine measurement but it modifies the cosine
formula.
Please take a look at "Lucene Practical Scoring Function" in the following
Javadoc:
http://lucene.apache.org/core/4_10_3/core/org/apache/lucene/
Dear Koji,
Thank you very much.
Do you know what is the range of score in this new formula? What is the
reasonable threshold for considering two documents as similar enough in
this formula?
Regards.
On Tue, Feb 3, 2015 at 1:35 PM, Koji Sekiguchi wrote:
> Lucene uses TFIDFSimilarity class to calc
Hi all,
I uploaded a patch (https://issues.apache.org/jira/browse/SOLR-5972) that
contains a new statistics result for a field - existInDoc.
It returns the number of documents in which the field has a value (not missing).
This patch is bason on Solr 4.4.
For multivalue fields there is a calculat
we have already started using this toolkit, we have explored it completely,
Do we have any sample script in python to get the config file or other
files from svn and deploy in tomcat?
*Thanks,*
*Rajesh**.*
On Mon, Feb 2, 2015 at 3:32 PM, Anshum Gupta wrote:
> Solr scale toolkit should be a go
Hi Lokesh,
thanks for the information.
I forgot to mention that the system I am working on is still using 3.5 so I
will probably have to reindex the whole set of documents.
Unless someone knows how to get around this...
From: Lokesh Chhaparwal
Sent:
Hi All,
I wonder if it's in some way possible to search for multiple terms like:
( OR OR OR )
and in case a document contains 2 or more of these terms: only the highest
scoring term should contribute to the final relevancy score; possibly lower
scoring terms should be discarded from the sco
Either use the MaxScoreQueryParser [1] or set tie to zero when using a DisMax
parser.
[1]:
https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-MaxScoreQueryParser
-Original message-
> From:Burgmans, Tom
> Sent: Tuesday 3rd February 2015 16:13
> To: solr-user@
Oh, I know I have problems! My (b) option of reversing sort and using the
current cursor mark is not working. It gets off by one record.
paging forward:
pg 1: docs 1-10
pg 2: docs 11-20
pg 3: docs 21-30
now paging backwards:
pg 2: docs 10-19
I'll go back to tracking all the cursor marks.
--
Hi,
I was wondering how can I limit the result of MoreLikeThis query by the
score value instead of filtering them by document count?
Thank you very much.
--
A.Nazemian
Hi - sure you can, using the frange parser as a filter:
https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-FunctionRangeQueryParser
http://lucene.apache.org/solr/4_10_3/solr-core/org/apache/solr/search/FunctionRangeQParserPlugin.html
But this is very much not recommended,
Hi all!
I'm trying to use Solr with the DIH and xslt processing. All is fine
till i put xml with html entity in the content (like $euro;) where i get a
Caused by: javax.xml.transform.TransformerException:
com.sun.org.apache.xml.internal.utils.WrappedRuntimeException
i put in the xsl the dt
Hi all!
I'm trying to use Solr with the DIH and xslt processing. All is fine
till i put xml with html entity in the content (like $euro;) where i get a
Caused by: javax.xml.transform.TransformerException:
com.sun.org.apache.xml.internal.utils.WrappedRuntimeException
i put in the xsl the dt
If the entities are in the content, you would need to add the DTD to the
content, not to the stylesheet. Or you could transform the content
converting the entities.
-Mike
On 02/03/2015 10:41 AM, Raul wrote:
Hi all!
I'm trying to use Solr with the DIH and xslt processing. All is fine
till i
: Recently, we have switched over to use atomic update instead of re-indexing
: when we need to update a doc in the index. It looks to me that the
: timestamp field is not updated during an atomic update. I have also looked
: into TimestampUpdateProcessorFactory and it looks to me that won't hel
I feel the tlog size is perfectly fine since your hard commit interval is
low.You can try increasing your hard commit and soft commit values.Soft
commit of 1 sec is very low.Soft commit is about visibility of
documents,so you can try and increase this as far your slas.
-Nishanth
On Mon, Feb 2,
If you're trying to do a bulk ingest of data, I recommend committing less
frequently. Don't soft commit at all until the end of the batch, and hard
commit every 60 seconds.
Michael Della Bitta
Senior Software Engineer
o: +1 646 532 3062
appinions inc.
“The Science of Influence Marketing”
18 E
You could implement some sort of sparse map. E.g. discard 9 out of 10
marks for anything more than 20 marks back. If they actually go back
that far again, you re-request from the nearest mark with a larger row
count.
And I would definitely add behavior analytic in this case. It may well
be that 14
: of official documentation, but I wonder abstractly how a non-committer then
: should contribute to the documentation. I just did an evaluation of
...
: With current technology, possibilities include:
you pretty much nailed it...
: * Make a comment within Confluence suggesting content
I'm sorry if this is a basic question, but I am curious where, or at least,
how can we set the parameters in the solrconfig.xml.
E.g. Consider the solrconfig.xml shown here:
http://svn.apache.org/viewvc/lucene/dev/branches/lucene_solr_4_10/solr/example/example-DIH/solr/db/conf/solrconfig.xml?revis
Hi All,
I'm using SOLR 4.9.0 to import XML using /dataimport from the dashboard and a
suitably configured xml-data-config.xml file.
Everything works fine, but very occasionally I encounter a bad XML file and the
XML importhandler fails with the following error, and the index rolls-back.
Caused
We set them as extra parameters sent to to the servlet (jetty or tomcat).
eg java -Dsolr.lock.type=native -jar start.jar
Jim
On 2/3/15, 11:58 AM, "O. Olson" wrote:
>I'm sorry if this is a basic question, but I am curious where, or at
>least,
>how can we set the parameters in the solrconfig.xml
Dear Markus,
Hi,
Thank you very much for your response. I did check the reason why it is not
recommended to filter by score in search query. But I think it is
reasonable to filter by score in case of finding similar documents. I know
in both of them (simple search query and mlt query) vsm of tf-idf
Thank you Jim. I was hoping if there is an alternative to putting the
parameters on the command line, which would be a pain if there are more than
a few parameters i.e. like a config file for example.
Thanks again
Jim.Musil wrote
> We set them as extra parameters sent to to the servlet (jetty or
I've seen this done (encouraged against it, but didn't win). It works.
Except, sometimes things change in the index, and the scores change
subtly. We get complaints that documents that previously were above the
threshold now aren't, and visa-versa. I try to explain that the score
has no meaning bet
Thx, It worked
--
View this message in context:
http://lucene.472066.n3.nabble.com/DocumentAnalysisRequestHandler-tp4183449p4183736.html
Sent from the Solr - User mailing list archive at Nabble.com.
core.properties?
https://cwiki.apache.org/confluence/display/solr/Configuring+solrconfig.xml#Configuringsolrconfig.xml-SubstitutingPropertiesinSolrConfigFiles
Regards.
Alex
Sign up for my Solr resources newsletter at http://www.solr-start.com/
On 3 February 2015 at 15:31, O. Olson wrot
The Solr properties can also be defined in solrcore.properties and
core.properties files:
https://cwiki.apache.org/confluence/display/solr/Configuring+solrconfig.xml
-- Jack Krupansky
On Tue, Feb 3, 2015 at 3:31 PM, O. Olson wrote:
> Thank you Jim. I was hoping if there is an alternative to pu
Hi
I am using solr 4.9 and need to index million of documents from database. I
am using DIH and sending request to fetch by ids. Is there a way to run
multiple indexing threads, concurrently in DIH.
I want to take advantage of
parameter. How do I do it. I am just invoking DIH handler using sol
DIH is single-threaded. There was once a threaded option, but it was buggy and
subsequently was removed.
What I do is partition my data and run multiple dih request handlers at the
same time. It means redundant sections in solrconfig.xml and its not very
elegant but it works.
For instance,
We are also facing the same problem in loading 14 Billion documents into Solr
4.8.10.
Dataimport is working in Single threaded, which is taking more than 3 weeks.
This is working fine without any issues but it takes months to complete the
load.
When we tried SolrJ with the below configuration
Thanks James. After lots of search and reading now I think I understand a
little from your answer.
If I understand correctly my solrconfig.xml will have section like this
db-data-config1.xml
db-data-config1.xml
.
.
.
.
.
db-data-config1.xml
Hi,
I am trying get the results of my facet-query in a sorted order.
This is the code snippet:
SolrQuery solrQuery = new SolrQuery();
solrQuery.setFacet(true);
solrQuery.setFacetLimit(100);
solrQuery.setFacetMinCount(1);
solrQuery.setStart(0);
solr
Hi,
I have a SolrCloud (Solr 4.4, writing to HDFS on CDH-5.3) collection
configured to be populated by flume Morphlines sink. The flume agent reads
data from Kafka and writes to the Solr collection.
The issue is that Solr indexing rate is abysmally poor (~6k docs/sec at
best, dips to a few hundre
What is your replication factor and doc size?
Replication can affect performance a fair amount more than it should currently.
For the number of nodes, that doesn’t sound like it matches what I’ve seen
unless those are huge documents or you have some slow analyzer in the chain or
something.
Wit
Hi,
> I have been trying to find out a way to get the facet results in
ascending order of counts. I could not look up online to find a way to do
this.
In short answer, Solr only supports facet results sorting by descending
order of counts, or lexicographical order of terms.
See the description fo
Hi,
No I am not using WordDelimiterFilter on query side.
Regards,
Modassar
On Fri, Jan 30, 2015 at 5:12 PM, Dmitry Kan wrote:
> Hi,
>
> Do you use WordDelimiterFilter on query side as well?
>
> On Fri, Jan 30, 2015 at 12:51 PM, Modassar Ather
> wrote:
>
> > Hi,
> >
> > An insight in the behav
On 2/2/2015 1:08 AM, Avanish Raju wrote:
> I'm learning to create collections by http for a new solr instance. To
> create a new collection called "*user6*", I tried the following:
> http://104.154.50.127:8983/solr/admin/collections?action=CREATE&name=*user6*
> &numShards=1&replicationFactor=2&prop
On 2/2/2015 11:57 AM, mathewvino wrote:
> I am using solrj API to make call to Solr Server with the data that I am
> looking for. Basically I am using
> solrj api as below to get the data. Everything is working as expected
>
> HttpSolrServer solr = new
> HttpSolrServer("http://server:8983/solr/co
Alexander and Jack Thanks for the reply.
Looking at both, I think that the CloneFieldUpdateProcessor can do what I
need without having to implement a custom one.
By the way, Is there a performance penalty by update processor comparing to
copy Field?
On Mon, Feb 2, 2015 at 4:29 PM, Alexandre Rafa
FYI, this Jira ticket might be related to your question... you can check
the patch.
https://issues.apache.org/jira/browse/SOLR-1672
2015-02-04 11:41 GMT+09:00 Tomoko Uchida :
> Hi,
>
> > I have been trying to find out a way to get the facet results in
> ascending order of counts. I could not look
I am created Solr cloud having 4 nodes. I want to sort the suggestion on
frequency. For this, I have added a line into solrconfig.xml is
*freq*
but it is not working and not reflecting on all nodes. Even I do the below
steps.:
sudo /mnt/nitin/solr/example/scripts/cloud-scripts/zkcli.sh -zkh
Thanks Michael Della Bitta.
Hi. Mike Sokolov,
There is no DEBUG appears inside logs..
On Tue, Feb 3, 2015 at 10:06 PM, Michael Della Bitta <
michael.della.bi...@appinions.com> wrote:
> If you're trying to do a bulk ingest of data, I recommend committing less
> frequently.
How would you do the transform of the content to convert the entities?
With a pre-proccess? We have lot of xml with the content insert (and the
content has the entities) and will be dificult add the DTD to the content...
Thanks
- Raul
El 03/02/15 a las 17:15, Michael Sokolov escribió:
If the e
Hi,
When I use MultiTermQuery like prefix, wildcard, Solr throws an exception if
exceeded maxBooleanClauses value in solrconfig.xml.
If I increase maxBooleanClauses, problem is solved. But it can cause memory
issuses.
So I want to know if there is any way to restrict search by maximum hitting
docum
giving
https://github.com/apache/lucene-solr/blob/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/XPathEntityProcessor.java#L309
you need to specify
onError="continue"
and check the log for LOG.warn("Failed for url : "...
Developers, would you mind to fix typo: app
45 matches
Mail list logo