I have usually used a logarithmic weighting for recency. The difference between
a day or two ago is similar to the difference between two or three weeks ago,
which is similar to the difference between five or six months ago.
The idea is to distinguish between news articles about the current Pres
first I think the requirement is a bad one. Why should a document with
low relevance 29 days ago score higher than the perfect document from
31 days ago? That doesn't seem like it serves the user very well...
And then "However in cases where update date is unavailable I need to
sort it using creat
First hit from googling "solr config API"
https://cwiki.apache.org/confluence/display/solr/Config+API
Best,
Erick
On Mon, Mar 13, 2017 at 8:27 PM, Binoy Dalal wrote:
> Is there a simpler way of modifying solrconfig.xml in cloud mode without
> having to download the file from zookeeper, modifyin
Hi Joel,
>One thing it could be is that gatherNodes will only work on single value
>fields currently.
Regarding this, the fields which I am using in the query is already a
single value field, not multi-value field.
Regards,
Edwin
On 14 March 2017 at 10:04, Zheng Lin Edwin Yeo wrote:
> Hi Joe
Is there a simpler way of modifying solrconfig.xml in cloud mode without
having to download the file from zookeeper, modifying it and reuploading it?
Something like the schema API maybe?
--
Regards,
Binoy Dalal
Hi Joel,
This is the details which I get form the logs.
java.lang.RuntimeException: java.util.concurrent.ExecutionException:
java.lang.RuntimeException: java.io.IOException:
java.util.concurrent.ExecutionException: java.io.IOException: -->
http://localhost:8984/solr/email/: An exception has occur
Hi all,
I am trying to resolve a problem here where I have to fiddle around with
set of dates ( created and updated date).
My use is that I have to make sure that the document with latest (recent)
update date should come higher in my search results.
Precisely, I am required to maintain 3 buckets
https://wiki.apache.org/solr/FieldCollapsing
> On Mar 13, 2017, at 9:59 PM, Dave wrote:
>
> Perhaps look into grouping on that field.
>
>> On Mar 13, 2017, at 9:08 PM, Scott Smith wrote:
>>
>> I'm trying to solve a search problem and wondering if facets (or something
>> else) might solve th
Perhaps look into grouping on that field.
> On Mar 13, 2017, at 9:08 PM, Scott Smith wrote:
>
> I'm trying to solve a search problem and wondering if facets (or something
> else) might solve the problem.
>
> Let's assume I have a bunch of documents (100 million+). Each document has a
> cate
I'm trying to solve a search problem and wondering if facets (or something
else) might solve the problem.
Let's assume I have a bunch of documents (100 million+). Each document has a
category (keyword) assigned to it. A single document my only have one
category, but there may be multiple docu
Are you sorting on a single field, or multiple fields?
Joel Bernstein
http://joelsolr.blogspot.com/
On Mon, Mar 13, 2017 at 6:49 PM, alexpusch wrote:
> As have been said, only the top N results are collected, but in order to
> find
> out which of the results are the top one, all the results mus
On 3/13/2017 7:58 AM, Mahmoud Almokadem wrote:
> When I start my bulk indexer program the CPU utilization is 100% on each
> server but the rate of the indexer is about 1500 docs per second.
>
> I know that some solr benchmarks reached 70,000+ doc. per second.
There are *MANY* factors that affect i
As have been said, only the top N results are collected, but in order to find
out which of the results are the top one, all the results must be sorted,
no? Can't the docs be somehow accessible in that stage?
Anyway, I see SortingResponseWriter does its own manual sorting using a
priority queue. So
I'm suggesting that worrying about your indexing rate is premature.
13,000 docs/second is over 1B docs per day. As a straw-man number,
each Solr replica (think shard) can hold 64M documents. You need 16
shards at that size to hold a single day's input. Let's say you want
to keep these docs around f
What kind of files are these?
Are these PDF files, each of which is 300Mb? Or Solr Update documents
(XML/JSON), where each document has long fields making it 300MB per
document? Or Solr Update documents that have multiple documents and an
individual batch is more than 300 Mb? Something else?
And
> On Mar 13, 2017, at 12:52 PM, Victor Hugo Olvera Morales
> wrote:
>
> How can I index files with more than 300 MB in weight in solr-6.2.1
Is that 300 MB of text or some source format, like PDF?
The King James Bible is only 4 MB of text, so 300 MB is extremely large.
wunder
Walter Underwood
How can I index files with more than 300 MB in weight in solr-6.2.1
Hi Erick,
Thanks for detailed answer.
The producer can sustain producing with that rate, it's not a spikes.
So, I can ran more clients that write to Solr although I got that maximum
utilization with a single client? Do you think it will increase throughput?
And you advice me to add more shard
OK, so you can get a 360% speedup by commenting out the solr.add. That
indicates that, indeed, you're pretty much running Solr flat out, not
surprising. You _might_ squeeze a little more out of Solr by adding
more client indexers, but that's not going to drive you to the numbers
you need. I do have
Thanks Erick,
I've commented out the line SolrClient.add(doclist) and get 5500+ docs per
second from single producer.
Regarding more shards, you mean use 2 nodes with 8 shards per node so we
got 16 shards on the same 2 nodes or spread shards over more nodes?
I'm using solr 6.4.1 with zookeeper o
Hello Vincent and Michael,
Thank you for the question and answer here.
I have added an 'Applying changes' section to
https://cwiki.apache.org/confluence/display/solr/Learning+To+Rank and changed
https://cwiki.apache.org/confluence/display/solr/Managed+Resources to
cross-reference to the reload
Thanks Joel! This is just a simplified sample query that I created to
better demonstrate the issue. I am not sure whether I want to upgrade to
solr 6.5 as only developer version is available yet and it's a stable
version as far as I know. Thanks for the clarification. I will try to find
some other
it's not a stable version*
On Mon, Mar 13, 2017 at 1:34 PM, Pratik Patel wrote:
> Thanks Joel! This is just a simplified sample query that I created to
> better demonstrate the issue. I am not sure whether I want to upgrade to
> solr 6.5 as only developer version is available yet and it's a stab
If you're using Solr 6.4 then the expression you're running won't work,
because on numeric comparisons are supported.
Solr 6.5 will have the expanded Evaluator functionality, which has string
comparisons.
In the expression you're working with it would be much more performant
though to filter the
Hi,
I am trying to write a streaming expression with 'having' function in it.
Following is my simple query.
having(
>search(collection1,q="*:*",fl="storeid",sort="storeid
> asc",fq=tags:"Company"),
>eq(storeid,524efcfd505637004b1f6f24)
> )
Here, storeid is a field of type "string" in s
You are right, I mean schemaless mode. I saw that it's your answer ;) I've
edited solrconfig.xml and fixed it. Thanks!
On Mon, Mar 13, 2017 at 5:46 PM, Alexandre Rafalovitch
wrote:
> There is managed schema, which means it is editable via API, and there
> is 'schemaless' mode that uses that to a
Syntax looks ok. The logs should have a stack trace.
One thing it could be is that gatherNodes will only work on single value
fields currently.
Joel Bernstein
http://joelsolr.blogspot.com/
On Mon, Mar 13, 2017 at 1:59 AM, Zheng Lin Edwin Yeo
wrote:
> Hi,
>
> I am getting this error when I trie
There is managed schema, which means it is editable via API, and there
is 'schemaless' mode that uses that to auto-define the field based on
the first occurance.
'schemaless' mode does not know if the field will be multi-valued the
first time it sees content for that field. So, all the fields crea
OK, I found the answer here:
http://stackoverflow.com/questions/38730035/solr-schemaless-mode-creating-fields-as-multivalued
On Mon, Mar 13, 2017 at 5:15 PM, Furkan KAMACI
wrote:
> Hi,
>
> I generate dummy documents to test Solr 6.4.2. I create a field like that
> at my test code:
>
>
And this is happening when the type is defined for the first time? You
have no field, you send documents, you get new field defined and it is
String, not Date? What's the value the field actually stores?
Regards,
Alex.
http://www.solr-start.com/ - Resources for Solr users, new and experien
Hi,
I generate dummy documents to test Solr 6.4.2. I create a field like that
at my test code:
int customCount = r.nextInt(500);
document.addField("custom_count", customCount);
This field is indexed as:
org.apache.solr.schema.TrieLongField
and
Multivalued
Note that 70,000 docs/second pretty much guarantees that there are
multiple shards. Lots of shards.
But since you're using SolrJ, the very first thing I'd try would be
to comment out the SolrClient.add(doclist) call so you're doing
everything _except_ send the docs to Solr. That'll tell you wheth
Everything works well but type is predicted as String instead of Date. I
create just plain documents as follows:
SimpleDateFormat simpleDateFormat = new
SimpleDateFormat("-MM-dd'T'HH:mm");
Calendar startDate = new GregorianCalendar(2017, r.nextInt(6),
r.nextInt(28));
We have a 30 node Hadoop cluster and each data node has a SOLR instance also
running. We are adding 10 nodes. After adding nodes, we'll run HDFS balancer.
This will affect data locality. does this impact how solr works (I mean
performance)? ThanksImad
Any other definitions in that URP chain are triggered?
Are you seeing this in a nested document by any chance?
Regards,
Alex.
http://www.solr-start.com/ - Resources for Solr users, new and experienced
On 13 March 2017 at 10:29, Furkan KAMACI wrote:
> Hi,
>
> I'm testing schemaless mode
Dear Solr-User,
Im trying to use solr clustering (Lingo algorithm) on my database (notices
with id, title, abstract fields)
All works fine when my query is simple (with or without Boolean operators)
but if I try with exact phrase like:
..&q=ti:snowboard binding&
Then Solr generates on
If you are using master/slave (non-cloud), here is an approach.
1. Build a new master with the new schema.
2. Index all content there.
3. Send all updates to both the old master and new master.
4. One by one, take a slave down, delete all the documents, configure with the
new schema and new maste
On 3/10/2017 10:12 AM, Chris Hostetter wrote:
> If i understand correctly, you mean you've modified the init.d/solr
> script such that when "su" is run you pass "-s /bin/bash" ?
I do not think we can be absolutely certain that bash will *always* be
in that exact location.
Checked the bash source
Hi,
I'm testing schemaless mode of Solr 6.4.2. Solr predicts fields types when
I generate dummy data and index it to Solr. However I could not make Solr
to predict date fields. I tried that:
"custom_start":["2017-05-16T00:00"]
which is a date parse result of SimpleDateFormat("-MM-dd'T'HH:mm
Hi great community,
I have a SolrCloud with the following configuration:
- 2 nodes (r3.2xlarge 61GB RAM)
- 4 shards.
- The producer can produce 13,000+ docs per second
- The schema contains about 300+ fields and the document size is about
3KB.
- Using SolrJ and SolrCloudClient,
Looks like changing the autoCommit maxTime setting is what did it for the
replication issues. Thanks Andrea/Erick for the reminders and pointers!
Scott
-Original Message-
From: Pouliot, Scott [mailto:scott.poul...@peoplefluent.com]
Sent: Friday, March 10, 2017 11:09 AM
To: solr-user@
This could be a good start: https://github.com/nicholasding/solr-lemmatizer
Regards,
Alex.
http://www.solr-start.com/ - Resources for Solr users, new and experienced
On 13 March 2017 at 09:17, OTH wrote:
> Hello all,
>
> I am looking to incorporate synonymization using Wordnet in my Sol
Hello all,
I am looking to incorporate synonymization using Wordnet in my Solr
application.
Does any one have any advice on how to do this, and what the 'best
practices' would be in this regard?
Much thanks
On 3/13/2017 3:16 AM, vbindal wrote:
> I am facing the same issue where my query *:* returns inconsistent number
> (almost 3) time the actual number in millions.
>
> When I try disturb=false on every machine, the results are correct. but
> without `disturb=false` results are incorrect.
This most
On 3/13/2017 3:07 AM, danny teichthal wrote:
> I have a limitation that our Solr cluster is "live" during full
> indexing. We have many updates and the index is incrementally updated.
> There's no way to index everything on a side index and replace. So,
> I'm trying to find a solution where users c
Hello SOLR experts,
I am new to SOLR and I am trying to do alphanumeric sort on string field(s).
However, in my case, alphabets should come before numbers. I also have a large
number of such fields (~2500), any of which can be alphanumerically sorted upon
at runtime. I’ve explored below concept
I'm using solr 4.10.0. I'm using "id" field as the unique key - it is passed
in with the document when ingesting the documents into solr. When querying
on different shards, I get duplicate documents with different "_version_".
Out off approx. milions of these docs are duplicates
Cloud has 3 shards
As Alex suggested, _version_ is a special field which can be used in atomic
updates with a special semantic :
"If the content in the _version_ field is equal to '1', then the document
must simply exist. In this case, no version matching occurs, but if the
document does not exist, the updates will
Hi,
I am facing the same issue where my query *:* returns inconsistent number
(almost 3 ) time the actual number in millions.
When I try disturb=false on every machine, the results are correct. but
without `disturb=false` results are incorrect.
Can you guys suggest something?
--
View this m
Thanks Shawn,
I understand that changing id to to string is not an option.
I have a limitation that our Solr cluster is "live" during full indexing.
We have many updates and the index is incrementally updated.
There's no way to index everything on a side index and replace.
So, I'm trying to find
50 matches
Mail list logo