Re: Need help with date boost

2017-03-13 Thread Walter Underwood
I have usually used a logarithmic weighting for recency. The difference between a day or two ago is similar to the difference between two or three weeks ago, which is similar to the difference between five or six months ago. The idea is to distinguish between news articles about the current Pres

Re: Need help with date boost

2017-03-13 Thread Erick Erickson
first I think the requirement is a bad one. Why should a document with low relevance 29 days ago score higher than the perfect document from 31 days ago? That doesn't seem like it serves the user very well... And then "However in cases where update date is unavailable I need to sort it using creat

Re: Modifying solrconfig.xml in solr cloud

2017-03-13 Thread Erick Erickson
First hit from googling "solr config API" https://cwiki.apache.org/confluence/display/solr/Config+API Best, Erick On Mon, Mar 13, 2017 at 8:27 PM, Binoy Dalal wrote: > Is there a simpler way of modifying solrconfig.xml in cloud mode without > having to download the file from zookeeper, modifyin

Re: Error for Graph Traversal using Streaming Expressions

2017-03-13 Thread Zheng Lin Edwin Yeo
Hi Joel, >One thing it could be is that gatherNodes will only work on single value >fields currently. Regarding this, the fields which I am using in the query is already a single value field, not multi-value field. Regards, Edwin On 14 March 2017 at 10:04, Zheng Lin Edwin Yeo wrote: > Hi Joe

Modifying solrconfig.xml in solr cloud

2017-03-13 Thread Binoy Dalal
Is there a simpler​ way of modifying solrconfig.xml in cloud mode without having to download the file from zookeeper, modifying it and reuploading it? Something like the schema API maybe? -- Regards, Binoy Dalal

Re: Error for Graph Traversal using Streaming Expressions

2017-03-13 Thread Zheng Lin Edwin Yeo
Hi Joel, This is the details which I get form the logs. java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.io.IOException: java.util.concurrent.ExecutionException: java.io.IOException: --> http://localhost:8984/solr/email/: An exception has occur

Need help with date boost

2017-03-13 Thread Atita Arora
Hi all, I am trying to resolve a problem here where I have to fiddle around with set of dates ( created and updated date). My use is that I have to make sure that the document with latest (recent) update date should come higher in my search results. Precisely, I am required to maintain 3 buckets

Re: Facet? Search problem

2017-03-13 Thread Dave
https://wiki.apache.org/solr/FieldCollapsing > On Mar 13, 2017, at 9:59 PM, Dave wrote: > > Perhaps look into grouping on that field. > >> On Mar 13, 2017, at 9:08 PM, Scott Smith wrote: >> >> I'm trying to solve a search problem and wondering if facets (or something >> else) might solve th

Re: Facet? Search problem

2017-03-13 Thread Dave
Perhaps look into grouping on that field. > On Mar 13, 2017, at 9:08 PM, Scott Smith wrote: > > I'm trying to solve a search problem and wondering if facets (or something > else) might solve the problem. > > Let's assume I have a bunch of documents (100 million+). Each document has a > cate

Facet? Search problem

2017-03-13 Thread Scott Smith
I'm trying to solve a search problem and wondering if facets (or something else) might solve the problem. Let's assume I have a bunch of documents (100 million+). Each document has a category (keyword) assigned to it. A single document my only have one category, but there may be multiple docu

Re: Iterating sorted result docs in a custom search component

2017-03-13 Thread Joel Bernstein
Are you sorting on a single field, or multiple fields? Joel Bernstein http://joelsolr.blogspot.com/ On Mon, Mar 13, 2017 at 6:49 PM, alexpusch wrote: > As have been said, only the top N results are collected, but in order to > find > out which of the results are the top one, all the results mus

Re: Indexing CPU performance

2017-03-13 Thread Shawn Heisey
On 3/13/2017 7:58 AM, Mahmoud Almokadem wrote: > When I start my bulk indexer program the CPU utilization is 100% on each > server but the rate of the indexer is about 1500 docs per second. > > I know that some solr benchmarks reached 70,000+ doc. per second. There are *MANY* factors that affect i

Re: Iterating sorted result docs in a custom search component

2017-03-13 Thread alexpusch
As have been said, only the top N results are collected, but in order to find out which of the results are the top one, all the results must be sorted, no? Can't the docs be somehow accessible in that stage? Anyway, I see SortingResponseWriter does its own manual sorting using a priority queue. So

Re: Indexing CPU performance

2017-03-13 Thread Erick Erickson
I'm suggesting that worrying about your indexing rate is premature. 13,000 docs/second is over 1B docs per day. As a straw-man number, each Solr replica (think shard) can hold 64M documents. You need 16 shards at that size to hold a single day's input. Let's say you want to keep these docs around f

Re: I need to index files larger than 300 Mb, helpme please

2017-03-13 Thread Alexandre Rafalovitch
What kind of files are these? Are these PDF files, each of which is 300Mb? Or Solr Update documents (XML/JSON), where each document has long fields making it 300MB per document? Or Solr Update documents that have multiple documents and an individual batch is more than 300 Mb? Something else? And

Re: I need to index files larger than 300 Mb, helpme please

2017-03-13 Thread Walter Underwood
> On Mar 13, 2017, at 12:52 PM, Victor Hugo Olvera Morales > wrote: > > How can I index files with more than 300 MB in weight in solr-6.2.1 Is that 300 MB of text or some source format, like PDF? The King James Bible is only 4 MB of text, so 300 MB is extremely large. wunder Walter Underwood

I need to index files larger than 300 Mb, helpme please

2017-03-13 Thread Victor Hugo Olvera Morales
How can I index files with more than 300 MB in weight in solr-6.2.1

Re: Indexing CPU performance

2017-03-13 Thread Mahmoud Almokadem
Hi Erick, Thanks for detailed answer. The producer can sustain producing with that rate, it's not a spikes. So, I can ran more clients that write to Solr although I got that maximum utilization with a single client? Do you think it will increase throughput? And you advice me to add more shard

Re: Indexing CPU performance

2017-03-13 Thread Erick Erickson
OK, so you can get a 360% speedup by commenting out the solr.add. That indicates that, indeed, you're pretty much running Solr flat out, not surprising. You _might_ squeeze a little more out of Solr by adding more client indexers, but that's not going to drive you to the numbers you need. I do have

Re: Indexing CPU performance

2017-03-13 Thread Mahmoud Almokadem
Thanks Erick, I've commented out the line SolrClient.add(doclist) and get 5500+ docs per second from single producer. Regarding more shards, you mean use 2 nodes with 8 shards per node so we got 16 shards on the same 2 nodes or spread shards over more nodes? I'm using solr 6.4.1 with zookeeper o

Re: LTR on multiple shards

2017-03-13 Thread Christine Poerschke (BLOOMBERG/ LONDON)
Hello Vincent and Michael, Thank you for the question and answer here. I have added an 'Applying changes' section to https://cwiki.apache.org/confluence/display/solr/Learning+To+Rank and changed https://cwiki.apache.org/confluence/display/solr/Managed+Resources to cross-reference to the reload

Re: BooleanEvaluator inside 'having' function of a streaming expression

2017-03-13 Thread Pratik Patel
Thanks Joel! This is just a simplified sample query that I created to better demonstrate the issue. I am not sure whether I want to upgrade to solr 6.5 as only developer version is available yet and it's a stable version as far as I know. Thanks for the clarification. I will try to find some other

Re: BooleanEvaluator inside 'having' function of a streaming expression

2017-03-13 Thread Pratik Patel
it's not a stable version* On Mon, Mar 13, 2017 at 1:34 PM, Pratik Patel wrote: > Thanks Joel! This is just a simplified sample query that I created to > better demonstrate the issue. I am not sure whether I want to upgrade to > solr 6.5 as only developer version is available yet and it's a stab

Re: BooleanEvaluator inside 'having' function of a streaming expression

2017-03-13 Thread Joel Bernstein
If you're using Solr 6.4 then the expression you're running won't work, because on numeric comparisons are supported. Solr 6.5 will have the expanded Evaluator functionality, which has string comparisons. In the expression you're working with it would be much more performant though to filter the

BooleanEvaluator inside 'having' function of a streaming expression

2017-03-13 Thread Pratik Patel
Hi, I am trying to write a streaming expression with 'having' function in it. Following is my simple query. having( >search(collection1,q="*:*",fl="storeid",sort="storeid > asc",fq=tags:"Company"), >eq(storeid,524efcfd505637004b1f6f24) > ) Here, storeid is a field of type "string" in s

Re: Managed Schema multiValued Predict Problem

2017-03-13 Thread Furkan KAMACI
You are right, I mean schemaless mode. I saw that it's your answer ;) I've edited solrconfig.xml and fixed it. Thanks! On Mon, Mar 13, 2017 at 5:46 PM, Alexandre Rafalovitch wrote: > There is managed schema, which means it is editable via API, and there > is 'schemaless' mode that uses that to a

Re: Error for Graph Traversal using Streaming Expressions

2017-03-13 Thread Joel Bernstein
Syntax looks ok. The logs should have a stack trace. One thing it could be is that gatherNodes will only work on single value fields currently. Joel Bernstein http://joelsolr.blogspot.com/ On Mon, Mar 13, 2017 at 1:59 AM, Zheng Lin Edwin Yeo wrote: > Hi, > > I am getting this error when I trie

Re: Managed Schema multiValued Predict Problem

2017-03-13 Thread Alexandre Rafalovitch
There is managed schema, which means it is editable via API, and there is 'schemaless' mode that uses that to auto-define the field based on the first occurance. 'schemaless' mode does not know if the field will be multi-valued the first time it sees content for that field. So, all the fields crea

Re: Managed Schema multiValued Predict Problem

2017-03-13 Thread Furkan KAMACI
OK, I found the answer here: http://stackoverflow.com/questions/38730035/solr-schemaless-mode-creating-fields-as-multivalued On Mon, Mar 13, 2017 at 5:15 PM, Furkan KAMACI wrote: > Hi, > > I generate dummy documents to test Solr 6.4.2. I create a field like that > at my test code: > >

Re: Predicting Date Field at Schemaless Mode

2017-03-13 Thread Alexandre Rafalovitch
And this is happening when the type is defined for the first time? You have no field, you send documents, you get new field defined and it is String, not Date? What's the value the field actually stores? Regards, Alex. http://www.solr-start.com/ - Resources for Solr users, new and experien

Managed Schema multiValued Predict Problem

2017-03-13 Thread Furkan KAMACI
Hi, I generate dummy documents to test Solr 6.4.2. I create a field like that at my test code: int customCount = r.nextInt(500); document.addField("custom_count", customCount); This field is indexed as: org.apache.solr.schema.TrieLongField and Multivalued

Re: Indexing CPU performance

2017-03-13 Thread Erick Erickson
Note that 70,000 docs/second pretty much guarantees that there are multiple shards. Lots of shards. But since you're using SolrJ, the very first thing I'd try would be to comment out the SolrClient.add(doclist) call so you're doing everything _except_ send the docs to Solr. That'll tell you wheth

Re: Predicting Date Field at Schemaless Mode

2017-03-13 Thread Furkan KAMACI
Everything works well but type is predicted as String instead of Date. I create just plain documents as follows: SimpleDateFormat simpleDateFormat = new SimpleDateFormat("-MM-dd'T'HH:mm"); Calendar startDate = new GregorianCalendar(2017, r.nextInt(6), r.nextInt(28));

Benefits of Data Locality in SOLR

2017-03-13 Thread Muhammad Imad Qureshi
We have a 30 node Hadoop cluster and each data node has a SOLR instance also running. We are adding 10 nodes. After adding nodes, we'll run HDFS balancer. This will affect data locality. does this impact how solr works (I mean performance)? ThanksImad

Re: Predicting Date Field at Schemaless Mode

2017-03-13 Thread Alexandre Rafalovitch
Any other definitions in that URP chain are triggered? Are you seeing this in a nested document by any chance? Regards, Alex. http://www.solr-start.com/ - Resources for Solr users, new and experienced On 13 March 2017 at 10:29, Furkan KAMACI wrote: > Hi, > > I'm testing schemaless mode

Solr5, Clustering & exact phrase problem

2017-03-13 Thread Bruno Mannina
Dear Solr-User, I’m trying to use solr clustering (Lingo algorithm) on my database (notices with id, title, abstract fields) All works fine when my query is simple (with or without Boolean operators) but if I try with exact phrase like: ..&q=ti:“snowboard binding”&… Then Solr generates on

Re: Changing definition of id field

2017-03-13 Thread Walter Underwood
If you are using master/slave (non-cloud), here is an approach. 1. Build a new master with the new schema. 2. Index all content there. 3. Send all updates to both the old master and new master. 4. One by one, take a slave down, delete all the documents, configure with the new schema and new maste

Re: solr init script won't execute under user account without login shell

2017-03-13 Thread Shawn Heisey
On 3/10/2017 10:12 AM, Chris Hostetter wrote: > If i understand correctly, you mean you've modified the init.d/solr > script such that when "su" is run you pass "-s /bin/bash" ? I do not think we can be absolutely certain that bash will *always* be in that exact location. Checked the bash source

Predicting Date Field at Schemaless Mode

2017-03-13 Thread Furkan KAMACI
Hi, I'm testing schemaless mode of Solr 6.4.2. Solr predicts fields types when I generate dummy data and index it to Solr. However I could not make Solr to predict date fields. I tried that: "custom_start":["2017-05-16T00:00"] which is a date parse result of SimpleDateFormat("-MM-dd'T'HH:mm

Indexing CPU performance

2017-03-13 Thread Mahmoud Almokadem
Hi great community, I have a SolrCloud with the following configuration: - 2 nodes (r3.2xlarge 61GB RAM) - 4 shards. - The producer can produce 13,000+ docs per second - The schema contains about 300+ fields and the document size is about 3KB. - Using SolrJ and SolrCloudClient,

RE: SOLR 4.8.0 Master/Slave Replication Issues

2017-03-13 Thread Pouliot, Scott
Looks like changing the autoCommit maxTime setting is what did it for the replication issues. Thanks Andrea/Erick for the reminders and pointers! Scott -Original Message- From: Pouliot, Scott [mailto:scott.poul...@peoplefluent.com] Sent: Friday, March 10, 2017 11:09 AM To: solr-user@

Re: Best way to synonymize with Wordnet

2017-03-13 Thread Alexandre Rafalovitch
This could be a good start: https://github.com/nicholasding/solr-lemmatizer Regards, Alex. http://www.solr-start.com/ - Resources for Solr users, new and experienced On 13 March 2017 at 09:17, OTH wrote: > Hello all, > > I am looking to incorporate synonymization using Wordnet in my Sol

Best way to synonymize with Wordnet

2017-03-13 Thread OTH
Hello all, I am looking to incorporate synonymization using Wordnet in my Solr application. Does any one have any advice on how to do this, and what the 'best practices' would be in this regard? Much thanks

Re: Inconsistent numFound in SC when querying core directly

2017-03-13 Thread Shawn Heisey
On 3/13/2017 3:16 AM, vbindal wrote: > I am facing the same issue where my query *:* returns inconsistent number > (almost 3) time the actual number in millions. > > When I try disturb=false on every machine, the results are correct. but > without `disturb=false` results are incorrect. This most

Re: Changing definition of id field

2017-03-13 Thread Shawn Heisey
On 3/13/2017 3:07 AM, danny teichthal wrote: > I have a limitation that our Solr cluster is "live" during full > indexing. We have many updates and the index is incrementally updated. > There's no way to index everything on a side index and replace. So, > I'm trying to find a solution where users c

Alphanumeric sort with alphabets first

2017-03-13 Thread Srinivasan Narayanan
Hello SOLR experts, I am new to SOLR and I am trying to do alphanumeric sort on string field(s). However, in my case, alphabets should come before numbers. I also have a large number of such fields (~2500), any of which can be alphanumerically sorted upon at runtime. I’ve explored below concept

Duplicate Documents which different version

2017-03-13 Thread vbindal
I'm using solr 4.10.0. I'm using "id" field as the unique key - it is passed in with the document when ingesting the documents into solr. When querying on different shards, I get duplicate documents with different "_version_". Out off approx. milions of these docs are duplicates Cloud has 3 shards

Re: Solr Update If Record Exists ?

2017-03-13 Thread alessandro.benedetti
As Alex suggested, _version_ is a special field which can be used in atomic updates with a special semantic : "If the content in the _version_ field is equal to '1', then the document must simply exist. In this case, no version matching occurs, but if the document does not exist, the updates will

Re: Inconsistent numFound in SC when querying core directly

2017-03-13 Thread vbindal
Hi, I am facing the same issue where my query *:* returns inconsistent number (almost 3 ) time the actual number in millions. When I try disturb=false on every machine, the results are correct. but without `disturb=false` results are incorrect. Can you guys suggest something? -- View this m

Re: Changing definition of id field

2017-03-13 Thread danny teichthal
Thanks Shawn, I understand that changing id to to string is not an option. I have a limitation that our Solr cluster is "live" during full indexing. We have many updates and the index is incrementally updated. There's no way to index everything on a side index and replace. So, I'm trying to find