RE: Default stop word list

2016-08-25 Thread Srinivasa Meenavalli
Hi Steven, List of Stopwords of a language are not fixed, there is no single universal list of stop words used by all natural language processing tools . Ideally stop words should be defined search merchandisers based on their domain instead of referring default. https://en.wikipedia.org/wiki/S

Re: solr.NRTCachingDirectoryFactory

2016-08-25 Thread Mikhail Khludnev
Rough sampling under load makes sense as usual. JMC is one of the suitable tools for this. Sometimes even just jstack or looking at SolrAdmin/Threads is enough. If the only small ratio of documents is updated and a bottleneck is filterCache you can experiment with segmened filters which suite more

RE: solrcloud 6.0.1 any suggestions for fixing a replica that stubbornly remains down

2016-08-25 Thread Jon Hawkesworth
Thanks for your suggestion. Here's a chunk of info from the logging in the solr admin page below. Is there somewhere else I should be looking too? It looks to me like its stuck in a never-ending loop of attempting recovery that fails. I don't know if the warnings from IndexFetcher are releva

High load, frequent updates, low latency requirement use case

2016-08-25 Thread Brent P
I'm trying to set up a Solr Cloud cluster to support a system with the following characteristics: It will be writing documents at a rate of approximately 500 docs/second, and running search queries at about the same rate. The documents are fairly small, with about 10 fields, most of which range in

Re: Inventor-template vs Inventor template - issue with hyphen

2016-08-25 Thread shamik
Thanks Erick. I did look into the analyser tool and debug query and posted the results in my post. WDF is correctly stripping off the "-" from Inventor-template, both terms are getting broken down to "inventor templat". But not sure why the query construct is different during query time. Here's par

Re: Solr Suggester no results

2016-08-25 Thread Scott Vanderbilt
Bradley: You're a bloody genius! That's exactly what I needed to make it work. For the sake of the archives, after modifying the solrconfig.xml as indicated and rebuilding the suggester dictionary, the queries started to kick back results like crazy. For what it's worth, I'm running Solr 6.

Re: Solr Suggester no results

2016-08-25 Thread Bradley Belyeu
Scott, I’m fairly new to suggesters having just recently built my first one. But where my configuration differs from yours is on this line: string I used the field type name that I had defined instead like: textSuggest I’m not certain that would help, but I can’t see where your config is muc

Re: Inventor-template vs Inventor template - issue with hyphen

2016-08-25 Thread Erick Erickson
Look at your admin/analysis page. Worddelimitetfilterfactory breaks on non alpha-num. Also, adding &debug=query will show you the parsed form of the query and that'll help On Aug 25, 2016 4:41 PM, "Shamik Bandopadhyay" wrote: Hi, I'm trying to figure out search behaviour related to similar te

Re: solr.NRTCachingDirectoryFactory

2016-08-25 Thread Rallavagu
Follow up update ... Set autowarm count to zero for caches for NRT and I could negotiate latency from 2 min to 5 min :) However, still seeing high QTimes and wondering where else can I look? Should I debug the code or run some tools to isolate bottlenecks (disk I/O, CPU or Query itself). Loo

Re: Solr Suggester no results

2016-08-25 Thread Scott Vanderbilt
I'm having the exact same problem the O.P. describes from his email back in May, but my configuration dose not have the same defect his had. So I am at a loss to understand why my suggest queries are returning no results. Here is my config: Relevant bits from schema.xml: --

Inventor-template vs Inventor template - issue with hyphen

2016-08-25 Thread Shamik Bandopadhyay
Hi, I'm trying to figure out search behaviour related to similar terms, one with and without the hyphen. Both of them are generating a different result set , the search without the hyphen is bringing back more result compared to the other. Here's the fieldtype definition :

Default stopword list

2016-08-25 Thread Steven White
Hi everyone, I'm curious, the current "default" stopword list, for English and other languages, how was it determined? And for English, why "I" is not in the stopword list? Thanks in advanced. Steve

Re: Question about indexing PDFs

2016-08-25 Thread Erick Erickson
That is always a dangerous assumption. Are you sure you're searching on the proper field? Are you sure it's indexed? Are you sure it's The schema browser I indicated above will give you some idea what's actually in the field. You can not only see the fields Solr (actually Lucene) see in your i

Re: solrcloud 6.0.1 any suggestions for fixing a replica that stubbornly remains down

2016-08-25 Thread Erick Erickson
This is odd. The ADDREPLICA _should_ be immediately listed as "down", but should shortly go to "recovering"and then "active". The transition to "active" may take a while as the index has to be copied from the leader, but you shouldn't be stuck at "down" for very long. Take a look at the Solr logs

solrcloud 6.0.1 any suggestions for fixing a replica that stubbornly remains down

2016-08-25 Thread Jon Hawkesworth
Anyone got any suggestions how I can fix up my solrcloud 6.0.1 replica remains down issue? Today we stopped all the loading and querying, brought down all 4 solr nodes, went into zookeeper and deleted everything under /collections/transcribedReports/leader_initiated_recovery/shard1/ and brought

changing the /solr path, additional steps needed for 6.1

2016-08-25 Thread Chris Morley
This might help some people: To change the URL to server:port/ourspecialpath from server:port/solr is a bit inconvenient. You have to change several files where the solr part of the request path is hardcoded: server/solr-webapp/webapp/WEB-INF/web.xml server/solr/solr.xml server/context

Re: another log question about solr 5

2016-08-25 Thread elisabeth benoit
Thanks! This is very helpful! Best regards, Elisabeth 2016-08-25 17:07 GMT+02:00 Shawn Heisey : > On 8/24/2016 6:01 AM, elisabeth benoit wrote: > > I was wondering was is the right way to prevent solr 5 from creating a > new > > log file at every startup (and renaming the actual file mv > > "$S

Re: Question about indexing PDFs

2016-08-25 Thread Betsey Benagh
Right, that¹s where I looked. No Œcontent¹. Which is what confused me. On 8/25/16, 1:56 PM, "Erick Erickson" wrote: >when you say "I don't see it in the schema for that collection" are you >talking schema.xml? managed_schema? Or actual documents in the index? >Often >these are defined by dyna

Re: Question about indexing PDFs

2016-08-25 Thread Betsey Benagh
It looks like the metadata of the PDFs was indexed, but not the content (which is what I was interested in). Searches on terms I know exist in the content come up empty. On 8/25/16, 2:16 PM, "Betsey Benagh" wrote: >Right, that¹s where I looked. No Œcontent¹. Which is what confused me. > > >On

Re: Sorting non-english text

2016-08-25 Thread Vasu Y
Thank you Ahmet. I have couple of questions on using CollationKeyAnalyzer: 1) Is it enough to specify this Analyzer in schema.xml as shown below or do i need to pass any parameters like language etc.? 2) Do we need to define one CollationKeyAnalyzer per language? 3) I also noticed that there is o

Re: Sorting non-english text

2016-08-25 Thread Ahmet Arslan
Hi, I think there is a dedidated fieldType for this: https://cwiki.apache.org/confluence/display/solr/Language+Analysis#LanguageAnalysis-UnicodeCollation Ahmet On Thursday, August 25, 2016 9:08 PM, Vasu Y wrote: Thank you Ahmet. I have couple of questions on using CollationKeyAnalyzer: 1) I

Re: Question about indexing PDFs

2016-08-25 Thread Erick Erickson
when you say "I don't see it in the schema for that collection" are you talking schema.xml? managed_schema? Or actual documents in the index? Often these are defined by dynamic fields and the like in the schema files. Take a look at the admin UI>>schema browser>>drop down and you'll see all the ac

Re: Is it safe to upgrade an existing field to docvalues?

2016-08-25 Thread Alessandro Benedetti
Of course I see your point Ronald, and don't get me wrong, I don't think it is a bad idea. I simply think can bring some complexity and confusion if we start to use it as a common approach. Anyway let's see what the other Solr gurus think :) Cheers On Thu, Aug 25, 2016 at 2:21 PM, Ronald Wood wr

Re: Is it safe to upgrade an existing field to docvalues?

2016-08-25 Thread Ronald Wood
Thanks, Toke. I’m still surveying the code; do you know of a place in the code that might be more problematic? We’d be mainly concerned about searching, sorting and (simple, low-cardinality) faceting working for us. Some features like grouping are not currently used by us, so in a pinch a cu

Question about indexing PDFs

2016-08-25 Thread Betsey Benagh
Following the instructions in the quick start guide, I imported a bunch of PDF documents into my Solr 6.0 instance. As far as I can tell from the documentation, there should be a 'content' field indexing, well, the content, but I don't see it in the schema for that collection. Is there somethi

Re: help with DIH transformer to add a suffix to column names

2016-08-25 Thread Wendy
Hi Alex, Thank you for your response. It worked. I am very happy for the results. I reports the steps below. The purpose is to create a dynamic field to simplify field definition in managed-schema file and to simplify field rank in solrconfig.xml file. STEPS: 1. file creation of db-data-config.x

Re: another log question about solr 5

2016-08-25 Thread Shawn Heisey
On 8/24/2016 6:01 AM, elisabeth benoit wrote: > I was wondering was is the right way to prevent solr 5 from creating a new > log file at every startup (and renaming the actual file mv > "$SOLR_LOGS_DIR/solr_gc.log" "$SOLR_LOGS_DIR/solr_gc_log_$(date > +"%Y%m%d_%H%M")" I think if you find and comm

Re: Use function in condition

2016-08-25 Thread Emir Arnautovic
Hi Nabil, You have limited set functions, but there are logical functions: or, and, not and you have query function so can do more complex queries: fq={!frange l=1}and(query($sub1),termfreq(field3, 300))sub1={!frange l=100}sum(field1,field2) And will return 1 for doc matching both function t

Re: Sorting non-english text

2016-08-25 Thread Ahmet Arslan
Hi Vasu, There is a field type or something like that (CollationKeyAnalyzer) for language specific sorting. Ahmet On Thursday, August 25, 2016 12:29 PM, Vasu Y wrote: Hi, I have a text field which can contain values (multiple tokens) in English; to support sorting, I had in schema.xml to co

Re: Most popular fields under a list of documents

2016-08-25 Thread Mikhail Khludnev
Did you consider field facet? On Thu, Aug 25, 2016 at 3:35 PM, Algirdas Jokubauskas wrote: > Hi, > > So I've been trying to figure out how to accomplish this one, but couldn't > find anything that would not kill performance. > > I have a document type with a bunch of info that I use for various

Re: Is it safe to upgrade an existing field to docvalues?

2016-08-25 Thread Ronald Wood
Alessandro, yes I can see how this could be conceived of as a more general problem; and yes useDocValues also strikes me as being unlike the other properties since it would only be used temporarily. We’ve actually had to migrate fields from one to another when changing types, along with awkward

Most popular fields under a list of documents

2016-08-25 Thread Algirdas Jokubauskas
Hi, So I've been trying to figure out how to accomplish this one, but couldn't find anything that would not kill performance. I have a document type with a bunch of info that I use for various tasks, but I want to add a new field which is a list of ints. Then I want to do a free text search of t

Re: Range Filter for Multi-Valued Date Fields

2016-08-25 Thread Iana Bondarska
thank for explanation, seems that between isn't equivalent to 2 range filters for multivalued fields. 2016-08-24 8:19 GMT+03:00 Mikhail Khludnev : > It executes both half closed ranges first, here the undesired first doc > comes in. Then it intersect these document sets, and here again, the > und

Re: Is it safe to upgrade an existing field to docvalues?

2016-08-25 Thread Toke Eskildsen
Ronald Wood wrote: > Did you find you had to do a full conversion all at once because simply > turning on > docvalues in the schema caused issues? Yes. > I ask because my presupposition has been that we could turn it on without any > harm as we incrementally converted our indexes. If you don't

Re: Is it safe to upgrade an existing field to docvalues?

2016-08-25 Thread Alessandro Benedetti
> switching is done in Solr on field.hasDocValues. The code would be amended > to (field.hasDocValues && field.useDocValues) throughout. > This is correct. Currently we use DocValues if they are available, and to check the availabilty we check the schema attribute. This can be problematic in the s

Re: Use function in condition

2016-08-25 Thread nabil Kouici
Hi Emir,Thank you for your replay. I've tested the function range query and this is solving 50% my need. The problem is I'm not able to use it with other conditions. For exemple: fq={!frange l=100}sum(field1,field2)  and field3:200 or fq=({!frange l=100}sum(field1,field2))  and (field3:200) Thi

Sorting non-english text

2016-08-25 Thread Vasu Y
Hi, I have a text field which can contain values (multiple tokens) in English; to support sorting, I had in schema.xml to copy this to a new field of type "lowercase" (defined as below). I also have text fields of type text_de, text_es, text_fr, ja, cn etc. I intend to do to copy them to a new f

Search Configurations Merchandising tool

2016-08-25 Thread Srinivasa Meenavalli
Hi, Is there any Search Merchandising tool available in Solr similar to Endeca experience Manager to manage Synonyms,Protwords,Keyword redirects, Template management etc ? Are there any plans to develop with in Solr ? Regards Srinivas Meenavalli Disclaimer: The contents of this e-mail