Re: shards as subset of All Shards

2014-07-18 Thread Jack Krupansky
"500B" - as in 500,000,000,000? Really? -- Jack Krupansky -Original Message- From: tomasv Sent: Friday, July 18, 2014 8:18 PM To: solr-user@lucene.apache.org Subject: shards as subset of All Shards Hello, This is kind of weird, but here goes: We are setting up a document repository (

Re: Performance issues with facets and filter query exclusions

2014-07-18 Thread Hayden Muhl
That query is representative of some of the queries in my test, but I didn't notice any correlation between using the match all docs query and poor query performance. Here's another example of a query that took longer than expected. qt=en&q=dress green leather&fq=userId:(383)&fq={!tag=productR

shards as subset of All Shards

2014-07-18 Thread tomasv
Hello, This is kind of weird, but here goes: We are setting up a document repository (SOLR4). This will be a large (to us) repository of approximately 500B documents. The documents are based on "people". Once all my documents are uploaded, we will receive new (follow-up) information on our "peop

RE: SolrCloud performance issues regarding hardware configuration

2014-07-18 Thread Toke Eskildsen
search engn dev [sachinyadav0...@gmail.com] wrote: > out of 700 million documents 95-97% values are unique approx. That's quite a lot. If you are not already using DocValues for that, you should do so. So, each shard handles ~175M documents. Even with DocValues, there is an overhead of just hav

Re: Plugin init failure for custom analysis filter

2014-07-18 Thread Jack Krupansky
Further down in the stack trace you will find the "cause" of the exception. Solr is calling the "init" method, but your code is throwing an exception. Your jar is probably in the proper place, otherwise Solr wouldn't have been able to load it and call the init method for it. -- Jack Krupansky

Re: Range query and Highlighting

2014-07-18 Thread Jack Krupansky
You can specify an "alternate" query to use for highlighting purposes, with the "hl.q" parameter. It doesn't affect the query results, but lets you control which terms get highlighted. See: http://wiki.apache.org/solr/HighlightingParameters#hl.q -- Jack Krupansky -Original Message- F

Range query and Highlighting

2014-07-18 Thread Jae Joo
If I use a combined query - range query and others (term query), all terms in field matched is highlighted. Any way to highlight only the term(s) in term query? Here is example. +date:{20031231 TO *] +(title:red) It highlight all terms except stopword. using fq would not be an option because th

Re: Match query string within indexed field?

2014-07-18 Thread prashantc88
Hi, Thanks for the reply. Is there a better way to do it if the scenario is the following: Indexed values: "abc def" Query String:"xy abc def z" So basically the query string has to match all the words present in the indexed data to give a MATCH. -- View this message in context: http://luc

text search problem

2014-07-18 Thread EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions)
Hi, Below is the text_general field type when I search Text:Boradway it is not returning all the records, it returning only few records. But when I search for Text:*Broadway*, it is getting more records. When I get into multiple words ln search like "Broadway Hotel", it may not get "Broadway"

Re: Performance issues with facets and filter query exclusions

2014-07-18 Thread Yonik Seeley
On Fri, Jul 18, 2014 at 2:10 PM, Hayden Muhl wrote: > I was doing some performance testing on facet queries and I noticed > something odd. Most queries tended to be under 500 ms, but every so often > the query time jumped to something like 5000 ms. > > q=*:*&fq={!tag=productBrandId}productBrandId:

Performance issues with facets and filter query exclusions

2014-07-18 Thread Hayden Muhl
I was doing some performance testing on facet queries and I noticed something odd. Most queries tended to be under 500 ms, but every so often the query time jumped to something like 5000 ms. q=*:*&fq={!tag=productBrandId}productBrandId:(156 1227)&facet.field={!ex=productBrandId}productBrandId&face

Re: Mixing ordinary and nested documents

2014-07-18 Thread Umesh Prasad
Comments inline On 16 July 2014 20:31, Bjørn Axelsen wrote: > Hi Solr users > > I would appreciate your inputs on how to handle a *mix *of *simple *and > *nested > *documents in the most easy and flexible way. > > I need to handle: > >- simple documens: webpages, short articles etc. (approx

Re: Match query string within indexed field?

2014-07-18 Thread Umesh Prasad
You are looking for wildcard queries but they can be quite costly and you will need to benchmark performance .. Specially Suffix wild card queries (of type *abc) are quite costly .. You can convert a suffix query into a prefix query by using a ReverseTokenFilter during index time analysis. A sea

Re: SolrCloud performance issues regarding hardware configuration

2014-07-18 Thread Erick Erickson
Right, this is the worst kind of use-case for faceting. You have 150M docs/shard and are asking up to 125M buckets to count into, plus control structures. Performance of this (even without OOMs) will be a problem. Having multiple queries execute this simultaneously will increase memory usage. So y

Match query string within indexed field?

2014-07-18 Thread prashantc88
Hi, My requirement is to give a match whenever a string is found within the indexed data of a field irrespective of where it is found. For example, if I have a field which is indexed with the data "abc". Now any of the following query string must give a match: xyzabc,xyabc, abcxyz .. I am using

Re: Does Solr deduplicate stored values

2014-07-18 Thread Erick Erickson
The data will be stored 100 times in your example, independently for each document, albeit compressed. Hmmm, doing that would certainly reduce the disk space requirements, but it'd also complicate the document read process. Instead of a single contiguous read from disk per document, there'd be mul

Re: Integrating Solr with HBase Using Lily Project

2014-07-18 Thread Erick Erickson
Probably a question better asked on the Lily or HBase user forums since those projects use Solr and will have a much better sense of what Solr versions are compatible. Best, Erick On Fri, Jul 18, 2014 at 4:20 AM, Vivekanand Ittigi wrote: > Hi, > > I tried to Integrate Solr with HBase Using HB

Re: Query join on multiple fields

2014-07-18 Thread Erick Erickson
Joins can be chained, don't quite know if that fits your use-case... But Whenever I see a question that looks like "How can I make Solr behave like a database", I have to ask two questions: 1> Is Solr the right tool? It's a marvelous search engine, but not a RDBMS. if your problem really

Re: Context-aware suggesters in Solr

2014-07-18 Thread RaCo
Hi Alan or Areek, Were you able to make this work? I have solr 4.9 with suggestcomponent and analyzingInfixLookupFactory wich is amazing, but can't make filtering on the suggestion based on a field that has their ID's. Any help would be appreciated. -- View this message in context: http://lu

Re: Solr 4.7.2 auto suggestion

2014-07-18 Thread benjelloun
the dictionary file named : "fwfsta.bin" contain NULL. thats mean the configuration is not correct. maybe i need to add or change somthing on thanks for help -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-7-2-auto-suggestion-tp4147677p4147878.html Sent from the Sol

Re: Understanding query behaviour in LBHttpSolrServer

2014-07-18 Thread Jack Krupansky
For traditional, non-SolrCloud "distributed" mode, load balancing and sharded queries are independent concepts - you can use them each separately or together at your choice. If you want the query to be sharded for a non-SolrCloud Solr server, then you need to pass the "shards" parameter on each

Re: Understanding query behaviour in LBHttpSolrServer

2014-07-18 Thread Shawn Heisey
On 7/18/2014 12:51 AM, search engn dev wrote: > From my understanding solr and solrj works as below, > 1. LBHttpSolrServer keeps pinging above list of servers and maintains list > of live servers. > 2. Every time query arives it picks one server from the list (round-robin > fashion) > 3. Sends q

RE: SolrCloud performance issues regarding hardware configuration

2014-07-18 Thread search engn dev
out of 700 million documents 95-97% values are unique approx. My facet query is : http://localhost:8983/solr/select?q=*:*&rows=0&facet=true&facet.limit=1&facet.field=user_digest Above query throws OOM exception as soon as fire it to solr. -- View this message in context: http://lucene.

Does Solr deduplicate stored values

2014-07-18 Thread Alexandre Rafalovitch
Hello, Say I have 100 documents with the same large field value. Stored and indexed. I know the indexed tokens are stored only once with posting lists. But what about original stored values? Do I get 100 copies of those? Or is Solr smarter that that? Regards, Alex

Integrating Solr with HBase Using Lily Project

2014-07-18 Thread Vivekanand Ittigi
Hi, I tried to Integrate Solr with HBase Using HBase Indexer project https://github.com/NGDATA/hbase-indexer/wiki (one of sub projects of Lily). I used Apache HBase running on HDFS and solr 4.8.0 but i started getting below mentioned error. 14/07/18 11:55:38 WARN impl.SepConsumer: Error processi

Re: solr boosting any perticular URL

2014-07-18 Thread Umesh Prasad
PS : You can give huge boosts to url at query time on a per request basis. Don't specify the bqs on solrconfig.xml .. Always determine add bqs for the query at run time.. On 18 July 2014 15:49, Umesh Prasad wrote: > Or you can give huge boosts to url at query time. If you are using dismax > th

Re: solr boosting any perticular URL

2014-07-18 Thread Umesh Prasad
Or you can give huge boosts to url at query time. If you are using dismax then you can use bq like bq = myfield:url1 ^ 50 .. That will bring up url1 as the first result always. On 18 July 2014 15:27, benjelloun wrote: > hello, > > before index the URL to a field in Solr, you can use j

Re: solr boosting any perticular URL

2014-07-18 Thread benjelloun
hello, before index the URL to a field in Solr, you can use java api(Solrj) and do a test if(URL=="") index on field1 else index on field2 then use edismax to boost a specific field: explicit 10 edismax field1^5.0 field2^1.0 -- View this

Re: questions on Solr WordBreakSolrSpellChecker and WordDelimiterFilterFactory

2014-07-18 Thread benjelloun
hello, for WordDelimiterFilterFactory: this is an exemple in schema.xml to folow: and for WordBreakSolrSpe

Re: Solr 4.7.2 auto suggestion

2014-07-18 Thread benjelloun
Hello, In Solr Admin i put on /selects: q="indexa", it should auto suggest "indexation" and other suggestion if existe. i have this response: { "responseHeader": { "status": 0, "QTime": 169, "params": { "indent": "true", "q": "indexa", "_": "1405671103093", "wt": "json" } }, "command": "build",

Plugin init failure for custom analysis filter

2014-07-18 Thread ssivakumaran
Getting this error below on adding new custom filter to schema.xml: org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] fieldType "textCustom": Plugin init failure for [schema.xml] analyzer/filter: Error instantiating class: 'org.apache.so

RE: SolrCloud performance issues regarding hardware configuration

2014-07-18 Thread Toke Eskildsen
From: search engn dev [sachinyadav0...@gmail.com]: > 1 collection : 4 shards : each shard has one master and one replica > total documents : 700 million Are you using DocValues for your facet fields? What is the approximate number of unique values in your facets and what is their type (string, nu