Boost documents based on criteria

2015-01-23 Thread Jorge Luis Betancourt González
Hi all, Recently I got an interesting use case that I'm not sure how to implement, the idea is that the client wants a fixed number of documents, let's call it N, to appear in the top of the results. Let me explain a little we're working with web documents so the idea is too promote the documen

Re: Solr regex query help

2015-01-23 Thread Erick Erickson
Right. As I mentioned on the original JIRA, the regex match is happening on _terms_. You are conflating the original input (the entire field) with the individual terms that the regex is applied to. I suggest that you look at the admin/analysis page. There you'll see the terms that are indexed and

Solr regex query help

2015-01-23 Thread Arumugam, Suresh
Hi All, We have indexed the documents to Solr & not able to query using the Regex. Our data looks like as below in a Text Field, which is indexed using the ClassicTokenizer. 1b ::PIPE:: 04/14/2014 ::PIPE:: 01:32:48 ::PIPE:: BMC Power/Reset action ::PIPE:: Delayed shutdown time

Re: Replicas fall into recovery mode right after update

2015-01-23 Thread Nishanth S
Can you tell what version of solr you are using and what causes your replicas to go into recovery. On Fri, Jan 23, 2015 at 8:40 PM, gouthsmsimhadri wrote: > I'm working with a cluster of solr-cloud servers at a configration of 10 > shards and 4 replicas on each shard in stress environment. > Pla

Replicas fall into recovery mode right after update

2015-01-23 Thread gouthsmsimhadri
I'm working with a cluster of solr-cloud servers at a configration of 10 shards and 4 replicas on each shard in stress environment. Planned production configuration is 10 shards and 15 replicas on each shard. Current commit settings are as follows 50 18000

Re: Need help importing data

2015-01-23 Thread Carl Roberts
NVM I figured this out. The problem was this: pk="link" in rss-dat.config.xml but unique id not link in schema.xml - it is id. From rss-data-config.xml: https://nvd.nist.gov/feeds/xml/cve/nvdcve-2.0-2002.xml.zip"; processor="XPathEntityProcessor" forEach="/nv

Re: Need Help with custom ZIPURLDataSource class

2015-01-23 Thread Carl Roberts
NVM - I have this working. The problem was this: pk="link" in rss-dat.config.xml but unique id not link in schema.xml - it is id. From rss-data-config.xml: url="https://nvd.nist.gov/feeds/xml/cve/nvdcve-2.0-2002.xml.zip"; processor="XPathEntityProcessor"

Need help importing data

2015-01-23 Thread Carl Roberts
Hi, I have set log4j logging to level DEBUG and I have also modified the code to see what is being imported and I can see the nextRow() records, and the import is successful, however I have no data. Can someone please help me figure this out? Here is the logging output: ow: r1={{id=CVE-20

Re: SolrCloud result correctness compared with single core

2015-01-23 Thread Erick Erickson
you might, but probably not enough to notice. At 50G, the tf/idf stats will _probably_ be close enough you won't be able to tell. That said, recently distributed tf/idf has been implemented but you need to ask for it, see SOLR-1632. This is Solr 5.0 though. I've rarely seen it matter except in fa

Re: Avoiding wildcard queries using edismax query parser

2015-01-23 Thread Jorge Luis Betancourt González
Tank your Michael for sharing your patch! It was really helpful, but for our particular requirement a SearchComponent that rewrites our query is enough (as suggested by Alexandre, although thanks a lot), basically we just escape a bunch of * that we know are "problematic". This approach allow

Re: Connection Reset Errors with Solr 4.4

2015-01-23 Thread Mike Drob
I'm not sure what a reasonable workaround would be. Perhaps somebody else can brainstorm and make a suggestion, sorry. On Tue, Jan 20, 2015 at 12:56 PM, Nishanth S wrote: > Thank you Mike.Sure enough,we are running into the same issue you > mentoined.Is there a quick fix for this other than the

Re: Solr I/O increases over time

2015-01-23 Thread Shawn Heisey
On 1/23/2015 3:52 PM, Daniel Cukier wrote: > I am running around eight solr servers (version 3.5) instances behind a > Load Balancer. All servers are identical and the LB is weighted by number > connections. The servers have around 4M documents and receive a constant > flow of queries. When the sol

Need Help with custom ZIPURLDataSource class

2015-01-23 Thread Carl Roberts
Hi, I created a custom ZIPURLDataSource class to unzip the content from an http URL for an XML ZIP file and it seems to be working (at least I have no errors), but no data is imported. Here is my configuration in rss-data-config.xml: https://nvd.nist.gov/feeds/xml/cve/nvdcve-2.0-2002.xml.zi

Solr I/O increases over time

2015-01-23 Thread Daniel Cukier
I am running around eight solr servers (version 3.5) instances behind a Load Balancer. All servers are identical and the LB is weighted by number connections. The servers have around 4M documents and receive a constant flow of queries. When the solr server starts, it works fine. But after some time

Fwd: Need Help with custom ZIPURLDataSource class

2015-01-23 Thread Carl Roberts
Hi, I created a custom ZIPURLDataSource class to unzip the content from an http URL for an XML ZIP file and it seems to be working (at least I have no errors), but no data is imported. Here is my configuration in rss-data-config.xml: https://nvd.nist.gov/feeds/xml/cve/nvdcve-2.0-2002.xml.zi

SolrCloud result correctness compared with single core

2015-01-23 Thread Yandong Yao
Hi Guys, As the main scoring mechanism is based tf/idf, so will same query running against SolrCloud return different result against running it against single core with same data sets as idf will only count df inside one core? eg: Assume I have 100GB data: A) Index those data using single core B)

Re: How to inject custom response data after results have been sorted

2015-01-23 Thread tedsolr
Thank you so much for your responses Hoss and Shalin. I gather the DocTransfomer allows manipulations to the doc list returned in the results. That is very cool. So the transformer has access to the Solr Request. I haven't seen the hook yet, but I believe you - I'll have to keep looking. It would c

Re: multiple data source indexing through data import handler

2015-01-23 Thread Qiu Mo
Alex, Thanks, I tried ${item.id}, it doesn’t work. However, if hardcode a id number instead of '${item.id}’ , the it add this one line to every document. for example select description from feature where item_id=3456 then this single description is added to every document as a field. it s

Re: multiple data source indexing through data import handler

2015-01-23 Thread Alexandre Rafalovitch
Try ${item.id} as that's what you are mapping it to. See also: https://issues.apache.org/jira/browse/SOLR-4383 Regards, Alex. Sign up for my Solr resources newsletter at http://www.solr-start.com/ On 23 January 2015 at 15:01, Qiu Mo wrote: > I am indexing data from two different databa

Re: Is it possible to read multiple RSS feeds and XML Zip file feeds with DIH into one core?

2015-01-23 Thread Alexandre Rafalovitch
Unzipping things might be an issue. You may need to do that as part of a batch job outside of Solr. For the rest, go through the documentation first, it does answer a bunch of questions. There is also a page on the Wiki as well, not just in the reference guide. Regards, Alex. Sign up for m

Re: Retrieving Phonetic Code as result

2015-01-23 Thread Jack Krupansky
That's phone the filter is doing - transforming text into phonetic codes at index time. And at query time as well to do the phonetic matching in the query. The actual phonetic codes are stored in the index for the purposes of query matching. -- Jack Krupansky On Fri, Jan 23, 2015 at 12:57 PM, Ami

multiple data source indexing through data import handler

2015-01-23 Thread Qiu Mo
I am indexing data from two different databases, but I can't add second database to indexing, can anyone help! below is my dats-config.xml my log indicate that '${item.ID}' is not catch any value from entity i

Re: Is it possible to read multiple RSS feeds and XML Zip file feeds with DIH into one core?

2015-01-23 Thread Carl Roberts
Excellent - thanks Shalin. But how does delta-import work? Does it do a clean also? Does it require a unique Id? Does it update existing records and only add when necessary? And, how would I go about unzipping the content from a URL to then import the unzipped XML? Is the recommended way

Re: How to inject custom response data after results have been sorted

2015-01-23 Thread Chris Hostetter
: If you just need to transform an individual result, that can be done by a : custom DocTransformer. But from your email, I think you need a custom : SearchComponent. if your PostFilter has already collected all of the info you need, and you now just wnat to return a subset of that information t

Re: Is it possible to read multiple RSS feeds and XML Zip file feeds with DIH into one core?

2015-01-23 Thread Shalin Shekhar Mangar
If you add clean=false as a parameter to the full-import then deletion is disabled. Since you are ingesting RSS there is no need for deletion at all I guess. On Fri, Jan 23, 2015 at 7:31 PM, Carl Roberts wrote: > OK - Thanks for the doc. > > Is it possible to just provide an empty value to preIm

Re: Is it possible to read multiple RSS feeds and XML Zip file feeds with DIH into one core?

2015-01-23 Thread Carl Roberts
OK - Thanks for the doc. Is it possible to just provide an empty value to preImportDeleteQuery to disable the delete prior to import? Will the data still be deleted for each entity during a delta-import instead of full-import? Is there any capability in the handler to unzip an XML file from

Re: How to inject custom response data after results have been sorted

2015-01-23 Thread Shalin Shekhar Mangar
If you just need to transform an individual result, that can be done by a custom DocTransformer. But from your email, I think you need a custom SearchComponent. On Fri, Jan 23, 2015 at 6:23 PM, tedsolr wrote: > Hello! With the help of this community I have solved 2 problems on my way > to > crea

Re: Sporadic Socket Timeout Error during Import

2015-01-23 Thread Shalin Shekhar Mangar
The default is 10 seconds and you can increase it by adding a "readTimeout" attribute (whose value is in milliseconds) in the URLDataSource e.g. On Fri, Jan 23, 2015 at 6:33 PM, Carl Roberts wrote: > Hi, > > I am using the DIH RSS example and I am running into a sporadic socket > timeout error

Re: Is it possible to read multiple RSS feeds and XML Zip file feeds with DIH into one core?

2015-01-23 Thread Alexandre Rafalovitch
https://cwiki.apache.org/confluence/display/solr/Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler Admin UI has the interface, so you can play there once you define it. You do have to use Curl, there is no built-in scheduler. Regards, Alex. Sign up for my Solr resources n

Sporadic Socket Timeout Error during Import

2015-01-23 Thread Carl Roberts
Hi, I am using the DIH RSS example and I am running into a sporadic socket timeout error during every 3rd or 4th request. Below is the stack trace. What is the default socket timeout for reads and how can I increase it? 15046 [Thread-17] ERROR org.apache.solr.handler.dataimport.URLDataSource

Re: Is it possible to read multiple RSS feeds and XML Zip file feeds with DIH into one core?

2015-01-23 Thread Carl Roberts
Hi Alex, If I am understanding this correctly, I can define multiple entities like this? ... How would I trigger loading certain entities during start? How would I trigger loading other entities during update? Is there a way to set an auto-update for certain entities so

How to inject custom response data after results have been sorted

2015-01-23 Thread tedsolr
Hello! With the help of this community I have solved 2 problems on my way to creating a search that collapses documents based on multiple fields. The CollapsingQParserPlugin was key. I have a new problem now. All the custom stats I generate in my custom QParser makes for way to much data to simply

Re: Retrieving Phonetic Code as result

2015-01-23 Thread Amit Jha
Can I extend solr to add phonetic codes at time of indexing as uuid field getting added. Because I want to preprocess the metaphone code because I calculate the code on runtime will give me some performance hit. Rgds AJ > On Jan 23, 2015, at 5:37 PM, Jack Krupansky wrote: > > Your app can use

Re: Suggester Example In Documentation Not Working

2015-01-23 Thread Chris Hostetter
: However, you will notice on page 228, under the section "Suggester", it : gives an example of a suggester search component using : solr.SpellCheckComponet. ... : So it would appear the solr.SuggestComponent has been around since 4.7, : but the documentation has not caught up with the

Re: Is it possible to read multiple RSS feeds and XML Zip file feeds with DIH into one core?

2015-01-23 Thread Alexandre Rafalovitch
You can define both multiple entities in the same file and nested entities if your list comes from an external source (e.g. a text file of URLs). You can also trigger DIH with a name of a specific entity to load just that. You can even pass DIH configuration file when you are triggering the process

Is it possible to read multiple RSS feeds and XML Zip file feeds with DIH into one core?

2015-01-23 Thread Carl Roberts
Hi, I have the RSS DIH example working with my own RSS feed - here is the configuration for it. https://nvd.nist.gov/download/nvd-rss.xml"; processor="XPathEntityProcessor" forEach="/RDF/item" transformer="DateFormatTransformer

Re: SolrCloud timing out marking node as down during startup.

2015-01-23 Thread Shalin Shekhar Mangar
Hi Mike, This is a bug which was fixed in Solr 4.10.3 via http://issues.apache.org/jira/browse/SOLR-6610 and it slows down cluster restarts. Since you have a single node cluster, you will run into it on every restart. On Thu, Jan 22, 2015 at 6:42 PM, Michael Roberts wrote: > Hi, > > I'm seeing

Re: How do you query a sentence composed of multiple words in a description field?

2015-01-23 Thread Walter Underwood
It isn’t that complicated. You need to understand URL escaping for working with any REST client. As soon as you need to read the logs, you’ll need to understand it. The double quote becomes %22 and the colon becomes %3A. In a parameter, the spaces can be +, but in a path they need to be %20.

Re: Is Solr a good candidate to index 100s of nodes in one XML file?

2015-01-23 Thread Carl Roberts
I got the RSS DIH example to work with my own RSS feed and it works great - thanks for the help. On 1/22/15, 11:20 AM, Carl Roberts wrote: Thanks. I am looking at the RSS DIH example right now. On 1/21/15, 3:15 PM, Alexandre Rafalovitch wrote: Solr is just fine for this. It even ships with

Re: How do you query a sentence composed of multiple words in a description field?

2015-01-23 Thread Carl Roberts
Thanks Erick, I think I am going to start using the browser for testing...:) Perhaps also a REST client for the Mac. Regards, Joe On 1/22/15, 6:56 PM, Erick Erickson wrote: Have you considered using the admin/query form? Lots of escaping is done there for you. Once you have the form of the

solr 4.7 Converting from one boost method to another using ExternalFileField

2015-01-23 Thread Parnit Pooni
Hi, I'm currently running into issues creating a solr query to try and boost on two ExternalFileFields. The following query seems to work, but is extremely long and repeats query terms and does not use what I would like to use. http://localhost/solr/Index/select?fl=field(externalFileField1),fiel

Re: Using tmpfs for Solr index

2015-01-23 Thread Shawn Heisey
On 1/23/2015 2:40 AM, Toke Eskildsen wrote: > If you have a single index on a box with enough memory to fully cache > the index data, I would recommend just using MMapDirectory without > involving tmpfs. If it's Solr 4.x, I have pretty much the same advice, with one small change. I would actually

Re: trying to get Apache Solr working with Dovecot.

2015-01-23 Thread Shawn Heisey
On 1/23/2015 12:11 AM, Kevin Laurie wrote: > The solr / lucene version is 4.10.2 > > I am trying to figure out how to see if Dovecot and Solr can contact. > Apparently when I make searches there seems to be no contact. I might try > to rebuild dovecot again and see if that solves the problem. > >

Re: Suggester Example In Documentation Not Working

2015-01-23 Thread Charles Sanders
Well, I'm running LucidWorks 2.9.1 which contains Solr 4.8. I initially was working with the Solr documentation: http://archive.apache.org/dist/lucene/solr/ref-guide/apache-solr-ref-guide-4.8.pdf However, you will notice on page 228, under the section "Suggester", it gives an example of a su

Re: In a SolrCloud, will a solr core(shard replica) failover to its good peer when its state is not Active

2015-01-23 Thread Shawn Heisey
On 1/22/2015 11:28 PM, 汤林 wrote: > From a testing aspect, if we would like to verify the case that a query > request to a "down" core on a running server will be failed over to the > good core on another running server, is there any way to make a core as > "down" on a running server? Thanks! I thi

RE: Avoiding wildcard queries using edismax query parser

2015-01-23 Thread Ryan, Michael F. (LNG-DAY)
Here's a Jira for this: https://issues.apache.org/jira/browse/SOLR-3031 I've attached a patch there that might be useful for you. -Michael -Original Message- From: Jorge Luis Betancourt González [mailto:jlbetanco...@uci.cu] Sent: Thursday, January 22, 2015 4:34 PM To: solr-user@lucene.a

Re: Retrieving Phonetic Code as result

2015-01-23 Thread Jack Krupansky
Your app can use the field analysis API (FieldAnalysisRequestHandler) to query Solr for what the resulting field values are for each filter in the analysis chain for a given input string. This is what the Solr Admin UI Analysis web page uses. See: http://lucene.apache.org/solr/4_10_2/solr-core/org

Re: Avoiding wildcard queries using edismax query parser

2015-01-23 Thread Jack Krupansky
Presence of a wildcard in a query term is detected by the traditional Solr and edismax query parsers and causes normal term analysis to be bypassed. As I said, wildcards are a specific feature that dismax specifically doesn't support - this has nothing to do with edismax. -- Jack Krupansky On Fri

Re: Count total frequency of a word in a SOLR index

2015-01-23 Thread Nitin Solanki
Ok.. Is there any to use user-defined field instead of word and freq in suggestion block ? On Fri, Jan 23, 2015 at 2:33 PM, Mikhail Khludnev < mkhlud...@griddynamics.com> wrote: > I don't think it's implemented. > I can propose to send the first request to termsComponent, that yields > terms by p

Re: Using tmpfs for Solr index

2015-01-23 Thread Toke Eskildsen
On Fri, 2015-01-23 at 07:34 +0100, deniz wrote: > Would it boost any performance in case the index has been switched from > RAMDirectoryFactory to use tmpfs? RAMDirectoryFactory does not perform well for non-small indexes, so ... probable yes. > Or it would simply do the same thing like MMap? A

Re: Field collapsing memory usage

2015-01-23 Thread Toke Eskildsen
On Thu, 2015-01-22 at 22:52 +0100, Erick Erickson wrote: > What do you think about folding this into the Solr (or Lucene?) code > base? Or is it to specialized? (writing under the assumption that DVEnabler actually works as it should for everyone and not just us) Right now it is an explicit tool.

Re: Count total frequency of a word in a SOLR index

2015-01-23 Thread Mikhail Khludnev
I don't think it's implemented. I can propose to send the first request to termsComponent, that yields terms by prefix, then the second request can gather totaltermfreqs. On Fri, Jan 23, 2015 at 11:51 AM, Nitin Solanki wrote: > Thanks Mikhail Khludnev.. > I tried this: > * > http://localhost:898

Re: Count total frequency of a word in a SOLR index

2015-01-23 Thread Nitin Solanki
Thanks Mikhail Khludnev.. I tried this: *http://localhost:8983/solr/collection1/spell?q=gram:%22the%22&rows=1&fl=totaltermfreq(gram,the) * and it worked. I want to know more. Can we do same thing *(tota

Query to get A-Z index

2015-01-23 Thread Priya Rodrigues
Is there a way to get the A-Z index from a field eg. if the field name contains Alpha, Pogo, Zoro, it should return A, P, Z Found something similar here http://stackoverflow.com/questions/8974299/solr-query-by-range-of-name But is there a way to do this without copyField? Thanks, Priya

Re: Count total frequency of a word in a SOLR index

2015-01-23 Thread Mikhail Khludnev
https://cwiki.apache.org/confluence/display/solr/Function+Queries totaltermfreq() of you need to sum term freq on docs from resultset? On Fri, Jan 23, 2015 at 10:56 AM, Nitin Solanki wrote: > I indexed some text_file files in Solr as it is. Applied " > *StandardTokenizerFactory*" and "*Shingle