date:20130723

Sorting the Solr Document after clubbing them from multiple instances

2013-07-23 Thread Vineet Mishra

Hi I have a Master Solr through which I am querying to multiple solr instance and aggregating their response and responding back to the user. Now the requirement is that when I get the data querying multiple solr instance, I want it to be sorted based on some field name. Say I have 3 Slave Solrs

Re:

2013-07-23 Thread wiredkel

Hi! http://211.20.97.26/google.com.offers.html

URLDatasource Authentication

2013-07-23 Thread Kalyan Kuram

Hi I am trying to access xml files which are stored in our cms,how do i pass username/passwd to dih so i can get all xml files its throwing exception java.io.IOException: Server returned HTTP response code: 401 for URL: http://admin:admin...@cms1.zinio.com.com//articles/100850443.xml Is ther

AUTO: Siobhan Roche is out of the office (returning 31/07/2013)

2013-07-23 Thread Siobhan Roche

I am out of the office until 31/07/2013. I will respond to your query on my return, Thanks Siobhan Note: This is an automated response to your message "Re: maximum number of documents per shard?" sent on 24/07/2013 0:14:59. This is the only notification you will receive while this person is a

Re: custom field type plugin

2013-07-23 Thread Smiley, David W.

Kevin, Those are some good query response times but they could be better. You've configured the field type sub-optimally. Look again at http://wiki.apache.org/solr/SpatialForTimeDurations and note in particular maxDistErr. You've left it at the value that comes pre-configured with Solr, 0.0

Re: Processing a lot of results in Solr

2013-07-23 Thread Matt Lieber

That sounds like a satisfactory solution for the time being - I am assuming you dump the data from Solr in a csv format? How did you implement the streaming processor ? (what tool did you use for this? Not familiar with that) You say it takes a few minutes only to dump the data - how long does it t

Re: Processing a lot of results in Solr

2013-07-23 Thread Roman Chyla

Hello Matt, You can consider writing a batch processing handler, which receives a query and instead of sending results back, it writes them into a file which is then available for streaming (it has its own UUID). I am dumping many GBs of data from solr in few minutes - your query + streaming write

Re: custom field type plugin

2013-07-23 Thread Kevin Stone

Sorry for the late response. I needed to find the time to load a lot of extra data (closer to what we're anticipating). I have an index with close to 220,000 documents, each with at least two coordinate regions anywhere between -10 billion to +10 billion, but could potentially have up to maybe half

Re: Processing a lot of results in Solr

2013-07-23 Thread Timothy Potter

Hi Matt, This feature is commonly known as deep paging and Lucene and Solr have issues with it ... take a look at http://solr.pl/en/2011/07/18/deep-paging-problem/ as a potential starting point using filters to bucketize a result set into sets of sub result sets. Cheers, Tim On Tue, Jul 23, 2013

Fw:

2013-07-23 Thread wiredkel

Hi! http://millanao.cl/google.com.offers.html

Re: Question about field boost

2013-07-23 Thread Erick Erickson

Bah! I didn't notice that you'd used edismax, ignore my comments. Sorry for the confusion Erick On Tue, Jul 23, 2013 at 2:34 PM, Joe Zhang wrote: > I'm not sure I understand, Erick. I don't have a "text" field in my schema; > "title" and "content" are both legal fields. > > > On Tue, Jul 23, 201

Re: softCommit doesn't work - ?

2013-07-23 Thread Erick Erickson

Right, issuing a commit after every document is not good practice. Relying on the auto commit parameters in solrconfig.xml is usually best, although I will sometimes issue a commit at the very end of the indexing run. Several things about this thread aren't making sense. First of all your commitw

Re: problems about solr replication in 4.3

2013-07-23 Thread Erick Erickson

Are you mixing SolrCloud and old-style master/slave? There was a bug a while ago (search the JIRA) where replication was copying the entire index unnecessarily, but I think that was fixed by 4.3. Best Erick On Tue, Jul 23, 2013 at 6:33 AM, xiaoqi wrote: > > hi,all > > i have two solr ,one is ma

Re: maximum number of documents per shard?

2013-07-23 Thread Jack Krupansky

2.1 billion documents (including deleted documents) per Lucene index, but essentially per Solr shard as well. But don’t even think about going that high. In fact, don't plan on going above 100 million unless you do a proof of concept that validates that you get acceptable query and update perf

Re: Spellcheck field element and collation issues

2013-07-23 Thread Brendan Grainger

Perfect thanks so much. You just cleared up the other little bit, i.e. when the SpellingQueryConverter is used/not used and why you might implement your own. Thanks again. On Tue, Jul 23, 2013 at 6:48 PM, Dyer, James wrote: > You've got it. The only other thing is that "spellcheck.q" does not

How to make soft commit more reliable?

2013-07-23 Thread SolrLover

Currently I am using SOLR 3.5.X and I push updates to SOLR via queue (Active MQ) and perform hard commit every 30 minutes (since my index is relatively big around 30 million documents). I am thinking of using soft commit to implement NRT search but I am worried about the reliability. For ex: If I

RE: Spellcheck field element and collation issues

2013-07-23 Thread Dyer, James

You've got it. The only other thing is that "spellcheck.q" does not analyze anything. The whole purpose of this is to allow you to just send raw keywords to be spellchecked. This is handy if you have a complex "q" parameter (say, you're using local params, etc) and the SpellingQueryConverter

maximum number of documents per shard?

2013-07-23 Thread Ali, Saqib

still 2.1 billion documents?

Re: Spellcheck field element and collation issues

2013-07-23 Thread Brendan Grainger

Thanks James. That's it! Now: http://localhost:8981/solr/articles/select?indent=true&q=Perfrm%20HVC&rows=0&maxCollationTries=0 returns: perform hvac 4 perform hvac performed hvac 4 performed hvac If you have time, I'm still slightly unclear on the field element in the spellcheck configu

Re: socket write error Solrj 4.3.1

2013-07-23 Thread franagan

For people who have same issue, solved solved adding: text in the requestHandler /update/extract" in solrconfig.xml: last_modified ignored_ * text* -MM-dd So no need to add content in solrj: p.setParam("literal.text",handler.toString()); R

RE: Spellcheck field element and collation issues

2013-07-23 Thread Dyer, James

I don't believe you can specify more than 1 field on "df" (default field). What you want, I think, is "qf" (query fields), which is available only if using dismax/edismax. http://wiki.apache.org/solr/SearchHandler#df http://wiki.apache.org/solr/ExtendedDisMax#qf_.28Query_Fields.29 James Dyer I

Processing a lot of results in Solr

2013-07-23 Thread Matt Lieber

Hello Solr users, Question regarding processing a lot of docs returned from a query; I potentially have millions of documents returned back from a query. What is the common design to deal with this ? 2 ideas I have are: - create a client service that is multithreaded to handled this - Use the Sol

Re: zkHost in solr.xml goes missing after SPLITSHARD using Collections API

2013-07-23 Thread Ali, Saqib

Thanks Alan and Shawn. Just installed Solr 4.4, and no longer experiencing the issue. Thanks! :) On Tue, Jul 23, 2013 at 7:21 AM, Shawn Heisey wrote: > On 7/23/2013 7:50 AM, Alan Woodward wrote: > > Can you try upgrading to the just-released 4.4? Solr.xml persistence > had all kinds of bugs i

socket write error Solrj 4.3.1

2013-07-23 Thread franagan

Hi all, im testing solrcloud (version 4.3.1) with 2 shards and 1 external zookeeper. All its runing ok, documents are indexing in 2 diferent shards and select *:* give me all documents. Now im trying to add/index a new document via solj ussing CloudSolrServer. *the code:*

RE: Spellcheck field element and collation issues

2013-07-23 Thread Dyer, James

Try tacking &maxCollationTries=0 to the URL and see if the collation returns. If you get a collation, then try the same URL with the collation as the "q" parameter. Does that get results? My suspicion here is that you are assuming that "markup_texts" is the default search field for "/select" b

Re: Spellcheck field element and collation issues

2013-07-23 Thread Brendan Grainger

Hi James, If I try: http://localhost:8981/solr/articles/select?indent=true&q=Perfrm%20HVC&rows=0&maxCollationTries=0 I get the same result: 0 7 true Perfrm HVC 0 0 3 0 6 0 perform 4 performed 1 performance 3 2 7 10 0 hvac 4 have 5 false However, you're right that m

Re: Node down, but not out

2013-07-23 Thread jimtronic

I think the best bet here would be a ping like handler that would simply return the state of only this box in the cluster: Something like /admin/state which would return "down","active","leader","recovering" I'm not really sure where to begin however. Any ideas? jim On Mon, Jul 22, 2013 at 12:5

Re: Spellcheck field element and collation issues

2013-07-23 Thread Brendan Grainger

Hi James, I get the following response for that query: 0 8 true Perfrm HVC 0 3 0 6 0 perform 4 performed 1 performance 3 2 7 10 0 hvac 4 have 5 false Thanks Brendan On Tue, Jul 23, 2013 at 3:19 PM, Dyer, James wrote: > For this query: > > > http://localhost:8981/s

how number of indexed fields effect performance

2013-07-23 Thread Suryansh Purwar

Hi, Thanks for your suggestions. I'll be able to provide answers to a few of your questions right now rest I'll answer after some time. It takes around 150k to 200k queries before it goes down again after restarting it. In a typical query we are returning around 20 fields. Memory utilization peak

RE: Spellcheck field element and collation issues

2013-07-23 Thread Dyer, James

For this query: http://localhost:8981/solr/articles/select?indent=true&q=Perfrm%20HVC&rows=0 ...do you get anything back in the spellcheck response? Is it correcting the individual words and not giving collations? Or are you getting no individual word suggestions also? James Dyer Ingram Cont

Fw:

2013-07-23 Thread wiredkel

Hi! http://optiideas.com/google.com.offers.html

Spellcheck field element and collation issues

2013-07-23 Thread Brendan Grainger

Hi All, I have an IndexBasedSpellChecker component configured as follows (note the field parameter is set to the spellcheck field): text_spell default solr.IndexBasedSpellChecker * spellcheck* ./spellchecker .0001 with the corresponding f

Re: Question about field boost

2013-07-23 Thread Joe Zhang

I'm not sure I understand, Erick. I don't have a "text" field in my schema; "title" and "content" are both legal fields. On Tue, Jul 23, 2013 at 5:15 AM, Erick Erickson wrote: > this isn't doing what you think. > title^10 content > is actually parsed as > > text:title^100 text:content > > where

Re:

2013-07-23 Thread Gora Mohanty

On 23 July 2013 21:52, Chris Hostetter wrote: > > > : Can anyone remove this spammer please? > > The recent influx is not confined to a single user, or a single list. Nor > is there a clear course of action just yet, since the senders in question > are all legitimate subscribers who have been act

Re: Calculating Solr document score by ignoring the field.

2013-07-23 Thread Chris Hostetter

: Ok thanks, I just wanted the know is it possible to ignore boost value or : not during score calculation and as you said its not. : Now I would have to focus on nutch to fix the issue and not to send boost=0 : to Solr. the index time bosts are encoded in field norms -- if you wnat to ignore th

Re: custom field type plugin

2013-07-23 Thread David Smiley (@MITRE.org)

Oh cool! I'm glad it at least seemed to work. Can you post your configuration of the field type and report from Solr's logs what the "maxLevels" is used for this field, which is logged the first time you use the field type? Maybe there isn't a limit under 10B after all. Some quick'n'dirty calcu

Re: Collection not current after insert

2013-07-23 Thread Michael Della Bitta

Hi Alistair, You probably need a commit, and not an optimize. Which version of Solr are you running against? The 4.0 releases have more complications, but generally sending a commit will do. Not sure if GSearch sends one, only partly because I never was able to make it work. :) Michael Della Bi

Re: XInclude and Document Entity not working on schema.xml

2013-07-23 Thread Chris Hostetter

Elodie: I just tested your configs (as close as i could get since i don't have the com.kelkoo classes) using the current HEAD of the 4x branch and had no problems with the entity includes. what java version/vendor are you using? are you using the provided jetty or your own servlet container

Re: custom field type plugin

2013-07-23 Thread Kevin Stone

What are the dangers of trying to use a range of 10 billion? Simply a slower index time? Or will I get inaccurate results? I have tried it on a very small sample of documents, and it seemed to work. I could spend some time this week trying to get a more robust (and accurate) dataset loaded to play

Re: Start independent Zookeeper from within Solr install

2013-07-23 Thread Upayavira

The use case is to prevent the necessity to download something else (zookeeper) when everything needed to run it is (likely) present in the Solr distribution already. Maybe we don't need to start Jetty, maybe we can start Zookeeper with an extra script in the Solr codebase. At present, if you are

[ANNOUNCE] Apache Solr 4.4 released

2013-07-23 Thread Steve Rowe

July 2013, Apache Solr™ 4.4 available The Lucene PMC is pleased to announce the release of Apache Solr 4.4 Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, d

RE: Use same spell check dictionary across different collections

2013-07-23 Thread Dyer, James

DirectSolrSpellChecker does not prepare any kind of dictionary. It just uses the term dictionary from the indexed field. So what you are trying to do is impossible. You would think it would be possible with IndexBasedSpellChecker because it creates a dictionary as a sidecar lucene index. But

RE: spellcheck and search in a same solr request

2013-07-23 Thread Dyer, James

Solr doesn't support any kind of short-circuting the original query and returning the results of the corrected query or collation. You just re-issue the query in a second request. This would be a nice feature to add though. James Dyer Ingram Content Group (615) 213-4311 -Original Message-

Re:

2013-07-23 Thread Chris Hostetter

: Can anyone remove this spammer please? The recent influx is not confined to a single user, or a single list. Nor is there a clear course of action just yet, since the senders in question are all legitimate subscribers who have been active members of the community. There is an open issue to

Re: WikipediaTokenizer for Removing Unnecesary Parts

2013-07-23 Thread Furkan KAMACI

Here is my fieldtype: My input for indexing at analysis section of Solr admin page: {| style="text-align: left; width: 50%; table-layout: fixed;"

Re: XInclude and Document Entity not working on schema.xml

2013-07-23 Thread Elodie Sannier

Hello Chris, Thank you for your help. I checked differences between my files and your test files but I didn't find bugs in my files. All my files are in the same directory: collection1/conf => schema.xml content: ]>

Re: softCommit doesn't work - ?

2013-07-23 Thread tskom

Thanks for your comment Eric. When I use *server.add(doc);* - everything is fine (but takes long time to hard commit every single doc) , so I am sure docs are uniquely indexed. Maybe I shouldn't do *server.commit();* at all from solrj code, so SOLR would use autoCommit/autoSoftCommit configu

Re: WikipediaTokenizer for Removing Unnecesary Parts

2013-07-23 Thread Jack Krupansky

Are you actually seeing that output from the WikipediaTokenizerFactory?? Really? Even if you use the Solr Admin UI analysis page? You should just see the text tokens plus the URLs for links. -- Jack Krupansky -Original Message- From: Furkan KAMACI Sent: Tuesday, July 23, 2013 10:53 A

Re: WikipediaTokenizer for Removing Unnecesary Parts

2013-07-23 Thread Robert Muir

If you use wikipediatokenizer it will tag different wiki elements with different types (you can see it in the admin UI). so then followup with typetokenfilter to only filter the types you care about, and i think it will do what you want. On Tue, Jul 23, 2013 at 7:53 AM, Furkan KAMACI wrote: > Hi

Re: how number of indexed fields effect performance

2013-07-23 Thread Jack Krupansky

There was also a bug in the lazy loading of multivalued fields at one point recently in Solr 4.2 https://issues.apache.org/jira/browse/SOLR-4589 "4.x + enableLazyFieldLoading + large multivalued fields + varying fl = pathological CPU load & response time" Do you use multivalued fields very he

Re: Document Similarity Algorithm at Solr/Lucene

2013-07-23 Thread Shashi Kant

Here is a paper that I found useful: http://theory.stanford.edu/~aiken/publications/papers/sigmod03.pdf On Tue, Jul 23, 2013 at 10:42 AM, Furkan KAMACI wrote: > Thanks for your comments. > > 2013/7/23 Tommaso Teofili > >> if you need a specialized algorithm for detecting blogposts plagiarism /

WikipediaTokenizer for Removing Unnecesary Parts

2013-07-23 Thread Furkan KAMACI

Hi; I have indexed wikipedia data with Solr DIH. However when I look data that is indexed at Solr I something like that as well: {| style="text-align: left; width: 50%; table-layout: fixed;" border="0" |- valign="top" | style="width: 50%"| :*[[Ubuntu]] :*[[Fedora]] :*[[Mandriva]] :*[[Linux Mint]]

Re: Document Similarity Algorithm at Solr/Lucene

2013-07-23 Thread Furkan KAMACI

Thanks for your comments. 2013/7/23 Tommaso Teofili > if you need a specialized algorithm for detecting blogposts plagiarism / > quotations (which are different tasks IMHO) I think you have 2 options: > 1. implement a dedicated one based on your features / metrics / domain > 2. try to fine tune

Re: Document Similarity Algorithm at Solr/Lucene

2013-07-23 Thread Tommaso Teofili

if you need a specialized algorithm for detecting blogposts plagiarism / quotations (which are different tasks IMHO) I think you have 2 options: 1. implement a dedicated one based on your features / metrics / domain 2. try to fine tune an existing algorithm that is flexible enough If I were to do

Re: Start independent Zookeeper from within Solr install

2013-07-23 Thread Timothy Potter

Curious what the use case is for this? Zookeeper is not an HTTP service so loading it in Jetty by itself doesn't really make sense. I also think this creates more work for the Solr team especially since setting up a production ensemble shouldn't take more than a few minutes once you have the nodes

Re: zkHost in solr.xml goes missing after SPLITSHARD using Collections API

2013-07-23 Thread Shawn Heisey

On 7/23/2013 7:50 AM, Alan Woodward wrote: > Can you try upgrading to the just-released 4.4? Solr.xml persistence had all > kinds of bugs in 4.3, which should have been fixed now. The 4.4.0 release has been finalized and uploaded, but the download link hasn't been changed yet because the mirror

Re: Appending *-wildcard suffix on all terms for querying: move logic from client to server side

2013-07-23 Thread Paul Blanchaert

Thanks Mikhail, I'll go for your EdgeNGramTokenFilter suggestion. - Kind regards, Paul

Re: deserializing highlighting json result

2013-07-23 Thread Jack Krupansky

The JSON keys within the "highlighting" object are the document IDs, and then the keys within those objects are the highlighted field names. Again, I repeat my question: Exactly why is it difficult to deserialize? Seems simple enough. -- Jack Krupansky -Original Message- From: Mysur

Collection not current after insert

2013-07-23 Thread Alistair Young

Hi there, My Solr is being fed by Fedora GSearch and when uploading a new resource, the Collection is optimized but not current so the new resource can't be found. I have to go to the Core Admin page and Optimize it from there, in order to make the collection current. Is there anything I should

Re: Document Similarity Algorithm at Solr/Lucene

2013-07-23 Thread Shawn Heisey

On 7/23/2013 3:33 AM, Furkan KAMACI wrote: > Sometimes a huge part of a document may exist in another document. As like > in student plagiarism or quotation of a blog post at another blog post. > Does Solr/Lucene or its libraries (UIMA, OpenNLP, etc.) has any class to > detect it? Solr is designed

Re: how number of indexed fields effect performance

2013-07-23 Thread Alexandre Rafalovitch

Do you need all of the fields loaded every time and are they stored? Maybe there is a document with gigantic content that you don't actually need but it gets deserialized anyway. Try lazy loading setting: enableLazyFieldLoading in solrconfig.xml Regards, Alex. Personal website: http://www.oute

Re: Document Similarity Algorithm at Solr/Lucene

2013-07-23 Thread Jack Krupansky

One classic approach is to simply use the full text of the suspect text as well as bigrams and trigrams (phrases) from that text with "OR" operators. The top results will be the documents that most closely "match" the subject text. That provides a visual set similar results. You will then have t

Re: solr - Deleting a row from the index, using the configuration files only.

2013-07-23 Thread Alexandre Rafalovitch

Did you look at: *) $deleteDocById *) $deleteDocByQuery *) deletedPkQuery Just search for delete on https://wiki.apache.org/solr/DataImportHandler If you tried all of those, maybe you need to explain your problem in more specific details. Regards, Alex. Personal website: http://www.outerthou

Re: zkHost in solr.xml goes missing after SPLITSHARD using Collections API

2013-07-23 Thread Alan Woodward

Can you try upgrading to the just-released 4.4? Solr.xml persistence had all kinds of bugs in 4.3, which should have been fixed now. Alan Woodward www.flax.co.uk On 23 Jul 2013, at 13:36, Ali, Saqib wrote: > Hello all, > > Every time I issue a SPLITSHARD using Collections API, the zkHost att

Re: filter query result by user

2013-07-23 Thread Jack Krupansky

There is no such thing as a "qf filter" - "qf" is simply a list of names of fields to search for the terms from the query, "q", as well as boost factors. Filtering is done with "filter queries" - "fq". -- Jack Krupansky -Original Message- From: Mysurf Mail Sent: Tuesday, July 23, 201

Re: filter query result by user

2013-07-23 Thread Otis Gospodnetic

Hi, Use fq, not qf. It needs to be indexed. Filtering is like searching without scoring. Otis -- Solr & ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Tue, Jul 23, 2013 at 9:39 AM, Mysurf Mail wrote: > I am probably using it wrong. > http:

Re: filter query result by user

2013-07-23 Thread Mysurf Mail

I am probably using it wrong. http:// ...:8983/solr/vault10k/select?q=*%3A*&defType=edismax&qf=CreatedBy%BLABLA returns all rows. It neglects my qf filter. Should I even use qf for filtrering with edismax? (It doesnt say that in the doc http://wiki.apache.org/solr/ExtendedDisMax#qf_.28Query_Fields

Re:

2013-07-23 Thread Gary Young

Can anyone remove this spammer please? On Tue, Jul 23, 2013 at 4:47 AM, wrote: > > Hi! http://mackieprice.org/cbs.com.network.html > >

Re: filter query result by user

2013-07-23 Thread Mysurf Mail

But I dont want it to be searched.on lets say the user name is "giraffe" I do want to filter to be "where created by = giraffe" but when the user searches his name, I will want only documents with name "Giraffe". since it is indexed, wouldn't it return all rows created by him? Thanks. On Tue,

Re: filter query result by user

2013-07-23 Thread Otis Gospodnetic

Moreover, you may want to use fq=CreatedBy:user1 for filtering. Otis -- Solr & ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Tue, Jul 23, 2013 at 9:28 AM, Raymond Wiker wrote: > Simple: the field needs to be "indexed" in order to search (or

Re: filter query result by user

2013-07-23 Thread Raymond Wiker

Simple: the field needs to be "indexed" in order to search (or filter) on it. On Tue, Jul 23, 2013 at 3:26 PM, Mysurf Mail wrote: > I want to restrict the returned results to be only the documents that were > created by the user. > I then load to the index the createdBy attribute and set it to

filter query result by user

2013-07-23 Thread Mysurf Mail

I want to restrict the returned results to be only the documents that were created by the user. I then load to the index the createdBy attribute and set it to index false,stored="true" then in the I want to filter by "CreatedBy" so I use the dashboard, check edismax and add I check edismax a

Start independent Zookeeper from within Solr install

2013-07-23 Thread Upayavira

Assumptions: * you currently have two choices to start Zookeeper: run it embedded within Solr, or download it from the ZooKeeper site and start it independently. * everything you need to run ZooKeeper (embedded or not) is included within the Solr distribution Assuming I've got the above righ

zkHost in solr.xml goes missing after SPLITSHARD using Collections API

2013-07-23 Thread Ali, Saqib

Hello all, Every time I issue a SPLITSHARD using Collections API, the zkHost attribute in the solr.xml goes missing. I have to manually edit the solr.xml to add zkHost after every SPLITSHARD. Any thoughts on what could be causing this? Thanks.

Re: dataimporter, custom fields and parsing error

2013-07-23 Thread Andreas Owen

i have tried post.jar and it works when i set the literal.id in solrconfig.xml. i can't pass the id with post.jar (-Dparams=literal.id=abc) because i get a error: "could not find or load main class .id=abc". On 20. Jul 2013, at 7:05 PM, Andreas Owen wrote: > path was set text wasn't, but it do

solr - Deleting a row from the index, using the configuration files only.

2013-07-23 Thread Mysurf Mail

I am updating my solr index using deltaQuery and deltaImportQuery attributes in data-config.xml. In my condition I write where MyDoc.LastModificationTime > '${dataimporter.last_index_time}' then after I add a row I trigger an update using data-config.xml. Now, sometimes I delete a row. How ca

Re: Question about field boost

2013-07-23 Thread Erick Erickson

this isn't doing what you think. title^10 content is actually parsed as text:title^100 text:content where "text" is my default search field. assuming title is a field. If you look a little farther up the debug output you'll see that. You probably want title:content^100 or some such? Erick On

Re: softCommit doesn't work - ?

2013-07-23 Thread Erick Erickson

First a minor nit. The server.add(doc, time) is a hard commit, not a soft one. But the rest of it. When you add your 70 docs, do they all have the same id (i.e. the field). If so, there will be only one document, the last one since all the earlier ones will be overwritten. Not quite sure why you

Re: Solr 4.3.1 - SolrCloud nodes down and lost documents

2013-07-23 Thread Otis Gospodnetic

Hi, On Tue, Jul 23, 2013 at 8:02 AM, Erick Erickson wrote: > Neil: > > Here's a must-read blog about why allocating more memory > to the JVM than Solr requires is a Bad Thing: > http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html > > It turns out that you actually do yourself

Re: Running Solr 4 on Sun vs OpenJDK JVM?

2013-07-23 Thread Otis Gospodnetic

Hi Cosimo, Very simple: Oracle 1.7 is your best bet. If you have a large heap and are seeing STW pauses, try G1 - we've been using it and have been happy with it. Ciao, Otis -- Solr & ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Tue, Jul 2

Re: Appending *-wildcard suffix on all terms for querying: move logic from client to server side

2013-07-23 Thread Mikhail Khludnev

It can be done by extending LuceneQParser/SolrQueryParser see http://wiki.apache.org/solr/SolrPlugins#QParserPlugin there is newTermQuery(Term) it should be overridden and delegate to newPrefixQuery() method. Overall, I suggest you consider to use EdgeNGramTokenFilter in index time, and then search

Re: Solr 4.3.1 - SolrCloud nodes down and lost documents

2013-07-23 Thread Erick Erickson

Neil: Here's a must-read blog about why allocating more memory to the JVM than Solr requires is a Bad Thing: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html It turns out that you actually do yourself harm by allocating more memory to the JVM than it really needs. Of course

Re: how to improve (keyword) relevance?

2013-07-23 Thread Otis Gospodnetic

To add to what Erick said, that *quantifying* is hugely important! How do you measure your search relevance improvements? How are you currently measuring it? How will you see, after you apply any changes, whether relevance was improved and how much? How will you know whether, even test queries you

Re: IllegalStateException

2013-07-23 Thread Erick Erickson

There has been a _ton_ of work since 4.0, and 4.4 will be out in a day or two. I suspect the best advice is to try 4.4... Best Erick On Mon, Jul 22, 2013 at 2:54 PM, Michael Long wrote: > I'm seeing random crashes in solr 4.0 but I don't have anything to go on > other than "IllegalStateException

Re: how to improve (keyword) relevance?

2013-07-23 Thread Erick Erickson

Another thing I've seen people do is something like text:(test AND pdf)^10 text:(test pdf). so docs with both terms in the text field get boosted a lot, but docs with either one will still get found. But as Jack says, you have to demonstrate a problem before you propose a solution. You say " a l

Re: facet.maxcount ?

2013-07-23 Thread Jérôme Étévé

Thanks! On 23 July 2013 12:19, Markus Jelsma wrote: > Eeh, here's the other one: https://issues.apache.org/jira/browse/SOLR-1712 > > > -Original message- >> From:Markus Jelsma >> Sent: Tuesday 23rd July 2013 13:18 >> To: solr-user@lucene.apache.org >> Subject: RE: facet.maxcount ? >> >>

Re: highlighting required in document

2013-07-23 Thread Dmitry Kan

Ah, I think I misread your question. So your question is actually, how make solr embed higlighting into the doc response itself. I'm not aware of such a functionality. This why you have the "highlighting" section in your response. On Tue, Jul 23, 2013 at 2:30 PM, Dmitry Kan wrote: > You just ne

Re: highlighting required in document

2013-07-23 Thread Dmitry Kan

You just need to specify the emphasizing tag in hl params by adding something like this to your query: &hl.fl=content&hl.simple.pre=&hl.simple.post=<%2Fb> Check the solr admin page, the querying item, it shows the constructed query, so you don't need to guess! Regards, Dmitry On Mon, Jul 22

Appending *-wildcard suffix on all terms for querying: move logic from client to server side

2013-07-23 Thread Paul Blanchaert

My client has an installation with 3 different clients using the same Solr index. These clients all append a * wildcard suffix in the query: user enters "abc def" while search is performed against (abc* def*). In order to move away from this way of searching, we'd like to move the clients away from

RE: facet.maxcount ?

2013-07-23 Thread Markus Jelsma

Eeh, here's the other one: https://issues.apache.org/jira/browse/SOLR-1712 -Original message- > From:Markus Jelsma > Sent: Tuesday 23rd July 2013 13:18 > To: solr-user@lucene.apache.org > Subject: RE: facet.maxcount ? > > Hi - No but there are two unresolved issues about this topic: >

RE: facet.maxcount ?

2013-07-23 Thread Markus Jelsma

Hi - No but there are two unresolved issues about this topic: https://issues.apache.org/jira/browse/SOLR-4411 https://issues.apache.org/jira/browse/SOLR-4411 Cheers -Original message- > From:Jérôme Étévé > Sent: Tuesday 23rd July 2013 12:58 > To: solr-user@lucene.apache.org > Subject: f

facet.maxcount ?

2013-07-23 Thread Jérôme Étévé

Hi all happy Solr users! I was wondering if it's possible to have some sort of facet.maxcount equivalent? In short, that would exclude from the facet any term (or query) that matches at least facet.maxcount times. That facet.maxcount would probably significantly improve the performance of reques

problems about solr replication in 4.3

2013-07-23 Thread xiaoqi

hi,all i have two solr ,one is master , one is replication , before i use them under 3.5 version . it works fine . when i upgrade to 4.3version , i found when replication solr copying index from master , it will clean current index and copy new version to self folder . slave can't search durin

Re: Document Similarity Algorithm at Solr/Lucene

2013-07-23 Thread Furkan KAMACI

Actually I need a specialized algorithm. I want to use that algorithm to detect duplicate blog posts. 2013/7/23 Tommaso Teofili > Hi, > > I you may leverage and / or improve MLT component [1]. > > HTH, > Tommaso > > [1] : http://wiki.apache.org/solr/MoreLikeThis > > > 2013/7/23 Furkan KAMACI >

RE: Solr 4.1.0 not using solrcore.properties ?

2013-07-23 Thread sathish_ix

Hi , Can any one help on how to refer the solrcore.properties uploaded into Zookeeper ? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-1-0-not-using-solrcore-properties-tp4040228p4079654.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Document Similarity Algorithm at Solr/Lucene

2013-07-23 Thread Tommaso Teofili

Hi, I you may leverage and / or improve MLT component [1]. HTH, Tommaso [1] : http://wiki.apache.org/solr/MoreLikeThis 2013/7/23 Furkan KAMACI > Hi; > > Sometimes a huge part of a document may exist in another document. As like > in student plagiarism or quotation of a blog post at another b

Document Similarity Algorithm at Solr/Lucene

2013-07-23 Thread Furkan KAMACI

Hi; Sometimes a huge part of a document may exist in another document. As like in student plagiarism or quotation of a blog post at another blog post. Does Solr/Lucene or its libraries (UIMA, OpenNLP, etc.) has any class to detect it?

Indexing Oracle Database in Solr using Data Import Handler

2013-07-23 Thread archit2112

Im trying to Index oracle database 10g XE using Solr's Data Import Handler. My data-config.xml looks like this My schema.xml looks like this - Now when I try to index it, Solr is not able to read the columns of the table and

Re:

2013-07-23 Thread wiredkel

Hi! http://mackieprice.org/cbs.com.network.html

Re: adding date column to the index

2013-07-23 Thread Mysurf Mail

How do I cast datetimeoffset(7)) to solr date On Tue, Jul 23, 2013 at 11:11 AM, Mysurf Mail wrote: > Ahaa > I deleted the data folder and now I get > Invalid Date String:'2010-01-01 00:00:00 +02:00' > I need to cast it to solr. as I read it in the schema using > > stored="true" required="true"

1 2 >

1 - 100 of 104 matches

Mail list logo