random results at specific slots

2012-06-05 Thread srinir
Hi, I would like to return results sorted by score (desc), but i would like to insert random results into some predefined slots (lets say 10, 14 and 18). The reason I want to do that is I boost click-through rate based features significantly and i want to give a chance to documents which doesnt ha

RE: Add HTTP-header from ResponseWriter

2012-06-05 Thread Markus Jelsma
Thanks, i'll check the issues. -Original message- > From:Jack Krupansky > Sent: Mon 04-Jun-2012 17:19 > To: solr-user@lucene.apache.org > Subject: Re: Add HTTP-header from ResponseWriter > > There is some commented-out code in SolrDispatchFilter.doFilter: > > // add info to http heade

Re: random results at specific slots

2012-06-05 Thread srinir
Other option I could think of is to write a custom component which implements handleResponses, where i can pick random documents from across shards and insert it into the ResponseBuilder's resultIds ? I would place this component at the end (or after QueryCOmponent). will that work ? is there a bet

maxScore always returned

2012-06-05 Thread Markus Jelsma
Hi, On trunk the maxScore response attribute is always returned even if score is not part of fl. Is this intentional? Thanks,

Re: Multi-words synonyms matching

2012-06-05 Thread O. Klein
The reason multi word synonyms work better if you use LUCENE_33 is because then Solr uses the SlowSynonymFilter instead of SynonymFilterFactory (FSTSynonymFilterFactory). But I don't know if the difference between them is a bug or not. Maybe someone has more insight? Bernd Fehling-2 wrote > >

Re: Strip html

2012-06-05 Thread Tigunn
Hello, I advanced on my problem. The index and fieldtype are good : I forgot copyfield "body_strip_html" on text, the defaultSearchField. Newbie's mistake. Now, solr returns all xml files i want. But, in php, the text isn't displayed for 2 xml files (with term "castor" snipped by html or xml t

Re: Multi-words synonyms matching

2012-06-05 Thread Bernd Fehling
Do you have test cases? What are you sending to your SynonymFilterFactory? What are you expecting it should return? What is it returning when setting to Version.LUCENE_33? What is it returning when setting to Version.LUCENE_36? Am 05.06.2012 10:56, schrieb O. Klein: > The reason multi word s

Search timeout for Solrcloud

2012-06-05 Thread arin_g
Hi, We use solrcloud in production, and we are facing some issues with queries that take very long specially deep paging queries, these queries keep our servers very busy. i am looking for a way to stop (kill) queries taking longer than a specific amount of time (say 5 seconds), i checked timeAllo

Re: maxScore always returned

2012-06-05 Thread darul
maybe look into your solrconfig.xml file whether fl not set by default on your request handler -- View this message in context: http://lucene.472066.n3.nabble.com/maxScore-always-returned-tp3987727p3987733.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Correct way to deal with source data that may include a multivalued field that needs to be used for sorting?

2012-06-05 Thread Erick Erickson
Older versions of Solr didn't really sort correctly on multivalued fields, they just didn't complain . Hmmm. Off the top of my head, you can: 1> You don't say what the documents to be indexed are. Are they Solr-style documents on disk or do you process them with, say, a SolrJ program?

ReadTimeout on commit

2012-06-05 Thread spring
Hi, I'm indexing documents in batches of 100 docs. Then commit. Sometimes I get this exception: org.apache.solr.client.solrj.SolrServerException: java.net.SocketTimeoutException: Read timed out at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpS olrServer.java

Solr instances: many singles vs multi-core

2012-06-05 Thread Christian von Wendt-Jensen
Hi, I'm runing a cluster of Solr serveres for an index split up in a lot of shards. Each shard is replicated. Current setup is one Tomcat instance per shard, even if the Tomcats are running on the same machine. My question is this: Would it be more advisable to run one Tomcat per machine with

RE: maxScore always returned

2012-06-05 Thread Markus Jelsma
Hi. We set fl in the request handler's default without score. thanks -Original message- > From:darul > Sent: Tue 05-Jun-2012 12:05 > To: solr-user@lucene.apache.org > Subject: Re: maxScore always returned > > maybe look into your solrconfig.xml file whether fl not set by default on >

SolrDispatchFilter, no hits in response NamedList if distrib=true

2012-06-05 Thread Markus Jelsma
Hi, I'm adding the numFound to the HTTP response header in a custom SolrDispatchFilter in the writeResponse() method, similar to the commented code in doFilter(). This works just fine but not for distributed requests. I'm trying to read "hits" from the SolrQueryResponse but it is not there for

Re: Search timeout for Solrcloud

2012-06-05 Thread Jason Rutherglen
There isn't a solution for killing long running queries that works. On Tue, Jun 5, 2012 at 1:34 AM, arin_g wrote: > Hi, > We use solrcloud in production, and we are facing some issues with queries > that take very long specially deep paging queries, these queries keep our > servers very busy. i a

RE: Search timeout for Solrcloud

2012-06-05 Thread Markus Jelsma
There's an open issue for improving deep paging performance: https://issues.apache.org/jira/browse/SOLR-1726 -Original message- > From:arin_g > Sent: Tue 05-Jun-2012 12:03 > To: solr-user@lucene.apache.org > Subject: Search timeout for Solrcloud > > Hi, > We use solrcloud in producti

filtering number and repeated contents

2012-06-05 Thread Mark , N
Is it possible to filter out numbers and disclaimer ( repeated contents) while indexing to SOLR? These are all surplus information and do not want to index it I have tried using boilerpipe algorithm as well to remove surplus infromation from web pages such as navigational elements, templates, and

Is it faster to search over many different fields or one field that combines the values of all those other fields?

2012-06-05 Thread santamaria2
Say I have various categories of 'tags'. I want a keyword search to search through my index of articles. So I search over: 1) the title. 2) the body 3) about 10 of these tag-categories. Each tag category is multivalued with a few words per value. Without considering the affect on 'relevance', and

Re: Strip html

2012-06-05 Thread Tigunn
I resolve my problem: I had to specify the field to return with my query. Thanks A LOT for your help ! -- View this message in context: http://lucene.472066.n3.nabble.com/Strip-html-tp3987051p398.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Correct way to deal with source data that may include a multivalued field that needs to be used for sorting?

2012-06-05 Thread Jack Krupansky
By saying "dirty data" you imply that only one of the values is "good" or "clean" and that the others can be safely discarded/ignored, as opposed to true multi-valued data where each value is there for good reason and needs to be preserved. In any case, how do you know/decide which value should

Re: Can't index sub-entitties in DIH

2012-06-05 Thread Rafael Taboada
Hi Gora, > Your configuration files look fine. It would seem that something > is going wrong with the SELECT in Oracle, or with the JDBC > driver used to access Oracle. Could you try: * Manually doing the SELECT for the entity, and sub-entity > to ensure that things are working. > The SELECTs

Re: random results at specific slots

2012-06-05 Thread Jack Krupansky
Take a look at "query elevation". It may do exactly want you want, but at a minimum, it would show you how this kind of thing can be done. See: http://wiki.apache.org/solr/QueryElevationComponent -- Jack Krupansky -Original Message- From: srinir Sent: Tuesday, June 05, 2012 3:08 AM T

HypericHQ plugins?

2012-06-05 Thread Paul Libbrecht
Hello SOLR users, is there someone who wrote plugins for HypericHQ to monitor the very many metrics SOLR exposes through JMX? I am a kind of newbie to JMX and the tutorials of Hyperic aren't simple enough to my taste... so I'd be helped if someone did it already. thanks in advance Paul

RE: Can't index sub-entitties in DIH

2012-06-05 Thread Dyer, James
I sucessfully use Oracle with DIH although none of my imports have sub-entities. (slight difference, I'm on ojdbc5.jar w/10g...). It may be you have a driver that doesn't play well with DIH in some cases. You might want to try these possible workarounds: - rename the columns in SELECT with "

Re: score filter

2012-06-05 Thread debdoot
Hello Grant, I need to frame a query that is a combination of two query parts and I use a 'function' query to prepare the same. Something like: q={!type=func q.op=AND df=text}product(query($uq,0.0),query($cq,0.1)) where $uq and $cq are two queries. Now, I want a search result returned only if I

Re: Can't index sub-entitties in DIH

2012-06-05 Thread Rahul Warawdekar
Hi, One of the possibilities for this kind of issue to occur may be the case sensitivity of column names in Oracle. Can you apply a transformer and check the entity map which actually contains the keys and their values ? Also, please try specifying upper case field names for Oracle and try if that

Re: Is it faster to search over many different fields or one field that combines the values of all those other fields?

2012-06-05 Thread Michael Della Bitta
I don't have the answer to your question, but I certainly don't think anybody should be slapped in the face for asking a question! Michael Della Bitta Appinions, Inc. -- Where Influence Isn’t a Game. http://www.appinions.com On Tue, Jun 5, 2012 a

Re: Can't index sub-entitties in DIH

2012-06-05 Thread Gora Mohanty
Hi, Sorry, I am stumped, and cannot help further without access to Oracle. Please disregard the bit about the quotes: I was reading a single quote followed by a double quote as three single quotes. There was no issue there. Since your configurations for Oracle, and mysql are different, are you us

Re: Can't index sub-entitties in DIH

2012-06-05 Thread Rafael Taboada
Hi James. Thanks for your advice. As I said, alias works for me. I use joins instead of sub-entities... Heavily... These config files work for me... db-data-config.xml

Re: filtering number and repeated contents

2012-06-05 Thread Jack Krupansky
My (very limited) understanding of "boilerpipe" in Tika is that it strips out "short text", which is great for all the menu and navigation text, but the typical disclaimer at the bottom of an email is not very short and frequently can be longer than the email message body itself. You may have to

Re: Can't index sub-entitties in DIH

2012-06-05 Thread Rafael Taboada
Hi Gora, Yes, I restart Solr for each change I do. Thanks for your help... An small question Is DIH work well with Oracle database? Using all the features It can do? On Tue, Jun 5, 2012 at 9:32 AM, Gora Mohanty wrote: > Hi, > > Sorry, I am stumped, and cannot help further without > acces

Re: Can't index sub-entitties in DIH

2012-06-05 Thread Gora Mohanty
On 5 June 2012 20:05, Rafael Taboada wrote: > Hi James. > > Thanks for your advice. > > As I said, alias works for me. I use joins instead of sub-entities... > Heavily... > These config files work for me... [...] How about NULL values in the column that you are doing a left outer join on? Cannot

Re: Can't index sub-entitties in DIH

2012-06-05 Thread Gora Mohanty
On 5 June 2012 20:08, Rafael Taboada wrote: > Hi Gora, > > Yes, I restart Solr for each change I do. > > Thanks for your help... > > An small question Is DIH work well with Oracle database? Using all the > features It can do? Unfortunately, I have never used DIH with Oracle. However, this sho

Re: Is it faster to search over many different fields or one field that combines the values of all those other fields?

2012-06-05 Thread Jack Krupansky
There may be a raw performance advantage to having all values in a single combined field, but then you loose the opportunity to boost title and tag field hits. With the extended dismax query parser you have the ability to specify the field list in the "qf" request parameter so that the query c

Re: London OSS search social - meetup 6th June

2012-06-05 Thread Richard Marr
Quick reminder, we're meeting at The Plough in Bloomsbury tomorrow night. Details and RSVP on the meetup page: http://www.meetup.com/london-search-social/events/65873032/ -- Richard Marr On 3 Jun 2012, at 00:29, Richard Marr wrote: > > Apologies for the short notice guys, we're meeting up at

Re: Correct way to deal with source data that may include a multivalued field that needs to be used for sorting?

2012-06-05 Thread Aaron Daubman
Thanks for the responses, By saying "dirty data" you imply that only one of the values is "good" or > "clean" and that the others can be safely discarded/ignored, as opposed to > true multi-valued data where each value is there for good reason and needs > to be preserved. In any case, how do you k

Re: Is it faster to search over many different fields or one field that combines the values of all those other fields?

2012-06-05 Thread Mikhail Khludnev
IRC, Lucene in Action book loops around this point almost every chapter: multifield query is faster. On Tue, Jun 5, 2012 at 7:04 PM, Jack Krupansky wrote: > There may be a raw performance advantage to having all values in a single > combined field, but then you loose the opportunity to boost titl

Re: Is it faster to search over many different fields or one field that combines the values of all those other fields?

2012-06-05 Thread Gora Mohanty
On 5 June 2012 22:05, Mikhail Khludnev wrote: > IRC, Lucene in Action book loops around this point almost every chapter: > multifield query is faster. [...] Surely this is dependent on the type, and volume of one's data? As with many issues, isn't the answer that "it depends", i.e., one should pr

Re: Search timeout for Solrcloud

2012-06-05 Thread Jack Krupansky
I'm curious... how deep is it that is becoming problematic? Tens of pages, hundreds, thousands, millions? And when you say deep paging, are you incrementing through all pages down to the depth or "gapping" to some very large depth outright? If the former, I am wondering if the Solr cache is bu

Boost by Nested Query / Join Needed?

2012-06-05 Thread naleiden
Hi, First off, I'm about a week into all things Solr, and still trying to figure out how to fit my relational-shaped peg through a denormalized hole. Please forgive my ignorance below :-D I have the need store a One-to-N type relationship, and perform a boost a related field. Let's say I want to

Is FileFloatSource's WeakHashMap cache only cleaned by GC?

2012-06-05 Thread Gregg Donovan
We've encountered GC spikes at Etsy after adding new ExternalFileFields a decent number of times. I was always a little confused by this behavior -- isn't it just one big float[]? why does that cause problems for the GC? -- but looking at theĀ FileFloatSource code a little more carefully, I wonder i

Re: Search timeout for Solrcloud

2012-06-05 Thread arin_g
for example when we set the start parameter to 1000, 2000 or higher (page 100, 200 ...), it takes very long (20, 30 seconds, sometimes even 100 seconds). this usually happens when there is a big gap between pages, mostly hit by web crawlers (when they crawl the last page link on our website). --

Solr 4.0 Clean Commit for production use

2012-06-05 Thread TheNova
Hey guys, I am trying to upgrade to Solr 4.0. Do you know where I get a clean 4.0 commit for production use? I did an SVN checkout from http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/ and it looks like they have migrated to 5.0. From the link below it looks like that happened by the end of

Re: Solr 4.0 Clean Commit for production use

2012-06-05 Thread Chris Hostetter
: Hey guys, I am trying to upgrade to Solr 4.0. Do you know where I get a clean Clarification: 4.0 does not exist yet. What does exist is the 4x branch, from which you can build snapshots that should be very similar to what will eventually be released as 4.0. : http://svn.apache.org/repos/asf

Re: Solr 4.0 Clean Commit for production use

2012-06-05 Thread Jack Krupansky
The Nightly Build wiki still says it is "4.x" even though it is now 5.x. See: https://wiki.apache.org/solr/NightlyBuilds AFAIK, there isn't a 4.x nightly build running. (Is that going to happen soon??) You can checkout the repo for the 4x branch: http://svn.apache.org/repos/asf/lucene/dev/bran

Re: Solr 4.0 Clean Commit for production use

2012-06-05 Thread Chris Hostetter
: The Nightly Build wiki still says it is "4.x" even though it is now 5.x. : See: : https://wiki.apache.org/solr/NightlyBuilds : : AFAIK, there isn't a 4.x nightly build running. (Is that going to happen : soon??) Yes... http://mail-archives.apache.org/mod_mbox/lucene-dev/201205.mbox/%3c3fd307e

Re: using Tika (ExtractingRequestHandler)

2012-06-05 Thread Chris Hostetter
I've updated the wiki to try and fill in some of these holes... http://wiki.apache.org/solr/ExtractingRequestHandler : i'm looking at using Tika to index a bunch of documents. the wiki page seems to be a little bit out of date ("// TODO: this is out of date as of Solr 1.4 - dist/apache-solr-ce

Re: Correct way to deal with source data that may include a multivalued field that needs to be used for sorting?

2012-06-05 Thread Chris Hostetter
: The real issue here is that the docs are created externally, and the : producer won't (yet) guarantee that fields that should appear once will : actually appear once. Because of this, I don't want to declare the field as : multiValued="false" as I don't want to cause indexing errors. It would be

Re: TermComponent and Optimize

2012-06-05 Thread Chris Hostetter
: It seems that TermComponent is looking at all versions of documents in the index. : : Does this is the expected behavior for TermComponent? Any suggestion about how to solve this? Yes... http://wiki.apache.org/solr/TermsComponent "The doc frequencies returned are the number of documents tha

Re: using Tika (ExtractingRequestHandler)

2012-06-05 Thread Jack Krupansky
Hoss, In your edit, I noticed that the wiki makes "SolrPlugin" a link, but to a nonexistent page, although the page "SolrPlugins" does exist. See: "it is provided as a SolrPlugin," http://wiki.apache.org/solr/ExtractingRequestHandler I also noticed a few other things: 1. Reference to the "/s

Re: Solr instances: many singles vs multi-core

2012-06-05 Thread Jack Krupansky
It probably can work out reasonably well in both scenarios, but you do get some additional flexibility with multiple Tomcat instances: 1. Any "per-instance" Tomcat limits become per-core rather than for all cores on that machine. 2. If you have to restart Tomcat, only a single shard is impacte

Re: index special characters solr

2012-06-05 Thread KPK
Thanks for your reply! I tried using the types field in WordDelimiterFilterFactory wherein I was passing a text file which contained % $ as alphabets. But even then it didnt get indexed and neither did it show up in search results. Am I missing something? Thanks, Kushal -- View this message in c

Re: I got ERROR, Unable to execute query

2012-06-05 Thread Jihyun Suh
I used 3.x mysql. After I migrate to 5.x mysql, I don't get same error just like ' Unable to execute query'. Maybe low version of mysql and Solr have some problems, I don't know exactly. 2012/6/5 Jihyun Suh > That's why I made a new DB for dataimport test. So my tables have no > access or activ

Solr, I have perfomance problem for indexing.

2012-06-05 Thread Jihyun Suh
I have 128 tables of mysql 5.x and each table have 3,5000 rows. When I start dataimport(indexing) in Solr, it takes 5 minutes for one table. But When Solr indexs 20th table, it takes around 10 minutes for one table. And then When it indexs 40th table, it takes around 20 minutes for one table. Solr

Re: index special characters solr

2012-06-05 Thread KPK
Thanks Jack for your help! I found my mistake, rather than classifying those special characters as ALPHA , I classified it as a DIGIT. Also I missed the same entry for search analyzer. So probably that was the reason for not getting relevant results. I spent a lot of time figuring this out. So I'l

Re: index special characters solr

2012-06-05 Thread Jack Krupansky
Thanks. I'm sure someone else will have the same issue at some point. -- Jack Krupansky -Original Message- From: KPK Sent: Tuesday, June 05, 2012 9:51 PM To: solr-user@lucene.apache.org Subject: Re: index special characters solr Thanks Jack for your help! I found my mistake, rather th

Re: Solr, I have perfomance problem for indexing.

2012-06-05 Thread Jack Krupansky
You wrote "3,5000", but is that 35 hundred (3,500) or 35 thousand (35,000)?? Your numbers seem far worse than what many people typically see with Solr and DIH. Is the database running on the same machine? Check the Solr log file to see if some errors (or warnings) might be occurring frequent

Re: Solr, I have perfomance problem for indexing.

2012-06-05 Thread Lance Norskog
Which Solr do you run? On Tue, Jun 5, 2012 at 8:02 PM, Jack Krupansky wrote: > You wrote "3,5000", but is that 35 hundred (3,500) or 35 thousand (35,000)?? > > Your numbers seem far worse than what many people typically see with Solr > and DIH. > > Is the database running on the same machine? > >

Hiring multiple Lucene/Solr Search Engineers

2012-06-05 Thread SV
Hi, We are hiring multiple Lucene/Solr engineers, tech leads, architects based in Minneapolis - both full time and consulting for developing new search platform. Please reach out to me - svamb...@gmail.com Thanks, Venkat Ambati Sr. Manager, Best Buy

Replication

2012-06-05 Thread William Bell
We are using SOLR 1.4, and we are experiencing full index replication every 15 minutes. I have checked the solrconfig and it has maxsegments set to 20. It appears like it is indexing a segment, but replicating the whole index. How can I verify it and possibly fix the issue? -- Bill Bell billnb.