date:20120502

Re: Solr: extracting/indexing HTML via cURL

2012-05-02 Thread Lance Norskog

You can have two fields: one which is stripped, and another which stores the original data. You can use directives and make the "stripped" field indexed but not stored, and the original field stored but not indexed. You only have to upload the file once, and only store the text once. If you look

RE: Solr Merge during off peak times

2012-05-02 Thread Prakashganesh, Prabhu

Ok, thanks Otis Another question on merging What is the best way to monitor merging? Is there something in the log file that I can look for? It seems like I have to monitor the system resources - read/write IOPS etc.. and work out when a merge happened It would be great if I can do it by looking

Re: should slave replication be turned off / on during master clean and re-index?

2012-05-02 Thread Erick Erickson

Simply turn off replication during your rebuild-from-scratch. See: http://wiki.apache.org/solr/SolrReplication#HTTP_API the "disabelreplication" command. The autocommit thing was, I think, in reference to keeping any replication of a partial-rebuild from being replicated. Autocommit is usually a f

Re: Solr Merge during off peak times

2012-05-02 Thread Erick Erickson

Why do you care? Merging is generally a background process, or are you doing heavy indexing? In a master/slave setup, it's usually not really relevant except that (with 3.x), massive merges may temporarily stop indexing. Is that the problem? Look at the merge policys, there are configurations that

Re: Lucene FieldCache - Out of memory exception

2012-05-02 Thread Jack Krupansky

The FieldCache gets populated the first time a given field is referenced as a facet and then will stay around forever. So, as additional queries get executed with different facet fields, the number of FieldCache entries will grow. If I understand what you have said, theses faceted queries do w

RE: Solr Merge during off peak times

2012-05-02 Thread Prakashganesh, Prabhu

We have a fairly large scale system - about 200 million docs and fairly high indexing activity - about 300k docs per day with peak ingestion rates of about 20 docs per sec. I want to work out what a good mergeFactor setting would be by testing with different mergeFactor settings. I think the def

Re: Solr Merge during off peak times

2012-05-02 Thread Erick Erickson

But again, with a master/slave setup merging should be relatively benign. And at 200M docs, having a M/S setup is probably indicated. Here's a good writeup of mergepolicy http://juanggrande.wordpress.com/2011/02/07/merge-policy-internals/ If you're indexing and searching on a single machine, merg

Null Pointer Exception in SOLR

2012-05-02 Thread mechravi25

Hi, When I tried to remove a data from UI (which will in turn hit SOLR), the whole application got stuck up. When we took the log files of the UI, we could see that this set of requests did not reach SOLR itself. In the SOLR log file, we were able to find the following exception occuring at the s

Re: Newbie question on sorting

2012-05-02 Thread Jacek

Erick, I'll do that. Thank you very much. Regards, Jacek On Tue, May 1, 2012 at 7:19 AM, Erick Erickson wrote: > The easiest way is to do that in the app. That is, return the top > 10 to the app (by score) then re-order them there. There's nothing > in Solr that I know of that does what you want

RE: Solr Merge during off peak times

2012-05-02 Thread Prakashganesh, Prabhu

Actually we are not thinking of a M/S setup We are planning to have x number of shards on N number of servers, each of the shard handling both indexing and searching The expected query volume is not that high, so don't think we would need to replicate to slaves. We think each shard will be able t

ExtractRH: How to strip metadata

2012-05-02 Thread Joseph Hagerty

Greetings Solr folk, How can I instruct the extract request handler to ignore metadata/headers etc. when it constructs the "content" of the document I send to it? For example, I created an MS Word document containing just the word "SEARCHWORD" and nothing else. However, when I ship this doc to my

Re: Removing old documents

2012-05-02 Thread Bai Shen

Somehow I missed that there was a solrclean command. Thanks. On Tue, May 1, 2012 at 10:41 AM, Markus Jelsma wrote: > Nutch 1.4 has a separate tool to remove 404 and redirects documents from > your > index based on your CrawlDB. Trunk's SolrIndexer can add and remove > documents > in one run base

Re: Solr Merge during off peak times

2012-05-02 Thread Erick Erickson

Optimizing is much less important query-speed wise than historically, essentially it's not recommended much any more. A significant effect of optimize _used_ to be purging obsolete data (i.e. that from deleted docs) from the index, but that is now done on merge. There's no harm in optimizing on o

Dumb question: Streaming collector /query results

2012-05-02 Thread vybe3142

I doubt if SOLR has this capability , given that it is based on a RESTful architecture, but I wanted to ask in case I'm mistaken. In lucene, it is easier to gain a direct handle to the collector / scorer and access all the results as they're collected (as opposed to the SOLR query call that perfor

Re: Dumb question: Streaming collector /query results

2012-05-02 Thread vybe3142

In other words, .. as an alternative , what's the most efficient way to gain access to all of the document ids that match a query -- View this message in context: http://lucene.472066.n3.nabble.com/Dumb-question-Streaming-collector-query-results-tp3955175p3955194.html Sent from the Solr - User ma

Re: ExtractRH: How to strip metadata

2012-05-02 Thread Jack Krupansky

Check to see if you have a CopyField for a wildcard pattern that copies to "meta", which would copy all of the Tika-generated fields to "meta." -- Jack Krupansky -Original Message- From: Joseph Hagerty Sent: Wednesday, May 02, 2012 9:56 AM To: solr-user@lucene.apache.org Subject: Extr

Re: ExtractRH: How to strip metadata

2012-05-02 Thread Joseph Hagerty

I do not. I commented out all of the copyFields provided in the default schema.xml that ships with 3.5. My schema is rather minimal. Here is my fields block, if this helps: On Wed, May 2, 2012 at 10:59 AM, Jack Krupansky wrote: > Check to see if you have a CopyField

question about dates

2012-05-02 Thread G.Long

Hi :) I'm starting to use Solr and I'm facing a little problem with dates. My documents have a date property which is of type 'MMdd'. To index these dates, I use the following code: String dateString = "20101230"; SimpleDateFormat sdf = new SimpleDateFormat("MMdd"); Date date = sdf.pa

Re: question about dates

2012-05-02 Thread Jack Krupansky

The trailing "Z" is required in your input data to be indexed, but the Z is not actually stored. Your query must have the trailing "Z" though, unless you are doing a wildcard or prefix query. -- Jack Krupansky -Original Message- From: G.Long Sent: Wednesday, May 02, 2012 11:18 AM To:

SOLRJ: Is there a way to obtain a quick count of total results for a query

2012-05-02 Thread vybe3142

I can achieve this by building a query with start and rows = 0, and using .getResults().getNumFound(). Are there any more efficient approaches to this? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/SOLRJ-Is-there-a-way-to-obtain-a-quick-count-of-total-results-for-a-

Re: question about dates

2012-05-02 Thread Jack Krupansky

Oops... I meant to say that Solr doesn't *index* the trailing Z, but it is "stored" (the stored value, not the indexed value.) The query must match the indexed value, not the stored value. -- Jack Krupansky -Original Message- From: Jack Krupansky Sent: Wednesday, May 02, 2012 11:55 A

Re: question about dates

2012-05-02 Thread Jack Krupansky

That wasn't right either... the query must have the trailing Z, which Solr will strip off to match the indexed value which doesn't have the Z. So, my corrected original statement is: The trailing "Z" is required in your input data to be indexed, but the Z is not actually indexed by Solr (it is

Re: Error with distributed search and Suggester component (Solr 3.4)

2012-05-02 Thread Ken Krugler

Hi Robert, On May 1, 2012, at 7:07pm, Robert Muir wrote: > On Tue, May 1, 2012 at 6:48 PM, Ken Krugler > wrote: >> Hi list, >> >> Does anybody know if the Suggester component is designed to work with shards? > > I'm not really sure it is? They would probably have to override the > default mer

Solr 3.5 - Elevate.xml causing issues when placed under /data directory

2012-05-02 Thread Noordeen, Roxy

Hello, I just started using elevation for solr. I am on solr 3.5, running with Drupal 7, Linux. 1. I updated my solrconfig.xml from ${solr.data.dir:./solr/data} To /usr/local/tomcat2/data/solr/dev_d7/data 2. I placed my elevate.xml in my solr's data directory. Based on forum answers, I thought

Re: ExtractRH: How to strip metadata

2012-05-02 Thread Jack Krupansky

I did some testing, and evidently the "meta" field is treated specially from the ERH. I copied the example schema, and added both "meta" and "metax" fields and set "fmap.content=metax", and lo and behold only the doc content appears in "metax", but all the doc metadata appears in "meta". Alt

Re: Dumb question: Streaming collector /query results

2012-05-02 Thread Mikhail Khludnev

I did small research with the fairly modest result https://github.com/m-khl/solr-patches/tree/streaming you can start exploring it from the trivial test https://github.com/m-khl/solr-patches/blob/17cd45ce7693284de08d39ebc8812aa6a20b8fb3/solr/core/src/test/org/apache/solr/response/ResponseStreaming

Re: Removing old documents

2012-05-02 Thread alxsss

I use jetty that comes with solr. I use solr's dedupe true id true url solr.processor.Lookup3Signature and because of this id is not url itself but its encoded signature. I see solrclean uses url to delete

Re: ExtractRH: How to strip metadata

2012-05-02 Thread Joseph Hagerty

How interesting! You know, I did at one point consider that perhaps the fieldname "meta" may be treated specially, but I talked myself out of it. I reasoned that a field name in my local schema should have no bearing on how a plugin such as solr-cell/Tika behaves. I should have tested my hypothesis

Dynamic core creation works in 3.5.0 fails in 3.6.0: At least one core definition required at run-time for Solr 3.6.0?

2012-05-02 Thread Emes, Matthew (US - Irvine)

Hi: I have been working on an integration project involving Solr 3.5.0 that dynamically registers cores as needed at run-time, but does not contain any cores by default. The current solr.xml configuration file is:- This configuration does not include any cores as those are created dynamica

Re: question about dates

2012-05-02 Thread Chris Hostetter

: String dateString = "20101230"; : SimpleDateFormat sdf = new SimpleDateFormat("MMdd"); : Date date = sdf.parse(dateString); : doc.addField("date", date); : : In the index, the date "20101230" is saved as "2010-12-29T23:00:00Z" ( because : of GMT). "because of GMT" is missleading and vague

Re: Error with distributed search and Suggester component (Solr 3.4)

2012-05-02 Thread Robert Muir

On Wed, May 2, 2012 at 12:16 PM, Ken Krugler wrote: > What confuses me is that Suggester says it's based on SpellChecker, which > supposedly does work with shards. > It is based on spellchecker apis, but spellchecker's ranking is based on simple comparators like string similarity, whereas sugge

need some help with a multicore config of solr3.6.0+tomcat7. mine reports: "Severe errors in solr configuration."

2012-05-02 Thread locuse

i've installed tomcat7 and solr 3.6.0 on linux/64 i'm trying to get a single webapp + multicore setup working. my efforts have gone off the rails :-/ i suspect i've followed too many of the wrong examples. i'd appreciate some help/direction getting this working. so far, i've configured

Re: Solr Merge during off peak times

2012-05-02 Thread Jason Rutherglen

> BTW, in 4.0, there's DocumentWriterPerThread that > merges in the background It flushes without pausing, but does not perform merges. Maybe you're thinking of ConcurrentMergeScheduler? On Wed, May 2, 2012 at 7:26 AM, Erick Erickson wrote: > Optimizing is much less important query-speed wise >

synonyms

2012-05-02 Thread Carlos Andres Garcia

Hello everbody, I have a doubt with respect to synonyms in Solr, In our company we are lookink for one solution to resolve synonyms from database and not from one text file like SynonymFilterFactory do it. The idea is save all the synonyms in the database, indexing and they will be ready to

Re: synonyms

2012-05-02 Thread Jack Krupansky

I'm not sure I completely follow, but are you simply saying that you want to have a synonym filter that reads the synonym table from a database rather than the current text file? If so, sure, you could develop a replacement for the current synonym filter which loads its table from a database, bu

RE: synonyms

2012-05-02 Thread Noordeen, Roxy

Another solution is to write a script to read the database and create the synonyms.txt file, dump the file to solr and reload the core. This gives you the custom synonym solution. -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Wednesday, May 02, 2012 4:5

RE: need some help with a multicore config of solr3.6.0+tomcat7. mine reports: "Severe errors in solr configuration."

2012-05-02 Thread Robert Petersen

I don't know if this will help but I usually add a dataDir element to each cores solrconfig.xml to point at a local data folder for the core like this: ${solr.data.dir:./solr/core0/data} -Original Message- From: loc...@mm.st [mailto:loc...@mm.st] Sent: Wednesday, May 0

Re: need some help with a multicore config of solr3.6.0+tomcat7. mine reports: "Severe errors in solr configuration."

2012-05-02 Thread vybe3142

I chronicled exactly what I had to configure to slay this dragon at http://vinaybalamuru.wordpress.com/2012/04/12/solr4-tomcat-multicor/ Hope that helps -- View this message in context: http://lucene.472066.n3.nabble.com/need-some-help-with-a-multicore-config-of-solr3-6-0-tomcat7-mine-reports-Se

Re: Phrase Slop probelm

2012-05-02 Thread Jack Krupansky

You are missing the "pf", "pf2", and "pf3" request parameters, which says which fields to do phrase proximity boosting on. "pf" boosts using the whole query as a phrase, "pf2" boosts bigrams, and "pf3" boost trigrams. You can use any combination of them, but if you use none of them, "ps" app

RE: synonyms

2012-05-02 Thread Carlos Andres Garcia

Thanks for your answers, now I have another cuestions,if I develop the filter to replacement the current synonym filter,I understand that this procces would be in time of the indexing because in time of the query search there are a lot problems knows. if so, how can I do for create my index file.

Re: Solr Merge during off peak times

2012-05-02 Thread Otis Gospodnetic

Hello Prabhu, Look at SPM for Solr (URL in sig below). It includes Index Statistics graphs, and from these graphs you can tell: * how many docs are in your index * how many docs are deleted * size of index on disk * number of index segments * number of index files * maybe something else I'm for

solr broke a pipe

2012-05-02 Thread Robert Petersen

Anyone have any clues about this exception? It happened during the course of normal indexing. This is new to me (we're running solr 3.6 on tomcat 6/redhat RHEL) and we've been running smoothly for some time now until this showed up: >>>Red Hat Enterprise Linux Server release 5.3 (Tikanga) >>>

Re: syntax for negative query OR something

2012-05-02 Thread Chris Hostetter

: How do I search for things that have no value or a specified value? Things with no value... (*:* -fieldName:[* TO *]) Things with a specific value... fieldName:A Things with no value or a specific value... (*:* -fieldName:[* TO *]) fieldName:A ..."or" if you aren't using

Re: syntax for negative query OR something

2012-05-02 Thread Jack Krupansky

Sounds good. "OR" in the negation of any query that matches any possible value in a field. The Solr query parser doc lists the open range as you used: -field:[* TO *] finds all documents without a value for field See: http://wiki.apache.org/solr/SolrQuerySyntax This also include pure wil

Re: syntax for negative query OR something

2012-05-02 Thread Jack Krupansky

Oops... that is: (-fname:*) OR fname:(A B C) or (-fname:[* TO *]) OR fname:(A B C) -- Jack Krupansky -Original Message- From: Jack Krupansky Sent: Wednesday, May 02, 2012 7:48 PM To: solr-user@lucene.apache.org Subject: Re: syntax for negative query OR something Sounds good. "OR

Re: syntax for negative query OR something

2012-05-02 Thread Jack Krupansky

Hmmm... I thought that worked in edismax. And I thought that pure negative queries were allowed in SolrQueryParser. Oh well. In any case, in the Lucene or Solr query parser, add "*:*" to select all docs before negating the docs that have any value in the field: (*:* -fname:*) OR fname:(A B C)

Re: synonyms

2012-05-02 Thread Jack Krupansky

There are lots of different strategies for dealing with synonyms, depending on what exactly is most important and what exactly your are willing to tolerate. In your latest example, you seem to be using string fields, which is somewhat different form the text synonyms we talk about in Solr. You

Re: Solr 3.5 - Elevate.xml causing issues when placed under /data directory

2012-05-02 Thread Koji Sekiguchi

(12/05/03 1:39), Noordeen, Roxy wrote: Hello, I just started using elevation for solr. I am on solr 3.5, running with Drupal 7, Linux. 1. I updated my solrconfig.xml from ${solr.data.dir:./solr/data} To /usr/local/tomcat2/data/solr/dev_d7/data 2. I placed my elevate.xml in my solr's data dire

Re: synonyms

2012-05-02 Thread Sohail Aboobaker

I think regular sync of database table with synonym text file seems to be simplest of the solutions. It will allow you to use Solr natively without any customization and it is not very complicated operation to update synonyms file with entries in database. > >

Re: syntax for negative query OR something

2012-05-02 Thread Ryan McKinley

thanks! On Wed, May 2, 2012 at 4:43 PM, Chris Hostetter wrote: > > : How do I search for things that have no value or a specified value? > > Things with no value... > (*:* -fieldName:[* TO *]) > Things with a specific value... > fieldName:A > Things with no value or a specific val

Re: Lucene FieldCache - Out of memory exception

2012-05-02 Thread Rahul R

Jack, Yes, the queries work fine till I hit the OOM. The fields that start with S_* are strings, F_* are floats, I_* are ints and so so. The dynamic field definitions from schema.xml : *Each FieldCache will be an array with maxdoc entries (your total number of documents - 1.4 mil

Re: Grouping ngroups count

2012-05-02 Thread Martijn v Groningen

Hi Francois, The issue you describe looks like a similar issue we have fixed before with matches count. Open an issue and we can look into it. Martijn On 1 May 2012 20:14, Francois Perron wrote: > Thanks for your response Cody, > > First, I used distributed grouping on 2 shards and I'm sure th

Parent-Child relationship

2012-05-02 Thread tamanjit.bin...@yahoo.co.in

Hi, I just wanted to get some information about whether Parent-Child relationship between documents which Lucene has been talking about has been implemented in Solr or not? I know join patch is available, would that be the only solution? And another question, as and when this will be possible (if

Re: solr broke a pipe

2012-05-02 Thread Mikhail Khludnev

It seems like slave instance start to pull the index from the master and then die, it causes broken pipe at master node. On Thu, May 3, 2012 at 3:31 AM, Robert Petersen wrote: > Anyone have any clues about this exception? It happened during the > course of normal indexing. This is new to me (w

54 matches

Mail list logo