date:20110413

Re: my index has 500 million docs ,how to improve solr search performance？

2011-04-13 Thread lu.rongbin

5G memory per JVM -- View this message in context: http://lucene.472066.n3.nabble.com/my-index-has-500-million-docs-how-to-improve-solr-search-performance-tp1902595p2819179.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: jetty update

2011-04-13 Thread Bill Bell

There is a patch that fixes UTF-8 and performance issues with Jetty. So I would recommend you use the patched version in 3.1/4.0. On 4/13/11 9:47 AM, "stockii" wrote: >is it necessary to update for solr ? > >- >--- System >

Re: how to get lots fields this way?

2011-04-13 Thread Floyd Wu

Can solr list fields in fl=... like this way? fl=!fieldName,score Floyd 2011/4/14 Otis Gospodnetic > Floyd, > > You need to explicitly list all fields in &fl=... > > Otis > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > Lucene ecosystem search :: http://search-lucene.com/

Re: Seattle Solr/Lucene User Group?

2011-04-13 Thread Chris Hostetter

: Does anyone know if there is a Solr/Lucene user group / : birds-of-feather that meets in Seattle? I don't live in seattle, but this group use to send meeting announvements to solr-user promoting "Seattle Hadoop/Lucene/NoSQL" Meetups. They still list "solr" in their keywords, but not in their

DIH CachedSqlEntityProcessor null exception

2011-04-13 Thread Zac Smith

I have come across an issue with the DIH where I get a null exception when pre-caching entities. I expect my entity to have null values so this is a bit of a roadblock for me. The issue was described more succinctly in this discussion: http://lucene.472066.n3.nabble.com/DataImportHandlerExcepti

Seattle Solr/Lucene User Group?

2011-04-13 Thread Gary Yngve

Hi all, Does anyone know if there is a Solr/Lucene user group / birds-of-feather that meets in Seattle? If not, I'd like to start one up. I'd love to learn and share tricks pertaining to NRT, performance, distributed solr, etc. Also, I am planning on attending the Lucene Revolution! Let's conn

Re: Result order when score is the same

2011-04-13 Thread Otis Gospodnetic

Hi Ken, It sounds like you want to just sort by "time changed/added" (reverse chrono order). I would not worry about issues just yet unless you have some reasons to think this is going to cause problems (e.g. giant index, low RAM). Jonathan is right about commits, and the NRT-ness of search

Re: how to get lots fields this way?

2011-04-13 Thread Otis Gospodnetic

Floyd, You need to explicitly list all fields in &fl=... Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message > From: Floyd Wu > To: solr-user@lucene.apache.org > Sent: Wed, April 13, 2011 2:34:49

Re: Result order when score is the same

2011-04-13 Thread Jonathan Rochkind

all documents. But, I would want the sort to be at the system level, I dont' want the overhead of sorting every query I ever make. How would 'doing it at the system level' avoid the 'overhead of sorting every query'? Every query has to be sorted, if you want it sorted. Beyond setting a def

Re: tika/pdfbox knobs & levers

2011-04-13 Thread Markus Jelsma

Hi, I'm not sure how Solr allows for adjusting these Tika settings to get the desired output. At least a few desirable Tika subsystems cannot be called from the ExtractingRequestHandler such as Tika's BoilerPlateContentHandler. I'm also not really sure if it's a good idea to normalize diacritic

Re: partial optimize does not reduce the segment number to maxNumSegments

2011-04-13 Thread Jay Hill

As Hoss mentioned earlier in the thread, you can use the statistics page from the admin console to view the current number of segments. But if you want to know by looking at the files, each segment will have a unique prefix, such as "_u". There will be one unique prefix for every segment in the ind

Re: Result order when score is the same

2011-04-13 Thread Markus Jelsma

> Is a new DocID generated everytime a doc with the same UniqueID is added to > the index? If so, then docID must be incremental and would look like > indexed_at ascending. What I see (and why it's a problem for me) is the > following. Yes, Solr removes the old and inserts a new when updating an

tika/pdfbox knobs & levers

2011-04-13 Thread Jay Luker

Hi all, I'm wondering if there are any knobs or levers i can set in solrconfig.xml that affect how pdfbox text extraction is performed by the extraction handler. I would like to take advantage of pdfbox's ability to normalize diacritics and ligatures [1], but that doesn't seem to be the default be

Re: Result order when score is the same

2011-04-13 Thread kenf_nc

Is a new DocID generated everytime a doc with the same UniqueID is added to the index? If so, then docID must be incremental and would look like indexed_at ascending. What I see (and why it's a problem for me) is the following. a search brings back the first 5 documents in a result set of say 60.

RE: Regarding filterquery

2011-04-13 Thread Joshua Bouchair

You have to specify the query. In the query you will have fq parameter which means facet query. http://wiki.apache.org/solr/solr-ruby -Original Message- From: soumya rao [mailto:soumrao...@gmail.com] Sent: Wednesday, April 13, 2011 2:27 PM To: solr-user@lucene.apache.org Subject: Re: Reg

Re: Result order when score is the same

2011-04-13 Thread Markus Jelsma

Sorting a large set is costly, the more fields you sort on, the more memory is consumed (and likely cached). If i remember correctly the result set will be ordered according to Lucene DocID's if there's nothing to sort on. If i read correctly, you don't want to specify those fixed sort paramete

Re: Curl bulk XML

2011-04-13 Thread Ezequiel Calderara

>From the post.jar i think that you can do something like... java -jar post.jar A*.xml java -jar post.jar B*.xml java -jar post.jar C*.xml java -jar post.jar D*.xml (im in windows) On Wed, Apr 13, 2011 at 4:41 PM, Markus Jelsma wrote: > Either put all documents in a large file or loop over them

Re: Curl bulk XML

2011-04-13 Thread Markus Jelsma

Either put all documents in a large file or loop over them with a simple shell script. > Hey guys, how do you curl update all the XML inside a folder from A-D? > Example: curl http://localhost:8080/solr update > Sent from my iPhone

Re: Result order when score is the same

2011-04-13 Thread Markus Jelsma

If you omitNorms and omitTermFreqAndPositions on the query field(s) and use no funky boost functions, all results will have identical score in AND-queries (or queries with one search term). IDF has no meaning because of AND, queryNorm is the same across the resultset, fieldNorm is 1 and TF is 1.

Curl bulk XML

2011-04-13 Thread Li

Hey guys, how do you curl update all the XML inside a folder from A-D? Example: curl http://localhost:8080/solr update Sent from my iPhone

Re: Regarding filterquery

2011-04-13 Thread Li

You should just ask me. Sent from my iPhone On Apr 13, 2011, at 11:27 AM, soumya rao wrote: > Thanks for the reply Josh. > > And where should I make changes in ruby to add filters? > > Soumya > > On Wed, Apr 13, 2011 at 11:20 AM, Joshua Bouchair < > joshuabouch...@wasserstrom.com> wrote: >

Re: Result order when score is the same

2011-04-13 Thread kenf_nc

Au contraire, I have almost 4 million documents, representing businesses in the US. And having the score be the same is a very common occurrence. It is quite clear from testing that if score is the same, then it sorts on indexed_at ascending. It seems silly to make me add a sort on every query, th

Re: Result order when score is the same

2011-04-13 Thread Jonathan Rochkind

In real life though, it seems unlikely that the relevancy score will ever be identical, so the second sort field will never be used. Is relevancy score ever identical? Rarely at any rate. On 4/13/2011 3:22 PM, Rob Casson wrote: you could just explicitly send multiple sorts...from the tutoria

Re: Result order when score is the same

2011-04-13 Thread Rob Casson

you could just explicitly send multiple sorts...from the tutorial: &sort=inStock asc, price desc cheers. On Wed, Apr 13, 2011 at 2:59 PM, kenf_nc wrote: > Is sort order when 'score' is the same a Lucene thing? Should I ask on the > Lucene forum? > > -- > View this message in context: > ht

Re: Result order when score is the same

2011-04-13 Thread kenf_nc

Is sort order when 'score' is the same a Lucene thing? Should I ask on the Lucene forum? -- View this message in context: http://lucene.472066.n3.nabble.com/Result-order-when-score-is-the-same-tp2816127p2817330.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Updates during Optimize

2011-04-13 Thread Mark Miller

Not cleanly currently. SOLR-2193: Re-architect Update Handler, should take care of this though. - Mark On Apr 12, 2011, at 8:21 AM, stockii wrote: > Hello. > > When is start an optimize (which takes more than 4 hours) no updates from > DIH are possible. > i thougt solr is copy the hole index a

how to get lots fields this way?

2011-04-13 Thread Floyd Wu

Hi, As I know when using fl=*, score means we need to get all field and score as returned search result. And if field is stored, all text will be returned as part of result. Now I have 2x fields, some of fields name have no prefix or fixed naming rule and cannot be predicted what name will be. I

Re: Regarding filterquery

2011-04-13 Thread soumya rao

Thanks for the reply Josh. And where should I make changes in ruby to add filters? Soumya On Wed, Apr 13, 2011 at 11:20 AM, Joshua Bouchair < joshuabouch...@wasserstrom.com> wrote: > Uncomment solrconfig.xml at the following location. > > > > Josh B. > > -Original Message- > From: so

RE: Regarding filterquery

2011-04-13 Thread Joshua Bouchair

Uncomment solrconfig.xml at the following location. Josh B. -Original Message- From: soumya rao [mailto:soumrao...@gmail.com] Sent: Wednesday, April 13, 2011 1:59 PM To: solr-user@lucene.apache.org Subject: Regarding filterquery Hi, I am a newbie to solr. I could see that the quer

Regarding filterquery

2011-04-13 Thread soumya rao

Hi, I am a newbie to solr. I could see that the queries are not cached. Would like to apply filterCache to queries in ruby. Can anyone provide me the syntax for this please? Thanks.

RE: Indexing Question for large dataset

2011-04-13 Thread Joshua Bouchair

Name equals the product name. Each separate product can have 1 to n prices based upon pricelist. A single document represents that single product. 1 The product name. 1.00 0.99 0.98 0.85 2 The product name. 1.10

RE: Indexing Question for large dataset

2011-04-13 Thread kenf_nc

Is NAME a product name? Why would it be multivalue? And why would it appear on more than one document? Is each 'document' a package of products? And the pricing tiers are on the package, not individual pieces? So sounds like you could, potentially, have a PriceListX column for each user. As your

Re: jetty update

2011-04-13 Thread Sam Granieri

Is your current solr installation with Jetty 6 working well for you in a production environment? I dont know enough about Jetty to help you further on this question. On Wed, Apr 13, 2011 at 10:47 AM, stockii wrote: > is it necessary to update for solr ? > > - > ---

Re: jetty update

2011-04-13 Thread stockii

is it necessary to update for solr ? - --- System One Server, 12 GB RAM, 2 Solr Instances, 7 Cores, 1 Core with 31 Million Documents other Cores < 100.000 - Solr1 for Search-Requests - commit every Minute - 5GB Xmx - Sol

Re: phpnative response writer in SOLR 3.1 ?

2011-04-13 Thread Chris Hostetter

: Subject: phpnative response writer in SOLR 3.1 ? : References: : <15647_1302703023_zzh0o1kefjfix.00_4da5abae.5070...@uni-bielefeld.de> : <0d30a85b-b981-4c27-9dbe-7fc8e0619...@gmail.com> : In-Reply-To: <0d30a85b-b981-4c27-9dbe-7fc8e0619...@gmail.com> http://people.apache.org/~hossman/#thread

Re: jetty update

2011-04-13 Thread Sam Granieri

I found this link after googling for a few minutes. http://wiki.eclipse.org/Jetty/Howto/Upgrade_from_Jetty_6_to_Jetty_7 I hope that helps Also, a question like this may be more appropriate for a jetty mailing list. On Wed, Apr 13, 2011 at 8:44 AM, ramires wrote: > hi > > how to update jetty 6 t

RE: Indexing Question for large dataset

2011-04-13 Thread Joshua Bouchair

Don't know of any other way to organize the documents. We need to have the specific price that belongs to the user, so I don't think that the facets would be the issue. The facet querying would be modified to the corresponding price list field for that user. Let's say the customer belongs to pri

Re: Field Analyzers: which values are indexed?

2011-04-13 Thread Ben Davies

Thanks both for your replies Eric, Yep, I use the Analysis page extensively, but what I was directly looking for was whether all of only the last line of values given by the analysis page, where eventually indexed. I think we've concluded it's only the last line. Cheers, Ben On Wed, Apr 13, 2011

Re: Indexing Question for large dataset

2011-04-13 Thread kenf_nc

Indexing isn't a problem, it's just disk space and space is cheap. But, if you do facets on all those price columns, that gets put into RAM which isn't as cheap or plentiful. Your cache buffers may get overloaded a lot and performance will suffer. 2000 price columns seems like a lot, could the doc

Indexing Question for large dataset

2011-04-13 Thread Joshua Bouchair

We have an ecommerce application B2C/B2B with a large amount of price list that range into 2000+ and growing. They want to index price to have facets and sorting. That seems like that would be a lot of columns to index, example below: INDEX COLUMN: NamePrice PriceList1Price

Re: function query apply only in the subset of the query

2011-04-13 Thread Yonik Seeley

On Wed, Apr 13, 2011 at 10:00 AM, Marco Martinez wrote: > Its seems that is a problem of my own query, now i need to investigate if > there is something different between a normal query and my implementation of > the query, because if you use it alone, its works properly. Look at your advance() i

Re: strange behavior of echoParams

2011-04-13 Thread Bernd Fehling

Hi Erik, never mind. Can't reproduce this strange behavior. Obviously stopping and starting of solr solved this. Thanks, Bernd Am 13.04.2011 16:00, schrieb Erik Hatcher: What does the parsed query look like with debugQuery=true for both scenarios? Any difference? Doesn't make any sense that e

Re: DIH : Unexpected character '=' (code 61); expected a semi-colon after the reference for entity 'st'

2011-04-13 Thread Markus Jelsma

This is invalid XML. Entities must be encoded or embedded within CDATA tags. On Wednesday 13 April 2011 16:10:51 Rosa (Anuncios) wrote: > Hi > > I'm having an error when i import an xml file with DIH. > > In this file my is an url wich looks like this : > > http://www.example.com/?cp=30_s&st=a

DIH : Unexpected character '=' (code 61); expected a semi-colon after the reference for entity 'st'

2011-04-13 Thread Rosa (Anuncios)

Hi I'm having an error when i import an xml file with DIH. In this file my is an url wich looks like this : http://www.example.com/?cp=30_s&st=a&c=655 Apparently the issue is with the "=" character? Is there any workaround? Error trace: rows processed:0 Processing Document # 849 at

phpnative response writer in SOLR 3.1 ?

2011-04-13 Thread Ralf Kraus

Hello, I just updatet to SOLR 3.1 and wondering if the phpnative response writer plugin is part of it? ( https://issues.apache.org/jira/browse/SOLR-1967 ) When I try to compile the sources files I get some errors : PHPNativeResponseWriter.java:57: org.apache.solr.request.PHPNativeResponseWri

Re: function query apply only in the subset of the query

2011-04-13 Thread Marco Martinez

Its seems that is a problem of my own query, now i need to investigate if there is something different between a normal query and my implementation of the query, because if you use it alone, its works properly. Thanks, Marco Martínez Bautista http://www.paradigmatecnologico.com Avenida de Europa,

Re: strange behavior of echoParams

2011-04-13 Thread Erik Hatcher

What does the parsed query look like with debugQuery=true for both scenarios? Any difference? Doesn't make any sense that echoParams would have an effect, unless somehow your search client is relying on parameters returned to do something with them.?! Erik On Apr 13, 2011, at 09:57 ,

strange behavior of echoParams

2011-04-13 Thread Bernd Fehling

Dear list, after setting "echoParams" to "none" wildcard search isn't working. Only if I set "echoParams" to "explicit" then wildcard is possible. http://wiki.apache.org/solr/CoreQueryParameters states that "echoParams" is for debugging purposes. We use Solr 3.1.0. Snippet from solrconfig.xml:

Result order when score is the same

2011-04-13 Thread kenf_nc

I'm using version 1.4.1. It appears that when several documents in a result set have the same score, the secondary sort is by 'indexed_at' ascending. Can this be altered in the config xml files? If I wanted the secondary sort to be indexed_at descending for example, or by a different field, say doc

jetty update

2011-04-13 Thread ramires

hi how to update jetty 6 to jetty 7 ? -- View this message in context: http://lucene.472066.n3.nabble.com/jetty-update-tp2816084p2816084.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Field Analyzers: which values are indexed?

2011-04-13 Thread Erick Erickson

CharFilterFactories are applied to the raw input before tokenization. Each token output from the tokenization is then sent through the rest of the chain. The Analysis page available from the Solr admin page is invaluable in answering in great detail what each part of an analysis chain does. Token

Re: Field Analyzers: which values are indexed?

2011-04-13 Thread Koji Sekiguchi

Or is the only the final value after completing the whole chain indexed? Yes. Koji -- http://www.rondhuit.com/en/

Re: ExtractingRequestHandler and Solr 3.1

2011-04-13 Thread Grant Ingersoll

On Apr 13, 2011, at 12:06 AM, Liam O'Boyle wrote: > Afternoon, > > After an upgrade to Solr 3.1 which has largely been very smooth and > painless, I'm having a minor issue with the ExtractingRequestHandler. > > The problem is that it's inserting metadata into the extracted > content, as well as

Re: Analysing all tokens in a stream

2011-04-13 Thread Ahmet Arslan

> I would like to build a component that during indexing > analyses all tokens > in a stream and adds metadata to a new field based on my > analysis. I have > different tasks that I would like to perform, like basic > classification and > certain more advanced phrase detections. How would I do > th

Field Analyzers: which values are indexed?

2011-04-13 Thread Ben Davies

Hi there, Just a quick question that the wiki page ( http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters) didn't seem to answer very well. Given an analyzer that has zero or more Char Filter Factories, one Tokenizer Factory, and zero or more Token Filter Factories, which value(s) are ind

Re: Searching during postcommit

2011-04-13 Thread Erick Erickson

Yes, you can assume this since that's the only way new content will be searchable, as you've discovered Best Erick On Wed, Apr 13, 2011 at 4:42 AM, Reeza Edah Tally wrote: > Thanks, > > I changed my searching to be triggered on a newSearcher event instead and > use the new searcher to retrie

Re: SolrException: Unavailable Service

2011-04-13 Thread Phong Dais

Erick, I was under the misconception that a solr "transaction" is ACID. >From what you said, I guess solr "transactions" are not Isolated. Thanks, Phong On Tue, Apr 12, 2011 at 2:54 PM, Erick Erickson wrote: > See below: > > On Tue, Apr 12, 2011 at 2:21 PM, Phong Dais wrote: > > > Erick, > > >

Re: Allowing looser matches

2011-04-13 Thread lboutros

If you are using the Dismax query parser, perhaps could you take a look to the minimum should match parameter 'mm' : http://wiki.apache.org/solr/DisMaxQParserPlugin#mm_.28Minimum_.27Should.27_Match.29 Ludovic. 2011/4/13 Mark Mandel [via Lucene] < ml-node+2815186-149863473-383...@n3.nabble.com>

Re: Allowing looser matches

2011-04-13 Thread Mark Mandel

Thanks! I searched high and low for that, couldn't see it in front of my face! Mark On Wed, Apr 13, 2011 at 6:32 PM, Pierre GOSSE wrote: > For (a) I don't think anything exists today providing this mechanism. > But (b) is a good description of the dismax handler with a MM parameter of > 66%. >

RE: Searching during postcommit

2011-04-13 Thread Reeza Edah Tally

Thanks, I changed my searching to be triggered on a newSearcher event instead and use the new searcher to retrieve the documents. This works. Btw can I assume that a new searcher will always be created soon after a commit? Regards, Reeza -Original Message- From: Otis Gospodnetic [mailto

RE: Allowing looser matches

2011-04-13 Thread Pierre GOSSE

For (a) I don't think anything exists today providing this mechanism. But (b) is a good description of the dismax handler with a MM parameter of 66%. Pierre -Message d'origine- De : Mark Mandel [mailto:mark.man...@gmail.com] Envoyé : mercredi 13 avril 2011 10:04 À : solr-user@lucene.ap

Allowing looser matches

2011-04-13 Thread Mark Mandel

Not sure if the title explains it all, or if what I want is even possible, but figured I would ask. Say, I have a series of products I'm selling, and a search of: "Blue Wool Rugs" Comes in. This returns 0 results, as "Blue" and "Rugs" match terms that are indexes, "Wool" does not. Is there a w

Re: function query apply only in the subset of the query

2011-04-13 Thread Marco Martinez

No, this query returns a few more documents than if a do it by lucene query parser. I'm going to generate another query parser that send a simple term query and see what is the output, when i have it, i will inform in the mail. Marco Martínez Bautista http://www.paradigmatecnologico.com Avenida de

Re: Updates during Optimize

2011-04-13 Thread stockii

"The current limitation or pause is when the ram buffer is flushing to disk " -> when an optimize starts and is running ~4 hours, you say, that DIH is flushing the doc`s during this "pause" into the index ? - --- System On

ExtractingRequestHandler and Solr 3.1

2011-04-13 Thread Liam O'Boyle

Afternoon, After an upgrade to Solr 3.1 which has largely been very smooth and painless, I'm having a minor issue with the ExtractingRequestHandler. The problem is that it's inserting metadata into the extracted content, as well as mapping it to a dynamic field. Previously the same configuration

Re: Is it possible to create a duplicate field ?

2011-04-13 Thread shrinath.m

Bill Bell wrote: > > Just set up your schema with a "string" multivalued field... > I've this in my schema: Worked.. Thanks... . -- View this message in context: http://lucene.472066.n3.nabble.com/Is-it-possible-to-create-a-duplicate-field-tp2815029p2815061.html Sent from the Solr - Use

66 matches

Mail list logo