5G memory per JVM
--
View this message in context:
http://lucene.472066.n3.nabble.com/my-index-has-500-million-docs-how-to-improve-solr-search-performance-tp1902595p2819179.html
Sent from the Solr - User mailing list archive at Nabble.com.
There is a patch that fixes UTF-8 and performance issues with Jetty. So I
would recommend you use the patched version in 3.1/4.0.
On 4/13/11 9:47 AM, "stockii" wrote:
>is it necessary to update for solr ?
>
>-
>--- System
>
Can solr list fields in fl=... like this way? fl=!fieldName,score
Floyd
2011/4/14 Otis Gospodnetic
> Floyd,
>
> You need to explicitly list all fields in &fl=...
>
> Otis
>
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
: Does anyone know if there is a Solr/Lucene user group /
: birds-of-feather that meets in Seattle?
I don't live in seattle, but this group use to send meeting announvements
to solr-user promoting "Seattle Hadoop/Lucene/NoSQL" Meetups. They still
list "solr" in their keywords, but not in their
I have come across an issue with the DIH where I get a null exception when
pre-caching entities. I expect my entity to have null values so this is a bit
of a roadblock for me. The issue was described more succinctly in this
discussion:
http://lucene.472066.n3.nabble.com/DataImportHandlerExcepti
Hi all,
Does anyone know if there is a Solr/Lucene user group /
birds-of-feather that meets in Seattle?
If not, I'd like to start one up. I'd love to learn and share tricks
pertaining to NRT, performance, distributed solr, etc.
Also, I am planning on attending the Lucene Revolution!
Let's conn
Hi Ken,
It sounds like you want to just sort by "time changed/added" (reverse chrono
order). I would not worry about issues just yet unless you have some reasons
to
think this is going to cause problems (e.g. giant index, low RAM). Jonathan is
right about commits, and the NRT-ness of search
Floyd,
You need to explicitly list all fields in &fl=...
Otis
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/
- Original Message
> From: Floyd Wu
> To: solr-user@lucene.apache.org
> Sent: Wed, April 13, 2011 2:34:49
all documents. But, I would want the sort to be at the system level, I dont'
want the overhead of sorting every query I ever make.
How would 'doing it at the system level' avoid the 'overhead of sorting
every query'? Every query has to be sorted, if you want it sorted.
Beyond setting a def
Hi,
I'm not sure how Solr allows for adjusting these Tika settings to get the
desired output. At least a few desirable Tika subsystems cannot be called from
the ExtractingRequestHandler such as Tika's BoilerPlateContentHandler. I'm
also not really sure if it's a good idea to normalize diacritic
As Hoss mentioned earlier in the thread, you can use the statistics page
from the admin console to view the current number of segments. But if you
want to know by looking at the files, each segment will have a unique
prefix, such as "_u". There will be one unique prefix for every segment in
the ind
> Is a new DocID generated everytime a doc with the same UniqueID is added to
> the index? If so, then docID must be incremental and would look like
> indexed_at ascending. What I see (and why it's a problem for me) is the
> following.
Yes, Solr removes the old and inserts a new when updating an
Hi all,
I'm wondering if there are any knobs or levers i can set in
solrconfig.xml that affect how pdfbox text extraction is performed by
the extraction handler. I would like to take advantage of pdfbox's
ability to normalize diacritics and ligatures [1], but that doesn't
seem to be the default be
Is a new DocID generated everytime a doc with the same UniqueID is added to
the index? If so, then docID must be incremental and would look like
indexed_at ascending. What I see (and why it's a problem for me) is the
following.
a search brings back the first 5 documents in a result set of say 60.
You have to specify the query. In the query you will have fq parameter which
means facet query.
http://wiki.apache.org/solr/solr-ruby
-Original Message-
From: soumya rao [mailto:soumrao...@gmail.com]
Sent: Wednesday, April 13, 2011 2:27 PM
To: solr-user@lucene.apache.org
Subject: Re: Reg
Sorting a large set is costly, the more fields you sort on, the more memory is
consumed (and likely cached).
If i remember correctly the result set will be ordered according to Lucene
DocID's if there's nothing to sort on.
If i read correctly, you don't want to specify those fixed sort paramete
>From the post.jar i think that you can do something like...
java -jar post.jar A*.xml
java -jar post.jar B*.xml
java -jar post.jar C*.xml
java -jar post.jar D*.xml
(im in windows)
On Wed, Apr 13, 2011 at 4:41 PM, Markus Jelsma
wrote:
> Either put all documents in a large file or loop over them
Either put all documents in a large file or loop over them with a simple shell
script.
> Hey guys, how do you curl update all the XML inside a folder from A-D?
> Example: curl http://localhost:8080/solr update
> Sent from my iPhone
If you omitNorms and omitTermFreqAndPositions on the query field(s) and use no
funky boost functions, all results will have identical score in AND-queries
(or queries with one search term). IDF has no meaning because of AND,
queryNorm is the same across the resultset, fieldNorm is 1 and TF is 1.
Hey guys, how do you curl update all the XML inside a folder from A-D?
Example: curl http://localhost:8080/solr update
Sent from my iPhone
You should just ask me.
Sent from my iPhone
On Apr 13, 2011, at 11:27 AM, soumya rao wrote:
> Thanks for the reply Josh.
>
> And where should I make changes in ruby to add filters?
>
> Soumya
>
> On Wed, Apr 13, 2011 at 11:20 AM, Joshua Bouchair <
> joshuabouch...@wasserstrom.com> wrote:
>
Au contraire, I have almost 4 million documents, representing businesses in
the US. And having the score be the same is a very common occurrence.
It is quite clear from testing that if score is the same, then it sorts on
indexed_at ascending. It seems silly to make me add a sort on every query,
th
In real life though, it seems unlikely that the relevancy score will
ever be identical, so the second sort field will never be used. Is
relevancy score ever identical? Rarely at any rate.
On 4/13/2011 3:22 PM, Rob Casson wrote:
you could just explicitly send multiple sorts...from the tutoria
you could just explicitly send multiple sorts...from the tutorial:
&sort=inStock asc, price desc
cheers.
On Wed, Apr 13, 2011 at 2:59 PM, kenf_nc wrote:
> Is sort order when 'score' is the same a Lucene thing? Should I ask on the
> Lucene forum?
>
> --
> View this message in context:
> ht
Is sort order when 'score' is the same a Lucene thing? Should I ask on the
Lucene forum?
--
View this message in context:
http://lucene.472066.n3.nabble.com/Result-order-when-score-is-the-same-tp2816127p2817330.html
Sent from the Solr - User mailing list archive at Nabble.com.
Not cleanly currently. SOLR-2193: Re-architect Update Handler, should take care
of this though.
- Mark
On Apr 12, 2011, at 8:21 AM, stockii wrote:
> Hello.
>
> When is start an optimize (which takes more than 4 hours) no updates from
> DIH are possible.
> i thougt solr is copy the hole index a
Hi,
As I know when using fl=*, score means we need to get all field and score as
returned search result. And if field is stored, all text will be returned as
part of result.
Now I have 2x fields, some of fields name have no prefix or fixed naming
rule and cannot be predicted what name will be.
I
Thanks for the reply Josh.
And where should I make changes in ruby to add filters?
Soumya
On Wed, Apr 13, 2011 at 11:20 AM, Joshua Bouchair <
joshuabouch...@wasserstrom.com> wrote:
> Uncomment solrconfig.xml at the following location.
>
>
>
> Josh B.
>
> -Original Message-
> From: so
Uncomment solrconfig.xml at the following location.
Josh B.
-Original Message-
From: soumya rao [mailto:soumrao...@gmail.com]
Sent: Wednesday, April 13, 2011 1:59 PM
To: solr-user@lucene.apache.org
Subject: Regarding filterquery
Hi,
I am a newbie to solr. I could see that the quer
Hi,
I am a newbie to solr. I could see that the queries are not cached. Would
like to apply filterCache to queries in ruby. Can anyone provide me the
syntax for this please?
Thanks.
Name equals the product name.
Each separate product can have 1 to n prices based upon pricelist.
A single document represents that single product.
1
The product name.
1.00
0.99
0.98
0.85
2
The product name.
1.10
Is NAME a product name? Why would it be multivalue? And why would it appear
on more than one document? Is each 'document' a package of products? And
the pricing tiers are on the package, not individual pieces?
So sounds like you could, potentially, have a PriceListX column for each
user. As your
Is your current solr installation with Jetty 6 working well for you in
a production environment?
I dont know enough about Jetty to help you further on this question.
On Wed, Apr 13, 2011 at 10:47 AM, stockii wrote:
> is it necessary to update for solr ?
>
> -
> ---
is it necessary to update for solr ?
-
--- System
One Server, 12 GB RAM, 2 Solr Instances, 7 Cores,
1 Core with 31 Million Documents other Cores < 100.000
- Solr1 for Search-Requests - commit every Minute - 5GB Xmx
- Sol
: Subject: phpnative response writer in SOLR 3.1 ?
: References:
: <15647_1302703023_zzh0o1kefjfix.00_4da5abae.5070...@uni-bielefeld.de>
: <0d30a85b-b981-4c27-9dbe-7fc8e0619...@gmail.com>
: In-Reply-To: <0d30a85b-b981-4c27-9dbe-7fc8e0619...@gmail.com>
http://people.apache.org/~hossman/#thread
I found this link after googling for a few minutes.
http://wiki.eclipse.org/Jetty/Howto/Upgrade_from_Jetty_6_to_Jetty_7
I hope that helps
Also, a question like this may be more appropriate for a jetty mailing list.
On Wed, Apr 13, 2011 at 8:44 AM, ramires wrote:
> hi
>
> how to update jetty 6 t
Don't know of any other way to organize the documents. We need to have the
specific price that belongs to the user, so I don't think that the facets would
be the issue. The facet querying would be modified to the corresponding price
list field for that user. Let's say the customer belongs to pri
Thanks both for your replies
Eric,
Yep, I use the Analysis page extensively, but what I was directly looking
for was whether all of only the last line of values given by the analysis
page, where eventually indexed.
I think we've concluded it's only the last line.
Cheers,
Ben
On Wed, Apr 13, 2011
Indexing isn't a problem, it's just disk space and space is cheap. But, if
you do facets on all those price columns, that gets put into RAM which isn't
as cheap or plentiful. Your cache buffers may get overloaded a lot and
performance will suffer.
2000 price columns seems like a lot, could the doc
We have an ecommerce application B2C/B2B with a large amount of price list that
range into 2000+ and growing. They want to index price to have facets and
sorting. That seems like that would be a lot of columns to index, example below:
INDEX COLUMN: NamePrice PriceList1Price
On Wed, Apr 13, 2011 at 10:00 AM, Marco Martinez
wrote:
> Its seems that is a problem of my own query, now i need to investigate if
> there is something different between a normal query and my implementation of
> the query, because if you use it alone, its works properly.
Look at your advance() i
Hi Erik,
never mind.
Can't reproduce this strange behavior.
Obviously stopping and starting of solr solved this.
Thanks,
Bernd
Am 13.04.2011 16:00, schrieb Erik Hatcher:
What does the parsed query look like with debugQuery=true for both scenarios?
Any difference?
Doesn't make any sense that e
This is invalid XML. Entities must be encoded or embedded within CDATA tags.
On Wednesday 13 April 2011 16:10:51 Rosa (Anuncios) wrote:
> Hi
>
> I'm having an error when i import an xml file with DIH.
>
> In this file my is an url wich looks like this :
>
> http://www.example.com/?cp=30_s&st=a
Hi
I'm having an error when i import an xml file with DIH.
In this file my is an url wich looks like this :
http://www.example.com/?cp=30_s&st=a&c=655
Apparently the issue is with the "=" character?
Is there any workaround?
Error trace:
rows processed:0 Processing Document # 849
at
Hello,
I just updatet to SOLR 3.1 and wondering if the phpnative response
writer plugin is part of it?
( https://issues.apache.org/jira/browse/SOLR-1967 )
When I try to compile the sources files I get some errors :
PHPNativeResponseWriter.java:57:
org.apache.solr.request.PHPNativeResponseWri
Its seems that is a problem of my own query, now i need to investigate if
there is something different between a normal query and my implementation of
the query, because if you use it alone, its works properly.
Thanks,
Marco Martínez Bautista
http://www.paradigmatecnologico.com
Avenida de Europa,
What does the parsed query look like with debugQuery=true for both scenarios?
Any difference? Doesn't make any sense that echoParams would have an effect,
unless somehow your search client is relying on parameters returned to do
something with them.?!
Erik
On Apr 13, 2011, at 09:57 ,
Dear list,
after setting "echoParams" to "none" wildcard search isn't working.
Only if I set "echoParams" to "explicit" then wildcard is possible.
http://wiki.apache.org/solr/CoreQueryParameters
states that "echoParams" is for debugging purposes.
We use Solr 3.1.0.
Snippet from solrconfig.xml:
I'm using version 1.4.1. It appears that when several documents in a result
set have the same score, the secondary sort is by 'indexed_at' ascending.
Can this be altered in the config xml files? If I wanted the secondary sort
to be indexed_at descending for example, or by a different field, say
doc
hi
how to update jetty 6 to jetty 7 ?
--
View this message in context:
http://lucene.472066.n3.nabble.com/jetty-update-tp2816084p2816084.html
Sent from the Solr - User mailing list archive at Nabble.com.
CharFilterFactories are applied to the raw input before tokenization.
Each token output from the tokenization is then sent through
the rest of the chain.
The Analysis page available from the Solr admin page is
invaluable in answering in great detail what each part of
an analysis chain does.
Token
Or is the only the final value after completing the whole chain indexed?
Yes.
Koji
--
http://www.rondhuit.com/en/
On Apr 13, 2011, at 12:06 AM, Liam O'Boyle wrote:
> Afternoon,
>
> After an upgrade to Solr 3.1 which has largely been very smooth and
> painless, I'm having a minor issue with the ExtractingRequestHandler.
>
> The problem is that it's inserting metadata into the extracted
> content, as well as
> I would like to build a component that during indexing
> analyses all tokens
> in a stream and adds metadata to a new field based on my
> analysis. I have
> different tasks that I would like to perform, like basic
> classification and
> certain more advanced phrase detections. How would I do
> th
Hi there,
Just a quick question that the wiki page (
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters) didn't seem to
answer very well.
Given an analyzer that has zero or more Char Filter Factories, one
Tokenizer Factory, and zero or more Token Filter Factories, which value(s)
are ind
Yes, you can assume this since that's the only
way new content will be searchable, as you've
discovered
Best
Erick
On Wed, Apr 13, 2011 at 4:42 AM, Reeza Edah Tally wrote:
> Thanks,
>
> I changed my searching to be triggered on a newSearcher event instead and
> use the new searcher to retrie
Erick,
I was under the misconception that a solr "transaction" is ACID.
>From what you said, I guess solr "transactions" are not Isolated.
Thanks,
Phong
On Tue, Apr 12, 2011 at 2:54 PM, Erick Erickson wrote:
> See below:
>
> On Tue, Apr 12, 2011 at 2:21 PM, Phong Dais wrote:
>
> > Erick,
> >
>
If you are using the Dismax query parser, perhaps could you take a look to
the minimum should match parameter 'mm' :
http://wiki.apache.org/solr/DisMaxQParserPlugin#mm_.28Minimum_.27Should.27_Match.29
Ludovic.
2011/4/13 Mark Mandel [via Lucene] <
ml-node+2815186-149863473-383...@n3.nabble.com>
Thanks!
I searched high and low for that, couldn't see it in front of my face!
Mark
On Wed, Apr 13, 2011 at 6:32 PM, Pierre GOSSE wrote:
> For (a) I don't think anything exists today providing this mechanism.
> But (b) is a good description of the dismax handler with a MM parameter of
> 66%.
>
Thanks,
I changed my searching to be triggered on a newSearcher event instead and
use the new searcher to retrieve the documents. This works.
Btw can I assume that a new searcher will always be created soon after a
commit?
Regards,
Reeza
-Original Message-
From: Otis Gospodnetic [mailto
For (a) I don't think anything exists today providing this mechanism.
But (b) is a good description of the dismax handler with a MM parameter of 66%.
Pierre
-Message d'origine-
De : Mark Mandel [mailto:mark.man...@gmail.com]
Envoyé : mercredi 13 avril 2011 10:04
À : solr-user@lucene.ap
Not sure if the title explains it all, or if what I want is even possible,
but figured I would ask.
Say, I have a series of products I'm selling, and a search of:
"Blue Wool Rugs"
Comes in. This returns 0 results, as "Blue" and "Rugs" match terms that are
indexes, "Wool" does not.
Is there a w
No, this query returns a few more documents than if a do it by lucene query
parser. I'm going to generate another query parser that send a simple term
query and see what is the output, when i have it, i will inform in the mail.
Marco Martínez Bautista
http://www.paradigmatecnologico.com
Avenida de
"The current limitation or pause is when the ram buffer is flushing to disk "
-> when an optimize starts and is running ~4 hours, you say, that DIH is
flushing the doc`s during this "pause" into the index ?
-
--- System
On
Afternoon,
After an upgrade to Solr 3.1 which has largely been very smooth and
painless, I'm having a minor issue with the ExtractingRequestHandler.
The problem is that it's inserting metadata into the extracted
content, as well as mapping it to a dynamic field. Previously the
same configuration
Bill Bell wrote:
>
> Just set up your schema with a "string" multivalued field...
>
I've this in my schema:
Worked.. Thanks...
.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Is-it-possible-to-create-a-duplicate-field-tp2815029p2815061.html
Sent from the Solr - Use
66 matches
Mail list logo