On Wed, Sep 30, 2009 at 3:01 PM, con wrote:
>
> Hi all
> I am getting incorrect results when i search with numbers only or string
> containing numbers.
> when such a search is done, all the results in the index is returned,
> irrespective of the search key.
> For eg, the phone number field is map
On Tue, Sep 29, 2009 at 6:42 PM, Jörg Agatz wrote:
> Hi Users...
>
> i have a Problem
>
> I have a lot of fields, (type=text) for search in all fields i copy all
> fields in the default text field and use this for default search.
>
> Now i will search...
>
> This is into a Field
>
> "RI-MC
Hi All,
I'm working with data that has multiple date precisions most of
which do not have a time associated with them, rather centuries (like
1800's), years (like 1867), and years/months (like 1918-11). I'm
able to sort and search using a workaround where we store the date as a
string
Hi Joe,
Currently the patch does not do that, but you can do something else
that might help you in getting your summed stock.
In the latest patch you can include fields of collapsed documents in
the result per distinct field value.
If your specify collapse.includeCollapseDocs.fl=num_in_stock in t
1. In my playing around with
sending in an XML document within a an XML CDATA tag,
with termVectors="true"
I noticed the following behavior:
peter
collapses to the term
personpeterperson
instead of
person
and
peter separately.
I realize I could try and do a search and replaces of characters
Hi all,
I would have two questions about the ReversedWildcardFilterFactory:
a) put it into both chains, index and query, or into index only?
b) where exactly in the/each chain do I have to put it? (Do I have to
respect a certain order - as I have wordDelimiter and lowercase in
there, as well.)
Hello list.
So, i setup my schema.xml with the different chains of analyzers and
filters for each field (i.e. i created types text-en, text-de, text-it).
As i have to index documents in different languages, this is good.
But what defines the analyzers and filters for the query?
Let's suppose i
Hi Claudio,
in schema.xml, the element accepts the attribute type.
If you need different analyzer chains during indexing and querying,
configure it like this:
If there is no difference, just remove one analyzer element and the type
att
Hi all,
Have you planned a release date for solr 1.4? If I understood well, it
will use lucene 2.9 release from last sept. 24th with a stable API?
Thanks.
Jerome.
--
Jerome Eteve.
http://www.eteve.net
jer...@eteve.net
I am trying to automate a build process that adds documents to 10 shards
over 5 machines and need to limit the size of a shard to no more than
200GB because I only have 400GB of disk available to optimize a given shard.
Why does the size (du) of an index typically decrease after a commit?
I'v
On Oct 1, 2009, at 8:32 AM, Jérôme Etévé wrote:
Hi all,
Have you planned a release date for solr 1.4? If I understood well, it
will use lucene 2.9 release from last sept. 24th with a stable API?
Please have a look at https://issues.apache.org/jira/secure/BrowseVersion.jspa?id=12310230&versi
It may take some time before resources are released and garbage
collected, so that may be part of the reason why things hang around
and du doesn't report much of a drop.
On Oct 1, 2009, at 8:54 AM, Phillip Farber wrote:
I am trying to automate a build process that adds documents to 10
shar
Phillip Farber wrote:
> I am trying to automate a build process that adds documents to 10
> shards over 5 machines and need to limit the size of a shard to no
> more than 200GB because I only have 400GB of disk available to
> optimize a given shard.
>
> Why does the size (du) of an index typically
Whoops - they way I have mail come in, not easy to tell if I'm replying
to Lucene or Solr list ;)
The way Solr works with Searchers and reopen, it shouldn't run into a
situation that requires greater than
2x to optimize. I won't guarantee it ;) But based on what I know, it
shouldn't happen under n
Thanks, that's exactly the kind of answer I was looking for.
Chantal Ackermann wrote:
> Hi Claudio,
>
> in schema.xml, the element accepts the attribute type.
> If you need different analyzer chains during indexing and querying,
> configure it like this:
>
>
>
>
>
>
>
>
Chantal Ackermann wrote:
> Hi all,
>
> I would have two questions about the ReversedWildcardFilterFactory:
> a) put it into both chains, index and query, or into index only?
> b) where exactly in the/each chain do I have to put it? (Do I have to
> respect a certain order - as I have wordDelimiter a
Hi guys,
Although i've been looking at Solr on and off for a few months, I'm still
getting to grips with the schema and filter/tokenizers.
I'm having trouble using the "solr.KeepWordFilterFactory" functionality and
there doesnt appear to be previous discussions here regarding it. I
basically hav
Here is how you need 3X. First, index everything and optimize. Then
delete everything and reindex without any merges.
You have one full-size index containing only deleted docs, one full-
size index containing reindexed docs, and need that much space for a
third index.
Honestly, disk is che
Nice one ;) Its not technically a case where optimize requires > 2x
though in case the user asking gets confused. Its a case unrelated to
optimize that can grow your index. Then you need < 2x for the optimize,
since you won't copy the deletes.
It also requires that you jump hoops to delete everyth
bq. and reindex without any merges.
Thats actually quite a hoop to jump as well - though if you determined
and you have tons of RAM, its somewhat doable.
Mark Miller wrote:
> Nice one ;) Its not technically a case where optimize requires > 2x
> though in case the user asking gets confused. Its a
Ok, one more question on this issue. I used to have an "all" field where
i used to copyField "title" "content" and "keywords" defined with
typeField "text", which used to have english-language dependant
analyzers/filters. Now I can copyField all the three "content-*" fields
as I know that only one
Hi.
This situation is still bugging me.
I thought i had it fixed yday, but no...
Seems like this goes both for deleting and adding, but I'll explain
the delete-situation here:
When I'm deleting documents(~5k) from a index, i get a error message
saying
"Only one usage of each socket address (
Thanks, Mark!
But I suppose it does matter where in the index chain it goes? I would
guess it is applied to the tokens, so I suppose I should put it at the
very end - after WordDelimiter and Lowercase have been applied.
Is that correct?
>>
>> >splitOnCaseChange="1" splitOnNume
I've now worked on three different search engines and they all have a
3X worst
case on space, so I'm familiar with this case. --wunder
On Oct 1, 2009, at 7:15 AM, Mark Miller wrote:
Nice one ;) Its not technically a case where optimize requires > 2x
though in case the user asking gets confuse
I just noticed this comment in the default schema:
Does that mean TrieFields are never going to get sortMissingLast?
Do you all think that a reasonable strategy is to use a copyField and
use "s" fields for sorting (only), and trie for everything else?
On Wed, Sep 30, 2009 at 10:59 PM, Steve Co
Oops, the missing trailing Z was probably just a cut and paste error.
It might be tough to come up with a case that can reproduce it -- it's a
sticky issue. I'll post it if I can, though.
-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org]
Sent: Tuesday, Septembe
I am trying to update to the newest version of solr from trunk as of May
5th. I updated and compiled from trunk as of yesterday (09/30/2009). When
I try to do a full import I am receiving a GC heap error after changing
nothing in the configuration files. Why would this happen in the most
recent
hello martijn, thx for the tip, i tried that approach but ran into two
snags, 1. returning the fields makes collapsing a lot slower for
results, but that might just be the nature of iterating large results.
2. it seems like only dupes of records on the first page are returned
or is tehre a a setti
Chantal Ackermann wrote:
Thanks, Mark!
But I suppose it does matter where in the index chain it goes? I would
guess it is applied to the tokens, so I suppose I should put it at the
very end - after WordDelimiter and Lowercase have been applied.
Is that correct?
>>
>> >split
Sorry about asking this here, but I can't reach wiki.apache.org right now.
What do I set in query.setMaxRows() to get all the rows?
--
http://www.linkedin.com/in/paultomblin
Jeff Newburn wrote:
> I am trying to update to the newest version of solr from trunk as of May
> 5th. I updated and compiled from trunk as of yesterday (09/30/2009). When
> I try to do a full import I am receiving a GC heap error after changing
> nothing in the configuration files. Why would thi
Sorry, in my last question I meant setRows not setMaxRows. Whay do I pass to
setRows to get all matches, not just the first 10?
-- Sent from my Palm Prē
You probably want to add the following command line option to java to
produce a heap dump:
-XX:+HeapDumpOnOutOfMemoryError
Then you can use jhat to see what's taking up all the space in the heap.
Bill
On Thu, Oct 1, 2009 at 11:47 AM, Mark Miller wrote:
> Jeff Newburn wrote:
> > I am trying to
Resuming this discussion in a new thread to focus only on this question:
What is the best way to get the size of an index so it does not get too
big to be optimized (or to allow a very large segment merge) given space
limits?
I already have the largest 15,000rpm SCSI direct attached storage
Hi Andrzej,
thanks! Unfortunately, I get a ClassNotFoundException for the
solr.ReversedWildcardFilterFactory with my nightly build from 22nd of
September. I've found the corresponding JIRA issue, but from the wiki
it's not obvious that this might require a patch? I'll have a closer
look at th
Andrew Clegg wrote:
>
>
> hossman wrote:
>>
>>
>> This is why the examples of using context files on the wiki talk about
>> keeping the war *outside* of the webapps directory, and using docBase in
>> your Context declaration...
>> http://wiki.apache.org/solr/SolrTomcat
>>
>>
>
> Great
Hi All,
I'm trying Solr CEL outside of the example and running into trouble
because I can't refer to the
http://wiki.apache.org/solr/ExtractingRequestHandler (the wiki's down).
After realizing I needed to copy all the jars from /example/solr/lib to
my indexes /lib dir, I am now hitting th
Hi folks,
I'm using the 2009-09-30 build, and any single or double quotes in the query
string cause an NPE. Is this normal behaviour? I never tried it with my
previous installation.
Example:
http://myserver:8080/solr/select/?title:%22Creatine+kinase%22
(I've also tried without the URL encoding
On 1 Oct 09, at 12:46 PM, Tricia Williams wrote:
STREAM_SOURCE_INFO
https://www.packtpub.com/article/indexing-data-solr-1.4-enterprise-search-server-2
appears to be a constant from this page:
http://lucene.apache.org/solr/api/constant-values.html
This has it embedded as an "arr" in the re
It was added to trunk on the 11th and shouldn't require a patch. You
sure that nightly was actually build after then?
solr.ReversedWildcardFilterFactory should work fine.
Chantal Ackermann wrote:
> Hi Andrzej,
>
> thanks! Unfortunately, I get a ClassNotFoundException for the
> solr.ReversedWildca
don't forget q=... :)
Erik
On Oct 1, 2009, at 9:49 AM, Andrew Clegg wrote:
Hi folks,
I'm using the 2009-09-30 build, and any single or double quotes in
the query
string cause an NPE. Is this normal behaviour? I never tried it with
my
previous installation.
Example:
http://mys
Sorry! I'm officially a complete idiot.
Personally I'd try to catch things like that and rethrow a
'QueryParseException' or something -- but don't feel under any obligation to
listen to me because, well, I'm an idiot.
Thanks :-)
Andrew.
Erik Hatcher-4 wrote:
>
> don't forget q=... :)
>
>
When I do a query directly form the web, the XML of the response
includes how many results would have been returned if it hadn't
restricted itself to the first 10 rows:
For instance, the query:
http://localhost:8080/solrChunk/nutch/select/?q=*:*&fq=category:mysites
returns:
0
0
*:*
category:mys
Added the parameter and it didn't seem to dump when it hit the gc limit
error. Any other thoughts?
--
Jeff Newburn
Software Engineer, Zappos.com
jnewb...@zappos.com - 702-943-7562
> From: Bill Au
> Reply-To:
> Date: Thu, 1 Oct 2009 12:16:53 -0400
> To:
> Subject: Re: Solr Trunk Heap Space I
Jeff Newburn wrote:
> Added the parameter and it didn't seem to dump when it hit the gc limit
> error. Any other thoughts?
>
>
You might use jmap to take a look at the heap (you can do it well its
live with Java6) or to force a heap dump when you specify.
Since its spending 98% of the time in
Mark Miller wrote:
>
> You might use jmap to take a look at the heap (you can do it well its
> live with Java6)
Errr - just so I don't screw anyone in a production environment - it
will freeze your app while its getting the info.
--
- Mark
http://www.lucidimagination.com
> My question is why isn't the DateField implementation of ISO 8601 broader so
> that it could include and MM as acceptable date strings? What would
> it take to do so?
Nobody ever cared? But yes, you're right, the spurious precision is
annoying. However, there is no "fuzzy search" for
If the wiki isn't working
https://www.packtpub.com/article/indexing-data-solr-1.4-enterprise-search-server-2
gave me more information. The LucidImagination article helps too.
Now that the wiki is up again it is more obvious that I need to add:
fulltext
text
to my solrconfig.xml
Tricia
Don't be too hard on yourself.
Sometimes, mistakes like that can happen even to the most brilliant and most
experienced.
On Thu, Oct 1, 2009 at 2:15 PM, Andrew Clegg wrote:
>
> Sorry! I'm officially a complete idiot.
>
> Personally I'd try to catch things like that and rethrow a
> 'QueryParseEx
Doing this you will send the dump where you want:
-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/path/to/the/dump
Then you can open the dump with jhat:
jhat /path/to/the/dump/your_stack.bin
It provably will give you a OutOfMemortException due to teh large size ofthe
dump. In case you can give
Trie fields also do not support faceting. They also take more ram in
some operations.
Given these defects, I'm not sure that promoting tries as the default
is appropriate at this time. (I'm sure this is an old argument.:)
On Thu, Oct 1, 2009 at 7:39 AM, Steve Conover wrote:
> I just noticed this
Indeed... and the only reason I knew the answer right away is because
I've experienced this myself numerous times :)
Erik
On Oct 1, 2009, at 11:46 AM, Israel Ekpo wrote:
Don't be too hard on yourself.
Sometimes, mistakes like that can happen even to the most brilliant
and most
ex
For future reference, the Solr & Lucene wikis and mailing lists are
indexed on http://www.lucidimagination.com/search/
On Thu, Oct 1, 2009 at 11:40 AM, Tricia Williams
wrote:
> If the wiki isn't working
>>
>>
>> https://www.packtpub.com/article/indexing-data-solr-1.4-enterprise-search-server-2
>
On Thu, Oct 1, 2009 at 11:41 AM, Jeff Newburn wrote:
> I am trying to update to the newest version of solr from trunk as of May
> 5th.
Tons of changes since... including the per-segment
searching/sorting/function queries (I think).
Do you sort on any single valued fields that you also facet on?
I've heard there is a new "partial optimize" feature in Lucene, but it
is not mentioned in the Solr or Lucene wikis so I cannot advise you
how to use it.
On a previous project we had a 500GB index for 450m documents. It took
14 hours to optimize. We found that Solr worked well (given enough RAM
fo
On Thu, Oct 1, 2009 at 10:39 AM, Steve Conover wrote:
> I just noticed this comment in the default schema:
>
>
>
> Does that mean TrieFields are never going to get sortMissingLast?
Not in time for 1.4, but yes they will eventually get it.
It has to do with the representation... currently we can'
bq. Tons of changes since... including the per-segment
searching/sorting/function queries (I think).
Yup. I actually didn't think so, because that was committed to Lucene in
Feburary - but it didn't come into Solr till March 10th. March 5th just
ducked it.
Yonik Seeley wrote:
> On Thu, Oct 1, 200
Ha! Searching "partial optimize" on
http://www.lucidimagination.com/search , we discover SOLR-603 which
gives the 'maxSegments' option to the command. The text
does not include the word 'partial'.
It's on http://wiki.apache.org/solr/UpdateXmlMessages. The command
gives a number of Lucene segments
On Thu, Oct 1, 2009 at 3:14 PM, Mark Miller wrote:
> bq. Tons of changes since... including the per-segment
> searching/sorting/function queries (I think).
>
> Yup. I actually didn't think so, because that was committed to Lucene in
> Feburary - but it didn't come into Solr till March 10th. March
Whoops. There is my lazy brain for you - march, may, august - all the
same ;)
Okay - forgot Solr went straight down and used FieldSortedHitQueue.
So it all still makes sense ;)
Still interested in seeing his field sanity output to see whats possibly
being doubled.
Yonik Seeley wrote:
> On Thu,
On Thu, Oct 1, 2009 at 3:37 PM, Mark Miller wrote:
> Still interested in seeing his field sanity output to see whats possibly
> being doubled.
Strangely enough, I'm having a hard time seeing caching at the different levels.
I mad a multi-segment index (2 segments), and then did a sort and facet:
1) That is correct. Including collapsed documents fields can make you
search significantly slower (depending on how many documents are
returned).
2) It seems that you are using the parameters as was intended. The
collapsed documents will contain all documents (from whole query
result) that have bee
On Thu, Oct 1, 2009 at 4:05 PM, Yonik Seeley wrote:
> On Thu, Oct 1, 2009 at 3:37 PM, Mark Miller wrote:
>> Still interested in seeing his field sanity output to see whats possibly
>> being doubled.
>
> Strangely enough, I'm having a hard time seeing caching at the different
> levels.
> I mad a
On Thu, Oct 1, 2009 at 4:35 PM, Yonik Seeley wrote:
> Since isTokenized() more reflects if something is tokenized at the
> Lucene level, perhaps we need something that specifies if there is
> more than one logical value per field value? I'm drawing a blank on a
> good name for such a method thoug
thx for the reply, i just want the number of dupes in the query
result, but it seems i dont get the correct totals,
for example a non collapsed dismax query for belgian beer returns X
number results
but when i collapse and sum the number of docs under collapse_counts,
its much less than X
it does
Is that possible? Implemented?
I want to be able to have SOLR Slave instance on publicly available host
(accessible via HTTP), and synchronize with Master securely (via HTTP)
I had it implicitly with cron jobs running as 'root' user, and Tomcat as
'tomcat'... Slave wasn't able to update index bec
Yonik Seeley wrote:
> On Thu, Oct 1, 2009 at 4:35 PM, Yonik Seeley
> wrote:
>
>> Since isTokenized() more reflects if something is tokenized at the
>> Lucene level, perhaps we need something that specifies if there is
>> more than one logical value per field value? I'm drawing a blank on a
>>
Ok I was able to get a heap dump from the GC Limit error.
1 instance of LRUCache is taking 170mb
1 instance of SchemaIndex is taking 56Mb
4 instances of SynonymMap is taking 112mb
There is no searching going on during this index update process.
Any ideas what on earth is going on? Like I said m
Thanks Lance,
I have lucid's search as one of my open search tools in my browser.
Generally pretty useful (especially the ability to filter) but it's not
of much help when the tool points out that the best info is on the wiki
and the link to the wiki reveals that it can't be reached. This
i gotten two different out of memory errors while using the field
collapsing component, using the latest patch (2009-09-26) and the
latest nightly,
has anyone else encountered similar problems? my collection is 5
million results but ive gotten the error collapsing as little as a few
thousand
SEVE
Jeff Newburn wrote:
> Ok I was able to get a heap dump from the GC Limit error.
>
> 1 instance of LRUCache is taking 170mb
> 1 instance of SchemaIndex is taking 56Mb
> 4 instances of SynonymMap is taking 112mb
>
> There is no searching going on during this index update process.
>
> Any ideas what o
I loaded the jvm and started indexing. It is a test server so unless
some errant query came in then no searching. Our instance has only
512mb but my concern is the obvious memory requirement leap since it
worked before. What other data would be helpful with this?
On Oct 1, 2009, at 5:14 P
--- On Wed, 9/23/09, Amit Nithian wrote:
Hi Amith,
Thanks for your reply.How do i set preference for the links , which should
appear first,second in the search results.
Which configuration file in Solr needs to be modified to achieve the same?.
Regards
Bhaskar
From: Amit Nithian
Subject:
> Not in time for 1.4, but yes they will eventually get it.
> It has to do with the representation... currently we can't tell
> between a 0 and "missing".
Hmm. So does that mean that a query for latitudes, stored as trie
floats, from -10 to +10 matches documents with no (i.e. null) latitude
value
QueryResponse#getResults()#getNumFound()
On Thu, Oct 1, 2009 at 11:49 PM, Paul Tomblin wrote:
> When I do a query directly form the web, the XML of the response
> includes how many results would have been returned if it hadn't
> restricted itself to the first 10 rows:
>
> For instance, the query:
On Thu, Oct 1, 2009 at 11:09 PM, Steve Conover wrote:
>> Not in time for 1.4, but yes they will eventually get it.
>> It has to do with the representation... currently we can't tell
>> between a 0 and "missing".
>
> Hmm. So does that mean that a query for latitudes, stored as trie
> floats, from
On Thu, Oct 1, 2009 at 8:45 PM, Jeffery Newburn wrote:
> I loaded the jvm and started indexing. It is a test server so unless some
> errant query came in then no searching. Our instance has only 512mb but my
> concern is the obvious memory requirement leap since it worked before. What
> other data
Hello,
I am working with some XML/JSON feed as well as Database and using transformer
to create the final index. I am no expert and I would like to get some help on
a hourly/daily rate basis. It might be also the this part of the job can be
outsourced to you completely, however I need to unders
78 matches
Mail list logo