Hi Ahmed,
fields that are empty do not impact the index. It's different from a
database.
I have text fields for different languages and per document there is
always only one of the languages set (the text fields for the other
languages are empty/not set). It works all very well and fast.
I wonder
On Thu, 29 Jul 2010 15:33:42 -0400
S Ahmed wrote:
> I understand (and its straightforward) when you want to create a
> index for something simple like Products.
>
> But how do you go about creating a Solr index when you have data
> coming from 10-15 database tables, and the tables have unrelated
2010/7/28 Rafal Bluszcz Zawadzki
> Hi,
>
> In my schema I have (inter ali) fields CollectionID, and CollectionName.
> These two values always match together, which means that for every value of
> CollectionID there is matching value from CollectionName.
>
> I am interested in query which allow m
> What approach shoud I use to perform wildcard and proximity
> searches?
>
>
>
> Like: "solr mail*"~10
>
>
>
> For getting docs where solr is within 10 words of "mailing"
> for
> instance?
You can do it with the plug-in described here:
https://issues.apache.org/jira/browse/SOLR-1604
It
Hi to all Solr/Lucene Users...
Out team had a discussion today regarding the Solr/Lucene community closer to
home.
I am hereby putting out an SOS to all Solr/Lucene users in the South African
market and wish to organize a meet-up (or user support group) if at all
possible.
It would be great to
Hi Ahmet,
Thank you. I'll be happy to test it if I manage to install it ok.. I'm a
newbie at solr but I'm going to try the instructions in the thread to
load it.
Another doubts I have about wildcard searches:
a) I think wildcard search is by default "case sensitive"? Is there a
way to make case
> a) I think wildcard search is by default "case sensitive"?
> Is there a
> way to make case insensitive?
Wildcard searches are not analyzed. To case insensitive search you can
lowercase query terms at client side. (with using lowercasefilter at index
time) e.g. Mail* => mail*
> I discovered
Sorry,
I had inspected the ...core.jar three times, without recognizing the
package. I was realy blind. =8-)
thanks
Uwe
Am 26.07.2010 20:48, schrieb Chris Hostetter:
: where is a Jar, containing org.apache.solr.client.solrj.embedded?
Classes in the embedded package are useless w/o the rest
Hi Ahmet,
> a) I think wildcard search is by default "case sensitive"?
> Is there a
> way to make case insensitive?
>>Wildcard searches are not analyzed. To case insensitive search you can
lowercase query terms >>at client side. (with using lowercasefilter at
index time) e.g. Mail* => mail*
>
> I
I believe they come back alphabetically sorted (not sure if this is language
specific or not), so a quick way might be to change the name from createdate
to zz_createdate or something like that.
Generally with XML you should not be worried about order however. It's
usually a sign of a design iss
Thank you for your reply.
This is a background as to what I am trying to achieve. I want to be able
to perform a search across numeric index ranges and get the results in
logical ordering instead of a lexicographic ordering using dspace. Currently
if I do a search using the query: var:[10 TO
hightlight's time is mainly spent on getting the field which you want
to highlight and tokenize this field(If you don't store term vector) .
you can check what's wrong,
2010/7/30 Peter Spam :
> If I don't do highlighting, it's really fast. Optimize has no effect.
>
> -Peter
>
> On Jul 29, 2010, a
Hi,
Thanks a lot for the info and your time. I think field collapse will work
for us. I looked at the https://issues.apache.org/jira/browse/SOLR-236 but
which file I should use for patch. We use solr-1.3.
Thanks
Bharat Jain
On Fri, Jul 30, 2010 at 12:53 AM, Chris Hostetter
wrote:
>
> : 1. Th
We are using Solr Extract Handler for indexing document metadata with
attachments. (/update/extract)
However, the SolrContentHandler doesn't seem to support index time document
boost attribute.
Probably , document.setDocumentBoost(Float.parseFloat(boost)) is missing.
Regards,
Jayendra
So I have tables like this:
Users
UserSales
UserHistory
UserAddresses
UserNotes
ClientAddress
CalenderEvent
Articles
Blogs
Just seems odd to me, jamming on these tables into a single index. But I
guess the idea of using a 'type' field to quality exactly what I am
searching is a good idea, in cas
I do store term vector:
-Pete
On Jul 30, 2010, at 7:30 AM, Li Li wrote:
> hightlight's time is mainly spent on getting the field which you want
> to highlight and tokenize this field(If you don't store term vector) .
> you can check what's wrong,
>
> 2010/7/30 Peter Spam :
>> If I don't do hi
I want to programmatically retrieve the number of indexed documents. I.e., get
the value of numDocs.
The only two ways I've come up with are searching for "*:*" and reporting the
hit count, or sending an Http GET to
http://xxx.xx.xxx.xxx:8080/solr/admin/stats.jsp#core and searching for in
I'd just index the eventtype, eventby and eventtime as separate fields. Then
queries something like eventtype:update AND eventtime:[ TO *].
Similarly for events update by pramod, the query would be something like:
eventby:pramod AND eventtype:update
HTH
Erick
On Wed, Jul 28, 2010 at 11:05 PM, Pr
Glad to help. Do be aware that there are several config values that
influence
the commit frequency, they might also be relevant.
Best
Erick
On Thu, Jul 29, 2010 at 5:11 AM, Christos Constantinou <
ch...@simpleweb.co.uk> wrote:
> Eric,
>
> Thank you very much for the indicators! I had a closer lo
See the subject about 1500 threads. The first place I'd look is how
often you're committing. If you're committing before the warmup queries
from the previous commit have done their magic, you might be getting
into a death spiral.
HTH
Erick
On Thu, Jul 29, 2010 at 7:02 AM, Peter Karich wrote:
>
Hi,
I'm new with solr and I'm doing my first installation under tomcat, I
followed the documentation on link (
http://wiki.apache.org/solr/SolrTomcat#Installing_Tomcat_6) but there are
some problems.
The http://localhost:8080/solr/admin works fine, but in some cases, for
example to see my schema.x
Hi Erick!
thanks for the response!
I will answer your questions ;-)
> How often are you making changes to your index?
Every 30-60 seconds. Too heavy?
> Do you have autocommit on?
No.
> Do you commit when updating each document?
No. I commit after a batch update of 200 documents
> Committ
Both approaches are ok, I think. (although I don't know the python API)
BTW: If you query q=*:* then add rows=0 to avoid some traffic.
Regards,
Peter.
> I want to programmatically retrieve the number of indexed documents. I.e.,
> get the value of numDocs.
>
> The only two ways I've come up with
Hi Peter :-),
did you already try other values for
hl.maxAnalyzedChars=2147483647
? Also regular expression highlighting is more expensive, I think.
What does the 'fuzzy' variable mean? If you use this to query via
"~someTerm" instead "someTerm"
then you should try the trunk of solr which is a l
Hello,
I'm looking for a list of English words that, when stemmed by Porter stemmer,
end up in the same stem as some similar, but unrelated words. Below are some
examples:
# this gets stemmed to "iron", so if you search for "ironic", you'll get "iron"
matches
ironic
# same stem as animal
a
Peter, there are events in solrconfig where you define warm up queries when a
new searcher is opened.
There are also cache settings that play a role here.
30-60 seconds is pretty frequent for Solr.
Otis
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: ht
I suppose you could write a component that just gets this info from
SolrIndexSearcher and write that in the response?
Otis
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/
- Original Message
> From: John DeRosa
> To:
May I suggest looking at some of the related issues, say SOLR-1682
This issue is related to:
SOLR-1682 Implement CollapseComponent
SOLR-1311 pseudo-field-collapsing
LUCENE-1421 Ability to group search results by field
SOLR-1773 Field Collapsing (lightweight version)
Some collisions are listed here:
http://www.attivio.com/blog/34-attivio-blog/333-doing-things-with-words-part-three-stemming-and-lemmatization.html
Have you asked Martin Porter? You can find his e-mail here:
http://tartarus.org/~martin/
wunder
On Jul 30, 2010, at 1:41 PM, Otis Gospodnetic wrot
Hello,
I'm wondering if anyone has good ideas for handling the following (Porter)
stemming problem.
The word "city" gets stemmed to "citi". But "citi" is short for "citibank", so
we have a conflict - the stems of both "city" and "citi" are "citi", so when
you
search for "city", you will get
Otis,
https://issues.apache.org/jira/browse/LUCENE-2055 may be of some help.
cheers
On 7/30/10 2:18 PM, Otis Gospodnetic wrote:
Hello,
I'm wondering if anyone has good ideas for handling the following (Porter)
stemming problem.
The word "city" gets stemmed to "citi". But "citi" is short for
Thanks!
On Jul 30, 2010, at 1:11 PM, Peter Karich wrote:
> Both approaches are ok, I think. (although I don't know the python API)
> BTW: If you query q=*:* then add rows=0 to avoid some traffic.
>
> Regards,
> Peter.
>
>> I want to programmatically retrieve the number of indexed documents. I.e
On Fri, Jul 30, 2010 at 4:41 PM, Otis Gospodnetic
wrote:
> I'm looking for a list of English words that, when stemmed by Porter stemmer,
> end up in the same stem as some similar, but unrelated words. Below are some
> examples:
>
> # this gets stemmed to "iron", so if you search for "ironic", y
Just starting with DataImportHandler and had a few simple questions.
Is there a location for more in depth documentation other than
http://wiki.apache.org/solr/DataImportHandler?
Specifically I was looking for a detailed document outlining
data-config.xml, the fields and attributes and how they a
Hi Otis,
does it mean that a new searcher is opened after I commit?
I thought only on startup...(?)
Regards,
Peter.
> Peter, there are events in solrconfig where you define warm up queries when a
> new searcher is opened.
>
> There are also cache settings that play a role here.
>
> 30-60 second
A good starting place might be the list of stemming errors for the original
Porter stemmer in this article that describes k-stem:
Krovetz, R. (1993). Viewing morphology as an inference process. In Proceedings
of the 16th annual international ACM SIGIR conference on Research and
development in i
Otis,
I think this is a great idea.
you could also go even further by making a better example for
StemmerOverrideFilter (stemdict.txt)
(
http://wiki.apache.org/solr/LanguageAnalysis#solr.StemmerOverrideFilterFactory
)
for example:
animated animate
animation animation
animations animation
thi
Wait- how much text are you highlighting? You say these logfiles are X
big- how big are the actual documents you are storing?
On Fri, Jul 30, 2010 at 1:16 PM, Peter Karich wrote:
> Hi Peter :-),
>
> did you already try other values for
>
> hl.maxAnalyzedChars=2147483647
>
> ? Also regular expre
As you make changes to your index, you probably want to see the new/modified
documents in your search results. In order to do that, the new searcher needs
to be reopened, and this happens on commit.
Otis
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: ht
On Sat, Jul 31, 2010 at 3:40 AM, Harry Smith wrote:
> Just starting with DataImportHandler and had a few simple questions.
>
> Is there a location for more in depth documentation other than
> http://wiki.apache.org/solr/DataImportHandler?
>
>
Umm, no, but let us know what is not covered well and i
: I want to programmatically retrieve the number of indexed documents. I.e.,
get the value of numDocs.
Index level stats like this can be fetched from the LukeRequestHandler in
any recent version of SOlr...
http://localhost:8983/solr/admin/luke?numTerms=0
In future releases (ie: alread
41 matches
Mail list logo