paulosalamat wrote:
Hi I'm new to this group,
I would like to ask a question:
What does it mean when you see a plus sign in between two words inside
synonyms.txt?
e.g.
macbookair => macbook+air
Thanks,
Paulo
Welcome, Paulo!
It depends on your tokenizer. You can specify a tokenizer via
Hi Koji,
Thank you for the reply.
I have another question. If WhitespaceTokenizer is used, is the term text
"macbook+air" equal to "macbook air"?
Thank you,
Paulo
On Mon, Apr 5, 2010 at 5:50 PM, Koji Sekiguchi [via Lucene] <
ml-node+697386-2142071620-218...@n3.nabble.com
> wrote:
> paulosala
paulosalamat wrote:
Hi Koji,
Thank you for the reply.
I have another question. If WhitespaceTokenizer is used, is the term text
"macbook+air" equal to "macbook air"?
No. In the field, "macbook air" will be a phrase (not a term).
You can define not only terms but phrases in synonyms.txt:
ex
hi,
I am using the piece of code given below
ReplicationHandler handler2 = new ReplicationHandler();
System.out.println( handler2.getDescription());
NamedList statistics = handler2.getStatistics();
hi,
I am using the piece of code given below
ReplicationHandler handler2 = new ReplicationHandler();
System.out.println( handler2.getDescription());
NamedList statistics = handler2.getStatistics();
If you're using ReplicitionHandler directly, you already have the xml from
which to extract the 'indexSize' attribute.
>From a client, you can get the indexSize by issuing:
http://hostname:8983/solr/core/replication?command=details
This will give you an xml response.
Use:
http://hostname:8983/s
Just a reminder, just over one week left open on the CFP. Some great talks
entered already. Keep it up!
On Mar 24, 2010, at 8:03 PM, Grant Ingersoll wrote:
> Apache Lucene EuroCon Call For Participation - Prague, Czech Republic May 20
> & 21, 2010
>
> All submissions must be received by Tue
On Fri, Apr 2, 2010 at 7:07 AM, Na_D wrote:
>
> hi,
>
>
> I need to monitor the index for the following information:
>
> 1. Size of the index
> 2 Last time the index was updated.
>
If by 'size o the index' you mean document count, then check the Luke
Request Handler
http://wiki.apache.org/solr/Lu
Hi,
I got the picture now.
Not having distinct add/update actions force me to implement custom
queueing mechanism.
Thanks
Cheers.
Erick Erickson wrote:
> One of the most requested features in Lucene/SOLR is to be able
> to update only selected fields rather than the whole document. But
> that's no
Chris,
I don't see anything in the headers suggesting that Julian's message was a
hijack of another thread
On Thu, Apr 1, 2010 at 2:17 PM, Chris Hostetter wrote:
>
> : Subject: add/update document as distinct operations? Is it possible?
> : References:
> :
>
> : In-Reply-To:
> :
>
>
> http://p
> Not sure of the exact vocabulary I am looking for so I'll
> try to explain
> myself.
>
> Given a search term is there anyway to return back a list
> of related/grouped
> keywords (based on the current state of the index) for that
> term.
>
> For example say I have a sports catalog and I searc
I still don't see what the difference is. If there was a distinct
add/update process, how would that absolve you from having
to implement your own queueing? To have predictable index
content, you still must order your operations.
Best
Erick
On Mon, Apr 5, 2010 at 12:45 PM, Julian Davchev wrote:
: name->john
: year->2009;year->2010;year->2011
:
: And I query for:
: q=john&fq=-year:2010
:
: Doc1 won't be in the matching results. Is there a way to make it appear
: because even having 2010 the document has also years that don't match the
: filter query?
Not natively -- but you can index a
On Apr 3, 2010, at 10:18 AM, MitchK wrote:
>
> Hello,
>
> I want to tinkle a little bit with Solr, so I need a little feedback:
> Is it possible to define a Minimum Should Match for the document itself?
>
> I mean, it is possible to say, that a query "this is my query" should only
> match a do
We are using multiselect facets like what you have below (although I haven't
verified your syntax). So no, we are not using sessions.
See http://www.lucidimagination.com/search/?q=multiselect+faceting#/s:email for
help.
-Grant
http://www.lucidimagination.com
On Apr 1, 2010, at 12:35 PM, bbara
Ok its now monday and everyone should have had their nice morning cup of
coffee :)
--
View this message in context:
http://n3.nabble.com/MoreLikeThis-function-queries-tp692377p698304.html
Sent from the Solr - User mailing list archive at Nabble.com.
:
: I don't know whether this is the good place to ask it, or there is a special
: tool for issue
: requests.
We use Jira for bug reports and feature reuqests, but it's always a good
idea to start with a solr-user email before filing a new bug/request to
help discuss the behavior you are seeing
: I want to be able to direct some search terms to specific fields
:
: I want to do something like this
:
: keyword1 should search against book titles / authors
:
: keyword2 should search against book contents / book info / user reviews
your question is a little vague ... will keyword1 and key
Is it possible to access the core name in a config file (such as
solrconfig.xml) so I can include core-specific configlets into a common
config file? I would like to pull in different configurations for
things like shards and replication, but have all the cores otherwise use
an identical confi
Thanks for the response Mitch.
I'm not too sure how well this will work for my needs but Ill certainly play
around with it. I think something more along the lines of Ahmet's solution
is what I was looking for.
--
View this message in context:
http://n3.nabble.com/Related-terms-combined-terms-
This sounds completley normal form what i remembe about mergeFactor.
Segmenets are merged "by level" meaning that with a mergeFactor of 5, once
5 "level 1" segments are formed they are merged into a single "level 2"
segment. then 5 more "level 1" segments are allowed to form before the
next m
Ahmet thanks, this sounds like what I was looking for.
Would one recommend using the TermsComponent prefix search or the Faceted
prefix search for this sort of functionality. I know for auto-suggest
functionality the generally consensus has been leaning towards the Faceted
prefix search over the
I'm guessing the user is expecting there to be one cfs file for the
index, and does not understand that its actually per segment.
On 04/05/2010 01:59 PM, Chris Hostetter wrote:
This sounds completley normal form what i remembe about mergeFactor.
Segmenets are merged "by level" meaning that wit
: so I have tried to attach the xslt steelsheet to the response of SOLR with
: passing this 2 variables wt=xslt&tr=example.xsl
:
: while example.xsl is an included steelsheet to SOLR , but the response in
: HTML was'nt very perfect .
can you elaborate on what you mean by "wasn't very perfect" ?
: This client uses a simple user-agent that requires JSON-syntax while parsing
: searchresults from solr, but when solr drops an exception, tomcat returns an
: error-500 page to the client and it crashes.
define "crashes" ? ... presumabl you are tlaking about the client crashing
because it ca
Hi,
I am using cachedSqlEntityprocessor in DIH to index the data. Please find
below my dataconfig structure,
---> object
--> object properties
For each and every object I would be retrieveing corresponding object
properties (in my subqueries).
I get in to OOM very often and I think thats a
: Some applications (such as Windows Notepad), insert a UTF-8 Byte Order Mark
: (BOM) as the first character of the file. So, perhaps the first word in your
: stopwords list contains a UTF-8 BOM and thats why you are seeing this
: behavior.
Robert: BOMs are one of those things that strike me as b
: Actually I needed time upto seconds granularity, so did you mean I
: should index the field after conversion into seconds
it doesnt' relaly matter what granularity you need -- the point is if you
need to query for things based on time of day, independent of hte actual
date, then the best way
: NOW/HOUR-5HOURS evaluates to 2010-03-31T21:00:00 which should not be the
: case if the current time is Wed Mar 31 19:50:48 PDT 2010. Is SOLR converting
: NOW to GMT time?
1) "NOW" means "Now" ... what moment in time is happening right at this
moment is independent of what locale you are in an
On Mon, Apr 5, 2010 at 2:28 PM, Chris Hostetter
wrote:
> If text files that start with a BOM aren't properly being dealt with by
> Solr right now, should we consider that a bug?
It's a Java bug:
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4508058
But we should fix if it's practical to do
: > However, I am searching for a solution that does something like: "this is my
: > query" and the document has to consist of this query plus maximal - for
: > example - two another terms?
...
: Not quite following. It sounds like you are saying you want to favor
: docs that are shorter,
On Mon, Apr 5, 2010 at 2:28 PM, Chris Hostetter wrote:
>
> Robert: BOMs are one of those things that strike me as being abhorent and
> inheriently evil because they seem to cause nothing but problems --
>
Yes.
>
> If text files that start with a BOM aren't properly being dealt with by
> Solr ri
Solr also has a feature to stream from a local file rather than over
the network. The parameter
stream.file=/full/local/file/name.txt
means 'read this file from the local disk instead of the POST upload'.
Of course, you have to get the entire file onto the Solr indexer
machine (or a common file
Hi
Can no-one help me with this?
Andrew
On 2 April 2010 22:24, Andrew McCombe wrote:
> Hi
>
> I am experimenting with Solr to index my gmail and am experiencing an error:
>
> 'Unable to load MailEntityProcessor or
> org.apache.solr.handler.dataimport.MailEntityProcessor'
>
> I downloaded a fres
The 2B limitation is within one shard, due to using a signed 32-bit
integer. There is no limit in that regard in sharding- Distributed
Search uses the stored unique document id rather than the internal
docid.
On Fri, Apr 2, 2010 at 10:31 AM, Rich Cariens wrote:
> A colleague of mine is using nati
Thank you both for responsing.
Hoss,
what you've pointed out was exactly what I am looking for.
However, I would *always* prefer the second implementation, because of the
fact that you have to compute the number of terms for all records only for
*one* time. :-)
At the moment I would feel like w
It seems to work ;).
However, trueman, you should subscribe to solr-user@lucene.apache.org, since
not everybody looks up Nabble for mailing-list postings.
- Mitch
--
View this message in context:
http://n3.nabble.com/Index-db-data-tp693204p698691.html
Sent from the Solr - User mailing list a
In a word: "no".
What you can do instead of deleting them is to add them to a growing
list of "don't search for these documents". This could be listed in a
filter query.
We had exactly this problem in a consumer app; we had a small but
continuously growing list of obscene documents in the index,
Hey All,
Just to save some folks some time in case you are trying to get new
Lucene/Solr up in running in Eclipse. If you continue to get weird errors,
e.g., in solr/src/test/TestConfig.java regarding
org.w3c.dom.Node#getTextContent(), I found for me this error was caused by
including the Tidy.jar
This information is not available via the API. If you would like this
information added to the statistics request, please file a JIRA
requesting it.
Without knowing the size of the index files to be transferred, the
client cannot monitor its own disk space. This would be useful for the
cloud manag
Sorry for doubleposting, but to avoid any missunderstanding:
Accessing instantiated filters is not a really good idea, since a new Filter
must be instantiated all the time. However, what I have ment was: if I
create a WordDelimiterFilter or a StopFilter and I have set a param for a
file like stop
Hi,
Suppose I search for the word *international. *A particular record (say *
recordX*) I am looking for is coming as the Nth result now.
I have a requirement that when a user queries for *international *I need
recordX to always be the first result. How can I achieve this.
Note:- When user searc
The MailEntityProcessor is an "extra" and does not come normally with
the DataImportHandler. The wiki page should mention this.
In the Solr distribution it should be in the dist/ directory as
dist/apache-solr-dataimporthandler-extras-1.4.jar. The class it wants
is in this jar . (Do 'unzip -l jar'
Making snippets is part of highlighting.
http://www.lucidimagination.com/search/s:lucid/li:cdrg?q=snippet
On Mon, Apr 5, 2010 at 10:53 AM, Shawn Heisey wrote:
> Is it possible to access the core name in a config file (such as
> solrconfig.xml) so I can include core-specific configlets into a com
mergeFactor=5 means that if there are 42 documents, there will be 3 index files:
1 with 25 documents,
3 with 5 documents, and
1 with 2 documents
Imagine making change with coins of 1 document, 5 documents, 5^2
documents, 5^3 documents, etc.
On Mon, Apr 5, 2010 at 10:59 AM, Chris Hostetter
wrote
Hi,
I am using the dismax handler.
I have a field named *myfield* which has a value say XXX.YYY.ZZZ. I have
boosted myfield^20.0.
Even with such a high boost (in fact among the qf fields specified this
field has the max boost given), when I search for XXX.YYY.ZZZ I see my
record as the second one
Hmmm, how do you know which particular record corresponds to which keyword?
Is this a list known at index time, as in "this record should come up first
whenever "bonkers" is the keyword?
If that's the case, you could copy the magic keyword to a different field
(say magic_keyword) and boost it righ
What do you get back when you specify &debugQuery=on?
Best
Erick
On Mon, Apr 5, 2010 at 7:31 PM, Mark Fletcher
wrote:
> Hi,
>
> I am using the dismax handler.
> I have a field named *myfield* which has a value say XXX.YYY.ZZZ. I have
> boosted myfield^20.0.
> Even with such a high boost (in fact
: If that's the case, you could copy the magic keyword to a different field
: (say magic_keyword) and boost it right into orbit as an OR clause
: (magic_keyword:bonkers ^1). This kind of assumes that a magic keyword
: corresponds to one and only one document
:
: If this is way off base, p
: Subject: Multicore and TermVectors
It doesn't sound like Multicore is your issue ... it seems like what you
mean is that you are using distributed search with TermVectors, and that
is causing a problem. Can you please clarify exactly what you mean ...
describe your exact setup (ie: how mana
: times. Is there any way to have the index keep its caches when the only thing
: that happens is deletions, then invalidate them when it's time to actually add
: data? It would have to be something I can dynamically change when switching
: between deletions and the daily import.
The problem is
: We had exactly this problem in a consumer app; we had a small but
: continuously growing list of obscene documents in the index, and did
: not want to display these. So, we had a filter query with all of the
: obscene words, and used this with every query.
that doesn't seem like it would really
On Mon, Apr 5, 2010 at 9:04 PM, Chris Hostetter
wrote:
> ... the reusing the FieldCache seems like hte only thing that would be
> advantageous in that case
And FieldCache entries are currently reused when there have only been
deletions on a segment (since Solr 1.4).
-Yonik
http://www.lucidimagin
: > ... the reusing the FieldCache seems like hte only thing that would be
: > advantageous in that case
:
: And FieldCache entries are currently reused when there have only been
: deletions on a segment (since Solr 1.4).
But that's kind of orthoginal to (what i think) Lance's point was: that
On Mon, Apr 5, 2010 at 9:10 PM, Chris Hostetter
wrote:
>
>
> : > ... the reusing the FieldCache seems like hte only thing that would be
> : > advantageous in that case
> :
> : And FieldCache entries are currently reused when there have only been
> : deletions on a segment (since Solr 1.4).
>
> But
Hi Eric,
Thanks many for your mail!
Please find attached the debugQuery results.
Thanks!
Mark
On Mon, Apr 5, 2010 at 7:38 PM, Erick Erickson wrote:
> What do you get back when you specify &debugQuery=on?
>
> Best
> Erick
>
> On Mon, Apr 5, 2010 at 7:31 PM, Mark Fletcher
> wrote:
>
> > Hi,
> >
On 04/05/2010 01:53 PM, Shawn Heisey wrote:
Is it possible to access the core name in a config file (such as
solrconfig.xml) so I can include core-specific configlets into a
common config file? I would like to pull in different configurations
for things like shards and replication, but have al
On 04/05/2010 02:28 PM, bbarani wrote:
Hi,
I am using cachedSqlEntityprocessor in DIH to index the data. Please find
below my dataconfig structure,
---> object
--> object properties
For each and every object I would be retrieveing corresponding object
properties (in my subqueries).
I ge
: The best you have to work with at the moment is Xincludes:
:
: http://wiki.apache.org/solr/SolrConfigXml#XInclude
:
: and System Property Substitution:
:
: http://wiki.apache.org/solr/SolrConfigXml#System_property_substitution
Except that XInclude is a feature of hte XML parser, while proper
I had a slight hiccup that I just ignored. Even when I used Java 1.6
JDK mode, Eclipse did not know this method. I had to comment out the
three places that use this method.
javax.xml.parsers.DocumentBuilderFactory.setXIncludeAware(true)
Lance Norskog
On Mon, Apr 5, 2010 at 1:49 PM, Mattmann, Chr
Mark,
I have opened a JIRA issue - https://issues.apache.org/jira/browse/SOLR-1867
Thanks,
Barani
--
View this message in context:
http://n3.nabble.com/Need-info-on-CachedSQLentity-processor-tp698418p699329.html
Sent from the Solr - User mailing list archive at Nabble.com.
There is no query parameter. The query parser throws an NPE if there
is no query parameter:
http://issues.apache.org/jira/browse/SOLR-435
It does not look like term vectors are processed in distributed search anyway.
On Mon, Apr 5, 2010 at 4:45 PM, Chris Hostetter
wrote:
>
> : Subject: Multicor
On 04/05/2010 10:12 PM, Chris Hostetter wrote:
: The best you have to work with at the moment is Xincludes:
:
: http://wiki.apache.org/solr/SolrConfigXml#XInclude
:
: and System Property Substitution:
:
: http://wiki.apache.org/solr/SolrConfigXml#System_property_substitution
Except that XInclude
Maddy,
you need to reindex the whole record, if you change or add any kind of data
that belongs to it.
Please, note that you need to subscribe to the solr-user-mailing list, since
not everyone is using Nabble to get Mailinglist-postings.
Kind regards,
- Mitch
Maddy.Jsh wrote:
>
> I indexed
Yeah, thanks for pointing this out.
I'm not using any relevancy functions (yet). The data indexed for my app is
basically log events.
The most relevant events are the newest ones, so sorting by timestamp is
enough.
BTW, your book is great ;)
-Janne
2010/3/31 Smiley, David W.
> Janne,
>
65 matches
Mail list logo