Given an input of "Windjacke" (probably "wind jacket" in English), I'd
like the code that prepares the data for the index (tokenizer etc) to
understand that this is a "Jacke" ("jacket") so that a query for "Jacke"
would include the "Windjacke" document in its result set.
It appears to me that such
> Given an input of "Windjacke" (probably "wind jacket" in English),
> I'd like the code that prepares the data for the index (tokenizer
> etc) to understand that this is a "Jacke" ("jacket") so that a
> query for "Jacke" would include the "Windjacke" document in its
> result set.
>
> It appears t
would slow down the update process but you don't need to split
> words during search.
> > Le 12 avr. 2012 à 11:52, Michael Ludwig a écrit :
> >
> >> Given an input of "Windjacke" (probably "wind jacket" in English),
> >> I'd like the co
> Von: Markus Jelsma
> We've done a lot of tests with the HyphenationCompoundWordTokenFilter
> using a from TeX generated FOP XML file for the Dutch language and
> have seen decent results. A bonus was that now some tokens can be
> stemmed properly because not all compounds are listed in the
> dic
> Von: Walter Underwood
> German noun decompounding is a little more complicated than it might
> seem.
>
> There can be transformations or inflections, like the "s" in
> "Weinachtsbaum" (Weinachten/Baum).
I remember from my linguistics studies that the terminus technicus for
these is "Fugenmorph
> Von: Tomas Zerolo
> > > There can be transformations or inflections, like the "s" in
> > > "Weinachtsbaum" (Weinachten/Baum).
> >
> > I remember from my linguistics studies that the terminus technicus
> > for these is "Fugenmorphem" (interstitial or joint morpheme) [...]
>
> IANAL (I am not a l
plus 1 𐀀
Maybe the test script output says that such characters cannot be used
for querying. Hardly relevant if you consider that the BMP comprises
even languages such as Telugu, Bopomofo and French.
Best,
Michael Ludwig
some
profiling for your specific scenario.
The rule of thumb here is probably: Get what you need.
Michael Ludwig
System.out.println(Charset.defaultCharset().displayName());
System.out.println(new String(bytes));
System.out.println(new String(bytes, Charset.forName("UTF-8")));
}
}
Output:
windows-1252
Käse (bad)
Käse (good)
Michael Ludwig
lt of favouring XML over strings,
I rather want something like this:
Eumel NDR Ländermagazine
There could be a parameter "hl.xml" which I could use to request
modified XML like this:
hl.xlm=em
hl.xlm=b
This would allow smoother processing technologies like XSLT.
Is such a feature available?
Michael Ludwig
ke a look at the class
java.nio.charset.Charset and the methods encode, decode, newEncoder,
newDecoder.
Michael Ludwig
g to field/@type?
Or do these default to "true" regardless of what's specified in the
respective ?
Michael Ludwig
&cred;
m...@lobster:~/funkuhr > cat zwei.xml
]>
&cred;
m...@lobster:~/funkuhr > cat cred.ent
ich
geheim
m...@lobster:~/funkuhr > xmllint --noent eins.xml
]>
ich
geheim
m...@lobster:~/funkuhr > xmllint --noent zwei.xml
]>
ich
geheim
But that doesn'
to that
type of data that I could limit my search to, as per Otis' post?
(4) And is that what's called a "core" here?
(5) Or, failing (3), and lumping everything together in one search
domain (core?), would I use that "type field" to limit my search to
a particular type of data?
Michael Ludwig
Matt Weber schrieb:
http://wiki.apache.org/solr/MultipleIndexes
Thanks, Mark. Your explanation and the pointer to the Wiki have
clarified things for me.
Michael Ludwig
Otis Gospodnetic schrieb:
Attribute values for fields should be inherited from attribute values
of their field types.
Thanks, that answers my question pertaining to @indexed and @stored in
the "fieldtype" and "field" elements in "schema.xml".
Michael Ludwig
structions in the tutorial
and run Solr in Jetty as per the distribution, which works out of the
box:
http://lucene.apache.org/solr/tutorial.html
Michael Ludwig
hieve. I think
you should start there.
http://lucene.apache.org/solr/tutorial.html#Indexing+Data
Michael Ludwig
n the address bar of your browser.
Or even do a string replacement s/8983/8080/g on the Solr doc you're
viewing.
Michael Ludwig
uday kumar maddigatla schrieb:
My intention is to use 8080 as port.
Is there any other way taht Solr will post the files in 8080 port
Solr doesn't post, it listens.
Use the curl utility as indicated in the documentation.
http://wiki.apache.org/solr/UpdateXmlMessages
Michael Ludwig
u
can dream up.
Seriously, read the docs, it'll help you :-)
Michael Ludwig
asing overlaps and hence redundancy?
Michael Ludwig
with some encoding not getting supported by Solr.
Did you make sure to not rely on your platform default encoding
(Charset) when constructing the InputStreamReader? If in doubt, take
a look at the InputStreamReader constructors.
Michael Ludwig
äse"));
}
}
Note the result of the above, which is plain wrong, reads:
[(k,0,1,type=), (se,2,4,type=)]
Thanks.
Michael Ludwig
nsidering that I'm a Solr/Lucene newbie, this approach might have a
disadvantage that escapes me, which is why other people haven't made
this particular suggestion. If so, I'd be happy to learn why this isn't
preferable.
Michael Ludwig
I'll plead ignorance of the 'ineluctable filter query' and will have
to read up on that one.
I meant a filter query that the application tags onto the query on
behalf of the user and without the user being able to do anything about
it so he cannot circumvent the filter.
Best regards,
Michael Ludwig
do entities.
C:\MILU\dev\XML # type egpe-net.xml
http://lobster.as-guides.com/ds/solr.schema.ent"; >
]>
&egpe_from_the_net;
&egpe_from_the_local_disk;
C:\MILU\dev\XML # type egpe-local.ent
Michael Ludwig
Shalin Shekhar Mangar schrieb:
On Mon, May 11, 2009 at 2:46 PM, Michael Ludwig
wrote:
Could you give an example of how the spellcheck.q parameter can be
brought into play to (take non-ASCII characters into account, so
that "Käse" isn't mishandled) given the following example:
L}\d_]+:-)
Michael Ludwig
ite different from a spellchecker.
IMHO, a name conveying the actual meaning, along the lines of
"suggest", would make more sense.
Michael Ludwig
laid
out in the thread referred to above, it seems you want to
use the spellcheck.q parameter for anything but what can
be encoded in ASCII. Is that true?
Michael Ludwig
and if possible, give a patch?
Please see: https://issues.apache.org/jira/browse/SOLR-1204
Regards,
Michael Ludwig
ess invasive. I added two sentences to the "Introduction" of:
http://wiki.apache.org/solr/SpellCheckComponent
Michael Ludwig
category values. I then allow the application to apply filtering
by category, incidentally, using faceting, which is a typical usage
pattern, I guess.
Michael Ludwig
@size) is concerned?
Michael Ludwig
is so terribly expensive.
Michael Ludwig
f you don't save, say, five or ten percent (YMMV), it
might not be worth the effort.
Michael Ludwig
side process based on top N (say 100) hits for this but it is my last
option.
Also a very interesting data mining question! I'm sorry I don't have any
answers for you. Maybe someone else does.
Best,
Michael Ludwig
), and (b) collecting all the
pesky little terms from the new structure mapping documents to term
numbers?
So basically, depending on expediency, you (a) know the facets and count
the documents which display them, or you (b) take the documents and see
what facets they have?
Michael Ludwig
ions is likely to scale as the product of
the number of your primary search results, the number of your search
terms, and the number of your facets.
I assume this is an expensive operation.
Michael Ludwig
Shalin Shekhar Mangar schrieb:
On Tue, Jun 9, 2009 at 7:25 PM, Michael Ludwig
wrote:
A filter query should probably be orthogonal to the primary query,
which means in plain English: unrelated to the primary query. To give
an example, I have a field "category", which is a required
Shalin Shekhar Mangar schrieb:
On Tue, Jun 9, 2009 at 7:47 PM, Michael Ludwig
wrote:
Given the following three filtering scenarios of (a) x:bla, (b)
y:blub, and (c) x:bla AND y:blub, will I end up with two or three
distinct filters? In other words, may filters be composites or are
they
Shalin Shekhar Mangar schrieb:
No, both filters and queries are computed on the entire index.
My comment was related to the "A filter query should probably be
orthogonal to the primary query..." part. I meant that both kinds of
use-cases are common.
Got it. Thanks :-)
Michael Ludwig
Fergus McMenemie schrieb:
On Tue, Jun 9, 2009 at 7:25 PM, Michael Ludwig
wrote:
A filter query is cached, which means that it is the more useful
the more often it is repeated. We know how often certain queries
arise, or at least have the means to collect that data - so we
know what might be
Yonik Seeley schrieb:
Yep, all that sounds right.
An additional optimization counts terms for the documents *not* in the
set when the base set is over half the size of the index.
Cool :-) Thanks for confirming my assumptions!
Michael Ludwig
to be determined. Is that a correct assessment?
Michael Ludwig
such as GNU Gettext for this purpose.
May or may not make sense in your particular situation.
Michael Ludwig
address.
Michael Ludwig
ing to any other language.
Michael Ludwig
ashokc schrieb:
Do I have to declare 'field1' also to be stored? 'field1' is never
returned in the response.
I find the following Wiki page helpful when dealing with @stored,
@indexed and friends:
http://wiki.apache.org/solr/FieldOptionsByUseCase
Michael Ludwig
might be
overkill for your particular situation.
Michael Ludwig
rough the files in question, but I can't seem to
find the issue. Any suggestions?
Run: ant -verbose
Michael Ludwig
t;the DisMaxRequestHandler is simply the standard request
handler with the default query parser set to the DisMax Query Parser".
So maybe you could program your own CustomDisMaxRequestHandler that
reuses the DisMax query parser (and probably other components) to
achieve what you want.
Michael Ludwig
.
BUILD SUCCESSFUL
You might want to read up on Ant usage in the Ant User Manual, a copy of
which should be part of your installation, or can be found on the web.
Quick overview:
ant -help
When I wrote "ant -verbose", I meant "ant -verbose ", so:
ant -verbose example
Michael Ludwig
ch is what gets used to analyze
the data in order to determine clusters, if I understand correctly.
Michael Ludwig
Michael Ludwig schrieb:
Martin Davidsson schrieb:
I've tried to read up on how to decide, when writing a query, what
criteria goes in the q parameter and what goes in the fq parameter,
to achieve optimal performance. Is there [...] some kind of rule of
thumb to help me decide how to
.
Bottom line, I think it may make perfect sense to store dates and times
in integers, depending on your use case and your client.
Michael Ludwig
o range faceting for a given
field and obtain, say, results reduced from their actual continuum
of values to three ranges {A,B,C}, you'd have to define three
"facet.query" parameters accordingly. A mere "facet.field", on the
other hand, creates as many filters as there are unique values in
the field. Is that correct?
Michael Ludwig
Shalin Shekhar Mangar schrieb:
On Mon, Jun 15, 2009 at 4:39 PM, Michael Ludwig
wrote:
I think if you truncate dates to incomplete dates, you effectively
also lose all the date logic. You may still apply it, but what would
you take the result to mean? You can't regain precision you'
, but some less regular graph, then the notion of a
"main item" needs clarification.
Michael Ludwig
2 and
update the indexes. is it possible to send the differences only
into shard 3 and then merge it at shard 3?
My (very limited) understanding of shards is that you repartition
your documents among shards and send each document to only one
shard. (Not sure this is correct.)
Michael Ludwig
Use the DisMaxRequestHandler and specify all fields you want to use in
your query in the qf parameter.
artist^3 album^2 track^1
http://wiki.apache.org/solr/DisMaxRequestHandler
Michael Ludwig
x27;s what most people do, though nothing prevents the indexing
client from sending the same doc to multiple shards. In some
scenarios that's exactly what you want to do.
What kind of scenario would that be?
Michael Ludwig
--
A: Because it messes up the order in which people normally read te
something like "solr
date range query". For example, see:
http://www.nabble.com/Date-Range-Query-%2B-Fields-to16108517.html
Michael Ludwig
://wiki.apache.org/solr/CoreAdmin
Michael Ludwig
quot;&qt=dismaxrequest -
return correct results
I'd attribute that to the "mm" (minimum match) parameter, the meaning
of which you can understand reading the following page, which it would
probably make a lot of sense to read anyway:
http://wiki.apache.org/solr/DisMaxRequestHandler
Michael Ludwig
ather than the terms
within a single field.
I added the comment in that I think that a wiki page discussing fs vs
q should also mention facet.query.
It now does: http://wiki.apache.org/solr/FilterQueryGuidance
Michael Ludwig
e" titles :-)
Now with a phrase query with a small ps and a large posIncGap that
could word. But then I lose the ability to search for artist and
track name together.
Another thing, are you sure you have enabled "pf" for "track"?
Michael Ludwig
- every song or whatever, definitely multi-valued
Michael Ludwig
ifier
title - album title
interpret - the musician, possibly multi-valued
track - every song or whatever, definitely multi-valued
Read up about multi-valued fields (sample schema.xml, for example, or
Google) if you're unsure what this is; your posting subject, however,
suggests you aren't.
Regards,
Michael Ludwig
stops!
Imagine it did one day!
Michael Ludwig
cumulative_evictions : 61153787
As we can see the cache hit ratio is almost zero. How do I improve the
filter cache.
Maybe these pages add some ideas to the mix:
http://wiki.apache.org/solr/FilterQueryGuidance
https://issues.apache.org/jira/browse/SOLR-475
Michael Ludwig
r:8983/solr/kk
For SolrJ, see this thread:
Using SolrJ with multicore/shards - ahammad
http://markmail.org/thread/qnytfrk4dytmgjis
if so, isnt there a better way to do that?
No idea.
Michael Ludwig
Rakhi Khatwani schrieb:
On Thu, Jun 18, 2009 at 3:51 PM, Michael Ludwig
wrote:
I don't know how we're supposed to use it. I did the following:
http://flunder:8983/solr/xpg/select?q=bla&shards=flunder:8983/solr/xpg,flunder:8983/solr/kk
i am gettin a page load error... "
st exact match which is nothing but unique key =
1001?
Yes, it is: q=id:1001
(1) Don't use DisMax here, that will not interpret field names.
(2) Replace "id" by whatever name you gave to your unique key field.
Michael Ludwig
MilkDud schrieb:
Michael Ludwig-4 wrote:
What do you expect the user to enter?
* "dream theater innocence faded" - certainly wrong
* dream theater "innocence faded" - much better
Most likely they would just enter dream theater innocence faded, no
quotes. Without any quot
fq=y:blub" instead of "fq=x:bla
AND y:blub". See:
filterCache/@size, queryResultCache/@size, documentCache/@size
http://markmail.org/thread/tb6aanicpt43okcm
Michael Ludwig
nd to think that drop-down boxes
(the values of which you control) are a nice match for the filter query,
whereas user-entered text is more likely to be a candidate for the main
query.
Michael Ludwig
Radha C. schrieb:
The feature "spelling suggestion" is available in solr? If yes, can
you tell me some documentations?
Have you tried googling for: solr spelling ? First hit:
http://wiki.apache.org/solr/SpellCheckComponent
Michael Ludwig
rFieldType - Michael Ludwig
http://markmail.org/thread/dgi4llhc7x5wuroc
(BTW, the patch in SOLR-1204 is ready but still awaiting clarification.
See comments from June 11 and 18.)
My Config is :
spellcheck = 'true';
spellcheck.dictionary = 'jarowinkler'
spellcheck.onlyMorePop
ou'll probably find that the word "for" is removed as a so-called
stopword.
Michael Ludwig
e ok but more than 3 words
resulted zero. Why is happens?
Hi Akinori,
I guess you're using the DisMax query parser. Please read this entire
page: http://wiki.apache.org/solr/DisMaxRequestHandler
The parameter that allows you to tweak this is the "mm" parameter.
Michael Ludwig
Koji Sekiguchi schrieb:
I'm not a Windows user, but I think you can use Linux command (e.g.
patch, to apply SOLR-284 patch to Solr nightly build) on cygwin
environment.
The standalone patch utility for Win32 is another option.
http://gnuwin32.sourceforge.net/packages/patch.htm
Michael Ludwig
Gurjot Singh schrieb:
Hi,
Is there a way to monitor the number of search queries made on the
solr index.
http://localhost:8983/solr/admin/stats.jsp
Look for "requests :".
Michael Ludwig
the inclusion of a stopword list result in stopwords being of
top importance in the MoreLikeThis query?
Michael Ludwig
Wallace schrieb:
I'd like to hear what approaches are being used by users to know what
people is searching for in their apps.
You could process the access log.
You could write a filter servlet logging the relevant part of the query
string to a dedicated location.
Michael Ludwig
86 matches
Mail list logo