Hi All,
I am very new and learning solr.
I have 10 column like following in table
1. id
2. name
3. user_id
4. location
5. country
6. landmark1
7. landmark2
8. landmark3
9. landmark4
10. landmark5
when user search for landmark then I want to return only one landmark which
match. Rest of the lan
Hi,
I have a requirement where I want to sum up the scores of the faceted
fields. This will be decide the relevancy for us. Is there a way to do it on
a facet field? Basically instead of giving the count of records for facet
field I would like to have total sum of scores for those records.
Any
I have studied some Russian. I kind of got the picture from the texts that all
the exceptions had already been 'found', and were listed in the book.
I do know that languages are living, changing organisms, but Russian has got to
be more regular than English I would think, even WITH all six case
Should this go into the trunk, or does it only solve problems unique
to your use case?
On Tue, Jul 27, 2010 at 5:49 AM, Chantal Ackermann
wrote:
> Hi Mitch,
>
> thanks for the code. Currently, I've got a different solution running
> but it's always good to have examples.
>
>> > If realized
>> > t
Solr respects case for field names. Database fields are supplied in
lower-case, so it should be 'attribute_name' and 'string_value'. Also
'product_id', etc.
It is easier if you carefully emulate every detail in the examples,
for example lower-case names.
On Tue, Jul 27, 2010 at 2:59 PM, kenf_nc
Ah! You have junk files piling up in the slave index directory. When
this happens, you may have to remove data/index entirely. I'm not sure
if Solr replication will handle that, or if you have to copy the whole
index to reset it.
You said the slaves time out- maybe the files are so large that the
There are two different datasets that Solr (Lucene really) saves from
a document: raw storage and the indexed terms. I don't think the
ExtractingRequestHandler ever automatically stored the raw data; in
fact Lucene works in Strings internally, not raw byte arrays (this is
changing).
It should be i
"Yonik's Law of Patches" reads: "A half-baked patch in Jira, with no
documentation, no tests and no backwards compatibilty is better than no
patch at all."
It'd be perfectly appropriate, IMO, for you to post an outline of what your
enhancements do over on the SOLR dev list and get a reaction from
I would start over from the Solr 1.4.1 binary distribution and follow
the instructions on the wiki:
http://wiki.apache.org/solr/ExtractingRequestHandler
(Java classpath stuff is notoriously difficult, especially when
dynamically configured and loaded. I often cannot tell if Java cannot
load the c
> Is there a way to tell Solr to only return a specific set of facet values? I
> feel like the facet query must be able to do this, but I'm not really
> understanding the facet query. In my specific case, I'd like to only see
> facet
> values for the same values I pass in as query filters, i.e.
Is there a way to tell Solr to only return a specific set of facet values? I
feel like the facet query must be able to do this, but I'm not really
understanding the facet query. In my specific case, I'd like to only see facet
values for the same values I pass in as query filters, i.e. if I run
(10/07/27 23:16), Stephen Green wrote:
The wiki entry for hl.highlightMultiTerm:
http://wiki.apache.org/solr/HighlightingParameters#hl.highlightMultiTerm
doesn't appear to be correct. It says:
If the SpanScorer is also being used, enables highlighting for
range/wildcard/fuzzy/prefix queries.
for STRING_VALUE, I assume there is a property in the 'select *' results
called string_value? if so I'm not sure why it wouldn't work. If not, then
that's why, it doesn't have anything to put there.
For ATTRIBUTE_NAME, is it possibly a case issue? you called it
'Attribute_Name' in your query, but
Yonik,
One more update on this. I used the filter query that was throwing
error and used it to delete a subset of results.
After that the queries started working correctly.
Which indicates that the particular docId was present in the index somewhere,
but lucene was not able to find it.
I haven't been able to reproduce anything...
But if you guys are sure you're not running any custom code, then
there's definitely seems to be a bug somewhere.
Can anyone reproduce this in something you can share?
-Yonik
http://www.lucidimagination.com
Hi,
(The first version of this was rejected for spam).
I'm setting up a test instance of Solr, and keep running into the problem of
having Solr not work the way I think it should work. Specifically, the data I
want to go into the index isn't there after indexing. I'm extracting the data
from M
I thought I asked a variation of this before, but I don't see it on the
list, apologies if this is a duplicate, but I have new questions.
So I need to find the min and max value of a result set. Which can be
several million documents. One way to do this is the StatsComponent.
One problem is t
Erik,
You're right on both accounts. I'll upgrade and then check into whether
our tokenizer is working properly.
Thanks,
Than
Erik Hatcher wrote:
Than -
Looks like maybe your text_bo field type isn't analyzing how you'd
like? Though that's just a hunch. I pasted the value of that field
I am getting a similar error with today's nightly build:
HTTP Status 500 - Index: 54, Size: 24
java.lang.IndexOutOfBoundsException: Index: 54, Size: 24 at
java.util.ArrayList.RangeCheck(ArrayList.java:547) at
java.util.ArrayList.get(ArrayList.java:322) at
org.apache.lucene.index.FieldInfos.fieldIn
Than -
Looks like maybe your text_bo field type isn't analyzing how you'd
like? Though that's just a hunch. I pasted the value of that field
returned in the link you provided into your analysis.jsp page and it
chunked tokens by whitespace. Though I could be experiencing a copy/
paste/i
Thank you very much Hoss for the reply.
I am using the embedded mode (SolrServer). I am not explicitly accessing
SolrIndexSearcher. I am explicitly closing the SolrCore after the request
has been processed.
Although I did notice that I am using SolrQueryRequest object and is not
explicitly getti
On Jul 27, 2010, at 12:21pm, Chris Hostetter wrote:
:
: I was wondering if anyone has found any resolution to this email
thread?
As Grant asked in his reply when this thread was first started
(December 2009)...
It sounds like you are either using embedded mode or you have some
custom c
I'm a relative beginner at SOLR, indexing and searching Unicode Tibetan
texts. I am trying to use the highlighter but it just returns, empty
elements, such as:
What am I doing wrong?
The query that generated that is:
http://www.thlib.org:8080/thdl-solr/thdl-texts/select?inden
: Thanks for your reply. I could not find in the log files any mention to
: that. By the way I only have _MM_DD.request.log files in my directory.
:
: Do I have to enable any specific log or level to catch those errors?
if you are using that "java -jar start.jar" command for the example jet
:
: I was wondering if anyone has found any resolution to this email thread?
As Grant asked in his reply when this thread was first started (December
2009)...
>> It sounds like you are either using embedded mode or you have some
>> custom code. Are you sure you are releasing your resources co
: Is there anyway to have time out support in distributed search. I
: searched https://issues.apache.org/jira/browse/SOLR-502 but looks it is
: not in main release of solr1.4
note that issue is marked "Fix Version/s: 1.3" ... that means it
was fixed in Solr 1.3, well before 1.4 came out.
Yo
Mark,
I'd like to see your code if you open a JIRA for this. I recently
opened SOLR-2010 with a patch that does something similar to the second
part only of what you describe (find combinations that actually return a
match). But I'm not sure if my approach is the best one so I would like
to see
In trunk (flex) you can ask each segment for its unique term count.
But to compute the unique term count across all segments is
necessarily costly (requires merging them, to de-dup), as Hoss
described.
Mike
On Tue, Jul 27, 2010 at 12:27 PM, Burton-West, Tom wrote:
> Hi Jason,
>
> Are you lookin
Alessandro & all,
I was having the same issue with Tika crashing on certain PDFs. I also noticed
the bug where no content was extracted after upgrading Tika.
When I went to the SOLR issue you link to below, I applied all the patches,
downloaded the Tika 0.8 jars, restarted tomcat, posted a f
Hi,
I found the suggestions returned from the standard solr spellcheck not to be
that relevant. By contrast, aspell, given the same dictionary and mispelled
words, gives much more accurate suggestions.
I therefore wrote an implementation of SolrSpellChecker that wraps jazzy,
the java aspell libra
I'm adding lots of small docs with several threads to solr and the adds
start fast but then slow down. I didn't do any explicit commits and
autocommit is turned off but the logs show lots of commit activity on
this core and restarting this solr core logged the below. Where did all
these commits c
Hi,
I'm trying to sort by distance like this:
sort=dist(2,lat,lon,55.755786,37.617633) asc
In general results are sorted, but some documents are not in right order.
I'm using DistanceUtils.getDistanceMi(...) from lucene spatial to calculate
real distance after reading documents from Solr.
Solr
Hi Jason,
Are you looking for the total number of unique terms or total number of term
occurrences?
Checkindex reports both, but does a bunch of other work so is probably not the
fastest.
If you are looking for total number of term occurrences, you might look at
contrib/org/apache/lucene/misc
Look into -XX:-GCUseOverheadLimit
On 7/26/10, Jonathan Rochkind wrote:
> I am now occasionally getting a Java "GC overhead limit exceeded" error
> in my Solr. This may or may not be related to recently adding much
> better (and more) warming querries.
>
> I can get it when trying a 'commit', afte
According to SO:
http://stackoverflow.com/questions/1557616/retrieving-per-keyword-field-match-position-in-lucene-solr-possible
It is not possible, but it is one year ago, is it still true for now?
Thanks.
Hi Yonik,
I am using Solr 1.4 release dated Feb-9 2010. There is no custom code. I am
using regular out of box dismax requesthandler.
The query is a simple one with 4 filter queries (fq's) and one sort query.
During the index generation, I delete a set of rows based on date filter, then
add new
If you could, let me know how your testing goes with this change. I too am
interested in having the Collate work as good as it can. It looks like the
code would be better with this change but then again I don't know what the
original author was thinking when this was put in.
James Dyer
E-Comm
The wiki entry for hl.highlightMultiTerm:
http://wiki.apache.org/solr/HighlightingParameters#hl.highlightMultiTerm
doesn't appear to be correct. It says:
If the SpanScorer is also being used, enables highlighting for
range/wildcard/fuzzy/prefix queries. Default is false.
But the code in Defaul
right, but your problem is this is the current output:
Ковров -> Ковр
Коврову -> Ковров
Ковровом -> Ковров
Коврове -> Ковров
so, if Ковров was simply left alone, all your forms would match...
2010/7/27 Oleg Burlaca
> Thanks Robert for all your help,
>
> The idea of ы[A-Z].* stopwords is ideal
Thanks Robert for all your help,
The idea of ы[A-Z].* stopwords is ideal for the english language,
although in russian nouns are inflected: Борис, Борису, Бориса, Борисом
I'll try the RussianLightStemFilterFactory (the article in the PDF mentioned
it's more accurate).
Once again thanks,
Oleg Bur
Thanks for the input, i'll check it out!
Marc
> Subject: RE: Spellcheck help
> Date: Fri, 23 Jul 2010 13:12:04 -0500
> From: james.d...@ingrambook.com
> To: solr-user@lucene.apache.org
>
> In org.apache.solr.spelling.SpellingQueryConverter, find the line (#84):
>
> final static String PATTERN =
Hi Mitch,
thanks for the code. Currently, I've got a different solution running
but it's always good to have examples.
> > If realized
> > that I have to throw an exception and add the onError attribute to the
> > entity to make that work.
> >
> I am curious:
> Can you show how to make a meth
Hi
I am using solrCloud.
Suppose I have a total 4 machines dedicated for solr.
I want to have 2 machines as replication (salves) and 2 masters
But I want to work with 8 logical cores rather 2.
i.e. each master (and each slave) will have 4 cores on it.
the reason is that I can optimize the cores on
Hi Chantal,
instead of:
/* multivalued, not required */
you do:
/* multivalued, not required */
The yourCustomFunctionToReturnAQueryString(vip, querystring1, querystring2)
{
if(vip != n
> We have three dedicated servers for solr, two for slaves and one for master,
> all with linux/debian packages installed.
>
> I understand that replication does always copies over the index in an exact
> form as in master index directory (or it is supposed to do that at least),
> and if the mast
Hi Mitch,
> New idea:
> Create a method which returns the query-string:
>
> returnString(theVIP)
> {
>if ( theVIP != null || theVIP != "")
>{
>return "a query-string to find the vip"
>}
>else
>{
>return "SELECT 1" // you nee
I did not realize the LucidWords.jar comes with an option to install the
sources :-)
On Tue, Jul 27, 2010 at 10:59 AM, Eric Grobler wrote:
> Good Morning, afternoon or evening...
>
> If someone installed Solr using the LucidWorks.jar (1.4) installation how
> can one make a small change and recomp
Hi Chantal,
> However, with this approach indexing time went up from 20min to more
> than 5 hours.
>
This is 15x slower than the initial solution... wow.
>From MySQL I know that IN ()-clauses are the embodiment of endlessness -
they perform very, very badly.
New idea:
Create a method which
Ouch! Absolutely correct - quoting the URL fixed it. Thanks for saving me a
sleepless night!
cheers - rene
2010/7/26 Chris Hostetter
>
> : However, when I'm trying this very URL with curl within my (perl) script,
> I
> : receive a NullPointerException:
> : CURL-COMMAND: curl -sL
> :
> http://lo
Hi,
I have been using DIH to do index documents from database. I am hoping to
use DIH to delete documents from index. I search in wiki and found the
special commands in DIH to do so.
http://wiki.apache.org/solr/DataImportHandler#Special_Commands
But there is no example on how to use them. I tr
Hi Jon,
During the last days we front the same problem.
Using Solr 1.4.1 classic (tika 0.4 ),from some pdf files we can't extract
content and from others, Solr throws an exception during the Indexing
Process .
You must:
Update tika libraries (into /contrib/extraction/lib)with tika-core.0.8
snapshot
Good Morning, afternoon or evening...
If someone installed Solr using the LucidWorks.jar (1.4) installation how
can one make a small change and recompile.
Is there a LucidWorks (tomcat) build somewhere?
Regards
ericz
Hi Mitch,
thanks for that suggestion. I wasn't aware of that. I've already added a
temporary field in my ScriptTransformer that does basically the same.
However, with this approach indexing time went up from 20min to more
than 5 hours.
The new approach is to query the solr index for that other d
We have three dedicated servers for solr, two for slaves and one for master,
all with linux/debian packages installed.
I understand that replication does always copies over the index in an exact
form as in master index directory (or it is supposed to do that at least),
and if the master index wa
Hi Matt,
I'm attempting to get the carrot based clustering component (in trunk) to
> work. I see that the clustering contrib has been disabled for the time
> being. Does anyone know if this will be re-enabled soon, or even better,
> know how I could get it working as it is?
>
I've recently create
Hi,
I'm attempting to get the carrot based clustering component (in trunk) to
work. I see that the clustering contrib has been disabled for the time
being. Does anyone know if this will be re-enabled soon, or even better,
know how I could get it working as it is?
Thanks,
Matt
2010/7/27 Oleg Burlaca
> Actually the situation with Немцов из ок,
> I've just checked how Yandex works with Немцов and Немцова:
> http://nano.yandex.ru/project/inflect/
>
> I think there are two solutions:
> a) manually search for both Немцов and then Немцова
> b) use wildcard query: Немцов*
>
Actually the situation with Немцов из ок,
I've just checked how Yandex works with Немцов and Немцова:
http://nano.yandex.ru/project/inflect/
I think there are two solutions:
a) manually search for both Немцов and then Немцова
b) use wildcard query: Немцов*
Robert, thanks for the RussianLightStemF
A similar word is Немцов.
The strange thing is that searching for "Немцова" will not find documents
containing "Немцов"
Немцова: 14 articles
http://www.sova-center.ru/search/?lg=1&q=%D0%BD%D0%B5%D0%BC%D1%86%D0%BE%D0%B2%D0%B0
Немцов: 74 articles
http://www.sova-center.ru/search/?lg=1&q=%D0%BD%D0%B
Yes, I'm sure I've enabled SnowballPorterFilterFactory both at Index and
Query time, because the search works ok,
except names and geo locations.
I've noticed that searching by
Коврова
also shows documents that contain Коврову, Коврове
Search by Ковров, 7 results:
http://www.sova-center.ru/searc
another look, your problem is ковров itself... its mapped to ковр
a workaround might be to use the protected words functionality to
keep ковров and any other problematic people/geo names as-is.
separately, in trunk there is an alternative russian stemmer
(RussianLightStemFilterFactory), which mig
Hi,
I've recently been looking into Spellchecking in solr, and was struck by how
limited the usefulness of the tool was.
Like most corpora , ours contains lots of different spelling mistakes for
the same word, so the 'spellcheck.onlyMorePopular' is not really that useful
unless you click on it nu
All of your examples stem to "ковров":
assertAnalyzesTo(a, "Коврова Коврову Ковровом Коврове",
new String[] { "ковров", "ковров", "ковров", "ковров" });
}
Are you sure you enabled this at *both* index and query time?
2010/7/27 Oleg Burlaca
> Hello,
>
> I'm using SnowballPorter
Hello,
I'm using SnowballPorterFilterFactory with language="Russian".
The stemming works ok except people names, geographical places.
Here are some examples:
searching for Ковров should also find Коврова, Коврову, Ковровом, Коврове.
Are there other stemming plugins for the russian language that
How to reduce the index files size, decreate the sync time between each nodes.
decrease the index create/update time.
Thanks.
I would use the string version as Drupal will probably populate it with a url
like thing something that may not validate as type url
On 27 Jul 2010, at 04:00, Savannah Beckett wrote:
>
> I am trying to merge the schema.xml that is the solr/nutch setup with the one
> from drupal apache solr mo
Hi,
IMHO you can do this with date range queries and (date) facets.
The DateMathParser will allow you to normalize dates on min/hours/days.
If you hit a limit there, then just add a field with an integer for
either min/hour/day. This way you'll loose the month information - which
is sometimes what
67 matches
Mail list logo