Solr is behaving a bit weirdly for some of the search terms. EG:
co-ownership, "co ownership".
It works fine with terms like quasi-delict, non-interference etc.
The issue is, its not return any excerpts in "highlighting" key of the
result dictionary. My search query is something like this:
http:/
You could write yourr query like
q=filedname1:searchValue AND fieldName2:value OR fieldName3: Value
Regards,
Manas
From: Suram [mailto:reactive...@yahoo.com]
Sent: Wed 3/17/2010 12:44 AM
To: solr-user@lucene.apache.org
Subject: Issue in search
In solr how ca
Thankyou Tommy. But the real problem here is that the xml is dynamic and the
element names will be different in different docs which means that there will
be a lot of field names to be added in schema if I were to index those xml
nodes separately.
Is it possible to have nested indexing (xml with
Just turn your entire disk to RAM
http://www.hyperossystems.co.uk/
800X faster. Who cares if it swaps to 'disk' then :-)
Dennis Gearon
Signature Warning
EARTH has a Right To Life,
otherwise we all die.
Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php
--
In solr how can perform AND, OR, NOT search while querying the data
--
View this message in context:
http://old.nabble.com/Issue-in-search-tp27927828p27927828.html
Sent from the Solr - User mailing list archive at Nabble.com.
You need to change your similarity object to be more sensitive at the
short end. This is a patch about how to do this:
http://issues.apache.org/jira/browse/LUCENE-2187
It involves Lucene coding.
On Fri, Mar 12, 2010 at 3:19 AM, muneeb wrote:
>
> Ah I see.
> Thanks very much Jay for your explan
Hi Giovanni,
Comments below:
> I'm pretty unclear on how to patch the Tika 0.7-trunk on our Solr instance.
> This is what I've tried so far (which was really just me guessing):
>
>
>
> 1. Got the latest version of the trunk code from
> http://svn.apache.org/repos/asf/lucene/tika/trunk
>
>
Use a + sign or %20 for the space. The URL standard uses a plus to mean a space.
On Tue, Mar 16, 2010 at 6:06 PM, KshamaPai wrote:
>
> Hi,
> Am using autobench to benchmark solr with the query
> http://localhost:8983/solr/select/?q=body:hotel AND
> _val_:"recip(hsin(0.7113258,-1.291311553,lat_rad
That would be a Tomcat question :)
On Tue, Mar 16, 2010 at 8:36 PM, blargy wrote:
>
> [java] INFO: The APR based Apache Tomcat Native library which allows optimal
> performance in production environments was not found on the
> java.library.path:
> .:/Library/Java/Extensions:/System/Library/Java/E
Hi all, we translated the Solr tutorial to Spanish due to a client's
request. For all you Spanish speakers/readers out there, you can have a look
at it:
http://www.linebee.com/?p=155
We hope this can expand the usage of the project and lower the language
barrier to non-english speakers.
Thanks
org/apache/solr/util/plugin/SolrCoreAware in the stack trace refers to
an interface in the main Solr jar.
I think this means that putting all of the libs in
apache-tomcat-6.0.20/lib is a mistake: the classloader finds
ExtractingRequestHandler in
apache-tomcat-6.0.20/lib/apache-solr-cell-1.4.1-dev.
[java] INFO: The APR based Apache Tomcat Native library which allows optimal
performance in production environments was not found on the
java.library.path:
.:/Library/Java/Extensions:/System/Library/Java/Extensions:/usr/lib/java
What the heck is this and why is it recommended for production setti
I was reading "Scaling Lucen and Solr"
(http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr/)
and I came across the section StopWords.
In there it mentioned that its not recommended to remove stop words at index
time. Why is this the case? Don't all t
There are certainly a number of widely varying opinions on the use of RAM
directory.
Basically, though, if you need the index to be persistent at some point
(i.e. saved across reboots, crashes etc.),
you'll need to write to a disk, so RAM directory becomes somewhat
superfluous in this case.
Genera
Hi,
Am using autobench to benchmark solr with the query
http://localhost:8983/solr/select/?q=body:hotel AND
_val_:"recip(hsin(0.7113258,-1.291311553,lat_rad,lng_rad,30),1,1,0)"^100
But if i specify the same in the autobench command as
autobench --file bar1.tsv --high_rate 100 --low_rate 20 --rate
It seems that Solr's query parser doesn't pass a single term query
to the Analyzer for the field. For example, if I give it
2001年 (year 2001 in Japanese), the searcher returns 0 hits
but if I quote them with double-quotes, it returns hits.
In this experiment, I configured schema.xml so that
the f
Hey Peter,
Thanks for your reply.
My question was mainly about the fact there seems to be two different
aspects to the solr RAM usage: in-process and out-process.
By that I mean, yes i know the many different parameters/caches to do with
solr in-process memory usage and related culprits, however
Aha. That appears to be the issue. I hadn't realized that the query
handler had all of those definitions there.
-Alex
On 3/16/2010 6:56 PM, Erick Erickson wrote:
I suspect your problem is that you still have "price" defined in
solrconfig.xml for the dismax handler. Look for the section
Besides the other notes here, I agree you'll hit OOM if you try to
read all the rows into memory at once, but I'm absolutely sure you
can read then N at a time instead. Not that I could tell you how, mind
you.
You're on your way...
Erick
On Tue, Mar 16, 2010 at 4:13 PM, Neil Chaudhuri <
nchau
I suspect your problem is that you still have "price" defined in
solrconfig.xml for the dismax handler. Look for the section
You'll see price defined as one of the default fields for "fl" and "bf".
HTH
Erick
On Tue, Mar 16, 2010 at 6:55 PM, Alex Thurlow wrote:
> Hi guys,
>Based on some s
On Tue, Mar 16, 2010 at 9:08 PM, KaktuChakarabati wrote:
>
> Hey,
> I am trying to understand what kind of calculation I should do in order to
> come up with reasonable RAM size for a given solr machine.
>
> Suppose the index size is at 16GB.
> The Max heap allocated to JVM is about 12GB.
>
> The
Disclaimer: My Oracle experience is miniscule at best. I am also a
beginner at Solr, so grab yourself the proverbial grain of salt.
I googled a bit on CLOB. One page I found mentioned setting up a view
to return the data type you want. Can you use the functions described
on these pages in
Lance,
I tried that but no luck. Just in case the relative paths were causing a
problem, I also tried using absolute paths but neither seemed to help.
First, I tried adding ** as the
full directory so it would hopefully include everything. When that
didn't work, I tried adding paths directly
Since my original thread was straying to a new topic, I thought it made sense
to create a new thread of discussion.
I am using the DataImportHandler to index 3 fields in a table: an id, a date,
and the text of a document. This is an Oracle database, and the document is an
XML document stored as
The DataImportHandler has tools for this. It will fetch rows from
Oracle and allow you to unpack columns as XML with Xpaths.
http://wiki.apache.org/solr/DataImportHandler
http://wiki.apache.org/solr/DataImportHandler#Usage_with_RDBMS
http://wiki.apache.org/solr/DataImportHandler#XPathEntityProces
Hi guys,
Based on some suggestions, I'm trying to use the dismax query
type. I'm getting a weird error though that I think it related to the
default test data set.
From the query tool (/solr/admin/form.jsp), I put in this:
Statement: artist:test title:test +type:video
query type: dismax
I'm pretty unclear on how to patch the Tika 0.7-trunk on our Solr instance.
This is what I've tried so far (which was really just me guessing):
1. Got the latest version of the trunk code from
http://svn.apache.org/repos/asf/lucene/tika/trunk
2. Built this using Maven (mvn install)
3
They are a namespace like other namespaces and are useable in
attributes, just like in the DB query string examples.
As to defaults, you can declare those in the
declarations in solrconfig.xml. Examples of this (search for
"defaults") in the wiki page.
On Tue, Mar 16, 2010 at 7:05 AM, Lukas Kahw
NoClassDefFoundError usually means that the class was found, but it
needs other classes and those were not found. That is, Solr finds the
ExtractingRequestHandler jar but cannot find the Tika jars.
In example/solr/conf/slrconfig.xml, there are several '' elements. These give classpath directories
Thanks Chris!
I'll try the patch.
-Original Message-
From: Mattmann, Chris A (388J) [mailto:chris.a.mattm...@jpl.nasa.gov]
Sent: Tuesday, March 16, 2010 5:37 PM
To: solr-user@lucene.apache.org
Subject: Re: PDFBox/Tika Performance Issues
Guys, I think this is an issue with PDFBOX and t
Guys, I think this is an issue with PDFBOX and the version that Tika 0.6
depends on. Tika 0.7-trunk upgraded to PDFBox 1.0.0 (see [1]), so it may
include a fix for the problem you're seeing.
See this discussion [2] on how to patch Tika to use the new PDFBox if you can't
wait for the 0.7 release
Originally 16 (the number of CPUs on the machine), but even with 5 threads it's
not looking so hot.
-Original Message-
From: Grant Ingersoll [mailto:gsi...@gmail.com] On Behalf Of Grant Ingersoll
Sent: Tuesday, March 16, 2010 5:15 PM
To: solr-user@lucene.apache.org
Subject: Re: PDFBox/Ti
That is a great article, David.
For the moment, I am trying an all-Solr approach, but I have run into a small
problem. The documents are stored as XML CLOB's using Oracle's OPAQUE object.
Is there any facility to unpack this into the actual text? Or must I execute
that in the SQL query?
Thank
Hmm, that is an ugly thing in PDFBox. We should probably take this over to the
PDFBox project. How many threads are you indexing with?
FWIW, for that many documents, I might consider using Tika on the client side
to save on a lot of network traffic.
-Grant
On Mar 16, 2010, at 4:37 PM, Giovan
Hey,
I am trying to understand what kind of calculation I should do in order to
come up with reasonable RAM size for a given solr machine.
Suppose the index size is at 16GB.
The Max heap allocated to JVM is about 12GB.
The machine I'm trying now has 24GB.
When the machine is running for a while
Do you have the option of just importing each xml node as a
field/value when you add the document?
That'll let you do the search easily. If you need to store the raw XML,
you can use an extra field.
Tommy Chheng
Programmer and UC Irvine Graduate Student
Twitter @tommychheng
http://tommy.chhe
If you do stay with Oracle, please report back to the list how that went. In
order to get decent filtering and faceting performance, I believe you will need
to use "bitmapped indexes" which Oracle and some other databases support.
You may want to check out my article on this subject:
http://ww
I've been trying to bulk index about 11 million PDFs, and while profiling our
Solr instance, I noticed that all of the threads that are processing indexing
requests are constantly blocking each other during this call:
http-8080-Processor39 [BLOCKED] CPU time: 9:35
java.util.Collections$Synchroni
I've also index a concatenation of 50k journal articles (making a
single document of several hundred MB of text) and it did not give me
an OOM.
-glen
On 16 March 2010 15:57, Erick Erickson wrote:
> Why do you think you'd hit OOM errors? How big is "very large"? I've
> indexed, as a single docum
For my purposes, the Porter analyzer was overly aggressive with stemming. So,
we then moved to KStem. It looks like this is no longer being maintained and
Lucid claimed much better performance with theirs, so I gave that a try and it
seems to be working fine. I didn't do any benchmarks though.
Certainly I could use some basic SQL count(*) queries to achieve faceted
results, but I am not sure of the flexibility, extensibility, or scalability of
that approach. And from what I have read, Oracle Text doesn't do faceting out
of the box.
Each document is a few MB, and there will be million
Hello Experts,
I need help on this issue of mine. I am unsure if this scenario is possible.
I have a field in my solr document named , the value of which is a
xml string as below. This xml structure is within the inputxml field value. I
needed help on searching this xml structure i.e. if I sear
Why do you think you'd hit OOM errors? How big is "very large"? I've
indexed, as a single document, a 26 volume encyclopedia of civil war
records..
Although as much as I like the technology, if I could get away without using
two technologies, I would. Are you completely sure you can't get what
Kevin,
When you say you just included the war you mean the /packs/solr.war correct?
I see that the KStemmer is nicely packed in there but I don't see LucidGaze
anywhere. Have you had any experience using this?
So I'm guessing you would suggest using the LucidWorks solr.war over the
apache-solr-
I am working on an application that currently hits a database containing
millions of very large documents. I use Oracle Text Search at the moment, and
things work fine. However, there is a request for faceting capability, and Solr
seems like a technology I should look at. Suffice to say I am new
If you search the mail archive, you'll find many discussions of
multilingual indexing/searching that'll provide you a plethora
of information.
But the synopsis as I remember is that using a single stemmer for
multiple languages is generally a bad idea
Best
Erick
On Tue, Mar 16, 2010 at 12:19
I'm trying it out right now. I hope it will work well out-of-box for
indexing/searching a set of documents with frequent update.
-aj
On Tue, Mar 16, 2010 at 11:52 AM, blargy wrote:
>
> Has anyone used this?:
> http://www.lucidimagination.com/Downloads/LucidWorks-for-Solr
>
> Other than the KStem
I used it mostly for KStemmer, but I also liked the fact that it included about
a dozen or so stable patches since Solr 1.4 was released. We just use the
included WAR in our project however. We don't use the installer or anything
like that.
From: blargy
To:
Most of our documents will be in English but not all and we are certain in
the process of acquiring more international content. Does anyone have any
experience using all of the different stemmers for languages of unknown
origin? Which ones perform the best? Give the most relevant results? What
are
I generate solr index on an hadoop cluster and I want to copy it from HDFS to
a server running solr.
I wish to copy the index on a different disk than the disk that solr
instance is using, then tell the solr server to switch from the current data
dir to the location where I copied the hadoop gene
Hi again ,
I just came from trying the version 1.5-dev from Solr trunk.
After applying the patch you provided, and adding icu4j-3_8_1 in classpath,
results are pretty good different then before.
Now words and texts are not reversed and are displayed correctly except some
pdf files's text parts that
Hi,
I am trying to use $deleteDocById to delete rows based on an SQL query in my
db-data-config.xml. The following tag is a top level tag in the tag.
However it seems like its only fetching the rows, its not actually issuing any
index deletes.
regards,
Lukas Kahwe Smith
m...@pooteew
Hi,
According to the wiki its possible to pass parameters to the DIH:
http://wiki.apache.org/solr/DataImportHandler#Accessing_request_parameters
I assume they are just being replaced via simple string replacements, which is
exactly what I need. Can they also be in all places, even attributes (fo
This is my first post on this list -- apologies if this has been discussed
before; I didn't come upon anything exactly equivalent in searching the
archives via Google.
I'm using Solr 1.4 as part of the VuFind application, and I just noticed that
searches for hyphenated terms are failing in stra
On Mar 15, 2010, at 11:36 AM, Jean-Sebastien Vachon wrote:
> Hi All,
>
> I'm trying to figure out how to perform spatial searches using Solr 1.5 (from
> the trunk).
>
> Is the support for spatial search built-in?
Almost. Main thing missing right now is filtering. There are still ways to do
If you're going to spend time mucking w/ TermPositions, you should just spend
your time working with SpanQuery, as that is what I understand you to be asking
about. AIUI, you want to be able to get at the positions in the document where
the query matched. This is exactly what a SpanQuery and i
Shalin Shekhar Mangar wrote:
>
> On Sat, Mar 13, 2010 at 9:30 AM, Suram wrote:
>
>>
>> Erick Erickson wrote:
>> >
>> > Did you commit your changes?
>> >
>> > Erick
>> >
>> > On Fri, Mar 12, 2010 at 7:38 AM, Suram wrote:
>> >
>> >>
>> >> Can set my index fields for auto Suggestion, sometime t
Thank you.
This work good as workaround. Yesterday I get the Tipp to look for wrong
solrconfig.xml and that was right.
By uploading our Files the solrconfig.xml was LOST ;-)
Is it possible to start Java in Debugmode for more Infos?
David
Am 16.03.2010 02:02, schrieb Tom Hill:
You need a que
58 matches
Mail list logo