I don't think you understand. Solr does not have the code to do that. It just
isn't there, nor would I expect it would ever be there.
Solr is open source though. You could look at the code and figure out how to
do it (though why anyone would do that remains beyond my ability to
understand).
Nicely put. ;^)
-Original Message-
From: Walter Underwood [mailto:wun...@wunderwood.org]
Sent: Tuesday, September 13, 2011 9:16 AM
To: solr-user@lucene.apache.org
Subject: Re: can indexing information stored in db rather than filesystem?
On Sep 13, 2011, at 6:51 AM, kiran.bodigam wrote:
numDocs is not the number of documents in memory. It is the number of
documents currently in the index (which is kept on disk). Same goes for
maxDocs, except that it is a count of all of the documents that have ever been
in the index since it was created or optimized (including deleted documen
Looking at the source for Jetty, line 149 in Jetty's HttpOutput java file looks
like this:
if (_closed)
throw new IOException("Closed"); <
[http://www.jarvana.com/jarvana/view/org/eclipse/jetty/aggregate/jetty-all/7.1.0.RC0/jetty-all-7.1.0.RC0-sources.jar!/org/ec
> changed the configuration to point it to my solr dir and started it again
You might look in your logs to see where Solr thinks the Solr home directory is
and/or if it complains about not being able to find it. As a guess, it can't
find it, perhaps because solr.solr.home does not point to the
Just add a bogus 0 timestamp after it when you index it. That is what we did.
Dates are not stored or indexed as characters, anyway, so space would not be
any different one way or the other.
JRJ
-Original Message-
From: stockii [mailto:stock.jo...@googlemail.com]
Sent: Wednesday, Sep
Mail -
Von: "Jay Jaeger - DOT"
An: solr-user@lucene.apache.org, "JETTY user mailing list"
Gesendet: Mittwoch, 14. September 2011 15:21:19
Betreff: RE: EofException with Solr in Jetty
Looking at the source for Jetty, line 149 in Jetty's HttpOutput java file looks
l
I think folks are going to need a *lot* more information. Particularly
1. Just what does your "test script" do? Is it doing updates, or just
queries of the sort you mentioned below?
2. If the test script is doing updates, how are those updates being fed to
Solr?
3. What version of Solr
o 6000m,
particularly given your relatively modest number of documents (2,000,000).
I was trying everything before asking here.
5. Machine characteristics, particularly operating system and physical
memory on the machine.
OS => Debian 6.0, Physcal Memory => 32 gb, CPU => 2x Intel Quad Cor
Some things to think about:
When solr starts up, solr should report for the location of solr home. Is it
what you expect?
Is there any security on the "dist" directory that would prevent solr from
accessing it?
Is there a classloader policy set on glassfish that could be getting in the way?
(y
nd its too much.
When i send a set of random queries (10-20 queries per second) response
times goes crayz ( 8 seconds to 60+ seconds).
On Wed, Sep 14, 2011 at 6:07 PM, Jaeger, Jay - DOT wrote:
> I don't have enough experience with filter queries to advise well on when
> to use fq vs. pu
Actually, Windoze also has symbolic links. You have to manipulate them from
the command line, but they do exist.
http://en.wikipedia.org/wiki/NTFS_symbolic_link
-Original Message-
From: Per Osbeck [mailto:per.osb...@lbi.com]
Sent: Thursday, September 15, 2011 7:15 AM
To: solr-user@lu
500 / second would be 1,800,000 per hour (much more than 500K documents).
1) how big is each document?
2) how big are your index files?
3) as others have recently written, make sure you don't give your JRE so much
memory that your OS is starved for memory to use for file system cache.
JRJ
--
We used copyField to copy the address to two fields:
1. Which contains just the first token up to the first whitespace
2. Which copies all of it, but translates to lower case.
Then our users can enter either a street number, a street name, or both. We
copied all of it to the second field bec
That would still show up as the CPU being busy.
-Original Message-
From: Federico Fissore [mailto:feder...@fissore.org]
Sent: Wednesday, September 28, 2011 6:12 AM
To: solr-user@lucene.apache.org
Subject: Re: strange performance issue with many shards on one server
Frederik Kraus, il 28
one server
Jaeger, Jay - DOT, il 28/09/2011 18:40, ha scritto:
> That would still show up as the CPU being busy.
>
i don't know how the program (top, htop, whatever) displays the value
but when the cpu has a cache miss definitely that thread sits and waits
for a number of clock cyc
One time when we had that problem, it was because one or more cores had a
broken XML configuration file.
Another time, it was because solr/home was not set right in the servlet
container.
Another time it was because we had an older EAR pointing to a newer release
Solr home directory. Given wha
cores adminPath="/admij/cores"
Was that a cut and paste? If so, the /admij/cores is presumably incorrect, and
ought to be /admin/cores
-Original Message-
From: Jaeger, Jay - DOT [mailto:jay.jae...@dot.wi.gov]
Sent: Wednesday, September 28, 2011 4:10 PM
To:
Are you changing just the host OS or the JVM, or both, from 32 bit to 64 bit?
If it is just the OS, the answer is definitely no, you don't need to do
anything more than copy.
If the answer is the JVM, I *think* the answer is still no, but others more
authoritative than I may wish to respond.
-
I am no expert, but here is my take and our situation.
Firstly, are you asking what the minimum number of documents is before it makes
*any* sense at all to use a distributed search, or are you asking what the
maximum number of documents is before a distributed search is essentially
required?
I am not expert, but based on my experience, the information you are looking
for should indeed be in your logs.
There are at least three logs you might look for / at:
- An HTTP request log
- The solr log
- Logging by the application server / JVM
Some information is available at http://wiki.apac
If you are asking how to tell which of 94000 records failed in a SINGLE HTTP
update request, I have no idea, but I suspect that you cannot necessarily tell.
It might help if you copied and pasted what you find in the solr log for the
failure (see my previous response for how to figure out where
I have no idea what might be causing your memory to increase like that (we
haven't run 3.4, and our index so far has been at most 28 million rows with
maybe 40 fields), but just as an aside, depending upon what you meant by "we
drop the whole index", I'd think it might work better to do an righ
We generated our own concatenated key (original customer, who may historically
have different addresses, etc.). If there is a way for Solr to do that
automatigically, I'd love to hear about it.
I don't think that the extra bytes for the key itself (String vs. binary
integer) is all that much o
My thought about this, based on some work we did when we considered using Solr
to index our LAN files:
1) If it matters - if someone misusing the private tags is a real issue (and it
sounds like it would be), then I think you need an application out in front to
enforce this (a good idea with So
We do much the same (along with name, address, postal code, etc.).
However, we use AND when we search: the more data someone can provide, the
fewer and more applicable their search results.
JRJ
-Original Message-
From: Jason Toy [mailto:jason...@gmail.com]
Sent: Thursday, October 06, 2
Perhaps integrate this using a javascript or other application front end to
query solr, get the key to the database, and then run off to get the data?
-Original Message-
From: Ikhsvaku S [mailto:ikhsv...@gmail.com]
Sent: Tuesday, October 11, 2011 2:47 PM
To: solr-user@lucene.apache.org
We have used a VMWare VM for our index for testing for our index (currently
around 3GB) and it has been just fine - at most maybe a 10 to 20% penalty, if
that, even when CPU bound. We also plan to use a VM for production.
What hypervisor one uses matters - sometimes a lot.
-Original Messag
One thing to consider is the case where the JVM is up, but the system is
otherwise unavailable (say, a NIC failure, firewall failure, load balancer
failure) - especially if you use a SAN (whose connection is different from the
normal network).
In such a case the old master might have uncommitte
It sounds like maybe you either have not told Solr where the Solr home
directory is, or , more likely, have not copied the jar files for this
particular class into the right directory (typically a "lib" directory) so
Tomcat cannot find that class. There is other correspondence on this list that
It depends upon whether you want Solr to do the XSL processing, or the browser.
After fussing a bit, and doing some reading and thinking, we decided it was
best to let the browser do the work, at least in our case.
If the browser is doing the processing, you don't need to modify sorlconfig.xml
As others have reported, I also did not get your image.
I am interested in your situation because we will deploy to WAS 7 in
production, and have tested there.
One thing I noted that might point to a possible problem you might have:
1. "The owner of the files created in the 2 environment
I do not believe that it will work as you have written it, unless you put an
application in between to read that XML and then call Solr with what it
expects. See http://wiki.apache.org/solr/UpdateXmlMessages
You need to have:
unique-value-if-any-1
abc
123
un
I believe that if you have the Solr distribution, you have the source for the
web UI already: it is just .jsp pages. They are inside the solr .war file.
JRJ
-Original Message-
From: nagarjuna [mailto:nagarjuna.avul...@gmail.com]
Sent: Wednesday, October 19, 2011 12:07 AM
To: solr-user@
200 instances of what? The Solr application with lucene, etc. per usual? Solr
cores? ???
Either way, 200 seems to be very very very many: unusually so. Why so many?
If you have 200 instances of Solr in a 20 GB JVM, that would only be 100MB per
Solr instance.
If you have 200 instances of S
Solr does not have an "update" per se: you have to re-add the document. A
document with the same value for the field defined as the uniqueKey will
replace any existing document with that key (you do not have to query and
explicitly delete it first).
JRJ
-Original Message-
From: hadi
It won't do it for you automatically. I suppose you might create the thumbnail
image beforehand, Base64 encode it, and add it as a stored, non-indexed, binary
field (see schema: solr.BinaryField) when you index the document.
JRJ
-Original Message-
From: hadi [mailto:md.anb...@gmail.com
Commit does not particularly spike disk or memory usage, unless you are adding
a very large number of documents between commits. A commit can cause a need to
merge indexes, which can increase disk space temporarily. An optimize is
*likely* to merge indexes, which will usually increase disk spa
It certainly is possible to develop search pages, update pages, etc. in any
architecture you like: I think I'd suggest looking at SolrJ if you want to do
that.http://wiki.apache.org/solr/Solrj
PLEASE: Go read through the documentation and tutorial and browse thru the
Wiki and FAQ. It's a
e is not a
single answer or formula that fits every situation.
JRJ
-Original Message-
From: Sujatha Arun [mailto:suja.a...@gmail.com]
Sent: Wednesday, October 19, 2011 11:58 PM
To: solr-user@lucene.apache.org
Subject: Re: Optimization /Commit memory
Thanks Jay ,
I was trying to compute t
Instances not solr cores.
We get an avg response time of below 1 sec.
The number of documents is not many most of the isntances ,some of the
instnaces have about 5 lac documents on average.
Regards
Sujahta
On Thu, Oct 20, 2011 at 3:35 AM, Jaeger, Jay - DOT wrote:
> 200 instances of what? The S
;
> On Thu, Oct 20, 2011 at 6:23 PM, Jaeger, Jay - DOT
> wrote:
>
>> Well, since the OS RAM includes the JVM RAM, that is part of your
>> requirement, yes? Aside from the JVM and normal OS requirements, all you
>> need OS RAM for is file caching. Thus, for updates, the O
1. Solr, proper, does not index "files". An adjunct called Solr Cel can. See
http://wiki.apache.org/solr/ExtractingRequestHandler . That article describes
which kinds of files it Solr Cel can handle.
2. I have no idea what you mean by "incidents per year". Please explain.
3. Even though
Maybe put them in a single string field (or any other field type that is not
analyzed -- certainly not text) using some character separator that will
connect them, but won't confuse the Solr query parser?
So maybe you start out with key value pairs of
Key1 value1
Key2 value2
Key3 value3
Prepro
website but found it was really technical,
since we are not on the developer side and we just want some basic
information or numbers about its usage.
Thanks for your answer, anyway.
2011/10/24 Jaeger, Jay - DOT
> 1. Solr, proper, does not index "files". An adjunct called Solr Ce
Could you replace it with something that will sort it last instead of an empty
string? (Say, for example, replacement="{}"). This would still give something
that looks empty to a person, and would sort last.
BTW, it looks to me as though your pattern only requires that the input contain
just
t the same thing as:
String silly = "";
JRJ
-Original Message-
From: themanwho [mailto:theman...@mac.com]
Sent: Tuesday, October 25, 2011 9:22 AM
To: solr-user@lucene.apache.org
Subject: RE: sort non-roman character strings last
Jay,
Thanks, good call on the pattern.
Sounds like a possible application of solr.PatternTokenizerFactory
http://lucene.apache.org/solr/api/org/apache/solr/analysis/PatternTokenizerFactory.html
You could use copyField to copy the entire string to a separate field (or set
of fields) that are processed by patterns.
JRJ
-Origina
I noted that in these messages the left hand side is lower case collection, but
the right hand side is upper case Collection. Assuming you did a cut/paste,
could you have a core name mismatch between a master and a slave somehow?
Otherwise (shudder): could you be doing a commit while the repli
My goodness. We do 4 million in about 1/2 HOUR (7+ million in 40 minutes).
First question: Are you somehow forcing Solr to do a commit for each and every
record? If so, that way leads to the house of PAIN.
The thing to do next, I suppose, might be to try and figure out whether the
issue is i
download them. By keeping older commits
we were able to work around this issue.
>
> -Original Message-
> From: Jaeger, Jay - DOT [mailto:jay.jae...@dot.wi.gov]
> Sent: 25 October 2011 20:48
> To: solr-user@lucene.apache.org
> Subject: RE: Replication issues with multi
No, we do not use DIH. Based on other responses I saw, its seems likely that
the issue is in the DIH component somehow.
JRJ
-Original Message-
From: Awasthi, Shishir [mailto:shishir.awas...@baml.com]
Sent: Tuesday, October 25, 2011 3:24 PM
To: solr-user@lucene.apache.org; Jaeger, Jay
It didn't look like that, but maybe.
Our experience has been very very good. I don't think we have seen a crash in
our prototype to date (though that prototype is also not very busy). We have
had as many a four cores, with as many as 35 million "documents".
-Original Message-
From
>From your logs, it looks like the Solr library is being found just fine, and
>that the servlet is initing OK.
Does your Jetty configuration specify index.jsp in a welcome list?
We had that problem in WebSphere: we got 404's the same way, and the cure was
to modify the Jetty web.xml to include
ERRATA, that should the the *SOLR* web.xml (not the Jetty web.xml)
Sorry for the confusion.
-Original Message-
From: Jaeger, Jay - DOT [mailto:jay.jae...@dot.wi.gov]
Sent: Wednesday, October 26, 2011 4:02 PM
To: 'solr-user@lucene.apache.org'
Subject: RE: Difficulties Installing
I very much doubt that would work: different versions of Lucene involved, and
Solr replication does just a streamed file copy, nothing fancy.
JRJ
-Original Message-
From: Nemani, Raj [mailto:raj.nem...@turner.com]
Sent: Wednesday, October 26, 2011 12:55 PM
To: solr-user@lucene.apache.o
erbilt [mailto:li...@datagenic.com]
Sent: Wednesday, October 26, 2011 5:41 PM
To: solr-user@lucene.apache.org
Subject: Re: Difficulties Installing Solr with Jetty 7.x
Jay:
Thanks for the response.
$JETTY_HOME/etc/webdefault.xml is the unmodified file that came with
Jetty, and it has a referencin
Shishir, we have 35 million "documents", and should be doing about 5000-1
new "documents" a day, but with very small "documents": 40 fields which have
at most a few terms, with many being single terms.
You may occasionally see some impact from top level index merges but those
should be
The file that he refers to, web.xml, is inside the solr WAR file in folder
web-inf. That WAR file is in ...\example\webapps. You would have to
uncomment the section under and change the
to something else. But, as the comments in the
section explain, you would also have to make other cha
It seems to me that this issue needs to be addressed in the FAQ and in the
tutorial, and that somewhere there should be a /select lock-down "how to".
This is not obvious to many (most?) users of Solr. It certainly wasn't obvious
to me before I read this.
JRJ
-Original Message-
From:
l out.
Jay R. Jaeger
State of Wisconsin,
Dept. of Transportation
1. Find a dictionary with the English words you find acceptable
2. Use the KeepWordFilterFactory (doc in the "AnalyzerTTokenizersTokenFilters
Wiki page).
-Original Message-
From: Omri Cohen [mailto:omri...@gmail.com]
Sent: Monday, August 15, 2011 1:23 AM
To: solr-user@lucene.apache.or
Note on i: Solr replication provides pretty good clustering support
out-of-the-box, including replication of multiple cores. Read the Wiki on
replication (Google +solr +replication if you don't know where it is).
In my experience, the problem with indexing PDFs is it takes a lot of CPU on
t
On the surface, you could simply add some more fields to your schema. But as
far as I can tell, you would have to have a separate Solr "document" for each
SKU/size combination, and store the rest of the information (brand, model,
color, SKU) redundantly and make the unique key a combination of
s an index.
-Original Message-
From: Steve Cerny [mailto:sjce...@gmail.com]
Sent: Tuesday, August 16, 2011 11:37 AM
To: solr-user@lucene.apache.org
Subject: Re: Product data schema question
Jay, this is great information.
I don't know enough about Solr whether this is possible...Can we setup
Not particularly. Just trying to do my part to answer some questions on the
list.
-Original Message-
From: Steve Cerny [mailto:sjce...@gmail.com]
Sent: Tuesday, August 16, 2011 11:49 AM
To: solr-user@lucene.apache.org
Subject: Re: Product data schema question
Thanks Jay, if we come to
Perhaps your admin doesn’t work because you don't have
defaultCoreName="whatever-core-you-want-by-default" in your tag? E.g.:
Perhaps this was enough to prevent it starting any cores -- I'd expect a
default to be required.
Also, from experience, if you add cores, and you have securi
them, besides 404 errors.
On Tuesday, 16 August, 2011 at 1:10 PM, Jaeger, Jay - DOT wrote:
> Perhaps your admin doesn’t work because you don't have
> defaultCoreName="whatever-core-you-want-by-default" in your tag? E.g.:
>
>
>
> Perhaps this was enough
I tried on my own test environment -- pulling out the default core parameter
out, under Solr 3.1
I got exactly your symptom: an error 404.
HTTP ERROR 404
Problem accessing /solr/admin/index.jsp. Reason:
missing core name in path
The log showed:
2011-08-
Whoops: That was Solr 4.0 (which pre-dates 3.1).
I doubt very much that the release matters, though: I expect the behavior would
be the same.
-Original Message-
From: Jaeger, Jay - DOT [mailto:jay.jae...@dot.wi.gov]
Sent: Tuesday, August 16, 2011 4:04 PM
To: solr-user
now. Excellent! The site
schemas are loading!
Looks like the site schemas have an issue:
"SEVERE: org.apache.solr.common.SolrException: Unknown fieldtype 'long'
specified on field area_id"
Errr. Why would `long` be an invalid type?
On Tuesday, 16 August, 2011 at 2:06 PM, Jaeg
okay, now. Thanks for the help. You guys saved me
from the insane asylum.
On Tuesday, 16 August, 2011 at 2:32 PM, Jaeger, Jay - DOT wrote:
> That said, the logs are showing a different error now. Excellent! The site
> schemas are loading!
>
> Great!
>
> "SEVERE: org.apa
I'd suggest looking at the logs of the master to see if the request is getting
thru or not, or if there are any errors logged there. If the master has a
replication config error, it might show up there.
We just went thru some master/slave troubleshooting. Here are some things that
you might l
It would perhaps help if you reported what you mean by "noticeably less time".
What were your timings? Did you run the tests multiple times?
One thing to watch for in testing: Solr performance is greatly affected by the
OS file system cache. So make sure when testing that you use the same
> What is the latest version of Tika that I can use with Solr 1.4.1? it
> comes packaged with 0.4. I tried 0.8 and it no workie.
When I was testing Tika last year, I used Solr build 1271 to get the most
recent Tika I could get my hands on at the time. That was before Solr 3.1, so
I expect it
> geospatial requirements
Looking at your email address, no surprise there. 8^)
> What insight can you share (if any) regarding moving forward to a later
> nightly build?
I used build 1271 (Solr 1.4.1, which seemed to be called Solr 4 at the time)
during some testing, and it performed well
You could presumably do it with solr.PatternTokenizerFactory with the pattern
set to .* as your
Or, maybe, if Solr allows it, you don't use any tokenizer at all?
Or, maybe you could use solr.WhitespaceTokenizerFactory, allowing it to split
up the words, along with solr.WordDelimiterFilterFacto
I would suggest #3, unless you have some very unusual performance requirements.
It has the advantage of isolating your index environment requirements from
the database.
-Original Message-
From: Nicholas Fellows [mailto:n...@djdownload.com]
Sent: Thursday, August 18, 2011 8:40 AM
To:
I am not an XSLT expert, but believe that in XSLT, "not" is a function, rather
than an operator.
http://www.w3.org/TR/xpath-functions/#func-not
So, not(contains)) rather than not contains() should presumably do
the trick.
-Original Message-
From: Christopher Gross [mailto:cog
You could run the HTML import from Tika (see the Solr tutorial on the Solr
website). The job that ran Tika would need the user/password of the site to be
indexed, but Solr would not. (You might have to write a little script to get
the HTML page using curl or wget or Nutch).
Users could then s
One way I had thought of doing this kind of thing: include in the index an
"ACL" of some sort. The problem I see in your case is that the list if
"friends" can presumably change over time.
So, given that, one way would be to have a little application in between. The
request goes to the appli
I don't think it has to be quite so bleak as that, depending upon the number of
queries done over a given timeframe, and the size of the result sets. Solr
does cache the identifiers of "documents" returned by search results. See
http://wiki.apache.org/solr/SolrCaching paying particular attent
Yes, but since Solr is written in Java to run in a JEE container, you would
host Solr in a web application server, either Jetty (which comes packaged), or
something else (say, Tomcat or WebSphere or something like that).
As a result, you aren't going to find anything that says how to run Solr un
"A programmer had a problem. He tried to solve it with regular expressions.
Now he has two problems" :).
A. That just isn't fair... 8^)
(I can't think of very many things that have allowed me to perform more magic
over my career than regular expressions, starting with SNOBOL. Uh oh: I ju
ndows shared hosting environment
Thank you!
Since it's shared hosting, how do I install java?
-Original Message-
From: Jaeger, Jay - DOT [mailto:jay.jae...@dot.wi.gov]
Sent: Thursday, August 25, 2011 4:34 PM
To: solr-user@lucene.apache.org
Subject: RE: Solr in a windows shared hosting e
401 - 485 of 485 matches
Mail list logo