Hello,
Is it possible to define more than one schema? I'm reading the example
schema.xml. It seems that we can only define one schema? What about if I want
to define one schema for document type A and another schema for document type B?
Thanks a lot,
Kevin
You might examine what the Apache CouchDB people have done.
It's a document oriented DB that is able to use JSON structured
documents combined with Lucene indexing of the documents with a
RESTful HTTP interface.
It's a stretch, and written in Erlang.. but perhaps there is some
inspiration to be h
onError="continue" must help .
which version of DIH are you using? onError is a Solr 1.4 feature
--Noble
On Thu, Jan 29, 2009 at 5:04 AM, Nathan Adams wrote:
> I am constructing documents from a JDBC datasource and a HTTP datasource
> (see data-config file below.) My problem is that I cannot kn
The problem you are trying to solve is that you cannot use
${dataimporter.last_index_time} as is. you may need something like
${dataimporter.last_index_time} - 3secs
am I right?
There are no straight ways to do this .
1) you may write your own function say 'lastIndexMinus3Secs' and add
them. func
Duh. Four cases. For extra credit, what language is "wunder" in?
wunder
On 1/28/09 5:12 PM, "Walter Underwood" wrote:
> I've done this. There are five cases for the tokens in the search
> index:
>
> 1. Tokens that are unique after stemming (this is good).
> 2. Tokens that are common after stem
I've done this. There are five cases for the tokens in the search
index:
1. Tokens that are unique after stemming (this is good).
2. Tokens that are common after stemming (usually trademarks,
like LaserJet).
3. Tokens with collisions after stemming:
German "mit", "MIT" the university
Germ
I'd like to use the DataImportHandler running against a slave database that,
at any given time, may be significantly behind the master DB. This can cause
updates to be missed if you use the clock-time as the "last_index_time."
E.g., if the slave catches up to the master between two delta-imports.
I'm not entirely sure about the fine points, but consider the
filters that are available that fold all the diacritics into their
low-ascii equivalents. Perhaps using that filter at *both* index
and search time on the English index would do the trick.
In your example, both would be 'munchen'. Strai
But do note that there's also no requirement that all documents
have the same fields. So you could consider storing a special
"meta document" that had *no* fields in common with any other
document that records whatever information you want about the
current state of the index.
Best
Erick
On Wed,
I am constructing documents from a JDBC datasource and a HTTP datasource
(see data-config file below.) My problem is that I cannot know if a
particular HTTP URL is available at index time, so I need DIH to
continue processing even if the HTTP location returns a 404.
onError="continue" does not app
org/apache/catalina/connector/Connector java/util/WeakHashMap
$Entry399,913,269 bytes
org/apache/catalina/connector/Connector java/lang/Object[ ]
197,256,078 bytes
org/apache/lucene/search/ExtendedFieldCachejava/util/WeakHashMap$Entry
[ ] 177,893,021 bytes
o
Hi, bear with me as I am new to Solr.
I have a requirement in an application where I need to show a list of
results by groups.
For instance, each document in my index correspond to a person and they have
a family name. I have hundreds of thousands of records (persons). What I
would like to do is
Mark Miller wrote on 01/26/2009 04:30:00 PM:
> Just a point or I missed: with such a large index (not doc size large,
> but content wise), I imagine a lot of your 16GB of RAM is being used by
> the system disk cache - which is good. Another reason you don't want to
> give too much RAM to the JV
Hi,
I currently have two indexes with solr. One for english version and one
with german version. They use respectively english/german2 snowball
factory.
Right now depending on which language is website currently I query
corresponding index.
There is requirement though that stuff is found regardless
There is no existing internal field like that.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
> From: Ian Connor
> To: solr-user@lucene.apache.org
> Sent: Wednesday, January 28, 2009 4:59:28 PM
> Subject: Re: solr as the data store
>
> I am plan
I am planning with backups, the recovery will only be incremental.
Is there an internal field to know when the last document hit the index or
is this best to build your own "created_at" type field to know when you need
to rebuild from?
After the backup is restored, this field could be read and th
Although the idea that you will need to rebuild from scratch is
unlikely, you might want to fully understand the cost of recovery if you
*do* have to.
If it's incredibly expensive(time or money), you need to keep that in
mind.
-Todd
-Original Message-
From: Ian Connor [mailto:ian.con...
Alejandro,
What you really want to do is identify the language of the email, store that in
the index and apply the appropriate analyzer. At query time you really want to
know the language of the query (either by detecting it or asking the user or
...)
Otis
--
Sematext -- http://sematext.com/
Mark,
I am not aware of anyone open-sourcing such tools. But note that changing the
files with a GUI is easy (editor + scp?). What makes things more complicated
is the need to make Solr reload those files and, in some cases, changes really
require a full index rebuilding.
Otis
--
Sematext --
Yeah, I think the begin/end chars are very helpful here. But I like the
suggestion of figuring out which words really need to support leading
wildcards...although that's typically impossible to predict, since people are
typically free to enter whatever queries they feel like.
Otis
--
Sematext
This is perfectly fine. Of course, you lose any relational model. If you
don't have or don't need one, why not.
It used to be the case that backups of live Lucene indices were hard, so people
preferred having a RDBMS be the primary data source, the one they know how to
back up and maintain we
One thing to keep in mind is that things like joins are impossible in
solr, but easy in a database. So if you ever need to do stuff like run
reports, you're probably better off with a database to query on -
unless you cover your bases very well in the solr index.
Thanks for your time!
Matt
Hi All,
Is anyone using Solr (and thus the lucene index) as there database store.
Up to now, we have been using a database to build Solr from. However, given
that lucene already keeps the stored data intact, and that rebuilding from
solr to solr can be very fast, the need for the separate databas
On Thu, Jan 29, 2009 at 12:39 AM, Gert Brinkmann wrote:
>
> Hello again,
>
> is there nobody who could help me with this? Or is it an FAQ and my
> questions are dumb somehow? Maybe I should try to shorten the questions: ;)
>
Quite the opposite, you are actually working with some advanced stuff :
Hello again,
is there nobody who could help me with this? Or is it an FAQ and my
questions are dumb somehow? Maybe I should try to shorten the questions: ;)
> A) fuzzy search
>
> What can I do to speed up the fuzzy query?
> B) combine stemming, prefix and fuzzy search
>
> Is there a way to
Well, both pages I listed are in the search results :). But I agree
that it isn't obvious to find, and that it should be improved. (The
Wiki is a community-created site which anyone can contribute to,
incidentally.)
cheers,
-Mike
On 28-Jan-09, at 1:11 AM, Jarek Zgoda wrote:
I swear I wa
IndexMergeTool - http://wiki.apache.org/solr/MergingSolrIndexes
Sameer.
--
http://www.productification.com
On Wed, Jan 28, 2009 at 7:30 AM, Jae Joo wrote:
> Hi,
>
> Is there any way to join multiple indexes in Solr?
>
> Thanks,
>
> Jae
>
Does you index stay at triple size after optimization? It is normal for
Lucene to use 2x or upto 3x disk space during optimization but it should
fall back to the normal numbers once optimization completes and unused
segments are cleaned up due the index deletion policy.
If you search for threads i
Hi Ryuuichi,
Thanks for your quick reply.
I checked the setting of in solrconfig.xml, and the value
is 'false'. Here is what in our solrconfig.xml.
===
false
1000
1
2147483647
10
1000
I'm coming in late on this thread, but I want to recommend the YourKit
Profiler product. It helped me track a performance problem similar to what
you describe. I had been futzing with GC logging etc. for days before
YourKit pinpointed the issue within minutes.
http://www.yourkit.com/
(My problem
Tried that. Basically, solr really didn't want to do the internal rewrite.
So essentially we would have to rewrite with a full redirect and then change
the solrj source to allow it to follow the redirect. We are going with an
external rewriter. However, the seemingly easiest way would be to just
surfer10 wrote:
i'm a little bit noob in java compiler so could you please tell me what tools
are used to apply patch SOLR-236 (Field groupping), does it need to be
applied on current solr-1.3 (and nightly builds of 1.4) or it already in
box?
what batch file stands for solr compilation in its di
I would think that using a servlet filter to rewrite the URL should be
pretty strait forward. You could write your own or use a tool like http://tuckey.org/urlrewrite/
and just configure that.
Using something like this, I think the upgrade procedure could be:
- install rewrite filter to rewr
Hi,
Your problem seems to be lower level than the SOLR code. You are sending
an xml request that contains an illegal (to xml spec) character. You
should strip these characters out of the data that you send. Or turn the
xml validation (not recommended because of all kinds of risks).
See
http://www
Hi,
Is there any way to join multiple indexes in Solr?
Thanks,
Jae
We are moving from single core to multicore. We have a few servers that we
want to migrate one at a time to ensure that each one functions. This
process is proving difficult as there is no default core to allow the
application to talk to the solr servers uniformly (ie without a core name
during c
I know that I can see the search result after the commit and it is ok,
I can disable the queryResultCache and the problem will be fixed . but I
need the queryResultCache because my index Size is big and I need good
performance .
so I am trying to find how to fix the bug or may be the solr guys
On Wed, Jan 28, 2009 at 4:29 PM, Parisa wrote:
>
> I should say that we also have this problem when we commit with waitflush =
> true and waitsearcher = true
>
> because it again close the old searcher and open a new one. so it has
> warming up process with the queryResultCache.
>
> besides , I
I should say that we also have this problem when we commit with waitflush =
true and waitsearcher = true
because it again close the old searcher and open a new one. so it has
warming up process with the queryResultCache.
besides , I need to commit waitFlush = false and waitSearcher=false to
>From my past projects, our Lucene classification corpus looked like this:
0|document text...|categoryA
1|document text...|categoryB
2|document text...|categoryA
3|document text...|categoryA
...
800|document text...|categoryC
With the faceting capabilities of Solr it is now possible to design mor
Hello Qingdi,
Have you changed the "" setting in solrconfig.xml?
In my experience, when using compound-file index
("true"),
the size of index grows up to triple during optimization.
My understanding is that when writing a new segment in compound format,
Lucene writes the multifile format first and
I swear I was looking this information in Solr wiki. See for yourself
if this is accessible at all:
http://wiki.apache.org/solr/?action=fullsearch&context=180&value=highlight&fullsearch=Text
Wiadomość napisana w dniu 2009-01-28, o godz. 00:58, przez Mike Klaas:
They are documented in http://w
Oh wait.. looks like Otis' suggestion of "index n-grams with begin/end
delim characters" and relying on phrase-searching to link the chains
of characters.. logically doing a better version of my previous email.
- Neal
On Wed, Jan 28, 2009 at 1:04 AM, Neal Richter wrote:
> leading wildcard searc
leading wildcard search is called grep ;-)
Ditto on the indexing reversed words suggestion.
Can you create a second field in solr that contains /only/ the words
from the fields you care to reverse? Once you do that you could
pre-process the query and look for leading wildcards and address those
44 matches
Mail list logo