Thanks for the quick reply!
In fact it was a typo, the 200 rows I got were from postgres. I tried to say
that the full-import was omitting the 100 oracle rows.
When I run the full import, I run it as a single job, using the url
command=full-import. I've tried to clear the index both using the cle
Just for testing purpose - I would
1. Use curl to create new docs
2. Use Solrj to go to individual dbs and collect docs.
On Wed, Jul 7, 2010 at 12:45 PM, Xavier Rodriguez wrote:
> Thanks for the quick reply!
>
> In fact it was a typo, the 200 rows I got were from postgres. I tried to
> say
> t
I was wondering if anyone has any experience using huge pages[1] to
improve SOLR (or Lucene) performance (esp on 64bit).
Some are reporting major performance gains in large, memory intense
applications (like EJBs)[2].
Also, ephemeral but significant performance reductions have also been
solved usin
How did you verify it was not processed? Did you
1. Query for docs - with no results
2. Use Solr Admin tool?
3. Bypass data import handler and see if the doc post/commit works.
On Tue, Jun 15, 2010 at 10:29 PM, iboppana
wrote:
>
> Hi All,
>
> We are trying implement solr for our newspapers site
1) Shouldn't you put your "entity" elements under "document" tag, i.e.
...
...
2) What happens if you try to run full-import with explicitly
specified "entity" GET parameter?
command=full-import&entity=carrers
command=full-import&entity=hidrants
On Wed, Jul 7, 2010 at 11:1
This looks reasonable. I'll take a look at the patch. Originally, I had
intended that it was just for one Field Sub Type, thinking that if we ever
wanted multiple sub types, that a new, separate class would be needed, but if
this proves to be clean this way, then I see no reason not to incorpo
Hi,
I am trying to make a Lucene module for SKOS-based synonym expansion. As I
wanted to implement the Filter in SOLR, I get a ClassCastException.
So I tried to take one of the existing SOLR Filters and FilterFactories, change
the package information, compress into a jar and use it as a plugin.
I haven't used this myself, but Solr supports a
http://wiki.apache.org/solr/UpdateXmlMessages#A.22rollback.22 rollback
function. It is supposed to rollback to the state at the previous commit. So
you may want to turn off auto-commit on the index you are updating if you
want to control what that
Currently our only requirement is to be able to search on the
numerical part of the daterange field, so our field type overrides
getRangeQuery and getFieldQuery to consider only the first two
subfields. If we wanted to be able to search the name subfield as
well, I suppose we could do some
So I will have a solr field that contains "years", ie, "1990", "2010",
maybe even "1492", "1209" and "907"/"0907".
I will be doing range limits over this field. Ie, [1950 TO 1975] or
what have you. The data represents publication dates of books on a
large library shelves; there will be aroun
Hi list,
I am wondering if Solr/Lucene can help improve my existing search engine.
I would like to have different results for each user - but still have
relevant results. Each user would have different score multipliers for
each searchable item.
Is this something possible?
Thanks,
--
Jean-Mic
On Wed, Jul 7, 2010 at 8:15 AM, Grant Ingersoll wrote:
> Originally, I had intended that it was just for one Field Sub Type, thinking
> that if we ever wanted multiple sub types, that a new, separate class would
> be needed
Right - this was my original thinking too. AbstractSubTypeFieldType
i
I'm still pretty new to SOLR and have a question about handling updates. I
currently have a db-config to do a bulk import. I have a single root entity
and then some data that comes from other tables. This works fine for an
initial bulk load. However, once indexed, is there a way I can tell SOLR
Hmmm, let's see your schema definitions please. I'm suspicious because
you've implied that you do use a unique key. If it's required, then your
definitions don't select it into the same name (i.e. you select as
id_carrer in one and id_hidrant in another). So if id_hidrant was defined
as your unique
You need to look carefully at your schema.xml. There are plenty of
comments in that file describing what's going on. That's where you
set up your analyzers by chaining together various tokenizers
and filters.
I think you're confused about indexing and storing. Generally it's
a bad practice to allo
My index contains data of 2 different languages, English & German. Now which
analyzer & stemmer should be applied on this data before feeding to index
-Sarfaraz
This isn't a very worrisome case. Most of the messages you see on the board
about
the dangers of dates arise because dates can be stored with many unique
values if
they include milliseconds. Then, when sorting on date your memory explodes
because
all the dates are loaded into memory.
In your case,
The short answer is "there isn't a single analyzer and stemmer that
really work well for mixed-language indexing and searching".
Take a look through the mail archive, try search for multilanguage or
multi-language
or multiple languages. There's a wealth of info there because this topic has
been
di
There are terms in my data like : one-way , separated by '-' , now the problem
is that the standard analyzer is considering these as a single term instead of
two. but i need that these should be stored as two terms in the index.. but how
to do this ??
Sarfaraz
Thanx Erick
:-)
--- On Thu, 8/7/10, Erick Erickson wrote:
From: Erick Erickson
Subject: Re: stemming the index
To: solr-user@lucene.apache.org
Date: Thursday, 8 July, 2010, 1:33 AM
The short answer is "there isn't a single analyzer and stemmer that
really work well for mixed-language indexing
Take a look at WordDelimiterFilterFactory
Erick
On Wed, Jul 7, 2010 at 4:15 PM, sarfaraz masood <
sarfarazmasood2...@yahoo.com> wrote:
> There are terms in my data like : one-way , separated by '-' , now the
> problem is that the standard analyzer is considering these as a single term
> inst
: Ubuntu server (see exception below). The same configuration works when
: injecting from a Windows client to a Windows server.
interesting ... so you're saying that if you use the exact same SolrJ
code, and just change the host:port, it works on windows? are you certian
that the version of So
: Does anyone know how to read in data from one or more of the example xml docs
: and ALSO store the filename and path from which it came?
Solr has no knowledge that your "xml docs" are actually files ... the XML
syntax ("...") is just a serialization mechanism for streaming
data to solr about
: with multicore. i cannot access:
: http://localhost:8983/solr/collection1/admin/zookeeper.jsp
why would you expect that URL to work? you don't have a core named
"collection1" in the solr.xml you posted...
:
:
:
...the only time "collection1" appears is as the defaultCoreName, but
unless
: I am fectching the following details programatically :
1) you didn't tell us how you were fetching those detials programatically
.. what URL are you using?
2) he fact that the handlerStart times are different suggests that you are
not looking at the same handler (maybe you are looking at tw
Hi,
I have a text file broken apart by carriage returns, and I'd like to only
return entire lines. So, I'm trying to use this:
&hl.fragmenter=regex
&hl.regex.pattern=^.*$
... but I still get fragments, even if I crank up the hl.regex.slop to 3 or so.
I also tried a pattern of
TokenFilterFactory is an interface. Your factory class has to
implement this interface.
If you look at the Lucene factories, they all subclass from
BaseTokenFilterFactory which then subclasses from
BaseTokenStreamFactory. That last one does various things for the
child factories (I don't know wha
If autocommit does not to an automatic rollback, that is a serious bug.
There should be a way to detect that an automatic rollback has
happened, but I don't know what it is. Maybe something in the Solr
MBeans?
On Wed, Jul 7, 2010 at 5:41 AM, osocurious2 wrote:
>
> I haven't used this myself, but
Yes, for a user's query you would include a different set of boosts as
a parameter in the search request. It's easy. You need the user->boost
set mapping in your front end, not in Solr.
On Wed, Jul 7, 2010 at 8:44 AM, Jean-Michel Philippon-Nadeau
wrote:
> Hi list,
>
> I am wondering if Solr/Lucen
You can pass variables to the DIH from the URL parameters. This would
let you pass a query term into the DIH operation.
On Wed, Jul 7, 2010 at 11:53 AM, Frank A wrote:
> I'm still pretty new to SOLR and have a question about handling updates. I
> currently have a db-config to do a bulk import.
There is no 'trie string'.
If you use a trie type for this problem, sorting will take much less
memory. Sorting strings uses memory both per document and per unique
term. The Trie types do not use any memory per unique term. So, yes, a
Trie Integer is a good choice for this problem.
On Wed, Jul 7
Hey Robert,
You may want to check out Flume for log file collection:
http://github.com/cloudera/flume. We don't currently allow Flume to populate
a Solr index, but that would be quite an interesting use case!
Later,
Jeff
On Wed, Jun 30, 2010 at 3:06 PM, Robert Petersen wrote:
> Sorry if this i
I use SegmentInfos to read the segment_N file and found the error is
that it try to load deletedDocs but the .del file's size is 0(because
of disk error) . So I use SegmentInfos to set delGen=-1 to ignore
deleted Docs.
But I think there is some bug. The logic of write my be -- it first
writes the
Although I have not tested it myself yet, the Lucene-Hunspell project might
be worth to have a look at: http://code.google.com/p/lucene-hunspell/
Jaran
On Wed, Jul 7, 2010 at 10:15 PM, sarfaraz masood <
sarfarazmasood2...@yahoo.com> wrote:
> Thanx Erick
> :-)
>
> --- On Thu, 8/7/10, Erick Ericks
34 matches
Mail list logo