You can have two fields: one which is stripped, and another which
stores the original data. You can use directives and make
the "stripped" field indexed but not stored, and the original field
stored but not indexed. You only have to upload the file once, and
only store the text once.
If you look
Ok, thanks Otis
Another question on merging
What is the best way to monitor merging?
Is there something in the log file that I can look for?
It seems like I have to monitor the system resources - read/write IOPS etc..
and work out when a merge happened
It would be great if I can do it by looking
Simply turn off replication during your rebuild-from-scratch. See:
http://wiki.apache.org/solr/SolrReplication#HTTP_API
the "disabelreplication" command.
The autocommit thing was, I think, in reference to keeping
any replication of a partial-rebuild from being replicated.
Autocommit is usually a f
Why do you care? Merging is generally a background process, or are
you doing heavy indexing? In a master/slave setup,
it's usually not really relevant except that (with 3.x), massive merges
may temporarily stop indexing. Is that the problem?
Look at the merge policys, there are configurations that
The FieldCache gets populated the first time a given field is referenced as
a facet and then will stay around forever. So, as additional queries get
executed with different facet fields, the number of FieldCache entries will
grow.
If I understand what you have said, theses faceted queries do w
We have a fairly large scale system - about 200 million docs and fairly high
indexing activity - about 300k docs per day with peak ingestion rates of about
20 docs per sec. I want to work out what a good mergeFactor setting would be by
testing with different mergeFactor settings. I think the def
But again, with a master/slave setup merging should
be relatively benign. And at 200M docs, having a M/S
setup is probably indicated.
Here's a good writeup of mergepolicy
http://juanggrande.wordpress.com/2011/02/07/merge-policy-internals/
If you're indexing and searching on a single machine, merg
Hi,
When I tried to remove a data from UI (which will in turn hit SOLR), the
whole application got stuck up. When we took the log files of the UI, we
could see that this set of requests did not reach SOLR itself. In the SOLR
log file, we were able to find the following exception occuring at the s
Erick, I'll do that. Thank you very much.
Regards,
Jacek
On Tue, May 1, 2012 at 7:19 AM, Erick Erickson wrote:
> The easiest way is to do that in the app. That is, return the top
> 10 to the app (by score) then re-order them there. There's nothing
> in Solr that I know of that does what you want
Actually we are not thinking of a M/S setup
We are planning to have x number of shards on N number of servers, each of the
shard handling both indexing and searching
The expected query volume is not that high, so don't think we would need to
replicate to slaves. We think each shard will be able t
Greetings Solr folk,
How can I instruct the extract request handler to ignore metadata/headers
etc. when it constructs the "content" of the document I send to it?
For example, I created an MS Word document containing just the word
"SEARCHWORD" and nothing else. However, when I ship this doc to my
Somehow I missed that there was a solrclean command. Thanks.
On Tue, May 1, 2012 at 10:41 AM, Markus Jelsma
wrote:
> Nutch 1.4 has a separate tool to remove 404 and redirects documents from
> your
> index based on your CrawlDB. Trunk's SolrIndexer can add and remove
> documents
> in one run base
Optimizing is much less important query-speed wise
than historically, essentially it's not recommended much
any more.
A significant effect of optimize _used_ to be purging
obsolete data (i.e. that from deleted docs) from the
index, but that is now done on merge.
There's no harm in optimizing on o
I doubt if SOLR has this capability , given that it is based on a RESTful
architecture, but I wanted to ask in case I'm mistaken.
In lucene, it is easier to gain a direct handle to the collector / scorer
and access all the results as they're collected (as opposed to the SOLR
query call that perfor
In other words, .. as an alternative , what's the most efficient way to gain
access to all of the document ids that match a query
--
View this message in context:
http://lucene.472066.n3.nabble.com/Dumb-question-Streaming-collector-query-results-tp3955175p3955194.html
Sent from the Solr - User ma
Check to see if you have a CopyField for a wildcard pattern that copies to
"meta", which would copy all of the Tika-generated fields to "meta."
-- Jack Krupansky
-Original Message-
From: Joseph Hagerty
Sent: Wednesday, May 02, 2012 9:56 AM
To: solr-user@lucene.apache.org
Subject: Extr
I do not. I commented out all of the copyFields provided in the default
schema.xml that ships with 3.5. My schema is rather minimal. Here is my
fields block, if this helps:
On Wed, May 2, 2012 at 10:59 AM, Jack Krupansky wrote:
> Check to see if you have a CopyField
Hi :)
I'm starting to use Solr and I'm facing a little problem with dates. My
documents have a date property which is of type 'MMdd'.
To index these dates, I use the following code:
String dateString = "20101230";
SimpleDateFormat sdf = new SimpleDateFormat("MMdd");
Date date = sdf.pa
The trailing "Z" is required in your input data to be indexed, but the Z is
not actually stored. Your query must have the trailing "Z" though, unless
you are doing a wildcard or prefix query.
-- Jack Krupansky
-Original Message-
From: G.Long
Sent: Wednesday, May 02, 2012 11:18 AM
To:
I can achieve this by building a query with start and rows = 0, and using
.getResults().getNumFound().
Are there any more efficient approaches to this?
Thanks
--
View this message in context:
http://lucene.472066.n3.nabble.com/SOLRJ-Is-there-a-way-to-obtain-a-quick-count-of-total-results-for-a-
Oops... I meant to say that Solr doesn't *index* the trailing Z, but it is
"stored" (the stored value, not the indexed value.) The query must match the
indexed value, not the stored value.
-- Jack Krupansky
-Original Message-
From: Jack Krupansky
Sent: Wednesday, May 02, 2012 11:55 A
That wasn't right either... the query must have the trailing Z, which Solr
will strip off to match the indexed value which doesn't have the Z. So, my
corrected original statement is:
The trailing "Z" is required in your input data to be indexed, but the Z is
not actually indexed by Solr (it is
Hi Robert,
On May 1, 2012, at 7:07pm, Robert Muir wrote:
> On Tue, May 1, 2012 at 6:48 PM, Ken Krugler
> wrote:
>> Hi list,
>>
>> Does anybody know if the Suggester component is designed to work with shards?
>
> I'm not really sure it is? They would probably have to override the
> default mer
Hello,
I just started using elevation for solr. I am on solr 3.5, running with Drupal
7, Linux.
1. I updated my solrconfig.xml
from
${solr.data.dir:./solr/data}
To
/usr/local/tomcat2/data/solr/dev_d7/data
2. I placed my elevate.xml in my solr's data directory. Based on forum answers,
I thought
I did some testing, and evidently the "meta" field is treated specially from
the ERH.
I copied the example schema, and added both "meta" and "metax" fields and
set "fmap.content=metax", and lo and behold only the doc content appears in
"metax", but all the doc metadata appears in "meta".
Alt
I did small research with the fairly modest result
https://github.com/m-khl/solr-patches/tree/streaming
you can start exploring it from the trivial test
https://github.com/m-khl/solr-patches/blob/17cd45ce7693284de08d39ebc8812aa6a20b8fb3/solr/core/src/test/org/apache/solr/response/ResponseStreaming
I use jetty that comes with solr.
I use solr's dedupe
true
id
true
url
solr.processor.Lookup3Signature
and because of this id is not url itself but its encoded signature.
I see solrclean uses url to delete
How interesting! You know, I did at one point consider that perhaps the
fieldname "meta" may be treated specially, but I talked myself out of it. I
reasoned that a field name in my local schema should have no bearing on how
a plugin such as solr-cell/Tika behaves. I should have tested my
hypothesis
Hi:
I have been working on an integration project involving Solr 3.5.0 that
dynamically registers cores as needed at run-time, but does not contain any
cores by default. The current solr.xml configuration file is:-
This configuration does not include any cores as those are created
dynamica
: String dateString = "20101230";
: SimpleDateFormat sdf = new SimpleDateFormat("MMdd");
: Date date = sdf.parse(dateString);
: doc.addField("date", date);
:
: In the index, the date "20101230" is saved as "2010-12-29T23:00:00Z" ( because
: of GMT).
"because of GMT" is missleading and vague
On Wed, May 2, 2012 at 12:16 PM, Ken Krugler
wrote:
> What confuses me is that Suggester says it's based on SpellChecker, which
> supposedly does work with shards.
>
It is based on spellchecker apis, but spellchecker's ranking is based
on simple comparators like string similarity, whereas sugge
i've installed tomcat7 and solr 3.6.0 on linux/64
i'm trying to get a single webapp + multicore setup working. my efforts
have gone off the rails :-/ i suspect i've followed too many of the
wrong examples.
i'd appreciate some help/direction getting this working.
so far, i've configured
> BTW, in 4.0, there's DocumentWriterPerThread that
> merges in the background
It flushes without pausing, but does not perform merges. Maybe you're
thinking of ConcurrentMergeScheduler?
On Wed, May 2, 2012 at 7:26 AM, Erick Erickson wrote:
> Optimizing is much less important query-speed wise
>
Hello everbody,
I have a doubt with respect to synonyms in Solr, In our company we are lookink
for one solution to resolve synonyms from database and not from one text file
like SynonymFilterFactory do it.
The idea is save all the synonyms in the database, indexing and they will be
ready to
I'm not sure I completely follow, but are you simply saying that you want to
have a synonym filter that reads the synonym table from a database rather
than the current text file? If so, sure, you could develop a replacement for
the current synonym filter which loads its table from a database, bu
Another solution is to write a script to read the database and create the
synonyms.txt file, dump the file to solr and reload the core.
This gives you the custom synonym solution.
-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com]
Sent: Wednesday, May 02, 2012 4:5
I don't know if this will help but I usually add a dataDir element to
each cores solrconfig.xml to point at a local data folder for the core
like this:
${solr.data.dir:./solr/core0/data}
-Original Message-
From: loc...@mm.st [mailto:loc...@mm.st]
Sent: Wednesday, May 0
I chronicled exactly what I had to configure to slay this dragon at
http://vinaybalamuru.wordpress.com/2012/04/12/solr4-tomcat-multicor/
Hope that helps
--
View this message in context:
http://lucene.472066.n3.nabble.com/need-some-help-with-a-multicore-config-of-solr3-6-0-tomcat7-mine-reports-Se
You are missing the "pf", "pf2", and "pf3" request parameters, which says
which fields to do phrase proximity boosting on.
"pf" boosts using the whole query as a phrase, "pf2" boosts bigrams, and
"pf3" boost trigrams.
You can use any combination of them, but if you use none of them, "ps"
app
Thanks for your answers, now I have another cuestions,if I develop the
filter to replacement the current synonym filter,I understand that this
procces would be in time of the indexing because in time of the query search
there are a lot problems knows. if so, how can I do for create my index
file.
Hello Prabhu,
Look at SPM for Solr (URL in sig below). It includes Index Statistics graphs,
and from these graphs you can tell:
* how many docs are in your index
* how many docs are deleted
* size of index on disk
* number of index segments
* number of index files
* maybe something else I'm for
Anyone have any clues about this exception? It happened during the
course of normal indexing. This is new to me (we're running solr 3.6 on
tomcat 6/redhat RHEL) and we've been running smoothly for some time now
until this showed up:
>>>Red Hat Enterprise Linux Server release 5.3 (Tikanga)
>>>
: How do I search for things that have no value or a specified value?
Things with no value...
(*:* -fieldName:[* TO *])
Things with a specific value...
fieldName:A
Things with no value or a specific value...
(*:* -fieldName:[* TO *]) fieldName:A
..."or" if you aren't using
Sounds good. "OR" in the negation of any query that matches any possible
value in a field.
The Solr query parser doc lists the open range as you used:
-field:[* TO *] finds all documents without a value for field
See:
http://wiki.apache.org/solr/SolrQuerySyntax
This also include pure wil
Oops... that is:
(-fname:*) OR fname:(A B C)
or
(-fname:[* TO *]) OR fname:(A B C)
-- Jack Krupansky
-Original Message-
From: Jack Krupansky
Sent: Wednesday, May 02, 2012 7:48 PM
To: solr-user@lucene.apache.org
Subject: Re: syntax for negative query OR something
Sounds good. "OR
Hmmm... I thought that worked in edismax. And I thought that pure negative
queries were allowed in SolrQueryParser. Oh well.
In any case, in the Lucene or Solr query parser, add "*:*" to select all
docs before negating the docs that have any value in the field:
(*:* -fname:*) OR fname:(A B C)
There are lots of different strategies for dealing with synonyms, depending
on what exactly is most important and what exactly your are willing to
tolerate.
In your latest example, you seem to be using string fields, which is
somewhat different form the text synonyms we talk about in Solr. You
(12/05/03 1:39), Noordeen, Roxy wrote:
Hello,
I just started using elevation for solr. I am on solr 3.5, running with Drupal
7, Linux.
1. I updated my solrconfig.xml
from
${solr.data.dir:./solr/data}
To
/usr/local/tomcat2/data/solr/dev_d7/data
2. I placed my elevate.xml in my solr's data dire
I think regular sync of database table with synonym text file seems to be
simplest of the solutions. It will allow you to use Solr natively without
any customization and it is not very complicated operation to update
synonyms file with entries in database.
>
>
thanks!
On Wed, May 2, 2012 at 4:43 PM, Chris Hostetter
wrote:
>
> : How do I search for things that have no value or a specified value?
>
> Things with no value...
> (*:* -fieldName:[* TO *])
> Things with a specific value...
> fieldName:A
> Things with no value or a specific val
Jack,
Yes, the queries work fine till I hit the OOM. The fields that start with
S_* are strings, F_* are floats, I_* are ints and so so. The dynamic field
definitions from schema.xml :
*Each FieldCache will be an array with maxdoc entries (your total number of
documents - 1.4 mil
Hi Francois,
The issue you describe looks like a similar issue we have fixed before
with matches count.
Open an issue and we can look into it.
Martijn
On 1 May 2012 20:14, Francois Perron
wrote:
> Thanks for your response Cody,
>
> First, I used distributed grouping on 2 shards and I'm sure th
Hi,
I just wanted to get some information about whether Parent-Child
relationship between documents which Lucene has been talking about has been
implemented in Solr or not? I know join patch is available, would that be
the only solution?
And another question, as and when this will be possible (if
It seems like slave instance start to pull the index from the master and
then die, it causes broken pipe at master node.
On Thu, May 3, 2012 at 3:31 AM, Robert Petersen wrote:
> Anyone have any clues about this exception? It happened during the
> course of normal indexing. This is new to me (w
54 matches
Mail list logo