I've got a very difficult project to tackle. I've been tasked with using
schemaless mode to index json files that we receive. The structure of the
json files will always be very different as we're receiving files from
different customers totally unrelated to one another. We are attempting to
build
Bumping this.
I'm seeing the error mentioned earlier in the thread - "Unable to download
completely. Downloaded 0!=" often in my logs. I'm
dealing with a situation where maxDoc count is growing at a faster rate
than numDocs and is now almost twice as large. I'm not optimizing but
rather relying o
What does "/no_coord" mean in the dismax scoring output? I've looked
through the wiki mail archives, lucidfind, and can't find any reference.
--
¡jah!
I'm on a project where we have 1B docs sharded across 20 servers. We're not
in production yet and we're doing load tests now. We're sending load to hit
100qps per server. As the load increases we're seeing query times
sporadically increasing to 10 seconds, 20 seconds, etc. at times. What
we're tryi
We're on the trunk:
4.0-2011-10-26_08-46-59 1189079 - hudson - 2011-10-26 08:51:47
Client timeouts are set to 4 seconds.
Thanks,
-Jay
On Thu, Jan 26, 2012 at 1:40 PM, Mark Miller wrote:
>
> On Jan 26, 2012, at 1:28 PM, Jay Hill wrote:
>
> >
> > I've tried sett
ibuted search,
meaning if a response wasn't received w/in the timeAllowed, and if
partialResults is true, then that shard would not be waited on for results.
is that correct?
thanks,
-jay
On Thu, Jan 26, 2012 at 2:23 PM, Jay Hill wrote:
> We're on the trunk:
> 4.0-2011-10-26_08-
Working with SolrJ I'm doing a query using the StatsComponent, and the
stats.facet parameter. I'm not able to set multiple fields for the
"stats.facet" parameter using SolrJ. Here is the query I'm trying to create:
http://localhost:8983/solr/select/?q=*:*&stats=on&stats.field=fieldForStats&stats.f
I'm having trouble getting the core CREATE command to work with relative
paths in the solr.xml configuration.
I'm working with a layout like this:
/opt/solr [this is solr.solr.home: $SOLR_HOME]
/opt/solr/solr.xml
/opt/solr/core0/ [this is the "template" core]
/opt/solr/core0/conf/schema.xml [etc.]
A merge factor of 100 is very high and out of the norm. Try starting with a
value of 10. I've never seen a running system with a value anywhere near
this high.
Also, what is your setting for ramBufferSizeMB?
-Jay
On Tue, Aug 17, 2010 at 10:46 AM, rajini maski wrote:
> yeah sorry I forgot to men
Removing those components is not likely to impact performance very much, if
at all. I would focus on other areas when tuning performance, such as
looking memory usage and configuration, query design, etc. But there isn't
any harm in removing them either. Why not do some load tests with the
componen
I have a project where we need to search 1B docs and still have results <
700ms. The problem is, we are using geofiltering and that is happening *
before* the queries, so we have to geofilter on the 1B docs to restrict our
set of docs first, and then do the query on a name field. But it seems that
I have a situation where I want to show the term counts as is done in the
TermsComponent, but *only* for terms that are *matched* in a query, so I
get something returned like this (pseudo code):
q=title:(golf swing)
title: golf legends show how to improve your golf swing on the golf course
...ot
t;
> Best
> Erick
>
> On Fri, Feb 24, 2012 at 3:31 PM, Jay Hill wrote:
> > I have a situation where I want to show the term counts as is done in the
> > TermsComponent, but *only* for terms that are *matched* in a query, so I
> > get something returned like this (p
Usually I would recommend trying to index all languages into one Solr core.
The determining factor for me is how much "overlap" there is in fields for
each language, i.e. how many common fields for each language. For example
if you have 60 common fields to all languages, but only 8 fields that are
You mentioned that dismax does not support wildcards, but edismax does. Not
sure if dismax would have solved your other problems, or whether you just
had to shift gears because of the wildcard issue, but you might want to have
a look at edismax.
-Jay
http://www.lucidimagination.com
On Mon, Jan 3
You can always try something like this out in the analysis.jsp page,
accessible from the Solr Admin home. Check out that page and see how it
allows you to enter text to represent what was indexed, and text for a
query. You can then see if there are matches. Very handy to see how the
various filters
As Hoss mentioned earlier in the thread, you can use the statistics page
from the admin console to view the current number of segments. But if you
want to know by looking at the files, each segment will have a unique
prefix, such as "_u". There will be one unique prefix for every segment in
the ind
Dismax works by first selecting the highest scoring sub-query of all the
sub-queries that were run. If I want to search on three fields, manu, name
and features, I can configure dismax like this:
dismax
* 0.0*
manu name features
*:*
Now I'll use this query:
http
Looks good, thanks Tom.
-Jay
On Fri, Apr 15, 2011 at 8:55 AM, Burton-West, Tom wrote:
> Thanks everyone.
>
> I updated the wiki. If you have a chance please take a look and check to
> make sure I got it right on the wiki.
>
> http://wiki.apache.org/solr/DisMaxQParserPlugin#tie_.28Tie_breaker.2
I don't think I understand what you're trying to do. Are you trying to
preserve all facets after a user clicks on a facet, and thereby triggers a
filter query, which excludes the other facets? If that's the case, you can
use local parameters to tag the filter queries so they are not used for the
fa
I've worked with a lot of different Solr implementations, and one area that
is emerging more and more is using Solr in combination with other "big data"
solutions. My company, Lucid Imagination, has added a two-day course to our
upcoming Lucene Revolution conference, "Scaling Search with Big Data a
UnInvertedField is similar to Lucene's FieldCache, except, while the
FieldCache cannot work with multivalued fields, UnInvertedField is designed
for that very purpose. So since your f_dcperson field is multivalued, by
default you use UnInvertedField. You're not doing anything wrong, that's
default
I'm writing a custom update request handler that will poll a "hot"
directory for Solr xml files and index anything it finds there. The custom
class implements Runnable, and when the run method is called the loop
starts to do the polling. How can I tell Solr to load this class on startup
to fire off
tup up the thread for my polling
UpdateRequestHandler.
This seems to work, but if anyone has a better (or more tested) approach
please let us know.
-Jay
On Mon, Jul 9, 2012 at 2:33 PM, Jay Hill wrote:
> I'm writing a custom update request handler that will poll a "hot"
> d
I'm doing some testing with field collapsing, and early results look good.
One thing seems odd to me however. I would expect to get back one block of
results, but I get two - the first one contains the collapsed results, the
second one contains the full non-collapsed results:
...
...
This see
Check the system request handler: http://localhost:8983/solr/admin/system
Should look something like this:
1.3.0.2009.07.28.10.39.42
1.4-dev 797693M - jayhill - 2009-07-28
10:39:42
2.9-dev
2.9-dev 794238 - 2009-07-15 18:05:08
-Jay
On Thu, Jul 30, 2009 at 10:32 AM, Walter Underwood wrote:
> I
Is it possible for the DataImportHandler to update records in the table it
is querying? For example, say I have a query like this in my entity:
query="select field1, field2, from someTable where hasBeenIndexed=false"
Is there a way I can mark each record processed by updating the
hasBeenIndexed f
updates.
> > > Writing a database procedure might be a good idea. In that case your
> > query
> > > will simply be > .../>.
> > > All the heavy lifting can be done by this query.
> > >
> > > Moreover, update queries, only return the
I'm using the MoreLikeThisHandler with a content stream to get documents
from my index that match content from an html page like this:
http://localhost:8080/solr/mlt?stream.url=http://www.sfgate.com/cgi-bin/article.cgi?f=/c/a/2009/08/06/SP5R194Q13.DTL&mlt.fl=body&rows=4&debugQuery=true
But, not su
8, 2009, at 10:42 AM, Ken Krugler wrote:
>
>
>> On Aug 7, 2009, at 5:23pm, Jay Hill wrote:
>>
>> I'm using the MoreLikeThisHandler with a content stream to get documents
>>> from my index that match content from an html page like this:
>>>
>>> http:
This seems to work:
?q=field\ name:something
Probably not a good idea to have field names with whitespace though.
-Jay
2009/8/28 Marcin Kuptel
> Hi,
>
> Is there a way to query solr about fields which names contain whitespaces?
> Indexing such data does not cause any problems but I have been
Unfortunately you can't sort on a multi-valued field. In order to sort on a
field it must be indexed but not multi-valued.
Have a look at the FieldOptions wiki page for a good description of what
values to set for different use cases:
http://wiki.apache.org/solr/FieldOptionsByUseCase
-Jay
www.luc
All you have to do is use the "start" and "rows" parameters to get the
results you want. For example, the query for the first page of results might
look like this,
?q=solr&start=0&rows=10 (other params omitted). So you'll start at the
beginning (0) and get 10 results. They next page would be
?q=sol
If you need an alternative to using the TermsComponent for auto-suggest,
have a look at this blog on using EdgeNGrams instead of the TermsComponent.
http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/
-Jay
http://www.lucidimagination.com
On Wed, S
Set up the query like this to highlight a field named "content":
SolrQuery query = new SolrQuery();
query.setQuery("foo");
query.setHighlight(true).setHighlightSnippets(1); //set other params as
needed
query.setParam("hl.fl", "content");
QueryResponse queryResponse =getSolrSe
one line out of the
> whole field as a snippet.
>
> On Thu, Sep 10, 2009 at 7:45 PM, Jay Hill wrote:
> > Set up the query like this to highlight a field named "content":
> >
> >SolrQuery query = new SolrQuery();
> >query.setQuery("foo");
>
gh lighted, even if the search term only occurs in the
> first line of a 300 page field. I'm not sure if mergeContinuous will
> do that, or if it will miss everything after the last line that
> contains the search term.
>
> On Fri, Sep 11, 2009 at 10:42 AM, Jay Hill wrote:
> &g
RequestHandlers are configured in solrconfig.xml. If no components are
explicitly declared in the request handler config the the defaults are used.
They are:
- QueryComponent
- FacetComponent
- MoreLikeThisComponent
- HighlightComponent
- StatsComponent
- DebugComponent
If you wanted to have a cus
Will do Shalin.
-Jay
http://www.lucidimagination.com
On Fri, Sep 11, 2009 at 9:23 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:
> Jay, it would be great if you can add this example to the Solrj wiki:
>
> http://wiki.apache.org/solr/Solrj
>
> On Fri, Sep 11,
Use: ?q=*:*
-Jay
http://www.lucidimagination.com
On Mon, Sep 14, 2009 at 4:18 PM, Jonathan Vanasco wrote:
> I'm using Solr for seach and faceted browsing
>
> Is it possible to have solr search for 'everything' , at least as far as q
> is concerned ?
>
> The request handlers I've found don't li
With dismax you can use q.alt when the q param is missing:
q.alt=*:*
should work.
-Jay
On Mon, Sep 14, 2009 at 5:38 PM, Jonathan Vanasco wrote:
> Thanks Jay & Matt
>
> I tried *:* on my app, and it didn't work
>
> I tried it on the solr admin, and it did
>
> I checked the solr config file, and
The two jar files are all you should need, and the configuration is correct.
However I noticed that you are on Solr 1.3. I haven't tested the Lucid
KStemmer on a non-Lucid-certified distribution of 1.3. I have tested it on
recent versions of 1.4 and it works fine (just tested with the most recent
n
For security reasons (say I'm indexing very sensitive data, medical records
for example) is there a way to encrypt data that is stored in Solr? Some
businesses I've encountered have such needs and this is a barrier to them
adopting Solr to replace other legacy systems. Would it require a
custom-wri
When working with SolrJ I have typically batched a Collection of
SolrInputDocument objects before sending them to the Solr server. I'm
working with the latest nightly build and using the ExtractingRequestHandler
to index documents, and everything is working fine. Except I haven't been
able to figur
Have a look at a blog I posted on how to use EdgeNGrams to build an
auto-suggest tool:
http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/
You could easily add filter queries to this approach. Ffor example, the
query used in the blog could add filter
"Two other approaches are to use either the TermsComponent (new in Solr
> > 1.4) or faceting."
>
>
>
> On Wed, Oct 7, 2009 at 1:51 AM, Jay Hill wrote:
>
> > Have a look at a blog I posted on how to use EdgeNGrams to build an
> > auto-suggest tool:
>
Correct me if I'm wrong, but wasn't the ISOLatin1AccentFilterFactory
deprecated in favor of:
in 1.4?
-Jay
http://www.lucidimagination.com
On Wed, Oct 7, 2009 at 1:44 AM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:
> On Tue, Oct 6, 2009 at 4:33 PM, Chantal Ackermann <
> chantal.
In the past setting rows=n with the full-import command has stopped the DIH
importing at the number I passed in, but now this doesn't seem to be
working. Here is the command I'm using:
curl '
http://localhost:8983/solr/indexer/mediawiki?command=full-import&rows=100'
But when 100 docs are imported
//issues.apache.org/jira/browse/SOLR-1501
> >
> > On Fri, Oct 9, 2009 at 6:10 AM, Jay Hill wrote:
> > > In the past setting rows=n with the full-import command has stopped the
> > DIH
> > > importing at the number I passed in, but now this doesn't se
Use copyField to copy to a field with a field type like this:
This works for your example, however I can't be sure if it will work for all
of your content, but give it a try and see.
-Jay
http://www.lucid
You could use separate DIH config files for each of your three tables. This
might be overkill, but it would keep them separate. The DIH is not limited
to one request handler setup, so you could create a unique handler for each
case with a unique name:
table1-config.xml
Shouldn't that be: java -Dsolr.solr.home=multicore -jar start.jar
and then hit url: http://localhost:8983/solr/core0/admin/ or
http://localhost:8983/solr/core1/admin/
-Jay
http://www.lucidimagination.com
On Fri, Oct 9, 2009 at 1:17 PM, Jason Rutherglen wrote:
> I have a fresh checkout from t
> 2009-10-09 13:37:05.096::INFO: Started SocketConnector @ 0.0.0.0:8983
>
> And http://localhost:8983/solr/admin yields a 404 error.
>
> On Fri, Oct 9, 2009 at 1:27 PM, Jay Hill wrote:
> > Shouldn't that be: java -Dsolr.solr.home=multicore -jar start.jar
> >
> > an
1.4 has a good chance of being released next week. There was a hope that it
might make it this week, but another bug in Lucene 2.9.1 was found, pushing
things back just a little bit longer.
-Jay
http://www.lucidimagination.com
On Thu, Oct 29, 2009 at 11:43 AM, beaviebugeater wrote:
>
> Do you h
Have a look at the VelocityResponseWriter (
http://wiki.apache.org/solr/VelocityResponseWriter). It's in the contrib
area, but the wiki has instructions on how to move it into your core Solr.
Solr uses response writers to return results. The default is XML but
responses can be returned in JSON, Rub
So assuming you set up a few sample sort queries to run in the firstSearcher
config, and had very low query volume during that ten minutes so that there
were no evictions before a new Searcher was loaded, would those queries run
by the firstSearcher be passed along to the cache for the next Searche
Here is a brief example of how to use SolrJ with the
ExtractingRequestHandler:
ContentStreamUpdateRequest req = new
ContentStreamUpdateRequest("/update/extract");
req.addFile(fileToIndex);
req.setParam("literal.id", getId(fileToIndex));
req.setParam("literal
You can set up multiple request handlers each with their own configuration
file. For example, in addition to the config you listed you could add
something like this:
data-two-config.xml
and so on with as many handlers as you need.
-Jay
http://www.lucidimagination.com
On Thu, Nov 5, 2009 a
There is a "text_rev" field type in the example schema.xml file in the
official release of 1.4. It uses the ReversedWildcardFilterFactory to revers
a field. You can do a copyField from the field you want to use for leading
wildcard searches to a field using the text_rev field, and then do a regular
The replication admin page on slaves used to have an auto-reload set to
reload every few seconds. In the official 1.4 release this doesn't seem to
be working, but it does in a nightly build from early June. Was this changed
on purpose or is this a bug? I looked through CHANGES.txt to see if anythin
I don't think your queries are actually nested queries. Nested queries key
off of the "magic" field name _query_. You're right however that there is
very little in the way of documentation of examples of nested queries. If
you haven't seen this blog about them yet you might find this a helpful
over
Looking at the example version of schema.xml there seems to be some
confusion on which numeric field types are best used in different
situations. What confused me was that the type of "int" is now set to a
TrieIntField, but with a precisionStep of 0:
'
the "tint" type is set up as a TrieIntFiel
I'm on a project where I'm trying to determine the size of the field cache.
We're seeing lots of memory problems, and I suspect that the field cache is
extremely large, but I'm trying to get exact counts on what's in the field
cache.
One thing that struck me as odd in the output of the stats.jsp p
n Sat, Dec 19, 2009 at 11:37 AM, Yonik Seeley
wrote:
> On Sat, Dec 19, 2009 at 2:25 PM, Jay Hill wrote:
> > One thing that struck me as odd in the output of the stats.jsp page is
> that
> > the field cache always shows a String type for a field, even if it is not
> a
> >
Oh, forgot to add (just to keep the thread complete), the field is being
used for a sort, so it was able to use TrieDoubleField.
Thanks again,
-Jay
On Sat, Dec 19, 2009 at 12:21 PM, Jay Hill wrote:
> This field is of class type solr.SortableDoubleField.
>
> I'm actually migra
The string fieldtype is not being tokenized, while the text fieldtype is
tokenized. So the stop word "for" is being removed by a stop word filter,
which doesn't happen with the text field type (no tokenizing).
Have a look at the schema.xml in the example dir and look at the default
configuration f
It seems to me that this is just the expected behavior of the FrenchAnalyzer
using the FrenchStemmer. I'm not familiar with the French language, but in
English words like running, runner, and runs are all stemmed down to "run"
as intended. I don't know what other words in French would stem down to
Usually that means there is another log4j.properties or log4j.xml file in
your classpath that is being found before the one you are intending to use.
Check your classpath for other versions of these files.
-Jay
On Tue, May 12, 2009 at 3:38 AM, Sagar Khetkade
wrote:
>
> Hi,
> I have solr impleme
The only downside would be that you would have to update a document anytime
a user was granted or denied access. You would have to query before the
update to get the current values for grantedUID and deniedUID, remove/add
values, and update the index. If you don't have a lot of changes in the
syste
- Migrate configuration files from old master (or backup) to new master.
- Replicate from a slave to the new master.
- Resume indexing to new master.
-Jay
On Wed, May 13, 2009 at 4:26 AM, nk 11 wrote:
> Nice.
> What if the master fails permanently (like a disk crash...) and the new
> master is
If that is your complete input file then it looks like you are missing the
wrapping element:
F8V7067-APL-KIT
> field>
> Belkin Mobile Power Cord for iPod w/ Dock
> Belkin
> electronics
> connector
> car power adapter, white
> 4
> 19.95
> 1
> false
>
Is it possible you just forgot
I was interested in this recently and also couldn't find anything on the
wiki. I found this in the list archive:
The version parameter determines the XML protocol used in the response.
Clients are strongly encouraged to ''always'' specify the protocol version,
so as to ensure that the format of th
Try using the admin analysis tool
(http://:/solr/admin/analysis.jsp)
too see what the analysis chain is doing to your query. Enter the field name
("question" in your case) and the Field value (Index) "customize" (since
that's what's in the document). For Field value (Query) enter "customer".
Check
Use the fl param to ask for only the fields you need, but also keep hl=true.
Something like this:
http://localhost:8080/solr/select/?q=bear&version=2.2&start=0&rows=10&indent=on&hl=true&fl=id
Note that &fl=id means the only field returned in the XML will be the id
field.
Highlights are still ret
Regarding being able to search SCHOLKOPF (o with no umlaut) and match
SCHÖLKOPF (with umlaut) try using the ISOLatin1AccentFilterFactory in your
analysis chain:
This filter removes accented chars and replaces them with non-accented
versions. As always, make sure to add it to the for both
In order to get the the values you want for the service field you will need
to change the fieldType definition in schema.xml for "service" to use
something that doesn't alter your original values. Try the "string"
fieldType to start and look at the fieldType definition for "string". I'm
guessing yo
I'm having some trouble getting the PlainTextEntityProcessor to populate a
field in an index. I'm using the TemplateTransformer to fill 2 fields, and
have a timestamp field in schema.xml, and these fields make it into the
index. Only the plaintText data is missing. Here is my configuration:
I'm using the DIH to index records from a relational database. No problems,
everything works great. But now, due to the size of index (70GB w/ 25M+
docs) I need to shard and want the DIH to distribute documents evenly
between two shards. Current approach is to modify the sql query in the
config fil
I'm using the XPathEntityProcessor to parse an xml structure that looks like
this:
Joe Smith
World Atlas
Content I want is here
More content I want is here.
Still more content here.>/p>
The author and title parse out fine:
under irrespective of nesting , tag
> names use this
>
>
>
>
>
>
>
> On Thu, Jul 2, 2009 at 5:31 AM, Jay Hill wrote:
> > I'm using the XPathEntityProcessor to parse an xml structure that looks
> like
> > this:
> >
> >
>
dy/chapter//p) doesn't seem to be
> >supported.
> >
> >Thanks,
> >-Jay
> >
> >
> >2009/7/1 Noble Paul ??  ˳Ë
> >
> >> complete xpath is not supported
> >>
> >> /book/body/chapter/p
> >>
> >> s
I'm on the trunk, built on July 2: 1.4-dev 789506
Thanks,
-Jay
On Thu, Jul 2, 2009 at 11:33 AM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:
> On Thu, Jul 2, 2009 at 11:38 PM, Mark Miller
> wrote:
>
> > Shalin Shekhar Mangar wrote:
> >
> >>
> >> It selects all matching nodes. But if t
Thanks Fergus, setting the field to multivalued did work:
gets all the elements as multivalue fields in the body field.
The only thing is, the body field is used by some other content sources, so
I have to look at the implications setting it to multi-valued will have on
the other data sour
Mathieu, have a look at Solr's DataImportHandler. It provides a
configuration-based approach to index different types of datasources
including relational databases and XML files. In particular have a look at
the XpathEntityProcessor (
http://wiki.apache.org/solr/DataImportHandler#head-f1502b1ed71d9
Just to be sure: You mentioned that you "adjusted" schema.xml - did you
re-index after making your changes?
-Jay
On Wed, Jul 8, 2009 at 7:07 AM, Yang Lin wrote:
> Thanks for your reply. But it works not.
>
> Yang
>
> 2009/7/8 Yao Ge
>
> >
> > Try with fl=* or fl=*,score added to your request
I haven't tried this myself, but it sounds like what you're looking for is
enabling remote streaming:
http://wiki.apache.org/solr/ContentStream#head-7179a128a2fdd5dde6b1af553ed41735402aadbf
As the link above shows you should be able to enable remote streaming like
this: and then something like t
Francis, your question is a little vague. Are you looking for the
configuration for connecting the DIH to a JNDI datasource set up in
Weblogic?
-Jay
On Mon, Jul 6, 2009 at 2:41 PM, Francis Yakin wrote:
>
> Have any one had experience creating a datasource for DIH to an Oracle
> Database?
We're building a spell index from a field in our main index with the
following configuration:
textSpell
default
spell
./spellchecker
true
This works great and re-builds the spelling index on commits as expected.
However, we know there are misspellings in
I am trying to run full and delta imports with the commit=false option, but
it doesn't seem to take effect - after the import a commit always happens no
matter what params I send. I've looked at the source and unless I'm missing
something it doesn't seem to process the commit param.
Here's the url
We had the same thing to deal with recently, and a great solution was posted
to the list. Create a stopwords filter on the field your using for your
spell checking, and then populate a custom stopwords file with known
misspelled words:
Y
My bad, I had a configuration setting overriding this value. Sorry for the
mistake.
-Jay
On Wed, Jul 15, 2009 at 12:07 PM, Jay Hill wrote:
> I am trying to run full and delta imports with the commit=false option, but
> it doesn't seem to take effect - after the import a commit alw
Actually, "my good" after all. The parameter does not take effect. If
commit=false is passed in a commit still happens.
Will open and JIRA and supply a patch shortly.
-Jay
On Wed, Jul 15, 2009 at 5:50 PM, Jay Hill wrote:
> My bad, I had a configuration setting overriding this
I've noticed this as well, usually when working with a large field cache. I
haven't done in-depth analysis of this yet, but it seems like when the stats
page is trying to pull data from a large field cache it takes quite a long
time.
Are you doing a lot of sorting? If so, what are the field types
Also, what is your heap size and the amount of RAM on the machine?
I've also noticed that, when watching memory usage through JConsole or
YourKit while loading the stats page, the memory usage spikes dramatically -
are you seeing this as well?
-Jay
On Thu, Dec 24, 2009 at 9:12 AM, Jay
The version of Tika in the 1.4 release definitely parses the most current
Office formats (.docx, .pptx, etc.) and they index as expected.
-Jay
On Mon, Jan 4, 2010 at 6:02 PM, Peter Wolanin wrote:
> You must have been searching old documentation - I think tika 0,3+ has
> support for the new MS f
It's definitely still an issue. I've seen this with at least four different
Solr implementations. It clearly seems to be a problem when there is a large
field cache. It would be bad enough if the stats.jsp was just slow to load
(usually takes 1 to 2 minutes), but when monitoring memory usage with
j
Actually my cases were all with customers I work with, not just one case. A
common practice is to monitor cache stats to tune the caches properly. Also,
noting the warmup times for new IndexSearchers, etc. I've worked with people
that have excessive auto-warm count values which is causing extremely
A couple of follow up questions:
- What type of garbage collector is in use?
- How often are you optimizing the index?
- In solrconfig.xml what is the setting for ?
- Right before and after you see this pause, check the output of
http://:/solr/admin/system,
specifically the output of and send thi
My colleague at Lucid Imagination, Tom Hill, will be presenting a free
webinar focused on analysis in Lucene/Solr. If you're interested, please
sign up and join us.
Here is the official notice:
We'd like to invite you to a free webinar our company is offering next
Thursday, 28 January, at 2PM Eas
If I've done a lot of research and have a very good idea of where my cache
sizes are having monitored the stats right before commits, is there any
reason why I wouldn't just set the initialSize and size counts to the same
values? Is there any reason to set a smaller initialSize if I know reliably
t
1 - 100 of 110 matches
Mail list logo