Use TemplateTransformer
On Wed, Jun 15, 2011 at 4:41 PM, MartinS wrote:
> Hello,
>
> I want to perform a data import from a relational database.
> Th
Yes Erick,
I did create an artificial load test with 30 users concurrently doing search
(around 28000 samples of actual queries). With 1.4.1, the test completes
within 3hrs without any failures (with SOLR1.2.1 it wouldn't match with this
performance, i.e., in 3 hrs it could only do 9700 samples).
I just don't want to suffer all the limitation a multiValued field has.. (it
does have some limitations, doesn't it?) I just remember I read somewhere
that it does.
On Wed, Jun 15, 2011 at 4:01 PM, Bob Sandiford wrote:
> Oops - sorry - missed that...
>
> Well, the multiValued setting is explic
Or another way of saying this is - what is the maximum throughput you
get from the system (qps / indexing speed, etc) since that is what you
really (should) care about - and how does it compare to the previous setup?
-Mike
On 6/15/2011 3:52 PM, Erick Erickson wrote:
Yes, 100% CPU utilization
"Make the master a slave of the slaves"... Sounds like
an infinite loop to me ...
You don't have to "bring the old master back online". Just
leave the promoted slave as the new master forever and then
create a new slave.
If you have significant differences between hardware for
master and slave, t
You might be able to do something like this in a custom QParser. look at
the LuceneQParser as an example, but replace usages of QueryParser with
your own subclass of QueryParser where you override the getBooleanQuery
method and muck with the Occur property of the BooleanClauses if they all
h
: If I change the field type in my schema, do I need to rebuild the entire
: index? I'm at a point now where it takes over a day to do a full import due
: to the sheer size of my application and I would prefer not having to reindex
: just because I want to make a change somewhere.
it really depen
Hey Brian,
Catching up on my email from vacation i notice a bunch of questions from
your about similarity and per-field similarity and the new
similarityprovider stuff that don't look like they were ever really
resolved.
A little back ground...
once upon a time, "Similarity" was a global sor
Hi Erick,
Thank you for the advice. Given what you've told us, we've modified our plan
to include three physical boxes: one master indexer, and two slaves. I have
a question, however.
Suppose that the master node goes down and we promote the designated slave
to take over. The new master node will
: Rather than reinventing wheels here, I think that fronting the conf/
: directory with a WebDAV server would be a great way to go. I'm not
: familiar with the state-of-the-art of WebDAV servers these days but
: there might be something pretty trivial that can be configured in Tomcat
: to do
Hmm, I suppose I have the same question from the Mahout side (I didn't
write that text). I would certainly call this far more related to
Hadoop than Lucene, though there are some Lucene touch-points, but no
direct connection to Solr that I'm aware of.
If I'm not wildly mistaken then I can edit the
In addition to Bob's response:
Am 15.06.2011 13:59, schrieb Omri Cohen:
[...]
> stored="true" required="false" />
> stored="true" required="false" />
> stored="true" required="false" />
> stored="true" required="false" />.
1. The value for "indexed" should either be "true" or "fals
Hello,
I want to perform a data import from a relational database.
That all works well.
However, i want to dynamically create a unique id for my solr documents
while importing by using my data config file. I cant get it to work, maybe
its not possible this way, but i thought i would ask you ll.
(I
How would the resulting single-valued field look like? Concatenate all input
fields into one long string?
If that's what you need, I've written a FieldCopy UpdateProcessor which can do
that. I'll contribute it in https://issues.apache.org/jira/browse/SOLR-2599
--
Jan Høydahl, search solution ar
Hi Dimitry,
>>The parameters you have menioned -- termInfosIndexDivisor and
>>termIndexInterval -- are not found in the solr 1.4.1 config|schema. Are you
>>using SOLR 3.1?
I'm pretty sure that the termIndexInterval (ratio of tii file to tis file) is
in the 1.4.1 example solrconfig.xml file, alt
Thanks. I'm trying to think through if there's any hypothetical way for
dismax to be improved to not be subject to this problem. Now that it's
clear that the problem isn't just with stopwords, and that in fact it's
very hard to predict if you'll get the problem and under what input,
when creat
I wonder whether CharFilters are applied to wildcard terms? I suspect
they might be. If that's the case, you could use the MappingCharFilter
to perform lowercasing (and strip diacritics too if you want that)
-Mike
On 06/15/2011 10:12 AM, Jamie Johnson wrote:
So simply lower casing the works
Hmmm, I admit I'm not using embedded, and I'm using 3.2, but I'm
not seeing the behavior you are.
My question about reindexing could have been better stated, I
was just making sure you didn't have some leftover cruft where
your field was multi-valued from previous experiments, but if
you're reinde
More hardware ...
Here's one scenario...
If you set up a master and two slaves, and then front the slaves
with a load balancer your system will be more robust.
In the event a slave goes down, all search requests will be handled
by the remaining slave while you create a new slave, have it replica
Yes, 100% CPU utilization will affect other processes, but
you've created an artificial situation with your load testing,
so I don't think it counts...
What kind of cpu utilization do you see when you simulate your
actual load rather than querying as fast as you can? That's a
more relevant number
Please review this page:
http://wiki.apache.org/solr/UsingMailingLists
You haven't stated what your problem is. Some
examples of what your inputs and desired outputs
are would be helpful
Meanwhile, see this page:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
but that's a wild gu
We rebuild the index from scratch each time we start (for now). The fields in
question are not multi-valued; in fact, I explicitly set multi-valued to false,
just to be sure.
Yes, this is SolrJ, using the embedded server, if that matters.
Using Solr/Lucene 3.1.0.
-Rich
-Original Message--
Jonathan:
Thanks for writing that up, you're right, it is arcane
I've starred this one!
Erick
>
> http://lucene.472066.n3.nabble.com/Dismax-Minimum-Match-Stopwords-Bug-td493483.html
> http://bibwild.wordpress.com/2010/04/14/solr-stop-wordsdismax-gotcha/
>
> So to understand, first familiari
Did you perhaps change the schema but not re-index? I'm grasping
at straws here, but something like this might happen if part of
your index has that field as a multi-valued field
If that't not the problem, what version of solr are you using? I
presume this is SolrJ?
Best
Erick
On Wed, Jun 15
I've found the problem in case someone is interested.
It's because of the indexReader.reopen(). If it is enabled, when opening a
new searcher due to the commit, this code is executed (in
SolrCore.getSearcher(boolean forceNew, boolean returnSearcher, final
Future[] waitSearcher)):
...
if
Hello,
Our development team is currently looking into migrating our search system
to Apache Solr, and we would greatly appreciate some advice on setup. We are
indexing approximately two hundred million database rows. We add about a
hundred thousand new rows throughout the day. These new database r
Try to look for snapshot.current file in the logs folder in ur SOLR-home dist
in your slave server, if this shows the older snapshot.
I also faced the similar issue(but with SOLR 1.2.1), using the
collection-distribution scripts.
The way i resolved it was:
1. Stopped the index replication script(
Next, however, I predict you're going to ask how you do a 'join' or
otherwise query accross both these cores at once though. You can't do
that in Solr.
On 6/15/2011 1:00 PM, Frank Wesemann wrote:
You'll configure multiple cores:
http://wiki.apache.org/solr/CoreAdmin
Hi.
How to have multiple
Hi Yonik,
Thanx for the prompt reply. This is a relief :)
Just 1 more question. Wouldn't the 100% CPU load would affect the system, as
system process would starve for the CPU?
I tried the load test 1st with 4-cores and then with 8-cores, still the CPU
usage was reaching 100%
We have index of ab
On Wed, Jun 15, 2011 at 2:21 PM, pravesh wrote:
> I would need some help in minimizing the CPU load on the new system. Could
> possibly NIOFSDirectory attributes to high CPU?
Yes, it's a feature! The CPU is only higher because the threads
aren't blocked on IO as much.
So the increase in CPU you
Hi everybody, I am using tinyMCE to save the text I am indexing, but as you
know the characters whith accents are changed. Could anybody tell me how to
solve that problem ? Is there any analyzers that recognize rich text ???
I would appreciate your help.
Regards,
Ariel
Okay, I figured this one out -- I'm participating in a thread with
myself here, but for benefit of posterity, or if anyone's interested,
it's kind of interesting.
It's actually a variation of the known issue with dismax, mm, and fields
with varying stopwords. Actually a pretty tricky problem w
Hi - I am examining a SolrDocument I retrieved through a query. The field I am
looking at is declared this way in my schema:
I know multivalued defaults to false, but I set it explicitly because I'm
seeing some unexpected behavior. I retrieve the value of the field like so:
final String resou
Hi,
I'm planning to upgrade my system from SOLR1.2.1 to SOLR1.4.1 version.
We had done some lucene level optimizations on the SOLR slaves in the
earlier system(1.2.1), like:
1. removed the synchronized block from the SegmentReader class's
isDeleted() method
2. removed the synchronized block fro
Hi Roman,
do you have solved your problem and how?
Regards,
Kai Gülzau
> -Original Message-
> From: Roman Chyla [mailto:roman.ch...@gmail.com]
> Sent: Saturday, February 05, 2011 4:50 PM
> To: solr-user@lucene.apache.org
> Subject: Is there anything like MultiSearcher?
>
> Dear Sol
I am new to both Solr and Cell, so sorry if I am misusing some of the
terminologies. So the problem I am trying to solve is to index a PDF document
using Solr Cell where I want to exclude part of it via XPATH. I am using Solr
release 3.1. When researching the user list, I came across one entry o
Hi,
By in-memory, I mean you hold a list of users (+ some other parameters
like order number, expiry, what ever else you need) in one of those
Greek HashMaps, and use this list to determine what query
parameters/results will be processed for a given search request
(SOLR-1872 reads an acl file to p
Thanks ,Peter.
I am not a Java Programmer and hence the code seems all Greek and Latin to
me .I do have a basic knowledge ,but all this Map,hashMap
,Hashlist,NamedList , I dont understand.
However I would like to implement the solution that you have mentoned ,so
if you have any pointers for
Have you tried setting your default operator to AND
in schema.xml?
Best
Erick
On Wed, Jun 15, 2011 at 12:36 PM, rajini maski wrote:
> ok. Thank you. I will consider this.
>
> One last doubt ,how do i handle negation terms?
>
> In the above mail as i mentioned, If i have 3 sentence like this:
>
>
You'll configure multiple cores:
http://wiki.apache.org/solr/CoreAdmin
Hi.
How to have multiple indexes in SOLR, with different fields and
different types of data?
Thank you very much!
Bye.
--
mit freundlichem Gruß,
Frank Wesemann
Fotofinder GmbH USt-IdNr. DE812854514
Software En
Try to use multiple cores:
http://wiki.apache.org/solr/CoreAdmin
On Wed, Jun 15, 2011 at 5:55 PM, shacky wrote:
> Hi.
>
> How to have multiple indexes in SOLR, with different fields and
> different types of data?
>
> Thank you very much!
> Bye.
>
--
Edoardo Tosca
Sourcesense - making sense o
Hi.
How to have multiple indexes in SOLR, with different fields and
different types of data?
Thank you very much!
Bye.
>
> Is it possible to use the clustering component to use predefined clusters
> generated by Mahout?
Actually, the existing Solr ClusteringComponent's API has been designed to
deal with both search results clustering (implemented by Carrot2) and
off-line clustering of the whole index. The latter
On Wed, Jun 15, 2011 at 6:11 PM, Omri Cohen wrote:
> thanks for the quick response, though as I said in my original post:
>
> *"some one has any idea, how I solve this without changing at_location to
> multiField? "*
[...]
This requirement makes little sense on the face of it, and as far as I kno
ok. Thank you. I will consider this.
One last doubt ,how do i handle negation terms?
In the above mail as i mentioned, If i have 3 sentence like this:
1 .tissue devitalization was observed in hepalocytes of liver
2. necrosis was observed in liver
3. Necrosis not found in liver
When i search "Ne
than
On Wed, Jun 15, 2011 at 9:42 PM, Erick Erickson wrote:
> Well, first it is usually unnecessary to specify the
> synonym filter both at index and query time, I'd apply
> it only at query time to start, then perhaps switch
> to index time, see the discussion at:
>
> http://wiki.apache.org/solr
i also thought of the lengthFilter stuff, provided it's a
text/KeywordTokenizer field:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.LengthFilterFactory
cheers,
rob
On Wed, Jun 15, 2011 at 12:00 PM, Erick Erickson
wrote:
> Have you tried setting 'facet.missing="false" '
I was hoping this wasn't the case :(
Is it possible to use the clustering component to use predefined
clusters generated by Mahout?
On 6/15/11 9:14 AM, Sean Owen wrote:
Hmm, I suppose I have the same question from the Mahout side (I didn't
write that text). I would certainly call this far mor
Well, first it is usually unnecessary to specify the
synonym filter both at index and query time, I'd apply
it only at query time to start, then perhaps switch
to index time, see the discussion at:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#head-2c461ac74b4ddd82e453dc68fcfc92da7735
Have you tried setting 'facet.missing="false" '?
See:
http://wiki.apache.org/solr/SimpleFacetParameters#facet.missing
Best
Erick
On Wed, Jun 15, 2011 at 11:52 AM, Adam Estrada
wrote:
> All,
>
> I have a field "foo" with several thousand blank or non-existing records in
> it. This is also my fac
All,
I have a field "foo" with several thousand blank or non-existing records in
it. This is also my faceting field. My question is, how can I deal with this
field so that I don't get a blank facet at query time?
5000
vs.
1000
Adam
The first question I have is whether you're sorting and/or
faceting on many unique string values? I'm guessing
that sometime you are. So, some questions to help
pin it down:
1> what fields are you sorting on?
2> what fields are you faceting on?
3> how many unique terms in each (see the solr admin p
The only integration at this point (as far as I can tell) is that Mahout can
read the lucene index created by Solr. I agree that it would be nice to swap
out the Carrot2 clustering engine with Mahout's set of algorithms but that
has not been done yet. Grant has pointed out that you can use Solr's
c
I have some more info!
I've build another index bigger than the others so names of the files are
not the same. This way, if I move from any of the other index to the bigger
one or vicevera it works (I can see the cahnges in the version, numDocs and
maxDocs)! So, I thing it is related to the name of
Hi,
I just came across this:
If I abort an import via /dataimport/?command=abort the connections to
the (in my case) database stay open.
Shouldn't DocBuilder#rollback() call something like cleanup() which in
turn tries to close EntityProcessors, Datasources etc.
instead of relying that finalize
"Apache Mahout is a new Apache TLP project to create scalable, machine
learning algorithms under the Apache license. It is related to other
Apache Lucene projects and integrates well with Solr."
How does Mahout integrate well with Solr? Can someone explain a brief
overview on whats available.
I don't know if this could have something to do with the problem but some of
the files of the indexes have same size and name (in all the index but not
in the empty one).
I have also realized that when moving back to the empty index and
committing, numDocs and maxDocs change. Once I'm with the empt
Hi all.
I'm starting to work with Solr.
I have a problem how to index a single page with multiple-page document.
That is, I want to see in the search results page number of the
multiple-page document (eg PDF) where the search term.
I'm struggling with this problem more than a month
and I have noth
So simply lower casing the works but can get complex. The query that I'm
executing may have things like ranges which require some words to be upper
case (i.e. TO). I think this would be much better solved on Solrs end, is
there a JIRA about this?
On Tue, Jun 14, 2011 at 5:33 PM, Mike Sokolov wr
Oops - sorry - missed that...
Well, the multiValued setting is explicitly to allow multiple values.
So - what's your actual use case - i.e. why do you want multiple values in a
field, but not want it to be multiValued? What's the problem you're trying to
solve here?
Bob Sandiford | Lead Softw
thanks for the quick response, though as I said in my original post:
*"some one has any idea, how I solve this without changing at_location to
multiField? "*
thank you very much though
*
*
*Omri Cohen*
Co-founder @ yotpo.com | o...@yotpo.com | +972-50-7235198 | +972-3-6036295
My profiles
Omri - you need to indicate to Solr that your at_location field can accept
multiple values. Add this to the field declaration:
multiValued="true"
See this reference for more information / options:
http://wiki.apache.org/solr/SchemaXml
Bob Sandiford | Lead Software Engineer | SirsiDyn
Dear list,
after getting OOM exception after one week of operation with
solr 3.2 I used MemoryAnalyzer for the heapdumpfile.
It looks like the fieldCache eats up all memory.
Objects Shalow Heap
Retained Heap
org.apache.lucene.search.Fi
Test are done on Solr 1.4
The simplest way to reproduce my problem is having 2 indexes and a Solr box
with just one core. Both index must have been created with the same schema.
1- Remove the index dir of the core and start the server (core is up with an
empty index)
2- check status page of the co
First off, you didn't "violate groups ettiquette". In fact, yours was
one of the better first posts in terms or providing enough information
for us to actually help!
A very useful page is the admin/analysis page to see how the
analysis chain works. For instance, if you haven't changed the
field ty
Erick: I have tried what you said. I needed clarification on this.. Below is
my doubt added:
Say If i have field type :
The data indexed in this field is :
sentence 1 : " tissue de
Hello everybody,
I have problem with a custom sorting in solr . This problem(incorrect
sorting order) happened only when used 2 or more shards in solr
configuration.
I did next:
Extends from TrieIntField just for override comparator
*public class OperatingStatusFieldType extends TrieIntF
Tomás, thanks for answer. It was very helpful!
We are using your first option now with a workaround. :)
But initially we thought not to modify the user's request. So if the user
requests the housenumber=14, we do not want to parse the request and add an
"OR 0". If there is no housenumber=14 solr s
Hi,
If you submit information to solr using xml, does the server assume you're
using unicode encoded in utf8? And does it accept the whole range of
possible characters in unicode? (For example, characters that require
multiple bytes when encoded in utf-8).
I'm getting quite a few "Invalid UTF-8 m
String also does not seem to accept spaces. currently the _id fields can
contain multiple ids ( using as a multiType alternative ). This is why I
used the text type.
On 15 June 2011 12:16, Judioo wrote:
> stored="true"/>
>
> so all attributes except 'id' are of type text.
>
> I didn't know t
What version of Solr is this?
Can you show steps to reproduce w/ the example server and data?
-Yonik
http://www.lucidimagination.com
On Wed, Jun 15, 2011 at 7:25 AM, Marc Sturlese wrote:
> Hey there,
> I've noticed a very odd behaviour with the snapinstaller and commit (using
> collectionDistri
Hello all,
in my schema.xml i have this fields:
.
I am trying to do the following:
I am getting the next exception:
ERROR: multiple values encountered for non multiValued copy field
at_location
some one has any idea, how I solve this without changing at_location to
multiFie
Hey there,
I've noticed a very odd behaviour with the snapinstaller and commit (using
collectionDistribution scripts). The first time I install a new index
everything works fine. But when installing a new one, I can't see the new
documents. Checking the status page of the core tells me that the ind
so all attributes except 'id' are of type text.
I didn't know that about the string type. So is my problem as described (
that partial matches are contributing to the calculation ) and does defining
the filed type as string solve this problem.
Or is my understanding completely incorrect?
Th
> /solr/select/?q=b007vty6&defType=dismax&qf=id^10%20parent_id^9%20brand_container_id^8%20series_container_id^8%20subseries_container_id^8%20clip_container_id^1%20clip_episode_id^1&debugQuery=on&fl=id,parent_id,brand_container_id,series_container_id,subseries_container_id,clip_episode_id,clip_episo
Sub entities can slow down indexing remarkably.What is that
datasource? DB? then try using CachedSqlEntityProcessor
On Tue, Jun 14, 2011 at 8:31 PM, Mark wrote:
> Hello all,
>
> We are using DIH to index our data (~6M documents) and its taking an
> extremely long time (~24 hours). I am trying to
On Tue, Jun 14, 2011 at 8:31 PM, Mark wrote:
> Hello all,
>
> We are using DIH to index our data (~6M documents) and its taking an
> extremely long time (~24 hours). I am trying to find ways that we can speed
> this up. I've been reading through older posts and it's my understanding
> this should
Hello,
I use the snowballPorterFilter(dutch) to stem the words in my index. Like
this:
restaurants => restaurant
restauranten => restaurant
apples => apple
Now i see on my solr analytics page that this happens with mcdonald's:
mcdonald's => mcdonald'
I don't want stemming for apostrophes. Is
jFM sounds good... I have written a little stand-alone Servlet that
reads in the XML/TXT Solr configuration files using Apache Commons IO,
outputs the current content for editing and writes it back to the file
system and after that I fire a CURL request from my (PHP based)
administration panel
Apologies
I have tried that method as well.
/solr/select/?q=b007vty6&defType=dismax&qf=id^10%20parent_id^9%20brand_container_id^8%20series_container_id^8%20subseries_container_id^8%20clip_container_id^1%20clip_episode_id^1&debugQuery=on&fl=id,parent_id,brand_container_id,series_container_id,subser
> I have 2 document types but want to return any documents
> where the requested
> ID appears. The ID appears in multiple attributes but I
> want to boost
> results based on which attribute contains the ID.
>
> so my query is
>
> q="id:b007vty6 parent_id:b007vty6
> brand_container_id:b007vty6
> s
Hi
I'm confused about exactly how boosts relevancy scores work.
Apologies if I am violating this groups etiquette but I could not find
solr's paste bin anywhere.
I have 2 document types but want to return any documents where the requested
ID appears. The ID appears in multiple attributes but I w
Though not with webdav (which is underdefined to my taste and seems only to be
working with common implementations such as mod_dav), I had success with jFM (I
used version 0.95):
http://java.net/projects/jfm
maybe that helps?
paul
Le 15 juin 2011 à 09:55, Erik Hatcher a écrit :
> Rat
Rather than reinventing wheels here, I think that fronting the conf/ directory
with a WebDAV server would be a great way to go. I'm not familiar with the
state-of-the-art of WebDAV servers these days but there might be something
pretty trivial that can be configured in Tomcat to do this? Or ma
84 matches
Mail list logo