Hi ,
Please comment on whether I should consider to move to the old Logbytesize
MP on moving to 3.6.1 from 1.3 ,as I see improvements in query performance
on optimization.
Just to mention we have a lot of indexes in multi cores as well as multiple
webapps and that's the reason we went for CFS in
Any comments on this?
On Mon, Sep 24, 2012 at 10:28 PM, Sujatha Arun wrote:
> Thanks Jack.
>
> so Qtime = Sum of all prepare components + sum of all process components -
> Debug comp process/prepare time
>
> In 3.6.1 the process part of Query component for the following query seems
> to take
hi
I m new to UIMA. Solr doea not have lemmatization component, i was thinking
of using UIMA for this.
Is this a correct choice and if so how i would go about it any idea?
I see couple of links for solr uima integration but dont know how that can
be used for lemmatization
Any thoughts?
--
Hi Mark,
If can support in future, I think it's great. It's a really useful feature.
For example, user can use to refresh with totally new core. User can build
index on one core. After build done, can swap old core and new core. Then
get totally new core for search.
Also can used in the backup. I
Dear all,
The company I'm working in have a website to server more than 10
customers, and every customer should have it's own search cataegory. So I
should create independent index for every customer.
The site http://wiki.apache.org/solr/MultipleIndexes give some solution to
create m
Hi Otis,
I was just looking at how to implement that, but was hoping for a
cleaner method - it seems like I will have to actually parse the error
as text to find the field that caused it, then remove/mangle that
field and attempt re-adding the document - which seems less than
ideal.
I would think
Hi Aaron,
You could catch the error on the client, fix/clean/remove, and retry, no?
Otis
--
Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html
On Mon, Sep 24, 2012 at 9:21 PM, Aaron Daubman wrote:
> Greetings,
>
> Is t
Greetings,
Is there a way to configure more graceful handling of field formatting
exceptions when indexing documents?
Currently, there is a field being generated in some documents that I
am indexing that is supposed to be a float but some times slips
through as an empty string. (I know, fix the d
I have seen this happening
We retry and that works. Is your solr server stalled?
On Mon, Sep 24, 2012 at 4:50 PM, balaji.gandhi
wrote:
> Hi,
>
> I am encountering this error randomly (under load) when posting to Solr
> using SolrJ.
>
> Has anyone encountered a similar error?
>
> org.apache.solr.
: We've been struggling with solr hangs in the solr process that indexes
: incoming PDF documents. TLDR; summary is that I'm thinking that
: PDFBox needs to have COSName.clearResources() called on it if the solr
: indexer expects to be able to keep running indefinitely. Is that
I don't know muc
Hello,
Is there a way to provide multiple facet field names in the Admin UI?
I have tried spaces, comas and simi-colons for no effect. Would have
been nice to be able to push the UI just a tiny bit further before
switching to the URL query string directly.
Or is single facet field a limitation of
On 9/24/2012 11:37 AM, Daisy wrote:
One thing I would like to know what is the diffrence between
PatternReplaceFilter and PatternReplaceCharFilter?
The CharFilter version gets applied before anything else, including the
Tokenizer. The Filter version gets applied in the order specified in
the
If you're concerned about throughput, consider moving all the
SolrCell (Tika) processing off the server. SolrCell is way cool
for showing what can be done, but its downside is you're
moving all the processing of the structured documents to the
same machine doing the indexing. Pretty soon, especiall
Be a little careful, spaces here can mess you up. Particularly
around the hyphen in -1hour. I.e. NOW -1HOUR is invalid but
NOW-1HOUR is ok (note the space between W and -). There aren't
any in your example, but just to be sure
One other note: you may get better performance out of making this
: database. I've got the admin page up, but I can't get
: localhpst:8080/solr/dataimport/ to work. It returns a 404 errror.
1) which version of solr are you using?
2) did you try localhost:8080/solr/dataimport (no trailing slash) ?
3) does anything in the admin UI work?
-Hoss
Hello, John,
Assuming this is a single core instance of Solr, does
"/solr/admin/dataimport.jsp" work?
Michael Della Bitta
Appinions
18 East 41st Street, 2nd Floor | New York, NY 10017-6271
www.appinions.com
Where Influence Isn’t a Game
On Mon,
I've been trying to set up Solr with Tomcat, in order to connect to a MySQL
database. I've got the admin page up, but I can't get
localhpst:8080/solr/dataimport/ to work. It returns a 404 errror.
Been googleing high and low, without finding the answer.
I've put this in my solrconfig.xml
Please see this post here:
http://stackoverflow.com/questions/12324837/apache-cassandra-integration-with-apache-solr/12326329#comment16936430_12326329
Does anyone have experience with or know if it's possible with the Solr
data-config combined with Cassandra JDBC drivers
(http://code.google.com/a/
https://issues.apache.org/jira/browse/SOLR-3883
-Yonik
http://lucidworks.com
On Mon, Sep 24, 2012 at 11:42 AM, Yonik Seeley wrote:
> On Mon, Sep 24, 2012 at 11:03 AM, dan sutton wrote:
>> Hi,
>>
>> This appears to happen in trunk too.
>>
>> It appears that the add command request parameters ge
My problem was that I specified the per-field similarity class INSIDE
the analyzer instead of outside it.
On 09/24/2012 02:56 PM, Carrie Coy wrote:
I'm trying to configure per-field similarity to disregard term
frequency (omitTf) in a 'title' field. I'm trying to follow the
example docs
Hi,
We are working on a DIH for our project and we are persisting the
last_modified_date in the ZooKeeper directory. Our understanding is that the
properties are uploaded to ZooKeeper when the first SOLR node comes up. When
the SOLR nodes are restarted whatever is persisted in the properties is
lo
We've been struggling with solr hangs in the solr process that indexes
incoming PDF documents. TLDR; summary is that I'm thinking that
PDFBox needs to have COSName.clearResources() called on it if the solr
indexer expects to be able to keep running indefinitely. Is that
likely? Is there anybody
I'm trying to configure per-field similarity to disregard term frequency
(omitTf) in a 'title' field. I'm trying to follow the example docs
without success: my custom similarity doesn't seem to have any effect on
'tf'. Is the NoTfSimilarity function below written correctly? Any
advice is
Using "solr.LengthFilterFactory" was great and also solve the problem of
using PatternReplaceFilter. So now I have two solutions. Thanks all for
helping me. One thing I would like to know what is the diffrence between
PatternReplaceFilter and PatternReplaceCharFilter?
--
View this message in con
I've had problems with empty tokens. You can remove those with this as a step
in the analyzer chain.
wunder
On Sep 24, 2012, at 10:07 AM, Jack Krupansky wrote:
> I tried it and PRFF is indeed generating an empty token. I don't know how
> Lucene will index or query an empty term. I me
When I do things like this and want to avoid empty tokens even though
previous analysis might result in some--I just throw one of these at the
end of my analysis chain:
A charfilter to filter raw characters can certainly still result in an
empty token, if an initial token wa
Thanks. Finally it works using
I wonder what is the reason for that, and what is the difference between the
filter and the charFilter?
--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-Remove-specific-punctuation-marks-tp4009795p4009918.html
Sent from the Solr - User
How could I know which query parser I am using?
Here is the part of my schema that I am using
As shown even if I tried to remove "(" the same happened for parsed query
and for numFound.
--
View this message in context:
htt
I tried it and PRFF is indeed generating an empty token. I don't know how
Lucene will index or query an empty term. I mean, what it "should" do. In
any case, it is best to avoid them.
You should be using a "charFilter" to simply filter raw characters before
tokenizing. So, try:
It has the
Thanks Jack.
so Qtime = Sum of all prepare components + sum of all process components -
Debug comp process/prepare time
In 3.6.1 the process part of Query component for the following query seems
to take 8 times more time? anything missing? For most queries the process
part of the Querycomponent
Never fails. Take the time to post this message, only to discover the answer
on my own a few minutes later.
The solution is to surround the -Durl value in double quotes. For example:
java
-Durl="http://localhost:8983/solr/contacts/update/csv?f.address.split=true&f.address.separator=%7C";
-Dtype
1. Which query parser are you using?
2. I see the following comment in the Java 6 doc for regex "\p{Punct}":
"POSIX character classes (US-ASCII only)", so if any of the punctuation is
some higher Unicode character code, it won't be matched/removed.
3. It seems very odd that the parsed query has
Could supply some sample user queries and some sample data the queries
should match? In other words, how do your users expect to "view" the data?
If you are simply trying to replicate full SQL queries in Solr, you're
probably going to be disappointed, but if you look at what queries your
users
I'd like to wrap my head around how faceting in SolrCloud works, does
Solr ask each shard for their maximum value and then use that to
determine what else should be asked for from other shards, or does it
ask for all values and do the aggregation on the requesting server?
On Mon, Sep 24, 2012 at 11:03 AM, dan sutton wrote:
> Hi,
>
> This appears to happen in trunk too.
>
> It appears that the add command request parameters get sent to the
> nodes. If I comment these out like so for add and commit:
>
> core/src/java/org/apache/solr/update/processor/DistributedUpdate
I'm not sure if this will be relevant for you, but this is roughly what I do.
Apologies if it's too basic.
I have a complex view that normalizes all the data that I need to be
together -- from over a dozen different tables. For one to many and many to
many relationships, I have sql turn the data
Hi,
This appears to happen in trunk too.
It appears that the add command request parameters get sent to the
nodes. If I comment these out like so for add and commit:
core/src/java/org/apache/solr/update/processor/DistributedUpdateProcessor.java
- params = new ModifiableSolrParams(req.getPa
Hi Erick,
Thanks for your reply. Yes i am using delete by query. I am currently
logging the number of items to be deleted before handing off to solr. And
from solr logs i can it deleted exactly that number. I will verify further.
Thanks.
On Mon, Sep 24, 2012 at 1:21 PM, Erick Erickson wrote:
>
Right - we need logs, admin->cloud dump to clipboard info, anything
else to go on.
On Mon, Sep 24, 2012 at 4:36 AM, Sami Siren wrote:
> hi,
>
> Can you share a little bit more about your configuration: how many
> shards, # of replicas, how does your clusterstate.json look like,
> anything suspici
Run a query on both old and new with &debugQuery=true on your query request and
look at the component timings for possible insight.
-- Jack Krupansky
From: Sujatha Arun
Sent: Monday, September 24, 2012 7:26 AM
To: solr-user@lucene.apache.org
Subject: Performance Degradation on Migrating from 1
Hi,
Im currently experimenting with Solr Cell to index files to Solr. During
this some questions came up.
1. Is it possible (and wise) to connect to Solr Cell with multiple Threads
at the same time to index several documents at the same time?
This question came up because my prrogramm takes abo
autoCommit (hard commit) is basically just to reduce how much RAM is
needed for the transaction log. You should generally use it with
openSearcher=false and don't need to use it for visibility.
It's also not required for durability due to the transaction log.
Soft commit should be used for visibi
And QTime doesn't include the time spent in the container (e.g., Tomcat or
Jetty) or network latency. Usually a query benchmark would be from the time
the client sent the query request until the time the client received the
query results.
The debug timing will help you understand which Solr co
On Mon, Sep 24, 2012 at 9:21 AM, Radim Kolar wrote:
> and what about solr.NRTCachingDirectoryFactory? Is solr.MMapDirectoryFactory
> faster if there is no NRT search requirements?
NRTCachingDirectoryFactory is a wrapping directory - it's generally
going to use solr.MMapDirectoryFactory as it's d
That looks like a valid Solr date math expression, but you need to make sure
that the field type is actually a Solr "DateField" as opposed to simply an
integer Unix time value.
-- Jack Krupansky
-Original Message-
From: Christian Bordis
Sent: Monday, September 24, 2012 7:16 AM
To: so
On Mon, Sep 24, 2012 at 9:47 AM, Mikhail Khludnev
wrote:
> Hi
> It seems like highlighting feature.
Thank you Mikhail. I actually do need the entire matched single entry,
not a snippet of it. Looking at the example in the OP, with
highlighting on "gold" I would get
glitters is gold
Whereas I ne
I tried & and it solved the 500 error code. But still it could find
punctuation marks.
Although the parsed query didnt contain the punctuation mark,
"{"
"{"
text:
text:
but still the numfound gives 1
and the highlight shows the result of punctuation mark
{
The steps I did:
1- editing the sc
Dne 24.9.2012 14:05, Erick Erickson napsal(a):
I'm pretty sure all you need to do is disable autoSoftCommit. Or rather
don't un-comment it from solrconfig.xml
and what about solr.NRTCachingDirectoryFactory? Is
solr.MMapDirectoryFactory faster if there is no NRT search requirements?
On Mon, Sep 24, 2012 at 2:16 PM, Erick Erickson wrote:
> Hmmm, works for me. What is your entire response packet?
>
> And you've covered the bases with indexed and stored so this
> seems like it _should_ work.
>
I'm sorry, reducing the output to rows=1 helped me notice that the
highlighted sectio
-Original message-
> From:Daisy
> Sent: Mon 24-Sep-2012 15:09
> To: solr-user@lucene.apache.org
> Subject: RE: Solr - Remove specific punctuation marks
>
> Yes I am trying to index Arabic document. There is a problem that the &&
> regex couldn't be understood in the solr schema and
Yes I am trying to index Arabic document. There is a problem that the &&
regex couldn't be understood in the solr schema and it gives 500 - code
error.
Here is an example:
input:
هذا مثال: للتوضيح (مثال علي علامات الترقيم) انتهي.
I tried also the regex: pattern="([\(\)\}\{\,[^.:\s+\S+]])"
but I
Hi Daisy,
I can't see anything wrong with the regex or the XML syntax.
One possibility: if it's Arabic you're matching against, you may want to add
ARABIC FULL STOP U+06D4 to the set you subtract from \p{Punct}.
If you give an example of your input and your expected output, I might be able
to
I am using Google for location input.
*It often splits out something like this:*
Shorewood, Seattle, Wa
*Since I am using this index analyzer:*
It means that if I search for "Sho" or "Shorew" I get the result I want.
However, if I search for “Sea” or “Seatt” I get no results.
I guess I need
Indexing is not happening after 'x' documents.
I am using Bitnami and had upgraded Mysql server from Mysql 5.1.* to Mysql
5.5.* version. After up gradation when I ran indexing on solr, it not get
indexed.
I am using a procedure in which i am finding the parent of a child and
inserting it in a
How do you delete items? By ID or by query?
My guess is that one of two things is happening:
1> your delete process is deleting too much data.
2> your index process isn't indexing what you think.
I'd add some logging to the SolrJ program to see what
it thinks is has deleted or added to the index
Hmmm, works for me. What is your entire response packet?
And you've covered the bases with indexed and stored so this
seems like it _should_ work.
Best
Erick
On Mon, Sep 24, 2012 at 6:12 AM, Dotan Cohen wrote:
>> > indexed="true"
>> multiValued="true" />
>>
>> d
I'm pretty sure all you need to do is disable autoSoftCommit. Or rather
don't un-comment it from solrconfig.xml
Best
Erick
On Mon, Sep 24, 2012 at 5:44 AM, Radim Kolar wrote:
> its possible to use solrcloud but without real-time features? In my
> application I do not need realtime features a
NP, good luck!
On Sun, Sep 23, 2012 at 3:41 PM, wrote:
> Hello Erick,
>
> Thanks a lot for your reply! Your suggestion is actually exactly the
> alternative solution we are thinking about and with your clarification on
> Solr's performance we are going to go for it! Many thanks again!
>
> Mile
Hi,
On migrating from 1.3 to 3.6.1 , I see the query performance degrading by
nearly 2 times for all types of query. Indexing performance slight
degradation over 1.3 For Indexing we use our custom scripts that post xml
over HTTP.
Any thing that I might have missed . I am thinking that this migh
Hi Everyone!
We doing some nice stuff with Chef (http://wiki.opscode.com/display/chef/Home).
It uses solr for search but range queries don't work as expected. Maybe chef,
solr just buggy or I am doing it wrong ;-)
In chef I have bunch of nodes witch timestamp attribute. Now want search nodes
Hi
On my windows workstation I have tried to index a document into a
SolrCloud instance with the following "special" configuration:
120
60
...
${solr.data.dir:}
That is commit every 20 minutes and soft commit every 10 minutes.
Rig
Hi,
I am running Solr 3.5, using SolrJ and using StreamingUpdateSolrServer to
index and delete items from solr.
I basically index items from the db into solr every night. Existing items
can be marked for deletion in the db and a delete request sent to solr to
delete such items.
My process runs a
> indexed="true"
> multiValued="true" />
>
> doctest
Note that in anonymizing the information, I introduced a typo. The
above "doctest" should be "doctext". In any case, the field names in
the production application and in production schema do in fact match!
--
It seems my clusterstate.json is still old. Is there a method to recreate is
without taking all nodes down at the same time?
-Original message-
> From:Markus Jelsma
> Sent: Thu 20-Sep-2012 10:14
> To: solr-user@lucene.apache.org
> Subject: RE: Nodes cannot recover and become unavail
its possible to use solrcloud but without real-time features? In my
application I do not need realtime features and old style processing
should be more efficient.
Hi;
I am working with apache-solr-3.6.0 on windows machine. I would like to
remove all punctuation marks before indexing except the colon and the
full-stop.
I tried:
But it didn't work. Any Ideas?
--
View this message in context:
http://lucene.472066.n3
hi,
Can you share a little bit more about your configuration: how many
shards, # of replicas, how does your clusterstate.json look like,
anything suspicious in the logs?
--
Sami Siren
On Mon, Sep 24, 2012 at 11:13 AM, Daniel Brügge
wrote:
> Hi,
>
> I am running Solrcloud 4.0-BETA and during th
Hi,
I am running Solrcloud 4.0-BETA and during the weekend it 'crashed' somehow,
so that it wasn't reachable. CPU load was 100%.
After a restart i couldn't access the data it just telled me:
"no servers hosting shard"
Is there a way to get the data back?
Thanks & regards
Daniel
Hi
It seems like highlighting feature.
24.09.2012 0:51 пользователь "Dotan Cohen" написал:
> Assuming a multivalued, stored and indexed field with name "comment".
> When performing a search, I would like to return only the values of
> "comment" which contain the match. For example:
>
> When searc
69 matches
Mail list logo