Hi:
I would start looking:
http://docs.lucidworks.com/display/solr/The+Standard+Query+Parser
And the
org.apache.lucene.queryparser.flexible.standard.StandardQueryParser.java
Hope it helps.
On Thu, Sep 5, 2013 at 11:30 PM, Scott Schneider <
scott_schnei...@symantec.com> wrote:
> Hello,
>
> I'm
the input string is a normal html page with the word Zahlungsverkehr in it and
my query is ...solr/collection1/select?q=*
On 5. Sep 2013, at 9:57 PM, Jack Krupansky wrote:
> And show us an input string and a query that fail.
>
> -- Jack Krupansky
>
> -Original Message- From: Shawn Heis
I will try this,thanks
--
View this message in context:
http://lucene.472066.n3.nabble.com/unknown-stream-source-info-while-indexing-rich-doc-in-solr-tp4088136p4088490.html
Sent from the Solr - User mailing list archive at Nabble.com.
The replication handler's backup command was built for pre-SolrCloud.
It takes a snapshot of the index but it is unaware of the transaction
log which is a key component in SolrCloud. Hence unless you stop
updates, commit your changes and then take a backup, you will likely
miss some updates.
That
Yes, if a document with the same key exists, then the old document
will be deleted and replaced with the new document. You can also
partially update documents (we call it atomic updates) which reads the
old document from local index, updates it according to the request and
then replaces the old doc
Stats Component can give you a count of non-null values in a field.
See https://cwiki.apache.org/confluence/display/solr/The+Stats+Component
On Fri, Sep 6, 2013 at 12:28 AM, Steven Bower wrote:
> Is there a way to get the count of buckets (ie unique values) for a field
> facet? the rudimentary a
Can you give exact steps to reproduce this problem?
Also, are you sure you supplied numShards=4 while creating the collection?
On Fri, Sep 6, 2013 at 12:20 AM, mike st. john wrote:
> using solr 4.4 , i used collection admin to create a collection 4shards
> replication - factor of 1
>
> i did t
Hi
I am currently using solr -3.5.0 indexed by wikipedia dump (50 gb) with
java 1.6. I am searching the tweets in the solr. Currently it takes average
of 210 millisecond for each post, out of which 200 millisecond is consumed
by solr server (QTime). I used the jconsole mointor tool, The report a
Hello!
I remember some time ago people were interested in how Solr instances can
be monitored with graphite. This blog post gives a hands-on example from my
experience of monitoring RAM usage of Solr.
http://dmitrykan.blogspot.fi/2013/09/monitoring-solr-with-graphite-and-carbon.html
Please note,
My guess is that you have a single request handler defined with all
your language specific spell check components. This is why you see
spellcheck values from all spellcheckers.
If the above is true, then I don't think there is a way to choose one
specific spellchecker component. The alternative is
Hi
I am currently using solr -3.5.0, indexed wikipedia dump (50 gb) with
java 1.6.
I am searching the solr with text (which is actually twitter tweets) .
Currently it takes average time of 210 millisecond for each post, out of
which 200 millisecond is consumed by solr server (QTime). I used th
Comments inline:
On Wed, Sep 4, 2013 at 10:38 PM, Lisandro Montaño
wrote:
> Hi all,
>
>
>
> I’m currently working on deploying a solrcloud distribution in centos
> machines and wanted to have more guidance about Replication Factor
> configuration.
>
>
>
> I have configured two servers with solrcl
On Wed, Sep 4, 2013 at 4:56 PM, sebastian.manolescu
wrote:
> I want to implement spell check functionality offerd by solr using MySql
> database, but I dont understand how.
> Here the basic flow of what I want to do.
>
> I have a simple inputText (in jsf) and if I type the word shwo the response
>
Understood, what I need is a count of the unique values in a field and that
field is multi-valued (which makes stats component a non-option)
On Fri, Sep 6, 2013 at 4:22 AM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:
> Stats Component can give you a count of non-null values in a field
Hi I am new to Solr , I am looking for option of restricting duplicate file
indexing in solr.Please let me know if it can be done with any configuration
change.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Restrict-Parsing-duplicate-file-in-Solr-tp4088471.html
Sent from t
hi All, I'm trying to store a 2 dimensional array in SOLR [version 4.0].
Basically I've the following data:
[[20121108, 1],[20121110, 7],[2012, 2],[20121112, 2]] ...
The inner array being used to keep some count say X for that particular day.
Currently, I'm using the following field to stor
Hi,
But i'm indexing rss feeds. I want that solr indexes that without change the
existing information of a document with the same uniqueKey.
The best approach is that solr updates the doc if changes are detected, but i
can leave without that.
I really would like that solr does not update the do
Hi,
We had implemented Auto Complete feature in our site. Below are the solr config
details.
schema.xml
First you need to tell us how you wish to use and query the data. That will
largely determine how the data must be stored. Give us a few example queries
of how you would like your application to be able to access the data.
Note that Lucene has only simple multivalued fields - no structure or
n
Explain what you mean by restring duplicate file indexing. Solr doesn't work
at the "file" level - only documents (rows or records) and fields and
values.
-- Jack Krupansky
-Original Message-
From: shabbir
Sent: Friday, September 06, 2013 12:24 AM
To: solr-user@lucene.apache.org
Subj
Is there any chance that your changed your schema since you indexed the
data? If so, re-index the data.
If a "*" query finds nothing, that implies that the default field is empty.
Are you sure the "df" parameter is set to the field containing your data?
Show us your request handler definition
i've managed to get it working if i use the regexTransformer and string is on
the same line in my tika entity. but when the string is multilined it isn't
working even though i tried ?s to set the flag dotall.
then i tried it like this and i get a stackoverflow
Have you checked the hit ratio of the different caches? Try to tune them to get
rid of all evictions if possible.
Tuning the size of the caches and warming you searcher can give you a pretty
good improvement. You might want to check your analysis chain as well to see if
you`re not doing anythin
It's always frustrating when someone replies with "Why not do it
a completely different way?". But I will anyway :).
There's no requirement at all that you send things to Solr to make
Solr Cel (aka Tika) do it's tricks. Since you're already in SolrJ
anyway, why not just parse on the client? This
Hi guyz,
Just a quick question:
I have a field that has CSV values in the database. So I will use the
DataImportHandler and will index it using RegexTransformer's splitBy
attribute. However, since this is the first time I am doing it, I just
wanted to be sure if it will work for Facet Count?
For
Thanks for clearing that up Erick. The updateLog XML element isn't present in
any of the solrconfig.xml files, so I don't believe this is enabled.
I posted the directory listing of all of the core data directories in a prior
post, but there are no files/folders found that contain "tlog" in th
Thanks!
-Original message-
> From:Erick Erickson
> Sent: Friday 6th September 2013 16:20
> To: solr-user@lucene.apache.org
> Subject: Re: SolrCloud 4.x hangs under high update volume
>
> Markus:
>
> See: https://issues.apache.org/jira/browse/SOLR-5216
>
>
> On Wed, Sep 4, 2013 at 11:
I don't know that it's too bad though - its always been the case that if you do
a backup while indexing, it's just going to get up to the last hard commit.
With SolrCloud that will still be the case. So just make sure you do a hard
commit right before taking the backup - yes, it might miss a few
Phone typing. The end should not say "don't hard commit" - it should say "do a
hard commit and take a snapshot".
Mark
Sent from my iPhone
On Sep 6, 2013, at 7:26 AM, Mark Miller wrote:
> I don't know that it's too bad though - its always been the case that if you
> do a backup while indexin
Yah, you're getting away with it due to the small data size. As
your data grows, the underlying mechanisms have to enumerate
every term in the field in order to find terms that match so it
can get _very_ expensive with large data sets.
Best to bite the bullet early or, better yet, see if you reall
Hi,Thanks for the quick reply. Sure, please find below the details as per your
query.
Essentially, I want to retrieve the doc through JSON [using JSON format as SOLR
result output]and want JSON to pick the the data from the dataX field as a two
dimensional array of ints. When I store the data as
Whoa! You should _not_ be using replication with SolrCloud. You can use
replication just fine with 4.4, just like you would have in 3.x say, but in
that case you should not be using the zkHost or zkRun parameters, should not
have a ZooKeeper ensemble running etc.
In SolrCloud, all updates are rout
Thank you Jack for the suggestion.
We can try group by site. But considering that number of sites are only
about 1000 against the index size of 5 million, One can expect most of the
hits would be hidden and for certain specific keywords only a handful of
actual results could be displayed if result
Markus:
See: https://issues.apache.org/jira/browse/SOLR-5216
On Wed, Sep 4, 2013 at 11:04 AM, Markus Jelsma
wrote:
> Hi Mark,
>
> Got an issue to watch?
>
> Thanks,
> Markus
>
> -Original message-
> > From:Mark Miller
> > Sent: Wednesday 4th September 2013 16:55
> > To: solr-user@lucen
bq: I'm actually not using the transaction log (or the
NRTCachingDirectoryFactory); it's currently set up to use the
MMapDirectoryFactory,
This isn't relevant to whether you're using the update log or not, this is
just how the index is handled. Look for something in your solrconfig.xml
like:
Facet counts are per field - your counts are scattered across different
fields.
There are additional capabilities in the facet component, but first you
should describe exactly what your requirements are.
-- Jack Krupansky
-Original Message-
From: Raheel Hasan
Sent: Friday, September
On 9/6/2013 7:09 AM, Andreas Owen wrote:
> i've managed to get it working if i use the regexTransformer and string is on
> the same line in my tika entity. but when the string is multilined it isn't
> working even though i tried ?s to set the flag dotall.
>
> dataSource="dataUrl" onError="skip"
You still haven't supplied any queries.
If all you really need is the JSON as a blob, simply store it as a string
and parse the JSON in your application layer.
-- Jack Krupansky
-Original Message-
From: A Geek
Sent: Friday, September 06, 2013 10:30 AM
To: solr user
Subject: RE: Stor
On 9/6/2013 2:54 AM, prabu palanisamy wrote:
> I am currently using solr -3.5.0, indexed wikipedia dump (50 gb) with
> java 1.6.
> I am searching the solr with text (which is actually twitter tweets) .
> Currently it takes average time of 210 millisecond for each post, out of
> which 200 millisec
ok i have html pages with .content i
want.. i want to extract (index, store) only that
between the body-comments. i thought regexTransformer would be the best because
xpath doesn't work in tika and i cant nest a xpathEntetyProcessor to use xpath.
what i have also found out is that t
Has anyone ever hit this when adding documents to SOLR? What does it mean?
ERROR [http-8983-6] 2013-09-06 10:09:32,700 SolrException.java (line 108)
org.apache.solr.common.SolrException: Invalid CRLF
at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:175)
at
org.apache.solr.handle
You're not being clear here - are the commas delimiting fields or do you
have one value per row?
Yes, you can tokenize a comma-delimited value in Solr.
-- Jack Krupansky
-Original Message-
From: Raheel Hasan
Sent: Friday, September 06, 2013 11:54 AM
To: solr-user@lucene.apache.org
Su
Its a csv from the database. I will import it like this, (say for example
the field is 'emailids' and it contain csv of email ids):
On Fri, Sep 6, 2013 at 9:01 PM, Jack Krupansky wrote:
> You're not being clear here - are the commas delimiting fields or do you
> have one value per row?
>
> Yes
Hi,
I'm runing solr 4.0 but using legacy distributed search set up. I set the
shards parameter for search, but indexing into each solr shards directly.
The problem I have been experiencing is building connection with solr
shards. If I run a query, by using wget, to get number of records from each
let me further elaborate:
[db>table1]
field1 = int
field2= string (solr indexing = true)
field3 = csv
[During import into solr]
splitBy=","
[After import]
solr will be searched for terms from field2.
[needed]
counts of occurrances of each value in csv
On Fri, Sep 6, 2013 at 9:35 PM, Raheel Ha
Thanks. I realized there's an error in the ChunkedInputFilter...
I'm not sure if this means there's a bug in the client library I'm using
(solrj 4.3) or is a bug in the server SOLR 4.3? Or is there something in
my data that's causing the issue?
On Fri, Sep 6, 2013 at 1:02 PM, Chris Hostetter w
basically, a field having a csv... and find counts / number of occurrance
of each csv value..
On Fri, Sep 6, 2013 at 8:54 PM, Raheel Hasan wrote:
> Hi,
>
> What I want is very simple:
>
> The "query" results:
> row 1 = a,b,c,d
> row 2 = a,f,r,e
> row 3 = a,c,ff,e,b
> ..
>
> facet count needed:
>
: Has anyone ever hit this when adding documents to SOLR? What does it mean?
Always check for the root cause...
: Caused by: java.io.IOException: Invalid CRLF
:
: at
:
org.apache.coyote.http11.filters.ChunkedInputFilter.parseCRLF(ChunkedInputFilter.java:352)
...so while Solr is trying to rea
I'm migrating from 3.x to 4.x and I'm running some queries to verify that
everything works like before. I've found however that the query "galaxy s3"
is giving much less results. In 3.x numFound=1628, in 4.x numFound=70.
Here's the relevant schema part:
: it shows type as undefined for dynamic field ignored_* , and I am using
That means the running solr instance does not know anything about a
dynamic field named ignored_* -- it doesn't exist.
: but on the admin page it shows schema :
the page showing hte schema file just tells you what's on d
: I'm migrating from 3.x to 4.x and I'm running some queries to verify that
: everything works like before. I've found however that the query "galaxy s3"
: is giving much less results. In 3.x numFound=1628, in 4.x numFound=70.
is your entire schema 100% identical in both cases?
what is the lucene
For what it's worth... I just updated to solrj 4.4 (even though my server
is solr 4.3) and it seems to have fixed the issue.
Thanks for the help!
On Fri, Sep 6, 2013 at 1:41 PM, Chris Hostetter wrote:
>
> : I'm not sure if this means there's a bug in the client library I'm using
> : (solrj 4.3)
Besides liking or not the behaviour we are getting in 3.x, Im required to
keep everything working as close as possible as before.
Have no idea why this is happening, but setting that field to true solved
the issue, now I get the exact same amount of items in both queries!
I wouldn't bother checki
: I'm not sure if this means there's a bug in the client library I'm using
: (solrj 4.3) or is a bug in the server SOLR 4.3? Or is there something in
: my data that's causing the issue?
It's unlikly that an error in the data you pass to SolrJ methods would be
causing this problem -- i'm pretty
Hey guys,
(copy of my post to SOLR-5216)
We tested this patch and unfortunately encountered some serious issues a
few hours of 500 update-batches/sec. Our update batch is 10 docs, so we are
writing about 5000 docs/sec total, using autoCommit to commit the updates
(no explicit commits).
Our envir
: Sorry for the multi-post, seems like the .tdump files didn't get
: attached. I've tried attaching them as .txt files this time.
Interesting ... it looks like 2 of your cores are blocked in loaded while
waiting for the searchers to open ... not clera if it's a deaklock or why
though - in bot
: Our schema is identical except the version.
: In 3.x it's 1.1 and in 4.x it's 1.5.
That's kind of a significant difference to leave out -- indepenent of the
question you are asking about here, it's going to make quite a few
differences in how things are being being parsed, and what defaults a
On 9/6/2013 12:46 PM, Fermin Silva wrote:
Our schema is identical except the version.
In 3.x it's 1.1 and in 4.x it's 1.5.
Also in solrconfig.xml we have no lucene version for 3.x (so it's using 2_4
i believe) and in 4.x we fixed it to 4_4.
The autoGeneratePhraseQueries parameter didn't exist
: Do all of your cores have "newSearcher" event listners configured or just
: 2 (i'm trying to figure out if it's a timing fluke that these two are
stalled, or if it's something special about the configs)
All of my cores have both the "newSearcher" and "firstSearcher" event listeners
configured.
is there any way to change the dataDir while creating a collection via the
collection api?
Did you ever get to index that long before without hitting the deadlock?
There really isn't anything negative the patch could be introducing, other than
allowing for some more threads to possibly run at once. If I had to guess, I
would say its likely this patch fixes the deadlock issue and your
Okay, thanks, useful info. Getting on a plane, but ill look more at this soon.
That 10k thread spike is good to know - that's no good and could easily be part
of the problem. We want to keep that from happening.
Mark
Sent from my iPhone
On Sep 6, 2013, at 2:05 PM, Tim Vaillancourt wrote:
>
hi,
curl '
http://192.168.0.1:8983/solr/admin/collections?action=CREATE&name=collectionx&numShards=4&replicationFactor=1&collection.configName=config1
'
after that, i added approx 100k documents, verified there were in the
index and distributed across the shards.
i then decided to start addin
it shows type as undefined for dynamic field ignored_* , and I am using
default collection1 core,
but on the admin page it shows schema :
--
View this message in context:
http://lucene.472066.n3.nabble.com/unknown-stream-source-info-while-indexing-rich-doc-in-solr-tp4088136p4088591
Hey Mark,
The farthest we've made it at the same batch size/volume was 12 hours
without this patch, but that isn't consistent. Sometimes we would only get
to 6 hours or less.
During the crash I can see an amazing spike in threads to 10k which is
essentially our ulimit for the JVM, but I strangely
Enjoy your trip, Mark! Thanks again for the help!
Tim
On 6 September 2013 14:18, Mark Miller wrote:
> Okay, thanks, useful info. Getting on a plane, but ill look more at this
> soon. That 10k thread spike is good to know - that's no good and could
> easily be part of the problem. We want to kee
Thanks Shalin and Mark for your responses. I am on the same page about the
conventions for taking the backup. However, I am less sure about the
restoration of the index. Lets say we have 3 shards across 3 solrcloud
servers.
1.> I am assuming we should take a backup from each of the shard leaders t
Hello,
I'm working with the Pecl package, with Solr 4.3.1. I have a doc defined
in my schema where id is the uniqueKey,
multiValued="false" />
id
I tried to add a doc to my index with the following code (simplified for
the question):
$client = new SolrClient($options);
$doc = new SolrInput
Hi,
Our schema is identical except the version.
In 3.x it's 1.1 and in 4.x it's 1.5.
Also in solrconfig.xml we have no lucene version for 3.x (so it's using 2_4
i believe) and in 4.x we fixed it to 4_4.
Thanks
On Sep 6, 2013 3:34 PM, "Chris Hostetter" wrote:
>
> : I'm migrating from 3.x to 4.x
Hi,
What I want is very simple:
The "query" results:
row 1 = a,b,c,d
row 2 = a,f,r,e
row 3 = a,c,ff,e,b
..
facet count needed:
'a' = 3 occurrence
'b' = 2 occur.
'c' = 2 occur.
.
.
.
I searched and found a solution here:
http://stackoverflow.com/questions/9914483/solr-facet-multiple-words-with-
I wouldn't say I love this idea, but wouldn't it be safe to LVM snapshot
the Solr index? I think this may even work on a live server, depending on
some file I/O details. Has anyone tried this?
An in-Solr solution sounds more elegant, but considering the tlog concern
Shalin mentioned, I think this
Thanks a ton Mark. I have tried SOLR-4816 and it didn't help. But I will
try Mark's patch next week, and see what happens.
-Kevin
On Thu, Sep 5, 2013 at 4:46 AM, Erick Erickson wrote:
> If you run into this again, try a jstack trace. You should see
> evidence of being stuck in SolrCmdDistributo
Does anyone know if there is such a thing as a BatchSolrServer object in the
solrj code? I am currently using the ConcurrentUpdateSolrServer, but it
isn't doing quite what I expected. It will distribute the load of sending
through the http client through different threads and manage the
connections
73 matches
Mail list logo