Let me try to help you, first of all I would like to encourage people to
post more information about their scenario than "This is my log, index
deleted, help me" :)
This kind of Info can be really useful :
1) Solr version
2) Solr architecture ( Solr Cloud ? Solr Cloud configuration ? Manual
Shard
Erick will correct me if I am wrong but this function query I don't think
it exists.
But maybe can be a nice contribution.
It should take in input a date format and a field and give in response the
new formatted Date.
The would be simple to use it :
fl=id,persian_date:dateFormat("/mm/dd",greg
Hi Edwin,
let's do this step by step.
Clustering is problem solved by unsupervised machine learning algorithms.
The scope of clustering is to group per similarity a corpus of documents,
trying to have meaningful groups for a human being.
Solr currently provides different approaches for *Query Time
I've tried to use solr.HMMChineseTokenizerFactory with the following
configurations:
It is able to be indexed, but when I tried to search for the words, it
matches many more other words and not just the words that I search. Why is
this so?
For example, the query
ht
On Wed, Jun 10, 2015, at 05:52 AM, William Bell wrote:
> Finding DIH issue with the new AngularJS DIH section, while indexing...
>
> 1,22613/s ?
>
> Last Update: 22:50:50
> *Indexing since 0:1:38.204*
> Requests: 1, Fetched: 1,22613/s, Skipped: 0, Processed: 1,22613/s
> Started: 3 minutes ago
The main objective here is actually to assign a title to the documents as
they are being indexed.
We actually found that the cluster labels provides a good information on
the key points of the documents, but I'm not sure if we can get a good
cluster labels with a single documents.
Besides getting
Hi Alessandro,
Please find the answers inline and help me out to figure out this problem.
1) Solr version : *4.2.1*
2) Solr architecture :* Master -slave/ Replication with requestHandler*
3) Kind of data source indexed : *Mysql *
4) What happened to the datasource ? any change in there ? : *No c
Note the clean= parameter to the DIH. It defaults to true. It will wipe
your index before it runs. Perhaps it succeeded at wiping, but failed to
connect to your database. Hence an empty DB?
clean=true is, IMO, a very dangerous default option.
Upayavira
On Wed, Jun 10, 2015, at 10:59 AM, Midas A
Let me answer in line, to get more info :
2015-06-10 10:59 GMT+01:00 Midas A :
> Hi Alessandro,
>
> Please find the answers inline and help me out to figure out this problem.
>
> 1) Solr version : *4.2.1*
> 2) Solr architecture :* Master -slave/ Replication with requestHandler*
>
>
Where happene
It depends a lot on what the documents are. Some document formats have
metadata that stores a title. Perhaps you can just extract that.
If not, once you've extracted the content, perhaps you could just have a
special field that is the first n words (followed by an ellipsis).
If you use a clusteri
Wow, Upaya, I didn't know that clean was default=true in the delta import
as well!
I did know it was default in the full import, but I agree with you that
having a default to true for delta import is very dangerous !
But assuming the user was using the delta import so far, if cleaning every
time,
I was only speaking about full import regarding the default of
clean=true. However, looking at the source code, it doesn't seem to
differentiate especially between a full and a delta in relation to the
default of clean=true, which would be pretty crappy. However, I'd need
to try it.
Upayavira
On
Another technology that might make more sense is a Doc Transformer.
You also specify them in the fl parameter. I would imagine you could
specify
fl=id,[persian f=gregorian_Date]
See here for more cases:
https://cwiki.apache.org/confluence/display/solr/Transforming+Result+Documents
This does no
I have used Solr 3.3 version as Data Import Handler(DIH) with Oracle.Its
working fine for me.
Now I am trying the same with Mysql.With the change in database, I have changed
the query used in data-config.xml for MySql.
The query has variables which are passed url in http.The same thing works fi
Some reason, you email is complete unreadable with a lot of nbsp
instead of spaces. Maybe it is trying to send as broken HTML?
You may want to try to reformat the message and resend.
Regards,
Alex.
Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com
Hi Erik
When running solr in simple mode on my laptop, I found the *vm files under
under server/solr/COLLECTION_NAME/conf
however, when running on my server in cloud mode (with only one node), I do
not find these conf/ directory under server.
Does it sit on another place?
thanks!
On Tue, Jun 9
In cloud mode, configurations live in ZooKeeper.
By doing the
-Dvelocity.template.base.dir=/example/files/conf/velocity/ trick
(or baking that into your solrconfig setup for the VelocityResponseWriter) you
can have the templates on the file system instead though.
—
Erik Hatcher, Senior Solutio
Thank you for your reply.
So my question is: can I get offset of time if I use NOW/MINUTE and not NOW/DAY
rounding?
You said " TZ affects what timezone is used when defining the concept of a
"day" for
the purposes of rounding by day. " I understand from this answer that query
like I mentione
I have used Solr 3.3 version as Data Import Handler(DIH) with Oracle.Its
working fine for me.
Now I am trying the same with Mysql.With the change in database, I have changed
the query used in data-config.xml for MySql.
The query has variables which are passed url in http.The same thing works fi
On 6/10/2015 6:43 AM, abhijit bashetti wrote:
> >= to_date('[?, '28/05/2015 11:13:50']', 'DD/MM/ HH24:MI:SS')
> Anyone knows where is the problem? Why is the variable resolver not working
> as expected?
> Note : to_date is function written by us in MySql.
> I have checked out the solr c
I agree with Upayavira,
Title extraction is an activity independent from Solr.
Furthermore I would say it's easy to extract the title before the Solr
Indexng stage.
When we send the content arrives to Solr Update processors it is already a
String.
If you want to do some clever title extraction, fo
Hello,
I have a cluster with 3 nodes (node1, node2 and node3). Now I want to create a
new collection with 3 shards using `implicit` routing:
http://localhost:8983/solr/admin/collections?action=CREATE&name=mycollection&numShards=3&router.name=implicit&shards=shard1,shard2,shard3&router.field
Take a look at the collections API CREATE command in more detail here:
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api1
Admittedly this is 5.2 but you didn't mention what version of Solr
you're using.
In particular the createNodeSet and createNodeSet.shuffle par
Just taking a look to the code :
"
if (requestParams.containsKey("clean")) {
clean = StrUtils.parseBool( (String) requestParams.get("clean"), true);
} else if (DataImporter.DELTA_IMPORT_CMD.equals(command) ||
DataImporter.IMPORT_CMD.equals(command)) {
clean = false;
} else {
clean = debug ?
I am using RankQuery to implement my applicative scorer that returns a score
based on the value of specific field (lets call it 'score_field') that is
stored for every document.
The RankQuery creates a collector, and for every collected docId I retrieve
the value of score_field, calculate the scor
I need to execute close() because the scorer is being opened in a context of
a query and caches some data in that scope - of the specific query. The way
to clear this cache, which is only relevant for that query, is to call
close(). I think this API is not so good, but I assume that the scorer's
co
Thank you very much.
It seems that document transformer is the perfect extension point for this
conversion. I will try to implement that.
Best regards.
On Wed, Jun 10, 2015 at 3:54 PM, Upayavira wrote:
> Another technology that might make more sense is a Doc Transformer.
>
> You also specify the
Hi,
We have a solr index with ~1M documents.
We want to give the ability to our users to filter results from queries -
meaning they will not shown again for any query of this specific user (we
currently have 10K users).
You can think of a scenario like a "recommendation engine" which you don't
wa
I'm having a config issue, I'm posting the error from Solrj which also
includes the cluster state JSON:
org.apache.solr.common.SolrException: No active slice servicing hash code
2ee4d125 in DocCollection(rfp365)={
"shards":{"shard1":{
"range":"-",
"state":"active",
I'm having a config issue, I'm posting the error from Solrj which also
includes the cluster state JSON:
org.apache.solr.common.SolrException: No active slice servicing hash code
2ee4d125 in DocCollection(rfp365)={
"shards":{"shard1":{
"range":"-",
"state":"active",
Hello,
The problem is SlowCompositeReaderWrapper.wrap(searcher.getIndexReader());
you hardly ever need to to this, at least because Solr already does it.
DocValues need to be accessed per segment, leaf/atomic/reader/context
provided to collector.
eg look at DocTermsIndexDocValues.strVal(int)
DocT
start with negating and bypassing caches by
https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-TermsQueryParser
eg
fq=-{!terms f=p_id cache=false}1,3,5,already,seen
note:
Elastic can even store such filters via
https://www.elastic.co/guide/en/elasticsearch/reference/current
Hi all,
I'm using Zookeeper 3.4.6 in the context of SolrCloud 5. When uploading
a config file containing the following, I get an "Invalid Path String"
error.
words="/netapp/dokubase/seeval/dicts/stopwords/stopwords_de.txt"
ignoreCase="true"/>
leads obviously to
Invalid path string
\"/con
:
: The problem is SlowCompositeReaderWrapper.wrap(searcher.getIndexReader());
: you hardly ever need to to this, at least because Solr already does it.
Specifically you should just use...
searcher.getLeafReader().getSortedSetDocValues(your_field_anme)
...instead of doing all this wrapp
On 6/10/2015 2:47 PM, Peter Scholze wrote:
> I'm using Zookeeper 3.4.6 in the context of SolrCloud 5. When
> uploading a config file containing the following, I get an "Invalid
> Path String" error.
>
> words="/netapp/dokubase/seeval/dicts/stopwords/stopwords_de.txt"
> ignoreCase="true"/>
>
> lead
I don't see any way around storing which recommendations have been delivered to
each user. Sounds like a separate collection with the unique ID created from
the combination of the user ID and the recommendation ID (with the IDs also
available as a separate, searchable and returnable fields).
: So my question is: can I get offset of time if I use NOW/MINUTE and not
NOW/DAY rounding?
i'm sorry, but your question is still too terse, vague, and ambiguious for
me to really make much sense of it; and the example queries you provided
really don't have enough context for me to understand
: The guys was using delta import anyway, so maybe the problem is
: different and not related to the clean.
that's not what the logs say.
Here's what i see...
Log begins with server startup @ "Jun 10, 2015 11:14:56 AM"
The DeletionPolicy for the "shopclue_prod" core is initialized at "Jun
10,
Hi,
Is it possible to list all the fields in the highlighting portion in the
output?
Currently,even when I *, it only shows fields where
highlighting is possible, and fields which highlighting is not possible is
not shown.
I would like to have the output where all the fields, regardless if
highli
Hi,
There are 5 cores and a separate server for indexing on this solrcloud. Can
you please share your suggestions on:
How can indexer know that the optimize has completed even if the
commit/optimize runs in background without going to the solr servers may be
by using any solrj or other API?
I t
If I knew, I would fix it ;). The sub-optimizes (i.e. the ones
sent out to each replica) should be sent in parallel and then
each thread should wait for completion from the replicas. There
is no real "check for optimize", I believe that the return from the
call is considered sufficient. If we can t
Why would you care when the forced merge (not an “optimize”) is done? Start it
and get back to work.
Or even better, never force merge and let the algorithm take care of it.
Seriously, I’ve been giving this advice since before Lucene was written,
because Ultraseek had the same approach for mana
Thank you for your quick answer.
The two parameters createNodeSet and createNodeSet.shuffle seem to solve the
problem:
http://localhost:8983/solr/admin/collections?action=CREATE&name=mycollection&numShards=3&router.name=implicit&shards=shard1,shard2,shard3&router.field=shard&createNodeSet=node1,
Until somewhere around Lucene 3.5, you needed to optimise, because the
merge strategy used wasn't that clever and left lots of deletes in your
largest segment. Around that point, the TieredMergePolicy became the
default. Because its algorithm is much more sophisticated, it took away
the need to opt
44 matches
Mail list logo