Hi Bill,
the classical way would be to have a reverse proxy in front of the
application that catches such cases. A decent reverse proxy or even
application firewall router will allow you to define limits on bandwidth
and sessions per time unit. Some even recognize specific
denial-of-service patte
https://github.com/Heliosearch/heliosearch
Last committment a year ago... that tells me something :-)
heliosearch.com and heliosearch.org go to standard GoDaddy pages.
Heliosearch was a fork that has apparently been dormant for a year already.
Cheers,
--Jürgen
On 26.11.2015 14:26, Bernd Fehlin
Subin,
Only the envelope is structured. What's inside the individual fields
of the structure may be single values (possibly considered structured
meta-data) or unstructured (like free text or other fields with informal
semantics).
Even if you pass a 5-hour video as a major case of unstructured d
Abhishek,
given the vast amount of information you write, I suspect thisis not
an HTTP error code (those are three digits, and the ones starting with
200 actually indicate a success), but rather a libcurl error code. Check
against this list to find out whether that's an explanation:
https://curl
To be precise: create one zoo.cfg for each of the instances. One config
file for all is a bad idea.
In each config file, use the same server.X lines, but use a unique
clientPort.
As you will also have separate data directories, I would recommend
having one root directory .../zookeeper where you c
Replication on the storage layer will provide a reliable storage for the
index and other data of Solr. In particular, this replication does not
guarantee your index files are consistent at any time as there may be
intermediate states that are only partially replicated. Replication is
only a converg
Hello,
have you tried the "createNodeSet" option of collection/shard creation
and the "node" option of replica creation in Solr 4.9.0+?
As you're just testing, I would strongly recommend going to the latest
version.
https://cwiki.apache.org/confluence/display/solr/Collections+API
This is useful
Hello Xinwu,
does it change anything if you use an underline instead of the dash in
the collection name?
What is the result of the call? Any status or error message?
Did you actually feed data into the collection?
Cheers,
--Jürgen
On 03.09.2014 11:21, xinwu wrote:
> Hi , all:
> I crea
Hello all,
as the migration from FAST to Solr is a relevant topic for several of
our customers, there is one issue that does not seem to be addressed by
Lucene/Solr: document vectors FAST-style. These document vectors are
used to form metrics of similarity, i.e., they may be used as a
"semantic f
/solr/The+Term+Vector+Component
> And just to show some impressive search functionality of the wiki: ;)
> https://cwiki.apache.org/confluence/dosearchsite.action?where=solr&spaceSearch=true&queryString=document+vectors
>
> Cheers,
> Jim
>
>
> 2014-09-05 9:44 GMT+02:00 &
Thanks for posting this. I was just about to send off a message of
similar content :-)
Important to add:
- In FAST ESP, you could have more than one such docvector associated
with a document, in order to reflect different metrics.
- Term weights in docvectors are document-relative, not absolute.
Depending on the size of the individual records returned, I'd use a
decent size window (to minimize network and marshalling/unmarshalling
overhead) of maybe 1000-1 items sorted by id, and use that in
combination with cursorMark. That will be easier on the server side in
terms of garbage collect
In a test scenario, I used stunnel for connections between some
zookeeper observers and the central ensemble, as well as between a SolrJ
4.9.0 client and the central zookeepers. This is entirely transparent
modulo performance penalties due to network latency and ssl overhead. I
finally ended up wit
Hello,
you have one shard and 11 replicas? Hmm...
- Why you have to keep two nodes on some machines?
- Physical hardware or virtual machines?
- What is the size of this index?
- Is this all on a local network or are there links with potential
outages or failures in between?
- What is the query l
Hello Anurag,
the CRLF problem with Cygwin can be cured by running the scripts all
through this filter:
tr -d '\r' < $script > $script.new ; mv $script.new $script
with $script holding the path of the script to be massaged.
Generally, however, I would advise to use the standard scripts only fo
Hello Nabil,
isn't that what should be expected? Cores are local to nodes, so you
only get the core status from the node you're asking. Cluster status
refers to the entire SolrCloud cluster, so you will get the status over
all collection/nodes/shards[=cores]. Check the Core Admin REST interface
f
s you can see, I'm not using direct connection to node. It's a CloudServer.
> Do you have example to how to get Cluster status from solrJ.
>
> Regards,
> Nabil.
>
>
> Le Lundi 20 octobre 2014 13h44, Jürgen Wagner (DVT)
> a écrit :
>
>
>
> Hello N
Hello Olivier,
for real production use, you won't really want to use any toys like
post.jar or curl. You want a decent connector to whatever data source
there is, that fetches data, possibly massages it a bit, and then feeds
it into Solr - by means of SolrJ or directly into the web service of
Sol
Hello Greg,
Consul and Zookeeper are quite similar in their offering with respect
to what SolrCloud needs. Service discovery, watches on distributed
cluster state, updates of configuration could all be handled through
Consul. Plus, Consul does offer built-in capabilities for
multi-datacenter sc
Hello Greg,
we run Zookeeper not on dedicated Zookeeper machines, but rather on
admin nodes in search application clusters (that makes two instances),
plus on at least one more node that does not have much load (e.g., a
crawling node). Also, as long as you don't stuff too much data into
Zookeeper
Hello Dan,
ManifoldCF is a connector framework, not a processing framework.
Therefore, you may try your own lightweight connectors (which usually
are not really rocket science and may take less time to write than time
to configure a super-generic connector of some sort), any connector out
there (
Hello Enrico,
you may use the chroot feature of Zookeeper to root the different
SolrCloud instances differently. Instead of zoohost1:2181, you can use
zoohost1:2181/cluster1 as the Zookeeper location. Unless there is a load
issue with high rates of updates and other data traffic, a single
Zookeep
Hi guy,
there's not much of a search operation here. Why not store the
documents in a key/value store and simply fetch them by matching ids?
Another approach: as there is no query, you could easily partition the
set of ids and fetch the results in multiple batches.
The maximum number of clause
Why rely on the default http client? Why not create one with
HttpClients.custom()
.setDefaultSocketConfig(socketConfig)
.setDefaultRequestConfig(requestConfig)
.setSSLSocketFactory(sslsf)
.build();
that has the SSLConnectionSocketFactory property set up with an
SSL
Hi Ali,
the sizing is not just determined by the number of indexed documents
(and even less by the number of concurrent users).
- Document volume (number of documents, amount of text data to be
indexed with each document, number and types of fields, the cardinality
of fields) guide you to the n
Hello,
no matter which search platform you will use, this will pose two
challenges:
- The size of the documents will render search less and less useful as
the likelihood of matches increases with document size. So, without a
proper semantic extraction (e.g., using decent NER or relationship
extr
Maybe you should consider creating different generations of indexes and
not keep everything in one index. If the likelihood of documents being
deleted is rather high in, e.g., the first week or so, you could have
one index for the high-probability of deletion documents (the fresh
ones) and a second
Hello Charlie,
theoretically, things may work as you describe them. A few big
HOWEVERs exist as far as I can see:
1. Attributes: as different organisations may use different schemata
(document attributes), the consolidation of results from multiple
sources may present a problem. This may not ari
28 matches
Mail list logo