Everythink about Tika extraction is written under those links. Basicaly
what you need is the following:
1) requestHandler for Tika in solrconfig.xml
2) keep all the fields in schema.xml that are needed for Tika (they are
marked in example schema.xml) and set those you don't need to
indexed=fals
I tried commenting out NOW in bq, but didn't make any difference in the
performance. I do see minor entry in the queryfiltercache rate which is a
meager 0.02.
I'm really struggling to figure out the bottleneck, any known pain points I
should be checking ?
--
View this message in context:
http
Hi,
Logically maintaining will be easy, as both collections are in
different folders.
Next, even thought making separate fields in one collection, at search
time if field list is not mentioned then results will be combination of
both domains. If this is mandatorily taking care at search/
Hi,
Regrets, I was confused with bit-set. I l have Shawn's suggested
approach in system. I want to try with other ways and test performance.
How can I use join? I have 2 different solr indexes.
localhost:8080/solr_1/select?q=content:test&fl=id,name,type
localhost:8081/solr_1_1/select?q=text:t
On 10/17/2013 12:51 PM, Christopher Gross wrote:
OK, super confused now.
http://index1:8080/solr/admin/cores?action=CREATE&name=test2&collection=test2&numshards=1&replicationFactor=3
Nets me this:
400
15007
Error CREATEing SolrCore 'test2': Could not find configName
for collection test2 fou
But global on a qt would be awesome !!!
Bill Bell
Sent from mobile
> On Oct 17, 2013, at 2:43 PM, Yonik Seeley wrote:
>
> There isn't a global "cache=false"... it's a local param that can be
> applied to any "fq" or "q" parameter independently.
>
> -Yonik
>
>
>> On Thu, Oct 17, 2013 at 4:3
Would someone help me out with the syntax for setting Tokenizer.documentFields
in the ClusteringComponent engine definition in solrconfig.xml? Carrot2 is
expecting a Collection of Strings. There's no schema definition for this XML
file and a big TODO on the Wiki wrt init params. Every permuta
Awesome, this make a lot of sense now. Thanks a lot guys.
Currently the only mention of this setting in the docs is under
filterQuery on the "SolrCaching" page as:
" Solr3.4 Adding the
localParam flag of {!cache=false} to a query will prevent
the filterCach
: Does "cache=false" apply to all caches? The docs make it sound like it is for
: filterCache only, but I could be misunderstanding.
it's per *query* -- not per cache, or per request...
/select?q={!cache=true}foo&fq={!cache=false}bar&fq={!cache=true}baz
...should cause 1 lookup/insert in the
There isn't a global "cache=false"... it's a local param that can be
applied to any "fq" or "q" parameter independently.
-Yonik
On Thu, Oct 17, 2013 at 4:39 PM, Tim Vaillancourt wrote:
> Thanks Yonik,
>
> Does "cache=false" apply to all caches? The docs make it sound like it is
> for filterCac
Thanks Yonik,
Does "cache=false" apply to all caches? The docs make it sound like it
is for filterCache only, but I could be misunderstanding.
When I force a commit and perform a /select a query many times with
"cache=false", I notice my query gets cached still, my guess is in the
queryResul
> load the configs into zookeeper,
Yes.
> stop tomcat, add it to the solr.xml file,
and restart tomcat.
To your CREATE URL, add the parameter &collection.configName=
http://wiki.apache.org/solr/SolrCloud#Managing_collections_via_the_Collections_API
Michael Della Bitta
Applications Developer
o
I trying to do this:
if (US_offers_i exists):
fq=US_offers_i:[1 TO *]
else:
fq=offers_count:[1 TO *]
Where:
US_offers_i is a dynamic field containing an int
offers_count is a status field containing an int.
I have tried this so far but it doesn't work:
http://solr_server/solr/col1/select?
I can't find it in the Admin->Cloud->Tree part of the UI.
Trying to "get" the file:
[zk: localhost:2181(CONNECTED) 0] get /aliases.json
Node does not exist: /aliases.json
So it didn't stick -- I'm guessing. I don't see an error message regarding
the "alias" in my logs either. Anywhere else I sh
Go to the admin screen for Cloud/Tree, and then click the node for
aliases.json. To the lower right, you should see something like:
{"collection":{"AdWorksQuery":"AdWorks"}}
Or access the Zookeeper instance, and do a 'get /aliases.json'.
-Original Message-
From: Christopher Gross [mail
Also, when I make an alias:
http://index1:8080/solr/admin/collections?action=CREATEALIAS&name=test1-alias&collections=test1
I get a pretty useless response:
00
So I'm not sure if it is made. I tried going to:
http://index1:8080/solr/test1-alias/select?q=*:*
but that didn't work. How do I use an
OK, super confused now.
http://index1:8080/solr/admin/cores?action=CREATE&name=test2&collection=test2&numshards=1&replicationFactor=3
Nets me this:
400
15007
Error CREATEing SolrCore 'test2': Could not find configName
for collection test2 found:[xxx, xxx, , x, xx]
400
For that
Thanks Garth!
Yes, indeed, I know that issue.
I had set up my SolrCloud using 4.5.0 and then encountered this problem, so
I rolled back to 4.4.0
-
Thanks,
Michael
--
View this message in context:
http://lucene.472066.n3.nabble.com/Change-config-set-for-a-collection-tp4096032p4096136.html
S
I work at Chegg.com and I really like it, but we have more search work than I
can do by myself, so we are hiring a senior software engineer for search. Most
of our search services are on Solr.
http://www.chegg.com/jobs/listings/?jvi=oAQGXfwN,Job
If you'd like to know a lot more about Chegg's b
But if you're working with multiple configs in zookeeper, be aware that 4.5
currently has an issue creating multiple collections in a cloud that has
multiple configs. It's targeted to be fixed whenever 4.5.1 comes out.
https://issues.apache.org/jira/i#browse/SOLR-5306
-Original Message---
Thank you, Shawn!
"linkconfig" - that's exactly what i was looking for!
--
View this message in context:
http://lucene.472066.n3.nabble.com/Change-config-set-for-a-collection-tp4096032p4096134.html
Sent from the Solr - User mailing list archive at Nabble.com.
Thanks, I'll try upgade.
On 17 October 2013 15:55, Mark Miller wrote:
> There was a reload bug in SolrCloud that was fixed in 4.4 -
> https://issues.apache.org/jira/browse/SOLR-4805
>
> Mark
>
> On Oct 17, 2013, at 7:18 AM, Grzegorz Sobczyk wrote:
>
> > Sorry for previous spam (something eat m
On 10/17/2013 2:36 AM, michael.boom wrote:
> The question also asked some 10 months ago in
> http://lucene.472066.n3.nabble.com/SolrCloud-4-1-change-config-set-for-a-collection-td4037456.html,
> and then the answer was negative, but here it goes again, maybe now it's
> different.
>
> Is it possibl
Hi Roland,
(13/10/17 20:44), Roland Everaert wrote:
Hi,
I helped a customer to deployed solr+manifoldCF and everything is going
quite smoothly, but every time solr is raising an exception, the
manifoldcfjob feeding
solr aborts. I would like to know if it is possible to configure the
ExtractRequ
Thanks Primoz, I was suspecting that too. But then, its hard to imagine that
query cache is only contributing to the big performance hit. The setting
applies to the old configuration, and it works pretty well even with the
query cache low hit rate.
--
View this message in context:
http://lucene
Yes, right now this constraint could be implemented in either the web app
or Solr. I see now that many of the QTimes on these queries are <10 ms
(probably due to caching), so I'm a bit less concerned.
On Wed, Oct 16, 2013 at 2:13 AM, Furkan KAMACI wrote:
> I just wonder that: Don't you implement
There was a reload bug in SolrCloud that was fixed in 4.4 -
https://issues.apache.org/jira/browse/SOLR-4805
Mark
On Oct 17, 2013, at 7:18 AM, Grzegorz Sobczyk wrote:
> Sorry for previous spam (something eat my message)
>
> I have the same problem but with reload action
> ENV:
> - 3x Solr 4.2.
I have just find this JIRA report, which could explain your problem:
https://issues.apache.org/jira/browse/SOLR-2416
Regards,
Roland.
On Thu, Oct 17, 2013 at 3:30 PM, wonder wrote:
> Thanks for answer. Yes Tika extract, but not index content. Here is the
> solr response
> ...
> "content":
I am also trying with something like -
java -Durl=http://domainname.com:1981/solr/web/update-Dtype=application/json
-jar /solr4RA/example1/exampledocs/post.jar
/root/Desktop/web/*.json
but it is giving error -
19:06:22 ERROR SolrCore org.apache.solr.common.SolrException: Unknown
command: subDoma
Thanks for answer. Yes Tika extract, but not index content. Here is the
solr response
...
"content": [ " 9118_xmessengereu_v18ximpda.jar dimonvideo.ru.txt " ],
...
There are not any of this files in index.
Any ideas?
17.10.2013 17:20, Roland Everaert ?:
Even if I don't test it myself, you ca
Even if I don't test it myself, you can use Tika, it is able to extract
document from zip archives and index them, but of course it depends of the
file type in the archive.
Regards,
Roland.
On Thu, Oct 17, 2013 at 2:36 PM, wonder wrote:
> Does anybody know how index files in zip archives?
>
The default Solr stopwords.txt file is empty, so SOMEBODY created that
non-empty stop words file.
The StopFilterFactory token filter in the field type analyzer controls stop
word processing. You can remove that step entirely, or different field types
can reference different stop word files, or
Wow thanks for all that, i just upgraded, linked my plugins & it seems fine
so far, but i have run into another issue
while adding a document to the solr cloud it says -
org.apache.solr.common.SolrException: Unknown document router
'{name=compositeId}'
in the clusterstate.json i can see -
Thanks for answer. If I dont want to store and index any fields i do:
multiValued="true"/>
multiValued="true"/>
multiValued="true"/>
multiValued="true"/>
multiValued="true"/>
multiValued="true"/>
multiValued="true"/>
multiValued="true"/>
multiValued="true"/>
stored="false" multiValued="true"/>
O
Does anybody know how index files in zip archives?
Hi,
I helped a customer to deployed solr+manifoldCF and everything is going
quite smoothly, but every time solr is raising an exception, the
manifoldcfjob feeding
solr aborts. I would like to know if it is possible to configure the
ExtractRequestHandler to ignore errors like it seems to be possibl
Hello everyone! Please tell my wy Solr freezes when I adding this file
http://yadi.sk/d/dy-RtcHXB7KZU
The response from the server does not come.
curl
"http://localhost:8085/solr/myCollection/update/extract?literal.id=doc1&literal.fileName=as&uprefix=attr_&&commit=true";
-F "myfile=@/media/PENDR
Thanks Shalin.
Regards,
Bharat Akkinepalli
-Original Message-
From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com]
Sent: Thursday, October 17, 2013 1:18 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr 4.4 - Master/Slave configuration - Replication Issue with
Commits after de
Tim, if a separate VLAN was an option, I wouldn't be trying to use SSL.
-- Chris
On Wed, Oct 16, 2013 at 7:27 PM, Tim Vaillancourt wrote:
> Not important, but I'm also curious why you would want SSL on Solr (adds
> overhead, complexity, harder-to-troubleshoot, etc)?
>
> To avoid the overhead, c
Sorry for previous spam (something eat my message)
I have the same problem but with reload action
ENV:
- 3x Solr 4.2.1 with 4 cores each
- ZK
Before error I have:
- 14, 2013 5:25:36 AM CollectionsHandler handleReloadAction INFO: Reloading
Collection : name=products&action=RELOAD
- hundreds of (
On 16 October 2013 11:48, RadhaJayalakshmi
wrote:
> Hi,
> My setup is
> Zookeeper ensemble - running with 3 nodes
> Tomcats - 9 Tomcat instances are brought up, by registereing with
> zookeeper.
>
> Steps :
> 1) I uploaded the solr configuration like db_data_config, solrconfig,
> schema
> xmls int
On the SolrCloud wiki page (https://wiki.apache.org/solr/SolrCloud), I
found this statement:
The Grouping feature only works if groups are in the same shard. You
must use the custom sharding feature to use the Grouping feature.
However, the Distributed Search page
(https://wiki.apache.o
Why don't you check these:
- Content extraction with Apache Tika (
http://www.youtube.com/watch?v=ifgFjAeTOws)
- ExtractingRequestHandler (
http://wiki.apache.org/solr/ExtractingRequestHandler)
- Uploading Data with Solr Cell using Apache Tika (
https://cwiki.apache.org/confluence/display/solr/Upl
Hello everyone! Please tell me how and where to set Tika options in
Solr? Where is Tica conf? I'm want to know how I can eliminate not
required to me response attribute(such as links or images)? Also I am
interesting how i can get and index only metadata in several file formats?
Hi,
Imagine the next situation. You have a corpus of documents and a list of
queries extracted from production environment. The corpus haven't been
manually annotated with relvant/non relevant tags for every query. Then you
configure various solr instances changing the schema (adding synonyms,
sto
The question also asked some 10 months ago in
http://lucene.472066.n3.nabble.com/SolrCloud-4-1-change-config-set-for-a-collection-td4037456.html,
and then the answer was negative, but here it goes again, maybe now it's
different.
Is it possible to change the config set of a collection using the Co
Thank you,
I found the file with the stopwords and noticed that my local file is
empty (comments only) and the one on my webserver has a big list of
english stopwords. That seems to be the problem.
I think in general it is a good idea to use stopwords for random
searches, but it is not usefull in
Stopwords are small words such as "and", "the" or "is",that we might
choose to exclude from our documents and queries because they are such
common terms. Once you have stripped stop words from your above query,
all that is left is the word "wild", or so is being suggested.
Somewhere in your config
48 matches
Mail list logo