Re: Question on index time de-duplication

2015-10-29 Thread Zheng Lin Edwin Yeo
Yes, you can try to use the SignatureUpdateProcessorFactory to do a hashing of the content to a signature field, and group the signature field during your search. You can find more information here: https://cwiki.apache.org/confluence/display/solr/De-Duplication I have been using this method to g

Re: Solr 5.3.1 CREATE defaults to schema-less mode Java version 1.7.0_45

2015-10-29 Thread Erick Erickson
I'm pretty confused about what you're trying to do. You mention using the SolrCloud UI to look at your core, but on the other hand you also mention using the core admin to create the core. Trying to use the core admin commands with SolrCloud is a recipe for disaster. Under the covers, the _collect

Re: Problem with the Content Field during Solr Indexing

2015-10-29 Thread Zheng Lin Edwin Yeo
The "\n" actually means new line as decoded by Solr from the indexed document. What is your file extension of your image file, and which method are you using to do the indexing? Regards, Edwin On 30 October 2015 at 04:38, Shruti Mundra wrote: > Hi, > > When I'm trying index an image file dire

Re: Solr 5.3.1 CREATE defaults to schema-less mode Java version 1.7.0_45

2015-10-29 Thread natasha
Note, if I attempt to CREATE the core using Solr 5.3.0 on my openstack machine (Java version 1.7.0) I have no issues. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-5-3-1-CREATE-defaults-to-schema-less-mode-Java-version-1-7-0-45-tp4237305p4237307.html Sent from the S

Question on index time de-duplication

2015-10-29 Thread Shamik Bandopadhyay
Hi, I'm looking to customizing index time de-duplication. Here's my use case and what I'm trying to achieve. I've identical documents coming from different release year of a given product. I need to index them in Solr as they are required in individual year context. But there's a generic search

Solr 5.3.1 CREATE defaults to schema-less mode Java version 1.7.0_45

2015-10-29 Thread natasha
Hi, I just downloaded Solr 5.3.1, and after starting Solr with the following command: bin/solr start -p 8985 I attempt to CREATE a Solr-core with the following command: curl 'http:localhost:8985/solr/admin/cores?action=CREATE&name=test-core&instanceDir=/Users/nw/Downloads/twc-session-dash/colle

Re: Solr collection alias - how rank is affected

2015-10-29 Thread Ronald Xiao
Using global IDF, if data is not even On Tuesday, October 27, 2015, Markus Jelsma wrote: > Hello - regarding fairly random/smooth distribution, you will notice it > for sure. A solution there is to use distributed collection statistics. On > top of that you might want to rely on docCount, not ma

Re: restore quorum after majority of zk nodes down

2015-10-29 Thread Pushkar Raste
How about having let's say 4 nodes on each side and make one node in one of data centers a observer. When data center with majority of the nodes go down, bounce the observer by reconfiguring it as a voting member. You will have to revert back the observer back to being one. There will be a short

Re: Closing Windows CMD kills Solr

2015-10-29 Thread Timothy Potter
would launching the java process with javaw help here? On Thu, Oct 29, 2015 at 4:03 AM, Zheng Lin Edwin Yeo wrote: > Yes, this is the expected behaviour. Once you close the command window, > Solr will stop running. This has happened to me several times. Just to > check, which version of Solr are

Using geotopic parser

2015-10-29 Thread Salonee Rege
We are using the geotopic parser on html pages. Does the geotopc parser only take .geot files. Kindly help *Salonee Rege* USC Viterbi School of Engineering University of Southern California Master of Computer Science - Student Computer Science - B.E salon...@usc.edu *||* *619-709-6756*

Re: org.apache.solr.common.SolrException: Document is missing mandatory uniqueKey field: id

2015-10-29 Thread Shawn Heisey
On 10/29/2015 1:54 AM, fabigol wrote: > hi, > thank to your reply > When you says > 'You must have a field labeled "id" in the doc sent to Solr'. it's in the > response of the select that i must get an "id"? i must write "select > 'something' as ID" is it good??? > in schema.xml i have the fo

Re: org.apache.solr.common.SolrException: Document is missing mandatory uniqueKey field: id

2015-10-29 Thread Erick Erickson
When you have a defined in schema.xml, that is a field in the doc that must exist in every doc you send to Solr. Actually, you'll see in the definition for the field that it has required="true" set. Assuming you have a column "carte_id" being selected for all docs inserted into Solr that should

Re: SolrJ stalls/hangs on client.add(); and doesn't return

2015-10-29 Thread Erick Erickson
You're sending 100K docs in a single packet? It's vaguely possible that you're getting a timeout although that doesn't square with no docs being indexed... Hmmm, to check you could do a manual commit. Or watch the Solr log to see if update requests ever go there. Or you're running out of memory o

Re: Solr for Pictures

2015-10-29 Thread Rallavagu
I was playing with exiftool (written in perl) and a custom java class built using metadata-extrator project (https://github.com/drewnoakes/metadata-extractor) and wondering if there is anything built into Solr or are there any best practices (general practices) to index pictures. On 10/29/15

Re: Solr for Pictures

2015-10-29 Thread Daniel Valdivia
Some extra googling yield this Wiki from a integration between Tika and a EXIFTool https://wiki.apache.org/tika/EXIFToolParser > On Oct 29, 2015, at 1:48 PM, Daniel Valdivia wrote: > > I think you can look into Tika for this https://tika.apache.or

RE: Solr for Pictures

2015-10-29 Thread Markus Jelsma
Hi - Solr does integrate with Apache Tika, which happily accepts images and other media formats. I am not sure if EXIF is exposed though but you might want to try. Otherwise patch it up or use Tika in your own process that indexes data to Solr. https://cwiki.apache.org/confluence/display/solr

Re: problem with solr auto add core after restart

2015-10-29 Thread Erick Erickson
What errors, if any, do you see in the Solr logs? The information here isn't enough to say much. Best, Erick On Thu, Oct 29, 2015 at 7:44 AM, sara hajili wrote: > hi, > i add this in solr.xml file : > ${coreRootDirectory:} > and in each core i added these to core.properties > loadOnStartup=true

Re: Solr for Pictures

2015-10-29 Thread Daniel Valdivia
I think you can look into Tika for this https://tika.apache.org/ There’s handlers to integrate Tika and Solr, some context: https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika

Solr for Pictures

2015-10-29 Thread Rallavagu
In general, is there a built-in data handler to index pictures (essentially, EXIF and other data embedded in an image)? If not, what is the best practice to do so? Thanks.

Fetching data from the Geotopic Parser

2015-10-29 Thread Shruti Mundra
Hi, We have started a Geotopic Parser on a specific port and when we tried to get the data using a socket connection we are receiving this error message - "Connected Message send HTTP/1.1 400 Bad Request Connection: close Server: Jetty(8.y.z-SNAPSHOT) Error: 400" What could be the reason f

Problem with the Content Field during Solr Indexing

2015-10-29 Thread Shruti Mundra
Hi, When I'm trying index an image file directly to Solr, the attribute content, consists of trails of "\n"s and not the data. We are successful in getting the metadata for that image. Can anyone help us out on how we could get the content along with the Metadata. Thanks! - Shruti Mundra

Re: restore quorum after majority of zk nodes down

2015-10-29 Thread Matteo Grolla
Hi Walter, it's not a problem to take down zk for a short (1h) time and reconfigure it. Meanwhile solr would go in readonly mode. I'd like feedback on the fastest way to do this. Would it work to just reconfigure the cluster with other 2 empty zk nodes? Would they correctly sync from the none

Re: restore quorum after majority of zk nodes down

2015-10-29 Thread Walter Underwood
You can't. Zookeeper needs a majority. One node is not a majority of a three node ensemble. There is no way to split a Solr Cloud cluster across two datacenters and have high availability. You can do that with three datacenters. You can probably bring up a new Zookeeper ensemble and configure t

Index Metatags in Nutch site.xml

2015-10-29 Thread Salonee Rege
We have finished running bin/nutch solrindex command on our Nutch segments.The data is getting indexed. I followed this link : https://wiki.apache.org/nutch/IndexMetatags . The metatags description and keywords were the sample ones we used. But they are not getting indexed. What could be the proble

restore quorum after majority of zk nodes down

2015-10-29 Thread Matteo Grolla
I'm designing a solr cloud installation where nodes from a single cluster are distributed on 2 datacenters which are close and very well connected. let's say that zk nodes zk1, zk2 are on DC1 and zk2 is on DC2 and let's say that DC1 goes down and the cluster is left with zk3. how can I restore a zk

Re: solr 5.3.0 master-slave: TWO segments after optimize

2015-10-29 Thread Andrii Berezhynskyi
Erick, they are not going away after reload. Emir, increased cpu and response time are on slaves. On all slaves. Here is a thread dump https://gist.github.com/andriiberezhynskyi/739d59cf78b043d653da (though not sure that I did it right, I used jstack -F PID). Thanks for your help!

Looking for SOLR consulting help

2015-10-29 Thread William Bell
Healthgrades is looking for Solr consulting assistance. Rate is negotiable based on skillsets. $125 - $175/hr We are flexible on time. 1. Solr 5.x experience 2. Tuning performance 3. Relevancy and Autosuggest experience 4. Move to Solr Cloud 5. Amazon Linux Debian experience 6. Java experience u

Re: Stem Words Highlighted - Keyword Not Highlighted

2015-10-29 Thread Jack Krupansky
Did you index the data before adding the word delimiter filter? The white space tokenizer preserves the period after "stocks.", but the WDF should remove it. The period is likely interfering with stemming. Are your filters the same for index time and query time? -- Jack Krupansky On Tue, Aug 18,

Re: language plugin

2015-10-29 Thread Jack Krupansky
Are you trying to do an atomic update without the content field? If so, it sounds like Solr needs an enhancement (bug fix?) so that language detection would be skipped if the input field is not present. Or maybe that could be an option. -- Jack Krupansky On Thu, Oct 29, 2015 at 3:25 AM, Chaushu,

Re: language plugin

2015-10-29 Thread Alexandre Rafalovitch
Could you post your full chain definition. It's an interesting problem, but hard to answer without seeing exact current configuration. Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 29 October 2015 at 03:25, Chaushu, Shani wr

problem with solr auto add core after restart

2015-10-29 Thread sara hajili
hi, i add this in solr.xml file : ${coreRootDirectory:} and in each core i added these to core.properties loadOnStartup=true and now if i stop and start solr from solr_home/bin after restart solr start and automatically added cores to solr. but when i restart solr service in linux as : service s

Re: Closing Windows CMD kills Solr

2015-10-29 Thread Zheng Lin Edwin Yeo
Yes, this is the expected behaviour. Once you close the command window, Solr will stop running. This has happened to me several times. Just to check, which version of Solr are you using? I have tried NSSM before, and it works for Solr 5.0 and Solr 5.1. However, when I move up to Solr 5.3.0, I wasn

Is it possible to use JiebaTokenizer for multilingual documents?

2015-10-29 Thread Zheng Lin Edwin Yeo
I would like to check, is it possible to use JiebaTokenizerFactory to index Multilingual documents in Solr? I found that JiebaTokenizerFactory works better for Chinese characters as compared to HMMChineseTokenizerFactory. However, for English characters, the JiebaTokenizerFactory is cutting the w

Re: Closing Windows CMD kills Solr

2015-10-29 Thread Charlie Hull
On 29/10/2015 00:12, Steven White wrote: Hi Folks, I don't understand if this is an expected behavior or not. On Windows, I start Solr from a command prompt like so: bin\solr start -p 8983 -s C:\MySolrIndex Now, once I close the command prompt the Java process that started Solr is killed

SolrJ stalls/hangs on client.add(); and doesn't return

2015-10-29 Thread Markus Jelsma
Hello - we have some processes periodically sending documents to 5.3.0 in local mode using ConcurrentUpdateSolrClient 5.3.0, it has queueSize 10 and threadCount 4, just chosen arbitrarily having no idea what is right. Usually its a few thousand up to some tens of thousands of rather small docum

RE: Closing Windows CMD kills Solr

2015-10-29 Thread Routley, Alan
Hi Steve, This is expected behaviour. I get around this by creating a scheduled task, set to run at startup to start Solr. Hope this helps Alan -Original Message- From: Steven White [mailto:swhite4...@gmail.com] Sent: 29 October 2015 00:13 To: solr-user@lucene.apache.org Subject: Clos

Re: org.apache.solr.common.SolrException: Document is missing mandatory uniqueKey field: id

2015-10-29 Thread fabigol
hi, thank to your reply When you says 'You must have a field labeled "id" in the doc sent to Solr'. it's in the response of the select that i must get an "id"? i must write "select 'something' as ID" is it good??? in schema.xml i have the following line ID end my data-import file i have I

Re: Many mapping files

2015-10-29 Thread Gora Mohanty
On 28 October 2015 at 19:45, fabigol wrote: > > Thank for your response. > I have 7 files *.xml. I already worked with SOlr but i have an only file. My > question is why in this project there's 7 files describing an entity. I am afraid that it is still difficult for an external person to guess at

language plugin

2015-10-29 Thread Chaushu, Shani
Hi, I'm using solr language detection plugin on field name "content" (solr 4.10, plugin LangDetectLanguageIdentifierUpdateProcessorFactory) When I'm indexing on the first time it works fine, but if I want to set one field again (regardless if it's the content or not) if goes to its default lan