Customzing Solr Dedupe
I'm facing a challenges using de-dupliation of Solr documents. De-duplicate is done using TextProfileSignature with following parameters: field1, field2, field3 0.5 3 Here Field3 is normal text with few lines of data. Field1 and Field2 can contain upto 5 or 6 words of data. I want to de-duplicate when data in field1 and field2 are exactly the same and 90% of the lines in field3 is matched to that in another document. Is there anyway to achieve this? -- View this message in context: http://lucene.472066.n3.nabble.com/Customzing-Solr-Dedupe-tp4196879.html Sent from the Solr - User mailing list archive at Nabble.com.
Update jar file in Solr 4.4.0
I have Solr cloud configuration which we run on 4 servers. We use tomcat as web server for solr. I have 5 zookeepers to maintain the data-replication. I have added a jar file with custom update processor. This is in shared folder which is mention in solr.xml**While creating the first version of this jar file I gave the name updateProcessor.0.1.jar as the file name. Even though it was shared, jar files were added in all the 4 servers.But now I have to update the updateProcessor. For this I created updateProcessor0.2.jar. I deleted the updateProcessor.0.1.jar from each sever and added a new one. But changes were not seen ?Any ideas what I am doing wrong? Should this is be checked using zkcli ? -- View this message in context: http://lucene.472066.n3.nabble.com/Update-jar-file-in-Solr-4-4-0-tp4282164.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Update jar file in Solr 4.4.0
Actually my changes in updateProcessor.0.1.jar were not reflecting (functionality wise). I was getting no errors. Well I dropped the jar file in shared folder only updateProcessor.0.1.jar . The entry added in solrconfig file was ** In /updateProcessor.0.1.jar/ I had class file with the path /org.apache.solr.update.processor.MyUpdateProcessorFactory/ However I have made some changes and it is working as expected. *Solution that worked for me:* I changed entry in solrconfig to ** Then created new jar file /updateProcessor.0.2.jar/ with following class: /org.apache.solr.update.processor.MyUpdateProcessorFactory2/ Thanks for your help. I will check with team about zookeepers though :) Regards, Aayush First, having 5 Zookeeper nodes to manage 4 Solr nodes is serious overkill. Three should be more than sufficient. what did you put in your configuration? Does your directive in solrconfig.xml mention updateProcessor.0.1? And what error are you seeing exactly? When Solr starts up, part of the voluminous messages are where exactly it looks for jar files. So you should be able to see exactly what Solr is aware of. If you didn't specify a directive, one assumes you dropped the jar somewhere in the Tomcat hive. Is it in the right place? Did you restart Tomcat? (not sure this last is necessary, but just in case...) Best, Erick On Mon, Jun 13, 2016 at 7:22 PM, thakkar.aayush <thakkar.aayush@> wrote: > I have Solr cloud configuration which we run on 4 servers. We use tomcat > as > web server for solr. I have 5 zookeepers to maintain the data-replication. > I > have added a jar file with custom update processor. This is in shared > folder > which is mention in solr.xml**While creating the first version of this jar > file I gave the name updateProcessor.0.1.jar as the file name. Even though > it was shared, jar files were added in all the 4 servers.But now I have to > update the updateProcessor. For this I created updateProcessor0.2.jar. I > deleted the updateProcessor.0.1.jar from each sever and added a new one. > But > changes were not seen ?Any ideas what I am doing wrong? Should this is be > checked using zkcli ? > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Update-jar-file-in-Solr-4-4-0-tp4282164.html > Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://lucene.472066.n3.nabble.com/Update-jar-file-in-Solr-4-4-0-tp4282164p4282328.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr facet search improvements
I have around 1 million job titles which are indexed on Solr and am looking to improve the faceted search results on job title matches. For example: a job search for *Research Scientist Computer Architecture* is made, and the facet field title which is tokenized in solr and gives the following results: 1. Senior Data Scientist 2. PARALLEL COMPUTING SOFTWARE ENGINEER 3. Engineer/Scientist 4 4. Data Scientist 5. Engineer/Scientist 6. Senior Research Scientist 7. Research Scientist-Wireless Networks 8. Research Scientist-Andriod Development 9. Quantum Computing Theorist Job 10.Data Sceintist Smart Analytics I want to be able to improve / optimize the job titles and be able to make exclusions and some normalizations. Is this possible with Solr? What is the best way to have more granular control over the facted search results ? For example *Engineer/Scientist 4* - is not useful and too specific and titles like *Quantum Computing theorist* would ideally also be excluded -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-facet-search-improvements-tp4182502.html Sent from the Solr - User mailing list archive at Nabble.com.