Solr Indexing error

2018-08-28 Thread kunhu0...@gmail.com
Hello All, Need help on the error related to Solr indexing. We are using Solr 6.6.3 and Nutch crawler 1.14. While indexing data to Solr we see errors as below possible analysis error: Document contains at least one immense term in field="content" (whose UTF8 encoding is longer than the max length

Solr indexing Duplicate URL's ending with /

2018-08-29 Thread kunhu0...@gmail.com
Team, Need suggestion on how to remove the duplicate entries while indexing to Solr. Below are the sample entries i see in solr collection while i need to remove the one which is ending with / https://www.abc.com/2018/test.html https://www.abc.com/2018/test.html/ Thank you -- Sent from: http

Solr Stale pages

2018-08-30 Thread kunhu0...@gmail.com
Hello All, I would like to know how Solr will handle the stale pages. For example there are 30 documents indexed for a domain abc.com and in the second collection i have only 27 documents for the same abc.com domain and this needs to be indexed in Solr. So how solr will handle the old pages alr

Re: Solr Stale pages

2018-08-30 Thread kunhu0...@gmail.com
Thanks for the update I'm using Nutch 1.14 and Solr 6.6.3 and Zookeeper 3.4.12. We are using two Solr and configured as Solr cloud. Please let me know if anything is missing -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Unknown field "cache"

2018-08-31 Thread kunhu0...@gmail.com
Hello Team, Need suggestions on Solr Indexing. We are using Solr-6.6.3 and Nutch 1.14. I see unknown field 'cache' error while indexing the data to Solr so i added below entry in field section of schema.xml forsolr Tried indexing the data again and this time error is unknown field 'date'. How

Solr unknown field 'metatag.description'

2018-08-31 Thread kunhu0...@gmail.com
Team, Need suggestions on solr indexing error unknown field 'metatag.description' We are using Nutch 1.14 and solr 6.6.3 Nutch-site.xml is below protocol-httpclient|urlfilter-regex|index-(basic|more)|query-(basic|site|url|lang)|indexer-solr|nutch-extensionpoints|parse-(html|tika|text|msexcel|