Hello All,
Need help on the error related to Solr indexing. We are using Solr 6.6.3 and
Nutch crawler 1.14. While indexing data to Solr we see errors as below
possible analysis error: Document contains at least one immense term in
field="content" (whose UTF8 encoding is longer than the max length
Team,
Need suggestion on how to remove the duplicate entries while indexing to
Solr. Below are the sample entries i see in solr collection while i need to
remove the one which is ending with /
https://www.abc.com/2018/test.html
https://www.abc.com/2018/test.html/
Thank you
--
Sent from: http
Hello All,
I would like to know how Solr will handle the stale pages. For example there
are 30 documents indexed for a domain abc.com and in the second collection i
have only 27 documents for the same abc.com domain and this needs to be
indexed in Solr.
So how solr will handle the old pages alr
Thanks for the update
I'm using Nutch 1.14 and Solr 6.6.3 and Zookeeper 3.4.12. We are using two
Solr and configured as Solr cloud. Please let me know if anything is missing
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Hello Team,
Need suggestions on Solr Indexing. We are using Solr-6.6.3 and Nutch 1.14.
I see unknown field 'cache' error while indexing the data to Solr so i added
below entry in field section of schema.xml forsolr
Tried indexing the data again and this time error is unknown field 'date'.
How
Team,
Need suggestions on solr indexing error unknown field 'metatag.description'
We are using Nutch 1.14 and solr 6.6.3
Nutch-site.xml is below
protocol-httpclient|urlfilter-regex|index-(basic|more)|query-(basic|site|url|lang)|indexer-solr|nutch-extensionpoints|parse-(html|tika|text|msexcel|