I agree with Upayavira,
Title extraction is an activity independent from Solr.
Furthermore I would say it's easy to extract the title before the Solr
Indexng stage.
When we send the content arrives to Solr Update processors it is already a
String.
If you want to do some clever title extraction, fo
It depends a lot on what the documents are. Some document formats have
metadata that stores a title. Perhaps you can just extract that.
If not, once you've extracted the content, perhaps you could just have a
special field that is the first n words (followed by an ellipsis).
If you use a clusteri
The main objective here is actually to assign a title to the documents as
they are being indexed.
We actually found that the cluster labels provides a good information on
the key points of the documents, but I'm not sure if we can get a good
cluster labels with a single documents.
Besides getting
Hi Edwin,
let's do this step by step.
Clustering is problem solved by unsupervised machine learning algorithms.
The scope of clustering is to group per similarity a corpus of documents,
trying to have meaningful groups for a human being.
Solr currently provides different approaches for *Query Time