I've only tested with the training data in it's own collection, but it was designed for multiple training sets in the same collection.
I suspect you're training set is too small to get a reliable model from. The training sets we tested with were considerably larger. All the idfs_ds values being the same seems odd though. The idfs_ds in particular were designed to be accurate when there are multiple training sets in the same collection. Joel Bernstein http://joelsolr.blogspot.com/ On Mon, Mar 20, 2017 at 5:41 PM, Joe Obernberger < joseph.obernber...@gmail.com> wrote: > If I put the training data into its own collection and use q="*:*", then > it works correctly. Is that a requirement? > Thank you. > > -Joe > > > > On 3/20/2017 3:47 PM, Joe Obernberger wrote: > >> I'm trying to build a model using tweets. I've manually tagged 30 tweets >> as threatening, and 50 random tweets as non-threatening. When I build the >> mode with: >> >> update(models2, batchSize="50", >> train(UNCLASS, >> features(UNCLASS, >> q="ProfileID:PROFCLUST1", >> featureSet="threatFeatures3", >> field="ClusterText", >> outcome="out_i", >> positiveLabel=1, >> numTerms=250), >> q="ProfileID:PROFCLUST1", >> name="threatModel3", >> field="ClusterText", >> outcome="out_i", >> maxIterations="100")) >> >> It appears to work, but all the idfs_ds values are identical. The >> terms_ss values look reasonable, but nearly all the weights_ds are 1.0. >> For out_i it is either -1 for non-threatening tweets, and +1 for >> threatening tweets. I'm trying to follow along with Joel Bernstein's >> excellent post here: >> http://joelsolr.blogspot.com/2017/01/deploying-ai-alerting-s >> ystem-with-solrs.html >> >> Tips? >> >> Thank you! >> >> -Joe >> >> >