I did a review of the code and it was definitely written to support having
multiple training sets in the same collection. So, it sounds like something
is not working as designed.
I planned on testing out model building with different types of training
sets anyway, so I'll can comment on my finding
Thank you Tim. I appreciated the tips. At this point, I'm just trying
to understand how to use it. The 30 tweets that I've selected so far,
are, in fact threatening. The things people say! My favorite so far is
'disingenuous twat waffle'. No kidding.
The issue that I'm having is not with
Joe,
To do this correctly, soundly, you will need to sample the data and mark
them as threatening or neutral. You can probably expand on this quite a
bit, but that would be a good start. You can then draw another set of
samples and see how you did. You use one to train and one to validate.
Wha
I've only tested with the training data in it's own collection, but it was
designed for multiple training sets in the same collection.
I suspect you're training set is too small to get a reliable model from.
The training sets we tested with were considerably larger.
All the idfs_ds values being t
If I put the training data into its own collection and use q="*:*", then
it works correctly. Is that a requirement?
Thank you.
-Joe
On 3/20/2017 3:47 PM, Joe Obernberger wrote:
I'm trying to build a model using tweets. I've manually tagged 30
tweets as threatening, and 50 random tweets as n