Re: model building

2017-03-22 Thread Joel Bernstein
I did a review of the code and it was definitely written to support having multiple training sets in the same collection. So, it sounds like something is not working as designed. I planned on testing out model building with different types of training sets anyway, so I'll can comment on my finding

Re: model building

2017-03-22 Thread Joe Obernberger
Thank you Tim. I appreciated the tips. At this point, I'm just trying to understand how to use it. The 30 tweets that I've selected so far, are, in fact threatening. The things people say! My favorite so far is 'disingenuous twat waffle'. No kidding. The issue that I'm having is not with

Re: model building

2017-03-21 Thread Tim Casey
Joe, To do this correctly, soundly, you will need to sample the data and mark them as threatening or neutral. You can probably expand on this quite a bit, but that would be a good start. You can then draw another set of samples and see how you did. You use one to train and one to validate. Wha

Re: model building

2017-03-20 Thread Joel Bernstein
I've only tested with the training data in it's own collection, but it was designed for multiple training sets in the same collection. I suspect you're training set is too small to get a reliable model from. The training sets we tested with were considerably larger. All the idfs_ds values being t

Re: model building

2017-03-20 Thread Joe Obernberger
If I put the training data into its own collection and use q="*:*", then it works correctly. Is that a requirement? Thank you. -Joe On 3/20/2017 3:47 PM, Joe Obernberger wrote: I'm trying to build a model using tweets. I've manually tagged 30 tweets as threatening, and 50 random tweets as n