Home | Geschichten | Kunst | Computer | Tindertraum |
background on this: I now have an example corpus of about 1300+ Spam/Junk messages, and I noticed a degrade in detection accuracy. Actually, I had a large corpus of Junk mail I trained Moz on, and found it was overzealous. Having now marked a lot of messages as 'not junk' I see the exact oposite, it doesn't detect some obvious Junk at all...
So is it better to just have a rather small (200+) corpus of example Junk or what?
And then, is there a difference between messages not marked at all and messages marked not junk???
[ by Martin>] [permalink] [similar entries]
similar entries (vs):
similar entries (cg):