Traumwind - Mozilla Junk filter questions

I can't really find answers to:

What messages are considered when comparing for Junk? All the 'tagged as Junk' ones?
What happens when I delete messages that are tagges as junk? does Moz 'forget' those examples?
Is it better to just keep a cretain amount of example Junk around? Or should I save all Spam/Junk?

background on this: I now have an example corpus of about 1300+ Spam/Junk messages, and I noticed a degrade in detection accuracy. Actually, I had a large corpus of Junk mail I trained Moz on, and found it was overzealous. Having now marked a lot of messages as 'not junk' I see the exact oposite, it doesn't detect some obvious Junk at all...

So is it better to just have a rather small (200+) corpus of example Junk or what?

And then, is there a difference between messages not marked at all and messages marked not junk???

[ by Martin>] [permalink] [similar entries]

similar entries (vs):

ok, it's proven (# 21%)
Mozilla Junk (# 20%)
If you think leaving rude messages will get my attention (# 10%)
Dave Farquhar on the new naive bayesian spam filter in Mozilla (# 9%)

similar entries (cg):

Mozilla Junk (# 20%)

relevant words

Big things to come (TM) 30th Dez 2002

Is it finished?
Oblique Strategies, Ed.3 Brian Eno and Peter Schmidt

amazon.de Wunschliste

usefull links:
Google Graph browser
Traumwind 6-Colormatch
UAV News

junk (5)
marked (3.4)
messages (3.3)
corpus (3.2)
moz (2.7)
example (2.2)
spam (2.1)
tagges (1.8)
tagged (1.8)
cretain (1.8)
oposite (1.8)
degrade (1.7)
detect (1.7)
overzealous (1.7)
trained (1.6)
better (1.5)
answers (1.5)
comparing (1.5)
accuracy (1.5)
detection (1.5)
save (1.4)
obvious (1.4)
questions (1.4)
considered (1.4)
exact (1.4)
all (1.3)
happens (1.3)
delete (1.3)
difference (1.3)
examples (1.2)
background (1.2)
when (1.2)
what (1.2)
amount (1.2)
mail (1.2)
filter (1.2)
forget (1.1)
i (1.1)
as (1.1)
noticed (1.1)
not (1.1)
ones (1.1)
large (1.1)
mozilla (1.1)
at (1)
just (1)
doesn (1)
now (1)
between (1)
keep (1)
are (1)
having (0.9)
or (0.9)
of (0.9)
a (0.9)
small (0.9)
lot (0.9)
it (0.9)
have (0.9)
found (0.8)
around (0.8)
does (0.8)
had (0.8)
find (0.8)
rather (0.8)
should (0.8)
is (0.8)
and (0.8)
see (0.8)
on (0.7)
then (0.7)
those (0.7)
actually (0.7)
really (0.7)
to (0.6)
there (0.6)
an (0.6)
was (0.6)
about (0.5)
some (0.5)
so (0.5)
can (0.5)
the (0.5)
this (0.4)
that (0.4)
in (0.4)
for (0.3)
martin (0.2)
(0)