Home | Geschichten | Kunst | Computer | Tindertraum |
TextCat is an implementation of the text categorization algorithm presented in Cavnar, W. B. and J. M. Trenkle, ``N-Gram-Based Text Categorization'' In Proceedings of Third Annual Symposium on Document Analysis and Information Retrieval, Las Vegas, NV, UNLV Publications/Reprographics, pp. 161-175, 11-13 April 1994.
via Idle words
Another point in a debate that seems to ensue about meta-info vs. categorizing content by it's nature. This script uses N_Grams to identify about 69 natural languages
[ by Martin>] [permalink] [similar entries]
similar entries (vs):
similar entries (cg):