Home | Geschichten | Kunst | Computer | Tindertraum |
WebSPHINX ( Website-Specific Processors for HTML INformation eXtraction) is a Java class library and interactive development environment for web crawlers. A web crawler (also called a robot or spider) is a program that browses and processes Web pages automatically.
WebSPHINX consists of two parts: the Crawler Workbench and the WebSPHINX class library.
Anybody interested in crawling and info-retrieval should download and play with this a little. It's like the RegexCoach for Crawlers
[ by Martin>] [permalink] [similar entries]
similar entries (vs):
similar entries (cg):