Home Geschichten Kunst Computer Tindertraum

[current]

reinventing the wheel...
(Wednesday 31st March 2004)

holy cow, I'm doing that right now. I need a robust HTML parser in Tcl. No DOM bells and whistles, just turn a HTML-Table into a Tcl data structure, filter that and spit out some reformed HTML. Key word here is 'just'...
So far anything I've seen either is C-based (nogo for this installation) or requires a complete libary installation (tcllib)...

Well, looks like I'll use the alotted time buget to do do some nice reinventing, tralala

I'll also take it as an opportunity to get some practice coding on my Mac... getting a better feel for the machine etc.
I need to use any opportunity to break my dependancy on the win2k machine here for such things... You know, one puts up with all those little annoyances of an windows machine just because everything is already set up and one is 'used to it'. But I will never benefit fully of MacOS X if I don't start using it and getting settled in

I'm going for a REX like shallow parsing (tokenizing) first

Step one: the tokenizer

# $html contains out HTML raw data
while {[regexp {^<.*?>} $html tag]} {
	# remove the tag from string
	set html [string replace $html 0 [expr [string length $tag]-1]]
	puts "tag: $tag"
	# get the text between tags
	regexp {^[^<]*} $html text
	set html [string replace $html 0 [expr [string length $text]-1]]
	if {[string length $text]} {
		puts "text: '$text'\n"
	}
}
puts "remaining: \[$html]"

If everything went right, the last line of output should read:
remaining: []
Instead of printing out the tokens, we'll need to process them of course!
Lesson learned so far: regexps only go so far before the become more trouble than they are worth. That's why I'm using string replace in there

Ok, you guessed, that's not ever half the work done yet...

[ by Martin>] [permalink] [similar entries]

similar entries (vs):

similar entries (cg):

no similar entries (yet?)

Martin Spernau
© 1994-2003

traumwind icon Big things to come (TM) 30th Dez 2002

Emphasize differences
Oblique Strategies, Ed.3 Brian Eno and Peter Schmidt



amazon.de Wunschliste





 

usefull links:
Google Graph browser
Traumwind 6-Colormatch
UAV News

powered by SBELT