Sep. 14th, 2005

dr_tectonic: (Default)
So, there's a really huge and significant difference between </b> and <\b> in HTML. If the HTML is part of XML that gets serialized and unserialized, it's even more significant.

It is also a difference that is nearly impossible to see when looking at code.

Because your brain knows what kind of slash belongs in a close tag, so it doesn't bother to look closely enough to check what kind it is -- why would it be a backslash? That would be stupid.

Yes, exactly.

If anybody can figure out how to make a program that can look at a bunch of text/code, figure out what context chunks of it are probably in, and point out the things that probably don't belong (i.e., "if this is HTML, that bit looks really unlikely"), that would be really, really cool, and I bet you could make a kajillion dollars off it.

(It might also be AI.)

EDIT: I forgot to mention that this HTML was occurring inside a bunch of Java code -- which makes it much harder to pick out.