semantic blindness
Sep. 14th, 2005 06:38 pmSo, there's a really huge and significant difference between </b> and <\b> in HTML. If the HTML is part of XML that gets serialized and unserialized, it's even more significant.
It is also a difference that is nearly impossible to see when looking at code.
Because your brain knows what kind of slash belongs in a close tag, so it doesn't bother to look closely enough to check what kind it is -- why would it be a backslash? That would be stupid.
Yes, exactly.
If anybody can figure out how to make a program that can look at a bunch of text/code, figure out what context chunks of it are probably in, and point out the things that probably don't belong (i.e., "if this is HTML, that bit looks really unlikely"), that would be really, really cool, and I bet you could make a kajillion dollars off it.
(It might also be AI.)
EDIT: I forgot to mention that this HTML was occurring inside a bunch of Java code -- which makes it much harder to pick out.
It is also a difference that is nearly impossible to see when looking at code.
Because your brain knows what kind of slash belongs in a close tag, so it doesn't bother to look closely enough to check what kind it is -- why would it be a backslash? That would be stupid.
Yes, exactly.
If anybody can figure out how to make a program that can look at a bunch of text/code, figure out what context chunks of it are probably in, and point out the things that probably don't belong (i.e., "if this is HTML, that bit looks really unlikely"), that would be really, really cool, and I bet you could make a kajillion dollars off it.
(It might also be AI.)
EDIT: I forgot to mention that this HTML was occurring inside a bunch of Java code -- which makes it much harder to pick out.
no subject
Date: 2005-09-14 05:57 pm (UTC)I think it does a lot of what you're looking for, and has been out there for years.
no subject
Date: 2005-09-14 06:00 pm (UTC)no subject
Date: 2005-09-14 07:22 pm (UTC)no subject
Date: 2005-09-14 08:20 pm (UTC). I've not used Tidy XML before, but my text editor has a decent code validator.
no subject
Date: 2005-09-14 10:37 pm (UTC)I, on the other hand, helped debug roughly the following code today:
String value = getValue();
value.replaceAll("([$\\\\])", "\\\\$1");
text = text.replaceAll(key, value);
I want something that will detect if I'm alternating between different languages' regular expression idioms in the same functions.
no subject
Date: 2005-09-14 10:40 pm (UTC)All the HTML was in beige, because it was part of a String.
no subject
Date: 2005-09-14 10:45 pm (UTC)no subject
Date: 2005-09-15 12:45 pm (UTC)no subject
Date: 2005-09-15 01:03 pm (UTC)The problem was, it was only meaningful in hindsight, because the parser is part of the Java code, too, and the whole thing broke when I added some code. So naturally I assumed the new code I had written was at fault, rather than the new data The possibility "actually the code is working, but the parser is getting malformed XML" was very low in my brain until I realized that's what it was...
no subject
Date: 2005-09-15 04:19 pm (UTC)no subject
Date: 2005-09-15 04:27 pm (UTC)no subject
Date: 2005-09-15 04:43 pm (UTC)