dr_tectonic | semantic blindness

You're viewing

dr_tectonic's journal
Create a Dreamwidth Account Learn More

Reload page in style: site light

So, there's a really huge and significant difference between </b> and <\b> in HTML. If the HTML is part of XML that gets serialized and unserialized, it's even more significant.

It is also a difference that is nearly impossible to see when looking at code.

Because your brain knows what kind of slash belongs in a close tag, so it doesn't bother to look closely enough to check what kind it is -- why would it be a backslash? That would be stupid.

Yes, exactly.

If anybody can figure out how to make a program that can look at a bunch of text/code, figure out what context chunks of it are probably in, and point out the things that probably don't belong (i.e., "if this is HTML, that bit looks really unlikely"), that would be really, really cool, and I bet you could make a kajillion dollars off it.

(It might also be AI.)

EDIT: I forgot to mention that this HTML was occurring inside a bunch of Java code -- which makes it much harder to pick out.

Threaded | Top-Level Comments Only

From:

navrins

Check out Tidy XML - http://www.w3.org/People/Raggett/tidy/

I think it does a lot of what you're looking for, and has been out there for years.

From:

navrins

Also, your brain apparently works differently than mine - I think that's a mistake that would stand out to me like a blooming dandelion in a landscaped lawn. Whereas I can miss a *missing* close tag (or a close tag that doesn't quite match the open tag, e.g. </td> when I should have used </th>) many times in a row, till I run Tidy XML on it.

From:

orbitalmechanic.livejournal.com

There has to be an emacs mode for this, too. M-x be-smarter-than-me or something. No, but seriously emacs has some awesome text-highlighting stuff that makes things like this show up well.

From:

k8cre8.livejournal.com

Yeah. I'm with you. I'd notice the slash quickly, but, will miss a non-matching closed tag, or forget to close something like a
. I've not used Tidy XML before, but my text editor has a decent code validator.

From:

flwyd.livejournal.com

If it's supposed to be well formed, parse it and see if you're missing closing tags... <\b> isn't a valid tag anyway...

I, on the other hand, helped debug roughly the following code today:

String value = getValue(); value.replaceAll("([$\\\\])", "\\\\$1"); text = text.replaceAll(key, value);

I want something that will detect if I'm alternating between different languages' regular expression idioms in the same functions.

From:

dr-tectonic.livejournal.com

Ah, but my awesome emacs text highlighting was already dedicated to coloring the java code that was generating the XML in question...

All the HTML was in beige, because it was part of a String.

From:

dr-tectonic.livejournal.com

The parser is what was crashing on the bad tag. EIT!

From:

madbodger.livejournal.com

That parser seems to not follow one of the old Unix ideas: "succeed quietly, fail noisily". It should have been verbose and specific about what it gagged on. Maybe I should hack up my old parser to serve as a quick-and-dirty scanner for obvious types of errors. It sure was bitchy enough.

From:

dr-tectonic.livejournal.com

It was, actually. It even gave a meaningful error message.

The problem was, it was only meaningful in hindsight, because the parser is part of the Java code, too, and the whole thing broke when I added some code. So naturally I assumed the new code I had written was at fault, rather than the new data The possibility "actually the code is working, but the parser is getting malformed XML" was very low in my brain until I realized that's what it was...

From:

flwyd.livejournal.com

Ah, but you neglect the two fundamental realizations of computer science:

Code is Data
Data is Code

From:

dr-tectonic.livejournal.com

Only in a sufficiently high-level programming language...

From:

flwyd.livejournal.com

Formally, a function is anything which takes input and produces output. javac is a function which takes java code and produces bytecode or not java code and produces error messages. In this instance, your code is data, but it's data which describes another function. Your function takes text and does HTML stuff. <\b> is data, but it is also a function written in the language of BeemersProgram. This function takes no input and produces an exception.

Threaded | Top-Level Comments Only

The Mad Schemes of Dr. Tectonic

The Secret Identity of Beemer, Baron Mustache-Wax

semantic blindness

semantic blindness

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject