The ILX2 Tech Discussion Thread

Message Bookmarked
Bookmark Removed
Keef asked me to start this thread some months ago, but I forgotted. Anyways, here it is, and on the right board too.

A starter Q: Where's the ILX2 project page? I'd like to volunteer to help out with the code.

libcrypt, Thursday, 22 February 2007 19:56 (eighteen years ago)

The answer: Code

libcrypt, Thursday, 22 February 2007 20:08 (eighteen years ago)

In the absence of booze-hound Keith W, a brief outline of the idea:

ILX's last problems stem from trying to do waaaay too much on the DB, which was already chock-full of millions of messages. Some pages pulled queries up to 16 times, every time they were loaded.

The new code is designed to cut that right down, and keeps all the threads in new answers in a cache. Posts are written to and read from the cache. This keeps database usage to a minimum -- since we started, 98.8% of pages have been drawn from the cache. This explains why it's so fast.

The vast majority of features users are asking for are per-user (some aren't, like xposts). Per-user implies pulling the db for each page load again (bad news) or running another cache system (extra complexity). That's why they're lower down the food chain until the rockness of the cache has been established.

The timed logout depends on Tomcat's sessions set up, from which a lot of the security stuff is derived.

.stet., Thursday, 22 February 2007 20:15 (eighteen years ago)

There's nothing wrong with hitting the database to get info, it's just that the old db was clunkily constructed, implemented, and configured. Sounds like you guys overcompensated a little in trying to avoid hitting the db.

Some possible approaches for the timed logout issue, if you don't want to store the info on everyone who's ever logged in to memory - store a login in the db, table just has to be a unique id (also stored in the cookie) and a bit flag. The overhead in hitting the table would be extremely low - it's only returning one bit per user. If you wanted to limit the open connection overhead, you could load the returned value into a session and when the session expires, you check the db again before asking the user to log in. Logging out would delete from the table and the cache.

Edward III, Thursday, 22 February 2007 20:26 (eighteen years ago)

Sounds like you guys overcompensated a little in trying to avoid hitting the db.

Yeh, possibly. I mean, it's been tested up to 400 connections/sec so far, and there's plenty more to go. Compare the statscock, which would do 1/30th of a connection/sec sometimes.

Keith has Views on the session/login issue, so I'll leave that one for him.

.stet., Thursday, 22 February 2007 20:33 (eighteen years ago)

Normally, a database keeps a query cache, which allows for very fast reponses to frequent static queries. Is the problem with the old-ILX code that this cache underperformed, or what?

libcrypt, Thursday, 22 February 2007 20:50 (eighteen years ago)

Didn't someone on the old thread mention he had access to load-testing software? Don't know if you'd be interested in pursuing that.

I like that Keith has "Views" on it - upper case makes them sound so important!

Edward III, Thursday, 22 February 2007 20:51 (eighteen years ago)

It would take so long to detail all the things in the old db, the codebase, and data access methods that caused performance problems. Suffice it to say that caching did not provide relief.

Edward III, Thursday, 22 February 2007 20:55 (eighteen years ago)

I don't think I overcompensated in the design - I don't see how I could have without knowing what the load, and potential future load, could be... My assumption was that really I couldn't overcompensate, as we've limited hardware. If I do stuff that makes it slower, it's just sooner when everyone has to donate again for more hardware (assuming the number of users doesn't just stay the same), which obviously I'd like to avoid.

What I do know, is that going to disk is ~1000 slower than going to (in-process) memory, so consequently, if you're to go to disk for a primary function instead of memory, then your server will fall over with 0.1% of the users it would were it using memory.

Certain pages do hit the DB, and certain queries are not as fast as they should be. Paradoxically, new answers is a relatively slow query (500ms or so), but it doesn't need to be fast, as it only does it once and then the cache is maintained as well as inserts to the DB. The only way to make that query faster, would have been to do pretty much what Graeme did, which is a fine approach (make another table and keep it up to date), but I wanted to keep the database as simple as possible. I'm not dogmatically against going to a database in principle, but it's clear that with the most heavily used functions, that would just make it orders of magnitude slower. The added side effect of caching things is that even if it were to start getting a bit hairy on the database (lots of load), that people won't be being 'poxy fuled', because it doesn't need to go there.

I load tested stuff with JMeter, but it's always tricky in load testing to get real world load, when you just don't know what the real world load looks like, as I said at the start. It's not 400 connections per second; on my five year old PC I could run 400 transactions per second. The number of DB connections would have likely not exceeded 1.

I now have a bunch of stats that I'm writing periodically to the database, so know much more about traffic than I did before, though I'm not writing the stats yet in all the places that I'd like to be, but the heavy load stuff (threads; new answers) is. What I can see, interestingly, is that threads outnumber new answers by 100-1. I think this means googlers. Were I to make the thread page go to the database for anything, the load would take a serious leap.

As I've said before, my primary driver in architecting this was to have a piece of kit that when someone stuck a link to it in a newspaper, that the site didn't just fall over for weeks, as happened before. It seems like such a missed opportunity that, loads of people who were interested enough to type the link in didn't get anything at all, and no doubt gave up some point later.

Anyway, if I get the time at some point soon, I will draw a few diagrams to explain how stuff works.

Keith, Sunday, 25 February 2007 12:55 (eighteen years ago)

It took a little while to refresh my grown-stale Tomcat knowledge, but this weekend I got my own personal ILX2 running on my MacBook Pro. I wrote a little patch to obscure email addresses entered in posts, even though it's not going to be as relevant now, since email addresses aren't publicly displayed anywhere. I dunno if my mail to Keith got through, but if you're innerested, let me know. If'n you don't hate my coding, maybe I can help out more going forward.

libcrypt, Monday, 26 February 2007 17:08 (eighteen years ago)

libcrypt do you have a nice how-to for getting this up and running on OS X? I tried once and I think I gave up around configuring sql.

TOMBOT, Monday, 26 February 2007 17:12 (eighteen years ago)

This is a fun thread to read, even if you don't understand any of it, like me.

Tracer Hand, Monday, 26 February 2007 17:59 (eighteen years ago)

TOMBOT, I've probably forgotten most of the zillion little details needed to get Tomcat all happy on OS X, but I can advise if you get stuck.

libcrypt, Monday, 26 February 2007 18:36 (eighteen years ago)

tomcat is SHIT

JW, Monday, 26 February 2007 20:22 (eighteen years ago)

Neither insightful nor helpful I'm afraid.

Ed, Monday, 26 February 2007 20:23 (eighteen years ago)

maybe this got removed. I swear I asked: (and maybe this is the wrong place for it); is it possible to get sub-board links back up in the nav, so we don't always have to jump back to the 'boards" page?

akm, Monday, 26 February 2007 22:40 (eighteen years ago)

It's been asked for several times in several places. It's not a high priority, I don't think. Advice has been to use browser bookmarks/tabs.

Ed, Monday, 26 February 2007 22:42 (eighteen years ago)

My advice is to just install phpbb

JW, Monday, 26 February 2007 23:29 (eighteen years ago)

My advice is to just install phpbb

Yeah, and then we can all get cute "which-South-Park-character-are-you?" avatars with pastel-blue or eye-gouging #00FF00 on black themes, user ratings, and enough login stats to choke a Hilton!

libcrypt, Tuesday, 27 February 2007 00:21 (eighteen years ago)

And persistent cookies!

JW, Tuesday, 27 February 2007 00:32 (eighteen years ago)

Hey Jon, here's a great idea - why don't you like, actually help out with the coding if you're so good at this? Jesus H.

Trayce, Tuesday, 27 February 2007 00:38 (eighteen years ago)

Install a backdoor while you're at it.

Casuistry, Tuesday, 27 February 2007 00:57 (eighteen years ago)

Hey Jon, here's a great idea - why don't you like, actually help out with the coding if you're so good at this? Jesus H.

Sure I'll just learn the most ass backward internet technology (TOMBOT JABA BEANGS) in the world that is completely useless to me just so I can have my contributions rejected. MORE LIKELY TO RECODE ILX MYSELF.

I'd consider writing an HTML validator but IIRC the data is stored in the DB not in HTML.

JW, Tuesday, 27 February 2007 01:31 (eighteen years ago)

I don't get the problem with persistent cookies. "If an exploit happened that allowed hackers to read your cookies, they could log in as you." - that would be a browser based exploit, not an ilx exploit, right? What are the chances of that? Why do the gazillion sites out there that use persistent cookies not think this is a problem?

n.b. as a currently unemployed web developer I'd be happy to offer a bit of coding help on the ui side of things.

ledge, Tuesday, 27 February 2007 10:19 (eighteen years ago)

Have you all considered that our own Jon Williams might in fact not be very good at programming, and is, truth be told, just a tiresome braggart who has been able to break other volunteers' code on a few occasions in the past and now just wants to throw fits because he can't find a website that tells him how to insert javascript redirects into the new restricted formatting markup?

TOMBOT, Tuesday, 27 February 2007 13:53 (eighteen years ago)

Is anyone working on the stylesheet at the moment? Any chance someone can fix this:

.headingblock {
left:35px;
width:100%;
position:relative;
color : #000000;
font-weight: bold;
}

The combination of the shift to the left and the forced 100% width is causing a horizontal scrollbar.

.headingblock {
margin-left:35px;
position:relative;
color : #000000;
font-weight: bold;
}

should fix it.

There will also be an unnecessary horizontal spacebar when 5% < 20px because of a similar problem with the .mainblock class:

.mainblock {
left:20px;
width:95%;
min-width: 25em;
margin-top: 2em;
margin-bottom: 1.5em;
padding: 1em;
position:relative;
background-color: #FFF;
color : #000000;
}

The fix is the same.

caek, Tuesday, 27 February 2007 18:28 (eighteen years ago)

Jon Williams may not be able to code, but he wrote the bestest Star Trek soundtrack evah!

libcrypt, Tuesday, 27 February 2007 18:38 (eighteen years ago)

I skinned that cat slightly differently (xpost to caek)

TOMBOT, Tuesday, 27 February 2007 18:45 (eighteen years ago)

whatever it choad

JW, Tuesday, 27 February 2007 18:58 (eighteen years ago)

xpost, dope. I've still got the scrollbar -- your fix is still to be rolled out, amirite?

caek, Tuesday, 27 February 2007 20:26 (eighteen years ago)

btw, the more things change, the more they stay the same

(summary, things change, people grumble, people get used to it, jon haXors it to prove a point rather than actually helpfully point stuff out to mods. Then, um, it turns into a discussion about the nature of art)

ailsa, Tuesday, 27 February 2007 20:34 (eighteen years ago)

OK, I have no idea why I tried to close a link tag with an italic tag. I R moron (can we have the "yer tags are rubbish" warning back please?)

ailsa, Tuesday, 27 February 2007 20:35 (eighteen years ago)

Ok so all these problems that need fixing with the code are bugging me, and I'm no fan of bbcode, and ajax is sooooo last year... but I just started to type &lt;i&gt; and this big red text flashed up telling me I can't use HTML tags and it BLEW MY MIND!

OMG it just happened again1! awesome!

ledge, Tuesday, 27 February 2007 23:38 (eighteen years ago)

oh fine escape my ampersnads for me!

ledge, Tuesday, 27 February 2007 23:39 (eighteen years ago)

ampersnads!

ledge, Tuesday, 27 February 2007 23:40 (eighteen years ago)

I don't think that's AJAX, dude.

Tracer Hand, Tuesday, 27 February 2007 23:41 (eighteen years ago)

just dhtml

JW, Tuesday, 27 February 2007 23:49 (eighteen years ago)

AJAX is like numbers actually being crunched or a databasey type thing happening and the results are returned to you via DHTML/Javascript in real-time, without reloading the page - the red warning thing on the text input field here is purely client-side, it's form validation

Tracer Hand, Wednesday, 28 February 2007 00:01 (eighteen years ago)

yeah. of course. i was drunk.

ledge, Wednesday, 28 February 2007 10:16 (eighteen years ago)

Have you all considered that our own Jon Williams might in fact not be very good at programming, and is, truth be told, just a tiresome braggart who has been able to break other volunteers' code on a few occasions in the past and now just wants to throw fits because he can't find a website that tells him how to insert javascript redirects into the new restricted formatting markup?

Rofflicious. Would be funny he knew all these terms but was unable to really do it. So, in short, no.

nathalie, Wednesday, 28 February 2007 10:26 (eighteen years ago)

JNDI + NetBeans == PITA.

libcrypt, Wednesday, 14 March 2007 07:57 (eighteen years ago)

I know I'll love JNDI once Tomcat stops poking me in the eye with it, I'm sure.

libcrypt, Wednesday, 14 March 2007 07:59 (eighteen years ago)

JNDI always seems a bit needlessly complicated to me, but it's probably not JNDI itself is it? It's a bit like connection strings, which are just some random string you have to get right (this helps with that...). No amount of understanding helps...

Keith, Wednesday, 14 March 2007 08:59 (eighteen years ago)

This page has 3 very good examples of JNDI, and they all work perfectly when I cook up a sample project following their guidelines, but the JDBC example doesn't seem to translate from boilerplate to real-world very well: Tomcat keeps throwing a javax.naming.NameNotFoundException: Name jdbc is not bound in this Context error with no ILX2 source lines referenced in the stack trace, since the exception is apparently thrown in j_security_check. I may have to grab Tomcat source to see what's going on, but the stubborn bit of me is resisting, based on the fact that the config is isomorphic to a working example of JDBC thru JNDI.

libcrypt, Wednesday, 14 March 2007 14:41 (eighteen years ago)

Now that I've slept on it, I'll bet the problem is the <Realm>, not the <Resource>: Since I'm using context.xml, its scope probably doesn't extend as far as it would in server.xml. If that's the problem, I may just need to move it up a level.

libcrypt, Wednesday, 14 March 2007 15:14 (eighteen years ago)

A few more rounds of musical XML in-between Oracle classes, band practice, etc., and I have version 294 building in NetBeans 5.5 and running with much, if not all, major functionality intact. This time, though, I documented everything, at least on the Tomcat/NetBeans end. Getting svn -- at least the very latest version -- running on Mac OS X is a bit laborious but pretty much standard unix geek stuff, and the MySQL bits are boilerplate. On the way, I did discover that MySQL 5.1 has dynamically-settable SQL tracing of a sort, which is a useful thing to know. If anyone wants to fool with the code in NetBeans, I'll post my doc.

libcrypt, Friday, 16 March 2007 07:44 (eighteen years ago)

I did it all in Eclipse 3.2, but on Linux rather than the Mac, though it shouldn't really be much different. I did write a document about getting up and running at some point and it's in Subversion. Possibly needs some updating.

Keith, Friday, 16 March 2007 08:02 (eighteen years ago)

NetBeans uses a layout with significant differences from Eclipse, so there's definitely a few changes that must be made to get NetBeans happy: Source stays out of WEB-INF, for instance, as do libraries. Also, Mac OS X doesn't come with Subversion, and you have to install Java 1.6.0 to get access to deque. Figuring out and fixing those issues was fairly routine, but the bit that tripped me up was that NetBeans prefers that context configuration go in context.xml, not server.xml: I couldn't just drop the context section from my old server.xml (from like a month ago or whatever) into context.xml: It didn't work until I realized that I needed the JDBC JNDI info in GlobalNamingResources, which I now realize you've added in in your sample Tomcat config too.

So why did I go to this bother? I like NetBeans better than Eclipse on OS X: The UI is a lot more polished and consistent, it comes bundled with Tomcat 5.5.17, and it doesn't look like ass on the Mac. The big negative is that there's no Subversion support, so while you can check out ILX2 right inside Eclipse, all source management has to take place outside NetBeans (which seems to have only CVS support).

libcrypt, Friday, 16 March 2007 14:29 (eighteen years ago)

Here is my guide guide to getting a development environment running on Mac OS X for ILX2. Questions are welcome.

libcrypt, Sunday, 18 March 2007 05:17 (eighteen years ago)

I don't think I wound up using the Deque class... Well, there may be a cache in there that uses it at compile time, but I'm fairly sure it's not used at runtime.

Still 1.6 is 20-25% faster than 1.5, which is fairly amazing. The throughput went up quite a bit when I switched to using it.

I've only tried out NetBeans occassionally. Seemed good for GUIs (at least, in as much as Java GUIs are generally shit), but on the whole I'm just more used to Eclipse, myself.

Keith, Sunday, 18 March 2007 11:51 (eighteen years ago)

You have clearly never had a job in tech.

libcrypt, Monday, 19 March 2007 02:47 (eighteen years ago)

I was a teenage .com-er

JW, Monday, 19 March 2007 05:39 (eighteen years ago)

libcrypt - to me URLs are part of the user interface. The less technical and the more intuitive they appear, the better.

This may be diverging from the spec and if so feel free to tell me to shut up (you may already have done that actually...) But how much mod_rewriting would it take to put in place a scheme something like this:

ilxor.com/boards
ilxor.com/40/newquestions
ilxor.com/40/newanswers
ilxor.com/40/52386

we woz robbed, Wednesday, 21 March 2007 21:07 (eighteen years ago)

we woz robbed: Technically, its simple to implement a slash-based URL rewriting scheme, and yes, I agree that "/" looks nicer than ugly GET strings and their "?"s and "&"s. However, there are at least two issues to consider before implementing such a thing:

1. Performance: Rewriting URLs would put a nontrivial load on a machine that ought stand up to a hearty Pitchfork-dotting. ILX2 runs on just one server, and there are times when it's handling a rather massive amount of non-member page fetches.

2. Security: The instant you introduce programmatic URL transformations is the moment you have to add yet another XSS worry to yr list of concerns: Those with an interest in ILXSS would no doubt be intrigued by the notion of mapping old XSS'ties to a new scheme.

A good, safe implementation of URL rewriting isn't something that can or should occur without much consideration, and almost surely isn't on anyone's short list of ILX desiderata, at least for 2007.

libcrypt, Thursday, 22 March 2007 02:27 (eighteen years ago)

slash rewrites aside, you have to admit that wwr is pretty right... "ThreadSelectedControllerServlet" is pretty damned nerdified. i understand how nice it is to have your classes/endpoints/etc named something meaningful... the "ControllerServlet" part should get lopped tho..."ThreadSelected?blah blah" "NewAnswers?blah blah"... that would help the aesthetics of probably the most vanilla site i can think of. (craiglist would probably arm wrestle you for that.) it's cutting close to bbs levels here with your nerdy bits hanging out and all.

also, i might add that depending on how much slash rewriting is done, the preferred method is using a hash so there's no chance of processing error and the performance cost is only a look up. of course, you're only rewriting very specific urls and anything not found in your hash goes straight to /.

is there a feature list/back log out there somewhere? just curious. in the back of my head there's two trolls fighting over whether to fire the app up on my local. "you fucker, you have no time!" "aw man, come on!"
m.

msp, Thursday, 22 March 2007 04:49 (eighteen years ago)

In a sense, ILX is its own feature list/backlog.

libcrypt, Thursday, 22 March 2007 05:16 (eighteen years ago)

a thing:

1. Performance: Rewriting URLs would put a nontrivial load on a machine that ought stand up to a hearty Pitchfork-dotting. ILX2 runs on just one server, and there are times when it's handling a rather massive amount of non-member page fetches.

2. Security: The instant you introduce programmatic URL transformations is the moment you have to add yet another XSS worry to yr list of concerns: Those with an interest in ILXSS would no doubt be intrigued by the notion of mapping old XSS'ties to a new scheme.

A good, safe implementation of URL rewriting isn't something that can or should occur without much consideration, and almost surely isn't on anyone's short list of ILX desiderata, at least for 2007.


Every point and piece of reasoning you use here is just fucking incorrect.

JW, Thursday, 22 March 2007 06:13 (eighteen years ago)

And to suggest I used XSS is pretty funny considering it could be done without being "cross site" or using any "Scripting".... but yea same kind of vector

JW, Thursday, 22 March 2007 06:13 (eighteen years ago)

With respect to rewriting URLs, tracer has volunteered to collate a list of suggestions from the site, get a poll up and get people voting on what they want. Once that's done, then things can be worked on in that order.

Naturally, libcrypt's points will have to be taken into serious consideration here wherever it might wind up on a poll. If it were people's number one concern, then the CPU used is less of an issue - it's a cost/benefit thing - it will use CPU, but why not use it on that if that's what everyone wants and it's not going to break the box (I am assuming that people would be in favour of a site that works more than one that has different URLs)? However, the hacking side of it is still a concern and would influence the decision, as it should.

I am sure URL rewriting will not break the box in itself, but most of the suggestions will add to the load and at some point regardless will start causing problems. I think it's right that we should be careful about it so we don't wind up with a board that falls over as it used to.

Keith, Thursday, 22 March 2007 09:09 (eighteen years ago)

> i might add that depending on how much slash rewriting is done, the preferred method is using a hash

if they're talking about using mod_rewrite for it (so apache does it before it even hits the website proper) then it's regular expressions or nothing, no hash lookup available, just additional entries in httpd.conf.

and simple changes like replacing anchored strings with other anchored strings is, i'd say, as safe as houses (and yes, there's no reason why the servlet can't be called 'NewAnswers'). performance-wise, i've never considered it.

koogs, Thursday, 22 March 2007 09:43 (eighteen years ago)

And to suggest I used XSS is pretty funny...


You're so vain.

libcrypt, Thursday, 22 March 2007 13:07 (eighteen years ago)

I lost the thread about :8090 (and I can't access :8090 from work), but is the plan still to switch over to that code this week?

caek, Tuesday, 27 March 2007 12:13 (eighteen years ago)

Skipping 2 messages at this point... Click here if you want to load them all.

2 messages!

I suggest a threshold (e.g. 150 posts) before it truncates to 100.

onimo, Tuesday, 27 March 2007 13:54 (eighteen years ago)

if they're talking about using mod_rewrite for it (so apache does it before it even hits the website proper) then it's regular expressions or nothing, no hash lookup available

http://httpd.apache.org/docs/1.3/mod/mod_rewrite.html#RewriteMap

ledge, Tuesday, 27 March 2007 15:40 (eighteen years ago)

The type of "hash lookup" available in mod_rewrite really isn't meant to be a substitute for regex-based string processing; it's more a convenience that keeps the configuration from getting outsized when you have a zillion specific rewrites to maintain. But more generally, maintaining a hash rewrite table for a dynamic board like ILX would be more expensive than regex rewrites: You'd have to write to a shared resource each time a post was made. In the case of mod_rewrite, that'd be some kind of file-based resource, which would mean a massive amount of file-locking contention in the worst case, and in the best (say, Sleepycat, which is likely inapplicable to NDBM), at least a lot of very slow I/O. Until mod_rewrite gets RDBMS support, regular expressions are the best of all possible worlds.

libcrypt, Saturday, 31 March 2007 16:43 (eighteen years ago)

that does look useful, hadn't seen it before. it'll be easier to maintain for large numbers than regexp rules in httpd.conf (or whatever) and hopefully it'll be kept in memory rather than read from file every time (they are).

plus 2.0 and above has support for different db types (but not rdbms)

BUT all the examples they give still use regexp:

> RewriteRule ^/ex/(.*) ${examplemap:$1}

and i don't see how a lookup can be easier than substituting one string for another given that we'd probably only use it to translate 'ile?' to 'ILX/ThreadSelectedControllerServlet?boardid=40&threadid=' (although that would be enough, really, one for each board)

koogs, Saturday, 31 March 2007 20:21 (eighteen years ago)

hopefully it'll be kept in memory rather than read from file every time (they are).


I view the file-vs-NDBM issue as a red herring. I can't speak to the actual design of mod_rewrite, but a good design would read the entire plaintext mapping file into memory and keep it sorted there for fast access, never touching it again without an Apache reload. Only when the map became so large as to impede performance would it make sense to use a database file: A reasonable db API would then make transparent all issues of memory and IO balance, perhaps with some kind of "system global area" tunable memory setting.

I'll take yr word for it that mod_rewrite stupidly doesn't use that kind of a strategy. But even if mod_rewrite didn't have any such issues, there would still be the problem of the shared hash table: Locking issues aside, each URL access would cause the file at least to be touched to obtain the modification time, and there would probably be some IO, if it had been. That'd be FAR slower than even the poorest-tuned regex. Considering that the regex engine is going to be involved anyways, a URL lookup table is a non-starter.

libcrypt, Saturday, 31 March 2007 21:02 (eighteen years ago)

a good object cache would never look at the cache source file mod time if you tell it not to. you could specifiy that it would only load when the app fires up or on some other event. meanwhile, you synchronize additions to the live cache hashmap and the source file when new questions come along.

i don't know. it really depends on what urls are part of a rewrite. like i said, "limited number". which, if you're talking about all urls at ilx, that's a few more orders of magnitude than what i was considering.

a well-balanced binary sorted hash would probably not be so much slower than a regex.

what was the whole point of this crap again? oh yeah, you guys need your classes named nicely.

isn't the real solution to bite the bullet and rename the stuff...?

how many hit/sec and what are the loads you guys are seeing anyway?

i'm still a little taken aback by some of the discussion on this thread. seriously, i've seen some really shitty code work just fine under a fairly large user base.

i appreciate the comments above. just batting some ideas around really.
m.

msp, Sunday, 1 April 2007 04:12 (eighteen years ago)

Is anyone looking at improving the HTML - or does it come from libraries and be difficult to tweak?

I'm not that sure how important this is to anybody though, the main problem with all those invalid br-slash or hr-slash codes would be that it is throughing the browser into quirks mode which means some stylesheet instructions get messed up.

However aiming for clean valid html4 strict, complete seperation of content and presentation and - especially, carefully semantically marking up the code would give a great set of hooks for inventive zengarden style sheet creators.

Sandy Blair, Sunday, 1 April 2007 07:36 (eighteen years ago)

Seconded! As priorities go it's hardly u+k, but nowt wrong with ensuring valid html, and it would indeed make writing custom css easier.

ledge, Sunday, 1 April 2007 16:21 (eighteen years ago)

What's wrong with it?

Keith, Sunday, 1 April 2007 17:12 (eighteen years ago)

All the <br/>s in comments. They're paragraphs and they should be marked up as such.

caek, Sunday, 1 April 2007 17:21 (eighteen years ago)

Some of this may sound like nitpicking...

For validity:
space needed before the slash in empty tags (or go with html instead of xhtml and lose the slash)
unescaped ampersands in links
(those are the most obvious, yr friendly html validator will of course show 'em all)

For css support/best practice in general:
paras instead of linebreaks in messages - in fact lose all <br>s, they're unstylable and not nice.
A wrapper with class or id for anything that could be stylable - there are a lot there already but could do with ones for e.g. the "--" in sigs, and separate ones for started by/last updated on the new answers page.

Incidentally why are all the messages in a thread inside a form?

xpostage

ledge, Sunday, 1 April 2007 17:25 (eighteen years ago)

I realise the paras instead of linebreaks in messages would require some more sohpisticated message parsing.

ledge, Sunday, 1 April 2007 17:28 (eighteen years ago)

Ledge, thanks for this, but I don't quite follow the validity comments (space needed before slash and unescaped ampersands), can you give me an example? I have used a validator and it doesn't tell me much, to be honest. What's a good one?

The CSS stuff is fine and right enough about the BRs.

The messages are all in a form because an admin requested functionality whereby they could select multiple threads to be deleted at once.

Cheers.

Keith, Sunday, 1 April 2007 17:31 (eighteen years ago)

For backwards compatibility with un-xhtml aware browsers, you have to have a space before the slash that indicates an empty tag - <br />, <hr />. And empty tags shouldn't be closed like normal tags - <input />, not <input></input>. Ampersands in all the inter-site links (and anywhere else) should be escaped - yadayadaservlet?boardid=40&amp;threadid=52386 (this doesn't affect link functionality).

http://validator.w3.org/ is the proverbial horses mouth. Wordy but comprehensive and does a good job of explaining the errors. It gives a vast quantity of errors but they're mostly all 'reference to entity "boardid" for which no system identifier could be generated' which is just referring to the unescaped ampersands, of which there are many.

ledge, Sunday, 1 April 2007 17:45 (eighteen years ago)

Thanks...

Keith, Sunday, 1 April 2007 17:54 (eighteen years ago)

So of course, things are never as simple as you think! I replaced all the &s for &amp; and some of them are sendRedirects, which means that the server instructs the browser to go to a different URL. These needed to go back.

Keith, Sunday, 1 April 2007 22:16 (eighteen years ago)

Hey Keith, as I said, the main pragmatic reason to get the HTML valid is that Internet Explorer has two layout modes, compliant and quirks mode. Quirks mode follows all sorts of microsoft specific layout rules that drives everyone nuts, such as double counting padding and borders for the size of an element.

I'm not sure that not following the entities ruling invokes quirks mode though, but not having a space between the 'br' and the slash will do. I don't know of any advantage of having ILX in xhtml so would suggest just making them slashless.

DOes anyone know if the 'microformat' advocates have made any suggestions for how to semantically mark up a message thread?

Sandy Blair, Monday, 2 April 2007 01:37 (eighteen years ago)

An XHTML 1.0 Strict valid ILX would force the markup to be better and more amenable to CSS than HTML 4.01 Strict. It should also render more quickly, especially on embedded devices, if not necessarily on IE.

libcrypt, Monday, 2 April 2007 04:32 (eighteen years ago)

libcrypt - ooh several derail debating opportunities, but I better not!

Did Keith say how easy it is to tweak the xhtml ? If it is easy we could write up an ideal ilx DOM.

Sandy Blair, Monday, 2 April 2007 06:20 (eighteen years ago)

> hopefully it'll be kept in memory rather than read from file every time (they are).

by which i meant they *are* cached in memory. not sure that was clear.

i realised about 5 minutes after disconnecting that the example i gave was a prime one for the use of hash map - rather than 20 odd regexps that differed only in board name and board id (and which would be run for every request) you'd use one that used a map from board name to board id (or vice versa). i don't think this needs to be dynamic, just one entry per board.

i think there's an html validator built into firefox - view source gives me a box full of errors at the bottom. 192 errors and 2 warnings for this page! (mostly it thinking unescaped ampersands are some weird entity)

koogs, Monday, 2 April 2007 08:50 (eighteen years ago)

Sandy, I couldn't bring myself to put in <br>! Not after almost a decade of dealing with XML systems.

In my head, I kind of feel HTML was bent and twisted into shape by the formative years of the WWW and that XHTML is an attempt to get that into a stricter form, based on XML rather than the older and more complex SGML. I don't really know a lot about the detail of the differences between HTML and XHTML, but I did write an SGML parser in 1996, to deal with the output from a Unix desktop publishing package and that was a bucketload of daft special cases, and I think it would make sense to avoid that sort of mess again, for a number of reasons, including my health!

Keith, Monday, 2 April 2007 10:35 (eighteen years ago)

An illegal link made the Chicago thread all in italics. :(

Also a couple bugs I'm seeing in IE:

--I deleted my cookies but after posting I don't get redirected back to the thread (or rather I do, but it doesn't load, I have to hit refresh)

--The hyperlinks at the bottom of the page (search, new answers, etc.) disappear when I mouse over them sometimes. At least on the fucked up Chicago thread.

Jordan, Monday, 2 April 2007 14:40 (eighteen years ago)

Woah! What happened there? I hit refresh and everything looks different!

kv_nol, Monday, 2 April 2007 15:01 (eighteen years ago)

What version of IE, Jordan?

libcrypt, Monday, 2 April 2007 15:05 (eighteen years ago)

IE6

Jordan, Monday, 2 April 2007 15:18 (eighteen years ago)

I can't get that effect to happen on IE 6.0.2900.2180, at least.

libcrypt, Monday, 2 April 2007 15:26 (eighteen years ago)

The hyperlink thing got fixed along with the italics.

That's the version I'm using, + SP2 and some other shit.

Jordan, Monday, 2 April 2007 15:28 (eighteen years ago)

Is there a way to access threads not in the top 50/new answers without trying to search?

milo z, Monday, 2 April 2007 19:48 (eighteen years ago)

APPARENTLY NOTT

Dr Morbius, Monday, 2 April 2007 20:30 (eighteen years ago)

the really rough part about attempting compliance at this point is that there's a mound of legacy html in the data of back posts from earlier versions of ilx. that's a tough problem to enforce new coolness but still have old nastiness.

m.

msp, Tuesday, 3 April 2007 00:42 (eighteen years ago)

>> Is there a way to access threads not in the top 50/new answers without trying to search?

> APPARENTLY NOTT

'More...', then 'Recent Questions' lets you page back through time.

koogs, Tuesday, 3 April 2007 08:15 (eighteen years ago)

Why is it so slow?

Eyeball Kicks, Tuesday, 3 April 2007 09:28 (eighteen years ago)

msp - indeed. I was thinking about this yesterday. No way am I going through some cleanup exercise either, as it'd just be too easy to screw up all the data, as well as being just too much work. The job to migrate from the old ILX database to the new one was bad enough, what with all the corruption that existed in the old one and that's not paying any attention to thread content (except the links, that had to be munged).

Libcrypt's done a lot of work to force XHTML compliance that'll go in with the next lot of stuff, but you're right and there's really no easy answer to it.

Keith, Tuesday, 3 April 2007 09:33 (eighteen years ago)

Yeah forget the HTML in old threads, if you get valid xhtml on a new thread - or a thread with no html playing, and if there is enough sematic hooks - basically decent class names against the elements of the message thread - then we're sorted for having a CSS zengarden style playground.

Sandy Blair, Tuesday, 3 April 2007 21:10 (eighteen years ago)

Whenever I post an answer, I get a blank screen, and then I have to click refresh to make ILX work again. Does anyone else have this problem?

Tuomas, Wednesday, 4 April 2007 07:40 (eighteen years ago)


You must be logged in to post. Please either login here, or if you are not registered, you may register here.