PHP/SQL Geeks Click Here For Fun N' Games

Message Bookmarked
Bookmark Removed
This is the thread to talk about the future of the ILX codebase, as per Pashmina, the future of ILX.

TOMBOT (TOMBOT), Thursday, 10 August 2006 12:32 (eighteen years ago) link

First question: entirely new codebase (long and complicated) or patching up the existing code (potential probs if Graham doesn't want just anyone looking at it)?

steal compass, drive north, disappear (tissp), Thursday, 10 August 2006 12:38 (eighteen years ago) link

As my boss would say, "do both"

Seriously, it would make sense to do a clean rewrite; and potentially it would be faster than trying to understand every nook and cranny of the old code.

(I have not seen the old code, but from what I understand it isn't wonderfully clear)

Forest Pines (ForestPines), Thursday, 10 August 2006 12:41 (eighteen years ago) link

yeah. it's not like there really was something really special about the old code.

(i mean, what everyone after is the look and feel, which frankly seems easier to do than the usual fancy horror of most bbs!!!!)

ken c (ken c), Thursday, 10 August 2006 12:43 (eighteen years ago) link

I think the lack of comments in Gr4h4m's spaghetti recipe, as alluded to in the other thread, is enough to make it worthwhile to rewrite the thing. "patching up" should consist of the bare minimum of maintenance, as with anything that's planned for deprecation, I would think.

TOMBOT (TOMBOT), Thursday, 10 August 2006 12:44 (eighteen years ago) link

Did we have any good recommendations for frameworks on the other thread?

TOMBOT (TOMBOT), Thursday, 10 August 2006 12:44 (eighteen years ago) link

A scratch rewrite would be the way forward, but it needs to be made compatible with the archive. Either keep the tables as is or make it easy to migrate them over.

Johnny B Was Quizzical (Johnney B), Thursday, 10 August 2006 12:45 (eighteen years ago) link

Re: Frameworks,

We mentioned Ruby On Rails, but I think that's something that's not gonna fly. LAMP seemed to be the way people wanted to keep it.

steal compass, drive north, disappear (tissp), Thursday, 10 August 2006 12:46 (eighteen years ago) link

Keeping a rewrite compatible with existing database stuff is easy enough so long as the DB schemas are the same.

steal compass, drive north, disappear (tissp), Thursday, 10 August 2006 12:46 (eighteen years ago) link

If the DB schema is fairly clear and straightforward then rewriting the SQL parts should be clear and straightforward too.

(I love theories like that)

Forest Pines (ForestPines), Thursday, 10 August 2006 12:48 (eighteen years ago) link

a look at the current database schema would be handy though (if we have any aspirations to migrate the old data/put new code over the same db)

xxxpost

ken c (ken c), Thursday, 10 August 2006 12:48 (eighteen years ago) link

(Things needed for a rewrite, while it's on my mind:

SVN server
Bugzilla installation)

steal compass, drive north, disappear (tissp), Thursday, 10 August 2006 12:48 (eighteen years ago) link

Assuming Andrew doesn't get the sponsorship renewed, and assuming it's ok to continue running it on as it is on a pay-per-month basis for a couple of months, what kind of timescale are you all looking at to do this?

Pashmina (Pashmina), Thursday, 10 August 2006 12:49 (eighteen years ago) link

depends how much time people have really i suppose.. i'd guess around 2-4 weeks?

ken c (ken c), Thursday, 10 August 2006 12:51 (eighteen years ago) link

Depends really on how many people are working on the code, plus what level of functionality we're talking about: a basic ILX, i.e. ability to add and respond to threads could be completed before any extra functionality coded in.

steal compass, drive north, disappear (tissp), Thursday, 10 August 2006 12:52 (eighteen years ago) link

plus a couple of weeks for testing/beta (web 2.0 stylee)

ken c (ken c), Thursday, 10 August 2006 12:52 (eighteen years ago) link

i mean, (a couple of years if we're being web 2.0 stylee)

ken c (ken c), Thursday, 10 August 2006 12:53 (eighteen years ago) link

A couple of infinities if you're Google.

Forest Pines (ForestPines), Thursday, 10 August 2006 12:54 (eighteen years ago) link

So if the sponsorship's running out on the 7th of Sept, we could do with another month, maybe another 2. I'll ask Andrew about it when he pops up on the other thread tomorrow.

Pashmina (Pashmina), Thursday, 10 August 2006 12:55 (eighteen years ago) link

google now provides free SVN hosting, fwiw.

gbx (skowly), Thursday, 10 August 2006 12:55 (eighteen years ago) link

Is this on the Google Code thing?

steal compass, drive north, disappear (tissp), Thursday, 10 August 2006 12:56 (eighteen years ago) link

i still remember some BASIC, if it helps

heavyweight grebt (sanskrit), Thursday, 10 August 2006 12:57 (eighteen years ago) link

(um, i think that should say 'MySQL' in the title, just to be precise as it's not exactly SQL. a version number for both that and the PHP codebase would be good too - they do differ quite a bit between revisions. i guess if we're rewriting we have a bit of leaway there).

have used php and mysql before for small webby projects, nothing anybody but stevem has ever seen though. have some smarty experience too (andrew mentioned it in his rewrite plans but i'm not sure it'll be useful). i also have to take some holiday (29 days left this year...) maybe i can take time off and sit and contribute to this...

would like to see current code before deciding on a complete rewrite though, 'licence' permiting.

Koogy Yonderboy (koogs), Thursday, 10 August 2006 13:06 (eighteen years ago) link

Fixing the PHP code has been discussed numerous times in the past. The sticking point has always been that the original authors didn't want it to be public right? Seems like that will continue to be a thorn unless a new codebase is made from the ground up. Seems to be the way forward as long as the DB schema can stay intact. Also PHP5 with support for classes would be a good move if a new codebase is undertaken.

Songbirds of Darker Florida (cprek), Thursday, 10 August 2006 13:23 (eighteen years ago) link

how would classes help? which classes would you use? (i am OO dumb and tend to see everything through procedural eyes rather than OO eyes, sorry)

Koogy Yonderboy (koogs), Thursday, 10 August 2006 13:26 (eighteen years ago) link

COBOL + tables + frames

=[[ (eman), Thursday, 10 August 2006 13:31 (eighteen years ago) link

http://www.koogy.clara.co.uk/ile.html (sorry)

Koogy Yonderboy (koogs), Thursday, 10 August 2006 13:51 (eighteen years ago) link

I'm not sure if classes would be useful or not, having not seen any of the current code or schema. But if there is an option of being able to use fancy class features and not needing them, or needing fancy class features and not being able to use them... pick the former.

PHP5 class and object handling was rewritten for more functionality and optimization. It's a little different from PHP4 though, so better to know right away what you plan on using. I imagine that for this board it would be a minor detail, but better

http://www.alternateinterior.com/2006/06/differences-between-php4-and-5s-object.html

haha FRAMES!

Songbirds of Darker Florida (cprek), Thursday, 10 August 2006 13:51 (eighteen years ago) link

eman OTM

steal compass, drive north, disappear (tissp), Thursday, 10 August 2006 13:52 (eighteen years ago) link

C'mon you pussies, do it in assembly.

I'm "familiar" with PHP and MySQL, most of my experience is with teh ASP/SQL Server evil empire, tho. No experience with forum apps but plenty with enterprise apps.

A rewrite would make it more extensible in the future, but I think what we want to avoid is having a whole bunch of ILXors pay for hosting and then be plagued by performance problems (or working out the bugs of an untested app). It would be ideal if an interim solution could reduce the poxy fule's of the current codebase while everybody dabbles in an open source rewrite.

What I'm really curious about is the root cause of the poxy fule errors - Is the code not cleaning up its connections to the db? Poor indexing on the tables? I'm wondering if some tweaks to the db could resolve it. There was a very bad period for the past couple of days, which then was alleviated - wha happened? Did the errors cause reduced traffic, or did somebody actually do something?

Edward III (edward iii), Thursday, 10 August 2006 13:52 (eighteen years ago) link

The sticking point has always been that the original authors didn't want it to be public right?

The original author of the php port didn't want it to be public. Greenspun did his in Tkl/Tk for vastly different reasons.

Rufus 3000 (Mr Noodles), Thursday, 10 August 2006 14:03 (eighteen years ago) link

I'm not sure I could be that much help, as most of my experience is with ASP/SQL Server and a bit of Java/JSP/MySQL and the tiniest bit of PHP. I'm pretty handy with CSS, but that's not much use if we're replicating the current look and feel of ILX as it stands.

Greig (treefell), Thursday, 10 August 2006 14:06 (eighteen years ago) link

frameworks is not something i'm really qualified to talk about, but you know, despite the complexity of reading the code, the code itself is NOT too complex, but there is a lot of it.

making it clearer is key, php classes/objects could well play a part in making the code more transparent. just better commenting and sensible variable names and a convention for the important globals would do just as well.

Britain's Obtusest Shepherd (Alan), Thursday, 10 August 2006 14:08 (eighteen years ago) link

Yeah I'm curious as well. Could be coding issues not being as efficient with it's open connections. Could be mysql server needs to have a little my.cnf tweaking. Could be the hardware just can't keep up with the site volume when the entirety of the content for each page is dynamic from the DB.. Is there anyone who could offer more than speculation?

I'd also suggest using PEAR:DB for simplifying/optimizing any code relating to MySQL connections. Works really well.

http://pear.php.net/package/DB

First step seems to me to start an SVN repository, perhaps on the google thing. Upload/edit any relevant schema files, specs, feature lists onto it so every techie that wants too can have a look.

Songbirds of Darker Florida (cprek), Thursday, 10 August 2006 14:09 (eighteen years ago) link

One feature that would be nice is user-definable stylesheets, which would be very easy to implement.

steal compass, drive north, disappear (tissp), Thursday, 10 August 2006 14:09 (eighteen years ago) link

(xposts)

steal compass, drive north, disappear (tissp), Thursday, 10 August 2006 14:10 (eighteen years ago) link

I'm with Greig in the ASP/SQL camp, but I'll be glad to help in any way possible (testing/migration grunt work/etc). Will it be possible to deploy a separate test db somewhere?

As a person whose job consists of rescuing hopelessly spaghetti coded projects (SCADA systems, not your typical db stuff), I sincerely recommend starting from scratch. use the existing code as a fieldguide, if necessary, to see what the object was, but don't try to patch it.

Jaq (Jaq), Thursday, 10 August 2006 14:18 (eighteen years ago) link

implementing new features like user themes should take a backseat to replicating the current functionality in more transparent and manageable (open) code.

having the current db schema is paramount to starting any of this, so who has it, and how do we control distribution?

TOMBOT (TOMBOT), Thursday, 10 August 2006 14:32 (eighteen years ago) link

possible plan of action (whether or not we're moving server - this whole rewrite thing really is kind of a separate issue to the moving server thing!! but if it improves things, it can mean we can have a much cheaper solution on the new server front)..

1. keep current ilx running (whatever way)
2. copy a few of the smaller boards from currentilx db into a nu-ilx db
3. create nu-ilx code
4. test out nu-ilx
5. migrate current-ilx db to nu-ilx do whatever final tests necessary (perhaps offically move some smaller boards on nu-ilx as pilots for a couple of weeks or so)
6. turn off rest of old-ilx
7. slag each other off, wave dicks in nu-ilx like the old days.

ken c (ken c), Thursday, 10 August 2006 14:36 (eighteen years ago) link

implementing new features like user themes should take a backseat to replicating the current functionality in more transparent and manageable (open) code.

Oh, agreed, without a doubt.

As an addition to that, we should be aiming for small, tight code (this will benefit both the developers and the server itself).

steal compass, drive north, disappear (tissp), Thursday, 10 August 2006 14:37 (eighteen years ago) link

ok nevermind, first hurdle
http://static.flickr.com/66/211800945_dca1f9aa1e_o.jpg

TOMBOT (TOMBOT), Thursday, 10 August 2006 14:39 (eighteen years ago) link

An admin who is able to see all ILX functionality should compile a current_features.txt file and make another file feature_requests.txt.

Schema is urgent and key.

(xpost)

Songbirds of Darker Florida (cprek), Thursday, 10 August 2006 14:40 (eighteen years ago) link

Is there anyone who could offer more than speculation?

From the impression I've gotten, Andrew or Alan would need to sound off on this.

Unfortunately, a major design flaw in ILX is also its key feature - no pagination of threads, each one opens in its full glory. With small/short threads it's not a problem, but if you've got a popular and very long thread, the poxy fule's may be a result of delivering the full payload on each request. Just to test this I saved off a couple of big threads and here are the file sizes (note this just represents the textual HTML, not any images which are delivered from other people's servers and aren't our goddamned problem):

Steely Dan: "Steely Dan's name has been popping up as a hip musical crush. Remember, this glossy bop-pop was the indifferent aristocracy to punk rock's stone-throwing in the late 70's. People fought and died so our generation could listen to something better. "
347 KB

A New Thread fot the Current Israel/Palestine/Lebanon mess
660 KB

Rolling Teenpop 2006 Thread
1.08 MB (!!!)

So each time the Rolling Teenpop thread gets hit, the codebase and db have to pull together over a meg of text and deliver it through the pipe. You also have the problem of people hitting refresh refresh refresh, and if the refresh occurs before their last request was completed the server may still be occupied threading their last one. It's not difficult to see how the environment could get stressed out under these conditions.

The question is, what's the best way to optimize when the mandate is to not reduce the payload?

Edward III (edward iii), Thursday, 10 August 2006 14:42 (eighteen years ago) link

what's the best way to optimize when the mandate is to not reduce the payload?

Keep a pregenerated cache of each page in New Answers? That way the most popular threads would only get generated once per post, not once per view.

Forest Pines (ForestPines), Thursday, 10 August 2006 14:46 (eighteen years ago) link

exactly. then the server only needs to copy and paste the html into your browser rather than working it out each time.

ken c (ken c), Thursday, 10 August 2006 14:48 (eighteen years ago) link

I can set up an SVN server if we need it (alternatively, what about sourceforge?)

steal compass, drive north, disappear (tissp), Thursday, 10 August 2006 14:50 (eighteen years ago) link

I'd just like to say to everyone... good luck, we're all counting on you.

Ste (Fuzzy), Thursday, 10 August 2006 14:52 (eighteen years ago) link

Next question is, how difficult would it be to implement precaching in the current codebase?

Edward III (edward iii), Thursday, 10 August 2006 14:52 (eighteen years ago) link

How easy would it be to set up another board on the current ILX where we can discuss code issues? (a limited poster board, like Mod Discussion)

Would it be worth it? It seems silly just to use one single thread for every potential issue.

steal compass, drive north, disappear (tissp), Thursday, 10 August 2006 14:55 (eighteen years ago) link

At this point I think we're okay w/ the single thread. We're just chatting. Once the rubber hits the road a dedicated forum would be appropriate.

Xpost - we're assuming that pre-caching isn't being performed... perhaps another question for Andrew and Alan. Maybe someone should review the thread and compile a list of questions for them?

Edward III (edward iii), Thursday, 10 August 2006 14:56 (eighteen years ago) link

I've been under the impression that post/thread deletion (as opposed to editing) has invariably caused poxy fulery - is that not the case?

Jaq (Jaq), Thursday, 10 August 2006 14:56 (eighteen years ago) link

http://code.google.com/p/ilxor/

hmmm

TOMBOT (TOMBOT), Thursday, 10 August 2006 14:59 (eighteen years ago) link

Aha. Only someone who knows the codebase can say that.

Starting from scratch, it wouldn't be that hard. The basic operations would be fairly simple:

1) when loading a thread, look for a thread-cache file and serve that if possible
2) when inserting a post, generate a thread-cache file too
3) when deleting or editing a post, either delete any thread-cache file, or generate a replacement.
4) run the following shell command, or something like it, once per day or whenever:

find $ilx_threadcache_dir -mtime +2 -print0 | xargs -0 rm

(which would delete any cached threads that haven't been updated in a couple of days, so you don't end up with a cached copy of the entire database)

(xpost)

I thought it was user locking which caused the *worst* outages, but I could be wrong. I was also under the impression that some of the outages are inexplicable; or have been happening for reasons that haven't been disclosed.

Forest Pines (ForestPines), Thursday, 10 August 2006 14:59 (eighteen years ago) link

Edward, if you go to your settings page (at the bottom, second from left), you can set the number of thread messages that are displayed - "last 20/50/100 or 200 posts" - whoever has that feature enabled pulls up a much smaller file than the ones listed. Quick and dirty solution to this could be to default that setting to, say, the last 50 answers displayed. The default for non-logged in, at least is to display the whole thread, which is probably a bit ridiculous, thinking about it.


x-post, I don't think post/thread deletion has been causing the issue - I've deleted numerous threads from the moderator request board, without any noticeable degradation. The slow-down usually seems to hit around the end of the afternoon, uk time.


Pashmina (Pashmina), Thursday, 10 August 2006 15:00 (eighteen years ago) link

(has anyone considered splitting ILX2.0 between two servers, public for the PHP and private for the database? If we did do thread caching then that would be very worthwhile, because gets of cached threads wouldn't even touch the DB machine)

xpost: I'd forgotten about that feature, which would add an extra level of complexity to caching stuff.

Forest Pines (ForestPines), Thursday, 10 August 2006 15:02 (eighteen years ago) link

the google code project page gives us some bugtracking and has svn all ready to go. just needs to be populated!

TOMBOT (TOMBOT), Thursday, 10 August 2006 15:03 (eighteen years ago) link

xpost: I'd forgotten about that feature, which would add an extra level of complexity to caching stuff.

that's quite easily solved.

the hardest bit is at the moment how each post displays is different depending on 1) whether you're logged in and 2) what user settings you have (show all details etc)

ken c (ken c), Thursday, 10 August 2006 15:08 (eighteen years ago) link

Anyone know if MySQL query cache is enabled?

Songbirds of Darker Florida (cprek), Thursday, 10 August 2006 15:11 (eighteen years ago) link

but i guess you can transform those bits with some kind of sed thingymajik which will still be less painful

ken c (ken c), Thursday, 10 August 2006 15:11 (eighteen years ago) link

It may not be worth the effort to figure out what's causing the poxy fules - just write the new code to be optimized and efficient (and modular, if at all possible).

Jaq (Jaq), Thursday, 10 August 2006 15:11 (eighteen years ago) link

We should start by getting a basic post thread/post reponse thing going, without worrying about the different display modes.

steal compass, drive north, disappear (tissp), Thursday, 10 August 2006 15:11 (eighteen years ago) link

I'll set up a bugzilla later tonight if anyone thinks it'd be useful

stet (stet), Thursday, 10 August 2006 15:14 (eighteen years ago) link

> Keep a pregenerated cache of each page in New Answers?

problem with cached pages is all the optional bits that users like, like the bracketted bits in the signature (most can be replaced by a link to user details page i guess, maybe one that passes message id as a param).

'show all posts' is also very useful on dialup when you hit 'load', wait for page to download and then disconnect and read all posts at your leisure. so suggest caching an 'all posts' version too

xposts, maybe.

Koogy Yonderboy (koogs), Thursday, 10 August 2006 15:14 (eighteen years ago) link

I'm thinking I'll just go post this over on Mod Req, perhaps? With "please" and "thank you" appended

1. An admin who is able to see all ILX functionality should compile a current_features.txt file and make another file feature_requests.txt.

2. Schema is urgent and key.

3. we're assuming that pre-caching isn't being performed

4. Anyone know if MySQL query cache is enabled?

TOMBOT (TOMBOT), Thursday, 10 August 2006 15:16 (eighteen years ago) link

(bugzilla i find unreadable. we use mantis here at work. http://www.mantisbt.org/)

Koogy Yonderboy (koogs), Thursday, 10 August 2006 15:16 (eighteen years ago) link

(Okay, am I the only one reading TOMBOT's posts?)

Ned Raggett (Ned), Thursday, 10 August 2006 15:17 (eighteen years ago) link

seriously, if mysql query cache is enabled then all the major pageload queries (including show only 50 new, etc) are already being cached, right?

Songbirds of Darker Florida (cprek), Thursday, 10 August 2006 15:17 (eighteen years ago) link

haha no I am.

Songbirds of Darker Florida (cprek), Thursday, 10 August 2006 15:17 (eighteen years ago) link

:-)

Ned Raggett (Ned), Thursday, 10 August 2006 15:18 (eighteen years ago) link

(oh, the google project page has an issues tracking thing anyway)

Koogy Yonderboy (koogs), Thursday, 10 August 2006 15:18 (eighteen years ago) link

it needs ppl's google usernames for some of the functionality. since I'm the owner, I have to add them in. webmail me, the address actually works!

TOMBOT (TOMBOT), Thursday, 10 August 2006 15:22 (eighteen years ago) link

(also if I can get pash's gmail name I'll put him in as an owner)

TOMBOT (TOMBOT), Thursday, 10 August 2006 15:23 (eighteen years ago) link

(not sure what owners can do that members can't, yet)

TOMBOT (TOMBOT), Thursday, 10 August 2006 15:23 (eighteen years ago) link

vietgrove(at)gmail(dot)com

Thanks, Tom.

Pashmina (Pashmina), Thursday, 10 August 2006 15:24 (eighteen years ago) link

tissp28, Tom

steal compass, drive north, disappear (tissp), Thursday, 10 August 2006 15:26 (eighteen years ago) link

It may not be worth the effort to figure out what's causing the poxy fules - just write the new code to be optimized and efficient (and modular, if at all possible).

-- Jaq (js...), August 10th, 2006.

I'll make this point again - if the move occurs, it's going to happen on September 7th. If a bunch of part-timers can re-code the entire app, enter a test/revision cycle, perform a stress test, and migrate the existing database into a new structure in less than a month's time, I'd be very impressed. It's not impossible, but keep in mind that forum members will be paying out of pocket to keep it going post 9/7. I'd hate to rollout a halfbaked app to somebody who just laid out cold hard cash, and as I noted on the other thread, I've seen bad migrations kill good forums.

This is a bit like changing the tire on a moving car, so I'd advise to err on the side of deliberation and a phased transition. If necessary, the dev team could be split into an optimization crew and a new app crew. That way there's three possible outcomes (1) move to new server with existing codebase, 2) move to new server with optimized codebase, 3) move to new server with new app) and maximum flexibility in case contingencies arise.

Edward III (edward iii), Thursday, 10 August 2006 15:27 (eighteen years ago) link

> it needs ppl's google usernames

that's me out then 8)

is there any guarantee that all the code isn't going to be indexed and made searchable by googly?

4) not have to move, rewrite at leisure.

Koogy Yonderboy (koogs), Thursday, 10 August 2006 15:28 (eighteen years ago) link

I have identified the first problem there.

IPSISSIMUS (Uri Frendimein), Thursday, 10 August 2006 15:38 (eighteen years ago) link

If necessary, the dev team could be split into an optimization crew and a new app crew. That way there's three possible outcomes (1) move to new server with existing codebase, 2) move to new server with optimized codebase, 3) move to new server with new app) and maximum flexibility in case contingencies arise.

Because the code as it exists is currently closed and I think it's wise to operate under the assumption that we probably can't or won't get new coder accounts in the next two weeks, replace "optimization" with "babysit" and that's what's happening already.

TOMBOT (TOMBOT), Thursday, 10 August 2006 15:38 (eighteen years ago) link

http://code.google.com/p/ilxor/issues/list

IPSISSIMUS (Uri Frendimein), Thursday, 10 August 2006 15:40 (eighteen years ago) link

Thanks for summarizing the q's Tombot. Wanna add me to the Google Code... my gmail's in my sig ....

Edward III (edward iii), Thursday, 10 August 2006 15:43 (eighteen years ago) link

A good way to end alot of the more speculative optimization ideas is to add to the repository:

-httpd.conf
-my.cnf
-machine_specs.txt

Of course any sensitive bits should be edited out. Maybe add a directory just to hold non-code relevant configs and feature documentation. Would the admins be willing to do this?

Songbirds of Darker Florida (cprek), Thursday, 10 August 2006 15:43 (eighteen years ago) link

Because the code as it exists is currently closed and I think it's wise to operate under the assumption that we probably can't or won't get new coder accounts in the next two weeks, replace "optimization" with "babysit" and that's what's happening already.

-- TOMBOT (tombo...), August 10th, 2006.

That's assuming the optimization team would be working on code changes - however, there are plenty of things external to the code that could possibly provide relief (e.g. turning on MySQL cache).

Edward III (edward iii), Thursday, 10 August 2006 15:46 (eighteen years ago) link

It seems highly likely that MySql is capable of caching the results of queries, and I'd be surprised if that isn't turned on by default; however, the problem is, in order that the cache is utilised, you have to make a query, which means you have to first check out an expensive database connection from a limited pool of these, so even if it is cached, you may run out of connections (the poxy fule messages seems to suggest this is the issue, but without looking at code...).

If it were me, similar to what Forest Pines is suggesting, I would cache the whole of the new answers page's threads in memory (this assumes that this is what most people look at, but the principle would apply to other places) - hash table of linked lists. It's not so big. Reading files is expensive compared to reading from memory. You could compress if it's an issue trading off CPU for memory. I don't think the presentation is too much of an issue Deano, that's just post-processing isn't it? Key to scalability is to use as few resources (connections, memory, disk CPU etc.) as possible for as little time as possible. This approach trades of memory for servicing a request much more quickly (memory access being 1000s of times faster than disk and likely 10,000 times faster than DB access). It saves you using a database connection at all, and releases an HTTP connection much more quickly due to shortening the length of the transaction. This frees those up for other users to make use of. Given the amount of info on the new answers page's threads, it seems feasible (can't be more than a few megs I expect, but even if it's 100MB, that's still only about 6 quid of memory).

For updates, first update the cache, then ideally stick the update on a persistent message queue (like MQseries or MSMQ; there are open source versions of these) to which there is a single thread (possibly more than one depending on volume) servicing the updates in series (no idea if you can do this in PHP, I'm afraid). This lets the system record the information to the database ultimately, without placing load on the server that's directly proportional to the number of users currently accessing the site (effectively, spreads updates out over a longer time, smoothing the profile). Obviously need to check that the queue isn't just getting longer and longer. Persistent queues will protect against the server crashing. It won't affect users, since they'll get all the data from the cache.

Of course, all this depends on my guesswork as to how people use ILX. I would say that the most important thing to do now is to find out how people actually do use it, but in a typical situation, 95% of access is read and 5% update. I would suspect ILX has a lower proportion of updates than this. With this in mind, you would optimise towards reading, which is kind of what myself and others are suggesting.

KeefW (kmw), Thursday, 10 August 2006 17:30 (eighteen years ago) link

Query caching in MySQL isn't turned on by default. The beauty of it as well is that it will automatically regenerate a cache file upon seeing updates to the data. I don't know if this is in memory or not, but that could be easily remedied. Again, we don't know until some tells us.

Keef OTM. And there's no reason to do all of these modifications really. The only concern I can think of regarding queuing page caches would be the potential for really wild xposts under heavy load. I'm probably missing a detail in the technical process though.

Optimizing reads is definitely key in the new ILX engine, codenamed Excelsior.

Songbirds of Darker Florida (cprek), Thursday, 10 August 2006 17:41 (eighteen years ago) link

Songbirds, what I was meaning was that all the updates happen in memory, but that the actual DB updates get queued (you do both). So users would get the updates immediately, because they read from the cache, just that they wouldn't yet be written to the DB. It's possibly a bit enterprise-class, and you could probably get away with just caching, but it would service an awful lot of users.

It would actually require a slight change to ensure something didn't get removed from the cache before it was written to the DB, so cache entries would need to be have a counter incremented when inserted and the update would need to decrement the counter it was flushed to disk. Once it's at zero, it OK to remove from the cache.

At a guess of an average 100K per thread, and 200 threads on the new answers page, that would need a cache of 20M, assuming no compression.

KeefW (kmw), Thursday, 10 August 2006 17:52 (eighteen years ago) link

I'd be surprised if that isn't turned on by default; however, the problem is, in order that the cache is utilised, you have to make a query, which means you have to first check out an expensive database connection from a limited pool of these, so even if it is cached, you may run out of connections

If you're currently running out of connections, minimizing the amount of time those connections are open will allow them to return to the pool more quickly. This is helpful if long-running queries are hogging connections. Of course, this is all theory until someone can look at the MySQL logs on the server to verify where the poxy fules are coming from.

The rest of your suggestions are helpful, but that's new app talk. I'm suggesting that someone investigate small environment tweaks that might relieve some of the current lockouts, since we may not be able to mod the existing codebase. I had an ETL job that was running for 90 minutes - by building one index over a single column it dropped to 5 minutes. Sometimes the simplest acts can fix these issues, the real bear is figuring out exactly where the bottleneck is.

Edward III (edward iii), Thursday, 10 August 2006 17:55 (eighteen years ago) link

Absolutely... Minimise resource usage and the amount of time that resources are used for.

So by contrast, if updates are much more frequent than reads, then indexing will make it slower for high volumes. As you say, working out where the bottlenecks are is highly important.

The key would seem to be analysing the usage patterns of the current version of ILX.

KeefW (kmw), Thursday, 10 August 2006 18:00 (eighteen years ago) link

this is all theory until someone can look at the MySQL logs on the server to verify where the poxy fules are coming from

boardfunctions.php on line 127 seems like a good place to start

TOMBOT (TOMBOT), Thursday, 10 August 2006 18:09 (eighteen years ago) link

Incidentally, without actually looking, I would guess that searching is the obvious connection hog. Assuming that doing a search on message contents performs an SQL query on the actual database (as compared with a query on an appropriate data structure suited to searching this type of info) will do a tablespace scan on the entire database, holding a connection until all of this is done, regardless of whether the user has given up and is doing other stuff. If anyone did this at my workplace they would be taken out, shot in the head, and then sacked, and we've got thousands of times the computing power I expect ILX has.

The easy remedy is to simply remove the function to search on thread content, rather than title, or if that's not palatable, make it blindingly obvious that one shouldn't use it unless it's absolutely necessary.

KeefW (kmw), Thursday, 10 August 2006 18:17 (eighteen years ago) link

searching has always been a big ugly slowdown thing too -- the search code was k-awful last i recall and i don't know how much was ever finally done.

unfortunately current job sitch doesn't let me offer to volunteer time this time (not that i ever ended up doing much in the past).

as i recall, there's no caching now, and also there's some very nice simple php caching libraries that i've used (and whose names are not on the tip of my tongue).

i'm all for the moving over as is and then doing work on the side (if the database schema is preserved, we could even have the old codebase and the beta one talking to it simultaneously! [assuming we trust the coders not to make dangerous beta code / trust [ahem] people not to go and try to break the database with hacks)

Sterling Clover (s_clover), Thursday, 10 August 2006 18:25 (eighteen years ago) link

anyway the search function could probably be improved vis a vis the "entire databse" thing with a few simple LIMITs, etc.

Sterling Clover (s_clover), Thursday, 10 August 2006 18:25 (eighteen years ago) link

boardfunctions.php on line 127 seems like a good place to start

-- TOMBOT (tombo...), August 10th, 2006.

Haha, where in the world did you ever come up with that idea?

If it is (as I suspect) a generic OpenDBConnection on that line then we're back to the database logs.

KeefW does have a good point about the searching; if the search doesn't utilize a basic word index it will have some deleterious effects on performance. The search function here is quite pokey, and it could be commandeering db connections that eventually time-out, all the while poxy fule'ing the rest of the universe...

Edward III (edward iii), Thursday, 10 August 2006 18:27 (eighteen years ago) link

OMG it's Sterling!

Edward III (edward iii), Thursday, 10 August 2006 18:30 (eighteen years ago) link

sterling I can go ahead and add you as a member to the project if you want, you've at least looked over big chunks of the current codebase so even if you don't have time to build you can review and compare. Plz 2help.

TOMBOT (TOMBOT), Thursday, 10 August 2006 18:30 (eighteen years ago) link

Thanks for the Google add, tombot...

Edward III (edward iii), Thursday, 10 August 2006 18:35 (eighteen years ago) link

may i suggest automatically cutting a thread at a predefined limit (whether posts or size) and beginning another thread under the same name (e.g. "Rolling 2006 Teenpop Thread Part II") with a link to the original and say the last 10-15 posts to keep some continuity.

this would prevent the chicagoans or teenpoppers from creating 4000+ post memory drain threads...

john, a resident of chicago. (john s), Thursday, 10 August 2006 18:50 (eighteen years ago) link

That approach sorta defies user expectation regarding thread behavior. I like Pash's suggestion to make the default view a limited number of posts, that seems like a more elegant solution.

It also might have a nice effect on bandwidth requirements.

Edward III (edward iii), Thursday, 10 August 2006 19:05 (eighteen years ago) link

Edward, if you go to your settings page (at the bottom, second from left), you can set the number of thread messages that are displayed - "last 20/50/100 or 200 posts" - whoever has that feature enabled pulls up a much smaller file than the ones listed. Quick and dirty solution to this could be to default that setting to, say, the last 50 answers displayed. The default for non-logged in, at least is to display the whole thread, which is probably a bit ridiculous, thinking about it.

-- Pashmina (vietgrov...), August 10th, 2006.

I made the change Pashmina notes above and remeasured those threads - quite a difference:

Rolling Teenpop 2006 Thread
347 KB -> 11 KB

A New Thread fot the Current Israel/Palestine/Lebanon mess
660 KB -> 25 KB

Rolling Teenpop 2006 Thread
1.08 MB -> 29 KB

If this default setting for users is easily configurable, I'd suggest globally changing it from "ALL" to "50" or "20". The nice thing is you get a link to view all comments on the truncated thread, so if you really want to see the whole thing you can get it.

Edward III (edward iii), Thursday, 10 August 2006 19:16 (eighteen years ago) link

as soon as we get some answers on new coder accounts for the current build and the configuration/schema, we can split folks into a Bottleneck Mitigation group and an Excelsior group to start actually tackling shit. I think the lines are pretty well drawn from this discussion so far.

TOMBOT (TOMBOT), Thursday, 10 August 2006 19:20 (eighteen years ago) link

(Okay, am I the only one reading TOMBOT's posts?)

I am too. And actually TOMBOT, if you could add me to the Google Code project page I'd be obliged (googlename = quartzcity)

Elvis Telecom (Chris Barrus), Thursday, 10 August 2006 19:47 (eighteen years ago) link

The order of magnitude here is evident.

Edward III (edward iii), Thursday, 10 August 2006 20:02 (eighteen years ago) link

the schema is indeed going to be key to anyone getting started. i don't have a copy and i no longer have server access to get it for you. sorry. it's all up to andrew from this point.

but i can give you some pointers on the general architecture. and yes the sql requests are very flabby - too much stuff is requested on generating threads i can remember that much.

the main approach with the very large table of posts is to have a cache of "new answers", that is some of the fields from posts from the last week, with oldest threads culled regularly.

then a cache of recently updated threads is generated from the newanswers_cache.

the cache of threads is refreshed every couple of minutes, so that when a thread off the list is "bumped" with a new answer/post, it doesn't appear on new answers page until that cache is refreshed.

the list of threads (on the new answers page) is ordered by the most recent post in the newanswers_cache, which is why a thread in the new answers cache is bumped to the top on that page immediately.

clear?

Britain's Obtusest Shepherd (Alan), Thursday, 10 August 2006 20:31 (eighteen years ago) link

Query caching is turned on. The issue is wrt connections as mentioned above. Also the cached results for the messages table expire every time somebody posts a new message, so it becomes effectively useless most of the time.

I strongly suggest core integration of memcached in the design of nu-ILX. That would solve most problems almost immediately. A rewrite should be possible within, say, a month (minus testing) if a good spec is written. There aren't that many features, after all.

Andrew (enneff), Thursday, 10 August 2006 22:35 (eighteen years ago) link

I have put a copy of the current ILX code and database schema here:

http://nf.wh3rd.net/ilxor/

It is password protected. If Tombot or whoever's coordinating this wants access to it, my email address is andrewdg @ gmail.

Andrew (enneff), Thursday, 10 August 2006 22:46 (eighteen years ago) link

Also I wouldn't mind being added to the Google Code group thing. I might be able to contribute some coding gruntwork if it's all laid out smoothly.

Andrew (enneff), Thursday, 10 August 2006 22:47 (eighteen years ago) link

I can't add much to the coding talk, but to try and resolve searching - is it possible to limit the search to de-indexed threads only? Google is far better for searching the threads it can see (as stated by many people on threads passim, "an scary clown" is one that comes to mind).

aldo_cowpat (aldo_cowpat), Friday, 11 August 2006 07:53 (eighteen years ago) link

on top of the schema, a dump, or partial dump, of the TimeZone table would help.

is there any way to improve the TZ calculations? all posts have a UTC timestamp, and then if a user is logged in, the time is adjusted to reflect their own zone, which going back to a post X years ago, means knowing what the offset was for every time zone going back X years. which is what that table is.

the last useful thing i remember coding was a brain-dead caching of those offsets.

Britain's Obtusest Shepherd (Alan), Friday, 11 August 2006 08:52 (eighteen years ago) link

hi my gmail is redbulldozers at that thingie dot com

ken c (ken c), Friday, 11 August 2006 10:26 (eighteen years ago) link

Dunno if this has been mentioned elsewhere, but I've starting using mvc-ish frameworks myself on projects, and it's really helped getting the application structured.

There's a load for PHP, quite a few influenced by Rails. CakePHP was a bit of a documentation nightmare when I last looked at it, but it might've improved since. I stuck with CodeIgniter because of the quality of the docs. CI has some caching classes built in, could be useful.

ringroad (ringroad), Friday, 11 August 2006 10:55 (eighteen years ago) link

the schema is indeed going to be key to anyone getting started. i don't have a copy and i no longer have server access to get it for you. sorry. it's all up to andrew from this point.

cough cough. You forgetting someone Alan?

Rufus 3000 (Mr Noodles), Friday, 11 August 2006 11:42 (eighteen years ago) link

Don't worry, I got the blueprints in my inbox this morning. Before coffee, even.

TOMBOT (TOMBOT), Friday, 11 August 2006 11:47 (eighteen years ago) link

Try not too laugh too hard at them.

Rufus 3000 (Mr Noodles), Friday, 11 August 2006 11:52 (eighteen years ago) link

Do somebody tweak something? Seems like the site's buffering the HTML payload now - I've had a couple messages load in sections, and a couple crap out halfway through loading (this is more preferable behavior than TURN BACK YOU etc etc anyway).

Edward III (edward iii), Friday, 11 August 2006 12:22 (eighteen years ago) link

sorry Z. i wasn't forgetting you, but i figured it was andrew's call – and was sort of justifying my own inaction more than anything else. wuss that i am.

Britain's Obtusest Shepherd (Alan), Friday, 11 August 2006 12:25 (eighteen years ago) link

Make that "Did somebody tweak something?"

Site seems more responsive in general...

Edward III (edward iii), Friday, 11 August 2006 12:27 (eighteen years ago) link

At my current job I'm a developer for a stress testing/load testing tool. Something like this might be useful when you reach the stage where you want to start testing nu-ilx before putting it into production. It won't solve all your problems but can give you some metrics on response time, throughput and an idea of how many concurrent users you can have hitting the refresh button on the trucker hat thread while posting about how much Finder sucks in the I hate apple thread.

I'm not trying to sell this mind you, its just an offer to run some scenarios in my free time that would simulate a ton of people all hitting the system at once, and the reporting back the relevant details.

k james (hypo emesis), Friday, 11 August 2006 12:40 (eighteen years ago) link

Also to TOMBOT or whoever else is in charge of the google code group thing - I have a little bit of PHP and MySql experience and would like to be added to the group, my user name is hypoemesis.

k james (hypo emesis), Friday, 11 August 2006 12:48 (eighteen years ago) link

Don't see any changes in the code. Apache hasn't been tweaked since Oct 2005, not that Alan or I can read the file, though we could read the htttpd.conf.bak.

Rufus 3000 (Mr Noodles), Friday, 11 August 2006 12:58 (eighteen years ago) link

k james - Thanks for the offer. Given the traffic this site sees a load test will be key (once there's an app to test)...

Edward III (edward iii), Friday, 11 August 2006 15:00 (eighteen years ago) link

tombot can you add me? my username is as my email address.

agree v. much on the memcached thing.

Sterling Clover (s_clover), Friday, 11 August 2006 22:04 (eighteen years ago) link

i didn't read the whole thread (so sorry if it's redundant) but i did read this:
The question is, what's the best way to optimize when the mandate is to not reduce the payload?

why not enable "show unread" by default? if people were only loading the last twenty messages of hastings or israel/palestine/lebanon FAP instead of 800 messages, that would take a huge load off the db.

a name means a lot just by itself (lfam), Saturday, 12 August 2006 18:56 (eighteen years ago) link

i mean that as a quick temp solution, just putting out a fire.

a name means a lot just by itself (lfam), Saturday, 12 August 2006 18:57 (eighteen years ago) link

okay, i read some more and my suggestion was redundant. good luck everybody.

a name means a lot just by itself (lfam), Sunday, 13 August 2006 00:03 (eighteen years ago) link

can i add a functionality request?

anonymous posting. just a checkbox that allows a logged-in, registered user to be completely anonymous on the website (though not necessarily in the db - gotta be some accountability)

will register with gmail later (don't need invite anymore i guess?) and post address here.

Koogy Yonderboy (koogs), Wednesday, 16 August 2006 07:37 (eighteen years ago) link

we should try and build in an imagespam filter with some cleverness (block imagesize > 1200x1200 pixels or something or enforce a "width" and "height" attribute)

ken c (ken c), Wednesday, 16 August 2006 14:44 (eighteen years ago) link

I think that's in already, ken

stet (stet), Wednesday, 16 August 2006 15:21 (eighteen years ago) link

I thought that was in for the i... shortcut, but not for manually-coded <img&rt;

Forest Pines (ForestPines), Wednesday, 16 August 2006 15:22 (eighteen years ago) link

That's probably it, I'm not nearly understanding all the code yet.

stet (stet), Wednesday, 16 August 2006 15:35 (eighteen years ago) link

have google account now. address is same as one below but with 'goo' instead of 'ile'.

links in the right hand bar of google page don't work btw.

> Site seems more responsive in general...
> Edward III, August 11th, 2006 2:27 PM

ha ha. 8)

Koogy Yonderboy (koogs), Wednesday, 16 August 2006 16:07 (eighteen years ago) link

You know, this is probably going to get a good chuckle, but there really should be an IRC channel for ILX hacking.

Although others will probably have their own choice, irc.gimp.net #ilxor could work.


mikef (mfleming), Thursday, 17 August 2006 05:23 (eighteen years ago) link

how about more modern urls, inna wordpress stylee, to reflect the structure of ilx>board>thread

well ilx>board at least so that urls are like

http://nuilx.org/ilm/ -> ilm newanswers
http://nuilx.org/thread/722857/ -> this thread, or possibly
http://nuilx.org/ile/thread/722857/ -> this thread
http://nuilx.org/ile/faq/ -> ile's specific faq

you get the idea

obv need to stay back compatible for old links to come in, but a little pre-processing of the URI wouldn't go amiss.

Britain's Obtusest Shepherd (Alan), Thursday, 17 August 2006 08:30 (eighteen years ago) link

that all looks mod_rewritable, doable via apache config.

Koogy Yonderboy (koogs), Thursday, 17 August 2006 08:45 (eighteen years ago) link

quite. the wordpress technique is to put this in the .htaccess:

<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>

leaving index.php to interpret the $_SERVER['REQUEST_URI'] into variables

Britain's Obtusest Shepherd (Alan), Thursday, 17 August 2006 09:40 (eighteen years ago) link

ah, have just realised that the other part of this, the bit you were probably on about, is the scripting of the links themselves within the pages, which will required code changes, yes. as you were. 8)

i think even just http://nuilx.org/722857 should be possible.

Koogy Yonderboy (koogs), Thursday, 17 August 2006 10:46 (eighteen years ago) link

how about more modern urls, inna wordpress style

That's fine, but please please keep the old URL format alive whatever you do.

mikef (mfleming), Thursday, 17 August 2006 13:56 (eighteen years ago) link

Mmmm - getting rid of the old format will break every inter-thread link, ever, not to mention links from external sites.

Forest Pines (ForestPines), Thursday, 17 August 2006 14:00 (eighteen years ago) link

can someone add me to google project please. email is goo at the domain below. thks.

doesn't seem to be much going on there tbh. read the mvc / php framework thing last night. have long weekend here, what with bank holiday and tomorrow off, and thought i could take the time to at least look through the existing code / schema, maybe attack the image url parser thing.

(oh, and new aliases will map to the old ones, be rewritten by apache to map exactly to old format so the old ones would work as always. it's just that when we script any new urls we'd use the new, shorter format)

Koogy Yonderboy (koogs), Thursday, 24 August 2006 07:24 (eighteen years ago) link


You must be logged in to post. Please either login here, or if you are not registered, you may register here.