Sparks

Forum Tweaks for the Season

10 posts in this topic

So, Ran and I spent some time during the heaviest load of the episode's airing tweaking a few things, to see what would work best for the insane load that hits things.  Reconfigured Varnish, tweaked PHP-FPM, and adjusted the MySQL settings, as well as moving a few SQL tables to InnoDB.  

Overall, we've made quite a few changes to things; it'll take a little while for everything to shake out into balance again.

After things finish shaking out tonight, do please let us know if anything has gone horribly wrong.  There may still be times when the load gets too high, and the 'Try again' page pops up, but they should be less frequent and usually should recover if you wait a minute or so and then click 'Try again'.

(Ideally, these tweaks should be ones we can leave in place even out-of-season, which would make my life considerably easier.) ;)

Share this post


Link to post
Share on other sites

An additional tweak: for the duration of the season, I am attempting to disable email notifications. I know some of you like it, and I like it too, but the fact is that the mail process means mail has to be scanned for viruses and so on and that leads to massive spikes in lag when a popular thread gets updated and a couple hundred notifications flood out.

Share this post


Link to post
Share on other sites

While we're working on trying to resolve outstanding issues, I've tried to shut down some things on the forum -- activity stream, search, and so on -- to try and reduce overall load while retaining the core posting and browsing functionality.

Share this post


Link to post
Share on other sites

We're going to see how things go with the episode tomorrow; if the current setup (which seems to be handling things well right now) dies under the episode load, I'm going to split the forum off onto a separate machine and see if we can't do something a little more efficient over there, without worrying about impacting the wiki.

Share this post


Link to post
Share on other sites

So, things are running smoothly—no massive issues, nothing breaking or spiraling out of control—but very slowly.  (As I have no doubt you have all noticed.)  

This isn't a surprise; our traffic after episode 2 last Sunday was in the top three stretches of traffic we've recorded over Westeros.org's history, well past the level where the server would previously have completely keeled over and gone unresponsive.

This means our new design works, which is great!  

However, the slowdown under heavy load is still dramatic enough that we are still going to bring the dedicated forum machine online.  Alas, I won't have time tonight (because Captain America: Civil War!).  But probably sometime Friday evening Pacific time, we'll be turning off the forum and swapping it to a new dedicated machine.  So if you encounter forum downtime on Friday, that'd be why.

Once we're up on the dedicated machine, I'll talk to Ran about whether or not we want to try turning search back on during the season.  We probably won't, just because search really does exponentially increase the load the forum puts on things.  So don't get your hopes up!  But we'll at least investigate the possibility. :) 

Share this post


Link to post
Share on other sites

I've put search on now, but yeah, Sunday through Tuesday (maybe even Wednesday) it'll be turned off to lighten the load!

Share this post


Link to post
Share on other sites

And... welcome to the new server, everyone!  Fingers crossed this helps clear up some of the traffic issues.  We may end up throwing a little more RAM into this machine if not, but I think without sharing CPU and RAM with the wiki, we should be better off.

Share this post


Link to post
Share on other sites

So, we've changed how our caching is working, which may have freed up a bunch of resources.  This is a fairly significant set of changes (changes touching configuration for the PHP queues on both machines, for nginx on the forum and Varnish on the wiki, the MySQL connection profiles, AND how the forum and wiki are configured).  

If the Seven are kind, this will make a huge lasting difference (not just the short-term one from unclogging the bottleneck we were having this morning); we'll see what happens.

Share this post


Link to post
Share on other sites

think I have finally tracked down why mobile pages for the wiki were sometimes being cached as desktop pages in Varnish, as well.  Fingers crossed!  Let Ran or me know if you see that happening again.

Share this post


Link to post
Share on other sites

Aaaaaand no, apparently not.  Sadly, there's still some browser out there someone is using that doesn't report itself with the standard mobile strings in User-Agent, but DOES request mobile layout once CSS is loading.  So it shows up as a desktop client when we're caching pages, and mobile when the page actually loads.  

I will track this down yet!

Share this post


Link to post
Share on other sites