Sparks

Administrators
  • Content count

    33
  • Joined

  • Last visited

About Sparks

  • Rank
    Sysadmin of Ice and Fire
  • Birthday May 17

Contact Methods

  • ICQ 0

Profile Information

  • Gender Female
  • Location Seattle, WA

Previous Fields

  • Name Rachel Blackman

Recent Profile Visitors

1,611 profile views
  1. Links in the Bran Re-Read

    Ugh. I added some stuff to rewrite the old-style URLs into the new ones, but it doesn't work on those because those links contain a both a query path and a query string, not one or the other, and the logic is trying to switch between the two. A simple rewrite of the other form wouldn't work.. I can try to rewrite the old-URL-handler logic to handle those. Basically, instead of the ?p=<postID> on the end of the path, if we're rewriting to the method nginx uses, it needs to turn into a &do=findComment&comment=<postID> on the end, if it's using the ?/ portion at the beginning. (Alternatively, just switching back to lighttpd would allow us to use both formats, and I'm seriously considering that at this point.) Either way, I'll try to come up with a fix for that URL format later.
  2. Access to site at work, not at home

    Sorry, it's been a little insane for me at work, so I fell behind on reading the forums. (But hey, we stayed up pretty well during the episode!) I would toss the IP to both me and Ran in PM here, that's probably quickest. Reference this thread, so Ran knows why he's getting emailed a random IP (in case he hasn't glanced at this thread). Also, if you used to be able to log on from home and can't now, chances are your ISP rotated dynamic IPs around, and you used to be on a different one but now hit a tainted/burned/blacklisted one.
  3. Access to site at work, not at home

    That particular code means an IP has been banned in the Invision anti spam blacklist; either your machine was compromised or—if you're on a home network where the IP rotates around, as with most cable/DSL providers—a previous person on your IP had a compromised machine. That machine was then stuck into a net to work as a proxy for people who would try to go around and register on Invision sites, then immediately post 40 'mail-order Russian bride' or 'best video download' or 'cheap website design' ads wherever they could. Once enough Invision sites report that IP as a spammer, Invision's mothership adds the IP to the list of bad net citizens, and all Invision boards then reject that IP address (so the spammer is cut off from Invision boards as a whole). The spammer moves along to a new compromised machine as their proxy, thus changing their IP, and whoever inherits the now-tainted IP gets shafted, because anything linked to the Invision mothership figures they're a spammer and blocks them. What you'll need to do is get your external home IP address (just Google 'What Is My IP' and hit that site once you are at home), then send that to one of the admins and we'll see about adding an override. (You may also want to do a malware scan of your home machine just to make sure it wasn't you that was the compromised one, and was someone else who had your home IP previously.)
  4. Board Issues 4

    Ugh. I hadn't noticed that nginx blocked path-type URLs. I've added some rewrite logic which hopefully fixes things.
  5. Forum Tweaks for the Season

    Aaaaaand no, apparently not. Sadly, there's stillsome browser out there someone is using that doesn't report itself with the standard mobile strings in User-Agent, but DOES request mobile layout once CSS is loading. So it shows up as a desktop client when we're caching pages, and mobile when the page actually loads. I will track this down yet!
  6. Forum Tweaks for the Season

    IthinkI have finally tracked down why mobile pages for the wiki were sometimes being cached as desktop pages in Varnish, as well. Fingers crossed! Let Ran or me know if you see that happening again.
  7. Forum Tweaks for the Season

    So, we've changed how our caching is working, which may have freed up a bunch of resources. This is a fairly significant set of changes (changes touching configuration for the PHP queues on both machines, for nginx on the forum and Varnish on the wiki, the MySQL connection profiles, AND how the forum and wiki are configured). If the Seven are kind, this will make a huge lasting difference (not just the short-term one from unclogging the bottleneck we were having this morning); we'll see what happens.
  8. Board Issues 4

    As background, think of the forum and wiki like a ride at Disneyland. You can only have a certain number of people on the ride at a time (i.e., the maximum number of PHP instances), but you can have a larger number of people waiting in line for the ride (the pending connections). Some people are okay just watching a video of the ride (Varnish cache for non-logged-in users), and so they can come to the end of the line, watch the video, and leave. Others, however, are stuck waiting for a space on the ride. That long delay before the page starts loading, that's the waiting-in-line part. As soon as the page starts loading, that's you actually being on the ride; you can see that the latter part has gotten considerably faster, but less so the first part. Now, there are two factors that determine how quickly that line moves: the speed of the ride (i.e., the server's CPU capacity) and the number of people you can fit on the ride (i.e., the server's RAM capacity). The faster the ride goes, the quicker you can get people back off and get the next group on. The more people you can fit on per run, the bigger a chunk of the line you can take away each time the ride comes back to put new people on. When the line gets gummed up, that's when things go wrong. Too many people in line, and the amusement park staff cap the line and refuse to let anyone else get in. (That's the 500 error you get sometimes.) Enough of those 500 errors, and Cloudflare itself marks the site offline. So, clearly, we need to keep the line from getting too long, and not just for performance reasons! But what if we just, I dunno, crammed more people in the ride? Those safety guidelines are for wusses, right? Well, turns out, if we risk that, that's when we hit the server's RAM limits. Lots of page faults, and the processors become bogged down, and we end up in the death spiral. People flying off the ride screaming, things on fire, things turning just generally Not Good. That's the part where instead of the server recovering on its own, I get back to the keys and have to try to balance things (or Ran just shuts things off until it recovers). Now, previously, the Wiki and Forum were two tracks of the same ride; they shared capacity and the line waiting to get in. This obviously wasn't ideal. We've split them onto two separate servers—made them into separate rides—with separate waiting lists and separate capacity. There's still some shared resources (notably the SQL database backing it, off on a third machine), but it's an improvement. And for a couple of days, everything ran very smoothly. However, apparently the forum has proven quite capable of hungrily devouring all new capacity; the faster things go, the more people show up, and the quicker we get back to where we were. You all collectively are basically Cookie Monster, but for servers. (Which is actually appropriate, if anyone's seen the very first thing the Muppet that became Cookie Monster came from; he was made for an IBM training film that Henson was hired to do, where he ate a computer. But I digress.) To stay with the amusement park analogy; the ride itself doesn't take long to go on, so we can move people through faster that way, but the capacity of the ride hasn't increased much so we still can only run a small number of people at a time. As such, we're probably going to increase the RAM of the machine later this week, so that I can increase the number of concurrent PHP sessions (i.e., the capacity of the ride). ...and now I have mental images of Muppets riding amusement part rides, and periodically eating them. Clearly, I need more coffee. (This post has been brought to you by the letter W and the number 16. Status Posts are a production of the Decaffeinated Sysadmin's Workshop.)
  9. Board Issues 4

    I think I found the issue; it should be fixed, but youmay have to log in again for the fix to take for your user session. (I.e., you might get logged out one more time, but that should be the last time.)
  10. Board Issues 4

    Mm. Copying over the cache/upload directory caused all the webserver's permissions on those directories to be rewritten. Try again!
  11. Forum Tweaks for the Season

    And... welcome to the new server, everyone! Fingers crossed this helps clear up some of the traffic issues. We may end up throwing a little more RAM into this machine if not, but I think without sharing CPU and RAM with the wiki, we should be better off.
  12. Friends list/Search Function

    Ran posted in the announcements thread about this, but the short form is that anything powered by the underlying search tables (search itself, obviously, but also a number of other features) puts a far heavier load on the forum than just reading/posting does. To use an analogy, it's the difference between going for a jog in running shorts and a tank-top, versus going for a jog in full plate armor while dragging a cart behind you as if you were a horse. Those functions are the armor and cart. The forum software has performance modes you can enable—disabling a lot of features—if the forum is seeing high traffic. Since the HBO seasons are airing and we're seeing traffic levels among the highest Westeros has ever seen, Ranturned those on for now, which disabled pretty much any feature you're abruptly missing. The features will be re-enabledpost-season when performance mode gets kicked off. Wearegoing to move the forum to a dedicated machine, per my post in the announcements forum, and we'll examine whether that gives us enough leeway to turn the features back on. But we're seeing traffic among the highest levels Westeros.org has ever seen right now, so it may prove to make more sense to keep the forum in performance mode, even on a separate machine.
  13. Forum Tweaks for the Season

    So, things are running smoothly—no massive issues, nothing breaking or spiraling out of control—but very slowly. (As I have no doubt you have all noticed.) This isn't a surprise; our traffic after episode 2 last Sunday was in the top three stretches of traffic we've recorded over Westeros.org's history, well past the level where the server would previously have completely keeled over and gone unresponsive. This means our new design works, which is great! However, the slowdown under heavy load is still dramatic enough that we are still going to bring the dedicated forum machine online. Alas, I won't have time tonight (because Captain America: Civil War!). But probably sometimeFriday evening Pacific time, we'll be turning off the forum and swapping it to a new dedicated machine. So if you encounter forum downtime on Friday, that'd be why. Once we're up on the dedicated machine, I'll talk to Ran about whether or not we want to try turning search back on during the season. Weprobablywon't, just because search really does exponentially increase the load the forum puts on things. So don'tget your hopes up! Butwe'll at least investigate the possibility.
  14. Forum Tweaks for the Season

    We're going to see how things go with the episode tomorrow; if the current setup (which seems to be handling things well right now) dies under the episode load, I'm going to split the forum off onto a separate machine and see if we can't do something a little more efficient over there, without worrying about impacting the wiki.
  15. Tapatalk not working with the forum again?

    Tapatalk's add-on for Invision broke horribly last time we updated Invision to get some security fixes, about... I want to say 4-5 weeks ago. Worse, it broke in a horrible manner for more than just Tapatalk users, so Ran sent in a bug report and turned it off until there's a fix. Sadly, it's not the only Invision forum I'm on that had the problem—just the only one I sysadmin—so I suspect it's an issue with Tapatalk's extension and certain configurations under the most recent versions of Invision. Ran could provide more details, as he was the one who did that particular investigation into "What The Heck Is Going Wrong Here?"