Monday, January 7, 2008

58,281

Second Life has had some login and asset server issues again lately. Some have noted that this is related to each increase in concurrency. We're seeing upwards of fifty-five thousand people online during the day. The highest I have seen was 58,281 people online.


This is cited as one of the problems with Second Life: With each increase in concurrency, failures start ripping across Second Life as the grid struggles with the load. It does this every time. The simple fact of the matter is that Second Life was just not meant for this volume of people online at the same time. When it was first opened, ten thousand was probably seen as the highest that would ever be achieved. This is reasonable given that at the time Second Life was hardly marketed to the public and sign-ups were throttled by the requirement of a credit card.


The oversight lay in opening up the grid to free registrations. This in itself is not a bad thing, the issue arises when you consider that, judging from the records (i.e., the town hall transcripts I managed to pull off the SL History Wiki), the Lindens just opened it up with no prior preparation other than expanding the help desk and updating the client to fix many of the com temporary bugs. I couldn't find anything that indicated that they took a good hard look at the structure upon which Second Life was built, and adjusting it to ensure the smooth running of the grid at 100,000 concurrent users. Is that excessive? Oh yes, but one has to take the predicted future numbers, and multiply them by seven to get a better idea of the kind of growth you get after relentless advertising by the Lindens and free-for-all registrations.


Some may say my flaw is my inherent assumption that Second Life would have grown as large as it did, and that no one at the time could have foreseen this. Nonsense. It never hurts to be overcautious. Someone at sometime should have sat down, and honestly asked themselves how many people can load onto Second Life before server/client communication breaks down, and what do we need to do to make sure it doesn't happen. Would you accept an suspension bridge built between Manhattan and Brooklyn that could only hold the current peak capacity? No, you'd reinforce it, so it could handle the traffic of two other bridges nearby and a little extra. You can't be too sure.


That's extreme, though. Second Life, unlike a bridge, won't cause terrible harm and tragedy if it collapses. But the general idea is the same. Linden Lab wants people to come in and pay for membership and buy islands and pay tier, but customers cannot do that if the store is closed. It has been said before, but it bears repeating that the general second life community has expressed its desire for stability over additions and features. I love windlight. But it lags like sin. I believe voice should be available to those who want to use it. But when it's enabled, I get the very wonderful experience of lagging from two servers: Second Life and the Voice server. Sculpties allows for more natural builds. But were they really top priority?


I believe I went over this before, but it seems like Linden Lab in general has its priorities mixed up because of Jira itself. Jira is by the people, for the people, and moderated to a great extant by the people themselves. I could go on about how clunky it can be to use and parse through, but considering that we all managed to get through and log into Second Life itself is qualification enough, and the general user should be able to handle it.


The issue arises from people. The way Jira works is you make a ticket detailing the bug/complaint you have, and other people browsing Jira (perhaps suffering the same problem) vote for your proposal. If you get enough votes, IN THEORY the Lindens look at and fix the issue. When it works, this system is beautiful and things get fixed.


But like most things designed by engineers, it fails spectacularly when you fail to account for the human factor (as engineers are wont to do). Let's go over some Jira mistakes people usually fall into:


1) Usually, people don't search for issues similar to their own. All too often, they make a new ticket. Hundreds of duplicates all pertaining to the same bug flood the system. This not only causes search to overflow, it also confuses people using search in the first place. If I see seventy similar issues, for which one do I vote for?


2) Usually, people create their tickets incorrectly. I have seen tickets labeled as "Show-Stopper" (the highest priority) which involve such frivolous things as "My boat became unlinked" or "My TV won't play". Not only are these things insignificant in the grand scheme of things, they are also far far too vague for the Lindens to figure out where to start. Hardly anyone lists a way to reproduce a bug, which helps when initially searching for it. No one lists their system specifications, which, if you're running SL with 128 MB of RAM, might explain your trouble with crashing. I could go on forever and a day with all the things people do wrong with filling out a ticket on Jira.


3) The people weeding out the good tickets from the bad sometimes foul up. Sometimes they foul up a lot. Do not get me wrong: They are performing a needed service for Second Life, one that does not involve (to my knowledge) a paycheck or any thanks and gratitude. They are all volunteers. But they do screw up. And when they do, Jira pays for it. Being human, they have bias and prejudices which reflect in the issues and tickets they keep or fix or delete. This is pure theory on my part as I have not done any extensive research into this, but the Law of Human F***-ups dictates that they will mess something somewhere. Their actions are less noticeable, so when they do it is not as evident or detectable.


4)The last is with the Lindens themselves. Jira is littered with issues that have been marked as resolved. Many of them you will recognise: "Phantom prims appearing on parcel, can't be deleted", "animations getting stuck in sim seams/crashed vehicles/upon log-in", etc. The hilarious part is that these 'resolved' issues aren't. They all still occur. Who claimed they were resolved? There was a minor issue on Jira a few months back which I will recount.


About five months ago, a new bug stalked Second Life: The Ass Affinity (or Ass Effect to some) bug. For those extremely new or absent during this period, I will explain. Occasionally, when you teleported or crossed into a different sim, all of your attachments would fly up into your rectum and wedge into your colon. The only way to fix this was to detach and retach each attachment, which was a hassle if you wore lots and lots of attachments. HUDs were also subject to this bug on occasion.


This bug was claimed to have been fixed no less than three times in four separate blog posts. After the first 'fix', they listed it as resolved on Jira, and never was it listed as 'reopened' or any of the other silly titles they have for things they overlooked. It was a very very surreal time, being told this bug was fixed when your hair suddenly teleports from your head to your anus. It defied all logic. Any normal human being spending ten minutes inworld would have noticed it within the first few minutes. This indicates that the Lindens don't spend enough time inworld, or they don't login at all, or they place too much trust into Jira. "Oh, it says resolved. I guess since it just happened to me I need to clear cache and reinstall the client or something!"


I have gone far off topic, but the issues are interrelated. Having given no forethought into the necessary measures to support a rapidly growing world, it amazes me that the last remaining recourse, Jira, is as fouled as it is. I don't have any idea of how to go about fixing this problem, but it's not my job. I don't earn a salary from Philip to figure out LL's problem. But I sure can whine about it.

No comments: