PSA: Memory Leak found in TF2 and most Source Engine games
44 replies, posted
[I][B]Foreword:[/B][/I]
I already reported this to the TF2 team, but since then I found out that the bug affects more than just one game, hence I think a public thread would be more effective to bringing attention to the issue.
It's not something that can be exploited by anyone, but rather a really annoying bug that will frequently crash your game out of nowhere. I also frequently see posts on other forums about the error messages mentioned below, but the solutions given to the affected users are all treating the symptoms, rather than attacking the problem at its core. It's also possible that certain people have already found out about this before me, but as of yet I haven't seen any real discussion about it, despite the bug being one of the main source of crashes in community servers for both TF2 and CS:S.
I'm also putting this thread in this subforum because I think TF2 is the most affected by this, with CS:S being second.
[B][I]With that, let's get on to the topic:[/I][/B]
So I recently discovered that there's a Memory Leak affecting almost every current version of the Source Engine, excluding all versions of Source2.
Note that DOTA 2 and Source Engine games based on the 2004 Build are unaffected. Gmod also doesn't have this problem, possibly due to originally being based on the 2004 build and hence using a different method to load custom sounds.
[B][U]Long story short: [/U][/B]
Whenever someone, or something attempts to play an MP3 file that doesn't exist on disk or cannot be loaded through normal means, the engine will allocate memory but never free it, taking up space on RAM until the game is closed.
For TF2, CS:S, HL2:DM and HL:DM Source, the associated error message is the following:
[CODE]Failed to create decoder for MP3 [ file.mp3 ]
Failed to create decoder for MP3 [ file.mp3 ]
*** Invalid sample rate (0) for sound 'file.mp3'.[/CODE]
As a sidenote, something seems to clearly be going wrong here, although it is NOT the cause for the underlying memory leak.
It attempts to call the decoder twice for a file that doesn't exist, and then acts as if the file was loaded but had an unsupported sample range of 0.
For reference, this is what it should output for a file that is missing from disk:
[CODE]Failed to load sound "file.mp3", file probably missing from disk/repository[/CODE]
And this is what it should say for a file that has an unsupported sample range:
[CODE]*** Invalid sample rate (32000) for sound 'file.mp3'.[/CODE]
The memory leak does [B]NOT[/B] occur if the file you are trying to load is actually found on disk, no matter if the sample range is correct or not.
It also does not occur with files of any other type than MP3, leading me to believe that it's specifically the MP3 decoding causing this issue.
Now normally, a single call to the MP3 handler does not take up much space on RAM and would under normal circumstances not cause crashes. Had it remained like this, the bug would probably have remained undiscovered.
However, as of at least the beginning of this year there appears to be yet another glitch in both TF2 and CS:S that causes random sound files downloaded from community servers to not be loaded correctly into the sound cache.
This in turn can lead to random MP3 files failing to be loaded at completely random times, and in certain cases cause the MP3 handler to be executed in an endless loop, flooding the console with the message [I]"Failed to create decoder for MP3"[/I] .
[B][U]The bottom-line is:[/U][/B] Whenever you see this [I]"Failed to create decoder for MP3"[/I] message being spammed in console, you are currently experiencing the full effects of the memory leak and are likely to crash within 5-7 minutes due to running out of memory.
A temporary fix for this issue is to quickly delete the sound.cache files from your TF2/CS:S folder and then use the console command "snd_restart" when you notice the message appearing. This should allow the affected MP3 file to be loaded properly.
It does not substitute a proper fix from Valve's end however, since you usually have no idea whether or not the glitch is occuring [I]until after[/I] you crashed. Which makes this leak VERY annoying to deal with.
It is also possible that the crash occurs much quicker for people with lower RAM.
I myself have 16GB of RAM, and the games allocate anywhere up to 3'000 - 3'500 MB until they crash. It's possible that this amount is based on globally available RAM, so the time to crash might be much shorter for your machine.
Additionally, anybody casually playing would have no idea this was a Memory Leak, since the associated error message "Out of Memory" does not appear. Every time it crashes on my end, it's now a CTD without error message.
[B]Moving on[/B], I originally thought that this bug was restricted only to TF2 and CS:S, as the aforementioned error messages don't seem to appear in Half Life, Portal, CS:GO or Left4Dead.
This is however merely an illusion.
Regardless of what error message appears in console, by using the console command "play X.mp3" and referencing a file X that does not exist on disk, you can reproduce the bug on almost every current Source engine game.
Simply execute the command a number of times and watch the hl2.exe process. If the memory usage goes up, then the game is affected.
A convenient command to reproduce the bug on games that support the "wait" command is [CODE]alias MLeak "play x.mp3; wait 1; MLeak"
//And then execute MLeak[/CODE]
This automatically executes the play command, allowing you to watch the game fill up the RAM
On games where the "wait" command was removed, such as CS:GO, you can use multiple layers of aliases to execute multiple "play" commands in one single call instead.
[B]Example:[/B]
[CODE]alias M1 "play x.mp3;play x.mp3;play x.mp3;play x.mp3;play x.mp3;play x.mp3;play x.mp3;play x.mp3;play x.mp3;play x.mp3"
alias M2 "M1;M1;M1;M1;M1;M1;M1;M1;M1;M1"
alias M3 "M2;M2;M2;M2;M2;M2;M2;M2;M2;M2"
//And then simply execute M3 to instantly allocate hundreds of Megabytes of memory[/CODE]
[B]The Source engine games that are confirmed to be affected by this (so far) are: [/B]
[QUOTE]Alien Swarm
Counter-Strike: Source, Counter Strike: Global Offensive
Day of Defeat: Source
Dino D-Day
Empires
Fortress Forever (Build 4044)
Half Life 2, Half Life 2: Episode 1, Half Life 2: Episode 2, Half Life 2: Lost Coast, Half Life 2:Deathmatch
Half Life: Source, Half Life Deathmatch: Source
Left 4 Dead, Left 4 Dead 2
No More Room in Hell
Portal 1, Portal 2
Team Fortress 2
[/QUOTE]
[B]The Source engine games that [del] are confirmed not to be affected[/del] where people couldn't reproduce the bug (in certain cases) are:[/B]
[QUOTE]
Gmod
The Ship (Build 3057)
Vampire: The Masquerade - Bloodlines
Black Mesa[/QUOTE]
These lists aren't comprehensive, and it's possible that more games are affected. However I assume that games based on engines before Build 3057 do not have the bug, and anything after 4044 does.
[B]And finally, Source 2 AND GoldSource are not affected by this issue altogether.[/B]
I believe this to be one of the main causes for crashes on community servers for TF2 and CS:S at the moment. Singleplayer games and Multiplayer games without a significant community will most likely not suffer from this bug.
Note that you cannot cause anybody to crash with these console commands but yourself, so if you were hoping to grief somebody with this bug, you've come to the wrong address.
And yes, that's about it.
Would be good if anyone could test the remaining Source engine games to see how far back the bug goes, as I don't own all of them.
[I]PS: I wonder if Titanfall is affected too.[/I]
Nice find, but I'm just curious as to why you're only posting this thread in just the tf2 subforum when it affects other source games?
Also, no-one plays titanfall lol
[QUOTE=Exploderguy;50800294]Nice find, but I'm just curious as to why you're only posting this thread in just the tf2 subforum when it affects other source games?[/QUOTE]
It's said why in the foreword.
It's also the only game affected with its own subforum.
Valve are professional coders.
At least source 2 fixes
Memory leaks are small bugs with big consequences. It's really easy to forget to free memory once you allocated it and it happens to good programmers as well occasionally.
And in a large system like Source it's difficult to determine if there actually is a memory leak. This bug has existed for at least 4 years from what I can tell and nobody really noticed what was going on until now, mostly because its causes aren't really apparent unless you know exactly where to look, and its bad properties only manifest under the right circumstances.
Occasionally I had L4D2 crash while changing settings, could it be related to this or it's a different matter altogheter?
if there's no missing file involved, then I assume there's no leak?
[QUOTE=Hell-met;50801286]if there's no missing file involved, then I assume there's no leak?[/QUOTE]
If it can find the file the leak doesn't happen, yes. Memory usage doesn't increase even with hundreds of play commands being spammed.
However, there are circumstances where the file is present on disk but something is screwed up in the sound.cache file, causing the game to behave like it can't find the file at all.
The leak occurs then also, which is where the random crashes on community servers come from.
For example, the latter situation has happened to me in a Deathrun server once where the map itself had an .mp3 file that was supposed to be played at a certain spot in the map.
There was no way the file could've been missing, as it was packed into the bsp file, and other people heard the music perfectly fine. I on the other hand always crashed after the same amount of time being spent on the server.
The problem was fixed for the time being once I deleted all sound.cache files from the TF2 folder and used "snd_restart". So I'm pretty sure that the random cache issues and the memory leak are two separate problems synergizing in the worst way possible.
It however reoccured in a VSH server, again with one of the music files being unable to load. It was present on disk at the time, and again deleting sound.cache, restarting the game and then executing snd_restart fixed it.
[editline]30th July 2016[/editline]
[QUOTE=Fox Powers;50801261]Occasionally I had L4D2 crash while changing settings, could it be related to this or it's a different matter altogheter?[/QUOTE]
Did it occur instantly after changing settings? Or did it take a while to manifest?
Most importantly, did it happen inside or outside of the game? Because without explicitly invoking the play command, you should not be getting the leak on the main menu.
There needs to be something attempting to play a sound after all.
[QUOTE=Doom64hunter;50801484]
Did it occur instantly after changing settings? Or did it take a while to manifest?
Most importantly, did it happen inside or outside of the game? Because without explicitly invoking the play command, you should not be getting the leak on the main menu.
There needs to be something attempting to play a sound after all.[/QUOTE]
I recall changing the settings to low, and then hitting accept, it would crash and after a couple of minutes it went to desktop
was while ingame
[QUOTE=Fox Powers;50801600]I recall changing the settings to low, and then hitting accept, it would crash and after a couple of minutes it went to desktop
was while ingame[/QUOTE]
Well I don't know, perhaps changing the settings screwed something up.
The only way to verify that it is indeed this particular issue is to
1.) Have the console open while playing, and regularly checking whether any missing sound messages are coming up.
2.) Check your memory usage from time to time if the problem keeps occuring.
Are the "MP3 initialized with no sound cache, this may cause janking" and "BlockingGetDataPointer: Async I/O Force" errors related to this by any chance?
So this is why i had random crashes while playing TF2?
[QUOTE=MattTheSpy;50801636]Are the "MP3 initialized with no sound cache, this may cause janking" and "BlockingGetDataPointer: Async I/O Force" errors related to this by any chance?[/QUOTE]
You're on to something here -- specifically concerning the former message.
I looked around a bit on the internet and found [URL="https://forums.alliedmods.net/showpost.php?p=2381777&postcount=644"]this random report [/URL] of some Robot footstep sounds for some mod not working properly.
Looking at the report, we can see the message you mentioned along with the familiar "Failed to create decoder for MP3 [...]" message that indicates the leak happening. I think I remember seeing this error myself sometimes when the bug occured, but it's been some time.
Anyways, the memory leak itself is present in any Source game dating back to Build 4044, and is completely independent of the above messages. You can invoke it by simply referencing a file that doesn't exist.
However, this error message supports the theory that there is something wrong with the sound cache system itself, and that it causes the engine to fail to load certain files, directly synergizing with the memory leak and enabling it to crash the game in community servers.
The second message you mentioned however occurs often and it doesn't seem like it has any effect on memory usage.
[QUOTE=mr appie;50801650]So this is why i had random crashes while playing TF2?[/QUOTE]
Nobody can guarantee that this is the exact cause for your crashes. It might as well be something completely different.
In other news, I tried to reproduce the memory leak in Fortress Forever, which is apparently based on Build 4044, and it's also present there, making this the oldest known occurence of the leak so far. That's almost 7 years without discovery!
Just tested Half-Life: Source. It also has a Memory Leak
EDIT: GoldSrc games are not affected by this.
Can confirm that Day of Defeat Source also has a Memory Leak.
Oh yeah, I forgot to mention that GoldSrc games weren't affected. Thanks!
That completes it then, all Valve-made Source 1 games are currently affected.
I'm going to look into this.
I'm seeing about 14 MB of added memory usage per 1000 play's. Does that correlate with what everyone else is seeing?
[QUOTE=sigsegv;50803416]I'm going to look into this.[/QUOTE]
Oh hey, you're that dude that got contacted by Eric for your efforts.
Welcome to Facepunch!
Thanks.
I'm pretty sure I've already narrowed down the probable cause of the leak. I'll give more details in a little while if I'm right about this.
[QUOTE=sigsegv;50804045]I'm seeing about 14 MB of added memory usage per 1000 play's. Does that correlate with what everyone else is seeing?[/QUOTE]
Good to see you here sigsegv!
I'm getting about 18MB of added memory per 1000 plays, but it's very inconsistent.
From a single execution of the play command, I get memory usage increases of either 4KB, 8KB, 12KB, 16KB, 20KB, 24KB or some very rare jumps far above the hundreds.
The fact that's it multiples of 4 is probably related to the page frame size in x86-64.
I currently have two leads on possible sources of the leak.
Both of these possible causes involve ~16KB allocations, which incidentally don't actually have anything directly to do with page size; it's just the arbitrary buffer size they decided to use for a couple of different things. But the reported memory usage of the process would tend to be rounded to multiples of 4KB due to the page size.
Incidentally the increase-per-1000-plays is looking more like 16MB-per-1000 to me right now. And currently I'm suspicious that either CAudioMixerWaveMP3's or CWaveDataStreamAsync's are getting leaked. The former class contains a 16KB buffer as a member, and the latter class does a dynamic allocation of a 16KB buffer in its constructor. And I have good reason to believe that either may well be involved in some "allocate, then immediately lose the pointer and never bother to delete" shenanigans. (One ~16KB buffer leaked per play fits in nicely with 16MB leaked per 1000 plays.)
I spent a few hours earlier debugging this on Windows using detours, malloc/free hooking, etc, and I'm getting a bit tired of that. Going to fire up the Mac and dig in with the debugger on a platform where I can actually look at some damn symbols.
Unfortunately, the fact that the change involving the MP3 system happened somewhere in the ~2009 timeframe means that looking at the leaked ~2007 source code doesn't give the full picture. And essentially all of the sound-related code isn't in the public 2013 SDK, since it's engine stuff. So I have to work off of hints from the 2007 source, plus whatever I can gather from disassembling the current binaries.
[editline]30th July 2016[/editline]
Also, I don't happen to have any engine dll/so/dylib's from that timeframe, so I can't BinDiff it either.
[editline]31st July 2016[/editline]
You sure GMod isn't affected?
I was able to get it to saturate the 32-bit address space with the infinite-loop alias command. On the other hand, it took about 10 minutes on a fast machine, and the process didn't crash [I]per se[/I]. (The command did essentially soft-lock the game for the entire time.)
[IMG]http://sigpipe.info/misc/20160731-gmod-leak.png[/IMG]
That is very strange, no matter what I do I can't get it to occur in GMod.
Whenever I use the play command with a non-existant mp3 file there, I get the following message:
[CODE]Create Stream Failed error 41
Failed to load sound "x.mp3", file probably missing from disk/repository
[/CODE]
Which is distinct from all other Source games.
Version info is:
[QUOTE]Protocol version 24
Exe version 16.02.26 (garrysmod)
Exe build: 16:43:36 Jul 2 2016 (6447) (4000)
[/QUOTE]
Yeah I get that same error message in GMod. None of the additional stuff about being unable to create the MP3 decoder, or janking, etc.
[QUOTE]Protocol version 24
Exe version 16.02.26 (garrysmod)
Exe build: 16:43:36 Jul 2 2016 (6447) (4000)[/QUOTE]
What's also strange is that the wait command seems to have worked on your end, but not on mine.
Could this discrepancy have something to do with the way Gmod draws resources from other games?
For me, Gmod only freezes when I use MLeak in console. If I keep the game open for 2 minutes, it would crash. Also nothing was added to the memory.
[QUOTE=mr appie;50804839]For me, Gmod only freezes when I use MLeak in console. If I keep the game open for 2 minutes, it would crash. Also nothing was added to the memory.[/QUOTE]
If I recall correctly the 'wait' command is actually disabled in GMod.
[QUOTE=mr appie;50804839]For me, Gmod only freezes when I use MLeak in console. If I keep the game open for 2 minutes, it would crash. Also nothing was added to the memory.[/QUOTE]
Well if "wait" doesn't work it will just skip that command and go right into the next recursion, using up all the available processing time for executing further "play" commands. That's why it freezes.
And yes, that's the same effect I'm getting if I execute the alias in the OP.
[QUOTE=MattTheSpy;50804862]If I recall correctly the 'wait' command is actually disabled in GMod.[/QUOTE]
Didn't know that. Anyway, I just tried the one without 'wait' command.
I got the same error as the OP said. But again, nothing was added to the memory.
Wow, Source 1 has become a goddamn mess.
First time replying here, I had to make an account just to comment on this.
There's been a memory leak on OS X for more than a year now (with the roots of it pointing towards the April 29th, 2015 update, the update that caused the game to crash upon map switching with the "can't load lump x, allocation of xxxx bytes failed!" error. That update's patch notes had a "fixed a crash related to the material system" note in it, so my guess is that the issues started all the way back then, Gun Mettle just made it a bazillion times worse). You may remember how the Gun Mettle updated caused TF2 to crash on OS X, which was then "solved" by Valve by not allowing us to go over medium texture quality. More than half a year passes, and Tough Break comes around, with a supposed fix. Texture quality is allowed on high again (but still not ultra high), yet the game keeps running out of memory on high. Back to medium it is. More recently I've seen people on Windows 10 experiencing this game breaking bug as well, with the same "lowering texture quality seemed to fix it" solution as well. I've been sending Valve emails ever since Gun Mettle, with some response back, but no actual working fix.
So it seems that there are now two memory leaks in Source. One regarding audio (as this thread explains), one regarding the rendering backend (presumably in the ToGL conversion layer Valve uses on OS X and Linux, however that doesn't explain the W10 crashes with the same cause since TF2 on Windows doesn't use OpenGL (and -gl in launch options doesn't work)). I was linked to this thread by our resident bug tracking/fixing lord sigsegv, I've been talking to him about the texture quality memory leak for about a month now.
Sorry, you need to Log In to post a reply to this thread.