Server lag and general optimisation

*Before I get started on this please note that I am not exactly the most knowledgeable person when it comes to dedicated servers, I came from using pay for slot servers and now I am taking a learning curve, this is apart of that curve so please refrain from posting useless comments or criticising me on my knowledge. Thanks. *

Ok so I have scoured the internet for any information possible and gathered together some programmes, info and code to optimise my game server to perform in the best way possible.

My specs:
I am currently using a server hosted by OVH, the MC-32 package including;
32gb RAM
i7-4790k 4 cores/8 threads running at 4.0ghz to 4.4ghz
2nr 240gb SSD’s (RAID)

Using TCAdmin 2.0
What I host;**

I host 3 game servers on Garry’s Mod and 2 Teamspeak 3 servers on the same CPU, each gmod server has its own core (2 threads) and the ts servers are on the same core.
-GHalo (being made)

DarkRP roughly has around 30 players on at a time and has 45 playerslots, the server automatically restarts daily at 3AM every morning.
TTT has only just opened but continues to hold around 20 people at a time.
I would just like to note that I used to host a TTT server and a few TS’s back at the start of the year and they were running smoothly.
Also, All of the servers are set to “normal” priority

The Main Issue:

My main issue is that my DarkRP server has heavy lag spikes when there is over 20 players on the DarkRP server. The TTT server and TS’s are completely fine and have no issues.
The server tends to drop down to around 5fps as seen in one of the images below and when not lagging runs at around 60 fps.
After looking on the internet I done my research to find that Garry’s Mod uses 1 core maximum, which uses 2 threads, 1 for the servers physics side and 1 for the networking side.

What I am looking for;

I am looking for someone with past experience to help me figure out what could be causing this issue, just your comments below are helpful to me. I am not entirely clued up on dedicated machines as I have previously used pay for slots servers and they tend to have everything all sorted for the optimisation side of things.
I did have a look at upgrading the processor to a better one, here are the packages - however I am unsure if this is a waste of my money. Financially this is not an issue for me, I just want to fix this issue

Please find below some screenshots;

Net_graph 4 image - The lag spike is at the bottom.
htop from SSH (PuTTY)
lag spike with SV FPS -
Prop limits for players -
server.cfg values -

If there are any more questions that you will need answers to help me, please let me know. Thanks.

[editline]28th November 2016[/editline]

I also forgot to mention, with the HTOP programme that I installed on ubuntu. it shows that DarkRP is running 1 thread at 100% and TTT runs that thread at around 44%.

Why run 2 teamspeaks? why not one big teamspeak? Two communities?

Try removing the most recently added addon. Might be too hard on the server’s resources.

Try blocking all big models using FPP (Built-in). (Giant cube, etc.)

If that doesn’t fix it, try lowering the prop limit 40*30 = 120. maybe 30. then it would be 90 for 30 players. Many servers even go for 15 but, when you play on them you are forced to teamup to get a good base.

All of your Garry’s Mod servers are using the first core. Make sure to set affinity on them.

(And try adding -tickrate 33 to your commandline)
(Also your server is more than powerful to run Garry’s Mod servers just read my first hint)

If you haven’t already figured it out, try this link to turn on affinity.

We run a 30 prop limit for users and 50 for VIPs at an average of 100-120 players. OPs problem is more then likely his pasted addons.

Just wonder the props limit is per player or server?

Per player…

[editline]29th November 2016[/editline]

I used to have someone who dealt with the teamspeak side of things, I have 1 teamspeak that does not have a license as you need to pay , thus meaning it only has 30 playerslots, which is my main, and another to test features without directly fucking shit sideways.

It’s been lagging from day 1, I spent lots of time creating this server and its been in slow development over time. The main addons are GUI’s, content wise, it has m9k and playermodels, also has aritz and drugs mods. Overall there isn’t much content so I do not believe that this is the issue. However, if @StonedPenguin is saying this too then I will have another look over this tonight and see if there could be any addons that are fairly CPU heavy.

Tried this, the server lags when there is like 30 players on using generic props that people use on all servers.

Forgot to mention, I am using Ubuntu 14.04 64bit any advice on this?

Thanks, I will try this and let you know.

TCAdmin manages affinity for you. Go to your service, and then into “Service Settings”, and you’ll find this. Make sure they’re all un-ticked.

Also, like shown above - your game servers should run on “Above Normal”, or at least that’s what I do. It just means they have higher priority over other system processes.
Hope this helps

[editline]29th November 2016[/editline]

Just as a note though; I doubt upgrading will help you here. It’s quite clearly a configuration issue.

Thanks for the help. In my first post I mentioned that each server has its own core, was unsure of the correct terminology.

DarkRP - CPU 1
GHalo - CPU 3
TS3 servers - CPU 4

Ill try changing the runtime to above normal see if that works, cheers.

No point in doing that. You’re best to leave the OS to handle the threading itself honestly

So should I just assign the servers to every core or every server to core 0 for example?

Do what I did in my screenshot; untick them all

I’ve seen mixed results when assigning a server to a single core, it’s helped in some cases but in other made it worse when at 12-13% (maxed core).

Tho this was with Windows which has terrible thread handling. I’d suggest setting your sv_maxrate to 0, this should alleviate any choke you are seeing.

With help from Lew, over at Neat Node / Visca Gaming, he shown me that the issue was because I had DarkRP on a MySQL database which was on a different server in a different location. - FYI if anyone else has this problem in the future.

Care to explain how Windows handles threads terrible?

I don’t mind.

For explanation purposes, I’m going to assume there is only have 1 CPU with 1 core, this all scales up though, as long as you have many more processes than cores.

First, a little terminology:
There are many places that data can be stored on your computer. One of these places are the Registers these a small bits of memory physically inside your CPU that store the values your CPU is operating on at the moment. In the x86 architecture, you have 40 registers(real processors have a few more because there are some clever ticks you can do to speed up execution in certain cases). At any rate, if you want to save all the data a CPU is using at the moment, you need to save at least 40 numbers somewhere else.

If you’ve ever programmed in C or C++ you may have seen a picture that looks like this:

This is a programs “Address Space”. It contains all the things your program needs to run, and is a neat model for many reasons. Among them being, you can grow your program infinitely! A whole 4gb (in 32 bit architecture, or 2^64 in 64-bit architecture) of address space is either yours, or not yours yet
Under closer inspection, this seems a little silly. You can write a program, and start it twice. Each program has this model, which means each program has an address 0x00001000, but of course there’s only 1 physical address 0x00001000, so who gets it?
The answer is, usually, neither of them. This is where the operating system comes into play. It is the operating system’s job to hand out memory to user-space processes, and keep track of who has what memory. Between hardware the the operating system, an address translation has to happen every time you read or write data, and even every time your program advances an instruction. This would be way to slow to do in the operating system, so there exists hardware support, but it still needs an operating system to do a little setup work before the hardware can do address translation.

The hardware setup for all this is done when the operating system decides your program gets to have a little CPU time, it preforms a context switch where it loads your program’s registers, and tells the CPU to start executing your instructions, either from the start or where you last left off.

There are many reasons the operating system might decide you’ve had enough processing time, the most common is because your program is waiting on I/O of some kind, disk, internet, or keyboard/mouse. If your program is just waiting around, it makes sense to run some other program and come back later. Every time you wait on something, the operating system will do a context switch and come back you later.

These context switches are expensive, they are pure overhead and they are so expensive that we can generally talk about the speed of an operating system by the number of context switches that must happen. In windows, depending on your processor, context switching takes between 50 and 100 ns.

But wait, there’s more!
Microsoft windows is designed on something called a micro-kernel architecture. The point of a micro-kernel is to take most of the code, and put it into the user-space. This hopefully enhances security of the operating system since your “real kernel” is very small, and the programs that manage file systems, drivers, and so on are under the guarantees of user-space programs. (Windows actually decided to not have these in true user space, but instead made 3 “rings” of protection, where regular user space programs live in ring 3, the kernel lives in ring 1, and the various parts that are still operating system live in ring 2). This all sounds well and good until you realize that now that these things live in your space, you have to context switch again to get them to do anything! Oh no!

I’m sorry, but that does not explain jack shit. It does not show why Windows threading is terrible, why Linux/OSX/whatever threading is better, or anything like this. Also do you realize that what you’ve described pretty much applies to Linux and OSX too?

And then there’s quite a bit of bullshit too:

First, that number is rubbish. Second, you don’t need to save the state of various control registers, which means there’s a lot less data to save than what you claim.

Sorry, wrong. All Windows NT operating systems use a hybrid kernel, which means that, unlike you stated, you don’t have to do context switches when accessing various kernel components. Also, it should be kinda obvious that shit like drivers doesn’t run “under the guarantees of user-space programs” because there’s such thing as Blue Screen of Death which you see when a driver crashes.

What a complete rubbish. First, Windows didn’t make the protection rings, they are actually implemented in hardware on x86 CPUs. Second, the kernel runs in ring 0, not 1. Third, rings 1 and 2 on x86 are basically obsolete now.

Also about that image… If you’re talking about Windows primarily, why did you use a diagram of Linux address space? :goodjob:

Not quite! Linux and OSX are built as “monolithic” kernels (although we can argue this all day too with things like FUSE), as opposed to the microkernal, that is, the various operating systems peripheries are really part of the kernel space, no second context switch needed to do the work!

16 universal registers
5 control registers
6 segment registers
3 descriptor registers
8 memory registers
= 40 registers, am I missing any? I think I’m only counting user-mode registers, I think there’s 80-odd registers defined in x86

You’re right, if you want to be pedantic, Windows NT is a hybrid kernel, you only need to do one context switch from user-space to kernel space…
Unless you want to access the windows environment, or windows services, or the windowing system…
I’ve always wondered why people call Windows NT a “hybrid”, as I see it as a marketing trick. Can you elaborate?

Again, you’re getting pedantic. I didn’t mention it because most drivers are user-space, and I didn’t feel like explaining (and probably don’t completely understand, I’ve never written one) the difference between kernel space and user space drivers in windows.

My mistake on the ring count (or a typo if you’re feeling generous, I promise I know it’s a 2-bit identifier!). As for them being implemented in the CPU, x86 only calls them “privileges”, and says that 1 and 2 should be used for device drivers. “Rings” is Windows terminology.

Because it’s the one “everybody’s taught”. I used the linux diagram for the same reason I talked about the 32-bit x86 for the whole post and not x86-64, even though we mostly all use 64 bit: it’s the simpler one. Windows is mostly the same anyway just the system at the bottom instead of the top.

This, for example, has more details:

Windows kernel, despite being a microkernel, has nearly all “extra” services running in same address space as the kernel itself, which means they can be invoked directly without need for IPC.

There’s at least half a dozen model specific registers on modern CPUs, and there are also SSE/AVX registers. But as I’ve said, you don’t need to save about a half of the ones you’ve specified because they’re service registers, not general purpose registers.

Its not like you can use all 40 registers. First of all all integer calculation is mainly using general purpose registers which are 8 (32 bit)/16 (64 bit) and you cant even control EIP/RIP via instructions.

Then segment registers, well go on and read how segmentation works, x86 defaults to ds for all data transfer.

And for your previous post I highly doubt you understand anything about the Windows Kernel.

““Rings” is Windows terminology.” <- Get your facts right before posting.