Language Question - How are strings handled in lua?

I know strings are immutable in Lua, I was googling around for a while and while changing a string in anyway simply creates a newly allocated string, does assigning/passing in to a function make a full copy of the string as well?

Say I have the following examples
[lua]
local strA = string.rep(“a”, 12800) --100kb string
local strB

strB = strA
–Are there now 2 100 kb strings in memory?
somefunction(strA)
–Again, was a new 100kb string allocated and pushed onto somefunction’s stack?
[/lua]

If the answer to those questions is yes, then how can something so absurd not be compensated in lua? Is there no way to handle a large string efficiently in lua? I realize if I convert the string to a integer table containing the string’s bytes as keys that would be more efficient to work with (I do this when building large strings from lua).However, there is no default lua function to take an existing large string and convert it into a table which means there is no fast way to do it and thus that solution is futile without a custom library.

string.Explode("", “explode this string”) puts every character into an table in GMod, but I forgot if there was a function for it in normal Lua. (isn’t string.split a function?)

And what you’re saying would make sense because strings are arrays, but unless a new string is allocated memory when it’s changed, it does not appear to be referenced the same way tables are.

In short, the answer to your question is probably yes and then the question would be, this should indeed be changed.

Right but you’re still going off speculation like me.

Also string.Explode is written in Lua therefore it is in no way a solution since it has to iterate over the string with string.sub and/or string.find referencing it several times.

Use string.gmatch. You can easily iterate over a truly massive string in very little time with it.

In Lua, the single most expensive thing you can do is call a function, followed by garbage collection.

So, if you are going to iterate over a string, you want to do it as much in C as possible, which means as few function calls as possible.



for line in str:gmatch( "[^
]+" ) do
  --something
end


Is the absolute fastest way to go over a string. You just adjust your pattern to fit what you are doing.

For example:


x = string.rep( "a", 12800 )
t = { }

z = os.clock( )

for k in x:gmatch( "%a" ) do
	table.insert( t, k )
end

print( ( os.clock( ) - z ) * 1000, "miliseconds" )

Takes about 20 miliseconds on each run on my computer - although your mileage may vary.

Thanks for the example though I do know about gmatch, so you are confirming that strings are copied when passed into function/other variables?

If so then if I call stringvar:gmatch() like you did above, would it not copy stringvar for every call to gmatch (since it is passed into the function’s stack)? If it does then the problem still exists.

Also, garbage collection is more expensive then calling a function. Though functions would make sense as 2nd place.

In garrysmod, garbage collection isn’t always called - it only happens once you reach a minimum amount of garbage. In theory, you can minimize the damage this causes by reducing the amount of garbage created. For example, if you re-use tables instead of creating them when dealing with them is the best way - scripts that create a new table each time they call surface.DrawPoly or similar will garbage collect very quickly.

Therefor, when you are trying to do something in a loop the most expensive thing you can do is call a function since garbage collection will not occur during the loop under most conditions.

A fun thing about Lua is that all string functions are part of the string metatable, so you can do this:



x = "12"

print( x.sub )


Will print out a function.

I don’t know if you end up with more string garbage if you do x:gmatch compared to string.gmatch( x, … ), though I do just prefer the former.

Everything except tables in Lua are passed by value.

[editline]2nd June 2011[/editline]

It’s exactly the same. The string metamethod points to the string function, meaning no copy is created.