I had wanted to delay this post for a bit so that I could have everything finely polished, but my coursework is kind of cutting in. I'm putting this out there for discussion so that when I come back to it I've got a fresh perspective and a bunch of ideas.
I think many of you are familiar with the situation with model formats. Most of them suck. There are plenty of libraries like Assimp to make things a little less painful, but they're big and heavy and sometimes you just don't want to deal with it all. You guys seemed to agree, but in typical WAYWO fashion we started arguing over names and format details without ever having the prospect of getting things complete. What I've tried to do is give you guys are core to build from. I've got the basics down on paper and in code, so the rest of the work can be done pretty much individually.
I think a lot of people resort to writing their own model formats, so I'm just trying to unify some of the effort here.
What is it?
It's a simple binary format for static meshes. It's not unlike IQM, because I tried to emulate a lot of what they've done. Of course, there are differences. Instead of having skeletal animation, there's collision hulls, bounding boxes/spheres, and an occlusion mesh for dynamic occlusion culling. Everything you need for static scenery.
Why should you use it?
Because your only real alternative is Wavefront OBJ :|
What can you do?
- Help edit the spec. Your input is valuable!
- Write an exporter/converter. The git repo has a file called 'writing_exporters.txt' with my notes. I've also written a Blender 2.6.2 input script to serve as an example.
- Use it!
What's done already?
There's a spec, a C library, a Blender exporter, and a test/example application that allows you to view BSM models.
[t]http://i.imgur.com/d2nrM.jpg[/t]
Where can I get it?
[url=https://github.com/ml32/Binary-Static-Mesh]Git Repo[/url]
[url=https://github.com/ml32/Binary-Static-Mesh/raw/master/spec/BSM_spec_v1_DRAFT.pdf]BSM v1 Draft[/url]
I had a quick look and it seems great to me.
I have a suggestion about the material and grouping though. I personally think that the material path and name should be removed because it's only going to be relevant for one program, and I just kind of don't see the point for having a default when there will (should?) always be a way to override it. I also think that you should leave 1 file as one mesh (not sure if this was the plan or not), but have a list of vertex ranges as groups. Thoughts?
You're pretty much always going to want a material/texture on a mesh, and I provide only the absolute bare minimum (a single free-form string, which can be a file path or any other unique identifier).
'Meshes' are groups, in this context. They're just groups with separate materials.
[QUOTE=ROBO_DONUT;35420300]'Meshes' are groups, in this context. They're just groups with separate materials.[/QUOTE]
Yeah. Just a mix up of words on my part.
Although my point about materials isn't that you don't want one, it's just that it doesn't make sense to have it in a model file in my opinion. I think it's down to the program to put assets together. I know it can be easily ignored but that's kind of the point. There's so many kinds of meshes that don't have a default material and or have multiple swappable materials, that unless you want to start handling this mess in the model file, why not just leave the whole thing down to the program like it will already have to do.
What?
I realise it's trivial but I don't know a better way to sum it up. Just a niggle with huge obj files that try to represent an entire scene, with a huge list of groups separable only by material with the emphasis on the program being able to support it. That doesn't help anybody at all.
Feel free to ignore I guess? I still think i've got a point.
I dunno, it sounds like you're trying to solve a problem with content authors by imposing unnecessary restrictions :\
I kind of get the gist of what you're trying to say, but what about if you're authoring terrain where you want to have small plateaus with grass on top, but rock on the side. You couldn't do that if you only support one material/model. Or you could do it, but you'd have to pack it all into a single UV map and you couldn't share tiling terrain textures with other scene elements.
Can you give me an example model file to test an implementation?
[editline]3rd April 2012[/editline]
Nevermind, I found one hiding from me :v:
I don't want to restrict the mesh, I just think it's up to some other file to piece together assets, such as which materials to use where. This is almost definitely going to happen anyway in some form so you can change them. So no point letting the mesh have first control over it, which is the expected functionality, just have mesh data and that's it. Sorry it took me so long to have hopefully explained this a bit better.
Bit big for a test file, is it Suzanne?
My only suggestion would be for the material name string as well. Instead of it being null terminated, there should be single byte that tells you how many bytes there are in the field. This way someone might choose to not use an actual string, but instead a UUID or some other way for them to uniquely identify information, which a content pipeline could use (not to mention that being explicit about the length would probably cut down on any vulnerabilities one might get from treating it as a null-terminated string :P)
what if the material path is more than 255 bytes long
[editline]4th April 2012[/editline]
If it's not asking for too much can you export a simple cube BSM?
[QUOTE=Chandler;35421723]My only suggestion would be for the material name string as well. Instead of it being null terminated, there should be single byte that tells you how many bytes there are in the field. This way someone might choose to not use an actual string, but instead a UUID or some other way for them to uniquely identify information, which a content pipeline could use (not to mention that being explicit about the length would probably cut down on any vulnerabilities one might get from treating it as a null-terminated string :P)[/QUOTE]
Sorry to use you as an example but this is kind of why it's irrelevant to put this in a mesh file. Everybody wants to do something different and it doesn't solve anything. What if you get a model from somewhere else and will have to change the material path to something else because it's program specific? You import and export it? You also have to decide at export time what the material path and file is. It makes no sense to handle it this way in your code and therefore no sense to put it in the mesh file.
I really didn't mean to make such a big deal of this so i'll just shut up now.
[QUOTE=Map in a box;35421873]what if the material path is more than 255 bytes long[/QUOTE]
Considering game/program assets are generally relative paths, it really shouldn't be. 255 is a lot of characters.
Maybe it could be a problem with Hiragana or other non-Latin character sets that take like 3 bytes/char in UTF-8? That's still ~85 chars, though, which is still a lot.
[QUOTE=Map in a box;35421873]If it's not asking for too much can you export a simple cube BSM?[/QUOTE]
[URL=http://www.mediafire.com/?44blp7d4hqykw6i]http://www.mediafire.com/?44blp7d4hqykw6i[/URL]
[QUOTE=Chandler;35421723]My only suggestion would be for the material name string as well. Instead of it being null terminated, there should be single byte that tells you how many bytes there are in the field. This way someone might choose to not use an actual string, but instead a UUID or some other way for them to uniquely identify information, which a content pipeline could use (not to mention that being explicit about the length would probably cut down on any vulnerabilities one might get from treating it as a null-terminated string :P)[/QUOTE]
This isn't a bad idea. I'm liking the possibility of UUIDs.
Pulling the strings out of the mesh structure would fix its special case for endianness conversion, but its an extra level of indirection. Using a buffer length value instead of null-termination allows null values in the string, (binary identifiers, etc.), but it means that you have to allocate a new buffer for each string just to tack the null-terminator on the end when you're working with C. Which means a lot of mallocs, and the whole memory management situation just gets messy. Maybe use a mix? Require strings to be followed by a null byte, but also provide a string length? Then the loader can use the string length to enforce the null-termination of strings (drop a '\0' in the gap between strings) and you can malloc() the entire string buffer all at once.
Opinions?
[editline]4th April 2012[/editline]
Also, I wrote about the special case of endianness conversions and the mesh structure, but I forgot about it in the code and reorder the entire structure regardless. Oops. :v:
[QUOTE=ROBO_DONUT;35427013]Pulling the strings out of the mesh structure would fix its special case for endianness conversion, but its an extra level of indirection. Using a buffer length value instead of null-termination allows null values in the string, (binary identifiers, etc.), but it means that you have to allocate a new buffer for each string just to tack the null-terminator on the end when you're working with C. Which means a lot of mallocs, and the whole memory management situation just gets messy. Maybe use a mix? Require strings to be followed by a null byte, but also provide a string length? Then the loader can use the string length to enforce the null-termination of strings (drop a '\0' in the gap between strings) and you can malloc() the entire string buffer all at once.
Opinions?
[editline]4th April 2012[/editline]
Also, I wrote about the special case of endianness conversions and the mesh structure, but I forgot about it in the code and reorder the entire structure regardless. Oops. :v:[/QUOTE]
I think that could lead to some interesting bugs with regards to strings, for example if the mesh is missing a \0 at the end, but in that case you could just abort the mesh loading.
Like the string buffer would be a contiguous list of:
[cpp]
uint8_t size;
uint8_t string[];
uint8_t delimiter;
[/cpp]
'delimiter' [i]should be[/i] '\0', but it is probably a bad idea to assume that when loading, so you'd have to enforce it manually.
Then, when you load the strings in the string buffer, instead of having to allocate and return a bunch of malloc'd buffers (something which I don't like, I think that the [i]caller[/i] should always be responsible for allocating memory and never the callee) or requiring the caller to allocate a buffer for each, you can just have the caller malloc a single buffer large enough for all the strings, then go over the entire thing and:
[cpp]
uint8_t strbuf = &data[header.offs_strings];
int i = 0;
while (i < header.size_strings) {
uint8_t size = strbuf[i];
i++;
/* bounds check */
if (i + size + 1 > header.size_strings) return false;
i += size;
/* null-delimiter enforcement */
strbuf[i] = '\0';
i++;
}
return true;
[/cpp]
Still kinda gross, though. Null-terminated strings are always gross, but it's what a lot of code expects :\.
I think the fixed-length buffer in the mesh structure is a little 'cleaner', because it's got an implicit upper limit on string length (256), and requiring that strings be null-terminated means you've got room to put the null byte if it isn't already there. I think I might just add a string 'length' field and make the UTF-8 encoding a recommendation instead of a requirement so that people can encode any sort of binary information in that field.
How would I go about making the vertices defined in a clockwise winding?
[QUOTE=ROBO_DONUT;35427803]Like the string buffer would be a contiguous list of:
[cpp]
uint8_t size;
uint8_t string[];
uint8_t delimiter;
[/cpp]
'delimiter' [i]should be[/i] '\0', but it is probably a bad idea to assume that when loading, so you'd have to enforce it manually.
Then, when you load the strings in the string buffer, instead of having to allocate and return a bunch of malloc'd buffers (something which I don't like, I think that the [i]caller[/i] should always be responsible for allocating memory and never the callee) or requiring the caller to allocate a buffer for each, you can just have the caller malloc a single buffer large enough for all the strings, then go over the entire thing and:
[cpp]
uint8_t strbuf = &data[header.offs_strings];
int i = 0;
while (i < header.size_strings) {
uint8_t size = strbuf[i];
i++;
/* bounds check */
if (i + size + 1 > header.size_strings) return false;
i += size;
/* null-delimiter enforcement */
strbuf[i] = '\0';
i++;
}
return true;
[/cpp]
Still kinda gross, though. Null-terminated strings are always gross, but it's what a lot of code expects :\.
I think the fixed-length buffer in the mesh structure is a little 'cleaner', because it's got an implicit upper limit on string length (256), and requiring that strings be null-terminated means you've got room to put the null byte if it isn't already there. I think I might just add a string 'length' field and make the UTF-8 encoding a recommendation instead of a requirement so that people can encode any sort of binary information in that field.[/QUOTE]
Why not have it be something like:
[cpp]
uint8_t size;
uint8_t* data;
if (size) { /* read size bytes into data */ }
else { /* strlen, set to size, make data point to it */ }
[/cpp]
So a null string will actually be like "\0STRINGDATA\0", but a CRC32 of a name would be like 0x04 0x04C11DB7, and then the user would know what to do with it?
Actually that is probably a bad idea. If anything, the naming scheme stuff should probably go into the extensions section. That way you'll have your full spec, and an example of how to use the extension section :v:.
In practical use are people going to be mixing path names with UUIDs and other sorts of binary identifiers?
My instinct is that people are going to choose just one and stick to it throughout, so having a separate field length for each is a tad redundant, I think.
If we look at other formats, they tend to use textual material identifiers (file paths, etc.) exclusively. If BSM is the only format to support a feature, I'm inclined to think that nobody is ever going to use it, since it becomes a portability/abstraction concern. I'd like to make the format as generic as possible.
I think that if it's limited to fixed-length, null-terminated strings, it'll be more typical, and 'binary' identifiers can be encoded as either ASCII hexadecimal or base-64 strings. That way, everything is coerced into a single format, everything is printable, and you aren't trying to juggle plaintext and binary.
If you leave it as-is, there are fewer special cases and more options. Base-64/ASCII Hex are also compatible with UTF-8, so the end result is that you still have 256-byte, null-terminated UTF-8 strings.
[QUOTE=ROBO_DONUT;35431487]My instinct is that people are going to choose just one and stick to it throughout, so having a separate field length for each is a tad redundant, I think.[/QUOTE]
Well you could make the 'size' variable simply equal the length (in bytes) of the field, so that running strlen on it is optional. (precomputed data all up in this :v:)
Sorry, you need to Log In to post a reply to this thread.