February 6th, 2008
LinQ
About a year ago I said I would make a post about LinQ. Now that Visual Studio 2008 is out, and LinQ is a released technology, I think I can get around to it. I still haven’t built a project that makes a good use of LinQ, but I’ve got a fairly good understanding of it, and found some interesting ways to use it.
Now, you may be wonder what LinQ is, and what it stands for. The answer is that it stands for Language INtegrated Query. It is an extension to C# and VB.NET that allows you to use a SQL-like syntax for constructing queries against a wide variety of data sources. The most obvious example is XML, but you can also use most of the built in data structures in the .NET framework as well.
With that out of the way, on to an example or two of how I’ve managed to use LinQ, and what the challenges that I’ve run into are.
First, a bit about the project that I’m using as a basis. I’m working on creating a data mining tool for the XML files used to modify Command and Conquer 3: Tiberium Wars. As was mentioned in a previous post, I was involved in the development of the Mod SDK. In this case, I’m looking to find ways to extend the SDK in new directions. The idea was inspired by another developer, Petroglyph Games.
For the development of their recent game, Universe at War, they developed an easy to use data mining tool for their game data files, which happened to be in XML. You could query things like the costs of all of the buildings belonging to one faction, and get a result from it scanning all of the game XML files.
The idea was to duplicate that, but for the C&C 3 XML files. Although it seems like a relatively straightforward task, there are some stumbling points. The first is how to create a simple interface for creating the query. That’s something that I’m still not sure about, and suggestions would be welcome. One advantage is that the C&C 3 SDK includes a full set of XSD files, which are schemas defining the allowable data in the game XML. Since XSD files are XML themselves, I can use LinQ to run queries against them, and gather information about what might be queried. It wouldn’t be easy, but it’s doable.
The second stumbling point is that LinQ doesn’t support dynamic queries easily. I’ll note that it is in fact certainly possible, but I need to understand how that works a bit better. Even if that is an issue, there’s always the option of using the ability of C# to compile C# code and execute it. So, that problem is solved, even if it’s a less than optimal solution.
The next problem is the layout of the C&C 3 XML. The game data is spread out over a very large number of files (Windows says 2,021 files). The files are linked together however, and include information about all of the files that they reference. I’ve already actually got working code for that, so parsing the file hierarchy is not a problem either. However, constructing a meaningful database of data from that is more challenging. My current plan is to have the user select the type of objects they intend to query (such as GameObjects, or ParticleFX, or any other basic type) and then create a “database” of all of the objects of that type to run queries against. It’s not perfect, but it allows me to amortize the cost of a full file scan over all of the searches, and should improve performance for multiple queries.
Now, all of this has had very little to do with LinQ, and is getting long, so I’ll end with a little code snippet.
IEnumerable
includes =
from e in x.Descendants(ealaAsset + “Include”)
where e.Attribute(”type”).Value == “all”
select e.Attribute(”source”).Value;
That’s the main component of my code to traverse the include hierarchy. It is a simple LinQ query that selects all of the source attributes of Include tags (in the proper namespace) that have a type attribute that is all. It’s amazingly compact, and gathers exactly what I want. Something similar could be constructed using other XML parsing toolkits, but LinQ makes it fast and easy. Of course, since each file can include other files, I use a recursive algorithm to process all of them, but that’s another blog post.