Speaking of which, Thorsten, what did you think of methods?
I guess it all hinges on the question you want to answer.
* for a particular user, we typically want to know what the bottleneck determining the performance seen actually is.
* for deciding upon how to code a feature, we want to see how fast one alternative way as compared with the other is, i.e. we need to either benchmark the feature in isolation or rig the situation that the feature to be tested is the bottleneck
* for deciding what to focus optimization onto, we want to find out what the tradeoff between typical bottlenecks is (usually you can trade performance against memory for instance)
* for deciding what features to default to, we want to know how a typical computer performs with all settings at default
I guess most important is that you know what your test has measured and what it hasn't measured - once you understand that, it doesn't matter so much what precisely you do. Whereas if you don't understand what your benchmark does, you end up drawing the completely wrong conclusions.
For the question at hand, I'm interested in a) whether a typical user can use the aircraft at all b) how it performs in comparison with an optimized aircraft of similar quality in FDM and visuals and c) what the cause of the difference is.
For that, I think the 7 fps with the Su-15 vs. the 25 fps for the F-15 at EGOD under identical conditions were most revealing. We see a user having adjusted settings such that he can fly with a high-quality plane at 25 fps, so the user isn't obviously limited by the visible scenery - and changing to a plane of similar quality lets the fps drop to unusable. It tells us that the plane is very much not optimized in comparison to the F-15.
We might rig this comparison, for instance setting vegetation density high enough, the same user would be driven to 3 fps even with the ufo, and the choice of plane wouldn't make a difference. But that is not how this user is typically operating FG, so it's not that relevant. We might also study the FDM in isolation, and we'd presumably find that the Su-15 is heavier. However, given that there's no evidence that the FDM leads to a bottleneck anywhere, it doesn't matter whether it's heavier or lighter.
(We may reasonably ask the question whether it is optimized, for instance in the Space Shuttle I'm only running systems and fcs channels I actually need to run which makes the FDM probably half of what it'd be if I would not optimize it, but I suspect in the overall scheme of things this doesn't matter too much because rendering is much heavier)
I think we've seen the same theme (far too vertex-heavy model for the visuals it provides) from a number of angles now, and as I said previously, there's a reason real-time 3d rendering has introduced normal maps and similar techniques - you can construct use cases under which this isn't apparent (very powerful GPU in combination with choking the system with other settings for instance), but it doesn't mean it doesn't matter in general.
You can of course argue 'I don't ever want to optimize, I want users to buy expensive hardware' - and since a plane is in a sense an optional feature, that's your choice as a developer. I think it's a pretty poor strategy, but hey - I don't need to fly the plane, right?