Board index FlightGear Support Graphics

CPU vs GPU Bound (split from other topic)

Graphics issues like: bad framerates, weird colors, OpenGL errors etc. Bad graphics ar usually the result of bad graphics cards or drivers.
Forum rules
In order to help you, we need to know a lot of information. Make sure to include answers to at least the following questions in your initial post.

- what OS (Windows Xp/Vista, Mac etc.) are you running?
- what FlightGear version do you use?
- what graphics card do you have?
- does the problem occur with any aircraft, at any airport?
- is there any output printed to the console (black window)?
- copy&paste your commandline (tick the "Show commandline box on the last page of FGRun or the "Others" section on the Mac launcher).
- please upload a screenshot of the problem.

If you experience FlightGear crashes, please report a bug using the issue tracker (can be also used for feature requests).
To run FlightGear on old computers with bad OpenGL support, please take a look at this wiki article. If you are seeing corrupted/broken textures, please see this article.

Note: If you did not get a reponse, even after 7 days, you may want to check out the FlightGear mailing lists to ask your question there.

Re: FG using 90%+ on 3600X, GTX1060, 64GB RAM... ? Multicco

Postby vnts » Sun May 24, 2020 10:52 pm

TorstenD (2016) wrote:I just recently got a new laptop with an i7/8GB and a gtx960/4GB.
It is able to run flightgear at 1920x1080 at 30-40fps when flying around
KSFO with the c172p, ALS, all shaders max

A 960 was about my conclusion too [1] for very conservative requirements to run @1080p with settings turned up really high (but not everything like trees maxed). I was being conservative: I expect something closer to 40-60+ FPS with todays more performance intensive C172P and effects with trees turned up but not maxed, but this for an estimated i5 desktop CPU (around Sandybridge/2012). Torsten's 2016 CPU may be slower as it is a laptop, while being a newer technology than an i5 Sandy-bridge.

From that thread, legoboyvdlp uses a GTX 920M (laptop) and can run ALS at max shaders. From his screenshots he has trees, maybe cloud density, and lods settings turned lower, and his laptop's resolution is smaller than 1080p. It goes to show how optimised ALS is on the GPU side.

Website with GPUs sorted by approx. benchmarks: link.
A 920M is 300, and a 960 is ~2300 in the current ranking. A GTX 660 from around ~2012 as Icecold mentions is ~1300, exactly in the middle of that range.

Very high FG settings and control panel (driver) settings can use up GPU time though. Trees at ultra (maybe memory related) and overlays (definitely GPU) can suck up performance. Having transparency aliasing at supersampling can then make both of those even slower. GPU drivers these days have supersampling AA (NVIDIA DSR) to give the GPU something to bottleneck itself on - AIUI presumably because games are designed for current low performance consoles that are popular, so they don't have fundamental uses for lots of GPU power, and games drive GPU sales/development a fair amount.
Mathias (2012) wrote:Our integration of the particle systems
need to be rethought as this contains geometry with culling disabled which
makes a pagedlod just never expire. Switching the particle systems off works
pretty good so far.

Wondering if this is still the case..I recall wkitty42 having the impression [2] that volcanoes/waterfalls could consume processing power even when being out of range - maybe it was just that they didn't expire after visiting them to test(?).
Mathias (2012) wrote:OpenGL wise we are basically geometry setup bound - at least for the
models.
..
That still means that for setting up that one draw with 3 triangles is about
as heavy as setting up say 500 triangles
..
Appart from OpenGL we spend a lot of time in scenegraph traversal.

Interesting. I found out recently that I get 2x+ the FPS, while not being GPU fragment bound, looking at a piece of ground close up or empty sky so one effect fills the screen, compared to looking at a scene in the horizon at ENBR. I made sure I wasn't fragment bound by shrinking window size right down, so it's unlikely to be overdraw of trees etc. Not sure if I was vertex bound, but probably not. This occured even with trees off, OSMCity off, and all LoD ranges set to <= 3.7km reducing vertices. It still occured with shaders turned down to minimum. AI traffic was off. I was using ufo/video assistant so results were not aircraft specific. Increasing LoD:bare from 3.7km to 30km while loking at the horizon in the same direction reduced FPS a lot. This was on a somewhat modern 4-5 year old k-series i5 (stock clocks). This could be scenegraph traversal related with OSG taking a shortcut when looking at ground or sky - not sure how OSG & use by FG has changed over the years.

(As a quick bandaid, perhaps(?) there's a non-rendering part of FG that could be moved to a separate thread on systems with enough cores to free up CPU time, without doing complicated rendering changes.)

Kind regards
vnts
 
Posts: 409
Joined: Thu Apr 02, 2015 1:29 am

Re: FG using 90%+ on 3600X, GTX1060, 64GB RAM... ? Multicco

Postby legoboyvdlp » Sun May 24, 2020 11:25 pm

vnts wrote in Sun May 24, 2020 10:52 pm:From that thread, legoboyvdlp uses a GTX 920M (laptop) and can run ALS at max shaders. From his screenshots he has trees, maybe cloud density, and lods settings turned lower, and his laptop's resolution is smaller than 1080p. It goes to show how optimised ALS is on the GPU side.


This is correct. All I don't use is overlay / urban, as being geometry shaders they are bad for performance.

In fact, on my hardware, ALS is faster than default :lol:
User avatar
legoboyvdlp
 
Posts: 7981
Joined: Sat Jul 26, 2014 2:28 am
Location: Northern Ireland
Callsign: G-LEGO
Version: next
OS: Windows 10 HP

Re: CPU vs GPU Bound (split from other topic)

Postby icecode » Mon May 25, 2020 12:31 am

(As a quick bandaid, perhaps(?) there's a non-rendering part of FG that could be moved to a separate thread on systems with enough cores to free up CPU time, without doing complicated rendering changes.)


There is some testing in that direction by Richard. See https://gitlab.com/flightgear/flightgea ... ded-viewer (GitLab link because I find it easier to read in than SourceForge).
icecode
 
Posts: 709
Joined: Thu Aug 12, 2010 1:17 pm
Location: Spain
Version: next
OS: Fedora

Re: FG using 90%+ on 3600X, GTX1060, 64GB RAM... ? Multicco

Postby amue » Tue May 26, 2020 8:02 am

vnts wrote in Sun May 24, 2020 10:52 pm:I found out recently that I get 2x+ the FPS, while not being GPU fragment bound, looking at a piece of ground close up or empty sky so one effect fills the screen, compared to looking at a scene in the horizon at ENBR.

That's simple view frustum culling. Geometry outside the view frustum isn't drawn.
amue
 
Posts: 92
Joined: Tue Apr 03, 2018 10:13 am

Re: CPU vs GPU Bound (split from other topic)

Postby Hooray » Tue May 26, 2020 9:01 am

What's really needed is a benchmark suite - because different people have different hardware/systems, and are using different startup/runtime settings.

Thus, to make conclusive drawings, different combinations of settings must be tested, with performance being sampled in the background.

It would actually be a neat little project to create different rendering profiles (say different , save those to an XML and then replay a few fgtape files while sampling performance metrics (frame rate, frame spacing) - this could be done with different draw masks set to enable/disable terrain/scenery

Image

Something like this would be easy to create via Nasal, and the sampled data could be written to a CSV file for plotting purposes (think gnuplot).

Alternatively, we could also create a dedicated "Benchmark" add-on and use the Canvas system for plotting.

http://wiki.flightgear.org/How_to_manip ... s_elements
Image

All the building blocks are already in place:

http://wiki.flightgear.org/Graphics_card_profiles
http://wiki.flightgear.org/Flight_Recorder

Thus, the first step would be to copy the flight recorder dialog and remove unnecessary stuff, and then add a combo to load rendering settings from an XML file.

We could then grow a library of such benchmarks for different parts of flightgear (aircraft, scenery)

For plotting purposes, we can either write all sampled data to a file in $FG_HOME or show results via the Canvas
Please don't send support requests by PM, instead post your questions on the forum so that all users can contribute and benefit
Thanks & all the best,
Hooray
Help write next month's newsletter !
pui2canvas | MapStructure | Canvas Development | Programming resources
Hooray
 
Posts: 12707
Joined: Tue Mar 25, 2008 9:40 am
Pronouns: THOU

Re: CPU vs GPU Bound (split from other topic)

Postby vnts » Tue May 26, 2020 3:52 pm

@ Icecold Thanks
amue wrote in Tue May 26, 2020 8:02 am:
vnts wrote in Sun May 24, 2020 10:52 pm:I found out recently that I get 2x+ the FPS, while not being GPU fragment bound, looking at a piece of ground close up or empty sky so one effect fills the screen, compared to looking at a scene in the horizon at ENBR.

That's simple view frustum culling. Geometry outside the view frustum isn't drawn.

Whatever it is, it has to be something being done on the CPU e.g. by OSG. I don't mean the GPU rejecting geometry outside the frustrum after vertices have been transformed. Since fragments were minimal, and vertices should have been negligible, the difference in FPS had to be CPU side. I thought it was likely OSG scene traversal related, skipping traversal outside the frustrum based on an oct-tree lookup or whatever spatially based data structuring OSG may use.
Mathias (2012) wrote:This rather means that the geometry setup/state
change time - the yellow one - must decrease!

Out of curiosity, I quickly downloaded an old version of gDEBugger (seems like AMD bought it and deleted old NVIDIA comaptible versions). I attempted to try profiling. I'll possibly look for a more updated tool (like NVIDIA Nsight Edit: it may not support older OpenGL).

Redundant state changes (out of ~24k calls): https://i.imgur.com/mKseTlD.png

It seems there are a bunch of state changes done even though the state is already already correct.
glMaterialf&fv calls seem to be a big culprit with ~3.1k out of ~24k calls.

If FG-side has written those calls I guess some way of tracking state in RAM, and checking before calling openGL, might help performance. If it's OSG-side, then the better solution might be to ask OSG for tracking - unless OSG won't maintain old OSG version (OSG should have such support as I thought one of the goals was to reduce state changes overall). If it's not OSG-side, maybe replacing the opengl calls with FGs own functions that check against tracked state first might be a quick hack(?). Tracking state everywhere might be hard. Instead, maybe querying & updating state using FG's functions at the start of a block with lots of redundant calls might be a noticeable improvement. State changes do make up a reasonable amount of OpenGL calls: [35%] in the profile run. But not sure if these calls are a noticeable performance hit to make it worthwhile compared to easy performance gains elsewhere.
Mathias (2012) wrote:That still means that for setting up that one draw with 3 triangles is about
as heavy as setting up say 500 triangles

There seem to be a lot of vertex batches (draw calls) with 1-12 vertices at ENBR: https://i.imgur.com/mKseTlD.png Not sure what is being drawn in these batches.
V12 wrote in Sat May 23, 2020 5:44 am:But in P3D was results very different, i7 @5GHz was significant faster than my R7 @4.2 GHz, i3 was too weak for serious flying with PMDG 737 or FSL A320.

It's hard(?) to compare bottlenecks between FG and another sim as FG does a lot more things both CPU side and GPU side.
For example, doing terrain lookups for weather, weather simulation, an optimized 120Hz JSBSim simulation with a lot more detail etc. GPU side: FG models lots of things in it's light simulation and creates content with GPU time like procedural texturing&overlays. The GPU side is just extremely optimised with custom rendering strategies. If a sim did those things at their average level of optimisation it would affect bottlenecks. There are also some parts in FG that are old/unoptimised. The craft simulation and rendering can also interfere with bottlenecks and have different quality of optimisation to the rest of the sim - best way is to use an equivalent thing to the ufo in other sims.

How far were you asking FG to render terrain? 250 km(?) as you mentioned in the scenery forum... What about settings for the other sim, if that sim's renderer will even show terrain that far away instead of not rendering and using fog to cover the evidence?

If LoD rough is making a large difference then performance & CPU/GPU bottlenecks may change when LoD rough is brought in.

As Hooray mentioned it is hard to determine without benchmarks. Even then there's a specific part in the GPU, or maybe RAM/CPU/disk io, that's the limiting factor in a given situation, and that could change in different situations. There's also interference in benchmarks from intel CPU turbo boost, GPU/CPU thermal throttling/power management, and even AIUI exotic effects which don't apply to FG yet like the CPU seeing AVX instructions and reducing clock speeds.

@wkitty: if the LoD:rough is set high that may be a reason why you are GPU bottle-necked (although older gen GPU will not help). Richards work may help fix things, as GPU utilisation may still be less than 100% even on older GPUs when there are long pauses while the CPU sets up the scene.

Kind regards
Last edited by vnts on Tue May 26, 2020 4:37 pm, edited 1 time in total.
vnts
 
Posts: 409
Joined: Thu Apr 02, 2015 1:29 am

Re: CPU vs GPU Bound (split from other topic)

Postby Hooray » Tue May 26, 2020 4:17 pm

Image

That's pretty good research !

A number of senior developers active in the rendering department have repeatedly pointed out how our scene graph is not particularly optimized - Mathias in particular noted on several occasions that he had to put up with a ton of legacy code and used generic wrappers to make things work well enough, but that our scene graph may contain a ton of redundant state under some circumstances.

Note that you can use the debug menu to dump the scene graph to disk and inspect it with fgviewer/osgviewer respectively, we have also some topics where we covered how to run osg optimizer on it, which is what you would want to use for sampling purposes, too (stats/metrics).

For instance, see: viewtopic.php?f=4&t=31502&p=304138&#p303926

The important thing at this point is to keep things really simple, i.e. using the aforementioned "draw masks" to selectively toggle scene features on/off and see how that has an impact (frame rate, frame spacing, osg stats)

I would suggest to start with a really simple aircraft in a location without any scenery, and then record an fgtape that you can easily replay again with different combinations of settings.

Once that is in place, you can easily try different draw masks / rendering settings, and once you find something that's interesting - using a system wide profiler (oprofile) and an OpenGL debugger can be really helpful.

I have previously also dumped a scene graph to disk (DEBUG menu) and viewed the file using osgviewer.

But before you even look at complex aircraft (think shuttle) or complex scenery/terrain (think LOWI), it's really a good idea to come up with a baseline first and then take it from there.

There's quite a bit of progress that can be made by simply taking an existing aircraft like the ogel, flying a flight plan or recording a flight, and then using a bash script to do this with different settings over and over again, while sampling frame rate and frame spacing.

We can also track RAM/VRAM and CPU utilization that way, to know if/when a profiler/debugger might help.

But the first step really is coming up with a set of flights so that different rendering settings can be tried.

The next step would be to use different aircraft and aircraft views (think complex cockpits) and possibly different airports/locations

Setting this up for even just a single aircraft will make it much easier for people to tinker with other configurations.

We could then also get this added to the phoronix benchmark and/or open benchmarks

https://www.phoronix.com/scan.php?page= ... st%20Suite
https://openbenchmarking.org/
http://www.opengamebenchmarks.org/

Once we have the tooling in place, such benchmarks can be automatically executed in a cluster (or even parts of it run on a build server instance)
Please don't send support requests by PM, instead post your questions on the forum so that all users can contribute and benefit
Thanks & all the best,
Hooray
Help write next month's newsletter !
pui2canvas | MapStructure | Canvas Development | Programming resources
Hooray
 
Posts: 12707
Joined: Tue Mar 25, 2008 9:40 am
Pronouns: THOU

Re: CPU vs GPU Bound (split from other topic)

Postby vnts » Thu May 28, 2020 4:33 pm

Some info on the standalone graphics profilers that doesn't need compiling from source.

gDEBugger. Quick look
- AIUI Have to download v6 from archive.org (page link, and download link). Linux version. Seems AMD bought gDEBugger, and then took old versions offline as they supported reading NVIDIA specific GPU peformance counters.
- Create project: bin/fgfs.exe. use --launcher as commandline and any other options used. Working directory same as shortcut (the one /bin is in).
- 3 modes. Each mode has different functions available - see menu and shorcut buttons. Modes have different amounts of slow down. Switch to faster modes to fiddle with fg in-sim settings and change locations.
- Menu bar: Play to launch/resume. Pause to halt and examine data/stats for duration fo the run. There are buttons to play one frame and stop, or progress to the next draw call.
- Profile mode: fastest/ least features. Right click on graphs to add counters. Add counters for different rendering contexts: no 5 seems to be the main rendering one. There are GPU specific counters but they need 'NVInstEnabler.exe' to switch driver instrumentation on.
- Debug mode has a menubar buttons to give different stages in GPU trivial workloads to see if frame rate jumps up because that stage was a bottleneck. Shows opengl call stats & history. Shows resources like textures with ID, glsl. Analyze mode can show redundant state changes & call stack .
- 'NVInstEnabler.exe' from NVIDIA perfkit is needed to enable reading internal GPU counters from the driver. Publically available perfkit only supports upto 900 series (Maxwell) GPUs. Perfworks is the new replacement. Perfworks supports 1000 series and later, but doesn't seem to be publically available. I haven't got internal counters to work with gDEBugger yet.

NVIDIA Nsight Graphics :
- Just glanced at it. Only supports openGL 4.5 core profile API [2]. Programs that only use API calls that are still in the 4.5 core profile can be profiled. FG currently uses interface calls that were removed so it doesn't seem to work. Starting the analysing interfrace using the frame debugger/profiler/trace fails.
- According to the wiki [1] next gen WS3.0 scenery using VBP needs OpenGL 3.3 core profile. If VPB route is taken, maybe it is possible to use opengl interface elements that are still working in 4.5 and still do everything needed for WS 3.0 - not sure (also makes it easier to port to a higher core profile later).
- However the internal performance counter reporting works, just enough to view counters in a graph. Works with 10 series and later GPUs. Each version of Nsight needs a minimum dirver version, so you may need to check / update.
- Nsight is also much faster than gDEBugger even in profile mode.
using the aforementioned "draw masks" to selectively toggle scene features on/off and see how that has an impact

There seem to be 4 draw-masks in sim/rendering/drawmasks, for terrain (& all objects), clouds, aircraft/models.

Not sure how FG does draw masks i.e. if they just skip over the objects at OSG level and also remove imacts related to driver CPU time, and bus/memory performance. The graphics profilers have options to kill all drawables while seeming preserving CPU/bus performance - results depend on how late these changes happen and these are probably late in the driver or something. The performance counters that show utilisation/bottlenecks in detail at various stages in the GPU pipeline need GPU vendor specific APIs with switches to enable talking to the driver for NVIDIA at least. GPU profilers do seem to have brute-force options to give individual stages of the pipeline a trivial load to see where the bottlenecks are, as FPS will jump when bottlenecks are removed.

I looked at draw-masks a bit to see if clouds were responsible for some of the small vertex batches. They were.

Everything turned on using UFO. Fixed view. Due to earlier experimenting I had detailed = 1.7km, rough = 7km, bare = 33km. All shaders maxed, except for overlays off, urban shader off.

All results - single frame stats: https://imgur.com/a/CLY2jiR

Full scene
Image

Clouds only. Terrain off & objects on terrain off:
Image

Terrain & objects only. Clouds off.
Image

Clouds seem to be responsible for a substantial amount of redundant state changes. As a quick fix maybe some state could be read just before doing clouds, and a calls replaced with a function that would check before making API calls.

Clouds in this scene seem to use batches of 1-40 vertices for almost everything. Clouds (terrain off) did 6k calls per frame. Terrain and objects with clouds off did 4k per frame.

---------

Frame spacing is not smooth when 40-60 FPS, when throttled or otherwise. Turning especially seems to make it worse at least on my system.

Settings turned up. Default throttle of 50 FPS. Using Nsight graphics as gDEBugger has a lot of FPS/freezing impact,
Image

Repeatable test:
- UFO. ENBR runway. High settings. Surrounding scenery has variation in amount of visible things - sea & islands to one side.
- NVIDIA control panel settings to see smoothness frame spacing issues better I had: maximum pre-rendered frames to 1, vsync off. Also triple buffering was off to see output without delay
- Gain a bit of altitude. Stationary. Bank to turn at a steady rate around one spot.
- Go to render menu and move it to far right side to clear view. Click the throttle FPS box on and off to see the difference in smoothness - if normal FPS is much higher. Can reduce window size to get higher FPS if GPU bound. Should be enough to see subtle frame spacing issues.

Results:
- The lack of smoothness decreases with increasing throttle FPS
- The lack of smoothness seems to happen even without throttle when FG settings/driver overrides are turned up high enough to get FPS to throttled range. So it doesn't seem to be throttle specific
- There is a subtle lack of smoothness even at high FPS (as expected)
- Turning especially causes spikes
- Same results when all scenery objects like trees and buildings were switched off - so there would be minimum, and same results when all shaders were set to minimum after that. This was really subtle due to high FPS.

This could be just slow down due to large amounts of data being moved in-between RAM and VRAM (Only a small portion of my VRAM is used, 1-2 GB), or differences in the scene graph traversal in various directions causing spikes in CPU time (as discussed before), or maybe some disk io loading scenery can somehow block the rendering thread?, or something else. Or a combination of all of these.

The issue, is if other people get lack of smoothness as well, the problems occur in the common 0-60 FPS range. Maximum pre-rendered frames setting may help smoothness.

Kind regards
vnts
 
Posts: 409
Joined: Thu Apr 02, 2015 1:29 am

Re: CPU vs GPU Bound (split from other topic)

Postby Hooray » Thu May 28, 2020 4:45 pm

frame spacing spikes are usually attributed to Nasal memory management (GC, see the wiki for details) - however, Richard has implemented a new/experimental scheme to improve the situation (hopefully).
Draw maks are conventional OSG switch nodes (google for details), so they're basically nodes to be skipped.
Additional draw masks can be easily added by looking existing ones and adapting those, e.g. see: http://wiki.flightgear.org/Canvas_Troub ... for_Canvas

Image

Speaking in general, it makes sense to come up with a tiny test case to determine each system's baseline, and then take it from there by incrementally adding more options/features while monitoring how performance behaves over time.

http://wiki.flightgear.org/Minimal_Startup_Profile
Image
Please don't send support requests by PM, instead post your questions on the forum so that all users can contribute and benefit
Thanks & all the best,
Hooray
Help write next month's newsletter !
pui2canvas | MapStructure | Canvas Development | Programming resources
Hooray
 
Posts: 12707
Joined: Tue Mar 25, 2008 9:40 am
Pronouns: THOU

Previous

Return to Graphics

Who is online

Users browsing this forum: No registered users and 0 guests