bugman wrote in Fri Jan 31, 2020 3:27 pm:@Hooray: I've looked at such things with the test suite (in a number of my branches). The issue though is that you could spend a lot of time implementing the private property tree and synchronization method, only to find out that the data transfer between threads is a bottleneck that causes things to run slower than without threads. This issue is quite important and causes developers to battle through many failed threading attempts (e.g. In one of my RL projects, I attempted 4 different parallelizations with different techniques and granularity, and all were much slower than serial execution, so I had to abandon this effort after a year of work).
Plus this is more interesting for developers - what users want is higher framerates, which threading the CPU in most cases will not do.
Right, that is the "data flow/dependencies" part mentioned previously - it only makes sense to look at factoring out certain subsystems, i.e. those that are primarily about properties and that don't don't do much else in terms of calling out other subsystems/APIs.
In other words, the power of the "private property tree" only comes into play if it can be trivially implementing by getting rid of the hard-coded property tree assumption, and using a private/local one that is specific to the owning subsystem instance. For instance, having a private property tree for the autopilot, the AI system or the canvas is relatively straightforward - because these systems are already primarily property-driven, i.e. that's what they use as their inputs - and ideally, the output of the subsystem should also be just/primarily "properties", too - which applies to things like the FDM, the autopilot, the route manager - and the AI traffic system (because it's basically just a multiplexer on top of a pseudo-fdm/autopilot and route manager).
When it comes to the canvas, the output is not "just a property (tree)" but instead a whole texture - and that in itself makes little sense to create in another process (unless you use shared memory or some other IPC to access it across process boundaries) - the thing is, all the fetching, update and texture re-creation can be done asynchronously, and that is something that OSG is pretty good at.
Given what we currently have in terms of canvas functionality, most canvas textures could be created asynchronously - we're only just about in the process of adding more complex canvas elements, that add to the complexity - e.g. texture elements/parts that render scene elements or sub-cameras.
Also, like you said, end-users care primarily about frame rate - but the other thing you forgot to mention is frame spacing, i.e. the latency needed to build a single a frame, and doing that at 30+ hz - in other words, it does make sense to free up the main loop to end up with a tight loop that primarily fetches needed resources (think textures or property state) from other subsystems that may not necessarily run inside the main loop.
The neat thing about Erik's idea of supporting "remote properties" or David Megginson's idea of using subsystem-specific SGPropertyNode _root nodes is that this setup scales pretty well without requiring a completely overhaul of FlightGear, that is, as long as you primarily deal with property-based subsystems that mainly deal with properties for their input/output needs.
Like I said, this applies to the property tree recorder subsystem (fgtape), but also things like AI traffic - to see how much of an impact these systems have on flightgear at runtime, you can simply disable/remove them completely from the main loop, and you'll be surprised to see what you are approaching a runtime profile with a rather competetive framerate and frame spacing.
And like you said, there are many other subsystems whose I/O needs are not as straightforward unfortunately, but there's at least a handful of subsystems that could be moved to worker threads by going down that path, i.e. by using subsystem-specific property trees that are used for computation purposes, so that inputs are fetched from the main/root tree (the global one) and outputs copied back into the tree after the computation has finished.
The added benefit being that you end up with a design that can also be distributed, i.e. so that multi-instance setups can share certain subsystems (think multiplayer or multi-machine setups like those commonly shown at FSWeekend/Linuxtag)
There is the remaining issue that our way of adding scripting functionality to the simulator doesn't scale at all, and whatever is currently added via Nasal to the sim, isn't easily parallelized, let alone distributed - because of the free-form nature of contributions supported by Nasal, on top of fragile concepts like timers and listeners.
One of the first steps would be examining how much is to be gained from disabling subsystems entirely, and compare frame rate/spacing accordingly.
The next step would be sampling property I/O at the subsystem level, which is unfortunately not as straightforward - but at least in the case of subsystems using their own dedicated branches in the global tree, can be done easily by tracking read/writes to that location (think multiplayer, /ai or /canvas, /logging etc)
Thinking about it, that would actually be a worthwhile addition to the built-in performance monitor: sampling property I/O per subsystem, over time - i.e. in terms node reads/writes, possibly in conjunction with listener/timer requests handled per time unit