Discussion:
Optimised, high-performance, multi-threaded rendering pipeline
Felix Bembrick
2016-11-10 18:11:21 UTC
Permalink
(Thanks to Kevin for lifting my "awaiting moderation" impasse).

So, with all the recent discussions regarding the great contribution by
Laurent Bourgès of MarlinFX, it was suggested that a separate thread be
started to discuss parallelisation of the JavaFX rendering pipeline in
general.

As has been correctly pointed-out, converting or modifying the existing
rendering pipeline into a fully multi-threaded and performant beast is
indeed quite a complex task.

But, that's exactly what myself and my colleagues have been working on for
about 2 years.

The result is what we call the Hyper Rendering Pipeline (HPR).

Work on HPR started when we developed FXMark and were (bitterly)
disappointed with the performance of the JavaFX scene graph. Many JavaFX
developers have blogged about the need to dramatically minimise the number
of nodes (especially on embedded devices) in order to achieve even
"acceptable" performance. Often it is the case that most (if not all
rendering) is eventually done in a single Canvas node.

Now, as well already know, the JavaFX Canvas does perform very well and the
recent awesome work (DemoFX) by Chris Newland, just for example, shows what
can be done with this one node.

But, the majority of the animation plumbing in JavaFX is related to the
scene graph itself and is designed to make use of multiple nodes and node
types. At the moment, the performance of this scene graph is the Achilles
Heel of JavaFX (or at least one of them).

Enter HPR.

I personally have worked with a number of hardware-accelerated toolkits
over the years and am astounded by just how sluggish the rendering pipeline
for JavaFX is. When I am animating just a couple of hundred nodes using
JavaFX and transitions, I am lucky to get more than about 30 FPS, but on
the same (very powerful) machine, I can use other toolkits to render
thousands of "objects" and achieve frame rates well over 1000 FPS.

So, we refactored the entire scene graph rendering pipeline with the
following goals and principles:

1. It is written using JavaFX 9 and Java 9 (but could theoretically be
back-ported to JavaFX 8 though I see no reason to).

2. We analysed how other toolkits had optimised their own rendering
pipelines (especially Qt which has made some significant advances in this
area in recent years). We also analysed recent examples of multi-threaded
rendering using the new Vulkan API.

3. We carefully analysed and determined which parts of the pipeline should
best utilise the CPU and which parts should best utilise the GPU.

4. For those parts most suited to the CPU, we use the advanced concurrency
features of Java 8/9 to maximise parallelisation and throughput by
utilising multiple cores & threads in as an efficient manner as possible.

5. We devoted a large amount of time to optimising the "communication"
between the CPU and GPU to be far less "chatty" and this alone led to some
huge performance gains.

6. We also looked at the structure of the scene graph itself and after
studying products such as OpenSceneGraph, we refactored the JavaFX scene
graph in such a way that it lends itself to optimised rendering much more
easily.

7. This is clearly not a "small" patch. In fact to refer to it as a
"patch" is probably rather inappropriate.

The end result is that we now have a fully-functional prototype of HPR and,
already, we are seeing very significant performance improvements.

At the minimum, scene graph rendering performance has improved by 500% and,
with judicious and sometimes "tricky" use of caching, we have seen
improvements in performance of 10x or more.

And... we are only just *starting* with the performance optimisation phase.

The potential for HPR is massive as it opens-up the possibility for the
JavaFX scene graph and the animation/transition infrastructure to be used
for a whole new class of applications including games, advanced
visualisations etc., without having to rely on imperative programming of a
single Canvas node.

I believe that HPR, along with tremendous recent developments like JPro and
the outstanding work by Gluon on mobiles and embedded devices, could
position JavaFX to be the best graphics toolkit of any kind in any language
and, be the ONLY *truly* cross-platform graphics technology available.

WORA for graphics and UIs is finally within reach!

Blessings,

Felix
Tobi
2016-11-11 10:27:27 UTC
Permalink
Hi,

thanks Felix, Laurent and Chris for sharing your stuff with the community!

I am happy to see starting a discussion about boosting up the JavaFX rendering performance. I can confirm that the performance of JavaFX scene graph is not there where it should be. So multithreading would be an excellent, but difficult approach.

Felix, concerning your research of other toolkits: Do they all use multithreading or are there any toolkits which use single threading but are faster than JavaFX?

So maybe there are other points than multithreading where we can boost the performance?

2) your HPR sounds great. Did you already try DemoFX (part 3) benchmark with your HPR?


Best regards,
Tobi
Post by Felix Bembrick
(Thanks to Kevin for lifting my "awaiting moderation" impasse).
So, with all the recent discussions regarding the great contribution by
Laurent Bourgès of MarlinFX, it was suggested that a separate thread be
started to discuss parallelisation of the JavaFX rendering pipeline in
general.
As has been correctly pointed-out, converting or modifying the existing
rendering pipeline into a fully multi-threaded and performant beast is
indeed quite a complex task.
But, that's exactly what myself and my colleagues have been working on for
about 2 years.
The result is what we call the Hyper Rendering Pipeline (HPR).
Work on HPR started when we developed FXMark and were (bitterly)
disappointed with the performance of the JavaFX scene graph. Many JavaFX
developers have blogged about the need to dramatically minimise the number
of nodes (especially on embedded devices) in order to achieve even
"acceptable" performance. Often it is the case that most (if not all
rendering) is eventually done in a single Canvas node.
Now, as well already know, the JavaFX Canvas does perform very well and the
recent awesome work (DemoFX) by Chris Newland, just for example, shows what
can be done with this one node.
But, the majority of the animation plumbing in JavaFX is related to the
scene graph itself and is designed to make use of multiple nodes and node
types. At the moment, the performance of this scene graph is the Achilles
Heel of JavaFX (or at least one of them).
Enter HPR.
I personally have worked with a number of hardware-accelerated toolkits
over the years and am astounded by just how sluggish the rendering pipeline
for JavaFX is. When I am animating just a couple of hundred nodes using
JavaFX and transitions, I am lucky to get more than about 30 FPS, but on
the same (very powerful) machine, I can use other toolkits to render
thousands of "objects" and achieve frame rates well over 1000 FPS.
So, we refactored the entire scene graph rendering pipeline with the
1. It is written using JavaFX 9 and Java 9 (but could theoretically be
back-ported to JavaFX 8 though I see no reason to).
2. We analysed how other toolkits had optimised their own rendering
pipelines (especially Qt which has made some significant advances in this
area in recent years). We also analysed recent examples of multi-threaded
rendering using the new Vulkan API.
3. We carefully analysed and determined which parts of the pipeline should
best utilise the CPU and which parts should best utilise the GPU.
4. For those parts most suited to the CPU, we use the advanced concurrency
features of Java 8/9 to maximise parallelisation and throughput by
utilising multiple cores & threads in as an efficient manner as possible.
5. We devoted a large amount of time to optimising the "communication"
between the CPU and GPU to be far less "chatty" and this alone led to some
huge performance gains.
6. We also looked at the structure of the scene graph itself and after
studying products such as OpenSceneGraph, we refactored the JavaFX scene
graph in such a way that it lends itself to optimised rendering much more
easily.
7. This is clearly not a "small" patch. In fact to refer to it as a
"patch" is probably rather inappropriate.
The end result is that we now have a fully-functional prototype of HPR and,
already, we are seeing very significant performance improvements.
At the minimum, scene graph rendering performance has improved by 500% and,
with judicious and sometimes "tricky" use of caching, we have seen
improvements in performance of 10x or more.
And... we are only just *starting* with the performance optimisation phase.
The potential for HPR is massive as it opens-up the possibility for the
JavaFX scene graph and the animation/transition infrastructure to be used
for a whole new class of applications including games, advanced
visualisations etc., without having to rely on imperative programming of a
single Canvas node.
I believe that HPR, along with tremendous recent developments like JPro and
the outstanding work by Gluon on mobiles and embedded devices, could
position JavaFX to be the best graphics toolkit of any kind in any language
and, be the ONLY *truly* cross-platform graphics technology available.
WORA for graphics and UIs is finally within reach!
Blessings,
Felix
Laurent Bourgès
2016-11-11 10:55:54 UTC
Permalink
Hi,

To optimize Pisces that became the Marlin rasterizer, I carefully avoided
any both array allocation (byte/int/float pools) and also reduced array
copies or clean up ie only clear dirty parts.

This approach is generic and could be applied in other critical places of
the rendering pipelines.

FYI here are my fosdem 2016 slides on the Marlin renderer:
https://bourgesl.github.io/fosdem-2016/slides/fosdem-2016-Marlin.pdf

Of course I would be happy to share my experience and work with a tiger
team on optimizing JavaFX graphics.

However I would like getting sort of sponsoring for my potential
contributions...

Cheers,
Laurent
Post by Tobi
Hi,
thanks Felix, Laurent and Chris for sharing your stuff with the community!
I am happy to see starting a discussion about boosting up the JavaFX
rendering performance. I can confirm that the performance of JavaFX scene
graph is not there where it should be. So multithreading would be an
excellent, but difficult approach.
Post by Tobi
Felix, concerning your research of other toolkits: Do they all use
multithreading or are there any toolkits which use single threading but are
faster than JavaFX?
Post by Tobi
So maybe there are other points than multithreading where we can boost the performance?
2) your HPR sounds great. Did you already try DemoFX (part 3) benchmark with your HPR?
Best regards,
Tobi
Post by Felix Bembrick
(Thanks to Kevin for lifting my "awaiting moderation" impasse).
So, with all the recent discussions regarding the great contribution by
Laurent Bourgès of MarlinFX, it was suggested that a separate thread be
started to discuss parallelisation of the JavaFX rendering pipeline in
general.
As has been correctly pointed-out, converting or modifying the existing
rendering pipeline into a fully multi-threaded and performant beast is
indeed quite a complex task.
But, that's exactly what myself and my colleagues have been working on for
about 2 years.
The result is what we call the Hyper Rendering Pipeline (HPR).
Work on HPR started when we developed FXMark and were (bitterly)
disappointed with the performance of the JavaFX scene graph. Many JavaFX
developers have blogged about the need to dramatically minimise the number
of nodes (especially on embedded devices) in order to achieve even
"acceptable" performance. Often it is the case that most (if not all
rendering) is eventually done in a single Canvas node.
Now, as well already know, the JavaFX Canvas does perform very well and the
recent awesome work (DemoFX) by Chris Newland, just for example, shows what
can be done with this one node.
But, the majority of the animation plumbing in JavaFX is related to the
scene graph itself and is designed to make use of multiple nodes and node
types. At the moment, the performance of this scene graph is the Achilles
Heel of JavaFX (or at least one of them).
Enter HPR.
I personally have worked with a number of hardware-accelerated toolkits
over the years and am astounded by just how sluggish the rendering pipeline
for JavaFX is. When I am animating just a couple of hundred nodes using
JavaFX and transitions, I am lucky to get more than about 30 FPS, but on
the same (very powerful) machine, I can use other toolkits to render
thousands of "objects" and achieve frame rates well over 1000 FPS.
So, we refactored the entire scene graph rendering pipeline with the
1. It is written using JavaFX 9 and Java 9 (but could theoretically be
back-ported to JavaFX 8 though I see no reason to).
2. We analysed how other toolkits had optimised their own rendering
pipelines (especially Qt which has made some significant advances in this
area in recent years). We also analysed recent examples of
multi-threaded
Post by Tobi
Post by Felix Bembrick
rendering using the new Vulkan API.
3. We carefully analysed and determined which parts of the pipeline should
best utilise the CPU and which parts should best utilise the GPU.
4. For those parts most suited to the CPU, we use the advanced concurrency
features of Java 8/9 to maximise parallelisation and throughput by
utilising multiple cores & threads in as an efficient manner as possible.
5. We devoted a large amount of time to optimising the "communication"
between the CPU and GPU to be far less "chatty" and this alone led to some
huge performance gains.
6. We also looked at the structure of the scene graph itself and after
studying products such as OpenSceneGraph, we refactored the JavaFX scene
graph in such a way that it lends itself to optimised rendering much more
easily.
7. This is clearly not a "small" patch. In fact to refer to it as a
"patch" is probably rather inappropriate.
The end result is that we now have a fully-functional prototype of HPR and,
already, we are seeing very significant performance improvements.
At the minimum, scene graph rendering performance has improved by 500% and,
with judicious and sometimes "tricky" use of caching, we have seen
improvements in performance of 10x or more.
And... we are only just *starting* with the performance optimisation phase.
The potential for HPR is massive as it opens-up the possibility for the
JavaFX scene graph and the animation/transition infrastructure to be used
for a whole new class of applications including games, advanced
visualisations etc., without having to rely on imperative programming of a
single Canvas node.
I believe that HPR, along with tremendous recent developments like JPro and
the outstanding work by Gluon on mobiles and embedded devices, could
position JavaFX to be the best graphics toolkit of any kind in any language
and, be the ONLY *truly* cross-platform graphics technology available.
WORA for graphics and UIs is finally within reach!
Blessings,
Felix
Felix Bembrick
2016-11-11 11:08:35 UTC
Permalink
Thanks Laurent,

That's another thing we discovered: using Java itself in the most performant way can help a lot.

It can be tricky, but profiling can often highlight various patterns of object instantiation that show-up red flags and can lead you directly to regions of the code that can be refactored to be significantly more efficient.

Also, the often overlooked GC log analysis can lead to similar discoveries and remedies.

Blessings,

Felix
Hi,
To optimize Pisces that became the Marlin rasterizer, I carefully avoided any both array allocation (byte/int/float pools) and also reduced array copies or clean up ie only clear dirty parts.
This approach is generic and could be applied in other critical places of the rendering pipelines.
https://bourgesl.github.io/fosdem-2016/slides/fosdem-2016-Marlin.pdf
Of course I would be happy to share my experience and work with a tiger team on optimizing JavaFX graphics.
However I would like getting sort of sponsoring for my potential contributions...
Cheers,
Laurent
Post by Tobi
Hi,
thanks Felix, Laurent and Chris for sharing your stuff with the community!
I am happy to see starting a discussion about boosting up the JavaFX rendering performance. I can confirm that the performance of JavaFX scene graph is not there where it should be. So multithreading would be an excellent, but difficult approach.
Felix, concerning your research of other toolkits: Do they all use multithreading or are there any toolkits which use single threading but are faster than JavaFX?
So maybe there are other points than multithreading where we can boost the performance?
2) your HPR sounds great. Did you already try DemoFX (part 3) benchmark with your HPR?
Best regards,
Tobi
Post by Felix Bembrick
(Thanks to Kevin for lifting my "awaiting moderation" impasse).
So, with all the recent discussions regarding the great contribution by
Laurent Bourgès of MarlinFX, it was suggested that a separate thread be
started to discuss parallelisation of the JavaFX rendering pipeline in
general.
As has been correctly pointed-out, converting or modifying the existing
rendering pipeline into a fully multi-threaded and performant beast is
indeed quite a complex task.
But, that's exactly what myself and my colleagues have been working on for
about 2 years.
The result is what we call the Hyper Rendering Pipeline (HPR).
Work on HPR started when we developed FXMark and were (bitterly)
disappointed with the performance of the JavaFX scene graph. Many JavaFX
developers have blogged about the need to dramatically minimise the number
of nodes (especially on embedded devices) in order to achieve even
"acceptable" performance. Often it is the case that most (if not all
rendering) is eventually done in a single Canvas node.
Now, as well already know, the JavaFX Canvas does perform very well and the
recent awesome work (DemoFX) by Chris Newland, just for example, shows what
can be done with this one node.
But, the majority of the animation plumbing in JavaFX is related to the
scene graph itself and is designed to make use of multiple nodes and node
types. At the moment, the performance of this scene graph is the Achilles
Heel of JavaFX (or at least one of them).
Enter HPR.
I personally have worked with a number of hardware-accelerated toolkits
over the years and am astounded by just how sluggish the rendering pipeline
for JavaFX is. When I am animating just a couple of hundred nodes using
JavaFX and transitions, I am lucky to get more than about 30 FPS, but on
the same (very powerful) machine, I can use other toolkits to render
thousands of "objects" and achieve frame rates well over 1000 FPS.
So, we refactored the entire scene graph rendering pipeline with the
1. It is written using JavaFX 9 and Java 9 (but could theoretically be
back-ported to JavaFX 8 though I see no reason to).
2. We analysed how other toolkits had optimised their own rendering
pipelines (especially Qt which has made some significant advances in this
area in recent years). We also analysed recent examples of multi-threaded
rendering using the new Vulkan API.
3. We carefully analysed and determined which parts of the pipeline should
best utilise the CPU and which parts should best utilise the GPU.
4. For those parts most suited to the CPU, we use the advanced concurrency
features of Java 8/9 to maximise parallelisation and throughput by
utilising multiple cores & threads in as an efficient manner as possible.
5. We devoted a large amount of time to optimising the "communication"
between the CPU and GPU to be far less "chatty" and this alone led to some
huge performance gains.
6. We also looked at the structure of the scene graph itself and after
studying products such as OpenSceneGraph, we refactored the JavaFX scene
graph in such a way that it lends itself to optimised rendering much more
easily.
7. This is clearly not a "small" patch. In fact to refer to it as a
"patch" is probably rather inappropriate.
The end result is that we now have a fully-functional prototype of HPR and,
already, we are seeing very significant performance improvements.
At the minimum, scene graph rendering performance has improved by 500% and,
with judicious and sometimes "tricky" use of caching, we have seen
improvements in performance of 10x or more.
And... we are only just *starting* with the performance optimisation phase.
The potential for HPR is massive as it opens-up the possibility for the
JavaFX scene graph and the animation/transition infrastructure to be used
for a whole new class of applications including games, advanced
visualisations etc., without having to rely on imperative programming of a
single Canvas node.
I believe that HPR, along with tremendous recent developments like JPro and
the outstanding work by Gluon on mobiles and embedded devices, could
position JavaFX to be the best graphics toolkit of any kind in any language
and, be the ONLY *truly* cross-platform graphics technology available.
WORA for graphics and UIs is finally within reach!
Blessings,
Felix
Tobias Bley
2016-11-25 10:45:44 UTC
Permalink
Hi,

@Felix: Is there any Github project, demo video or trial to test HPR with JavaFX?

Best regards,
Tobi
Post by Felix Bembrick
Thanks Laurent,
That's another thing we discovered: using Java itself in the most performant way can help a lot.
It can be tricky, but profiling can often highlight various patterns of object instantiation that show-up red flags and can lead you directly to regions of the code that can be refactored to be significantly more efficient.
Also, the often overlooked GC log analysis can lead to similar discoveries and remedies.
Blessings,
Felix
Hi,
To optimize Pisces that became the Marlin rasterizer, I carefully avoided any both array allocation (byte/int/float pools) and also reduced array copies or clean up ie only clear dirty parts.
This approach is generic and could be applied in other critical places of the rendering pipelines.
https://bourgesl.github.io/fosdem-2016/slides/fosdem-2016-Marlin.pdf
Of course I would be happy to share my experience and work with a tiger team on optimizing JavaFX graphics.
However I would like getting sort of sponsoring for my potential contributions...
Cheers,
Laurent
Post by Tobi
Hi,
thanks Felix, Laurent and Chris for sharing your stuff with the community!
I am happy to see starting a discussion about boosting up the JavaFX rendering performance. I can confirm that the performance of JavaFX scene graph is not there where it should be. So multithreading would be an excellent, but difficult approach.
Felix, concerning your research of other toolkits: Do they all use multithreading or are there any toolkits which use single threading but are faster than JavaFX?
So maybe there are other points than multithreading where we can boost the performance?
2) your HPR sounds great. Did you already try DemoFX (part 3) benchmark with your HPR?
Best regards,
Tobi
Post by Felix Bembrick
(Thanks to Kevin for lifting my "awaiting moderation" impasse).
So, with all the recent discussions regarding the great contribution by
Laurent Bourgès of MarlinFX, it was suggested that a separate thread be
started to discuss parallelisation of the JavaFX rendering pipeline in
general.
As has been correctly pointed-out, converting or modifying the existing
rendering pipeline into a fully multi-threaded and performant beast is
indeed quite a complex task.
But, that's exactly what myself and my colleagues have been working on for
about 2 years.
The result is what we call the Hyper Rendering Pipeline (HPR).
Work on HPR started when we developed FXMark and were (bitterly)
disappointed with the performance of the JavaFX scene graph. Many JavaFX
developers have blogged about the need to dramatically minimise the number
of nodes (especially on embedded devices) in order to achieve even
"acceptable" performance. Often it is the case that most (if not all
rendering) is eventually done in a single Canvas node.
Now, as well already know, the JavaFX Canvas does perform very well and the
recent awesome work (DemoFX) by Chris Newland, just for example, shows what
can be done with this one node.
But, the majority of the animation plumbing in JavaFX is related to the
scene graph itself and is designed to make use of multiple nodes and node
types. At the moment, the performance of this scene graph is the Achilles
Heel of JavaFX (or at least one of them).
Enter HPR.
I personally have worked with a number of hardware-accelerated toolkits
over the years and am astounded by just how sluggish the rendering pipeline
for JavaFX is. When I am animating just a couple of hundred nodes using
JavaFX and transitions, I am lucky to get more than about 30 FPS, but on
the same (very powerful) machine, I can use other toolkits to render
thousands of "objects" and achieve frame rates well over 1000 FPS.
So, we refactored the entire scene graph rendering pipeline with the
1. It is written using JavaFX 9 and Java 9 (but could theoretically be
back-ported to JavaFX 8 though I see no reason to).
2. We analysed how other toolkits had optimised their own rendering
pipelines (especially Qt which has made some significant advances in this
area in recent years). We also analysed recent examples of multi-threaded
rendering using the new Vulkan API.
3. We carefully analysed and determined which parts of the pipeline should
best utilise the CPU and which parts should best utilise the GPU.
4. For those parts most suited to the CPU, we use the advanced concurrency
features of Java 8/9 to maximise parallelisation and throughput by
utilising multiple cores & threads in as an efficient manner as possible.
5. We devoted a large amount of time to optimising the "communication"
between the CPU and GPU to be far less "chatty" and this alone led to some
huge performance gains.
6. We also looked at the structure of the scene graph itself and after
studying products such as OpenSceneGraph, we refactored the JavaFX scene
graph in such a way that it lends itself to optimised rendering much more
easily.
7. This is clearly not a "small" patch. In fact to refer to it as a
"patch" is probably rather inappropriate.
The end result is that we now have a fully-functional prototype of HPR and,
already, we are seeing very significant performance improvements.
At the minimum, scene graph rendering performance has improved by 500% and,
with judicious and sometimes "tricky" use of caching, we have seen
improvements in performance of 10x or more.
And... we are only just *starting* with the performance optimisation phase.
The potential for HPR is massive as it opens-up the possibility for the
JavaFX scene graph and the animation/transition infrastructure to be used
for a whole new class of applications including games, advanced
visualisations etc., without having to rely on imperative programming of a
single Canvas node.
I believe that HPR, along with tremendous recent developments like JPro and
the outstanding work by Gluon on mobiles and embedded devices, could
position JavaFX to be the best graphics toolkit of any kind in any language
and, be the ONLY *truly* cross-platform graphics technology available.
WORA for graphics and UIs is finally within reach!
Blessings,
Felix
Felix Bembrick
2016-11-25 11:19:22 UTC
Permalink
Yes.
Post by Tobias Bley
Hi,
@Felix: Is there any Github project, demo video or trial to test HPR with JavaFX?
Best regards,
Tobi
Post by Felix Bembrick
Thanks Laurent,
That's another thing we discovered: using Java itself in the most performant way can help a lot.
It can be tricky, but profiling can often highlight various patterns of object instantiation that show-up red flags and can lead you directly to regions of the code that can be refactored to be significantly more efficient.
Also, the often overlooked GC log analysis can lead to similar discoveries and remedies.
Blessings,
Felix
Hi,
To optimize Pisces that became the Marlin rasterizer, I carefully avoided any both array allocation (byte/int/float pools) and also reduced array copies or clean up ie only clear dirty parts.
This approach is generic and could be applied in other critical places of the rendering pipelines.
https://bourgesl.github.io/fosdem-2016/slides/fosdem-2016-Marlin.pdf
Of course I would be happy to share my experience and work with a tiger team on optimizing JavaFX graphics.
However I would like getting sort of sponsoring for my potential contributions...
Cheers,
Laurent
Post by Tobi
Hi,
thanks Felix, Laurent and Chris for sharing your stuff with the community!
I am happy to see starting a discussion about boosting up the JavaFX rendering performance. I can confirm that the performance of JavaFX scene graph is not there where it should be. So multithreading would be an excellent, but difficult approach.
Felix, concerning your research of other toolkits: Do they all use multithreading or are there any toolkits which use single threading but are faster than JavaFX?
So maybe there are other points than multithreading where we can boost the performance?
2) your HPR sounds great. Did you already try DemoFX (part 3) benchmark with your HPR?
Best regards,
Tobi
Post by Felix Bembrick
(Thanks to Kevin for lifting my "awaiting moderation" impasse).
So, with all the recent discussions regarding the great contribution by
Laurent Bourgès of MarlinFX, it was suggested that a separate thread be
started to discuss parallelisation of the JavaFX rendering pipeline in
general.
As has been correctly pointed-out, converting or modifying the existing
rendering pipeline into a fully multi-threaded and performant beast is
indeed quite a complex task.
But, that's exactly what myself and my colleagues have been working on for
about 2 years.
The result is what we call the Hyper Rendering Pipeline (HPR).
Work on HPR started when we developed FXMark and were (bitterly)
disappointed with the performance of the JavaFX scene graph. Many JavaFX
developers have blogged about the need to dramatically minimise the number
of nodes (especially on embedded devices) in order to achieve even
"acceptable" performance. Often it is the case that most (if not all
rendering) is eventually done in a single Canvas node.
Now, as well already know, the JavaFX Canvas does perform very well and the
recent awesome work (DemoFX) by Chris Newland, just for example, shows what
can be done with this one node.
But, the majority of the animation plumbing in JavaFX is related to the
scene graph itself and is designed to make use of multiple nodes and node
types. At the moment, the performance of this scene graph is the Achilles
Heel of JavaFX (or at least one of them).
Enter HPR.
I personally have worked with a number of hardware-accelerated toolkits
over the years and am astounded by just how sluggish the rendering pipeline
for JavaFX is. When I am animating just a couple of hundred nodes using
JavaFX and transitions, I am lucky to get more than about 30 FPS, but on
the same (very powerful) machine, I can use other toolkits to render
thousands of "objects" and achieve frame rates well over 1000 FPS.
So, we refactored the entire scene graph rendering pipeline with the
1. It is written using JavaFX 9 and Java 9 (but could theoretically be
back-ported to JavaFX 8 though I see no reason to).
2. We analysed how other toolkits had optimised their own rendering
pipelines (especially Qt which has made some significant advances in this
area in recent years). We also analysed recent examples of multi-threaded
rendering using the new Vulkan API.
3. We carefully analysed and determined which parts of the pipeline should
best utilise the CPU and which parts should best utilise the GPU.
4. For those parts most suited to the CPU, we use the advanced concurrency
features of Java 8/9 to maximise parallelisation and throughput by
utilising multiple cores & threads in as an efficient manner as possible.
5. We devoted a large amount of time to optimising the "communication"
between the CPU and GPU to be far less "chatty" and this alone led to some
huge performance gains.
6. We also looked at the structure of the scene graph itself and after
studying products such as OpenSceneGraph, we refactored the JavaFX scene
graph in such a way that it lends itself to optimised rendering much more
easily.
7. This is clearly not a "small" patch. In fact to refer to it as a
"patch" is probably rather inappropriate.
The end result is that we now have a fully-functional prototype of HPR and,
already, we are seeing very significant performance improvements.
At the minimum, scene graph rendering performance has improved by 500% and,
with judicious and sometimes "tricky" use of caching, we have seen
improvements in performance of 10x or more.
And... we are only just *starting* with the performance optimisation phase.
The potential for HPR is massive as it opens-up the possibility for the
JavaFX scene graph and the animation/transition infrastructure to be used
for a whole new class of applications including games, advanced
visualisations etc., without having to rely on imperative programming of a
single Canvas node.
I believe that HPR, along with tremendous recent developments like JPro and
the outstanding work by Gluon on mobiles and embedded devices, could
position JavaFX to be the best graphics toolkit of any kind in any language
and, be the ONLY *truly* cross-platform graphics technology available.
WORA for graphics and UIs is finally within reach!
Blessings,
Felix
Tobias Bley
2016-11-25 13:07:19 UTC
Permalink
A very short answer ;) ….

Do you have any URL?
Yes.
Post by Tobias Bley
Hi,
@Felix: Is there any Github project, demo video or trial to test HPR with JavaFX?
Best regards,
Tobi
Post by Felix Bembrick
Thanks Laurent,
That's another thing we discovered: using Java itself in the most performant way can help a lot.
It can be tricky, but profiling can often highlight various patterns of object instantiation that show-up red flags and can lead you directly to regions of the code that can be refactored to be significantly more efficient.
Also, the often overlooked GC log analysis can lead to similar discoveries and remedies.
Blessings,
Felix
Hi,
To optimize Pisces that became the Marlin rasterizer, I carefully avoided any both array allocation (byte/int/float pools) and also reduced array copies or clean up ie only clear dirty parts.
This approach is generic and could be applied in other critical places of the rendering pipelines.
https://bourgesl.github.io/fosdem-2016/slides/fosdem-2016-Marlin.pdf
Of course I would be happy to share my experience and work with a tiger team on optimizing JavaFX graphics.
However I would like getting sort of sponsoring for my potential contributions...
Cheers,
Laurent
Post by Tobi
Hi,
thanks Felix, Laurent and Chris for sharing your stuff with the community!
I am happy to see starting a discussion about boosting up the JavaFX rendering performance. I can confirm that the performance of JavaFX scene graph is not there where it should be. So multithreading would be an excellent, but difficult approach.
Felix, concerning your research of other toolkits: Do they all use multithreading or are there any toolkits which use single threading but are faster than JavaFX?
So maybe there are other points than multithreading where we can boost the performance?
2) your HPR sounds great. Did you already try DemoFX (part 3) benchmark with your HPR?
Best regards,
Tobi
Post by Felix Bembrick
(Thanks to Kevin for lifting my "awaiting moderation" impasse).
So, with all the recent discussions regarding the great contribution by
Laurent Bourgès of MarlinFX, it was suggested that a separate thread be
started to discuss parallelisation of the JavaFX rendering pipeline in
general.
As has been correctly pointed-out, converting or modifying the existing
rendering pipeline into a fully multi-threaded and performant beast is
indeed quite a complex task.
But, that's exactly what myself and my colleagues have been working on for
about 2 years.
The result is what we call the Hyper Rendering Pipeline (HPR).
Work on HPR started when we developed FXMark and were (bitterly)
disappointed with the performance of the JavaFX scene graph. Many JavaFX
developers have blogged about the need to dramatically minimise the number
of nodes (especially on embedded devices) in order to achieve even
"acceptable" performance. Often it is the case that most (if not all
rendering) is eventually done in a single Canvas node.
Now, as well already know, the JavaFX Canvas does perform very well and the
recent awesome work (DemoFX) by Chris Newland, just for example, shows what
can be done with this one node.
But, the majority of the animation plumbing in JavaFX is related to the
scene graph itself and is designed to make use of multiple nodes and node
types. At the moment, the performance of this scene graph is the Achilles
Heel of JavaFX (or at least one of them).
Enter HPR.
I personally have worked with a number of hardware-accelerated toolkits
over the years and am astounded by just how sluggish the rendering pipeline
for JavaFX is. When I am animating just a couple of hundred nodes using
JavaFX and transitions, I am lucky to get more than about 30 FPS, but on
the same (very powerful) machine, I can use other toolkits to render
thousands of "objects" and achieve frame rates well over 1000 FPS.
So, we refactored the entire scene graph rendering pipeline with the
1. It is written using JavaFX 9 and Java 9 (but could theoretically be
back-ported to JavaFX 8 though I see no reason to).
2. We analysed how other toolkits had optimised their own rendering
pipelines (especially Qt which has made some significant advances in this
area in recent years). We also analysed recent examples of multi-threaded
rendering using the new Vulkan API.
3. We carefully analysed and determined which parts of the pipeline should
best utilise the CPU and which parts should best utilise the GPU.
4. For those parts most suited to the CPU, we use the advanced concurrency
features of Java 8/9 to maximise parallelisation and throughput by
utilising multiple cores & threads in as an efficient manner as possible.
5. We devoted a large amount of time to optimising the "communication"
between the CPU and GPU to be far less "chatty" and this alone led to some
huge performance gains.
6. We also looked at the structure of the scene graph itself and after
studying products such as OpenSceneGraph, we refactored the JavaFX scene
graph in such a way that it lends itself to optimised rendering much more
easily.
7. This is clearly not a "small" patch. In fact to refer to it as a
"patch" is probably rather inappropriate.
The end result is that we now have a fully-functional prototype of HPR and,
already, we are seeing very significant performance improvements.
At the minimum, scene graph rendering performance has improved by 500% and,
with judicious and sometimes "tricky" use of caching, we have seen
improvements in performance of 10x or more.
And... we are only just *starting* with the performance optimisation phase.
The potential for HPR is massive as it opens-up the possibility for the
JavaFX scene graph and the animation/transition infrastructure to be used
for a whole new class of applications including games, advanced
visualisations etc., without having to rely on imperative programming of a
single Canvas node.
I believe that HPR, along with tremendous recent developments like JPro and
the outstanding work by Gluon on mobiles and embedded devices, could
position JavaFX to be the best graphics toolkit of any kind in any language
and, be the ONLY *truly* cross-platform graphics technology available.
WORA for graphics and UIs is finally within reach!
Blessings,
Felix
Felix Bembrick
2016-11-25 15:45:30 UTC
Permalink
Short answer? Maybe.

But exactly one more word than any from Oracle ;-)
Post by Tobias Bley
A very short answer ;) ….
Do you have any URL?
Yes.
Post by Tobias Bley
Hi,
@Felix: Is there any Github project, demo video or trial to test HPR with JavaFX?
Best regards,
Tobi
Post by Felix Bembrick
Thanks Laurent,
That's another thing we discovered: using Java itself in the most performant way can help a lot.
It can be tricky, but profiling can often highlight various patterns of object instantiation that show-up red flags and can lead you directly to regions of the code that can be refactored to be significantly more efficient.
Also, the often overlooked GC log analysis can lead to similar discoveries and remedies.
Blessings,
Felix
Hi,
To optimize Pisces that became the Marlin rasterizer, I carefully avoided any both array allocation (byte/int/float pools) and also reduced array copies or clean up ie only clear dirty parts.
This approach is generic and could be applied in other critical places of the rendering pipelines.
https://bourgesl.github.io/fosdem-2016/slides/fosdem-2016-Marlin.pdf
Of course I would be happy to share my experience and work with a tiger team on optimizing JavaFX graphics.
However I would like getting sort of sponsoring for my potential contributions...
Cheers,
Laurent
Post by Tobi
Hi,
thanks Felix, Laurent and Chris for sharing your stuff with the community!
I am happy to see starting a discussion about boosting up the JavaFX rendering performance. I can confirm that the performance of JavaFX scene graph is not there where it should be. So multithreading would be an excellent, but difficult approach.
Felix, concerning your research of other toolkits: Do they all use multithreading or are there any toolkits which use single threading but are faster than JavaFX?
So maybe there are other points than multithreading where we can boost the performance?
2) your HPR sounds great. Did you already try DemoFX (part 3) benchmark with your HPR?
Best regards,
Tobi
Post by Felix Bembrick
(Thanks to Kevin for lifting my "awaiting moderation" impasse).
So, with all the recent discussions regarding the great contribution by
Laurent Bourgès of MarlinFX, it was suggested that a separate thread be
started to discuss parallelisation of the JavaFX rendering pipeline in
general.
As has been correctly pointed-out, converting or modifying the existing
rendering pipeline into a fully multi-threaded and performant beast is
indeed quite a complex task.
But, that's exactly what myself and my colleagues have been working on for
about 2 years.
The result is what we call the Hyper Rendering Pipeline (HPR).
Work on HPR started when we developed FXMark and were (bitterly)
disappointed with the performance of the JavaFX scene graph. Many JavaFX
developers have blogged about the need to dramatically minimise the number
of nodes (especially on embedded devices) in order to achieve even
"acceptable" performance. Often it is the case that most (if not all
rendering) is eventually done in a single Canvas node.
Now, as well already know, the JavaFX Canvas does perform very well and the
recent awesome work (DemoFX) by Chris Newland, just for example, shows what
can be done with this one node.
But, the majority of the animation plumbing in JavaFX is related to the
scene graph itself and is designed to make use of multiple nodes and node
types. At the moment, the performance of this scene graph is the Achilles
Heel of JavaFX (or at least one of them).
Enter HPR.
I personally have worked with a number of hardware-accelerated toolkits
over the years and am astounded by just how sluggish the rendering pipeline
for JavaFX is. When I am animating just a couple of hundred nodes using
JavaFX and transitions, I am lucky to get more than about 30 FPS, but on
the same (very powerful) machine, I can use other toolkits to render
thousands of "objects" and achieve frame rates well over 1000 FPS.
So, we refactored the entire scene graph rendering pipeline with the
1. It is written using JavaFX 9 and Java 9 (but could theoretically be
back-ported to JavaFX 8 though I see no reason to).
2. We analysed how other toolkits had optimised their own rendering
pipelines (especially Qt which has made some significant advances in this
area in recent years). We also analysed recent examples of multi-threaded
rendering using the new Vulkan API.
3. We carefully analysed and determined which parts of the pipeline should
best utilise the CPU and which parts should best utilise the GPU.
4. For those parts most suited to the CPU, we use the advanced concurrency
features of Java 8/9 to maximise parallelisation and throughput by
utilising multiple cores & threads in as an efficient manner as possible.
5. We devoted a large amount of time to optimising the "communication"
between the CPU and GPU to be far less "chatty" and this alone led to some
huge performance gains.
6. We also looked at the structure of the scene graph itself and after
studying products such as OpenSceneGraph, we refactored the JavaFX scene
graph in such a way that it lends itself to optimised rendering much more
easily.
7. This is clearly not a "small" patch. In fact to refer to it as a
"patch" is probably rather inappropriate.
The end result is that we now have a fully-functional prototype of HPR and,
already, we are seeing very significant performance improvements.
At the minimum, scene graph rendering performance has improved by 500% and,
with judicious and sometimes "tricky" use of caching, we have seen
improvements in performance of 10x or more.
And... we are only just *starting* with the performance optimisation phase.
The potential for HPR is massive as it opens-up the possibility for the
JavaFX scene graph and the animation/transition infrastructure to be used
for a whole new class of applications including games, advanced
visualisations etc., without having to rely on imperative programming of a
single Canvas node.
I believe that HPR, along with tremendous recent developments like JPro and
the outstanding work by Gluon on mobiles and embedded devices, could
position JavaFX to be the best graphics toolkit of any kind in any language
and, be the ONLY *truly* cross-platform graphics technology available.
WORA for graphics and UIs is finally within reach!
Blessings,
Felix
Tobias Bley
2016-11-27 19:57:47 UTC
Permalink
Where can we read more about your HPR renderer?
Post by Felix Bembrick
Short answer? Maybe.
But exactly one more word than any from Oracle ;-)
Post by Tobias Bley
A very short answer ;) ….
Do you have any URL?
Yes.
Post by Tobias Bley
Hi,
@Felix: Is there any Github project, demo video or trial to test HPR with JavaFX?
Best regards,
Tobi
Post by Felix Bembrick
Thanks Laurent,
That's another thing we discovered: using Java itself in the most performant way can help a lot.
It can be tricky, but profiling can often highlight various patterns of object instantiation that show-up red flags and can lead you directly to regions of the code that can be refactored to be significantly more efficient.
Also, the often overlooked GC log analysis can lead to similar discoveries and remedies.
Blessings,
Felix
Hi,
To optimize Pisces that became the Marlin rasterizer, I carefully avoided any both array allocation (byte/int/float pools) and also reduced array copies or clean up ie only clear dirty parts.
This approach is generic and could be applied in other critical places of the rendering pipelines.
https://bourgesl.github.io/fosdem-2016/slides/fosdem-2016-Marlin.pdf
Of course I would be happy to share my experience and work with a tiger team on optimizing JavaFX graphics.
However I would like getting sort of sponsoring for my potential contributions...
Cheers,
Laurent
Post by Tobi
Hi,
thanks Felix, Laurent and Chris for sharing your stuff with the community!
I am happy to see starting a discussion about boosting up the JavaFX rendering performance. I can confirm that the performance of JavaFX scene graph is not there where it should be. So multithreading would be an excellent, but difficult approach.
Felix, concerning your research of other toolkits: Do they all use multithreading or are there any toolkits which use single threading but are faster than JavaFX?
So maybe there are other points than multithreading where we can boost the performance?
2) your HPR sounds great. Did you already try DemoFX (part 3) benchmark with your HPR?
Best regards,
Tobi
Post by Felix Bembrick
(Thanks to Kevin for lifting my "awaiting moderation" impasse).
So, with all the recent discussions regarding the great contribution by
Laurent Bourgès of MarlinFX, it was suggested that a separate thread be
started to discuss parallelisation of the JavaFX rendering pipeline in
general.
As has been correctly pointed-out, converting or modifying the existing
rendering pipeline into a fully multi-threaded and performant beast is
indeed quite a complex task.
But, that's exactly what myself and my colleagues have been working on for
about 2 years.
The result is what we call the Hyper Rendering Pipeline (HPR).
Work on HPR started when we developed FXMark and were (bitterly)
disappointed with the performance of the JavaFX scene graph. Many JavaFX
developers have blogged about the need to dramatically minimise the number
of nodes (especially on embedded devices) in order to achieve even
"acceptable" performance. Often it is the case that most (if not all
rendering) is eventually done in a single Canvas node.
Now, as well already know, the JavaFX Canvas does perform very well and the
recent awesome work (DemoFX) by Chris Newland, just for example, shows what
can be done with this one node.
But, the majority of the animation plumbing in JavaFX is related to the
scene graph itself and is designed to make use of multiple nodes and node
types. At the moment, the performance of this scene graph is the Achilles
Heel of JavaFX (or at least one of them).
Enter HPR.
I personally have worked with a number of hardware-accelerated toolkits
over the years and am astounded by just how sluggish the rendering pipeline
for JavaFX is. When I am animating just a couple of hundred nodes using
JavaFX and transitions, I am lucky to get more than about 30 FPS, but on
the same (very powerful) machine, I can use other toolkits to render
thousands of "objects" and achieve frame rates well over 1000 FPS.
So, we refactored the entire scene graph rendering pipeline with the
1. It is written using JavaFX 9 and Java 9 (but could theoretically be
back-ported to JavaFX 8 though I see no reason to).
2. We analysed how other toolkits had optimised their own rendering
pipelines (especially Qt which has made some significant advances in this
area in recent years). We also analysed recent examples of multi-threaded
rendering using the new Vulkan API.
3. We carefully analysed and determined which parts of the pipeline should
best utilise the CPU and which parts should best utilise the GPU.
4. For those parts most suited to the CPU, we use the advanced concurrency
features of Java 8/9 to maximise parallelisation and throughput by
utilising multiple cores & threads in as an efficient manner as possible.
5. We devoted a large amount of time to optimising the "communication"
between the CPU and GPU to be far less "chatty" and this alone led to some
huge performance gains.
6. We also looked at the structure of the scene graph itself and after
studying products such as OpenSceneGraph, we refactored the JavaFX scene
graph in such a way that it lends itself to optimised rendering much more
easily.
7. This is clearly not a "small" patch. In fact to refer to it as a
"patch" is probably rather inappropriate.
The end result is that we now have a fully-functional prototype of HPR and,
already, we are seeing very significant performance improvements.
At the minimum, scene graph rendering performance has improved by 500% and,
with judicious and sometimes "tricky" use of caching, we have seen
improvements in performance of 10x or more.
And... we are only just *starting* with the performance optimisation phase.
The potential for HPR is massive as it opens-up the possibility for the
JavaFX scene graph and the animation/transition infrastructure to be used
for a whole new class of applications including games, advanced
visualisations etc., without having to rely on imperative programming of a
single Canvas node.
I believe that HPR, along with tremendous recent developments like JPro and
the outstanding work by Gluon on mobiles and embedded devices, could
position JavaFX to be the best graphics toolkit of any kind in any language
and, be the ONLY *truly* cross-platform graphics technology available.
WORA for graphics and UIs is finally within reach!
Blessings,
Felix
Felix Bembrick
2016-11-27 21:57:31 UTC
Permalink
Well, given that you and Benjamin seem to be the only people interested in it, perhaps we should discuss it offline (so as not to bother Oracle or spam list this)...
Post by Tobias Bley
Where can we read more about your HPR renderer?
Post by Felix Bembrick
Short answer? Maybe.
But exactly one more word than any from Oracle ;-)
Post by Tobias Bley
A very short answer ;) ….
Do you have any URL?
Yes.
Post by Tobias Bley
Hi,
@Felix: Is there any Github project, demo video or trial to test HPR with JavaFX?
Best regards,
Tobi
Post by Felix Bembrick
Thanks Laurent,
That's another thing we discovered: using Java itself in the most performant way can help a lot.
It can be tricky, but profiling can often highlight various patterns of object instantiation that show-up red flags and can lead you directly to regions of the code that can be refactored to be significantly more efficient.
Also, the often overlooked GC log analysis can lead to similar discoveries and remedies.
Blessings,
Felix
Hi,
To optimize Pisces that became the Marlin rasterizer, I carefully avoided any both array allocation (byte/int/float pools) and also reduced array copies or clean up ie only clear dirty parts.
This approach is generic and could be applied in other critical places of the rendering pipelines.
https://bourgesl.github.io/fosdem-2016/slides/fosdem-2016-Marlin.pdf
Of course I would be happy to share my experience and work with a tiger team on optimizing JavaFX graphics.
However I would like getting sort of sponsoring for my potential contributions...
Cheers,
Laurent
Post by Tobi
Hi,
thanks Felix, Laurent and Chris for sharing your stuff with the community!
I am happy to see starting a discussion about boosting up the JavaFX rendering performance. I can confirm that the performance of JavaFX scene graph is not there where it should be. So multithreading would be an excellent, but difficult approach.
Felix, concerning your research of other toolkits: Do they all use multithreading or are there any toolkits which use single threading but are faster than JavaFX?
So maybe there are other points than multithreading where we can boost the performance?
2) your HPR sounds great. Did you already try DemoFX (part 3) benchmark with your HPR?
Best regards,
Tobi
Post by Felix Bembrick
(Thanks to Kevin for lifting my "awaiting moderation" impasse).
So, with all the recent discussions regarding the great contribution by
Laurent Bourgès of MarlinFX, it was suggested that a separate thread be
started to discuss parallelisation of the JavaFX rendering pipeline in
general.
As has been correctly pointed-out, converting or modifying the existing
rendering pipeline into a fully multi-threaded and performant beast is
indeed quite a complex task.
But, that's exactly what myself and my colleagues have been working on for
about 2 years.
The result is what we call the Hyper Rendering Pipeline (HPR).
Work on HPR started when we developed FXMark and were (bitterly)
disappointed with the performance of the JavaFX scene graph. Many JavaFX
developers have blogged about the need to dramatically minimise the number
of nodes (especially on embedded devices) in order to achieve even
"acceptable" performance. Often it is the case that most (if not all
rendering) is eventually done in a single Canvas node.
Now, as well already know, the JavaFX Canvas does perform very well and the
recent awesome work (DemoFX) by Chris Newland, just for example, shows what
can be done with this one node.
But, the majority of the animation plumbing in JavaFX is related to the
scene graph itself and is designed to make use of multiple nodes and node
types. At the moment, the performance of this scene graph is the Achilles
Heel of JavaFX (or at least one of them).
Enter HPR.
I personally have worked with a number of hardware-accelerated toolkits
over the years and am astounded by just how sluggish the rendering pipeline
for JavaFX is. When I am animating just a couple of hundred nodes using
JavaFX and transitions, I am lucky to get more than about 30 FPS, but on
the same (very powerful) machine, I can use other toolkits to render
thousands of "objects" and achieve frame rates well over 1000 FPS.
So, we refactored the entire scene graph rendering pipeline with the
1. It is written using JavaFX 9 and Java 9 (but could theoretically be
back-ported to JavaFX 8 though I see no reason to).
2. We analysed how other toolkits had optimised their own rendering
pipelines (especially Qt which has made some significant advances in this
area in recent years). We also analysed recent examples of multi-threaded
rendering using the new Vulkan API.
3. We carefully analysed and determined which parts of the pipeline should
best utilise the CPU and which parts should best utilise the GPU.
4. For those parts most suited to the CPU, we use the advanced concurrency
features of Java 8/9 to maximise parallelisation and throughput by
utilising multiple cores & threads in as an efficient manner as possible.
5. We devoted a large amount of time to optimising the "communication"
between the CPU and GPU to be far less "chatty" and this alone led to some
huge performance gains.
6. We also looked at the structure of the scene graph itself and after
studying products such as OpenSceneGraph, we refactored the JavaFX scene
graph in such a way that it lends itself to optimised rendering much more
easily.
7. This is clearly not a "small" patch. In fact to refer to it as a
"patch" is probably rather inappropriate.
The end result is that we now have a fully-functional prototype of HPR and,
already, we are seeing very significant performance improvements.
At the minimum, scene graph rendering performance has improved by 500% and,
with judicious and sometimes "tricky" use of caching, we have seen
improvements in performance of 10x or more.
And... we are only just *starting* with the performance optimisation phase.
The potential for HPR is massive as it opens-up the possibility for the
JavaFX scene graph and the animation/transition infrastructure to be used
for a whole new class of applications including games, advanced
visualisations etc., without having to rely on imperative programming of a
single Canvas node.
I believe that HPR, along with tremendous recent developments like JPro and
the outstanding work by Gluon on mobiles and embedded devices, could
position JavaFX to be the best graphics toolkit of any kind in any language
and, be the ONLY *truly* cross-platform graphics technology available.
WORA for graphics and UIs is finally within reach!
Blessings,
Felix
Felix Bembrick
2016-11-28 05:54:47 UTC
Permalink
Sorry Gerrit - you did indeed.

Maybe you'd also like to participate in the offline discussion (especially now that you don't work for Oracle)?
Well I mentioned before that I'm interested too :)
Cheers,
Gerrit
Post by Felix Bembrick
Well, given that you and Benjamin seem to be the only people interested in it, perhaps we should discuss it offline (so as not to bother Oracle or spam list this)...
Post by Tobias Bley
Where can we read more about your HPR renderer?
Post by Felix Bembrick
Short answer? Maybe.
But exactly one more word than any from Oracle ;-)
Post by Tobias Bley
A very short answer ;) ….
Do you have any URL?
Yes.
Post by Tobias Bley
Hi,
@Felix: Is there any Github project, demo video or trial to test HPR with JavaFX?
Best regards,
Tobi
Post by Felix Bembrick
Thanks Laurent,
That's another thing we discovered: using Java itself in the most performant way can help a lot.
It can be tricky, but profiling can often highlight various patterns of object instantiation that show-up red flags and can lead you directly to regions of the code that can be refactored to be significantly more efficient.
Also, the often overlooked GC log analysis can lead to similar discoveries and remedies.
Blessings,
Felix
Hi,
To optimize Pisces that became the Marlin rasterizer, I carefully avoided any both array allocation (byte/int/float pools) and also reduced array copies or clean up ie only clear dirty parts.
This approach is generic and could be applied in other critical places of the rendering pipelines.
https://bourgesl.github.io/fosdem-2016/slides/fosdem-2016-Marlin.pdf
Of course I would be happy to share my experience and work with a tiger team on optimizing JavaFX graphics.
However I would like getting sort of sponsoring for my potential contributions...
Cheers,
Laurent
Post by Tobi
Hi,
thanks Felix, Laurent and Chris for sharing your stuff with the community!
I am happy to see starting a discussion about boosting up the JavaFX rendering performance. I can confirm that the performance of JavaFX scene graph is not there where it should be. So multithreading would be an excellent, but difficult approach.
Felix, concerning your research of other toolkits: Do they all use multithreading or are there any toolkits which use single threading but are faster than JavaFX?
So maybe there are other points than multithreading where we can boost the performance?
2) your HPR sounds great. Did you already try DemoFX (part 3) benchmark with your HPR?
Best regards,
Tobi
Post by Felix Bembrick
(Thanks to Kevin for lifting my "awaiting moderation" impasse).
So, with all the recent discussions regarding the great contribution by
Laurent Bourgès of MarlinFX, it was suggested that a separate thread be
started to discuss parallelisation of the JavaFX rendering pipeline in
general.
As has been correctly pointed-out, converting or modifying the existing
rendering pipeline into a fully multi-threaded and performant beast is
indeed quite a complex task.
But, that's exactly what myself and my colleagues have been working on for
about 2 years.
The result is what we call the Hyper Rendering Pipeline (HPR).
Work on HPR started when we developed FXMark and were (bitterly)
disappointed with the performance of the JavaFX scene graph. Many JavaFX
developers have blogged about the need to dramatically minimise the number
of nodes (especially on embedded devices) in order to achieve even
"acceptable" performance. Often it is the case that most (if not all
rendering) is eventually done in a single Canvas node.
Now, as well already know, the JavaFX Canvas does perform very well and the
recent awesome work (DemoFX) by Chris Newland, just for example, shows what
can be done with this one node.
But, the majority of the animation plumbing in JavaFX is related to the
scene graph itself and is designed to make use of multiple nodes and node
types. At the moment, the performance of this scene graph is the Achilles
Heel of JavaFX (or at least one of them).
Enter HPR.
I personally have worked with a number of hardware-accelerated toolkits
over the years and am astounded by just how sluggish the rendering pipeline
for JavaFX is. When I am animating just a couple of hundred nodes using
JavaFX and transitions, I am lucky to get more than about 30 FPS, but on
the same (very powerful) machine, I can use other toolkits to render
thousands of "objects" and achieve frame rates well over 1000 FPS.
So, we refactored the entire scene graph rendering pipeline with the
1. It is written using JavaFX 9 and Java 9 (but could theoretically be
back-ported to JavaFX 8 though I see no reason to).
2. We analysed how other toolkits had optimised their own rendering
pipelines (especially Qt which has made some significant advances in this
area in recent years). We also analysed recent examples of multi-threaded
rendering using the new Vulkan API.
3. We carefully analysed and determined which parts of the pipeline should
best utilise the CPU and which parts should best utilise the GPU.
4. For those parts most suited to the CPU, we use the advanced concurrency
features of Java 8/9 to maximise parallelisation and throughput by
utilising multiple cores & threads in as an efficient manner as possible.
5. We devoted a large amount of time to optimising the "communication"
between the CPU and GPU to be far less "chatty" and this alone led to some
huge performance gains.
6. We also looked at the structure of the scene graph itself and after
studying products such as OpenSceneGraph, we refactored the JavaFX scene
graph in such a way that it lends itself to optimised rendering much more
easily.
7. This is clearly not a "small" patch. In fact to refer to it as a
"patch" is probably rather inappropriate.
The end result is that we now have a fully-functional prototype of HPR and,
already, we are seeing very significant performance improvements.
At the minimum, scene graph rendering performance has improved by 500% and,
with judicious and sometimes "tricky" use of caching, we have seen
improvements in performance of 10x or more.
And... we are only just *starting* with the performance optimisation phase.
The potential for HPR is massive as it opens-up the possibility for the
JavaFX scene graph and the animation/transition infrastructure to be used
for a whole new class of applications including games, advanced
visualisations etc., without having to rely on imperative programming of a
single Canvas node.
I believe that HPR, along with tremendous recent developments like JPro and
the outstanding work by Gluon on mobiles and embedded devices, could
position JavaFX to be the best graphics toolkit of any kind in any language
and, be the ONLY *truly* cross-platform graphics technology available.
WORA for graphics and UIs is finally within reach!
Blessings,
Felix
Michael Paus
2016-11-28 07:10:21 UTC
Permalink
I am interested too although I have only been listening quietly so far
due to lack of time.
Cheers
Michael
Post by Felix Bembrick
Sorry Gerrit - you did indeed.
Maybe you'd also like to participate in the offline discussion (especially now that you don't work for Oracle)?
Well I mentioned before that I'm interested too :)
Cheers,
Gerrit
Post by Felix Bembrick
Well, given that you and Benjamin seem to be the only people interested in it, perhaps we should discuss it offline (so as not to bother Oracle or spam list this)...
Post by Tobias Bley
Where can we read more about your HPR renderer?
Post by Felix Bembrick
Short answer? Maybe.
But exactly one more word than any from Oracle ;-)
Post by Tobias Bley
A very short answer ;) ….
Do you have any URL?
Yes.
Post by Tobias Bley
Hi,
@Felix: Is there any Github project, demo video or trial to test HPR with JavaFX?
Best regards,
Tobi
Post by Felix Bembrick
Thanks Laurent,
That's another thing we discovered: using Java itself in the most performant way can help a lot.
It can be tricky, but profiling can often highlight various patterns of object instantiation that show-up red flags and can lead you directly to regions of the code that can be refactored to be significantly more efficient.
Also, the often overlooked GC log analysis can lead to similar discoveries and remedies.
Blessings,
Felix
Hi,
To optimize Pisces that became the Marlin rasterizer, I carefully avoided any both array allocation (byte/int/float pools) and also reduced array copies or clean up ie only clear dirty parts.
This approach is generic and could be applied in other critical places of the rendering pipelines.
https://bourgesl.github.io/fosdem-2016/slides/fosdem-2016-Marlin.pdf
Of course I would be happy to share my experience and work with a tiger team on optimizing JavaFX graphics.
However I would like getting sort of sponsoring for my potential contributions...
Cheers,
Laurent
Post by Tobi
Hi,
thanks Felix, Laurent and Chris for sharing your stuff with the community!
I am happy to see starting a discussion about boosting up the JavaFX rendering performance. I can confirm that the performance of JavaFX scene graph is not there where it should be. So multithreading would be an excellent, but difficult approach.
Felix, concerning your research of other toolkits: Do they all use multithreading or are there any toolkits which use single threading but are faster than JavaFX?
So maybe there are other points than multithreading where we can boost the performance?
2) your HPR sounds great. Did you already try DemoFX (part 3) benchmark with your HPR?
Best regards,
Tobi
Post by Felix Bembrick
(Thanks to Kevin for lifting my "awaiting moderation" impasse).
So, with all the recent discussions regarding the great contribution by
Laurent Bourgès of MarlinFX, it was suggested that a separate thread be
started to discuss parallelisation of the JavaFX rendering pipeline in
general.
As has been correctly pointed-out, converting or modifying the existing
rendering pipeline into a fully multi-threaded and performant beast is
indeed quite a complex task.
But, that's exactly what myself and my colleagues have been working on for
about 2 years.
The result is what we call the Hyper Rendering Pipeline (HPR).
Work on HPR started when we developed FXMark and were (bitterly)
disappointed with the performance of the JavaFX scene graph. Many JavaFX
developers have blogged about the need to dramatically minimise the number
of nodes (especially on embedded devices) in order to achieve even
"acceptable" performance. Often it is the case that most (if not all
rendering) is eventually done in a single Canvas node.
Now, as well already know, the JavaFX Canvas does perform very well and the
recent awesome work (DemoFX) by Chris Newland, just for example, shows what
can be done with this one node.
But, the majority of the animation plumbing in JavaFX is related to the
scene graph itself and is designed to make use of multiple nodes and node
types. At the moment, the performance of this scene graph is the Achilles
Heel of JavaFX (or at least one of them).
Enter HPR.
I personally have worked with a number of hardware-accelerated toolkits
over the years and am astounded by just how sluggish the rendering pipeline
for JavaFX is. When I am animating just a couple of hundred nodes using
JavaFX and transitions, I am lucky to get more than about 30 FPS, but on
the same (very powerful) machine, I can use other toolkits to render
thousands of "objects" and achieve frame rates well over 1000 FPS.
So, we refactored the entire scene graph rendering pipeline with the
1. It is written using JavaFX 9 and Java 9 (but could theoretically be
back-ported to JavaFX 8 though I see no reason to).
2. We analysed how other toolkits had optimised their own rendering
pipelines (especially Qt which has made some significant advances in this
area in recent years). We also analysed recent examples of multi-threaded
rendering using the new Vulkan API.
3. We carefully analysed and determined which parts of the pipeline should
best utilise the CPU and which parts should best utilise the GPU.
4. For those parts most suited to the CPU, we use the advanced concurrency
features of Java 8/9 to maximise parallelisation and throughput by
utilising multiple cores & threads in as an efficient manner as possible.
5. We devoted a large amount of time to optimising the "communication"
between the CPU and GPU to be far less "chatty" and this alone led to some
huge performance gains.
6. We also looked at the structure of the scene graph itself and after
studying products such as OpenSceneGraph, we refactored the JavaFX scene
graph in such a way that it lends itself to optimised rendering much more
easily.
7. This is clearly not a "small" patch. In fact to refer to it as a
"patch" is probably rather inappropriate.
The end result is that we now have a fully-functional prototype of HPR and,
already, we are seeing very significant performance improvements.
At the minimum, scene graph rendering performance has improved by 500% and,
with judicious and sometimes "tricky" use of caching, we have seen
improvements in performance of 10x or more.
And... we are only just *starting* with the performance optimisation phase.
The potential for HPR is massive as it opens-up the possibility for the
JavaFX scene graph and the animation/transition infrastructure to be used
for a whole new class of applications including games, advanced
visualisations etc., without having to rely on imperative programming of a
single Canvas node.
I believe that HPR, along with tremendous recent developments like JPro and
the outstanding work by Gluon on mobiles and embedded devices, could
position JavaFX to be the best graphics toolkit of any kind in any language
and, be the ONLY *truly* cross-platform graphics technology available.
WORA for graphics and UIs is finally within reach!
Blessings,
Felix
Felix Bembrick
2016-11-28 07:51:48 UTC
Permalink
Great - good to see interest growing.

Especially given that you work for Oracle, right?
I am interested too although I have only been listening quietly so far due to lack of time.
Cheers
Michael
Post by Felix Bembrick
Sorry Gerrit - you did indeed.
Maybe you'd also like to participate in the offline discussion (especially now that you don't work for Oracle)?
Well I mentioned before that I'm interested too :)
Cheers,
Gerrit
Post by Felix Bembrick
Well, given that you and Benjamin seem to be the only people interested in it, perhaps we should discuss it offline (so as not to bother Oracle or spam list this)...
Post by Tobias Bley
Where can we read more about your HPR renderer?
Post by Felix Bembrick
Short answer? Maybe.
But exactly one more word than any from Oracle ;-)
Post by Tobias Bley
A very short answer ;) ….
Do you have any URL?
Yes.
Post by Tobias Bley
Hi,
@Felix: Is there any Github project, demo video or trial to test HPR with JavaFX?
Best regards,
Tobi
Post by Felix Bembrick
Thanks Laurent,
That's another thing we discovered: using Java itself in the most performant way can help a lot.
It can be tricky, but profiling can often highlight various patterns of object instantiation that show-up red flags and can lead you directly to regions of the code that can be refactored to be significantly more efficient.
Also, the often overlooked GC log analysis can lead to similar discoveries and remedies.
Blessings,
Felix
Hi,
To optimize Pisces that became the Marlin rasterizer, I carefully avoided any both array allocation (byte/int/float pools) and also reduced array copies or clean up ie only clear dirty parts.
This approach is generic and could be applied in other critical places of the rendering pipelines.
https://bourgesl.github.io/fosdem-2016/slides/fosdem-2016-Marlin.pdf
Of course I would be happy to share my experience and work with a tiger team on optimizing JavaFX graphics.
However I would like getting sort of sponsoring for my potential contributions...
Cheers,
Laurent
Post by Tobi
Hi,
thanks Felix, Laurent and Chris for sharing your stuff with the community!
I am happy to see starting a discussion about boosting up the JavaFX rendering performance. I can confirm that the performance of JavaFX scene graph is not there where it should be. So multithreading would be an excellent, but difficult approach.
Felix, concerning your research of other toolkits: Do they all use multithreading or are there any toolkits which use single threading but are faster than JavaFX?
So maybe there are other points than multithreading where we can boost the performance?
2) your HPR sounds great. Did you already try DemoFX (part 3) benchmark with your HPR?
Best regards,
Tobi
Post by Felix Bembrick
(Thanks to Kevin for lifting my "awaiting moderation" impasse).
So, with all the recent discussions regarding the great contribution by
Laurent Bourgès of MarlinFX, it was suggested that a separate thread be
started to discuss parallelisation of the JavaFX rendering pipeline in
general.
As has been correctly pointed-out, converting or modifying the existing
rendering pipeline into a fully multi-threaded and performant beast is
indeed quite a complex task.
But, that's exactly what myself and my colleagues have been working on for
about 2 years.
The result is what we call the Hyper Rendering Pipeline (HPR).
Work on HPR started when we developed FXMark and were (bitterly)
disappointed with the performance of the JavaFX scene graph. Many JavaFX
developers have blogged about the need to dramatically minimise the number
of nodes (especially on embedded devices) in order to achieve even
"acceptable" performance. Often it is the case that most (if not all
rendering) is eventually done in a single Canvas node.
Now, as well already know, the JavaFX Canvas does perform very well and the
recent awesome work (DemoFX) by Chris Newland, just for example, shows what
can be done with this one node.
But, the majority of the animation plumbing in JavaFX is related to the
scene graph itself and is designed to make use of multiple nodes and node
types. At the moment, the performance of this scene graph is the Achilles
Heel of JavaFX (or at least one of them).
Enter HPR.
I personally have worked with a number of hardware-accelerated toolkits
over the years and am astounded by just how sluggish the rendering pipeline
for JavaFX is. When I am animating just a couple of hundred nodes using
JavaFX and transitions, I am lucky to get more than about 30 FPS, but on
the same (very powerful) machine, I can use other toolkits to render
thousands of "objects" and achieve frame rates well over 1000 FPS.
So, we refactored the entire scene graph rendering pipeline with the
1. It is written using JavaFX 9 and Java 9 (but could theoretically be
back-ported to JavaFX 8 though I see no reason to).
2. We analysed how other toolkits had optimised their own rendering
pipelines (especially Qt which has made some significant advances in this
area in recent years). We also analysed recent examples of multi-threaded
rendering using the new Vulkan API.
3. We carefully analysed and determined which parts of the pipeline should
best utilise the CPU and which parts should best utilise the GPU.
4. For those parts most suited to the CPU, we use the advanced concurrency
features of Java 8/9 to maximise parallelisation and throughput by
utilising multiple cores & threads in as an efficient manner as possible.
5. We devoted a large amount of time to optimising the "communication"
between the CPU and GPU to be far less "chatty" and this alone led to some
huge performance gains.
6. We also looked at the structure of the scene graph itself and after
studying products such as OpenSceneGraph, we refactored the JavaFX scene
graph in such a way that it lends itself to optimised rendering much more
easily.
7. This is clearly not a "small" patch. In fact to refer to it as a
"patch" is probably rather inappropriate.
The end result is that we now have a fully-functional prototype of HPR and,
already, we are seeing very significant performance improvements.
At the minimum, scene graph rendering performance has improved by 500% and,
with judicious and sometimes "tricky" use of caching, we have seen
improvements in performance of 10x or more.
And... we are only just *starting* with the performance optimisation phase.
The potential for HPR is massive as it opens-up the possibility for the
JavaFX scene graph and the animation/transition infrastructure to be used
for a whole new class of applications including games, advanced
visualisations etc., without having to rely on imperative programming of a
single Canvas node.
I believe that HPR, along with tremendous recent developments like JPro and
the outstanding work by Gluon on mobiles and embedded devices, could
position JavaFX to be the best graphics toolkit of any kind in any language
and, be the ONLY *truly* cross-platform graphics technology available.
WORA for graphics and UIs is finally within reach!
Blessings,
Felix
Michael Paus
2016-11-28 08:08:28 UTC
Permalink
Post by Felix Bembrick
Great - good to see interest growing.
Especially given that you work for Oracle, right?
Sorry, if I have to disappoint you on that but I do not work for Oracle.
I run my own little company and are the head of the Java User Group
Stuttgart.
<http://www.jugs.de/>
Post by Felix Bembrick
I am interested too although I have only been listening quietly so far due to lack of time.
Cheers
Michael
Post by Felix Bembrick
Sorry Gerrit - you did indeed.
Maybe you'd also like to participate in the offline discussion (especially now that you don't work for Oracle)?
Well I mentioned before that I'm interested too :)
Cheers,
Gerrit
Post by Felix Bembrick
Well, given that you and Benjamin seem to be the only people interested in it, perhaps we should discuss it offline (so as not to bother Oracle or spam list this)...
Post by Tobias Bley
Where can we read more about your HPR renderer?
Post by Felix Bembrick
Short answer? Maybe.
But exactly one more word than any from Oracle ;-)
Post by Tobias Bley
A very short answer ;) ….
Do you have any URL?
Yes.
Post by Tobias Bley
Hi,
@Felix: Is there any Github project, demo video or trial to test HPR with JavaFX?
Best regards,
Tobi
Post by Felix Bembrick
Thanks Laurent,
That's another thing we discovered: using Java itself in the most performant way can help a lot.
It can be tricky, but profiling can often highlight various patterns of object instantiation that show-up red flags and can lead you directly to regions of the code that can be refactored to be significantly more efficient.
Also, the often overlooked GC log analysis can lead to similar discoveries and remedies.
Blessings,
Felix
Hi,
To optimize Pisces that became the Marlin rasterizer, I carefully avoided any both array allocation (byte/int/float pools) and also reduced array copies or clean up ie only clear dirty parts.
This approach is generic and could be applied in other critical places of the rendering pipelines.
https://bourgesl.github.io/fosdem-2016/slides/fosdem-2016-Marlin.pdf
Of course I would be happy to share my experience and work with a tiger team on optimizing JavaFX graphics.
However I would like getting sort of sponsoring for my potential contributions...
Cheers,
Laurent
Post by Tobi
Hi,
thanks Felix, Laurent and Chris for sharing your stuff with the community!
I am happy to see starting a discussion about boosting up the JavaFX rendering performance. I can confirm that the performance of JavaFX scene graph is not there where it should be. So multithreading would be an excellent, but difficult approach.
Felix, concerning your research of other toolkits: Do they all use multithreading or are there any toolkits which use single threading but are faster than JavaFX?
So maybe there are other points than multithreading where we can boost the performance?
2) your HPR sounds great. Did you already try DemoFX (part 3) benchmark with your HPR?
Best regards,
Tobi
Post by Felix Bembrick
(Thanks to Kevin for lifting my "awaiting moderation" impasse).
So, with all the recent discussions regarding the great contribution by
Laurent Bourgès of MarlinFX, it was suggested that a separate thread be
started to discuss parallelisation of the JavaFX rendering pipeline in
general.
As has been correctly pointed-out, converting or modifying the existing
rendering pipeline into a fully multi-threaded and performant beast is
indeed quite a complex task.
But, that's exactly what myself and my colleagues have been working on for
about 2 years.
The result is what we call the Hyper Rendering Pipeline (HPR).
Work on HPR started when we developed FXMark and were (bitterly)
disappointed with the performance of the JavaFX scene graph. Many JavaFX
developers have blogged about the need to dramatically minimise the number
of nodes (especially on embedded devices) in order to achieve even
"acceptable" performance. Often it is the case that most (if not all
rendering) is eventually done in a single Canvas node.
Now, as well already know, the JavaFX Canvas does perform very well and the
recent awesome work (DemoFX) by Chris Newland, just for example, shows what
can be done with this one node.
But, the majority of the animation plumbing in JavaFX is related to the
scene graph itself and is designed to make use of multiple nodes and node
types. At the moment, the performance of this scene graph is the Achilles
Heel of JavaFX (or at least one of them).
Enter HPR.
I personally have worked with a number of hardware-accelerated toolkits
over the years and am astounded by just how sluggish the rendering pipeline
for JavaFX is. When I am animating just a couple of hundred nodes using
JavaFX and transitions, I am lucky to get more than about 30 FPS, but on
the same (very powerful) machine, I can use other toolkits to render
thousands of "objects" and achieve frame rates well over 1000 FPS.
So, we refactored the entire scene graph rendering pipeline with the
1. It is written using JavaFX 9 and Java 9 (but could theoretically be
back-ported to JavaFX 8 though I see no reason to).
2. We analysed how other toolkits had optimised their own rendering
pipelines (especially Qt which has made some significant advances in this
area in recent years). We also analysed recent examples of multi-threaded
rendering using the new Vulkan API.
3. We carefully analysed and determined which parts of the pipeline should
best utilise the CPU and which parts should best utilise the GPU.
4. For those parts most suited to the CPU, we use the advanced concurrency
features of Java 8/9 to maximise parallelisation and throughput by
utilising multiple cores & threads in as an efficient manner as possible.
5. We devoted a large amount of time to optimising the "communication"
between the CPU and GPU to be far less "chatty" and this alone led to some
huge performance gains.
6. We also looked at the structure of the scene graph itself and after
studying products such as OpenSceneGraph, we refactored the JavaFX scene
graph in such a way that it lends itself to optimised rendering much more
easily.
7. This is clearly not a "small" patch. In fact to refer to it as a
"patch" is probably rather inappropriate.
The end result is that we now have a fully-functional prototype of HPR and,
already, we are seeing very significant performance improvements.
At the minimum, scene graph rendering performance has improved by 500% and,
with judicious and sometimes "tricky" use of caching, we have seen
improvements in performance of 10x or more.
And... we are only just *starting* with the performance optimisation phase.
The potential for HPR is massive as it opens-up the possibility for the
JavaFX scene graph and the animation/transition infrastructure to be used
for a whole new class of applications including games, advanced
visualisations etc., without having to rely on imperative programming of a
single Canvas node.
I believe that HPR, along with tremendous recent developments like JPro and
the outstanding work by Gluon on mobiles and embedded devices, could
position JavaFX to be the best graphics toolkit of any kind in any language
and, be the ONLY *truly* cross-platform graphics technology available.
WORA for graphics and UIs is finally within reach!
Blessings,
Felix
Felix Bembrick
2016-11-28 08:28:32 UTC
Permalink
No disappointment, no surprises.

It was a rhetorical question...
Post by Michael Paus
Post by Felix Bembrick
Great - good to see interest growing.
Especially given that you work for Oracle, right?
Sorry, if I have to disappoint you on that but I do not work for Oracle.
I run my own little company and are the head of the Java User Group Stuttgart.
<http://www.jugs.de/>
Post by Felix Bembrick
I am interested too although I have only been listening quietly so far due to lack of time.
Cheers
Michael
Post by Felix Bembrick
Sorry Gerrit - you did indeed.
Maybe you'd also like to participate in the offline discussion (especially now that you don't work for Oracle)?
Well I mentioned before that I'm interested too :)
Cheers,
Gerrit
Post by Felix Bembrick
Well, given that you and Benjamin seem to be the only people interested in it, perhaps we should discuss it offline (so as not to bother Oracle or spam list this)...
Post by Tobias Bley
Where can we read more about your HPR renderer?
Great - good to see interest growing.
Especially given that you work for Oracle, right?
Sorry, if I have to disappoint you on that but I do not work for Oracle.
I run my own little company and are the head of the Java User Group Stuttgart.
<http://www.jugs.de/>
Post by Felix Bembrick
I am interested too although I have only been listening quietly so far due to lack of time.
Cheers
Michael
Post by Felix Bembrick
Sorry Gerrit - you did indeed.
Maybe you'd also like to participate in the offline discussion (especially now that you don't work for Oracle)?
Well I mentioned before that I'm interested too :)
Cheers,
Gerrit
Post by Felix Bembrick
Well, given that you and Benjamin seem to be the only people interested in it, perhaps we should discuss it offline (so as not to bother Oracle or spam list this)...
Post by Tobias Bley
Where can we read more about your HPR renderer?
Post by Felix Bembrick
Short answer? Maybe.
But exactly one more word than any from Oracle ;-)
Post by Tobias Bley
A very short answer ;) ….
Do you have any URL?
Yes.
Post by Tobias Bley
Hi,
@Felix: Is there any Github project, demo video or trial to test HPR with JavaFX?
Best regards,
Tobi
Post by Felix Bembrick
Thanks Laurent,
That's another thing we discovered: using Java itself in the most performant way can help a lot.
It can be tricky, but profiling can often highlight various patterns of object instantiation that show-up red flags and can lead you directly to regions of the code that can be refactored to be significantly more efficient.
Also, the often overlooked GC log analysis can lead to similar discoveries and remedies.
Blessings,
Felix
Hi,
To optimize Pisces that became the Marlin rasterizer, I carefully avoided any both array allocation (byte/int/float pools) and also reduced array copies or clean up ie only clear dirty parts.
This approach is generic and could be applied in other critical places of the rendering pipelines.
https://bourgesl.github.io/fosdem-2016/slides/fosdem-2016-Marlin.pdf
Of course I would be happy to share my experience and work with a tiger team on optimizing JavaFX graphics.
However I would like getting sort of sponsoring for my potential contributions...
Cheers,
Laurent
Post by Tobi
Hi,
thanks Felix, Laurent and Chris for sharing your stuff with the community!
I am happy to see starting a discussion about boosting up the JavaFX rendering performance. I can confirm that the performance of JavaFX scene graph is not there where it should be. So multithreading would be an excellent, but difficult approach.
Felix, concerning your research of other toolkits: Do they all use multithreading or are there any toolkits which use single threading but are faster than JavaFX?
So maybe there are other points than multithreading where we can boost the performance?
2) your HPR sounds great. Did you already try DemoFX (part 3) benchmark with your HPR?
Best regards,
Tobi
Post by Felix Bembrick
(Thanks to Kevin for lifting my "awaiting moderation" impasse).
So, with all the recent discussions regarding the great contribution by
Laurent Bourgès of MarlinFX, it was suggested that a separate thread be
started to discuss parallelisation of the JavaFX rendering pipeline in
general.
As has been correctly pointed-out, converting or modifying the existing
rendering pipeline into a fully multi-threaded and performant beast is
indeed quite a complex task.
But, that's exactly what myself and my colleagues have been working on for
about 2 years.
The result is what we call the Hyper Rendering Pipeline (HPR).
Work on HPR started when we developed FXMark and were (bitterly)
disappointed with the performance of the JavaFX scene graph. Many JavaFX
developers have blogged about the need to dramatically minimise the number
of nodes (especially on embedded devices) in order to achieve even
"acceptable" performance. Often it is the case that most (if not all
rendering) is eventually done in a single Canvas node.
Now, as well already know, the JavaFX Canvas does perform very well and the
recent awesome work (DemoFX) by Chris Newland, just for example, shows what
can be done with this one node.
But, the majority of the animation plumbing in JavaFX is related to the
scene graph itself and is designed to make use of multiple nodes and node
types. At the moment, the performance of this scene graph is the Achilles
Heel of JavaFX (or at least one of them).
Enter HPR.
I personally have worked with a number of hardware-accelerated toolkits
over the years and am astounded by just how sluggish the rendering pipeline
for JavaFX is. When I am animating just a couple of hundred nodes using
JavaFX and transitions, I am lucky to get more than about 30 FPS, but on
the same (very powerful) machine, I can use other toolkits to render
thousands of "objects" and achieve frame rates well over 1000 FPS.
So, we refactored the entire scene graph rendering pipeline with the
1. It is written using JavaFX 9 and Java 9 (but could theoretically be
back-ported to JavaFX 8 though I see no reason to).
2. We analysed how other toolkits had optimised their own rendering
pipelines (especially Qt which has made some significant advances in this
area in recent years). We also analysed recent examples of multi-threaded
rendering using the new Vulkan API.
3. We carefully analysed and determined which parts of the pipeline should
best utilise the CPU and which parts should best utilise the GPU.
4. For those parts most suited to the CPU, we use the advanced concurrency
features of Java 8/9 to maximise parallelisation and throughput by
utilising multiple cores & threads in as an efficient manner as possible.
5. We devoted a large amount of time to optimising the "communication"
between the CPU and GPU to be far less "chatty" and this alone led to some
huge performance gains.
6. We also looked at the structure of the scene graph itself and after
studying products such as OpenSceneGraph, we refactored the JavaFX scene
graph in such a way that it lends itself to optimised rendering much more
easily.
7. This is clearly not a "small" patch. In fact to refer to it as a
"patch" is probably rather inappropriate.
The end result is that we now have a fully-functional prototype of HPR and,
already, we are seeing very significant performance improvements.
At the minimum, scene graph rendering performance has improved by 500% and,
with judicious and sometimes "tricky" use of caching, we have seen
improvements in performance of 10x or more.
And... we are only just *starting* with the performance optimisation phase.
The potential for HPR is massive as it opens-up the possibility for the
JavaFX scene graph and the animation/transition infrastructure to be used
for a whole new class of applications including games, advanced
visualisations etc., without having to rely on imperative programming of a
single Canvas node.
I believe that HPR, along with tremendous recent developments like JPro and
the outstanding work by Gluon on mobiles and embedded devices, could
position JavaFX to be the best graphics toolkit of any kind in any language
and, be the ONLY *truly* cross-platform graphics technology available.
WORA for graphics and UIs is finally within reach!
Blessings,
Felix
Felix Bembrick
2016-11-28 08:29:07 UTC
Permalink
Agreed.
Post by Michael Paus
Post by Felix Bembrick
Great - good to see interest growing.
Especially given that you work for Oracle, right?
Sorry, if I have to disappoint you on that but I do not work for Oracle.
I run my own little company and are the head of the Java User Group Stuttgart.
<http://www.jugs.de/>
Post by Felix Bembrick
I am interested too although I have only been listening quietly so far due to lack of time.
Cheers
Michael
Post by Felix Bembrick
Sorry Gerrit - you did indeed.
Maybe you'd also like to participate in the offline discussion (especially now that you don't work for Oracle)?
Well I mentioned before that I'm interested too :)
Cheers,
Gerrit
Post by Felix Bembrick
Well, given that you and Benjamin seem to be the only people interested in it, perhaps we should discuss it offline (so as not to bother Oracle or spam list this)...
Post by Tobias Bley
Where can we read more about your HPR renderer?
Great - good to see interest growing.
Especially given that you work for Oracle, right?
Sorry, if I have to disappoint you on that but I do not work for Oracle.
I run my own little company and are the head of the Java User Group Stuttgart.
<http://www.jugs.de/>
Post by Felix Bembrick
I am interested too although I have only been listening quietly so far due to lack of time.
Cheers
Michael
Post by Felix Bembrick
Sorry Gerrit - you did indeed.
Maybe you'd also like to participate in the offline discussion (especially now that you don't work for Oracle)?
Well I mentioned before that I'm interested too :)
Cheers,
Gerrit
Post by Felix Bembrick
Well, given that you and Benjamin seem to be the only people interested in it, perhaps we should discuss it offline (so as not to bother Oracle or spam list this)...
Post by Tobias Bley
Where can we read more about your HPR renderer?
Post by Felix Bembrick
Short answer? Maybe.
But exactly one more word than any from Oracle ;-)
Post by Tobias Bley
A very short answer ;) ….
Do you have any URL?
Yes.
Post by Tobias Bley
Hi,
@Felix: Is there any Github project, demo video or trial to test HPR with JavaFX?
Best regards,
Tobi
Post by Felix Bembrick
Thanks Laurent,
That's another thing we discovered: using Java itself in the most performant way can help a lot.
It can be tricky, but profiling can often highlight various patterns of object instantiation that show-up red flags and can lead you directly to regions of the code that can be refactored to be significantly more efficient.
Also, the often overlooked GC log analysis can lead to similar discoveries and remedies.
Blessings,
Felix
Hi,
To optimize Pisces that became the Marlin rasterizer, I carefully avoided any both array allocation (byte/int/float pools) and also reduced array copies or clean up ie only clear dirty parts.
This approach is generic and could be applied in other critical places of the rendering pipelines.
https://bourgesl.github.io/fosdem-2016/slides/fosdem-2016-Marlin.pdf
Of course I would be happy to share my experience and work with a tiger team on optimizing JavaFX graphics.
However I would like getting sort of sponsoring for my potential contributions...
Cheers,
Laurent
Post by Tobi
Hi,
thanks Felix, Laurent and Chris for sharing your stuff with the community!
I am happy to see starting a discussion about boosting up the JavaFX rendering performance. I can confirm that the performance of JavaFX scene graph is not there where it should be. So multithreading would be an excellent, but difficult approach.
Felix, concerning your research of other toolkits: Do they all use multithreading or are there any toolkits which use single threading but are faster than JavaFX?
So maybe there are other points than multithreading where we can boost the performance?
2) your HPR sounds great. Did you already try DemoFX (part 3) benchmark with your HPR?
Best regards,
Tobi
Post by Felix Bembrick
(Thanks to Kevin for lifting my "awaiting moderation" impasse).
So, with all the recent discussions regarding the great contribution by
Laurent Bourgès of MarlinFX, it was suggested that a separate thread be
started to discuss parallelisation of the JavaFX rendering pipeline in
general.
As has been correctly pointed-out, converting or modifying the existing
rendering pipeline into a fully multi-threaded and performant beast is
indeed quite a complex task.
But, that's exactly what myself and my colleagues have been working on for
about 2 years.
The result is what we call the Hyper Rendering Pipeline (HPR).
Work on HPR started when we developed FXMark and were (bitterly)
disappointed with the performance of the JavaFX scene graph. Many JavaFX
developers have blogged about the need to dramatically minimise the number
of nodes (especially on embedded devices) in order to achieve even
"acceptable" performance. Often it is the case that most (if not all
rendering) is eventually done in a single Canvas node.
Now, as well already know, the JavaFX Canvas does perform very well and the
recent awesome work (DemoFX) by Chris Newland, just for example, shows what
can be done with this one node.
But, the majority of the animation plumbing in JavaFX is related to the
scene graph itself and is designed to make use of multiple nodes and node
types. At the moment, the performance of this scene graph is the Achilles
Heel of JavaFX (or at least one of them).
Enter HPR.
I personally have worked with a number of hardware-accelerated toolkits
over the years and am astounded by just how sluggish the rendering pipeline
for JavaFX is. When I am animating just a couple of hundred nodes using
JavaFX and transitions, I am lucky to get more than about 30 FPS, but on
the same (very powerful) machine, I can use other toolkits to render
thousands of "objects" and achieve frame rates well over 1000 FPS.
So, we refactored the entire scene graph rendering pipeline with the
1. It is written using JavaFX 9 and Java 9 (but could theoretically be
back-ported to JavaFX 8 though I see no reason to).
2. We analysed how other toolkits had optimised their own rendering
pipelines (especially Qt which has made some significant advances in this
area in recent years). We also analysed recent examples of multi-threaded
rendering using the new Vulkan API.
3. We carefully analysed and determined which parts of the pipeline should
best utilise the CPU and which parts should best utilise the GPU.
4. For those parts most suited to the CPU, we use the advanced concurrency
features of Java 8/9 to maximise parallelisation and throughput by
utilising multiple cores & threads in as an efficient manner as possible.
5. We devoted a large amount of time to optimising the "communication"
between the CPU and GPU to be far less "chatty" and this alone led to some
huge performance gains.
6. We also looked at the structure of the scene graph itself and after
studying products such as OpenSceneGraph, we refactored the JavaFX scene
graph in such a way that it lends itself to optimised rendering much more
easily.
7. This is clearly not a "small" patch. In fact to refer to it as a
"patch" is probably rather inappropriate.
The end result is that we now have a fully-functional prototype of HPR and,
already, we are seeing very significant performance improvements.
At the minimum, scene graph rendering performance has improved by 500% and,
with judicious and sometimes "tricky" use of caching, we have seen
improvements in performance of 10x or more.
And... we are only just *starting* with the performance optimisation phase.
The potential for HPR is massive as it opens-up the possibility for the
JavaFX scene graph and the animation/transition infrastructure to be used
for a whole new class of applications including games, advanced
visualisations etc., without having to rely on imperative programming of a
single Canvas node.
I believe that HPR, along with tremendous recent developments like JPro and
the outstanding work by Gluon on mobiles and embedded devices, could
position JavaFX to be the best graphics toolkit of any kind in any language
and, be the ONLY *truly* cross-platform graphics technology available.
WORA for graphics and UIs is finally within reach!
Blessings,
Felix
Tobi
2016-11-28 08:10:57 UTC
Permalink
We should discuss a new rendering pipeline on the openjfx mailing list. It’s not off topic - it’s an important topic for the future of JavaFX.
Post by Felix Bembrick
Sorry Gerrit - you did indeed.
Maybe you'd also like to participate in the offline discussion (especially now that you don't work for Oracle)?
Well I mentioned before that I'm interested too :)
Cheers,
Gerrit
Post by Felix Bembrick
Well, given that you and Benjamin seem to be the only people interested in it, perhaps we should discuss it offline (so as not to bother Oracle or spam list this)...
Post by Tobias Bley
Where can we read more about your HPR renderer?
Post by Felix Bembrick
Short answer? Maybe.
But exactly one more word than any from Oracle ;-)
Post by Tobias Bley
A very short answer ;) ….
Do you have any URL?
Yes.
Post by Tobias Bley
Hi,
@Felix: Is there any Github project, demo video or trial to test HPR with JavaFX?
Best regards,
Tobi
Post by Felix Bembrick
Thanks Laurent,
That's another thing we discovered: using Java itself in the most performant way can help a lot.
It can be tricky, but profiling can often highlight various patterns of object instantiation that show-up red flags and can lead you directly to regions of the code that can be refactored to be significantly more efficient.
Also, the often overlooked GC log analysis can lead to similar discoveries and remedies.
Blessings,
Felix
Hi,
To optimize Pisces that became the Marlin rasterizer, I carefully avoided any both array allocation (byte/int/float pools) and also reduced array copies or clean up ie only clear dirty parts.
This approach is generic and could be applied in other critical places of the rendering pipelines.
https://bourgesl.github.io/fosdem-2016/slides/fosdem-2016-Marlin.pdf
Of course I would be happy to share my experience and work with a tiger team on optimizing JavaFX graphics.
However I would like getting sort of sponsoring for my potential contributions...
Cheers,
Laurent
Post by Tobi
Hi,
thanks Felix, Laurent and Chris for sharing your stuff with the community!
I am happy to see starting a discussion about boosting up the JavaFX rendering performance. I can confirm that the performance of JavaFX scene graph is not there where it should be. So multithreading would be an excellent, but difficult approach.
Felix, concerning your research of other toolkits: Do they all use multithreading or are there any toolkits which use single threading but are faster than JavaFX?
So maybe there are other points than multithreading where we can boost the performance?
2) your HPR sounds great. Did you already try DemoFX (part 3) benchmark with your HPR?
Best regards,
Tobi
Post by Felix Bembrick
(Thanks to Kevin for lifting my "awaiting moderation" impasse).
So, with all the recent discussions regarding the great contribution by
Laurent Bourgès of MarlinFX, it was suggested that a separate thread be
started to discuss parallelisation of the JavaFX rendering pipeline in
general.
As has been correctly pointed-out, converting or modifying the existing
rendering pipeline into a fully multi-threaded and performant beast is
indeed quite a complex task.
But, that's exactly what myself and my colleagues have been working on for
about 2 years.
The result is what we call the Hyper Rendering Pipeline (HPR).
Work on HPR started when we developed FXMark and were (bitterly)
disappointed with the performance of the JavaFX scene graph. Many JavaFX
developers have blogged about the need to dramatically minimise the number
of nodes (especially on embedded devices) in order to achieve even
"acceptable" performance. Often it is the case that most (if not all
rendering) is eventually done in a single Canvas node.
Now, as well already know, the JavaFX Canvas does perform very well and the
recent awesome work (DemoFX) by Chris Newland, just for example, shows what
can be done with this one node.
But, the majority of the animation plumbing in JavaFX is related to the
scene graph itself and is designed to make use of multiple nodes and node
types. At the moment, the performance of this scene graph is the Achilles
Heel of JavaFX (or at least one of them).
Enter HPR.
I personally have worked with a number of hardware-accelerated toolkits
over the years and am astounded by just how sluggish the rendering pipeline
for JavaFX is. When I am animating just a couple of hundred nodes using
JavaFX and transitions, I am lucky to get more than about 30 FPS, but on
the same (very powerful) machine, I can use other toolkits to render
thousands of "objects" and achieve frame rates well over 1000 FPS.
So, we refactored the entire scene graph rendering pipeline with the
1. It is written using JavaFX 9 and Java 9 (but could theoretically be
back-ported to JavaFX 8 though I see no reason to).
2. We analysed how other toolkits had optimised their own rendering
pipelines (especially Qt which has made some significant advances in this
area in recent years). We also analysed recent examples of multi-threaded
rendering using the new Vulkan API.
3. We carefully analysed and determined which parts of the pipeline should
best utilise the CPU and which parts should best utilise the GPU.
4. For those parts most suited to the CPU, we use the advanced concurrency
features of Java 8/9 to maximise parallelisation and throughput by
utilising multiple cores & threads in as an efficient manner as possible.
5. We devoted a large amount of time to optimising the "communication"
between the CPU and GPU to be far less "chatty" and this alone led to some
huge performance gains.
6. We also looked at the structure of the scene graph itself and after
studying products such as OpenSceneGraph, we refactored the JavaFX scene
graph in such a way that it lends itself to optimised rendering much more
easily.
7. This is clearly not a "small" patch. In fact to refer to it as a
"patch" is probably rather inappropriate.
The end result is that we now have a fully-functional prototype of HPR and,
already, we are seeing very significant performance improvements.
At the minimum, scene graph rendering performance has improved by 500% and,
with judicious and sometimes "tricky" use of caching, we have seen
improvements in performance of 10x or more.
And... we are only just *starting* with the performance optimisation phase.
The potential for HPR is massive as it opens-up the possibility for the
JavaFX scene graph and the animation/transition infrastructure to be used
for a whole new class of applications including games, advanced
visualisations etc., without having to rely on imperative programming of a
single Canvas node.
I believe that HPR, along with tremendous recent developments like JPro and
the outstanding work by Gluon on mobiles and embedded devices, could
position JavaFX to be the best graphics toolkit of any kind in any language
and, be the ONLY *truly* cross-platform graphics technology available.
WORA for graphics and UIs is finally within reach!
Blessings,
Felix
Felix Bembrick
2016-11-28 08:33:26 UTC
Permalink
Sorry, the "agreed" comment was meant to be a reply to you Tobi.

Pity not everyone "agrees"...
Post by Tobi
We should discuss a new rendering pipeline on the openjfx mailing list. It’s not off topic - it’s an important topic for the future of JavaFX.
Post by Felix Bembrick
Sorry Gerrit - you did indeed.
Maybe you'd also like to participate in the offline discussion (especially now that you don't work for Oracle)?
Well I mentioned before that I'm interested too :)
Cheers,
Gerrit
Post by Felix Bembrick
Well, given that you and Benjamin seem to be the only people interested in it, perhaps we should discuss it offline (so as not to bother Oracle or spam list this)...
Post by Tobias Bley
Where can we read more about your HPR renderer?
Post by Felix Bembrick
Short answer? Maybe.
But exactly one more word than any from Oracle ;-)
Post by Tobias Bley
A very short answer ;) ….
Do you have any URL?
Yes.
Post by Tobias Bley
Hi,
@Felix: Is there any Github project, demo video or trial to test HPR with JavaFX?
Best regards,
Tobi
Post by Felix Bembrick
Thanks Laurent,
That's another thing we discovered: using Java itself in the most performant way can help a lot.
It can be tricky, but profiling can often highlight various patterns of object instantiation that show-up red flags and can lead you directly to regions of the code that can be refactored to be significantly more efficient.
Also, the often overlooked GC log analysis can lead to similar discoveries and remedies.
Blessings,
Felix
Hi,
To optimize Pisces that became the Marlin rasterizer, I carefully avoided any both array allocation (byte/int/float pools) and also reduced array copies or clean up ie only clear dirty parts.
This approach is generic and could be applied in other critical places of the rendering pipelines.
https://bourgesl.github.io/fosdem-2016/slides/fosdem-2016-Marlin.pdf
Of course I would be happy to share my experience and work with a tiger team on optimizing JavaFX graphics.
However I would like getting sort of sponsoring for my potential contributions...
Cheers,
Laurent
Post by Tobi
Hi,
thanks Felix, Laurent and Chris for sharing your stuff with the community!
I am happy to see starting a discussion about boosting up the JavaFX rendering performance. I can confirm that the performance of JavaFX scene graph is not there where it should be. So multithreading would be an excellent, but difficult approach.
Felix, concerning your research of other toolkits: Do they all use multithreading or are there any toolkits which use single threading but are faster than JavaFX?
So maybe there are other points than multithreading where we can boost the performance?
2) your HPR sounds great. Did you already try DemoFX (part 3) benchmark with your HPR?
Best regards,
Tobi
Post by Felix Bembrick
(Thanks to Kevin for lifting my "awaiting moderation" impasse).
So, with all the recent discussions regarding the great contribution by
Laurent Bourgès of MarlinFX, it was suggested that a separate thread be
started to discuss parallelisation of the JavaFX rendering pipeline in
general.
As has been correctly pointed-out, converting or modifying the existing
rendering pipeline into a fully multi-threaded and performant beast is
indeed quite a complex task.
But, that's exactly what myself and my colleagues have been working on for
about 2 years.
The result is what we call the Hyper Rendering Pipeline (HPR).
Work on HPR started when we developed FXMark and were (bitterly)
disappointed with the performance of the JavaFX scene graph. Many JavaFX
developers have blogged about the need to dramatically minimise the number
of nodes (especially on embedded devices) in order to achieve even
"acceptable" performance. Often it is the case that most (if not all
rendering) is eventually done in a single Canvas node.
Now, as well already know, the JavaFX Canvas does perform very well and the
recent awesome work (DemoFX) by Chris Newland, just for example, shows what
can be done with this one node.
But, the majority of the animation plumbing in JavaFX is related to the
scene graph itself and is designed to make use of multiple nodes and node
types. At the moment, the performance of this scene graph is the Achilles
Heel of JavaFX (or at least one of them).
Enter HPR.
I personally have worked with a number of hardware-accelerated toolkits
over the years and am astounded by just how sluggish the rendering pipeline
for JavaFX is. When I am animating just a couple of hundred nodes using
JavaFX and transitions, I am lucky to get more than about 30 FPS, but on
the same (very powerful) machine, I can use other toolkits to render
thousands of "objects" and achieve frame rates well over 1000 FPS.
So, we refactored the entire scene graph rendering pipeline with the
1. It is written using JavaFX 9 and Java 9 (but could theoretically be
back-ported to JavaFX 8 though I see no reason to).
2. We analysed how other toolkits had optimised their own rendering
pipelines (especially Qt which has made some significant advances in this
area in recent years). We also analysed recent examples of multi-threaded
rendering using the new Vulkan API.
3. We carefully analysed and determined which parts of the pipeline should
best utilise the CPU and which parts should best utilise the GPU.
4. For those parts most suited to the CPU, we use the advanced concurrency
features of Java 8/9 to maximise parallelisation and throughput by
utilising multiple cores & threads in as an efficient manner as possible.
5. We devoted a large amount of time to optimising the "communication"
between the CPU and GPU to be far less "chatty" and this alone led to some
huge performance gains.
6. We also looked at the structure of the scene graph itself and after
studying products such as OpenSceneGraph, we refactored the JavaFX scene
graph in such a way that it lends itself to optimised rendering much more
easily.
7. This is clearly not a "small" patch. In fact to refer to it as a
"patch" is probably rather inappropriate.
The end result is that we now have a fully-functional prototype of HPR and,
already, we are seeing very significant performance improvements.
At the minimum, scene graph rendering performance has improved by 500% and,
with judicious and sometimes "tricky" use of caching, we have seen
improvements in performance of 10x or more.
And... we are only just *starting* with the performance optimisation phase.
The potential for HPR is massive as it opens-up the possibility for the
JavaFX scene graph and the animation/transition infrastructure to be used
for a whole new class of applications including games, advanced
visualisations etc., without having to rely on imperative programming of a
single Canvas node.
I believe that HPR, along with tremendous recent developments like JPro and
the outstanding work by Gluon on mobiles and embedded devices, could
position JavaFX to be the best graphics toolkit of any kind in any language
and, be the ONLY *truly* cross-platform graphics technology available.
WORA for graphics and UIs is finally within reach!
Blessings,
Felix
Benjamin Gudehus
2016-11-25 11:25:48 UTC
Permalink
Wow, thanks for all the great work (Felix and Laurent)! Marlin and HPR seem
to really fit into what needs to be done to improve the performance.

Speaking of the Vulkan API: Does HPR use shaders to optimize the rendering
or does this only apply to rasterization (i.e. Marlin)?

Webrender and Servo (by Mozilla written in Rust) use GPU shaders a lot,
along with parallelized DOM (scene graph) access, aggressive culling and
caching and batching.

--Benjamin
Post by Tobias Bley
Hi,
@Felix: Is there any Github project, demo video or trial to test HPR with JavaFX?
Best regards,
Tobi
Post by Felix Bembrick
Thanks Laurent,
That's another thing we discovered: using Java itself in the most
performant way can help a lot.
Post by Felix Bembrick
It can be tricky, but profiling can often highlight various patterns of
object instantiation that show-up red flags and can lead you directly to
regions of the code that can be refactored to be significantly more
efficient.
Post by Felix Bembrick
Also, the often overlooked GC log analysis can lead to similar
discoveries and remedies.
Post by Felix Bembrick
Blessings,
Felix
Post by Laurent Bourgès
Hi,
To optimize Pisces that became the Marlin rasterizer, I carefully
avoided any both array allocation (byte/int/float pools) and also reduced
array copies or clean up ie only clear dirty parts.
Post by Felix Bembrick
Post by Laurent Bourgès
This approach is generic and could be applied in other critical places
of the rendering pipelines.
Post by Felix Bembrick
Post by Laurent Bourgès
https://bourgesl.github.io/fosdem-2016/slides/fosdem-2016-Marlin.pdf
Of course I would be happy to share my experience and work with a tiger
team on optimizing JavaFX graphics.
Post by Felix Bembrick
Post by Laurent Bourgès
However I would like getting sort of sponsoring for my potential
contributions...
Post by Felix Bembrick
Post by Laurent Bourgès
Cheers,
Laurent
Post by Tobi
Hi,
thanks Felix, Laurent and Chris for sharing your stuff with the
community!
Post by Felix Bembrick
Post by Laurent Bourgès
Post by Tobi
I am happy to see starting a discussion about boosting up the JavaFX
rendering performance. I can confirm that the performance of JavaFX scene
graph is not there where it should be. So multithreading would be an
excellent, but difficult approach.
Post by Felix Bembrick
Post by Laurent Bourgès
Post by Tobi
Felix, concerning your research of other toolkits: Do they all use
multithreading or are there any toolkits which use single threading but are
faster than JavaFX?
Post by Felix Bembrick
Post by Laurent Bourgès
Post by Tobi
So maybe there are other points than multithreading where we can boost
the performance?
Post by Felix Bembrick
Post by Laurent Bourgès
Post by Tobi
2) your HPR sounds great. Did you already try DemoFX (part 3)
benchmark with your HPR?
Post by Felix Bembrick
Post by Laurent Bourgès
Post by Tobi
Best regards,
Tobi
Am 10.11.2016 um 19:11 schrieb Felix Bembrick <
(Thanks to Kevin for lifting my "awaiting moderation" impasse).
So, with all the recent discussions regarding the great contribution
by
Post by Felix Bembrick
Post by Laurent Bourgès
Post by Tobi
Laurent Bourgès of MarlinFX, it was suggested that a separate thread
be
Post by Felix Bembrick
Post by Laurent Bourgès
Post by Tobi
started to discuss parallelisation of the JavaFX rendering pipeline in
general.
As has been correctly pointed-out, converting or modifying the
existing
Post by Felix Bembrick
Post by Laurent Bourgès
Post by Tobi
rendering pipeline into a fully multi-threaded and performant beast is
indeed quite a complex task.
But, that's exactly what myself and my colleagues have been working
on for
Post by Felix Bembrick
Post by Laurent Bourgès
Post by Tobi
about 2 years.
The result is what we call the Hyper Rendering Pipeline (HPR).
Work on HPR started when we developed FXMark and were (bitterly)
disappointed with the performance of the JavaFX scene graph. Many
JavaFX
Post by Felix Bembrick
Post by Laurent Bourgès
Post by Tobi
developers have blogged about the need to dramatically minimise the
number
Post by Felix Bembrick
Post by Laurent Bourgès
Post by Tobi
of nodes (especially on embedded devices) in order to achieve even
"acceptable" performance. Often it is the case that most (if not all
rendering) is eventually done in a single Canvas node.
Now, as well already know, the JavaFX Canvas does perform very well
and the
Post by Felix Bembrick
Post by Laurent Bourgès
Post by Tobi
recent awesome work (DemoFX) by Chris Newland, just for example,
shows what
Post by Felix Bembrick
Post by Laurent Bourgès
Post by Tobi
can be done with this one node.
But, the majority of the animation plumbing in JavaFX is related to
the
Post by Felix Bembrick
Post by Laurent Bourgès
Post by Tobi
scene graph itself and is designed to make use of multiple nodes and
node
Post by Felix Bembrick
Post by Laurent Bourgès
Post by Tobi
types. At the moment, the performance of this scene graph is the
Achilles
Post by Felix Bembrick
Post by Laurent Bourgès
Post by Tobi
Heel of JavaFX (or at least one of them).
Enter HPR.
I personally have worked with a number of hardware-accelerated
toolkits
Post by Felix Bembrick
Post by Laurent Bourgès
Post by Tobi
over the years and am astounded by just how sluggish the rendering
pipeline
Post by Felix Bembrick
Post by Laurent Bourgès
Post by Tobi
for JavaFX is. When I am animating just a couple of hundred nodes
using
Post by Felix Bembrick
Post by Laurent Bourgès
Post by Tobi
JavaFX and transitions, I am lucky to get more than about 30 FPS, but
on
Post by Felix Bembrick
Post by Laurent Bourgès
Post by Tobi
the same (very powerful) machine, I can use other toolkits to render
thousands of "objects" and achieve frame rates well over 1000 FPS.
So, we refactored the entire scene graph rendering pipeline with the
1. It is written using JavaFX 9 and Java 9 (but could theoretically be
back-ported to JavaFX 8 though I see no reason to).
2. We analysed how other toolkits had optimised their own rendering
pipelines (especially Qt which has made some significant advances in
this
Post by Felix Bembrick
Post by Laurent Bourgès
Post by Tobi
area in recent years). We also analysed recent examples of
multi-threaded
Post by Felix Bembrick
Post by Laurent Bourgès
Post by Tobi
rendering using the new Vulkan API.
3. We carefully analysed and determined which parts of the pipeline
should
Post by Felix Bembrick
Post by Laurent Bourgès
Post by Tobi
best utilise the CPU and which parts should best utilise the GPU.
4. For those parts most suited to the CPU, we use the advanced
concurrency
Post by Felix Bembrick
Post by Laurent Bourgès
Post by Tobi
features of Java 8/9 to maximise parallelisation and throughput by
utilising multiple cores & threads in as an efficient manner as
possible.
Post by Felix Bembrick
Post by Laurent Bourgès
Post by Tobi
5. We devoted a large amount of time to optimising the "communication"
between the CPU and GPU to be far less "chatty" and this alone led to
some
Post by Felix Bembrick
Post by Laurent Bourgès
Post by Tobi
huge performance gains.
6. We also looked at the structure of the scene graph itself and after
studying products such as OpenSceneGraph, we refactored the JavaFX
scene
Post by Felix Bembrick
Post by Laurent Bourgès
Post by Tobi
graph in such a way that it lends itself to optimised rendering much
more
Post by Felix Bembrick
Post by Laurent Bourgès
Post by Tobi
easily.
7. This is clearly not a "small" patch. In fact to refer to it as a
"patch" is probably rather inappropriate.
The end result is that we now have a fully-functional prototype of
HPR and,
Post by Felix Bembrick
Post by Laurent Bourgès
Post by Tobi
already, we are seeing very significant performance improvements.
At the minimum, scene graph rendering performance has improved by
500% and,
Post by Felix Bembrick
Post by Laurent Bourgès
Post by Tobi
with judicious and sometimes "tricky" use of caching, we have seen
improvements in performance of 10x or more.
And... we are only just *starting* with the performance optimisation
phase.
Post by Felix Bembrick
Post by Laurent Bourgès
Post by Tobi
The potential for HPR is massive as it opens-up the possibility for
the
Post by Felix Bembrick
Post by Laurent Bourgès
Post by Tobi
JavaFX scene graph and the animation/transition infrastructure to be
used
Post by Felix Bembrick
Post by Laurent Bourgès
Post by Tobi
for a whole new class of applications including games, advanced
visualisations etc., without having to rely on imperative programming
of a
Post by Felix Bembrick
Post by Laurent Bourgès
Post by Tobi
single Canvas node.
I believe that HPR, along with tremendous recent developments like
JPro and
Post by Felix Bembrick
Post by Laurent Bourgès
Post by Tobi
the outstanding work by Gluon on mobiles and embedded devices, could
position JavaFX to be the best graphics toolkit of any kind in any
language
Post by Felix Bembrick
Post by Laurent Bourgès
Post by Tobi
and, be the ONLY *truly* cross-platform graphics technology available.
WORA for graphics and UIs is finally within reach!
Blessings,
Felix
Felix Bembrick
2016-11-25 15:44:03 UTC
Permalink
Thanks Benjamin,

We studied those products you mentioned when designing HPR and, yes, there is extensive use of shaders and much more utilisation of the GPU in general.

We also have the beginnings of a Vulkan-only version written in (coincidentally) Rust which is showing amazing promise. Vulkan is something we are investing a lot of research and effort into.

Felix
Wow, thanks for all the great work (Felix and Laurent)! Marlin and HPR seem to really fit into what needs to be done to improve the performance.
Speaking of the Vulkan API: Does HPR use shaders to optimize the rendering or does this only apply to rasterization (i.e. Marlin)?
Webrender and Servo (by Mozilla written in Rust) use GPU shaders a lot, along with parallelized DOM (scene graph) access, aggressive culling and caching and batching.
--Benjamin
Post by Tobias Bley
Hi,
@Felix: Is there any Github project, demo video or trial to test HPR with JavaFX?
Best regards,
Tobi
Post by Felix Bembrick
Thanks Laurent,
That's another thing we discovered: using Java itself in the most performant way can help a lot.
It can be tricky, but profiling can often highlight various patterns of object instantiation that show-up red flags and can lead you directly to regions of the code that can be refactored to be significantly more efficient.
Also, the often overlooked GC log analysis can lead to similar discoveries and remedies.
Blessings,
Felix
Hi,
To optimize Pisces that became the Marlin rasterizer, I carefully avoided any both array allocation (byte/int/float pools) and also reduced array copies or clean up ie only clear dirty parts.
This approach is generic and could be applied in other critical places of the rendering pipelines.
https://bourgesl.github.io/fosdem-2016/slides/fosdem-2016-Marlin.pdf
Of course I would be happy to share my experience and work with a tiger team on optimizing JavaFX graphics.
However I would like getting sort of sponsoring for my potential contributions...
Cheers,
Laurent
Post by Tobi
Hi,
thanks Felix, Laurent and Chris for sharing your stuff with the community!
I am happy to see starting a discussion about boosting up the JavaFX rendering performance. I can confirm that the performance of JavaFX scene graph is not there where it should be. So multithreading would be an excellent, but difficult approach.
Felix, concerning your research of other toolkits: Do they all use multithreading or are there any toolkits which use single threading but are faster than JavaFX?
So maybe there are other points than multithreading where we can boost the performance?
2) your HPR sounds great. Did you already try DemoFX (part 3) benchmark with your HPR?
Best regards,
Tobi
Post by Felix Bembrick
(Thanks to Kevin for lifting my "awaiting moderation" impasse).
So, with all the recent discussions regarding the great contribution by
Laurent Bourgès of MarlinFX, it was suggested that a separate thread be
started to discuss parallelisation of the JavaFX rendering pipeline in
general.
As has been correctly pointed-out, converting or modifying the existing
rendering pipeline into a fully multi-threaded and performant beast is
indeed quite a complex task.
But, that's exactly what myself and my colleagues have been working on for
about 2 years.
The result is what we call the Hyper Rendering Pipeline (HPR).
Work on HPR started when we developed FXMark and were (bitterly)
disappointed with the performance of the JavaFX scene graph. Many JavaFX
developers have blogged about the need to dramatically minimise the number
of nodes (especially on embedded devices) in order to achieve even
"acceptable" performance. Often it is the case that most (if not all
rendering) is eventually done in a single Canvas node.
Now, as well already know, the JavaFX Canvas does perform very well and the
recent awesome work (DemoFX) by Chris Newland, just for example, shows what
can be done with this one node.
But, the majority of the animation plumbing in JavaFX is related to the
scene graph itself and is designed to make use of multiple nodes and node
types. At the moment, the performance of this scene graph is the Achilles
Heel of JavaFX (or at least one of them).
Enter HPR.
I personally have worked with a number of hardware-accelerated toolkits
over the years and am astounded by just how sluggish the rendering pipeline
for JavaFX is. When I am animating just a couple of hundred nodes using
JavaFX and transitions, I am lucky to get more than about 30 FPS, but on
the same (very powerful) machine, I can use other toolkits to render
thousands of "objects" and achieve frame rates well over 1000 FPS.
So, we refactored the entire scene graph rendering pipeline with the
1. It is written using JavaFX 9 and Java 9 (but could theoretically be
back-ported to JavaFX 8 though I see no reason to).
2. We analysed how other toolkits had optimised their own rendering
pipelines (especially Qt which has made some significant advances in this
area in recent years). We also analysed recent examples of multi-threaded
rendering using the new Vulkan API.
3. We carefully analysed and determined which parts of the pipeline should
best utilise the CPU and which parts should best utilise the GPU.
4. For those parts most suited to the CPU, we use the advanced concurrency
features of Java 8/9 to maximise parallelisation and throughput by
utilising multiple cores & threads in as an efficient manner as possible.
5. We devoted a large amount of time to optimising the "communication"
between the CPU and GPU to be far less "chatty" and this alone led to some
huge performance gains.
6. We also looked at the structure of the scene graph itself and after
studying products such as OpenSceneGraph, we refactored the JavaFX scene
graph in such a way that it lends itself to optimised rendering much more
easily.
7. This is clearly not a "small" patch. In fact to refer to it as a
"patch" is probably rather inappropriate.
The end result is that we now have a fully-functional prototype of HPR and,
already, we are seeing very significant performance improvements.
At the minimum, scene graph rendering performance has improved by 500% and,
with judicious and sometimes "tricky" use of caching, we have seen
improvements in performance of 10x or more.
And... we are only just *starting* with the performance optimisation phase.
The potential for HPR is massive as it opens-up the possibility for the
JavaFX scene graph and the animation/transition infrastructure to be used
for a whole new class of applications including games, advanced
visualisations etc., without having to rely on imperative programming of a
single Canvas node.
I believe that HPR, along with tremendous recent developments like JPro and
the outstanding work by Gluon on mobiles and embedded devices, could
position JavaFX to be the best graphics toolkit of any kind in any language
and, be the ONLY *truly* cross-platform graphics technology available.
WORA for graphics and UIs is finally within reach!
Blessings,
Felix
Felix Bembrick
2016-11-11 11:01:23 UTC
Permalink
Hi Tobi,

Thanks for the input.

In answer to your first question, not all toolkits use as much parallelisation as HPR but are indeed more performant than JavaFX for a few reasons:

1. They are "closer to the metal". The more layers of architecture you add, there is almost an inevitable performance hit associated with them. There's an OpenGL toolkit named Visualization Library which is very low-level and the performance is outstanding (and is basically single threaded).

2. The structure of the scene graph. The more cumbersome or memory-hogging the scene graph along with the actual "scene graph model" used can seriously impair the ability to optimise the rendering pipeline. That's why it was necessary for us to restructure the JavaFX scene graph itself.

3. Significant performance improvements can be achieved (even in a single threaded pipeline) simply by batching GPU commands or using various forms of "caching" them on the GPU (as both OpenGL and Direct3D support). JavaFX is particularly poor in this area of CPU-to-GPU communication.

Now, as for testing HPR with DemoFX, the answer is no.

There are 2 main reasons:

1. HPR currently only works with Java/JavaFX 9 (and that is unlikely to change).

2. Given that DemoFX is mostly Canvas based (along with some 3D scene overlays), I doubt it would have much impact (although I expect the 3D parts would perform better).

I hope these answers are helpful.

Blessings,

Felix
Post by Tobi
Hi,
thanks Felix, Laurent and Chris for sharing your stuff with the community!
I am happy to see starting a discussion about boosting up the JavaFX rendering performance. I can confirm that the performance of JavaFX scene graph is not there where it should be. So multithreading would be an excellent, but difficult approach.
Felix, concerning your research of other toolkits: Do they all use multithreading or are there any toolkits which use single threading but are faster than JavaFX?
So maybe there are other points than multithreading where we can boost the performance?
2) your HPR sounds great. Did you already try DemoFX (part 3) benchmark with your HPR?
Best regards,
Tobi
Post by Felix Bembrick
(Thanks to Kevin for lifting my "awaiting moderation" impasse).
So, with all the recent discussions regarding the great contribution by
Laurent Bourgès of MarlinFX, it was suggested that a separate thread be
started to discuss parallelisation of the JavaFX rendering pipeline in
general.
As has been correctly pointed-out, converting or modifying the existing
rendering pipeline into a fully multi-threaded and performant beast is
indeed quite a complex task.
But, that's exactly what myself and my colleagues have been working on for
about 2 years.
The result is what we call the Hyper Rendering Pipeline (HPR).
Work on HPR started when we developed FXMark and were (bitterly)
disappointed with the performance of the JavaFX scene graph. Many JavaFX
developers have blogged about the need to dramatically minimise the number
of nodes (especially on embedded devices) in order to achieve even
"acceptable" performance. Often it is the case that most (if not all
rendering) is eventually done in a single Canvas node.
Now, as well already know, the JavaFX Canvas does perform very well and the
recent awesome work (DemoFX) by Chris Newland, just for example, shows what
can be done with this one node.
But, the majority of the animation plumbing in JavaFX is related to the
scene graph itself and is designed to make use of multiple nodes and node
types. At the moment, the performance of this scene graph is the Achilles
Heel of JavaFX (or at least one of them).
Enter HPR.
I personally have worked with a number of hardware-accelerated toolkits
over the years and am astounded by just how sluggish the rendering pipeline
for JavaFX is. When I am animating just a couple of hundred nodes using
JavaFX and transitions, I am lucky to get more than about 30 FPS, but on
the same (very powerful) machine, I can use other toolkits to render
thousands of "objects" and achieve frame rates well over 1000 FPS.
So, we refactored the entire scene graph rendering pipeline with the
1. It is written using JavaFX 9 and Java 9 (but could theoretically be
back-ported to JavaFX 8 though I see no reason to).
2. We analysed how other toolkits had optimised their own rendering
pipelines (especially Qt which has made some significant advances in this
area in recent years). We also analysed recent examples of multi-threaded
rendering using the new Vulkan API.
3. We carefully analysed and determined which parts of the pipeline should
best utilise the CPU and which parts should best utilise the GPU.
4. For those parts most suited to the CPU, we use the advanced concurrency
features of Java 8/9 to maximise parallelisation and throughput by
utilising multiple cores & threads in as an efficient manner as possible.
5. We devoted a large amount of time to optimising the "communication"
between the CPU and GPU to be far less "chatty" and this alone led to some
huge performance gains.
6. We also looked at the structure of the scene graph itself and after
studying products such as OpenSceneGraph, we refactored the JavaFX scene
graph in such a way that it lends itself to optimised rendering much more
easily.
7. This is clearly not a "small" patch. In fact to refer to it as a
"patch" is probably rather inappropriate.
The end result is that we now have a fully-functional prototype of HPR and,
already, we are seeing very significant performance improvements.
At the minimum, scene graph rendering performance has improved by 500% and,
with judicious and sometimes "tricky" use of caching, we have seen
improvements in performance of 10x or more.
And... we are only just *starting* with the performance optimisation phase.
The potential for HPR is massive as it opens-up the possibility for the
JavaFX scene graph and the animation/transition infrastructure to be used
for a whole new class of applications including games, advanced
visualisations etc., without having to rely on imperative programming of a
single Canvas node.
I believe that HPR, along with tremendous recent developments like JPro and
the outstanding work by Gluon on mobiles and embedded devices, could
position JavaFX to be the best graphics toolkit of any kind in any language
and, be the ONLY *truly* cross-platform graphics technology available.
WORA for graphics and UIs is finally within reach!
Blessings,
Felix
Loading...