Category: Proxy usage

Q

How do I get the best performance out of the proxy classes?

A

That's a really tough question, but there are some tips that will typically help you a lot. Let's talk about general application performance first though. It is of course unavoidable to incur performance penalties when you're integrating two different technologies. The questions are always the same:

  • How big is the penalty?
  • How does that compare to a "pure", non-integrated application?
  • How does that compare to "integration technology X"?
  • Is it fast enough for my use case?

How does it compare to "integration technology X"?

Let's start with the third question because it's the easiest to answer. The JNI-based, in-process integration with JunC++ion-generated proxy classes is among the fastest possible ways of integrating Java with C++. All other commonly used integration technologies rely on out-of-process approaches like messaging, socket-based communications, etc. and are much slower than JNI, often by one to two orders of magnitude.

This is not to say that it is impossible to write a faster integration solution using other approaches. If you apply your expert domain knowledge to an integration problem you will almost always be able to create a faster integration solution. That's because you know exactly what you need and you know exactly how you're going to use the solution. This allows you to optimize the code for this use case.

Our solution is generic and our expertise lies in the domain of Java/C++ integration, not in your application domain. That being said, application requirements change with nice regularity and your expensive, hand-coded, hand-optimized integration might become obsolete before it even reaches production!

Another consideration should be the reliability of the code. Take hand-written JNI as an example: if you hand-write JNI code you can make certain assumptions, for example that a certain class and certain methods or fields will always have to be present in a correctly deployed application. This allows you to skip some error checking, thereby improving performance while running the risk of catastrophic and possible hard to debug failure in the face of change. The key point we're trying to make is: your expert knowledge allows you to make a reliability/performance tradeoff that we cannot afford to make in a generically useful product; we have to choose reliability at all times.

So, how much slower than a pure Java application is it?

That's a really hard question to answer. It depends on too many factors to comfortably answer. Factors that we know to be relevant include:

  • the JVM that is being used
  • garbage collector configuration
  • the Java APIs being used
  • the way the Java APIs are used

Some applications that are I/O-bound or compute-bound might not show any performance degradation at all. Other applications that are heavily interactive and have many cross-language calls involving little or no work on the Java side could have a significant overhead. Typically, we see an overhead of no more than 25% in desktop applications.

There are some factors that really contribute to bad performance. The following section lists a number of these factors.

JVM

You might not have a choice about the JVM you're using, but if you do, you might want to benchmark your application with a few different JVMs. Most Java Runtime Environments give you a choice between a JVM that has been optimized for GUIs and one that has been optimized for non-GUI applications. The server JVM is usually better performing than the client JVM, unless you're writing an integrated Swing GUI solution.

String performance

If you're exchanging a lot of String data between the two sides, see whether you can use the UTF-8 encoding for the framework. The framework encoding is responsible for the translation of single/multi-byte C++ characters to characters in a Java string. By default, the framework uses a method that uses the platform encoding and requires a number of steps to perform the translation. If you know for sure that your application is only using ASCII characters or true UTF-8-encoded strings, you can configure the UTF-8 encoding to be used. This causes a shortcut to be taken during these marshalling operations that performs much better and can make quite a difference in total performance.

Reusing String instances can also gain you some performance. Let's look at a relatively common use case of looking up String-indexed values in a Java Hashtable. Let's say you wrote a C++ method that looks like this:

Object    MyType::getAdditionalData() const
{
    return mHashtable.get( "ADDITIONAL_DATA" );
}

This method will return the correct result, but you could make it perform faster by saving yourself the cost of creating a temporary Java String instance which is used as an argument for the lookup operation. Consider writing this instead:

Object    MyType::getAdditionalData() const
{
    if( pAdditionalData == NULL )
        pAdditionalData = new String( "ADDITIONAL_DATA" );
    
    return mHashtable.get( *pAdditionalData );
}

You create only one String instance, benefiting both by using fewer JNI calls and by saving the garbage collector some work. Again, don't do this everywhere, just be aware of the pattern and use it for your performance hotspots.

Array performance

Arrays are among the most tricky parts of the framework in terms of performance. Let's consider a naive piece of C++ code that we've written using generated proxy classes:

float     sum = 0;
for( int i=0; i<myInst.getArray().length; i++ ) sum += myInst.getArray()[ i ];

This snippet will work as expected, but it will perform quite badly. It can easily be optimized by moving the myInst.getArray() call out of the loop, thereby saving a lot of JNI calls:

float               sum = 0;
xmog_float_array arrFloat = myInst.getArray(); for( int i=0; i<arrFloat.length; i++ ) sum += arrFloat[ i ];

This will perform better, but it still has a lot of JNI calls: every array element access requires a JNI call! We can improve on that by "marshalling" array data across the boundary on one step:

float               sum = 0;
xmog_float_array arrFloat = myInst.getArray(); float * nativeArrFloat = new float[arrFloat.length]; arrFloat.to_native( nativeArrFloat, 0, arrFloat.length ); for( int i=0; i<arrFloat.length; i++ ) sum += nativeArrFloat[ i ]; delete[] nativeArrFloat;

This technique of "block-accessing" array elements works for all primitive array types. Don't forget to clean up behind yourself when you (rather than the framework) dynamically allocate memory! You get the best performance if you have a statically allocated native array that's large enough to hold the contents of the largest Java array that you're using. You can also process the Java array data in chunks by customizing the start/end indices of the to_native() invocation.

Object reference performance

Every interaction with the Java side is costly, at least in relative terms. When you have a choice between writing:

int      result = 0;
 
if( myInst.foo.bar.id != otherInst.foo.bar.id )
    result = myInst.foo.bar.id;

and writing:

int      result = 0;
int      temp = myInst.foo.bar.id;

if( temp != otherInst.foo.bar.id )
    result = temp;

you should choose the latter because it only performs the costly deep dereferencing once.

If you were writing pure C++ or pure Java code, the compiler or JIT might take care of this optimization for you. In a mixed language environment, what you write is what get's executed!

Do it in Java

This is the one recommendation that we don't like making because it doesn't apply to every performance problem, but sometimes it's the one that gives you the biggest benefit. Sometimes you have a performance hotspot that is due to a usage pattern in C++ that requires many calls across the language boundary. Consider this snippet:

MyDataType  data( _use_java_ctor );
 
data.foo = 5;
data.bar = 6;
data.answer = 42; 

Here you have at least 4 JNI operations (object creation plus three field sets). Consider the following snippet instead:

MyDataType  data( 5, 6, 42 );

Here you have just the constructor invocation. The second snippet will perform much better than the first one.

If you control the Java side of the integration problem, you might also consider the creation of additional, performance-optimized entry points for consumption by the C++ side. The more work is performed per cross-language call, the higher the performance. Let us make it perfectly clear that changing your Java code is rarely required! We just mention it as a last-ditch optimization technique for critical performance problems.


Copyright 2006-2011 by Codemesh, Inc., ALL RIGHTS RESERVED

:
frequently asked questions
home products support customers partners newsroom about us contact us