Optical design has always pushed the boundaries of what's possible on computers. ZEMAX currently supports up to 16 processors per machine, helping to solve the most difficult problems in optical design.
ZEMAX automatically divides most lengthy calculations, such as ray-tracing, diffraction analysis, and optimization into multiple parallel tasks. For example, ZEMAX can trace one ray on one processor, while tracing another on a second processor, and so on, and then combine the results. ZEMAX supports up to 16 CPUs without any user intervention, giving speed increases up to 16x over a single processor.
This article discusses multi-CPU operation in more detail, and gives examples of how well performance scales over multiple processors.
Optical design has always pushed the boundaries of what's possible on computers. ZEMAX currently supports up to 16 processors per machine, helping to solve the most difficult problems in optical design.
ZEMAX automatically divides most lengthy calculations, such as ray-tracing, diffraction analysis, and optimization into multiple parallel tasks, called threads. For example, consider the Geometric Bitmap Image Analysis feature, which is discussed in detail in the article How to Produce Photo-Realistic Output Images. This feature takes a .jpg or .bmp bitmap image such as this:
and traces rays from each pixel through the optical system to the detector. For a fairly poor imaging system the resulting image is like so:
On a computer with a single CPU, ZEMAX would start at pixel 1 and trace all rays, then go to pixel 2 etc until all pixels have been traced. On a four processor machine, however, the task can be cut up like so:
This calculation would run four times faster than the single CPU case, minus a small amount for the overhead of splitting the image up, launching the threads, receiving the thread data when the threads return, and stitching the data back into a single image. Good software engineering can minimize, but not eliminate entirely, this overhead.




Every feature in ZEMAX is a separate thread, so it is independent of all other features. Not all features are internally multi-threaded however. There is an overhead in launching and managing threads, and in receiving the data from the threads on completion, and in stitching the results back together again.
Also, remember that each thread must include a full copy of all lens data. If every feature spawned multiple internal threads, the amount of memory would quickly rise.
Instead, the computationally demanding features are internally multi-threaded. This includes (but is not limited to) optimization, global optimization, tolerancing, Huygen's calculations, diffraction calculations, physical optics and non-sequential ray-tracing. Many individual Analysis features are also internally multi-threaded. ZEMAX manages the multi-threading to ensure optimal use of the resources in the machine for the task in hand.
For example, let's optimize the double Gauss sample file using a default wavefront merit function. Make all radii (except the surfaces with infinite radii) and thicknesses variable. Place an f/# solve on the last radius, and make it f/3. Then build a default merit function like so:
I then optimize it and tell ZEMAX to use all 8 CPUs
Now damped-least-squares optimization involves taking the derivative of the merit function with respect to each variable, and the calculation is multi-threaded at the variable level, so that each derivative can be computed as its own thread. However, when the optimizer is run, the result is surprising:
Only one CPU is being used! (13% CPU Usage represents only 1 of the 8 CPUs in the machine being used.) Why?
Well, in some respects this problem is too simple to benefit from multi-threading, or at least, it is too efficiently implemented in ZEMAX to need to be multi-threaded. This simple macro:
! How long does it take to compute the merit function?
FORMAT 4.3 EXP
cycles = 100
TIMER # set the timer
FOR i = 1, cycles, 1
dummy = MFCN() # update the merit function
NEXT i
PRINT "The average time to compute the merit function is: ", ETIM()/cycles, " seconds"
PRINT "Program End"
END
reveals:
Executing C:\Program Files (x86)\ZEMAX\MACROS\QUICKIE.ZPL.
The average time to compute the merit function is: 3.100E-004 seconds
Program End
The merit function takes only 310 microseconds to compute on one processor of this machine! Remember that the merit function in this case is the RMS wavefront error over three fields and three wavelengths. This shows the incredible efficiency of ZEMAX's Gaussian Quadrature and DLS algorithms. The overhead of copying all the data, launching threads, and receiving data back again is not justified when the merit function is computed this quickly. If a DENC (Diffraction Encircled Energy) operand is added to the merit function, the CPU utilization approaches 100%.
In general, a computation can be split up into "parallel-izable" tasks which can be split up over multiple CPUs, and serial tasks which can only take place in a single thread of execution. The CPU utilization shown in Task Manager is a rough representation of CPU usage, and during execution of a multi-threaded taks you will see the CPU usage vary from 13% (single CPU in this case) to 100%.