The last decade witnessed rapid increase in multimedia and other applications that require transmitting and protecting huge amount of data streams simultaneously. For such applications, a high-performance cryptosystem is compulsory to provide necessary security services. Elliptic curve cryptosystem (ECC) has been introduced as a considerable option. However, the usual sequential implementation of ECC and the standard elliptic curve (EC) form cannot achieve required performance level. Moreover, the widely used Hardware implementation of ECC is costly option and may be not affordable. This research aims to develop a high-performance parallel software implementation for ECC. To achieve this, many experiments were performed to examine several factors affecting ECC performance including the projective coordinates, the scalar multiplication algorithm, the elliptic curve (EC) form, and the parallel implementation. The ECC performance was analyzed using the different factors to tune-up them and select the best choices to increase the speed of the cryptosystem. Experimental results illustrated that parallel Montgomery ECC implementation using homogenous projection achieves the highest performance level, since it scored the shortest time delay for ECC computations. In addition, results showed that NAF algorithm consumes less time to perform encryption and scalar multiplication operations in comparison with Montgomery ladder and binary methods. Java multi-threading technique was adopted to implement ECC computations in parallel. The proposed multithreaded Montgomery ECC implementation significantly improves the performance level compared to previously presented parallel and sequential implementations.

Elliptic Curve cryptosystem (ECC) is a next-generation approach to public key cryptosystems that uses relatively small keys to provide the same or greater level of security compared to the other public-key cryptosystems [

The performance of ECC is affected by the choices of implementation environment; software and hardware implementations, and the domain parameters such as field representation, Elliptic curve (EC) form, algorithm for scalar multiplication, and the coordinates system. Every one of the aforementioned parameters plays important role in the ECC performance [

The current research investigates the ECC performance using the different parameters, and selects the best possible parameters’ choices and scalar multiplication algorithms to optimize the cryptosystem's performance. This study considers software implementation of ECC since it is cost-effective and consumes less resources in comparison with hardware implementation. The rest of this article is organized as follows: Section 2 mathematical background, Section 3 related works, Section 4 proposed cryptosystem design, Section 5 results and discussion, and finally the conclusion in Section 6.

ECC is a type of public key cryptography that depends on the discrete logarithm problem for EC. ECC has two levels of computations; upper and lower levels of computations. The main operation in the upper level is the scalar multiplication, which consists of two operations; point doubling and point addition (called point operations). The lower level, on the other hand, includes finite field (called Galois field GF) computations, which are modular addition, subtraction, multiplication, and division operations. The latter is the most time consuming operation because it requires finding the multiplicative inverse [

The scalar multiplication algorithm performs either one or both of point operations in each iteration. The point addition operation adds two different points G and Q on the EC to obtain another EC point R. The point addition can be calculated via computing the coordinates (x_{3}, y_{3}) of the resulting point, as follows:

G(

Alternatively, the point doubling operation adds the point to itself (Q = G) and can be calculated as follows:

It worth mentioning that the EC form does not affect the point addition calculations, hence it relies only on the two points on the curve. On the contrary, point doubling calculation changes according to the EC form and the use of coordinate systems, since it requires to derive the slope equation from the elliptic curve equation.

There are two common types of finite fields in which ECC computations are applied; the prime field GF(P) and the binary field GF(

Several elliptic curve forms over GF(P) were presented previously. Among the most important forms is the Montgomery curve since it has less computational complexity, which makes it appropriate for security applications that need high speed ECC.

The Montgomery curve equation is defined as follows: E:

To find all points on the Montgomery curve over GF(P), substitute the value of x = 0, 1,…P in the Montgomery curve equation then take the square roots to find the value of y. The EC is distinguished from other curves since it is symmetric about the x-axis; every point on the elliptic curve has a mirrored point on the other side of the x-axis. For any non-vertical line drawn through the elliptic curve, it will intersect exactly at three points. The EC has a point Ɵ called point at infinity [

Points on a curve can be represented using different types of coordinates systems. The standard coordinates is affine coordinates P(x, y). Alternatively, the use of projective coordinates introduced to improve the performance of elliptic curve computations due to its ability to avoid the time-consuming inversion operation. In projective coordinates a point

The main operation in ECC encryption is the scalar multiplication, which is usually used to judge the cryptosystem performance. Three main algorithms were used to perform scalar multiplication, which are the Montgomery ladder, NAF, and Binary left to right (LTR). The scalar multiplication algorithms vary in terms of speed and security levels [

There has been an intense amount of researches done in the field of ECC since it was introduced in 1985. The majority of previous studies investigated possible factors to improve ECC performance such as the use of certain EC form, the use of projective coordinates, the parallel and concurrent execution of elliptic curve computations, and the use of efficient scalar multiplication algorithms. Again, all these factors affect the ECC performance and should be considered when designing high-speed cryptosystem. Other research works focused on safeguarding ECC against certain types of attacks such as simple time attack (STA). This section surveys the most important research works that studied the possible improvements on the performance and security of ECC.

EC point operations (point doubling (ECDBL) and point addition (ECADD)) include the time expensive modular Inversion when using affine coordinates. Many researchers proposed using projective coordinates systems to eliminate the modular Inversion [

There is a plenty of studies that examined the hardware implementation of ECC forms. Some of them used sequential implementation [

However, the major disadvantage of hardware implementations is that they require a dedicated crypto-processor and hardware components, entailing significant increase in the cost of designing and implementing the cryptosystem. This makes hardware implementation of ECC not affordable and complicated choice for many security applications, especially those applications with limited resources. Software implementations for ECC, on the other hand, have low development costs, easy to upgrade, and more flexible [

Authors of [

In [

Authors of [

The study presented in [

It has been noticed that majority of previous research works did not conduct performance testing on their software implementations of ECC to highlight the efficiency of their proposed cryptosystems. In addition, previous studies focused on the sequential software implementation for ECC computations and considered one scalar multiplication algorithm per each proposed ECC implementation.

Furthermore, the reviewed research works that studied the software implementation of ECC focused on the standard EC form and did not investigate the use of newly introduced EC representations such as Montgomery, and tripling oriented curves. Another issue in previous works is that they are predominately use the affine coordinates system to represent the EC points, which leads to a significant drawback in ECC performance. This is because the use of affine coordinates requires finding the multiplicative inverse to perform the modular division operation. Computing the inverse is the most time consuming operation in EC cryptography.

Although, the projective coordinates were used in some studies, those researches did not benefit from the parallel and concurrent execution of ECC computations with projective coordinates. Instead, the vast majority of reviewed researches used the sequential implementation of ECC, in which only one finite field operation can be performed in each level of computations. Again, this increases the time delay of ECC operations and exposes the cryptosystem to the risk of side channel attacks. The sequential software implementation of ECC in not efficient for security applications that require high-speed cryptosystems such as multimedia and military applications.

This research seeks to develop improved software implementation for ECC. To this end, the different factors affecting the ECC performance will be investigated, which are the use of projective coordinates, the use of EC forms with less computations complexity, the scalar multiplication algorithm, and the use of parallel software implementation to perform ECC computations. In the proposed ECC's implementations, the different factors are tuned up and combined to reduce the time delay. The ECC performance is evaluated and analyzed to find the best combination of these factors that achieves the optimum performance level. This aims to develop a high-performance and efficient ECC. To our knowledge, this combination of factors and optimization of the software implementation of ECC have never been proposed and studied previously.

This section presents the equations and methods used to implement ECC computations. The parallel computational schemes, which are required for parallel software implementations, are designed in this section. Eventually, this section introduces the enhanced software implementation of ECC.

This research uses the recently introduced form of EC called Montgomery curve, which was presented in Section 2. The Montgomery EC is selected since it has relatively low computational complexity. EC points are represented using projective coordinates instead of usual affine form to avoid the modular division operation. Three types of projective coordinates (Homogenous, Lopez Dahab, and Jacobian) were tested in this research to find the one that gives the best performance.

The computations of point doubling and addition operations were implemented in parallel manner via utilizing the inherent parallelism in ECC computations. Different parallelization levels were tested to find out the best level that achieves the least time delay.

The major scalar multiplication algorithms were examined in this research. In particular, we evaluated the performance of ECC using the Binary method, the NAF algorithm, and the Montgomery scalar algorithm. Proposed ECC implementations are evaluated and compared in terms of time consumption of the scalar multiplication operation. The Java programming language is used to implement the proposed ECC and perform encryption and decryption operations. The following section presents the underlying computations for Montgomery point operations using the three coordinates systems.

In this section, the ECC point computations using projective coordinates are illustrated. The main EC point operations are point doubling and point addition. Each one of them requires performing a number of finite filed computations; multiplication, addition, and subtraction. The projective coordinates use three dimensions (x, y, and z) to represent an EC point instead of two dimensions. The point addition operation adds two different EC points to find a third point, while the point doubling operation duplicates an EC point to find another point. Equations of point doubling and addition are presented in Section 2.

The Homogeneous coordinates were used to represent the points on the Montgomery curve. The point doubling operation computes the third point represented by the coordinates X3, Y3, and Z3. This can be done using the following equations:

In the Lopez-Dahap coordinates, the EC point is represented using the coordinates system. The equations used to compute the result of Montgomery point doubling using Lopez-Dahap coordinates are as follows:

The Jacobean coordinates system uses the following equations to find the result of the Montgomery point doubling operation:

The point addition computations, on the other hand, are similar for all ECs over GF(P) and are not affected by the change in EC equation. The point addition computations are presented in great details in [

In this section, the ECC computational schemes to implement finite field arithmetic are designed. All possible design choices to perform ECC points computations were investigated; starting from the sequential design tell reaching the design with maximum parallel operations. The rationale behind parallelizing ECC computations is to reduce the time delay as much as possible and hence improving the cryptosystem performance. This study focused on the parallel ECC design that achieves the shortest time delay. The main disadvantage of sequential implementation of ECC is that it consumes longer time to implement the cryptosystem's computations and wastes the CPU capability by executing only one modular operation per each step.

The current study aims to improve the speed of ECC through utilizing the CPU's capability to allow the concurrent execution of multiple modular operations simultaneously. In this section we present the data flows from inputs to outputs of the Montgomery ECC point Doubling over GF(P). These designs created by mapping the computational operation to a parallel software designs.

The proposed designs use four parallel multiplications (4-PM) and two parallel additions (2-PA), since it is considered the best parallelization level that yields the highest performance for point doubling. In each level of computations, either addition or multiplication operations are performed. The operations in each level are executed in a parallel manner. This was achieved through using the multi-threading technique in software implementation as will be clarified in Section 4.3.

Note that the modular operations within each sequential level cannot begin unless the operations from the previous level are fully executed.

Unlike the sequential design, the proposed ECC designs exploit the inherited parallelism in EC commutations by allowing the concurrent executions of multiple modular operations within each level.

It can be noticed from

This section elaborates on the software implementation of proposed parallel and high-speed GF(P) ECC using Montgomery curve. First, we illustrate the implementation of modular arithmetic; multiplication, addition, subtraction, and inversion operations. Then, the parallel multi-threaded implementation of point doubling and addition operations is presented. The software implementation of the scalar multiplication using the Montgomery ladder, the NAF, and the Binary algorithms is shown in this section. Finally, an ECC encryption and decryption experiments are performed to verify the cryptosystem efficiency and evaluate its performance level.

For Modular arithmetic operations, the java BigInteger class is used to allow very large integer (typically 256-bit) calculations beyond the limits of all primitive data types, which cannot handle values greater than 64 bits. In addition, the java BigInteger class provides a set of methods to support the execution of modular addition, modular subtraction, modular multiplication and modular inversion. All of these operations are implemented using built-in methods in the BigInteger class. For example, the modular multiplication is implemented by using the multiply and mod methods of BigInteger class. The following code segment shows an example for the modular multiplication operation used in the proposed cryptosystem.

BigInteger A = new BigInteger(“587995594274”);

BigInteger B = new BigInteger(“2000000000000”);

BigInteger P = new BigInteger (“504842927415879955942747”);

BigInteger C = A.multiply(B).mod(P);

System.out.println(“The result of A * B mod P is “+C);

In this research, we examined all possible design choices for Montgomery ECC using the three main projections. The next section presents the software implementations for the most efficient design choices used to perform ECC points operations.

To facilitate the points operations on an elliptic curve, we implemented Point and EllipticCurve Classes. The Point Class has three private members, representing the X and Y and Z coordinates of a point. These members are objects of the BigInteger class.

The EllipticCurve class handles the domain parameters and have public functions pointAddition() and PointDoubling () and scalarMultplication(). The pointAddition() and PointDoubling () methods have sequential and multithreaded java implementations for each of the three coordinates systems.

While the sequential implementation of point operations is straightforward and performs one operation per each level, the parallel implementation of point operations reflects the parallel designs presented in Section 4.2 and exploit the inherited parallelism in ECC computations. To achieve this, the multi-threading technique in java was used to allow the parallel execution of ECC computations. The Java threads are independent and save time; multiple operations can be performed simultaneously and separately. Three main methods from the Java Thread class were used in the proposed implementation, as follows:

Run (): It is used to do an action for a thread.

start (): It is used to trigger the execution of the finite field operation within the thread.

join (): It is used to wait for the results of the finite field operation within a specific thread. This method can be used for the caller thread to wait for the completion of called thread.

At each level, multiple threads are created; one thread for each finite field operation within the level. Within the bracket of the Run method, any segment of Java code can be placed. In our implementation, a prime finite field operation using BigInteger Class's object and procedures is used. For the purpose of controlling the order of the executions of the levels within the parallel design, the join () method used, to make the main thread waits for the completion of all called threads that perform the field operations within a particular level. The code segment below showcases the creation and start of a Java thread used in the parallel Implementations of ECC.

Thread thread1 =

thread1.start();

Within the bracket of the Run method, any segment of Java code can be placed. In our implementation, a prime finite field operation using BigInteger Class's object and procedures is used. And for the purpose of controlling the order of the executions of the levels within the parallel design, the join () method used, to make the main thread waits for the completion of all called threads that performs the finite field operations within the same level. For instance, in the code of the parallel implementation for Montgomery Point Doubling using Homogeneous projection, four thread objects were created at the first multiplication level and named thread1, thread2, thread3, and thread4. In this level, each thread responsible for carrying out the result of modular multiplication assigned to it within the design presented in

At the end of a particular level of computations, each thread was created in this level will call the join method to ensure that each thread completes its calculations before starting the next level. Other levels of computations for ECC points are implemented in parallel using similar mechanism. Sixteen java threads were created to implement the Montgomery point doubling using Homogeneous Projective system.

Thread thread1 =

thread1.start();

Thread thread2 =

thread2.start();

Thread thread3 =

thread3.start();

Thread thread4 =

thread4.start(); thread1.join(); thread2.join(); thread3.join(); thread4.join();

The scalar multiplication is the most critical operation in ECC. The current research investigates the three main algorithms used for performing the scalar multiplication, which are the Montgomery Ladder, the NAF, and the Binary algorithms. This aims to determine the fastest algorithm that provides best performance level for ECC encryption/decryption processes. Experimental results showed that the NAF algorithm is faster than the Montgomery ladder and the Binary Left-to-Right method, because it requires less number of point addition operations. The following code segment is used to implement the NAF algorithm after computing the signed binary representation (NAF) of the scalar K through the function (computeNAF(k)).

Point R =

BigInteger [] s = {computeNAF}(k); // compute the NAF representation

R = G; //The result point R initialized to the input point G

R =

R = PointAddition(R,G); // if value = 1, perform a point addition R = R + G

{Point inverseG =

R = PointAddition(R, inverseG); // If value = −1 perform a point subtraction

} }

In order to test the proposed cryptosystem, and verify its functionality, a number of encryption and decryption experiments were conducted. Moreover, this contributes in evaluating the ECC performance more precisely, and assists developers to build high-speed and efficient cryptosystems.

For multithreaded implementations of point operations. all of the tests are run on a Dell PC (Intel(R) Core(TM) i7–4770 CPU @ 3.40 GHz, 4GB RAM), running the Windows 7 operating system. The code was compiled and run with Java version 1.8 using the Eclipse IDE tool. The M-221 curve used for the implementation and testing of the cryptosystems.

The process of implementing the ECC encryption and decryption went through the following phases:

To set up an ECC, both senders and receivers are required to select and agree upon the same EC domain parameters, i.e., the underlying finite field

Each side selects a random integer as its private keys and then computes its public key. The selectPrivatekey, getRandomNumber, and computePublic_key methods were used for the key generation purposes.

In this phase, each char in the message is mapped to its corresponding point in the lookup table that the senders and receivers have agreed upon. The encoding method takes an array of char types and returns an array of Point types. The method loop the input array to map each char element with its EC Point using the ASCII value of the char, and then this value is used as an index for the point representing this char.

4)

In this phase, the cryptosystem encrypts the point array representing the message after encoding to produce a cipher array. The encryption function is a part of the EllipticCurve class, and takes as an input; the Private Key, the message points, and the other Party's Public Key. This method returns a cipher array of points for each char in the message after encryption. The encryption process is implemented through the following code segment:

Point [] cipher =

cipher =

Point[] Encryption (BigInteger Private_Key, Point[] M , Point otherPartyPublicKey)

Ciphers[n] = PointAddition(M[n], scalarMultplication(Private_Key, otherPartyPublicKey));

5)

In the decryption phase, the cryptosystem decrypts the ciphertext using a member method of the EllipticCurve Class. The Decryption method takes the cipher array of points as an input and returns the message mapped points back. The decryption process inverses the encryption to restore the plaintext.

6)

In the message decoding phase, each elliptic curve Point in the cipher array is mapped back to its corresponding char using the lookup table that both sender and receiver have agreed upon at the setup phase. The decoding method takes an array of Point types and returns an array of Char types representing the plaintext.

The following section presents the experimental results for the proposed ECC, and discussion about the performance evaluation.

GF(P) operations | Average execution time (nanosecond (ns)) |
---|---|

Multiplication | 731891 ns |

Addition | 647204 ns |

Inversion | 1393295 ns |

EC operation/ Projective coordinates system | Homogenous | Lopez–Dahab | Jacobian | |||
---|---|---|---|---|---|---|

Sequential | Parallel | Sequential | Parallel | Sequential | Parallel | |

Montgomery point doubling | 447402 ns | 176611 ns | 458541 ns | 191111 ns | 429292 ns | 384655 ns |

Point addition | 411182 ns | 382558 ns | 434728 ns | 404962 ns | 393655 ns | 368348 ns |

In order to highlight the improvement on the performance achieved by using the proposed parallel implementation,

It can be observed from the results presented in

EC operation/Projective coordinates system | Homogenous | Lopez–Dahab | Jacobian |
---|---|---|---|

Montgomery point doubling | 253% | 239% | 111% |

Point addition | 107% | 107% | 106% |

The scalar multiplication is the main operation in ECC encryption, and usually used to evaluate the cryptosystem performance.

Scalar algorithms | Sequential | Parallel | ||||
---|---|---|---|---|---|---|

Homogenous | Lopez–Dahab | Jacobian | Homogenous | Lopez–Dahab | Jacobian | |

Montgomery ladder | 18.733 | 22.269 | 21.624 | 14.608 | 19.545 | 17.448 |

NAF method | 13.728 | 14.225 | 12.005 | 5.335 | 12.981 | 11.154 |

Binary (LTR) | 17.772 | 19.557 | 20.972 | 9.722 | 17.481 | 19.440 |

It can be noticed from

The research works | ECC operation | Time delays | Proposed implementation |
---|---|---|---|

[ |
Encrypt | 50.68153 ms | 8.009 ms |

[ |
ECSM | 7.762 ms | 5.335 ms |

[ |
ECADD | 3770 ms | 5.335 ns |

[ |
ECSM | 556.51 ms | 5.335 ns |

The explosive increase in the amount of data being transmitted over communication networks makes it necessary to develop a high-speed cryptosystem. Elliptic curve cryptosystem (ECC) has been proposed as an efficient public key cryptosystem since it can provide comparable security level to other systems with using shorter key length. However, ECC using its usual sequential implementation and affine coordinates cannot achieve adequate performance level to protect the huge amount data processes by modern applications such as multimedia and military. Moreover, the majority of previous studies investigated the hardware implementation of ECC, which is costly option, and used only one standard curve.

In this article, a high-speed parallel software implementation for ECC is developed. It studied and tuned-up the main factors playing significant role in improving ECC performance. A newly introduced form called Montgomery curve is studied in this research because it has lower computational complexity, and hence requires less field operations. This research studied the performance level of sequential and parallel implementation of Montgomery ECC. The parallel implementation was done using Java multithreading techniques. Three projective coordinates were implemented and examined due to their ability to avoid the long time inversion operation. It has been shown the proposed multi-threaded ECC implementation using Homogenous projection achieved the best performance results. Such implementation is efficient choice for applications that need fast encryption process. For sequential Montgomery ECC, Jacobian projection obtained the shortest time delay. As a future research, other forms of EC and extensions of projective coordinates can be studied, since it may reduce the time complexity. Parallelizing the upper level of computations for ECC could be considered to increase the speed of encryption process as well.

Authors Acknowledge the support from Imam Mohammad Ibn Saud Islamic University for this research.