### 2.1 Experimental setup

The basic experimental set-up is schematically shown in Fig. 1a. The plaintext image, \(f(x_0,y_0)\), where \((x_0,y_0)\) denotes the coordinates of the input plane, was displayed on an amplitude-only spatial light modulator (SLM-A). A 4-f imaging system was then used to project the plaintext image displaced on SLM-A to the input of the proposed encryption engine, which is depicted in Fig. 1b. The engine has a cascaded structure, each of which is composed of a phase-only SLM (SLM-P) and a photorefractive crystal [which was a Sr\(_{0.61}\)Ba\(_{0.39}\)Nb\(_2\)O\(_6\) (SBN:61) in this study]. In our experiments, the first phase-only SLM (SLM-P\(_1\)) was placed on the conjugation plane of SLM-A so as to introduce a random phase, R\(_0=\exp [j\phi (x_0,y_0)]\), to the plaintext image, resulting in a random-phase-encoded image \(\psi _0(x_0,y_0)=f(x_0,y_0)\exp [j\phi (x_0,y_0)]\), where the subscript 0 in \(\psi\) stands for the axial position \(z=0\). This random-phase-encoded image \(\psi _0(x_0,y_0)\) was then projected to the front surface of a SBN:61 crystal whose crystalline *c*-axis was perpendicular to the beam propagation direction. The complex wave field at the back surface of the SBN:61 crystal was then projected to SLM-P\(_2\), the other phase-only SLM that was used to random-phase encode the incoming light field by R\(_1=\exp [j\varphi (x_1,y_1)]\) displaced on it. The resulting complex image is called the cyphertext image, written as *g*(*x*, *y*) for convenience. This complex cyphertext image was recorded holographically by interfering with an additional reference beam, usually a plane wave with a known carrier frequency, as shown in Fig. 1a. For the SBN crystal, we used the self-defocusing nonlinearity, which is evoked by applying an external negative electric field *E* along the *c*-axis. Technically, this responses to the change of refractive index of \(\delta n \propto r_{33} E\bar{I}/(1+\bar{I})\), where \(\bar{I}\) is the input intensity \(|\psi _0(x,y)|^2\) measured relative to a background (dark current) intensity [33], and \(r_{33}=255\) pm/V is the electro-optic coefficient relative to the applied field *E* and the *c*-axis [34].

### 2.2 Theory and experimental results

In the experimental demonstration, we used a binary image shown in Fig. 2a as the plaintext for simplicity. The direct image of it through our experimental set-up is shown in Fig. 2b. It was taken by the camera when displaying the binary image shown in Fig. 2a on the SLM-A while the other two SLMs-P and the external electric field *E* switching off. The distortion exhibits in the image was mainly due to the imperfection of the crystal and the aberration of the imaging optic. More careful alignment of the optic did not make significant improvement in our experiments. Nevertheless, we take it as the ground truth plaintext image in our proof-of-principle demonstration. To encrypt the plaintext image, we displayed two statistically independent random phases on the two phase-only SLMs, and turned on the nonlinearity. Mathematically, this nonlinear encryption process can be written as

$$\begin{aligned} g(x,y)&=~T\{\exp [j\varphi (x_1,y_1)]\psi _{z_1}(x_1,y_1);z_2\}\nonumber \\&=~T\{\exp [j\varphi (x_1,y_1)]T\{\psi _0(x_0,y_0);z_1\};z_2\}, \end{aligned}$$

(1)

where the transform \(T\{\cdot \}\) is defined as the nonlinear Schrödinger transform whose integral form is given by [35]

$$\begin{aligned} \psi _z(x,y)=\mathrm {FST}\{\psi _0(x_0,y_0);z\}-j\int _0^zU(z-z')\delta n(|\psi _{z'}(x',y')|^2)\psi _{z'}(x',y')\mathrm {d}z', \end{aligned}$$

(2)

where \(\mathrm {FST}\{\psi _0(x_0,y_0);z\}\) denotes the linear propagation of \(\psi _0(x_0,y_0)\) within the crystal with the length of *z*, *U*(*z*) is the free Schrödinger operator given by \(U(z)\propto \exp [ik\Delta /z]\), where *k* is the wave vector and \(\Delta\) denotes the transverse Laplacian, and \(\delta n(|\psi _{z'}(x',y')|^2)\) is the index of refraction induced by the nonlinearity of the crystal at the plane \(z'\). The nonlinear term in Eq. (2) suggests that the original changes to the beam will be accumulatively augmented upon propagation. As a consequence, the spatial modes of the beam evolve in a coupled manner even with the generation of new ones owing to the wave mixing process [36], rather than propagating independently as in a linear system [7, 8] that all the current techniques for optical image encryption are operating on. It is in this way that the proposed scheme can break the linearity.

As mentioned above, the cyphertext obtained in this way is a complex-valued image [the intensity of which is shown in Fig. 2c]. It should be recorded using interferometry-based techniques like digital holography [37]. This allows the encryption as the process described by Eq. (1) to be reversible provided that the nonlinear medium is fully characterized and the amplitude and phase of the cyphertext image *g*(*x*, *y*) are known [36, 38]. Thus the plaintext can be reconstructed from the digital hologram of the cyphertext numerically, with the conjugations of the two random phase keys presented in the first places, respectively, to demodulate the random phase

$$\begin{aligned} f(x_0,y_0) = \exp [-j\phi (x_0,y_0)]T\{\exp [-j\varphi (x_1,y_1)]T\{g(x,y);-z_2\};-z_1\} \end{aligned}$$

(3)

The decrypted image with the correct keys is shown in Fig. 2d. This demonstrates that the numerical decryption can reverse the wave-mixing process and demodulate the random phase. Here the external voltage that applied across the *c*-axis of the SBN crystal was \(E=-500\) Vcm\(^{-1}\), and the geometric parameters \(z_1=9.7\) mm and \(z_2=8\) mm. We need to mention that only the first nonlinear transform was performed optically since only one SBN:61 crystal (with the size of \(4.4\times 4.4\times 9.7\) mm\(^3\)) was used in this experiment because there is only one such crystal at hand. The second nonlinear transform was performed numerically. Note that various algorithms have been proposed for the numerical solution of the nonlinear Schrödinger equation [39]. Here we simply employed the split-step Fourier propagation method, which has been intensively used in the studies in nonlinear optics [34, 36, 40].

In the extreme case that the nonlinearity reduces to zero (no applied voltage across the crystal), i.e., the second term in Eq. (2) is absent, the system becomes a Fresnel-based system [8], except that it propagates in the crystal other than in free space. The intensities of the cyphertext and the decrypted image are plotted in Fig. 2c and d, respectively. One can see that the plaintext image can be recovered in the linear case is comparable to that in the nonlinear case shown in Fig. 2f. Both these two images are lightly distorted in comparison to the ground truth as the optics were not perfectly aligned in our proof-of-principle experiments, or the numerical reconstruction algorithm did not take the imperfection of the crystal into account. Further calibration of the algorithmic with respect to the experimental setting will help improve the reconstructed results [36]. Comparing to the linear counterpart, the nonlinearly encrypted cyphertext image is more obliterated by virtue to the nonlinear self-defocusing and light-induced scattering that arises from the augment of the beam scattered by the imperfection of the crystal [41]. Such difference in the intensity patterns has been observed in the case even without random phase modulation [36], and thus has the potential to add additional physical security features.

The plaintext image can be recovered even when the nonlinearity was further increased. However, it can be more seriously distorted because the light-induced scattering effect is stronger in this case. In Fig. 2h, we plot the reconstructed plaintext image when the external voltage was tuned to \(E=-1000\) Vcm\(^{-1}\). It is clearly seen that the noise is augmented as the nonlinearity increases, and thus the recovered image is distorted. It becomes severely when the external voltage goes up to \(-2000\) Vcm\(^{-1}\) (Fig. 2j), even all the keys are correctly presented. It is quite challenging to get rid of the light-induced scattering effect in the numerical decryption algorithm because it can be invoked by the imperfection anywhere inside the crystal [42] or even on its surface [43]. In addition, the actual nonlinear effects can be even more intriguing [44]. For example, wave-mixing in the self-defocusing crystal can induce focusing as well [45, 46]. But the numerical decryption algorithm at the current stage does not take them into account. Fortunately, these effects can be ignored when the nonliearity strength is not too strong as in our study. The experimental results confirms that the proposed encryption system works well at small nonlinearity as the light-induced scattering can be very weak in this case [47]. Indeed, in numerical simulation, the recovered plaintext images are perfectly identical with the ground truth regardless of the nonlinearity strength. Details can be found in the Appendix.

### 2.3 Toleration analysis

For an optical encryption system, it is important to analyze how the misalignment of the keys affects the performance of decryption since it explicitly relies on the reversibility of the system. It is expectable that the decryption is sensitive to the alignment as otherwise the modulation cannot be feasibly undone. However, a certain level of toleration against misalignment is desirable for the sake of practice.

As described in Eq. (1), there are several keys to the proposed system: the random phases R\(_0\) and R\(_1\), their geometric positions in the system, and the nonlinearity. To perform the toleration analysis, we should make an assumption that the correct random phases R\(_0\) and R\(_1\) should be presented. Otherwise it is not possible to recover any meaningful image. This has been well studied in the linear counterpart [7]. One can expect that it will not become better in the nonlinear case. Thus we focus on the toleration to the misalignment of the random phases along the transverse and longitudinal directions and to the change of nonlinearity strength for decryption with respect to that for encryption. And we will examine these factors independently.

First, we analyze the toleration to the displacement of the random phase key along a transverse direction. To perform the analysis, one can first calculate the complex conjugation of a cyphertext image \(g^*(x,y)\), and then numerically reverse the second nonlinear transform in Eq. 1a. The resulting complex disturbance can be written as \(\exp [-j\varphi (x_1,y_1)]T\{\psi _0^*(x_0,y_0);-z_1\}\). If R\(_1\) is not placed at its original position, but transversely translated over a distance \(\Delta x_1\) along the *x*-axis, the demodulated image can be written as \(\exp [j\varphi (x_1-\Delta x_1,y_1)]\exp [-j\varphi (x_1,y_1)]T\{\psi _0^*(x_0,y_0);-z_1\}\). This means that the random phase cannot be demodulated completely in this case. The residual phase distortion \(\exp \{j[\varphi (x_1-\delta x_1,y_1)-\varphi (x_1,y_1)]\}\) invokes speckle noise, which is accumulatively augmented upon the nonlinear propagation through the crystal [42, 47]. As a result, the recovered plaintext image is corrupted by noise, the quaility of which can be evaluated by using some standard criterion indicator such as the normalized mean-squared error (NMSE). One can expect that the NMSE value monotonously increases along with \(\Delta x_1\) from 0 to \(l_x\), the correlation length of R\(_1\). Indeed, we observed a linear relation between them as shown in Fig. 3a. In comparison with the linear counterpart [48], the proposed nonlinear encryption engine is more sensitive to the transverse translation of R\(_2\), as one can see from the inset in Fig. 3a that the position mismatch of \(l_x/2\) is sufficient to make the decrypted image totally corrupted.

The response to the axial translation of R\(_1\) can be clearly seen by writing the decrypted image \(\hat{f}(x_0,y_0) = \exp [-j\phi (x_0,y_0)]T\{\exp [-j\varphi (x_1,y_1)]T\{g(x,y);-z_2-\Delta z\};-z_1\}\) according to Eq. (3). The noise comes from the mismatch \(\Delta z\), and is further augmented by the second nonlinear transform in the decryption process. Thus it is expected that the level of noise increases as \(\Delta z\) increases either in the \(+\) or − direction, as evidenced by the experimental results shown in Fig. 3b. But the NMSE value increases quickly from 0 to about 0.5 as \(|\Delta z|\) increases from 0 to 4 mm, and then become steady as \(|\Delta z|\) increases further. One can clearly see that the proposed engine is more tolerant to the misalignment of R\(_1\) in the longitudinal direction in comparison to the transverse one. This is reasonable because the latter one is due to an effectively wrong random phase key.

The analysis of the toleration to the misalignment of R\(_0\) is straightforward. The transverse misalignment of R\(_0\) does not have any effect to the decrypted image when the plaintext image \(f(x_0,y_0)\) is real, as in our study. However, if R\(_0\) is misaligned in the longitudinal direction, there will be residual random phase and give rise to noise effect. Because of the absence of further amplification, the decryption is more tolerant to R\(_0\) than R\(_1\), as depicted by Fig. 3b.

Next we examine the robustness to additive noise of the decrypted image. This is done by adding zero-mean Gaussian white noise to the cyphertext image so that \(g'(x,y) = g^*(x,y) + \alpha n(x,y)\), where \(\alpha\) is a weighting factor that specifies the strength of the noise with respect to the signal, and \(n(x,y) \sim \mathcal {N}(0,\sigma )\), where \(\sigma\) is the standard deviation. Owing to the nonlinearity, the additive noise *n*(*x*, *y*) is coupled with the signal term \(g^*(x,y)\) on the way that \(g'(x,y)\) is propagating back to the original input plane. The resulting noise on the recovered plaintext \(\hat{f}'\) then is not additive anymore. The other immediate consequence of the nonlinear coupling is that the strength of the noise on \(\hat{f}'\) is not linearly proportional to *n*(*x*, *y*). Indeed, it has been reported that a portion of the noise power can be transferred to the signal [49]. As a result, the proposed nonlinear encryption technique should be more robust to noise, although the PSNR of \(\hat{f}'\) should be a nonlinear function of \(\text {SNR} = -10\log _{10}|g^*|^2/(\alpha |n|)^2\) of \(g'\). Indeed, we observed such a nonlinear dependence in our experiment (Fig. 4). The NMSE value decreases nonlinearly as the strength of *n*(*x*, *y*) linearly increases. As an example, we plot in Fig. 4a–d the recovered plaintext when the SNR of \(g'(x,y)\) is 10, 0, \(-10\), and \(-20\) dB, respectively. It is clearly seen that the detail of the plaintext retains even the SNR of \(g'(x,y)\) is 0 dB. In contrast, linear dependence is expected in its linear counterpart [50] as additive noise on \(g^*\) is transformed to additive noise on \(\hat{f}'\), and the power of noise conserved due to the canonical nature of this linear encryption system [51].

We also examined how the decrypted image is affected by the deviation of the strength of nonlinearity (denoted by *Q*) alone for decryption with respect to that for encryption (denoted by *q*). Specifically, this can be seen by the change of the NMSE value with respect to *Q*/*q*. The result is plotted in Fig. 5. It suggests that the decryption is quite robust to the change of nonlinearity. The NMSE value is less than 0.1 when \(Q/q=2\), and is about 0.4 even when \(Q/q=5\). Even when the decryption is carried out with \(Q=0\), one can obtain a plaintext with acceptable quality (\(\text {NMSE}\approx 0.15\)) if the two random phase masks and the length of the crystals are known. This is reasonable because of the fact that the nonlinear refractive index \(\delta n\) is four orders of magnitude smaller than the linear one [34, 36]. However, this does not mean the introduction of spatial nonlinearity is trivial. In fact, the nonlinearity does not mean to use in this way. It is used to protect the system from cryptalaysis when the random phase keys are unknown. We will show in Sect. that it has a significant impact to the enhancement of the security.

### 2.4 Security analysis

Most of the cryptanalysis techniques [17,18,19,20] rely on Kerckhoffs’s principle [52] that an intruder is assumed to have full access to the cyphertext image *g*(*x*, *y*) and/or the corresponding plaintext image \(f(x_0,y_0)\). Thus, one more transform of *g*(*x*, *y*) does not add significant intrinsic security. This is in particular true for a linear system, in which one can easily calculate the Fourier spectrum of the cyphertext *g*(*x*, *y*) and subsequently recover the plaintext image by using phase retrieval algorithms owing to the memory effect [20]. Here we show that the proposed nonlinear encryption technique is immune to such phase-retrieval-based known-plaintext attack (KPA).

According to Kerckhoffs’s principle [52], we are assumed to know *M* pairs of cyphertext–plaintext images, i.e., \([g_m(x,y),f_m(x_0,y_0)]\), where \(m=1,\ldots ,M\). To examine the KPA, we also assume that the strength of nonlinearity and the length of the crystal \(z_1\) and \(z_2\) are known as well. It is straightforward to calculate \(\psi _{z_1,m}(x_1,y_1)\exp [j\varphi (x_1,y_1)]\) from \(g_m(x,y)\) using nonlinear digital holography [36]. Since the random phase \(\text {R}_1 = \exp [j\varphi (x_1,y_1)]\) is unknown, it is not possible to directly use digital holography to reconstruct \(f_m(x_0,y_0)\) from \(\psi _{z_1,m}(x_1,y_1)\exp [j\varphi (x_1,y_1)]\). Note that the random phase R\(_1\) does not change the magnitude \(|\psi _{z_1,m}(x_1,y_1)|\). Thus an alternative approach is to retrieve the unknown phases \(\varphi (x_1,y_1)\) and therefore, \(\phi (x_0,y_0)\), from \(f_m(x_0,y_0)\) and \(|\psi _{z_1,m}(x_1,y_1)|\). In contrast to the linear counterpart, a nonlinear phase retrieval algorithm is needed in this case [53, 54]. If such KPA successes, the retrieved phase, denoted as \(\hat{\varphi }(x_1,y_1)\), should be used to decrypt any other cyphertext image, \(g_t(x,y)\), encrypted by the same system and the same set of keys. For the cryptanalysis of a linear double random-phase encoding [7, 8], the multiple-phase retrieval algorithm [19] has been demonstrated to be implicitly feasible. Here we adopt the routine of this algorithm but replacing the linear canonical transform in [19] with the nonlinear Schrödinger transform to perform the attack.

Apparently, if nothing but a noise-like pattern is recovered, we can conclude that the proposed nonlinear encryption method is immune to the phase-retrieval-based KPA. This can be verified on experimental data. However, one may argue that this may attribute to the defect of the crystal or noise in the system as this may break the reversibility of the system [55]. Thus, we endeavor to examine the security via numerical experiments, which can be regarded as a fundamental baseline.

In the numerical study, we used \(M=4\) pairs of cyphertext-plaintext images to perform the aforementioned KPA. The 4 plaintext images are shown in Fig. 6a–d, and the corresponding cyphertext images are shown in Fig. 6e–h, respectively. The KPA algorithm attempts to recover the random phase \(\varphi (x_1,y_1)\) and uses it to decode the cyphertext of an unknown plaintext image shown in Fig. 6i. The recovered random phase key \(\hat{\varphi }(x_1,y_1)\) is shown in Fig. 6j. The difference between it and the original phase \(\varphi (x_1,y_1)\) is shown in Fig. 6k. Its random-like distribution implies that the KPA algorithm [19], although has been demonstrated to be very efficient to attack a linear encryption system, is not able to retrieve the phase key of the proposed nonlinear encryption system. Indeed, nothing about the image to be analyzed (Fig. 6i) is revealed in the recovered image (Fig. 6l). Instead, it is some information about the known plaintext images that is revealed. In the specific case shown in Fig. 6l, it is a clear ‘S’ together with a dimmed ‘M’ against a noisy background that is recovered with \(\hat{\varphi }(x_1,y_1)\). With a close look at the positions of S and M, it is not difficult to see that they appear at their original positions as in Fig. 6a, d as if they were memorized by the retrieved phase key \(\hat{\varphi }(x_1,y_1)\). This exotic phenomenon is mainly due to the fact that phase evolution in a nonlinear optical system is significantly dependent on intensity-induced refractive index changes [36]. Although nonlinear refractive index is small comparing to the base one, it does have a significant impact to the enhancement of the security level, protecting it from the powerful KPA analysis. The appearance of which known plaintext image in the recovered image is determined by where the KPA algorithm stops. In the case of Fig. 6l, the KPA algorithm stops after the use of the known plaintext image ‘S’ (Fig. 6a) iteratively compute the phase key \(\hat{\varphi }(x_1,y_1)\). And thus \(\hat{\varphi }(x_1,y_1)\) *memorizes* clearly the information of the image ‘S’. One iteration previous to this is the use of the image ‘M’. And thus \(\hat{\varphi }(x_1,y_1)\) still has a little memory of it. This phenomenon has never been observed in the linear counterparts [16,17,18,19,20] or in phase retrieval using nonlinear diversity [53, 54].