Noise-Robust Estimation of Quantum Observables on Noisy Hardware
Quantum computing is at a pivotal moment in its evolution. However, qubits are highly susceptible to losing their quantum coherence, experiencing both random fluctuations and systematic errors caused by unintended interactions with the environment and calibration imperfections. This noise poses a significant challenge for quantum computation, limiting the reliability and scalability of quantum devices. While hardware advancements and quantum error correction (QEC) hold promise for addressing these issues in the long term, quantum error mitigation methods — techniques that improve the accuracy of noisy outcomes without the physical-qubit overhead of error correction — are increasingly recognized for their potential to enhance the usability of quantum devices in the current Noisy Intermediate-Scale Quantum (NISQ) era as well as in the near-future partial error-correction era. Existing quantum error mitigation techniques for quantum algorithms fall into two categories:
-
Noise-Aware Methods: These rely on detailed noise models to counteract specific noise effects. While theoretically able to provide a bias-free result, the accuracy of noise-aware methods is limited by the difficulty of obtaining accurate noise characterizations and the stability of the noise profile, giving rise to some finite bias in the final estimation.
-
Noise-Agnostic Methods: These techniques, like Zero Noise Extrapolation (ZNE) and Clifford Data Regression (CDR), attempt to mitigate the effect of noise without the need to know the noise profile. However, they often introduce systematic unknown biases due to various forms of model-mismatch. For ZNE, the model mismatch occurs due to using certain mathematical fitting function to extrapolate to the zero-noise limit. In addition, any inaccuracy in noise-amplification would contribute to the final bias error. In CDR, on the other hand, the model mismatch occurs due to using an ansatz learned from a set of ‘near-Clifford’ circuits to error mitigate the target ‘non-Clifford’ quantum circuit.
These limitations highlight the need for a noise-agnostic and general-purpose approach that addresses the shortcomings of the existing methods.
Overview of Noise-Robust Estimation.
is a novel framework that leverages a bias-dispersion correlation discovered within its framework to address (and suppress the impact of) the model-mismatch problem of the current noise-agnostic methods explained above.NRE introduces an auxiliary quantity, $\mathcal{A}$, which is designed to be less sensitive to noise than the directly measured expectation value of the observable $O$. The latter is obtained by executing the noisy target ($t$) quantum circuit,$\langle \widetilde{O} \rangle_t$. Unlike existing methods, NRE employs a distinctive two-step post-processing framework, enabling robust estimations.
First post-processing step: Baseline Estimation
This step constructs an initial estimator using a noise-canceling circuit ($ncc$) and a tunable control parameter. As explained in the following, the $ncc$ is structurally similar to the target circuit but designed such that its noiseless expectation value is known. While the baseline estimation suppresses noise effects, it generally retains an unknown residual bias, $\mathcal{B}$.
Second post-processing step: Bias-Dispersion Correlation Leads to the Final Estimation
A key insight of NRE is the discovery of a strong correlation between the unknown residual bias and a measurable quantity, the normalized dispersion $\mathcal{D}$. The latter quantifies the noise sensitivity of the auxiliary quantity relative to that of the observable of interest. Ideally, after the first post-processing step, the auxiliary quantity should be completely insensitive to noise, implying $\mathcal{D} \to 0$. Since $\mathcal{D}$ quantifies noise sensitivity, we find that a smaller $\mathcal{D}$ statistically correlates with a reduced residual bias $\mathcal{B}$, improving the estimation accuracy. A unique feature of NRE is its use of classical bootstrapping, which serves a dual purpose:
-
(i) providing an estimate of statistical uncertainties and
-
(ii) generating a data set of generally non-ideal baseline estimations (i.e., estimations with nonzero $\mathcal{D}$ ).
The final noise-mitigated estimate is obtained by applying data regression to this data set, to find the estimation at the $\mathcal{D} \to 0$ limit, thereby maximally suppressing the residual bias of the estimation.
The main panel of Figure 1 schematically illustrates the probability density of a noisy measurement as well as the NRE estimations after each of the two post-processing steps. The inset shows an example for the correlation between the normalized dispersion and the bias error of the baseline estimations obtained from bootstrapping. The data points are obtained using IQM's Garnet quantum processor for a specific quantum algorithm explained in detail in the "implementation and verification” section. We observe in the inset plot that performing linear regression on the baseline estimations to approximate the value associated with the limit of $\mathcal{D} \to 0$ results in a highly accurate final estimation.
Methodology
The Auxiliary Quantity and Taylor Expansion
Let us assume that the baseline hardware noise can be quantified by a rate $\epsilon_0$. We can write for the parameterized noise rate $\epsilon \to \lambda\epsilon_0$ in which $\lambda$ is the dimensionless noise scale factor. As mentioned before, our objective is to find an estimation for the noiseless expectation value of observable $O$ using a target quantum state: $\langle O \rangle_t$. Due to the inherent noise in executing the target state, the outcome of the measurement yields a noisy expectation value, denoted as $\langle \widetilde{O} \rangle_t (\lambda)$.
The NRE framework introduces an auxiliary quantity, $\mathcal{A}$, constructed from measurements of both the target circuit ($t$) and a noise-canceling circuit ($ncc$)- see the circuits in Figure 2. $\mathcal{A}$ is designed to converge to the ideal noiseless result as noise vanishes ($\lambda \to 0$), while at a finite noise scale, $\mathcal{A}$ also depends on a real-valued control parameter, $n$.
To implement NRE, $\mathcal{A}$ is measured at various noise scale factors $\lambda_i$. The auxiliary quantity is defined as,
$$\mathcal{A}(n, \lambda) = P_1(\lambda) + n \cdot P_2(\lambda)$$Where the real number $n$ is the control parameter used for the first step of the post-processing, as explained in the following, and the terms $P_1$ and $P_2$ are defined as,
$$P_1(\lambda_i) = \langle \widetilde{O} \rangle_t(\lambda_i) \frac{\langle \widetilde{O} \rangle_{ncc} }{ \langle \widetilde{O} \rangle_{ncc} (\lambda_i)}$$$$P_2(\lambda_i) = \log \frac{\langle \widetilde{O} \rangle_{ncc} }{ \langle \widetilde{O} \rangle_{ncc} (\lambda_i)}$$The noise-canceling circuit is chosen such that the ideal expectation value $\langle O \rangle_{ncc}$ is known. This can be achieved, for example, by replacing all non-Clifford gates in the target circuit with Clifford gates so that $\langle O \rangle_{ncc}$ can be simulated efficiently classically.
The purpose of the NRE ansatz is to maximally suppress the noise sensitivity of the auxiliary quantity by introducing the $ncc$ and the tunable control parameter. To justify the form of the proposed ansatz, we note that $P_1$ accounts for the competition between the noise affecting the target circuit and the noise affecting the noise-canceling circuit. Because these circuits have similar gate structures but different error scaling properties, their noise contributions compete in a nontrivial manner. However, one can generally expect that $P_1(\lambda)$ should have reduced noise-sensitivity compared with $\langle \widetilde{O} \rangle_t(\lambda)$. We can therefore expect $P_1(\lambda)$ to scale approximately as a (multi-)exponential function with smaller exponent(s), allowing it to be locally approximated as a linear function, particularly in the vicinity of the smallest noise scale factor $\lambda_1$.
To further reduce the noise sensitivity of $\mathcal{A}$, we introduce the term $P_2$, which is defined in logarithmic form. The logarithm ensures that $P_2$ behaves approximately linearly with respect to the noise scale factor as well. This property allows the tunable control parameter $n$ to optimally adjust $\mathcal{A}$ enhancing its robustness against noise variations and enabling a more precise expectation value estimation.
Assuming we measure both the target and noise-canceling circuits at M different values for the noise scale factor, using Taylor expansion, we then find for the ideal expectation value,
$$\langle O \rangle_t = \mathcal{A}_{ncc}(n, \lambda_1) + \sum_{j=1}^{M-1} \mathcal{A}^{[j]}_{\lambda_1}(n) \cdot \frac{(-\lambda_1)^j}{j!} + \mathcal{B}$$In the above equation, the center of Taylor expansion is at the smallest noise scale $\lambda_1$, and $\mathcal{A}^{[j]}_{\lambda_1}$ depicts the numerically approximated derivative of $\mathcal{A}$ of order j at $\lambda_1$. The last term in the above equation, $\mathcal{B}$, is the unknown residual bias error. The residual bias error is the result of truncating the Taylor series and the discretization error (which is due to finite noise-level spacing). We note that different noise-scaling behavior between the target circuit and the noise-canceling circuit impacts the magnitude of both the truncation and discretization errors. In addition, inaccurate noise amplification also contributes to the residual bias error.
As already explained, by introducing the noise-canceling circuit and the optimal control parameter, we expect the sensitivity of $\mathcal{A}$ to the noise scale factor $\lambda_i$ shall be smaller in comparison to $\langle \widetilde{O} \rangle_t(\lambda_i)$. This is in fact the justification to seek an approximation for $\langle O \rangle_t$ by Taylor expanding $\mathcal{A}$ rather than $\langle \widetilde{O} \rangle_t$ because in this scenario, the residual bias error— encompassing the sum of higher-order derivatives— is anticipated to be smaller.
First post-processing step
We define the optimal control parameter as the value that ensures the auxiliary quantity at the smallest noise level $\lambda_1$ serves as an estimator for the ideal expectation value. The optimal control parameter $n_{op}$ therefore satisfies the below equation,
$$\sum_{j=1}^{M-1} \mathcal{A}^{[j]}_{\lambda_1}(n_{op}) \cdot \frac{(-\lambda_1)^j}{j!} = 0$$The solution to the above equation is discussed in detail in Ref. [1]. We then arrive at the following baseline estimation defined as,
$$\langle O \rangle_{b-NRE} = \mathcal{A}_{ncc}(n_{op}, \lambda_1)$$The above equation indicates that (the accuracy of) the baseline estimation depends on the choice of the noise-canceling circuit. In the following, we set up the second post-processing step that aims to improve the accuracy of the final estimation for a given noise-canceling circuit.

Second post-processing step
As mentioned before, the residual bias error of the baseline estimation, $\mathcal{B}$, includes a term that corresponds to the sum of higher-order derivatives of $\mathcal{A}_{ncc}$ at $\lambda_1$. Although these derivatives cannot be directly computed, a smaller variation in $\mathcal{A}_{ncc}(n_{op},\lambda_i)$ across different noise scale factors $\lambda_i$ suggests that these higher-order derivatives—and consequently, the residual bias—are reduced. Additionally, as shown in detail in Ref. [1], modeling the effects of imprecise noise amplification shows that its contribution to $\mathcal{B}$ also decreases as the variation of $\mathcal{A}_{ncc}(n_{op},\lambda_i)$ with respect to $\lambda_i$ becomes smaller.
In other words, when the dispersion of the set of auxiliary data points $\{\mathcal{A}(\lambda_i)\}$ is smaller, the residual bias error in the baseline estimation is statistically expected to be reduced. We now introduce the normalized dispersion, $\mathcal{D}$, a dimensionless metric that serves as a proxy for the unknown residual bias,
$$\mathcal{D} = \frac{MAD[\{\mathcal{A}(\lambda_i)\}]}{MAD[\{\langle \widetilde{O} \rangle_t(\lambda_i)\}]}$$Here Mean Absolute Deviation (MAD) is a function that quantifies dispersion of a data set:
$$MAD[\{x(\lambda_j)\}] = \frac{1}{m}\sum_{j=1}^{m}|x_j - \bar{x}|$$Where $\bar{x}$ represents the mean value of the set $\{x(\lambda_j)\}$ with $m$ members.
To arrive at the final NRE estimation, we first apply bootstrapping to the original set of experimental counts, generating an ensemble of bootstrapped counts for each measurement. From these, we compute a set of independent estimations of expectation values. Note that the bootstrapping procedure is commonly used to estimate statistical uncertainties in expectation values caused by shot noise.
Next, for each bootstrapped set of expectation values, we compute the auxiliary quantity and apply the first post-processing step to obtain the corresponding baseline estimator. Simultaneously, we calculate the normalized dispersion associated with all bootstrapped sets of expectation values. This results in a data set of baseline estimations and their corresponding normalized dispersion values. The second post-processing step performs regression on this data set to find the error-mitigated estimation at the $\mathcal{D} \to 0$ limit. The final NRE estimator is then defined as:
$$\langle O \rangle_{NRE} \equiv lim_{D \to 0} \langle O \rangle_{b-NRE}(\mathcal{D})$$
In Figure 2, we illustrate the workflow of the NRE framework as explained above. Figure 3 presents a schematic representation where measurements are performed on both a target and a noise-canceling circuit at various noise scale factors. Each experimental count undergoes $s=2$ bootstrap iterations, generating independent copies of expectation values and the auxiliary quantity. The same figure also schematically depicts the dispersion of $\{\mathcal{A}(\lambda_i)\}$ and $\{\langle \widetilde{O} \rangle_t(\lambda_i)\}$, quantified using the mean absolute deviation (MAD). We note that performing NRE requires a statistically large number of baseline estimations. Ref. [1] also explains in detail how the framework shown in Figure 2 can be extended to produce a final estimation that includes both a mean and a standard deviation. This is achieved by resampling from each bootstrapped expectation value. For further details, please refer to Ref. [1].
NRE implementation and verification examples
We implemented NRE and verified its performance on three different IQM quantum processors with different qubit counts and connectivity maps. Here we present some implementation results on IQM Garnet which is a 20-qubit QPU with square-grid topology as shown in Figure 4(a).
Our case study is error mitigating the measured ground state energy of transverse-field Ising model (TFIM), a quantum mechanical model used to study magnetic systems. The model Hamiltonian for this system reads,
$$H = - h \sum_j \sigma^j_X - \sum_{(i,j)}\sigma^i_Z \sigma^j_Z$$We first simulate a noise-free system for transverse field strength of g=2 using a Hamiltonian Variational Ansatz with 4 layers to obtain the ground state energy. Once the outcome of the variational algorithm converged, we consider the associated quantum circuit as the target circuit.
In the below table, we summarize the relevant information for the study and the parameters for the application of NRE.
Observable: Ground state energy
Target circuit: Paramagnetic phase (g=2) of TFIM (obtained from Hamiltonian Variational Ansatz with 4 layers)
Noise-canceling circuit: Obtained from target circuit by replacing each non-Clifford gate with a Clifford that minimizes the Frobenius norm (don't worry if you are not familiar with the term 😊, it essentially means we are looking for the "closest" Clifford gate for each non-Clifford gate).
Noise amplification method: Global unitary folding (which is a gate-based noise amplification method).

Figures 5(a)-(c) show results obtained using noise scale factors of the form $\{\lambda_i\}=[1,1+h,1+2h]$, with $h=0.5, 1$ and $h=2$ respectively. In Figure 5(d), we increase the number of noise scales to 4 while keeping $h= 1$. Figure 5(e) is obtained by setting $\{\lambda_i\}=[2,4,6]$. This is equivalent to the setup in panel (b) in which $\{\lambda_i\}=[1,2,3]$ except that the circuit depth is doubled. Consequently, NRE is performed on a much noisier circuit, where $\langle \widetilde{O} \rangle_t(\lambda_2)$, is reduced by approximately 80% compared to the noiseless value. In all panels, the shaded areas represent the baseline estimations obtained via bootstrapping and resampling, as detailed in the figure caption.
Across all cases, we find a strong correlation between the baseline estimation and the normalized dispersion. We observe that the shaded areas corresponding to lower values of $\mathcal{D}$ generally provide more accurate estimations of the ideal value. This reinforces the theoretical foundation of NRE, confirming that $\mathcal{D}$serves as a reliable proxy for the residual bias $\mathcal{B}$. To obtain the final estimation, we apply a linear regression to the baseline estimations, assigning a weight to each baseline estimation in the form of $1/\mathcal{D}$ during the regression. This ensures that baseline estimations with lower values of normalized dispersion are given greater weight in determining the final estimation.
In Figures 5(f-j), we compare the performance of NRE in reducing the bias error (of the error mitigated estimation) relative to the ideal expectation value against other noise-agnostic methods, including Zero Noise Extrapolation (ZNE) [2], Clifford-Data Regression (CDR) [3], variable-noise Clifford-Data Regression (vnCDR) [4] and Mitigating Depolarizing Noise (MDN) with noise-estimation circuits [5]. For ZNE, we used a single-exponential fit. For CDR and vnCDR, we generated 40 near-Clifford training circuits by replacing approximately 90% of the non-Clifford gates in the target circuits with Clifford gates, following the procedure in Ref. [3]. The implementation of MDN employed the same noise-canceling circuit used in NRE as the estimation circuit used in MDN.
We observe that the accuracy of ZNE is highly sensitive to the choice of noise scale factors, and the optimal selection is generally unknown a priori; however, in these figures, we find that ZNE performs best in panel (h), where the noise scale factors are odd integers. This behavior may be attributed to the gate-based noise amplification working more accurately in this setting. In sharp contrast, NRE provides remarkably stable and accurate predictions across all choices of noise scale factors.
In Figure 5(i), NRE reduces the relative bias by two orders of magnitude more than any other method. Furthermore, in Figure 5(j), which corresponds to the deeper and noisier circuit of panel 5(e), all other methods fail to produce an accurate prediction. Notably, CDR and vnCDR perform particularly poorly in this case, due to an increased mismatch between the noise scaling of the target and training circuits. In contrast, NRE remains the most reliable, recovering the ideal expectation value with 90% accuracy.

Conclusions and outlook:
The ability to perform reliable error mitigation is expected to play a crucial role in achieving quantum advantage before the advent of fault-tolerant quantum computing. Even in the early stages of fault-tolerant quantum computing, residual errors from imperfect logical operations will still require mitigation. In this module, we reviewed Noise-Robust Estimation (NRE), a newly developed noise-agnostic error mitigation method containing two distinct post-processing steps. The first-post processing step results in a baseline estimation containing an unknown residual error. NRE introduces a calculable metric called normalized dispersion that correlates with the bias of baseline estimation. The second post-processing step then uses this correlation to arrive at the final error-mitigated estimation.
We implemented and benchmarked NRE using different IQM backends, and different quantum algorithms including the Transverse Field Ising model (in different phases) and also quantum simulation of the H4 molecule. We persistently observed that NRE outperforms other noise-agnostic methods such as ZNE and MDN across different noise-amplification settings. In addition, the sampling cost of NRE is also analyzed in detail in Ref. [1] and it is found that NRE has lower sampling cost compared with MDN, but (modestly) higher sampling cost compared with ZNE (i.e, within the same order of magnitude as in ZNE.) These findings establish NRE as a reliable noise-agnostic error mitigation approach with manageable sampling cost.
If you want to try out NRE yourself, you will be able to use it via IQM Resonance soon. Stay tuned 😊.
NRE is patented by IQM Quantum Computers.References
- Amin Hosseinkhani, Fedor Šimkovic, Alessio Calzona, Tianhan Liu, Adrian Auer, Inés de Vega Noise-Robust Estimation of Quantum Observables with Noisy Hardware, arXiv:2503.06695
- Kristan Temme, Sergey Bravyi, Jay M. Gambetta Error mitigation for short-depth quantum circuits, Phys. Rev. Lett. 119, 180509 (2017).
- Piotr Czarnik, Andrew Arrasmith, Patrick J. Coles, and Lukasz Cincio Error mitigation with Clifford quantum-circuit data, Quantum 5, 592 (2021).
- Angus Lowe, Max Hunter Gordon, Piotr Czarnik, Andrew Arrasmith, Patrick J. Coles, and Lukasz Cincio, Unified approach to data-driven quantum error mitigation, Phys. Rev. Res. 3, 033098 (2021).
- Miroslav Urbanek, Benjamin Nachman, Vincent R. Pascuzzi, Andre He, Christian W. Bauer, and Wibe A. de Jong, Mitigating depolarizing noise on quantum computers with noise-estimation circuits, Phys. Rev. Lett. 127, 270502 (2021).