Side channel analysis with chipwhisperer SCAPACK L1 (notes by Przemyslaw Kubiak) -------------------------------------------------------------------------------- 1. How to start --------------- Installation instructions (I have chosen the git way): https://chipwhisperer.readthedocs.io/en/latest/installing.html 2. How to connect ----------------- In chipwhisperer/jupyter run jupyter notebook Then go to the file (i.e., to the "chapter"): 1 - Connecting to Hardware.ipynb In that file: Section: ChipWhisperer-Lite (Capture + UFO), Includes SCAPACK-L1/SCAPACK-L2 shows how to connect the capture board with the target board. Section: UFO Target Settings shows the default jumper setting of the UFO board UFO Board is explained on the webpage (including how to change a target): https://rtfm.newae.com/Targets/CW308%20UFO/ 3. Communication protocol - where to find it -------------------------------------------- The notebook includes subdirecory: courses We can see for example Lab 3_3 (three files), and SOLN_Lab_3_3 The last file sets the environmental variables SCOPETYPE = 'OPENADC' PLATFORM = 'CWLITEARM' CRYPTO_TARGET = 'AVRCRYPTOLIB' VERSION = 'HARDWARE' and shows how to call one of the previous files. The file Lab 3_3 - DPA on Firmware Implementation of AES (HARDWARE) includes the very basic steps necessary to run the target library on the target board: %run "../../Setup_Scripts/Setup_Generic.ipynb" - which tries to connect to capture board %%bash -s "$PLATFORM" "$CRYPTO_TARGET" "$SS_VER" cd ../../../hardware/victims/firmware/simpleserial-aes make PLATFORM=$1 CRYPTO_TARGET=$2 SS_VER=$3 - which compiles the sources of the target "library" together with the communication interface cw.program_target(scope, prog, "../../../hardware/victims/firmware/simpleserial-aes/simpleserial-aes-{}.hex".format(PLATFORM)) - which programs the target board when we go to the directory chipwhisperer/hardware/victims/firmware/simpleserial-aes we see the "flash" files encoded in hexadecimal format, and we also see the corresponding source and the makefile. The makefile includes ../simpleserial/Makefile.simpleserial The sources in the ../simpleserial directory define the communication protocol between the target and the computer (we are using SS_VER='SS_VER_1_1' of the protocol). To understand the protocol you need to read the sources from the simpleserial directory. 4. Communication protocol - how to use it ----------------------------------------- I am learning chipwhisperer by working on an attack on ECDSA signatures in the mbedtls library. The target is STM32F3 microcontroller (32-bit Arm Cortex-M4 core). For ECDSA see e.g., sect.4.2.1 of https://www.bsi.bund.de/SharedDocs/Downloads/EN/BSI/Publications/TechGuidelines/TR03111/BSI-TR-03111_V-2-0_pdf.pdf?__blob=publicationFile&v=2 I started with simple power analysis. For each experiment on ECDSA a new version of "firmware" for the targed was prepared, that is a new directory next to simpleserial-aes was created. However, the first was simply a test of the mbedtls library: chipwhisperer/hardware/victims/firmware/simpleserial-ecdsa with two source files: simpleserial-ecdsa.c simpleserial-ecdsa-arm.c where simpleserial-ecdsa.c: #include "hal.h" #include "simpleserial.h" #include #include #include void ecdsa_init(void); uint8_t ecdsa_set_key(uint8_t *pt); uint8_t ecdsa_gen_key(uint8_t *pt); uint8_t ecdsa_gen_sig(uint8_t *pt); uint8_t ecdsa_gen_sig_det(uint8_t *pt); int main(void) { platform_init(); init_uart(); trigger_setup(); ecdsa_init(); simpleserial_init(); simpleserial_addcmd('k', 32, ecdsa_set_key); simpleserial_addcmd('g', 0, ecdsa_gen_key); simpleserial_addcmd('s', 33, ecdsa_gen_sig); simpleserial_addcmd('d', 33, ecdsa_gen_sig_det); while(1) simpleserial_get(); } and simpleserial-ecdsa-arm.c: (...) uint8_t ecdsa_gen_key(uint8_t *pt) { int ret = 0; //longer type than the output type, but the simplesierial_get uses a sigle octet array in ack //const char *pers = "ecdsa"; uint8_t buf_for_compressed_point[1+FIELD_LEN]; size_t compressed_point_length; //mbedtls_entropy_context entropy; //mbedtls_ctr_drbg_context ctr_drbg; //mbedtls_entropy_init( &entropy ); //!!!!!!!!!!! STM32F3 entropy mbedtls HowTo //mbedtls_ctr_drbg_init( &ctr_drbg ); ((void) pt); if (key_is_empty) { MBEDTLS_MPI_CHK( mbedtls_ecp_gen_key( ECPARAMS, &ctx, myrand, NULL ) ); key_is_empty = 0; } MBEDTLS_MPI_CHK( mbedtls_ecp_check_pub_priv( &ctx, &ctx) ); memset(buf_for_compressed_point, 0, 1 + FIELD_LEN); MBEDTLS_MPI_CHK( mbedtls_ecp_point_write_binary( &ctx.grp, &ctx.Q, MBEDTLS_ECP_PF_COMPRESSED, &compressed_point_length, buf_for_compressed_point, 1 + FIELD_LEN ) ); simpleserial_put('r', compressed_point_length, buf_for_compressed_point); cleanup: if (ret) simpleserial_put('r', sizeof(int), (uint8_t *)&ret); return( ret ); } uint8_t ecdsa_set_key(uint8_t *pt) { int ret = 0; //longer type than the output type, but the simplesierial_get uses a sigle octet array in ack //const char *pers = "ecdsa"; uint8_t buf_for_compressed_point[1+FIELD_LEN]; size_t compressed_point_length; //mbedtls_entropy_context entropy; //mbedtls_ctr_drbg_context ctr_drbg; //mbedtls_entropy_init( &entropy ); //!!!!!!!!!!! STM32F3 entropy mbedtls HowTo //mbedtls_ctr_drbg_init( &ctr_drbg ); if (key_is_empty) { MBEDTLS_MPI_CHK( mbedtls_ecp_group_load( &ctx.grp, ECPARAMS ) ); MBEDTLS_MPI_CHK( mbedtls_mpi_read_binary( &ctx.d, pt, FIELD_LEN ) ); MBEDTLS_MPI_CHK( mbedtls_ecp_check_privkey( &ctx.grp, &ctx.d) ); //MBEDTLS_MPI_CHK( mbedtls_ctr_drbg_seed( &ctr_drbg, mbedtls_entropy_func, &entropy, (const unsigned char *) pers, strlen( pers ) ) ); MBEDTLS_MPI_CHK( mbedtls_ecp_mul( &ctx.grp, &ctx.Q, &ctx.d, &ctx.grp.G, NULL, NULL ) ); //mbedtls_ctr_drbg_random, &ctr_drbg ) ); key_is_empty = 0; } memset(buf_for_compressed_point, 0, 1 + FIELD_LEN); MBEDTLS_MPI_CHK( mbedtls_ecp_point_write_binary( &ctx.grp, &ctx.Q, MBEDTLS_ECP_PF_COMPRESSED, &compressed_point_length, buf_for_compressed_point, 1 + FIELD_LEN ) ); simpleserial_put('r', compressed_point_length, buf_for_compressed_point); cleanup: if (ret) simpleserial_put('r', sizeof(int), (uint8_t *)&ret); return( ret ); } uint8_t ecdsa_gen_sig(uint8_t *pt) //pt[0] contains the value of the length of the hash, the next pt[0] octets contains the hash value { int ret = 0; //longer type than the output type, but the simplesierial_get uses a sigle octet array in ack uint8_t buf_for_sig[2*(BASEPOINT_ORDER_LEN)]; mbedtls_mpi r, s; mbedtls_mpi_init( &r ); mbedtls_mpi_init( &s ); MBEDTLS_MPI_CHK( mbedtls_ecdsa_sign( &ctx.grp, &r, &s, &ctx.d, pt+1, pt[0], myrand, NULL ) ); //MBEDTLS_MPI_CHK( mbedtls_ecdsa_verify( &ctx.grp, pt+1, pt[0], &ctx.Q, &r, &s ) ); memset(buf_for_sig, 0, 2*(BASEPOINT_ORDER_LEN)); MBEDTLS_MPI_CHK( mbedtls_mpi_write_binary( &r, buf_for_sig, BASEPOINT_ORDER_LEN ) ); MBEDTLS_MPI_CHK( mbedtls_mpi_write_binary( &s, buf_for_sig + BASEPOINT_ORDER_LEN, BASEPOINT_ORDER_LEN ) ); simpleserial_put('r', 2*(BASEPOINT_ORDER_LEN), buf_for_sig); cleanup: if (ret) simpleserial_put('r', sizeof(int), (uint8_t *)&ret); mbedtls_mpi_free( &r ); mbedtls_mpi_free( &s ); return( ret ); } (...) The return value from the target to the capture board (and then to the computer) is send by the target with the void simplserial_put(char c, uint8_t size, uint8_t* output) function implemented in the simpleserial module. On the side of the jupyter notebook (ECDSA-mbedtls) we use target.simpleserial_write target.simpleserial_read_witherrors The API documentation is here: https://chipwhisperer.readthedocs.io/en/latest/api.html In case of scalar multiplication the timeout is very important! 5. Sources of the mbedtls ------------------------- Github repository of mbedtls: https://github.com/ARMmbed/mbedtls The sources are also available in the directory: chipwhisperer/hardware/victims/firmware/crypto/mbedtls 6. The idea of the attack - reconstruct ephemeral private key k ---------------------------------------------------------------- Having k it is easy to reconstruct private key d. For calculation of the ephemeral public key r the scalar multiplication algorithm is needed. In mbedtls the scalar multiplication algorithm on elliptic curve in Weierstrass form is the comb method: 1. Chae Hoon Lim, Pil Joong Lee: "More Flexible Exponentiation with Precomputation", CRYPTO 1994, https://link.springer.com/chapter/10.1007/3-540-48658-5_11 2. Biljana Cubaleska Andreas RiekeThomas Hermann: "Improving and Extending the Lim/Lee Exponentiation Algorithm", SAC 1999, https://link.springer.com/chapter/10.1007%2F3-540-46513-8_12 ---------------------- To see that really the comb method is utilized by mbedtls just follow the calls in the function mbedtls_ecdsa_sign implemented in ./library/ecdsa.c ---------------------- Alternatively, in the main directory of the mbedtls repository we may use the command: grep -rin -A5 -B5 "comb" ./ we get a lot of false positives, but in the file: ./library/ecp.c we see the functions: void ecp_comb_recode_core( unsigned char x[], size_t d, unsigned char w, const mbedtls_mpi *m ) which is called by: int ecp_comb_recode_scalar( const mbedtls_ecp_group *grp, const mbedtls_mpi *m, unsigned char k[COMB_MAX_D + 1], size_t d, unsigned char w, unsigned char *parity_trick ) int ecp_precompute_comb( const mbedtls_ecp_group *grp, mbedtls_ecp_point T[], const mbedtls_ecp_point *P, unsigned char w, size_t d, mbedtls_ecp_restart_ctx *rs_ctx ) static int ecp_mul_comb_core( const mbedtls_ecp_group *grp, mbedtls_ecp_point *R, const mbedtls_ecp_point T[], unsigned char T_size, const unsigned char x[], size_t d, int (*f_rng)(void *, unsigned char *, size_t), void *p_rng, mbedtls_ecp_restart_ctx *rs_ctx ) The last one is called by: static int ecp_mul_comb_after_precomp( const mbedtls_ecp_group *grp, mbedtls_ecp_point *R, const mbedtls_mpi *m, const mbedtls_ecp_point *T, unsigned char T_size, unsigned char w, size_t d, int (*f_rng)(void *, unsigned char *, size_t), void *p_rng, mbedtls_ecp_restart_ctx *rs_ctx ) and the last and the first one are called by: static int ecp_mul_comb( mbedtls_ecp_group *grp, mbedtls_ecp_point *R, const mbedtls_mpi *m, const mbedtls_ecp_point *P, int (*f_rng)(void *, unsigned char *, size_t), void *p_rng, mbedtls_ecp_restart_ctx *rs_ctx ) The mbedtls implementation of the comb method uses some trick: the indices of columns are always odd numbers, but the scalar might be longer by one bit. 6a. Find the fragments of the code that depends on the bits of the private ephemeral scalar ------------------------------------------------------------------------------------------- (in the mbedtls sources the scalar is pointed by m ): - ecp_comb_recode_core - ecp_mul_comb_core (the call of ecp_select_comb ) - ecp_comb_recode_scalar (the call of mbedtls_mpi_safe_cond_assign and ecp_comb_recode_core) We shall consider ecp_select_comb A useful note: for given size of T[] and given domain parameters the valuse in T[] will always be the same! 6b. How to collect traces? -------------------------- - Two important functions on the side of c code: void trigger_low(void); void trigger_high(void); - What number of cycles each trace takes? read the variable: scope.adc.trig_count after the call of scope.capture() (This value indicates how long the trigger was high or low >>>last time a trace was captured.<<<) See the examples: chipwhisperer/jupyter/archive/Fault_5-RSA_Fault_Attack.ipynb chipwhisperer/jupyter/archive/Fault_1-Introduction_to_Clock_Glitch_Attacks.ipynb chipwhisperer/jupyter/courses/fault101/SOLN_Fault 1_3 - Clock Glitching to Memory Dump.ipynb Simplification in chipwhisperer: target is synchronized with the scope (i.e., the clocks are synchronized). scope.clock.adc_src = "clkgen_x4" - on the side of jupyter notebook set the scope to be 4x faster than the target scope.adc.samples = NNN - set to NNN the length of the traces on the side of jupyter notebook ...and reset target before collecting traces For collecting traces use scope.capture() scope.get_last_trace() usages - just grep the jupyter notebook delivered with chipwhisperer package. 6c. Some shortcoming of SCAPACK L1 ----------------------------------------------------------------------- In SCAPACK L1 the buffer for trace on the side f the scope is small. Bypass this by concatenating fragments of traces - use variables scope.adc.offset scope.adc.samples So we mimic the continuous mode implemented in Chipwhisperer Pro. 6d. For known *m collect the traces depending on different values of the fragments of *m ---------------------------------------------------------------------------------------- Template attacks: 1. Traces corresponding to the same value of the fragment of *m will be placed in the same bin. 2. Calculate the mean trace of each bin. 3. Find points of interest - the points, where the mean traces in the bin differs significantly. 3. Prepare a template (size of the covariance matrix is important). 4. Make the attack. For template attacks see chapter 5.3 of Power Analysis Attacks: Revealing The Secrets Of Smart Cards Recommended: 1. Use at least two devices, one for profiling, another one for attacks. 2. If you do not know, where the data will be placed in memory - make the attack independent from addresses.