Side channel analysis with chipwhisperer SCAPACK L1 (notes by Przemyslaw Kubiak)
--------------------------------------------------------------------------------

1. How to start
---------------

Installation instructions (I have chosen the git way):
https://chipwhisperer.readthedocs.io/en/latest/installing.html

2. How to connect
-----------------

In chipwhisperer/jupyter  run

  jupyter notebook

Then go to the file (i.e., to the "chapter"):  

1 - Connecting to Hardware.ipynb

In that file:
Section: ChipWhisperer-Lite (Capture + UFO), Includes SCAPACK-L1/SCAPACK-L2 
shows how to connect the capture board with the target board.

Section: UFO Target Settings 
shows the default jumper setting of the UFO board

UFO Board is explained on the webpage (including how to change a target):
https://rtfm.newae.com/Targets/CW308%20UFO/


3. Communication protocol - where to find it
--------------------------------------------

The notebook includes subdirecory:

	courses

We can see for example Lab 3_3 (three files), and SOLN_Lab_3_3
The last file sets the environmental variables 

SCOPETYPE = 'OPENADC'
PLATFORM = 'CWLITEARM'
CRYPTO_TARGET = 'AVRCRYPTOLIB'
VERSION = 'HARDWARE'

and shows how to call one of the previous files.

The file  Lab 3_3 - DPA on Firmware Implementation of AES (HARDWARE)  includes the very basic steps necessary to run the target library on the target board:

%run "../../Setup_Scripts/Setup_Generic.ipynb"              - which tries to connect to capture board

%%bash -s "$PLATFORM" "$CRYPTO_TARGET" "$SS_VER"
cd ../../../hardware/victims/firmware/simpleserial-aes
make PLATFORM=$1 CRYPTO_TARGET=$2 SS_VER=$3                 - which compiles the sources of the target "library" together with the communication interface

cw.program_target(scope, prog, "../../../hardware/victims/firmware/simpleserial-aes/simpleserial-aes-{}.hex".format(PLATFORM))    - which programs the target board 


when we go to the directory  chipwhisperer/hardware/victims/firmware/simpleserial-aes
we see the "flash" files encoded in hexadecimal format, and we also see the corresponding source and the makefile. 
The makefile includes ../simpleserial/Makefile.simpleserial
The sources in the ../simpleserial directory  define the communication protocol between the target and the computer (we are using SS_VER='SS_VER_1_1' of the protocol). 
To understand the protocol you need to read the sources from the simpleserial directory.



4. Communication protocol - how to use it
-----------------------------------------

I am learning chipwhisperer by working on an attack on ECDSA signatures in the mbedtls library. The target is STM32F3 microcontroller (32-bit Arm Cortex-M4 core).
For ECDSA see e.g., sect.4.2.1 of https://www.bsi.bund.de/SharedDocs/Downloads/EN/BSI/Publications/TechGuidelines/TR03111/BSI-TR-03111_V-2-0_pdf.pdf?__blob=publicationFile&v=2 


I started with simple power analysis.
For each experiment on ECDSA a new version of "firmware" for the targed was prepared, that is a new directory next to  simpleserial-aes  was created.
However, the first was simply a test of the  mbedtls  library:

    chipwhisperer/hardware/victims/firmware/simpleserial-ecdsa

with two source files:

simpleserial-ecdsa.c
simpleserial-ecdsa-arm.c

where  simpleserial-ecdsa.c:


#include "hal.h"
#include "simpleserial.h"
#include <string.h>
#include <stdint.h>
#include <stdlib.h>



void ecdsa_init(void);
uint8_t ecdsa_set_key(uint8_t *pt);
uint8_t ecdsa_gen_key(uint8_t *pt);
uint8_t ecdsa_gen_sig(uint8_t *pt);
uint8_t ecdsa_gen_sig_det(uint8_t *pt);

int main(void)
{
    platform_init();
    init_uart();
    trigger_setup();

    ecdsa_init();

    simpleserial_init();
    simpleserial_addcmd('k', 32, ecdsa_set_key);
    simpleserial_addcmd('g', 0, ecdsa_gen_key);
    simpleserial_addcmd('s', 33, ecdsa_gen_sig);
    simpleserial_addcmd('d', 33, ecdsa_gen_sig_det);
    while(1)
        simpleserial_get();
}



and simpleserial-ecdsa-arm.c:


(...)


uint8_t ecdsa_gen_key(uint8_t *pt)
{
    int      ret = 0;                     //longer type than the output type, but the simplesierial_get uses a sigle octet array in ack
    //const char *pers = "ecdsa";
    uint8_t  buf_for_compressed_point[1+FIELD_LEN];    
    size_t   compressed_point_length;
    //mbedtls_entropy_context entropy;
    //mbedtls_ctr_drbg_context ctr_drbg;

    //mbedtls_entropy_init( &entropy );        //!!!!!!!!!!! STM32F3 entropy mbedtls HowTo
    //mbedtls_ctr_drbg_init( &ctr_drbg );

    
    ((void) pt);

    if (key_is_empty) 
    {
        MBEDTLS_MPI_CHK( mbedtls_ecp_gen_key( ECPARAMS, &ctx, myrand, NULL ) );
        key_is_empty = 0;
    }    
    MBEDTLS_MPI_CHK( mbedtls_ecp_check_pub_priv( &ctx, &ctx) );

    memset(buf_for_compressed_point, 0, 1 + FIELD_LEN);
    MBEDTLS_MPI_CHK( mbedtls_ecp_point_write_binary( &ctx.grp, &ctx.Q, MBEDTLS_ECP_PF_COMPRESSED, &compressed_point_length, buf_for_compressed_point, 1 + FIELD_LEN ) );    
    simpleserial_put('r', compressed_point_length, buf_for_compressed_point);
   
cleanup:
    if (ret) simpleserial_put('r', sizeof(int), (uint8_t *)&ret);
    return( ret );
}




uint8_t ecdsa_set_key(uint8_t *pt)
{
    int      ret = 0;                     //longer type than the output type, but the simplesierial_get uses a sigle octet array in ack
    //const char *pers = "ecdsa";
    uint8_t  buf_for_compressed_point[1+FIELD_LEN];        
    size_t   compressed_point_length;
    //mbedtls_entropy_context entropy;
    //mbedtls_ctr_drbg_context ctr_drbg;
    
    //mbedtls_entropy_init( &entropy );        //!!!!!!!!!!! STM32F3 entropy mbedtls HowTo
    //mbedtls_ctr_drbg_init( &ctr_drbg );

    if (key_is_empty) 
    {
        MBEDTLS_MPI_CHK( mbedtls_ecp_group_load( &ctx.grp, ECPARAMS ) );
        MBEDTLS_MPI_CHK( mbedtls_mpi_read_binary( &ctx.d, pt, FIELD_LEN ) );
        MBEDTLS_MPI_CHK( mbedtls_ecp_check_privkey( &ctx.grp, &ctx.d) );
        //MBEDTLS_MPI_CHK( mbedtls_ctr_drbg_seed( &ctr_drbg, mbedtls_entropy_func, &entropy, (const unsigned char *) pers, strlen( pers ) ) );
        MBEDTLS_MPI_CHK( mbedtls_ecp_mul( &ctx.grp, &ctx.Q, &ctx.d, &ctx.grp.G, NULL, NULL ) );   //mbedtls_ctr_drbg_random, &ctr_drbg ) );

        key_is_empty = 0;
    }
    
    memset(buf_for_compressed_point, 0, 1 + FIELD_LEN);
    MBEDTLS_MPI_CHK( mbedtls_ecp_point_write_binary( &ctx.grp, &ctx.Q, MBEDTLS_ECP_PF_COMPRESSED, &compressed_point_length, buf_for_compressed_point, 1 + FIELD_LEN ) );
    simpleserial_put('r', compressed_point_length, buf_for_compressed_point);
   
cleanup:
    if (ret) simpleserial_put('r', sizeof(int), (uint8_t *)&ret);
    return( ret );
}




uint8_t ecdsa_gen_sig(uint8_t *pt)   //pt[0] contains the value of the length of the hash, the next pt[0] octets contains the hash value
{
    int      ret = 0;          //longer type than the output type, but the simplesierial_get uses a sigle octet array in ack
    
    uint8_t  buf_for_sig[2*(BASEPOINT_ORDER_LEN)]; 
    mbedtls_mpi r, s;

    mbedtls_mpi_init( &r );
    mbedtls_mpi_init( &s );
    MBEDTLS_MPI_CHK( mbedtls_ecdsa_sign( &ctx.grp, &r, &s, &ctx.d, pt+1, pt[0], myrand, NULL ) );
    //MBEDTLS_MPI_CHK( mbedtls_ecdsa_verify( &ctx.grp, pt+1, pt[0], &ctx.Q, &r, &s ) );

    memset(buf_for_sig, 0, 2*(BASEPOINT_ORDER_LEN));
    MBEDTLS_MPI_CHK( mbedtls_mpi_write_binary( &r, buf_for_sig, BASEPOINT_ORDER_LEN ) );
    MBEDTLS_MPI_CHK( mbedtls_mpi_write_binary( &s, buf_for_sig + BASEPOINT_ORDER_LEN, BASEPOINT_ORDER_LEN ) );
    simpleserial_put('r', 2*(BASEPOINT_ORDER_LEN), buf_for_sig);

cleanup:
    if (ret) simpleserial_put('r', sizeof(int), (uint8_t *)&ret);

    mbedtls_mpi_free( &r );
    mbedtls_mpi_free( &s );
    return( ret );
}


(...)



The return value from the target to the capture board (and then to the computer) is send by the target with the

void simplserial_put(char c, uint8_t size, uint8_t* output)

function implemented in the simpleserial module.


On the side of the jupyter notebook (ECDSA-mbedtls) we use 

target.simpleserial_write
target.simpleserial_read_witherrors


The API documentation is here:  https://chipwhisperer.readthedocs.io/en/latest/api.html

In case of scalar multiplication the timeout is very important!









5. Sources of the mbedtls
-------------------------
Github repository of mbedtls:

	https://github.com/ARMmbed/mbedtls

The sources are also available in the directory:

	chipwhisperer/hardware/victims/firmware/crypto/mbedtls




6. The idea of the attack - reconstruct ephemeral private key  k
----------------------------------------------------------------
Having  k   it is easy to reconstruct  private key  d.
For calculation of the ephemeral public key r the scalar multiplication algorithm is needed. 
In mbedtls the scalar multiplication algorithm on elliptic curve in Weierstrass form is the comb method: 


1. Chae Hoon Lim, Pil Joong Lee: "More Flexible Exponentiation with Precomputation", CRYPTO 1994, https://link.springer.com/chapter/10.1007/3-540-48658-5_11 
2. Biljana Cubaleska Andreas RiekeThomas Hermann: "Improving and Extending the Lim/Lee Exponentiation Algorithm", SAC 1999, https://link.springer.com/chapter/10.1007%2F3-540-46513-8_12


----------------------

To see that really the comb method is utilized by mbedtls  just follow the calls in the function  mbedtls_ecdsa_sign  implemented   in ./library/ecdsa.c

----------------------

Alternatively, in the main directory of the mbedtls repository  we may use the command:

   grep -rin -A5 -B5 "comb" ./
	
we get a lot of false positives, but in the file:

	./library/ecp.c

we see the functions:


void ecp_comb_recode_core( unsigned char x[], size_t d,
                                  unsigned char w, const mbedtls_mpi *m )


which is called by:

int ecp_comb_recode_scalar( const mbedtls_ecp_group *grp,
                                   const mbedtls_mpi *m,
                                   unsigned char k[COMB_MAX_D + 1],
                                   size_t d,
                                   unsigned char w,
                                   unsigned char *parity_trick )


int ecp_precompute_comb( const mbedtls_ecp_group *grp,
                                mbedtls_ecp_point T[], const mbedtls_ecp_point *P,
                                unsigned char w, size_t d,
                                mbedtls_ecp_restart_ctx *rs_ctx )

static int ecp_mul_comb_core( const mbedtls_ecp_group *grp, mbedtls_ecp_point *R,
                              const mbedtls_ecp_point T[], unsigned char T_size,
                              const unsigned char x[], size_t d,
                              int (*f_rng)(void *, unsigned char *, size_t),
                              void *p_rng,
                              mbedtls_ecp_restart_ctx *rs_ctx )
                              
The last one is called by:                              

static int ecp_mul_comb_after_precomp( const mbedtls_ecp_group *grp,
                                mbedtls_ecp_point *R,
                                const mbedtls_mpi *m,
                                const mbedtls_ecp_point *T,
                                unsigned char T_size,
                                unsigned char w,
                                size_t d,
                                int (*f_rng)(void *, unsigned char *, size_t),
                                void *p_rng,
                                mbedtls_ecp_restart_ctx *rs_ctx )


and the last and the first one are called by:

static int ecp_mul_comb( mbedtls_ecp_group *grp, mbedtls_ecp_point *R,
                         const mbedtls_mpi *m, const mbedtls_ecp_point *P,
                         int (*f_rng)(void *, unsigned char *, size_t),
                         void *p_rng,
                         mbedtls_ecp_restart_ctx *rs_ctx )
                         

The mbedtls implementation of the comb method uses some trick:  the indices of columns are always odd numbers, but the scalar might be longer by one bit.

                         
                         
6a. Find the fragments of the code that depends on the bits of the private ephemeral scalar
------------------------------------------------------------------------------------------- 

(in the mbedtls sources the scalar is pointed by  m ):                      

- ecp_comb_recode_core
- ecp_mul_comb_core  (the call of ecp_select_comb )
- ecp_comb_recode_scalar (the call of mbedtls_mpi_safe_cond_assign  and  ecp_comb_recode_core)


We shall consider   ecp_select_comb
A useful note: for given size of T[] and given domain parameters  the valuse in T[] will always be the same!


6b. How to collect traces?
--------------------------

- Two important functions on the side of c code: 

void trigger_low(void);   
void trigger_high(void);



- What number of cycles each trace takes?
read the variable:  scope.adc.trig_count  after the call of scope.capture()   (This value indicates how long the trigger was high or low >>>last time a trace was captured.<<<)

See the examples:
chipwhisperer/jupyter/archive/Fault_5-RSA_Fault_Attack.ipynb
chipwhisperer/jupyter/archive/Fault_1-Introduction_to_Clock_Glitch_Attacks.ipynb
chipwhisperer/jupyter/courses/fault101/SOLN_Fault 1_3 - Clock Glitching to Memory Dump.ipynb



Simplification in chipwhisperer: target is synchronized with the scope (i.e., the clocks are synchronized).

scope.clock.adc_src = "clkgen_x4"  - on the side of jupyter notebook  set the scope to be 4x faster than the target
scope.adc.samples = NNN  - set to NNN the length of the traces on the side of jupyter notebook
...and reset target before collecting traces


For collecting traces use 

scope.capture()
scope.get_last_trace()

usages - just grep the jupyter notebook delivered with chipwhisperer package.



6c. Some shortcoming of SCAPACK L1
-----------------------------------------------------------------------

In SCAPACK L1 the buffer for trace on the side f the scope is small.
Bypass this by concatenating fragments of traces - use variables

    scope.adc.offset
    scope.adc.samples 

So we mimic the continuous mode implemented in Chipwhisperer Pro.


6d. For known *m collect the traces depending on different values of the fragments of *m
----------------------------------------------------------------------------------------

Template attacks:

1. Traces corresponding to the same value of the fragment of *m   will be placed in the same bin.
2. Calculate the mean trace of each bin.
3. Find points of interest - the points, where the mean traces in the bin differs significantly.
3. Prepare a template (size of the covariance matrix is important). 
4. Make the attack.


For template attacks see chapter 5.3 of  Power Analysis Attacks: Revealing The Secrets Of Smart Cards
 
Recommended: 
1. Use at least two devices, one for profiling, another one for attacks.
2. If you do not know, where the data will be placed in memory - make the attack independent from addresses.