Federated Learning Concepts
Federated learning trains a shared model across many decentralised devices or servers, each keeping its data local — only model updates (gradients or weights) are shared, preserving data privacy by design.
The FedAvg Algorithm
The canonical federated learning algorithm, Federated Averaging (FedAvg), alternates between distributing the global model to clients, local training on private data, and aggregating client updates on a central server.
FedAvg Step-by-Step
- Server broadcasts the current global model weights \\(w_t\\) to a selected subset of clients
- Each client runs E epochs of SGD on its local dataset, producing updated weights \\(w_t^k\\)
- Server aggregates client weights via a weighted average: \\(w_{t+1} = \\sum_k \\frac{n_k}{n} w_t^k\\) where \\(n_k\\) is client k's dataset size
- Repeat until convergence
This converges to a global model without any raw data leaving the clients.
Conceptual Simulation in NumPy
Privacy Enhancements
Sharing raw gradients can leak information about training data. Two key techniques harden federated learning against privacy attacks.
Differential Privacy
Differential Privacy (DP) adds calibrated Gaussian noise to client gradients before uploading, providing a mathematical guarantee that individual data points cannot be inferred from the shared updates. The privacy budget \\(\\epsilon\\) controls the trade-off: smaller \\(\\epsilon\\) = stronger privacy but lower model utility.
Secure Aggregation
Secure Aggregation uses cryptographic protocols (secret sharing, homomorphic encryption) so the server learns only the sum of client updates — never any individual client's gradient. This protects against a curious server while still enabling accurate FedAvg aggregation.
Challenges in Federated Learning
Federated learning introduces challenges absent in centralised training: non-IID data across clients, unreliable client connectivity, and slower convergence.
Key Challenges and Mitigations
- Statistical heterogeneity (non-IID data): FedProx adds a proximal term to prevent large local deviations
- System heterogeneity: Async aggregation or client sampling handles slow or dropping clients
- Communication efficiency: Gradient compression, quantisation, and model distillation reduce upload costs
- Convergence: More local epochs per round helps, but can increase client drift on non-IID data