Setting up AI Infrastructure with sensitive data (part 1)

I have been looking on the internet to find some information on hosting AI training software with little luck. Here is my setup, maybe it will help you make better decisions.

The training rig

With a strict budget limit of $15k, we are not going to outperform Alibaba servers, but we will setup a decent training rig with secure object storage.

Labeled data will keep flowing from collection sites. It will be put in a readonly bucket. Experiments will need differently processed data, so a processed data will be stored at different buckets. The Object Storage server should be able to handle all that securely and efficiently.

Objectives

  • data privacy
  • on-premise computing
  • reproducible models
  • models optimized for specialised hardware (quantization aware training)
  • team collaboration

This training setup is suppose to suit our specific use-case, which includes PII data. If that wasn’t the case I would go with one of the cloud services.

Servers specs

Training server

- CPU Intel® Xeon® GOLD 5220R, 24 cores, 2.2 GHz, 35.75MB L3 cache (150W)
- RAM 32GB RAM, DDR4 ECC Registered 2666 MHz
- GPU GeForce RTX 3090 24GB
- 1.6TB Samsung SSD PM1725b, HHHL PCIe 3.0 x8, NVMe

Storage server

- CPU Intel® Xeon® E-2224, 4 rdzenie, 3.4 GHz, 8MB L3 cache (71W)
- RAM 16GB DDR4 ECC Unbuffered 2666 MHz
- SSD SATA 240GB, 2.5in SATA 6Gb/s, enterprise
- Disk HDD SATA, 8TB, 7200rpm, Enterprise

Software

A few open source object storage solutions are available, I chose min.io. For extra security it is configured in SSE-S3 mode, where the secret key is managed by outside KMS.

For running experiments we use Jupiter Lab. We have created a shared space on disk to be able to share notebook without the trouble of going through git repos. A Coral TPU M.2 is connected to the AI server to run quantized neural networks the same way as they are on the Edge.

For data versioning we stick with DVC, which makes some redundancies but is really helping to organise training.

Experiments are being tracked using Weights & Biases. It is a pretty expensive service, but we found it the easiest to use. We also use W&B Sweeps to automate the hyperparameter tunning, it is very convinient.

W&B is an outside service, so we don’t post any data there. At some point plan is to use on-prem experiment tracking.

In Part two I will share how data flows during the experiment.

Let me know on twitter if you enjoyed this read.

Incorporating Swift in Objective-C projects

Background

Since Apple released Swift, a lot of people were eager to use it, but some of us don’t start new projects that often. Newcomers don’t even learn Objective-C anymore. In our iOS community we never had a chance of writing in a new language. The last update I remember, was Modern Objective-C, which brought us few useful things, but the changes were small. We had Convert to Modern Objective-C Syntax button, that mosty did the job. It’s also worth remembering that with Swift, we also got update for Obj-C. Also, remember the days of gcc? Since Apple compiler change, the things started to develop in a faster pace.

The Swift Decision

So, if you’re wondering whether you should port you app to Swift, this is my advice.

Wait for the ocasion. Look for:

  • refactor planned
  • new UI comming
  • client demanding the change
  • a lot of free time

In my case it was new UI. We made a decision to wait with Swift, and start incorporating it with the new UI. It went better then expected. What we thrived to achieve is to leave model data, communicators, managers in Obj-C, and make a Swift UI layer on top of it. The results are fantastic, we have divided “backend” and “frontend” of the app. A lot of functions, were moved from ViewControllers to Obj-C managers, along with relevant getters we needed for the UI.

Implementation

I won’t get into details, because here is a great intro.

My notes on mixing Swift with Objective-C:

  • If you have multiple targets, XCode it will generate a header for each target. In my case, I wanted one header for all targets.
  • Swift can’t access class variables. So you’ll have to prepare getters.

How to make bitcoin a better currency

Dump from a presentation I gave at the uni.

Bitcoin advantages

  1. No central point of trust
  2. Incentives and economic system
  3. Predictable money supply
  4. Divisibility and fungibility
  5. Versatility, openness, vibrancy
  6. Scripting
  7. Transaction irreversibility
  8. Low fees and friction
  9. Readily available implementations


Threats:

  • compromised private key
  • signature forgeries


Posible solutions:

  • threshold cryptography - split private keys into multiple devices
  • super wallet - threshold cryptography + sub-wallet in a smartphone


Accidental loss of bitcoin

lost private key = zombie coins

Solutions:

  • backup
  • pseudo random keys (keep only a seed)
  • encryption
  • trusted paths (DigiPass)


Deflation

  • Bitcoin’s supply was planned at the very beginning
  • there will never be more than 21M Bitcoins (lost included)
  • growing value of Bitcoin encourages saving
  • saving decreases circulation
  • low circulation discourages block creation
  • low block creation may lead to sudden collapse of value or large-scale fraud


History revision attack

  • if two blocks are published nearly simultaneously, a fork in the chain can occur
  • nodes are programmed to follow the blockchain whose total proof-of-work difficulty is the largest and discard blocks from other forks
  • that makes the “history revision attack” possible


There are some simple guidelines defending against the attack

  • trust your own remembered history
  • don’t trust ancient forks


Scalability problems

  • smooth operation of Bitcoin relies on the timely broadcast of transactions and blocks

  • wallet software fetches the entire Bitcoin blockchain at installation

  • all new transactions and blocks are (supposedly) broadcast to all nodes

  • private key storage is dynamically growing


Solutions

  • verifiers, e.g. nodes that create new blocks, need to receive all transactions

  • clients, e.g. nodes that are not minting new coins, need to receive only transactions payable to their public keys

  • a third-party cloud service provider might filter Bitcoin transactions, and sends only relevant transactions to nodes that have registered for the service


Improving anonymity

  • multiple public keys of the same user can potentially be linked when the user pays change to herself 

  • to address this issue, third-party services called mixers take multiple users’ coins, mix them, and issue back coins in equal denominations

  • a malicious mixer can cheat and not pay the money back

  • a cautious user could send the money to the mixer in small amounts, and only continue sending when the mixer has paid back


Conclusion

  • Bitcoin’s appeal lies in its simplicity, flexibility, and decentralization

  • the core design could support a robust decentralized currency if done right


Setting up AI Infrastructure with sensitive data (part 1)

Incorporating Swift in Objective-C projects

How to make bitcoin a better currency