How I moved 365 GB of instructor videos in one weekend (Dropbox → SharePoint, zero failures)
A client (MartialApps Inc., a karate platform) had 654 instructor video files — about 365 GB — sitting in Dropbox. They needed them in SharePoint for compliance. They didn't want to use Dropbox's "move to SharePoint" Zapier path because the files had specific folder hierarchies that had to be preserved exactly.
I had 48 hours. Here's what I built.
The shape of the problem
- Source: Dropbox shared folder, ~150 nested subfolders, mixed video formats.
- Destination: SharePoint Online (vonkcanada tenant), a deep path under
Shared Documents/IT Developers/Videos app/. - Constraint: Preserve all metadata (creation dates, file names with Bengali characters, folder structure).
- Constraint: Verifiable — leave a per-file CSV proving each upload completed.
- Constraint: Idempotent — if it fails halfway, the next run picks up where it left off.
- Constraint: Run somewhere other than my laptop. I sleep.
The architecture
Dropbox shared link
│
│ (Dropbox API — list_folder, download)
▼
Python worker
│
│ (MS Graph — large_upload_session)
▼
SharePoint folder
│
└─ writes results.csv to GitHub Actions artifactWorker runs on GitHub Actions. Why GHA instead of a droplet:
- Free 2000 minutes/mo on personal accounts — plenty for batch jobs.
- Auth via OIDC = no long-lived tokens to leak.
- Logs + artifacts archived for free.
- Workflow file lives in the repo = the whole job is version-controlled.
- Re-running a job = one click in the GitHub UI.
The pieces
1. Auth — MSAL device code flow
SharePoint via MS Graph needs an Azure app registration + delegated user token. Device-code flow lets you authenticate once locally, then store the refresh token as a GitHub secret. The Action picks it up and refreshes silently for each run.
from msal import PublicClientApplication, SerializableTokenCache
cache = SerializableTokenCache()
if os.path.exists(CACHE_PATH):
cache.deserialize(open(CACHE_PATH).read())
app = PublicClientApplication(
CLIENT_ID, authority=f"https://login.microsoftonline.com/{TENANT_ID}",
token_cache=cache,
)
accounts = app.get_accounts()
if accounts:
result = app.acquire_token_silent(SCOPES, account=accounts[0])
else:
flow = app.initiate_device_flow(scopes=SCOPES)
print(flow["message"])
result = app.acquire_token_by_device_flow(flow)
open(CACHE_PATH, "w").write(cache.serialize())2. Dropbox — list + download with resumable streams
Dropbox's files/list_folder + files/list_folder/continue gives you the whole tree. Stream downloads via files/download into a tempfile. Don't slurp into memory — some files are 2+ GB.
3. SharePoint — large-upload sessions
Files under 4 MB can be PUT in one shot. Anything bigger needs a upload session: create a session URL, then PUT chunks (10 MB each) with Content-Range headers. MS Graph reassembles them. Each chunk has its own retry budget.
def upload_large(graph_token, drive_id, target_path, local_path):
url = f"https://graph.microsoft.com/v1.0/drives/{drive_id}/root:/{target_path}:/createUploadSession"
r = requests.post(url, headers={"Authorization": f"Bearer {graph_token}"},
json={"item": {"@microsoft.graph.conflictBehavior": "replace"}})
upload_url = r.json()["uploadUrl"]
file_size = os.path.getsize(local_path)
chunk_size = 10 * 1024 * 1024 # 10 MB
with open(local_path, "rb") as f:
offset = 0
while offset < file_size:
chunk = f.read(chunk_size)
end = offset + len(chunk) - 1
headers = {
"Content-Length": str(len(chunk)),
"Content-Range": f"bytes {offset}-{end}/{file_size}",
}
for attempt in range(3):
resp = requests.put(upload_url, headers=headers, data=chunk)
if resp.status_code in (200, 201, 202):
break
time.sleep(2 ** attempt)
offset += len(chunk)
return True4. Idempotency — the receipt CSV
Before uploading a file, check if it already exists at the target path with the same size. If yes, skip. This makes the entire job restartable. After upload, append a row to results.csv:
source_path,bytes,sha256_short,sharepoint_url,uploaded_at
KaratePunches/forward-punch.mp4,12435421,a8f2c1,https://...,2026-04-30T11:14:22Z
KaratePunches/back-fist.mp4,9821339,c1d3e5,https://...,2026-04-30T11:15:08ZEnd of job: the CSV is the ground truth. Diff against the source listing to find any gaps.
5. Verification — a separate script
Once upload finishes, a second script walks the SharePoint target recursively and confirms every file in the source listing has a SharePoint counterpart with matching size. Reports any mismatch.
def verify(source_listing, sp_path):
sp_files = list_sharepoint_recursive(sp_path) # {path: size}
missing = []
size_mismatch = []
for src in source_listing:
rel = src["path"].lstrip("/")
if rel not in sp_files:
missing.append(rel)
elif sp_files[rel] != src["size"]:
size_mismatch.append((rel, src["size"], sp_files[rel]))
return missing, size_mismatchThe results
| Metric | Value |
|---|---|
| Files transferred | 654 |
| Total size | 365 GB |
| Wall time | 1h 56m (single GH Actions job) |
| Failures | 0 |
| Retries triggered (chunk-level) | 14 |
| Manual interventions | 0 |
Cost: $0 (within free GitHub Actions allowance).
Why this beats no-code tools for this job
- Zapier / Make caps out at small files. Most file-mover Zaps refuse anything above 100-150 MB. Our files were up to 2.3 GB.
- Power Automate works but charges per run. 654 runs at "premium connector" prices = ~$200/mo of license cost. We pay $0.
- Dropbox Smart Sync + manual drag-and-drop: would take a person 2 days, lose folder hierarchy somewhere, and have no audit trail.
Reuse — the kit
The whole thing is parameterized. Want to run a new transfer? Two commands:
# 1. Mint a fresh Dropbox short-lived token, paste into .env
# 2. Trigger:
bash automation/new-job.sh \
"Sales Marketing/Videos social media/MyNewFolder" \
"https://www.dropbox.com/scl/fo/.../?dl=0"GitHub Actions picks it up, runs the worker, drops a verification CSV in the artifact, and emails me when done.
Since the original run we've done four more transfers — all clean, all under 2 hours, all $0 marginal cost. The kit is open source, see my GitHub.
What this kind of work costs
For "I have N gigabytes of files in System A that need to be in System B, with audit trail and zero data loss" projects, ballpark:
- Source + destination are mainstream (Dropbox / GDrive / S3 / SharePoint): $800-1500 fixed.
- One end is a legacy / niche system needing custom auth: $1500-3000.
- Ongoing pipeline (new files arrive monthly): +$200/mo retainer for monitoring and re-runs.
If you have a file-pile problem like this, book a free call. Most of these are solvable in a weekend.
Topics:
Want to Implement These Strategies?
I can help you apply these insights to your business. Book a free consultation today.
Book Your Free Consultation