Field Notes | Ahsan Habib Akik — Full-Stack Developer

A client (MartialApps Inc., a karate platform) had 654 instructor video files — about 365 GB — sitting in Dropbox. They needed them in SharePoint for compliance. They didn't want to use Dropbox's "move to SharePoint" Zapier path because the files had specific folder hierarchies that had to be preserved exactly.

I had 48 hours. Here's what I built.

The shape of the problem

Stack of hard drives — The brief — preserve hierarchy, hit zero failures, leave receipts.

Source: Dropbox shared folder, ~150 nested subfolders, mixed video formats.
Destination: SharePoint Online (vonkcanada tenant), a deep path under Shared Documents/IT Developers/Videos app/.
Constraint: Preserve all metadata (creation dates, file names with Bengali characters, folder structure).
Constraint: Verifiable — leave a per-file CSV proving each upload completed.
Constraint: Idempotent — if it fails halfway, the next run picks up where it left off.
Constraint: Run somewhere other than my laptop. I sleep.

The architecture

Dropbox shared link
       │
       │  (Dropbox API — list_folder, download)
       ▼
   Python worker
       │
       │  (MS Graph — large_upload_session)
       ▼
SharePoint folder
       │
       └─ writes results.csv to GitHub Actions artifact

Worker runs on GitHub Actions. Why GHA instead of a droplet:

Free 2000 minutes/mo on personal accounts — plenty for batch jobs.
Auth via OIDC = no long-lived tokens to leak.
Logs + artifacts archived for free.
Workflow file lives in the repo = the whole job is version-controlled.
Re-running a job = one click in the GitHub UI.

The pieces

1. Auth — MSAL device code flow

SharePoint via MS Graph needs an Azure app registration + delegated user token. Device-code flow lets you authenticate once locally, then store the refresh token as a GitHub secret. The Action picks it up and refreshes silently for each run.

from msal import PublicClientApplication, SerializableTokenCache

cache = SerializableTokenCache()
if os.path.exists(CACHE_PATH):
    cache.deserialize(open(CACHE_PATH).read())

app = PublicClientApplication(
    CLIENT_ID, authority=f"https://login.microsoftonline.com/{TENANT_ID}",
    token_cache=cache,
)

accounts = app.get_accounts()
if accounts:
    result = app.acquire_token_silent(SCOPES, account=accounts[0])
else:
    flow = app.initiate_device_flow(scopes=SCOPES)
    print(flow["message"])
    result = app.acquire_token_by_device_flow(flow)

open(CACHE_PATH, "w").write(cache.serialize())

2. Dropbox — list + download with resumable streams

Dropbox's files/list_folder + files/list_folder/continue gives you the whole tree. Stream downloads via files/download into a tempfile. Don't slurp into memory — some files are 2+ GB.

3. SharePoint — large-upload sessions

Files under 4 MB can be PUT in one shot. Anything bigger needs a upload session: create a session URL, then PUT chunks (10 MB each) with Content-Range headers. MS Graph reassembles them. Each chunk has its own retry budget.

def upload_large(graph_token, drive_id, target_path, local_path):
    url = f"https://graph.microsoft.com/v1.0/drives/{drive_id}/root:/{target_path}:/createUploadSession"
    r = requests.post(url, headers={"Authorization": f"Bearer {graph_token}"},
                      json={"item": {"@microsoft.graph.conflictBehavior": "replace"}})
    upload_url = r.json()["uploadUrl"]

    file_size = os.path.getsize(local_path)
    chunk_size = 10 * 1024 * 1024  # 10 MB
    with open(local_path, "rb") as f:
        offset = 0
        while offset < file_size:
            chunk = f.read(chunk_size)
            end = offset + len(chunk) - 1
            headers = {
                "Content-Length": str(len(chunk)),
                "Content-Range": f"bytes {offset}-{end}/{file_size}",
            }
            for attempt in range(3):
                resp = requests.put(upload_url, headers=headers, data=chunk)
                if resp.status_code in (200, 201, 202):
                    break
                time.sleep(2 ** attempt)
            offset += len(chunk)
    return True

4. Idempotency — the receipt CSV

Before uploading a file, check if it already exists at the target path with the same size. If yes, skip. This makes the entire job restartable. After upload, append a row to results.csv:

source_path,bytes,sha256_short,sharepoint_url,uploaded_at
KaratePunches/forward-punch.mp4,12435421,a8f2c1,https://...,2026-04-30T11:14:22Z
KaratePunches/back-fist.mp4,9821339,c1d3e5,https://...,2026-04-30T11:15:08Z

End of job: the CSV is the ground truth. Diff against the source listing to find any gaps.

5. Verification — a separate script

Once upload finishes, a second script walks the SharePoint target recursively and confirms every file in the source listing has a SharePoint counterpart with matching size. Reports any mismatch.

def verify(source_listing, sp_path):
    sp_files = list_sharepoint_recursive(sp_path)  # {path: size}
    missing = []
    size_mismatch = []
    for src in source_listing:
        rel = src["path"].lstrip("/")
        if rel not in sp_files:
            missing.append(rel)
        elif sp_files[rel] != src["size"]:
            size_mismatch.append((rel, src["size"], sp_files[rel]))
    return missing, size_mismatch

The results

Metric	Value
Files transferred	654
Total size	365 GB
Wall time	1h 56m (single GH Actions job)
Failures	0
Retries triggered (chunk-level)	14
Manual interventions	0

Cost: $0 (within free GitHub Actions allowance).

Why this beats no-code tools for this job

Zapier / Make caps out at small files. Most file-mover Zaps refuse anything above 100-150 MB. Our files were up to 2.3 GB.
Power Automate works but charges per run. 654 runs at "premium connector" prices = ~$200/mo of license cost. We pay $0.
Dropbox Smart Sync + manual drag-and-drop: would take a person 2 days, lose folder hierarchy somewhere, and have no audit trail.

Reuse — the kit

The whole thing is parameterized. Want to run a new transfer? Two commands:

# 1. Mint a fresh Dropbox short-lived token, paste into .env
# 2. Trigger:
bash automation/new-job.sh \
  "Sales Marketing/Videos social media/MyNewFolder" \
  "https://www.dropbox.com/scl/fo/.../?dl=0"

GitHub Actions picks it up, runs the worker, drops a verification CSV in the artifact, and emails me when done.

Since the original run we've done four more transfers — all clean, all under 2 hours, all $0 marginal cost. The kit is open source, see my GitHub.

What this kind of work costs

For "I have N gigabytes of files in System A that need to be in System B, with audit trail and zero data loss" projects, ballpark:

Source + destination are mainstream (Dropbox / GDrive / S3 / SharePoint): $800-1500 fixed.
One end is a legacy / niche system needing custom auth: $1500-3000.
Ongoing pipeline (new files arrive monthly): +$200/mo retainer for monitoring and re-runs.

If you have a file-pile problem like this, book a free call. Most of these are solvable in a weekend.

I had 48 hours. Here's what I built.

The shape of the problem

Source: Dropbox shared folder, ~150 nested subfolders, mixed video formats.
Destination: SharePoint Online (vonkcanada tenant), a deep path under Shared Documents/IT Developers/Videos app/.
Constraint: Preserve all metadata (creation dates, file names with Bengali characters, folder structure).
Constraint: Verifiable — leave a per-file CSV proving each upload completed.
Constraint: Idempotent — if it fails halfway, the next run picks up where it left off.
Constraint: Run somewhere other than my laptop. I sleep.

The architecture

Dropbox shared link
       │
       │  (Dropbox API — list_folder, download)
       ▼
   Python worker
       │
       │  (MS Graph — large_upload_session)
       ▼
SharePoint folder
       │
       └─ writes results.csv to GitHub Actions artifact

Worker runs on GitHub Actions. Why GHA instead of a droplet:

Free 2000 minutes/mo on personal accounts — plenty for batch jobs.
Auth via OIDC = no long-lived tokens to leak.
Logs + artifacts archived for free.
Workflow file lives in the repo = the whole job is version-controlled.
Re-running a job = one click in the GitHub UI.

The pieces

1. Auth — MSAL device code flow

from msal import PublicClientApplication, SerializableTokenCache

cache = SerializableTokenCache()
if os.path.exists(CACHE_PATH):
    cache.deserialize(open(CACHE_PATH).read())

app = PublicClientApplication(
    CLIENT_ID, authority=f"https://login.microsoftonline.com/{TENANT_ID}",
    token_cache=cache,
)

accounts = app.get_accounts()
if accounts:
    result = app.acquire_token_silent(SCOPES, account=accounts[0])
else:
    flow = app.initiate_device_flow(scopes=SCOPES)
    print(flow["message"])
    result = app.acquire_token_by_device_flow(flow)

open(CACHE_PATH, "w").write(cache.serialize())

2. Dropbox — list + download with resumable streams

Dropbox's files/list_folder + files/list_folder/continue gives you the whole tree. Stream downloads via files/download into a tempfile. Don't slurp into memory — some files are 2+ GB.

3. SharePoint — large-upload sessions

def upload_large(graph_token, drive_id, target_path, local_path):
    url = f"https://graph.microsoft.com/v1.0/drives/{drive_id}/root:/{target_path}:/createUploadSession"
    r = requests.post(url, headers={"Authorization": f"Bearer {graph_token}"},
                      json={"item": {"@microsoft.graph.conflictBehavior": "replace"}})
    upload_url = r.json()["uploadUrl"]

    file_size = os.path.getsize(local_path)
    chunk_size = 10 * 1024 * 1024  # 10 MB
    with open(local_path, "rb") as f:
        offset = 0
        while offset < file_size:
            chunk = f.read(chunk_size)
            end = offset + len(chunk) - 1
            headers = {
                "Content-Length": str(len(chunk)),
                "Content-Range": f"bytes {offset}-{end}/{file_size}",
            }
            for attempt in range(3):
                resp = requests.put(upload_url, headers=headers, data=chunk)
                if resp.status_code in (200, 201, 202):
                    break
                time.sleep(2 ** attempt)
            offset += len(chunk)
    return True

4. Idempotency — the receipt CSV

Before uploading a file, check if it already exists at the target path with the same size. If yes, skip. This makes the entire job restartable. After upload, append a row to results.csv:

source_path,bytes,sha256_short,sharepoint_url,uploaded_at
KaratePunches/forward-punch.mp4,12435421,a8f2c1,https://...,2026-04-30T11:14:22Z
KaratePunches/back-fist.mp4,9821339,c1d3e5,https://...,2026-04-30T11:15:08Z

End of job: the CSV is the ground truth. Diff against the source listing to find any gaps.

5. Verification — a separate script

Once upload finishes, a second script walks the SharePoint target recursively and confirms every file in the source listing has a SharePoint counterpart with matching size. Reports any mismatch.

def verify(source_listing, sp_path):
    sp_files = list_sharepoint_recursive(sp_path)  # {path: size}
    missing = []
    size_mismatch = []
    for src in source_listing:
        rel = src["path"].lstrip("/")
        if rel not in sp_files:
            missing.append(rel)
        elif sp_files[rel] != src["size"]:
            size_mismatch.append((rel, src["size"], sp_files[rel]))
    return missing, size_mismatch

The results

Metric	Value
Files transferred	654
Total size	365 GB
Wall time	1h 56m (single GH Actions job)
Failures	0
Retries triggered (chunk-level)	14
Manual interventions	0

Cost: $0 (within free GitHub Actions allowance).

Why this beats no-code tools for this job

Zapier / Make caps out at small files. Most file-mover Zaps refuse anything above 100-150 MB. Our files were up to 2.3 GB.
Power Automate works but charges per run. 654 runs at "premium connector" prices = ~$200/mo of license cost. We pay $0.
Dropbox Smart Sync + manual drag-and-drop: would take a person 2 days, lose folder hierarchy somewhere, and have no audit trail.

Reuse — the kit

The whole thing is parameterized. Want to run a new transfer? Two commands:

# 1. Mint a fresh Dropbox short-lived token, paste into .env
# 2. Trigger:
bash automation/new-job.sh \
  "Sales Marketing/Videos social media/MyNewFolder" \
  "https://www.dropbox.com/scl/fo/.../?dl=0"

GitHub Actions picks it up, runs the worker, drops a verification CSV in the artifact, and emails me when done.

Since the original run we've done four more transfers — all clean, all under 2 hours, all $0 marginal cost. The kit is open source, see my GitHub.

What this kind of work costs

For "I have N gigabytes of files in System A that need to be in System B, with audit trail and zero data loss" projects, ballpark:

Source + destination are mainstream (Dropbox / GDrive / S3 / SharePoint): $800-1500 fixed.
One end is a legacy / niche system needing custom auth: $1500-3000.
Ongoing pipeline (new files arrive monthly): +$200/mo retainer for monitoring and re-runs.

If you have a file-pile problem like this, book a free call. Most of these are solvable in a weekend.

How I moved 365 GB of instructor videos in one weekend (Dropbox → SharePoint, zero failures)

The shape of the problem

The architecture

The pieces

1. Auth — MSAL device code flow

2. Dropbox — list + download with resumable streams

3. SharePoint — large-upload sessions

4. Idempotency — the receipt CSV

5. Verification — a separate script

The results

Why this beats no-code tools for this job

Reuse — the kit

What this kind of work costs

Topics:

Want to Implement These Strategies?

How I moved 365 GB of instructor videos in one weekend (Dropbox → SharePoint, zero failures)

The shape of the problem

The architecture

The pieces

1. Auth — MSAL device code flow

2. Dropbox — list + download with resumable streams

3. SharePoint — large-upload sessions

4. Idempotency — the receipt CSV

5. Verification — a separate script

The results

Why this beats no-code tools for this job

Reuse — the kit

What this kind of work costs

Topics:

Want to Implement These Strategies?