Building an Interactive Graph Visualization

The above image was made with stable diffusion using the prompt 'A colorful network graph with a faded urban landscape in the background.'

My new conspiracy news search app got a major upgrade today. The page now features an embedded interactive graph visualization. Each of WantToKnow.info's 20 highest rated stories are shown along with the 10 articles from the archive that they're most closely related to. I'm satisfied with how it turned out.

To make this graph, I started with two csv files containing the data I needed. I combined these, did a bunch of pandas, and created a 2-sheet Excel file describing the graph. This Excel file was then uploaded to Kumu, where I was able to use a variation of css to style the graph's display. Here's the python script I wrote to do that:

import os
import re
import matplotlib.pyplot as plt

df = pd.read_csv("C:/datasources/WTKvectorsearch.csv", sep='|')
dfkey = pd.read_csv("C:/datasources/WTKrelatedmap.csv", sep='|')

# Rename columns
df.rename(columns={"url": "WTKlink"}, inplace=True)
df.rename(columns={"Links": "ArticleSource"}, inplace=True)

# Format tags
df['tags'] = df['tags'].astype(str).apply(lambda t: t.strip('['))
df['tags'] = df['tags'].astype(str).apply(lambda u: u.strip(']'))
df['tags'] = df['tags'].str.replace(',', '|')
df['tags'] = df['tags'].str.replace('\'', '')
df['tags'] = df['tags'].str.replace(' ', '')

# Ensure Priority column is numeric
df['Priority'] = pd.to_numeric(df['Priority'], errors='coerce')

# Sort by Priority and reset index
df_sorted = df.sort_values(by='Priority', ascending=False).reset_index(drop=True)

# Create a list of top 20 ArticleId values
top = df_sorted['ArticleId'].head(20).tolist()
df = df_sorted.head(20)
df.insert(loc=1, column='Type', value="Primary", allow_duplicates=True)

# Select rows from dfkey where ArticleId is in the top list
dfkey = dfkey[dfkey['ArticleId'].isin(top)]

# Convert Related column from string representation of list to actual list
dfkey['Related'] = dfkey['Related'].apply(eval)

# Expand the Related column
dfkey_expanded = dfkey.explode('Related').reset_index(drop=True)
dfkey_expanded = dfkey_expanded.drop_duplicates()

# Get all unique values from ArticleId and Related columns
unique_values = pd.concat([dfkey_expanded['ArticleId'], dfkey_expanded['Related']]).unique()

# Convert to list
unique_values_list = unique_values.tolist()

dftemp = df_sorted[df_sorted['ArticleId'].isin(unique_values_list)]
dftemp.insert(loc=1, column='Type', value="Related", allow_duplicates=True)

# Merge df and dftemp, prioritizing df values
merged_df = pd.concat([df, dftemp], ignore_index=True)

# Sort by 'Type' to prioritize 'Primary' over 'Related'
merged_df = merged_df.sort_values(by='Type', ascending=True)

# Drop duplicates, keeping the first occurrence (Primary)
merged_df = merged_df.drop_duplicates(subset='ArticleId', keep='first').reset_index(drop=True)

# Rename columns
dfkey_expanded.rename(columns={"ArticleId": "From"}, inplace=True)
dfkey_expanded.rename(columns={"Related": "To"}, inplace=True)
merged_df.rename(columns={"Priority": "Weight"}, inplace=True)

# Select relevant columns
final = merged_df[['ArticleId', 'Type', 'Title', 'tags', 'PublicationDate', 'Publication', 'Summary', 'ArticleSource', 'wtkURL', 'Weight']]

# Subtract 699 from each value in the 'Weight' column
final['Weight'] = final['Weight'] - 699

# Use qcut to scale data into simplified integer values (0-11 range)
final['Weight'] = pd.qcut(final['Weight'], 12, labels=False, duplicates='drop')
final = final[final['Weight'].notna()]
final['Weight'] = final['Weight'].astype(int)

# Plot to confirm binning success
#final['Weight'].value_counts(sort=True).plot.bar()

# Strip html from smmaries for display
final['Summary'] = final['Summary'].str.replace(r'<[^<>]*>', '', regex=True)

# Rename columns for kumu
dfkey_expanded.rename(columns={"ArticleId": "From"}, inplace=True)
dfkey_expanded.rename(columns={"Related": "To"}, inplace=True)
final.rename(columns={"ArticleId": "Label"}, inplace=True)

writer = pd.ExcelWriter("C:\\datasources\\topstoriesmap.xlsx")
final.to_excel(writer,'Sheet1', index=False)
dfkey_expanded.to_excel(writer,'Sheet2', index=False)
writer.save()

Reflections

When I woke up today, I was determined to add this feature to my app. Initially I was considering computing the graph directly in the page using D3. Maybe I'll end up doing that eventually. I'm not thrilled about embedded iframes, but nearly everything else about Kumu is great and relatively easy, so I went with that.

It was a bit of a heavy lift to get all of this done in one long day. But the result of that labor is absolutely cool. I've spent a long time imagining a webpage like the one I've created. Now that it's made, I'm feeling like I did something personally significant, even if not that many people end up using it.

Read Free Mind Gazette on Substack

Read my novels:

Small Gods of Time Travel is available as a web book on IPFS and as a 41 piece Tezos NFT collection on Objkt.
The Paradise Anomaly is available in print via Blurb and for Kindle on Amazon.
Psychic Avalanche is available in print via Blurb and for Kindle on Amazon.
One Man Embassy is available in print via Blurb and for Kindle on Amazon.
Flying Saucer Shenanigans is available in print via Blurb and for Kindle on Amazon.
Rainbow Lullaby is available in print via Blurb and for Kindle on Amazon.
The Ostermann Method is available in print via Blurb and for Kindle on Amazon.
Blue Dragon Mississippi is available in print via Blurb and for Kindle on Amazon.

See my NFTs:

Small Gods of Time Travel is a 41 piece Tezos NFT collection on Objkt that goes with my book by the same name.
History and the Machine is a 20 piece Tezos NFT collection on Objkt based on my series of oil paintings of interesting people from history.
Artifacts of Mind Control is a 15 piece Tezos NFT collection on Objkt based on declassified CIA documents from the MKULTRA program.