So you inherited took over some Tableau Workbooks and need to rework them into something more efficient and migrate to Tableau server. I am going to recreate teh Tableau Workbooks on Tableau server rather than just publishing as I found the calculated fields get published as a dimension. Other than calculated fields, I found there are group definition to take care of too and I have not look into sets yet. With multiple data sources and calculated fields and groups, I wanted get a complete list of these to work through.
🎶Meanwhile, some music for you to enjoy as you read. This piece is part of Vivaldi’s The Four Seasons, particularly the first of the 4 violin concertos, called Spring.
🎶让大家继续读的时候,欣赏一点音乐。这件作品是维瓦尔第的“四季”的一部分 - 4小提琴协奏曲中的第一首,名为“春天”。
Source | 来源: https://musopen.org/
https://soundcloud.com/alan-chang-563971986/four-concerti-grossi-4-vivaldi-op8-i-la-primavera-allegro-largo-allegro
Source | 来源: https://musopen.org/music/2213/antonio-vivaldi/the-four-seasons/
I did some research and found that Tableau Workbooks are basically saved as XML files. So I should be able to “look” into it. I found a page which gives a clue on handling the calculated fields but that doesn’t take care of the group definitions. So I studied the way it works for the calculated fields and rework a bit to cater for getting the information for calculated fields and group definitions. My program will produce two Excel files: one with the information for calculated fields and the other with the information for group definitions. It’s not perfect but here is the program which I wrote in Jupyter Notebook and downloaded as python file if anyone is interested.
#!/usr/bin/env python
# coding: utf-8
# This program has two parts:
# 1. Extract Tableau TWB Calculated Fields into an Excel file
# Reference :https://vizartpandey.com/how-to-extract-tableau-field-information-using-python-api/
#
#
# 2. Extract Tableau TWB Group definition into an Excel file
# Note: I decided to output to 2 worksheets because I wanted to look at them in different format
Librarys import and setup
import easygui
import xml.etree.ElementTree as ET
import os
import pandas as pd
from IPython.display import Markdown, display, HTML
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', None)
To prompt user for twb file, uncomment the next line and comment out the line after.
file = easygui.fileopenbox(default="C:\projects\ace\twb*.twb")
Update your TWB filename to read
file = r"C:\Users\ace\Documents\My Tableau Repository\Workbooks\Tableau For Healthcare - Ambulatory Access.twb" # <--- change filename if necessary
#parse the twb file
tree = ET.parse(file)
root = tree.getroot()
create a dictionary of name and tableau generated name
calc_dict = {}
for item in root.findall('.//column[@caption]'):
if item.find(".//calculation") is None:
continue
else:
calc_dict[item.attrib['name']] = '[' + item.attrib['caption'] + ']'
list of calculation field name, tableau generated name, and calculation/formula
calc_list = []
for item in root.findall('.//column[@caption]'):
if item.find(".//calculation") is None:
continue
else:
if item.find(".//calculation[@formula]") is not None:
calc_caption = '[' + item.attrib['caption'] + ']'
calc_name = item.attrib['name']
calc_raw_formula = item.find(".//calculation").attrib['formula']
calc_comment = ''
calc_formula = ''
for line in calc_raw_formula.split('\r\n'):
if line.startswith('//'):
calc_comment = calc_comment + line + ' '
else:
calc_formula = calc_formula + line + ' '
for name, caption in calc_dict.items():
calc_formula = calc_formula.replace(name, caption)
calc_row = (calc_caption, calc_name, calc_formula, calc_comment)
calc_list.append(list(calc_row))
else:
pass
convert the list of calculation into a data frame
data = calc_list
data = pd.DataFrame(data, columns=['Name', 'Remote Name', 'Formula', 'Comment'])
remove duplicate rows from data frame
data = data.drop_duplicates(subset=None, keep='first', inplace=False)
Note: This part does not handle the comments enter in the same line as the formula but works for me well enough and I'm not going to amend further.
get the name of the file
export to Excel
base = os.path.basename(file)
os.path.splitext(base)
filename = os.path.splitext(base)[0]
out_folder = 'C:\projects\ace\output\'
out_file = out_folder + filename + '_' + 'calculations.xlsx'
data.to_excel(out_file, 'Calculations')
display(f'Opening output file containinng calculations {out_file}')
os.startfile(out_file)
list of group name, tableau generated name, and group definitions
group_list = []
for item in root.findall(".//column[@caption]"):
group_caption = item.attrib.get('caption')[0:]
group_name = item.attrib.get('name')[1:-1]
for group in item.findall(".//calculation[@column]"):
group_depend_field = group.attrib.get('column')[1:-1]
for bin in group.findall(".//bin[@default-name]"):
default_bin = bin.attrib.get('default-name')
bin_value = bin.attrib.get('value')[0:]
bin_items = []
for bin_item in bin.findall(".//value"):
bin_item_value = bin_item.text
bin_items.append(bin_item_value)
bin_items_dict = dict(enumerate(bin_items, 1))
group_row = (group_caption, group_name, group_depend_field, bin_value, default_bin, bin_items_dict)
group_list.append(list(group_row))
convert the list of groups into a data frame
group_data = group_list
group_df = pd.DataFrame(group_data, columns=['Name', 'Remote Name', 'Dependent Field', 'Bin', 'Default', 'Items'])
export to Excel
out_file = out_folder + filename + '_' + 'groups.xlsx'
group_df.to_excel(out_file, 'Groups')
display(f'Opening output file containinng groups {out_file}')
os.startfile(out_file)


Please see my other posts: @ace108
请看我其他帖: @ace108
