Hello everyone! I have continued to analyse the data set I began looking into a few days ago, and have some new analysis to present to the community.
Before I commence with showing the analytics, a few important distinctions.
Not Complete
This is by no means a true and complete understanding of the commenting behaviours on the HIVE blockchain. It covers just a week of comments, with the data set used being for the period of 18 May through to 24 May. I will be pulling a larger sample of data to further investigate trends.
Classification of Account Types can be improved
For a discussion of this, see the section "non human accounts for the purpose of this analysis" part of this post.
Not every suggestion is yet implemented
I am so very pleased by everyone who took the time to discuss with me in the comments of my previous post about other things that could be investigated and analysed. This is an interim analysis based on what has interested me and been the quickest to go from MVP (my first post) - to something more polished and repeatable.
I'm okay with being proven wrong
If you see stuff in the data that doesn't line up with what you perceive to be facts about the chain, please let me know, and I am happy to further investigate this. Anyway, wtihout further adieu, let's look at the data.
I have completed this analysis using Power Query and Power BI.
I will begin with the homepage of the report, which provides an overview. For the purpose of this example, and to explain the metrics displayed, I will started with only information relating to my own comments for the period:
On the left hand side, we can see the overall information about my comments. The range of dates in which they were found, the total payout value, the average pay out value, the number of comments made, the average length of my comments, how many replies they got, and how long the... longest comment was.
Because my own account is tagged as "human", its human. Below that is the user search function.
The table to the right shows this in a table view, which makes more sense when we want to compare users to Each other for the purpose of analysis.
The table to the right breaks down my conversations with people on the chain. You can see who I replied to the most, and what length my average reply to that person was. I've also added how many votes that reply has gotten.
In my example, you can see that my most common conspirators are @riverflows and @galenkp, along with a whole host of other people.
Comments in General
Now, If I remove the filter showing just myself on the report, things will start to get a little more interesting:
We can now see that I've flagged about 30% of comments as coming from "non human" accounts and when sorting by pay out value, we can see that three non human accounts have received the majority of rewards for comments.
There is, however, an opportunity to improve this analysis, as I am not (at this stage) actively searching to see if there is a beneficiary defined for a user's comments. For instance, redditposh sometimes defines beneficiaries to its comments which go to the author of the top level post.
This is something in my "backlog" to investigate as to where the rewards for such accounts "goes". Take it with a grain of salt for the time being. However, If I flick back over to accounts that I've comfortably defined as a human (again, see the appendices at the bottom of this post for further information on what is used to determine that), then we can start to examine "human" users in more detail.
I can then sort by the other dimensions to find out, for instance who made the most comments in the period:
Then, I can see who got the most votes, where I notice my first BIG HMMM". There's a number of users with a low number of comments, but a HIGH number of votes for relatively short comments. Comments that are around average length, but... why on Earth would these comments be seeing ~200 votes per comment?
200 VOTES PER COMMENT? WHAT?
Here is what that brief investigation uncovers for some of the users in this high vote:comment ratio. It appears that there is an app on hive called skatehype, and well, it looks like there's an incredibly long curataion trail for comments on posts...
Why? I don't know. Does this add value? I don't know? Should I ignore it in future releases of this report? You tell me.
It appears that almost all users of this app / front end / site have around 200 votes per comment. I feel like that is pretty spammy on the chain, but the app looks like it rewards and onboards people for uploading skateboard tricks to hive.
I didn't know this app existed before this analysis, but investigation like this lets us learn new things. The first person who appears to be a genuine author on this list, with solid engagement across their comments (in terms of net positive votes) is @davideownzall
Average length of Comments
Here, a balance between long comments and lots of comments is probably what we're looking for to find people that engage deeply with the content of other people. Already, I am seeing a few accounts listed that may not be classified as "human" appearing in this list. A reminder that this is is not complete analytics or research, and will be a work in progress for some time to come.
Average Depth of Comments:
Who leaves the longest comment chains? Who talks to people a lot more than others. Again, here - I'm seeing some accounts that are likely not to meet the definition of human, but there are definitely a few humans that I can see in this list. I will add this to the list of things to further investigate at a later time.
Longest Comment
Who had the most "points" to make? Or, who just unloaded half a novella onto the chain in someone's comment's section, or ... perhaps is quoting the entire content of a really long chain by other users?
I love the fact that a user by the name of "short segments" has the longest comment here. There's a lot of human acacounts in this list, but again, at a glance, there's a few accounts that may not be human showing up in this list, for instance "dcityrewards". This is something that I will need to look into further to make adjustments to the "likely not human list" in the Appendices to this post.
Who talks to who the most?
Here is a screenshot of who talks to who the most, taken from the right hand side of the report, taking into account the frequency of replies.
Now, moving onto some more exciting stuff....
Duplicate Comments Left by the same author
This will be a series of screenshots, in order to demonstrate the content of the comment, along with the number of times it has appeared on the chain. Please remember that this is just a week's worth of data for the purposes of this analysis, so it is likely that these accounts have made these identical comments many more times than depicted here.
As a discussion point - does the HIVE community as a whole, feel that these identical comments add any value to the chain, or does it serve only to increase the bloat of the block-log, and create boundaries into the abilities of preservation, on the basis that much of the content in these comments is visible in other places on chain, where it serves a purpose, such as in the voting ledger?
Anyhow, I present to you the most prevalent comments by a SINGLE user that are repeated many times on HIVE. This is to say - a user than leaves the EXACT Same comment more than once.
I'll stop here, but it is evident that this page is NOT FILTERED to human creators - and we can see a lot of curation projects voting on content. In the future, I plan to use this page of the report to identify users who make the identical comment multiple times on chain and potentially add them to "not human" user class in further analysis.
But what about the same comment that may be left by more than one user?
This is something that is interesting ... I guess - how many users have left a comment like "thank you", or "great post" or something like that? For this step, I am moving out of PowerBI and into KNIME to use its powerful group by function and aggregation / concatenation features. I've exported my table as a CSV, then completed a group by (body of comment) - so we know what the comment is, followed by the sum (so we know how many times that comment was published), then a concatenation of each of the "authors" who posted that comment, and how many times each one contributed.
This screenshot is pretty hard to read, so here's a table of anything that has appeared more than fifty times. There's probably some accounts here that should be flagged as non human, I'll add that to my list of things to do for future iterations of reporting.
I have stripped out any HTML and images (hopefully) but the table below is a bit of a nightmare owing to the large number of users making calls to bots. I should probably exclude those from calls from future
Comments Published more than ten times:
Finally, Comment Reward Distribution:
This needs further investigation to see if the "non human comment payments" involve beneficiary.
Appendices:
"Non Human" accounts for the purpose of this analysis
The definition of "human" here is a single person acting on their own, not a part of a curation team or community. (And certainly not a bot!)
To create this list, I looked at comments that appeared more than once (where the text of each comment was identical, and the author was also the same).
Most of these accounts relate to curation projects stating that they've voted on something. I'm not counting them as "human", as they're often a collective of humans, which is by definition, not "human".
If you are a human, and you appear on this list, please comment and I will investigate further based on your comment history. This list is likely to grow as a look at data on a longer term basis.
"a-colmena", "actifit", "airhawk-project", "ajolote", "ajolte", "aliveandthriving", "amazingdrinks", "aquarius.academy", "asd09", "asean.hive", "beerlover", "bilpcoinbpc", "bot-bdbhueso", "bpcvoter1", "bpcvoter2", "bpcvoter3", "ccceo.voter", "celf.magazine", "centtoken", "chessbrotherspro", "cinnccf", "commentrewarder", "coolmonsters", "digital.hub", "discovery-it", "diyhub", "dmhafiz", "dookbot", "douglas.life", "drawmatic", "duo-tip", "dw38h", "ecency.waves", "enlace", "entropia", "es-literatos", "f76wz", "fallen.angels", "fgh87", "foodiesunite", "fulldeportes", "guest06", "guest07", "guest08", "guest09", "guest10", "helios.daily", "helios.notify", "helios.voter", "hiq.smartbot", "hispapro", "hive-103505", "hive-106316", "hive-106444", "hive-118507", "hive-124452", "hive-134572", "hive-174680", "hive-177745", "hive-179017", "hive-br.voter", "hive-lu", "hive=14396", "hiveargentina", "hivebits", "hivebuzz", "hivecurious", "hivepakistan", "hivewatchers", "hk14d", "hug.bot", "india-leo", "indiaunited", "indonesianhiver", "innerblocks", "itharagaian", "jamerussell", "jkl65", "keys-defender", "la-colmena", "ladiesofhive", "ladytoken", "leothreads", "liketu.moments", "lilybee", "lolzbot", "lovesniper", "luvshares", "meme.bot", "minimalistliving", "music-community", "musiczone", "nowayjosecuisine", "osomar357", "pandex", "peak.snaps", "pixbee", "pizzabot", "poshtoken", "qurator", "redditposh", "rutablockchain", "sahi1", "scifimultiverse", "sor31", "splinterboost", "ssg-community", "steemmonsters", "stemsocial", "strava2hive", "swc-curation", "terraboost", "theinkwell", "thepimpdistrict", "thoughtfulposts", "tippybot", "tipu", "tokenfaucet", "topcomment", "travelfeed", "tydynrain", "u89gw", "upme.notify", "vibes-voter", "vmn31", "w7ngc", "w95hj", "waivio", "waivio.updates01", "waivio.updates02", "waivio.updates03", "waivio.updates04", "waivio.updates05", "waivio.updates06", "waivio.updates07", "waivio.updates08", "waivio.updates09", "waivio.updates10", "wine.bot", "witnessbot", "wiv01", "womentribe", "worldmappin", "x6oc5", "xcv47", "youhive", "zxc43"
Comments Published More than Once (HTML, PNG, GIF, JPEG, JPG, "<", ">", "/" "" "@" "1" filtered out:
Things to still do
- Investigate beneficiary rewards
- Find more accounts that "aren't human, investigating: duplicate comments and curation hive accounts"
- Who Swears the most?
- Operationalise the dataset
Power Query for extraction and taagging:
Source = Sql.Databases("vip.hivesql.io"),
DBHive = Source{[Name="DBHive"]}[Data],
dbo_Comments = DBHive{[Schema="dbo",Item="Comments"]}[Data],
#"Select Date" = Table.SelectRows(dbo_Comments, each [created] >= #datetime(2025, 5, 18, 0, 0, 0) and [created] <= #datetime(2025, 5, 24, 0, 0, 0)),
#"Only Comments" = Table.SelectRows(#"Select Date", each ([parent_author] <> null and [parent_author] <> "") and ([title] = "")),
#"Removed Columns" = Table.RemoveColumns(#"Only Comments",{"author_rewards", "promoted", "body_language", "TS"}),
#"Duplicated Column" = Table.DuplicateColumn(#"Removed Columns", "body", "body - Copy"),
#"Lowercased Text" = Table.TransformColumns(#"Duplicated Column",{{"body - Copy", Text.Lower, type text}}),
#"Added Conditional Column" = Table.AddColumn(#"Lowercased Text", "Swearing?", each if Text.Contains([#"body - Copy"], "fuck") then "Contains Swearing" else if Text.Contains([#"body - Copy"], "shit") then "Contains Swearing" else if Text.Contains([#"body - Copy"], "cunt") then "Contains Swearing" else if Text.Contains([#"body - Copy"], "wanker") then "Contains Swearing" else "No Swearing"),
#"Added Conditional Column1" = Table.AddColumn(#"Added Conditional Column", "Thankful?", each if Text.Contains([#"body - Copy"], "thank") then "Uses Thank" else "Doesn't Thank"),
#"Added Conditional Column2" = Table.AddColumn(#"Added Conditional Column1", "Uses Exclamation", each if Text.Contains([#"body - Copy"], "!") then "Uses Exclamation" else "Doesn't use exclamataion"),
#"Added Conditional Column3" = Table.AddColumn(#"Added Conditional Column2", "Calls Bot", each if Text.Contains([#"body - Copy"], "!bbh") then "Yes" else if Text.Contains([#"body - Copy"], "!beer") then "Yes" else if Text.Contains([#"body - Copy"], "!wusang") then "Yes" else if Text.Contains([#"body - Copy"], "!lady") then "Yes" else if Text.Contains([#"body - Copy"], "!wine") then "Yes" else "Probably Not"),
#"Added Custom" = Table.AddColumn(#"Added Conditional Column3", "Custom", each let
textValue = [#"body - Copy"],
splitWords = Text.SplitAny(Text.Lower(textValue), " .,;?()[]{}-"),
startsWithExclamation = List.AnyTrue(List.Transform(splitWords, each Text.StartsWith(_, "!")))
in
startsWithExclamation),
#"Duplicated Column1" = Table.DuplicateColumn(#"Added Custom", "body - Copy", "body - Copy - Copy"),
#"Trimmed Text" = Table.TransformColumns(#"Duplicated Column1",{{"body - Copy - Copy", Text.Trim, type text}}),
#"Cleaned Text" = Table.TransformColumns(#"Trimmed Text",{{"body - Copy - Copy", Text.Clean, type text}}),
#"Renamed Columns1" = Table.RenameColumns(#"Cleaned Text",{{"body - Copy - Copy", "body trim clean"}}),
#"Renamed Columns" = Table.RenameColumns(#"Renamed Columns1",{{"Calls Bot", "Starts with Exclamation Calls Bot"}, {"Custom", "Tokenised Call Bots"}})
in
#"Renamed Columns" ```