79434400

Date: 2025-02-12 20:46:56
Score: 7.5 🚩
Natty: 4.5
Report link

thanks for the answer. I've developed this as a solution for the problem i have to solve

//PART 1: Aggregate on temporal dimension and obtain percentage of posts classified as NSFW
//Posts are :(id,subreddit.id,subreddit.name,subreddit.nsfw,created_utc,permalink,domain,url,selftext,title,score)
//(x._1, x._3, x._4)
val percentageNSFWPosts = rddPosts.map(x => (x._5, x._4))  // (created_utc, nsfw flag)
.groupByKey()
.mapValues({case (nsfwCount) =>
    val totalPostsAtTime = nsfwCount.size
    val totNSFWPost = nsfwCount.count(el => el == true)
    ((totNSFWPost * 100) / totalPostsAtTime).toDouble
})

By doing this, i'm able to get everything in a certain temporal dimension and get the percentage of all posts that are considered NSFW. Since you have mentioned usage of reduce (which is also a method they have suggested me to make this more optimized), could you help me to optimize this work by using reduce or reduceByKey ?

Reasons:
  • Blacklisted phrase (0.5): thanks
  • Blacklisted phrase (1): help me
  • Blacklisted phrase (1): :(
  • RegEx Blacklisted phrase (3): could you help me
  • Long answer (-0.5):
  • Has code block (-0.5):
  • Ends in question mark (2):
  • Low reputation (1):
Posted by: Andrea Bianchi