[img]http://i.imgur.com/XfZHuLS.png[/img]
So far, these competitions have been held only for fun. But what if we could use our programming skills for the good of the many? That's what this competition is about. More specifically, it's about something us Facepunch programmers hold near and dear: the highlights.
The highlights of the monthly WAYWO have been a subject of much debate recently. Of course, being programmers, we can't just manually find highlights - where's the fun in that? That's where this competition comes in. The goal? Create a program that can find the best content in a thread, without human intervention.
[b]Rules[/b]
• You can submit any application you want, as long as you wrote it. Yes, you can submit one you've made before. This is a contest to find the [b]best[/b] program, and if yours is the best, well, that's just the kind of thing we want.
• Your program must be open source. For the good of the subforum, come on.
[b]Details[/b]
The goal is simple: your program must provide the best content from [url=http://facepunch.com/showthread.php?t=1398111]version 45[/url] of WAYWO (the previous version).
Your program must produce at least twenty posts.
Your program should output a list of post IDs (no particular order), comma separated. For example, if your program just outputted the first three posts (which would not be a valid entry):
[code]
$ highlights
44969379,44969392,44969401
[/code]
The contest will run from the date of this posting until the end (in GMT) of July 27th (next Sunday).
[b]Judging[/b]
For each program, I'll take the output, shuffle it up, and have an impartial judge (i.e. me, since I'm not competing) score the results by the percentage of content. For example, if a program outputted 15 posts of content and 5 posts of noise, that'd be 75% content.
[b]Extra Info[/b]
You can scrape the thread if you want, or you can download an archive of the thread from [url=fpcomp.github.io/files/waywo45.zip]here.[/url] The archive contains every post in the thread in JSON format, including its author, content, and ratings. It should be everything you need.
You can also check out the newly renovated website: [url]http://fpcomp.github.io/[/url]
Awesome! I just wrote a [url=http://facepunch.com/showthread.php?t=1389034&p=45433779&viewfull=1#post45433779]thread stitcher[/url] so I'm already set up.
[editline]18th July 2014[/editline]
[url]http://lab.facepunch.com/api/post/list/?threadid=1398111&page=1[/url]
Just to make everyone's life easier.
[QUOTE=Ott;45434569]Awesome! I just wrote a [url=http://facepunch.com/showthread.php?t=1389034&p=45433779&viewfull=1#post45433779]thread stitcher[/url] so I'm already set up.
[editline]18th July 2014[/editline]
[url]http://lab.facepunch.com/api/post/list/?threadid=1398111&page=1[/url]
Just to make everyone's life easier.[/QUOTE]
There's also an archive in JSON format in the OP.
You can use whatever though, as long as it works.
Yeah it doesn't. There's no ratings :v:
[QUOTE=Ott;45434755]Yeah it doesn't. There's no ratings :v:[/QUOTE]
There are, they're just only in posts that have ratings:
[code]
{
"thread": 1398111,
"page": 12,
"author": {
"name": "Berkin",
"info": {
"browser": "Unknown"
},
"type": "gold"
},
"date": 1402286400,
"number": "444",
"id": "45040455",
"content": "<div class=\"quote\">\r\n\t\r\n\t\t<div class=\"information\">\r\n\t\t\t\r\n\t\t\t\t<a href=\"showthread.php?p=45040432#post45040432\" rel=\"nofollow\">supersnail11 posted:</a>\r\n\t\t\t\r\n\t\t</div>\r\n\t\r\n\t<div class=\"message\">Can it be used in reverse, like regex but fuzzier?</div>\r\n\t\r\n</div> You can generate regular expressions with it.<br>\r\n<br>\r\nI have done it in the past for a security question generator.",
"sanitized_content": "You can generate regular expressions with it.\r\n\r\nI have done it in the past for a security question generator.",
"ratings": {
"winner": 3
}
}
[/code]
Also, there's other ways to find highlights than with ratings! Maybe make a neural network and teach it to [del]love[/del] find good highlights.
I was talking about Labpunch.
[QUOTE=Ott;45434851]I was talking about Labpunch.[/QUOTE]
Oh, yeah, that might be an issue.
Still, though, don't be afraid to think outside the box!
Good thing I made a program that does pretty much exactly like this a couple months back: [url]https://github.com/Catuna/FP_bot[/url]
Though now that we I have an archive I might just aswell use that
I'm in. I'll write some thing in a bit.
Also, I think it goes without saying, don't hand-pick your results and have your program spit out the posts you picked. That's just not cool.
I don't really know anything about neural networks or machine learning or any of this fancy jazz so I've probably taken the naive approach to this challenge. My algorithm currently does the following:
1) Heavily penalize the OP to remove from rankings
2) Assign multiplier based on image contact and length (short posts are penalized)
3) Add or take away score*multiplier (multiplier only in effect if a positive increase) based on each rating recieved, promoting posts with "positive" ratings and punishing those with "negative ratings"
4) Sort list
5) Take top 50, take highest scoring post of each user in top 50, penalize rest of posts in top 50 by half to diversify list and re sort.
6) Return list of ranked posts with id, score, username
It's not the best thing on earth, and not particularly clever but it was a little fun. I probably will only work on it a little more so it outputs in the correct format and any ideas I get overnight on how it should favor posts. At the moment the system can be pretty easily gamed. Here is it's current output:
[code]
1 45246886 SCORE: 32.25 Jitterz
2 45041782 SCORE: 31.425 adnzzzzZ
3 45102808 SCORE: 18.45 Icedshot
4 45177763 SCORE: 15.15 Ziks
5 45062799 SCORE: 14.375 geel9
6 45022336 SCORE: 12.675 JohnnyOnFlame
7 45113720 SCORE: 12.45 Swebonny
8 45020599 SCORE: 11.7 thomasfn
9 45010382 SCORE: 10.35 Kamil_
10 45003105 SCORE: 10.3 layla
11 44984552 SCORE: 9.225 Hypershadsy
12 45024148 SCORE: 9.2 AntonioR
13 45246425 SCORE: 9.075 Vilusia
14 44969410 SCORE: 8.645 Cold
15 45144508 SCORE: 8.5 Makke
16 45198987 SCORE: 7.65 Z_guy
17 45096776 SCORE: 7.5 Simspelaaja
18 45071728 SCORE: 7.125 cra0kalo
19 45157975 SCORE: 6.975 chaz13
20 45060468 SCORE: 6.95 Deco Da Man
[/code]
The top ranked post is Jitterz baby. Which I guess is a highlight? I mean he did make it... The alogrithm actually does pretty well at filtering out posts that aren't content and heavily penalizing things that wouldn't make great highlights. For example posts that were rated dumb a lot or short one sentence replies. The worst post of that thread would be (apart from the OP):
[url]http://facepunch.com/showthread.php?t=1398111&p=45159729&viewfull=1#post45159729[/url]
Okay i did mine, the algorithm takes the thread posts as json as supersnail provided and outputs CSV of highlights.
Currently it highlights these:
[code]
1. 45246886
2. 45041782
3. 45246425
4. 45062799
5. 45177763
6. 45022336
7. 45113720
8. 45144508
9. 45024148
10. 45020599
11. 45003105
12. 45010382
13. 45239769
14. 45089673
15. 45260359
16. 45198987
17. 45070157
18. 45060468
19. 44984552
20. 45214278
21. 45071728
22. 45074464
23. 45008270
24. 45208128
25. 45147064
[/code]
Gonna post link to github soon
[editline]20th July 2014[/editline]
There [url]https://github.com/cartman300/WAYWOrithm[/url]
Oooh! I have an idea! A really simple one, but trainable!
Just got off night shift again so no idea when I'll have the time :(
But essentially back-propagation on the ratings weights, and get a human (me) to compare two random posts and say which is more highlights-worthy (show them two posts with a button below each);
for each rating the better post has, add epsilon * the number of ratings to that rating's weights;
for each rating the worse post has, subtract the above;
eventually each rating should have a nearly optimal score weighting, e.g. Dumb should have a large negative weight because bad posts will almost invariably be dumb.
I'll try this out when I get the chance, or anyone else can give it a go if they feel like it.
There are also other things you'd want to incorporate, so posts with images should be given a slight preference (maybe a *1.1 multiplier, can tune this in a similar way) because they look flashier (videos too), and a bonus each time a post is quoted by other posts (do this in an initial first pass) because it means people are talking about it, so it's more likely to be memorable.
[QUOTE=r0b0tsquid;45444035]Oooh! I have an idea! A really simple one, but trainable!
Just got off night shift again so no idea when I'll have the time :(
But essentially back-propagation on the ratings weights, and get a human (me) to compare two random posts and say which is more highlights-worthy (show them two posts with a button below each);
for each rating the better post has, add epsilon * the number of ratings to that rating's weights;
for each rating the worse post has, subtract the above;
eventually each rating should have a nearly optimal score weighting, e.g. Dumb should have a large negative weight because bad posts will almost invariably be dumb.
I'll try this out when I get the chance, or anyone else can give it a go if they feel like it.
There are also other things you'd want to incorporate, so posts with images should be given a slight preference (maybe a *1.1 multiplier, can tune this in a similar way) because they look flashier (videos too), and a bonus each time a post is quoted by other posts (do this in an initial first pass) because it means people are talking about it, so it's more likely to be memorable.[/QUOTE]
Are you sure you don't mean [code]for each rating the better post has, add epsilon / the number of ratings to that rating's weights;[/code]?
Pretty sure that I meant multiplication? Where epsilon is some small arbitrary value, and the more of a particular rating there is, the more we can assume that it has contributed to the post's worthiness or unworthiness, so the more we want to change the weight!
[img]http://i.imgur.com/KioQm8W.png[/img]
No CSS :v:
also no backend as of yet, but this is the training screen
I really shouldn't be doing this in C++ :v:
oh well, time for sleep!
[QUOTE=Borsty;45446916]My current status. Takes ratings, count of images and videos into account.[/QUOTE]
You didn't take into account multiple entries with the progression of the same content.
You should take the top rank for each unique user.
Commie got perma'd
[img]http://i.cubeupload.com/ep6ZNT.png[/img]
Hope you all have gold
Aw man. :(
I don't have time to code this up right now, but it might be worth experimenting with weighting posts based on the number of times they were quoted. This is just an idea from lurking through the WAYWO threads for the past few years -- often the 'highlights' are being quoted over and over as people discuss them.
Sorry, you need to Log In to post a reply to this thread.