[QUOTE=G4MB!T;47425901]Just tag them as not working?[/QUOTE]
silly Gambit, people can't read
Right, so there's WAAAAAAYYYYY too many files for me to move across automatically with a script. Instead, I'm going to make the site fetch the file and images for any visit to a garrysmod.org url and create a download on the new site for it (if it doesn't already exist) and redirect the old page to the new one. That way only things actually being linked to will be copied across and it'll save me downloading (potentially) ~135,000 zip files and ~135,000/270,000/405,000 images all at once.
[editline]30th March 2015[/editline]
I'll make filters in the future to include/exclude saves/dupes/garrysmod.org content soon.
[QUOTE=adamdburton;47425961]Right, so there's WAAAAAAYYYYY too many files for me to move across automatically with a script. Instead, I'm going to make the site fetch the file and images for any visit to a garrysmod.org url and create a download on the new site for it (if it doesn't already exist) and redirect the old page to the new one. That way only things actually being linked to will be copied across and it'll save me downloading (potentially) ~135,000 zip files and ~135,000/270,000/405,000 images all at once.
[editline]30th March 2015[/editline]
I'll make filters in the future to include/exclude saves/dupes/garrysmod.org content soon.[/QUOTE]
Does the addon we try and fetch automaticly make a public download of it once we input it? or can we just get a private one since personally I would love to get some files from the old gmod without it being public to shit
Is this the new garrysmod.org with subjective rules I have been hearing about?
My old map from 2013 has been the top download for six days
If I last a week do I get a prize? :v:
[QUOTE=Sally;47426389]Is this the new garrysmod.org with subjective rules I have been hearing about?[/QUOTE]
Yeah hi welcome to the forum with the most subjective rules ever.
[editline]31st March 2015[/editline]
In all seriousness, the rules exist to stop idiots remaining on the site and claiming 'BUT ITS NOT IN THE RULES' when they get banned for uploading stupid shit. Further to that, only 2 people have been removed from the site out of the ~3,500 registered. One Because He's Tyler Wearing And Apparently Uploading LUA Hacks And Talking Like This Is Still Cool, and the other uploaded someone else's work without permission on 3 different accounts.
[editline]31st March 2015[/editline]
[QUOTE=BigBadWilly;47426273]Does the addon we try and fetch automaticly make a public download of it once we input it? or can we just get a private one since personally I would love to get some files from the old gmod without it being public to shit[/QUOTE]
Err, they become public. Perhaps you should find another way to get the files if you want to keep them to yourself.
once you're done with uploading all the old files ( and putting them in a different catagorie? ) we should make a new thread to fix them up and re-uplaod them! maybe make a competition out of it? the guy that fixed the most old addons will get his name displayed on the site maybe?
The site is now importing files from the old site when their urls are visited!
[QUOTE=adamdburton;47428288]The site is now importing files from the old site when their urls are visited![/QUOTE]
[img]http://i.imgur.com/u75yima.png[/img]
I fully support this change.
Maybe you should make a list of trusted users to help organize the garrysmod.org uploads, allow them to set the category.
How many TB harddrives do I have to buy to get each and every file?
I guess you can also programmatically try to rip tons of descriptions and authors from web.archive.com.
[url]http://web.archive.org/web/20120920093113/http://www.garrysmod.org/downloads/?a=list&b=downloads[/url]
It's all inside a div with the id 'TabWindow' and a class 'enteredText'
[editline]31st March 2015[/editline]
In fact hang on I'll write a concept on that one
[QUOTE=maurits150;47428620]How many TB harddrives do I have to buy to get each and every file?
I guess you can also programmatically try to rip tons of descriptions and authors from web.archive.com.
[url]http://web.archive.org/web/20120920093113/http://www.garrysmod.org/downloads/?a=list&b=downloads[/url]
It's all inside a div with the id 'TabWindow' and a class 'enteredText'
[editline]31st March 2015[/editline]
In fact hang on I'll write a concept on that one[/QUOTE]
I can't even calculate it. It's pulling downloads as-and-when they're needed for now. I'll post some stats after a few weeks of it doing that.
Great idea with the descriptions though, I can get uploader names and tags too. I'll write something to scrape them too.
I just noticed that the wayback machine api seems broken and ignores some url parameters. I'm not sure if you can fix that but otherwise you just gotta use the "web.archive.org/web/http://www.garrysmod.org/downloads/?a=view&id=<id>" url format directly.
Oh god so much shit getting reuploaded ;-; thank you based Adam
Here's my PoC. I didn't use any fancy frameworks to write this one so you can probably substitute some functions for more reliable and better error reporting ones.
[php]
<?php
function extractDownloadFromHtml($html) {
// Initialize return array.
$downloadData = array();
// Create DOM document.
$dom = new DomDocument();
@$dom->loadHtml($html); // Load HTML but ignore warnings.
$divs = $dom->getElementsByTagName('div');
// Fetch tags
foreach($divs as $div) {
if($div->getAttribute('class') == 'downloadinfo') {
if(substr($div->nodeValue, 0, strlen('Tags:')) == 'Tags:') {
$downloadData['tags'] = explode(" ", trim(substr($div->nodeValue, strlen('Tags: '), strlen($div->nodeValue))));
}
}
}
// Fetch content
foreach($divs as $div) {
if($div->getAttribute('id') == 'TabWindow') {
$downloadData['description'] = $div->nodeValue;
}
}
// Fetch meta
foreach($divs as $div) {
if($div->getAttribute('class') == 'downloadmeta') {
preg_match("/Size:\s+(\d+\.\d+ \w+)\s+-\s+Uploaded\s+document\.write\(\s+GetElapsedTime\(\s+(\d+)\s+\)\s+\);\s+-\s+(\d+)\s+Downloads\s+\((\d+)\s+this\s+week\)\s+-\s+Uploaded\s+by\s+(.*)/m", $div->nodeValue, $matches);
if($matches && count($matches) == 6) {
$downloadData['size'] = $matches[1];
$downloadData['uploadtime'] = $matches[2];
$downloadData['downloads'] = $matches[3];
$downloadData['downloads_this_week'] = $matches[4];
$downloadData['uploader'] = $matches[5];
}
}
}
return $downloadData;
}
function getAddonDescription($id) {
$waybackPage = @file_get_contents("http://web.archive.org/web/http://www.garrysmod.org/downloads/?a=view&id=" . $id); // Download but ignore 'failed to open stream' warnings.
if($waybackPage) { // Check if we got anything decoded.
$downloadData = extractDownloadFromHtml($waybackPage);
$downloadData['id'] = $id;
return $downloadData;
} else {
throw new Exception("failed to download wayback page");
}
}
var_dump(getAddonDescription(3952));
?>
[/php]
Output:
[code]
array (size=8)
'tags' =>
array (size=7)
0 => string 'maps' (length=4)
1 => string 'cscdesert' (length=9)
2 => string 'fallout' (length=7)
3 => string 'captsupercow' (length=12)
4 => string 'new' (length=3)
5 => string 'wasteland' (length=9)
6 => string 'desert' (length=6)
'description' => string '
This map is designed as a post-apocalyptic RP map set in the southwestern U.S. desert after
a global nuclear war. A small town has emerged on the surface from survivors, while pre-war
military installations and fallout shelters still exist in the area. The valley was sealed off when the tunnels
leading out of the valley collapsed. If you have any questions, additions or requests,
please email me [email protected]
/* <![CDATA[ */
(function(){try{var s,a,i,j,r,c,l=document.getElementById("__cf_email__");a=l'... (length=1090)
'size' => string '9.58 MB' (length=7)
'uploadtime' => string '1169043153' (length=10)
'downloads' => string '153030' (length=6)
'downloads_this_week' => string '24' (length=2)
'uploader' => string 'Captsupercow' (length=12)
'id' => int 3952
[/code]
[editline]31st March 2015[/editline]
Just notice there's some weird email crap in there (probably a leftover from cloudflare) that I didn't notice. I guess script tags aren't filtered out but you can fix that easily.
[QUOTE=maurits150;47428797]Here's my PoC. I didn't use any fancy frameworks to write this one so you can probably substitute some functions for more reliable and better error reporting ones.
[php]
<?php
function extractDownloadFromHtml($html) {
// Initialize return array.
$downloadData = array();
// Create DOM document.
$dom = new DomDocument();
@$dom->loadHtml($html); // Load HTML but ignore warnings.
$divs = $dom->getElementsByTagName('div');
// Fetch tags
foreach($divs as $div) {
if($div->getAttribute('class') == 'downloadinfo') {
if(substr($div->nodeValue, 0, strlen('Tags:')) == 'Tags:') {
$downloadData['tags'] = explode(" ", trim(substr($div->nodeValue, strlen('Tags: '), strlen($div->nodeValue))));
}
}
}
// Fetch content
foreach($divs as $div) {
if($div->getAttribute('id') == 'TabWindow') {
$downloadData['description'] = $div->nodeValue;
}
}
// Fetch meta
foreach($divs as $div) {
if($div->getAttribute('class') == 'downloadmeta') {
preg_match("/Size:\s+(\d+\.\d+ \w+)\s+-\s+Uploaded\s+document\.write\(\s+GetElapsedTime\(\s+(\d+)\s+\)\s+\);\s+-\s+(\d+)\s+Downloads\s+\((\d+)\s+this\s+week\)\s+-\s+Uploaded\s+by\s+(.*)/m", $div->nodeValue, $matches);
if($matches && count($matches) == 6) {
$downloadData['size'] = $matches[1];
$downloadData['uploadtime'] = $matches[2];
$downloadData['downloads'] = $matches[3];
$downloadData['downloads_this_week'] = $matches[4];
$downloadData['uploader'] = $matches[5];
}
}
}
return $downloadData;
}
function getAddonDescription($id) {
$waybackPage = @file_get_contents("http://web.archive.org/web/http://www.garrysmod.org/downloads/?a=view&id=" . $id); // Download but ignore 'failed to open stream' warnings.
if($waybackPage) { // Check if we got anything decoded.
$downloadData = extractDownloadFromHtml($waybackPage);
$downloadData['id'] = $id;
return $downloadData;
} else {
throw new Exception("failed to download wayback page");
}
}
var_dump(getAddonDescription(3952));
?>
[/php]
Output:
[code]
array (size=8)
'tags' =>
array (size=7)
0 => string 'maps' (length=4)
1 => string 'cscdesert' (length=9)
2 => string 'fallout' (length=7)
3 => string 'captsupercow' (length=12)
4 => string 'new' (length=3)
5 => string 'wasteland' (length=9)
6 => string 'desert' (length=6)
'description' => string '
This map is designed as a post-apocalyptic RP map set in the southwestern U.S. desert after
a global nuclear war. A small town has emerged on the surface from survivors, while pre-war
military installations and fallout shelters still exist in the area. The valley was sealed off when the tunnels
leading out of the valley collapsed. If you have any questions, additions or requests,
please email me [email protected]
/* <=!=[=C=D=A=T=A=[ */
(function(){try{var s,a,i,j,r,c,l=document.getElementById("__cf_email__");a=l'... (length=1090)
'size' => string '9.58 MB' (length=7)
'uploadtime' => string '1169043153' (length=10)
'downloads' => string '153030' (length=6)
'downloads_this_week' => string '24' (length=2)
'uploader' => string 'Captsupercow' (length=12)
'id' => int 3952
[/code]
[editline]31st March 2015[/editline]
Just notice there's some weird email crap in there (probably a leftover from cloudflare) that I didn't notice. I guess script tags aren't filtered out but you can fix that easily.[/QUOTE]
Awesome, thanks for this. I'll get it implemented.
I forgot to fetch the most important thing, the title!
[php]
// Fetch title
foreach($divs as $div) {
if($div->getAttribute('id') == 'downloadtitle') {
$downloadData['title'] = $div->getElementsByTagName('h2')->item(0)->nodeValue;
}
}
[/php]
I'm sure you'll manage to get anything else you need yourself though. Good to see it getting implemented because it makes the site so much more fulfilling. Are you going to update information on existing garrysmod.org downloads too?
[QUOTE=maurits150;47428825]I forgot to fetch the most important thing, the title!
[php]
// Fetch title
foreach($divs as $div) {
if($div->getAttribute('id') == 'downloadtitle') {
$downloadData['title'] = $div->getElementsByTagName('h2')->item(0)->nodeValue;
}
}
[/php]
I'm sure you'll manage to get anything else you need yourself though. Good to see it getting implemented because it makes the site so much more fulfilling. Are you going to update information on existing garrysmod.org downloads too?[/QUOTE]
Thanks for this, it's much appreciated. I won't be able to populate the information instantly for each download because archive.org will probably block me. I'll set it running on a schedule to populate 1 per minute.
[editline]31st March 2015[/editline]
Right, any new ones are now grabbing the info straight away, old ones I'll manually set to populate.
I can already see them popping up with correct info. Awesome!
Would there be anyway to allow people with nothing better to do ( like me ) to go through and tag the re-uploads?
[QUOTE=Jeezy;47429107]Would there be anyway to allow people with nothing better to do ( like me ) to go through and tag the re-uploads?[/QUOTE]
I'll make another level of user to do this. I already have moderators but they have access to remove downloads and users too.
[editline]31st March 2015[/editline]
I honestly can't believe how many old downloads are being link to. There's over 1000 that have been reuploaded in the last 6 hours.
[editline]31st March 2015[/editline]
[QUOTE=Jeezy;47429107]Would there be anyway to allow people with nothing better to do ( like me ) to go through and tag the re-uploads?[/QUOTE]
Added you as an upload organiser, see the organisation menu option under account.
[editline]31st March 2015[/editline]
Ok so taking the sum of the file sizes for 1000 downloads and multiplying by the amount of downloads from the old site (~135,000), there's [b]~1.92TB[/b] of content with ~14GB already uploaded.
[URL=https://garrysmods.org/download/1343/addonszip]Glad to see the nude mods are getting reuploaded[/URL]
[editline]31st March 2015[/editline]
Makes it more interesting that we can see who downloads them.
Some requests:
1) Could the old garrysmod.org re-uploads be removed from the front page? (I want to see new addons, not old and broken ones)
2) I started reporting some re-uploads because they are effortless (2 second shit) or nude/sexy mods. Could you please add some sort of filter to block these kind of reuploads? It's really annoying.
Shit like this:
[url=https://garrysmods.org/download/1451/lolwtfzip][img]https://garrysmods.org/download/1451/button.png[/img][/url]
[URL="https://garrysmods.org/download/815/parakeets-pill-pack-r1zip"]This sure takes me back...[/URL]
Looks like someone even downloaded it. I wonder how broken it is now. Sort of conflicted about having it taken down.
You still need to fix the images on garrysmod.org urls (if you can).
Like:
[img]https://maurits.tv/data/img/March%202015/2015-03-31_17-32-32.png[/img]
[QUOTE=maurits150;47429822]You still need to fix the images on garrysmod.org urls (if you can).
Like:
[img]https://maurits.tv/data/img/March%202015/2015-03-31_17-32-32.png[/img][/QUOTE]
Fixed!
This is really great. How are you currently doing with hosting costs? I know you said you can absorb the hosting costs unless it becomes unmanageable, however I'm sure there are a lot of people that wouldn't mind pitching in already as a thanks for working so hard on this site.
So many old addons! This is amazing!
But with so many pages of addons, there should be an easy way to get to a specific page instead of clicking >> 100 times.
Oh my god, old garrysmod.org banners now auto link to your site!! This is amazing
Sorry, you need to Log In to post a reply to this thread.