[QUOTE=ultradude25;33982171]Self rating is impossible.[/QUOTE]
"Only losers self-rate"
Well dang.
[editline]31st December 2011[/editline]
Meh, NovembrDobby deserves it anyway :v: on with the project!
[QUOTE=high;33981944]You are still matching around tags though. If you have html tags in your regex string then you should really be using an html parser instead.[/QUOTE]
No sure about that, considering that the page offers no easy way to identify the table I'm supposed to extract my data from.
Regex, on the other hand, can extract the data even if they split it across multiple tables, or change the hierarchy of the page.
[QUOTE=danharibo;33982050]I would recommend using XPath for querying XML documents, this page having some decent examples: [url]http://msdn.microsoft.com/en-us/library/ms256086.aspx[/url]
If you're lucky you can probably find a library that can use XPath to query HTML too.[/QUOTE]
I'd considered using XPath but faced the problems above.. there's no easy way to path to that data (no id/class attributes), unless I rely on the fact that it's preceded with a <H3> tag, and again, if the page's layout/hierarchy changes it would still suddenly break.
redid these
[IMG]http://i.imgur.com/kGxFL.gif[/IMG]
[QUOTE=Overv;33981820]No, that extra question mark indicates a non-greedy match. That means it will stop as soon as it finds a match instead of trying to find the longest available match. In the case of [i].*?>[/i], that means it will stop when it finds the first > and not go any further.[/QUOTE]
Why not use [I][^>]*>[/I] ? It's much faster and easier to read.
[QUOTE=voodooattack;33982201]No sure about that, considering that the page offers no easy way to identify the table I'm supposed to extract my data from.
Regex, on the other hand, can extract the data even if they split it across multiple tables, or change the hierarchy of the page.[/QUOTE]
You can find all <a> tags with an href that contains "pandora/talk?botid=". Then to get the count you just do link->parent->nextsibling.
The regex match is even more prone to breaking than using an html parser.
[QUOTE=voodooattack;33982201]No sure about that, considering that the page offers no easy way to identify the table I'm supposed to extract my data from.
Regex, on the other hand, can extract the data even if they split it across multiple tables, or change the hierarchy of the page.
I'd considered using XPath but faced the problems above.. there's no easy way to path to that data (no id/class attributes), unless I rely on the fact that it's preceded with a <H3> tag, and again, if the page's layout/hierarchy changes it would still suddenly break.[/QUOTE]When I'm faced with poor data hierarchy across multiple tables I would try to match the tables within the page, and then run a 2nd path on each table to extract the data.
Or use High's approach above, both are fine.
[QUOTE=Robber;33982276]Why not use [I][^>]*>[/I] ? It's much faster and easier to read.[/QUOTE]
In this case that would be better yes, it's more useful for cases with delimiters that are longer than one character.
Aha, great idea. Thanks guys.
I'll go implement this quick. Be right back. :eng101:
[editline]31st December 2011[/editline]
Grrr..
[csharp]System.Xml.XmlException was unhandled
Message=The 'link' start tag on line 6 position 2 does not match the end tag of 'head'. Line 7, position 3.
Source=System.Xml
LineNumber=7
LinePosition=3
SourceUri=http://www.pandorabots.com/botmaster/en/mostactive
StackTrace:
at System.Xml.XmlTextReaderImpl.Throw(Exception e)
at System.Xml.XmlTextReaderImpl.Throw(String res, String[] args)
at System.Xml.XmlTextReaderImpl.ThrowTagMismatch(NodeData startTag)
at System.Xml.XmlTextReaderImpl.ParseEndElement()
at System.Xml.XmlTextReaderImpl.ParseElementContent()
at System.Xml.XmlTextReaderImpl.Read()
at System.Xml.XPath.XPathDocument.LoadFromReader(XmlReader reader, XmlSpace space)
at System.Xml.XPath.XPathDocument..ctor(String uri, XmlSpace space)
at System.Xml.XPath.XPathDocument..ctor(String uri)
at CleverOmegleGUI.PandoraBotSelectionForm.refreshBotList() in PandoraBotSelectionForm.cs:line 198
at CleverOmegleGUI.PandoraBotSelectionForm.<btnRefresh_Click>b__4(Object ) in PandoraBotSelectionForm.cs:line 269
at System.Threading.QueueUserWorkItemCallback.WaitCallback_Context(Object state)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean ignoreSyncCtx)
at System.Threading.QueueUserWorkItemCallback.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem()
at System.Threading.ThreadPoolWorkQueue.Dispatch()
at System.Threading._ThreadPoolWaitCallback.PerformWaitCallback()
InnerException:
[/csharp]
[html]<html>
<head>
...
<link rel="stylesheet" title="default" href="http://static.pandorabots.com:10081/botmaster/common/default.css" type="text/css">
</head>[/html]
They forgot to close that tag...
[QUOTE=voodooattack;33982407]Aha, great idea. Thanks guys.
I'll go implement this quick. Be right back. :eng101:
[editline]31st December 2011[/editline]
Grrr..
[html]<html>
<head>
...
<link rel="stylesheet" title="default" href="http://static.pandorabots.com:10081/botmaster/common/default.css" type="text/css">
</head>[/html]
They forgot to close that tag...[/QUOTE]
That's why I recommended using MIL or DOL(derived from MIL). Xml requires a closing tag but html does not.
Here is a quick example I threw together using MIL.
[csharp] public class Bot
{
public string Name { get; set; }
public string Id { get; set; }
public int Interactions { get; set; }
}
private static void Main(string[] args)
{
var bots = new List<Bot>();
string html;
while (true)
{
try
{
using (var wc = new WebClient())
{
html = wc.DownloadString("http://www.pandorabots.com/botmaster/en/mostactive");
break;
}
}
catch (WebException we)
{
Console.WriteLine("Failed to retrieve page ({0}). Try again (Y/N)?: ", we.Status);
if (Console.ReadLine().ToLower() == "n")
return;
}
}
var doc = HtmlDocument.Create(html);
foreach (var node in doc.Nodes.FindByName("a"))
{
//Check if the node is an element
var ele = node as HtmlElement;
if (ele == null)
continue;
//Check if the link is to a bot
var href = ele.Attributes.GetAttributeValue("href");
if (href == null || !href.Contains("pandora/talk?botid="))
continue;
//Make sure the parent and the parent's next sibling exist
if (ele.Parent == null || ele.Parent.Next == null)
continue;
//Make sure the parent's next sibling is an element and not a text node
var iele = ele.Parent.Next as HtmlElement;
if (iele == null)
continue;
//Get the BotId from the link
var id = GetBotId(href);
//Get the interactions
int interactions;
if (!int.TryParse(iele.InnerText, out interactions))
continue;
bots.Add(new Bot { Name = ele.InnerText, Id = id, Interactions = interactions });
}
Console.ReadLine();
}
static string GetBotId(string href)
{
var match = Regex.Match(href, "=([\\d\\w]+)$");
if (!match.Success)
return null;
return match.Groups[1].Value;
}[/csharp]
[QUOTE=voodooattack;33982407]Aha, great idea. Thanks guys.
I'll go implement this quick. Be right back. :eng101:
[editline]31st December 2011[/editline]
Grrr..
[csharp]System.Xml.XmlException was unhandled
Message=The 'link' start tag on line 6 position 2 does not match the end tag of 'head'. Line 7, position 3.
Source=System.Xml
LineNumber=7
LinePosition=3
SourceUri=http://www.pandorabots.com/botmaster/en/mostactive
StackTrace:
at System.Xml.XmlTextReaderImpl.Throw(Exception e)
at System.Xml.XmlTextReaderImpl.Throw(String res, String[] args)
at System.Xml.XmlTextReaderImpl.ThrowTagMismatch(NodeData startTag)
at System.Xml.XmlTextReaderImpl.ParseEndElement()
at System.Xml.XmlTextReaderImpl.ParseElementContent()
at System.Xml.XmlTextReaderImpl.Read()
at System.Xml.XPath.XPathDocument.LoadFromReader(XmlReader reader, XmlSpace space)
at System.Xml.XPath.XPathDocument..ctor(String uri, XmlSpace space)
at System.Xml.XPath.XPathDocument..ctor(String uri)
at CleverOmegleGUI.PandoraBotSelectionForm.refreshBotList() in PandoraBotSelectionForm.cs:line 198
at CleverOmegleGUI.PandoraBotSelectionForm.<btnRefresh_Click>b__4(Object ) in PandoraBotSelectionForm.cs:line 269
at System.Threading.QueueUserWorkItemCallback.WaitCallback_Context(Object state)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean ignoreSyncCtx)
at System.Threading.QueueUserWorkItemCallback.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem()
at System.Threading.ThreadPoolWorkQueue.Dispatch()
at System.Threading._ThreadPoolWaitCallback.PerformWaitCallback()
InnerException:
[/csharp]
[html]<html>
<head>
...
<link rel="stylesheet" title="default" href="http://static.pandorabots.com:10081/botmaster/common/default.css" type="text/css">
</head>[/html]
They forgot to close that tag...[/QUOTE]
Then don't specify a DOCTYPE but some tags don't need to be closed, you should use a HTML parser since that's not XML.
[QUOTE=voodooattack;33982407]Aha, great idea. Thanks guys.
I'll go implement this quick. Be right back. :eng101:
[editline]31st December 2011[/editline]
Grrr..
[csharp]System.Xml.XmlException was unhandled
Message=The 'link' start tag on line 6 position 2 does not match the end tag of 'head'. Line 7, position 3.
Source=System.Xml
LineNumber=7
LinePosition=3
SourceUri=http://www.pandorabots.com/botmaster/en/mostactive
StackTrace:
at System.Xml.XmlTextReaderImpl.Throw(Exception e)
at System.Xml.XmlTextReaderImpl.Throw(String res, String[] args)
at System.Xml.XmlTextReaderImpl.ThrowTagMismatch(NodeData startTag)
at System.Xml.XmlTextReaderImpl.ParseEndElement()
at System.Xml.XmlTextReaderImpl.ParseElementContent()
at System.Xml.XmlTextReaderImpl.Read()
at System.Xml.XPath.XPathDocument.LoadFromReader(XmlReader reader, XmlSpace space)
at System.Xml.XPath.XPathDocument..ctor(String uri, XmlSpace space)
at System.Xml.XPath.XPathDocument..ctor(String uri)
at CleverOmegleGUI.PandoraBotSelectionForm.refreshBotList() in PandoraBotSelectionForm.cs:line 198
at CleverOmegleGUI.PandoraBotSelectionForm.<btnRefresh_Click>b__4(Object ) in PandoraBotSelectionForm.cs:line 269
at System.Threading.QueueUserWorkItemCallback.WaitCallback_Context(Object state)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean ignoreSyncCtx)
at System.Threading.QueueUserWorkItemCallback.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem()
at System.Threading.ThreadPoolWorkQueue.Dispatch()
at System.Threading._ThreadPoolWaitCallback.PerformWaitCallback()
InnerException:
[/csharp]
[html]<html>
<head>
...
<link rel="stylesheet" title="default" href="http://static.pandorabots.com:10081/botmaster/common/default.css" type="text/css">
</head>[/html]
They forgot to close that tag...[/QUOTE]
[quote]In HTML the <link> tag has no end tag.[/quote]
I personally use [url]http://htmlagilitypack.codeplex.com/[/url] for HTML parsing, it's quite nifty.
[IMG]http://i.imgur.com/x5ZFS.png[/IMG]
I would've ported my Java application to C# but not anymore.
[QUOTE=iffamies;33983088][IMG]http://i.imgur.com/x5ZFS.png[/IMG]
I would've ported my Java application to C# but not anymore.[/QUOTE]
I hate that
"Look at this shiny Restart Later button! Too bad, we have disabled the button. Also, we removed the close button so it looks like you have a choice while you really don't."
[QUOTE=NovembrDobby;33982256]redid these
[IMG]http://i.imgur.com/kGxFL.gif[/IMG][/QUOTE]
There is something about that shine that doesn't look right... Maybe it is the interval?
[QUOTE=uitham;33983439]I hate that
"Look at this shiny Restart Later button! Too bad, we have disabled the button. Also, we removed the close button so it looks like you have a choice while you really don't."[/QUOTE]
I just ignore the window. :v:
[QUOTE=iffamies;33983088][IMG]http://i.imgur.com/x5ZFS.png[/IMG]
I would've ported my Java application to C# but not anymore.[/QUOTE]
It's worse on Windows XP when you run Windows Update and click 'Restart Later'
Because the 'Would you like to restart now?' box will almost always pop up while you're typing, and if you hit enter it restarts.
Every time.
Alright, I'm going to stop trying now.
[IMG]http://i.imgur.com/2yI1v.png[/IMG]
[QUOTE=iffamies;33984669]Alright, I'm going to stop trying now.
[IMG]http://i.imgur.com/2yI1v.png[/IMG][/QUOTE]
How the hell have you broken Visual Studio so bad?
[img]http://i.imgur.com/WglDF.png[/img]
[URL="http://blogcake.x10.mx/wordpress/2011/12/skyrim-music-converter/"]Skyrim Music Converter[/URL]
Who's brilliant idea was it to set the default interaction to the value which causes user interruption anyway. In fact, why don't Microsoft use Mozilla's policy to dialog user interruption by not doing anything until you click on the window itself, but still keep the window as the top frame to indicate it needs attention?
[editline]31st December 2011[/editline]
[QUOTE=supersnail11;33984715][img]http://i.imgur.com/WglDF.png[/img]
[URL="http://blogcake.x10.mx/wordpress/2011/12/skyrim-music-converter/"]Skyrim Music Converter[/URL][/QUOTE]
That's pretty cool.
More on my Quadtree;
[img]http://i55.tinypic.com/29pen3r.png[/img]
Objects are all on the same depth, but if an object overlaps multiple leafs, a pointer to that object will be added to the vector of objects in the respective leaf(s). I now have a vector per leaf and a master vector, which has the pointers to all the objects. When I add an object to a leaf, I simply create a pointer to the master vector object.
I must have installed c# 2010 express about 70 times before, I have not once had any trouble. What the hell are you doing wrong? :v:
[URL="http://www.toodledo.com/public/td4eff7a9bcb73d/0/0/list.html"]To-do list for CleverOmegle/dotOmgle[/URL]
[QUOTE=Darwin226;33985607]Happy
[CSHARP]time.Push(new Year()); [/CSHARP]
everyone.[/QUOTE]
WARNING: Failed to specify argument #1 on Year constructor, assuming 0.
[QUOTE=Tezzanator92;33984899]I must have installed c# 2010 express about 70 times before, I have not once had any trouble. What the hell are you doing wrong? :v:[/QUOTE]
I don't know honestly. I'm installing it normally. Reinstall didn't fix anything :((
[QUOTE=amcfaggot;33985620]WARNING: Failed to specify argument #1 on Year constructor, assuming 0.[/QUOTE]
The stack keeps the id's, not the year.
How does the stack handle entities before 0? :O~
There is a line where I should have stopped posting, and I've crossed it, so I'll stop now.
Spent nearly an hour trying to get the print() function to work properly - I was passing a pointer to a member function of the dialog class into the tokenizer and parser, with a bunch of stuff that to convert between pointer types that I still don't quite understand the necessity of; after an hour of watching it segfault every time the function was called, I gave up and stored a pointer to the textbox in a global, and then created a regular function that wrote to the textbox from there. I feel dirty :S
Anyway, beats VB6 :v:
Sorry, you need to Log In to post a reply to this thread.