Wednesday, July 15, 2009

Perl regular expressions, can you help?

say i created a perl regular expression to capture any URL on the net. Then i save that file with .pl extension. Then i upload it to a directory in a web hosting company%26#039;s account as part of my perl-based web page.



is that it? as soon as i run the file, it starts collecting?



thanks



Perl regular expressions, can you help?





No, that%26#039;s not it.



Regular expressions allow you to match patterns within strings. Just because you have a regular expression says nothing about where you get the strings from.



If you have a regular expression in a Perl script on a web server, then you expect it to run when users specifically request it (or when invoked via server-side includes on other pages). So it would execute on demand. What URL would you like to match? Your own? Perhaps the referring page (if supplied by the user%26#039;s browser)? You won%26#039;t get so many URLs that way, and it would be more thorough to use Perl to extract data from your web server%26#039;s log files.



If you were expecting your script to visit other web sites and try to match URLs, then you%26#039;d need to do something to make it scan the internet, such as give it a starting page, and write the code to scan for all the URLs on that page and follow them. This is called spidering.



Spidering, if that%26#039;s what you%26#039;re trying to do, has a number of risks. The internet is rather large; the chance that you have sufficient storage/bandwidth to collect all the URLs on the internet is zero. And that%26#039;s if you avoid circular linking, or infinite linking (where sites will give you new, unique URLs to index every time you ask).



Perhaps you should ask another question where you state what you are trying to accomplish, and then more specific help could be provided.



You might also consider asking your Perl questions in a Perl-specific forum, such as at http://www.perlmonks.org/

No comments:

Post a Comment