Archive for the 'Using the Internet as a Student' Category
Utilizing the internet as a modern student: Tips from a Master Googler
This is an article in two parts, the first explains search engines and searching, while the second explains actual tips about searching, skip to the second part if you don’t want to read about search engines and searching.
Part 1: Understanding the Google Algorithm and computer searching
The internet is a big place. What amasses in the petabytes (1 petabyte = 100,000 Gigabytes; possibly even larger) of information, culminates one of the largest data archives in the known universe, and best yet, it’s actively archived by search engines such as Google, Microsoft Live, and Yahoo. However, in it’s initial days, search engines were loaded down with poor information and bad sorting algorithms. Fortunately, this is no longer the case. Ever since Google came into being, the original revolutionary search engine sorting algorithm, pretty much everyone has updated their algorithms to be more efficient, and thus more useful.
However, despite this, the ever growing Google index can be difficult to navigate and find useful information. With literally trillions of entries (link) in the index, searching through it to find anything from a niche topic to the latest big news article can be an absolute nightmare (and just imagine what it’s like to INDEX all of this).
First however, you need to understand a little about searching.
For the most part, computer search programs index information based on key words. Indexing, is the action of taking significant (important) information, id est, key words, and adding the key words plus their location to a massive database called an index. Like the index cards at your local library, the index contains significant information and it’s location, along with some other information (Google actually keeps a cache of the sites the crawl, id est they keep a copy). However, unlike your index cards, indexes can span trillions of entries across terabytes of drive space.
Computers, however, are able to take this information and process it based on rules (aka an algorithm) and update the index with this information.
This introduces a couple of problems.
First two are technical. The first of the two is searching. Its extremely inefficient to just go from A->Z on the index list. Computer’s on their own are incapable of just "skimming" through the index like we are to find the information they’re looking for. Therefore more efficient sorting algorithms such as Binary searching are used (although I highly doubt Google uses this mechanism, binary searching is a lot more efficient than "linear" searching, but is still inefficient in it’s own respect, id est serving up the kind of traffic Google serves).
The second is keeping all of this information up to date. Sites, such as this blog, are constantly updated with more or less of a frequency. Wikipedia for instance gets updated very rapidly, as do popular forums. Other sites, such as the W3C’s site or a personal "about" page, are updated less frequently. There are quite a few initiatives working to resolve some of these issues but more often then not you’ll notice it takes a few hours/days/weeks/months for you to see changes in a search engine’s index because of the time it takes to "crawl" (crawling is a term used to indicate what search engine "spiders" do, they’re computer programs that crawl the web parsing information and updating the index) your site.
The third is probably the most obviously and also arguably the most important. How do you group all of that significant information together in a way that makes sense and is easy to use?
Google pioneered this thought with their revolutionary algorithm, of which the byproduct is a "Page Rank". Google’s method was to use both the archaic method of taking your keywords and finding the site with the most of those keywords (however this resulted in quite a few "spam" sites), with a new method that measured how many sites linked to that given site with the given key words. Exempli Gratia; say LifeHacker links to an article on Gizmodo with the key words "cell phone". This increases Gizmodo’s "Page Rank" because another site linked to it with given key words, so that when you search for "cell phone" you’re more likely to be linked to Gizmodo than some other site with just the key words "cell phone" and no site linking to it with those key words.
Confused yet? Well it gets worse. In this given example, LifeHacker has a high Page Rank, and thus it’s link to Gizmodo has more "weight" (id est, influence) on the Google Spiders than say if you opened a site on Geocities and linked to another site with the key words "cell phone".
Short little trip off the path. You probably are thinking "can’t this be exploited?" and the answer is yes. Exploiting the Google algorithm is called "Google Bombing" and was very prominent in the 2004 US Presidential election (search "failure" or "waffle" and you’d see George Bush’s website and Michael Moore’s website linked to the prior and John Kerry’s website linked to the latter).
Now for a little more information. Google has recently started keeping a history of your web searches (and web history if you use Google Toolbar or Google Chrome). This adds to a decent level of personalization in your search results as Google "learns" how you search (how you phrase your search terms) and what information you look for. However the extent to which this is effective is rather debatable.
Now that you have a better understanding of how Google sorts its data, lets work into part 2, which explains more of how to fully utilize the algorithm to find information.
Part 2: Utilizing the Google Algorithm (aka, good searching techniques)
First off, most common mistake is to ask Google a question. This theory is due to A.Natural habit (you have a question, we’re all trained to ask it a specific way) and B. Ask Jeeves, which is (misleadingly) getting users to ask it questions. Google is smart enough to omit some of the commonly used "question keywords" such as "how do I ___?" It will extrapolate that phrase and guide you to a site that has instructions on how to do whatever.
However, a better method is to completely remove the "how do I?" part, and just go with the ____ part. Exempli Gratia, if your question is how do I register to vote you’d enter register to vote in Google and it will likely lead you to a site that has registration information. Also, this search is further enhanced by adding your state of residence, such as register to vote Texas. Also keep in mind that Google does not keep your search terms linear unless you specifically ask it to. So register to vote Texas is the same as Texas register to vote which is the same as register Texas vote to. If you want to signal Google to keep all search terms (not omit any search terms that you inputted) and in that specific order, use quotation marks around your search term. Keep in mind that this will seriously hamper the amount of search terms returned, and while useful for honing in on specific things such as quotes, will be a lot less useful for generic information because other people might have phrased their information differently. Also keep in mind that words like "to" and "for" and words like that are largely unnecessary in a search, but can sometimes be helpful if you need to reference back to that search later.
Second is probably isn’t what a lot of you are going to want to hear, but, while decent search terms yield decent to good answers, great search terms will often times lead you to exactly what you want. Great search terms often times require research, unless you already know about the given topic.
Like the Substitution Method in Integral Calculus, finding great search terms is more or less a trial-and-error process, all the time while you refine your search terms. On average, I usually do 3-4 searches in Google before I find the information I was looking for. Good search terms will often lead you to sites with good information that helps you refine your search terms.
However, I will most often (probably 95%-98% of the time) find my answer (or information to refine my search terms) on the first 2 pages of search results. The thing about the Google algorithm is the information grows more irrelevant as you proceed farther down the search results.
This can usually be a huge time saver for those of you used to pouring over all those pages of miscellaneous and seemingly irrelevant information (often times it is). If your top results are irrelevant to your topic, chances are you need to change your search terms.
But what is a good search term? Well like I said, it’s largely a trial-and-error method, but once you get used to doing it you can usually come up with good to great search terms off the top of your head in seconds. It also has a lot to do with how much you know on a given topic.
However lets take an example. I know a lot about Windows Mobile, media formats (such as H.264), and the AT&T Tilt. Someone asked me a question about H.264 playback on the AT&T Tilt. Personally I had never tried this (I usually use my laptop for video playback if it’s necessary), so I have no idea what techniques work and which don’t. However, I know that the Tilt in all logic should be able to run SD H.264 videos with decent framerates. I searched "Windows Mobile H.264" (because Google handles search terms based on key words, not on the actual question), and then scanned through the search results. Turns out my answer was the fourth link down "Hackzine.com: HOWTO - iPod and PSP movies on Windows Mobile". I happened to know this was correct information because I know the iPod and PSP use H.264 as their primary codec. I also checked the description of the link which read "Last week I mentioned that you can use TCPMP on Windows Mobile Smartphones and Pocket PCs to view H.264 encoded MP4s". Notice how my search terms are bold-faced? That description was exactly the information I wanted or at least had further information on it within that link.
How do you refine your search terms though? This can sometimes be more ambiguous but will often times require you to read an article. However, reading entire articles can be lengthy and time consuming, something we as students have very little of. So I’d like to introduce you to a keyboard shortcut. I know this works in Firefox (my favorite method) and in Internet Explorer (a more annoying version with a popup box). Ctrl+F. This is the "find" shortcut (you can go into your toolbar, and Find is usually under "Edit", sometimes called "Find in this page" or "Where is") and can be a life saver. Remember how I referenced the link descriptions in Google? How they bold-face your matching search terms to the key words on that page? Well this is where that comes really in handy. Typically you can see a short preview of the information on the page before you even enter it, allowing you to discern if it’s even worth looking at, but sometimes you need to read that entire paragraph of information in order to comprehend what it’s talking about. So you enter the page, Ctrl+F, enter your search terms that were bold faced on Google, and this will take you directly to the location of those keywords in that page. Keep in mind that, while Google’s search tool is not linear, Firefox’s search (and probably all of them) ARE. So in Firefox, Windows Mobile is not the same as Mobile Windows, yet they will yield the same results in Google if you search the two.
So once your search terms are highlighted in the page (you may have to hit "Next" a few times to find that paragraph, but you may also need to find further information in that page) you can read up a little more information and may even be able to refine your search topic, or even find your answer.
Now I know that sounds like a lot to learn and practice. It does take quite a while to learn well. I’ve been using Google for years and years, and it’s taken a long time to work out my technique. However, with all the searching we do as students, you can work out a method rather quickly.
No commentsUtilizing the Internet as a modern student: Part 1
Ok so this is probably going to be the first in a series I do on utilizing the internet as a modern student, making use of all the different tools we have in modern times to better ourselves.
What prompted this is I was never taught Algebra properly (and never really listened either to be bluntly honest), so I’m terrible with Algebra. However, if you’ve read my About page, you know I’m also an engineer. Math is essential to my degree, including a bunch of Calculus courses. I happen to be in Calculus 2 (or Integral Calculus, it differs at different colleges) and I’m having a time with all the Algebra tricks required to solve a bunch of these equations.
In the old days I could go for tutoring, get a lot of help from my professor, or review through the book.
But this is the Information Age, what can I do now that doesn’t inconvenience the tutor (they have lots of other students to help out), inconvenience my professor (he’s busy enough as it is), or make me want to pull my hair out in a generally ineffective manner?
If you’re not familiar, a lot of colleges have been putting some of their courses online. MIT, U/C Berkeley, Stanford, etc. All of these and many more have been initiating free courses (called courseware) to be downloaded and viewed on demand to whoever.
So, as bad as a taste that this leaves in my mouth, I launched iTunes and went to the "iTunes U" category in the iTunes store, then scrolled down to mathematics and found a College Algebra course with ~35 lectures for download from Florida Community College at Jacksonville. Perfect.
I also found differential and integral calculus courses from MIT which will be great for review throughout the days. Plus I can have the MIT training in Calculus and other courses. It may be beneficial in an interview to say "yeah I took the course at my own college but I also took MIT’s course via their OpenCourseware service, so I have both my college’s training and MIT’s training"
So time to begin the review, aka totally retaking the course but I can’t think of a much better way to do it.
No comments
