<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>David Sterry&#039;s Blog &#187; Search</title>
	<atom:link href="http://davidsterry.com/blog/category/search/feed/" rel="self" type="application/rss+xml" />
	<link>http://davidsterry.com/blog</link>
	<description>Better than bad, it&#039;s good</description>
	<lastBuildDate>Tue, 03 Jan 2012 22:19:06 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.1</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Search Phrases</title>
		<link>http://davidsterry.com/blog/2006/04/search-phrases/</link>
		<comments>http://davidsterry.com/blog/2006/04/search-phrases/#comments</comments>
		<pubDate>Tue, 04 Apr 2006 09:48:00 +0000</pubDate>
		<dc:creator>David Sterry</dc:creator>
				<category><![CDATA[Search]]></category>
		<category><![CDATA[Trend Sweet Trend]]></category>

		<guid isPermaLink="false">http://10.168.29.5/blog/?p=35</guid>
		<description><![CDATA[<!--Searching /home/davidste/public_html/blog/../images/random: found 9 images in 0.000151 seconds-->
<!---Displayed in 0.000381 seconds.-->
A while back I created Trend Sweet Trend and added a search phrase tracker to it. It tracks the most popular search phrases people type into Yahoo! Search thanks to Yahoo! Buzz which updates each weekday. The search phrases are on the left and on the right is a grid showing what rank each term [...]]]></description>
			<content:encoded><![CDATA[<!--Searching /home/davidste/public_html/blog/../images/random: found 9 images in 8.9E-5 seconds-->
<!---Displayed in 0.000261 seconds.-->
<p>A while back I created Trend Sweet Trend and added a search phrase tracker to it. It tracks the most popular search phrases people type into Yahoo! Search thanks to Yahoo! Buzz which updates each weekday. The search phrases are on the left and on the right is a grid showing what rank each term was at in the past. American Idol for instance is in 1st place and has been for the last two days that Yahoo Buzz was reporting. Before that it was 2nd and before that 12th. Green means it&#8217;s rank has risen or held steady after rising. Red means it fell. Purple shows you that it&#8217;s the first time it&#8217;s ever been on the list. Ever being not a very long time(I started tracking in late November 2005).</p>
<p>I am always amazed that most of what people search for are celebrities. Go <a href="http://www.sterryit.com/sptrends.shtml">here</a> if you want to see the live page and click on links.</p>
<p><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.sterryit.com/images/tstbuzz.gif"><img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 412px;" src="http://www.sterryit.com/images/tstbuzz.gif" alt="" border="0" /></a><br />Since I&#8217;ve been following this I&#8217;ve noticed that entertainment is really what bonds us all together. We don&#8217;t all do the same work or really spend much time worrying about the same news. But we do watch the same hit TV shows and follow the most famous celebrities. This page is really just there to try and make sense of what&#8217;s most popular and was an interesting extension for me of the first <a href="http://www.sterryit.com/trends.shtml">Trend Sweet Trend</a> page. If you like it or want to see something changed about it, let me know.</p>
]]></content:encoded>
			<wfw:commentRss>http://davidsterry.com/blog/2006/04/search-phrases/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Our Own Google</title>
		<link>http://davidsterry.com/blog/2006/03/our-own-google/</link>
		<comments>http://davidsterry.com/blog/2006/03/our-own-google/#comments</comments>
		<pubDate>Mon, 06 Mar 2006 05:17:00 +0000</pubDate>
		<dc:creator>David Sterry</dc:creator>
				<category><![CDATA[Search]]></category>

		<guid isPermaLink="false">http://10.168.29.5/blog/?p=31</guid>
		<description><![CDATA[<!--Searching /home/davidste/public_html/blog/../images/random: found 9 images in 7.9E-5 seconds-->
<!---Displayed in 0.000255 seconds.-->
The other day I began to think about search engines and their innately commercial nature. I thought, why is it that the Internet, which exists as a large network of networks, relies on commercial organizations to index the web and serve search results? Shouldn&#8217;t the Internet index itself and provide search as a basic utility [...]]]></description>
			<content:encoded><![CDATA[<!--Searching /home/davidste/public_html/blog/../images/random: found 9 images in 8.6E-5 seconds-->
<!---Displayed in 0.000262 seconds.-->
<p>The other day I began to think about search engines and their innately commercial nature. I thought, why is it that the Internet, which exists as a large network of networks, relies on commercial organizations to index the web and serve search results? Shouldn&#8217;t the Internet index itself and provide search as a basic utility or protocol? With the success of the open source development model there must be great potential to create a community-owned-and-operated search engine.</p>
<p>What would be the benefits of such a system? It would ideally collect only aggregate and anonymous(hash IP addresses with random session keys) logs that would be used for the sole purpose of improving search. The servers providing search would be donated via a distributed effort similar to Seti@Home except limited more to local ISPs with the server resources to provide the search results and maintain replicated indices.</p>
<p>After a bit of thought on this concept, I did some research. Turns out there are a couple of projects over at the Apache Software Foundation by the names of Lucene and Nutch. These are free software projects with the goal of developing world class indexing and search application software. I&#8217;m not sure if they&#8217;re trying to build a distributed web search infrastructure but their project is certainly important to the idea I&#8217;ve been thinking about. One thing I learned from their FAQ that I did not know is that indexing the web is not the most bandwidth intensive task a company like Google or Yahoo has to deal with&#8230;it&#8217;s actually serving the search results. Makes sense when you think of how many times people look at a specific site versus how often it would change.</p>
<p>One other thought before I call an end to this post. Free software is generally good at recreating or making small improvements to commodity proprietary software where features have been stable for some time. Since search engines are being rapidly developed in the commercial sector, it is difficult for free software to supplant leaders like Google and Yahoo at their own game. It may be necessary to start the search engine I&#8217;m talking about by looking to index sites that would provide little or no ad-revenue. Think scientific research and academia.</p>
]]></content:encoded>
			<wfw:commentRss>http://davidsterry.com/blog/2006/03/our-own-google/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>My own Google</title>
		<link>http://davidsterry.com/blog/2006/01/my-own-google/</link>
		<comments>http://davidsterry.com/blog/2006/01/my-own-google/#comments</comments>
		<pubDate>Tue, 31 Jan 2006 09:46:00 +0000</pubDate>
		<dc:creator>David Sterry</dc:creator>
				<category><![CDATA[Search]]></category>
		<category><![CDATA[Trend Sweet Trend]]></category>

		<guid isPermaLink="false">http://10.168.29.5/blog/?p=29</guid>
		<description><![CDATA[<!--Searching /home/davidste/public_html/blog/../images/random: found 9 images in 7.8E-5 seconds-->
<!---Displayed in 0.000252 seconds.-->
I&#8217;m writing my own search engine. I know it sounds crazy but it&#8217;s just the sort of project that fuels my fire. At this point, I&#8217;m keeping the scope of the search to urls that have appeared on my Trend Sweet Trend(retired in 2008) page currently numbering (>7000). I&#8217;ve written 3 perl scripts so far [...]]]></description>
			<content:encoded><![CDATA[<!--Searching /home/davidste/public_html/blog/../images/random: found 9 images in 7.7E-5 seconds-->
<!---Displayed in 0.000249 seconds.-->
<p>I&#8217;m writing my own search engine. I know it sounds crazy but it&#8217;s just the sort of project that fuels my fire. At this point, I&#8217;m keeping the scope of the search to urls that have appeared on my <a href="http://www.sterryit.com/trends.shtml">Trend Sweet Trend</a>(retired in 2008) page currently numbering (>7000). I&#8217;ve written 3 perl scripts so far and setup 5 tables on my newly LAMP&#8217;d web server. </p>
<p>One table stores the urls and the last time they were indexed. Another is a queue of urls to be indexed. A third stores the parsed text from the indexing operation. The other two aren&#8217;t storing anything yet but will be for the cache of each page and to keep track of links between sites once I unleash a crawling feature.</p>
<p>As for the three scripts, one grabs the urls from my Trend Sweet Trend database and puts them in my search engine&#8217;s crawling table. The second checks the last time the sites in the url table were indexed and adds urls that need indexing to the queue. Finally the third script uses curl to visit each page, parse out text using HTML::Parser and stores the counts of each term in my search terms table. It&#8217;s quite primitive, indexes a lot of garbage text, and can&#8217;t do any booleans but besides that it&#8217;s great. Watch out Google. Anybody got a spare datacenter with 100000 commodity boxes?</p>
]]></content:encoded>
			<wfw:commentRss>http://davidsterry.com/blog/2006/01/my-own-google/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

