<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Guy Rutenberg &#187; mctext</title>
	<atom:link href="http://www.guyrutenberg.com/category/projects/mctext/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.guyrutenberg.com</link>
	<description>Keeping track of what I do</description>
	<lastBuildDate>Wed, 16 Jun 2010 19:53:40 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.1</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>mctext 0.2 &#8211; A Markov Chain Text Generator</title>
		<link>http://www.guyrutenberg.com/2008/04/30/mctext-02-a-markov-chain-text-generator/</link>
		<comments>http://www.guyrutenberg.com/2008/04/30/mctext-02-a-markov-chain-text-generator/#comments</comments>
		<pubDate>Wed, 30 Apr 2008 17:41:57 +0000</pubDate>
		<dc:creator>Guy</dc:creator>
				<category><![CDATA[C/C++]]></category>
		<category><![CDATA[Projects]]></category>
		<category><![CDATA[mctext]]></category>

		<guid isPermaLink="false">http://www.guyrutenberg.com/?p=42</guid>
		<description><![CDATA[This is the second release of my Markov Chain text generator &#8211; mctext. This text generator takes existing sample text, and generates a new text using Markov Chains.
The main new thing in the version in that it allows the users to specify via the command line how many words should be considered when generating the [...]]]></description>
			<content:encoded><![CDATA[<p>This is the second release of my Markov Chain text generator &#8211; <a href="/2008/01/29/mctext-using-markov-chains-to-generate-text/"><code>mctext</code></a>. This text generator takes existing sample text, and generates a new text using Markov Chains.</p>
<p>The main new thing in the version in that it allows the users to specify via the command line how many words should be considered when generating the next one. The bigger the step number the closer the generated text is to the original one. The value used in mctext-0.1 was 2, and this is also the default in this one. The number of steps can be set using the <code>--steps</code> command line switch.<br />
<span id="more-42"></span><br />
In this version also couple of bugs were fixed (mostly segmentation faults). Another change the regular user will not notice, as it happened under the hood. I&#8217;ve redesign the program, and gave better software architecture that hopefully will allow one to extend its abilities and generalize its output generation.</p>
<p>I planned to add more features in this release but due to lack of time, I&#8217;ve decided to release as-is. The design rewrite is part of a future plan to allow <code>mctext</code> to operate on music pieces (probably MIDI). This should add a new dimension to the program and allow it to utilize the new generalized structure to generate new music pieces based on sample ones. This future project can have much nicer results that just plain text generation. I really hope I&#8217;ll find the time to complete it.</p>
<p>The new package can be downloaded found here &#8211; <a href="/wp-content/uploads/2008/04/mctext-0.2.tar.bz2">mctext-0.2.tar.bz2</a>. Compilation and installation remained the same as in the previous version. The only dependency is the Boost C++ library.</p>
<p>This is a free software, so please fill free to modify or hack it any way you like. It would be great if you can send a comment when you do so. Also, if you got an interesting idea how this program can be used or modified, please comment.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.guyrutenberg.com/2008/04/30/mctext-02-a-markov-chain-text-generator/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>mctext &#8211; Using Markov Chains to Generate Text</title>
		<link>http://www.guyrutenberg.com/2008/01/29/mctext-using-markov-chains-to-generate-text/</link>
		<comments>http://www.guyrutenberg.com/2008/01/29/mctext-using-markov-chains-to-generate-text/#comments</comments>
		<pubDate>Tue, 29 Jan 2008 07:37:45 +0000</pubDate>
		<dc:creator>Guy</dc:creator>
				<category><![CDATA[Projects]]></category>
		<category><![CDATA[mctext]]></category>

		<guid isPermaLink="false">http://www.guyrutenberg.com/2008/01/29/mctext-using-markov-chains-to-generate-text/</guid>
		<description><![CDATA[mctext is a new project of mine, focusing on text generation using Markov Chains. This little utility reads a sample text file, preferably a large one, and generates new text based on the semantics given in the sample text.

How does it work?
mctext reads the given file and treats it as a list of words. Now [...]]]></description>
			<content:encoded><![CDATA[<p><code>mctext</code> is a new project of mine, focusing on text generation using <a href="http://en.wikipedia.org/wiki/Markov_chain">Markov Chains</a>. This little utility reads a sample text file, preferably a large one, and generates new text based on the semantics given in the sample text.<br />
<span id="more-38"></span></p>
<h4>How does it work?</h4>
<p><code>mctext</code> reads the given file and treats it as a list of words. Now it randomly chooses two adjacent words and puts in the output string. Now the text generation employs Markov Chains to continue. It takes the last two words in the output string, and searches for all the words that follow them in the sample file. He choose between those words randomly and adds the chosen one to the string. After doing so, it repeats the process until enough new text is generated.</p>
<p>For example, this was an output of the program when given 500 posts from Tech Crunch as sample text:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">$ ./mctext -w 100 tc.txt
declined to name a specific position on the internet is now 
extended through the birth of high velocity P2P file sharing
and broadcasting short experiences, thoughts and fantasies.
By that we can look forward to seeing everyone. Loic Le Meur,
a well of useful contextual information that would be complete
without a second's hesitation. DonorsChoose Doing Well, But
Fred Wilson and the</pre></div></div>

<p>(tc.txt was the file holding the text of the 500 posts)</p>
<h4>Compiling and Using <code>mctext</code></h4>
<p>If you want to try it yourself, download the source package from <a href="/wp-content/uploads/2008/01/mctext-0.1.tar.bz2">here</a>. Compiling is pretty straight forward (<code>./configure &#038;&#038; make</code>). The program depends on the  <a href="http://www.boost.org">Boost</a> library. Some Linux distributions separate the additional Boost libraries from the core ones, so if it&#8217;s your case you will need to install the <code>program-options</code> library.</p>
<p>Invoking the program is simple. Just pass it the sample text file as argument and use <code>-w NUM</code> flag to specify how much words do you want it to generate. <code>mctext</code> can also take the sample text from stdin. See <code>mctext --help</code> for more information.</p>
<p><code>mctext</code> is a new project, and the current implementation was a proof-of-concept. As such, there is still a lot to improve and look up to.For the next I&#8217;m planning to allow changing the number of words considered at each step from the command line, as well as improving the sentence recognition. If you found a bug, or you have any suggestion for new feature I will be glad to hear.</p>
<p>Update: I&#8217;ve released a new version of <a href="/2008/04/30/mctext-02-a-markov-chain-text-generator/">mctext</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.guyrutenberg.com/2008/01/29/mctext-using-markov-chains-to-generate-text/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

<!-- Dynamic Page Served (once) in 0.402 seconds -->
