My Java Text Mining Toolkit
About the Project
I started working on text mining and information retrieval sometime around 2007, and I am still learning. This project is a work-in-progress, and contains some components and building blocks I built as part of learning this stuff. As I learn more, more code will be added. Use it if you find them useful.
I plan on using this site as a placeholder for this code. There is no formal documentation. However, I usually talk about them in my blog. Links to posts that deal with particular aspects of the code in the project are listed below. I am also usually pretty liberal with inline comments (Javadoc and non) in my code, so you may want to download the source code and generate the Javadocs locally if that makes sense, or read through the code.
If you have questions, please post it as a comment to the relevant blog post, that way you have a better chance of getting an answer, either from me or from other readers. If you find bugs, it would be awesome if you can send me a patch through the tracker, else just point it out on the relevant blog post.
Links to specific posts
Here are links to some of my blog posts covering parts of the code in the project as I built them.
The code in this project is released under the Lesser GNU General Public License (LGPL) which pretty much allows you to incorporate code from this project into your own (commercial or otherwise) project without fear of liability and without an expectation from you to open source your own project.
About the Site
I got the template for the site from Open Source Web Design, where it was contributed by Craig from DesignCreek (Thanks Craig). I choose the template because the pencil caps reminded me of overlapping bell curves, and because pencil and paper are tools that are typically forgotten when we talk about all this fancy computer stuff, and because they symbolize collaborative learning, which is pretty much what I hope to do with this site.