0

Java Pitfalls, Groovy to the rescue

Came across this interesting Java gotcha recently, which has surely caused some really infuriating bugs for some hapless programmer out there. I thought it would be good to document it here and also one alternative workaround. Consider the following:

import java.util.ArrayList;

public class TestCode
{
	public static void main(String[] args)
	{
		ArrayList<String> list = new ArrayList<String>();
		list.add("one");
		list.add("two");
		list.add("three");
		list.remove(0);
		System.out.println("Result1: " + (2 == list.size()));
		// Result1: true
	}
}

The result, of course, is expected. However, consider the following version, where we change ArrayList to Collection:

import java.util.Collection;

public class TestCode
{
	public static void main(String[] args)
	{
		Collection<String> list = new ArrayList<String>();
		list.add("one");
		list.add("two");
		list.add("three");
		list.remove(0);
		System.out.println("Result2: " + (2 == list.size()));
		// Result2: false
	}
}

The result is false because the list size is still 3! We can thank autoboxing for this. The problem lies with the fact that the Collection interface has a remove method, but this method removes the item matching the content, not the item in that index order, which is what the remove method of ArrayList does. In this case, Java autoboxes the integer 2 into an Integer object, looks in the list to see if there is an element with the content of 2, fails to find one, and doesn’t remove anything from the list.

To me, this is a definite code smell as one of the guidelines of clean code is that a method named foobar() in one class should have the exact same functionality as a foobar() in another class. For example, if you have a method called checkForUser(userName) in an authentication module that well, checks if an user exists, a checkForUser(userName) in another authentication module, apart from checking if the user exists, should NOT add the user if the check is negative! Code smells like this lead to many hours of frustrating debugging, especially in big teams on big projects.

However, all is not lost. As mentioned in the title, Groovy will help to save your day. Try out the Groovy code below:

ArrayList<String> list = new ArrayList<String>()
list.add("one")
list.add("two")
list.add("three")
list.remove(0)
println 2 == list.size()
// true

Collection<String> list_ = new ArrayList<String>()
list_.add("one")
list_.add("two")
list_.add("three")
list_.remove(0)
println 2 == list_.size()
// true

You’ll see that Groovy is smart enough to keep things consistent as the results are both true! Of course, on the flip side, if you’re a seasoned Java veteran for whom quirks like this are ingrained in your soul, you might find the Groovy interpretation of remove unnatural. Personally for me though, the Groovy interpretation is much more natural and intuitive.

0

Game of Thrones Word Cloud Fun

Given the recent interest in Song of Fire and Ice recently due to the HBO premiere of Game of Thrones, I did up the word cloud below for GoT using Wordle
Wordle: Game of Thrones

This is pretty interesting since a SoFaI newbie can draw the following conclusions from looking at the word cloud:

  • Lords and Sers abound in the book
  • Jon, Catelyn, Tyrion, Ned, Dany and Arya all play a pretty major role in the book
  • Pycelle? Not so much
  • etc…

For those who are interested in doing this to their favorite pdfs, here’s the groovy code that I cobbled together to extract the text from pdf:

package extract

import com.itextpdf.text.pdf.PdfReader
import com.itextpdf.text.pdf.parser.PdfTextExtractor

class Book
{
	def path
	def start
	def end
	def outputFileName

    // start - page in the .pdf you want to start extracting from. No point extracting from preface and content pages
    // end - last page to stop extracting. Not interested in the family descriptions, etc
	Book(path, start, end, outputFileName)
	{
		this.path = path
		this.start = start
		this.end = end
		this.outputFileName = outputFileName
	}
}

books = []
books.add(new Book("C:\\book1.pdf", 3, 553, "book1.txt"))
books.add(new Book("C:\\book2.pdf", 3, 596, "book2.txt"))

for (eachBook in books)
{
	reader = new PdfReader(eachBook.path)
	wordList = [];

	for (i in eachBook.start..eachBook.end)
	{
		page = PdfTextExtractor.getTextFromPage(reader, i)
		lines = page.split()
		for (eachWord in lines)
		{
            // Because I only want to capture entities and not ALL the text,
            // the regex below is a naive method to capture only words that start with
            // an uppercase letter, e.g ,Ned\" and have at least 2 characters as there's
            // a good chance that it's an entity. This can be made more sophisticated with time.
			capitalisedWordRegex= /.*?([A-Z][a-zA-Z]+).*/
			matcher = (eachWord =~ capitalisedWordRegex)
			if (matcher.matches())
				wordList.add(matcher[0][1])
		}
	}

	outputFile = new File(eachBook.outputFileName)
	for (eachWord in wordList)
		outputFile.withWriterAppend{ file -> file << eachWord + "\n"}
	println "Finished $eachBook.path"
}

Once you have the output file, you can create your word cloud using by going to Wordle, or you can download Wordle and generate a picture which you can save and use. Here’s some of the clouds I generated.