Magnus K Karlsson: February 2009

February 28, 2009

Enterprise Integration Diagram for Apache ServiceMix, Camel and Petal

One of the challenge when doing integration job with an ESB solution is to visualize it. If you are using Eclipse rescue is on the way - Eclipse Enterprise Integration Designer, STP.

The Eclipse STP project implement, some of the most imported pattern of the Enterprise Integration Pattern.

The project is still in incubation and there are some problem to run it, even if you follow the installation instruction from the STP site.

Installation

Download Eclipse Ganymede SR1 Enterprise edition.
Start Eclipse and open Help → Software Updates...
Select Ganymede → SOA → STP Designer, see snapshot. and install.
After restart, close Eclipse and download patch from http://webui.sourcelabs.com/eclipse/issues/240077
Extract the zip file to eclipse/plugins folder and start Eclipse.

Creating Integration Patterns Diagram

Create a pattern project, e.g. a simple Java project.
Create New and Other and select Integration Patterns Diagram.
At the time being is unfortunately only ServiceMix and Petal, supported, so select ServiceMix.
End the guide and finish.

After that you are set to create diagrams.

February 27, 2009

Clustering and CAP Theorem

When designing an application for clustering one should first be aware of the CAP (Consistency, Availability and Partitioning) theorem. The theorem states that you can only have two of the three CAP properties at the same time.

The Consistency means that all users see the same set of data, even when data updated or deleted. This is normally achieved with storing data to a database and using transactions.

Availability is achieved through replicating data, so that data is always available even in case of failure.

The last, Partitioning, means that the system is partitioning tolerant, i.e. the system holds when it is distributed over several servers and machines.

So which one should go? Well when building highly loaded applications as Google or Amazon the natural answer is Consistency, because letting the system not be available during failure or being not responsive during high load is not a options. This is quite interesting because most programmers are raised with the idea of using a database as a foundation for an application. And this idea must now go. Or? Well it is not so black and white.

The main key of building clustered application is twofold:

Asynchronous communication. E.g. Amazon has a separate service displaying what books other people has also bought. Failure of this service should not hinder the rest of the page to be rendered.
Brake down your application and analyze each part individually according to CAP properties. Does payment must be consistent? Probably. Does user information needs to be consistent. Probably not. Etc.

After you analyzed your application into different clustering functions you keep these data separated, because they will be deployed differently and independently.

So to achieve an High Availability and Partitioned Tolerant application is through scaling out, I.e. using more machines and using:

Stateless data, i.e. the applications server only holds request data.
Replicate read intensive data through a master/slave setup.
Caching data.
Sharding database, i.e. using multiple databases and manually decide which tables goes into which database.

February 22, 2009

Hibernate Search Example

In my previous blog I was writing about Hibernate Search in general, but in this blog I will show you code examples of using Hibernate Search. The preferred and easiest way to get started is to use Maven2, if you are not able to use Maven2 then please download the relevant jar file.

<project xmlns="http://maven.apache.org/POM/4.0.0"
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
       http://maven.apache.org/maven-v4_0_0.xsd">

<modelVersion>4.0.0</modelVersion>
<groupId>se.msc</groupId>
<artifactId>hibernatesearch</artifactId>
<version>0.0.1-SNAPSHOT</version>
<repositories>
 <repository>
  <id>repository.jboss.org</id>
  <name>JBoss Maven Repository</name>
  <url>http://repository.jboss.org/maven2</url>
  <layout>default</layout>
 </repository>
</repositories>
<dependencies>
 <!-- For Test -->
 <dependency>
  <groupId>junit</groupId>
  <artifactId>junit</artifactId>
  <version>4.5</version>
  <scope>test</scope>
 </dependency>
 <!-- Hibernate Core -->
 <dependency>
  <groupId>org.hibernate</groupId>
  <artifactId>hibernate-core</artifactId>
  <version>3.3.1.GA</version>
 </dependency>
 <!-- Hibernate Annotation -->
 <dependency>
  <groupId>org.hibernate</groupId>
  <artifactId>hibernate-annotations</artifactId>
  <version>3.4.0.GA</version>
 </dependency>
 <!-- Hibernate Annotation uses SLF4J -->
 <dependency>
  <groupId>org.slf4j</groupId>
  <artifactId>slf4j-log4j12</artifactId>
  <version>1.5.2</version>
 </dependency> 
 <!-- Hibernate EntityManager -->
 <dependency>
  <groupId>org.hibernate</groupId>
  <artifactId>hibernate-entitymanager</artifactId>
  <version>3.4.0.GA</version>
 </dependency>
 <!-- Hibernate Validator
 <dependency>
  <groupId>org.hibernate</groupId>
  <artifactId>hibernate-validator</artifactId>
  <version>3.0.0.ga</version>
 </dependency> --> 
 <!-- Hibernate Search -->
 <dependency>
  <groupId>org.hibernate</groupId>
  <artifactId>hibernate-search</artifactId>
  <version>3.1.0.GA</version>
 </dependency>
 <!-- Hibernate Search 3part Lib -->
 <!-- Solr's Analyzer Framework -->
 <dependency>
  <groupId>org.apache.solr</groupId>
  <artifactId>solr-common</artifactId>
  <version>1.3.0</version>
 </dependency>
 <dependency>
  <groupId>org.apache.solr</groupId>
  <artifactId>solr-core</artifactId>
  <version>1.3.0</version>
 </dependency>
 <dependency>
  <groupId>org.apache.lucene</groupId>
  <artifactId>lucene-snowball</artifactId>
  <version>2.4.0</version>
 </dependency>
 <!-- MySQL JDBC connector -->
 <dependency>
  <groupId>mysql</groupId>
  <artifactId>mysql-connector-java</artifactId>
  <version>5.1.6</version>
 </dependency>
</dependencies>
<build>
 <plugins>
  <plugin>
   <groupId>org.apache.maven.plugins</groupId>
   <artifactId>maven-compiler-plugin</artifactId>
   <configuration>
    <source>1.6</source>
    <target>1.6</target>
   </configuration>
  </plugin>
 </plugins>
</build>
</project>

In this example I will be using Hibernate Core, Hibernate Annotation and Hibernate Search. I will not use Hibernate EntityManager, but using EntityManager instead of Session is and easy thing to do and will not impact the concept or code significantly. Lets start with configure our Hibernate Core with hibernate.cfg.xml.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE hibernate-configuration PUBLIC
   "-//Hibernate/Hibernate Configuration DTD 3.0//EN"
   "http://hibernate.sourceforge.net/hibernate-configuration-3.0.dtd">

<hibernate-configuration>
<session-factory>
 <property name="hibernate.dialect">org.hibernate.dialect.MySQLInnoDBDialect</property>
 <property name="hibernate.connection.driver_class">com.mysql.jdbc.Driver</property>
 <property name="hibernate.connection.url">
  jdbc:mysql://localhost:3306/test?createDatabaseIfNotExist=true&amp;useUnicode=true&amp;characterEncoding=utf-8
 </property>
 <property name="hibernate.connection.username">root</property>
 <property name="hibernate.connection.password"></property>
 <property name="hibernate.hbm2ddl.auto">create-drop</property> <!-- update -->

 <!-- Hibernate Search -->
 <!-- org.hibernate.search.store.FSDirectoryProvider -->
 <!-- org.hibernate.search.store.RAMDirectoryProvider for test -->
 <property name="hibernate.search.default.directory_provider">
  org.hibernate.search.store.RAMDirectoryProvider
 </property>
 <property name="hibernate.search.default.indexBase">
  /home/magnus/tmp/lucene/indexes
 </property>  

 <!-- Mapped classes -->
 <mapping class="se.msc.hibernatesearch.domain.Person" />
</session-factory>
</hibernate-configuration>

The Hibernate Search comes with sensible default values and there is actually only two values that needs configuring the directory provider and the base directory of the index files. In this example I will be using JUnit as start class and since unit tested class are run over and over again I will for consistency use an in-memory index in combination with drop and create the database schema. This way I will always get a clean start whenever restarting the JUnit text. To make our example complete here is the log4j.properties file.

# Root logger option
log4j.rootLogger=INFO, stdout

# Log native SQL
log4j.logger.org.hibernate.SQL=debug
log4j.logger.org.hibernate.bind=debug

# Direct log messages to stdout
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.Target=System.out
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d{ABSOLUTE} %5p %c{1}:%L - %m%n

To be able to use Hibernate from JUnit we need the popular a HibernateUtil.

public class HibernateUtil {

private static final SessionFactory sessionFactory;

static {
 try {
  AnnotationConfiguration conf = new AnnotationConfiguration();
  sessionFactory = conf.configure().buildSessionFactory();
 } catch (Throwable ex) {
  throw new ExceptionInInitializerError(ex);
 }
}

public static Session getSession() throws HibernateException {
 return sessionFactory.openSession();
}
}

Now lets start with annotating our domain class. For simplicity I will only use one class here.

@Entity
@Table(name = "PERSON")
@Indexed
public class Person implements Serializable {

private static final long serialVersionUID = 1L;

@Id
@Column(name = "PERSON_ID", updatable = false)
@GeneratedValue(strategy = GenerationType.IDENTITY)
@DocumentId
private Long id = null;

@Column(name = "FIRSTNAME", nullable = false, length = 250)
@Field(index = Index.TOKENIZED, store = Store.YES)
private String firstname = "";

@Column(name = "BIRTHDATE", nullable = false)
@Field(index = Index.UN_TOKENIZED, store = Store.YES)
@DateBridge(resolution = Resolution.DAY)
private Date birthdate = new Date();

public Person() {
}

public Person(String firstname, String birthdate)
  throws IllegalArgumentException {
 setFirstnameFromInput(firstname);
 setBirthdateFromInput(birthdate);
}

public Long getId() {
 return id;
}

protected void setId(Long id) {
 this.id = id;
}

public String getFirstname() {
 return firstname;
}

public void setFirstname(String firstname) {
 this.firstname = StringUtil.setEmptyStringAsNull(firstname);
}

public void setFirstnameFromInput(String firstname) {
 this.firstname = StringUtil.setEmptyStringAsNullAndTrim(firstname);
}

public Date getBirthdate() {
 return birthdate;
}

public void setBirthdate(Date birthdate) {
 this.birthdate = DateUtil.setTodayAsNull(birthdate);
}

public void setBirthdateFromInput(String birthdate)
  throws IllegalArgumentException {
 this.birthdate = DateUtil.setTodayAsNullAndParse(birthdate);
}

public String toString() {
 return "Person {" + "id=" + id + ", firstname='" + firstname
   + "', birthdate='" + DateUtil.format(birthdate) + "'}";
}
}

There is not much to it. We use @Indexed to mark the class searchable, @DocumentId for primary key, @Field for simple properties and @DateBride for properties that need transformation, remember that Lucene only works with strings. All Hibernate Search annotation are documented in org.hibernate.search.annotations There is only two annotation that need further explanation: index (Index.TOKENIZED | Index.UN_TOKENIZED) Tokenized split the text into words (Tokens) and removes insignificant words. See example.

Untokenized leaves the text unchanged. store (Store.YES | Store.NO) Both options indexed the field, but Store.YES writes the field to Lucene index file and makes it available via Luke. But the main difference is that now one can utilize projection which means you can avoid even touching the database, that is the benefit we are looking for when writing high speed search application. The main drawback when using project is that raw Object, containing String value, are returned, instead of domain object graphs. Using Store.YES should be the preferred way whenever you want high performance, and if you need to further manipulate the object, simple do a database round trip and grab the persisted domain object via the primary key. Another drawback of projection is that you can only index simple properties and on-to-one (embedded) object, but not other many relations. This is due of difference in the object mode between Lucene and Hibernate.

public class HibernateTemplate {

public Object execute(HibernateCallback action) throws HibernateException {
 Session session = null;
 Transaction tx = null;
 Object object = null;
 try {
  session = HibernateUtil.getSession();
  tx = session.getTransaction();
  tx.begin();
  object = action.execute(session);
  tx.commit();
 } catch (HibernateException e) {
  if (tx != null && tx.isActive())
   tx.rollback();
  throw e;
 } finally {
  if (session != null)
   session.close();
 }
 return object;
}

public void saveOrUpdate(final Object entity) throws HibernateException {
 execute(new HibernateCallback() {

  @Override
  public Object execute(Session session) throws HibernateException {
   session.saveOrUpdate(entity);
   return null;
  }
 });
}

public List find(final String query) throws HibernateException {
 return (List) execute(new HibernateCallback() {

  @Override
  public Object execute(Session session) throws HibernateException {
   return session.createQuery(query).list();
  }
 });
}

public List findWithFullText(String query, String field,
  final Class entity) throws HibernateException, ParseException {

 QueryParser parser = new QueryParser(field, new StandardAnalyzer());
 final org.apache.lucene.search.Query lucQuery = parser.parse(query);

 return (List) execute(new HibernateCallback() {

  @Override
  public Object execute(Session session) throws HibernateException {
   FullTextSession ftSess = Search.getFullTextSession(session);
   return ftSess.createFullTextQuery(lucQuery, entity).list();
  }
 });

}

public List findWithFullTextAndProjection(String query, String field,
  final Class entity) throws HibernateException, ParseException {

 QueryParser parser = new QueryParser(field, new StandardAnalyzer());
 final org.apache.lucene.search.Query lucQuery = parser.parse(query);

 return (List) execute(new HibernateCallback() {

  @Override
  public Object execute(Session session) throws HibernateException {
   FullTextSession fTS = Search.getFullTextSession(session);
   FullTextQuery fTQ = fTS.createFullTextQuery(lucQuery, entity);
   fTQ.setProjection("id", "firstname", "birthdate");
   return fTQ.list();
  }
 });

}
}

public interface HibernateCallback {

public Object execute(Session session) throws HibernateException;
}

public class SessionTest {

private static final Person[] INIT_DATA = new Person[] {
  new Person("Magnus", "1974-01-01"),
  new Person("Bertil", "1973-02-02"),
  new Person("Klara", "1972-03-03") };

private static final String FIELD = "firstname";

private static final Class ENTITY = Person.class;

HibernateTemplate temp = new HibernateTemplate();

private void printPersonResult(Person[] persons) {
 System.out.println("Number of hits: " + persons.length);
 for (Person person : persons) {
  System.out.println(person);
 }
}

private void printObjectResult(List res) {
 System.out.println("Number of hits: " + res.size());
 for (Object row : res) {
  Object[] objects = (Object[]) row; 
  for (Object o : objects)
   System.out.println(o);
 }
}

@Test
public void testSaveAll() throws Exception {
 for (Person person : INIT_DATA)
  temp.saveOrUpdate(person);
}

@Test
public void testFindAll() throws Exception {
 List res = temp.find("from Person");
 Assert.assertEquals("Testing find all.", 3, res.size());
 printPersonResult(res.toArray(new Person[0]));
}

@Test
public void testFullText() throws Exception {
 String query = "firstname:Magnus";
 List res = temp.findWithFullText(query, FIELD, ENTITY);
 Assert.assertEquals("Testing firstname search.", 1, res.size());
 printPersonResult(res.toArray(new Person[0]));
}

@Test
public void testFullText2() throws Exception {
 String query = "birthdate:19720303";
 List res = temp.findWithFullText(query, FIELD, ENTITY);
 Assert.assertEquals("Testing birthdate search.", 1, res.size());
 printPersonResult(res.toArray(new Person[0]));
}

@Test
public void testFullTextProjection() throws Exception {
 String query = "firstname:Magnus OR birthdate:19730202";
 List res = temp.findWithFullTextAndProjection(query, FIELD, ENTITY);
 Assert.assertEquals("Testing search projection.", 2, res.size());
 printObjectResult(res);
}
}

February 20, 2009

Hibernate Search

In my last project I worked with Apache Lucene doing full-text/free-text search. I was quite impress of the Lucene library and what it was capable of and the speed it was executing the search. After that I was thrill to look at the Hibernate Search project that unite the popular ORM library Hibarnate and Apache Lucene and this is what I concluded.

Using the Database Free-Text Capability
The full-text feature is not new and several popular databases already implement that feature, as Oracle DB, Microsft SQL Server and MySQL, but the problem with this are:

You cannot use HQL, but must use native SQL, i.e. your solution will not be portable.

But the greatest problem is scalability. Most of the tier of an server solution can easily be clustered, but the database is normally not deployed in that way, since its primary task is to upright hold atomicity. Normally you mirroring a database for fail-over, but not clustering it. And since full-text search can be very CPU and memory intensive doing full-text search directly against a database is not the best way.

SQL shortcoming
But there also other problems that SQL poorly addresses.

First when searching a text one is not interested of all the “glue"-words, e.g. a, the, over, under, but merely noun and verbs. The same thing goes for the query. This analyzing is not part of SQL where query are based on the same order and all words, that the query contains.

Another importing feature of a rich text-search library is handling of words with the same root and meaning, e.g. save, saving, saved. This should a good search-text library take into account.

To make a search library appreciated, it should also understand typos, it should have a more phonetic approach.

The last, but not the least, is returning search result sorted by relevance. Relevance is often defined as:

If a query contain multiple word, search result where the same word order is more resembled, should have a higher rank.
If a query contain multiple word, search result with the most word match frequntly, should have a higher rank.
If the query contains typos, the better resembles, the higher rank.

When NOT to use Hibernate Search
Even if Lucene is great there are some time, when you do not want to use it. These cases are when you want to search after a specific column, e.g. date, integer column or when you want to wild card search a specific word, then you are better off with SQL queries. This is of course natural when you think about it since these are not free-text field, merely singular value columns.

All this does Apache Lucene promise, but then why Hibernate Search, what does it offer? The problem between Lucene and hibernate is twofold:
Hibernate uses a structured domain model with association, etc., but where Lucene stores indexed in a flat hierarchy.
How to handle synchronization between Hibernate ACID CRUDE operation and the Lucene Index. Once updating the database one expects the index also be updated.

February 19, 2009

Remove Gnome Globalmenu

I have been playing around with making Ubuntu look like Mac OSX, and when doing so I installed gnome2-globalmenu. Before the latest version there where no synaptic package available so one was left with manually compile and install the applet by hand. This worked and there are several guides written to guide you, but the problem arise when you want to uninstall it. I have been search the web but found only one forum discussion in german how to completely uninstall the gnome global menu.

First remove the applet and the environment variable GTK_MODULES.

After doing this normally one think you should be done, but NO you are not. Every time you start an application you get the error message.

Gtk-Message: Failed to load module "globalmenu-gnome": 
libglobalmenu-gnome.so: cannot open shared object file: No such file or 
directory

To get rid of this, do the following:

Open gconf-editor, press alt+f2 and then run.
Disable globalmenu-gnome

February 14, 2009

Make Ubuntu 8.10 Intrepid Look Like Mac OSX

There are numerous site describing how to make Ubuntu look like Mac OSX, but must site that I have come across are quite buggy. And even the home page for the Google Global Menu is not correct. But yesterday I came across I really good web page that works and maybe even more imported have to uninstall the theme. http://maketecheasier.com/turn-your-ubuntu-intrepid-into-mac-osx-leopard/2009/01/08 And if you are annoyed with the close, minimize and maximize (metacity) buttons you can change that with the following.

Open gconf-editor, press alt+f2 and then run.

Edit the key: app → metacity → general

Set the button_layout to: menu:close,minimize,maximize

And a last that is forgotten is that Mac4Lin comes with themes and plugins for firefox and thunderbird. These packages are located in Mac4Lin directory and under Mozilla.

February 11, 2009

Java Concurrency: How to Share Data Between Threads.

One of challenging thing of writing multi thread safe application is to share data between different threads. The problem is twofold:

Visibility, i.e. one reading thread must see the latest value that another thread writes to.
Atomicity, e.g. one thread increment a shared class variable must be consistent with another thread setting the same class variable to a new value.

The simplest case is when one class variable is not depending on its previous state/value. A typical example is a boolean value that is explicit set and does not make use of its previous value.

public class Worker {

private volatile boolean working = false;

private long workCount = 0L;

public boolean isWorking() {
return working;
}

public void startWorking() {
working = true;
work();
}

public void stopWorking() {
working = false;
}

private void work() {
while (working) {
 ++workCount;
}
}
}

In this example we do not use synchronization, but instead we use volatile. What volatile do is force the JVM to store the variable value in memory, instead of local registers that each thread otherwise read and writes from. What is imported to not forget about volatile it does not upright hold Atomicity, so if the class variable was depending on it's previous state/value, this implementation would not be thread safe. Instead one should use one of the classes in java.util.concurrent.atomic package.