I wish to inform: 2014

Sunday, September 28, 2014

Finagle, promises, local and dtab

Finally figured out how request context is defined in finagle. It was a bunch of things that had been puzzling me for some time and all started making sense together.

So what is a request context. To understand request context we first have to understand how computations work in finagle. The first two rules of finagle are that you do not block. So how then do you make network calls and other typically blocking operations? The answer is that you still, of course, perform all those computations but by not blocking. Each blocking computation returns a future and then you define future computations on that. This way you never block a thread, you just do your thing and leave the thread, defining the next steps when you get a chance again. In this way you can define what you want to do as a graph of these chained operations. It may sound complex but in practice it is very easy to express as monadic operations on futures. Scala even provides a convenient (for comprehension) syntax for expressing these. So you can imagine an incoming request coming in, starting a bunch of async computations that in turn spawn other computations and so on. Ultimately, a request ends up occupying chunks of execution time slices on many threads. Request Context is state that is available in all those time slices. If all these operations were happening on the same thread in a blocking manner then defining this context would be very simple, it's just any state defined on the thread, whatever variable you define is available to all subsequent operations. In an async scenario this has to be managed in a more sophisticated manner. What we basically have to do is save the request context before starting any async operation and restore it afterwards. Request Context is stored in com.twitter.util.Local.

Local is a utility class built on Java's ThreadLocal that provides operations to easily save and restore data saved in Local onto the current thread. This saving and restoring of state is exactly what happens in async operations in finagle. Async operations are typically implemented as promises. A promise is created at the start of an async operation. This promise implements the future interface and can be passed to callers who can use them to define operations pending on the execution of the underlying async operation of the promise. When the async operation finishes, the promise is marked complete. Thus a promise wraps the execution of an async operation. Finagle does an additional thing here. Before executing the async operation the local context, defined using com.twitter.util.Local, is stored away and after the async operation finishes it is restored. Thus the local context survives the async operation. This facility is extremely powerful as we'll see. We can stow away some piece of data regarding a particular request and have it available in all computations spawned by the request.

Delegations Tables or DTabs are a mechanism of defining how names map to locations where locations are network locations defined ultimately as ipaddress+port. Imagine on your desktop machine you had mounted a remote server's directory, the remote server directory in turn had mounted another server's directory, which had mounted a disk under it's directory somewhere and so on. When you want to read from a file in such a filesystem a lot of resolutions may happen behind the scenes. After some prefix of the path, when a mounted location is encountered, it would have to be mapped to a network location. This kind of a thing will go on until we find the file in question in some sector of some disk somewhere. My point is that names translate to locations. Similarly in Wily naming names translate to locations albeit the locations are host port pairs in the ip universe. There is an elaborate process of resolution to achieve that. Finagle allows overriding the resolution process at various granularities, the finest of which is an individual request. The overrides are defined by things called delegation tables, which basically define prefix substitutions. I won't go into the details of that mechanism here, that's a different topic, but talk about how overriding of these dtabs per request is achieved.

The DTab overrides for a request are basically stored in the local context, defined by com.twitter.util.Local. This dtab override then stays available through out the asynchronous life of a request and affects every network request that this request in turn spawns. Thus every downstream network request that this request triggers uses the overridden delegation table. Well that's not the end of it, for participating protocols such as Http and Thrift this overridden delegation table gets written to outgoing requests and read back on the receiving services. Thus these overrides are basically preserved across the entire network stack for the request. Is this cool or what? Calling it amazing would be an understatement, if you ask me.

Thursday, May 22, 2014

Embrace the implicit

When you first learn scala, implicits come as one of the most advanced and complicated features to grasp. Over time, you get an impression from the scala community that implicits are an advanced feature that should be used carefully. This makes a lot of sense because when you use them in your code, everybody that reads the code needs to know about implicits first of all and then how the particular implicit works in that particular context. Implicits are somewhat magical (and by magical I mean the bad sort of magic) in that sense. Usually when you read code, the context that you need to understand the code with, is around there or at least there is a direct clue to where go look. With implicits, the process to go look is complicated, because the best fit is chosen through an involved process. You have to look at all related companion objects and see if an implicit is defined there that would fit the invocation location best. For compiler, this is an easy thing to do, but for humans it’s very involved.

So well, if there is so much complexity associated with implicits then why embrace them. I’ll give you two reasons:

Implicits are used all over the place in most of the useful and popular scala libraries. Without understanding how they work it will feel like magic (not good); you may as well be working in ruby.
Implicits happen to be fundamental to scala

The first point is self explanatory. Let me expand on the second one.

A great thing about scala collections is that when you map over a collection you get back the collection of the same type or the type that fits best. E.g. if you map over list you get back a list. If you map over an array you get back an array, when you map over a map you get back a map !! This, as you would expect, is a very important property of the collections and arguably without this property the collections would be far less useful. The way this is achieved in the scala collections is through the CanBuildFrom pattern. There is a great explanation of how this works in the "Programming in Scala" book in the chapter titled "The Architecture of Scala Collections". In short, the way it works is that each collection type has an associated builder that knows how to build a collection of the type, adding one element at a time. The map function on the collection takes an implicit builder factory as input. At compile time, the builder that best fits in the context is implicitly chosen and thus the builder of the same type or the "most similar" type is chosen. The concept is explained in great detail in the paper titled Generics of a Higher Kind.

Context bounds and adhoc polymorphism in scala work using implicit conversions.

Martin Odersky, the creator of scala, even includes implicit parameters in the simple parts of Scala.

For better or for worse, implicits are hard to avoid in Scala. I feel it’s best to embrace them rather than avoid learning about them. The important rules for implicits are these:

Only constructs defined with the implicit keyword can be used in an implicit context. So, if your function takes an implicit parameter of type MyType then only a variable of type MyType that is defined marked with the keyword implicit would fit, not any other variable of that type.
val builder = new Builder[MyType]("some") implicit val implicitBuilder = new Builder[MyType]("imp") def myfunc(param1: String)(implicit builder: Builder[MyType])
Only implicitBuilder fits and will be picked up.
An inserted implicit conversion must be in scope as a single identifier, or be associated with the source or target type of the conversion.
- This is the most complex part. The scope here is not just the lexical scope but includes the companion types, and associated companion types) of the source and target as well. Source type is the object that is passed in and target type is the type that is expected in the context. The implicit conversions may be defined in the companion object of either type. Any time you see a function call it’s hard to predict by quickly reading the code what types it was passed in because the passed in type may be any that can convert to the expected type through implicit conversions, which can be defined in the imported definitions, companion objects of the source and target types. If this wasn’t enough the definitions are also searched in associated types of the source and target types where association is defined by
  - All subtypes of a type
  - All type parameters of a types
  - Special associations for Singleton types and type projections
    
    Mind boggles at the possibilities.
One-at-a-time Rule: Only one implicit is tried.

The compiler will never rewrite x + y to convert1(convert2(x)) + y (example taken directly from the book Programming in Scala)
Explicits-First Rule: Whenever code type checks as it is written, no implicits are attempted.

If the code compiles already, i.e. without needing any implicit conversions then none will be attempted. Implicit conversions are always used as a fall back.

Where implicits are tried.

implicit parameters

def map[B, That](f: (A) ⇒ B)(implicit bf: CanBuildFrom[List[A], B, That]): That

implicit conversions to expected type
implicit def intToString(i: Int): String = i.toString def hyphenate(str: String) val i = 1234 hyphenate(i)
The above compiles even though hyphenate receives an int instead of string because int gets converted to the expected type, i.e. string, automatically

Conversions of the receiver of a selection

import OptToTry._
val opt = funcThatMaybeReturnsAnInt()
opt.toTry(new Exception("No dice"))
object OptToTry {
  class TryConvertableOption[T](opt: Option[T]) {
    def toTry(e: Exception) = opt match {
      case Some(v) => Return(v)
      case None => Throw(e)
    }
  }
  implicit def optToTryConvertableOption(opt: Option[T]): TryConvertableOption[T] =
    new TryConvertableOption(opt)
}

OptToTry supplies an implict conversion from option to TryConvertableOption that supports a method to convert an option to a Try. This enables calling toTry on options to convert them to Try.

Having a good grasp of implicits is not easy but is essential to get a good grip on scala. I wish I could give you a silver bullet but there is none that I know of. Get over it and embrace the implicits. One caveat though, implicits allow you to play nice magical tricks, tricks that will be a puzzle to everyone who reads that code every time, including you. I like solving puzzles but not when I’m reading code, I hope you feel the same and keep it simple with implicits.

Tuesday, May 20, 2014

Blogging from command line using b.py

Since it took me some time to figure out how to set up b.py to start blogging from command line, I thought I'd post it out here.

pip install b.py

pip install markdown

pip install google-api-python-client

pip install python-gflags

pip install docutils

brew install source-highlight

Then make a directory where you'll keep your blogs. This directory needs to have a configuration file called brc.py

Mine looks like this(I use blogger, blogid jumbled):

brc.py

service = "b"
service_options = {
  'blog': 3865076449691263698
}

Save the above in a file called brc.py in the same folder where you want to keep your blogs.

To get your blog id run the following:

b.py blogs

This does two things:

It would ask you login to blogger in the browser and then save the creds in a file called b.dat. Make sure you doe this in the blog directory, the one that has brc.py
It lists your blogs along with the blog id. You should update the blog id in the brc.py file

Now create blog post in markdown e.g. like this:

blogpost.md

!b
service: blogger
title: My First Post
labels: blogging

My first blog from command line

The first four lines are special. !b tells b.py that headers follow. The three headers provide info about the blog post.

Make sure the following files are in the blog directory:

brc.py
b.dat
blogpost.md

Now publish the post: b.py post blogpost.md

This should have published the blog. b.py would have changed your blog file, don't be surprised. It inserts some metadata in the file, perhaps to keep track of it to update it later, I haven't tried.

If you need speed nothing beats command line. Thank the creator of b.py and enjoy!!

I wish to inform