At work, I use Node.js and find it to be quite useful. If I were to explain Node.js in one word, I'd say "interesting" (which is not a purely positive adjective). The community is thriving and expanding. Despite its quirks, JavaScript can be a fantastic programming language. And you'll be rethinking "best practise" and the patterns of well-structured code on a daily basis. There's a lot of energy in Node.js right now, and working with it exposes you to a lot of thinking - fantastic mental weightlifting.
It is possible to use Node.js in production, but it is far from the "turn-key" deployment promised by the documentation. Although "cluster" has been integrated into Node.js v0.6.x, providing one of the platform's basic building pieces, my "production.js" script still has 150 lines of logic to manage things like generating the log directory, recycling dead workers, and so on. You'll also need to throttle incoming connections and do all of the tasks that Apache handles for PHP if you're running a "real" production service. To be fair, Ruby on Rails suffers from the same issue. It is resolved by the use of two complementary mechanisms: 1) Running Ruby on Rails/Node.js behind a dedicated webserver like Nginx (or Apache / Lighttd) built in C and thoroughly tested.
Static content, access logging, rewriting URLs, terminating SSL, enforcing access rules, and managing many sub-services are all possible with the webserver. The webserver proxies requests that are directed to the actual node service. 2) Using a framework like Unicorn to manage worker processes, recycle them on a regular basis, and so on. I have yet to locate a completely baked Node.js serving framework; it may exist, but I haven't found it yet, and my hand-rolled "production.js" still has 150 lines.
According to Express, the normal practise is to serve everything through a single jack-of-all-trades Node.js service... "app.use(express.static( dirname + '/public'))". That's probably acceptable for low-load services and development. However, once you try to put a lot of pressure on your service and have it run 24 hours a day, you'll quickly see why big sites use well-tested, hardened C-code like Nginx to front their site and handle all of the static content requests (unless you set up a CDN like Amazon CloudFront). See this guy for a slightly amusing and unabashedly nasty perspective on it.
Node.js is also finding its way into non-service applications. Even if you're not serving web content using Node.js, you can use npm modules to structure your code, Browserify to stitch it together into a single component, and uglify-js to minify it before deploying it. JavaScript is a fantastic impedance match for interacting with the web, and it is typically the simplest path of attack. Use my underscore-CLI module, the utility-belt of structured data, for example, if you want to dig through a number of JSON response payloads.
Advantages and disadvantages:
Pro: Writing JavaScript on the backend has been a "gateway drug" to learning current UI patterns for a server person. Writing client code is no longer a chore for me.
Pro: It encourages thorough error checking (err is returned by virtually all callbacks, nagging the programmer to handle it; also, async.js and other libraries handle the "fail if any of these subtasks fails" paradigm much better than typical synchronous code)
Pro: Getting status on tasks in flight, interacting amongst workers, and sharing cache state are just a few of the interesting and normally difficult tasks that become straightforward.
Pro: A large community and a large number of excellent libraries built on a reliable package manager (npm)
Cons: There is no standard library for JavaScript. You get so used to importing functionality that using JSON.parse or another built-in technique that doesn't require a npm module feels strange. This means that everything has five different variants. Even the Node.js "core" modules have five more versions if you're not content with the default implementation. This results in rapid evolution, but also considerable perplexity.
Versus a simple one-process-per-request model (LAMP):
Pro: Thousands of active connections are possible. Very quick and efficient. In comparison to PHP or Ruby, this might represent a 10X reduction in the number of boxes necessary for a web fleet.
Pro: It's simple to create parallel patterns. Consider the situation when you need to get three (or N) blobs from Memcached. Do it in PHP... did you just build code that downloads the first, second, and third blobs? Wow, that's a long time. There's a particular PECL module for Memcached that fixes this problem, but what if you want to get Memcached data in concurrently with your database query? Because Node.js is asynchronous, it's natural for a web request to do numerous tasks in parallel.
Con: Asynchronous code is intrinsically more sophisticated than synchronous code, and the first learning curve can be difficult for developers who don't have a firm grasp on what concurrent execution entails. Still, it's a lot easier than designing any form of multithreaded programming that requires locking.
Con: If a compute-intensive request takes 100 milliseconds to complete, it will impede the processing of other requests in the same Node.js session... a.k.a. cooperative-multitasking. The Web Workers pattern can help with this (spinning off a subprocess to deal with the expensive task). You could also utilise a huge number of Node.js workers and have each one handle only one request at a time (still fairly efficient because there is no process recycle).
Con: Managing a production system is far more difficult than using a CGI approach such as Apache + PHP, Perl, Ruby, and so on. Exceptions that go unhandled will bring the entire process to a halt, forcing logic to restart unsuccessful workers (see cluster). Modules with faulty native code can cause the process to crash. Any requests handled by a worker are lost when it dies, so a flawed API can rapidly degrade service for other cohosted APIs.
Compared to developing a "real" service in Java, C#, or C (really, C?)
Pro: Asynchronous programming in Node.js is both easier and more beneficial than thread-safety programming in other languages. By far the most painless asynchronous paradigm I've ever worked in is Node.js. It's only slightly more difficult than writing synchronous programmes with appropriate libraries.
Pro: There are no issues with multithreading or locking. True, you spend more time in the beginning writing more verbose code that defines an asynchronous workflow with no blocking actions. You'll also need to develop some tests and get the system up and running (it is a scripting language and fat fingering variable names is only caught at unit-test time). BUT, once it's up and running, the surface area for heisenbugs (abnormal problems that only appear once in a million runs) is significantly reduced. The costs of producing Node.js code are strongly weighted toward the beginning of the development process. Then you're more likely to have stable code.
Pro: When it comes to expressing functionality, JavaScript is far more lightweight. JSON, dynamic type, lambda notation, prototypal inheritance, lightweight modules, etc... it just takes less code to describe the same ideas.
Con: Perhaps you adore writing Java code for services?
Check out From Java to Node.js, a blog entry about a Java developer's impressions and experiences with Node.js, for a different viewpoint on JavaScript and Node.js.
Modules When considering node, keep in mind that your choice of JavaScript libraries will DEFINE your experience. Most people use at least two, an asynchronous pattern helper (Step, Futures, Async), and a JavaScript sugar module (Underscore.js).
Helper / JavaScript Sugar:
Underscore.js - use this. Just do it. It makes your code nice and readable with stuff like _.isString(), and _.isArray(). I'm not really sure how you could write safe code otherwise. Also, for enhanced command-line-fu, check out my own Underscore-CLI.
Asynchronous Pattern Modules:
Step - a very elegant way to express combinations of serial and parallel actions. My personal reccomendation. See my post on what Step code looks like.
Futures - much more flexible (is that really a good thing?) way to express ordering through requirements. Can express things like "start a, b, c in parallel. When A, and B finish, start AB. When A, and C finish, start AC." Such flexibility requires more care to avoid bugs in your workflow (like never calling the callback, or calling it multiple times). See Raynos's post on using futures (this is the post that made me "get" futures).
Async - more traditional library with one method for each pattern. I started with this before my religious conversion to step and subsequent realization that all patterns in Async could be expressed in Step with a single more readable paradigm.
TameJS - Written by OKCupid, it's a precompiler that adds a new language primative "await" for elegantly writing serial and parallel workflows. The pattern looks amazing, but it does require pre-compilation. I'm still making up my mind on this one.
StreamlineJS - competitor to TameJS. I'm leaning toward Tame, but you can make up your own mind.
Or to read all about the asynchronous libraries, see this panel-interview with the authors.
Web Framework:
Express Great Ruby on Rails-esk framework for organizing web sites. It uses JADE as a XML/HTML templating engine, which makes building HTML far less painful, almost elegant even.
jQuery While not technically a node module, jQuery is quickly becoming a de-facto standard for client-side user interface. jQuery provides CSS-like selectors to 'query' for sets of DOM elements that can then be operated on (set handlers, properties, styles, etc). Along the same vein, Twitter's Bootstrap CSS framework, Backbone.js for an MVC pattern, and Browserify.js to stitch all your JavaScript files into a single file. These modules are all becoming de-facto standards so you should at least check them out if you haven't heard of them.
Testing:
JSHint - Must use; I didn't use this at first which now seems incomprehensible. JSLint adds back a bunch of the basic verifications you get with a compiled language like Java. Mismatched parenthesis, undeclared variables, typeos of many shapes and sizes. You can also turn on various forms of what I call "anal mode" where you verify style of whitespace and whatnot, which is OK if that's your cup of tea -- but the real value comes from getting instant feedback on the exact line number where you forgot a closing ")" ... without having to run your code and hit the offending line. "JSHint" is a more-configurable variant of Douglas Crockford's JSLint.
Mocha competitor to Vows which I'm starting to prefer. Both frameworks handle the basics well enough, but complex patterns tend to be easier to express in Mocha.
Vows Vows is really quite elegant. And it prints out a lovely report (--spec) showing you which test cases passed / failed. Spend 30 minutes learning it, and you can create basic tests for your modules with minimal effort.
Zombie - Headless testing for HTML and JavaScript using JSDom as a virtual "browser". Very powerful stuff. Combine it with Replay to get lightning fast deterministic tests of in-browser code.
A comment on how to "think about" testing:
Testing is non-optional. With a dynamic language like JavaScript, there are very few static checks. For example, passing two parameters to a method that expects 4 won't break until the code is executed. Pretty low bar for creating bugs in JavaScript. Basic tests are essential to making up the verification gap with compiled languages.
Forget validation, just make your code execute. For every method, my first validation case is "nothing breaks", and that's the case that fires most often. Proving that your code runs without throwing catches 80% of the bugs and will do so much to improve your code confidence that you'll find yourself going back and adding the nuanced validation cases you skipped.
Start small and break the inertial barrier. We are all lazy, and pressed for time, and it's easy to see testing as "extra work". So start small. Write test case 0 - load your module and report success. If you force yourself to do just this much, then the inertial barrier to testing is broken. That's <30 min to do it your first time, including reading the documentation. Now write test case 1 - call one of your methods and verify "nothing breaks", that is, that you don't get an error back. Test case 1 should take you less than one minute. With the inertia gone, it becomes easy to incrementally expand your test coverage.
Now evolve your tests with your code. Don't get intimidated by what the "correct" end-to-end test would look like with mock servers and all that. Code starts simple and evolves to handle new cases; tests should too. As you add new cases and new complexity to your code, add test cases to exercise the new code. As you find bugs, add verifications and / or new cases to cover the flawed code. When you are debugging and lose confidence in a piece of code, go back and add tests to prove that it is doing what you think it is. Capture strings of example data (from other services you call, websites you scrape, whatever) and feed them to your parsing code. A few cases here, improved validation there, and you will end up with highly reliable code.
Also, check out the official list of recommended Node.js modules. However, GitHub's Node Modules Wiki is much more complete and a good resource.
To understand Node, it's helpful to consider a few of the key design choices:
Node.js is EVENT BASED and ASYNCHRONOUS / NON-BLOCKING. Events, like an incoming HTTP connection will fire off a JavaScript function that does a little bit of work and kicks off other asynchronous tasks like connecting to a database or pulling content from another server. Once these tasks have been kicked off, the event function finishes and Node.js goes back to sleep. As soon as something else happens, like the database connection being established or the external server responding with content, the callback functions fire, and more JavaScript code executes, potentially kicking off even more asynchronous tasks (like a database query). In this way, Node.js will happily interleave activities for multiple parallel workflows, running whatever activities are unblocked at any point in time. This is why Node.js does such a great job managing thousands of simultaneous connections.
Why not just use one process/thread per connection like everyone else? In Node.js, a new connection is just a very small heap allocation. Spinning up a new process takes significantly more memory, a megabyte on some platforms. But the real cost is the overhead associated with context-switching. When you have 10^6 kernel threads, the kernel has to do a lot of work figuring out who should execute next. A bunch of work has gone into building an O(1) scheduler for Linux, but in the end, it's just way way more efficient to have a single event-driven process than 10^6 processes competing for CPU time. Also, under overload conditions, the multi-process model behaves very poorly, starving critical administration and management services, especially SSHD (meaning you can't even log into the box to figure out how screwed it really is).
Node.js is SINGLE THREADED and LOCK FREE. Node.js, as a very deliberate design choice only has a single thread per process. Because of this, it's fundamentally impossible for multiple threads to access data simultaneously. Thus, no locks are needed. Threads are hard. Really really hard. If you don't believe that, you haven't done enough threaded programming. Getting locking right is hard and results in bugs that are really hard to track down. Eliminating locks and multi-threading makes one of the nastiest classes of bugs just go away. This might be the single biggest advantage of node.
But how do I take advantage of my 16 core box?
Two ways:
For big heavy compute tasks like image encoding, Node.js can fire up child processes or send messages to additional worker processes. In this design, you'd have one thread managing the flow of events and N processes doing heavy compute tasks and chewing up the other 15 CPUs.
For scaling throughput on a webservice, you should run multiple Node.js servers on one box, one per core, using cluster (With Node.js v0.6.x, the official "cluster" module linked here replaces the learnboost version which has a different API). These local Node.js servers can then compete on a socket to accept new connections, balancing load across them. Once a connection is accepted, it becomes tightly bound to a single one of these shared processes. In theory, this sounds bad, but in practice it works quite well and allows you to avoid the headache of writing thread-safe code. Also, this means that Node.js gets excellent CPU cache affinity, more effectively using memory bandwidth.
Node.js lets you do some really powerful things without breaking a sweat. Suppose you have a Node.js program that does a variety of tasks, listens on a TCP port for commands, encodes some images, whatever. With five lines of code, you can add in an HTTP based web management portal that shows the current status of active tasks. This is EASY to do:
var http = require('http');
http.createServer(function (req, res) {
res.writeHead(200, {'Content-Type': 'text/plain'});
res.end(myJavascriptObject.getSomeStatusInfo());
}).listen(1337, "127.0.0.1");
Now you can hit a URL and check the status of your running process. Add a few buttons, and you have a "management portal". If you have a running Perl / Python / Ruby script, just "throwing in a management portal" isn't exactly simple.
But isn't JavaScript slow / bad / evil / spawn-of-the-devil? JavaScript has some weird oddities, but with "the good parts" there's a very powerful language there, and in any case, JavaScript is THE language on the client (browser). JavaScript is here to stay; other languages are targeting it as an IL, and world class talent is competing to produce the most advanced JavaScript engines. Because of JavaScript's role in the browser, an enormous amount of engineering effort is being thrown at making JavaScript blazing fast. V8 is the latest and greatest javascript engine, at least for this month. It blows away the other scripting languages in both efficiency AND stability (looking at you, Ruby). And it's only going to get better with huge teams working on the problem at Microsoft, Google, and Mozilla, competing to build the best JavaScript engine (It's no longer a JavaScript "interpreter" as all the modern engines do tons of JIT compiling under the hood with interpretation only as a fallback for execute-once code). Yeah, we all wish we could fix a few of the odder JavaScript language choices, but it's really not that bad. And the language is so darn flexible that you really aren't coding JavaScript, you are coding Step or jQuery -- more than any other language, in JavaScript, the libraries define the experience. To build web applications, you pretty much have to know JavaScript anyway, so coding with it on the server has a sort of skill-set synergy. It has made me not dread writing client code.
Besides, if you REALLY hate JavaScript, you can use syntactic sugar like CoffeeScript. Or anything else that creates JavaScript code, like Google Web Toolkit (GWT).
Speaking of JavaScript, what's a "closure"? - Pretty much a fancy way of saying that you retain lexically scoped variables across call chains. ;) Like this:
var myData = "foo";
database.connect( 'user:pass', function myCallback( result ) {
database.query("SELECT * from Foo where id = " + myData);
} );
// Note that doSomethingElse() executes _BEFORE_ "database.query" which is inside a callback
doSomethingElse();
See how you can just use "myData" without doing anything awkward like stashing it into an object? And unlike in Java, the "myData" variable doesn't have to be read-only. This powerful language feature makes asynchronous-programming much less verbose and less painful.
To know more about Node JS, It's recommended to join Node JS Course today.