We have many considerations when choose the storage media in the computer system or some other electronic equipments. As we known, storage system is really very important and essential in computer architecture. And as the open source hardware, we must consider more than traditional computer because we have to face a broader application scale.
You know we use Allwinner’s main chip A10 and A20 since its lower price, high performance, rich interfaces and his kind open-source attitude. But we also found their good chip design in storage media support. So when we were designing, we tried our best to support these good features. Cubieboard1, Cubieboard2 and Cubieboard3 can support NAND FLASH, TF card, TSD and HDD, but they don’t support EMMC. EMMC is 8-lines chip with EMMC interface. It’s useless when used in A10&A20 because A10&A20 don’t support 8-lines EMMC interface. Fortunately, A80 can support 8-lines EMMC interface. So in the upcoming A80 platform, we will adopt EMMC solution to get faster read/write speed. Another issue everyone concerned is SATA port. In my opinion, SATA is not very important at this time if EMMC is supported. A80 board has up to 64GB storage capacity with just one piece of on-board EMMC chip and the speed is also not very low against SATA port. Even more, the USB3.0 port is also a very good way to extend the storage capacity with fast access speed.
In the above table, I set out 5 sort of storage media which are very common used for your easy comparison. I think you will make a different choice if the position is different. Here I would like to give you some advices.
If you are a developer or playing Cubieboard for fun, Cubieboard dual-card version is very suitable. You can try lots of card based operating system from the community at your pleasure. But you should buy the TF card with Class 10 specification. Higher read/write speed will bring you more smooth using experiences and shorter start-up time. I once met the start-up time from 30s~90s with different TF card.
If you plan to apply Cubieboards in your product, you should consider this problem carefully. If the products have no backup power supply and will probable be cut power off randomly, you should not choose NAND Flash and HDD. The code/data may be damaged when power cutting off if a batch data writing has not finished. TSD and TF card can solve this great hidden danger. TSD is a kind of chip with TSSOP48 looks like NAND Flash. But actually it is a TF card which contains Nand Flash and card controller. The controller has good firmware backup mechanism.
In some other areas with backup battery, you can choose Nand Flash for its low cost. Nand Flash is widely used in lots of electronics devices. You can also guarantee the code safety at the level of software if it’s possible.
The Big Data Zone is presented by Splunk, the maker of data analysis solutions such as Hunk, an analytics tool for Hadoop, and the Splunk Web Framework.
An ARM chip is not for processing Big Data by design. However, it’s been gradually becoming powerful enough to do so. Many people have been trying to, at least, run Apache Hadoop on top of an ARM cluster. Cubieboard’s guys have posted in the last August that they were able to run Hadoop on an 8-node machine. Also, Jamie Whitehorn seems to be the first guy who successfully ran Hadoop on a Raspberry Pi cluster, in October 2013. Both show that an ARM cluster is OK to up and run Hadoop.
But is it feasible to seriously do Big Data on a low-cost ARM cluster? This question really bugs me.
We know that is doing operations on disk. With the slow I/O and networking of these ARM SoCs, Hadoop’s MapReduce will really not be able to process a real Big Data computation, which average 15GB per file.
Yes, the true nature of Big Data is not that big. And we often do batch processing over a list of files, each of them is not larger than that size. So if a cluster is capable of processing a file of size 15GB, it is good enough for Big Data.
To answer this question, we have prototyped an ARM-based cluster, designed to process Big Data. It’s a 22-node Cubieboard A10 with 100 Mbps Ethernet. Here’s what it looks like:
The cluster running Spark and Hadoop
As we learned that Hadoop’s Map Reduce is not a good choice to process on this kind of cluster, we decided to use only HDFS then looked for an alternative, and stumbled upon Apache Spark.
It’s kind a lucky for us that Spark is an in-memory framework which optionally spills intermediate results out to a disk when a computing node is running out-of-memory. So it’s running fine on our cluster. Although the cluster has total 20GB of RAM, there’s only 10GB available for data processing. If we try to allocate a larger amount of memory, some nodes will die during the computation.
So what is the result? Our cluster is good enough to crush a single, 34GB, Wikipedia article file from the year 2012. Its size is 2-times larger than the average Big Data file size, mentioned above. Also, it’s 3-times larger than the memory allocated to process data.
We simply ran a tweaked word count program in Spark’s shell and waited for 1 hour 50 mins and finally the cluster answered that the Wikipedia file contains 126,938,368 words of “the”. The cluster spent around 30.5 hours in total across all nodes.
The result printed out from Spark’s shell
(Just don’t mind the date. We didn’t set the cluster date/time properly.)
Design and Observation
We have 20 Spark worker nodes, and 2 of them also run Hadoop Data Nodes. This enables us to understand the data locality of our Spark/Hadoop cluster. We run the Hadoop’s Name Node and the Spark’s master node on the same machine. Another machine is the driver. The file is stored in 2 SSDs with SATA connected to Data Nodes. We set duplication to 2 as we have only 2 SSDs.
We have observed that the current bottleneck may be from 100 Mbps Ethernet, but we still have no chance to confirm this until we create a new cluster with 1Gbps Ethernet. We have 2 power supplies attached and the cluster seems to consume not much power. We’ll measure this in detail later. We are located in one of the warmest cities of Thailand, but we’ve found that the cluster is able to run fine here (with some additional fans). Our room temperature is air-conditioned to 25 degrees Celsius (or 77 degrees Fahrenheit). It’s great to have a cluster running without a costly data center, right?
An ARM system-on-chip board has demonstrated an enough power to form a cluster and process non-trivial size of data. The missing puzzle-piece we have found is to not rely only on Hadoop. We need to choose the right software package. Then with some engineering efforts, we can tune the software to fit the cluster. We have successfully used it to process a single large, 34 GB, file with acceptable time spent.
We are looking forwards to develop a new one, bigger by CPU cores but we’ll keep its size small. Yes, and we’re seriously thinking of putting the new one in the production.
Ah, I forgot to tell you the cluster’s name. It’s SUT Aiyara Cluster: Mk-I.
Chird Team made up of several individuals who are passionate about Technology. Chird is a young start-up specializing in embedded technology training and embedded products R&D，located in Hangzhou-China.
Chird Team compared with numerous board solutions by careful assessments, CubieTruck works more available and conforms to icloud-education.
Chird is a famous embedded training institution, firstly introduces ARM @ i-cloud service in education field by CubieTruck. Chird Team actively promotes teaching mode reform. It is the first step towards a combined education system with local colleges and universities. As a practice teaching platform, CubieTruck provides an effective way to train up advanced technologies in embedded system development by all dimensions.
Why choose Cubietruck as embedded i-cloud education platform?
Most of the traditional embedded training institutions usually adopt PC + ARM target board solution. By installing virtual machine in PC for students using, the utilization rate of PC resources and development efficiency of students is very low. In either case, there is no help for students to absorb the knowledge of Linux system , and to understand hardware peripherals in this way.
The Cubietruck installed Debian Linux, provides i-cloud service for students and helps them deepen understanding of linux system throughout runtime environment. Every student gets a user account from Windows server, and connects to it by RDP protocol. The students can study and practice in the Cubietruck terminal. It can promote utilization of sever resources.
Some Technical Features
- Cubietruck works in the combination of one server and multiple cloud terminal by Giga byte network transmission, sharing with the same PC sever resources.
- Every student gets a user account in PC sever center installed Windows system, which is convenient for students to write or reviews documents.
- The cubietruck installed Debian Linux, provides i-cloud service for students and help them deepen understanding of linux system throughout runtime environment. Students also can freely log into windows system by RDP applications such as rdesktop、freerdp. Chird classroom offers data transfer by Giga bybe cables and switcher to guarantee
a smooth network.
- With Linux virtual machine installed under the Windows OS of PC sever center, students can connect to the PC sever center to compile and develop. It can save precious time and improve work efficiently , further more reduce Cubietruck heavy burden.
The advantages of Cubietruck cloud classroom
- Saving cost in hardware, Cubietruck is much cheaper than a PC computer
- Very convenient to manage and maintain.
- Small size (less than palm size), low power consumption, high integration and high performance
- Easier to help students understanding Linux system
- Provided Mulit-interfaces and experiment cases for students, to activate their creative idea
Desire of Chird
Set embedded training , embedded teaching plan and embedded product development into a whole. Help train embedded talents and build a bridge for students and company personnel. To form an embedded engineer social circle, facilitating everybody to make like-minded friends, to understand industry dynamic,and to enrich lives…
DVK521 is an expansion board designed for Cubieboard, integrates various components and interfaces for connecting external accessory boards. It’s ideal for Cubieboard evaluation and development.
- Cubieboard socket
- OV7670 interface: for connecting OV7670 Camera Module
- 8I/Os interface: easily connects to modules controlled by I/Os, such as 8 Push Buttons
- SPI interface: easily connects to SPI modules such as AT45DBXX Dataflash, etc.
- I2C interface: easily connects to I2C modules such as PCF8574 Expansion Module, PCF8563 RTC Module, etc.
- USB interface: USB TO UART, convenient for debugging
- VGA interface: for connecting VGA display Module
- Capacitive touch screen socket: for connecting capacitive touch screen using I2C interface
- 7inch LCD interface: for connecting 7inch LCD
- ONE-WIRE interface: easily connects to ONE-WIRE devices (TO-92 package), such as temperature sensor (DS18B20), electronic registration number (DS2401), etc.
- UART interface (PL2303TA): connects to the UART interface of Cubieboard
- Cubieboard Ext Ports
- 5V/3.3 V power input/output: usually used as power output, also common-grounding with other user board
- Power indicator
- User LEDs
- User Keys
- Joystick: five positions
- PL2303TA: onboard USB TO UART convertor
- 12M crystal: for PL2303TA
- Joystick jumper
- User Keysjumper
- ONE-WIRE jumper
- Buzzer jumper
- User LEDs jumper
- Prototyping area: can be used to place user components for experiments
For jumpers above:
- short the jumper to connect to I/Os used in example code
- open the jumper to connect to other custom pins via jumper wires
The User Guide CD includes development resources listed as follow:
- User manual
- Development environment setup guide
- Kernel porting and configuration guide
- Drivers porting and configuration guide
- Schematic (PDF)
- Demo code drivers and API testing source code
- Linux kernel driver source code, and the high-level API source code based on Linux
- Pre-configured Cubieboard image
- ubuntu based on ARM chips, just burn it into SD card, it’s ready to run
- Related documentations and software
|DS18B20 Temperature sensor||1-WIRE||Y|
|PL2303 USB TO UART||UART||Y|
|7inch multi-color LCD||LCD||Y|
|Capacitive touch screen||I2C||Y|
- DVK521 x 1
- USB type A plug to mini-B plug cable x 1
- 4-pin 2-pin wires pack x 1
- LCD screws pack x 1