黑马程序员技术交流社区

标题: 求好心人路过给看看这个题目~~ [打印本页]

作者: Abstact小哲 时间: 2013-9-10 19:40
标题: 求好心人路过给看看这个题目~~
本帖最后由 Abstact小哲于 2013-9-11 11:13 编辑

编写函数，从一个字符串中按字节数截取一部分，但不能截取出半个中文（GBK码表）

// 例如：从“HM程序员”中截取2个字节是“HM”，截取4个则是“HM程”，截取3个字节也要是"HM"而不要出现半个中文

这个题没思路啊。。不知道怎么做，哪位朋友能帮忙指点1下
先谢谢了~

作者: .....淡定 时间: 2013-9-10 19:49

public static String splitString(String str, int len) {
if (str == null && "".equals(str)) { // 字符串是否为空
return null;
}
byte[] strBytes = null;
try {
strBytes = str.getBytes("GBK");
} catch (Exception e) {
e.printStackTrace();
}
int strLen = strBytes.length;// 字符串的长度
if (len >= strLen || len < 1) {
return str;
}
int count = 0;
for (int i = 0; i < len; i++) {
int value = strBytes[i];
if (value < 0) {
count++;
}
}
if (count % 2 != 0) {
len = (len == 1) ? len : len - count / 2 - 1;
} else {
len = len - (count / 2);
}
return str.substring(0, len);
}

复制代码

只提供参考。。不要抄袭

作者: 昝文萌 时间: 2013-9-10 20:06
说一下思路：
1、首先把源字符串转化为字节数组。
str=“HM程序员”
byte[] origin= str.getBytes("GBK");
2、从源字符数组取出给定的字节数作为新子串。
新子串的为newStr= new String(origin, 0, 3)；
2、通过比较新子串和源字符串相同位置上的字符是否相等来判断新子串中的字节是否是汉字。
假设新子串为取三个字节，为new= new String(origin, 0, 3)，把新子串和源字符串相同位置上的字符进行比较，当比较到第三个字符是，源字符串的字符为”程“，而新子串上的字符为”程“的第一个字节不想等，所以就舍弃。

作者: Yuan先生 时间: 2013-9-10 21:05
public class Demo {
public static void main(String[] args) throws Exception{
      String str ="HM程序员";
      int num =trimGBK(str.getBytes("GBK"),4); // HM程
      System.out.println(str.substring(0,num));
}
public static int trimGBK(byte[] buf,int n){
      int num = 0;
      Boolean  bChineseFirstHalf = false;
      for(int i=0;i<n;i++){
         if(buf[i]<0 && !bChineseFirstHalf){
            bChineseFirstHalf= true;
         }else{
            num++;
            bChineseFirstHalf= false;
         }
      }
      return num;
}
}

作者: 流浪的风 时间: 2013-9-10 21:42

public class Test10 {
public static void main(String[] args) throws UnsupportedEncodingException {
Scanner sc = new Scanner(System.in);// 创建一个Scanner对象
System.out.println("请输入字符串");
String s = sc.nextLine();// 输入字符串参数
System.out.println("请输入字节数");
int num = sc.nextInt();// 输入字节数参数
sc.close();// 关闭流
String s1 = splitDemo(s, num);// 将参数传入调用的方法
System.out.println("切割后的字符串是：" + s1);// 对截取结果进行打印
}
public static String splitDemo(String str, int num)
throws UnsupportedEncodingException {
if (str == null) {// 当字符串被赋值为null时的情况
return "你的字符串没有值";
}
if ("".equals(str)) {// 当字符串长度为零的情况
return "你输入的是空的";
}
int count = 0;// 定义一个数来记录小于零的字节数用来求有多少汉字
byte[] arr = str.getBytes("GB2312");// 把传入的字符串通过getbyte(String str)方法得到的一个GB2312编码的字节数组arr
for (int i = 0; i < num; i++) {// 遍历0到num这段字节数组求出负数的个数
if (arr[i] < 0) {
count++;// 求出count的值
}
}
if (count % 2 == 0) {// 负字节数是偶数的情况
return new String(arr, 0, num, "GB2312");
}
if (count % 2 == 1) {// 负数字节数是基数的情况
return new String(arr, 0, num + 1, "GB2312");
}
return null;

复制代码

这个是我自己做的，那个GBK的编码的汉字对应的字节数值不一定是负数，所以我采用的是GB2312编码表来写的，希望能解决你的问题。

名称	第一字节	第二字节
GB2312	0xB0-0xF7(176-247)	0xA0-0xFE（160-254）
GBK	0x81-0xFE（129-254）	0x40-0xFE（64-254）

这是GBk和GB2312编码表的区别，希望能帮到你！

欢迎光临黑马程序员技术交流社区 (http://bbs.itheima.com/)

黑马程序员IT技术论坛 X3.2